Distribution-Free, Risk-Controlling Prediction Sets

Thursday, September 23, 2021 - 4:15pm to 5:15pm

Event Calendar Category

ORC

Speaker Name

Stephen Bates

Affiliation

University of California, Berkeley

Building and Room number

E51-149

Abstract

While improving prediction accuracy has been the focus of machine learning in recent years, this alone does not suffice for reliable decision-making. Deploying predictive models in consequential settings also requires analyzing and communicating their uncertainty. To give valid inference for prediction tasks, we show how to generate set-valued predictions from any black-box predictive model that control certain statistical error rates on future test points at a user-specified level. Our approach provides explicit finite-sample guarantees for any dataset and model by using a holdout set to calibrate the size of the prediction sets. This framework enables simple, distribution-free, rigorous error control for many tasks, and we demonstrate it in four large-scale prediction problems: (1) multi-label classification, where each observation has multiple associated labels; (2) classification problems where the labels have a hierarchical structure; (3) image segmentation, where we wish to predict a set of pixels containing an object of interest; and (4) protein structure prediction.

Biography

Dr. Stephen Bates is a postdoctoral fellow working with Professor Michael I. Jordan at UC Berkeley and the Simons Institute for the Theory of Computing. His research interests include high-dimensional statistics, causal inference, and uncertainty quantification for predictive models. Previously, he earned his Ph.D. in statistics from Stanford University under the supervision of Professor Emmanuel Cand├Ęs, where he received the best dissertation award and his thesis work appeared on the cover of the Proceedings of the National Academy of Sciences (USA).