Wednesday, October 23, 2019 - 4:00pm to 5:00pm
Event Calendar Category
LIDS & Stats Tea
Building and Room Number
Due to the ease of modern data collection, practitioners often have access to a large set of covariates that they wish to relate to some observed outcome. Generalized linear models (GLMs) offer a particularly interpretable framework for such an analysis. In these high-dimensional problems, the amount of data is often not large relative the number of covariates, leading to non-trivial inferential uncertainty; a Bayesian approach allows coherent quantification of this uncertainty. Unfortunately, existing methods for Bayesian inference in GLMs require running times roughly cubic in parameter dimension, thus limiting their applicability in increasingly widespread settings with tens of thousands of parameters. We propose to reduce time and memory costs with a low-rank approximation of the data. We show that our method, which we call LR-GLM, still provides a full Bayesian posterior approximation and admits running times reduced by a full factor of the parameter dimension. We rigorously establish the quality of our approximation via interpretable error bounds and show how the choice of rank allows a tunable computational--statistical trade-off. Experiments support our theory and demonstrate the efficacy of LR-GLM on real large-scale datasets.
Brian Trippe is a third-year PhD student at MIT working with Tamara Broderick in the Machine Learning group. Before MIT, he completed an MPhil by Research at the University of Cambridge where he worked on Bayesian methods for machine learning, and a BA at Columbia College in New York.