Thursday, September 12, 2019 - 11:00am to 12:00pm
Event Calendar Category
Other LIDS Events
EECS, UC Berkeley
Building and Room Number
A continuing mystery in understanding the empirical success of deep neural networks has been in their ability to achieve zero training error and yet generalize well, even when the training data is noisy and there are many more parameters than data points. Following the information-theoretic tradition of seeking understanding, this talk will share our three-part approach to shedding some light on this phenomenon. First, following the tradition of such distilled toy models like the BSC and AWGN channels, the Gaussian source, or scalar linear control systems, we zoom in on the classical linear regression problem in the underdetermined setting with more parameters than training data. Here, the solutions that minimize training error interpolate the data, including noise. Second, following the tradition of converse bounds, we give a genie-aided bound on how well any interpolative solutions can generalize to fresh test data, and show that this bound generically decays to zero (at a known rate) with the number of extra features, thus characterizing an explicit benefit of overparameterization. Third, we talk about what it takes to achieve such harmless interpolation in appropriately overparameterized limits. For appropriately sparse linear models, we provide a hybrid interpolating scheme (combining classical sparse recovery schemes with harmless noise-fitting) to achieve generalization error close to the bound on interpolative solutions. Along the way, we call out certain key concepts that we call "signal bleed" and "crab-pot regularization" that help us understand what is required to achieve harmless interpolation in general.
Anant Sahai did his undergraduate work in EECS at UC Berkeley, and then went MIT as a graduate student studying Electrical Engineering and Computer Sciences. After graduating with his PhD from LIDS, and before joining the Berkeley faculty, he was on the theoretical/algorithmic side of a team at the de-facto LIDS startup Enuvis, Inc. developing new adaptive software radio techniques for GPS in very low SNR environments (such as those encountered indoors in urban areas).
His research interests span information theory, decentralized control, machine learning, and wireless communication --- with a particular interest at the intersections of these fields. Recently, he is very interested in machine learning for cooperation, control, and wireless communication. On the teaching side, he has been involved with revamping the core Machine Learning oriented curriculum at Berkeley.