Train Faster, Generalize Better: Stability of Stochastic Gradient Descent

Wednesday, November 18, 2015 - 4:00pm to Thursday, November 19, 2015 - 3:55pm

Event Calendar Category

LIDS Seminar Series

Speaker Name

Ben Recht


Univ. of California, Berkeley

Building and Room Number



The most widely used optimization method in machine learning practice is the Stochastic Gradient Method (SGM).  This method has been used since the fifties to build statistical estimators, iteratively improving models by correcting errors observed on single data points.  SGM is not only scalable, robust, and simple to implement, but achieves the state-of-the-art performance in many different domains.  In contemporary systems, SGM powers enterprise analytics systems and is the workhorse tool used to train complex pattern-recognition systems in speech and vision.
In this talk, I will explore why SGM has had such staying power, focusing on the notion of generalization.  I will show that any model trained with a few SGM iterations has vanishing generalization error and performs as well on unseen data as on the training data.   The analysis will solely employ elementary tools from convex and continuous optimization.  Applying the results to the convex case provides new explanations for why multiple epochs of stochastic gradient descent generalize well in practice. In the nonconvex case, I will describe a new interpretation of common practices in neural networks, and provide a formal rationale for stability-promoting mechanisms in training large, deep models. Conceptually, these findings underscore the importance of reducing training time beyond its obvious benefit.
Joint work with Moritz Hardt and Yoram Singer.


Benjamin Recht is an Associate Professor in the Department of Electrical Engineering and Computer Sciences and the Department of Statistics at the University of California, Berkeley.  Ben’s research focuses on scalable computational tools for large-scale data analysis and explores the intersections of convex optimization, mathematical statistics, and randomized algorithms.  He is the recipient of a Presidential Early Career Awards for Scientists and Engineers, an Alfred P. Sloan Research Fellowship, the 2012 SIAM/MOS Lagrange Prize in Continuous Optimization, the 2014 Jamon Prize, and the 2015 William O. Baker Award for Initiatives in Research.