Acceleration by Stepsize Hedging

Monday, September 18, 2023 - 3:00pm

Event Calendar Category

Other LIDS Events

Speaker Name

Jason Altschuler

Affiliation

UPenn

Building and Room number

4-237

Can we accelerate convergence of gradient descent without changing the algorithm — just by optimizing stepsizes? Surprisingly, we show that the answer is yes. Our proposed Silver Stepsize Schedule optimizes strongly convex functions in $k^{\log_p 2} = k^{0.7864}$ iterations, where $p=1+\sqrt{2}$ is the silver ratio and $k$ is the condition number. This is intermediate between the textbook unaccelerated rate $k$ and the accelerated rate $\sqrt{k}$ due to Nesterov in 1983. The non-strongly convex setting is conceptually identical, and standard black-box reductions imply an analogous accelerated rate $\eps^{-\log_p 2} = \eps^{-0.7864}$. We conjecture and provide partial evidence that these rates are optimal among all possible stepsize schedules. The Silver Stepsize Schedule is non-monotonic, fractal-like, and approximately periodic with period $k^{\log_p 2}$. This leads to a phase transition in the convergence rate: super-exponential, then exponential. Why should such stepsizes help? The core intuition is “hedging” between individually suboptimal strategies — short steps and long steps — since bad cases for the former are good cases for the latter, and vice versa. Properly combining these stepsizes yields faster convergence due to the misalignment of worst-case functions. This talk is based on work that publishes and extends my 2018 Master’s Thesis — which established for the first time that dynamic stepsizes can enable acceleration in convex optimization. Prior to this thesis, the only such result was for the special case of quadratics, due to Young in 1953.

Based on joint work with Pablo Parrilo.

Jason Altschuler is an Assistant Professor at UPenn in the Department of Statistics and Data Science. He received his PhD from MIT and BSE from Princeton. His research is at the interface of optimization, probability, and machine learning, with a focus on the design and analysis of large-scale optimization algorithms.

https://jasonaltschuler.github.io/