Provably Faster Convergence of Adaptive Gradient Methods

Wednesday, October 14, 2020 - 11:00am to 11:30am

Event Calendar Category

LIDS & Stats Tea

Speaker Name

Jingzhao Zhang

Affiliation

LIDS

Zoom meeting id

921 4123 0377

Join Zoom meeting

https://mit.zoom.us/j/92141230377

Abstract

While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning, adaptive methods like Adam have been observed to outperform SGD across important tasks, such as NLP models. The settings under which SGD performs poorly in comparison to adaptive methods are not well understood yet. Instead, recent theoretical progress shows that SGD is minimax optimal under canonical settings. In this talk, we provide empirical and theoretical evidence that a different smoothness condition or a heavy-tailed distribution of the noise could both result in SGD’s poor performance. Based on this observation, we study clipped variants of SGD that circumvent this issue; we then analyze their convergence and show that adaptive methods can be provably faster than SGD.

Biography

Jingzhao is a PhD student working with Suvrit Sra and Ali Jadbabaie. His research interests are broadly in the analysis and design of fast optimization algorithms.