Wednesday, October 4, 2023 - 4:00pm
Event Calendar Category
LIDS & Stats Tea
Building and Room Number
n this talk, I'll be talking about interesting phenomena arising in Deep Learning optimization. In particular, based on a simple yet canonical model, we will discuss how the neural networks learn the "correct" classifiers. Surprisingly, understanding this mechanism requires a non-convex optimization dynamics that is beyond the scope of conventional convex optimization theory.
In particular, this talk will be based on the two NeurIPS 2023 papers:
- Learning threshold neurons via edge-of-stability (NeurIPS 2023; https://arxiv.org/abs/2212.07469) (joint with Sebastien Bubeck (MSR), Sinho Chewi (IAS), Yin Tat Lee (MSR), Felipe Suarez (CMU), Yi Zhang (MSR))
- The Crucial Role of Normalization in Sharpness-Aware Minimization (NeurIPS 2023; https://arxiv.org/abs/2305.15287) (joint with Yan Dai (Tsinghua), Suvrit Sra (MIT))
Kwangjun Ahn is a final year PhD student at MIT with the Department of EECS (Electrical Engineering & Computer Science) and Laboratory for Information and Decision Systems (LIDS). His advisors are Profs. Suvrit Sra and Ali Jadbabaie. He's also working part time at Google Research, where he's working on accelerating LLM inference with the Speech & Language Algorithms Team. His current research interests include understanding LLM optimization and how to speed up the optimization. He has worked on various topics over the years, including machine learning theory, optimization, statistics, and learning for control.