Thesis Defense: Optimization for Deep Learning: Bridging the Theory-Practice Gap

Monday, July 26, 2021 - 9:30am to 11:00pm

Event Calendar Category

LIDS Thesis Defense

Speaker Name

Chulhee “Charlie” Yun

Affiliation

LIDS & EECS

Join Zoom meeting

https://mit.zoom.us/j/92550841435

Abstract

Thesis Committee: Prof. Suvrit Sra (Supervisor), Prof. Ali Jadbabaie (Supervisor), Prof. Asu Ozdaglar

Abstract:
The success of deep learning has been primarily driven by empirical breakthroughs, but such empirical successes have left many theoretical questions that cannot be explained via existing theory. For example, despite the nonconvexity of the training objectives, deep neural networks can be reliably trained to fully memorize the training data, yet perform very well on unseen test data. Although progress has been made over the recent years, the gap between theory and practice remains wide open; this thesis takes a step towards closing this gap.

First, we discuss the optimization landscape of neural networks. We establish the existence of spurious local minima for general datasets and activation functions, which suggests that the convergence of optimization methods on neural networks cannot be explained solely via the training objectives. The second part of this thesis discusses the implicit bias in training neural networks, which is an area to understand generalization in deep learning via the optimization trajectory. Through a unified analysis using a tensor representation of neural networks, we show how different architectures in linear neural networks lead to different global minima in overparameterized setups. The last part of this thesis addresses a major theory-practice gap in stochastic finite-sum optimization: practical algorithms shuffle and iterate through component indices, while most theoretical analyses assume uniformly sampling the indices. To close this gap, we develop tight convergence rates of shuffling-based SGD, which are faster than the rate of its uniform-sampling counterpart.