Three Facets of Understanding Pre-training: Loss, Inductive Bias, and Implicit Bias

Wednesday, October 12, 2022 - 4:00pm

Event Calendar Category

Other LIDS Events

Speaker Name

Tengyu Ma

Affiliation

Stanford University

Join Zoom meeting

https://mit.zoom.us/j/93673116988

Abstract: AI is undergoing a paradigm shift with the rise of models pre-trained with self-supervisions and then adapted to a wide range of downstream tasks. However, their working largely remains a mystery; classical learning theory cannot explain why pre-training on an unsupervised task can help many different downstream tasks. This talk will first investigate the role of pre-training losses in extracting meaningful structural information from unlabeled data, especially in the infinite data regime. Concretely, I will show that the contrastive loss can give rise to embeddings whose Euclidean distance captures the manifold distance between raw data (or, more generally, the graph distance of a so-called positive-pair graph). Moreover, directions in the embedding space correspond to relationships between clusters in the positive-pair graph. Then, I will discuss two other elements that seem necessary for a sharp explanation of the behavior of practical pre-trained models: inductive bias of architectures and implicit bias of optimizers. I will introduce two recent, ongoing projects, where we (1) strengthen the previous theoretical framework by incorporating the inductive bias of architectures and (2) demonstrate the implicit bias of optimizers in pre-training, even with infinite pre-training data, empirically and theoretically.

Based on https://arxiv.org/abs/2106.04156, https://arxiv.org/abs/2204.02683, and ongoing works.

Tengyu Ma is an assistant professor of Computer Science and Statistics at Stanford University. He received his Ph.D. from Princeton University and B.E. from Tsinghua University. His research interests include topics in machine learning and algorithms, such as deep learning and its theory, non-convex optimization, deep reinforcement learning, representation learning, and high-dimensional statistics. He is a recipient of the ACM Doctoral Dissertation Award Honorable Mention, the Sloan Fellowship, and the NSF CAREER Award.