Stochastics and Statistics Seminar | Boaz Barak (Harvard University)

Friday, September 24, 2021 - 11:00am to 12:00pm

Event Calendar Category

IDSS

Speaker Name

Boaz Barak

Affiliation

Harvard University

Building and Room number

E18-304

Self-supervised learning is an increasingly popular approach for learning representations of data that can be used for downstream representation tasks. A practical advantage of self-supervised learning is that it can be used on unlabeled data. However, even when labels are available, self-supervised learning can be competitive with the more "traditional" approach of supervised learning.

In this talk, we consider "self-supervised + simple classifier (SSS)" algorithms, which are obtained by first learning a self-supervised classifier on data, and then reusing the same data and its labels to fit a simple (e.g., linear) classifier on these data. We show that:

1) Unlike traditional end-to-end supervised learning algorithms, SSS algorithms have small generalization gaps in practice, even when the final classifier is highly over-parameterized.

2) Under natural assumptions we can prove that the generalization gap will tend to zero if the number of samples is sufficiently larger than the complexity of the simple classifier, independently of the complexity of the self-supervised model. We show that the bound proven yields non-vacuous guarantees for many popular representation-learning-based classifiers on CIFAR-10 and ImageNet, including SimCLR, AMDIM, and BigBiGAN.

3) We give evidence that self-supervised and fully supervised models learn similar representations, by showing that the self-supervised layers can be "stitched" to the bottom of a fully supervised model, and vice versa, without a significant loss of performance.

Based on joint works with Bansal and Kaplun ( https://arxiv.org/abs/2010.08508 ), and Bansal and Nakkiran ( https://arxiv.org/abs/2106.07682 ).

Boaz Barak is the Gordon McKay Professor of Computer Science at Harvard University's John A. Paulson School of Engineering and Applied Sciences. His research interests include all areas of theoretical computer science and in particular cryptography and computational complexity. Previously, he was a principal researcher at Microsoft Research New England, and before that an associate professor (with tenure) at Princeton University's computer science department. Barak has won the ACM dissertation award, the Packard and Sloan fellowships, and was also selected for Foreign Policy magazine's list of 100 leading global thinkers for 2014 and chosen as a Simons Investigator in 2017.