Efficient Learning from Untrusted Batches

Wednesday, October 2, 2019 - 4:00pm to 4:30pm

Event Calendar Category

LIDS & Stats Tea

Speaker Name

Sitan Chen

Affiliation

CSAIL

Building and Room Number

LIDS Lounge

Abstract

In recent years there has been an explosion of interest in designing unsupervised learning algorithms able to tolerate adversarial corruptions in the input data. A notable instantiation of this, first introduced to the theory community by Qiao and Valiant and motivated by the practical task of "federated learning," goes as follows:

There is an unknown distribution $\mathcal{D}$ over $n$ elements, and some set of servers, an $\epsilon$ fraction of whom are malicious. Each non-malicious server $i$ gives the learner a batch of $k$ independent draws from some distribution $\mathcal{D}_i$ for which $d_{TV}(\mathcal{D},\mathcal{D}_i)\le\eta$, and each malicious server gives an adversarially chosen batch of samples. There is an information-theoretic lower bound saying one cannot learn $\mathcal{D}$ to within total variation better than $O(\eta + \epsilon/\sqrt{k})$, and no algorithm was known to match this bound without suffering an exponential dependence on $\eta$.

In this talk, I will describe how to use the sum-of-squares (SoS) hierarchy to obtain the first efficient algorithm for this problem. Time permitting, I'll also describe how to port the technology of Haar wavelets and $\mathcal{A}_K$ norms from VC theory over to SoS to improve the sample complexity to sublinear in the support size of $\mathcal{D}$ when $\mathcal{D}$ is shape-constrained.

Joint work with Jerry Li and Ankur Moitra.

Biography

Sitan Chen is a fourth-year graduate student in theoretical computer science at MIT advised by Ankur Moitra. His research focuses on unsupervised learning in settings where data is drawn from heterogeneous sources, approximate inference in graphical models and connections to phase transitions in statistical physics, and the use and limitations of convex programming hierarchies for these and related problems. He has presented his work at venues including STOC, SODA, and the Simons Institute, and his research is supported by a PD Soros Fellowship.