Challenges in Reliable Machine Learning

Tuesday, February 23, 2021 - 3:00pm to Wednesday, February 24, 2021 - 3:55pm

Event Calendar Category

LIDS Seminar Series

Speaker Name

Kamalika Chaudhuri

Affiliation

University of California San Diego

Event Recording

Video

Zoom meeting id

965 2254 5847

Join Zoom meeting

https://mit.zoom.us/j/96522545847

As machine learning is increasingly deployed, there is a need for reliable and robust methods that go beyond simple test accuracy. In this talk, we will discuss two challenges that arise in reliable machine learning. The first is robustness to adversarial examples, which are small imperceptible perturbations to legitimate test inputs that cause machine learning classifiers to misclassify. While recent work has proposed many attacks and defenses, why exactly they arise still remains a mystery. In this talk, we'll take a closer look at this question.
The second problem is overfitting, that many generative models are known to be prone to. Motivated by privacy concerns, we formalize a form of overfitting that we call data-copying -- where the generative model memorizes and outputs training samples or small variations thereof. We provide a three-sample test for detecting data-copying and study the performance of our test on several canonical models and datasets.

Kamalika Chaudhuri is an Associate Professor at the University of California, San Diego. She received a Bachelor of Technology degree in Computer Science and Engineering in 2002 from the Indian Institute of Technology, Kanpur, and a PhD in Computer Science from the University of California at Berkeley in 2007. After a postdoctoral stint at UCSD, she joined the CSE department at UC San Diego as an assistant professor in 2010. She received an NSF CAREER Award in 2013 and a Hellman Faculty Fellowship in 2012. She has served as the program co-chair for AISTATS 2019 and ICML 2019.

Kamalika's research interests lie in the foundations of trustworthy machine learning -- or machine learning beyond accuracy, which includes problems such as learning from sensitive data while preserving privacy, learning under sampling bias, and in the presence of an adversary.