Monday, November 18, 2019 - 4:00pm to 5:00pm
Event Calendar Category
LIDS Seminar Series
University of Texas at Austin
Building and Room Number
We investigate a simple generic approach to correct for this, motivated by a classic statistical idea: trimmed loss. This advocates jointly (a) selecting which training samples to ignore, and (b) fitting a model on the remaining samples. As such this is computationally infeasible even for linear regression. We propose and study the natural iterative variant that alternates between these two steps (a) and (b) - each of which individually can be easily accomplished in pretty much any statistical setting. We also study the batch-SGD variant of this idea. We demonstrate both theoretically (for generalized linear models) and empirically (for vision and NLP neural network models) that this effectively recovers accuracy in the presence of bad training data.
This work is joint with Yanyao Shen and Vatsal Shah and appears in NeurIPS 2019 and ICML 2019.
Sujay is also interested in learning from and applying his ideas in industry. He has been a Visiting Scientist at Google Research, a senior quant at Engineers Gate and is currently a Principal Scientist and Amazon Scholar at Amazon.