Autonomy Tea Talk

Tuesday, March 11, 2025 - 4:00pm

Event Calendar Category

Other LIDS Events

Speaker Name

Olawale Salaudeen

Affiliation

LIDS / Healthy ML Lab

Building and Room number

32-D650

Building and Room Number

LIDS Lounge

"Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified?"

Spurious correlations are unstable statistical associations that hinder robust decision-making. Conventional wisdom suggests that models relying on such correlations will fail to generalize out-of-distribution (OOD), particularly under strong distribution shifts. However, a growing body of empirical evidence challenges this view, as naive empirical risk minimizers often achieve the best OOD accuracy across popular OOD generalization benchmarks. In light of these counterintuitive results, we propose a different perspective: {\em many widely used benchmarks for assessing the impact of spurious correlations on OOD generalization are misspecified.} Specifically, they fail to include shifts in spurious correlations that meaningfully degrade OOD generalization, making them unsuitable for evaluating the benefits of removing such correlations. We establish sufficient—and in some cases necessary—conditions under which a distribution shift can reliably assess a model's reliance on spurious correlations. Crucially, under these conditions, we provably should not observe a strong positive correlation between in-distribution and out-of-distribution accuracy—often referred to as accuracy on the line. Yet, when we examine state-of-the-art OOD generalization benchmarks, we find that most exhibit accuracy on the line, suggesting they do not effectively assess robustness to spurious correlations. Our findings expose a limitation in evaluating algorithms for domain generalization, i.e., learning predictors that do not rely on spurious correlations. Our results highlight the need to rethink how we assess robustness to spurious correlations.

Olawale (Wale) Salaudeen is a postdoctoral associate at MIT in the Healthy ML Lab, led by Professor Marzyeh Ghassemi. Before joining MIT, he earned a PhD in Computer Science at the University of Illinois at Urbana-Champaign and the Stanford Trustworthy AI Research (STAIR) Lab at Stanford University, where he was advised by Professor Sanmi Koyejo. He has been recognized with several honors, including a Sloan Scholarship, Beckman Graduate Research Fellowship, GEM Associate Fellowship, and an NSF Miniature Brain Machinery Traineeship. His industry and research experience includes internships at Sandia National Laboratories (with Dr. Eric Goodman), Google Brain (with Dr. Alex D’Amour), Cruise LLC, and the Max Planck Institute for Intelligent Systems (with Dr. Moritz Hardt).

Before pursuing his PhD, Salaudeen completed a Bachelor of Science in Mechanical Engineering with minors in Computer Science and Mathematics at Texas A&M University.

His research focuses on the principles and practices of reliable and trustworthy AI for social and societal good. He primarily investigates the robustness of artificial intelligence (AI) in real-world decision-making. His prior work has addressed AI robustness under distribution shift, including generalization, adaptation, and evaluation, as well as the broader understanding of effective AI/ML evaluation practices. His research has applications in areas such as biological imaging, algorithmic fairness, healthcare, and AI policy.