Wednesday, October 1, 2025 - 4:00pm
Event Calendar Category
Other LIDS Events
Speaker Name
Marina Mancoridis
Building and Room number
32-D650
Building and Room Number
LIDS Lounge
"Potemkin Understanding in Large Language Models"
Large language models (LLMs) are regularly evaluated using benchmark datasets. But what justifies making inferences about an LLM's capabilities based on its answers to a curated set of questions? This paper first introduces a formal framework to address this question. The key is to note that the benchmarks used to test LLMs—such as AP exams—are also those used to test people. However, this raises an implication: these benchmarks are only valid tests if LLMs misunderstand concepts in ways that mirror human misunderstandings. Otherwise, success on benchmarks only demonstrates potemkin understanding: the illusion of understanding driven by answers irreconcilable with how any human would interpret a concept.
We present two procedures for quantifying the existence of potemkins: one using a specially designed benchmark in three domains, the other using a general procedure that provides a lower-bound on their prevalence. We find that potemkins are ubiquitous across models, tasks, and domains. We also find that these failures reflect not just incorrect understanding, but deeper internal incoherence in concept representations.
Marina Mancoridis is a PhD student in Electrical Engineering and Computer Science at MIT, advised by Sendhil Mullainathan. She studies the intersection of artificial intelligence and behavioral science. Her recent work, Potemkin Understanding in Large Language Models (published at ICML 2025), develops new ways to assess concept understanding in models. Since joining MIT, she has been supported by the MIT Presidential Graduate Fellowship and the Schwarzman College of Computing Amazon AI Research Innovation Fellowship.