Achieving the Reinforcement Learning Trifecta

Tuesday, April 20, 2021 - 3:00pm to 4:00pm

Event Calendar Category

LIDS Seminar Series

Speaker Name

John Langford

Affiliation

Microsoft

Event Recording

Video

Zoom meeting id

989 7723 8884

Join Zoom meeting

https://mit.zoom.us/j/98977238884

There are 3 key problems at the heart of reinforcement learning: How do you generalize to new unseen observations? How do you assign value to actions taken? And how do you explore so as to gather the information necessary to do the first two? Traditional reinforcement learning theories (MDPs, Contextual Bandits, Policy Improvement) accomplish 2 of these 3. We have developed a new line of research over the last 5 years that addresses all 3 simultaneously, creating new kinds of algorithms which can lead to a new kind of reinforcement learning.

John Langford is a computer scientist working in machine learning and learning theory, a field that he says "is shifting from an academic discipline to an industrial tool". He is well known for work on the Isomap embedding algorithm, CAPTCHA challenges, Cover Trees for nearest neighbor search, Contextual Bandits (which he coined) for reinforcement learning applications, and learning reductions. John is the author of the blog hunch.net and the principal developer of Vowpal Wabbit. He works at Microsoft Research New York, of which he was one of the founding members, and was previously affiliated with Yahoo! Research, Toyota Technological Institute at Chicago, and IBM's Watson Research Center. He studied Physics and Computer Science at the California Institute of Technology, earning a double bachelor's degree in 1997, and he received his Ph.D. in Computer Science from Carnegie Mellon University in the year of 2002. John was the program co-chair for the 2012 International Conference on Machine Learning (ICML), general chair for the 2016 ICML, and is the President of ICML from 2019–2021. In August 2018, he participated in the Human-Level AI conference, hosted by GoodAI in Prague.