Advances in Off-policy Value Estimation in Reinforcement Learning

Monday, November 22, 2021 - 11:00am to 12:00pm

Event Calendar Category

Uncategorized

Speaker Name

Martha White

Affiliation

University of Alberta

Zoom meeting id

979338

Join Zoom meeting

https://mit.zoom.us/j/99849707728

Abstract

Temporal difference learning algorithms underlie most approaches in reinforcement learning, for both prediction and control. A well-known issue is that these approaches can diverge under nonlinear function approximation, such as with neural networks, and in the off-policy setting where data is generated by a different policy than the one being learned. Naturally, there has been a flurry of work towards resolving this issue. In this talk, I will discuss two key advances that largely resolve the problem: sound gradient-based methods and emphatic reweightings. I will discuss our generalized objective that unifies several approaches and facilitates creating easy-to-use algorithms that consistently outperform temporal difference learning approaches in our experiments.

Biography

Martha White is an Associate Professor of Computing Science at the University of Alberta and a PI of Amii---the Alberta Machine Intelligence Institute---which is one of the top machine learning centres in the world. She holds a Canada CIFAR AI Chair and received IEEE’s “AIs 10 to Watch: The Future of AI” award in 2020. She has authored more than 50 papers in top journals and conferences. Martha is an associate editor for TPAMI, and has served as co-program chair for ICLR and area chair for many conferences in AI and ML, including ICML, NeurIPS, AAAI and IJCAI. Her research focus is on developing algorithms for agents continually learning on streams of data, with an emphasis on representation learning and reinforcement learning.