Why is RLHF Data-Efficient in Policy Optimization?

Monday, April 1, 2024 - 4:00pm

Event Calendar Category

LIDS Seminar Series

Speaker Name

R. Srikant

Affiliation

UIUC

Building and Room Number

32-155

We consider a version of a policy optimization in reinforcement learning where one has to learn rewards through human feedback. We study the sample complexity of this algorithm and compare it to the sample complexity of an algorithm where the rewards are known a priori. We show that the amount of additional data needed to infer rewards from human feedback is a small fraction of the total amount of data needed for policy optimization. Joint work with Yihan Du, Anna Winnicki, Gal Dalal and Shie Mannor.

R. Srikant is a Grainger Chair in Engineering, Co-Director of the C3.ai Digital Transformation Institute and a Professor of Electrical and Computer Engineering and the Coordinated Science Lab at the University of Illinois Urbana-Champaign. His research interests span machine learning, applied probability and communication networks. He is the recipient of the 2021 ACM SIGMETRICS Achievement Award, the 2019 IEEE Koji Kobayashi Computers and Communication Award and the 2015 IEEE INFOCOM Achievement Award. He has also received several Best Paper awards including the 2015 IEEE INFOCOM Best Paper Award and the 2017 Applied Probability Society Best Publication Award.