Wednesday, November 30, 2022 - 4:00pm
Event Calendar Category
LIDS & Stats Tea
ORC & LIDS
Building and Room Number
We study the question of learnability for contextual bandits when the reward function class is unrestricted and provide consistent algorithms for large families of data-generating processes. Our analysis shows that achieving consistency irrespective of the reward function—referred to as universal consistency—is possible for large classes of non-i.i.d. processes. We give tight characterizations of learnable processes for stationary contextual bandits in this context, and provide algorithms that achieve universal consistency whenever possible—referred to as optimistically universal algorithms. These enjoy the strong property that if they fail to be universally consistent then no other algorithm would be either. In particular for finite action spaces, we show that learning with contextual bandits can be done without generalizability cost compared to the full-feedback case—standard supervised learning.
Moïse Blanchard is a 4-th year Ph.D. student in ORC and LIDS, advised by Prof. Patrick Jaillet. His research focuses on online decision-making problems and learnability in machine learning in general.