Wednesday, October 6, 2021 - 4:00pm to 4:30pm
Event Calendar Category
LIDS & Stats Tea
Building and Room Number
Sequential decision-making in a multi-agent environment arises frequently in the real world. We consider online learning in unknown Markov games: 1) a Markov game is used to model the multi-player dynamic environment, 2) online learning means that we act as one of the players and play against others (“opponents"), 3) and “unknown games" refer to the setup where the actions taken by the opponents are unobservable. The difficulty of the problem comes from three aspects: unknown environment dynamics, potentially adversarial opponents, and unobservable actions. We show that in this challenging setting, achieving sublinear regret against the best response in hindsight is statistically hard. We then consider a weaker notion of regret by competing with the minimax value of the game and present an algorithm that achieves a sublinear regret. This is the first sublinear regret bound (to our knowledge) for online learning in unknown Markov games. And importantly, the regret bound is independent of the size of the opponents' action spaces.
Joint work with Yuanhao Wang, Tiancheng Yu, and Suvrit Sra.
Yi Tian is a Ph.D. student at LIDS and EECS, advised by Prof. Suvrit Sra. His current research concerns sequential decision-making under uncertainty, and he is broadly interested in reinforcement learning and control.