Sequential Decision-Makings in Non-Stationary Environments

Wednesday, May 19, 2021 - 4:00pm to 4:30pm

Event Calendar Category

LIDS & Stats Tea

Speaker Name

Ruihao Zhu



Zoom meeting id

966 2455 2278

Join Zoom meeting


Motivated by non-stationary environments in online advertising, dynamic pricing, and inventory control, we introduce data-driven decision-making algorithms that achieve state-of-the-art regret bounds for bandit optimization and reinforcement learning settings.

For bandit optimization, we show how the challenge posed by non-stationarity can be overcome by an unconventional marriage between stochastic and adversarial bandit algorithms. It is well known that the optimal regret bound can be achieved when the variation budget, which quantifies the total amount of temporal changes of the latent environment, is known in advance. Unfortunately, recovering this bound without knowing the variation budget has been an open problem until recently. We propose a novel Bandit-over-Bandit framework that utilizes an independent adversarial bandit algorithm to adaptively tune the parameter of a base stochastic bandit algorithm. We show that the Bandit-over-Bandit framework enjoys a nearly optimal regret bound. Numerical experiments with a dataset of an online auto-loan company demonstrate that our proposed algorithms achieve superior performance when compared to existing algorithms.

For reinforcement learning, we show that simply adopting techniques from bandit optimization might lead to linear regret despite the similarity between bandit optimization and reinforcement learning. We overcome this challenge by a novel confidence widening technique that incorporates extra optimism into the learning algorithm to ensure low regret bounds. We demonstrate the power of our technique by applying it to the stochastic inventory model with lost-sales, fixed cost, and zero lead time.


Ruihao Zhu is currently a 5th-year candidate for the Interdisciplinary Ph.D. in Statistics at the MIT Institute for Data, Systems, and Society (IDSS) and the Laboratory for Information & Decision Systems (LIDS). He has the pleasure of being advised by David Simchi-Levi. Previously, Ruihao received his B.Eng. degrees in Electrical Engineering and Computer Science from the University of Michigan.

Ruihao's research seeks to help organizations to improve decision-making in uncertain and dynamically changing environments by developing data-driven models and analytics methodologies. He is currently focusing primarily on statistical learning, with applications in revenue management and supply chain management. Some of his research projects are in collaboration with companies across different industries, such as consumer packaged goods and medical device manufacturing.