Regret of Queueing Bandits

Monday, September 17, 2018 - 4:00pm to Tuesday, September 18, 2018 - 4:55pm

Event Calendar Category

LIDS Seminar Series

Speaker Name

Sanjay Shakkottai

Affiliation

University of Texas, Austin

Building and Room Number

32-155

We consider a variant of the multiarmed bandit (MAB) problem where jobs or tasks queue for service, and service rates of different servers (agents) may be unknown. Such (queueing+learning) problems are motivated by a vast range of service systems, including supply and demand in online platforms (e.g., Uber, Lyft, Airbnb, Upwork, etc.), order flow in financial markets (e.g., limit order books), communication systems, and supply chains.

We study algorithms that minimize queue-regret: the expected difference between the queue-lengths (backlogs) obtained by the algorithm, and those obtained by a genie-aided matching algorithm that knows exact service rates. A naive view of this problem would suggest that queue-regret could grow logarithmically: since queue-regret cannot be larger than classical regret, results for the standard MAB problem give algorithms that ensure queue-regret increases no more than logarithmically in time. Our work shows surprisingly more complex behavior — specifically, the optimal queue-regret decreases with time and scales as O(1/t). We next consider holding-cost regret in multi-class (multiple types of tasks) multi-server (servers/agents have task-type dependent service rate) systems. Holding costs correspond to a system where a linear cost (with respect to time spent in the queue) is incurred for each incomplete task. We consider learning-based variants of the c-mu rule – a classic and well-studied scheduling policy that is used when server/agent service rates are known. We develop algorithms that result in constant expected holding-cost regret (independent of time). The key insight that allows such a regret bound is that service systems we consider exhibit explore-free learning, where no penalty is (eventually) incurred for exploring and learning server/agent rates. We finally discuss the implications of our results on building platforms for matching tasks to servers/agents.

Based on joint work with Subhashini Krishnasamy, Rajat Sen, Ari Arapostathis and Ramesh Johari.

Sanjay Shakkottai received his Ph.D. from the ECE Department at the University of Illinois at Urbana-Champaign in 2002. He is with The University of Texas at Austin, where he is currently the Ashley H. Priddy Centennial Professor in Engineering, the Director of the Wireless Networking and Communications Group (WNCG), and a Professor in the Department of Electrical and Computer Engineering. He received the NSF CAREER award in 2004, and was elected as an IEEE Fellow in 2014. His research interests lie at the intersection of algorithms for resource allocation, statistical learning and networks, with applications to wireless communication networks and online platforms.