Cooperative Multiagent Reinforcement Learning: Multiagent Rollout

Thursday, July 17, 2025 - 3:00pm

Event Calendar Category

Other LIDS Events

Speaker Name

Dimitri Bertsekas

Affiliation

MIT LIDS and Arizona State University

Building and Room number

45-500A

Join Zoom meeting

Watch via zoom

In this lecture, we address multiagent control problems, where the overall control action consists of multiple components, each selected - conceptually - by a different agent. We assume that all agents share a common objective function, and we initially consider the case where they also share full information.

Our focus is on rollout and approximate policy iteration methods that can handle the extremely large control space that arises even with a moderate number of agents. To manage this complexity, we reformulate the problem to trade off control space size for increased state space complexity. Based on this reformulation, we introduce a multiagent rollout algorithm that uses a new policy improvement approach: at each decision stage, agents take turns performing local improvements to a base policy, using coordinating information from the others. This sequential structure ensures that the total computational cost grows linearly with the number of agents. In contrast, standard rollout methods require computation that grows exponentially with the number of agents.

Despite the substantial reduction in computation, our multiagent rollout algorithm preserves a key property of standard rollout: it guarantees an improvement over the base policy.

We also consider the more challenging setting where agents do not fully share their information. In this case, we discuss modified multiagent rollout schemes that allow agents to operate autonomously by relying on precomputed policies and limited signaling.

Dimitri Bertsekas' undergraduate studies were in engineering at the National Technical University of Athens, Greece. He obtained his MS in electrical engineering at the George Washington University, Wash. DC in 1969, and his Ph.D. in system science in 1971 at the Massachusetts Institute of Technology (M.I.T.).

Dr. Bertsekas has held faculty positions with the Engineering-Economic Systems Dept., Stanford University (1971-1974) and the Electrical Engineering Dept. of the University of Illinois, Urbana (1974-1979). From 1979 to 2019 he was with the Electrical Engineering and Computer Science Department of M.I.T., where he served as McAfee Professor of Engineering. Since 2019 he has been Fulton Professor of Computational Decision Making and a full time faculty member at the School of Computing and Augmented Intelligence at Arizona State University (ASU), Tempe. He has served as a consultant to various private companies, and as editor for several scientific journals. In 1995 he founded a publishing company, Athena Scientific, which has published, among others, all of his books since that time. In 2023 he was appointed Chief Scientific Advisor of Bayforest Technologies, a London-based quantitative investment company.

Professor Bertsekas' research spans several fields, including optimization, control, large-scale computation, reinforcement learning, and artificial intelligence, and is closely tied to his teaching and book authoring activities. He has written numerous research papers, and twenty books and research monographs, several of which are used as textbooks in MIT and ASU classes.

Professor Bertsekas was awarded the INFORMS 1997 Prize for Research Excellence in the Interface Between Operations Research and Computer Science for his book "Neuro-Dynamic Programming", the 2001 ACC John R. Ragazzini Education Award, the 2009 INFORMS Expository Writing Award, the 2014 ACC Richard E. Bellman Control Heritage Award for "contributions to the foundations of deterministic and stochastic optimization-based methods in systems and control," the 2014 Khachiyan Prize for Life-Time Accomplishments in Optimization, the SIAM/MOS 2015 George B. Dantzig Prize, and the 2022 IEEE Control Systems Award. Together with his coauthor John Tsitsiklis, he was awarded the 2018 INFORMS John von Neumann Theory Prize, for the contributions of the research monographs "Parallel and Distributed Computation" and "Neuro-Dynamic Programming". In 2001, he was elected to the United States National Academy of Engineering for "pioneering contributions to fundamental research, practice and education of optimization/control theory ..."

Dr. Bertsekas' recent books are "Introduction to Probability: 2nd Edition" (2008), "Convex Optimization Theory" (2009), "Dynamic Programming and Optimal Control," Vol. I, (2017), and Vol. II: (2012), "Convex Optimization Algorithms" (2015), "Nonlinear Programming" (2016), "Reinforcement Learning and Optimal Control" (2019), "Rollout, Policy Iteration, Distributed Reinforcement Learning" (2020), "Abstract Dynamic Programming" (2022, 3rd edition), "Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control" (2022), and "A Course in Reinforcement Learning" (2023), all published by Athena Scientific.