Tuesday, June 27, 2017 - 4:00pm
Event Calendar Category
Other LIDS Events
Building and Room Number
We provide the first sublinear running time upper bound and a nearly matching lower bound for the discounted Markov decision problem. In particular, we propose a randomized linear programming algorithm for approximating the optimal policy of the discounted Markov decision problem. We show that it finds an ε-optimal policy using nearly-linear running time in the worst case. For Markov decision processes that are ergodic under every stationary policy, we show that the algorithm finds an ε-optimal policy using running time linear in the total number of state-action pairs, which is sublinear in the input size. We establish computational lower bound on the run time of any randomized algorithm for solving the MDP, which suggests that the complexity of MDP depends on data structures of the input. These results provide new complexity benchmarks for solving stochastic dynamic programs. The proposed algorithm can be extended to apply to reinforcement learning and yield sharp PAC sample complexity.
Mengdi Wang is interested in data-driven stochastic optimization and applications in machine and reinforcement learning. She received her PhD in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2013. At MIT, Mengdi was affiliated with the Laboratory for Information and Decision Systems and was advised by Dimitri P. Bertsekas. Mengdi became an assistant professor at Princeton in 2014. She received the Young Researcher Prize in Continuous Optimization of the Mathematical Optimization Society in 2016 (awarded once every three years), the Princeton SEAS Innovation Award in 2016, and the NSF Career Award in 2017.