Yichen Chen will present his FPO "On the complexity of Markov decision problems" on Friday, 4/3/2020 at 10:45am via Zoom.

Link to Zoom Meeting: https://princeton.zoom.us/j/271401282

The members of his committee are as follows: Mengdi Wang (Adviser); Readers: Yuxin Chen and Karthik Narasimhan; Examiners: Sanjeev Arora, Robert Schapire, and Mengdi Wang.

A copy of his thesis is available upon request (please email ngotsis@cs.princeton.edu). All are welcome to attend. The talk abstract follows below.

The Markov decision problem (MDP) is one of the most basic models for sequential

decision-making problems in a dynamic environment where outcomes are partly random. It models a stochastic control process in which a planner makes a sequence

of decisions as the system evolves. The Markov decision problem provides a mathematical framework for dynamic programming, stochastic control, and reinforcement

learning. In this thesis, we study the complexity of solving MDPs.

In the first part of the thesis, we propose a class of stochastic primal-dual methods

for solving MDPs. We formulate the core policy optimization problem of the MDP

as a stochastic saddle point problem. By utilizing the value-policy duality structure,

the algorithm samples state-action transitions and makes alternative updates to value

and policy estimates. We prove that our algorithm is able to find the approximate

optimal policies to Markov decision problems with small space and computational

complexity. Using linear models to represent the value functions and the policies, our

algorithm is capable of scaling to problems with infinite and continuous state spaces.

In the second part of the thesis, we establish the computational complexity lower

bounds for solving MDPs. We prove our results by modeling the MDP algorithms

using branching programs and then characterizing the properties of these programs

by quantum arguments. The analysis is also extended to study the complexity of

solving two-player turn-based Markov games. Our results show that if we have a

simulator that can sample according to the transition probability function in O(1),

the lower bounds have reduced dependence on the state number. These results suggest a fundamental di↵erence between Markov decision problems with and without a

simulator.

We believe that our results provide a new piece of theoretical evidence for the

success of simulation-based methods in solving MDPs and Markov games.