Yichen Chen will present his general exam on Monday, May 16, 2016 at 12pm in CS302
Yichen Chen will present his general exam on Monday, May 16, 2016 at 12pm in CS302 The members of his committee are Han Liu, adviser (ORFE), Mengdi Wang (ORFE), Elad Hazan, and Sanjeev Arora Everyone is invited to attend his talk, and those faculty wishing to remain for the oral exam following are welcome to do so. His abstract and reading list follow below. Title:Random multi-constraint projection and an online primal-dual method for discounted markovian decision process Abstract: In this talk I will first discuss the problem of solving an optimization problem subject to large number of constraints. Usual approach like stochastic gradient descent fails in the sense that the projection step is too computationally expensive. We propose a class of algorithms that perform both stochastic gradient descent and random feasibility updates simultaneously. Our algorithm does not subject to the heavy computation of projection. I will show the almost sure convergence of these algorithms, and analyze the iterates’ feasibility error and optimality error, respectively. The rate analysis and numerical experiments reveal that the algorithm using the polyhedral-set projection scheme is the most efficient one within known algorithms. Then I will discuss an on going work that uses online primal-dual method to recover the optimal policy under Markovian Decision Process. We focus on the black-box model where transition probabilities and state transition cost are unknown. We propose a stochastic primal-dual algorithm for solving the linear formulation of the Bellman equation. The algorithm updates the primal and dual iterates by using sample state transitions and sample costs generated by the simulator. We show that the t-th averaged dual iterate can be implemented as a near-optimal randomized policy, achieving an efficiency loss of the order 1/ sqrt(t). Moreover, we provide a thresholding procedure that recovers the exact optimal policy from the dual iterates with high probability. Book: Richard S Sutton and Andrew G Barto. Introduction to reinforcement learning, volume 135. MIT Press Cambridge, 1998. Reading List: [1] Dimitri P Bertsekas and John N Tsitsiklis. Neuro-dynamic programming: an overview. In Decision and Control, 1995., Proceedings of the 34th IEEE Conference on, volume 1, pages 560–564. IEEE, 1995. [2] Nicolo Cesa-Bianchi, Ofer Dekel, and Ohad Shamir. Online learning with switching costs and other adaptive adversaries. In Advances in Neural Information Processing Systems, pages 1160–1168, 2013. [3] Randy Cogill. Primal-dual algorithms for discounted markov decision pro- cesses. In Control Conference (ECC), 2015 European, pages 260–265. IEEE, 2015. [4] Daniela Pucci de Farias and Benjamin Van Roy. The linear programming approach to approximate dynamic programming. Operations Research, 51(6):850–865, 2003. [5] Vijay V Desai, Vivek F Farias, and Ciamac C Moallemi. Approximate dy- namic programming via a smoothed linear program. Operations Research, 60(3):655–674, 2012. [6] Luc Devroye, Gabor Lugosi, and Gergely Neu. Random-walk perturba- tions for online combinatorial optimization. Information Theory, IEEE Transactions on, 61(7):4099–4106, 2015. [7] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016. [8] John N Tsitsiklis and Benjamin Van Roy. An analysis of temporal- difference learning with function approximation. Automatic Control, IEEE Transactions on, 42(5):674–690, 1997. [9] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4):279–292, 1992. [10] Yinyu Ye. The simplex and policy-iteration methods are strongly polyno- mial for the markov decision problem with a fixed discount rate. Mathe- matics of Operations Research, 36(4):593–603, 2011.
participants (1)
-
Nicki Gotsis