Yuping Luo will present his General Exam on Thursday, May 23, 2019 at 4pm in CS 301.

The members of his committee are Sanjeev Arora (adviser), Karthik Narasimhan, and Elad Hazan.

Everyone is invited to attend his talk, and those faculty wishing to remain for the oral exam following are welcome to do so. His abstract and reading list follow below.

Abstract:
In recent years, model-free reinforcement learning has demonstrated its significant success in many fields. However, these algorithms suffer from high sample complexity and require a massive number of interactions with the environment, which can be expensive in the physical world.

Model-based reinforcement learning is considered to be a promising approach to reduce sample complexity. However, the theoretical understanding of such methods has been rather limited. In this talk, I'll present an algorithmic framework for designing and analyzing model-based RL algorithms with theoretical guarantees. We design a meta-algorithm with a theoretical guarantee of monotone improvement to a local maximum of the expected reward. The meta-algorithm iteratively builds a lower bound of the expected reward based on the estimated dynamical model and sample trajectories and then maximizes the lower bound jointly over the policy and the model. Instantiating our framework with simplification gives a variant of model-based RL algorithms Stochastic Lower Bounds Optimization. Experiments demonstrate that our algorithm achieves state-of-the-art performance when only one million or fewer samples are permitted on a range of continuous control benchmark tasks.

[1] Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, and Pieter Abbeel. Model-ensemble trust-region policy optimization. arXiv preprint arXiv:1802.10592, 2018.

[2] Sutton, Richard S. "Dyna, an integrated architecture for learning, planning, and reacting." ACM SIGART Bulletin 2.4 (1991): 160-163.

[3] Gu, Shixiang, et al. "Continuous deep q-learning with model-based acceleration." International Conference on Machine Learning. 2016.

[4] Buckman, Jacob, et al. "Sample-efficient reinforcement learning with stochastic ensemble value expansion." Advances in Neural Information Processing Systems. 2018.

[5] Feinberg, Vladimir, et al. "Model-based value estimation for efficient model-free reinforcement learning." arXiv preprint arXiv:1803.00101 (2018).

[6] Pascanu, Razvan, et al. "Learning model-based planning from scratch." arXiv preprint arXiv:1707.06170 (2017).

[7] Clavera, Ignasi, et al. "Model-based reinforcement learning via meta-policy optimization." arXiv preprint arXiv:1809.05214 (2018).

[8] Chua, Kurtland, et al. "Deep reinforcement learning in a handful of trials using probabilistic dynamics models." Advances in Neural Information Processing Systems. 2018.

[9] Asadi, Kavosh, Dipendra Misra, and Michael L. Littman. "Lipschitz continuity in model-based reinforcement learning." arXiv preprint arXiv:1804.07193 (2018).

[10] Farahmand, Amir-massoud, Andre Barreto, and Daniel Nikovski. "Value-aware loss function for model-based reinforcement learning." Artificial Intelligence and Statistics. 2017.