Yuping Luo will present his FPO "Towards Designing Efficient and Effective Deep Model-based Reinforcement Learning Algorithms" on Tuesday, June 21, 2022 at 1:30 PM via Zoom.

Location: Zoom link: https://princeton.zoom.us/j/94826666865

The members of Yuping’s committee are as follows:

Examiners: Sanjeev Arora (Adviser), Karthik Narasimhan, Tengyu Ma (Stanford University)

Readers: Elad Hazan, Jason Lee

A copy of his thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:

Recent advances in deep reinforcement learning have demonstrated its great potential for real-world problems. However, two concerns prevent reinforcement learning from being applied: Efficiency and Efficacy. This dissertation studies how to design improving the efficiency and efficacy of reinforcement learning by designing deep model-based algorithms. The access to dynamics models empowers the algorithms to plan, which is key to sequential decision making. Four topics are covered in this dissertation, which are online reinforcement learning, the expressivity of neural networks in deep reinforcement learning, offline reinforcement learning, and safe reinforcement learning. For online reinforcement learning, we present an algorithmic framework with theoretical guarantees by utilizing a lower bound of performance the policy learned in the learned environment can obtain in the real environment, and we also empirically verify the efficiency of our proposed method. For expressivity of neural networks in deep reinforcement learning, we prove that in some scenarios, the model-based approaches can require much less representation power to approximate a near-optimal policy than model-free approaches, and empirically show that this can be an issue in physically simulation environments and a model-based planner can help. For offline reinforcement learning, we devise an algorithm that stays close to the provided expert demonstration set to reduce distribution shift, and we also conduct experiments to demonstrate the efficacy of our methods to improve the success rate for simulated robotics environments. For safe reinforcement learning, we propose a method that uses the learned dynamics model to certify safe states, and our experiments show that our method can learn a decent policy without a single safety violation during training in a set of simple but challenging tasks while baseline algorithms have hundreds of safety violations.

Louis Riehl
Graduate Administrator
Computer Science Department, CS213
Princeton University
(609) 258-8014