Yuping Luo will present his Pre FPO "On Designing Efficient and Effective Deep Model-based Reinforcement Learning Algorithms" on Friday, February 4, 2022 at 3pm via Zoom.

Zoom link: https://princeton.zoom.us/j/93379363621?pwd=QWFiV0JZT3VwTXVacnc4SEh5UVErdz09

Committee:
Sanjeev Arora (advisor, examiner)
Karthik Narasimhan (examiner)
Tengyu Ma (examiner, Stanford)
Elad Hazan (reader)
Jason Lee (reader, ECE)

All are welcome to attend.

Abstract:
Tengyu Ma is an outside committee member. He is an Assistant Professor at Stanford University. He graduated from Princeton University, also advised by Prof. Sanjeev Arora (my advisor), I have a lot of collaborations with him and think he can be a great fit for my committee. 

Abstract: 
Reinforcement Learning (RL) enables the agent to make decisions in a complex interactive environment. To deal with a large state space, deep reinforcement learning employs neural networks as powerful function approximators. Recent advances in deep reinforcement learning have demonstrated its great potential for real-world problems. However, two concerns prevent RL from being applied: Efficiency and Efficacy. Can we design algorithms to learn a policy with desired property (effectively) with limited information (efficiently)? The property can be high total rewards, safety, or interpretability. To this end, model-based reinforcement learning algorithms use a dynamics model, whether learned or given, to assist policy learning. The access to dynamics models empowers the algorithms to plan, which is key to sequential decision making.

In this talk, I will present our works on improving the efficiency and efficacy of reinforcement learning by designing model-based algorithms. I'll first talk about our work on offline reinforcement learning, which can efficiently use expert demonstration. The key idea is to learn a conservatively-extrapolated value function and a dynamics model to make local corrections, so the policy will stay close to the demonstration and achieve high total rewards. Next, I'll talk about our work on effectively learning a policy with safety constraints during training. The key idea is to learn barrier certificates and a dynamics model to certify safe states so that unsafe states can be avoided.