Yuping Luo will present his Pre FPO "On Designing Efficient and Effective Deep Model-based Reinforcement Learning Algorithms" on Friday, February 4, 2022 at 3pm via Zoom.

3 Feb 2022

      Yuping Luo will present his Pre FPO "On Designing Efficient and Effective Deep Model-based Reinforcement Learning Algorithms" on Friday, February 4, 2022 at 3pm via Zoom. 

Zoom link: https://princeton.zoom.us/j/93379363621?pwd=QWFiV0JZT3VwTXVacnc4SEh5UVErdz09 

Committee: 
Sanjeev Arora (advisor, examiner) 
Karthik Narasimhan (examiner) 
Tengyu Ma (examiner, Stanford) 
Elad Hazan (reader) 
Jason Lee (reader, ECE) 

All are welcome to attend. 

Abstract: 
Tengyu Ma is an outside committee member. He is an Assistant Professor at Stanford University. He graduated from Princeton University, also advised by Prof. Sanjeev Arora (my advisor), I have a lot of collaborations with him and think he can be a great fit for my committee. 

Abstract: 
Reinforcement Learning (RL) enables the agent to make decisions in a complex interactive environment. To deal with a large state space, deep reinforcement learning employs neural networks as powerful function approximators. Recent advances in deep reinforcement learning have demonstrated its great potential for real-world problems. However, two concerns prevent RL from being applied: Efficiency and Efficacy. Can we design algorithms to learn a policy with desired property (effectively) with limited information (efficiently)? The property can be high total rewards, safety, or interpretability. To this end, model-based reinforcement learning algorithms use a dynamics model, whether learned or given, to assist policy learning. The access to dynamics models empowers the algorithms to plan, which is key to sequential decision making. 

In this talk, I will present our works on improving the efficiency and efficacy of reinforcement learning by designing model-based algorithms. I'll first talk about our work on offline reinforcement learning, which can efficiently use expert demonstration. The key idea is to learn a conservatively-extrapolated value function and a dynamics model to make local corrections, so the policy will stay close to the demonstration and achieve high total rewards. Next, I'll talk about our work on effectively learning a policy with safety constraints during training. The key idea is to learn barrier certificates and a dynamics model to certify safe states so that unsafe states can be avoided.

Nicki Mahler

tags

participants (1)