Yuping Luo will present his Pre FPO "On Designing Efficient and Effective Deep Model-based Reinforcement Learning Algorithms" on Friday, February 4, 2022 at 3pm via Zoom.
Yuping Luo will present his Pre FPO "On Designing Efficient and Effective Deep Model-based Reinforcement Learning Algorithms" on Friday, February 4, 2022 at 3pm via Zoom. Zoom link: https://princeton.zoom.us/j/93379363621?pwd=QWFiV0JZT3VwTXVacnc4SEh5UVErdz09 Committee: Sanjeev Arora (advisor, examiner) Karthik Narasimhan (examiner) Tengyu Ma (examiner, Stanford) Elad Hazan (reader) Jason Lee (reader, ECE) All are welcome to attend. Abstract: Tengyu Ma is an outside committee member. He is an Assistant Professor at Stanford University. He graduated from Princeton University, also advised by Prof. Sanjeev Arora (my advisor), I have a lot of collaborations with him and think he can be a great fit for my committee. Abstract: Reinforcement Learning (RL) enables the agent to make decisions in a complex interactive environment. To deal with a large state space, deep reinforcement learning employs neural networks as powerful function approximators. Recent advances in deep reinforcement learning have demonstrated its great potential for real-world problems. However, two concerns prevent RL from being applied: Efficiency and Efficacy. Can we design algorithms to learn a policy with desired property (effectively) with limited information (efficiently)? The property can be high total rewards, safety, or interpretability. To this end, model-based reinforcement learning algorithms use a dynamics model, whether learned or given, to assist policy learning. The access to dynamics models empowers the algorithms to plan, which is key to sequential decision making. In this talk, I will present our works on improving the efficiency and efficacy of reinforcement learning by designing model-based algorithms. I'll first talk about our work on offline reinforcement learning, which can efficiently use expert demonstration. The key idea is to learn a conservatively-extrapolated value function and a dynamics model to make local corrections, so the policy will stay close to the demonstration and achieve high total rewards. Next, I'll talk about our work on effectively learning a policy with safety constraints during training. The key idea is to learn barrier certificates and a dynamics model to certify safe states so that unsafe states can be avoided.
participants (1)
-
Nicki Mahler