Riley Simmons-Edler will present his Pre FPO "Overcoming Sampling and Exploration Challenges in Deep Reinforcement Learning" on Friday, March 12, 2021 at 4pm via Zoom

5 Mar 2021

      Riley Simmons-Edler will present his Pre FPO "Overcoming Sampling and Exploration Challenges in Deep Reinforcement Learning" on Friday, March 12, 2021 at 4pm via Zoom 

[ https://princeton.zoom.us/j/92165806242?pwd=VWhvWW1DZXpqWUsvY0pVclE1eklVZz09 | https://princeton.zoom.us/j/92165806242?pwd=VWhvWW1DZXpqWUsvY0pVclE1eklVZz09  ] 

Committee: 
Sebastian Seung (advisor, examiner) 
Olga Russokovsky (examiner) 
Szymon Rusinkiewicz (examiner) 
Daniel Lee (external- Cornell, reader) 
Karthik Narasimhan (reader) 

Talk abstract follows below. All are welcome to attend. 

Abstract: 

The combination of deep neural networks with the algorithms and formalisms of reinforcement learning (RL) shows great promise for solving otherwise intractable learning tasks. However, practical applications of deep reinforcement learning remain scarce. The outstanding challenges of deep RL can be broadly grouped into two categories, broadly described as "What to learn from experiences?" and "What experiences to learn from?" In this thesis, I will describe the work I have done to address the second category, which is historically the less-studied domain. Specifically, I address problems of sampling- whether actions, states, or trajectories- which contain information sufficient for learning a task. I examine this challenge at three levels of algorithm design and task complexity, from algorithmic submodules to combinations of algorithms to hybrid algorithms that violate common RL conventions. 

First, I will present my work on stable and efficient sampling of actions that optimize a Q-function of continuous-valued actions. By combining a sample-based optimizer with neural network approximation, it is possible to obtain both stability in training as well as computational efficiency and precision in inference. 

Second, I will present my work on reward-aware exploration, the discovery of highly-rewarding states in tasks where commonly-used sampling methods are insufficient. A teacher "exploration" agent can discover states and trajectories which maximize the amount a student "exploitation" agent learns on those experiences, and can enable the student agent to solve hard tasks which are otherwise impossible for it. 

Third, I will present my work combining reinforcement learning with heuristic search, for use in task domains where the transition model is known, but where the combinatorics of the state space are intractable for traditional search algorithms. I show that by combining deep Q-learning with a best-first tree search algorithm, it is possible to find solutions to simple program synthesis problems with dramatically fewer samples than common search algorithms require. 

Lastly, I will conclude with a summary of the major takeaways of this work, and discuss potential extensions and future directions for the efficient sampling of useful experiences in RL.

Nicki Mahler

tags

participants (1)