Riley Simmons-Edler will present his FPO "Overcoming Sampling and Exploration Challenges in Deep Reinforcement Learning" on Tuesday, May 3, 2022 at 5:00 PM in CS 402 and Zoom.

Location: Zoom link: https://princeton.zoom.us/j/91838854729?pwd=MXhZOGtiem8wQkoyMjA4Z3hJNWxVQT09 and CS 402

The members of Riley’s committee are as follows:

Examiners: H. Sebastian Seung (Adviser), Olga Russakovsky, Szymon Rusinkiewicz

Readers: Karthik Narasimhan, Daniel Lee (Cornell Tech)

A copy of his thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:

The combination of deep neural networks with reinforcement learning (RL) shows great promise for solving otherwise intractable learning tasks. However, practical demonstrations of deep reinforcement learning remain scarce. The challenges in using deep RL for a given task can be grouped into two categories, broadly “What to learn from experience?” and “What experience to learn from?” In this thesis, I describe work to address the second category. Specifically, problems of sampling actions, states, and trajectories which contain information relevant to learning tasks. I examine this challenge at three levels of algorithm design and task complexity, from algorithmic components to hybrid combination algorithms that break common RL conventions.

In the first chapter, I describe work on stable and efficient sampling of actions that optimize a Q-function of continuous-valued actions. By combining a sample-based optimizer with neural network approximation, it is possible to obtain stability in training, computational efficiency, and precise inference.

In the second chapter, I describe work on reward-aware exploration, the discovery of desirable behaviors where common sampling methods are insufficient. A teacher “exploration” agent discovers states and trajectories which maximize the amount a student “exploitation” agent learns on those experiences, and can enable the student agent to solve hard tasks which are otherwise impossible.

In the third chapter, I describe work combining reinforcement learning with heuristic search, for use in task domains where the transition model is known, but where the combinatorics of the state space are intractable for traditional search. By combining deep Q-learning with a best-first tree search algorithm, it is possible to find solutions to program synthesis problems with dramatically fewer samples than with common search algorithms or RL alone.

Lastly, I conclude with a summary of the major takeaways of this work, and discuss extensions and future directions for efficient sampling in RL.

Louis Riehl
Graduate Administrator
Computer Science Department, CS213
Princeton University
(609) 258-8014