[talks] U Syed preFPO

Fri Feb 13 11:00:09 EST 2009

Umar Syed will present his preFPO on Friday February 20 at 10:30AM in Room 302.
The members of his committee are: Rob Schapire advisor;  David Blei and Warren 
Powell (ORF), readers;  Michael Littman (Rutgers) and Yael Niv (PSY), non-readers.
Everyone is invited to attend his talk.  His abstract follows below.
---------------------------
Title: Algorithms for Hybrid Supervised/Reinforcement Learning Problems

Abstract:

Reinforcement and supervised learning are two different approaches to machine learning.
For an illustration their differences, consider the task of learning to drive a car. A
reinforcement learning algorithm learns how to drive be trying several different driving
behaviors, and uses feedback from the environment to decided which behavior is best. The
feedback is delivered in the form of rewards, which reflect the desirability of various
events such as crashing, staying on the road, etc. By contrast, a supervised learning
algorithm passively observes an expert driver, and attempts to imitate his driving
behavior. In general, reinforcement learning algorithms are concerned with maximizing
reward, and supervised learning algorithms are concerned with matching observed behavior.

In this talk, I will discuss learning algorithms that combine aspects of both
reinforcement and supervised learning, leveraging the strengths of each approach.
Specifically, I will describe algorithms for (i) using observations from an expert to
learn high-reward behavior in environments where the true rewards are only partially
known; (ii) improving the imitation of an expert's behavior by imposing reward-based
constraints on the space of possible behaviors; (iii) learning high-reward behavior in an
environment where the rewards can abruptly, but predictably, change. I will present
theoretical performance guarantees for each of our algorithms, which in some cases
represent substantial improvements over guarantees for earlier methods. Our algorithms are
motivated by problems in vehicle navigation, spoken dialog systems, and web search. I will
present experimental results from some of these domains.

Includes joint work with Rob Schapire, Michael Bowling (U Alberta), Jason Williams (AT&T),
Nina Mishra (Microsoft), and Alex Slivkins (Microsoft).