Umar Syed will present his preFPO on Friday February 20 at 10:30AM in Room 302. The members of his committee are: Rob Schapire advisor; David Blei and Warren Powell (ORF), readers; Michael Littman (Rutgers) and Yael Niv (PSY), non-readers. Everyone is invited to attend his talk. His abstract follows below. --------------------------- Title: Algorithms for Hybrid Supervised/Reinforcement Learning Problems Abstract: Reinforcement and supervised learning are two different approaches to machine learning. For an illustration their differences, consider the task of learning to drive a car. A reinforcement learning algorithm learns how to drive be trying several different driving behaviors, and uses feedback from the environment to decided which behavior is best. The feedback is delivered in the form of rewards, which reflect the desirability of various events such as crashing, staying on the road, etc. By contrast, a supervised learning algorithm passively observes an expert driver, and attempts to imitate his driving behavior. In general, reinforcement learning algorithms are concerned with maximizing reward, and supervised learning algorithms are concerned with matching observed behavior. In this talk, I will discuss learning algorithms that combine aspects of both reinforcement and supervised learning, leveraging the strengths of each approach. Specifically, I will describe algorithms for (i) using observations from an expert to learn high-reward behavior in environments where the true rewards are only partially known; (ii) improving the imitation of an expert's behavior by imposing reward-based constraints on the space of possible behaviors; (iii) learning high-reward behavior in an environment where the rewards can abruptly, but predictably, change. I will present theoretical performance guarantees for each of our algorithms, which in some cases represent substantial improvements over guarantees for earlier methods. Our algorithms are motivated by problems in vehicle navigation, spoken dialog systems, and web search. I will present experimental results from some of these domains. Includes joint work with Rob Schapire, Michael Bowling (U Alberta), Jason Williams (AT&T), Nina Mishra (Microsoft), and Alex Slivkins (Microsoft).