[talks] Yingfei Wang Generals: Thursday, May 15, 2014 at 1pm in Room 402

Thu May 8 11:21:48 EDT 2014

Yingfei Wang will present the general exam on  Thursday, May 15, 2014 at 1pm in Room 402 .  Yingfei Wang's committee members are Warren Powell, Robert Schapire and David Blei.

Everyone is invited to attend the talk and those faculty wishing to remain for the oral exam following are welcome to do so.  Yingfei's abstract and reading list follow below:

Abstract：
We consider theoretical analysis, algorithm development and real world applications on optimizing sequential design-of-experiments problems. One goal can be to find the alternative with the best underlying average performance within a limited measurement budget, which is often addressed in online Bayesian Ranking and Selection Problems. We derived the first finite-time bound of the knowledge-gradient policy for ranking and selection problems under the condition that the value of information is submodular. We build a new, public-domain, Matlab-based testing environment for comparing a number of policies on a wide range of learning problems, providing the most comprehensive testbed that has yet appeared in the literature. We then consider decision making problems, motivated by applications in experimental science, in which the learner has the ability to repeatedly select a batch, that is, a multi-set of size B over M possible alternatives or a nested batch with the B alternatives sharing the same values of several controllable parameters, and then receives the payoffs for just the selected alternatives. The goal is to find out the one set of tunable control parameters that maximizes some utility function. We extend the sequential ranking and selection problem into a general framework for batch-mode learning and nested-batch-mode learning. By formulating the problem within a dynamic programming framework, we derive the Knowledge-Gradient variants to tackle both batch and nested-batch measurements. A Monte Carlo sampling procedure is applied to tackle intractable expectations computation.

Readings:
Stuart Russell and Peter Norvig, Artificial intelligence: a modern approach (Chapters 3, 4.1, 5, 7, 13-15, 17-18, 21),

Sebastien Bubeck and Nicolo Cesa-bianchi, Regret Analysis of Stochastic and Non-stochastoc Multi-armed Bandit Problems (Chapters 1-5).

Robert Schapire’s lecture notes for the course Theoretical Machine Learning.
http://www.cs.princeton.edu/courses/archive/spring13/cos511/index.html

John Gittins, Kevin Glazebrook, and Richard Weber, Multi-armed Bandit Allocation Indices end Edition (Chapter 2).

Ilya Ryzhov, Warren Powell, and Peter Frazier, The knowledge gradient algorithm for a general class of online learning problems. Operations Research, 2012, 60(1): 180-195.

Martijn Mes, Warren Powell, and Peter Frazier, Hierarchical knowledge gradient for sequential sampling. The Journal of Machine Learning Research, 2011, 12: 2931-2974.

Peter Frazier, and Warren Powell, Consistency of sequential Bayesian sampling policies. SIAM Journal on Control and Optimization, 2011, 49(2): 712-731.

Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer, Finite-time analysis of the multiarmed bandit problem.  Machine learning, 2002, 47(2-3): 235-256.

Daniel Golovin and Andreas Krause, Adaptive submodularity: A new approach to active learning and stochastic optimization. COLT. 2010: 333-345.

Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire, The non-stochastic multi-armed bandit problem.  SIAM Journal on Computing, 2002, 32(1): 48-77.