Donghun Lee’s generals will be in **302** not 402 as previously announced.

Sorry for any confusion.

--------------------------------------------------------------------------------

Donghun Lee will present his research seminar/general exam on Wednesday May 4

at 1PM in Room 302. The members of his committee are: Warren Powell (ORFE, advisor),

Rob Schapire, and Tom Funkhouser. Everyone is invited to attend his talk and those

faculty wishing to remain for the oral exam following are welcome to do so. His

abstract and reading list follow below.

-----------------------

Abstract

Value iteration and Q learning algorithms overestimate their value functions due to bias introduced by the maximum operator in the update rules. The bias can be substantial when these algorithms are applied to problems which contain large variances in rewards due to innate stochasticity in the cost function. This problem gets worse as the action space gets larger. We aim to bound the value function when there exist such bias introduced by the maximum operator in these algorithms. We seek to gain insights into the rate of convergence of the value function in the presence of high levels of noise in the cost function.

Reading List

Books

· Powell,W. B. (2007). Approximate Dynamic Programming: Solving the Curses of Dimensionality, chapter 6, pages 179–224. Wiley-Interscience

· Bertsekas, D. P. (2009). Dynamic Programming and Optimal Control, chapter 6 sections 6.1-6.5, pages 327–446. Athena Scientific

· Russell, S. and Norvig, P. (2003). Artificial Intelligence: A Modern Approach, 2^nd Edition, chapter 21, pages 763–789. Prentice Hall

Papers

· Even-Dar, E. and Mansour, Y. (2003). Learning rates for Q-learning. Journal of Machine Learning Research, 5:1–25.

· Kearns, M. and Singh, S. (1999). Finite-sample convergence rates for Q-learning and indirect algorithms. In Neural Information Processing Systems, volume 12, pages 996–1002. MIT Press.

· Kulkarni, S. R. and Horn, C. S. (1996). An alternative proof for convergence of stochastic approximation algorithms. IEEE Transactions on Automatic Control, 41(3):419–424.

· Lim, S. H. and DeJong, G. (2005). Towards finite-sample convergence of direct reinforcement learning. In Proceedings of European Conference on Machine Learning, pages 230–241.

· Strehl, A. L., Li, L., and Littman, M. L. (2009). Reinforcement learning in finite MDPs: PAC analysis. Journal of Machine Learning Research, 10:2413–2444.

· Tsitsiklis, J. N. (1994). Asynchronous stochastic approximation and Q-learning. Machine Learning, 16:185–202.

· Wang, I.-J., Chong, E. K., and Kulkarni, S. R. (1996). Equivalent necessary and sufficient conditions on noise sequences for stochastic approximation algorithms. Advances in Applied Probability, 28:1996.