[talks] DH Lee General exam

Melissa Lawson mml at CS.Princeton.EDU
Thu Apr 28 11:32:40 EDT 2011

Donghun Lee will present his research seminar/general exam on Wednesday May 4 

at 1PM in Room 402.  The members of his committee are:  Warren Powell (ORFE, advisor), 

Rob Schapire, and Tom Funkhouser.  Everyone is invited to attend his talk and those 

faculty wishing to remain for the oral exam following are welcome to do so.  His 

abstract and reading list follow below.



Value iteration and Q learning algorithms overestimate their value functions due to bias
introduced by the maximum operator in the update rules. The bias can be substantial when
these algorithms are applied to problems which contain large variances in rewards due to
innate stochasticity in the cost function. This problem gets worse as the action space
gets larger. We aim to bound the value function when there exist such bias introduced by
the maximum operator in these algorithms. We seek to gain insights into the rate of
convergence of the value function in the presence of high levels of noise in the cost


Reading List


.        Powell,W. B. (2007). Approximate Dynamic Programming: Solving the Curses of
Dimensionality, chapter 6, pages 179-224. Wiley-Interscience

.        Bertsekas, D. P. (2009). Dynamic Programming and Optimal Control, chapter 6
sections 6.1-6.5, pages 327-446. Athena Scientific

.        Russell, S. and Norvig, P. (2003). Artificial Intelligence: A Modern Approach,
2nd Edition, chapter 21, pages 763-789. Prentice Hall


.        Even-Dar, E. and Mansour, Y. (2003). Learning rates for Q-learning. Journal of
Machine Learning Research, 5:1-25.

.        Kearns, M. and Singh, S. (1999). Finite-sample convergence rates for Q-learning
and indirect algorithms. In Neural Information Processing Systems, volume 12, pages
996-1002. MIT Press.

.        Kulkarni, S. R. and Horn, C. S. (1996). An alternative proof for convergence of
stochastic approximation algorithms. IEEE Transactions on Automatic Control,

.        Lim, S. H. and DeJong, G. (2005). Towards finite-sample convergence of direct
reinforcement learning. In Proceedings of European Conference on Machine Learning, pages

.        Strehl, A. L., Li, L., and Littman, M. L. (2009). Reinforcement learning in
finite MDPs: PAC analysis. Journal of Machine Learning Research, 10:2413-2444.

.        Tsitsiklis, J. N. (1994). Asynchronous stochastic approximation and Q-learning.
Machine Learning, 16:185-202.

.        Wang, I.-J., Chong, E. K., and Kulkarni, S. R. (1996). Equivalent necessary and
sufficient conditions on noise sequences for stochastic approximation algorithms. Advances
in Applied Probability, 28:1996.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/talks/attachments/20110428/d81fc115/attachment.html>

More information about the talks mailing list