<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 12 (filtered medium)"><style><!--

/* Font Definitions */

@font-face

        {font-family:Wingdings;

        panose-1:5 0 0 0 0 0 0 0 0 0;}

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

@font-face

        {font-family:Consolas;

        panose-1:2 11 6 9 2 2 4 3 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        margin-bottom:.0001pt;

        font-size:12.0pt;

        font-family:"Times New Roman","serif";

        color:black;}

h3

        {mso-style-priority:9;

        mso-style-link:"Heading 3 Char";

        mso-margin-top-alt:auto;

        margin-right:0in;

        mso-margin-bottom-alt:auto;

        margin-left:0in;

        font-size:13.5pt;

        font-family:"Times New Roman","serif";

        color:black;

        font-weight:bold;}

h4

        {mso-style-priority:9;

        mso-style-link:"Heading 4 Char";

        mso-margin-top-alt:auto;

        margin-right:0in;

        mso-margin-bottom-alt:auto;

        margin-left:0in;

        font-size:12.0pt;

        font-family:"Times New Roman","serif";

        color:black;

        font-weight:bold;}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:purple;

        text-decoration:underline;}

p

        {mso-style-priority:99;

        mso-margin-top-alt:auto;

        margin-right:0in;

        mso-margin-bottom-alt:auto;

        margin-left:0in;

        font-size:12.0pt;

        font-family:"Times New Roman","serif";

        color:black;}

pre

        {mso-style-priority:99;

        mso-style-link:"HTML Preformatted Char";

        margin:0in;

        margin-bottom:.0001pt;

        font-size:10.0pt;

        font-family:"Courier New";

        color:black;}

span.Heading3Char

        {mso-style-name:"Heading 3 Char";

        mso-style-priority:9;

        mso-style-link:"Heading 3";

        font-family:"Cambria","serif";

        color:#4F81BD;

        font-weight:bold;}

span.Heading4Char

        {mso-style-name:"Heading 4 Char";

        mso-style-priority:9;

        mso-style-link:"Heading 4";

        font-family:"Cambria","serif";

        color:#4F81BD;

        font-weight:bold;

        font-style:italic;}

span.HTMLPreformattedChar

        {mso-style-name:"HTML Preformatted Char";

        mso-style-priority:99;

        mso-style-link:"HTML Preformatted";

        font-family:"Consolas","serif";

        color:black;}

span.EmailStyle22

        {mso-style-type:personal-reply;

        font-family:"Arial","sans-serif";

        color:blue;

        font-weight:normal;

        font-style:normal;

        text-decoration:none none;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-size:10.0pt;}

@page WordSection1

        {size:8.5in 11.0in;

        margin:1.0in 1.0in 1.0in 1.0in;}

div.WordSection1

        {page:WordSection1;}

/* List Definitions */

@list l0

        {mso-list-id:1218668551;

        mso-list-template-ids:1437878568;}

@list l0:level1

        {mso-level-number-format:bullet;

        mso-level-text:\F0B7;

        mso-level-tab-stop:.5in;

        mso-level-number-position:left;

        text-indent:-.25in;

        mso-ansi-font-size:10.0pt;

        font-family:Symbol;}

@list l1

        {mso-list-id:1871989696;

        mso-list-template-ids:2033373158;}

@list l1:level1

        {mso-level-number-format:bullet;

        mso-level-text:\F0B7;

        mso-level-tab-stop:.5in;

        mso-level-number-position:left;

        text-indent:-.25in;

        mso-ansi-font-size:10.0pt;

        font-family:Symbol;}

ol

        {margin-bottom:0in;}

ul

        {margin-bottom:0in;}

--></style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]--></head><body bgcolor=white lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:blue'>Donghun Lee will present his research seminar/general exam on Wednesday May 4 <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:blue'>at 1PM in Room 402.  The members of his committee are:  Warren Powell (ORFE, advisor), <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:blue'>Rob Schapire, and Tom Funkhouser.  Everyone is invited to attend his talk and those <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:blue'>faculty wishing to remain for the oral exam following are welcome to do so.  His <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:blue'>abstract and reading list follow below.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:blue'>-----------------------<o:p></o:p></span></p><h3>Abstract<o:p></o:p></h3><p style='margin-bottom:0in;margin-bottom:.0001pt'>Value iteration and Q learning algorithms overestimate their value functions due to bias introduced by the maximum operator in the update rules. The bias can be substantial when these algorithms are applied to problems which contain large variances in rewards due to innate stochasticity in the cost function. This problem gets worse as the action space gets larger. We aim to bound the value function when there exist such bias introduced by the maximum operator in these algorithms. We seek to gain insights into the rate of convergence of the value function in the presence of high levels of noise in the cost function.<o:p></o:p></p><p style='margin-bottom:0in;margin-bottom:.0001pt'><o:p> </o:p></p><h3>Reading List<o:p></o:p></h3><h4>Books<o:p></o:p></h4><p style='mso-margin-top-alt:5.0pt;margin-right:0in;margin-bottom:0in;margin-left:.5in;margin-bottom:.0001pt;text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style='font-size:10.0pt;font-family:Symbol'><span style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'>        </span></span></span><![endif]>Powell,W. B. (2007). Approximate Dynamic Programming: Solving the Curses of Dimensionality, chapter 6, pages 179–224. Wiley-Interscience<o:p></o:p></p><p style='mso-margin-top-alt:5.0pt;margin-right:0in;margin-bottom:0in;margin-left:.5in;margin-bottom:.0001pt;text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style='font-size:10.0pt;font-family:Symbol'><span style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'>        </span></span></span><![endif]>Bertsekas, D. P. (2009). Dynamic Programming and Optimal Control, chapter 6 sections 6.1-6.5, pages 327–446. Athena Scientific<o:p></o:p></p><p style='mso-margin-top-alt:5.0pt;margin-right:0in;margin-bottom:0in;margin-left:.5in;margin-bottom:.0001pt;text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style='font-size:10.0pt;font-family:Symbol'><span style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'>        </span></span></span><![endif]>Russell, S. and Norvig, P. (2003). Artificial Intelligence: A Modern Approach, 2<sup>nd</sup> Edition, chapter 21, pages 763–789. Prentice Hall<o:p></o:p></p><h4>Papers<o:p></o:p></h4><p style='mso-margin-top-alt:5.0pt;margin-right:0in;margin-bottom:0in;margin-left:.5in;margin-bottom:.0001pt;text-indent:-.25in;mso-list:l1 level1 lfo2'><![if !supportLists]><span style='font-size:10.0pt;font-family:Symbol'><span style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'>        </span></span></span><![endif]>Even-Dar, E. and Mansour, Y. (2003). Learning rates for Q-learning. Journal of Machine Learning Research, 5:1–25.<o:p></o:p></p><p style='mso-margin-top-alt:5.0pt;margin-right:0in;margin-bottom:0in;margin-left:.5in;margin-bottom:.0001pt;text-indent:-.25in;mso-list:l1 level1 lfo2'><![if !supportLists]><span style='font-size:10.0pt;font-family:Symbol'><span style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'>        </span></span></span><![endif]>Kearns, M. and Singh, S. (1999). Finite-sample convergence rates for Q-learning and indirect algorithms. In Neural Information Processing Systems, volume 12, pages 996–1002. MIT Press.<o:p></o:p></p><p style='mso-margin-top-alt:5.0pt;margin-right:0in;margin-bottom:0in;margin-left:.5in;margin-bottom:.0001pt;text-indent:-.25in;mso-list:l1 level1 lfo2'><![if !supportLists]><span style='font-size:10.0pt;font-family:Symbol'><span style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'>        </span></span></span><![endif]>Kulkarni, S. R. and Horn, C. S. (1996). An alternative proof for convergence of stochastic approximation algorithms. IEEE Transactions on Automatic Control, 41(3):419–424.<o:p></o:p></p><p style='mso-margin-top-alt:5.0pt;margin-right:0in;margin-bottom:0in;margin-left:.5in;margin-bottom:.0001pt;text-indent:-.25in;mso-list:l1 level1 lfo2'><![if !supportLists]><span style='font-size:10.0pt;font-family:Symbol'><span style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'>        </span></span></span><![endif]>Lim, S. H. and DeJong, G. (2005). Towards finite-sample convergence of direct reinforcement learning. In Proceedings of European Conference on Machine Learning, pages 230–241.<o:p></o:p></p><p style='mso-margin-top-alt:5.0pt;margin-right:0in;margin-bottom:0in;margin-left:.5in;margin-bottom:.0001pt;text-indent:-.25in;mso-list:l1 level1 lfo2'><![if !supportLists]><span style='font-size:10.0pt;font-family:Symbol'><span style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'>        </span></span></span><![endif]>Strehl, A. L., Li, L., and Littman, M. L. (2009). Reinforcement learning in finite MDPs: PAC analysis. Journal of Machine Learning Research, 10:2413–2444.<o:p></o:p></p><p style='mso-margin-top-alt:5.0pt;margin-right:0in;margin-bottom:0in;margin-left:.5in;margin-bottom:.0001pt;text-indent:-.25in;mso-list:l1 level1 lfo2'><![if !supportLists]><span style='font-size:10.0pt;font-family:Symbol'><span style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'>        </span></span></span><![endif]>Tsitsiklis, J. N. (1994). Asynchronous stochastic approximation and Q-learning. Machine Learning, 16:185–202.<o:p></o:p></p><p style='mso-margin-top-alt:5.0pt;margin-right:0in;margin-bottom:0in;margin-left:.5in;margin-bottom:.0001pt;text-indent:-.25in;mso-list:l1 level1 lfo2'><![if !supportLists]><span style='font-size:10.0pt;font-family:Symbol'><span style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'>        </span></span></span><![endif]>Wang, I.-J., Chong, E. K., and Kulkarni, S. R. (1996). Equivalent necessary and sufficient conditions on noise sequences for stochastic approximation algorithms. Advances in Applied Probability, 28:1996.<o:p></o:p></p><p class=MsoNormal><br><br><o:p></o:p></p><pre><o:p> </o:p></pre></div></body></html>