Donghun Lee will present his FPO, "Learning To Learn Optimally: A Practical Framework for Machine Learning Applications With Finite Time Horizon" on Tuesday, 4/23/2019 at 2pm in CS 105.

16 Apr 2019

      Donghun Lee will present his FPO, "Learning To Learn Optimally: A Practical Framework for Machine Learning Applications With Finite Time Horizon" on Tuesday, 4/23/2019 at 2pm in CS 105. 

The members of his committee are as follows: Examiners: Warren Powell (adviser), Ryan Adams, and Peter Ramadge (ELE); Readers: Mengdi Wang (ORFE), Yuxin Chen (ELE), and Elad Hazan. 

A copy of his thesis is available upon request. All are welcome to attend. Abstract follows below. 

Most machine learning algorithms with asymptotic guarantees leave nite time 
horizon issues such as initialization or tuning open to the end users, to whom the 
burden may cause undesirable outcome in practice where nite time horizon performance 
matters. As an inspirational case of the undesirable nite time behavior, we 
identify the nite time bias in Q-learning algorithm and present a method to alleviate 
the bias on-the- 
y. Motivated by the gap between the asymptotic guarantees and 
the practical burdens of machine learning, we investigate the problem of learning to 
learn, de ned as the problem of learning how to apply a given machine learning algorithm 
to solve a given task with a nite time horizon objective function. To address 
the problem more generally, we develop the framework of learning to learn optimally 
(LTLO), which models the problem of optimal application of a machine learning algorithm 
to a given task in a nite horizon. We demonstrate the use of the LTLO 
framework as a modeling tool for a real world problem via an example of learning to 
learn how to bid in sponsored search auctions. We show the practical bene t of using 
the LTLO framework as a baseline to construct meta-LQKG+, a knowledge gradient 
based LTLO algorithm designed to solve online hyperparameter optimization approximately 
with a few number of trials, and demonstrate the practical sample eciency 
of the algorithm. Answering to the need for a robust anytime LTLO algorithm, we 
develop online regularized knowledge gradient policy, which solves the problem of 
LTLO with high probability and has a sublinear regret bound.

Nicki Mahler

tags

participants (1)