Donghun Lee will present his FPO, "Learning To Learn Optimally: A Practical Framework for Machine Learning Applications With Finite Time Horizon" on Tuesday, 4/23/2019 at 2pm in CS 105.

The members of his committee are as follows: Examiners: Warren Powell (adviser), Ryan Adams, and Peter Ramadge (ELE); Readers: Mengdi Wang (ORFE), Yuxin Chen (ELE), and Elad Hazan.

A copy of his thesis is available upon request. All are welcome to attend. Abstract follows below.

Most machine learning algorithms with asymptotic guarantees leave nite time

horizon issues such as initialization or tuning open to the end users, to whom the

burden may cause undesirable outcome in practice where nite time horizon performance

matters. As an inspirational case of the undesirable nite time behavior, we

identify the nite time bias in Q-learning algorithm and present a method to alleviate

the bias on-the-

y. Motivated by the gap between the asymptotic guarantees and

the practical burdens of machine learning, we investigate the problem of learning to

learn, dened as the problem of learning how to apply a given machine learning algorithm

to solve a given task with a nite time horizon objective function. To address

the problem more generally, we develop the framework of learning to learn optimally

(LTLO), which models the problem of optimal application of a machine learning algorithm

to a given task in a nite horizon. We demonstrate the use of the LTLO

framework as a modeling tool for a real world problem via an example of learning to

learn how to bid in sponsored search auctions. We show the practical benet of using

the LTLO framework as a baseline to construct meta-LQKG+, a knowledge gradient

based LTLO algorithm designed to solve online hyperparameter optimization approximately

with a few number of trials, and demonstrate the practical sample eciency

of the algorithm. Answering to the need for a robust anytime LTLO algorithm, we

develop online regularized knowledge gradient policy, which solves the problem of

LTLO with high probability and has a sublinear regret bound.