Donghun Lee will present his FPO, "Learning To Learn Optimally: A Practical Framework for Machine Learning Applications With Finite Time Horizon" on Tuesday, 4/23/2019 at 2pm in CS 105.
Donghun Lee will present his FPO, "Learning To Learn Optimally: A Practical Framework for Machine Learning Applications With Finite Time Horizon" on Tuesday, 4/23/2019 at 2pm in CS 105. The members of his committee are as follows: Examiners: Warren Powell (adviser), Ryan Adams, and Peter Ramadge (ELE); Readers: Mengdi Wang (ORFE), Yuxin Chen (ELE), and Elad Hazan. A copy of his thesis is available upon request. All are welcome to attend. Abstract follows below. Most machine learning algorithms with asymptotic guarantees leave nite time horizon issues such as initialization or tuning open to the end users, to whom the burden may cause undesirable outcome in practice where nite time horizon performance matters. As an inspirational case of the undesirable nite time behavior, we identify the nite time bias in Q-learning algorithm and present a method to alleviate the bias on-the- y. Motivated by the gap between the asymptotic guarantees and the practical burdens of machine learning, we investigate the problem of learning to learn, de ned as the problem of learning how to apply a given machine learning algorithm to solve a given task with a nite time horizon objective function. To address the problem more generally, we develop the framework of learning to learn optimally (LTLO), which models the problem of optimal application of a machine learning algorithm to a given task in a nite horizon. We demonstrate the use of the LTLO framework as a modeling tool for a real world problem via an example of learning to learn how to bid in sponsored search auctions. We show the practical bene t of using the LTLO framework as a baseline to construct meta-LQKG+, a knowledge gradient based LTLO algorithm designed to solve online hyperparameter optimization approximately with a few number of trials, and demonstrate the practical sample eciency of the algorithm. Answering to the need for a robust anytime LTLO algorithm, we develop online regularized knowledge gradient policy, which solves the problem of LTLO with high probability and has a sublinear regret bound.
participants (1)
-
Nicki Mahler