The members of his committee are as follows: Kai Li (adviser), Jonathan Cohen (PNI), Hugo Larochelle (Google Brain), Jia Deng, and Karthik Narasimhan.
All are welcome to attend. Please see below for talk title and abstract.
Title: Meta-Learning for Data and Processing Efficiency
Abstract:
Deep learning models have shown great success in a variety of areas, including computer vision, natural language processing, and reinforcement learning. However, deep learning methods still lack the efficiency and flexibility of human learning. Current deep learning methods involve training on a large amount of data to produce a model that can then specialize to the specific task encoded by the training data. Humans, on the other hand, are constantly learning and mastering new concepts throughout our lives with comparatively little experience. In order to bridge this gap, previous work has suggested the use of meta-learning. Rather than learning how to do a specific task, meta-learning involves learning how-to-learn and utilizing this knowledge to learn new tasks more effectively. This talk focuses on work using meta-learning to improve the data and processing efficiency of deep learning models when learning new tasks.
In the first part, we discuss a meta-learning model for the few-shot learning problem, where the aim is to be able to learn a new classification task having unseen classes with few labeled examples. We use a LSTM-based meta-learner model to learn both the initialization and the optimization algorithm used to train another neural network in the few-shot regime and show that our method compares favorably to nearest-neighbor approaches. The second part of the thesis deals with improving the predictive uncertainty of models in the few-shot learning setting. Proper predictive uncertainty is key to deploying machine learning models in the wild (especially in settings with little data), as proper human intervention can be applied when a model’s prediction is known to be uncertain. Using a Bayesian perspective, we propose a meta-learning method which efficiently amortizes hierarchical variational inference across tasks, learning a prior distribution over neural network weights so that a few steps of gradient descent will produce a good task-specific approximate posterior. Finally, we focus on applying meta-learning in the context of making choices that impact processing efficacy. When training a network on multiple tasks, we have a choice between interactive parallelism (training on different tasks one after another) and independent parallelism (using the network to process multiple tasks concurrently). For the simulation environment we consider, we show that there is a trade-off between these two types of processing choices in deep neural networks, where one can learn faster by using interactive parallelism but at the cost of having to do tasks serially. We then discuss a meta-learning algorithm for an agent to learn how to train itself with regard to this trade-off in an environment with unknown serialization cost.