Sachin Ravi will present his FPO "Meta-Learning for Data and Processing Efficiency" on Friday, 5/17/2019 at 10 AM in CS 402
Sachin Ravi will present his FPO "Meta-Learning for Data and Processing Efficiency" on Friday, 5/17/2019 at 10 AM in CS 402. The members of his committee are as follows: Kai Li (adviser) Nonreaders: Jonathan Cohen and Jia Deng; Readers: Karthik Narasimhan and Hugo Larochelle (Google Brain, University of Montreal) All are welcome to attend. A copy of his thesis is available in CS 310. His abstract follows below. Deep learning models have shown great success in a variety of machine learning benchmarks; however, these models still lack the eciency and flexibility of humans. Current deep learning methods involve training on a large amount of data to produce a model that can then specialize to the specific task encoded by the training data. Humans, on the other hand, are able to learn new concepts throughout our lives with comparatively little feedback. In order to bridge this gap, previous work has suggested the use of meta-learning. Rather than learning how to do a specific task, meta-learning involves learning how-to-learn and utilizing this knowledge to learn new tasks more e↵ectively. This thesis focuses on using meta-learning to improve the data and processing eciency of deep learning models when learning new tasks. First, we discuss a meta-learning model for the few-shot learning problem, where the aim is to learn a new classification task having unseen classes with few labeled examples. We use a LSTM-based meta-learner model to learn both the initialization and the optimization algorithm used to train another neural network and show that our method compares favorably to nearest-neighbor approaches. The second part of the thesis deals with improving the predictive uncertainty of models in the few-shot learning setting. Using a Bayesian perspective, we propose a meta-learning method which eciently amortizes hierarchical variational inference across tasks, learning a prior distribution over neural network weights so that a few steps of gradient descent will produce a good task-specific approximate posterior. Finally, we focus on applying meta-learning in the context of making choices that impact processing ecacy. When training a network on multiple tasks, we have a choice between interactive parallelism (training on di↵erent tasks one after another) and independent parallelism (using the network to process multiple tasks concurrently). For the simulation environment considered, we show that there is a trade-o↵ between these two types of processing choices in deep neural networks. We then discuss a meta-learning algorithm for an agent to learn how to train itself with regard to this trade-o↵ in an environment with unknown serialization cost.
participants (1)
-
Nicki Mahler