Sachin Ravi will present his FPO "Meta-Learning for Data and Processing Efficiency" on Friday, 5/17/2019 at 10 AM in CS 402

10 May 2019

      Sachin Ravi will present his FPO "Meta-Learning for Data and Processing Efficiency" on Friday, 5/17/2019 at 10 AM in CS 402. 

The members of his committee are as follows: Kai Li (adviser) Nonreaders: Jonathan Cohen and Jia Deng; Readers: Karthik Narasimhan and Hugo Larochelle (Google Brain, University of Montreal) 

All are welcome to attend. A copy of his thesis is available in CS 310. His abstract follows below. 

Deep learning models have shown great success in a variety of machine learning 
benchmarks; however, these models still lack the eciency and flexibility of humans. 
Current deep learning methods involve training on a large amount of data to produce 
a model that can then specialize to the specific task encoded by the training data. 
Humans, on the other hand, are able to learn new concepts throughout our lives 
with comparatively little feedback. In order to bridge this gap, previous work has 
suggested the use of meta-learning. Rather than learning how to do a specific task, 
meta-learning involves learning how-to-learn and utilizing this knowledge to learn 
new tasks more e↵ectively. This thesis focuses on using meta-learning to improve the 
data and processing eciency of deep learning models when learning new tasks. 
First, we discuss a meta-learning model for the few-shot learning problem, where 
the aim is to learn a new classification task having unseen classes with few labeled 
examples. We use a LSTM-based meta-learner model to learn both the initialization 
and the optimization algorithm used to train another neural network and show that 
our method compares favorably to nearest-neighbor approaches. The second part of 
the thesis deals with improving the predictive uncertainty of models in the few-shot 
learning setting. Using a Bayesian perspective, we propose a meta-learning method 
which eciently amortizes hierarchical variational inference across tasks, learning a 
prior distribution over neural network weights so that a few steps of gradient descent 
will produce a good task-specific approximate posterior. Finally, we focus on applying 
meta-learning in the context of making choices that impact processing ecacy. When 
training a network on multiple tasks, we have a choice between interactive parallelism 
(training on di↵erent tasks one after another) and independent parallelism (using the 
network to process multiple tasks concurrently). For the simulation environment 
considered, we show that there is a trade-o↵ between these two types of processing 
choices in deep neural networks. We then discuss a meta-learning algorithm for an 
agent to learn how to train itself with regard to this trade-o↵ in an environment with 
unknown serialization cost.

Nicki Mahler

tags

participants (1)