Tengyu Ma will present his FPO, "Non-convex Optimization for Machine Learning: Design, Analysis, and Understanding" on Thursday, 10/12/2017 at 3pm, in CS 402.

5 Oct 2017

      Tengyu Ma will present his FPO, "Non-convex Optimization for Machine Learning: Design, Analysis, and Understanding" on Thursday, 10/12/2017 at 3pm, in CS 402. 

The members of his committee are: Sanjeev Arora (adviser); Readers: Rong GE (Duke), Sanjeev Arora, and David Steurer (ETH Zurich); Nonreaders: Sanjeev Arora, Elad Hazan, and Mark Braverman. 

A copy of his thesis, is available in Room 310. Everyone is invited to attend his talk. The talk abstract follows below: 

Non-convex optimization is ubiquitous in modern machine learning: recent breakthroughs 
in deep learning require optimizing non-convex training objective functions; 
problems that admit accurate convex relaxation can often be solved more efficiently 
with non-convex formulations. However, the theoretical understanding of optimizing 
non-convex functions remained rather limited to the intractability result of optimizing 
worst case non-convex functions. Can we extend the algorithmic frontier by effi- 
ciently optimizing a family of interesting non-convex functions? Can we successfully 
apply non-convex optimization to machine learning problems with provable guarantees? 
How do we interpret the complicated models in machine learning that demand 
non-convex optimizers? 
Towards addressing these questions, in this thesis, we theoretically studied various 
machine learning models including sparse coding, topic models, and matrix completion, 
linear dynamical systems, and word embeddings. 
We first consider how to find a coarse solution to serve as a good starting point 
for local improvement algorithms such as stochastic gradient descent. We propose ef- 
ficient methods for sparse coding and topic inference with better provable guarantees. 
Second, we propose a framework for analyzing local improvement algorithms with 
a good starting point. We apply it successfully to the sparse coding problem and 
demonstrate its advantage over other related frameworks. 
Then, we consider minimizing a family of non-convex functions that local improvement 
algorithms can succeed efficiently from random or arbitrary initialization 
— the family of functions of which all local minima are also global. The challenge, in 
turn, becomes proving the objective function belongs to this class. We establish such 
type of results for the natural learning objectives for matrix completion and learning 
linear dynamical systems. 
iii 
Finally, we make steps towards interpreting the non-linear models that require 
non-convex training algorithms. We reflect on the principles of word embeddings in 
natural language processing. We give a generative model for the texts, using which 
we explain why different non-convex formulations such as word2vec and GloVe can 
learn similar word embeddings with the surprising performance — analogous words 
have embeddings with similar differences.

Nicki Gotsis

tags

participants (1)