talks

Download

talks@lists.cs.princeton.edu

October 2017

4 participants
12 discussions

Tengyu Ma will present his FPO, "Non-convex Optimization for Machine Learning: Design, Analysis, and Understanding" on Thursday, 10/12/2017 at 3pm, in CS 402.
by Nicki Gotsis 05 Oct '17

05 Oct '17

Tengyu Ma will present his FPO, "Non-convex Optimization for Machine Learning: Design, Analysis, and Understanding" on Thursday, 10/12/2017 at 3pm, in CS 402. The members of his committee are: Sanjeev Arora (adviser); Readers: Rong GE (Duke), Sanjeev Arora, and David Steurer (ETH Zurich); Nonreaders: Sanjeev Arora, Elad Hazan, and Mark Braverman. A copy of his thesis, is available in Room 310. Everyone is invited to attend his talk. The talk abstract follows below: Non-convex optimization is ubiquitous in modern machine learning: recent breakthroughs in deep learning require optimizing non-convex training objective functions; problems that admit accurate convex relaxation can often be solved more efficiently with non-convex formulations. However, the theoretical understanding of optimizing non-convex functions remained rather limited to the intractability result of optimizing worst case non-convex functions. Can we extend the algorithmic frontier by effi- ciently optimizing a family of interesting non-convex functions? Can we successfully apply non-convex optimization to machine learning problems with provable guarantees? How do we interpret the complicated models in machine learning that demand non-convex optimizers? Towards addressing these questions, in this thesis, we theoretically studied various machine learning models including sparse coding, topic models, and matrix completion, linear dynamical systems, and word embeddings. We first consider how to find a coarse solution to serve as a good starting point for local improvement algorithms such as stochastic gradient descent. We propose ef- ficient methods for sparse coding and topic inference with better provable guarantees. Second, we propose a framework for analyzing local improvement algorithms with a good starting point. We apply it successfully to the sparse coding problem and demonstrate its advantage over other related frameworks. Then, we consider minimizing a family of non-convex functions that local improvement algorithms can succeed efficiently from random or arbitrary initialization — the family of functions of which all local minima are also global. The challenge, in turn, becomes proving the objective function belongs to this class. We establish such type of results for the natural learning objectives for matrix completion and learning linear dynamical systems. iii Finally, we make steps towards interpreting the non-linear models that require non-convex training algorithms. We reflect on the principles of word embeddings in natural language processing. We give a generative model for the texts, using which we explain why different non-convex formulations such as word2vec and GloVe can learn similar word embeddings with the surprising performance — analogous words have embeddings with similar differences.

1 0

Rajesh Ranganath will present his FPO, "Black Box Variational Inference: Scalable, Generic Bayesian Computation and its Applications" on Friday, 10/6/2017 at 11am in CS 105.
by Nicki Gotsis 02 Oct '17

02 Oct '17

Rajesh Ranganath will present his FPO, "Black Box Variational Inference: Scalable, Generic Bayesian Computation and its Applications" on Friday, 10/6/2017 at 11:00 in CS 105. The members of his committee are as follows: David Blei (Adviser); Examiners: Sanjeev Arora, Barbara Engelhardt, and David Blei; Non Examiners: David Blei and Peter Orbanz (Columbia University, Department of Statistics) A copy of his thesis, is available in Room 310. Everyone is invited to attend his talk. The talk abstract follows below. Probabilistic generative models are robust to noise, uncover unseen patterns, and make predictions about the future. These models have been used successfully to solve problems in neuroscience, astrophysics, genetics, and medicine. The main computational challenge is computing the hidden structure given the data—posterior inference. For most models of interest, computing the posterior distribution requires approximations like variational inference. Variational inference transforms posterior inference into optimization. Classically, this optimization problem was feasible to deploy in only a small fraction of models. This thesis develops black box variational inference. Black box variational inference is a variational inference algorithm that is easy to deploy on a broad class of models and has already found use in models for neuroscience and health care. It makes new kinds of models possible, ones that were too unruly for previous inference methods. One set of models we develop is deep exponential families. Deep exponential families uncover new kinds of hidden pattens while being predictive of future data. Many existing models are deep exponential families. Black box variational inference makes it possible for us to quickly study a broad range of deep exponential families with minimal added eort for each new type of deep exponential family. The ideas around black box variational inference also facilitate new kinds of variational methods. First, we develop hierarchical variational models. Hierarchical variational models improve the approximation quality of variational inference by building higher-fidelity approximations from coarser ones. We show that they help with inference in deep exponential families. Second, we introduce operator variational inference. Operator variational inference delves into the possible distance measures that can be used for the variational optimization problem. We show that this formulation categorizes various variational inference methods and enables variational approximations without tractable densities. By developing black box variational inference, we have opened doors to new models, better posterior approximations, and new varieties of variational inference algorithms.

1 0