Brian Bullins will present his Pre-FPO "Efficient Higher-Order Optimization for Machine Learning" on Friday, March 15, 2019 at 10am in CS 402.

The members of his committee are as follows: Elad Hazan (adviser), Sanjeev Arora, Mark Braverman, Yuxin Chen (ELE), and Yoram Singer

All are welcome to attend. The talk title and abstract follow below.

Title: Efficient Higher-Order Optimization for Machine Learning

Abstract:

In recent years, stochastic gradient descent (SGD) has taken center stage for training large-scale models in machine learning. Although some higher-order methods have improved iteration complexity in theory, the per-iteration costs render them unusable when faced with millions of parameters and training examples.

In this talk, I will present several of my works which enable higher-order optimization to be as scalable as first-order methods. The first method is a stochastic second-order algorithm for convex optimization, called LiSSA, which uses Hessian information to construct an unbiased Newton step in time linear in the dimension. To bypass the typical efficiency barriers for second-order methods, we harness the ERM structure in standard machine learning tasks.

While convex problems allow for global convergence, recent state-of-the-art models, such as deep neural networks, highlight the importance of developing a better understanding of non-convex guarantees. In order to handle this challenging setting, I will present FastCubic, a Hessian-based method which provably converges to first-order critical points faster than gradient descent, while additionally converging to second-order critical points. Finally, I will show how our algorithm FastQuartic combines even higher-order information with “highly smooth acceleration” to guarantee even faster convergence for a large class of convex quartic problems.