Please note a change in location. The FPO will take place in Friend 101.

__________________________________________________

Brian Bullins will present his FPO, "Efficient Higher-Order Optimization for Machine Learning" on Tuesday, 7/23/2019 at 2pm in Friend 101.

The members of his committee are: Elad Hazan (Adviser), Examiners: Mark Braverman and Sanjeev Arora; Readers: Yoram Singer and Yuxin Chen (EE).

A copy of his thesis, is available upon request.

Everyone is invited to attend his talk. The talk abstract follows below:

In recent years, stochastic gradient descent (SGD) has taken center stage for training

large-scale models in machine learning. Although some higher-order methods have improved

iteration complexity in theory, the per-iteration costs render them unusable when faced with

millions of parameters and training examples.

In this thesis, I will present several works which enable higher-order optimization to be as

scalable as first-order methods. The first method is a stochastic second-order algorithm for

convex optimization, called LiSSA, which uses Hessian information to construct an unbiased

Newton step in time linear in the problem dimension. To bypass the typical eciency barriers

for second-order methods, we harness the ERM structure in standard machine learning tasks.

While convex problems allow for global convergence, recent state-of-the-art models, such

as deep neural networks, highlight the importance of developing a better understanding of

non-convex guarantees. In order to handle this challenging setting, I will present FastCubic,

a Hessian-based method which provably converges to first-order critical points faster than

gradient descent, while additionally converging to second-order critical points. Finally, I will

establish how to leverage even higher-order derivative information by means of the FastQuartic

algorithm, which achieves faster convergence for both smooth and non-smooth convex

optimization problems by combining ecient tensor methods with near-optimal higher-order

acceleration.