[talks] Naman Agawal will present a pre-FPO talk on March 5th at 2:00 pm in CS 401

Tue Feb 27 11:32:03 EST 2018

Naman Agawal will present a pre-FPO talk on March 5th at 2:00 pm in CS 401

Members of his committee are as follows: 
1. Elad Hazan (adviser and examiner) - Princeton University
2. Sanjeev Arora (examiner) - Princeton University
3. Mark Bravermann (examiner) - Princeton University
4. Yoram Singer (reader) - Princeton University
5. Yuxin Chen (reader) - Princeton University

All are invited to attend.

=============

Title: Efficient Second-Order Methods for Machine Learning 

Abstract: With the advent of massive datasets and increasingly complex tasks, modern machine learning systems pose several new efficiency and scalability challenges for optimization methods. Stochastic gradient-based methods form the state-of-the-art in large-scale machine learning optimization due to their extremely efficient per-iteration computational cost. Second-order methods, i.e. methods that use the second derivative of the optimization objective, are known to provably enable faster convergence as well as  guaranteeing better solutions. The latter however have been less explored due to the high cost of computing the second-order information. In our research, we study second-order stochastic methods for optimization problems arising in machine learning that match the per-iteration cost of gradient based methods, yet enjoy the faster convergence properties of second-order optimization overall leading to faster algorithms than the best known gradient based methods. 

In the first part of the talk I will describe our algorithm LISSA (Linear Time Second-Order Stochastic Algorithm) for convex machine learning tasks such as ridge and logistic regression. LISSA incorporates second order information from such objectives, running in time proportional to the sparsity of the underlying input. Further we extend LISSA with careful matrix sampling techniques based on leverage scores to provide the best known theoretical running times for optimization in these settings. In the second part of the talk we will consider non-convex empirical risk minimization objectives including deep neural networks. I will describe our algorithm FastCubic, an efficient second order algorithm which provably converges to better local minima faster than known first-order methods.