[talks] Wei Hu General Exam Presentation - TODAY Friday, May 25, 2018 1:00 pm CS 331

Fri May 25 09:00:00 EDT 2018

Wei Hu will present his General Exam Presentation on TODAY Friday, May 25, 2018 at 1:00 pm in CS 331.

Committee: Sanjeev Arora (adviser), Elad Hazan, Yoram Singer

Title: Advances in Convex and Non-Convex Approaches to Low-Rank Matrix Optimization

Abstract:

Low-rank matrix estimation is a fundamental class of problems that find a wide range of applications in machine learning. In this talk, I will present two recent results on the optimization side of such problems, one using a convex approach and another using a non-convex approach.

Faster optimization over trace-norm balls
Minimizing a convex matrix function over a trace-norm ball usually serves as a convex surrogate to low-rank matrix optimization. In this work, we propose a rank-k variant of the classical Frank-Wolfe algorithm to solve this problem. Our algorithm replaces the top singular vector computation (1-SVD) in Frank-Wolfe with a top-k singular vector computation (k-SVD), which can be done by repeatedly applying 1-SVD k times. Alternatively, our algorithm can be viewed as a rank-k restricted version of projected gradient descent. We show that our algorithm has a linear convergence rate when the objective function is smooth and strongly convex, and the optimal solution has rank at most k. This improves the convergence rate and the total time complexity of the Frank-Wolfe method and its variants.

Implicit regularization of gradient descent on learning multi-layer homogeneous models
We consider directly using gradient descent to solve the low-rank matrix factorization problem, i.e., minimizing ||UV-M||_F^2 over the two factors U and V simultaneously. This problem, together with many multi-layer models such as deep neural networks with ReLU activation, have the property of homogeneity. A direct consequence of homogeneity is that a solution can produce small function value while being unbounded, which poses significant difficulty in analyzing first-order optimization methods like (stochastic) gradient descent. In this work, we prove that for a large class of deep homogeneous models, gradient descent with infinitesimal step size effectively enforces the differences between the squared L2 norms across different layers to remain invariant, and thus different layers are automatically balanced. Inspired by this finding, we show that for matrix factorization, gradient descent with random initialization and positive step size converges to a bounded global minimum
 .

The first result is joint work with Zeyuan Allen-Zhu (MSR), Elad Hazan (Princeton) and Yuanzhi Li (Princeton). The second result is joint work with Simon S. Du (CMU) and Jason D. Lee (USC).

Barbara A. Mooring
Interim Graduate Coordinator
Computer Science Department
Princeton University
_______________________________________________
talks mailing list
talks at lists.cs.princeton.edu
To edit subscription settings or remove yourself, use this link:
https://lists.cs.princeton.edu/mailman/listinfo/talks