Wei Hu will present his General Exam Presentation on TODAY Friday, May 25, 2018 at 1:00 pm in CS 331. Committee: Sanjeev Arora (adviser), Elad Hazan, Yoram Singer Title: Advances in Convex and Non-Convex Approaches to Low-Rank Matrix Optimization Abstract: Low-rank matrix estimation is a fundamental class of problems that find a wide range of applications in machine learning. In this talk, I will present two recent results on the optimization side of such problems, one using a convex approach and another using a non-convex approach. Faster optimization over trace-norm balls Minimizing a convex matrix function over a trace-norm ball usually serves as a convex surrogate to low-rank matrix optimization. In this work, we propose a rank-k variant of the classical Frank-Wolfe algorithm to solve this problem. Our algorithm replaces the top singular vector computation (1-SVD) in Frank-Wolfe with a top-k singular vector computation (k-SVD), which can be done by repeatedly applying 1-SVD k times. Alternatively, our algorithm can be viewed as a rank-k restricted version of projected gradient descent. We show that our algorithm has a linear convergence rate when the objective function is smooth and strongly convex, and the optimal solution has rank at most k. This improves the convergence rate and the total time complexity of the Frank-Wolfe method and its variants. Implicit regularization of gradient descent on learning multi-layer homogeneous models We consider directly using gradient descent to solve the low-rank matrix factorization problem, i.e., minimizing ||UV-M||_F^2 over the two factors U and V simultaneously. This problem, together with many multi-layer models such as deep neural networks with ReLU activation, have the property of homogeneity. A direct consequence of homogeneity is that a solution can produce small function value while being unbounded, which poses significant difficulty in analyzing first-order optimization methods like (stochastic) gradient descent. In this work, we prove that for a large class of deep homogeneous models, gradient descent with infinitesimal step size effectively enforces the differences between the squared L2 norms across different layers to remain invariant, and thus different layers are automatically balanced. Inspired by this finding, we show that for matrix factorization, gradient descent with random initialization and positive step size converges to a bounded global minimum . The first result is joint work with Zeyuan Allen-Zhu (MSR), Elad Hazan (Princeton) and Yuanzhi Li (Princeton). The second result is joint work with Simon S. Du (CMU) and Jason D. Lee (USC). Barbara A. Mooring Interim Graduate Coordinator Computer Science Department Princeton University _______________________________________________ talks mailing list talks@lists.cs.princeton.edu To edit subscription settings or remove yourself, use this link: https://lists.cs.princeton.edu/mailman/listinfo/talks