Abhishek Panigrahi will present his General Exam "Demystifying Gradient Descent in modern Deep Learning: Implicit training biases and Modular Generalization" on Wednesday, January 18, 2023 at 3:00 PM over Zoom.
Committee Members: Sanjeev Arora (advisor), Elad Hazan, Danqi Chen
Abstract:
Modern deep learning involves training large scale neural networks, which comes at the cost of deciding the best training recipe. Traditional machine learning fails to explain the hidden mechanisms of such models, owing to the high non-convexity of the model landscape. My research focuses on the training-time interplay between different training algorithms and the architecture that drives the generalizability of these models. In this talk, I will focus on Gradient Descent and its two novel mechanisms: (a) the Edge of Stability in Deep Learning, where the interplay between learning rate and the model landscape leads to implicit regularization of hessian flatness during training, and (b) modular skill acquisition that drives generalization during language-model fine-tuning.
Reading List:
Everyone is invited to attend the talk, and those faculty wishing to remain for the oral exam following are welcome to do so.