Alexander Wettig will present his General Exam "Towards a Better Understanding of Language Pre-training and Fine-tuning" on Thursday, May 18, 2023 at 3:00 PM in FC 114 and via zoom.

15 May 2023

      Alexander Wettig will present his General Exam "Towards a Better Understanding of Language Pre-training and Fine-tuning" on Thursday, May 18, 2023 at 3:00 PM in FC 114 and via zoom. 

Zoom link: [ https://princeton.zoom.us/j/93571660087 | https://princeton.zoom.us/j/93571660087 ] 

Committee Members: Danqi Chen (advisor), Karthik Narasimhan, Sanjeev Arora 

Abstract: 
The paradigm of pre-training and fine-tuning has become a cornerstone in Natural Language Processing (NLP) and has achieves state-of-the-art performance on a wide range of tasks. In this talk, we will delve into key aspects of masked language model (MLM) pre-training and fine-tuning dynamics. 

In the first part, we will challenge the conventional wisdom surrounding the masking rate in MLM pre-training. Traditionally, a 15% masking rate has been widely adopted, but we show that this is not universally optimal, and 40% can outperform 15% under an efficient pre-training recipe. Surprisingly, even an 80% masking rate can still maintain 95% fine-tuning performance and acquire non-trivial linguistic features. We explore how the masking rate depends on the model size and the masking strategies, and shed light on the distinct effects of increasing the masking rate on task difficulty and optimization. 

In the second part, we will investigate whether kernel-based dynamics can describe the fine-tuning dynamics of a pre-trained language models, particularly in the few-shot settings. Inspired by the Neural Tangent Kernel (NTK), our analysis characterizes the conditions under which gradient features can solve the fine-tuning updates. We conduct experiments across 14 NLP tasks and demonstrate that formulating the downstream task as a masked word prediction problem through prompting often induces kernel-based dynamics during fine-tuning. The kernel view also provides an explanation for the success of parameter-efficient subspace-based fine-tuning methods. 

Reading List: 
[ https://docs.google.com/document/d/1HCyyShO_2ilodDbv9ibG2_84m3tkvatKHOlavLon... | https://docs.google.com/document/d/1HCyyShO_2ilodDbv9ibG2_84m3tkvatKHOlavLon... ] 

Everyone is invited to attend the talk, and those faculty wishing to remain for the oral exam following are welcome to do so. 

Louis Riehl 
Graduate Administrator 
Computer Science Department, CS213 
Princeton University 
(609) 258-8014

Louis W. Riehl

tags

participants (1)