**Please note, lunch will be available at 12:00pm in the Friend Center Convocation room before the talk.**

CS Colloquium Series

Speaker: Luke Zettlemoyer, University of Washington
Date: Thursday, November 17
Time: 12:30pm
Location: Friend Center Convocation room
Host: Danqi Chen

Title: Large Language Models: Will they keep getting bigger? And, how will we use them if they do?

Abstract: The trend of building ever larger language models has dominated much research in NLP over the last few years. In this talk, I will discuss our recent efforts to (at least partially) answer two key questions in this area: Will we be able to keep scaling? And, how will we actually use the models, if we do? I will cover our recent efforts on learning new types of sparse mixtures of experts (MoEs) models. Unlike model-parallel algorithms for learning dense models, which are very difficult to further scale with existing hardware, our sparse approaches have significantly reduced cross-node communication costs and could possibly provide the next big leap in performance, although finding a version that scales well in practice remains an open challenge. I will also present our recent work on prompting language models that better controls for surface form variation, to improve performance of models that are so big we can only afford to do inference, with little to no task-specific fine tuning. Finally, time permitting, I will discuss work on new forms of supervision for language model training, including learning from the hypertext and multi-modal structure of web pages to provide new signals for both learning and prompting the model. Together, these methods present our best guesses for how to keep the scaling trend alive as we move forward to the next generation of NLP models. 
This talk describes work done at the University of Washington and Meta, primarily led by Armen Aghajanyan, Suchin Gururangan, Ari Holtzmann, Mike Lewis, Margaret Li, Sewon Min, and Peter West. 

Bio: Luke Zettlemoyer is a Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, and a Research Director at Meta. His research focuses on empirical methods for natural language semantics, and involves designing machine learning algorithms, introducing new tasks and datasets, and, most recently, studying how to best develop self-supervision signals for pre-training. His honors include being named an ACL Fellow as well as winning a PECASE award, an Allen Distinguished Investigator award, and multiple best paper awards. Luke received his PhD from MIT and was a postdoc at the University of Edinburgh.

This talk will be live streamed on Princeton University Media Central.  See link here: https://mediacentrallive.princeton.edu/