Speaker: Noah Golowich, Massachusetts Institute of Technology
Date: Monday, March 3
Time: 12:30pm EST
Location: CS 105
Host: Elad Hazan
Event page: https://www.cs.princeton.edu/events/theoretical-foundations-multi-agent-learning
Register for live-stream online here: https://princeton.zoom.us/webinar/register/WN_E0TDd9QdQfi2XAdSueJ_iQ
Title: Theoretical Foundations for Multi-Agent Learning
Abstract: As learning algorithms become increasingly capable of acting autonomously, it is important to better understand the behavior that results from their interactions. For example, a pervasive challenge in multi-agent learning settings, which spans both theory and practice and dates back decades, has been the failure of convergence for iterative algorithms such as gradient descent. Accordingly, a longstanding central question with broad relevance is: how quickly can we compute solution concepts, i.e., equilibria, in multi-agent settings?
I will discuss results which address this question at a variety of levels, starting from foundational settings involving normal-form games and building up to complex problems such as multi-agent reinforcement learning which more aptly model realistic situations. First, I will present a result establishing a near-optimal convergence rate for a simple online learning algorithm in normal-form games, resolving a decade-long line of work which gave suboptimal bounds. I will then discuss a new algorithm for minimizing swap regret exponentially faster than previous approaches. Our algorithm allows us to answer several open questions, such as by establishing the first PTAS for correlated equilibria in extensive-form games.
Beyond contending with agents' differing incentives, the increasing use of machine learning algorithms presents other challenges, such as the proliferation of AI-generated content. In the latter part of the talk, I will discuss an approach to detect such content via watermarking. Our watermarking scheme is the first to embed a watermark in a language model's output in a way which only leads to negligible changes in the distribution of the output but which is robust to adversarial edits.
Bio: Noah Golowich is a PhD Student at the Massachusetts Institute of Technology, advised by Constantinos Daskalakis and Ankur Moitra. His research interests lie in theoretical machine learning, with a particular focus on the connections between multi-agent learning, game theory, online learning, and theoretical reinforcement learning. He has also worked on CS theory more broadly, including on problems in differential privacy, learning theory, and watermarking. He was supported by a Fannie & John Hertz Foundation Fellowship and an NSF Graduate Fellowship.
Speaker: Rose Wang, Stanford University
Date: Tuesday, March 4
Time: 12:30pm EST
Location: CS 105
Host: Tom Griffiths
Event page: https://www.cs.princeton.edu/events/scaling-expertise-language-models-applications-education
Register for live-stream online here: https://princeton.zoom.us/webinar/register/WN_8AaPm0K_RCG9PBMzxprLpg
Title: Scaling Expertise via Language Models: With Applications to Education
Abstract: Access to expertise shapes how individuals learn, develop, and succeed across society. For example, in education, experienced teachers teach students and train novice educators through effective interactions. However, access to expertise is limited, undermining learning at scale. While language models promise to democratize access, they often mimic surface-level patterns and lack the human touch needed to support learners through challenges. In this talk, I will present novel computational methods and interventions that embed expert-like thinking into language models and empower human novices in real-time interactions. First, I will present Bridge, an adaptation method that extracts latent expert reasoning to adapt language models for complex interactions. Then, I will introduce Tutor CoPilot, a novel Human-AI approach that provides expert-like guidance to tutors in real time. In the first randomized controlled trial of a Human-AI system for live tutoring, Tutor CoPilot significantly improves the quality of learning interactions for 900 tutors and 1,800 K-12 students from underserved communities.
Bio: Rose E. Wang is a Computer Science PhD candidate at Stanford University. She develops algorithms, benchmarks and large-scale interventions to tackle challenges in real-world interactions, with a focus on Education. Her work is deployed in industry and directly improves the education of under-served students through partnerships she has cultivated during her Ph.D., including Title I school districts and several education companies, impacting 200,000+ students, 1,700+ teachers, 16,100+ tutors, in millions of tutoring sessions across the U.S., UK and India. Her work is recognized by the 2025 Economic Report of the President, NSF Graduate Research Fellowship, CogSci Best Paper Award, NeurIPS Cooperative AI Best Paper Award, ICLR Oral, Rising Star in Data Science, Building Educational Applications Ambassador Paper Award, and the Learning Engineering Tools Competition Award.
Speaker: Ruiqi Zhong, University of California, Berkeley
Date: Thursday, March 6
Time: 12:30pm EST
Location: CS 105
Host: Danqi Chen
Event page: https://www.cs.princeton.edu/events/building-expert-level-language-models-decomposed-weak-validations
Register for live-stream online here: https://princeton.zoom.us/webinar/register/WN_05z1hr3MQKalcALbEK_clg
Title: Building Expert-Level Language Models from Decomposed Weak Validations
Abstract: Language models (LMs) can process large volumes of information and perform complex reasoning. They hold the promise of executing expert-level tasks, such as brainstorming scientific hypotheses or developing complex software. However, building these LMs requires humans to validate their outputs, which is challenging; e.g., developers cannot easily validate whether complex software is bug-free. If our validation is fallible, LMs may learn to "hack" the validators, convincing us that they are right even when they are wrong.
To address this, I show how to decompose complex validation tasks into "weaker" ones that are easier for humans or LMs: e.g., validating return values rather than entire programs, or validating discoveries on individual samples rather than on entire datasets. Through several examples, I show how these techniques allow us to use LMs for expert-level tasks more reliably. Looking forward, I discuss how to use LMs to automate these task decompositions, and how we can use these frameworks to monitor both individual AI systems and their broader impact within society.
Bio: Ruiqi Zhong is a final-year Ph.D. student at UC Berkeley, co-advised by Jacob Steinhardt and Dan Klein. He was previously a part-time member of technical staff at Anthropic, where he worked on the automated red teaming team. His research is at the intersection of machine learning and NLP, and he develops language model systems to advance the frontier of human capabilities. He developed the earliest prototype of instruction-tuning, and his research contribution has been scaled up by leading language model companies, such as Google, OpenAI, and Anthropic.