Rohit Agarwal will present his General Exam "AI Alignment in Agentic Pipelines via Incentives and Correction" on Friday, May 15, 2026 at 2:00 PM in CS 301 and via zoom.
Zoom link: https://princeton.zoom.us/j/7714894406
Committee Members: Elad Hazan (advisor), Sanjeev Arora, Peter Henderson
Abstract:
We propose a mechanism-design perspective on AI alignment in agentic pipelines. The key observation is that alignment in a multi-agent setting is a fixed-point problem: penalizing misbehavior may deter a solver, but it can also reduce an auditor's incentive to monitor, since auditing then mainly incurs cost on a population that already appears aligned. Thus, robust alignment requires not only discouraging bad behavior, but also preserving incentives for oversight. We also discuss the applications of this viewpoint in real-life Large Language Model settings, in coding through an empirical reinforcement learning setup, and in mathematics through empirical agentic pipeline design.
Reading List:
https://docs.google.com/document/d/1Ji1xpqK-0S5saeVjwodH07aXmU_iQo0XvYYVTcVIbgw/edit?usp=sharing
Everyone is invited to attend the talk, and those faculty wishing to remain for the oral exam following are welcome to do so.