Committee Members:
Abstract:
Agentic systems that autonomously interact with their environments enable powerful large language models (LLMs) across numerous downstream applications, such as deep research and software engineering. Due to the complex, long-horizon nature of these tasks, agentic systems need to effectively scale to extremely long trajectories with many steps. Thus, the building block of these systems—the underlying LLMs—must be both capable of processing long sequences and interacting with environments through tool use.
In this talk, I will first establish the foundation for these systems by developing new evaluations for long-context language models (LCLMs) and retrieval systems, a key tool in many agentic systems. I introduce HELMET, a holistic and robust benchmark for LCLMs, and show that diverse evaluations provides more reliable signals for developing LCLMs and how they perform in real settings. Then, I propose reasoning-intensive retrieval, which extends the information retrieval task beyond lexical and semantic matching. Reasoning-intensive retrieval requires in-depth reasoning to identify relevant documents and enables agentic systems to tackle more complex problems.
Reading List:
https://docs.google.com/document/d/1XH4QblNAg9rEaeFWuCDBTcgl6iGbRJ6jgLRaT4RrXWo/edit?usp=sharing
Everyone is invited to attend the talk, and those faculty wishing to remain for the oral exam following are welcome to do so.