
Howard Yen will present his General Exam " Long-Context Language Models for Long-Horizon Agents " on Friday, October 10, 2025 at 2:00 PM in CS 401 . Committee Members: Danqi Chen (advisor), Tri Dao , Karthik Narasimhan Abstract: Agentic systems that autonomously interact with their environments enable powerful large language models (LLMs) across numerous downstream applications, such as deep research and software engineering. Due to the complex, long-horizon nature of these tasks, agentic systems need to effectively scale to extremely long trajectories with many steps. Thus, the building block of these systems—the underlying LLMs—must be both capable of processing long sequences and interacting with environments through tool use. In this talk, I will first establish the foundation for these systems by developing new evaluations for long-context language models (LCLMs) and retrieval systems, a key tool in many agentic systems. I introduce HELMET, a holistic and robust benchmark for LCLMs, and show that diverse evaluations provides more reliable signals for developing LCLMs and how they perform in real settings. Then, I propose reasoning-intensive retrieval, which extends the information retrieval task beyond lexical and semantic matching. Reasoning-intensive retrieval requires in-depth reasoning to identify relevant documents and enables agentic systems to tackle more complex problems. I will conclude with context management for long-horizon agentic search, where agents must exhaustively search and reason over many sources to solve complex tasks. I first develop an automated analysis pipeline and error taxonomym to gain a better understanding of long-horizon agentic systems and how they fail, where one of the major failure modes is context mismanagement. Then, I design a simple framework, SLIM, that allows these systems to scale effectively to long trajectories with effective context management. Reading List: https://docs.google.com/document/d/1XH4QblNAg9rEaeFWuCDBTcgl6iGbRJ6jgLRaT4Rr... Everyone is invited to attend the talk, and those faculty wishing to remain for the oral exam following are welcome to do so.