
Jiayi Geng will present her MSE thesis "Understanding the Failure Modes in Large Language Models"on 4/16 at 2pm in Friend 005. Date & Time: April 16th, 2:00–2:30 PM Location: Friend Center, Room 005 Thesis Title: Understanding the Failure Modes in Large Language Models Thesis advisors: Danqi Chen, Tom Griffiths Presentation Abstract: Using AI to create autonomous researchers has the potential to accelerate discovery. A prerequisite for this vision to become reality is assessing how well an AI model can identify the underlying structure of a system from its behavior. In this paper, we explore the question of whether a large language model (LLM) can learn from passive observations and actively collect informative data to refine its own hypotheses. To answer this question, we investigate the ability of LLMs to reverse-engineer four types of black-box systems, chosen to represent problems that might appear in different domains of research: list-mapping programs, formal languages, mathematical equations and board games. We use the Bayesian agents as a normative reference to study the gap between LLMs and ideal inference in reverse-engineering black-box systems. Through extensive experiments, we show that while LLMs have difficulty reverse-engineering these systems from observations alone, data generated by LLM-driven interventions can be effective. By testing edge cases, the LLM is able to refine its own hypotheses and avoid failure modes such as overcomplication, where the LLM falsely assumes prior knowledge about the black box, and overlooking, where the LLM fails to incorporate observations. These insights provide practical guidance for helping LLMs more effectively reverse-engineer black-box systems, supporting their use in making new discoveries.