Liv d'Aliberti will present their General Exam "Learning Machine Behavior: Measuring Exploration, Stability, and Alignment in Learned Agents" on Tuesday, May 26, 2026 at 1:00 PM in Sherrerd 306.

Committee Members: Manoel Horta Ribeiro (advisor), Peter Henderson, Tom Griffiths

Abstract:

Modern AI systems increasingly behave less like static predictors and more like learned agents: they reason over multiple steps, explore alternative responses, adapt through feedback, and participate in human institutions. As these systems become more capable and socially consequential, evaluation must move beyond average benchmark performance. We need tools for asking behavioral questions: which behaviors are actually present? Are they stable across prompts, samples, and training runs? Do they reflect a genuine learned capability, or an artifact of prompting, sampling, optimization, or interpretation?

This talk centers on “The Illusion of Insight in Reasoning Models,” which studies whether large reasoning models exhibit intrinsic “Aha!” moments. Rather than treating insight as self-evident from a reasoning trace, I frame it as a behavioral construct that must be operationalized, measured, and tested. The goal is not to deny that reasoning models exhibit interesting behavior, but to separate robust behavioral evidence from compelling anthropomorphic stories. This perspective draws on cognitive science and statistical evaluation: claims about model cognition require construct validity, robustness checks, and careful alternatives to surface-level interpretation.

I then connect this measurement problem to broader questions in Reinforcement Learning (RL) and alignment. Ongoing work on behavior-consistent RL asks whether agents with similar returns actually learn similar behaviors across random seeds, while work on exploration in Large Language Model (LLM) fine-tuning asks whether diversity should enter learning at the level of tokens, prompts, or candidate solutions. Together, these projects point toward a science of learned machine behavior: one that rigorously tests behavioral claims, identifies when evaluation artifacts masquerade as capabilities, and designs learning algorithms whose behavior is more reliable, interpretable, and aligned with human interests.

Reading List:

https://docs.google.com/document/d/1mSkrxB-SvCrvIhsne3VDMqYFxf3lMbCkg8z8QM3TkWk/edit?usp=sharing

Everyone is invited to attend the talk, and those faculty wishing to remain for the oral exam following are welcome to do so.