The time and location for this General Exam have been updated to 10:00 AM in CS 301. Sorry for any confusion.

Berlin Chen will present his General Exam "Inference Efficiency for Sub-quadratic Models" on Friday, January 30, 2026 at 10:00 AM in CS 301 and via zoom.

Zoom link: https://princeton.zoom.us/j/92265989878

Committee Members: Tri Dao (advisor), Elad Hazan, Kai Li

Abstract:

Recent progress in AI has witnessed a paradigm shift to test-time compute, where LLM inference commands an increasingly greater share of the compute budget. This shift presents a new opportunity for model design that tailors to the computational characteristic of inference. In this talk, I will present a recent work on improving the decoding efficiency of Mamba, which is a variant of subquadratic models based on State Space Models (SSMs). In particular, I will highlight a key challenge preventing subquadratic models from being hardware efficient during decoding. I will then propose an adjustment to the model that addresses the challenge, and demonstrate its key advantages from an inference-first perspective. I will further motivate the change by connecting it to classic SSMs. Drawing upon this connection, I will highlight two additional architectural adjustments to Mamba that are naturally motivated by classic SSM formulations and demonstrate their roles in improving model quality.

Reading List:

https://docs.google.com/document/d/1T2z9TI_StxTt9tMNawjSo5842qr8JSBP2zgl7oS3BSY/edit?usp=sharing

Everyone is invited to attend the talk, and those faculty wishing to remain for the oral exam following are welcome to do so.