Berlin Chen will present his General Exam "Inference Efficiency for Sub-quadratic Models" on Friday, January 30, 2026 at 11:00 AM in Friend 108 and via zoom. Zoom link: https://princeton.zoom.us/j/92265989878 Committee Members: Tri Dao (advisor), Elad Hazan, Kai Li Abstract: Recent progress in AI has witnessed a paradigm shift to test-time compute, where LLM inference commands an increasingly greater share of the compute budget. This shift presents a new opportunity for model design that tailors to the computational characteristic of inference. In this talk, I will present a recent work on improving the decoding efficiency of Mamba, which is a variant of subquadratic models based on State Space Models (SSMs). In particular, I will highlight a key challenge preventing subquadratic models from being hardware efficient during decoding. I will then propose an adjustment to the model that addresses the challenge, and demonstrate its key advantages from an inference-first perspective. I will further motivate the change by connecting it to classic SSMs. Drawing upon this connection, I will highlight two additional architectural adjustments to Mamba that are naturally motivated by classic SSM formulations and demonstrate their roles in improving model quality. Reading List: https://docs.google.com/document/d/1T2z9TI_StxTt9tMNawjSo5842qr8JSBP2zgl7oS3... Everyone is invited to attend the talk, and those faculty wishing to remain for the oral exam following are welcome to do so.