Berlin Chen will present his General Exam "Inference Efficiency for Sub-quadratic Models" on Friday, January 30, 2026 at 11:00 AM in Friend 108 and via zoom.
Berlin Chen will present his General Exam "Inference Efficiency for Sub-quadratic Models" on Friday, January 30, 2026 at 11:00 AM in Friend 108 and via zoom. Zoom link: https://princeton.zoom.us/j/92265989878 Committee Members: Tri Dao (advisor), Elad Hazan, Kai Li Abstract: Recent progress in AI has witnessed a paradigm shift to test-time compute, where LLM inference commands an increasingly greater share of the compute budget. This shift presents a new opportunity for model design that tailors to the computational characteristic of inference. In this talk, I will present a recent work on improving the decoding efficiency of Mamba, which is a variant of subquadratic models based on State Space Models (SSMs). In particular, I will highlight a key challenge preventing subquadratic models from being hardware efficient during decoding. I will then propose an adjustment to the model that addresses the challenge, and demonstrate its key advantages from an inference-first perspective. I will further motivate the change by connecting it to classic SSMs. Drawing upon this connection, I will highlight two additional architectural adjustments to Mamba that are naturally motivated by classic SSM formulations and demonstrate their roles in improving model quality. Reading List: https://docs.google.com/document/d/1T2z9TI_StxTt9tMNawjSo5842qr8JSBP2zgl7oS3... Everyone is invited to attend the talk, and those faculty wishing to remain for the oral exam following are welcome to do so.
The time and location for this General Exam have been updated to 10:00 AM in CS 301. Sorry for any confusion. Berlin Chen will present his General Exam "Inference Efficiency for Sub-quadratic Models" on Friday, January 30, 2026 at 10:00 AM in CS 301 and via zoom. Zoom link: https://princeton.zoom.us/j/92265989878 Committee Members: Tri Dao (advisor), Elad Hazan, Kai Li Abstract: Recent progress in AI has witnessed a paradigm shift to test-time compute, where LLM inference commands an increasingly greater share of the compute budget. This shift presents a new opportunity for model design that tailors to the computational characteristic of inference. In this talk, I will present a recent work on improving the decoding efficiency of Mamba, which is a variant of subquadratic models based on State Space Models (SSMs). In particular, I will highlight a key challenge preventing subquadratic models from being hardware efficient during decoding. I will then propose an adjustment to the model that addresses the challenge, and demonstrate its key advantages from an inference-first perspective. I will further motivate the change by connecting it to classic SSMs. Drawing upon this connection, I will highlight two additional architectural adjustments to Mamba that are naturally motivated by classic SSM formulations and demonstrate their roles in improving model quality. Reading List: https://docs.google.com/document/d/1T2z9TI_StxTt9tMNawjSo5842qr8JSBP2zgl7oS3... Everyone is invited to attend the talk, and those faculty wishing to remain for the oral exam following are welcome to do so.
participants (2)
-
CS Grad Department -
gradinfo@cs.princeton.edu