New subject: UPDATE: Sowmya Thanvantri will present her General Exam "Generalized Combinations of Experts in Multi-Head Attention" on Monday, April 27, 2026 at 2:30 PM in CS 401.

23 Apr 2026

      Sowmya Thanvantri will present her General Exam "Generalized Combinations of Experts in Multi-Head Attention" on Monday, April 27, 2026 at 1:30 PM in CS 401. 

Committee Members: Ryan Adams (advisor), Tom Griffiths, Karthik Narasimhan 

Abstract: 
The success of large language models today is attributed largely to the attention mechanism used in transformers, which assigns a measure of uncertainty over other tokens. This idea was extended to multi-head attention, allowing models to learn several different distributions of uncertainty over tokens. However, multi-head attention does not take full advantage of the different distributions because the outputs of the heads are concatenated and passed through a linear map, which prevents learning more expressive mappings. To enhance expressivity, a mixture model could aggregate information between these distributions. Alternatively, taking a Bayesian perspective, a product-based model across distributions could lead to a more accurate representation. To achieve both of these, we draw on ideas from mixture of experts (MoE) and product of experts (PoE) and propose a generalized combination of expert heads in multi-head attention to allow models to learn a richer representation of attention. 

Reading List: 
https://docs.google.com/document/d/1PaOPsZcP-rYhBvUYQBqSR-c_BAB3HIYI4rGCSE7J... 

Everyone is invited to attend the talk, and those faculty wishing to remain for the oral exam following are welcome to do so.

Sowmya Thanvantri will present her General Exam "Generalized Combinations of Experts in Multi-Head Attention" on Monday, April 27, 2026 at 1:30 PM in CS 401.

CS Grad Department

gradinfo＠cs.princeton.edu

tags

participants (2)