The time for this General Exam as been updated to 2:30 PM. Other details remain the same.
Sowmya Thanvantri will present her General Exam "Generalized Combinations of Experts in Multi-Head Attention" on Monday, April 27, 2026 at 2:30 PM in CS 401.
Committee Members: Ryan Adams (advisor), Tom Griffiths, Karthik Narasimhan
Abstract:
The success of large language models today is attributed largely to the attention mechanism used in transformers, which assigns a measure of uncertainty over other tokens. This idea was extended to multi-head attention, allowing models to learn several different distributions of uncertainty over tokens. However, multi-head attention does not take full advantage of the different distributions because the outputs of the heads are concatenated and passed through a linear map, which prevents learning more expressive mappings. To enhance expressivity, a mixture model could aggregate information between these distributions. Alternatively, taking a Bayesian perspective, a product-based model across distributions could lead to a more accurate representation. To achieve both of these, we draw on ideas from mixture of experts (MoE) and product of experts (PoE) and propose a generalized combination of expert heads in multi-head attention to allow models to learn a richer representation of attention.
Reading List:
Everyone is invited to attend the talk, and those faculty wishing to remain for the oral exam following are welcome to do so.