Speaker: Simran Arora, Stanford University
Date: Tuesday, March 18
Time: 12:30pm EST
Location: CS 105
Host: Mae Milano
Event page: https://www.cs.princeton.edu/events/pareto-efficient-ai-systems-expanding-quality-and-efficiency-frontier-ai
Register for live-stream online here: https://princeton.zoom.us/webinar/register/WN_3VSa9cXIQE24GdQVOrvwCQ
Title: Pareto-efficient AI systems: Expanding the quality and efficiency frontier of AI
Abstract: We have made exciting progress in AI by scaling massive models on massive amounts of data center compute. However, this represents a small fraction of AI’s potential. My work expands the Pareto frontier between the AI capabilities we can achieve and the long tail of compute constraints.
In this talk, we piece-by-piece build up to a language model architecture that expands the Pareto frontier between quality and throughput efficiency. The Transformer, AI’s current workhorse architecture, is memory hungry, limiting its throughput, or amount of data it can process per second. This has led to a Cambrian explosion of alternate architecture candidates proposed across prior work. Prior work paints an exciting picture: there are architectures that are asymptotically faster than the Transformer, while also matching its quality. However, I ask, if we’re using asymptotically faster building blocks, what if anything are we giving up in quality?
1. In part one, we understand the tradeoffs and show indeed, there’s no free lunch. I present my work to identify and explain the fundamental quality and efficiency tradeoffs between different classes of architectures. Methods I developed for this analysis are now ubiquitous in the development of efficient language models.
2. In part two, we measure how existing architecture candidates fare along on the tradeoff space. While many proposed architectures are asymptotically fast, they are not wall-clock fast compared to the Transformer. I present ThunderKittens, a programming library that I built to help AI researchers develop hardware-efficient AI algorithms.
3. In part three, we expand the Pareto frontier of the tradeoff space. I present the BASED architecture, which is built from simple, hardware-efficient components. In culmination, I released a suite of state-of-the-art 8B-405B parameter Transformer-free language models, per standard evaluations, all on an academic budget.
Given the massive investment into AI models, this work blending AI and systems has had significant impact and adoption in research, open-source, and industry.
Bio: Simran Arora is a PhD student at Stanford University advised by Chris Ré. Her research blends AI and systems towards expanding the Pareto frontier between AI capabilities and efficiency. Her machine learning research has appeared as Oral and Spotlight presentations at NeurIPS, ICML, and ICLR, including an Outstanding Paper award at NeurIPS and Best Paper award at ICML ES-FoMo. Her systems work has appeared at VLDB, SIGMOD, CIDR, and CHI, and her systems artifacts are widely used in research, open-source, and industry. In 2023, Simran created and taught the CS229s Systems for Machine Learning course at Stanford. She has also been supported by a SGF Sequoia Fellowship and the Stanford Computer Science Graduate Fellowship.
Speaker: Olivia Hsu, Stanford University
Date: Thursday, March 20
Time: 12:30pm EST
Location: CS 105
Host: Brian Kernighan
Event page: https://www.cs.princeton.edu/events/language-silicon-programming-systems-sparse-accelerators
Register for live-stream online here: https://princeton.zoom.us/webinar/register/WN_j-QIWzFvR1mwBO3qOP9atg
Title: From Language to Silicon: Programming Systems for Sparse Accelerators
Abstract: In this era of specialization, modern hardware development focuses on domain-specific accelerator design due to the plateau in technology scaling combined with a continual need for performance. However, domain-specific programming systems for these accelerators require extreme engineering effort, and their complexity has largely caused them to lag behind. Fundamentally, the widespread usability, proliferation, and democratization of domain-specific accelerators hinge on their programming systems, especially when targeting new domains.
This talk presents research on accelerator programming systems for the emerging domain of sparse computation. The first system, the Sparse Abstract Machine (SAM), introduces a unified abstract machine model and compiler abstraction for sparse dataflow accelerators. SAM defines a novel streaming representation and abstract dataflow interfaces that serve as an abstraction to decouple sparse accelerator implementations from their programs, similar to a stable ISA but for dataflow. The second system, Mosaic, introduces modular and portable compilation solutions that can leverage heterogeneous sparse accelerators and high-performance systems within the same system. These systems are a first step towards usable and programmable heterogeneous hardware acceleration for all. I will conclude by discussing the next steps to reach this goal, which include programming systems for accelerators in other domains and interoperation between accelerators across domains.
Bio: Olivia Hsu is a final-year Ph.D. candidate at Stanford University in the Department of Computer Science, advised by Professors Kunle Olukotun and Fredrik Kjolstad. She received her B.S. in Electrical Engineering and Computer Science (EECS) at UC Berkeley. Her broad research interests include computer architecture, computer and programming systems, compilers, programming languages, and digital circuits/VLSI. Olivia is a 2024 Rising Star in EECS and an NSF Graduate Research Fellow, and her research won a distinguished paper award at PLDI 2023. To learn more about her work, please visit her website at https://cs.stanford.edu/~owhsu.