CS Colloquium speakers

Speaker: Qianqian Wang, University of California, Berkeley

Date: Monday, March 17

Time: 12:30pm EST

Location: CS 105

Host: Olga Russakovsky

Event page: https://www.cs.princeton.edu/events/learning-perceive-4d-world

Title: Learning to Perceive the 4D World

Abstract: Perceiving the 4D world (i.e., 3D space over time) from visual input is essential for human interaction with the physical environment. While computer vision has made remarkable progress in 3D scene understanding, much of it remains piecemeal—for example, focusing solely on static scenes or specific categories of dynamic objects. How can we model diverse dynamic scenes in the wild? How can we achieve online perception with human-like capabilities? In this talk, I will first discuss holistic scene representations that enable long-range motion estimation and 4D reconstruction. I will then introduce a unified learning-based framework for online dense 3D perception, which continuously refines scene understanding with new observations. I will conclude by discussing future directions and challenges in advancing spatial intelligence.

Bio: Qianqian Wang is a postdoctoral researcher at UC Berkeley, working with Prof. Angjoo Kanazawa and Prof. Alexei A. Efros. She received her Ph.D. in Computer Science from Cornell University in 2023, advised by Prof. Noah Snavely and Prof. Bharath Hariharan. Her research lies at the intersection of computer vision, computer graphics, and machine learning. She is a recipient of the ICCV Best Student Paper Award, CVPR Best Paper Honorable Mention, Cornell CS Dissertation Award, Google PhD Fellowship, and EECS Rising Stars.

Speaker: Simran Arora, Stanford University

Date: Tuesday, March 18

Time: 12:30pm EST

Location: CS 105

Host: Mae Milano

Event page: https://www.cs.princeton.edu/events/pareto-efficient-ai-systems-expanding-quality-and-efficiency-frontier-ai

Title: Pareto-efficient AI systems: Expanding the quality and efficiency frontier of AI

Abstract: We have made exciting progress in AI by scaling massive models on massive amounts of data center compute. However, this represents a small fraction of AI’s potential. My work expands the Pareto frontier between the AI capabilities we can achieve and the long tail of compute constraints.

In this talk, we piece-by-piece build up to a language model architecture that expands the Pareto frontier between quality and throughput efficiency. The Transformer, AI’s current workhorse architecture, is memory hungry, limiting its throughput, or amount of data it can process per second. This has led to a Cambrian explosion of alternate architecture candidates proposed across prior work. Prior work paints an exciting picture: there are architectures that are asymptotically faster than the Transformer, while also matching its quality. However, I ask, if we’re using asymptotically faster building blocks, what if anything are we giving up in quality?

1. In part one, we understand the tradeoffs and show indeed, there’s no free lunch. I present my work to identify and explain the fundamental quality and efficiency tradeoffs between different classes of architectures. Methods I developed for this analysis are now ubiquitous in the development of efficient language models.

2. In part two, we measure how existing architecture candidates fare along on the tradeoff space. While many proposed architectures are asymptotically fast, they are not wall-clock fast compared to the Transformer. I present ThunderKittens, a programming library that I built to help AI researchers develop hardware-efficient AI algorithms.

3. In part three, we expand the Pareto frontier of the tradeoff space. I present the BASED architecture, which is built from simple, hardware-efficient components. In culmination, I released a suite of state-of-the-art 8B-405B parameter Transformer-free language models, per standard evaluations, all on an academic budget.

Given the massive investment into AI models, this work blending AI and systems has had significant impact and adoption in research, open-source, and industry.

Bio: Simran Arora is a PhD student at Stanford University advised by Chris Ré. Her research blends AI and systems towards expanding the Pareto frontier between AI capabilities and efficiency. Her machine learning research has appeared as Oral and Spotlight presentations at NeurIPS, ICML, and ICLR, including an Outstanding Paper award at NeurIPS and Best Paper award at ICML ES-FoMo. Her systems work has appeared at VLDB, SIGMOD, CIDR, and CHI, and her systems artifacts are widely used in research, open-source, and industry. In 2023, Simran created and taught the CS229s Systems for Machine Learning course at Stanford. She has also been supported by a SGF Sequoia Fellowship and the Stanford Computer Science Graduate Fellowship.

Speaker: Olivia Hsu, Stanford University

Date: Thursday, March 20

Time: 12:30pm EST

Location: CS 105

Host: Brian Kernighan

Event page: https://www.cs.princeton.edu/events/language-silicon-programming-systems-sparse-accelerators

Title: From Language to Silicon: Programming Systems for Sparse Accelerators

Abstract: In this era of specialization, modern hardware development focuses on domain-specific accelerator design due to the plateau in technology scaling combined with a continual need for performance. However, domain-specific programming systems for these accelerators require extreme engineering effort, and their complexity has largely caused them to lag behind. Fundamentally, the widespread usability, proliferation, and democratization of domain-specific accelerators hinge on their programming systems, especially when targeting new domains.

This talk presents research on accelerator programming systems for the emerging domain of sparse computation. The first system, the Sparse Abstract Machine (SAM), introduces a unified abstract machine model and compiler abstraction for sparse dataflow accelerators. SAM defines a novel streaming representation and abstract dataflow interfaces that serve as an abstraction to decouple sparse accelerator implementations from their programs, similar to a stable ISA but for dataflow. The second system, Mosaic, introduces modular and portable compilation solutions that can leverage heterogeneous sparse accelerators and high-performance systems within the same system. These systems are a first step towards usable and programmable heterogeneous hardware acceleration for all. I will conclude by discussing the next steps to reach this goal, which include programming systems for accelerators in other domains and interoperation between accelerators across domains.

Bio: Olivia Hsu is a final-year Ph.D. candidate at Stanford University in the Department of Computer Science, advised by Professors Kunle Olukotun and Fredrik Kjolstad. She received her B.S. in Electrical Engineering and Computer Science (EECS) at UC Berkeley. Her broad research interests include computer architecture, computer and programming systems, compilers, programming languages, and digital circuits/VLSI. Olivia is a 2024 Rising Star in EECS and an NSF Graduate Research Fellow, and her research won a distinguished paper award at PLDI 2023. To learn more about her work, please visit her website at https://cs.stanford.edu/~owhsu.