The CS Department Colloquium Series kicks off next week. Announcements will be sent each Friday with the following week's talks. And the full calendar can be viewed here: [ https://www.cs.princeton.edu/general/newsevents/events | https://www.cs.princeton.edu/general/newsevents/events ] Speaker: Milijana Surbatovich, Carnegie Mellon University Date: Monday, February 27 Time: 12:30pm EST Location: CS 105 Host: Amit Levy Event page: https://www.cs.princeton.edu/events/26341 Title: Designing Formally Correct Intermittent Systems Abstract: "Extreme edge computing" is an emerging computing paradigm targeting application domains like medical wearables, disaster-monitoring tiny satellites, or smart infrastructure. This paradigm brings sophisticated sensing and data processing into an embedded device's deployment environment, enabling computing in environments that are too harsh, inaccessible, or dense to support frequent communication with a central server. Batteryless, energy harvesting devices (EHDs) are key to enabling extreme edge computing; instead of using batteries, which may be too costly or even impossible to replace, they can operate solely off energy collected from their environment. However, harvested energy is typically too weak to power a device continuously, causing frequent, arbitrary power failures that break traditional software and make correct programming difficult. Given the high assurance requirements of the envisioned application domains, EHDs must execute software without bugs that could render the device inoperable or leak sensitive information. While researchers have developed intermittent systems to support programming EHDs, they rely on informal, undefined correctness notions that preclude proving such necessary correctness and security properties. My research lays the foundation for designing formally correct intermittent systems that provide correctness guarantees. In this talk, I show how existing correctness notions are insufficient, leading to unaddressed bugs. I then present the first formal model of intermittent execution, along with correctness definitions for important memory consistency and timing properties. I use these definitions to design and implement both the language abstractions that programmers can use to specify their desired properties and the enforcement mechanisms that uphold them. Finally, I discuss my future research directions in intermittent system security and leveraging formal methods for full-stack correctness reasoning. Bio: Milijana Surbatovich is a PhD Candidate in the Electrical and Computer Engineering Department at Carnegie Mellon University, co-advised by Professors Brandon Lucia and Limin Jia. Her research interests are in applied formal methods, programming languages, and systems for intermittent computing and non-traditional computing platforms broadly. She is excited by research problems that require reasoning about correctness and security across the architecture, system, and language stack. She was awarded CMU's CyLab Presidential Fellowship in 2021 and was selected as a 2022 Rising Star in EECS. Previously, she received an MS in ECE from CMU in 2020 and a BS in Computer Science from the University of Rochester in 2017. Speaker: Stephanie Wang, University of California, Berkeley Date: Wednesday, March 1 Time: 12:30pm EST Location: CS 105 Host: Wyatt Lloyd Event page: https://www.cs.princeton.edu/events/26342 Title: The Design of a General-Purpose Distributed Execution System Abstract: Scaling applications with distributed execution has become the norm. With the rise of big data and machine learning, more and more developers must build applications that involve complex and data-intensive distributed processing. In this talk, I will discuss the design of a general-purpose distributed execution system that can serve as a common platform for such applications. Such a system offers two key benefits: (1) common system functionality such as distributed resource management can be shared across different application domains, and (2) by building on the same platform, applications across domains can easily interoperate. First, I will introduce the distributed futures interface, a powerful yet expressive distributed programming abstraction for remote execution and memory. Second, I will introduce ownership, an architecture for distributed futures systems that simultaneously provides horizontal scalability, low latency, and fault tolerance. Finally, I will present Exoshuffle, a large-scale shuffle system that builds on distributed futures and ownership to match the speed and reliability of specialized data processing frameworks while using an order of magnitude less code. These works have reached a broad audience through Ray, an open-source distributed futures system for Python that has more than 23,000 GitHub stars and that has been used to train ChatGPT and to break the world record for CloudSort. Bio: Stephanie Wang is a final-year PhD student at UC Berkeley, advised by Professor Ion Stoica. She is interested in distributed systems, with current focus on problems in cloud computing and fault tolerance. She is a co-creator and committer of the popular open-source project Ray for distributed Python. Stephanie has received the UC Berkeley Chancellor’s Fellowship, a Distinguished Artifact Award at SOSP’19, and was selected for Rising Stars in EECS in 2021. Speaker: Tri Dao, Stanford University Date: Thursday, March 2 Time: 12:30pm EST Location: CS 105 Host: Jia Deng Event page: https://www.cs.princeton.edu/events/26343 Title: Hardware-aware Algorithms for Efficient Machine Learning Abstract: Machine learning (ML) models training will continue to grow to consume more cycles, their inference will proliferate on more kinds of devices, and their capabilities will be used on more domains. Some goals central to this future are to make ML models efficient so they remain practical to train and deploy, and to unlock new application domains with new capabilities. We describe some recent developments in hardware-aware algorithms to improve the efficiency-quality tradeoff of ML models and equip them with long context. In the first half, we focus on structured sparsity, a natural approach to mitigate the extensive compute and memory cost of large ML models. We describe a line of work on learnable fast transforms which, thanks to their expressiveness and efficiency, yields some of the first sparse training methods to speed up large models in wall-clock time (2x) without compromising their quality. In the second half, we focus on efficient Transformer training and inference for long sequences. We describe FlashAttention, a fast and memory-efficient algorithm to compute attention with no approximation. By careful accounting of reads/writes between different levels of memory hierarchy, FlashAttention is 2-4x faster and uses 10-20x less memory compared to the best existing attention implementations, allowing us to train higher-quality Transformers with 8x longer context. FlashAttention is now widely used in some of the largest research labs and companies, in just 6 months after its release. We conclude with some exciting directions in ML and systems, such as software-hardware co-design, structured sparsity for scientific AI, and long context for new AI workflows and modalities. Bio: Tri Dao is a PhD student in Computer Science at Stanford, co-advised by Christopher Ré and Stefano Ermon. He works at the interface of machine learning and systems, and his research interests include sequence models with long-range memory and structured matrices for compact deep learning models. His work has received the ICML 2022 Outstanding paper runner-up award.