Here are next week's CS Department Colloquium Series talks.  As always, you can find the full calendar of events here: https://www.cs.princeton.edu/general/newsevents/events 


Speaker: Yao Lu, Microsoft Research
Date: Monday, April 17
Time: 12:30pm EST
Location: CS 105
Host: Kai Li
Event page: https://www.cs.princeton.edu/events/26389

Title: Towards Intelligent Data Systems

Abstract:  From single-box databases, data systems are evolving into multi-tenant compute and storage platforms that host not only structured data analytics but also AI workloads and AI-enhanced system components. The result of this evolution, which I call an “intelligent” data system, creates new opportunities and challenges for research and production at the intersection of machine learning and systems.

Key considerations in these systems include efficiency and cost, ML support and a flexible runtime for heterogeneous jobs. I will describe our work on query optimizers both for AI and aided by AI. For ML inference workloads over unstructured data, our optimizer injects proxy models for queries with complex predicates leading to a many-fold improvement in processing time; for query optimization in classic data analytics, our pre-trained models summarize structured datasets, answer cardinality estimation calls, and avoid the high training cost in recent instance-optimized database components. I will also describe our query processor and optimizer that enable and accelerate ML inference workflows on hybrid/IoT cloud. These efforts, combined with a few missing pieces that I will outline, contribute to better data systems where users can build, deploy, and optimize data analytics and AI applications with ease.

Bio: Yao Lu is a researcher at the Data Systems group, Microsoft Research Redmond. He works at the intersection of machine learning and data systems towards improved data and compute platforms for cloud machine learning, as well as using machine learning to improve current data platforms. He received his Ph.D. from the University of Washington in 2018.


Speaker: Ananya Kumar, Stanford University
Date: Tuesday, April 18
Time: 12:30pm EST
Location: CS 105
Host: Tom Griffiths
Event page: https://www.cs.princeton.edu/events/26370

Title: Foundation Models for Robust Machine Learning

Abstract:  Machine learning systems are not robust—they suffer large drops in accuracy when deployed in different environments from what they were trained on. In this talk, I show that the foundation model paradigm—adapting models that are pretrained on broad unlabeled data—is a principled solution that leads to state-of-the-art robustness. I will focus on the key ingredients: how we should pretrain and adapt models for robustness. (1) First, I show that contrastive pretraining on unlabeled data learns transferable representations that improves accuracy even on domains where we had no labels. We explain why pretraining works in a very different way from some classical intuitions of collapsing representations (domain invariance). Our theory predicts phenomena on real datasets, and leads to improved pretraining methods. (1) Next, I will show that the standard approach of adaptation (updating all the model's parameters) can distort pretrained representations and perform poorly out-of-distribution. Our theoretical analysis leads to better methods for adaptation and state-of-the-art accuracies on ImageNet and in applications such as satellite remote sensing, wildlife conservation, and radiology.

Bio: Ananya Kumar is a Ph.D. candidate in the Department of Computer Science at Stanford University, advised by Percy Liang and Tengyu Ma. His work focuses on representation learning, foundation models, and reliable machine learning. His papers have been recognized with several Spotlight and Oral presentations at NeurIPS, ICML, and ICLR, and his research is supported by a Stanford Graduate Fellowship.


Speaker: Saadia Gabriel, University of Washington
Date: Thursday, April 20
Time: 12:30pm EST
Location: CS 105
Host: Olga Troyanskya
Event page: https://www.cs.princeton.edu/events/26380

Title: Socially Responsible and Factual Reasoning for Equitable AI Systems

Abstract: Understanding the implications underlying a text is critical to assessing its impact. This requires endowing artificial intelligence (AI) systems with pragmatic reasoning, for example to infer that the statement “Epidemics and cases of disease in the 21st century are “staged”” relates to unfounded conspiracy theories. In this talk, I discuss how shortcomings in the ability of current AI systems to reason about pragmatics leads to inequitable detection of false or harmful language. I demonstrate how these shortcomings can be addressed by imposing human-interpretable structure on deep learning architectures using insights from linguistics.

In the first part of the talk, I describe how adversarial text generation algorithms can be used to improve model robustness. I then introduce a pragmatic formalism for reasoning about harmful implications conveyed by social media text. I show how this pragmatic approach can be combined with generative neural language models to uncover implications of news headlines. I also address the bottleneck to progress in text generation posed by gaps in evaluation of factuality. I conclude with an interdisciplinary study showing how content moderation informed by pragmatics can be used to ensure safe interactions with conversational agents, and my future vision for development of context-aware systems.

Bio: Saadia Gabriel is a PhD candidate in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, advised by Prof. Yejin Choi and Prof. Franziska Roesner. Her research revolves around natural language processing and machine learning, with a particular focus on building systems for understanding how social commonsense manifests in text (i.e. how do people typically behave in social scenarios), as well as mitigating spread of false or harmful text (e.g. Covid-19 misinformation). Her work has been covered by a wide range of media outlets like Forbes and TechCrunch. It has also received a 2019 ACL best short paper nomination, a 2019 IROS RoboCup best paper nomination and won a best paper award at the 2020 WeCNLP summit. Prior to her PhD, Saadia received a BA summa cum laude from Mount Holyoke College in Computer Science and Mathematics.