CS Colloquium speakers

Speaker: Jialin Ding, Amazon Web Services

Date: Monday, February 26

Time: 12:30pm EST

Location: CS 105

Host: Wyatt Lloyd

Event page: https://www.cs.princeton.edu/events/26578

Title: Instance-Optimization: Rethinking Database Design for the Next 1000X

Abstract: Modern database systems aim to support a large class of different use cases while simultaneously achieving high performance. However, as a result of their generality, databases often achieve adequate performance for the average use case but do not achieve the best performance for any individual use case. In this talk, I will describe my work on designing databases that use machine learning and optimization techniques to automatically achieve performance much closer to the optimal for each individual use case. In particular, I will present my work on instance-optimized database storage layouts, in which the co-design of data structures and optimization policies improves query performance in analytic databases by orders of magnitude. I will highlight how these instance-optimized data layouts address various challenges posed by real-world database workloads and how I implemented and deployed them in production within Amazon Redshift, a widely-used commercial database system.

Bio: Jialin Ding is an Applied Scientist at AWS. Before that, he received his PhD in computer science from MIT, advised by Tim Kraska. He works broadly on applying machine learning and optimization techniques to improve data management systems, with a focus on building databases that automatically self-optimize to achieve high performance for any specific application. His work has appeared in top conferences such as SIGMOD, VLDB, and CIDR, and has been recognized by a Meta Research PhD Fellowship. To learn more about Jialin’s work, please visit https://jialinding.github.io/.

CSML/CS/PSY Colloquium

Speaker: Brenden Lake, New York University

Date: Tuesday, February 27

Time: 12:30pm EST

Location: CS 105

Host: Tom Griffiths

Event page: https://csml.princeton.edu/events/csmlpsycs-seminar

Title: Towards more human-like learning in machines: Bridging the data and generalization gaps

Abstract: There is an enormous data gap between how AI systems and children learn: The best LLMs now learn language from text with a word count in the trillions, whereas it would take a child roughly 100K years to reach those numbers through speech (Frank, 2023, "Bridging the data gap"). There is also a clear generalization gap: whereas machines struggle with systematic generalization, children can excel. For instance, once a child learns how to "skip," they immediately know how to "skip twice" or "skip around the room with their hands up" due to their compositional skills. In this talk, I'll describe two case studies in addressing these gaps.

1) The data gap: We train deep neural networks from scratch (using DINO, CLIP, etc.), not on large-scale data from the web, but through the eyes and ears of a single child. Using head-mounted video recordings from a child as training data (<200 hours of video slices over 26 months), we show how deep neural networks can perform challenging visual tasks, acquire many word-referent mappings, generalize to novel visual referents, and achieve multi-modal alignment. Our results demonstrate how today's AI models are capable of learning key aspects of children's early knowledge from realistic input.

2) The generalization gap: Can neural networks capture human-like systematic generalization? We address a 35-year-old debate catalyzed by Fodor and Pylyshyn's classic article, which argued that standard neural networks are not viable models of the mind because they lack systematic compositionality -- the algebraic ability to understand and produce novel combinations from known components. We'll show how neural networks can achieve human-like systematic generalization when trained through meta-learning for compositionality (MLC), a new method for optimizing the compositional skills of neural networks through practice. With MLC, neural networks can match human performance and solve several machine learning benchmarks.

Given these findings, we'll discuss the paths forward for building machines that learn, generalize, and interact in more human-like ways based on more natural input.

Vong, W. K., Wang, W., Orhan, A. E., and Lake, B. M (2024). Grounded language acquisition through the eyes and ears of a single child. Science, 383, 504-511.

Orhan, A. E., and Lake, B. M. (in press). Learning high-level visual representations from a child’s perspective without strong inductive biases. Nature Machine Intelligence.

Lake, B. M. and Baroni, M. (2023). Human-like systematic generalization through a meta-learning neural network. Nature, 623, 115-121.

Bio: Brenden M. Lake is an Assistant Professor of Data Science and Psychology at New York University. He received his M.S. and B.S. in Symbolic Systems from Stanford University in 2009, and his Ph.D. in Cognitive Science from MIT in 2014. He was a postdoctoral Data Science Fellow at NYU from 2014-2017. Brenden is a recipient of the Robert J. Glushko Prize for Outstanding Doctoral Dissertation in Cognitive Science and the MIT Technology Review 35 Innovators Under 35. His research was also selected by Scientific American as one of the 10 most important advances of 2016. Brenden's research focuses on computational problems that are easier for people than they are for machines, such as learning new concepts from just a few examples, learning by asking questions, learning by generating new goals, and learning by producing novel combinations of known components.

CS Colloquium

Speaker: Eric Mitchell, Stanford University

Date: Thursday, February 29

Time: 12:30pm EST

Location: CS 105

Host: Karthik Narasimhan

Event page: https://www.cs.princeton.edu/events/26587

Talk info TBA