Here is the full list of CS Colloquium talks for next week.
All talks will be recorded.
~~~~~

CS Colloquium Speaker
Speaker: Daniel Kang, Stanford University
Date: Monday, April 18, 2022
Time: 12:30pm EST
Location: Zoom Webinar
Host: Amit Levy
Event page: https://www.cs.princeton.edu/events/26185
Please register here: https://princeton.zoom.us/webinar/register/WN_rQIK4ofARyidoIwnUSPT3w

Title: Efficient and Accurate Systems for Querying Unstructured Data

Abstract:  Over the past 60 years, relational databases have been a runaway success: they are deployed at every major organization and have produced hundreds of billions of dollars in market capitalization. However, there is a growing demand for analytics over unstructured data (e.g., videos, audio, text) given the rise of ML capabilities: previously, unstructured data did not fit cleanly with the relational database model (e.g., selecting pixels vs semantic content about objects in an image). Unfortunately, ML can be prohibitively expensive to deploy (e.g., 10 orders of magnitude more expensive than standard relational analytics) and can produce incorrect results. These problems are exacerbated by the scale of data. For example, the Tesla fleet of vehicles produces exabytes of data per day.

In this talk, I'll describe my work on new ML-based query systems to tackle the cost and reliability of unstructured data analytics. My first line of work accelerates large classes of queries by orders of magnitude while providing strong guarantees on query accuracy. I accomplish this by developing novel query processing algorithms, indexing methods, and execution engines for unstructured data queries. I'll also describe how to find errors in human labels and ML model outputs using novel data management systems. My systems can be used to automatically improve ML models and, perhaps surprisingly, have discovered a large number of errors in a popular autonomous vehicle dataset. My research has been deployed at an autonomous vehicle company and has enabled new forms of video analytics for ecologists at the Jasper Ridge biological preserve.

Bio: Daniel Kang is a sixth year PhD student in the Stanford DAWN lab, co-advised by Professors Peter Bailis and Matei Zaharia. His research focuses on systems to query unstructured data. In particular, he focuses on using cheap approximations to accelerate query processing algorithms and new programming models for ML data management. Daniel is collaborating with autonomous vehicle companies and ecologists to deploy his research. His work is supported in part by the NSF GRFP and the Google PhD fellowship.
~~~~~

CS Colloquium Speaker
Speaker: Amy Ousterhout ‘13, University of California, Berkeley
Date: Thursday, April 21, 2022
Time: 12:30pm EST
Location: CS 105
Host: Jennifer Rexford
Event page: https://www.cs.princeton.edu/events/26177
This talk will be live-streamed at https://mediacentrallive.princeton.edu/

Title: Optimizing CPU Efficiency and Tail Latency in Datacenters

Abstract:  The slowing of Moore’s Law and increased concerns about the environmental impacts of computing are exerting pressure on datacenter operators to use resources such as CPUs and memory more efficiently. However, it is difficult to improve efficiency without degrading the performance of applications.

In this talk, I will focus on CPU efficiency and how we can increase efficiency while maintaining low tail latency for applications. The key innovation is to reallocate cores between applications on the same server very quickly, every few microseconds. First I will describe Shenango, a system design that makes such frequent core reallocations possible. Then I will show how policy choices for core reallocation and load balancing impact CPU efficiency and tail latency, and present the policies that yield the best combination of both.

Bio: Amy is a postdoctoral researcher in the Department of Electrical Engineering and Computer Sciences at UC Berkeley. She received her PhD in Computer Science from MIT and her BSE in Computer Science from Princeton University. Her research is on operating systems and distributed systems, and focuses on improving the efficiency, performance, and usability of applications in datacenters. She is a recipient of a Jacobs Presidential Fellowship at MIT, an NSF Graduate Research Fellowship, and a Hertz Foundation Fellowship.