Here is the full list of CS Colloquium talks for next week. All talks will be recorded. ~~~~~ CS Colloquium Speaker Speaker: Daniel Kang, Stanford University Date: Monday, April 18, 2022 Time: 12:30pm EST Location: Zoom Webinar Host: Amit Levy Event page: https://www.cs.princeton.edu/events/26185 Please register here: https://princeton.zoom.us/webinar/register/WN_rQIK4ofARyidoIwnUSPT3w Title: Efficient and Accurate Systems for Querying Unstructured Data Abstract: Over the past 60 years, relational databases have been a runaway success: they are deployed at every major organization and have produced hundreds of billions of dollars in market capitalization. However, there is a growing demand for analytics over unstructured data (e.g., videos, audio, text) given the rise of ML capabilities: previously, unstructured data did not fit cleanly with the relational database model (e.g., selecting pixels vs semantic content about objects in an image). Unfortunately, ML can be prohibitively expensive to deploy (e.g., 10 orders of magnitude more expensive than standard relational analytics) and can produce incorrect results. These problems are exacerbated by the scale of data. For example, the Tesla fleet of vehicles produces exabytes of data per day. In this talk, I'll describe my work on new ML-based query systems to tackle the cost and reliability of unstructured data analytics. My first line of work accelerates large classes of queries by orders of magnitude while providing strong guarantees on query accuracy. I accomplish this by developing novel query processing algorithms, indexing methods, and execution engines for unstructured data queries. I'll also describe how to find errors in human labels and ML model outputs using novel data management systems. My systems can be used to automatically improve ML models and, perhaps surprisingly, have discovered a large number of errors in a popular autonomous vehicle dataset. My research has been deployed at an autonomous vehicle company and has enabled new forms of video analytics for ecologists at the Jasper Ridge biological preserve. Bio: Daniel Kang is a sixth year PhD student in the Stanford DAWN lab, co-advised by Professors Peter Bailis and Matei Zaharia. His research focuses on systems to query unstructured data. In particular, he focuses on using cheap approximations to accelerate query processing algorithms and new programming models for ML data management. Daniel is collaborating with autonomous vehicle companies and ecologists to deploy his research. His work is supported in part by the NSF GRFP and the Google PhD fellowship. ~~~~~ CS Colloquium Speaker Speaker: Amy Ousterhout ‘13, University of California, Berkeley Date: Thursday, April 21, 2022 Time: 12:30pm EST Location: CS 105 Host: Jennifer Rexford Event page: https://www.cs.princeton.edu/events/26177 This talk will be live-streamed at https://mediacentrallive.princeton.edu/ Title: Optimizing CPU Efficiency and Tail Latency in Datacenters Abstract: The slowing of Moore’s Law and increased concerns about the environmental impacts of computing are exerting pressure on datacenter operators to use resources such as CPUs and memory more efficiently. However, it is difficult to improve efficiency without degrading the performance of applications. In this talk, I will focus on CPU efficiency and how we can increase efficiency while maintaining low tail latency for applications. The key innovation is to reallocate cores between applications on the same server very quickly, every few microseconds. First I will describe Shenango, a system design that makes such frequent core reallocations possible. Then I will show how policy choices for core reallocation and load balancing impact CPU efficiency and tail latency, and present the policies that yield the best combination of both. Bio: Amy is a postdoctoral researcher in the Department of Electrical Engineering and Computer Sciences at UC Berkeley. She received her PhD in Computer Science from MIT and her BSE in Computer Science from Princeton University. Her research is on operating systems and distributed systems, and focuses on improving the efficiency, performance, and usability of applications in datacenters. She is a recipient of a Jacobs Presidential Fellowship at MIT, an NSF Graduate Research Fellowship, and a Hertz Foundation Fellowship.