[talks] Matvey Arye is presenting his Pre FPO titled "Data Processing Across Continents" on April 30, 2015 at 3pm in CS 401

Nicki Gotsis ngotsis at CS.Princeton.EDU
Fri Apr 24 11:10:33 EDT 2015

Matvey Arye is presenting his Pre FPO April 30, 2015 at 3pm in CS 401.

The members of his/her committee are: Mike Freedman(advisor), Kai LI (reader), Andrea Lapaugh(reader), Jen Rexford (non-reader), Nick Feamster(non-reader) 

Everyone is invited to attend his talk.  The talk title and abstract follow below:

Title: "Data Processing Across Continents" 

Abstract: An increasing number of data sources create data across the globe. These include everything from server logs owned by Internet-scale companies to military intelligence systems. This thesis addresses the question of how to enable near-real-time analytical queries on such data. Existing systems tend to centralize such data into a single datacenter before analyzing it. However, in light of low and asymmetric bandwidth provisioning in and between certain geographic regions, centralizing all data can be both slow and costly. 
I have addressed this problem with three complementary research directions. 
First, I describe a system that queries the data in a distributed manner and centralizes only the data that is needed to fulfill the query. It dynamically adjusts the accuracy of its answers to tradeoff data quality versus responsiveness. Second, I explore some challenges due to the interaction between an application-level dynamic quality adaptation control loop and TCP (which has its own control loop). These two control loops can create negative feedback effects and propose a way to overcome this. We translate these insights into the domain of Internet video streaming, and show improvements in streaming performance over leading industry players. Finally, I present a case study of how to optimize the top-k query for wide-area analysis. The top-k query addresses questions of popularity and is thus ubiquitous in modern computer systems. My algorithms reduce both the bandwidth usage and number of rounds used by such queries. 

More information about the talks mailing list