Matvey Arye will present his FPO, " Data Processing Across Continents" on Tuesday, 5/17/2016 at 10:00am in CS 402.
Matvey Arye will present his FPO, " Data Processing Across Continents" on Tuesday, 5/17/2016 at 10:00am in CS 402. The members of his committee are Michael Freedman (adviser), readers: Andrea LaPaugh and Kai Li; nonreaders: Jennifer Rexford and Nick Feamster. A copy of his thesis is available in Room 310. Everyone is invited to attend his talk. The talk abstract follow below. An increasing number of data sources create data across the globe. These include everything from server logs owned by Internet-scale companies to military intelligence systems. This thesis addresses the question of how to enable near-real-time analytical queries on such data. Existing systems tend to centralize such data into a single datacenter before analyzing it. However, in light of low and asymmetric bandwidth provisioning in and between certain geographic regions, centralizing all data can be both slow and costly. This thesis have addressed this problem with three complementary research directions. First, we describe a system that queries the data in a distributed manner and centralizes only the data that is needed to fulfill the query. Our system incorporates edge storage and customizable degradation operators in its programming model. These elements allow the system to adjust the data-volumes transferred to match available bandwidth. Second, we explore some challenges due to the interaction between an application level dynamic quality adaptation control loop and TCP (which has its own control loop). These two control loops can create negative feedback e↵ects which reduces system throughput below what the network can sustain. These insights are translated into the domain of Internet video streaming and several solutions are proposed. Our solutions enable video flows to achieve above 90% of its fair-share of throughput, while industry players often achieve less than 50% of their fair-share. Finally, we present a case study of how to optimize queries for wide-area analysis. We optimize the top-k query, which addresses questions of popularity and is thus ubiquitous in modern computer systems. Our algorithms reduce both the bandwidth usage and number of rounds used by such queries. In particular, we propose the first exact two-round top-k algorithm (which still transfers 19% less bytes than the best previously-known exact 3-round algorithm). Our 2-or-3-round exact algorithm transfers 31% less bytes than the best previously-known approximate algorithm. Finally, our approximate algorithm uses 40% less bandwidth than previous algorithms with stronger guarantees.
participants (1)
-
Nicki Gotsis