[talks] Matvey Arye will present his FPO, " Data Processing Across Continents" on Tuesday, 5/17/2016 at 10:00am in CS 402.

Nicki Gotsis ngotsis at CS.Princeton.EDU
Mon May 9 11:36:00 EDT 2016


Matvey Arye will present his FPO, " Data Processing Across Continents" on Tuesday, 5/17/2016 at 10:00am in CS 402.

The members of his committee are Michael Freedman (adviser), readers: Andrea LaPaugh and Kai Li; nonreaders: Jennifer Rexford and Nick Feamster.

A copy of his thesis is available in Room 310.

Everyone is invited to attend his talk. The talk abstract follow below.

An increasing number of data sources create data across the globe. These include
everything from server logs owned by Internet-scale companies to military intelligence
systems. This thesis addresses the question of how to enable near-real-time
analytical queries on such data. Existing systems tend to centralize such data into
a single datacenter before analyzing it. However, in light of low and asymmetric
bandwidth provisioning in and between certain geographic regions, centralizing all
data can be both slow and costly. This thesis have addressed this problem with three
complementary research directions.
First, we describe a system that queries the data in a distributed manner and centralizes
only the data that is needed to fulfill the query. Our system incorporates edge
storage and customizable degradation operators in its programming model. These elements
allow the system to adjust the data-volumes transferred to match available
bandwidth.
Second, we explore some challenges due to the interaction between an application level
dynamic quality adaptation control loop and TCP (which has its own control
loop). These two control loops can create negative feedback e↵ects which reduces
system throughput below what the network can sustain. These insights are translated
into the domain of Internet video streaming and several solutions are proposed. Our
solutions enable video flows to achieve above 90% of its fair-share of throughput,
while industry players often achieve less than 50% of their fair-share.
Finally, we present a case study of how to optimize queries for wide-area analysis.
We optimize the top-k query, which addresses questions of popularity and is thus
ubiquitous in modern computer systems. Our algorithms reduce both the bandwidth
usage and number of rounds used by such queries. In particular, we propose the first
exact two-round top-k algorithm (which still transfers 19% less bytes than the best
previously-known exact 3-round algorithm). Our 2-or-3-round exact algorithm transfers 
31% less bytes than the best previously-known approximate algorithm. Finally,
our approximate algorithm uses 40% less bandwidth than previous algorithms with
stronger guarantees.



More information about the talks mailing list