[talks] Matvey Arye general exam

Tue Apr 23 14:47:39 EDT 2013

Matvey Arye will present his research seminar/general exam on Monday April 29 at 12:30 PM in 
Room 401 (note room!). The members of his committee are: Michael Freedman (advisor), 
Vivek Pai, and Kai Li. Everyone is invited to attend his talk and those faculty wishing to remain 
for the oral exam following are welcome to do so. His abstract and reading list follow below. 

----- Original Message -----

Abstract: 
Global-scale services generate data that is both widely distributed and big, such as system logs and video feeds. Unfortunately, traditional approaches for backhauling and analyzing this data centrally are slow and expensive, due to the high cost or availability of wide-area network bandwidth. Moreover, they require the analyst to commit to a data-collection policy upfront, making it agnostic to current and future resource conditions. 
Jetstream is a system that allows adaptive and real-time analysis of large, distributed data sets. It uses dispersed, structured storage to enable data collection without a fixed policy, and adapts the fidelity of collection in response to changes in network conditions. Namely, if a given user query cannot be satisfied within the available bandwidth, Jetstream automatically transforms the query, trading precision for bandwidth. One key ingredient in Jetstream’s architecture is its storage abstraction: a novel adaptation of the data cube from OLAP databases, which we use to represent aggregations and approximations of distributed data. The cube model helps us define a range of data-degradation transforms, all of which can be implemented as standard operators in a user’s query graph. The evaluation is conducted on a system stretching between clusters in Europe and North America and demonstrates the ability to maintain real-time responsiveness, save significant bandwidth through in-place aggregation and approximation, and dynamically adapt the data degradation policies based on changing resource constraints and input data rates. 

Current Reading List: 
(9 papers and 1 textbook) 

Principles of Computer System Design: An Introduction 
Saltzer and Kaashoek 

The Design of the Borealis Stream Processing Engine 
Abadi et al., CIDR 2005 

Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks 
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly 
EuroSys, 2007 

TAG: A Tiny Aggregation Service for Ad-Hoc Sensor Networks 
Madden et al., OSDI, 2002 

A Cost-Space Approach to Distributed Query Optimization in Stream Based Overlays 
Shneidman et al., NetDB 2005 

BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data 
Sameer Agarwal, Aurojit Panda, Barzan Mozafari, Samuel Madden, Ion Stoica 
To Appear in ACM EuroSys 2013 

DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views 
Yanif Ahmad, Oliver Kennedy, Christoph Koch, and Milos Nikolic 
VLDB 2012 

MapReduce: Simplified Data Processing on Large Clusters 
Jeffrey Dean and Sanjay Ghemawat, OSDI 2004 

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals 
Jim Gray, et al., Data Mining and Knowledge Discover 1997. 

Discretized Streams: An Efﬁcient and Fault-Tolerant Model for Stream Processing on Large Clusters 
Matei Zaharia, et al., HotCloud, 2012. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/talks/attachments/20130423/39ace227/attachment.htm>