[talks] Matvey Arye general exam
Melissa M. Lawson
mml at CS.Princeton.EDU
Tue Apr 23 14:47:39 EDT 2013
Matvey Arye will present his research seminar/general exam on Monday April 29 at 12:30 PM in
Room 401 (note room!). The members of his committee are: Michael Freedman (advisor),
Vivek Pai, and Kai Li. Everyone is invited to attend his talk and those faculty wishing to remain
for the oral exam following are welcome to do so. His abstract and reading list follow below.
----- Original Message -----
Global-scale services generate data that is both widely distributed and big, such as system logs and video feeds. Unfortunately, traditional approaches for backhauling and analyzing this data centrally are slow and expensive, due to the high cost or availability of wide-area network bandwidth. Moreover, they require the analyst to commit to a data-collection policy upfront, making it agnostic to current and future resource conditions.
Jetstream is a system that allows adaptive and real-time analysis of large, distributed data sets. It uses dispersed, structured storage to enable data collection without a fixed policy, and adapts the fidelity of collection in response to changes in network conditions. Namely, if a given user query cannot be satisfied within the available bandwidth, Jetstream automatically transforms the query, trading precision for bandwidth. One key ingredient in Jetstream’s architecture is its storage abstraction: a novel adaptation of the data cube from OLAP databases, which we use to represent aggregations and approximations of distributed data. The cube model helps us define a range of data-degradation transforms, all of which can be implemented as standard operators in a user’s query graph. The evaluation is conducted on a system stretching between clusters in Europe and North America and demonstrates the ability to maintain real-time responsiveness, save significant bandwidth through in-place aggregation and approximation, and dynamically adapt the data degradation policies based on changing resource constraints and input data rates.
Current Reading List:
(9 papers and 1 textbook)
Principles of Computer System Design: An Introduction
Saltzer and Kaashoek
The Design of the Borealis Stream Processing Engine
Abadi et al., CIDR 2005
Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly
TAG: A Tiny Aggregation Service for Ad-Hoc Sensor Networks
Madden et al., OSDI, 2002
A Cost-Space Approach to Distributed Query Optimization in Stream Based Overlays
Shneidman et al., NetDB 2005
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data
Sameer Agarwal, Aurojit Panda, Barzan Mozafari, Samuel Madden, Ion Stoica
To Appear in ACM EuroSys 2013
DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views
Yanif Ahmad, Oliver Kennedy, Christoph Koch, and Milos Nikolic
MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat, OSDI 2004
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Jim Gray, et al., Data Mining and Knowledge Discover 1997.
Discretized Streams: An Efﬁcient and Fault-Tolerant Model for Stream Processing on Large Clusters
Matei Zaharia, et al., HotCloud, 2012.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the talks