Enabling Large-Scale Data Intensive Computations
Chandu Thekkath, Microsoft Research
Friday, October 22, 1:30pm
This talk describes a set of distributed services developed at
Microsoft Research Silicon Valley to enable efficient parallel
programming on very large datasets. Parallel programs arise naturally
within scientific, data mining, and business applications. Central to
our philosophy is the notion that parallel programs do not have to be
difficult to write and that the same program must seamlessly run on a
laptop, desktop, a small cluster, or on a large data center without the
author having to worry about the details of parallelization,
synchronization, or fault-tolerance. We have built several services
(Dryad, DryadLINQ, TidyFS, and Nectar) that embody this belief. Our
goal is to enable users, particularly scientists of all disciplines, to
treat a computer cluster as a forensic, diagnostic, or analytic tool.
The talk will describe the details of our infrastructure and the
characteristics of some of the applications that have been run on it.
Chandu Thekkath is a researcher at Microsoft Research
Silicon Valley. He received his Ph.D. in Computer Science from the
University of Washington in 1994. Since then, except for a sabbatical
year at Stanford in 2000, he has been in industrial research labs at
DEC, Compaq, and Microsoft. He is a fellow of the ACM.