Practical Elegance: Designing programming models for large-scale systems
Michael Isard,
Microsoft Research
Monday, November 11, 4:30pm
Computer Science 105
The growing need for systems that operate on large datasets is well
known, but most programming models for such systems are either too low
level to appeal to non-expert programmers, or are specialized to tightly
restricted problem domains. I will describe the evolution of a
programming model, and its associated systems, that I have worked on at
Microsoft Research. The model encourages programmers to describe an
algorithm as a series of data-parallel steps, embedded within a familiar
language and programming environment (C#/.NET). In order to write code
this way programmers must think indirectly about data-dependencies, but
can leave out the messy details of concurrency. We believe this
elegantly balances our desire to insulate programmers from
implementation details with the need for the system to automatically
infer safe parallel and distributed execution strategies. We have been
able to design systems that take programs written in this model and
execute them efficiently on large datasets and clusters of hundreds of
computers. As the model has evolved we have made it more expressive, so
that it now encompasses both incremental and iterative computation,
while keeping the ability to execute these richer programs efficiently,
at scale. Our ultimate goal is to integrate data-parallelism seamlessly
into all aspects of a general-purpose programming language.
Michael Isard started out as a computer vision researcher, but for the
last few years has mostly been building distributed execution engines
and thinking about how to program them. He received his D.Phil in
computer vision from the Oxford University Engineering Science
Department in 1998, and worked at the Compaq Systems Research Center in
Palo Alto for three years before joining Microsoft Research in Silicon
Valley in 2002.