Sid Sen will present his preFPO on Thursday March 8 at 11AM in Room 402. 
The members of his committee are: Bob Tarjan, advisor, reader; Michael Freedman,
advisor, reader; Jen Rexford, reader; Vivek Pai and Bob Sedgewick, nonreaders.
Everyone is invited to attend his talk.  His abstract follows below.
----------------------------------
Title: New Data Structures and Algorithms for Highly-Available Distributed Systems

Abstract:

We design new data structures and algorithms to improve the availability of
distributed systems. Our work spans several settings, from individual database
servers to large-scale peer-to-peer systems. A unifying approach in our work is
to synergize theory and systems until a solution that is both practical and
provably good is attained.

In database systems, availability suffers when transactions contend for locks on
shared index structures. To increase transaction concurrency, many databases
such as Berkeley DB avoid rebalancing B-tree indices after deletions, only
rebalancing after insertions. But when one company tried this technique with
red-black trees, the result was an extended downtime and litigation. Using a
novel exponential-potential analysis, we theoretically justify why deletion
without rebalancing works in B-trees but fails in red-black trees, and devise
a new binary tree called Ravl trees that works.

In peer-to-peer systems, the availability of a service depends on unreliable
participant machines. These machines fail not just by crashing, but also in
arbitrary ways, e.g. due to bugs or malice. To scale such a system securely, we
must partition it into smaller groups that remain locally correct despite a
dynamic, global adversary. We expose serious practical limitations in the
current-best algorithm for this, and devise a new algorithm called Commensal
Cuckoo that keeps the system available two orders of magnitude longer.

In enterprise desktop and some server cluster settings, availability is often
desired in conjunction with energy efficiency. Unfortunately, existing
energy-saving solutions require special hardware or virtualization technology
that IT departments are unwilling to deploy. To address this, we built GreenUp,
a system that allows any machine to act as a proxy for other sleeping machines,
ensuring their availability to clients seamlessly. Notably, while optimizing
load distribution in GreenUp, we discovered and analyzed a new balls-in-bins
process with connections to population genetics and high-dimensional geometry.
GreenUp is deployed on over ~1,100 machines within Microsoft.

Though often ignored, networking is as critical to the availability of a system
as the servers themselves. Beyond connectivity, congestion in the network can
render services unavailable by starving traffic, even when sufficient bandwidth
exists. We address this problem in datacenter networks through LocalFlow, a flow
routing algorithm based on splitting flows. Unlike prior solutions, LocalFlow is
switch-local (and hence scalable) and routes optimally. Realizing LocalFlow
required a novel theoretical model as well as novel mechanisms for practical
deployment on programmable commodity switches.