Christopher Hodsdon will present his FPO "Stronger Abstractions and Performance Guarantees for Building Strongly Consistent Distributed Services" on May 9th, 2023 at 11am in CS 302.
His committee is as follows:
Examiners: Wyatt Lloyd (advisor), Ethan Katz-Bassett (advisor), Michael Freedman
Readers: Siddhartha Sen, Ravi Netravali
Everyone is invited to attend the talk.
Title: Stronger Abstractions and Performance Guarantees for Building Strongly Consistent Distributed Services
Abstract:
Strongly consistent distributed services require complex coordination among the services’ machines. Building these services thus requires designing and reasoning about complex coordination. To make building strongly consistent distributed services easier, designers build services atop abstractions that simplify reasoning about the service. Two abstractions used today are the multi-sequence abstraction and replicated state machines.
The multi-sequence abstraction simplifies ordering operations on sharded data; it explicitly orders the operations on each shard it accesses. Existing implementations have two shortcomings: failures can result in some multi-sequence numbers never being assigned, exposing a noncontiguous multi-sequence, which requires designing and implementing complex coordination in the form of consensus to handle, and existing implementations do not scale
The replicated state machine (RSM) abstraction is used by many of today’s services to ensure correctness and resilience despite failures. It provides the abstraction of “a single machine that does not fail”. However, with RSM protocols used today, an individual slow replica can slow the entire RSM. RSMs are commonly geo-distributed, but current state-of-the-art protocols that provide the stronger slowdown-tolerant abstraction do not have comparable latency to the state-of-practice in a wide-area network or are unable to handle certain types of slowdowns.
This dissertation shows that it is possible to build systems that strengthen both abstractions while providing performance guarantees.
First, we posit that sequencers should expose our new contiguous multi-sequence abstraction. Contiguity guarantees every sequence number is assigned an operation, simplifying and strengthening the abstraction. Without scalability in the implementation of the contiguous multi-sequence, a service would need to be redesigned once it has outgrown the implementation. We design and implement MASON, the first system to expose the contiguous multi-sequence abstraction and the first to provide a scalable multi-sequence. MASON is thus an ideal building block for consistent, scalable services. Our evaluation shows MASON unlocks scalable throughput for two strongly consistent services built on it.
Second, we seek to provide the stronger RSM abstraction of “a single machine that does not slow down”, with AVICENNA, a slowdown-tolerant replicated state machine protocol that has comparable latency to the current state-of-practice in the wide-area.