Jennifer Lam will present her FPO "Rethinking Distributed Systems for High Contention Workloads" on Monday, 5/4/26 at 10am in CS 301
Jennifer Lam will present her FPO "Rethinking Distributed Systems for High Contention Workloads" on Monday, 5/4/26 at 10am in CS 301 and Zoom Zoom link: [ https://princeton.zoom.us/j/91095935061?pwd=heXLEXo7YrQLfGSdK8ZSVN6r5zv3AF.1&jst=2 | https://princeton.zoom.us/j/91095935061?pwd=heXLEXo7YrQLfGSdK8ZSVN6r5zv3AF.1&jst=2 ] Committee: Examiners: Wyatt Lloyd (adviser), Mae Milano, and Jialin Ding, Zachary Kincaid Readers: Wyatt Llyod and Haonan Lu (co-adviser, external at SUNY Buffalo) Abstract: Strongly consistent distributed systems support large-scale applications by sharding data across many machines to provide capacity greater than what can fit on a single machine. They are designed to run on asynchronous datacenter networks, where transaction requests may be dropped, delayed, or interleaved across shards. Interleaving requests is a root cause of performance bottlenecks. When transactions overlap across shards, their requests arrive in opposite orders. If at least one request is a write, it becomes challenging for serializable systems to guarantee that transactions appear in a matching total order across all shards; writes make the execution order non-commutative, so a read ordered before a write cannot observe it, while a read ordered after must. Consequently, interleaving may result in a request order on one shard contradicting another. Such conflicts degrade performance by consuming extra CPU, memory, and network round-trips. The frequency of conflicts increases under workload skew, where access patterns favor a subset of keys. Skew causes overlap between transactions accessing the same shards and, combined with writes, leads to high contention. Consequently, concurrency control protocols that performed well under low contention often perform poorly under high contention. Many use conflict-based ordering, which orders overlapping transactions using conflict resolution mechanisms such as wait-wait or wait-die. These mechanisms typically allow at most one transaction to proceed at a time, forcing others to wait or be aborted. Under high contention, this worsens overall latency and throughput. This dissertation recognizes that high-contention workloads are challenging for existing distributed systems and calls for new architectures robust against them. We introduce two such systems. The first is TurboDB, a process-ordered serializable distributed database that mitigates conflict on the hottest keys by co-locating them on a hotshard and efficiently serializing requests to that shard. It improves performance by up to an order of magnitude. The second is Whiptail, a strictly serializable distributed database that eliminates conflicts. It exploits improved datacenter networks and tightly synchronized clocks to enable clients to predetermine the total order of transactions, eliminating conflicts. Whiptail achieves up to an order-of-magnitude improvement in throughput on high-contention workloads.
participants (1)
-
Gradinfo