[talks] Nanqinqin Li will present his FPO "System Idle Time Need Not Be Wasted" on Thursday, 5/7/2026 at 11:30 AM at 194 Nassau Street conference room.

23 Apr 2026

      Nanqinqin Li will present his FPO "System Idle Time Need Not Be Wasted" on Thursday, 5/7/2026 at 11:30 AM at 194 Nassau Street conference room. 

Committee: 
Examiners: Mike Freedman (adviser), Wyatt Lloyd, Asaf Cidon (Columbia University) 
Readers: Mike Freedman and Jialin Ding 

Abstract: 
Modern computer systems routinely lose performance and availability to 
time spent waiting on blocking events: CPUs stall on memory, services 
block on storage and networks, and stateful applications wait through 
conservative failover protocols. This dissertation studies a common 
question: when a system must wait, how can it safely run other useful 
work in parallel with that wait? The dissertation develops this theme 
through two systems projects. The first, Speculative Recovery, targets 
failover for stateful applications using recovery from disaggregated 
storage (REDS). REDS is resource efficient because only one instance 
runs during normal operation, but failover is slow because timeout and 
recovery run sequentially. Speculative Recovery starts backup recovery 
as soon as the primary appears unavailable, while letting the primary 
continue in case it recovers first. The work introduces disk superposition 
and the super/collapse abstractions, allowing temporary divergence of 
disk state while ensuring only one version becomes externally 
observable. The design includes collocated-clone for near-normal clone 
performance and dirty-bit-based rules for correctness. Implemented in 
Ceph and evaluated with MySQL, PostgreSQL, and MariaDB, the 
approach improves failover while preserving the resource efficiency of 
REDS. The second project, LiteSwitch, targets sub-microsecond CPU 
stall cycles caused by CXL-attached memory. CXL expands memory 
capacity but increases access latency and amplifies memory-induced 
stalls. Existing harvesting techniques are mismatched: profiling-based 
methods struggle with CXL latency variation, and interruptbased 
delivery is too expensive for hundreds-of-nanoseconds windows. 
LiteSwitch uses a lightweight hardware-software co-design. On the 
hardware side, location-dependent memory branching (LDMB) detects 
long-latency accesses online and delivers control via direct branching. 
On the software side, Bundled Handoff provides fast scavenger 
selection, and xstate-aware context switching avoids unnecessary 
SIMD/FP iiisave/restore overhead. Evaluation shows substantial 
slowdown reductions across representative workloads and CXL latency 
settings. Taken together, these projects show that idle time can be an 
opportunity rather than unavoidable loss. The central lesson is that 
useful parallelization with waiting is effective only when systems codesign 
performance mechanisms with correctness constraints. By 
combining overlap with careful control over observability, ordering, and 
runtime overhead, this dissertation demonstrates practical ways to 
improve both availability and performance in modern systems.

[talks] Nanqinqin Li will present his FPO "System Idle Time Need Not Be Wasted" on Thursday, 5/7/2026 at 11:30 AM at 194 Nassau Street conference room.

Gradinfo