Nanqinqin Li will present his FPO "System Idle Time Need Not Be Wasted" on Thursday, 5/7/2026 at 11:30 AM at 194 Nassau Street conference room. Committee: Examiners: Mike Freedman (adviser), Wyatt Lloyd, Asaf Cidon (Columbia University) Readers: Mike Freedman and Jialin Ding Abstract: Modern computer systems routinely lose performance and availability to time spent waiting on blocking events: CPUs stall on memory, services block on storage and networks, and stateful applications wait through conservative failover protocols. This dissertation studies a common question: when a system must wait, how can it safely run other useful work in parallel with that wait? The dissertation develops this theme through two systems projects. The first, Speculative Recovery, targets failover for stateful applications using recovery from disaggregated storage (REDS). REDS is resource efficient because only one instance runs during normal operation, but failover is slow because timeout and recovery run sequentially. Speculative Recovery starts backup recovery as soon as the primary appears unavailable, while letting the primary continue in case it recovers first. The work introduces disk superposition and the super/collapse abstractions, allowing temporary divergence of disk state while ensuring only one version becomes externally observable. The design includes collocated-clone for near-normal clone performance and dirty-bit-based rules for correctness. Implemented in Ceph and evaluated with MySQL, PostgreSQL, and MariaDB, the approach improves failover while preserving the resource efficiency of REDS. The second project, LiteSwitch, targets sub-microsecond CPU stall cycles caused by CXL-attached memory. CXL expands memory capacity but increases access latency and amplifies memory-induced stalls. Existing harvesting techniques are mismatched: profiling-based methods struggle with CXL latency variation, and interruptbased delivery is too expensive for hundreds-of-nanoseconds windows. LiteSwitch uses a lightweight hardware-software co-design. On the hardware side, location-dependent memory branching (LDMB) detects long-latency accesses online and delivers control via direct branching. On the software side, Bundled Handoff provides fast scavenger selection, and xstate-aware context switching avoids unnecessary SIMD/FP iiisave/restore overhead. Evaluation shows substantial slowdown reductions across representative workloads and CXL latency settings. Taken together, these projects show that idle time can be an opportunity rather than unavoidable loss. The central lesson is that useful parallelization with waiting is effective only when systems codesign performance mechanisms with correctness constraints. By combining overlap with careful control over observability, ordering, and runtime overhead, this dissertation demonstrates practical ways to improve both availability and performance in modern systems.