[talks] Haoyu Zhang will present his General Exam on May 7, 2015 at 10am in CS 401

Nicki Gotsis ngotsis at CS.Princeton.EDU
Thu Apr 30 14:25:18 EDT 2015


Haoyu Zhang will present his General Exam on May 7, 2015 at 10am in CS 401.

The members of his committee are Mike Freedman (Advisor), Jen Rexford, and Nick Feamster.

Everyone is invited to attend his talk, and those faculty wishing to remain for the oral exam following are welcome to do so.  His abstract and reading list follow below.

Abstract
> Software-defined networking (SDN) offer greater flexibility than
> traditional distributed network architectures, at the risk of the
> controller being a single point-of-failure. Unfortunately, existing
> fault-tolerance techniques, such as replicated state machine, are
> insufficient to ensure correct network behavior under controller
> failures. The challenge is that, in addition to the application state of
> the controllers, the switches maintain hard state that must be handled
> consistently. Thus, it is necessary to incorporate switch state into the
> system model to correctly offer a "logically centralized" controller.
>
> We introduce Ravana, a fault-tolerant SDN controller platform that
> processes the control messages transactionally and exactly once (at both
> the controllers and the switches). Ravana maintains these guarantees in
> the face of both controller and switch crashes. The key insight in
> Ravana is that replicated state machines can be extended with
> lightweight switch-side mechanisms to guarantee correctness, without
> involving the switches in an elaborate consensus protocol. Our prototype
> implementation of Ravana provides transparent fault tolerance:
> controller applications can run on Ravana without modifying a single
> line of code. Experiments show that Ravana achieves high throughput with
> reasonable overhead, compared to a single controller, with a failover
> time under 100ms. We also use verification tools to prove Ravana's
> correctness under controller failures.


Reading List
> [1] J. H. Saltzer and M. F. Kaashoek, Principles of Computer System
> Design: An Introduction. Morgan Kaufmann Publishers Inc., 2009.
>
> [2] L. Lamport, “Time, Clocks, and the Ordering of Events in a
> Distributed System,” Commun. ACM, July 1978.
>
> [3] M. Rosenblum and J. K. Ousterhout, “The Design and Implementation
> of a Log-structured File System,” ACM Trans. Comput. Syst., Feb. 1992.
>
> [4] B. Liskov and J. Cowling, “Viewstamped Replication Revisited,”
> tech. rep., MIT, July 2012.
>
> [5] D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J.
> Spreitzer, and C. H. Hauser, “Managing Update Conflicts in Bayou, a
> Weakly Connected Replicated Storage System,” in SOSP, Dec. 1995.
>
> [6] W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen, “Don’T
> Settle for Eventual: Scalable Causal Consistency for Wide-area Storage
> with COPS,” in SOSP, Oct. 2011.
>
> [7] D. Ongaro and J. Ousterhout, “In Search of an Understandable
> Consensus Algorithm,” in USENIX ATC, June 2014.
>
> [8] J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J.
> Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh,
> S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle,
> S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R.
> Wang, and D. Woodford, “Spanner: Google’s Globally-distributed
> Database,” in OSDI, Oct. 2012.
>
> [9] M. Casado, M. J. Freedman, J. Pettit, J. Luo, N. Gude, N. McKeown,
> and S. Shenker, “Rethinking Enterprise Network Control,” IEEE/ACM
> Trans. Netw., Aug. 2009.
>
> [10] T. Koponen, M. Casado, N. Gude, J. Stribling, L. Poutievski, M.
> Zhu, R. Ramanathan, Y. Iwata, H. Inoue, T. Hama, and S. Shenker,
> “Onix: A Distributed Control Platform for Large-scale Production
> Networks,” in OSDI, 2010.
>
> [11] E. B. Nightingale, K. Veeraraghavan, P. M. Chen, and J. Flinn,
> “Rethink the Sync,” in OSDI, Nov. 2006.


More information about the talks mailing list