[talks] 2pm Fri Oct 1 talk on "predicting faults in heterogenous federated distributed systems"

Jennifer Rexford jrex at CS.Princeton.EDU
Fri Sep 24 18:46:04 EDT 2010


Speaker: Professor Dejan Kostic, EPFL
Title: Predicting Faults in Heterogeneous, Federated Distributed Systems
Date/time: 2:00-3:00pm on Friday October 1
Location: CS 302

Abstract:

It is notoriously difficult to make distributed systems reliable. This
becomes even harder in the case of the widely-deployed systems that
become heterogeneous and federated. The set of routers in charge of
the inter-domain routing in the Internet is a prime example of such a
system. The unanticipated interaction of nodes under seemingly valid
configuration changes and local fault-handling can have a profound
effect. For example, the Internet has suffered from multiple IP prefix
hijackings, as well as performance and reliability problems due to
emergent behavior resulting from a local session reset. 

We argue that the key step in making these systems reliable is the
need to automatically predict faults. In this talk, I will describe
the design and implementation of DiCE, a system that uses temporal and
spatial awareness to predict faults in heterogeneous, federated
systems. Our live evaluation in the testbed shows that DiCE quickly
and successfully predicts two important classes of faults, operator
mistakes and programming errors, that have plagued BGP routing in the
Internet.

Joint work with Marco Canini, Vojin Jovanovic, and Gautam Kumar

Bio: Dejan Kostić obtained his Ph.D. in Computer Science at the Duke University, under Amin Vahdat. He spent the last two years of his studies and a brief stay as a postdoctoral scholar at the University of California, San Diego. He received his Master of Science degree in Computer Science from the University of Texas at Dallas, and his Bachelor of Science degree in Computer Engineering and Information Technology from the University of Belgrade (ETF), Serbia. In January 2006, he started as a tenure-track assistant professor at the School of Computer and Communications Sciences at EPFL (Ecole Polytechnique Fédérale de Lausanne), Switzerland. In 2010, he received a European Research Council (ERC) Starting Investigator Award. His interests include Distributed Systems, Computer Networks, Operating Systems, and Mobile Computing.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/talks/attachments/20100924/6a844ba7/attachment.html>


More information about the talks mailing list