Do not blame users for misconfigurations
Yuanyuan Zhou,
University of California, San Diego
Monday, December 9, 4:30pm
Computer Science 105
Today's data centers usually employ high redundancy to tolerate hardware
and software errors. But unfortunately another type of errors, namely
configuration errors (i.e., misconfigurations) can still take down the
whole data centers. It has contributed to more than 30% high severity
issues and caused some of the recent major downtime in cloud service
companies including Amazon, Microsoft Azure, Google, Facebook, etc.
Unfortunately, many software developers and system designers put most of
blames on users for configuration errors, and do not pay enough
attention to handling misconfiguration in a more active way. Comparing
to software bugs, configuration issues have much less tooling support
for error detections, issue tracking, tolerance testing, as well as
design reviews.
In this talk, I will present our recent work on characterizing
misconfigurations in commercial and open source systems (as well as a
major commercial cloud service provider), and also some of our solutions
in addressing this configuration problem. We approach it from the
perspective that configurations are a part of user interface, and
thereby need to consider from user perspectives. More specifically, as a
practical first step, we need to avoid error-prone requirements and
also react gracefully to user mistakes. Our solutions have been used
two commercial companies and have influenced some popular open source
systems such as "Squid" redesigning their configuration. In this talk, I
will also some of the negative interactions (and "challenges") with
some open source developers.
Yuanyuan Zhou is currently a Qualcomm Chair Professor at UC-San Diego.
Before UCSD, she was a tenured associate professor at University of
Illinois at Urbana Champaign. Her research interests span the areas of
operating systems, software engineering, system reliability and
maintainability. She has 3 papers selected into the IEEE Micro Special
Issue on Top Picks from Architecture Conferences, best paper at SOSP'05,
and 2011 ACM SIGSOFT FSE (Foundation of Software Engineering)
Distinguished Paper. She has co-founded two startups. Her recent
startup, PatternInsight, has successfully deployed software quality
assurance tools in many companies including Cisco, Juniper, Qualcomm,
Motorola, Intel, EMC, Lucent, Tellabs, etc. Recently Pattern Insight has
sold its Log Insight business to VmWare. In addition, Intel has
licensed some of her solutions on detecting concurrency bugs. She has
graduated 14 Ph.D. students so far, many of whom have joined top
industrial companies and academia such as University of Wisconsin at
Madison, University of Toronto, University of Waterloo, etc as
tenure-track faculty.