Robert Harrison will present his Pre FPO on February 7, 2019 at 2pm in CS 402.
The members of his committee are as follows: Jennifer Rexford (adviser), Nick Feamster, Nate Foster (Cornell), Kyle Jamieson, and David Walker
All are welcome to attend. The talk title and abstract follow below.
Title: Scalable Network-Wide Telemetry with Programmable Switches
Abstract:
Managing and securing modern networks requires collecting and analyzing network traffic from a distributed collection of switches in real time, i.e., performing network-wide telemetry. Telemetry systems must be flexible and fine-grained, both in timescale and traffic detail, to support the wide range of queries that analyze security, performance, and reliability properties of networks. Yet, they must also scale as the number of monitoring tasks, link speeds, and the size of the networks grow. Achieving fine-grained, network-wide telemetry that scales requires balancing the division of labor between high-speed network switches and general-purpose CPUs. Modern Protocol Independent Switch Architecture (PISA) switches combine the high-speed processing of network hardware with the programmability, albeit limited, of general-purpose CPUs and allow us to adjust this division of labor.
This thesis explores, first, how to partition telemetry tasks between a general-purpose CPU and a single PISA switch for a flexible query language, and second, how to distribute a subset of queries expressed in that language across a set of switches. In order to realize these goals, we must address two key challenges: (i) how to effectively use a programmable switch to perform customized data collection that minimizes the amount of data processed by the CPU, and (ii) how to efficiently coordinate among a distributed set of switches to compute the output of a global function. In addressing both of these challenges, we must operate within memory, processing, and bandwidth constraints. We describe our solutions, respectively, to these challenges below.
First, we present Sonata, an expressive and scalable network telemetry system that performs the collection and analysis of network traffic using the compute resources of both stream-processing servers and a programmable switch. Sonata provides a declarative interface to express queries using dataflow operators for a range of common telemetry tasks. Sonata partitions queries across a stream processor and a switch data plane, running as much of the query as it can on the network switch at line rate. Sonata models the constraints of PISA switches and solves an optimization problem to compile the high-level dataflow operators in the query language to low-level PISA primitives. Sonata can support a wide range of monitoring tasks while reducing the workload on the stream processor by orders of magnitude compared to existing telemetry systems.
Second, we present Herd, a system for implementing a subset of Sonata queries distributed across a set of switches with high accuracy under resource constraints. Our solution counts relevant flows at distributed switches without maintaining per-flow state and probabilistically reports them to a central coordinator. Based on these reports, the coordinator adapts the reporting threshold and probability at each switch based on the spatial locality of the flows. Simulations using real traffic traces show that our prototype can detect network-wide heavy hitters accurately with 17% savings in communication overhead and 38% savings in switch state compared to existing approaches. We then present an algorithm to tune system parameters in order to maximize detection accuracy under resource constraints.
Together, Sonata and Herd provide network operators the ability to execute a set of network-wide telemetry queries from a single interface that combines the strengths of both programmable data planes and general-purpose CPUs to achieve both flexibility and scalability.