[talks] S Ihm preFPO

Mon Dec 13 11:41:30 EST 2010

Sunghwan Ihm will present his preFPO on Friday December 17 at 2PM in Room 401 (note room!)
The members of his committee are:  Vivek Pai, advisor; Jen Rexford and Larry Peterson,
readers; 
Mike Freedman and Andrea LaPaugh, nonreaders.  Everyone is invited to attend his talk.
His 
abstract follows below.
----------------------

Title: Understanding and Improving Modern Web Traffic Caching

As Web sites move from relatively static displays of simple pages to
rich media applications with heavy client-side interaction, the nature
of the resulting Web traffic changes as well. At the same time, Web
traffic also increases due to the popularity of social networking,
video streaming, file-hosting sites, and consolidation of existing
applications to the Web. Understanding the nature of these changes
grows more difficult without a better characterization of the
underlying traffic, which is necessary to improve response time,
analyze caching effectiveness, and to design intermediary systems,
such as firewalls, security analyzers, and reporting/management
systems. Unfortunately, while the Web has been studied heavily in the
past, we still have little understanding of today's Web - what has
been changed, and what we can do better about it.

In this thesis, we revisit the previous issues with today's Web,
focusing on characterizing traffic and developing a new caching
technique which outperforms previous approaches. Specifically, we
analyze five years (2006-2010) of real Web traffic from a
globally-distributed proxy system, which captures the browsing
behavior of over 50,000 users. Using this data set, we also develop a
new page detection algorithm that is better suited for the modern Web
page interactions, and investigate the redundancy of this traffic,
using both traditional object-level caching as well as content-based
approaches. Among many findings, we observe a huge potential benefit
of the content-based caching approach in today's Web - the byte hit
rate is almost twice larger than that of the traditional object-level
caching approach. Motivated by the promising benefit of content-based
caching, we also develop Wanax, a scalable and flexible wide-area
network (WAN) accelerator. It uses a novel multiresolution chunking
(MRC) scheme that provides high compression rates and high disk
performance for a variety of content, while using much less memory
than existing approaches. Wanax exploits the design of MRC to perform
intelligent load shedding to maximize throughput even when running on
resource-limited shared platforms. Finally, Wanax exploits mesh
network environments, instead of just the star topologies common in
enterprise branch offices.