
Sunghwan Ihm will present his preFPO on Friday December 17 at 2PM in Room 401 (note room!) The members of his committee are: Vivek Pai, advisor; Jen Rexford and Larry Peterson, readers; Mike Freedman and Andrea LaPaugh, nonreaders. Everyone is invited to attend his talk. His abstract follows below. ---------------------- Title: Understanding and Improving Modern Web Traffic Caching As Web sites move from relatively static displays of simple pages to rich media applications with heavy client-side interaction, the nature of the resulting Web traffic changes as well. At the same time, Web traffic also increases due to the popularity of social networking, video streaming, file-hosting sites, and consolidation of existing applications to the Web. Understanding the nature of these changes grows more difficult without a better characterization of the underlying traffic, which is necessary to improve response time, analyze caching effectiveness, and to design intermediary systems, such as firewalls, security analyzers, and reporting/management systems. Unfortunately, while the Web has been studied heavily in the past, we still have little understanding of today's Web - what has been changed, and what we can do better about it. In this thesis, we revisit the previous issues with today's Web, focusing on characterizing traffic and developing a new caching technique which outperforms previous approaches. Specifically, we analyze five years (2006-2010) of real Web traffic from a globally-distributed proxy system, which captures the browsing behavior of over 50,000 users. Using this data set, we also develop a new page detection algorithm that is better suited for the modern Web page interactions, and investigate the redundancy of this traffic, using both traditional object-level caching as well as content-based approaches. Among many findings, we observe a huge potential benefit of the content-based caching approach in today's Web - the byte hit rate is almost twice larger than that of the traditional object-level caching approach. Motivated by the promising benefit of content-based caching, we also develop Wanax, a scalable and flexible wide-area network (WAN) accelerator. It uses a novel multiresolution chunking (MRC) scheme that provides high compression rates and high disk performance for a variety of content, while using much less memory than existing approaches. Wanax exploits the design of MRC to perform intelligent load shedding to maximize throughput even when running on resource-limited shared platforms. Finally, Wanax exploits mesh network environments, instead of just the star topologies common in enterprise branch offices.
participants (1)
-
Melissa Lawson