Abstract:
Web Caching is the caching of web documents in order to reduce bandwidth usage, server load and perceived download time for the web object download. A web cache stores copies of web objects when they are accessed through cache. The stored objects can be used to server subsequent requests. Web caching has been extensive studied and a plethora of techniques exist today. Solutions are based on developing good cache replacement algorithms which are used to figure out storing what objects can provide more savings, improving file systems which usually involves optimizing file systems for frequent additions, deletions and accesses, etc. However, the existing solutions have scalability problems with respect to memory. Some techniques impose lower bounds on the amount of memory needed for a web cache. Other techniques have huge virtual memory requirements leading to frequent swapping in low memory systems. Cache replacement policies need metadata for better functionality and file systems need huge lists or trees (preferably in memory) for any access to files that it stores. This requires the physical memory to scale as an increasing function of the number of web objects that the cache can store. This restricts not only the number of objects that can be stored in a web cache but also the economical affordability of the system. Hence, we propose a web caching mechanism, the HashCache, a file system optimization approach, which uses hashing to reduce the physical memory requirements. HashCache uses hashing techniques to map files to the disk locations. Hashing makes the cache maintenance and usage to be mostly memory less. Hashing mechanism also allows more flexible mechanisms for metadata maintenance and does not impose lower bounds on the requirements of memory to provide performance guarantees. HashCache exploits the distribution of sizes and contents of web objects to enable an efficient usage of the secondary storage (optimal bucket sizes). Also, with the help of experiments on an implementation of the HashCache architecture we show the usefulness of HashCache.