TF-CACHE Martin Hamilton
INTERNET-DRAFT Loughborough University
Andrew Daviel
Vancouver Webpages
February 1998
Cachebusting - cause and prevention
draft-hamilton-cachebusting-00.txt
Status of This Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups. Note that other groups may also
distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than as ``work in
progress.''
To learn the current status of any Internet-Draft, please check
the ``1id-abstracts.txt'' listing contained in the Internet-Drafts
Shadow Directories on ds.internic.net (US East Coast),
nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or
munnari.oz.au (Pacific Rim).
Distribution of this memo is unlimited. Editorial comments should
be sent directly to the author. Technical discussion will take
place on the mailing list of the TERENA Web Caching Task Force -
TF-CACHE. For more information see
.
This Internet Draft expires August 1998.
Abstract
Cachebusting is the sometimes deliberate, sometimes inadvertant,
practice of defeating caching. This document explains the nature of
the problem with relation to proxy cache servers using the World-Wide
Web's HTTP protocol, and outlines some simple measures which may be
taken to make an HTTP based service more ''cache friendly''. Since Web
caching is still a novel concept, we also explain the basic
principles behind it. This document should be read by developers of
[Page 1]
INTERNET-DRAFT February 1998
HTTP based products and services - we assume that the reader is
already familiar with HTTP.
1. The rationale for Web Caching
Caching is a technique widely used in both computer systems hardware
and software to improve performance and work around bottlenecks.
General examples include physical memory devoted to caching transient
data on disk drives and controllers, and operating system features
such as directory name lookup cache. Web Caching operates at a
higher level often referred to as "middleware". This typically
implies caching of transient WWW objects by the end user's Web
browser, or using a separate "proxy cache" server which sits between
the end user's browser and the "origin server" which they are trying
to contact. Figure 1 illustrates this relationship.
+---------+ +---------+ +---------+
| End | ----------> | Proxy | ----------> | Origin |
| user's | HTTP | cache | HTTP/FTP/.. | |
| browser | <---------- | server | <---------- | server |
+---------+ +---------+ +---------+
Figure 1 - a simple proxy cache configuration
Proxy cache servers typically speak HTTP [1,2] to the end user's WWW
browser, and a variety of protocols to the origin servers. In
addition to caching WWW objects, they may also elect to cache other
information such as reachability metrics (when choosing between
multiple origin servers) and the results of domain name lookups.
Recent developments have focussed on linking proxy cache servers
together so as to pool their storage capacity - typically using the
Internet Cache Protocol [3]. This is discussed further in [4].
Proxy caches offer additional functionality above and beyond the WWW
browser's own built-in cache, since cached objects may be shared with
the entire population of users and with cooperating proxy cache
servers. By contrast - browser caches are typically private to the
individual, or can only be shared with those browsers which have
access to the filesystem on which the cached objects are found.
Figure 2 illustrates the operation of the proxy cache server in the
case that the requested WWW object (usually identified by its URL, or
the URL plus the HTTP request headers sent by the WWW browser) has
already been cached.
+---------+ +---------+ +---------+
| End | ----------> | Proxy | < No need > | Origin |
| user's | HTTP | cache | < to > | |
| browser | <---------- | server | < contact > | server |
[Page 2]
INTERNET-DRAFT February 1998
+---------+ +---------+ +---------+
Figure 2 - fetching a cached object
A cache's effectiveness is usually measured in terms of its "hit
rate" - the ratio of requests which may be satisfied using cached
objects. The goal of the cache administrator is to make this figure
as high as possible, without serving a significant volume of stale
material to the cache's users.
Cache hit rates of 40% to 50% for WWW related traffic are common, for
example [5]. Caching also helps to make more effective use of the
available bandwidth by allowing TCP congestion control algorithms to
work properly - conventional HTTP traffic takes the form of a very
large number of short lived TCP connections, which often defeats TCP
"slow-start" [6] on busy lines.
It follows that proxy caching should be highly attractive to Internet
Service Providers and organisations which buy connectivity from them,
on a cost/benefit basis. Cache hits are typically delivered an order
of magnitude faster than cache misses, since the objects requested do
not have to be fetched from the origin server. This means that a
site which encourages caching can provide the end user with a much
higher perceived quality of service whilst at the same time getting
better value for money from their leased line(s).
The World-Wide Web community is standardising a new version of HTTP -
1.1 - which specifically addresses a number of caching issues. At
the time of writing, this had yet to be widely deployed, and the
specification was still being developed. In this document we only
discuss the best of current practice.
2. The cachebusting problem
Support in the HTTP protocol and its implementations for proxies and
caching is something which has essentially been retro-fitted. As a
result, there are many common practices which are incompatible with
it, and either defeat caching completely or reduce the benefits which
derive from it. This is primarily an educational issue involving
developers of HTTP based services and systems.
Caching at the HTTP level can cause problems for services which make
heavy use of usage statistics - e.g. to provide "hit counts" for
advertisers. Users of cached copies of an object are effectively
invisible to the provider of the original service. This may provide
a strong motivation to defeat caching.
There is also the case that a product comes with an out-of-the-box
[Page 3]
INTERNET-DRAFT February 1998
configuration which defeats caching, perhaps unintentionally on the
part of the vendor or its developers. If the product works for most
users with few if any modifications to the default settings, there
will be no incentive to dig deeper into its configuration
possibilities.
3. How to be friendly to proxy cache servers
We will go on to outline some simple measures which the developers of
HTTP based systems and services can take to make their products more
cache-friendly.
3.1 Tips for HTTP server administrators
Use a server which supports HTTP 1.1 - this has a number of
additional features to support caching.
Send the Expires header on documents and images where feasible
- this will help caches to decide when your objects are stale.
Use an HTTP server which supports the GET method with the
If-Modified-Since header - this will help browsers and proxy
caches to figure out whether their cached copy of a file is
out of date.
Ensure that the time is set correctly on the server machine, e.g.
via NTP [7], so that the timestamp information carried in the
HTTP headers makes sense.
3.2 Tips for content providers (e.g. HTML authors)
Encourage the sharing of links to common graphics and applets, so
that only one URL is used for a given object.
Use client-side imagemaps (USEMAP - [8]) where feasible, since
server-side imagemaps generate HTTP Redirects which are typically
uncacheable.
Use trailing slashes (/) for directory names to avoid extra
redirects.
Where you are using a file which is returned when the directory
name is requested (typically index.html or index.htm) "./" can
usually be written instead of referring to the file by name.
Try to use a single name for a server in the hostname part of the
URL in the HTML which you create.
[Page 4]
INTERNET-DRAFT February 1998
Don't rename files to age them - give them unique names in the
first place and update the links which point to them.
Use the Internet domain name in the host component of the URLs you
create, rather than the host's IP address.
If you really want to count every access to a given page, embed a
tiny non-cacheable image into it. This will give you an access
count for the page without requiring the whole thing to be
downloaded again by each user of given proxy cache.
3.3 Dynamic content (e.g. CGI) developers
Make results cacheable where practical :-
Use GET instead of POST for simple queries, since POST results
aren't cached.
Use the path component of the URL to pass information instead of
QUERY_STRING - caches may treat objects with a ? in their URL
as uncacheable.
Use a directory name other than "cgi-bin", since caches can be
expected to treat URLs containing this as uncacheable.
Generate valid Last-Modified and Expires headers.
Handle If-Modified-Since requests.
Use applet and scripting technologies such as Javascript or Java
instead of CGI for form validation, where feasible.
If you use cookies, try to restrict them to the portions of your
server where they're essential, since objects returned with a
Set-Cookie header are commonly treated as uncacheable. Be aware
that cookies may not interact well with proxy cache severs.
Try not to parse the HTTP USER_AGENT header to select browser
specific capabilities, since the cached HTML will be browser
specific, and may be returned to a browser which doesn't know
what to do with it. Use features like instead.
Don't use server-side includes unless your server can send the
Last-Modified HTTP header with them.
Don't use redirects, since their results may be uncacheable.
Try to keep the size and complexity of pages on secure servers
to a minimum, since secure HTTP requests are not cached in proxy
caches and may not be cached in many browsers. Try to avoid
using secure servers for general pages where feasible.
Don't set the objects your server returns to expire immediately,
[Page 5]
INTERNET-DRAFT February 1998
or at some time in the recent past, unless you want to be held
up to public ridicule!
Don't use content-negotiation until HTTP 1.1 is more widely
deployed, since in HTTP/1.0 it interacts badly with proxy caches.
Don't specify port 80 in the URL, e.g. when generating URLs
programatically.
Don't use server modules or scripts to convert document's character
set on the server side. Leave it to the client.
3.4 Developers of stand-alone applications
Implement proxy support.
Give users of your application the ability to configure
proxying, preferably allowing for a different proxy server and
port number on a protocol by protocol basis, and allowing for
some Internet domains and/or IP addresses to be exempted from
the proxy configuration.
Make use of user/admin configured preferences for HTTP proxying
which may already have been set up before your application is
installed, where these are available.
Ideally any new URL protocol schemes, such as "urn:", should be
passed to an HTTP proxy server, making it possible to support
new protocols without having to upgrade individual software
installations.
4. Security considerations
Cachebusting is clearly justified in those cases where the use of
caching has, in itself, security and privacy implications. The end
user has no way of knowing what information is being logged, or where
it will end up - e.g. bank account or credit card numbers.
Proxy servers tend to subvert firewalls and access controls based on
IP addresses and/or domain names.
Proxy servers can be useful as a central mechanism for laundering
incoming WWW traffic to (for example) remove or block offensive
material, or to check applications and applets being downloaded for
problems such as viruses and denial of service attacks.
[Page 6]
INTERNET-DRAFT February 1998
5. Acknowledgements
Thanks to Duane Wessels, Vinod Valloppilli, George Michaelson, Donald
Neal, Ernst Heiri, Wojtek Sylwestrzak, Alan J. Flavell and Jens-S
Voeckler for their contributions to this document.
6. References
[1] A. Luotonen and K. Altis, "World-Wide Web proxies", In
WWW94 Conference Proceedings (Elsevier), 1994.
[2] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, T.
Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1",
RFC 2068 (Proposed Standard), 01/03/1997.
[3] D. Wessels, K. Claffy, "Internet Cache Protocol (ICP),
version 2", RFC 2186 (Informational), September 1997.
[4] D. Wessels, K. Claffy. "Application of Internet Cache
Protocol (ICP), version 2", RFC 2187 (Informational),
September 1997.
[5] K. Claffy, "NLANR Caching Workshop Report", June 1997.
[6] W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast
Retransmit, and Fast Recovery Algorithms", RFC 2001 (Pro-
posed Standard), 01/24/1997.
[7] D. Mills, "Network Time Protocol (v3)", RFC 1305 (Pro-
posed Standard), 04/09/1992.
[8] J. Seidman, "A Proposed Extension to HTML: Client-Side
Image Maps", RFC 1980 (Informational), 08/14/1996.
7. Authors' addresses
Martin Hamilton
Department of Computer Studies
[Page 7]
INTERNET-DRAFT February 1998
Loughborough University of Technology
Leics. LE11 3TU, UK
Email: m.t.hamilton@lut.ac.uk
Andrew Daviel
Vancouver Webpages
Box 357, 185-9040 Blundell Road
Richmond, BC V6Y1K3, CA
Email: andrew@vancouver-webpages.com
This Internet Draft expires August 1998.
[Page 8]