idnits 2.17.1
draft-hamilton-cachebusting-00.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
this document.
Expected boilerplate is as follows today (2024-04-25) according to
https://trustee.ietf.org/license-info :
IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
This Internet-Draft is submitted in full conformance with the provisions
of BCP 78 and BCP 79.
IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
Copyright (c) 2024 IETF Trust and the persons identified as the document
authors. All rights reserved.
IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
This document is subject to BCP 78 and the IETF Trust's Legal Provisions
Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided
without warranty as described in the Simplified BSD License.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
** Missing expiration date. The document expiration date should appear on
the first and last page.
** The document seems to lack a 1id_guidelines paragraph about
Internet-Drafts being working documents.
** The document seems to lack a 1id_guidelines paragraph about 6 months
document validity -- however, there's a paragraph with a matching
beginning. Boilerplate error?
** The document seems to lack a 1id_guidelines paragraph about the list of
current Internet-Drafts.
** The document seems to lack a 1id_guidelines paragraph about the list of
Shadow Directories.
== No 'Intended status' indicated for this document; assuming Proposed
Standard
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
** The document seems to lack an Introduction section.
** The document seems to lack an IANA Considerations section. (See Section
2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
when there are no actions for IANA.)
** The document seems to lack separate sections for Informative/Normative
References. All references will be assumed normative when checking for
downward references.
** There are 2 instances of too long lines in the document, the longest one
being 3 characters in excess of 72.
Miscellaneous warnings:
----------------------------------------------------------------------------
-- The document seems to lack a disclaimer for pre-RFC5378 work, but may
have content which was first submitted before 10 November 2008. If you
have contacted all the original authors and they are all willing to grant
the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
this comment. If not, you may need to add the pre-RFC5378 disclaimer.
(See the Legal Provisions document at
https://trustee.ietf.org/license-info for more information.)
-- The document date (February 1998) is 9566 days in the past. Is this
intentional?
Checking references for intended status: Proposed Standard
----------------------------------------------------------------------------
(See RFCs 3967 and 4897 for information about using normative references
to lower-maturity documents in RFCs)
-- Possible downref: Non-RFC (?) normative reference: ref. '1'
** Obsolete normative reference: RFC 2068 (ref. '2') (Obsoleted by RFC 2616)
** Downref: Normative reference to an Informational RFC: RFC 2186 (ref. '3')
** Downref: Normative reference to an Informational RFC: RFC 2187 (ref. '4')
-- Possible downref: Non-RFC (?) normative reference: ref. '5'
** Obsolete normative reference: RFC 2001 (ref. '6') (Obsoleted by RFC 2581)
** Obsolete normative reference: RFC 1305 (ref. '7') (Obsoleted by RFC 5905)
** Obsolete normative reference: RFC 1980 (ref. '8') (Obsoleted by RFC 2854)
Summary: 16 errors (**), 0 flaws (~~), 1 warning (==), 4 comments (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
2 TF-CACHE Martin Hamilton
3 INTERNET-DRAFT Loughborough University
4 Andrew Daviel
5 Vancouver Webpages
6 February 1998
8 Cachebusting - cause and prevention
10 draft-hamilton-cachebusting-00.txt
12 Status of This Memo
14 This document is an Internet-Draft. Internet-Drafts are working
15 documents of the Internet Engineering Task Force (IETF), its
16 areas, and its working groups. Note that other groups may also
17 distribute working documents as Internet-Drafts.
19 Internet-Drafts are draft documents valid for a maximum of six
20 months and may be updated, replaced, or obsoleted by other
21 documents at any time. It is inappropriate to use Internet-Drafts
22 as reference material or to cite them other than as ``work in
23 progress.''
25 To learn the current status of any Internet-Draft, please check
26 the ``1id-abstracts.txt'' listing contained in the Internet-Drafts
27 Shadow Directories on ds.internic.net (US East Coast),
28 nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or
29 munnari.oz.au (Pacific Rim).
31 Distribution of this memo is unlimited. Editorial comments should
32 be sent directly to the author. Technical discussion will take
33 place on the mailing list of the TERENA Web Caching Task Force -
34 TF-CACHE. For more information see
35 .
37 This Internet Draft expires August 1998.
39 Abstract
41 Cachebusting is the sometimes deliberate, sometimes inadvertant,
42 practice of defeating caching. This document explains the nature of
43 the problem with relation to proxy cache servers using the World-Wide
44 Web's HTTP protocol, and outlines some simple measures which may be
45 taken to make an HTTP based service more ''cache friendly''. Since Web
46 caching is still a novel concept, we also explain the basic
47 principles behind it. This document should be read by developers of
48 HTTP based products and services - we assume that the reader is
49 already familiar with HTTP.
51 1. The rationale for Web Caching
53 Caching is a technique widely used in both computer systems hardware
54 and software to improve performance and work around bottlenecks.
55 General examples include physical memory devoted to caching transient
56 data on disk drives and controllers, and operating system features
57 such as directory name lookup cache. Web Caching operates at a
58 higher level often referred to as "middleware". This typically
59 implies caching of transient WWW objects by the end user's Web
60 browser, or using a separate "proxy cache" server which sits between
61 the end user's browser and the "origin server" which they are trying
62 to contact. Figure 1 illustrates this relationship.
64 +---------+ +---------+ +---------+
65 | End | ----------> | Proxy | ----------> | Origin |
66 | user's | HTTP | cache | HTTP/FTP/.. | |
67 | browser | <---------- | server | <---------- | server |
68 +---------+ +---------+ +---------+
70 Figure 1 - a simple proxy cache configuration
72 Proxy cache servers typically speak HTTP [1,2] to the end user's WWW
73 browser, and a variety of protocols to the origin servers. In
74 addition to caching WWW objects, they may also elect to cache other
75 information such as reachability metrics (when choosing between
76 multiple origin servers) and the results of domain name lookups.
77 Recent developments have focussed on linking proxy cache servers
78 together so as to pool their storage capacity - typically using the
79 Internet Cache Protocol [3]. This is discussed further in [4].
81 Proxy caches offer additional functionality above and beyond the WWW
82 browser's own built-in cache, since cached objects may be shared with
83 the entire population of users and with cooperating proxy cache
84 servers. By contrast - browser caches are typically private to the
85 individual, or can only be shared with those browsers which have
86 access to the filesystem on which the cached objects are found.
87 Figure 2 illustrates the operation of the proxy cache server in the
88 case that the requested WWW object (usually identified by its URL, or
89 the URL plus the HTTP request headers sent by the WWW browser) has
90 already been cached.
92 +---------+ +---------+ +---------+
93 | End | ----------> | Proxy | < No need > | Origin |
94 | user's | HTTP | cache | < to > | |
95 | browser | <---------- | server | < contact > | server |
96 +---------+ +---------+ +---------+
98 Figure 2 - fetching a cached object
100 A cache's effectiveness is usually measured in terms of its "hit
101 rate" - the ratio of requests which may be satisfied using cached
102 objects. The goal of the cache administrator is to make this figure
103 as high as possible, without serving a significant volume of stale
104 material to the cache's users.
106 Cache hit rates of 40% to 50% for WWW related traffic are common, for
107 example [5]. Caching also helps to make more effective use of the
108 available bandwidth by allowing TCP congestion control algorithms to
109 work properly - conventional HTTP traffic takes the form of a very
110 large number of short lived TCP connections, which often defeats TCP
111 "slow-start" [6] on busy lines.
113 It follows that proxy caching should be highly attractive to Internet
114 Service Providers and organisations which buy connectivity from them,
115 on a cost/benefit basis. Cache hits are typically delivered an order
116 of magnitude faster than cache misses, since the objects requested do
117 not have to be fetched from the origin server. This means that a
118 site which encourages caching can provide the end user with a much
119 higher perceived quality of service whilst at the same time getting
120 better value for money from their leased line(s).
122 The World-Wide Web community is standardising a new version of HTTP -
123 1.1 - which specifically addresses a number of caching issues. At
124 the time of writing, this had yet to be widely deployed, and the
125 specification was still being developed. In this document we only
126 discuss the best of current practice.
128 2. The cachebusting problem
130 Support in the HTTP protocol and its implementations for proxies and
131 caching is something which has essentially been retro-fitted. As a
132 result, there are many common practices which are incompatible with
133 it, and either defeat caching completely or reduce the benefits which
134 derive from it. This is primarily an educational issue involving
135 developers of HTTP based services and systems.
137 Caching at the HTTP level can cause problems for services which make
138 heavy use of usage statistics - e.g. to provide "hit counts" for
139 advertisers. Users of cached copies of an object are effectively
140 invisible to the provider of the original service. This may provide
141 a strong motivation to defeat caching.
143 There is also the case that a product comes with an out-of-the-box
144 configuration which defeats caching, perhaps unintentionally on the
145 part of the vendor or its developers. If the product works for most
146 users with few if any modifications to the default settings, there
147 will be no incentive to dig deeper into its configuration
148 possibilities.
150 3. How to be friendly to proxy cache servers
152 We will go on to outline some simple measures which the developers of
153 HTTP based systems and services can take to make their products more
154 cache-friendly.
156 3.1 Tips for HTTP server administrators
158 Use a server which supports HTTP 1.1 - this has a number of
159 additional features to support caching.
161 Send the Expires header on documents and images where feasible
162 - this will help caches to decide when your objects are stale.
164 Use an HTTP server which supports the GET method with the
165 If-Modified-Since header - this will help browsers and proxy
166 caches to figure out whether their cached copy of a file is
167 out of date.
169 Ensure that the time is set correctly on the server machine, e.g.
170 via NTP [7], so that the timestamp information carried in the
171 HTTP headers makes sense.
173 3.2 Tips for content providers (e.g. HTML authors)
175 Encourage the sharing of links to common graphics and applets, so
176 that only one URL is used for a given object.
178 Use client-side imagemaps (USEMAP - [8]) where feasible, since
179 server-side imagemaps generate HTTP Redirects which are typically
180 uncacheable.
182 Use trailing slashes (/) for directory names to avoid extra
183 redirects.
185 Where you are using a file which is returned when the directory
186 name is requested (typically index.html or index.htm) "./" can
187 usually be written instead of referring to the file by name.
189 Try to use a single name for a server in the hostname part of the
190 URL in the HTML which you create.
192 Don't rename files to age them - give them unique names in the
193 first place and update the links which point to them.
195 Use the Internet domain name in the host component of the URLs you
196 create, rather than the host's IP address.
198 If you really want to count every access to a given page, embed a
199 tiny non-cacheable image into it. This will give you an access
200 count for the page without requiring the whole thing to be
201 downloaded again by each user of given proxy cache.
203 3.3 Dynamic content (e.g. CGI) developers
205 Make results cacheable where practical :-
206 Use GET instead of POST for simple queries, since POST results
207 aren't cached.
208 Use the path component of the URL to pass information instead of
209 QUERY_STRING - caches may treat objects with a ? in their URL
210 as uncacheable.
211 Use a directory name other than "cgi-bin", since caches can be
212 expected to treat URLs containing this as uncacheable.
213 Generate valid Last-Modified and Expires headers.
214 Handle If-Modified-Since requests.
216 Use applet and scripting technologies such as Javascript or Java
217 instead of CGI for form validation, where feasible.
219 If you use cookies, try to restrict them to the portions of your
220 server where they're essential, since objects returned with a
221 Set-Cookie header are commonly treated as uncacheable. Be aware
222 that cookies may not interact well with proxy cache severs.
224 Try not to parse the HTTP USER_AGENT header to select browser
225 specific capabilities, since the cached HTML will be browser
226 specific, and may be returned to a browser which doesn't know
227 what to do with it. Use features like instead.
229 Don't use server-side includes unless your server can send the
230 Last-Modified HTTP header with them.
232 Don't use redirects, since their results may be uncacheable.
234 Try to keep the size and complexity of pages on secure servers
235 to a minimum, since secure HTTP requests are not cached in proxy
236 caches and may not be cached in many browsers. Try to avoid
237 using secure servers for general pages where feasible.
239 Don't set the objects your server returns to expire immediately,
240 or at some time in the recent past, unless you want to be held
241 up to public ridicule!
243 Don't use content-negotiation until HTTP 1.1 is more widely
244 deployed, since in HTTP/1.0 it interacts badly with proxy caches.
246 Don't specify port 80 in the URL, e.g. when generating URLs
247 programatically.
249 Don't use server modules or scripts to convert document's character
250 set on the server side. Leave it to the client.
252 3.4 Developers of stand-alone applications
254 Implement proxy support.
256 Give users of your application the ability to configure
257 proxying, preferably allowing for a different proxy server and
258 port number on a protocol by protocol basis, and allowing for
259 some Internet domains and/or IP addresses to be exempted from
260 the proxy configuration.
262 Make use of user/admin configured preferences for HTTP proxying
263 which may already have been set up before your application is
264 installed, where these are available.
266 Ideally any new URL protocol schemes, such as "urn:", should be
267 passed to an HTTP proxy server, making it possible to support
268 new protocols without having to upgrade individual software
269 installations.
271 4. Security considerations
273 Cachebusting is clearly justified in those cases where the use of
274 caching has, in itself, security and privacy implications. The end
275 user has no way of knowing what information is being logged, or where
276 it will end up - e.g. bank account or credit card numbers.
278 Proxy servers tend to subvert firewalls and access controls based on
279 IP addresses and/or domain names.
281 Proxy servers can be useful as a central mechanism for laundering
282 incoming WWW traffic to (for example) remove or block offensive
283 material, or to check applications and applets being downloaded for
284 problems such as viruses and denial of service attacks.
286 5. Acknowledgements
288 Thanks to Duane Wessels, Vinod Valloppilli, George Michaelson, Donald
289 Neal, Ernst Heiri, Wojtek Sylwestrzak, Alan J. Flavell and Jens-S
290 Voeckler for their contributions to this document.
292 6. References
294 [1] A. Luotonen and K. Altis, "World-Wide Web proxies", In
295 WWW94 Conference Proceedings (Elsevier), 1994.
297 [2] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, T.
298 Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1",
299 RFC 2068 (Proposed Standard), 01/03/1997.
301 [3] D. Wessels, K. Claffy, "Internet Cache Protocol (ICP),
302 version 2", RFC 2186 (Informational), September 1997.
304 [4] D. Wessels, K. Claffy. "Application of Internet Cache
305 Protocol (ICP), version 2", RFC 2187 (Informational),
306 September 1997.
308 [5] K. Claffy, "NLANR Caching Workshop Report", June 1997.
309
311 [6] W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast
312 Retransmit, and Fast Recovery Algorithms", RFC 2001 (Pro-
313 posed Standard), 01/24/1997.
315 [7] D. Mills, "Network Time Protocol (v3)", RFC 1305 (Pro-
316 posed Standard), 04/09/1992.
318 [8] J. Seidman, "A Proposed Extension to HTML: Client-Side
319 Image Maps", RFC 1980 (Informational), 08/14/1996.
321 7. Authors' addresses
323 Martin Hamilton
324 Department of Computer Studies
325 Loughborough University of Technology
326 Leics. LE11 3TU, UK
328 Email: m.t.hamilton@lut.ac.uk
330 Andrew Daviel
331 Vancouver Webpages
332 Box 357, 185-9040 Blundell Road
333 Richmond, BC V6Y1K3, CA
335 Email: andrew@vancouver-webpages.com
337 This Internet Draft expires August 1998.