idnits 2.17.1 

draft-hamilton-cachebusting-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-25) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Introduction section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 2 instances of too long lines in the document, the longest one
     being 3 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 1998) is 9566 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  ** Obsolete normative reference: RFC 2068 (ref. '2') (Obsoleted by RFC 2616)

  ** Downref: Normative reference to an Informational RFC: RFC 2186 (ref. '3')

  ** Downref: Normative reference to an Informational RFC: RFC 2187 (ref. '4')

  -- Possible downref: Non-RFC (?) normative reference: ref. '5'

  ** Obsolete normative reference: RFC 2001 (ref. '6') (Obsoleted by RFC 2581)

  ** Obsolete normative reference: RFC 1305 (ref. '7') (Obsoleted by RFC 5905)

  ** Obsolete normative reference: RFC 1980 (ref. '8') (Obsoleted by RFC 2854)


     Summary: 16 errors (**), 0 flaws (~~), 1 warning (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	TF-CACHE                                                 Martin Hamilton
3	INTERNET-DRAFT                                   Loughborough University
4	                                                           Andrew Daviel
5	                                                      Vancouver Webpages
6	                                                           February 1998

8	                  Cachebusting - cause and prevention

10	                   draft-hamilton-cachebusting-00.txt

12	                          Status of This Memo

14	      This document is an Internet-Draft.  Internet-Drafts are working
15	      documents of the Internet Engineering Task Force (IETF), its
16	      areas, and its working groups.  Note that other groups may also
17	      distribute working documents as Internet-Drafts.

19	      Internet-Drafts are draft documents valid for a maximum of six
20	      months and may be updated, replaced, or obsoleted by other
21	      documents at any time.  It is inappropriate to use Internet-Drafts
22	      as reference material or to cite them other than as ``work in
23	      progress.''

25	      To learn the current status of any Internet-Draft, please check
26	      the ``1id-abstracts.txt'' listing contained in the Internet-Drafts
27	      Shadow Directories on ds.internic.net (US East Coast),
28	      nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or
29	      munnari.oz.au (Pacific Rim).

31	      Distribution of this memo is unlimited.  Editorial comments should
32	      be sent directly to the author.  Technical discussion will take
33	      place on the mailing list of the TERENA Web Caching Task Force -
34	      TF-CACHE.  For more information see
35	      <URL:http://www.terena.nl/task-forces/tf-cache/>.

37	      This Internet Draft expires August 1998.

39	Abstract

41	   Cachebusting is the sometimes deliberate, sometimes inadvertant,
42	   practice of defeating caching.  This document explains the nature of
43	   the problem with relation to proxy cache servers using the World-Wide
44	   Web's HTTP protocol, and outlines some simple measures which may be
45	   taken to make an HTTP based service more ''cache friendly''.  Since Web
46	   caching is still a novel concept, we also explain the basic
47	   principles behind it.  This document should be read by developers of
48	   HTTP based products and services - we assume that the reader is
49	   already familiar with HTTP.

51	1. The rationale for Web Caching

53	   Caching is a technique widely used in both computer systems hardware
54	   and software to improve performance and work around bottlenecks.
55	   General examples include physical memory devoted to caching transient
56	   data on disk drives and controllers, and operating system features
57	   such as directory name lookup cache.  Web Caching operates at a
58	   higher level often referred to as "middleware".  This typically
59	   implies caching of transient WWW objects by the end user's Web
60	   browser, or using a separate "proxy cache" server which sits between
61	   the end user's browser and the "origin server" which they are trying
62	   to contact.  Figure 1 illustrates this relationship.

64	      +---------+             +---------+             +---------+
65	      | End     | ----------> | Proxy   | ----------> | Origin  |
66	      | user's  |    HTTP     | cache   | HTTP/FTP/.. |         |
67	      | browser | <---------- | server  | <---------- | server  |
68	      +---------+             +---------+             +---------+

70	             Figure 1 - a simple proxy cache configuration

72	   Proxy cache servers typically speak HTTP [1,2] to the end user's WWW
73	   browser, and a variety of protocols to the origin servers.  In
74	   addition to caching WWW objects, they may also elect to cache other
75	   information such as reachability metrics (when choosing between
76	   multiple origin servers) and the results of domain name lookups.
77	   Recent developments have focussed on linking proxy cache servers
78	   together so as to pool their storage capacity - typically using the
79	   Internet Cache Protocol [3].  This is discussed further in [4].

81	   Proxy caches offer additional functionality above and beyond the WWW
82	   browser's own built-in cache, since cached objects may be shared with
83	   the entire population of users and with cooperating proxy cache
84	   servers.  By contrast - browser caches are typically private to the
85	   individual, or can only be shared with those browsers which have
86	   access to the filesystem on which the cached objects are found.
87	   Figure 2 illustrates the operation of the proxy cache server in the
88	   case that the requested WWW object (usually identified by its URL, or
89	   the URL plus the HTTP request headers sent by the WWW browser) has
90	   already been cached.

92	      +---------+             +---------+             +---------+
93	      | End     | ----------> | Proxy   | < No need > | Origin  |
94	      | user's  |    HTTP     | cache   | <   to    > |         |
95	      | browser | <---------- | server  | < contact > | server  |
96	      +---------+             +---------+             +---------+

98	                   Figure 2 - fetching a cached object

100	   A cache's effectiveness is usually measured in terms of its "hit
101	   rate" - the ratio of requests which may be satisfied using cached
102	   objects.  The goal of the cache administrator is to make this figure
103	   as high as possible, without serving a significant volume of stale
104	   material to the cache's users.

106	   Cache hit rates of 40% to 50% for WWW related traffic are common, for
107	   example [5].  Caching also helps to make more effective use of the
108	   available bandwidth by allowing TCP congestion control algorithms to
109	   work properly - conventional HTTP traffic takes the form of a very
110	   large number of short lived TCP connections, which often defeats TCP
111	   "slow-start" [6] on busy lines.

113	   It follows that proxy caching should be highly attractive to Internet
114	   Service Providers and organisations which buy connectivity from them,
115	   on a cost/benefit basis.  Cache hits are typically delivered an order
116	   of magnitude faster than cache misses, since the objects requested do
117	   not have to be fetched from the origin server.  This means that a
118	   site which encourages caching can provide the end user with a much
119	   higher perceived quality of service whilst at the same time getting
120	   better value for money from their leased line(s).

122	   The World-Wide Web community is standardising a new version of HTTP -
123	   1.1 - which specifically addresses a number of caching issues.  At
124	   the time of writing, this had yet to be widely deployed, and the
125	   specification was still being developed.  In this document we only
126	   discuss the best of current practice.

128	2. The cachebusting problem

130	   Support in the HTTP protocol and its implementations for proxies and
131	   caching is something which has essentially been retro-fitted.  As a
132	   result, there are many common practices which are incompatible with
133	   it, and either defeat caching completely or reduce the benefits which
134	   derive from it.  This is primarily an educational issue involving
135	   developers of HTTP based services and systems.

137	   Caching at the HTTP level can cause problems for services which make
138	   heavy use of usage statistics - e.g. to provide "hit counts" for
139	   advertisers.  Users of cached copies of an object are effectively
140	   invisible to the provider of the original service.  This may provide
141	   a strong motivation to defeat caching.

143	   There is also the case that a product comes with an out-of-the-box
144	   configuration which defeats caching, perhaps unintentionally on the
145	   part of the vendor or its developers.  If the product works for most
146	   users with few if any modifications to the default settings, there
147	   will be no incentive to dig deeper into its configuration
148	   possibilities.

150	3. How to be friendly to proxy cache servers

152	We will go on to outline some simple measures which the developers of
153	HTTP based systems and services can take to make their products more
154	cache-friendly.

156	3.1 Tips for HTTP server administrators

158	     Use a server which supports HTTP 1.1 - this has a number of
159	       additional features to support caching.

161	     Send the Expires header on documents and images where feasible
162	       - this will help caches to decide when your objects are stale.

164	     Use an HTTP server which supports the GET method with the
165	       If-Modified-Since header - this will help browsers and proxy
166	       caches to figure out whether their cached copy of a file is
167	       out of date.

169	     Ensure that the time is set correctly on the server machine, e.g.
170	       via NTP [7], so that the timestamp information carried in the
171	       HTTP headers makes sense.

173	3.2 Tips for content providers (e.g. HTML authors)

175	     Encourage the sharing of links to common graphics and applets, so
176	       that only one URL is used for a given object.

178	     Use client-side imagemaps (USEMAP - [8]) where feasible, since
179	       server-side imagemaps generate HTTP Redirects which are typically
180	       uncacheable.

182	     Use trailing slashes (/) for directory names to avoid extra
183	       redirects.

185	     Where you are using a file which is returned when the directory
186	       name is requested (typically index.html or index.htm) "./" can
187	       usually be written instead of referring to the file by name.

189	     Try to use a single name for a server in the hostname part of the
190	       URL in the HTML which you create.

192	     Don't rename files to age them - give them unique names in the
193	       first place and update the links which point to them.

195	     Use the Internet domain name in the host component of the URLs you
196	       create, rather than the host's IP address.

198	     If you really want to count every access to a given page, embed a
199	       tiny non-cacheable image into it.  This will give you an access
200	       count for the page without requiring the whole thing to be
201	       downloaded again by each user of given proxy cache.

203	3.3 Dynamic content (e.g. CGI) developers

205	     Make results cacheable where practical :-
206	       Use GET instead of POST for simple queries, since POST results
207	         aren't cached.
208	       Use the path component of the URL to pass information instead of
209	         QUERY_STRING - caches may treat objects with a ? in their URL
210	         as uncacheable.
211	       Use a directory name other than "cgi-bin", since caches can be
212	         expected to treat URLs containing this as uncacheable.
213	       Generate valid Last-Modified and Expires headers.
214	       Handle If-Modified-Since requests.

216	     Use applet and scripting technologies such as Javascript or Java
217	       instead of CGI for form validation, where feasible.

219	     If you use cookies, try to restrict them to the portions of your
220	       server where they're essential, since objects returned with a
221	       Set-Cookie header are commonly treated as uncacheable.  Be aware
222	       that cookies may not interact well with proxy cache severs.

224	     Try not to parse the HTTP USER_AGENT header to select browser
225	       specific capabilities, since the cached HTML will be browser
226	       specific, and may be returned to a browser which doesn't know
227	       what to do with it.  Use features like <NOFRAMES> instead.

229	     Don't use server-side includes unless your server can send the
230	       Last-Modified HTTP header with them.

232	     Don't use redirects, since their results may be uncacheable.

234	     Try to keep the size and complexity of pages on secure servers
235	       to a minimum, since secure HTTP requests are not cached in proxy
236	       caches and may not be cached in many browsers. Try to avoid
237	       using secure servers for general pages where feasible.

239	     Don't set the objects your server returns to expire immediately,
240	       or at some time in the recent past, unless you want to be held
241	       up to public ridicule!

243	     Don't use content-negotiation until HTTP 1.1 is more widely
244	       deployed, since in HTTP/1.0 it interacts badly with proxy caches.

246	     Don't specify port 80 in the URL, e.g. when generating URLs
247	       programatically.

249	     Don't use server modules or scripts to convert document's character
250	       set on the server side.  Leave it to the client.

252	3.4 Developers of stand-alone applications

254	     Implement proxy support.

256	     Give users of your application the ability to configure
257	       proxying, preferably allowing for a different proxy server and
258	       port number on a protocol by protocol basis, and allowing for
259	       some Internet domains and/or IP addresses to be exempted from
260	       the proxy configuration.

262	     Make use of user/admin configured preferences for HTTP proxying
263	       which may already have been set up before your application is
264	       installed, where these are available.

266	     Ideally any new URL protocol schemes, such as "urn:", should be
267	       passed to an HTTP proxy server, making it possible to support
268	       new protocols without having to upgrade individual software
269	       installations.

271	4. Security considerations

273	   Cachebusting is clearly justified in those cases where the use of
274	   caching has, in itself, security and privacy implications.  The end
275	   user has no way of knowing what information is being logged, or where
276	   it will end up - e.g. bank account or credit card numbers.

278	   Proxy servers tend to subvert firewalls and access controls based on
279	   IP addresses and/or domain names.

281	   Proxy servers can be useful as a central mechanism for laundering
282	   incoming WWW traffic to (for example) remove or block offensive
283	   material, or to check applications and applets being downloaded for
284	   problems such as viruses and denial of service attacks.

286	5. Acknowledgements

288	   Thanks to Duane Wessels, Vinod Valloppilli, George Michaelson, Donald
289	   Neal, Ernst Heiri, Wojtek Sylwestrzak, Alan J. Flavell and Jens-S
290	   Voeckler for their contributions to this document.

292	6. References

294	   [1]         A. Luotonen and K. Altis, "World-Wide Web proxies", In
295	               WWW94 Conference Proceedings (Elsevier), 1994.

297	   [2]         R. Fielding, J. Gettys, J. Mogul, H. Frystyk, T.
298	               Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1",
299	               RFC 2068 (Proposed Standard), 01/03/1997.

301	   [3]         D. Wessels, K. Claffy, "Internet Cache Protocol (ICP),
302	               version 2", RFC 2186 (Informational), September 1997.

304	   [4]         D. Wessels, K. Claffy.  "Application of Internet Cache
305	               Protocol (ICP), version 2", RFC 2187 (Informational),
306	               September 1997.

308	   [5]         K. Claffy, "NLANR Caching Workshop Report", June 1997.
309	               <URL:http://ircache.nlanr.net/Cache/Workshop97/minutes.html>

311	   [6]         W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast
312	               Retransmit, and Fast Recovery Algorithms", RFC 2001 (Pro-
313	               posed Standard), 01/24/1997.

315	   [7]         D. Mills, "Network Time Protocol (v3)", RFC 1305 (Pro-
316	               posed Standard), 04/09/1992.

318	   [8]         J. Seidman, "A Proposed Extension to HTML: Client-Side
319	               Image Maps", RFC 1980 (Informational), 08/14/1996.

321	7. Authors' addresses

323	   Martin Hamilton
324	   Department of Computer Studies
325	   Loughborough University of Technology
326	   Leics. LE11 3TU, UK

328	   Email: m.t.hamilton@lut.ac.uk

330	   Andrew Daviel
331	   Vancouver Webpages
332	   Box 357, 185-9040 Blundell Road
333	   Richmond, BC V6Y1K3, CA

335	   Email: andrew@vancouver-webpages.com

337	                 This Internet Draft expires August 1998.