idnits 2.17.1 draft-hamilton-cachebusting-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 2 instances of too long lines in the document, the longest one being 3 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 1998) is 9566 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 2068 (ref. '2') (Obsoleted by RFC 2616) ** Downref: Normative reference to an Informational RFC: RFC 2186 (ref. '3') ** Downref: Normative reference to an Informational RFC: RFC 2187 (ref. '4') -- Possible downref: Non-RFC (?) normative reference: ref. '5' ** Obsolete normative reference: RFC 2001 (ref. '6') (Obsoleted by RFC 2581) ** Obsolete normative reference: RFC 1305 (ref. '7') (Obsoleted by RFC 5905) ** Obsolete normative reference: RFC 1980 (ref. '8') (Obsoleted by RFC 2854) Summary: 16 errors (**), 0 flaws (~~), 1 warning (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TF-CACHE Martin Hamilton 3 INTERNET-DRAFT Loughborough University 4 Andrew Daviel 5 Vancouver Webpages 6 February 1998 8 Cachebusting - cause and prevention 10 draft-hamilton-cachebusting-00.txt 12 Status of This Memo 14 This document is an Internet-Draft. Internet-Drafts are working 15 documents of the Internet Engineering Task Force (IETF), its 16 areas, and its working groups. Note that other groups may also 17 distribute working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six 20 months and may be updated, replaced, or obsoleted by other 21 documents at any time. It is inappropriate to use Internet-Drafts 22 as reference material or to cite them other than as ``work in 23 progress.'' 25 To learn the current status of any Internet-Draft, please check 26 the ``1id-abstracts.txt'' listing contained in the Internet-Drafts 27 Shadow Directories on ds.internic.net (US East Coast), 28 nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or 29 munnari.oz.au (Pacific Rim). 31 Distribution of this memo is unlimited. Editorial comments should 32 be sent directly to the author. Technical discussion will take 33 place on the mailing list of the TERENA Web Caching Task Force - 34 TF-CACHE. For more information see 35 . 37 This Internet Draft expires August 1998. 39 Abstract 41 Cachebusting is the sometimes deliberate, sometimes inadvertant, 42 practice of defeating caching. This document explains the nature of 43 the problem with relation to proxy cache servers using the World-Wide 44 Web's HTTP protocol, and outlines some simple measures which may be 45 taken to make an HTTP based service more ''cache friendly''. Since Web 46 caching is still a novel concept, we also explain the basic 47 principles behind it. This document should be read by developers of 48 HTTP based products and services - we assume that the reader is 49 already familiar with HTTP. 51 1. The rationale for Web Caching 53 Caching is a technique widely used in both computer systems hardware 54 and software to improve performance and work around bottlenecks. 55 General examples include physical memory devoted to caching transient 56 data on disk drives and controllers, and operating system features 57 such as directory name lookup cache. Web Caching operates at a 58 higher level often referred to as "middleware". This typically 59 implies caching of transient WWW objects by the end user's Web 60 browser, or using a separate "proxy cache" server which sits between 61 the end user's browser and the "origin server" which they are trying 62 to contact. Figure 1 illustrates this relationship. 64 +---------+ +---------+ +---------+ 65 | End | ----------> | Proxy | ----------> | Origin | 66 | user's | HTTP | cache | HTTP/FTP/.. | | 67 | browser | <---------- | server | <---------- | server | 68 +---------+ +---------+ +---------+ 70 Figure 1 - a simple proxy cache configuration 72 Proxy cache servers typically speak HTTP [1,2] to the end user's WWW 73 browser, and a variety of protocols to the origin servers. In 74 addition to caching WWW objects, they may also elect to cache other 75 information such as reachability metrics (when choosing between 76 multiple origin servers) and the results of domain name lookups. 77 Recent developments have focussed on linking proxy cache servers 78 together so as to pool their storage capacity - typically using the 79 Internet Cache Protocol [3]. This is discussed further in [4]. 81 Proxy caches offer additional functionality above and beyond the WWW 82 browser's own built-in cache, since cached objects may be shared with 83 the entire population of users and with cooperating proxy cache 84 servers. By contrast - browser caches are typically private to the 85 individual, or can only be shared with those browsers which have 86 access to the filesystem on which the cached objects are found. 87 Figure 2 illustrates the operation of the proxy cache server in the 88 case that the requested WWW object (usually identified by its URL, or 89 the URL plus the HTTP request headers sent by the WWW browser) has 90 already been cached. 92 +---------+ +---------+ +---------+ 93 | End | ----------> | Proxy | < No need > | Origin | 94 | user's | HTTP | cache | < to > | | 95 | browser | <---------- | server | < contact > | server | 96 +---------+ +---------+ +---------+ 98 Figure 2 - fetching a cached object 100 A cache's effectiveness is usually measured in terms of its "hit 101 rate" - the ratio of requests which may be satisfied using cached 102 objects. The goal of the cache administrator is to make this figure 103 as high as possible, without serving a significant volume of stale 104 material to the cache's users. 106 Cache hit rates of 40% to 50% for WWW related traffic are common, for 107 example [5]. Caching also helps to make more effective use of the 108 available bandwidth by allowing TCP congestion control algorithms to 109 work properly - conventional HTTP traffic takes the form of a very 110 large number of short lived TCP connections, which often defeats TCP 111 "slow-start" [6] on busy lines. 113 It follows that proxy caching should be highly attractive to Internet 114 Service Providers and organisations which buy connectivity from them, 115 on a cost/benefit basis. Cache hits are typically delivered an order 116 of magnitude faster than cache misses, since the objects requested do 117 not have to be fetched from the origin server. This means that a 118 site which encourages caching can provide the end user with a much 119 higher perceived quality of service whilst at the same time getting 120 better value for money from their leased line(s). 122 The World-Wide Web community is standardising a new version of HTTP - 123 1.1 - which specifically addresses a number of caching issues. At 124 the time of writing, this had yet to be widely deployed, and the 125 specification was still being developed. In this document we only 126 discuss the best of current practice. 128 2. The cachebusting problem 130 Support in the HTTP protocol and its implementations for proxies and 131 caching is something which has essentially been retro-fitted. As a 132 result, there are many common practices which are incompatible with 133 it, and either defeat caching completely or reduce the benefits which 134 derive from it. This is primarily an educational issue involving 135 developers of HTTP based services and systems. 137 Caching at the HTTP level can cause problems for services which make 138 heavy use of usage statistics - e.g. to provide "hit counts" for 139 advertisers. Users of cached copies of an object are effectively 140 invisible to the provider of the original service. This may provide 141 a strong motivation to defeat caching. 143 There is also the case that a product comes with an out-of-the-box 144 configuration which defeats caching, perhaps unintentionally on the 145 part of the vendor or its developers. If the product works for most 146 users with few if any modifications to the default settings, there 147 will be no incentive to dig deeper into its configuration 148 possibilities. 150 3. How to be friendly to proxy cache servers 152 We will go on to outline some simple measures which the developers of 153 HTTP based systems and services can take to make their products more 154 cache-friendly. 156 3.1 Tips for HTTP server administrators 158 Use a server which supports HTTP 1.1 - this has a number of 159 additional features to support caching. 161 Send the Expires header on documents and images where feasible 162 - this will help caches to decide when your objects are stale. 164 Use an HTTP server which supports the GET method with the 165 If-Modified-Since header - this will help browsers and proxy 166 caches to figure out whether their cached copy of a file is 167 out of date. 169 Ensure that the time is set correctly on the server machine, e.g. 170 via NTP [7], so that the timestamp information carried in the 171 HTTP headers makes sense. 173 3.2 Tips for content providers (e.g. HTML authors) 175 Encourage the sharing of links to common graphics and applets, so 176 that only one URL is used for a given object. 178 Use client-side imagemaps (USEMAP - [8]) where feasible, since 179 server-side imagemaps generate HTTP Redirects which are typically 180 uncacheable. 182 Use trailing slashes (/) for directory names to avoid extra 183 redirects. 185 Where you are using a file which is returned when the directory 186 name is requested (typically index.html or index.htm) "./" can 187 usually be written instead of referring to the file by name. 189 Try to use a single name for a server in the hostname part of the 190 URL in the HTML which you create. 192 Don't rename files to age them - give them unique names in the 193 first place and update the links which point to them. 195 Use the Internet domain name in the host component of the URLs you 196 create, rather than the host's IP address. 198 If you really want to count every access to a given page, embed a 199 tiny non-cacheable image into it. This will give you an access 200 count for the page without requiring the whole thing to be 201 downloaded again by each user of given proxy cache. 203 3.3 Dynamic content (e.g. CGI) developers 205 Make results cacheable where practical :- 206 Use GET instead of POST for simple queries, since POST results 207 aren't cached. 208 Use the path component of the URL to pass information instead of 209 QUERY_STRING - caches may treat objects with a ? in their URL 210 as uncacheable. 211 Use a directory name other than "cgi-bin", since caches can be 212 expected to treat URLs containing this as uncacheable. 213 Generate valid Last-Modified and Expires headers. 214 Handle If-Modified-Since requests. 216 Use applet and scripting technologies such as Javascript or Java 217 instead of CGI for form validation, where feasible. 219 If you use cookies, try to restrict them to the portions of your 220 server where they're essential, since objects returned with a 221 Set-Cookie header are commonly treated as uncacheable. Be aware 222 that cookies may not interact well with proxy cache severs. 224 Try not to parse the HTTP USER_AGENT header to select browser 225 specific capabilities, since the cached HTML will be browser 226 specific, and may be returned to a browser which doesn't know 227 what to do with it. Use features like instead. 229 Don't use server-side includes unless your server can send the 230 Last-Modified HTTP header with them. 232 Don't use redirects, since their results may be uncacheable. 234 Try to keep the size and complexity of pages on secure servers 235 to a minimum, since secure HTTP requests are not cached in proxy 236 caches and may not be cached in many browsers. Try to avoid 237 using secure servers for general pages where feasible. 239 Don't set the objects your server returns to expire immediately, 240 or at some time in the recent past, unless you want to be held 241 up to public ridicule! 243 Don't use content-negotiation until HTTP 1.1 is more widely 244 deployed, since in HTTP/1.0 it interacts badly with proxy caches. 246 Don't specify port 80 in the URL, e.g. when generating URLs 247 programatically. 249 Don't use server modules or scripts to convert document's character 250 set on the server side. Leave it to the client. 252 3.4 Developers of stand-alone applications 254 Implement proxy support. 256 Give users of your application the ability to configure 257 proxying, preferably allowing for a different proxy server and 258 port number on a protocol by protocol basis, and allowing for 259 some Internet domains and/or IP addresses to be exempted from 260 the proxy configuration. 262 Make use of user/admin configured preferences for HTTP proxying 263 which may already have been set up before your application is 264 installed, where these are available. 266 Ideally any new URL protocol schemes, such as "urn:", should be 267 passed to an HTTP proxy server, making it possible to support 268 new protocols without having to upgrade individual software 269 installations. 271 4. Security considerations 273 Cachebusting is clearly justified in those cases where the use of 274 caching has, in itself, security and privacy implications. The end 275 user has no way of knowing what information is being logged, or where 276 it will end up - e.g. bank account or credit card numbers. 278 Proxy servers tend to subvert firewalls and access controls based on 279 IP addresses and/or domain names. 281 Proxy servers can be useful as a central mechanism for laundering 282 incoming WWW traffic to (for example) remove or block offensive 283 material, or to check applications and applets being downloaded for 284 problems such as viruses and denial of service attacks. 286 5. Acknowledgements 288 Thanks to Duane Wessels, Vinod Valloppilli, George Michaelson, Donald 289 Neal, Ernst Heiri, Wojtek Sylwestrzak, Alan J. Flavell and Jens-S 290 Voeckler for their contributions to this document. 292 6. References 294 [1] A. Luotonen and K. Altis, "World-Wide Web proxies", In 295 WWW94 Conference Proceedings (Elsevier), 1994. 297 [2] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, T. 298 Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", 299 RFC 2068 (Proposed Standard), 01/03/1997. 301 [3] D. Wessels, K. Claffy, "Internet Cache Protocol (ICP), 302 version 2", RFC 2186 (Informational), September 1997. 304 [4] D. Wessels, K. Claffy. "Application of Internet Cache 305 Protocol (ICP), version 2", RFC 2187 (Informational), 306 September 1997. 308 [5] K. Claffy, "NLANR Caching Workshop Report", June 1997. 309 <URL:http://ircache.nlanr.net/Cache/Workshop97/minutes.html> 311 [6] W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast 312 Retransmit, and Fast Recovery Algorithms", RFC 2001 (Pro- 313 posed Standard), 01/24/1997. 315 [7] D. Mills, "Network Time Protocol (v3)", RFC 1305 (Pro- 316 posed Standard), 04/09/1992. 318 [8] J. Seidman, "A Proposed Extension to HTML: Client-Side 319 Image Maps", RFC 1980 (Informational), 08/14/1996. 321 7. Authors' addresses 323 Martin Hamilton 324 Department of Computer Studies 325 Loughborough University of Technology 326 Leics. LE11 3TU, UK 328 Email: m.t.hamilton@lut.ac.uk 330 Andrew Daviel 331 Vancouver Webpages 332 Box 357, 185-9040 Blundell Road 333 Richmond, BC V6Y1K3, CA 335 Email: andrew@vancouver-webpages.com 337 This Internet Draft expires August 1998.