idnits 2.17.1 draft-vinod-icp-traffic-dist-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 4 instances of lines with control characters in the document. == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 135 has weird spacing: '...oration may w...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 1997) is 9683 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 2068 (ref. '5') (Obsoleted by RFC 2616) Summary: 10 errors (**), 0 flaws (~~), 3 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Vinod Valloppillil 2 Microsoft Corporation 3 Josh Cohen 4 Netscape Communications 5 21 April 1997 6 Expires October 1997 8 Hierarchical HTTP Routing Protocol 10 Status of this Memo 12 This document is an Internet-Draft. Internet-Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its areas, 14 and its working groups. Note that other groups may also distribute 15 working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet-Drafts as reference 20 material or to cite them other than as ``work in progress.'' 22 To learn the current status of any Internet-Draft, please check the 23 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 24 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 25 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 26 ftp.isi.edu (US West Coast). 28 Abstract 30 Recent interest in finding solutions for traffic problems stemming 31 from HTTP have centered around the use of cooperating proxy-caches. 33 We contend that by using a deterministic, hash-based approach for 34 routing URLs within an "array" of proxy servers, many of the benefits 35 of alternative cache cooperation protocols (such as ICP) may be 36 realized. 38 As an example of such an implementation we propose the use of 39 "Proxy Client Configuration Files" between proxy servers in order 40 to exchange routing information. This implementation is motivated 41 in part by the adoption of this file by existing, popular web 42 browsers to provide intelligent URL request routing. 44 This draft discusses adopting this well-understood, widely 45 implemented browser protocol by web proxies in order to facilitate 46 intelligent routing of requests within a network of proxy servers. 48 1. Introduction 50 There is significant interest in the Internet community and the 51 ICP working group in particular in finding mechanisms where these 52 public caches on individual proxy servers can be further aggregated 53 and shared by as many browsers as possible. 55 Philosophically, protocols such as ICPv2 are based on dynamic 56 "pinging" of neighboring proxy servers in an attempt to locate 57 copies of cached objects. 59 We propose an alternate approach based on hash-based routing of 60 URLs. The hash-based routing approach documented here uses a known 61 "request resolution path" through a network of proxies that is 62 determined by the URL of the request. An interesting side effect of 63 this deterministic mechanism is that cache duplication is avoided. 65 Hashing distributes the URL space among several proxies which are 66 assumed to be relatively equidistant from each other. Additionally, 67 this hash-based approach is more tuned for "hierarchical" deployments 68 of proxy servers. One example of this might be a departmental level 69 proxy which routes into an "array" of top level proxies in a 70 corporation which provide the gateway to an ISP. The ISP, in turn, 71 might operate another "array" of proxies at his/her POP. 73 By contrast, ICP networks typically involve peered caches which 74 may operate at the top level of many ISP hierarchies. 76 As an example of an implementation of hash-based routing, we propose 77 extending the existing "Proxy Client Configuration File" protocol used 78 by browsers to intelligently route HTTP requests. 80 Our proposal would implement this protocol on proxy servers in order 81 to provide a vendor independent mechanism for specifying sophisticated 82 hop-by-hop HTTP routing between groups of proxy servers. 84 We also demonstrate that intelligent utilization of this routing 85 protocol can yield almost all of the benefits of alternative cache 86 cooperation protocols. 88 We do NOT propose any specific routing scripts and instead leave 89 determination of such scripts up to individual vendor 90 implementations. 92 Although there are clear advantages to the use of the 93 Proxy Client Configuration File as the vehicle for transporting 94 routing information, there may be interest in the working group 95 in exploring other vehicles (e.g. publishing a static data table 96 containing proxies in an "array" implementing a well-known hash 97 function within proxies) 99 2. Proxy Client Configuration File 101 The Proxy Client Configuration File is described in [1] and [2]. 102 Additionally, multiple interoperable implementations of this protocol 103 are available in popular client browsers. 105 As originally constructed, this file is intended for consumption by 106 client programs (web browsers) and is evaluated per URL to be 107 retrieved by the browser. The output of this script provides an 108 ordered series of proxy servers to be used by the browser to retrieve 109 the object specified by the URL. 111 One of the excellent properties of HTTP-proxy protocol [5] is that it 112 exposes proxy servers to upstream servers & upstream proxies as 113 regular clients. Because the administrator a group of proxies may 114 wish to make make assumptions about a downstream client's ability 115 to interpret a script, we wish to extend the metaphor to include 116 use of the configuration file by proxies as well as "classical" 117 clients. 119 3. Example implementation 121 Researchers have documented the concept of using client-side 122 hash-based routing to spread load across multiple proxy servers. 123 The deterministic nature of many of these algorithms has the 124 additional benefit of improving cache hit rates by creating the 125 image of a single logical cache spread over many proxies. [4] 127 In this proposal, the administrator of an "array" of proxies at an 128 ISP may wish to construct a script that hashes URLs and distributes 129 the hash space across each of his/her proxy servers. Using the same 130 downstream script, the administrator should be able to service both 131 dial-in clients (whose browsers already support the protocol) as well 132 as leased lines to corporate proxies. 134 The hop-by-hop nature of the routing provides additional flexibility 135 in this example. The corporation may wish to use one particular 136 routing script internally (one which tells clients to directly access 137 intranet content, for example) whereas the ISP may wish for the 138 corporation's proxy servers to use a different script to route into 139 the ISP's proxies (one which routes all requests through the caches 140 for maximum hit rates). 142 4. Security Considerations 144 Security issues are not directly addressed in this document. Any 145 security functionality is derived from the underlying HTTP layer. 147 Some consideration may need to be given to ensure the integrity / 148 security of the initial script passing. More specifically, this 149 draft doesn't address issues that may stem from the possiblity that 150 malicious scripts may be constructed. 152 5. Advantages of script-based routing vs. ICP v2 154 We now provide a comparison of this proposal vs. the current Internet 155 Cache Protocol draft [3]. 157 a. Symmetric protocol between client -> proxy and proxy -> proxy 159 This preserves the symmetry of HTTP's presentation of proxy servers 160 as "mega clients" to upstream servers / proxies. 162 ICP is not currently processed / generated by client browsers. 164 b. Eliminate messages for cache 'miss' events. 166 A very significant percentage of all ICP messages exchanged in the 167 field are cache "misses." [NLANR's field experience indicates that 168 85-90% of all ICP transactions are "misses".] 170 Because this protocol eliminates querying, miss messages no longer 171 occur (the outcome of all forwards are now either either "cache 172 hit" or "continue resolving upstream"). 174 c. Takes advantage of all HTTP work including options, cache-control, 175 authentication, etc. 177 HTTP already provides protocol options to perform functions such as 178 proxy to proxy authentication, etc. These functions don't have to 179 be re-invented. 181 Additionally, much of the new behavior in the HTTP 1.1 cache-control 182 headers is not expressible in ICPv2. Forwarding the entire HTTP 183 request to the next upstream/neighboring proxy allows it to be 184 privy to these options. 186 d. Already implemented on the browser 188 Eases compliance testing and demonstrates soundness of the protocol 189 (in a limited case). 191 e. Sorted requests between proxies = single logical cache 193 Over time, assuming that URL requests are randomly routed (e.g. 194 round robin DNS) to a set of peer ICP neighbors (e.g. on a LAN 195 within an ISP's head-end), the contents of these neighboring 196 caches will eventually become roughly identical. 198 A deterministic hash-based routing scheme, however, provides for a 199 single logical cache image across 'n' proxies instead of 'n' 200 identical caches. 202 ICP's peer to peer queries are replaced by intelligent request 203 routing in the previous level of the hierarchy. 205 f. No new transport protocols 207 The behavior of HTTP is already well understood by system 208 administrators and passed through firewalls, etc. By contrast, 209 ICP is relatively unknown in the vast majority of intranets 210 which may affect speed of deployment. 212 In general, the development and deployment of new wire protocols 213 should be a carefully evaluated endeavor due to huge support 214 costs and "entropy" effects on corporate networks. 216 6. Advantages of ICP v2 vs. script-based routing 218 a. Exchange of messages over WAN 220 ICP is sometimes used across very wide area links to perform 221 cache look-ups. An example of this might be peered top-level 222 caches between two overseas ISPs. This protocol is more 223 intended for use by proxies that are in relative proximity to each 224 other. 226 One critical question is whether these transoceanic cache 227 look-ups are worth their cost. This is especially a concern given 228 the opportunity to build larger caches within a traditional cache 229 hierarchy. Do large local caches "skim" most of the potential 230 cache hits? This question could be answered with some idea of the 231 hit rate for ICP over WAN links between very large peer caches. 233 b. Exchange of messages across peer administrative domains 235 Correct implementation of the proxy configuration script is in part 236 dependent on having a series of proxies within the same 237 administrative domain which share their logical cache. 239 Because ICP maintains a very loose relationship between neighbors, 240 it is easier to implement across such domains. However, once 241 again, the question of whether anything more than 2 or 3 levels of 242 cache look-ups is valuable becomes pertinent. If not, then a 2-3 243 level hierarchical array of proxies within corporations & ISPs 244 might be sufficient for maximum cache hit rates. 246 c. Binary protocol 248 ICP is clearly faster and easier to parse than HTTP due to it's 249 binary nature. However, the construction of efficient HTTP engines 250 is already at a premium due to the wide deployment of the protocol. 252 d. Connectionless transport 254 ICP can and often is transported over UDP which is lighter weight 255 than HTTP's TCP connection. Many of these disadvantages may be 256 mitigated by performance optimizations such as keep-alives and 257 pipelining. 259 Additionally, notice that in the case of a cache hit, ICP may 260 require construction of a TCP connection to transport the requested 261 object. 263 Furthermore, the lack of congestion control on ICP messages is 264 the obvious downside of connectionless transport. In this scheme 265 connections between proxy servers would almost certainly be HTTP 266 Keep-Alive sessions. 268 e. Failure case benefit. 270 If for some reason, the ICP cache who has a URL is too slow to 271 respond or is down an alternate cache will be used to fulfill 272 the request. It is likely that this cache will cache the 273 results. At any later point in time, this cache will respond 274 with a HIT message when queried about the URL. This allows 275 very busy URLs to be spread among multiple caches and stems from 276 the non-deterministic nature of the protocol. 278 In the hashing scheme, if a busy set of URLS is assigned to one 279 cache via the hash, and that server is too slow or down, another 280 cache will handle and cache that request. Unfortunately, that 281 cached version is of no use to any clients or proxies anymore 282 since the clients/proxies will never go to that proxy again if it 283 doesnt match the hash function. 285 f. Server distance determination 287 In the field, a secondary benefit of ICP has been use of its 288 UDP round-trip times as a means of guaging relative distance 289 between peer caches. Because hash-based routing relies on TCP 290 and implies hierarchies known a priori, this feature of ICP 291 isn't realized. 293 g. Current installed base 295 ICP currently has an installed base of ~3000 proxies. 297 7. Open Issues 299 As specified via Proxy Client Configuration files, there are 300 two primary open issues associated with this protocol: 302 1) Standardization of the Proxy-client configuration file. 304 Currently, this protocol is only a de facto standard and has not 305 been formally accepted / endorsed by the IETF 307 2) Performance of script evaluation on proxy servers. 309 There are potentially significant issues with evaluating proxy 310 configuration scripts per URL processed by a proxy server. 311 Requiring an interpreter for Javascript [1] may be outside of 312 the bounds of the working group. 314 Additionally, performance of the script + script interpreter may 315 be a significant cost for proxy servers which need to handle high 316 transaction volumes. 318 8. Acknowledgements 320 The authors would like to thank Brian Smith, Kip Compton, Ari 321 Luotonen, and Kerry Schwartz for their assistance in preparing 322 this document. 324 9. References 326 [1] Luotonen, Ari., "Navigator Proxy Auto-Config File Format", 327 Netscape Corporation, http://home.netscape.com/eng/mozilla/2.0/ 328 relnotes/demo/proxy-live.html, March 1996. 330 [2] Microsoft Corporation., "Automatic Proxy Configuration", 331 http://www.microsoft.com/ie/ieak/autosys.htm, March 21, 1997. 333 [3] Wessels, Duane., "Internet Cache Protocol Version 2", http://ds. 334 internic.net/internet-drafts/draft-wessels-icp-v2-00.txt, March 21, 335 1997. 337 [4] Sharp Corporation., "Super Proxy Script", 338 http://naragw.sharp.co.jp/sps/, August 9, 1996. 340 [5] Fielding, R., et. al, "Hypertext Transfer Protocol -- HTTP/1.1", 341 RFC 2068, UC Irvine, January 1997. 343 10. Author Information 345 Vinod Valloppillil 346 Microsoft Corporation 347 One Microsoft Way 348 Redmond, WA 98052 350 Phone: 1.206.703.3460 351 Email: VinodV@Microsoft.Com 353 Josh Cohen 354 Netscape Communications Corporation 355 501 E. Middlefield Rd. 356 Mountain View, CA 94043 358 Phone: 1.415.937.4157 359 Email: Josh@Netscape.Com 361 Expires October 1997