idnits 2.17.1 

draft-vinod-icp-traffic-dist-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-19) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 4 instances of lines with control characters in the document.

  == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 135 has weird spacing: '...oration  may w...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 1997) is 9683 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  -- Possible downref: Non-RFC (?) normative reference: ref. '4'

  ** Obsolete normative reference: RFC 2068 (ref. '5') (Obsoleted by RFC 2616)


     Summary: 10 errors (**), 0 flaws (~~), 3 warnings (==), 6 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	INTERNET-DRAFT                                       Vinod Valloppillil
2	<draft-vinod-icp-traffic-dist-00.txt>		  Microsoft Corporation
3					                             Josh Cohen
4	                                        	Netscape Communications
5	                                                          21 April 1997
6	                                                   Expires October 1997

8	         	Hierarchical HTTP Routing Protocol

10	Status of this Memo

12	  This document is an Internet-Draft.  Internet-Drafts are working
13	  documents of the Internet Engineering Task Force (IETF), its areas,
14	  and its working groups.  Note that other groups may also distribute
15	  working documents as Internet-Drafts.

17	  Internet-Drafts are draft documents valid for a maximum of six months
18	  and may be updated, replaced, or obsoleted by other documents at any
19	  time.  It is inappropriate to use Internet-Drafts as reference
20	  material or to cite them other than as ``work in progress.''

22	  To learn the current status of any Internet-Draft, please check the
23	  ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
24	  Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
25	  munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
26	  ftp.isi.edu (US West Coast).

28	Abstract

30	  Recent interest in finding solutions for traffic problems stemming
31	  from HTTP have centered around the use of cooperating proxy-caches.

33	  We contend that by using a deterministic, hash-based approach for
34	  routing URLs within an "array" of proxy servers, many of the benefits
35	  of alternative cache cooperation protocols (such as ICP) may be
36	  realized.

38	  As an example of such an implementation we propose the use of
39	  "Proxy Client Configuration Files" between proxy servers in order
40	  to exchange routing information.  This implementation is motivated
41	  in part by the adoption of this file by existing, popular web
42	  browsers to provide intelligent URL request routing.

44	  This draft discusses adopting this well-understood, widely
45	  implemented browser protocol by web proxies in order to facilitate
46	  intelligent routing of requests within a network of proxy servers.

48	1. Introduction

50	  There is significant interest in the Internet community and the
51	  ICP working group in particular in finding mechanisms where these
52	  public caches on individual proxy servers can be further aggregated
53	  and shared by as many browsers as possible.

55	  Philosophically, protocols such as ICPv2 are based on dynamic
56	  "pinging" of neighboring proxy servers in an attempt to locate
57	  copies of cached objects.

59	  We propose an alternate approach based on hash-based routing of
60	  URLs.  The hash-based routing approach documented here uses a known
61	  "request resolution path" through a network of proxies that is
62	  determined by the URL of the request.  An interesting side effect of
63	  this deterministic mechanism is that cache duplication is avoided.

65	  Hashing distributes the URL space among several proxies which are
66	  assumed to be relatively equidistant from each other.  Additionally,
67	  this hash-based approach is more tuned for "hierarchical" deployments
68	  of proxy servers.  One example of this might be a departmental level
69	  proxy which routes into an "array" of top level proxies in a
70	  corporation which provide the gateway to an ISP.  The ISP, in turn,
71	  might operate another "array" of proxies at his/her POP.

73	  By contrast, ICP networks typically involve peered caches which
74	  may operate at the top level of many ISP hierarchies.

76	  As an example of an implementation of hash-based routing, we propose
77	  extending the existing "Proxy Client Configuration File" protocol used
78	  by browsers to intelligently route HTTP requests.

80	  Our proposal would implement this protocol on proxy servers in order
81	  to provide a vendor independent mechanism for specifying sophisticated
82	  hop-by-hop HTTP routing between groups of proxy servers.

84	  We also demonstrate that intelligent utilization of this routing
85	  protocol can yield almost all of the benefits of alternative cache
86	  cooperation protocols.

88	  We do NOT propose any specific routing scripts and instead leave
89	  determination of such scripts up to individual vendor
90	  implementations.

92	  Although there are clear advantages to the use of the
93	  Proxy Client Configuration File as the vehicle for transporting
94	  routing information, there may be interest in the working group
95	  in exploring other vehicles (e.g. publishing a static data table
96	  containing proxies in an "array" implementing a well-known hash
97	  function within proxies)

99	2. Proxy Client Configuration File

101	  The Proxy Client Configuration File is described in [1] and [2].
102	  Additionally, multiple interoperable implementations of this protocol
103	  are available in popular client browsers.

105	  As originally constructed, this file is intended for consumption by
106	  client programs (web browsers) and is evaluated per URL to be
107	  retrieved by the browser.  The output of this script provides an
108	  ordered series of proxy servers to be used by the browser to retrieve
109	  the object specified by the URL.

111	  One of the excellent properties of HTTP-proxy protocol [5] is that it
112	  exposes proxy servers to upstream servers & upstream proxies as
113	  regular clients.  Because the administrator a group of proxies may
114	  wish to make make assumptions about a downstream client's ability
115	  to interpret a script, we wish to extend the metaphor to include
116	  use of the configuration file by proxies as well as "classical"
117	  clients.

119	3. Example implementation

121	  Researchers have documented the concept of using client-side
122	  hash-based routing to spread load across multiple proxy servers.
123	  The deterministic nature of many of these algorithms has the
124	  additional benefit of improving cache hit rates by creating the
125	  image of a single logical cache spread over many proxies. [4]

127	  In this proposal, the administrator of an "array" of proxies at an
128	  ISP may wish to construct a script that hashes URLs and distributes
129	  the hash space across each of his/her proxy servers.  Using the same
130	  downstream script, the administrator should be able to service both
131	  dial-in clients (whose browsers already support the protocol) as well
132	  as leased lines to corporate proxies.

134	  The hop-by-hop nature of the routing provides additional flexibility
135	  in this example.  The corporation  may wish to use one particular
136	  routing script internally (one which tells clients to directly access
137	  intranet content, for example) whereas the ISP may wish for the
138	  corporation's proxy servers to use a different script to route into
139	  the ISP's proxies (one which routes all requests through the caches
140	  for maximum hit rates).

142	4. Security Considerations

144	  Security issues are not directly addressed in this document.  Any
145	  security functionality is derived from the underlying HTTP layer.

147	  Some consideration may need to be given to ensure the integrity /
148	  security of the initial script passing.  More specifically, this
149	  draft doesn't address issues that may stem from the possiblity that
150	  malicious scripts may be constructed.

152	5. Advantages of script-based routing vs. ICP v2

154	  We now provide a comparison of this proposal vs. the current Internet
155	  Cache Protocol draft [3].

157	  a. Symmetric protocol between client -> proxy and proxy -> proxy

159	    This preserves the symmetry of HTTP's presentation of proxy servers
160	    as "mega clients" to upstream servers / proxies.

162	    ICP is not currently processed / generated by client browsers.

164	  b. Eliminate messages for cache 'miss' events.

166	    A very significant percentage of all ICP messages exchanged in the
167	    field are cache "misses." [NLANR's field experience indicates that
168	    85-90% of all ICP transactions are "misses".]

170	    Because this protocol eliminates querying, miss messages no longer
171	    occur  (the outcome of all forwards are now either either "cache
172	    hit" or "continue resolving upstream").

174	  c. Takes advantage of all HTTP work including options, cache-control,
175	  authentication, etc.

177	    HTTP already provides protocol options to perform functions such as
178	    proxy to proxy authentication, etc.  These functions don't have to
179	    be re-invented.

181	    Additionally, much of the new behavior in the HTTP 1.1 cache-control
182	    headers is not expressible in ICPv2.  Forwarding the entire HTTP
183	    request to the next upstream/neighboring proxy allows it to be
184	    privy to these options.

186	  d. Already implemented on the browser

188	    Eases compliance testing and demonstrates soundness of the protocol
189	    (in a limited case).

191	  e. Sorted requests between proxies = single logical cache

193	    Over time, assuming that URL requests are randomly routed (e.g.
194	    round robin DNS) to a set of peer ICP neighbors (e.g. on a LAN
195	    within an ISP's head-end), the contents of these neighboring
196	    caches will eventually become roughly identical.

198	    A deterministic hash-based routing scheme, however, provides for a
199	    single logical cache image across 'n' proxies instead of 'n'
200	    identical caches.

202	    ICP's peer to peer queries are replaced by intelligent request
203	    routing in the previous level of the hierarchy.

205	  f.  No new transport protocols

207	    The behavior of HTTP is already well understood by system
208	    administrators and passed through firewalls, etc.  By contrast,
209	    ICP is relatively unknown in the vast majority of intranets
210	    which may affect speed of deployment.

212	    In general, the development and deployment of new wire protocols
213	    should be a carefully evaluated endeavor due to huge support
214	    costs and "entropy" effects on corporate networks.

216	6.     Advantages of ICP v2 vs. script-based routing

218	  a. Exchange of messages over WAN

220	    ICP is sometimes used across very wide area links to perform
221	    cache look-ups.  An example of this might be peered top-level
222	    caches between two overseas ISPs.  This protocol is more
223	    intended for use by proxies that are in relative proximity to each
224	    other.

226	    One critical question is whether these transoceanic cache
227	    look-ups are worth their cost.  This is especially a concern given
228	    the opportunity to build larger caches within a traditional cache
229	    hierarchy.  Do large local caches "skim" most of the potential
230	    cache hits?  This question could be answered with some idea of the
231	    hit rate for ICP over WAN links between very large peer caches.

233	  b. Exchange of messages across peer administrative domains

235	    Correct implementation of the proxy configuration script is in part
236	    dependent on having a series of proxies within the same
237	    administrative domain which share their logical cache.

239	    Because ICP maintains a very loose relationship between neighbors,
240	    it is easier to implement across such domains.  However, once
241	    again, the question of whether anything more than 2 or 3 levels of
242	    cache look-ups is valuable becomes pertinent.  If not, then a 2-3
243	    level hierarchical array of proxies within corporations & ISPs
244	    might be sufficient for maximum cache hit rates.

246	  c. Binary protocol

248	    ICP is clearly faster and easier to parse than HTTP due to it's
249	    binary nature.  However, the construction of efficient HTTP engines
250	    is already at a premium due to the wide deployment of the protocol.

252	  d. Connectionless transport

254	    ICP can and often is transported over UDP which is lighter weight
255	    than HTTP's TCP connection.  Many of these disadvantages may be
256	    mitigated by performance optimizations such as keep-alives and
257	    pipelining.

259	    Additionally, notice that in the case of a cache hit, ICP may
260	    require construction of a TCP connection to transport the requested
261	    object.

263	    Furthermore, the lack of congestion control on ICP messages is
264	    the obvious downside of connectionless transport.  In this scheme
265	    connections between proxy servers would almost certainly be HTTP
266	    Keep-Alive sessions.

268	  e. Failure case benefit.

270	    If for some reason, the ICP cache who has a URL is too slow to
271	    respond or is down an alternate cache will be used to fulfill
272	    the request.  It is likely that this cache will cache the
273	    results.  At any later point in time, this cache will respond
274	    with a HIT message when queried about the URL.  This allows
275	    very busy URLs to be spread among multiple caches and stems from
276	    the non-deterministic nature of the protocol.

278	    In the hashing scheme, if a busy set of URLS is assigned to one
279	    cache via the hash, and that server is too slow or down, another
280	    cache will handle and cache that request.  Unfortunately, that
281	    cached version is of no use to any clients or proxies anymore
282	    since the clients/proxies will never go to that proxy again if it
283	    doesnt match the hash function.

285	  f.  Server distance determination

287	    In the field, a secondary benefit of ICP has been use of its
288	    UDP round-trip times as a means of guaging relative distance
289	    between peer caches.  Because hash-based routing relies on TCP
290	    and implies hierarchies known a priori, this feature of ICP
291	    isn't realized.

293	  g.  Current installed base

295	    ICP currently has an installed base of ~3000 proxies.

297	7. Open Issues

299	  As specified via Proxy Client Configuration files, there are
300	  two primary open issues associated with this protocol:

302	  1)  Standardization of the Proxy-client configuration file.

304	    Currently, this protocol is only a de facto standard and has not
305	    been formally accepted / endorsed by the IETF

307	  2)  Performance of script evaluation on proxy servers.

309	    There are potentially significant issues with evaluating proxy
310	    configuration scripts per URL processed by a proxy server.
311	    Requiring an interpreter for Javascript [1] may be outside of
312	    the bounds of the working group.

314	    Additionally, performance of the script + script interpreter may
315	    be a significant cost for proxy servers which need to handle high
316	    transaction volumes.

318	8. Acknowledgements

320	  The authors would like to thank Brian Smith, Kip Compton, Ari
321	  Luotonen, and Kerry Schwartz for their assistance in preparing
322	  this document.

324	9. References

326	  [1] Luotonen, Ari., "Navigator Proxy Auto-Config File Format",
327	 Netscape Corporation, http://home.netscape.com/eng/mozilla/2.0/
328	 relnotes/demo/proxy-live.html, March 1996.

330	  [2] Microsoft Corporation., "Automatic Proxy Configuration",
331	 http://www.microsoft.com/ie/ieak/autosys.htm, March 21, 1997.

333	  [3] Wessels, Duane., "Internet Cache Protocol Version 2", http://ds.
334	 internic.net/internet-drafts/draft-wessels-icp-v2-00.txt, March 21,
335	 1997.

337	  [4] Sharp Corporation., "Super Proxy Script",
338	 http://naragw.sharp.co.jp/sps/, August 9, 1996.

340	  [5] Fielding, R., et. al, "Hypertext Transfer Protocol -- HTTP/1.1",
341	 RFC 2068, UC Irvine, January 1997.

343	10.  Author Information

345	    Vinod Valloppillil
346	    Microsoft Corporation
347	    One Microsoft Way
348	    Redmond, WA 98052

350	    Phone:  1.206.703.3460
351	    Email:  VinodV@Microsoft.Com

353	    Josh Cohen
354	    Netscape Communications Corporation
355	    501 E. Middlefield Rd.
356	    Mountain View, CA 94043

358	    Phone: 1.415.937.4157
359	    Email: Josh@Netscape.Com

361	Expires October 1997