idnits 2.17.1 draft-nottingham-surrogates-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 7, 2000) is 8688 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 2616 (ref. '2') (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 2396 (ref. '3') (Obsoleted by RFC 3986) ** Obsolete normative reference: RFC 2145 (ref. '5') (Obsoleted by RFC 7230) Summary: 6 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group M. Nottingham 2 Internet-Draft Akamai Technologies 3 Expires: January 5, 2001 July 7, 2000 5 Requirements for Demand-Driven Surrogate Origin Servers 6 draft-nottingham-surrogates-00 8 Status of this Memo 10 This document is an Internet-Draft and is in full conformance with 11 all provisions of Section 10 of RFC2026. 13 Internet-Drafts are working documents of the Internet Engineering 14 Task Force (IETF), its areas, and its working groups. Note that 15 other groups may also distribute working documents as 16 Internet-Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six 19 months and may be updated, replaced, or obsoleted by other documents 20 at any time. It is inappropriate to use Internet-Drafts as reference 21 material or to cite them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 25 The list of Internet-Draft Shadow Directories can be accessed at 26 http://www.ietf.org/shadow.html. 27 This Internet-Draft will expire on January 5, 2001. 29 Copyright Notice 31 Copyright (C) The Internet Society (2000). All Rights Reserved. 33 Abstract 35 This document states requirements for demand-driven surrogate origin 36 servers, also known as reverse proxies and Web accelerators. 38 1. Introduction 40 A surrogate origin server (also known as a reverse proxy or HTTP 41 accelerator) is a device that authoritatively serves requests on 42 behalf of an origin server (known as its master origin server)[1]. 44 Demand-driven surrogate origin servers are populated by the traffic 45 flowing through them; when a client requests an object which is not 46 resident, they will fetch it from the master origin server. 48 It may be useful to conceptualize a demand-driven surrogate as an 49 origin server that happens to be populated via the HTTP on the back 50 end. 52 In many ways, they are similar to proxy/caches, and often leverage 53 proxy/cache software. However, surrogates serve content 54 authoritatively, and therefore take the role of an origin server, 55 not a proxy, to downstream clients. 57 Unfortunately, the use of a proxy/cache as a surrogate origin server 58 introduces several problems in protocol implementation, due to this 59 changing of roles. This document attempts to rectify such 60 inconsistencies. 62 Additionally, master origin server administrators usually have a 63 greater degree of control over the activity and use of surrogates 64 than they would over proxies. Because of this close relationship, 65 more precise control over the behavior of the surrogate can be given 66 to the administrator. 68 This document specifies acceptable mechanisms for doing so. 70 1.1 Requirements 72 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 73 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 74 document are to be interpreted as described in RFC 2119[4]. 76 An implementation is not compliant if it fails to satisfy one or 77 more of the MUST or REQUIRED level requirements. An implementation 78 that satisfies all the MUST or REQUIRED level and all the SHOULD 79 level requirements is said to be "unconditionally compliant"; one 80 that satisfies all the MUST level requirements but not all the 81 SHOULD level requirements is said to be "conditionally compliant". 83 1.2 Terminology 85 This document uses terms defined and explained in the WREC 86 Taxonomy[1], and the HTTP/1.1 specification[2]. The reader is 87 expected to be familiar with both. 89 In this document, the term "surrogate" is shorthand for a 90 demand-driven surrogate origin server, unless explicitly stated 91 otherwise. Similarly, "origin server" refers to a surrogate's master 92 origin server. 94 2. Overview of Demand-Driven Surrogate Origin Servers 96 2.1 Uses and Characteristics 98 In normal operation, demand-driven surrogate origin servers are 99 deployed and maintained by (or on behalf of) the publisher of a Web 100 site, rather than directly for end users (as a proxy would be). This 101 is often done for a number of reasons, including (a non-exhaustive 102 list): 103 o Reduction of load on the master origin server 104 o Reduction of network traffic to the master origin server 105 o Distribution of objects, in order to improve perceived latency by 106 storing them closer to end users 107 o Introduction of content transformation or other value-added 108 services 110 Surrogate deployments may vary in several ways, including: 111 o Proximity - surrogates may be deployed close to the master origin 112 server to reduce load on it, or near end users to reduce network 113 traffic and improve perceived latency. 114 o Selection of surrogate objects - entire Web sites may be routed 115 through surrogates, or a subset of a site's objects may be 116 nominated for publication through them, depending on the effect 117 desired, and the nature of the surrogate. 118 o Number of surrogates - surrogates may be deployed in any number. 119 Localized surrogates may use any of a number of mechanisms to 120 distribute requests between them, while distributed surrogates 121 usually use wide-area DNS load balancing. 123 By their nature, surrogates are never the parent or child of other 124 surrogates. However, they MAY have such relationships with 125 proxy/caches. 127 2.2 General Operation 129 2.2.1 Configuration 131 In order to accept and properly handle requests on behalf of a 132 master origin server, a surrogate needs to be aware of its master's 133 identity, and the profile of traffic that will be served on its 134 behalf. 136 Additionally, it may be desirable to configure surrogates with other 137 information, including: 138 o Any encryption or authentication information required by the 139 master origin server 140 o Default object handling information, including coherence 141 o Specific object handling information 142 o Other special instructions to the surrogate 144 Surrogates may be configured by a variety of mechanisms, including 145 manual, out-of-band, or vendor-specific. 147 Some types of surrogate configuration may be communicated in-band, 148 by HTTP headers described in this document. However, such 149 information is not neccessarily limited to that form of 150 communication. 152 Manual and out-of-band configuration mechanisms may vary in 153 implementation; specification of them is out of scope for this 154 document. 156 2.2.2 Request Handling 158 A surrogate is configured to forward traffic to a master origin 159 server, so that the hostname of the surrogate may be used in 160 published URLs. 162 A surrogate MAY be configured to forward traffic to multiple master 163 origin servers by using the Host request header to differentiate 164 requests. In this scenario, requests without a Host header SHOULD be 165 replied to with a 502 Gateway Error response status code. 167 Surrogates MUST accept Absolute-URI[3] as well as Relative-URI 168 requests and forward them to the master origin server, as 169 configured. They MUST NOT forward Absolute-URI requests to origin 170 servers that they have not been configured to serve. 172 Surrogates MAY use encryption (SSL or TLS) on downstream, upstream 173 or both connections. 175 2.3 Origin Server to Surrogate Optimizations 177 Surrogates serve content on behalf of nominated origin servers, 178 implying that the origin server administrator has access to 179 configure, monitor and receive logs from the surrogate. 181 Because of this, a greater degree of trust exists between them than 182 there would be between an origin server and third-party proxies. 183 This allows modification or extension of the relationship between 184 them, to offer greater control and functionality. 186 2.3.1 Separation of Coherence 188 Origin server administrators are wary of trusting third-party caches 189 to keep objects coherent, because they do not always implement 190 coherence in a predictable or correct manner. 192 Surrogate coherence behavior can be both predicted and tested by 193 origin server administrators. However, there is still need to be 194 able to describe object coherence to downstream caches. 196 This leads to the need for separate coherence mechanisms; one 197 between the master origin server and surrogates, and another between 198 surrogates and their clients. 200 This is accomplished by defining new, surrogate-specific mechanisms, 201 while traditional coherence mechanisms retain their meaning for 202 downstream caches. While the new mechanisms are introduced as HTTP 203 headers here, they MAY also be communicated by separate 204 configuration of the surrogate. 206 2.3.2 Protocol Feature Manipulation 208 Surrogates MAY add end-to-end protocol features that are not 209 supported by the origin server, in order to offer greater 210 functionality to downstream clients. For example, a surrogate could 211 add ETag validators to objects, to improve downstream cacheability. 213 Surrogates may also implement hop-by-hop mechanisms (such as 214 transfer encoding for compression and persistent connections) that 215 are lacking on the master origin server, to offer improved quality 216 of service to their clients. 218 When offering extended end-to-end features, surrogates MUST defer to 219 support on the origin server; if a feature is present there, it 220 cannot be overridden by the surrogate implementation. 222 2.4 Problems Introduced by Use of Proxies as Origin Servers 224 2.4.1 Dates and Age Calculation 226 In HTTP/1.1[2] The Date response header is required to reflect the 227 time that an object is generated on its origin server. Since 228 surrogates serve content authoritatively, objects obtained from them 229 can always be considered fresh, and SHOULD contain a current Date 230 header. 232 Passing non-current Date headers causes downstream caches to handle 233 objects with an overly conservative freshness lifetime, if it is 234 derived from either Cache-Control: max-age or some heuristic-based 235 freshness algorithms. 237 2.4.2 Interpretation of Proxy-Specific Information 239 Request headers such as Pragma: no-cache and some Cache-Control 240 headers, if honored by surrogates, may cause excessive and 241 unnecessary load on the master origin server. 243 2.4.3 Logging 245 Proxy-specific log formats may not be appropriate for use by a 246 surrogate. In particular, master origin servers often log 247 information such as the User-Agent and Referer presented by the 248 client. 250 Surrogates SHOULD be capable of logging such information, in a 251 manner compatible with common origin server logs. 253 3. Specific Requirements 255 Requirements for a surrogate are the same as those for a gateway or 256 proxy in HTTP/1.1[2], except as noted. 258 3.1 Protocol Version Interpretation 260 Implementations MUST satisfy the requirements of RFC 2145[5], 261 including those behaviors specific to proxies. 263 3.2 Methods 265 A surrogate MUST NOT accept CONNECT requests, or forward them to the 266 master origin server. 268 TRACE requests MAY be responded to as if max-forwards=0 were 269 present, to keep the surrogate's relationship with the origin server 270 private. 272 3.3 Status Codes 274 3.3.1 Redirections 276 Surrogates receiving redirections (301, 302 and 307 status codes) 277 SHOULD resolve them and serve the resulting object to clients. 279 If surrogate-specific coherence is specified in a redirect, but not 280 available for the resulting object, it SHOULD be applied to the 281 object. 283 3.3.2 Error Conditions 285 Surrogates MUST NOT change the semantics of 4xx and 5xx series 286 status codes obtained from origin servers. However, these responses 287 MAY be cached for a short period. 289 401 Unauthorized status codes MAY be generated to propagate HTTP 290 authentication; see "Working with Protocol Extensions". 292 Surrogates SHOULD send a 502 Bad Gateway error when 293 surrogate-specific directives are incomplete, contradict themselves 294 or don't parse correctly. 296 A 504 Gateway Timeout response SHOULD be sent under any of the 297 following conditions: 298 o DNS failure when resolving the origin server 299 o no route to origin server 300 o refused connection to origin server 301 o connection timeout to origin server 303 However, a surrogate MAY be configured to use a cached resource, a 304 different resource, or redirect to a different location under these 305 conditions. 307 3.4 Cache Coherence and Correctness 309 The RECOMMENDED mechanism for assuring coherence on surrogates is 310 use of Surrogate-Control request and response headers. 312 Surrogates MAY be configured to fall back to HTTP cache coherence 313 (such as Expires and Cache-Control response headers), if 314 surrogate-specific mechanisms are not available. 316 Surrogate origin servers MAY also be configured to use a heuristic 317 freshness algorithm to ensure coherence if no other freshness 318 information is available. 320 Because surrogates separate upstream and downstream coherence, they 321 MAY also implement proprietary mechanisms for assuring coherence 322 with the master origin server. 324 3.5 End-to-End Headers 326 Because a surrogate assumes the role of an origin server in 327 downstream connections, the scope of end-to-end headers is changed. 328 Although many headers can be propagated from the origin server, some 329 must be changed in order to ensure protocol compliance, and others 330 can be changed to enhance or optimize downstream connections. 332 3.5.1 Age 334 Surrogates MUST strip any Age header from responses before 335 forwarding them to clients. 337 Surrogates MUST NOT add Age headers to responses. 339 Age headers SHOULD be used by surrogates in Age calculations, when 340 determining coherence with the master origin server. 342 3.5.2 Cache-Control Request Header 344 Cache-Control headers in requests MUST NOT be honored by surrogates. 346 3.5.3 Cache-Control Response Header 348 By default, Cache-Control headers in responses from a master origin 349 server MUST NOT be honored by surrogates, and MUST be forwarded to 350 clients. 352 Surrogates SHOULD be able to be configured to honor Cache-Control 353 response headers. 355 3.5.4 Date 357 Surrogate origin servers MUST serve a current Date header with each 358 response; they MUST NOT serve a cached Date header. 360 3.5.5 ETag 362 If none are present, a surrogate MAY insert weak ETags as 363 validators, if separate coherence with the master origin server has 364 been established. 366 3.5.6 Expires 368 By default, Expires response headers SHOULD NOT be honored by 369 surrogates, unless configured to do so. Surrogates MUST forward 370 Expires headers to clients. 372 It has been observed that that if a Cache-Control: max-age response 373 header is set, many origin servers will set a complimentary Expires: 374 value, to duplicate the intended freshness effect for HTTP/1.0 375 clients. To accommodate this, surrogates SHOULD recalculate the 376 Expires header to match the delta communicated in Cache-Control: 377 max-age, but only if both are present in a response, and are 378 equivalent. 380 Some older Web servers have been observed to set an Expires header 381 based on an offset from the Date, without setting a Cache-Control: 382 max-age header. This is problematic, as it is difficult to 383 distinguish these responses from those which wish to expire content 384 at an absolute date. Surrogates MAY compensate for this by 385 considering objects which specify an Expires without a 386 Cache-Control: max-age directive stale when the Expires time is 387 reached; however, this may have undesirable effects in some 388 situations. 390 3.5.7 Host 392 Surrogate origin servers MUST replace any Host header in requests 393 with the name of the appropriate master origin server before 394 forwarding it. 396 3.5.8 Last-Modified 398 Last-Modified response headers MUST NOT be modified by a surrogate. 400 3.5.9 Pragma 402 Surrogate origin servers MUST NOT honor Pragma request directives. 404 3.5.10 Proxy-Authenticate 406 Surrogates MUST NOT include a Proxy-Authenticate header in responses 407 to clients. 409 3.5.11 Proxy-Authorization 411 Surrogates MUST ignore Proxy-Authorization headers in requests from 412 clients. 414 3.5.12 Server 416 Surrogates MAY set their own Server response header, replacing any 417 present. 419 3.5.13 Via 421 Surrogates SHOULD append a Via header to requests, as outlined in 422 RFC2616[2]. 424 3.6 Surrogate-Control HTTP Headers 426 Surrogate-specific HTTP headers allow specification of metadata in 427 requests or responses to the surrogate. These can be though of as 428 analogies of cache-affecting headers such as Cache-Control. 430 Surrogate-Specific headers MUST be consumed before forwarding a 431 request or response. 433 3.6.1 Surrogate-Control Request Header 435 Surrogate-Control request directives have similar semantics and 436 effects as Cache-Control request headers. Defined directives are: 438 no-cache 439 Has same meaning as a Cache-Control: max-age request directive to 440 a proxy. 441 only-if-cached 442 Has same meaning as a Cache-Control: only-if-cached request 443 directive to a proxy. 445 3.6.2 Surrogate-Control Response Header 447 Surrogate-Control response directives have similar semantics and 448 effects as Cache-Control response headers. Defined directives are: 450 max-age 451 Has same meaning as a Cache-Control: max-age response directive 452 to a proxy. 453 no-cache 454 Has same meaning as a Cache-Control: no-cache response directive 455 to a proxy. 456 must-revalidate 457 Has same meaning as a Cache-Control: must-revalidate response 458 directive to a proxy. 460 Surrogates SHOULD require some form of client authentication when 461 honoring Surrogate-Control response directives. 463 3.7 Surrogate-Generated Headers 465 3.7.1 X-Forwarded-For Request Header 467 Surrogates SHOULD be capable of adding a header that denotes the 468 client which requested the object. 470 3.7.2 X-Served-For Response Header 472 Surrogates MAY add a response header which denotes the name of the 473 master origin server, if it is not obvious in the Request-URI, in 474 order to enable third parties to identify the source of the content. 476 4. Working with Protocol Extensions 477 4.1 HTTP Authentication 479 Surrogates receiving responses with WWW-Authenticate headers MUST 480 NOT serve them without assuring that the client has presented proper 481 credentials. 483 HTTP Authentication may also be used to prevent access to the origin 484 server by unauthorised clients, while allowing unauthenticated 485 access to the objects through the surrogate. To accomplish this, a 486 surrogate MAY be configured to send Authorization request headers, 487 with a predetermined authentication realm. 489 5. Controlling Effects of Upstream Proxies 491 Surrogates SHOULD append appropriate Cache-Control and Pragma 492 request headers to assure that any intermediate proxy/caches do not 493 serve a response without validation on the master origin server. 495 6. Security Considerations 497 6.1 Surrogate to Origin Authentication and Security 499 Surrogates SHOULD allow use of SSL on the connection to the origin 500 server, while serving objects unencrypted, to increase security 501 between them. 503 They SHOULD also support at least one of the following 504 authentication mechanisms for origin server access: 505 o Client-Side SSL Certificates 506 o HTTP Authentication into a specific realm (see "HTTP 507 Authentication") 508 o Cookie-based authentication (using cookie value as shared secret) 510 6.2 Knowledge of Surrogate/Origin Relationship 512 It may or may not be necessary to hide the relationship between 513 surrogates and origin servers, depending on the nature of their use. 515 Surrogates SHOULD allow configuration to accomplish this. 516 Specifically, this includes all HTTP headers that identify responses 517 as coming from a surrogate, TRACE requests, and error responses and 518 warnings that identify the surrogate. 520 References 522 [1] Cooper, I., Melve, I. and G. Tomlinson, "Internet Web 523 Replication and Caching Taxonomy", November 1999. 525 [2] Fielding, R., Gettys, J., Mogul, J. C., Frystyk, H., Masinter, 526 L., Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol 527 - HTTP/1.1", RFC 2616, June 1999. 529 [3] Berners-Lee, T., Fielding, R.T. and L. Masinter, "Uniform 530 Resource Identifiers (URI): Generic Syntax", RFC 2396, August 531 1998. 533 [4] Bradner, S., "Key words for use in RFCs to Indicate Requirement 534 Levels", RFC 2119, March 1997. 536 [5] Fielding, R., Gettys, J., Mogul, J. C. and H. Frystyk, "Use and 537 Intepretation of HTTP Version Numbers", RFC 2145, May 1997. 539 Author's Address 541 Mark Nottingham 542 Akamai Technologies 543 Suite 703, 1400 Fashion Island Bvld 544 San Mateo, CA 94404 545 US 547 EMail: mnot@akamai.com 548 URI: http://www.akamai.com/ 550 Appendix A. Acknowledgements 552 The author gratefully acknowledges the contributions of: John 553 Dilley, John Martin, Joel Wein, Peter Danzig, Chuck Neerdaels, and, 554 David Karger. 556 Full Copyright Statement 558 Copyright (C) The Internet Society (2000). All Rights Reserved. 560 This document and translations of it may be copied and furnished to 561 others, and derivative works that comment on or otherwise explain it 562 or assist in its implementation may be prepared, copied, published 563 and distributed, in whole or in part, without restriction of any 564 kind, provided that the above copyright notice and this paragraph 565 are included on all such copies and derivative works. However, this 566 document itself may not be modified in any way, such as by removing 567 the copyright notice or references to the Internet Society or other 568 Internet organizations, except as needed for the purpose of 569 developing Internet standards in which case the procedures for 570 copyrights defined in the Internet Standards process must be 571 followed, or as required to translate it into languages other than 572 English. 574 The limited permissions granted above are perpetual and will not be 575 revoked by the Internet Society or its successors or assigns. 577 This document and the information contained herein is provided on an 578 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 579 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 580 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 581 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 582 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 584 Acknowledgement 586 Funding for the RFC editor function is currently provided by the 587 Internet Society.