idnits 2.17.1 draft-drechsler-httpbis-improved-caching-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 16, 2016) is 2900 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 7234 (Obsoleted by RFC 9111) -- Obsolete informational reference (is this intentional?): RFC 7231 (Obsoleted by RFC 9110) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Drechsler, Ed. 3 Internet-Draft Technische Universitaet Chemnitz 4 Intended status: Standards Track May 16, 2016 5 Expires: November 17, 2016 7 Hypertext Transfer Protocol: Improved HTTP Caching 8 draft-drechsler-httpbis-improved-caching-05 10 Abstract 12 This document describes an improved HTTP caching method which can be 13 applied in addition to the standard caching behavior for HTTP. It 14 defines the associated header field that controls this improved 15 caching mechanism and a modified caching operation which is slightly 16 different to standard caching operation for HTTP. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at http://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on November 17, 2016. 35 Copyright Notice 37 Copyright (c) 2016 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 This document may contain material from IETF Documents or IETF 51 Contributions published or made publicly available before November 52 10, 2008. The person(s) controlling the copyright in some of this 53 material may not have granted the IETF Trust the right to allow 54 modifications of such material outside the IETF Standards Process. 55 Without obtaining an adequate license from the person(s) controlling 56 the copyright in such materials, this document may not be modified 57 outside the IETF Standards Process, and derivative works of it may 58 not be created outside the IETF Standards Process, except to format 59 it for publication as an RFC or to translate it into languages other 60 than English. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 65 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 66 2. Specification . . . . . . . . . . . . . . . . . . . . . . . . 3 67 2.1. HTTP header field extension . . . . . . . . . . . . . . . 4 68 2.2. Modified cache operation . . . . . . . . . . . . . . . . 6 69 2.2.1. Incoming Request Messages . . . . . . . . . . . . . . 6 70 2.2.2. Incoming Response Messages . . . . . . . . . . . . . 6 71 2.3. Suggestions . . . . . . . . . . . . . . . . . . . . . . . 11 72 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 73 3.1. Header Field Registration . . . . . . . . . . . . . . . . 11 74 3.2. Cache Directive Registration . . . . . . . . . . . . . . 11 75 4. Security Considerations . . . . . . . . . . . . . . . . . . . 12 76 5. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 77 5.1. Normative References . . . . . . . . . . . . . . . . . . 13 78 5.2. Informative References . . . . . . . . . . . . . . . . . 13 79 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 13 81 1. Introduction 83 HTTP caching has a significant potential for reducing Interdomain 84 traffic, especially when shared caches are used within operator 85 networks. Recent studies have shown very promising results regarding 86 the cacheability of HTTP traffic (see [Ager], [Erman]). 88 Unfortunately this potential can not be fully used by the standard 89 caching behavior described in [RFC7234]. The following two reasons 90 mainly limit the benefit of caching today: 92 1. Different URLs for one specific resource: 94 For cache systems which follow the instructions in [RFC7234] 95 the URL mainly serves as a identifier for the cached content. 96 Unfortunately due to mechanisms like load balancing and/or the 97 use of CDNs the URL for one specific resource can vary. From 98 the point of the cache system two different URLs mean two 99 different cache items notwithstanding that the cache items can 100 be identical in their bit-representation. Therefore caching 101 systems usually store one specific content several times and 102 use storage capacity which could potentially be used for 103 caching of other contents. 105 2. Personalization of HTTP messages in the header: 107 When HTTP messages carry personal information like cookies, 108 session IDs in the query string (this affects also point 1) or 109 other header attributes for the purpose of personalization (or 110 managing state) then shared caches cannot reuse these 111 responses for following requests. In this context content 112 producers allow caching only in the browser of the user (e.g. 113 via Cache-control: private) or deny caching at all. If a 114 specific representation is requested several times by 115 different clients then this would result in HTTP messages 116 which differ in the headers while the bodies are equal. 117 According to [Ager] personalization is also one of the main 118 reasons for the unused potential of caching. 120 The goal of this proposal is to address these challenges and come up 121 with caching, varying URLs and personalization. 123 1.1. Requirements Language 125 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 126 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 127 document are to be interpreted as described in RFC 2119 [RFC2119]. 129 2. Specification 131 The approach for an improved HTTP caching in this proposal is 132 twofold. 134 Section 2.1 introduces a new header field with a hash value. This is 135 used for precisely identifying the transfered content in the body of 136 HTTP messages and to signal the permission for caching and reusing of 137 the body in intermediate cache systems. 139 The modified caching operation described in Section 2.2 uses the 140 above-mentioned header field and ensures that all headers (of HTTP 141 request and response messages) are exchanged between client and 142 server even if the body of a response message is coming from an 143 intermediate cache systems. 145 2.1. HTTP header field extension 147 For precisely identifying the transfered content independent of the 148 used URL and independent of additional header fields in the context 149 of content negotiation the following header field is used: 151 Cache-NT: sha-256 "=" 153 The new header field carries an SHA-256 value (algorithm as in [SHS]) 154 which is computed and encoded the following way: 156 1. When a client wants to retrieve a specific content it uses a HTTP 157 GET request with a URL to address the resource. Additionally the 158 client can use further header fields to negotiate that 159 representation of the resource which fits best for the client 160 (this mechanisms is called content negotiation in [RFC7231]). 161 The SHA-256 value MUST be computed over that representation of 162 the resource which would be send by the server to the client in 163 case of a successful response with status code 200 OK. 165 2. The SHA-256 hash value MUST be computed before the modifications 166 of the possibly present header fields Content-Encoding, Content- 167 Range and Transfer-Encoding are applied. 169 3. The SHA-256 hash value MUST always be computed over the full 170 representation even if only parts of it are transfered to the 171 client (e. g. partial content, delta encoding). The hash value 172 serves as an unique identifier for intermediate cache systems to 173 identify also parts of the full representation. 175 4. The SHA-256 hash value MUST be computed by the origin server. It 176 SHOULD be computed only once (when the resource is made available 177 on the server or when the resource has changed). It SHOULD NOT 178 be computed in the moment when the server receives the request 179 due to not delaying the response. 181 5. After computing the SHA-256 hash value the output of it MUST be 182 base64 encoded without line wrapping. 184 The Cache-NT header field is send by the server in successful 185 responses with status codes 200 or 206. If the header field is 186 present then the server signals that the body of the response can be 187 used for caching by intermediate cache systems for subsequent 188 requests in compliance with the cache operation described in 189 Section 2.2. 191 In the following some examples are given: 193 Example header field: 195 Cache-NT: sha-256=ZDJhODRmNGI4YjY1M ... DgyMjlkYTgwNGEyNiAgLQo= 197 Example for computation of the hash value under UNIX: 199 sha256sum PopularVideo.mp4 | base64 -w0 201 Several examples of request-response pairs: 203 a) 204 +---------------------------------------------+ 205 | GET /videos/PopularVideo.webm HTTP/1.1 | 206 | Host: example.com | 207 +---------------------------------------------+ 209 +---------------------------------------------+ 210 | HTTP/1.1 200 OK | 211 | Content-Type: video/webm | 212 | Cache-NT: sha-256=AAAAAAAAAA...AAAAAAAAAA | 213 | ... | 214 +---------------------------------------------+ 215 b) 216 +---------------------------------------------+ 217 | GET /videos/PopularVideo.webm HTTP/1.1 | 218 | Host: example.com | 219 | Range: bytes=0-499 | 220 +---------------------------------------------+ 222 +---------------------------------------------+ 223 | HTTP/1.1 206 Partial Content | 224 | Content-Type: video/webm | 225 | Content-Range: bytes 0-499/1000 | 226 | Cache-NT: sha-256=AAAAAAAAAA...AAAAAAAAAA | 227 | ... | 228 +---------------------------------------------+ 230 => same hash value as in a) because only a part of the 231 representation is requested 233 c) 234 +---------------------------------------------+ 235 | GET /videos/PopularVideo HTTP/1.1 | 236 | Host: example.com | 237 | Accept: video/mp4 | 238 +---------------------------------------------+ 240 +---------------------------------------------+ 241 | HTTP/1.1 200 OK | 242 | Content-Type: video/mp4 | 243 | Cache-NT: sha-256=BBBBBBBBBB...BBBBBBBBBB | 244 | ... | 245 +---------------------------------------------+ 247 => different representation as in a) and b) results in a 248 different hash value 250 2.2. Modified cache operation 252 The modified cache operation is slightly different to the one in 253 [RFC7234]. It uses the header field described in Section 2.1 and 254 ensures that all headers (of HTTP request and response messages) are 255 exchanged between client and server even if the body of a response 256 message is coming from an intermediate cache systems. Client 257 requests will never terminate at intermediate cache systems as in 258 [RFC7234]. 260 2.2.1. Incoming Request Messages 262 Incoming request messages MUST always be forwarded to the origin 263 server by the intermediate cache system. 265 For HTTP/1.0 or HTTP/1.1 requests the cache system SHOULD keep track 266 of the desired connection state by evaluating the Connection header 267 field. 269 For HTTP/1.1 requests the cache system MUST keep track of all 270 pipelined requests. 272 2.2.2. Incoming Response Messages 274 The cache system analyzes the header of incoming response messages. 275 If the status code IS NOT 200 or 206 then the response is forwarded 276 to the client without modifications. If the status code IS 200 or 277 206 then the cache system looks for the Cache-NT header field 278 (described in Section 2.1). Two situations can arise: 280 a. The Cache-NT header field IS NOT present: 282 Then the response message is forwarded to the client without 283 modifications. 285 b. The Cache-NT header field IS present: 287 Then the cache system analyzes the hash value in the Cache-NT 288 header field. Two situations can arise: 290 1. The cache system has NO cache entry which fits to the hash 291 value in the Cache-NT header field (cache miss): 293 Then the response message is forwarded to the client 294 without modifications. To prevent cache poisoning the 295 cache system computes the hash value over the 296 transferred representation in the body (as it is 297 described in Section 2.1) and if it does match to the 298 hash value in the Cache-NT header field of the response 299 from the server then a copy of the message body is 300 stored in the cache system. Figure 2 visualizes this 301 cache operation in case of a cache miss. 303 2. The cache system has an cache entry which fits to the hash 304 value in the Cache-NT header field (cache hit): 306 After receiving of the whole message header the cache 307 system aborts the transfer of the message body from the 308 server: 310 o HTTP/2: Via sending RST_STREAM to the server. As 311 each HTTP request-response exchange is assigned to a 312 single stream no side effects will arise. 314 o HTTP/1.0: Via closing the TCP connection to the 315 server (and sending TCP_RST). If the TCP connection 316 was intended to stay open (signaling via the 317 Connection header field) then the cache system 318 SHOULD open an new TCP connection (with a new TCP 319 port) to the server immediately for following 320 requests by the client. 322 o HTTP/1.1: Via closing the TCP connection (and 323 sending TCP_RST). If the TCP connection was 324 intended to stay open (signaling via the Connection 325 header field) then the cache system SHOULD open an 326 new TCP connection (with a new TCP port) to the 327 server immediately for following requests by the 328 client. If pipelining was used then the cache 329 system MUST retrieve all requests after the current 330 request once again. 332 After that the cache system uses the already received 333 message header from the server and concatenates it with 334 the locally stored body. In this process the cache 335 systems MUST follow the possibly present header fields 337 o Content-Encoding 338 o Content-Range 340 o Transfer-Encoding 342 and MUST transform the body in the right way. This 343 means that the client will receive exactly the same 344 HTTP response message which was originally send out by 345 the server. Figure 1 visualizes this cache operation 346 in case of a cache hit. 348 +-----------------+ +-----------------+ 349 | HEADER (Client) | <-------------------------- | HEADER (Client) | 350 |-----------------| request is forwarded |-----------------| 351 | BODY (Client) | <-------------------------- | BODY (Client) | 352 +-----------------+ +-----------------+ 354 ############ ############ ############ 355 # # <------------ # # <------------- # # 356 # Server # # Cache # # Client # 357 # # ------------> # # -------------> # # 358 ############ ############ ############ 360 +-----------------+ +-----------------+ 361 | HEADER (Server) | --------------------------> | HEADER (Server) | 362 |-----------------| HEADER (Server) + BODY |-----------------| 363 | BODY (Server) | (Cache) is forwarded | | 364 | | --------------> | BODY (Cache) | 365 ... | | | 366 | +-----------------+ 367 | 368 | local stored copy of the body is 369 | used and concatenated with the 370 | header from the server 371 | 372 | 373 ============ | ============ 374 || | || 375 || | || 376 || +-----------------+ || 377 || | | || 378 || | BODY (Cache) | || 379 || | | || 380 || +-----------------+ || 381 || || 382 || || 383 || || 384 || cache storage || 385 || || 386 =========================== 388 Cache operation in case of cache hit. 390 Figure 1 392 +-----------------+ +-----------------+ 393 | HEADER (Client) | <-------------------------- | HEADER (Client) | 394 |-----------------| request is forwarded |-----------------| 395 | BODY (Client) | <-------------------------- | BODY (Client) | 396 +-----------------+ +-----------------+ 398 ############ ############ ############ 399 # # <------------ # # <------------- # # 400 # Server # # Cache # # Client # 401 # # ------------> # # -------------> # # 402 ############ ############ ############ 404 +-----------------+ +-----------------+ 405 | HEADER (Server) | --------------------------> | HEADER (Server) | 406 |-----------------| response (HEADER + BODY) |-----------------| 407 | | is forwarded | | 408 | BODY (Server) | --------------------------> | BODY (Server) | 409 | | | | | 410 +-----------------+ | +-----------------+ 411 | 412 | copy of body is stored in cache 413 | 414 | 415 ============ | ============ 416 || | || 417 || V || 418 || +-----------------+ || 419 || | | || 420 || | BODY (Server) | || 421 || | | || 422 || +-----------------+ || 423 || || 424 || || 425 || || 426 || cache storage || 427 || || 428 =========================== 430 Cache operation in case of cache miss. 432 Figure 2 434 2.3. Suggestions 436 In case of a cache hit the cache system aborts the transfer of the 437 response body from the server after the whole header has been 438 received (see Section 2.2). As the transfer of the body cannot be 439 aborted immediately the server will still send some parts of the 440 body. How many Kilobytes are transfered depends mainly on the 441 congestion window of the underlying TCP connection. If the 442 congestion window is small then only a few Kilobytes of the response 443 will go over the wire. 445 Evaluations at Technische Universitaet Chemnitz have shown that at 446 least around 20 Kilobytes are transfered between origin server and 447 cache system in case of a cache hit (this is for a HTTP/1.0 or 448 HTTP/1.1 request right after opening a TCP connection). Therefore 449 including the Cache-NT header field for small resources does not make 450 much sense from the point of caching as the whole body is being 451 transfered before the cache system can abort it. 453 3. IANA Considerations 455 3.1. Header Field Registration 457 HTTP header fields are registered within the Message Header Field 458 Registry maintained at . 461 This document defines the following HTTP header fields, so their 462 associated registry entries shall be updated according to the 463 permanent registrations below (see [BCP90]): 465 +-------------------+----------+-------------------+--------------+ 466 | Header Field Name | Protocol | Status | Reference | 467 +-------------------+----------+-------------------+--------------+ 468 | Cache-NT | http | proposed standard | Section 2.1 | 469 +-------------------+----------+-------------------+--------------+ 471 The change controller is: "IETF (iesg@ietf.org) - Internet 472 Engineering Task Force". 474 3.2. Cache Directive Registration 476 This document defines the following HTTP header field directives: 478 +-----------------+--------------+ 479 | Cache Directive | Reference | 480 +-----------------+--------------+ 481 | sha-256 | Section 2.1 | 482 +-----------------+--------------+ 484 4. Security Considerations 486 This section is meant to inform developers, information providers, 487 and users of known security concerns specific to the caching 488 mechanism described in this proposal. In addition more general 489 security considerations of HTTP caching are discussed in Section 8 of 490 [RFC7234]. 492 The cache operation in Section 2.2 uses the Cache-NT header field 493 (see Section 2.1) in incoming response messages. If the hash value 494 in the Cache-NT header field of the (server) response does not 495 correspond to the representation in the body of that response then a 496 wrong body is maybe concatenated to the header of the server and send 497 to the client (this occurs when the cache system has an cache entry 498 which fits to the hash value in the response of the server). Origin 499 server SHOULD always include the correct hash value in the Cache-NT 500 header field which fits to the representation in the body. 501 Intermediaries MUST NOT change the hash value in the Cache-NT. In 502 addition the client can compute the hash value over the full 503 representation (in case of responses with 200 OK) itself and can re- 504 validate it with the value in the Cache-NT header field. 506 If a cache system does not have a cache entry which fits to the hash 507 value in the Cache-NT header field then it forwards the response to 508 the client and stores a local copy of the body (see Section 2.2). To 509 prevent cache poisoning the cache system SHOULD compute the hash 510 value over the full representation in the body (in case of responses 511 with 200 OK) itself and SHOULD re-validate it with the value in the 512 Cache-NT header field. 514 Another security concern will arise if significant security flaws in 515 the used hash algorithm (currently SHA-256) are detected. Then the 516 cache can easily be poisoned. In this case origin servers and 517 intermediate cache systems MUST switch to another hash algorithm (e. 518 g. SHA-512 or the upcoming SHA-3 family). 520 5. References 521 5.1. Normative References 523 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 524 Requirement Levels", BCP 14, RFC 2119, March 1997. 526 [RFC7234] Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke, 527 Ed., "Hypertext Transfer Protocol (HTTP/1.1): Caching", 528 RFC 7234, June 2014. 530 5.2. Informative References 532 [Ager] Ager, B., Schneider, F., Juhoon, K., and A. Feldmann, 533 "Revisiting Cacheability in Times of User Generated 534 Content", IEEE Conference on Computer Communications, 535 Workshops pp. 1-6, March 2010, 536 . 539 [BCP90] Klyne, G., Nottingham, M., and J. Mogul, "Registration 540 Procedures for Message Header Fields", BCP 90, RFC 3864, 541 September 2004. 543 [Erman] Erman, J., Gerber, A., Hajiaghayi, M., Pei, D., and O. 544 Spatscheck, "Network-aware forward caching", Proceedings 545 of the 18th international conference on World wide web pp. 546 291-300, 2009, 547 . 549 [RFC7231] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 550 Protocol (HTTP/1.1): Semantics and Content", RFC 7231, 551 June 2014. 553 [SHS] National Institute of Standards and Technology, "Secure 554 Hash Standard (SHS)", FEDERAL INFORMATION PROCESSING 555 STANDARDS PUBLICATION 180-4, U.S. Department of Commerce , 556 March 2012, . 559 Author's Address 561 Chris Drechsler (editor) 562 Technische Universitaet Chemnitz 563 09107 Chemnitz 564 Germany 566 Email: chris.drechsler@etit.tu-chemnitz.de