idnits 2.17.1 draft-vkrasnov-h2-compression-dictionaries-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: A server that wishes to apply protocol level compression on a stream or use a stream as a dictionary SHOULD not apply non-identity content-coding (see [RFC7231], section 3.1.2.1) to that stream. -- The document date (March 5, 2018) is 2244 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'NDict' is mentioned on line 363, but not defined == Unused Reference: 'BREACH' is defined on line 547, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 7231 (Obsoleted by RFC 9110) ** Obsolete normative reference: RFC 7540 (Obsoleted by RFC 9113) Summary: 4 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group V. Krasnov 3 Internet-Draft Cloudflare, Inc. 4 Intended status: Informational Y. Weiss 5 Expires: September 6, 2018 Akamai Technologies, Inc. 6 March 5, 2018 8 Compression Dictionaries for HTTP/2 9 draft-vkrasnov-h2-compression-dictionaries-03 11 Abstract 13 This document specifies new HTTP/2 frame types and new HTTP/2 14 settings values that enable the use of previously transferred data as 15 compression dictionaries, significantly improving overall compression 16 ratio for a given connection. 18 In addition, this document proposes to define a set of industry 19 standard, static, dictionaries to be used with any Lempel-Ziv based 20 compression for the common textual MIME types prevalent on the web. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at https://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on September 6, 2018. 39 Copyright Notice 41 Copyright (c) 2018 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (https://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 1.1. Conventions and Terminology . . . . . . . . . . . . . . . 3 58 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 3 59 2.1. Security Considerations . . . . . . . . . . . . . . . . . 3 60 2.2. Content Coding . . . . . . . . . . . . . . . . . . . . . 3 61 2.3. Compression Contexts . . . . . . . . . . . . . . . . . . 4 62 2.4. Server Push Interaction . . . . . . . . . . . . . . . . . 4 63 2.5. HTTP/QUIC . . . . . . . . . . . . . . . . . . . . . . . . 4 64 3. HTTP/2 Extension . . . . . . . . . . . . . . . . . . . . . . 4 65 3.1. Extension Settings . . . . . . . . . . . . . . . . . . . 4 66 3.2. Extension Frames . . . . . . . . . . . . . . . . . . . . 5 67 3.2.1. The SET_COMPRESSION_CONTEXT frame . . . . . . . . . . 5 68 3.2.2. The SET_DICTIONARY Frame . . . . . . . . . . . . . . 5 69 3.2.3. The USE_DICTIONARY Frame . . . . . . . . . . . . . . 7 70 3.3. Static Dictionaries . . . . . . . . . . . . . . . . . . . 7 71 4. Dictionary State . . . . . . . . . . . . . . . . . . . . . . 8 72 4.1. Attack scenarios and mitigations . . . . . . . . . . . . 10 73 4.1.1. Cross-origin secret leak . . . . . . . . . . . . . . 10 74 4.1.2. Same-origin secret leak . . . . . . . . . . . . . . . 11 75 5. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 76 5.1. Normative References . . . . . . . . . . . . . . . . . . 12 77 5.2. Informative References . . . . . . . . . . . . . . . . . 12 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 80 1. Introduction 82 The HTTP/2 [RFC7540] protocol encourages the use of many small assets 83 for CSS/JS/HTML, due to its multiplexed nature. Prior to HTTP/2, 84 asset inlining was encouraged, resulting in fewer, larger assets per 85 website. 87 The HTTP/2 protocol also allows for transmitted data to be compressed 88 with a lossless compression format. The format used is specified in 89 the "Content-Encoding" (see [RFC2616], section 14.11) header field. 90 For example, "Content-Encoding: br" means the data was compressed 91 using the Brotli format. 93 The nature of the compression algorithms, such as DEFLATE [RFC1951] 94 and Brotli [RFC7932], used with HTTP in practice, require a certain 95 "window" of data to perform backward matching. Therefore, larger 96 files have much better compression ratio. To improve compression for 97 smaller files, these algorithms allow to use a chunk of arbitrary 98 data as a "Custom Dictionary" and function as the initial sliding 99 window. 101 Note: While that is not longer true for the latest stable version of 102 Brotli, there's work underway to re-enable use of arbitrary 103 compression dictionaries. 105 Compression is a compute-heavy operation, where investing additional 106 compute power results in diminishing returns (in terms of compression 107 ratio/CPU cycles). The "Custom Dictionary" technique is known to 108 improve compression ratio significantly, with little additional 109 computational cost. It is also supported by most Lempel-Ziv based 110 compression formats. 112 This document introduces a mechanism for using previously transmitted 113 data over HTTP/2 as a dictionary to be used with an underlying 114 compression algorithm. 116 1.1. Conventions and Terminology 118 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 119 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 120 "OPTIONAL" in this document are to be interpreted as described in RFC 121 2119 [RFC2119]. 123 2. Preliminaries 125 2.1. Security Considerations 127 The use of compression over an encrypted connection could be used by 128 malicious actors to potentially leak sensitive information. We will 129 collaborate with industry experts to identify any additional attack 130 vectors introduced by this draft, and include a set of best practices 131 to both servers and clients that would implement it. 133 A list of attack vectors and potential mitigations is described later 134 in this document. 136 2.2. Content Coding 138 A server that wishes to apply protocol level compression on a stream 139 or use a stream as a dictionary SHOULD not apply non-identity 140 content-coding (see [RFC7231], section 3.1.2.1) to that stream. 142 2.3. Compression Contexts 144 In the scope of this document, a compression context is a set of non- 145 overlaping streams, that SHALL only be used as compression 146 dictionaries for streams within the same compression context. While 147 it is the responsibility of the server to implement best-practice 148 techniques to mitigate cross-compression side channel attacks, 149 compression contexts let the client mitigate some of the risks of 150 cross-compression side channel attacks, by explicitly stating which 151 requests can be cross-compressed with which requests. 153 For example a client may choose to disable compression for cross-site 154 requests by assigning them to different compression contexts. 156 2.4. Server Push Interaction 158 Pushed streams may be cross-stream compressed or used as 159 dictionaries, same as a regular stream. In some scenarios it may 160 benefit the server to push a dummy resource to prime a dictionary. 162 2.5. HTTP/QUIC 164 Due to the nature of this draft, it is expected that a strict order 165 is maintained between the definition and consumption of dictionaries. 166 The nature of QUIC is such that frames and streams might not 167 delivered in the order they are sent, therefore, a head-of-line 168 blocking may occur when implementing compression dictionaries in 169 HTTP/QUIC. This is similar to the tradeoff present in the HPACK/QUIC 170 mapping. 172 3. HTTP/2 Extension 174 3.1. Extension Settings 176 The extension introduces a new SETTINGS value. 178 SETTINGS_COMPRESSION(0xTBA): For greater compression, and to prevent 179 setting identifier depletion, the 32-bit value for this setting is 180 defined as follows: 182 +---------------+---------+-----------+-----------+ 183 | SDVersion (8) | Fmt (8) | DSize (8) | NDict (8) | 184 +---------------+---------+-----------+-----------+ 186 NDict: Indicates the number of dictionaries the client is willing to 187 maintain. The default value is 0, the maximal value is 255. 189 DSize: Log2 of the maximal size of each dictionary. The default 190 value is 0, the maximal value is 255. For example value of 17 191 indicates each dictionary MUST be smaller or equal to 2^17 192 (131,072 octets). 194 Fmt: Compression format to use, as a bitmask. 1st bit indicates 195 brotli, 2nd bit indicates zlib. Other bits are reserved for 196 future compression methods. A value of 0 indicates no support for 197 cross-stream compression. 199 SDVersion: If greater than 0, indicates the version of static 200 dictionaries to use. Maximal value is 255, the default value is 201 0, which indicates no static dictionaries are used. 203 3.2. Extension Frames 205 3.2.1. The SET_COMPRESSION_CONTEXT frame 207 The SET_COMPRESSION_CONTEXT frame (type=0xTBA). 209 +-------------+ 210 | Context (8) | 211 +-------------+ 213 The SET_COMPRESSION_CONTEXT frame can be sent by the client on any 214 stream in the idle state. The frame indicates the compression 215 context ID for the given stream. Frames with an assigned context 216 SHALL NOT be compressed using dictionaries from a different context. 217 Frames with an assigned context SHALL NOT be used as a dictionary for 218 streams with from a different context. 220 The SET_COMPRESSION_CONTEXT frame contains the following fields: 222 Context: an 8-bit context ID that indicates the compression context 223 for the stream. If the frame is ommited, then the context value 224 is assumed to be 0. The allowed context values are 0 through 255. 225 A special context ID of 255 indicates the stream can only be 226 compressed using the static dictionaries. 228 3.2.2. The SET_DICTIONARY Frame 230 The SET_DICTIONARY frame (type=0xTBA) contains one to many 231 Dictionary-Entry. 233 +---------------+---------------+ 234 | Dictionary-Entry (+) ... 235 +---------------+---------------+ 236 A Dictionary-Entry field is encoded as follows: 238 +-------------------------------+ 239 | Dictionary-ID (8) | 240 +---+---------------------------+ 241 | P | Size (7+) | 242 +---+---------------------------+ 243 | E?| D?| Truncate? (6+) | 244 +---+---------------------------+ 245 | Offset? (8+) | 246 +-------------------------------+ 248 The SET_DICTIONARY frame can be sent from the server to the client, 249 on any client initiated stream in the open or half-closed (remote) 250 states, or on any server initiated stream in the reserved (local) 251 state. The SET_DICTIONARY frame MUST precede any DATA frames on that 252 stream. The SET_DICTIONARY frame SHOULD be followed by sufficient 253 DATA frames to build the dictionaries. If a RST frame was received 254 for the stream before sufficient DATA was sent, the dictionaries are 255 reset. 257 The Dictionary-Entry contains the following fields: 259 Dictionary-ID: an 8-bit ID, indicates the dictionary. MUST be lower 260 than the value agreed by the SETTINGS_COMPRESSION setting. 262 Size: Indicates how many octets of the stream will be used for the 263 dictionary. Size is represented as an integer with 7-bit prefix 264 (see [RFC7541], Section 5.1). If P is set, the actual number of 265 octets to use is 2 to the power of Size. If the computed value is 266 greater than the length of the decompressed DATA, use all the 267 available DATA. 269 Truncate: An optional field, represented as an integer with 6-bit 270 prefix. Present when the APPEND flag is set. Truncate indicates 271 the number of octets to keep of the existing dictionary, before 272 appending the new data to it. If E is set, then Truncate is 273 ignored, and new data is appended at the end. If Truncate is 274 zero, then the dictionary is replaced, as if APPEND was unset. If 275 the optional field D is set, then the first Truncate octets of the 276 previous dictionary are used, otherwise the last Truncate octets 277 are used. 279 Offset: An optional field, represented as an integer with 8-bit 280 prefix. Present when the OFFSET flag is set. Offset indicates 281 that the first Offset octets of the stream are ignored when 282 building the dictionary. 284 The flags defined for the SET_DICTIONARY frame apply to each 285 Dictionary-Entry in the frame. The SET_DICTIONARY frame defines the 286 following flags: 288 APPEND (0x1): Indicates that the data is to be appended to the 289 existing dictionary with the given ID, as opposed to replacing it 290 with the new data. Also indicates that fields E, D and Truncate 291 are present. 293 OFFSET (0x2): Indicates the presence of the Offset field. 295 3.2.3. The USE_DICTIONARY Frame 297 The USE_DICTIONARY frame (type=0xTBA). 299 +-------------+ 300 | Dict ID (8) | 301 +-------------+ 303 The USE_DICTIONARY frame indicates that the current stream is 304 compressed with the indicated dictionary. The USE_DICTIONARY frame 305 MUST be sent prior to any DATA frame on a given stream. 306 SET_DICTIONARY and USE_DICTIONARY frames MAY be sent on the same 307 stream. Only one USE_DICTIONARY frame MAY be sent for a stream. 309 The USE_DICTIONARY frame contains the following fields: 311 Dict ID: an 8-bit ID that indicates which dictionary to use. The 312 dictionary MUST be previously defined by a SET_DICTIONARY frame, 313 or by a static dictionary. 315 3.3. Static Dictionaries 317 This document proposes to generate a set of up to 8 standard 318 dictionaries to be optionally bundled with supporting 319 implementations. Each dictionary should be 32,768 or 65,536 octets 320 long. 322 Each static dictionary will be identified by an integer ID in the 323 range {0..7}. 325 If either endpoint supports the use of static dictionaries, it will 326 indicate this by setting the SDVersion value of SETTINGS_COMPRESSION 327 to greater than 0. The number will indicate the highest version of 328 the dictionaries known. 330 The actual version used will be the lowest of the two values set by 331 the endpoints. 333 If the client and the server agree on the use of static dictionaries, 334 then both will initialize the first 8 dictionaries (IDs 0 through 7), 335 with the contents of the static dictionaries. The static 336 dictionaries belong to context 0. 338 If the value of the field NDict is lower than 8, then up to NDict 339 dictionaries will be initialized. 341 4. Dictionary State 343 Both the server and the client MUST process the SET_DICTIONARY and 344 USE_DICTIONARY frames in the order they are sent/received, with the 345 exception when both are sent over the same stream. In that case 346 USE_DICTIONARY is processed prior to the SET_DICTIONARY frames. 348 Doing otherwise will result in an illegal state of the dictionaries. 349 This is similar to the way HEADER frames are processed in order to 350 maintain legal HPACK state on the server and the client. 352 A possible dictionary implementation can be describes as follows: 354 struct { 355 u8 id; 356 u8 ctx; 357 u64 size; 358 u8 dict[size]; 359 } D; 361 The collection of dictionaries could then be described as: 363 D dictionaries[NDict]; 365 Initially all the dictionaries are unitialized: 367 for (i = 0; i < NDict; i++) { 368 dictionaries[i] = {id = i, ctx = 0, size = 0, dict = {}}; 369 } 371 Client side USE_DICTIONARY frame behaviour pseudo code: 373 dictionary = dictionaries[frame.Dictionary-ID] 375 if (dictionary.ctx != 0 && dictionary.ctx != stream.ctx) 376 return PROTOCOL_ERROR 378 stream.decompressed_data = decompress(stream.dict, stream.data) 380 Client side SET_DICTIONARY frame behaviour pseudo code: 382 foreach entry = frame.Dictionary-Entry { 383 dictionary = dictionaries[entry.DICT_ID] 385 if (entry.size == 0) { 386 dictionary.size = 0 387 dictionary.ctx = 0 388 dictionary.dict = {} 389 continue 390 } 392 if (dictionary.ctx != 0 && dictionary.ctx != stream.ctx) { 393 return PROTOCOL_ERROR 394 } 396 dictionary.ctx = stream.ctx 398 if (entry.P == 1) { 399 size = 1 << entry.Size 400 } else { 401 size = entry.Size 402 } 404 if (frame.APPEND) { 405 if (entry.E == 1) { 406 truncate = dictionary.size 407 } else { 408 truncate = entry.Truncate 409 } 410 } else { 411 truncate = 0 412 } 414 if (frame.OFFSET) { 415 offset = entry.Offset 416 } else { 417 offset = 0 418 } 420 new_dict_data = stream.decompressed_data[offset:offset + size] 421 if (entry.D == 1) { 422 old_dict_data = head(dictionary.dict, truncate) 423 } else { 424 old_dict_data = tail(dictionary.dict, truncate) 425 } 427 dict_data = append(old_dict_data, new_dict_data) 429 dictionary.dict = tail(dict_data, 1 << settings.DSize) 430 dictionary.size = len(dictionary.dict) 431 } 433 The server behaviour mirrors the client behaviour, but it is up to 434 the server to choose the best dictionary. 436 4.1. Attack scenarios and mitigations 438 A single HTTP/2 connection is likely to be shared among multiple 439 origins (over which it is authoritative) and among different 440 navigation contexts to the same origin. When such sharing happens, 441 and if compression contexts are shared between those instances, an 442 attacker can use a BREACH-style attack in order to exfiltrate secrets 443 from the context. Such secrets may include: 445 o Cookies set using Javascript (and in-particular "httponly" cookies 446 set from anonymous functions in external JS, which is not 447 accessible to scripts otherwise) 449 o CSRF tokens 451 o CSP nonces 453 o Application level secrets (e.g. financial information, stored 454 credit cards numbers, codes, etc.) 456 The mechanism for such data theft can happen if the attacker can: * 457 Download multiple similar payloads to the target page modulo the 458 actual secret, while trying out multiple permutations of the secret. 459 * Observe the on-the-wire transfer size using Resource Timing's 460 "transferSize" property. 462 The rest of this section will describe different scenarios where 463 those conditions are met as well as potential mitigations for them. 465 4.1.1. Cross-origin secret leak 467 An HTTP/2 session can be used to deliver resources from multiple 468 origins over which the session has proved to be authoritative, 469 through connection reuse (see [RFC7540] section 9.1.1 for more 470 details). As a result, sharing compression contexts between such 471 origins can be theoretically used to leak secrets from one of these 472 origins to the next. 474 4.1.1.1. Mitigation 476 Limiting compression contexts to be used within the confines of a 477 single origin. 479 4.1.2. Same-origin secret leak 481 Malicious pages on the origin as well as an XSS attacker can normally 482 use "fetch()" or "XMLHttpRequest()" in order to inspect in-content 483 secrets. This could be limited with CSP by only permitting the 484 download of specific files, using nonces or using "connect-src 485 'none'" in order to limit arbitrary scripts from downloading files 486 that contain secrets. However, using shared-dictionaries between 487 secret resources and malicious ones can enable an attacker to guess 488 said secrets and exfiltrate them (e.g. using other deficiencies in 489 the defined CSP, if there are any). 491 Furthermore, said malicious page or XSS attack can also use as a 492 dictionary resources fetched from the same origin in a different 493 browsing context, enabling it to also inspect resources which cannot 494 be fetched at all on its base page. 496 4.1.2.1. Mitigation 498 There's no obvious mitigation for this kind of attack, but a few 499 options are: 501 o Limiting compression contexts to be used only within a single 502 navigation context can limit the opportunity for the separate 503 navigation context to inspect secrets from resources it is not 504 allowed to fetch. At the same time this can be complex to 505 implement, as the network layer is not aware of the navigation 506 context and is supposed for example to dedupe outgoing requests 507 from different compression contexts. 509 o "transferSize" padding/bucketing in such cases (e.g. pages with 510 above mentioned CSP limitations) may be enough to render this 511 attack not-practical. 513 o Limit dictionary sharing (or "transferSize" accuracy for resources 514 that use shared dictionaries) only to non-credentialed resource 515 fetches. 517 5. References 518 5.1. Normative References 520 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 521 Requirement Levels", BCP 14, RFC 2119, 522 DOI 10.17487/RFC2119, March 1997, 523 . 525 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 526 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 527 Transfer Protocol -- HTTP/1.1", RFC 2616, 528 DOI 10.17487/RFC2616, June 1999, 529 . 531 [RFC7231] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 532 Protocol (HTTP/1.1): Semantics and Content", RFC 7231, 533 DOI 10.17487/RFC7231, June 2014, 534 . 536 [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext 537 Transfer Protocol Version 2 (HTTP/2)", RFC 7540, 538 DOI 10.17487/RFC7540, May 2015, 539 . 541 [RFC7541] Peon, R. and H. Ruellan, "HPACK: Header Compression for 542 HTTP/2", RFC 7541, DOI 10.17487/RFC7541, May 2015, 543 . 545 5.2. Informative References 547 [BREACH] Prado, A., Harris, N., and Y. Gluck, "BREACH: SSL, Gone in 548 30 Seconds", 2013, . 550 [RFC1951] Deutsch, P., "DEFLATE Compressed Data Format Specification 551 version 1.3", RFC 1951, DOI 10.17487/RFC1951, May 1996, 552 . 554 [RFC7932] Alakuijala, J. and Z. Szabadka, "Brotli Compressed Data 555 Format", RFC 7932, DOI 10.17487/RFC7932, July 2016, 556 . 558 Authors' Addresses 560 Vlad Krasnov 561 Cloudflare, Inc. 563 Email: vlad@cloudflare.com 564 Yoav Weiss 565 Akamai Technologies, Inc. 567 Email: yoav@yoav.ws