idnits 2.17.1 draft-yasskin-wpack-bundled-exchanges-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 2 instances of too long lines in the document, the longest one being 7 characters in excess of 72. ** The abstract seems to contain references ([2], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 26, 2019) is 1664 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 1224 -- Looks like a reference, but probably isn't: '2' on line 1226 -- Looks like a reference, but probably isn't: '3' on line 1228 -- Looks like a reference, but probably isn't: '4' on line 1230 -- Looks like a reference, but probably isn't: '5' on line 1232 == Outdated reference: A later version (-16) exists of draft-ietf-cbor-7049bis-07 -- Possible downref: Normative reference to a draft: ref. 'CBORbis' -- Possible downref: Non-RFC (?) normative reference: ref. 'FETCH' == Outdated reference: A later version (-06) exists of draft-ietf-httpbis-variants-05 -- Possible downref: Normative reference to a draft: ref. 'I-D.ietf-httpbis-variants' == Outdated reference: A later version (-09) exists of draft-yasskin-http-origin-signed-responses-06 -- Possible downref: Non-RFC (?) normative reference: ref. 'INFRA' ** Obsolete normative reference: RFC 7540 (Obsoleted by RFC 9113) -- Possible downref: Non-RFC (?) normative reference: ref. 'URL' == Outdated reference: A later version (-02) exists of draft-yasskin-webpackage-use-cases-01 Summary: 3 errors (**), 0 flaws (~~), 5 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Yasskin 3 Internet-Draft Google 4 Intended status: Standards Track September 26, 2019 5 Expires: March 29, 2020 7 Bundled HTTP Exchanges 8 draft-yasskin-wpack-bundled-exchanges-02 10 Abstract 12 Bundled exchanges provide a way to bundle up groups of HTTP 13 request+response pairs to transmit or store them together. They can 14 include multiple top-level resources with one identified as the 15 default by a manifest, provide random access to their component 16 exchanges, and efficiently store 8-bit resources. 18 Note to Readers 20 Discussion of this draft takes place on the wpack mailing list 21 (wpack@ietf.org), which is archived at 22 https://www.ietf.org/mailman/listinfo/wpack [1]. 24 The source code and issues list for this draft can be found in 25 https://github.com/WICG/webpackage [2]. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at https://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on March 29, 2020. 44 Copyright Notice 46 Copyright (c) 2019 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (https://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 62 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 63 1.2. Mode of specification . . . . . . . . . . . . . . . . . . 3 64 2. Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . 3 65 2.1. Stream attributes and operations . . . . . . . . . . . . 4 66 2.2. Load a bundle's metadata . . . . . . . . . . . . . . . . 4 67 2.2.1. Load a bundle's metadata from the end . . . . . . . . 5 68 2.3. Load a response from a bundle . . . . . . . . . . . . . . 5 69 3. Format . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 70 3.1. Top-level structure . . . . . . . . . . . . . . . . . . . 6 71 3.2. Serving constraints . . . . . . . . . . . . . . . . . . . 7 72 3.3. Load a bundle's metadata . . . . . . . . . . . . . . . . 7 73 3.3.1. Parsing the index section . . . . . . . . . . . . . . 10 74 3.3.2. Parsing the manifest section . . . . . . . . . . . . 13 75 3.3.3. Parsing the signatures section . . . . . . . . . . . 14 76 3.3.4. Parsing the critical section . . . . . . . . . . . . 15 77 3.3.5. The responses section . . . . . . . . . . . . . . . . 16 78 3.3.6. Starting from the end . . . . . . . . . . . . . . . . 16 79 3.4. Load a response from a bundle . . . . . . . . . . . . . . 17 80 3.5. Parsing CBOR items . . . . . . . . . . . . . . . . . . . 19 81 3.5.1. Parse a known-length item . . . . . . . . . . . . . . 19 82 3.5.2. Parsing variable-length data from a bytestring . . . 19 83 3.5.3. Parsing the type and argument of a CBOR item . . . . 20 84 3.6. Interpreting CBOR HTTP headers . . . . . . . . . . . . . 20 85 4. Guidelines for bundle authors . . . . . . . . . . . . . . . . 21 86 5. Security Considerations . . . . . . . . . . . . . . . . . . . 22 87 5.1. Version skew . . . . . . . . . . . . . . . . . . . . . . 22 88 5.2. Content sniffing . . . . . . . . . . . . . . . . . . . . 22 89 6. IANA considerations . . . . . . . . . . . . . . . . . . . . . 23 90 6.1. Internet Media Type Registration . . . . . . . . . . . . 23 91 6.2. Web Bundle Section Name Registry . . . . . . . . . . . . 24 92 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 25 93 7.1. Normative References . . . . . . . . . . . . . . . . . . 25 94 7.2. Informative References . . . . . . . . . . . . . . . . . 27 95 7.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 27 96 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 27 97 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 28 98 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 28 100 1. Introduction 102 To satisfy the use cases in [I-D.yasskin-webpackage-use-cases], this 103 document proposes a new bundling format to group HTTP resources. 104 Several of the use cases require the resources to be signed: that's 105 provided by bundling signed exchanges 106 ([I-D.yasskin-http-origin-signed-responses]) rather than natively in 107 this format. 109 1.1. Terminology 111 Exchange (noun) An HTTP request+response pair. This can either be a 112 request from a client and the matching response from a server or 113 the request in a PUSH_PROMISE and its matching response stream. 114 Defined by Section 8 of [RFC7540]. 116 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 117 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 118 "OPTIONAL" in this document are to be interpreted as described in BCP 119 14 [RFC2119] [RFC8174] when, and only when, they appear in all 120 capitals, as shown here. 122 1.2. Mode of specification 124 This specification defines how conformant bundle parsers work. It 125 does not constrain how encoders produce a bundle: although there are 126 some guidelines in Section 4, encoders MAY produce any sequence of 127 bytes that a conformant parser would parse into the intended 128 semantics. 130 This specification uses the conventions and terminology defined in 131 the Infra Standard ([INFRA]). 133 2. Semantics 135 A bundle is logically a set of HTTP exchanges, with a URL identifying 136 the manifest(s) of the bundle itself. 138 While the order of the exchanges is not semantically meaningful, it 139 can significantly affect performance when the bundle is loaded from a 140 network stream. 142 A bundle is parsed from a stream of bytes, which is assumed to have 143 the attributes and operations described in Section 2.1. 145 Bundle parsers support two operations, Load a bundle's metadata 146 (Section 2.2) and Load a response from a bundle (Section 2.3) each of 147 which can return an error instead of their normal result. 149 A client is expected to load the metadata for a bundle as soon as it 150 starts downloading it or otherwise discovers it. Then, when fetching 151 ([FETCH]) a request, the client is expected to match it against the 152 requests in the metadata, and if one matches, load that request's 153 response. 155 2.1. Stream attributes and operations 157 o A sequence of *available bytes*. As the stream delivers bytes, 158 these are appended to the available bytes. 160 o An *EOF* flag that's true if the available bytes include the 161 entire stream. 163 o A *current offset* within the available bytes. 165 o A *seek to offset N* operation to set the current offset to N 166 bytes past the beginning of the available bytes. A seek past the 167 end of the available bytes blocks until N bytes are available. If 168 the stream ends before enough bytes are received, either due to a 169 network error or because the stream has a finite length, the seek 170 fails. 172 o A *read N bytes* operation, which blocks until N bytes are 173 available past the current offset, and then returns them and seeks 174 forward by N bytes. If the stream ends before enough bytes are 175 received, either due to a network error or because the stream has 176 a finite length, the read operation returns an error instead. 178 2.2. Load a bundle's metadata 180 This takes the bundle's stream and returns either an error (where an 181 error is a "format error" or a "version error"), an error with a 182 fallback URL (which is also the primaryUrl when the bundle parses 183 successfully), or a map ([INFRA]) of metadata containing at least 184 keys named: 186 primaryUrl The URL of the main resource in the bundle. If the 187 client can't process the bundle for any reason, this is also the 188 fallback URL, a reasonable URL to try to load instead. 190 requests A map ([INFRA]) whose keys are URLs and whose values 191 consist of either: 193 * A single "ResponseMetadata" value for a non-content-negotiated 194 resource or 196 * A set of content-negotiated resources represented by 198 + A "Variants" header field value 199 ([I-D.ietf-httpbis-variants]) and 201 + A map ([INFRA]) from each of the possible combinations of 202 one available-value for each variant-axis to a 203 "ResponseMetadata" structure. Load a response from a bundle 204 can use the "ResponseMetadata" structures to find the 205 matching response. 207 manifest The URL of the bundle's manifest(s). This is a URL to 208 support bundles with multiple different manifests, where the 209 client uses content negotiation to select the most appropriate 210 one. 212 The map may include other items added by sections defined in the 213 Web Bundle Section Name Registry. 215 This operation only waits for a prefix of the stream that, if the 216 bundle is encoded with the "responses" section last, ends before the 217 first response. 219 This operation's implementation is in Section 3.3. 221 2.2.1. Load a bundle's metadata from the end 223 If a bundle's bytes are embedded in a longer sequence rather than 224 being streamed, a parser can also load them starting from a pointer 225 to the last byte of the bundle. This returns the same data as 226 Section 2.2. 228 This operation's implementation is in Section 3.3.6. 230 2.3. Load a response from a bundle 232 This takes the stream of bytes representing the bundle, a request 233 ([FETCH]), and the "ResponseMetadata" returned from Section 2.2 for 234 the appropriate content-negotiated resource within the request's URL, 235 and returns the response ([FETCH]) matching that request. 237 This operation can be completed without inspecting bytes other than 238 those that make up the loaded response, although higher-level 239 operations like proving that an exchange is correctly signed 240 ([I-D.yasskin-http-origin-signed-responses]) may need to load other 241 responses. 243 A client will generally want to load the response for a request that 244 the client generated. For a URL with multiple variants, the client 245 SHOULD use the algorithm in Section 4 of [I-D.ietf-httpbis-variants] 246 to select the best variant. 248 This operation's implementation is in Section 3.4. 250 3. Format 252 3.1. Top-level structure 254 _This section is non-normative._ 256 A bundle holds a series of named sections. The beginning of the 257 bundle maps section names to the range of bytes holding that section. 258 The most important section is the "index" (Section 3.3.1), which 259 similarly maps serialized HTTP requests to the range of bytes holding 260 that request's serialized response. Byte ranges are represented 261 using an offset from some point in the bundle _after_ the encoding of 262 the range itself, to reduce the amount of work needed to use the 263 shortest possible encoding of the range. 265 Future specifications can define new sections with extra data, and if 266 necessary, these sections can be marked "critical" (Section 3.3.4) to 267 prevent older parsers from using the rest of the bundle incorrectly. 269 The bundle is a CBOR item ([CBORbis]) with the following CDDL 270 ([CDDL]) schema: 272 webbundle = [ 273 ; 🌐📦 in UTF-8. 274 magic: h'F0 9F 8C 90 F0 9F 93 A6', 275 version: bytes .size 4, 276 primary-url: whatwg-url, 277 section-lengths: bytes .cbor [* (section-name: tstr, length: uint) ], 278 sections: [* any ], 279 length: bytes .size 8, ; Big-endian number of bytes in the bundle. 280 ] 282 $section-name /= "index" / "manifest" / "signatures" / "critical" / "responses" 284 $section /= index / manifest / signatures / critical / responses 286 responses = [*response] 288 whatwg-url = tstr 290 3.2. Serving constraints 292 When served over HTTP, a response containing an "application/ 293 webbundle" payload MUST include at least the following response 294 header fields, to reduce content sniffing vulnerabilities 295 (Section 5.2): 297 o Content-Type: application/webbundle 299 o X-Content-Type-Options: nosniff 301 3.3. Load a bundle's metadata 303 A bundle holds a series of sections, which can be accessed randomly 304 using the information in the "section-lengths" CBOR item, which holds 305 a list of alternating section names and section lengths: 307 section-lengths = [* (section-name: tstr, length: uint) ], 309 To implement Section 2.2, the parser MUST run the following steps, 310 taking the "stream" as input. 312 1. Seek to offset 0 in "stream". Assert: this operation doesn't 313 fail. 315 2. If reading 10 bytes from "stream" returns an error or doesn't 316 return the bytes with hex encoding "86 48 F0 9F 8C 90 F0 9F 93 317 A6" (the CBOR encoding of the 6-item array initial byte and 318 8-byte bytestring initial byte, followed by 🌐📦 319 in UTF-8), return a "format error". 321 3. Let "version" be the result of reading 5 bytes from "stream". 322 If this is an error, return a "format error". 324 4. Let "urlType" and "urlLength" be the result of reading the type 325 and argument of a CBOR item from "stream" (Section 3.5.3). If 326 this is an error or "urlType" is not 3 (a CBOR text string), 327 return a "format error". 329 5. Let "fallbackUrlBytes" be the result of reading "urlLength" 330 bytes from "stream". If this is an error, return a "format 331 error". 333 6. Let "fallbackUrl" be the result of parsing ([URL]) the UTF-8 334 decoding of "fallbackUrlBytes" with no base URL. If either the 335 UTF-8 decoding or parsing fails, return a "format error". 337 Note: From this point forward, errors also include the fallback 338 URL to help clients recover. 340 7. If "version" does not have the hex encoding "44 31 00 00 00" 341 (the CBOR encoding of a 4-byte byte string holding an ASCII "1" 342 followed by three 0 bytes), return a "version error" with 343 "fallbackUrl". 345 Note: RFC EDITOR PLEASE DELETE THIS NOTE; Implementations of 346 drafts of this specification MUST NOT use the version "1" in 347 this byte string, and MUST instead define an implementation- 348 specific string to identify which draft is implemented. This 349 string SHOULD match the version used in the draft's MIME type 350 (Section 6.1). 352 8. Let "sectionLengthsLength" be the result of getting the length 353 of the CBOR bytestring header from "stream" (Section 3.5.2). If 354 this is an error, return a "format error" with "fallbackUrl". 356 9. If "sectionLengthsLength" is 8192 (8*1024) or greater, return a 357 "format error" with "fallbackUrl". 359 10. Let "sectionLengthsBytes" be the result of reading 360 "sectionLengthsLength" bytes from "stream". If 361 "sectionLengthsBytes" is an error, return a "format error" with 362 "fallbackUrl". 364 11. Let "sectionLengths" be the result of parsing one CBOR item 365 (Section 3.5) from "sectionLengthsBytes", matching the section- 366 lengths rule in the CDDL ([CDDL]) above. If "sectionLengths" is 367 an error, return a "format error" with "fallbackUrl". 369 12. Let ("sectionsType", "numSections") be the result of parsing the 370 type and argument of a CBOR item from "stream" (Section 3.5.3). 372 13. If "sectionsType" is not "4" (a CBOR array) or "numSections" is 373 not half of the length of "sectionLengths", return a "format 374 error" with "fallbackUrl". 376 14. Let "sectionsStart" be the current offset within "stream". 378 For example, if "sectionLengthsLength" were 52 and 379 "sectionLengths" contained 4 items (2 sections), "sectionsStart" 380 would be 65 (10 initial bytes + a 2-byte bytestring header to 381 describe a 52-byte bytestring + 52 bytes of section lengths + a 382 1-byte array header for the 2 sections). 384 15. Let "knownSections" be the subset of the Section 6.2 that this 385 client has implemented. 387 16. Let "ignoredSections" be an empty set. 389 17. Let "sectionOffsets" be an empty map ([INFRA]) from section 390 names to (offset, length) pairs. These offsets are relative to 391 the start of "stream". 393 18. Let "currentOffset" be "sectionsStart". 395 19. For each (""name"", "length") pair of adjacent elements in 396 "sectionLengths": 398 1. If ""name""'s specification in "knownSections" says not to 399 process other sections, add those sections' names to 400 "ignoredSections". 402 Note: The "ignoredSections" enables sections that supercede 403 other sections to be introduced in the future. 404 Implementations that don't implement any such sections are 405 free to omit the relevant steps. 407 2. If "sectionOffsets["name"]" exists, return a "format error" 408 with "fallbackUrl". That is, duplicate sections are 409 forbidden. 411 3. Set "sectionOffsets["name"]" to ("currentOffset", "length"). 413 4. Set "currentOffset" to "currentOffset + length". 415 20. If the "responses" section is not last in "sectionLengths", 416 return a "format error" with "fallbackUrl". This allows a 417 streaming parser to assume that it'll know the requests by the 418 time their responses arrive. 420 21. Let "metadata" be a map ([INFRA]) initially containing the 421 single key/value pair ""primaryUrl""/"fallbackUrl". 423 22. For each ""name"" --> ("offset", "length") triple in 424 "sectionOffsets": 426 1. If ""name"" isn't in "knownSections", continue to the next 427 triple. 429 2. If ""name""'s Metadata field (Section 6.2) is "No", continue 430 to the next triple. 432 3. If ""name"" is in "ignoredSections", continue to the next 433 triple. 435 4. Seek to offset "offset" in "stream". If this fails, return 436 a "format error" with "fallbackUrl". 438 5. Let "sectionContents" be the result of reading "length" 439 bytes from "stream". If "sectionContents" is an error, 440 return a "format error" with "fallbackUrl". 442 6. Follow ""name""'s specification from "knownSections" to 443 process the section, passing "sectionContents", "stream", 444 "sectionOffsets", and "metadata". If this returns an error, 445 return a "format error" with "fallbackUrl". 447 23. Assert: "metadata" has an entry with the key "primaryUrl". 449 24. If "metadata" doesn't have entries with keys "requests" and 450 "manifest", return a "format error" with "fallbackUrl". 452 25. Return "metadata". 454 3.3.1. Parsing the index section 456 The "index" section defines the set of HTTP requests in the bundle 457 and identifies their locations in the "responses" section. It 458 consists of a map from URL strings to arrays consisting of a 459 "Variants" header field value ([I-D.ietf-httpbis-variants]) followed 460 by one "location-in-responses" pair for each of the possible 461 combinations of available-values within the "Variants" value in 462 lexicographic (row-major) order. 464 For example, given a "variants-value" of "Accept-Encoding;gzip;br, 465 Accept-Language;en;fr;ja", the list of "location-in-responses" pairs 466 will correspond to the "VariantKey"s: 468 o gzip;en 470 o gzip;fr 472 o gzip;ja 474 o br;en 476 o br;fr 478 o br;ja 480 The order of variant-axes is important. If the "variants-value" were 481 "Accept-Language;en;fr;ja, Accept-Encoding;gzip;br" instead, the 482 "location-in-responses" pairs would instead correspond to: 484 o en;gzip 486 o en;br 488 o fr;gzip 490 o fr;br 492 o ja;gzip 494 o ja;br 496 As a special case, an empty "variants-value" indicates that there is 497 only one resource at the specified URL and that no content 498 negotiation is performed. 500 index = {* whatwg-url => [ variants-value, +location-in-responses ] } 501 variants-value = bstr 502 location-in-responses = (offset: uint, length: uint) 504 A "ResponseMetadata" struct identifies a byte range within the bundle 505 stream, defined by an integer offset from the start of the stream and 506 the integer number of bytes in the range. 508 To parse the index section, given its "sectionContents", the 509 "sectionOffsets" map, and the "metadata" map to fill in, the parser 510 MUST do the following: 512 1. Let "index" be the result of parsing "sectionContents" as a CBOR 513 item matching the "index" rule in the above CDDL (Section 3.5). 514 If "index" is an error, return an error. 516 2. Let "requests" be an initially-empty map ([INFRA]) from URLs to 517 response descriptions, each of which is either a single 518 "location-in-stream" value or a pair of a "Variants" header field 519 value ([I-D.ietf-httpbis-variants]) and a map from that value's 520 possible "Variant-Key"s to "location-in-stream" values, as 521 described in Section 2.2. 523 3. Let "MakeRelativeToStream" be a function that takes a "location- 524 in-responses" value ("offset", "length") and returns a 525 "ResponseMetadata" struct or error by running the following sub- 526 steps: 528 1. If "offset" + "length" is larger than 529 "sectionOffsets["responses"].length", return an error. 531 2. Otherwise, return a "ResponseMetadata" struct whose offset is 532 "sectionOffsets["responses"].offset" + "offset" and whose 533 length is "length". 535 4. For each ("url", "responses") entry in the "index" map: 537 1. Let "parsedUrl" be the result of parsing ([URL]) "url" with 538 no base URL. 540 2. If "parsedUrl" is a failure, its fragment is not null, or it 541 includes credentials, return an error. 543 3. If the first element of "responses" is the empty string: 545 1. If the length of "responses" is not 3 (i.e. there is more 546 than one "location-in-responses" in responses), return an 547 error. 549 2. Otherwise, assert that "requests"["parsedUrl"] does not 550 exist, and set "requests"["parsedUrl"] to 551 "MakeRelativeToStream(location-in-responses)", where 552 "location-in-responses" is the second and third elements 553 of "responses". If that returns an error, return an 554 error. 556 4. Otherwise: 558 1. Let "variants" be the result of parsing the first element 559 of "responses" as the value of the "Variants" HTTP header 560 field (Section 2 of [I-D.ietf-httpbis-variants]). If 561 this fails, return an error. 563 2. Let "variantKeys" be the Cartesian product of the lists 564 of available-values for each variant-axis in 565 lexicographic (row-major) order. See the examples above. 567 3. If the length of "responses" is not "2 * len(variantKeys) 568 + 1", return an error. 570 4. Set "requests"["parsedUrl"] to a map from 571 "variantKeys"["i"] to the result of calling 572 "MakeRelativeToStream" on the "location-in-responses" at 573 "responses"["2*i+1"] and "responses"["2*i+2"], for "i" in 574 ["0", "len(variantKeys)"). If any "MakeRelativeToStream" 575 call returns an error, return an error. 577 5. Set "metadata["requests"]" to "requests". 579 3.3.2. Parsing the manifest section 581 The "manifest" section records a single URL identifying the manifest 582 of the bundle. The URL MUST refer to the one or more response(s) 583 contained in the bundle itself. 585 The bundle can contain multiple resources at this URL, and the client 586 is expected to content-negotiate for the best one. For example, a 587 client might select the one with an "accept" header of "application/ 588 manifest+json" ([appmanifest]) and an "accept-language" header of 589 "es-419". 591 manifest = whatwg-url 593 To parse the manifest section, given its "sectionContents" and the 594 "metadata" map to fill in, the parser MUST do the following: 596 1. Let "urlString" be the result of parsing "sectionContents" as a 597 CBOR item matching the above "manifest" rule (Section 3.5. If 598 "urlString" is an error, return that error. 600 2. Let "url" be the result of parsing ([URL]) "urlString" with no 601 base URL. 603 3. If "url" is a failure, its fragment is not null, or it includes 604 credentials, return an error. 606 4. Set "metadata["manifest"]" to "url". 608 3.3.3. Parsing the signatures section 610 The "signatures" section vouches for the resources in the bundle. 612 The section can contain as many signatures as needed, each by some 613 authority, and each covering an arbitrary subset of the resources in 614 the bundle. Intermediates, including attackers, can remove 615 signatures from the bundle without breaking the other signatures. 617 The bundle parser's client is responsible to determine the validity 618 and meaning of each authority's signatures. In particular, the 619 algorithm below does not check that signatures are valid. For 620 example, a client might: 622 o Use the ecdsa_secp256r1_sha256 algorithm defined in Section 4.2.3 623 of [TLS1.3] to check the validity of any signature with an EC 624 public key on the secp256r1 curve. 626 o Reject all signatures by an RSA public key. 628 o Treat an X.509 certificate with the CanSignHttpExchanges extension 629 (Section 4.2 of [I-D.yasskin-http-origin-signed-responses]) and a 630 valid chain to a trusted root as an authority that vouches for the 631 authenticity of resources claimed to come from that certificate's 632 domains. 634 o Treat an X.509 certificate with another extension or EKU as 635 vouching that a particular analysis has run over the signed 636 resources without finding malicious behavior. 638 A client might also choose different behavior for those kinds of 639 authorities and keys. 641 signatures = [ 642 authorities: [*authority], 643 vouched-subsets: [*{ 644 authority: index-in-authorities, 645 sig: bstr, 646 signed: bstr ; Expected to hold a signed-subset item. 647 }], 648 ] 649 authority = augmented-certificate 650 index-in-authorities = uint 652 signed-subset = { 653 validity-url: whatwg-url, 654 auth-sha256: bstr, 655 date: uint, 656 expires: uint, 657 subset-hashes: {+ 658 whatwg-url => [variants-value, +resource-integrity] 659 }, 660 * tstr => any, 661 } 662 resource-integrity = (header-sha256: bstr, payload-integrity-header: tstr) 664 The "augmented-certificate" CDDL rule comes from Section 3.3 of 665 [I-D.yasskin-http-origin-signed-responses]. 667 To parse the signatures section, given its "sectionContents", the 668 "sectionOffsets" map, and the "metadata" map to fill in, the parser 669 MUST do the following: 671 1. Let "signatures" be the result of parsing "sectionContents" as a 672 CBOR item matching the "signatures" rule in the above CDDL 673 (Section 3.5). 675 2. Set "metadata["authorities"]" to the list of authorities in the 676 first element of the "signatures" array. 678 3. Set "metadata["vouched-subsets"]" to the second element of the 679 "signatures" array. 681 3.3.4. Parsing the critical section 683 The "critical" section lists sections of the bundle that the client 684 needs to understand in order to load the bundle correctly. Other 685 sections are assumed to be optional. 687 critical = [*tstr] 688 To parse the critical section, given its "sectionContents" and the 689 "metadata" map to fill in, the parser MUST do the following: 691 1. Let "critical" be the result of parsing "sectionContents" as a 692 CBOR item matching the above "critical" rule (Section 3.5). If 693 "critical" is an error, return that error. 695 2. For each value "sectionName" in the "critical" list, if the 696 client has not implemented sections named "sectionName", return 697 an error. 699 This section does not modify the returned metadata. 701 3.3.5. The responses section 703 The responses section does not add any items to the bundle metadata 704 map. Instead, its offset and length are used in processing the index 705 section (Section 3.3.1). 707 3.3.6. Starting from the end 709 The length of a bundle is encoded as a big-endian integer inside a 710 CBOR byte string at the end of the bundle. 712 +------------+-----+----+----+----+----+----+----+----+----+----+ 713 | first byte | ... | 48 | 00 | 00 | 00 | 00 | 00 | BC | 61 | 4E | 714 +------------+-----+----+----+----+----+----+----+----+----+----+ 715 / \ 716 0xBC614E-10=12345668 omitted bytes 718 Figure 1: Example trailing bytes 720 Parsing from the end allows the bundle to be appended to another 721 format such as a self-extracting executable. 723 To implement Section 2.2.1, taking a sequence of bytes "bytes", the 724 client MUST: 726 1. Let "byteStringHeader" be "bytes[bytes.length - 9]". If 727 "byteStringHeader is not "0x48` (the CBOR ([CBORbis]) initial 728 byte for an 8-byte byte string), return an error. 730 2. Let "bundleLength" be "[bytes[bytes.length - 8], 731 bytes[bytes.length])" (the last 8 bytes) interpreted as a big- 732 endian integer. 734 3. If "bundleLength > bytes.length", return an error. 736 4. Let "stream" be a new stream whose: 738 * Available bytes are "[bytes[bytes.length - bundleLength], 739 bytes[bytes.length])". 741 * EOF flag is set. 743 * Current offset is initially 0. 745 * The seek to offset N and read N bytes operations succeed 746 immediately if "currentOffset + N <= bundleLength" and fail 747 otherwise. 749 5. Return the result of running Section 3.3 with "stream" as input. 751 3.4. Load a response from a bundle 753 The result of Load a bundle's metadata maps each URL and Variant-Key 754 ([I-D.ietf-httpbis-variants]) to a response, which consists of 755 headers and a payload. The headers can be loaded from the bundle's 756 stream before waiting for the payload, and similarly the payload can 757 be streamed to downstream consumers. 759 response = [headers: bstr .cbor headers, payload: bstr] 761 To implement Section 2.3, the parser MUST run the following steps, 762 taking the bundle's "stream", a "request" ([FETCH]), and a 763 "responseMetadata" returned by Section 2.2 . 765 1. Seek to offset "responseMetadata.offset" in "stream". If this 766 fails, return an error. 768 2. Read 1 byte from "stream". If this is an error or isn't "0x82", 769 return an error. 771 3. Let "headerLength" be the result of getting the length of a CBOR 772 bytestring header from "stream" (Section 3.5.2). If 773 "headerLength" is an error, return that error. 775 4. If "headerLength" is 524288 (512*1024) or greater, return an 776 error. 778 5. Let "headerCbor" be the result of reading "headerLength" bytes 779 from "stream" and parsing a CBOR item from them matching the 780 "headers" CDDL rule. If either the read or parse returns an 781 error, return that error. 783 6. Let ("headers", "pseudos") be the result of converting 784 "headerCbor" to a header list and pseudoheaders using the 785 algorithm in Section 3.6. If this returns an error, return that 786 error. 788 7. If "pseudos" does not have a key named ':status' or its size 789 isn't 1, return an error. 791 8. If "pseudos[':status']" isn't exactly 3 ASCII decimal digits, 792 return an error. 794 9. Let "payloadLength" be the result of getting the length of a 795 CBOR bytestring header from "stream" (Section 3.5.2). If 796 "payloadLength" is an error, return that error. 798 10. If "payloadLength" is greater than 0 and "headers" does not 799 contain a "Content-Type" header, return an error. 801 The client MUST interpret the following payload as this 802 specified media type instead of trying to sniff a media type 803 from the bytes of the payload, for example by appending an 804 artificial "X-Content-Type-Options: nosniff" header field 805 ([FETCH]) to "headers". 807 11. If "stream.currentOffset + payloadLength != 808 responseMetadata.offset + responseMetadata.length", return an 809 error. 811 12. Let "body" be a new body ([FETCH]) whose stream is a tee'd copy 812 of "stream" starting at the current offset and ending after 813 "payloadLength" bytes. 815 TODO: Add the rest of the details of creating a "ReadableStream" 816 object. 818 13. Let "response" be a new response ([FETCH]) whose: 820 * Url list is "request"'s url list, 822 * status is "pseudos[':status']", 824 * header list is "headers", and 826 * body is "body". 828 14. Return "response". 830 3.5. Parsing CBOR items 832 Parsing a bundle involves parsing many CBOR items. All of these 833 items need to be deterministically encoded. 835 3.5.1. Parse a known-length item 837 To parse a CBOR item ([CBORbis]), optionally matching a CDDL rule 838 ([CDDL]), from a sequence of bytes, "bytes", the parser MUST do the 839 following: 841 1. If "bytes" are not a well-formed CBOR item, return an error. 843 2. If "bytes" does not satisfy the core deterministic encoding 844 requirements from Section 4.2.1 of [CBORbis], return an error. 845 This format does not use floating point values or tags, so this 846 specification does not add any deterministic encoding rules for 847 them. 849 3. If "bytes" includes extra bytes after the encoding of a CBOR 850 item, return an error. 852 4. Let "item" be the result of decoding "bytes" as a CBOR item. 854 5. If a CDDL rule was specified, but "item" does not match it, 855 return an error. 857 6. Return "item". 859 3.5.2. Parsing variable-length data from a bytestring 861 Bundles encode variable-length data in CBOR bytestrings, which are 862 prefixed with their length. This algorithm returns the number of 863 bytes in the variable-length item and sets the stream's current 864 offset to the first byte of the contents. 866 To get the length of a CBOR bytestring header from a bundle's stream, 867 the parser MUST do the following: 869 1. Let ("type", "argument") be the result of parsing the type and 870 argument of a CBOR item from the stream (Section 3.5.3). If this 871 returns an error, return that error. 873 2. If "type" is not "2", the item is not a bytestring. Return an 874 error. 876 3. Return "argument". 878 3.5.3. Parsing the type and argument of a CBOR item 880 To parse the type and argument of a CBOR item from a bundle's stream, 881 the parser MUST do the following. This algorithm returns a pair of 882 the CBOR major type 0-7 inclusive, and a 64-bit integral argument for 883 the CBOR item: 885 1. Let "firstByte" be the result of reading 1 byte from the stream. 886 If "firstByte" is an error, return that error. 888 2. Let "type" be "(firstByte & 0xE0) / 0x20". 890 3. If "firstByte & 0x1F" is: 892 0..23, inclusive Return ("type", "firstByte & 0x1F"). 894 24 Let "content" be the result of reading 1 byte from the stream. 895 If "content" is an error or is less than 24, return an error. 897 25 Let "content" be the result of reading 2 bytes from the 898 stream. If "content" is an error or its first byte is 0, 899 return an error. 901 26 Let "content" be the result of reading 4 bytes from the 902 stream. If "content" is an error or its first two bytes are 903 0, return an error. 905 27 Let "content" be the result of reading 8 bytes from the 906 stream. If "content" is an error or its first four bytes are 907 0, return an error. 909 28..31, inclusive Return an error. Note: This intentionally 910 does not support indefinite-length items. 912 4. Let "argument" be the big-endian integer encoded in "content". 914 5. Return ("type", "argument"). 916 3.6. Interpreting CBOR HTTP headers 918 Bundles represent HTTP requests and responses as a list of headers, 919 matching the following CDDL ([CDDL]): 921 headers = {* bstr => bstr} 923 Pseudo-headers starting with a ":" provide the non-header information 924 needed to create a request or response as appropriate 925 To convert a CBOR item "item" into a [FETCH] header list and 926 pseudoheaders, parsers MUST do the following: 928 1. If "item" doesn't match the "headers" rule in the above CDDL, 929 return an error. 931 2. Let "headers" be a new header list ([FETCH]). 933 3. Let "pseudos" be an empty map ([INFRA]). 935 4. For each pair ("name", "value") in "item": 937 1. If "name" contains any upper-case or non-ASCII characters, 938 return an error. This matches the requirement in 939 Section 8.1.2 of [RFC7540]. 941 2. If "name" starts with a ':': 943 1. Assert: "pseudos[name]" does not exist, because CBOR maps 944 cannot contain duplicate keys. 946 2. Set "pseudos[name]" to "value". 948 3. Continue. 950 3. If "name" or "value" doesn't satisfy the requirements for a 951 header in [FETCH], return an error. 953 4. Assert: "headers" does not contain ([FETCH]) "name", because 954 CBOR maps cannot contain duplicate keys and an earlier step 955 rejected upper-case bytes. 957 Note: This means that a response cannot set more than one 958 cookie, because the "Set-Cookie" header ([RFC6265]) has to 959 appear multiple times to set multiple cookies. 961 5. Append ("name", "value") to "headers". 963 5. Return ("headers", "pseudos"). 965 4. Guidelines for bundle authors 967 Bundles SHOULD consist of a single CBOR item satisfying the core 968 deterministic encoding requirements (Section 3.5) and matching the 969 "webbundle" CDDL rule in Section 3.1. 971 5. Security Considerations 973 5.1. Version skew 975 Bundles currently have no mechanism for ensuring that the signed 976 exchanges they contain constitute a consistent version of those 977 resources. Even if a website never has a security vulnerability when 978 resources are fetched at a single time, an attacker might be able to 979 combine a set of resources pulled from different versions of the 980 website to build a vulnerable site. While the vulnerable site could 981 have occurred by chance on a client's machine due to normal HTTP 982 caching, bundling allows an attacker to guarantee that it happens. 983 Future work in this specification might allow a bundle to constrain 984 its resources to come from a consistent version. 986 5.2. Content sniffing 988 While modern browsers tend to trust the "Content-Type" header sent 989 with a resource, especially when accompanied by "X-Content-Type- 990 Options: nosniff", plugins will sometimes search for executable 991 content buried inside a resource and execute it in the context of the 992 origin that served the resource, leading to XSS vulnerabilities. For 993 example, some PDF reader plugins look for "%PDF" anywhere in the 994 first 1kB and execute the code that follows it. 996 The "application/webbundle" format defined above includes URLs and 997 request headers early in the format, which an attacker could use to 998 cause these plugins to sniff a bad content type. 1000 To avoid vulnerabilities, in addition to the response header 1001 requirements in Section 3.2, servers are advised to only serve an 1002 "application/webbundle" resource from a domain if it would also be 1003 safe for that domain to serve the bundle's content directly, and to 1004 follow at least one of the following strategies: 1006 1. Only serve bundles from dedicated domains that don't have access 1007 to sensitive cookies or user storage. 1009 2. Generate bundles "offline", that is, in response to a trusted 1010 author submitting content or existing signatures reaching a 1011 certain age, rather than in response to untrusted-reader queries. 1013 3. Do all of: 1015 1. If the bundle's contained URLs (e.g. in the manifest and 1016 index) are derived from the request for the bundle, percent- 1017 encode [3] ([URL]) any bytes that are greater than 0x7E or 1018 are not URL code points [4] ([URL]) in these URLs. It is 1019 particularly important to make sure no unescaped nulls (0x00) 1020 or angle brackets (0x3C and 0x3E) appear. 1022 2. Similarly, if the request headers for any contained resource 1023 are based on the headers sent while requesting the bundle, 1024 only include request header field names *and values* that 1025 appear in a static allowlist. Keep the set of allowed 1026 request header fields smaller than 24 elements to prevent 1027 attackers from controlling a whole CBOR length byte. 1029 3. Restrict the number of items a request can direct the server 1030 to include in a bundle to less than 12, again to prevent 1031 attackers from controlling a whole CBOR length byte. 1033 4. Do not reflect request header fields into the set of response 1034 headers. 1036 If the server serves responses that are written by a potential 1037 attacker but then escaped, the "application/webbundle" format allows 1038 the attacker to use the length of the response to control a few bytes 1039 before the start of the response. Any existing mechanisms that 1040 prevent polyglot documents probably keep working in the face of this 1041 new attack, but we don't have a guarantee of that. 1043 To encourage servers to include the "X-Content-Type-Options: nosniff" 1044 header field, clients SHOULD reject bundles served without it. 1046 6. IANA considerations 1048 6.1. Internet Media Type Registration 1050 IANA is requested to register the MIME media type 1051 ([IANA.media-types]) for bundled exchanges, application/webbundle, as 1052 follows: 1054 o Type name: application 1056 o Subtype name: webbundle 1058 o Required parameters: 1060 * v: A string denoting the version of the file format. 1061 ([RFC5234] ABNF: "version = 1*(DIGIT/%x61-7A)") The version 1062 defined in this specification is "1". 1064 Note: RFC EDITOR PLEASE DELETE THIS NOTE; Implementations of 1065 drafts of this specification MUST NOT use simple integers to 1066 describe their versions, and MUST instead define 1067 implementation-specific strings to identify which draft is 1068 implemented. 1070 o Optional parameters: N/A 1072 o Encoding considerations: binary 1074 o Security considerations: See Section 5 of this document. 1076 o Interoperability considerations: N/A 1078 o Published specification: This document 1080 o Applications that use this media type: None yet, but it is 1081 expected that web browsers will use this format. 1083 o Fragment identifier considerations: N/A 1085 o Additional information: 1087 * Deprecated alias names for this type: N/A 1089 * Magic number(s): 86 48 F0 9F 8C 90 F0 9F 93 A6 1091 * File extension(s): .wbn 1093 * Macintosh file type code(s): N/A 1095 o Person & email address to contact for further information: See the 1096 Author's Address section of this specification. 1098 o Intended usage: COMMON 1100 o Restrictions on usage: N/A 1102 o Author: See the Author's Address section of this specification. 1104 o Change controller: The IESG iesg@ietf.org [5] 1106 o Provisional registration? Yes. 1108 6.2. Web Bundle Section Name Registry 1110 IANA is directed to create a new registry with the following 1111 attributes: 1113 Name: Web Bundle Section Names 1114 Review Process: Specification Required 1116 Initial Assignments: 1118 +--------------+---------------+----------+-------------------------+ 1119 | Section Name | Specification | Metadata | Metadata Fields | 1120 +--------------+---------------+----------+-------------------------+ 1121 | "index" | Section 3.3.1 | Yes | "requests" | 1122 | | | | | 1123 | "manifest" | Section 3.3.2 | Yes | "manifest" | 1124 | | | | | 1125 | "signatures" | Section 3.3.3 | Yes | "authorities", | 1126 | | | | "vouched-subsets" | 1127 | | | | | 1128 | "critical" | Section 3.3.4 | Yes | | 1129 | | | | | 1130 | "responses" | Section 3.3.5 | No | | 1131 +--------------+---------------+----------+-------------------------+ 1133 Requirements on new assignments: 1135 Section Names MUST be encoded in UTF-8. 1137 Assignments must specify whether the section is parsed during 1138 Load a bundle's metadata (Metadata=Yes) or not (Metadata=No). 1140 The section's specification can use the bytes making up the section, 1141 the bundle's stream (Section 2.1), and the "sectionOffsets" map 1142 (Section 3.3), as input, and MUST say if an error is returned, and 1143 otherwise what items, if any, are added to the map that Section 3.3 1144 returns. A section's specification MAY say that, if it is present, 1145 another section is not processed. 1147 7. References 1149 7.1. Normative References 1151 [appmanifest] 1152 Caceres, M., Christiansen, K., Lamouri, M., Kostiainen, 1153 A., Dolin, R., and M. Giuca, "Web App Manifest", World 1154 Wide Web Consortium WD WD-appmanifest-20180523, May 2018, 1155 . 1157 [CBORbis] Bormann, C. and P. Hoffman, "Concise Binary Object 1158 Representation (CBOR)", draft-ietf-cbor-7049bis-07 (work 1159 in progress), August 2019. 1161 [CDDL] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data 1162 Definition Language (CDDL): A Notational Convention to 1163 Express Concise Binary Object Representation (CBOR) and 1164 JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, 1165 June 2019, . 1167 [FETCH] WHATWG, "Fetch", September 2019, 1168 . 1170 [I-D.ietf-httpbis-variants] 1171 Nottingham, M., "HTTP Representation Variants", draft- 1172 ietf-httpbis-variants-05 (work in progress), March 2019. 1174 [I-D.yasskin-http-origin-signed-responses] 1175 Yasskin, J., "Signed HTTP Exchanges", draft-yasskin-http- 1176 origin-signed-responses-06 (work in progress), July 2019. 1178 [IANA.media-types] 1179 IANA, "Media Types", 1180 . 1182 [INFRA] WHATWG, "Infra", September 2019, 1183 . 1185 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1186 Requirement Levels", BCP 14, RFC 2119, 1187 DOI 10.17487/RFC2119, March 1997, 1188 . 1190 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 1191 Specifications: ABNF", STD 68, RFC 5234, 1192 DOI 10.17487/RFC5234, January 2008, 1193 . 1195 [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext 1196 Transfer Protocol Version 2 (HTTP/2)", RFC 7540, 1197 DOI 10.17487/RFC7540, May 2015, 1198 . 1200 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1201 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1202 May 2017, . 1204 [URL] WHATWG, "URL", September 2019, 1205 . 1207 7.2. Informative References 1209 [I-D.yasskin-webpackage-use-cases] 1210 Yasskin, J., "Use Cases and Requirements for Web 1211 Packages", draft-yasskin-webpackage-use-cases-01 (work in 1212 progress), March 2018. 1214 [RFC6265] Barth, A., "HTTP State Management Mechanism", RFC 6265, 1215 DOI 10.17487/RFC6265, April 2011, 1216 . 1218 [TLS1.3] Rescorla, E., "The Transport Layer Security (TLS) Protocol 1219 Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, 1220 . 1222 7.3. URIs 1224 [1] https://www.ietf.org/mailman/listinfo/wpack 1226 [2] https://github.com/WICG/webpackage 1228 [3] https://url.spec.whatwg.org/#percent-encode 1230 [4] https://url.spec.whatwg.org/#url-code-points 1232 [5] mailto:iesg@ietf.org 1234 Appendix A. Change Log 1236 RFC EDITOR PLEASE DELETE THIS SECTION. 1238 draft-02 1240 o Fix the initial bytes of the format. 1242 o Allow empty responses to omit their content type. 1244 o Provisionally register application/webbundle. 1246 draft-01 1248 o Include only section lengths in the section index, requiring 1249 sections to be listed in order. 1251 o Have the "index" section map URLs to sets of responses negotiated 1252 using the Variants system ([I-D.ietf-httpbis-variants]). 1254 o Require the "manifest" to be embedded into the bundle. 1256 o Add a content sniffing security consideration. 1258 o Add a version string to the format and its mime type. 1260 o Add a fallback URL in a fixed location in the format, and use that 1261 fallback URL as the primary URL of the bundle. 1263 o Add a "signatures" section to let authorities (like domain-trusted 1264 X.509 certificates) vouch for subsets of a bundle. 1266 o Use the CBORbis "deterministic encoding" requirements instead of 1267 "canonicalization" requirements. 1269 Appendix B. Acknowledgements 1271 Thanks to the Chrome loading team, especially Kinuko Yasuda and 1272 Kouhei Ueno for making the format work well when streamed. 1274 Author's Address 1276 Jeffrey Yasskin 1277 Google 1279 Email: jyasskin@chromium.org