idnits 2.17.1 draft-ietf-httpbis-header-structure-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([2], [3], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 27, 2017) is 2343 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 658 -- Looks like a reference, but probably isn't: '2' on line 660 -- Looks like a reference, but probably isn't: '3' on line 662 == Missing Reference: 'RFCxxxx' is mentioned on line 153, but not defined ** Obsolete normative reference: RFC 7230 (Obsoleted by RFC 9110, RFC 9112) -- Obsolete informational reference (is this intentional?): RFC 7231 (Obsoleted by RFC 9110) -- Obsolete informational reference (is this intentional?): RFC 7540 (Obsoleted by RFC 9113) Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 HTTP Working Group M. Nottingham 3 Internet-Draft Fastly 4 Intended status: Standards Track P-H. Kamp 5 Expires: May 31, 2018 The Varnish Cache Project 6 November 27, 2017 8 Structured Headers for HTTP 9 draft-ietf-httpbis-header-structure-02 11 Abstract 13 This document describes Structured Headers, a way of simplifying HTTP 14 header field definition and parsing. It is intended for use by new 15 specifications of HTTP header fields. This includes revisions of 16 existing specifications when doing so does not cause interoperability 17 issues. 19 Note to Readers 21 Discussion of this draft takes place on the HTTP working group 22 mailing list (ietf-http-wg@w3.org), which is archived at 23 https://lists.w3.org/Archives/Public/ietf-http-wg/ [1]. 25 _RFC EDITOR: please remove this section before publication_ 27 Working Group information can be found at https://httpwg.github.io/ 28 [2]; source code and issues list for this draft can be found at 29 https://github.com/httpwg/http-extensions/labels/header-structure 30 [3]. 32 Status of This Memo 34 This Internet-Draft is submitted in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF). Note that other groups may also distribute 39 working documents as Internet-Drafts. The list of current Internet- 40 Drafts is at https://datatracker.ietf.org/drafts/current/. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 This Internet-Draft will expire on May 31, 2018. 49 Copyright Notice 51 Copyright (c) 2017 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (https://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. Code Components extracted from this document must 60 include Simplified BSD License text as described in Section 4.e of 61 the Trust Legal Provisions and are provided without warranty as 62 described in the Simplified BSD License. 64 1. Introduction 66 Specifying the syntax of new HTTP header fields is an onerous task; 67 even with the guidance in [RFC7231], Section 8.3.1, there are many 68 decisions - and pitfalls - for a prospective HTTP header field 69 author. 71 Likewise, bespoke parsers often need to be written for specific HTTP 72 headers, because each has slightly different handling of what looks 73 like common syntax. 75 This document introduces structured HTTP header field values 76 (hereafter, Structured Headers) to address these problems. 77 Structured Headers define a generic, abstract model for data, along 78 with a concrete serialisation for expressing that model in textual 79 HTTP headers, as used by HTTP/1 [RFC7230] and HTTP/2 [RFC7540]. 81 HTTP headers that are defined as Structured Headers use the types 82 defined in this specification to define their syntax and basic 83 handling rules, thereby simplifying both their definition and 84 parsing. 86 Additionally, future versions of HTTP can define alternative 87 serialisations of the abstract model of Structured Headers, allowing 88 headers that use it to be transmitted more efficiently without being 89 redefined. 91 Note that it is not a goal of this document to redefine the syntax of 92 existing HTTP headers; the mechanisms described herein are only 93 intended to be used with headers that explicitly opt into them. 95 To specify a header field that uses Structured Headers, see 96 Section 2. 98 Section 4 defines a number of abstract data types that can be used in 99 Structured Headers, of which only three are allowed at the "top" 100 level: lists, dictionaries, or items. 102 Those abstract types can be serialised into textual headers - such as 103 those used in HTTP/1 and HTTP/2 - using the algorithms described in 104 Section 3. 106 1.1. Notational Conventions 108 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 109 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 110 "OPTIONAL" in this document are to be interpreted as described in BCP 111 14 [RFC2119] [RFC8174] when, and only when, they appear in all 112 capitals, as shown here. 114 This document uses the Augmented Backus-Naur Form (ABNF) notation of 115 [RFC5234], including the DIGIT, ALPHA and DQUOTE rules from that 116 document. It also includes the OWS rule from [RFC7230]. 118 2. Specifying Structured Headers 120 HTTP headers that use Structured Headers need to be defined to do so 121 explicitly; recipients and generators need to know that the 122 requirements of this document are in effect. The simplest way to do 123 that is by referencing this document in its definition. 125 The field's definition will also need to specify the field-value's 126 allowed syntax, in terms of the types described in Section 4, along 127 with their associated semantics. 129 Field definitions MUST NOT relax or otherwise modify the requirements 130 of this specification; doing so would preclude handling by generic 131 software. 133 However, field definitions are encouraged to clearly state additional 134 constraints upon the syntax, as well as the consequences when those 135 constraints are violated. 137 For example: 139 # FooExample Header 141 The FooExample HTTP header field conveys a list of numbers about how 142 much Foo the sender has. 144 FooExample is a Structured header [RFCxxxx]. Its value MUST be a 145 dictionary ([RFCxxxx], Section Y.Y). 147 The dictionary MUST contain: 149 * A member whose key is "foo", and whose value is an integer 150 ([RFCxxxx], Section Y.Y), indicating the number of foos in 151 the message. 152 * A member whose key is "bar", and whose value is a string 153 ([RFCxxxx], Section Y.Y), conveying the characteristic bar-ness 154 of the message. 156 If the parsed header field does not contain both, it MUST be ignored. 158 Note that empty header field values are not allowed by the syntax, 159 and therefore will be considered errors. 161 3. Parsing Requirements for Textual Headers 163 When a receiving implementation parses textual HTTP header fields 164 (e.g., in HTTP/1 or HTTP/2) that are known to be Structured Headers, 165 it is important that care be taken, as there are a number of edge 166 cases that can cause interoperability or even security problems. 167 This section specifies the algorithm for doing so. 169 Given an ASCII string input_string that represents the chosen 170 header's field-value, return the parsed header value. Note that 171 input_string may incorporate multiple header lines combined into one 172 comma-separated field-value, as per [RFC7230], Section 3.2.2. 174 1. Discard any OWS from the beginning of input_string. 176 2. If the field-value is defined to be a dictionary, return the 177 result of Parsing a Dictionary from Textual headers 178 (Section 4.7). 180 3. If the field-value is defined to be a list, return the result of 181 Parsing a List from Textual Headers (Section 4.8). 183 4. If the field-value is defined to be a parameterised label, return 184 the result of Parsing a Parameterised Label from Textual headers 185 (Section 4.4). 187 5. Otherwise, return the result of Parsing an Item from Textual 188 Headers (Section 4.6). 190 Note that in the case of lists and dictionaries, this has the effect 191 of combining multiple instances of the header field into one. 192 However, for singular items and parameterised labels, it has the 193 effect of selecting the first value and ignoring any subsequent 194 instances of the field, as well as extraneous text afterwards. 196 Additionally, note that the effect of the parsing algorithms as 197 specified is generally intolerant of syntax errors; if one is 198 encountered, the typical response is to throw an error, thereby 199 discarding the entire header field value. This includes any non- 200 ASCII characters in input_string. 202 4. Structured Header Data Types 204 This section defines the abstract value types that can be composed 205 into Structured Headers, along with the textual HTTP serialisations 206 of them. 208 4.1. Numbers 210 Abstractly, numbers are integers with an optional fractional part. 211 They have a maximum of fifteen digits available to be used in one or 212 both of the parts, as reflected in the ABNF below; this allows them 213 to be stored as IEEE 754 double precision numbers (binary64) 214 ([IEEE754]). 216 The textual HTTP serialisation of numbers allows a maximum of fifteen 217 digits between the integer and fractional part, along with an 218 optional "-" indicating negative numbers. 220 number = ["-"] ( "." 1*15DIGIT / 221 DIGIT "." 1*14DIGIT / 222 2DIGIT "." 1*13DIGIT / 223 3DIGIT "." 1*12DIGIT / 224 4DIGIT "." 1*11DIGIT / 225 5DIGIT "." 1*10DIGIT / 226 6DIGIT "." 1*9DIGIT / 227 7DIGIT "." 1*8DIGIT / 228 8DIGIT "." 1*7DIGIT / 229 9DIGIT "." 1*6DIGIT / 230 10DIGIT "." 1*5DIGIT / 231 11DIGIT "." 1*4DIGIT / 232 12DIGIT "." 1*3DIGIT / 233 13DIGIT "." 1*2DIGIT / 234 14DIGIT "." 1DIGIT / 235 15DIGIT ) 237 integer = ["-"] 1*15DIGIT 238 unsigned = 1*15DIGIT 240 integer and unsigned are defined as conveniences to specification 241 authors; if their use is specified and their ABNF is not matched, a 242 parser MUST consider it to be invalid. 244 For example, a header whose value is defined as a number could look 245 like: 247 ExampleNumberHeader: 4.5 249 4.1.1. Parsing Numbers from Textual Headers 251 TBD 253 4.2. Strings 255 Abstractly, strings are ASCII strings [RFC0020], excluding control 256 characters (i.e., the range 0x20 to 0x7E). Note that this excludes 257 tabs, newlines and carriage returns. They may be at most 1024 258 characters long. 260 The textual HTTP serialisation of strings uses a backslash ("") to 261 escape double quotes and backslashes in strings. 263 string = DQUOTE 1*1024(char) DQUOTE 264 char = unescaped / escape ( DQUOTE / "\" ) 265 unescaped = %x20-21 / %x23-5B / %x5D-7E 266 escape = "\" 267 For example, a header whose value is defined as a string could look 268 like: 270 ExampleStringHeader: "hello world" 272 Note that strings only use DQUOTE as a delimiter; single quotes do 273 not delimit strings. Furthermore, only DQUOTE and "" can be escaped; 274 other sequences MUST generate an error. 276 Unicode is not directly supported in Structured Headers, because it 277 causes a number of interoperability issues, and - with few exceptions 278 - header values do not require it. 280 When it is necessary for a field value to convey non-ASCII string 281 content, binary content (Section 4.5) SHOULD be specified, along with 282 a character encoding (most likely, UTF-8). 284 4.2.1. Parsing a String from Textual Headers 286 Given an ASCII string input_string, return an unquoted string. 287 input_string is modified to remove the parsed value. 289 1. Let output_string be an empty string. 291 2. If the first character of input_string is not DQUOTE, throw an 292 error. 294 3. Discard the first character of input_string. 296 4. If input_string contains more than 1025 characters, throw an 297 error. 299 5. While input_string is not empty: 301 1. Let char be the result of removing the first character of 302 input_string. 304 2. If char is a backslash ("\"): 306 1. If input_string is now empty, throw an error. 308 2. Else: 310 1. Let next_char be the result of removing the first 311 character of input_string. 313 2. If next_char is not DQUOTE or "\", throw an error. 315 3. Append next_char to output_string. 317 3. Else, if char is DQUOTE, remove the first character of 318 input_string and return output_string. 320 4. Else, append char to output_string. 322 6. Otherwise, throw an error. 324 4.3. Labels 326 Labels are short (up to 256 characters) textual identifiers; their 327 abstract model is identical to their expression in the textual HTTP 328 serialisation. 330 label = lcalpha *255( lcalpha / DIGIT / "_" / "-"/ "*" / "/" ) 331 lcalpha = %x61-7A ; a-z 333 Note that labels can only contain lowercase letters. 335 For example, a header whose value is defined as a label could look 336 like: 338 ExampleLabelHeader: foo/bar 340 4.3.1. Parsing a Label from Textual Headers 342 Given an ASCII string input_string, return a label. input_string is 343 modified to remove the parsed value. 345 1. If input_string contains more than 256 characters, throw an 346 error. 348 2. If the first character of input_string is not lcalpha, throw an 349 error. 351 3. Let output_string be an empty string. 353 4. While input_string is not empty: 355 1. Let char be the result of removing the first character of 356 input_string. 358 2. If char is not one of lcalpha, DIGIT, "_", "-", "*" or "/": 360 1. Prepend char to input_string. 362 2. Return output_string. 364 3. Append char to output_string. 366 5. Return output_string. 368 4.4. Parameterised Labels 370 Parameterised Labels are labels (Section 4.3) with up to 256 371 parameters; each parameter has a label and an optional value that is 372 an item (Section 4.6). Ordering between parameters is not 373 significant, and duplicate parameters MUST be considered an error. 375 The textual HTTP serialisation uses semicolons (";") to delimit the 376 parameters from each other, and equals ("=") to delimit the parameter 377 name from its value. 379 parameterised = label *256( OWS ";" OWS label [ "=" item ] ) 381 For example, 383 ExampleParamHeader: abc; a=1; b=2; c 385 4.4.1. Parsing a Parameterised Label from Textual Headers 387 Given an ASCII string input_string, return a label with an mapping of 388 parameters. input_string is modified to remove the parsed value. 390 1. Let primary_label be the result of Parsing a Label from Textual 391 Headers (Section 4.3) from input_string. 393 2. Let parameters be an empty mapping. 395 3. In a loop: 397 1. Consume any OWS from the beginning of input_string. 399 2. If the first character of input_string is not ";", exit the 400 loop. 402 3. Consume a ";" character from the beginning of input_string. 404 4. Consume any OWS from the beginning of input_string. 406 5. let param_name be the result of Parsing a Label from Textual 407 Headers (Section 4.3) from input_string. 409 6. If param_name is already present in parameters, throw an 410 error. 412 7. Let param_value be a null value. 414 8. If the first character of input_string is "=": 416 1. Consume the "=" character at the beginning of 417 input_string. 419 2. Let param_value be the result of Parsing an Item from 420 Textual Headers (Section 4.6) from input_string. 422 9. If parameters has more than 255 members, throw an error. 424 10. Add param_name to parameters with the value param_value. 426 4. Return the tuple (primary_label, parameters). 428 4.5. Binary Content 430 Arbitrary binary content up to 16K in size can be conveyed in 431 Structured Headers. 433 The textual HTTP serialisation indicates their presence by a leading 434 "*", with the data encoded using Base 64 Encoding [RFC4648], without 435 padding (as "=" might be confused with the use of dictionaries). 437 binary = "*" 1*21846(base64) 438 base64 = ALPHA / DIGIT / "+" / "/" 440 For example, a header whose value is defined as binary content could 441 look like: 443 ExampleBinaryHeader: *cHJldGVuZCB0aGlzIGlzIGJpbmFyeSBjb250ZW50Lg 445 4.5.1. Parsing Binary Content from Textual Headers 447 Given an ASCII string input_string, return binary content. 448 input_string is modified to remove the parsed value. 450 1. If the first character of input_string is not "*", throw an 451 error. 453 2. Discard the first character of input_string. 455 3. Let b64_content be the result of removing content of input_string 456 up to but not including the first character that is not in ALPHA, 457 DIGIT, "+" or "/". 459 4. Let binary_content be the result of Base 64 Decoding [RFC4648] 460 b64_content, synthesising padding if necessary. If an error is 461 encountered, throw it. 463 5. Return binary_content. 465 4.6. Items 467 An item is can be a number (Section 4.1), string (Section 4.2), label 468 (Section 4.3) or binary content (Section 4.5). 470 item = number / string / label / binary 472 4.6.1. Parsing an Item from Textual Headers 474 Given an ASCII string input_string, return an item. input_string is 475 modified to remove the parsed value. 477 1. Discard any OWS from the beginning of input_string. 479 2. If the first character of input_string is a "-" or a DIGIT, 480 process input_string as a number (Section 4.1) and return the 481 result, throwing any errors encountered. 483 3. If the first character of input_string is a DQUOTE, process 484 input_string as a string (Section 4.2) and return the result, 485 throwing any errors encountered. 487 4. If the first character of input_string is "*", process 488 input_string as binary content (Section 4.5) and return the 489 result, throwing any errors encountered. 491 5. If the first character of input_string is an lcalpha, process 492 input_string as a label (Section 4.3) and return the result, 493 throwing any errors encountered. 495 6. Otherwise, throw an error. 497 4.7. Dictionaries 499 Dictionaries are unordered maps of key-value pairs, where the keys 500 are labels (Section 4.3) and the values are items (Section 4.6). 501 There can be between 1 and 1024 members, and keys are required to be 502 unique. 504 In the textual HTTP serialisation, keys and values are separated by 505 "=" (without whitespace), and key/value pairs are separated by a 506 comma with optional whitespace. 508 dictionary = label "=" item *1023( OWS "," OWS label "=" item ) 510 For example, a header field whose value is defined as a dictionary 511 could look like: 513 ExampleDictHeader: foo=1.23, en="Applepie", da=*w4ZibGV0w6ZydGUK 515 Typically, a header field specification will define the semantics of 516 individual keys, as well as whether their presence is required or 517 optional. Recipients MUST ignore keys that are undefined or unknown, 518 unless the header field's specification specifically disallows them. 520 4.7.1. Parsing a Dictionary from Textual Headers 522 Given an ASCII string input_string, return a mapping of (label, 523 item). input_string is modified to remove the parsed value. 525 1. Let dictionary be an empty mapping. 527 2. While input_string is not empty: 529 1. Let this_key be the result of running Parse Label from 530 Textual Headers (Section 4.3) with input_string. If an error 531 is encountered, throw it. 533 2. If dictionary already contains this_key, raise an error. 535 3. Consume a "=" from input_string; if none is present, raise an 536 error. 538 4. Let this_value be the result of running Parse Item from 539 Textual Headers (Section 4.6) with input_string. If an error 540 is encountered, throw it. 542 5. Add key this_key with value this_value to dictionary. 544 6. Discard any leading OWS from input_string. 546 7. If input_string is empty, return dictionary. 548 8. Consume a COMMA from input_string; if no comma is present, 549 raise an error. 551 9. Discard any leading OWS from input_string. 553 3. Return dictionary. 555 4.8. Lists 557 Lists are arrays of items (Section 4.6) or parameterised labels 558 (Section 4.4, with one to 1024 members. 560 In the textual HTTP serialisation, each member is separated by a 561 comma and optional whitespace. 563 list = list_member 1*1024( OWS "," OWS list_member ) 564 list_member = item / parameterised 566 For example, a header field whose value is defined as a list of 567 labels could look like: 569 ExampleLabelListHeader: foo, bar, baz_45 571 and a header field whose value is defined as a list of parameterised 572 labels could look like: 574 ExampleParamListHeader: abc/def; g="hi";j, klm/nop 576 4.8.1. Parsing a List from Textual Headers 578 Given an ASCII string input_string, return a list of items. 579 input_string is modified to remove the parsed value. 581 1. Let items be an empty array. 583 2. While input_string is not empty: 585 1. Let item be the result of running Parse Item from Textual 586 Headers (Section 4.6) with input_string. If an error is 587 encountered, throw it. 589 2. Append item to items. 591 3. Discard any leading OWS from input_string. 593 4. If input_string is empty, return items. 595 5. Consume a COMMA from input_string; if no comma is present, 596 raise an error. 598 6. Discard any leading OWS from input_string. 600 3. Return items. 602 5. IANA Considerations 604 This draft has no actions for IANA. 606 6. Security Considerations 608 TBD 610 7. References 612 7.1. Normative References 614 [RFC0020] Cerf, V., "ASCII format for network interchange", STD 80, 615 RFC 20, DOI 10.17487/RFC0020, October 1969, 616 . 618 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 619 Requirement Levels", BCP 14, RFC 2119, 620 DOI 10.17487/RFC2119, March 1997, 621 . 623 [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 624 Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, 625 . 627 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 628 Specifications: ABNF", STD 68, RFC 5234, 629 DOI 10.17487/RFC5234, January 2008, 630 . 632 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 633 Protocol (HTTP/1.1): Message Syntax and Routing", 634 RFC 7230, DOI 10.17487/RFC7230, June 2014, 635 . 637 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 638 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 639 May 2017, . 641 7.2. Informative References 643 [IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", 2008, 644 . 646 [RFC7231] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 647 Protocol (HTTP/1.1): Semantics and Content", RFC 7231, 648 DOI 10.17487/RFC7231, June 2014, 649 . 651 [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext 652 Transfer Protocol Version 2 (HTTP/2)", RFC 7540, 653 DOI 10.17487/RFC7540, May 2015, 654 . 656 7.3. URIs 658 [1] https://lists.w3.org/Archives/Public/ietf-http-wg/ 660 [2] https://httpwg.github.io/ 662 [3] https://github.com/httpwg/http-extensions/labels/header-structure 664 Appendix A. Changes 666 A.1. Since draft-ietf-httpbis-header-structure-01 668 Replaced with draft-nottingham-structured-headers. 670 A.2. Since draft-ietf-httpbis-header-structure-00 672 Added signed 64bit integer type. 674 Drop UTF8, and settle on BCP137 ::EmbeddedUnicodeChar for h1-unicode- 675 string. 677 Change h1_blob delimiter to ":" since "'" is valid t_char 679 Authors' Addresses 681 Mark Nottingham 682 Fastly 684 Email: mnot@mnot.net 685 URI: https://www.mnot.net/ 687 Poul-Henning Kamp 688 The Varnish Cache Project 690 Email: phk@varnish-cache.org