idnits 2.17.1 draft-ietf-httpbis-header-structure-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (April 24, 2017) is 2558 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 7230 (Obsoleted by RFC 9110, RFC 9112) -- Obsolete informational reference (is this intentional?): RFC 7231 (Obsoleted by RFC 9110) -- Obsolete informational reference (is this intentional?): RFC 7232 (Obsoleted by RFC 9110) -- Obsolete informational reference (is this intentional?): RFC 7233 (Obsoleted by RFC 9110) -- Obsolete informational reference (is this intentional?): RFC 7234 (Obsoleted by RFC 9111) -- Obsolete informational reference (is this intentional?): RFC 7235 (Obsoleted by RFC 9110) -- Obsolete informational reference (is this intentional?): RFC 7694 (Obsoleted by RFC 9110) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 HTTP Working Group P-H. Kamp 3 Internet-Draft The Varnish Cache Project 4 Intended status: Standards Track April 24, 2017 5 Expires: October 26, 2017 7 HTTP Header Common Structure 8 draft-ietf-httpbis-header-structure-01 10 Abstract 12 An abstract data model for HTTP headers, "Common Structure", and a 13 HTTP/1 serialization of it, generalized from current HTTP headers. 15 Note to Readers 17 Discussion of this draft takes place on the HTTP working group 18 mailing list (ietf-http-wg@w3.org), which is archived at 19 https://lists.w3.org/Archives/Public/ietf-http-wg/ . 21 Working Group information can be found at http://httpwg.github.io/ ; 22 source code and issues list for this draft can be found at 23 https://github.com/httpwg/http-extensions/labels/header-structure . 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on October 26, 2017. 42 Copyright Notice 44 Copyright (c) 2017 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 1. Introduction 59 The HTTP protocol does not impose any structure or datamodel on the 60 information in HTTP headers, the HTTP/1 serialization is the 61 datamodel: An ASCII string without control characters. 63 HTTP header definitions specify how the string must be formatted and 64 while families of similar headers exist, it still requires an 65 uncomfortable large number of bespoke parser and validation routines 66 to process HTTP traffic correctly. 68 In order to improve performance HTTP/2 and HPACK uses naive text- 69 compression, which incidentally decoupled the on-the-wire 70 serialization from the data model. 72 During the development of HPACK it became evident that significantly 73 bigger gains were available if semantic compression could be used, 74 most notably with timestamps. However, the lack of a common data 75 structure for HTTP headers would make semantic compression one long 76 list of special cases. 78 Parallel to this, various proposals for how to fulfill data- 79 transportation needs, and to a lesser degree to impose some kind of 80 order on HTTP headers, at least going forward, were floated. 82 All of these proposals, JSON, CBOR etc. run into the same basic 83 problem: Their serialization is incompatible with RFC 7230's 84 [RFC7230] ABNF definition of 'field-value'. 86 For binary formats, such as CBOR, a wholesale base64/85 87 reserialization would be needed, with negative results for both 88 debugability and bandwidth. 90 For textual formats, such as JSON, the format must first be neutered 91 to not violate field-value's ABNF, and then workarounds added to 92 reintroduce the features just lost, for instance UNICODE strings. 94 The post-surgery format is no longer JSON, and it experience 95 indicates that almost-but-not-quite compatibility is worse than no 96 compatibility. 98 This proposal starts from the other end, and builds and generalizes a 99 data structure definition from existing HTTP headers, which means 100 that HTTP/1 serialization and 'field-value' compatibility is built 101 in. 103 If all future HTTP headers are defined to fit into this Common 104 Structure we have at least halted the proliferation of bespoke 105 parsers and started to pave the road for semantic compression 106 serializations of HTTP traffic. 108 1.1. Terminology 110 In this document, the key words "MUST", "MUST NOT", "REQUIRED", 111 "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", 112 and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119 113 [RFC2119]. 115 2. Definition of HTTP Header Common Structure 117 The data model of Common Structure is an ordered sequence of named 118 dictionaries. Please see Appendix A for how this model was derived. 120 The definition of the data model is on purpose abstract, uncoupled 121 from any protocol serialization or programming environment 122 representation, it is meant as the foundation on which all such 123 manifestations of the model can be built. 125 Common Structure in ABNF (Slightly bastardized relative to RFC5234 126 [RFC5234]): 128 import token from RFC7230 129 import DIGIT from RFC5234 131 common-structure = 1* ( identifier dictionary ) 133 dictionary = * ( identifier [ value ] ) 135 value = identifier / 136 integer / 137 number / 138 ascii-string / 139 unicode-string / 140 blob / 141 timestamp / 142 common-structure 144 Recursion is included as a way to to support deep and more general 145 data structures, but its use is highly discouraged and where it is 146 used the depth of recursion SHALL always be explicitly limited in the 147 specifications of the HTTP headers which allow it. 149 identifier = token [ "/" token ] 151 integer = ["-"] 1*19 DIGIT 153 Integers SHALL be in the range +/- 2^63-1 (= +/- 9223372036854775807) 155 number = ["-"] DIGIT '.' 1*14DIGIT / 156 ["-"] 2DIGIT '.' 1*13DIGIT / 157 ["-"] 3DIGIT '.' 1*12DIGIT / 158 ... / 159 ["-"] 12DIGIT '.' 1*3DIGIT / 160 ["-"] 13DIGIT '.' 1*2DIGIT / 161 ["-"] 14DIGIT '.' 1DIGIT 163 The limit of 15 significant digits is chosen so that numbers can be 164 correctly represented by IEEE754 64 bit binary floating point. 166 ascii-string = * %x20-7e 168 This is intended to be an efficient, "safe" and uncomplicated string 169 type, for uses where the string content is culturally neutral or 170 where it will not be user visible. 172 unicode-string = * UNICODE 174 UNICODE = 175 # UNICODE nicked from draft-seantek-unicode-in-abnf-02 177 Unicode-strings are unrestricted because there is no sane and/or 178 culturally neutral way to subset or otherwise make unicode "safe", 179 and Unicode is still evolving new and interesting code points. 181 Users of unicode-string SHALL be prepared for the full gammut of 182 glyph-gymnastics in order to avoid U+1F4A9 U+08 U+1F574. 184 blob = * %0x00-ff 186 Blobs are intended primarily for cryptographic data, but can be used 187 for any otherwise unsatisfied needs. 189 timestamp = number 191 A timestamp counts seconds since the UNIX time_t epoch, including the 192 "invisible leap-seconds" misfeature. 194 3. HTTP/1 Serialization of HTTP Header Common Structure 196 In ABNF: 198 import OWS from RFC7230 199 import HEXDIG, DQUOTE from RFC5234 200 import EmbeddedUnicodeChar from RFC5137 202 h1-common-structure-header = 203 h1-common-structure-legacy-header / 204 h1-common-structure-self-identifying-header 206 h1-common-structure-legacy-header = 207 field-name ":" OWS h1-common-structure 209 Only white-listed legacy headers (see Section 8) can use this format. 211 h1-common-structure-self-identifying-header: 212 field-name ":" OWS ">" h1-common-structure "<" 214 h1-common-structure = h1-element * ("," h1-element) 216 h1-element = identifier * (";" identifier ["=" h1-value]) 218 h1-value = identifier / 219 integer / 220 number / 221 h1-ascii-string / 222 h1-unicode-string / 223 h1-blob / 224 h1-timestamp / 225 ">" h1-common-structure "<" 227 h1-ascii-string = DQUOTE *( 228 ( "\" DQUOTE ) / 229 ( "\" "\" ) / 230 0x20-21 / 231 0x23-5B / 232 0x5D-7E 233 ) DQUOTE 235 h1-unicode-string = DQUOTE *( 236 ( "\" DQUOTE ) 237 ( "\" "\" ) / 238 EmbeddedUnicodeChar / 239 0x20-21 / 240 0x23-5B / 241 0x5D-7E / 242 ) DQUOTE 244 The dim prospects of ever getting a majority of HTTP1 paths 8-bit 245 clean makes UTF-8 unviable as H1 serialization. Given that very 246 little of the information in HTTP headers is presented to users in 247 the first place, improving H1 and HPACK efficiency by inventing a 248 more efficient RFC5137 compliant escape-sequences seems unwarranted. 250 h1-blob = ":" base64 ":" 251 # XXX: where to import base64 from ? 253 h1-timestamp = number 255 XXX: Allow OWS in parsers, but not in generators ? 256 In programming environments which do not define a native 257 representation or serialization of Common Structure, the HTTP/1 258 serialization should be used. 260 4. When to use Common Structure Parser 262 All future standardized and all private HTTP headers using Common 263 Structure should self identify as such. In the HTTP/1 serialization 264 by making the first character ">" and the last "<". (These two 265 characters are deliberately "the wrong way" to not clash with 266 exsisting usages.) 268 Legacy HTTP headers which fit into Common Structure, are marked as 269 such in the IANA Message Header Registry (see Section 8), and a 270 snapshot of the registry can be used to trigger parsing according to 271 Common Structure of these headers. 273 5. Desired Normative Effects 275 All new HTTP headers SHOULD use the Common Structure if at all 276 possible. 278 6. Open/Outstanding issues to resolve 280 6.1. Single/Multiple Headers 282 Should we allow splitting common structure data over multiple headers 283 ? 285 Pro: 287 Avoids size restrictions, easier on-the-fly editing 289 Contra: 291 Cannot act on any such header until all headers have been received. 293 We must define where headers can be split (between identifier and 294 dictionary ?, in the middle of dictionaries ?) 296 Most on-the-fly editing is hackish at best. 298 7. Future Work 299 7.1. Redefining existing headers for better performance 301 The HTTP/1 serializations self-identification mechanism makes it 302 possible to extend the definition of existing Appendix A.5 headers 303 into Common Structure. 305 For instance one could imagine: 307 Date: >1475061449.201< 309 Which would be faster to parse and validate than the current 310 definition of the Date header and more precise too. 312 Some kind of signal/negotiation mechanism would be required to make 313 this work in practice. 315 7.2. Define a validation dictionary 317 A machine-readable specification of the legal contents of HTTP 318 headers would go a long way to improve efficiency and security in 319 HTTP implementations. 321 8. IANA Considerations 323 The IANA Message Header Registry will be extended with an additional 324 field named "Common Structure" which can have the values "True", 325 "False" or "Unknown". 327 The RFC723x headers listed in Appendix A.4 will get the value "True" 328 in the new field. 330 The RFC723x headers listed in Appendix A.5 will get the value "False" 331 in the new field. 333 All other existing entries in the registry will be set to "Unknown" 334 until and if the owner of the entry requests otherwise. 336 9. Security Considerations 338 Unique dictionary keys are required to reduce the risk of smuggling 339 attacks. 341 10. References 342 10.1. Normative References 344 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 345 Requirement Levels", BCP 14, RFC 2119, 346 DOI 10.17487/RFC2119, March 1997, 347 . 349 [RFC5137] Klensin, J., "ASCII Escaping of Unicode Characters", 350 BCP 137, RFC 5137, DOI 10.17487/RFC5137, February 2008, 351 . 353 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 354 Specifications: ABNF", STD 68, RFC 5234, 355 DOI 10.17487/RFC5234, January 2008, 356 . 358 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 359 Protocol (HTTP/1.1): Message Syntax and Routing", 360 RFC 7230, DOI 10.17487/RFC7230, June 2014, 361 . 363 10.2. Informative References 365 [RFC7231] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 366 Protocol (HTTP/1.1): Semantics and Content", RFC 7231, 367 DOI 10.17487/RFC7231, June 2014, 368 . 370 [RFC7232] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 371 Protocol (HTTP/1.1): Conditional Requests", RFC 7232, 372 DOI 10.17487/RFC7232, June 2014, 373 . 375 [RFC7233] Fielding, R., Ed., Lafon, Y., Ed., and J. Reschke, Ed., 376 "Hypertext Transfer Protocol (HTTP/1.1): Range Requests", 377 RFC 7233, DOI 10.17487/RFC7233, June 2014, 378 . 380 [RFC7234] Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke, 381 Ed., "Hypertext Transfer Protocol (HTTP/1.1): Caching", 382 RFC 7234, DOI 10.17487/RFC7234, June 2014, 383 . 385 [RFC7235] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 386 Protocol (HTTP/1.1): Authentication", RFC 7235, 387 DOI 10.17487/RFC7235, June 2014, 388 . 390 [RFC7239] Petersson, A. and M. Nilsson, "Forwarded HTTP Extension", 391 RFC 7239, DOI 10.17487/RFC7239, June 2014, 392 . 394 [RFC7694] Reschke, J., "Hypertext Transfer Protocol (HTTP) Client- 395 Initiated Content-Encoding", RFC 7694, 396 DOI 10.17487/RFC7694, November 2015, 397 . 399 Appendix A. Do HTTP headers have any common structure ? 401 Several proposals have been floated in recent years to use some 402 preexisting structured data serialization or other for HTTP headers, 403 to impose some sanity. 405 None of these proposals have gained traction and no obvious candidate 406 data serializations have been left unexamined. 408 This effort tries to tackle the question from the other side, by 409 asking if there is a common structure in existing HTTP headers we can 410 generalize for this purpose. 412 A.1. Survey of HTTP header structure 414 The RFC723x family of HTTP/1 standards control 49 entries in the IANA 415 Message Header Registry, and they share two common motifs. 417 The majority of RFC723x HTTP headers are lists. A few of them are 418 ordered, ('Content-Encoding'), some are unordered ('Connection') and 419 some are ordered by 'q=%f' weight parameters ('Accept') 421 In most cases, the list elements are some kind of identifier, usually 422 derived from ABNF 'token' as defined by [RFC7230]. 424 A subgroup of headers, mostly related to MIME, uses what one could 425 call a 'qualified token':: 427 qualified-token = token-or-asterix [ "/" token-or-asterix ] 429 The second motif is parameterized list elements. The best known is 430 the "q=0.5" weight parameter, but other parameters exist as well. 432 Generalizing from these motifs, our candidate "Common Structure" data 433 model becomes an ordered list of named dictionaries. 435 In pidgin ABNF, ignoring white-space for the sake of clarity, the 436 HTTP/1.1 serialization of Common Structure is is something like: 438 token-or-asterix = token from RFC7230, but also allowing "*" 440 qualified-token = token-or-asterix [ "/" token-or-asterix ] 442 field-name, see RFC7230 444 Common-Structure-Header = field-name ":" 1#named-dictionary 446 named-dictionary = qualified-token [ *(";" param) ] 448 param = token [ "=" value ] 450 value = we'll get back to this in a moment. 452 Nineteen out of the RFC723x's 48 headers, almost 40%, can already be 453 parsed using this definition, and none the rest have requirements 454 which could not be met by this data model. See Appendix A.4 and 455 Appendix A.5 for the full survey details. 457 A.2. Survey of values in HTTP headers 459 Surveying the datatypes of HTTP headers, standardized as well as 460 private, the following picture emerges: 462 A.2.1. Numbers 464 Integer and floating point are both used. Range and precision is 465 mostly unspecified in controlling documents. 467 Scientific notation (9.192631770e9) does not seem to be used 468 anywhere. 470 The ranges used seem to be minus several thousand to plus a couple of 471 billions, the high end almost exclusively being POSIX time_t 472 timestamps. 474 A.2.2. Timestamps 476 RFC723x text format, but POSIX time_t represented as integer or 477 floating point is not uncommon. ISO8601 have also been spotted. 479 A.2.3. Strings 481 The vast majority are pure ASCII strings, with either no escapes, %xx 482 URL-like escapes or C-style back-slash escapes, possibly with the 483 addition of \uxxxx UNICODE escapes. 485 Where non-ASCII character sets are used, they are almost always 486 implicit, rather than explicit. UTF8 and ISO-8859-1 seem to be most 487 common. 489 A.2.4. Binary blobs 491 Often used for cryptographic data. Usually in base64 encoding, 492 sometimes ""-quoted more often not. base85 encoding is also seen, 493 usually quoted. 495 A.2.5. Identifiers 497 Seems to almost always fit in the RFC723x 'token' definition. 499 A.3. Is this actually a useful thing to generalize ? 501 The number one wishlist item seems to be UNICODE strings, with a big 502 side order of not having to write a new parser routine every time 503 somebody comes up with a new header. 505 Having a common parser would indeed be a good thing, and having an 506 underlying data model which makes it possible define a compressed 507 serialization, rather than rely on serialization to text followed by 508 text compression (ie: HPACK) seems like a good idea too. 510 However, when using a datamodel and a parser general enough to 511 transport useful data, it will have to be followed by a validation 512 step, which checks that the data also makes sense. 514 Today validation, such as it is, is often done by the bespoke 515 parsers. 517 This then is probably where the next big potential for improvement 518 lies: 520 Ideally a machine readable "data dictionary" which makes it possibly 521 to copy that text out of RFCs, run it through a code generator which 522 spits out validation code which operates on the output of the common 523 parser. 525 But history has been particularly unkind to that idea. 527 Most attempts studied as part of this effort, have sunk under 528 complexity caused by reaching for generality, but where scope has 529 been wisely limited, it seems to be possible. 531 So file that idea under "future work". 533 A.4. RFC723x headers with "common structure" 535 o Accept [RFC7231], Section 5.3.2 537 o Accept-Charset [RFC7231], Section 5.3.3 539 o Accept-Encoding [RFC7231], Section 5.3.4, [RFC7694], Section 3 541 o Accept-Language [RFC7231], Section 5.3.5 543 o Age [RFC7234], Section 5.1 545 o Allow [RFC7231], Section 7.4.1 547 o Connection [RFC7230], Section 6.1 549 o Content-Encoding [RFC7231], Section 3.1.2.2 551 o Content-Language [RFC7231], Section 3.1.3.2 553 o Content-Length [RFC7230], Section 3.3.2 555 o Content-Type [RFC7231], Section 3.1.1.5 557 o Expect [RFC7231], Section 5.1.1 559 o Max-Forwards [RFC7231], Section 5.1.2 561 o MIME-Version [RFC7231], Appendix A.1 563 o TE [RFC7230], Section 4.3 565 o Trailer [RFC7230], Section 4.4 567 o Transfer-Encoding [RFC7230], Section 3.3.1 569 o Upgrade [RFC7230], Section 6.7 571 o Vary [RFC7231], Section 7.1.4 573 A.5. RFC723x headers with "uncommon structure" 575 1 of the RFC723x headers is only reserved, and therefore have no 576 structure at all: 578 o Close [RFC7230], Section 8.1 580 5 of the RFC723x headers are HTTP dates: 582 o Date [RFC7231], Section 7.1.1.2 584 o Expires [RFC7234], Section 5.3 586 o If-Modified-Since [RFC7232], Section 3.3 588 o If-Unmodified-Since [RFC7232], Section 3.4 590 o Last-Modified [RFC7232], Section 2.2 592 24 of the RFC723x headers use bespoke formats which only a single or 593 in rare cases two headers share: 595 o Accept-Ranges [RFC7233], Section 2.3 597 * bytes-unit / other-range-unit 599 o Authorization [RFC7235], Section 4.2 601 o Proxy-Authorization [RFC7235], Section 4.4 603 * credentials 605 o Cache-Control [RFC7234], Section 5.2 607 * 1#cache-directive 609 o Content-Location [RFC7231], Section 3.1.4.2 611 * absolute-URI / partial-URI 613 o Content-Range [RFC7233], Section 4.2 615 * byte-content-range / other-content-range 617 o ETag [RFC7232], Section 2.3 619 * entity-tag 621 o Forwarded [RFC7239] 623 * 1#forwarded-element 625 o From [RFC7231], Section 5.5.1 627 * mailbox 629 o If-Match [RFC7232], Section 3.1 630 o If-None-Match [RFC7232], Section 3.2 632 * "*" / 1#entity-tag 634 o If-Range [RFC7233], Section 3.2 636 * entity-tag / HTTP-date 638 o Host [RFC7230], Section 5.4 640 * uri-host [ ":" port ] 642 o Location [RFC7231], Section 7.1.2 644 * URI-reference 646 o Pragma [RFC7234], Section 5.4 648 * 1#pragma-directive 650 o Range [RFC7233], Section 3.1 652 * byte-ranges-specifier / other-ranges-specifier 654 o Referer [RFC7231], Section 5.5.2 656 * absolute-URI / partial-URI 658 o Retry-After [RFC7231], Section 7.1.3 660 * HTTP-date / delay-seconds 662 o Server [RFC7231], Section 7.4.2 664 o User-Agent [RFC7231], Section 5.5.3 666 * product *( RWS ( product / comment ) ) 668 o Via [RFC7230], Section 5.7.1 670 * 1#( received-protocol RWS received-by [ RWS comment ] ) 672 o Warning [RFC7234], Section 5.5 674 * 1#warning-value 676 o Proxy-Authenticate [RFC7235], Section 4.3 677 o WWW-Authenticate [RFC7235], Section 4.1 679 * 1#challenge 681 Appendix B. Changes 683 B.1. Since draft-ietf-httpbis-header-structure-00 685 Added signed 64bit integer type. 687 Drop UTF8, and settle on BCP137 [RFC5137]::EmbeddedUnicodeChar for 688 h1-unicode-string. 690 Change h1_blob delimiter to ":" since "'" is valid t_char 692 Author's Address 694 Poul-Henning Kamp 695 The Varnish Cache Project 697 Email: phk@varnish-cache.org