idnits 2.17.1 draft-ietf-httpbis-header-structure-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (December 10, 2016) is 2687 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC7694' is mentioned on line 460, but not defined ** Obsolete undefined reference: RFC 7694 (Obsoleted by RFC 9110) == Missing Reference: 'Section 3' is mentioned on line 460, but not defined == Missing Reference: 'RFC7239' is mentioned on line 515, but not defined ** Obsolete normative reference: RFC 7230 (Obsoleted by RFC 9110, RFC 9112) Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 HTTP Working Group P-H. Kamp 3 Internet-Draft The Varnish Cache Project 4 Intended status: Standards Track December 10, 2016 5 Expires: June 13, 2017 7 HTTP Header Common Structure 8 draft-ietf-httpbis-header-structure-00 10 Abstract 12 An abstract data model for HTTP headers, "Common Structure", and a 13 HTTP/1 serialization of it, generalized from current HTTP headers. 15 Note to Readers 17 Discussion of this draft takes place on the HTTP working group 18 mailing list (ietf-http-wg@w3.org), which is archived at 19 https://lists.w3.org/Archives/Public/ietf-http-wg/ . 21 Working Group information can be found at http://httpwg.github.io/ ; 22 source code and issues list for this draft can be found at 23 https://github.com/httpwg/http-extensions/labels/header-structure . 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on June 13, 2017. 42 Copyright Notice 44 Copyright (c) 2016 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 1. Introduction 59 The HTTP protocol does not impose any structure or datamodel on the 60 information in HTTP headers, the HTTP/1 serialization is the 61 datamodel: An ASCII string without control characters. 63 HTTP header definitions specify how the string must be formatted and 64 while families of similar headers exist, it still requires an 65 uncomfortable large number of bespoke parser and validation routines 66 to process HTTP traffic correctly. 68 In order to improve performance HTTP/2 and HPACK uses naive text- 69 compression, which incidentally decoupled the on-the-wire 70 serialization from the data model. 72 During the development of HPACK it became evident that significantly 73 bigger gains were available if semantic compression could be used, 74 most notably with timestamps. However, the lack of a common data 75 structure for HTTP headers would make semantic compression one long 76 list of special cases. 78 Parallel to this, various proposals for how to fulfill data- 79 transportation needs, and to a lesser degree to impose some kind of 80 order on HTTP headers, at least going forward were floated. 82 All of these proposals, JSON, CBOR etc. run into the same basic 83 problem: Their serialization is incompatible with [RFC7230]'s ABNF 84 definition of 'field-value'. 86 For binary formats, such as CBOR, a wholesale base64/85 87 reserialization would be needed, with negative results for both 88 debugability and bandwidth. 90 For textual formats, such as JSON, the format must first be neutered 91 to not violate field-value's ABNF, and then workarounds added to 92 reintroduce the features just lost, for instance UNICODE strings, and 93 suddenly it is no longer JSON anymore. 95 This proposal starts from the other end, and builds and generalizes a 96 data structure definition from existing HTTP headers, which means 97 that HTTP/1 serialization and 'field-value' compatibility is built 98 in. 100 If all future HTTP headers are defined to fit into this Common 101 Structure we have at least halted the proliferation of bespoke 102 parsers and started to pave the road for semantic compression 103 serializations of HTTP traffic. 105 1.1. Terminology 107 In this document, the key words "MUST", "MUST NOT", "REQUIRED", 108 "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", 109 and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119 110 [RFC2119]. 112 2. Definition of HTTP Header Common Structure 114 The data model of Common Structure is an ordered sequence of named 115 dictionaries. Please see Appendix A for how this model was derived. 117 The definition of the data model is on purpose abstract, uncoupled 118 from any protocol serialization or programming environment 119 representation, meant as the foundation on which all such 120 manifestations of the model can be built. 122 Common Structure in ABNF: 124 import token from RFC7230 125 import DIGIT from RFC5234 127 common-structure = 1* ( identifier dictionary ) 129 dictionary = * ( identifier value ) 131 value = identifier / 132 number / 133 ascii_string / 134 unicode_string / 135 blob / 136 timestamp / 137 common-structure 139 identifier = token [ "/" token ] 141 number = ["-"] 1*15 DIGIT 142 # XXX: Not sure how to do this in ABNF: 143 # XXX: A single "." allowed between any two digits 144 # The range is limited is to ensure it can be 145 # correctly represented in IEEE754 64 bit 146 # binary floating point format. 148 ascii_string = * %x20-7e 149 # This is a "safe" string in the sense that it 150 # contains no control characters or multi-byte 151 # sequences. If that is not fancy enough, use 152 # unicode_string. 154 unicode_string = * unicode_codepoint 155 # XXX: Is there a place to import this from ? 156 # Unrestricted unicode, because there is no sane 157 # way to restrict or otherwise make unicode "safe". 159 blob = * %0x00-ff 160 # Intended for cryptographic data and as a general 161 # escape mechanism for unmet requirements. 163 timestamp = POSIX time_t with optional millisecond resolution 164 # XXX: Is there a place to import this from ? 166 3. HTTP/1 Serialization of HTTP Header Common Structure 168 In ABNF: 170 import OWS from RFC7230 171 import HEXDIG, DQUOTE from RFC5234 172 import UTF8-2, UTF8-3, UTF8-4 from RFC3629 174 h1_common-structure-header = 175 ( field-name ":" OWS ">" h1_common_structure "<" ) 176 # Self-identifying HTTP headers 177 ( field-name ":" OWS h1_common_structure ) / 178 # legacy HTTP headers on white-list, see {{iana}} 180 h1_common_structure = h1_element * ("," h1_element) 182 h1_element = identifier * (";" identifier ["=" h1_value]) 184 h1_value = identifier / 185 number / 186 h1_ascii_string / 187 h1_unicode_string / 188 h1_blob / 189 h1_timestamp / 190 h1_common-structure 192 h1_ascii_string = DQUOTE *( 193 ( "\" DQUOTE ) / 194 ( "\" "\" ) / 195 0x20-21 / 196 0x23-5B / 197 0x5D-7E 198 ) DQUOTE 199 # This is a proper subset of h1_unicode_string 200 # NB only allowed backslash escapes are \" and \\ 202 h1_unicode_string = DQUOTE *( 203 ( "\" DQUOTE ) 204 ( "\" "\" ) / 205 ( "\" "u" 4*HEXDIG ) / 206 0x20-21 / 207 0x23-5B / 208 0x5D-7E / 209 UTF8-2 / 210 UTF8-3 / 211 UTF8-4 212 ) DQUOTE 213 # This is UTF8 with HTTP1 unfriendly codepoints 214 # (00-1f, 7f) neutered with \uXXXX escapes. 216 h1_blob = "'" base64 "'" 217 # XXX: where to import base64 from ? 218 h1_timestamp = number 219 # UNIX/POSIX time_t semantics. 220 # fractional seconds allowed. 222 h1_common_structure = ">" h1_common_structure "<" 224 XXX: Allow OWS in parsers, but not in generators ? 226 In programming environments which do not define a native 227 representation or serialization of Common Structure, the HTTP/1 228 serialization should be used. 230 4. When to use Common Structure Parser 232 All future standardized and all private HTTP headers using Common 233 Structure should self identify as such. In the HTTP/1 serialization 234 by making the first character ">" and the last "<". (These two 235 characters are deliberately "the wrong way" to not clash with 236 exsisting usages.) 238 Legacy HTTP headers which fit into Common Structure, are marked as 239 such in the IANA Message Header Registry (see Section 8), and a 240 snapshot of the registry can be used to trigger parsing according to 241 Common Structure of these headers. 243 5. Desired Normative Effects 245 All new HTTP headers SHOULD use the Common Structure if at all 246 possible. 248 6. Open/Outstanding issues to resolve 250 6.1. Single/Multiple Headers 252 Should we allow splitting common structure data over multiple headers 253 ? 255 Pro: 257 Avoids size restrictions, easier on-the-fly editing 259 Contra: 261 Cannot act on any such header until all headers have been received. 263 We must define where headers can be split (between identifier and 264 dictionary ?, in the middle of dictionaries ?) 265 Most on-the-fly editing is hackish at best. 267 7. Future Work 269 7.1. Redefining existing headers for better performance 271 The HTTP/1 serializations self-identification mechanism makes it 272 possible to extend the definition of existing Appendix A.5 headers 273 into Common Structure. 275 For instance one could imagine: 277 Date: >1475061449.201< 279 Which would be faster to parse and validate than the current 280 definition of the Date header and more precise too. 282 Some kind of signal/negotiation mechanism would be required to make 283 this work in practice. 285 7.2. Define a validation dictionary 287 A machine-readable specification of the legal contents of HTTP 288 headers would go a long way to improve efficiency and security in 289 HTTP implementations. 291 8. IANA Considerations 293 The IANA Message Header Registry will be extended with an additional 294 field named "Common Structure" which can have the values "True", 295 "False" or "Unknown". 297 The RFC723x headers listed in Appendix A.4 will get the value "True" 298 in the new field. 300 The RFC723x headers listed in Appendix A.5 will get the value "False" 301 in the new field. 303 All other existing entries in the registry will be set to "Unknown" 304 until and if the owner of the entry requests otherwise. 306 9. Security Considerations 308 TBD 310 10. Normative References 312 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 313 Requirement Levels", BCP 14, RFC 2119, 314 DOI 10.17487/RFC2119, March 1997, 315 . 317 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 318 Protocol (HTTP/1.1): Message Syntax and Routing", 319 RFC 7230, DOI 10.17487/RFC7230, June 2014, 320 . 322 Appendix A. Do HTTP headers have any common structure ? 324 Several proposals have been floated in recent years to use some 325 preexisting structured data serialization or other for HTTP headers, 326 to impose some sanity. 328 None of these proposals have gained traction and no obvious candidate 329 data serializations have been left unexamined. 331 This effort tries to tackle the question from the other side, by 332 asking if there is a common structure in existing HTTP headers we can 333 generalize for this purpose. 335 A.1. Survey of HTTP header structure 337 The RFC723x family of HTTP/1 standards control 49 entries in the IANA 338 Message Header Registry, and they share two common motifs. 340 The majority of RFC723x HTTP headers are lists. A few of them are 341 ordered, ('Content-Encoding'), some are unordered ('Connection') and 342 some are ordered by 'q=%f' weight parameters ('Accept') 344 In most cases, the list elements are some kind of identifier, usually 345 derived from ABNF 'token' as defined by [RFC7230]. 347 A subgroup of headers, mostly related to MIME, uses what one could 348 call a 'qualified token':: 350 qualified_token = token_or_asterix [ "/" token_or_asterix ] 352 The second motif is parameterized list elements. The best known is 353 the "q=0.5" weight parameter, but other parameters exist as well. 355 Generalizing from these motifs, our candidate "Common Structure" data 356 model becomes an ordered list of named dictionaries. 358 In pidgin ABNF, ignoring white-space for the sake of clarity, the 359 HTTP/1.1 serialization of Common Structure is is something like: 361 token_or_asterix = token from {{RFC7230}}, but also allowing "*" 363 qualified_token = token_or_asterix [ "/" token_or_asterix ] 365 field-name, see {{RFC7230}} 367 Common_Structure_Header = field-name ":" 1#named_dictionary 369 named_dictionary = qualified_token [ *(";" param) ] 371 param = token [ "=" value ] 373 value = we'll get back to this in a moment. 375 Nineteen out of the RFC723x's 48 headers, almost 40%, can already be 376 parsed using this definition, and none the rest have requirements 377 which could not be met by this data model. See Appendix A.4 and 378 Appendix A.5 for the full survey details. 380 A.2. Survey of values in HTTP headers 382 Surveying the datatypes of HTTP headers, standardized as well as 383 private, the following picture emerges: 385 A.2.1. Numbers 387 Integer and floating point are both used. Range and precision is 388 mostly unspecified in controlling documents. 390 Scientific notation (9.192631770e9) does not seem to be used 391 anywhere. 393 The ranges used seem to be minus several thousand to plus a couple of 394 billions, the high end almost exclusively being POSIX time_t 395 timestamps. 397 A.2.2. Timestamps 399 RFC723x text format, but POSIX time_t represented as integer or 400 floating point is not uncommon. ISO8601 have also been spotted. 402 A.2.3. Strings 404 The vast majority are pure ASCII strings, with either no escapes, %xx 405 URL-like escapes or C-style back-slash escapes, possibly with the 406 addition of \uxxxx UNICODE escapes. 408 Where non-ASCII character sets are used, they are almost always 409 implicit, rather than explicit. UTF8 and ISO-8859-1 seem to be most 410 common. 412 A.2.4. Binary blobs 414 Often used for cryptographic data. Usually in base64 encoding, 415 sometimes ""-quoted more often not. base85 encoding is also seen, 416 usually quoted. 418 A.2.5. Identifiers 420 Seems to almost always fit in the RFC723x 'token' definition. 422 A.3. Is this actually a useful thing to generalize ? 424 The number one wishlist item seems to be UNICODE strings, with a big 425 side order of not having to write a new parser routine every time 426 somebody comes up with a new header. 428 Having a common parser would indeed be a good thing, and having an 429 underlying data model which makes it possible define a compressed 430 serialization, rather than rely on serialization to text followed by 431 text compression (ie: HPACK) seems like a good idea too. 433 However, when using a datamodel and a parser general enough to 434 transport useful data, it will have to be followed by a validation 435 step, which checks that the data also makes sense. 437 Today validation, such as it is, is often done by the bespoke 438 parsers. 440 This then is probably where the next big potential for improvement 441 lies: 443 Ideally a machine readable "data dictionary" which makes it possibly 444 to copy that text out of RFCs, run it through a code generator which 445 spits out validation code which operates on the output of the common 446 parser. 448 But history has been particularly unkind to that idea. 450 Most attempts studied as part of this effort, have sunk under 451 complexity caused by reaching for generality, but where scope has 452 been wisely limited, it seems to be possible. 454 So file that idea under "future work". 456 A.4. RFC723x headers with "common structure" 458 Accept [RFC7231, Section 5.3.2] 459 Accept-Charset [RFC7231, Section 5.3.3] 460 Accept-Encoding [RFC7231, Section 5.3.4][RFC7694, Section 3] 461 Accept-Language [RFC7231, Section 5.3.5] 462 Age [RFC7234, Section 5.1] 463 Allow [RFC7231, Section 7.4.1] 464 Connection [RFC7230, Section 6.1] 465 Content-Encoding [RFC7231, Section 3.1.2.2] 466 Content-Language [RFC7231, Section 3.1.3.2] 467 Content-Length [RFC7230, Section 3.3.2] 468 Content-Type [RFC7231, Section 3.1.1.5] 469 Expect [RFC7231, Section 5.1.1] 470 Max-Forwards [RFC7231, Section 5.1.2] 471 MIME-Version [RFC7231, Appendix A.1] 472 TE [RFC7230, Section 4.3] 473 Trailer [RFC7230, Section 4.4] 474 Transfer-Encoding [RFC7230, Section 3.3.1] 475 Upgrade [RFC7230, Section 6.7] 476 Vary [RFC7231, Section 7.1.4] 478 A.5. RFC723x headers with "uncommon structure" 480 1 of the RFC723x headers is only reserved, and therefore have no 481 structure at all: 483 Close [RFC7230, Section 8.1] 485 5 of the RFC723x headers are HTTP dates: 487 Date [RFC7231, Section 7.1.1.2] 488 Expires [RFC7234, Section 5.3] 489 If-Modified-Since [RFC7232, Section 3.3] 490 If-Unmodified-Since [RFC7232, Section 3.4] 491 Last-Modified [RFC7232, Section 2.2] 493 24 of the RFC723x headers use bespoke formats which only a single or 494 in rare cases two headers share: 496 Accept-Ranges [RFC7233, Section 2.3] 497 bytes-unit / other-range-unit 499 Authorization [RFC7235, Section 4.2] 500 Proxy-Authorization [RFC7235, Section 4.4] 501 credentials 503 Cache-Control [RFC7234, Section 5.2] 504 1#cache-directive 506 Content-Location [RFC7231, Section 3.1.4.2] 507 absolute-URI / partial-URI 509 Content-Range [RFC7233, Section 4.2] 510 byte-content-range / other-content-range 512 ETag [RFC7232, Section 2.3] 513 entity-tag 515 Forwarded [RFC7239] 516 1#forwarded-element 518 From [RFC7231, Section 5.5.1] 519 mailbox 521 If-Match [RFC7232, Section 3.1] 522 If-None-Match [RFC7232, Section 3.2] 523 "*" / 1#entity-tag 525 If-Range [RFC7233, Section 3.2] 526 entity-tag / HTTP-date 528 Host [RFC7230, Section 5.4] 529 uri-host [ ":" port ] 531 Location [RFC7231, Section 7.1.2] 532 URI-reference 534 Pragma [RFC7234, Section 5.4] 535 1#pragma-directive 537 Range [RFC7233, Section 3.1] 538 byte-ranges-specifier / other-ranges-specifier 540 Referer [RFC7231, Section 5.5.2] 541 absolute-URI / partial-URI 543 Retry-After [RFC7231, Section 7.1.3] 544 HTTP-date / delay-seconds 546 Server [RFC7231, Section 7.4.2] 547 User-Agent [RFC7231, Section 5.5.3] 548 product *( RWS ( product / comment ) ) 550 Via [RFC7230, Section 5.7.1] 551 1#( received-protocol RWS received-by [ RWS comment ] ) 553 Warning [RFC7234, Section 5.5] 554 1#warning-value 556 Proxy-Authenticate [RFC7235, Section 4.3] 557 WWW-Authenticate [RFC7235, Section 4.1] 558 1#challenge 560 Author's Address 562 Poul-Henning Kamp 563 The Varnish Cache Project 565 Email: phk@varnish-cache.org