idnits 2.17.1 draft-kamp-httpbis-structure-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (October 05, 2016) is 2758 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '5' on line 399 == Missing Reference: 'RFC7694' is mentioned on line 450, but not defined ** Obsolete undefined reference: RFC 7694 (Obsoleted by RFC 9110) == Missing Reference: 'Section 3' is mentioned on line 450, but not defined == Missing Reference: 'RFC7239' is mentioned on line 505, but not defined == Unused Reference: 'RFC5234' is defined on line 307, but no explicit reference was found in the text ** Obsolete normative reference: RFC 7230 (Obsoleted by RFC 9110, RFC 9112) Summary: 3 errors (**), 0 flaws (~~), 6 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group PH. Kamp 3 Internet-Draft The Varnish Cache Project 4 Intended status: Informational October 05, 2016 5 Expires: April 8, 2017 7 HTTP header common structure 8 draft-kamp-httpbis-structure-00 10 Abstract 12 An abstract data model for HTTP headers, "Common Structure", and a 13 HTTP/1 serialization of it, generalized from current HTTP headers. 15 Status of This Memo 17 This Internet-Draft is submitted in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF). Note that other groups may also distribute 22 working documents as Internet-Drafts. The list of current Internet- 23 Drafts is at http://datatracker.ietf.org/drafts/current/. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on April 8, 2017. 32 Copyright Notice 34 Copyright (c) 2016 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the Simplified BSD License. 47 1. Introduction 49 The HTTP protocol does not impose any structure or datamodel on the 50 information in HTTP headers, the HTTP/1 serialization is the 51 datamodel: An ASCII string without control characters. 53 HTTP header definitions specify how the string must be formatted and 54 while families of similar headers exist, it still requires an 55 uncomfortable large number of bespoke parser and validation routines 56 to process HTTP traffic correctly. 58 In order to improve performance HTTP/2 and HPACK uses naive text- 59 compression, which incidentally decoupled the on-the-wire 60 serialization from the data model. 62 During the development of HPACK it became evident that significantly 63 bigger gains were available if semantic compression could be used, 64 most notably with timestamps. However, the lack of a common data 65 structure for HTTP headers would make semantic compression one long 66 list of special cases. 68 Parallel to this, various proposals for how to fulfill data- 69 transportation needs, and to a lesser degree to impose some kind of 70 order on HTTP headers, at least going forward were floated. 72 All of these proposals, JSON, CBOR etc. run into the same basic 73 problem: Their serialization is incompatible with [RFC7230]'s ABNF 74 definition of 'field-value'. 76 For binary formats, such as CBOR, a wholesale base64/85 77 reserialization would be needed, with negative results for both 78 debugability and bandwidth. 80 For textual formats, such as JSON, the format must first be neutered 81 to not violate field-value's ABNF, and then workarounds added to 82 reintroduce the features just lost, for instance UNICODE strings, and 83 suddenly it is no longer JSON anymore. 85 This proposal starts from the other end, and builds and generalizes a 86 data structure definition from existing HTTP headers, which means 87 that HTTP/1 serialization and 'field-value' compatibility is built 88 in. 90 If all new HTTP headers are defined to fit into this Common Structure 91 we have at least halted the proliferation of bespoke parsers and 92 started to pave the road for semantic compression serializations of 93 HTTP traffic. 95 1.1. Terminology 97 In this document, the key words "MUST", "MUST NOT", "REQUIRED", 98 "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", 99 and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119 100 [RFC2119]. 102 2. Definition of HTTP header Common Structure 104 The data model of Common Structure is an ordered sequence of named 105 dictionaries. Please see Appendix A for how this model was derived. 107 The definition of the data model is on purpose abstract, uncoupled 108 from any protocol serialization or programming environment 109 representation, meant as the foundation on which all such 110 manifestations of the model can be built. 112 Common Structure in ABNF: 114 import token from RFC7230 115 import DIGIT from RFC5234 117 common-structure = 1* ( identifier dictionary ) 119 dictionary = * ( identifier value ) 121 value = identifier / 122 number / 123 ascii_string / 124 unicode_string / 125 blob / 126 timestamp / 127 common-structure 129 identifier = (token / "*") [ token / "*" ] 131 number = ["-"] 1*15 DIGIT 132 # XXX: Not sure how to do this in ABNF: 133 # XXX: A single "." allowed between any two digits 134 # The range is limited is to ensure it can be 135 # correctly represented in IEEE754 64 bit 136 # binary floating point format. 138 ascii_string = * %x20-7e 139 # This is a "safe" string in the sense that it 140 # contains no control characters or multi-byte 141 # sequences. If that is not fancy enough, use 142 # unicode_string. 144 unicode_string = * unicode_codepoint 145 # XXX: Is there a place to import this from ? 146 # Unrestricted unicode, because there is no sane 147 # way to restrict or otherwise make unicode "safe". 149 blob = * %0x00-ff 150 # Intended for cryptographic data and as a general 151 # escape mechanism for unmet requirements. 153 timestamp = POSIX time_t with optional millisecond resolution 154 # XXX: Is there a place to import this from ? 156 3. HTTP/1 serialization of HTTP header Common Structure 158 In ABNF: 160 import OWS from {{RFC7230}} 161 import HEXDIG, DQUOTE from {{RFC5234}} 162 h1_common-structure-header = 163 ( field-name ":" OWS ">" h1_common_structure "<" ) 164 # Self-identifying HTTP headers 165 ( field-name ":" OWS h1_common_structure ) / 166 # legacy HTTP headers on white-list, see {{iana}} 168 h1_common_structure = h1_element * ("," h1_element) 170 h1_element = identifier * (";" identifier ["=" h1_value]) 172 h1_value = identifier / 173 number / 174 h1_ascii_string / 175 h1_unicode_string / 176 h1_blob / 177 h1_timestamp / 178 h1_common-structure 180 h1_ascii_string = DQUOTE *( 181 ( "\" DQUOTE ) / 182 ( "\" "\" ) / 183 0x20-21 / 184 0x23-5B / 185 0x5D-7E 186 ) DQUOTE 187 # This is a proper subset of h1_unicode_string 188 # NB only allowed backslash escapes are \" and \\ 190 h1_unicode_string = DQUOTE *( 191 ( "\" DQUOTE ) 192 ( "\" "\" ) / 193 ( "\" "u" 4*HEXDIG ) / 194 0x20-21 / 195 0x23-5B / 196 0x5D-7E / 197 0x80-F7 198 ) DQUOTE 199 # XXX: how to say/import "UTF-8 encoding" ? 200 # HTTP1 unfriendly codepoints (00-1f, 7f) must be 201 # encoded with \uXXXX escapes 203 h1_blob = "'" base64 "'" 204 # XXX: where to import base64 from ? 206 h1_timestamp = number 207 # UNIX/POSIX time_t semantics. 208 # fractional seconds allowed. 210 h1_common_structure = ">" h1_common_structure "<" 212 XXX: Allow OWS in parsers, but not in generators ? 214 In programming environments which do not define a native 215 representation or serialization of Common Structure, the HTTP/1 216 serialization should be used. 218 4. When to use Common Structure parser 220 All future standardized and all private HTTP headers using Common 221 Structure should self identify as such. In the HTTP/1 serialization 222 by making the first character ">" and the last "<". (These two 223 characters are deliberately "the wrong way" to not clash with 224 exsisting usages.) 226 Legacy HTTP headers which fit into Common Structure, are marked as 227 such in the IANA Message Header Registry (see {iana}), and a snapshot 228 of the registry can be used to trigger parsing according to Common 229 Structure of these headers. 231 5. Desired normative effects 233 All new HTTP headers SHOULD use the Common Structure if at all 234 possible. 236 6. Open/Outstanding issues to resolve 238 6.1. Single/multiple headers 240 Should we allow splitting common structure data over multiple headers 241 ? 243 Pro: 245 Avoids size restrictions, easier on-the-fly editing 247 Contra: 249 Cannot act on any such header until all headers have been received. 251 We must define where headers can be split (between identifier and 252 dictionary ?, in the middle of dictionaries ?) 254 Most on-the-fly editing is hackish at best. 256 7. Future work 258 7.1. Redefining existing headers for better performance 260 The HTTP/1 serializations self-identification mechanism makes it 261 possible to extend the definition of existing Appendix C headers into 262 Common Structure. 264 For instance one could imagine: 266 Date: >1475061449.201< 268 Which would be faster to parse and validate than the current 269 definition of the Date header and more precise too. 271 Some kind of signal/negotiation mechanism would be required to make 272 this work in practice. 274 7.2. Define a validation dictionary 276 A machine-readable specification of the legal contents of HTTP 277 headers would go a long way to improve efficiency and security in 278 HTTP implementations. 280 8. IANA considerations 282 The IANA Message Header Registry will be extended with an additional 283 field named "Common Structure" which can have the values "True", 284 "False" or "Unknown". 286 The RFC723x headers listed in Appendix B will get the value "True" in 287 the new field. 289 The RFC723x headers listed in Appendix C will get the value "False" 290 in the new field. 292 All other existing entries in the registry will be set to "Unknown" 293 until and if the owner of the entry requests otherwise. 295 9. Normative References 297 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 298 Requirement Levels", BCP 14, RFC 2119, 299 DOI 10.17487/RFC2119, March 1997, 300 . 302 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 303 Protocol (HTTP/1.1): Message Syntax and Routing", 304 RFC 7230, DOI 10.17487/RFC7230, June 2014, 305 . 307 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 308 Specifications: ABNF", STD 68, RFC 5234, 309 DOI 10.17487/RFC5234, January 2008, 310 . 312 Appendix A. Does HTTP headers have any common structure ? 314 Several proposals have been floated in recent years to use some 315 preexisting structured data serialization or other for HTTP headers, 316 to impose some sanity. 318 None of these proposals have gained traction and no obvious candidate 319 data serializations have been left unexamined. 321 This effort tries to tackle the question from the other side, by 322 asking if there is a common structure in existing HTTP headers we can 323 generalize for this purpose. 325 A.1. Survey of HTTP header structure 327 The RFC723x family of HTTP/1 standards control 49 entries in the IANA 328 Message Header Registry, and they share two common motifs. 330 The majority of RFC723x HTTP headers are lists. A few of them are 331 ordered, ('Content-Encoding'), some are unordered ('Connection') and 332 some are ordered by 'q=%f' weight parameters ('Accept') 334 In most cases, the list elements are some kind of identifier, usually 335 derived from ABNF 'token' as defined by [RFC7230]. 337 A subgroup of headers, mostly related to MIME, uses what one could 338 call a 'qualified token':: 340 qualified_token = token_or_asterix [ "/" token_or_asterix ] 342 The second motif is parameterized list elements. The best known is 343 the "q=0.5" weight parameter, but other parameters exist as well. 345 Generalizing from these motifs, our candidate "Common Structure" data 346 model becomes an ordered list of named dictionaries. 348 In pidgin ABNF, ignoring white-space for the sake of clarity, the 349 HTTP/1.1 serialization of Common Structure is is something like: 351 token_or_asterix = token from {{RFC7230}}, but also allowing "*" 353 qualified_token = token_or_asterix [ "/" token_or_asterix ] 355 field-name, see {{RFC7230}} 357 Common_Structure_Header = field-name ":" 1#named_dictionary 359 named_dictionary = qualified_token [ *(";" param) ] 361 param = token [ "=" value ] 363 value = we'll get back to this in a moment. 365 Nineteen out of the RFC723x's 48 headers, almost 40%, can already be 366 parsed using this definition, and none the rest have requirements 367 which could not be met by this data model. See Appendix B and 368 Appendix C for the full survey details. 370 A.2. Survey of values in HTTP headers 372 Surveying the datatypes of HTTP headers, standardized as well as 373 private, the following picture emerges: 375 A.2.1. Numbers 377 Integer and floating point are both used. Range and precision is 378 mostly unspecified in controlling documents. 380 Scientific notation (9.192631770e9) does not seem to be used 381 anywhere. 383 The ranges used seem to be minus several thousand to plus a couple of 384 billions, the high end almost exclusively being POSIX time_t 385 timestamps. 387 A.2.2. Timestamps 389 RFC723x text format, but POSIX time_t represented as integer or 390 floating point is not uncommon. ISO8601 have also been spotted. 392 A.2.3. Strings 394 The vast majority are pure ASCII strings, with either no escapes, %xx 395 URL-like escapes or C-style back-slash escapes, possibly with the 396 addition of \uxxxx UNICODE escapes. 398 Where non-ASCII character sets are used, they are almost always 399 implicit, rather than explicit. UTF8 and ISO-8859-1[5] seem to be 400 most common. 402 A.2.4. Binary blobs 404 Often used for cryptographic data. Usually in base64 encoding, 405 sometimes ""-quoted more often not. base85 encoding is also seen, 406 usually quoted. 408 A.2.5. Identifiers 410 Seems to almost always fit in the RFC723x 'token' definition. 412 A.3. Is this actually a useful thing to generalize ? 414 The number one wishlist item seems to be UNICODE strings, with a big 415 side order of not having to write a new parser routine every time 416 somebody comes up with a new header. 418 Having a common parser would indeed be a good thing, and having an 419 underlying data model which makes it possible define a compressed 420 serialization, rather than rely on serialization to text followed by 421 text compression (ie: HPACK) seems like a good idea too. 423 However, when using a datamodel and a parser general enough to 424 transport useful data, it will have to be followed by a validation 425 step, which checks that the data also makes sense. 427 Today validation, such as it is, is often done by the bespoke 428 parsers. 430 This then is probably where the next big potential for improvement 431 lies: 433 Ideally a machine readable "data dictionary" which makes it possibly 434 to copy that text out of RFCs, run it through a code generator which 435 spits out validation code which operates on the output of the common 436 parser. 438 But history has been particularly unkind to that idea. 440 Most attempts studied as part of this effort, have sunk under 441 complexity caused by reaching for generality, but where scope has 442 been wisely limited, it seems to be possible. 444 So file that idea under "future work". 446 Appendix B. RFC723x headers with "common structure" 448 Accept [RFC7231, Section 5.3.2] 449 Accept-Charset [RFC7231, Section 5.3.3] 450 Accept-Encoding [RFC7231, Section 5.3.4][RFC7694, Section 3] 451 Accept-Language [RFC7231, Section 5.3.5] 452 Age [RFC7234, Section 5.1] 453 Allow [RFC7231, Section 7.4.1] 454 Connection [RFC7230, Section 6.1] 455 Content-Encoding [RFC7231, Section 3.1.2.2] 456 Content-Language [RFC7231, Section 3.1.3.2] 457 Content-Length [RFC7230, Section 3.3.2] 458 Content-Type [RFC7231, Section 3.1.1.5] 459 Expect [RFC7231, Section 5.1.1] 460 Max-Forwards [RFC7231, Section 5.1.2] 461 MIME-Version [RFC7231, Appendix A.1] 462 TE [RFC7230, Section 4.3] 463 Trailer [RFC7230, Section 4.4] 464 Transfer-Encoding [RFC7230, Section 3.3.1] 465 Upgrade [RFC7230, Section 6.7] 466 Vary [RFC7231, Section 7.1.4] 468 Appendix C. RFC723x headers with "uncommon structure" 470 1 of the RFC723x headers is only reserved, and therefore have no 471 structure at all: 473 Close [RFC7230, Section 8.1] 475 5 of the RFC723x headers are HTTP dates: 477 Date [RFC7231, Section 7.1.1.2] 478 Expires [RFC7234, Section 5.3] 479 If-Modified-Since [RFC7232, Section 3.3] 480 If-Unmodified-Since [RFC7232, Section 3.4] 481 Last-Modified [RFC7232, Section 2.2] 483 24 of the RFC723x headers use bespoke formats which only a single or 484 in rare cases two headers share: 486 Accept-Ranges [RFC7233, Section 2.3] 487 bytes-unit / other-range-unit 489 Authorization [RFC7235, Section 4.2] 490 Proxy-Authorization [RFC7235, Section 4.4] 491 credentials 493 Cache-Control [RFC7234, Section 5.2] 494 1#cache-directive 496 Content-Location [RFC7231, Section 3.1.4.2] 497 absolute-URI / partial-URI 499 Content-Range [RFC7233, Section 4.2] 500 byte-content-range / other-content-range 502 ETag [RFC7232, Section 2.3] 503 entity-tag 505 Forwarded [RFC7239] 506 1#forwarded-element 508 From [RFC7231, Section 5.5.1] 509 mailbox 511 If-Match [RFC7232, Section 3.1] 512 If-None-Match [RFC7232, Section 3.2] 513 "*" / 1#entity-tag 515 If-Range [RFC7233, Section 3.2] 516 entity-tag / HTTP-date 518 Host [RFC7230, Section 5.4] 519 uri-host [ ":" port ] 521 Location [RFC7231, Section 7.1.2] 522 URI-reference 524 Pragma [RFC7234, Section 5.4] 525 1#pragma-directive 527 Range [RFC7233, Section 3.1] 528 byte-ranges-specifier / other-ranges-specifier 530 Referer [RFC7231, Section 5.5.2] 531 absolute-URI / partial-URI 533 Retry-After [RFC7231, Section 7.1.3] 534 HTTP-date / delay-seconds 536 Server [RFC7231, Section 7.4.2] 537 User-Agent [RFC7231, Section 5.5.3] 538 product *( RWS ( product / comment ) ) 540 Via [RFC7230, Section 5.7.1] 541 1#( received-protocol RWS received-by [ RWS comment ] ) 543 Warning [RFC7234, Section 5.5] 544 1#warning-value 546 Proxy-Authenticate [RFC7235, Section 4.3] 547 WWW-Authenticate [RFC7235, Section 4.1] 548 1#challenge 550 Author's Address 552 Poul-Henning Kamp 553 The Varnish Cache Project 555 Email: phk@varnish-cache.org