idnits 2.17.1 draft-kamp-httpbis-structure-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (October 30, 2016) is 2734 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '5' on line 402 == Missing Reference: 'RFC7694' is mentioned on line 453, but not defined ** Obsolete undefined reference: RFC 7694 (Obsoleted by RFC 9110) == Missing Reference: 'Section 3' is mentioned on line 453, but not defined == Missing Reference: 'RFC7239' is mentioned on line 508, but not defined == Unused Reference: 'RFC5234' is defined on line 310, but no explicit reference was found in the text ** Obsolete normative reference: RFC 7230 (Obsoleted by RFC 9110, RFC 9112) Summary: 3 errors (**), 0 flaws (~~), 6 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group PH. Kamp 3 Internet-Draft The Varnish Cache Project 4 Intended status: Informational October 30, 2016 5 Expires: May 3, 2017 7 HTTP header common structure 8 draft-kamp-httpbis-structure-01 10 Abstract 12 An abstract data model for HTTP headers, "Common Structure", and a 13 HTTP/1 serialization of it, generalized from current HTTP headers. 15 Status of This Memo 17 This Internet-Draft is submitted in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF). Note that other groups may also distribute 22 working documents as Internet-Drafts. The list of current Internet- 23 Drafts is at http://datatracker.ietf.org/drafts/current/. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on May 3, 2017. 32 Copyright Notice 34 Copyright (c) 2016 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the Simplified BSD License. 47 1. Introduction 49 The HTTP protocol does not impose any structure or datamodel on the 50 information in HTTP headers, the HTTP/1 serialization is the 51 datamodel: An ASCII string without control characters. 53 HTTP header definitions specify how the string must be formatted and 54 while families of similar headers exist, it still requires an 55 uncomfortable large number of bespoke parser and validation routines 56 to process HTTP traffic correctly. 58 In order to improve performance HTTP/2 and HPACK uses naive text- 59 compression, which incidentally decoupled the on-the-wire 60 serialization from the data model. 62 During the development of HPACK it became evident that significantly 63 bigger gains were available if semantic compression could be used, 64 most notably with timestamps. However, the lack of a common data 65 structure for HTTP headers would make semantic compression one long 66 list of special cases. 68 Parallel to this, various proposals for how to fulfill data- 69 transportation needs, and to a lesser degree to impose some kind of 70 order on HTTP headers, at least going forward were floated. 72 All of these proposals, JSON, CBOR etc. run into the same basic 73 problem: Their serialization is incompatible with [RFC7230]'s ABNF 74 definition of 'field-value'. 76 For binary formats, such as CBOR, a wholesale base64/85 77 reserialization would be needed, with negative results for both 78 debugability and bandwidth. 80 For textual formats, such as JSON, the format must first be neutered 81 to not violate field-value's ABNF, and then workarounds added to 82 reintroduce the features just lost, for instance UNICODE strings, and 83 suddenly it is no longer JSON anymore. 85 This proposal starts from the other end, and builds and generalizes a 86 data structure definition from existing HTTP headers, which means 87 that HTTP/1 serialization and 'field-value' compatibility is built 88 in. 90 If all future HTTP headers are defined to fit into this Common 91 Structure we have at least halted the proliferation of bespoke 92 parsers and started to pave the road for semantic compression 93 serializations of HTTP traffic. 95 1.1. Terminology 97 In this document, the key words "MUST", "MUST NOT", "REQUIRED", 98 "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", 99 and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119 100 [RFC2119]. 102 2. Definition of HTTP header Common Structure 104 The data model of Common Structure is an ordered sequence of named 105 dictionaries. Please see Appendix A for how this model was derived. 107 The definition of the data model is on purpose abstract, uncoupled 108 from any protocol serialization or programming environment 109 representation, meant as the foundation on which all such 110 manifestations of the model can be built. 112 Common Structure in ABNF: 114 import token from RFC7230 115 import DIGIT from RFC5234 117 common-structure = 1* ( identifier dictionary ) 119 dictionary = * ( identifier value ) 121 value = identifier / 122 number / 123 ascii_string / 124 unicode_string / 125 blob / 126 timestamp / 127 common-structure 129 identifier = token [ "/" token ] 131 number = ["-"] 1*15 DIGIT 132 # XXX: Not sure how to do this in ABNF: 133 # XXX: A single "." allowed between any two digits 134 # The range is limited is to ensure it can be 135 # correctly represented in IEEE754 64 bit 136 # binary floating point format. 138 ascii_string = * %x20-7e 139 # This is a "safe" string in the sense that it 140 # contains no control characters or multi-byte 141 # sequences. If that is not fancy enough, use 142 # unicode_string. 144 unicode_string = * unicode_codepoint 145 # XXX: Is there a place to import this from ? 146 # Unrestricted unicode, because there is no sane 147 # way to restrict or otherwise make unicode "safe". 149 blob = * %0x00-ff 150 # Intended for cryptographic data and as a general 151 # escape mechanism for unmet requirements. 153 timestamp = POSIX time_t with optional millisecond resolution 154 # XXX: Is there a place to import this from ? 156 3. HTTP/1 serialization of HTTP header Common Structure 158 In ABNF: 160 import OWS from {{RFC7230}} 161 import HEXDIG, DQUOTE from {{RFC5234}} 162 import UTF8-2, UTF8-3, UTF8-4 from {{RFC3629}} 164 h1_common-structure-header = 165 ( field-name ":" OWS ">" h1_common_structure "<" ) 166 # Self-identifying HTTP headers 167 ( field-name ":" OWS h1_common_structure ) / 168 # legacy HTTP headers on white-list, see {{iana}} 170 h1_common_structure = h1_element * ("," h1_element) 172 h1_element = identifier * (";" identifier ["=" h1_value]) 174 h1_value = identifier / 175 number / 176 h1_ascii_string / 177 h1_unicode_string / 178 h1_blob / 179 h1_timestamp / 180 h1_common-structure 182 h1_ascii_string = DQUOTE *( 183 ( "\" DQUOTE ) / 184 ( "\" "\" ) / 185 0x20-21 / 186 0x23-5B / 187 0x5D-7E 188 ) DQUOTE 189 # This is a proper subset of h1_unicode_string 190 # NB only allowed backslash escapes are \" and \\ 192 h1_unicode_string = DQUOTE *( 193 ( "\" DQUOTE ) 194 ( "\" "\" ) / 195 ( "\" "u" 4*HEXDIG ) / 196 0x20-21 / 197 0x23-5B / 198 0x5D-7E / 199 UTF8-2 / 200 UTF8-3 / 201 UTF8-4 202 ) DQUOTE 203 # This is UTF8 with HTTP1 unfriendly codepoints 204 # (00-1f, 7f) neutered with \uXXXX escapes. 206 h1_blob = "'" base64 "'" 207 # XXX: where to import base64 from ? 209 h1_timestamp = number 210 # UNIX/POSIX time_t semantics. 211 # fractional seconds allowed. 213 h1_common_structure = ">" h1_common_structure "<" 215 XXX: Allow OWS in parsers, but not in generators ? 217 In programming environments which do not define a native 218 representation or serialization of Common Structure, the HTTP/1 219 serialization should be used. 221 4. When to use Common Structure parser 223 All future standardized and all private HTTP headers using Common 224 Structure should self identify as such. In the HTTP/1 serialization 225 by making the first character ">" and the last "<". (These two 226 characters are deliberately "the wrong way" to not clash with 227 exsisting usages.) 229 Legacy HTTP headers which fit into Common Structure, are marked as 230 such in the IANA Message Header Registry (see {iana}), and a snapshot 231 of the registry can be used to trigger parsing according to Common 232 Structure of these headers. 234 5. Desired normative effects 236 All new HTTP headers SHOULD use the Common Structure if at all 237 possible. 239 6. Open/Outstanding issues to resolve 241 6.1. Single/multiple headers 243 Should we allow splitting common structure data over multiple headers 244 ? 246 Pro: 248 Avoids size restrictions, easier on-the-fly editing 250 Contra: 252 Cannot act on any such header until all headers have been received. 254 We must define where headers can be split (between identifier and 255 dictionary ?, in the middle of dictionaries ?) 257 Most on-the-fly editing is hackish at best. 259 7. Future work 261 7.1. Redefining existing headers for better performance 263 The HTTP/1 serializations self-identification mechanism makes it 264 possible to extend the definition of existing Appendix C headers into 265 Common Structure. 267 For instance one could imagine: 269 Date: >1475061449.201< 271 Which would be faster to parse and validate than the current 272 definition of the Date header and more precise too. 274 Some kind of signal/negotiation mechanism would be required to make 275 this work in practice. 277 7.2. Define a validation dictionary 279 A machine-readable specification of the legal contents of HTTP 280 headers would go a long way to improve efficiency and security in 281 HTTP implementations. 283 8. IANA considerations 285 The IANA Message Header Registry will be extended with an additional 286 field named "Common Structure" which can have the values "True", 287 "False" or "Unknown". 289 The RFC723x headers listed in Appendix B will get the value "True" in 290 the new field. 292 The RFC723x headers listed in Appendix C will get the value "False" 293 in the new field. 295 All other existing entries in the registry will be set to "Unknown" 296 until and if the owner of the entry requests otherwise. 298 9. Normative References 300 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 301 Requirement Levels", BCP 14, RFC 2119, 302 DOI 10.17487/RFC2119, March 1997, 303 . 305 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 306 Protocol (HTTP/1.1): Message Syntax and Routing", 307 RFC 7230, DOI 10.17487/RFC7230, June 2014, 308 . 310 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 311 Specifications: ABNF", STD 68, RFC 5234, 312 DOI 10.17487/RFC5234, January 2008, 313 . 315 Appendix A. Does HTTP headers have any common structure ? 317 Several proposals have been floated in recent years to use some 318 preexisting structured data serialization or other for HTTP headers, 319 to impose some sanity. 321 None of these proposals have gained traction and no obvious candidate 322 data serializations have been left unexamined. 324 This effort tries to tackle the question from the other side, by 325 asking if there is a common structure in existing HTTP headers we can 326 generalize for this purpose. 328 A.1. Survey of HTTP header structure 330 The RFC723x family of HTTP/1 standards control 49 entries in the IANA 331 Message Header Registry, and they share two common motifs. 333 The majority of RFC723x HTTP headers are lists. A few of them are 334 ordered, ('Content-Encoding'), some are unordered ('Connection') and 335 some are ordered by 'q=%f' weight parameters ('Accept') 337 In most cases, the list elements are some kind of identifier, usually 338 derived from ABNF 'token' as defined by [RFC7230]. 340 A subgroup of headers, mostly related to MIME, uses what one could 341 call a 'qualified token':: 343 qualified_token = token_or_asterix [ "/" token_or_asterix ] 345 The second motif is parameterized list elements. The best known is 346 the "q=0.5" weight parameter, but other parameters exist as well. 348 Generalizing from these motifs, our candidate "Common Structure" data 349 model becomes an ordered list of named dictionaries. 351 In pidgin ABNF, ignoring white-space for the sake of clarity, the 352 HTTP/1.1 serialization of Common Structure is is something like: 354 token_or_asterix = token from {{RFC7230}}, but also allowing "*" 356 qualified_token = token_or_asterix [ "/" token_or_asterix ] 358 field-name, see {{RFC7230}} 360 Common_Structure_Header = field-name ":" 1#named_dictionary 362 named_dictionary = qualified_token [ *(";" param) ] 364 param = token [ "=" value ] 366 value = we'll get back to this in a moment. 368 Nineteen out of the RFC723x's 48 headers, almost 40%, can already be 369 parsed using this definition, and none the rest have requirements 370 which could not be met by this data model. See Appendix B and 371 Appendix C for the full survey details. 373 A.2. Survey of values in HTTP headers 375 Surveying the datatypes of HTTP headers, standardized as well as 376 private, the following picture emerges: 378 A.2.1. Numbers 380 Integer and floating point are both used. Range and precision is 381 mostly unspecified in controlling documents. 383 Scientific notation (9.192631770e9) does not seem to be used 384 anywhere. 386 The ranges used seem to be minus several thousand to plus a couple of 387 billions, the high end almost exclusively being POSIX time_t 388 timestamps. 390 A.2.2. Timestamps 392 RFC723x text format, but POSIX time_t represented as integer or 393 floating point is not uncommon. ISO8601 have also been spotted. 395 A.2.3. Strings 397 The vast majority are pure ASCII strings, with either no escapes, %xx 398 URL-like escapes or C-style back-slash escapes, possibly with the 399 addition of \uxxxx UNICODE escapes. 401 Where non-ASCII character sets are used, they are almost always 402 implicit, rather than explicit. UTF8 and ISO-8859-1[5] seem to be 403 most common. 405 A.2.4. Binary blobs 407 Often used for cryptographic data. Usually in base64 encoding, 408 sometimes ""-quoted more often not. base85 encoding is also seen, 409 usually quoted. 411 A.2.5. Identifiers 413 Seems to almost always fit in the RFC723x 'token' definition. 415 A.3. Is this actually a useful thing to generalize ? 417 The number one wishlist item seems to be UNICODE strings, with a big 418 side order of not having to write a new parser routine every time 419 somebody comes up with a new header. 421 Having a common parser would indeed be a good thing, and having an 422 underlying data model which makes it possible define a compressed 423 serialization, rather than rely on serialization to text followed by 424 text compression (ie: HPACK) seems like a good idea too. 426 However, when using a datamodel and a parser general enough to 427 transport useful data, it will have to be followed by a validation 428 step, which checks that the data also makes sense. 430 Today validation, such as it is, is often done by the bespoke 431 parsers. 433 This then is probably where the next big potential for improvement 434 lies: 436 Ideally a machine readable "data dictionary" which makes it possibly 437 to copy that text out of RFCs, run it through a code generator which 438 spits out validation code which operates on the output of the common 439 parser. 441 But history has been particularly unkind to that idea. 443 Most attempts studied as part of this effort, have sunk under 444 complexity caused by reaching for generality, but where scope has 445 been wisely limited, it seems to be possible. 447 So file that idea under "future work". 449 Appendix B. RFC723x headers with "common structure" 451 Accept [RFC7231, Section 5.3.2] 452 Accept-Charset [RFC7231, Section 5.3.3] 453 Accept-Encoding [RFC7231, Section 5.3.4][RFC7694, Section 3] 454 Accept-Language [RFC7231, Section 5.3.5] 455 Age [RFC7234, Section 5.1] 456 Allow [RFC7231, Section 7.4.1] 457 Connection [RFC7230, Section 6.1] 458 Content-Encoding [RFC7231, Section 3.1.2.2] 459 Content-Language [RFC7231, Section 3.1.3.2] 460 Content-Length [RFC7230, Section 3.3.2] 461 Content-Type [RFC7231, Section 3.1.1.5] 462 Expect [RFC7231, Section 5.1.1] 463 Max-Forwards [RFC7231, Section 5.1.2] 464 MIME-Version [RFC7231, Appendix A.1] 465 TE [RFC7230, Section 4.3] 466 Trailer [RFC7230, Section 4.4] 467 Transfer-Encoding [RFC7230, Section 3.3.1] 468 Upgrade [RFC7230, Section 6.7] 469 Vary [RFC7231, Section 7.1.4] 471 Appendix C. RFC723x headers with "uncommon structure" 473 1 of the RFC723x headers is only reserved, and therefore have no 474 structure at all: 476 Close [RFC7230, Section 8.1] 478 5 of the RFC723x headers are HTTP dates: 480 Date [RFC7231, Section 7.1.1.2] 481 Expires [RFC7234, Section 5.3] 482 If-Modified-Since [RFC7232, Section 3.3] 483 If-Unmodified-Since [RFC7232, Section 3.4] 484 Last-Modified [RFC7232, Section 2.2] 486 24 of the RFC723x headers use bespoke formats which only a single or 487 in rare cases two headers share: 489 Accept-Ranges [RFC7233, Section 2.3] 490 bytes-unit / other-range-unit 492 Authorization [RFC7235, Section 4.2] 493 Proxy-Authorization [RFC7235, Section 4.4] 494 credentials 496 Cache-Control [RFC7234, Section 5.2] 497 1#cache-directive 499 Content-Location [RFC7231, Section 3.1.4.2] 500 absolute-URI / partial-URI 502 Content-Range [RFC7233, Section 4.2] 503 byte-content-range / other-content-range 505 ETag [RFC7232, Section 2.3] 506 entity-tag 508 Forwarded [RFC7239] 509 1#forwarded-element 511 From [RFC7231, Section 5.5.1] 512 mailbox 514 If-Match [RFC7232, Section 3.1] 515 If-None-Match [RFC7232, Section 3.2] 516 "*" / 1#entity-tag 518 If-Range [RFC7233, Section 3.2] 519 entity-tag / HTTP-date 521 Host [RFC7230, Section 5.4] 522 uri-host [ ":" port ] 524 Location [RFC7231, Section 7.1.2] 525 URI-reference 527 Pragma [RFC7234, Section 5.4] 528 1#pragma-directive 530 Range [RFC7233, Section 3.1] 531 byte-ranges-specifier / other-ranges-specifier 533 Referer [RFC7231, Section 5.5.2] 534 absolute-URI / partial-URI 536 Retry-After [RFC7231, Section 7.1.3] 537 HTTP-date / delay-seconds 539 Server [RFC7231, Section 7.4.2] 540 User-Agent [RFC7231, Section 5.5.3] 541 product *( RWS ( product / comment ) ) 543 Via [RFC7230, Section 5.7.1] 544 1#( received-protocol RWS received-by [ RWS comment ] ) 546 Warning [RFC7234, Section 5.5] 547 1#warning-value 549 Proxy-Authenticate [RFC7235, Section 4.3] 550 WWW-Authenticate [RFC7235, Section 4.1] 551 1#challenge 553 Author's Address 555 Poul-Henning Kamp 556 The Varnish Cache Project 558 Email: phk@varnish-cache.org