HTTP Working Group                                             P-H. Kamp                                         M. Nottingham
Internet-Draft                                 The Varnish Cache Project                                                    Fastly
Intended status: Standards Track                          April 24, 2017                               P-H. Kamp
Expires: October 26, May 31, 2018                          The Varnish Cache Project
                                                       November 27, 2017

                      Structured Headers for HTTP Header Common Structure
                 draft-ietf-httpbis-header-structure-01
                 draft-ietf-httpbis-header-structure-02

Abstract

   An abstract data model for

   This document describes Structured Headers, a way of simplifying HTTP headers, "Common Structure",
   header field definition and a
   HTTP/1 serialization parsing.  It is intended for use by new
   specifications of it, generalized from current HTTP headers. header fields.  This includes revisions of
   existing specifications when doing so does not cause interoperability
   issues.

Note to Readers

   Discussion of this draft takes place on the HTTP working group
   mailing list (ietf-http-wg@w3.org), which is archived at
   https://lists.w3.org/Archives/Public/ietf-http-wg/ . [1].

   _RFC EDITOR: please remove this section before publication_

   Working Group information can be found at http://httpwg.github.io/ ; https://httpwg.github.io/
   [2]; source code and issues list for this draft can be found at
   https://github.com/httpwg/http-extensions/labels/header-structure .
   [3].

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/. https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on October 26, 2017. May 31, 2018.

Copyright Notice

   Copyright (c) 2017 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info)
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

1.  Introduction

   The HTTP protocol does not impose any structure or datamodel on

   Specifying the
   information in syntax of new HTTP headers, the HTTP/1 serialization header fields is an onerous task;
   even with the
   datamodel: An ASCII string without control characters. guidance in [RFC7231], Section 8.3.1, there are many
   decisions - and pitfalls - for a prospective HTTP header definitions specify how the string must be formatted and
   while families of similar headers exist, it still requires an
   uncomfortable large number of field
   author.

   Likewise, bespoke parser and validation routines
   to process HTTP traffic correctly.

   In order parsers often need to improve performance HTTP/2 and HPACK uses naive text-
   compression, which incidentally decoupled the on-the-wire
   serialization from the data model.

   During the development of HPACK it became evident that significantly
   bigger gains were available if semantic compression could be used,
   most notably with timestamps.  However, the lack of a common data
   structure for HTTP headers would make semantic compression one long
   list of special cases.

   Parallel to this, various proposals written for how to fulfill data-
   transportation needs, and to a lesser degree to impose some kind of
   order on specific HTTP
   headers, at least going forward, were floated.

   All because each has slightly different handling of what looks
   like common syntax.

   This document introduces structured HTTP header field values
   (hereafter, Structured Headers) to address these proposals, JSON, CBOR etc. run into the same basic
   problem: Their serialization is incompatible with RFC 7230's
   [RFC7230] ABNF definition of 'field-value'.

   For binary formats, such as CBOR, problems.
   Structured Headers define a wholesale base64/85
   reserialization would be needed, generic, abstract model for data, along
   with negative results a concrete serialisation for both
   debugability and bandwidth.

   For expressing that model in textual formats, such
   HTTP headers, as used by HTTP/1 [RFC7230] and HTTP/2 [RFC7540].

   HTTP headers that are defined as JSON, Structured Headers use the format must first be neutered types
   defined in this specification to not violate field-value's ABNF, define their syntax and then workarounds added to
   reintroduce the features just lost, for instance UNICODE strings.

   The post-surgery format is no longer JSON, basic
   handling rules, thereby simplifying both their definition and
   parsing.

   Additionally, future versions of HTTP can define alternative
   serialisations of the abstract model of Structured Headers, allowing
   headers that use it experience
   indicates to be transmitted more efficiently without being
   redefined.

   Note that almost-but-not-quite compatibility it is worse than no
   compatibility.

   This proposal starts from the other end, and builds and generalizes not a
   data structure definition from goal of this document to redefine the syntax of
   existing HTTP headers, which means
   that HTTP/1 serialization and 'field-value' compatibility is built
   in.

   If all future HTTP headers headers; the mechanisms described herein are defined only
   intended to fit be used with headers that explicitly opt into this Common
   Structure we have them.

   To specify a header field that uses Structured Headers, see
   Section 2.

   Section 4 defines a number of abstract data types that can be used in
   Structured Headers, of which only three are allowed at least halted the proliferation of bespoke
   parsers "top"
   level: lists, dictionaries, or items.

   Those abstract types can be serialised into textual headers - such as
   those used in HTTP/1 and started to pave HTTP/2 - using the road for semantic compression
   serializations of HTTP traffic. algorithms described in
   Section 3.

1.1.  Terminology

   In this document, the  Notational Conventions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC 2119
   [RFC2119].

2.  Definition
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   This document uses the Augmented Backus-Naur Form (ABNF) notation of
   [RFC5234], including the DIGIT, ALPHA and DQUOTE rules from that
   document.  It also includes the OWS rule from [RFC7230].

2.  Specifying Structured Headers

   HTTP Header Common Structure

   The data model headers that use Structured Headers need to be defined to do so
   explicitly; recipients and generators need to know that the
   requirements of Common Structure this document are in effect.  The simplest way to do
   that is an ordered sequence of named
   dictionaries.  Please see Appendix A for how by referencing this model was derived. document in its definition.

   The field's definition will also need to specify the field-value's
   allowed syntax, in terms of the data model is on purpose abstract, uncoupled
   from any protocol serialization types described in Section 4, along
   with their associated semantics.

   Field definitions MUST NOT relax or programming environment
   representation, it is meant otherwise modify the requirements
   of this specification; doing so would preclude handling by generic
   software.

   However, field definitions are encouraged to clearly state additional
   constraints upon the syntax, as well as the foundation on which all such
   manifestations consequences when those
   constraints are violated.

   For example:

   # FooExample Header

   The FooExample HTTP header field conveys a list of numbers about how
   much Foo the model can sender has.

   FooExample is a Structured header [RFCxxxx]. Its value MUST be built.

   Common Structure in ABNF (Slightly bastardized relative to RFC5234
   [RFC5234]):

     import token from RFC7230
     import DIGIT from RFC5234

     common-structure = 1* ( identifier a
   dictionary ) ([RFCxxxx], Section Y.Y).

   The dictionary = MUST contain:

   * ( identifier [ value ] ) A member whose key is "foo", and whose value = identifier / is an integer /
     ([RFCxxxx], Section Y.Y), indicating the number /
             ascii-string /
             unicode-string /
             blob /
             timestamp /
             common-structure

   Recursion of foos in
     the message.
   * A member whose key is included as a way to to support deep "bar", and more general
   data structures, but its use whose value is highly discouraged a string
     ([RFCxxxx], Section Y.Y), conveying the characteristic bar-ness
     of the message.

   If the parsed header field does not contain both, it MUST be ignored.

   Note that empty header field values are not allowed by the syntax,
   and where therefore will be considered errors.

3.  Parsing Requirements for Textual Headers

   When a receiving implementation parses textual HTTP header fields
   (e.g., in HTTP/1 or HTTP/2) that are known to be Structured Headers,
   it is
   used important that care be taken, as there are a number of edge
   cases that can cause interoperability or even security problems.
   This section specifies the algorithm for doing so.

   Given an ASCII string input_string that represents the chosen
   header's field-value, return the parsed header value.  Note that
   input_string may incorporate multiple header lines combined into one
   comma-separated field-value, as per [RFC7230], Section 3.2.2.

   1.  Discard any OWS from the depth beginning of recursion SHALL always input_string.

   2.  If the field-value is defined to be explicitly limited a dictionary, return the
       result of Parsing a Dictionary from Textual headers
       (Section 4.7).

   3.  If the field-value is defined to be a list, return the result of
       Parsing a List from Textual Headers (Section 4.8).

   4.  If the field-value is defined to be a parameterised label, return
       the result of Parsing a Parameterised Label from Textual headers
       (Section 4.4).

   5.  Otherwise, return the result of Parsing an Item from Textual
       Headers (Section 4.6).

   Note that in the
   specifications case of lists and dictionaries, this has the effect
   of combining multiple instances of the header field into one.
   However, for singular items and parameterised labels, it has the
   effect of selecting the first value and ignoring any subsequent
   instances of the field, as well as extraneous text afterwards.

   Additionally, note that the effect of the parsing algorithms as
   specified is generally intolerant of syntax errors; if one is
   encountered, the typical response is to throw an error, thereby
   discarding the entire header field value.  This includes any non-
   ASCII characters in input_string.

4.  Structured Header Data Types

   This section defines the abstract value types that can be composed
   into Structured Headers, along with the textual HTTP headers which allow it.

     identifier = token  [ "/" token ]

     integer = ["-"] 1*19 DIGIT

   Integers SHALL serialisations
   of them.

4.1.  Numbers

   Abstractly, numbers are integers with an optional fractional part.
   They have a maximum of fifteen digits available to be used in one or
   both of the range +/- 2^63-1 (= +/- 9223372036854775807) parts, as reflected in the ABNF below; this allows them
   to be stored as IEEE 754 double precision numbers (binary64)
   ([IEEE754]).

   The textual HTTP serialisation of numbers allows a maximum of fifteen
   digits between the integer and fractional part, along with an
   optional "-" indicating negative numbers.

   number   = ["-"] ( "." 1*15DIGIT /
                DIGIT '.' "." 1*14DIGIT /
              ["-"]
               2DIGIT '.' "." 1*13DIGIT /
              ["-"]
               3DIGIT '.' "." 1*12DIGIT /
              ...
               4DIGIT "." 1*11DIGIT /
               5DIGIT "." 1*10DIGIT /
               6DIGIT "." 1*9DIGIT /
               7DIGIT "." 1*8DIGIT /
               8DIGIT "." 1*7DIGIT /
               9DIGIT "." 1*6DIGIT /
              10DIGIT "." 1*5DIGIT /
              11DIGIT "." 1*4DIGIT /
              ["-"]
              12DIGIT '.' "." 1*3DIGIT /
              ["-"]
              13DIGIT '.' "." 1*2DIGIT /
              ["-"]
              14DIGIT '.' "." 1DIGIT

   The limit of 15 significant digits /
              15DIGIT )

   integer  = ["-"] 1*15DIGIT
   unsigned = 1*15DIGIT

   integer and unsigned are defined as conveniences to specification
   authors; if their use is chosen so that numbers can specified and their ABNF is not matched, a
   parser MUST consider it to be
   correctly represented by IEEE754 64 bit binary floating point.

     ascii-string = * %x20-7e

   This invalid.

   For example, a header whose value is intended defined as a number could look
   like:

   ExampleNumberHeader: 4.5

4.1.1.  Parsing Numbers from Textual Headers

   TBD

4.2.  Strings

   Abstractly, strings are ASCII strings [RFC0020], excluding control
   characters (i.e., the range 0x20 to 0x7E).  Note that this excludes
   tabs, newlines and carriage returns.  They may be an efficient, "safe" at most 1024
   characters long.

   The textual HTTP serialisation of strings uses a backslash ("") to
   escape double quotes and uncomplicated backslashes in strings.

   string
   type, for uses where the    = DQUOTE 1*1024(char) DQUOTE
   char      = unescaped / escape ( DQUOTE / "\" )
   unescaped = %x20-21 / %x23-5B / %x5D-7E
   escape    = "\"
   For example, a header whose value is defined as a string content could look
   like:

   ExampleStringHeader: "hello world"

   Note that strings only use DQUOTE as a delimiter; single quotes do
   not delimit strings.  Furthermore, only DQUOTE and "" can be escaped;
   other sequences MUST generate an error.

   Unicode is culturally neutral or
   where not directly supported in Structured Headers, because it will
   causes a number of interoperability issues, and - with few exceptions
   - header values do not require it.

   When it is necessary for a field value to convey non-ASCII string
   content, binary content (Section 4.5) SHOULD be user visible.

     unicode-string = * UNICODE

     UNICODE = <U+0000-U+D7FF / U+E000-U+10FFFF>
     # UNICODE nicked specified, along with
   a character encoding (most likely, UTF-8).

4.2.1.  Parsing a String from draft-seantek-unicode-in-abnf-02

   Unicode-strings are unrestricted because there Textual Headers

   Given an ASCII string input_string, return an unquoted string.
   input_string is no sane and/or
   culturally neutral way modified to subset or otherwise make unicode "safe",
   and Unicode remove the parsed value.

   1.  Let output_string be an empty string.

   2.  If the first character of input_string is still evolving new and interesting code points.

   Users not DQUOTE, throw an
       error.

   3.  Discard the first character of unicode-string SHALL input_string.

   4.  If input_string contains more than 1025 characters, throw an
       error.

   5.  While input_string is not empty:

       1.  Let char be prepared for the full gammut result of
   glyph-gymnastics in order to avoid U+1F4A9 U+08 U+1F574.

     blob = * %0x00-ff

   Blobs are intended primarily for cryptographic data, but can removing the first character of
           input_string.

       2.  If char is a backslash ("\"):

           1.  If input_string is now empty, throw an error.

           2.  Else:

               1.  Let next_char be used
   for any otherwise unsatisfied needs.

     timestamp = number

   A timestamp counts seconds since the UNIX time_t epoch, including result of removing the
   "invisible leap-seconds" misfeature. first
                   character of input_string.

               2.  If next_char is not DQUOTE or "\", throw an error.

               3.  HTTP/1 Serialization  Append next_char to output_string.

       3.  Else, if char is DQUOTE, remove the first character of
           input_string and return output_string.

       4.  Else, append char to output_string.

   6.  Otherwise, throw an error.

4.3.  Labels

   Labels are short (up to 256 characters) textual identifiers; their
   abstract model is identical to their expression in the textual HTTP Header Common Structure

   In ABNF:

     import OWS from RFC7230
     import HEXDIG, DQUOTE from RFC5234
     import EmbeddedUnicodeChar from RFC5137

     h1-common-structure-header
   serialisation.

   label =
             h1-common-structure-legacy-header lcalpha *255( lcalpha /
             h1-common-structure-self-identifying-header

     h1-common-structure-legacy-header =
             field-name ":" OWS h1-common-structure

   Only white-listed legacy headers (see Section 8) can use this format.

     h1-common-structure-self-identifying-header:
             field-name ":" OWS ">" h1-common-structure "<"

     h1-common-structure = h1-element * ("," h1-element)

     h1-element = identifier * (";" identifier ["=" h1-value])

     h1-value = identifier /
             integer /
             number /
             h1-ascii-string /
             h1-unicode-string /
             h1-blob /
             h1-timestamp /
             ">" h1-common-structure "<"

     h1-ascii-string = DQUOTE *(
                       ( "\" DQUOTE ) /
                       ( "\" "\" ) DIGIT /
                       0x20-21 "_" /
                       0x23-5B "-"/ "*" /
                       0x5D-7E "/" ) DQUOTE

     h1-unicode-string
   lcalpha = DQUOTE *(
                         ( "\" DQUOTE )
                         ( "\" "\" ) /
                         EmbeddedUnicodeChar /
                         0x20-21 /
                         0x23-5B /
                         0x5D-7E /
                         ) DQUOTE

   The dim prospects of ever getting %x61-7A ; a-z

   Note that labels can only contain lowercase letters.

   For example, a majority of HTTP1 paths 8-bit
   clean makes UTF-8 unviable header whose value is defined as H1 serialization. a label could look
   like:

   ExampleLabelHeader: foo/bar

4.3.1.  Parsing a Label from Textual Headers

   Given that very
   little an ASCII string input_string, return a label. input_string is
   modified to remove the parsed value.

   1.  If input_string contains more than 256 characters, throw an
       error.

   2.  If the first character of input_string is not lcalpha, throw an
       error.

   3.  Let output_string be an empty string.

   4.  While input_string is not empty:

       1.  Let char be the result of removing the information in HTTP headers first character of
           input_string.

       2.  If char is not one of lcalpha, DIGIT, "_", "-", "*" or "/":

           1.  Prepend char to input_string.

           2.  Return output_string.

       3.  Append char to output_string.

   5.  Return output_string.

4.4.  Parameterised Labels

   Parameterised Labels are labels (Section 4.3) with up to 256
   parameters; each parameter has a label and an optional value that is
   an item (Section 4.6).  Ordering between parameters is presented not
   significant, and duplicate parameters MUST be considered an error.

   The textual HTTP serialisation uses semicolons (";") to users in delimit the first place, improving H1
   parameters from each other, and HPACK efficiency by inventing a
   more efficient RFC5137 compliant escape-sequences seems unwarranted.

     h1-blob = ":" base64 ":"
     # XXX: where equals ("=") to import base64 delimit the parameter
   name from ?

     h1-timestamp its value.

   parameterised = number

   XXX: Allow label *256( OWS in parsers, but not in generators ? ";" OWS label [ "=" item ] )

   For example,

   ExampleParamHeader: abc; a=1; b=2; c

4.4.1.  Parsing a Parameterised Label from Textual Headers

   Given an ASCII string input_string, return a label with an mapping of
   parameters. input_string is modified to remove the parsed value.

   1.  Let primary_label be the result of Parsing a Label from Textual
       Headers (Section 4.3) from input_string.

   2.  Let parameters be an empty mapping.

   3.  In programming environments which do a loop:

       1.   Consume any OWS from the beginning of input_string.

       2.   If the first character of input_string is not define ";", exit the
            loop.

       3.   Consume a native
   representation or serialization ";" character from the beginning of Common Structure, input_string.

       4.   Consume any OWS from the HTTP/1
   serialization should beginning of input_string.

       5.   let param_name be used. the result of Parsing a Label from Textual
            Headers (Section 4.3) from input_string.

       6.   If param_name is already present in parameters, throw an
            error.

       7.   Let param_value be a null value.

       8.   If the first character of input_string is "=":

            1.  Consume the "=" character at the beginning of
                input_string.

            2.  Let param_value be the result of Parsing an Item from
                Textual Headers (Section 4.6) from input_string.

       9.   If parameters has more than 255 members, throw an error.

       10.  Add param_name to parameters with the value param_value.

   4.  When  Return the tuple (primary_label, parameters).

4.5.  Binary Content

   Arbitrary binary content up to use Common Structure Parser

   All future standardized and all private 16K in size can be conveyed in
   Structured Headers.

   The textual HTTP headers serialisation indicates their presence by a leading
   "*", with the data encoded using Common
   Structure should self identify Base 64 Encoding [RFC4648], without
   padding (as "=" might be confused with the use of dictionaries).

   binary = "*" 1*21846(base64)
   base64 = ALPHA / DIGIT / "+" / "/"

   For example, a header whose value is defined as such.  In binary content could
   look like:

   ExampleBinaryHeader: *cHJldGVuZCB0aGlzIGlzIGJpbmFyeSBjb250ZW50Lg

4.5.1.  Parsing Binary Content from Textual Headers

   Given an ASCII string input_string, return binary content.
   input_string is modified to remove the HTTP/1 serialization
   by making parsed value.

   1.  If the first character ">" and of input_string is not "*", throw an
       error.

   2.  Discard the last "<".  (These two
   characters are deliberately "the wrong way" first character of input_string.

   3.  Let b64_content be the result of removing content of input_string
       up to but not including the first character that is not clash with
   exsisting usages.)

   Legacy HTTP headers which fit into Common Structure, are marked as
   such in ALPHA,
       DIGIT, "+" or "/".

   4.  Let binary_content be the IANA Message Header Registry (see Section 8), and a
   snapshot result of the registry Base 64 Decoding [RFC4648]
       b64_content, synthesising padding if necessary.  If an error is
       encountered, throw it.

   5.  Return binary_content.

4.6.  Items

   An item is can be used to trigger parsing according a number (Section 4.1), string (Section 4.2), label
   (Section 4.3) or binary content (Section 4.5).

   item = number / string / label / binary

4.6.1.  Parsing an Item from Textual Headers

   Given an ASCII string input_string, return an item. input_string is
   modified to
   Common Structure remove the parsed value.

   1.  Discard any OWS from the beginning of these headers.

5.  Desired Normative Effects

   All new HTTP headers SHOULD use input_string.

   2.  If the Common Structure if at all
   possible.

6.  Open/Outstanding issues to resolve

6.1.  Single/Multiple Headers

   Should we allow splitting common structure data over multiple headers
   ?

   Pro:

   Avoids size restrictions, easier on-the-fly editing

   Contra:

   Cannot act on first character of input_string is a "-" or a DIGIT,
       process input_string as a number (Section 4.1) and return the
       result, throwing any such header until all headers have been received.

   We must define where headers can be split (between identifier errors encountered.

   3.  If the first character of input_string is a DQUOTE, process
       input_string as a string (Section 4.2) and
   dictionary ?, in return the result,
       throwing any errors encountered.

   4.  If the middle first character of dictionaries ?)

   Most on-the-fly editing input_string is hackish at best.

7.  Future Work
7.1.  Redefining existing headers for better performance

   The HTTP/1 serializations self-identification mechanism makes it
   possible to extend "*", process
       input_string as binary content (Section 4.5) and return the definition
       result, throwing any errors encountered.

   5.  If the first character of existing Appendix A.5 headers
   into Common Structure.

   For instance one could imagine:

     Date: >1475061449.201<

   Which would be faster to parse input_string is an lcalpha, process
       input_string as a label (Section 4.3) and validate than return the current
   definition result,
       throwing any errors encountered.

   6.  Otherwise, throw an error.

4.7.  Dictionaries

   Dictionaries are unordered maps of key-value pairs, where the Date header keys
   are labels (Section 4.3) and more precise too.

   Some kind of signal/negotiation mechanism would the values are items (Section 4.6).
   There can be between 1 and 1024 members, and keys are required to make
   this work in practice.

7.2.  Define be
   unique.

   In the textual HTTP serialisation, keys and values are separated by
   "=" (without whitespace), and key/value pairs are separated by a validation
   comma with optional whitespace.

   dictionary

   A machine-readable = label "=" item *1023( OWS "," OWS label "=" item )

   For example, a header field whose value is defined as a dictionary
   could look like:

   ExampleDictHeader: foo=1.23, en="Applepie", da=*w4ZibGV0w6ZydGUK

   Typically, a header field specification of will define the legal contents semantics of HTTP
   headers would go
   individual keys, as well as whether their presence is required or
   optional.  Recipients MUST ignore keys that are undefined or unknown,
   unless the header field's specification specifically disallows them.

4.7.1.  Parsing a long way Dictionary from Textual Headers

   Given an ASCII string input_string, return a mapping of (label,
   item). input_string is modified to improve efficiency and security in
   HTTP implementations.

8.  IANA Considerations

   The IANA Message Header Registry will remove the parsed value.

   1.  Let dictionary be an empty mapping.

   2.  While input_string is not empty:

       1.  Let this_key be extended the result of running Parse Label from
           Textual Headers (Section 4.3) with input_string.  If an additional
   field named "Common Structure" which can have error
           is encountered, throw it.

       2.  If dictionary already contains this_key, raise an error.

       3.  Consume a "=" from input_string; if none is present, raise an
           error.

       4.  Let this_value be the values "True",
   "False" result of running Parse Item from
           Textual Headers (Section 4.6) with input_string.  If an error
           is encountered, throw it.

       5.  Add key this_key with value this_value to dictionary.

       6.  Discard any leading OWS from input_string.

       7.  If input_string is empty, return dictionary.

       8.  Consume a COMMA from input_string; if no comma is present,
           raise an error.

       9.  Discard any leading OWS from input_string.

   3.  Return dictionary.

4.8.  Lists

   Lists are arrays of items (Section 4.6) or "Unknown".

   The RFC723x headers listed in Appendix A.4 will get parameterised labels
   (Section 4.4, with one to 1024 members.

   In the textual HTTP serialisation, each member is separated by a
   comma and optional whitespace.

   list = list_member 1*1024( OWS "," OWS list_member )
   list_member = item / parameterised

   For example, a header field whose value "True"
   in the new field.

   The RFC723x headers listed in Appendix A.5 will get the is defined as a list of
   labels could look like:

   ExampleLabelListHeader: foo, bar, baz_45

   and a header field whose value "False"
   in the new field.

   All other existing entries in is defined as a list of parameterised
   labels could look like:

   ExampleParamListHeader: abc/def; g="hi";j, klm/nop

4.8.1.  Parsing a List from Textual Headers

   Given an ASCII string input_string, return a list of items.
   input_string is modified to remove the registry will parsed value.

   1.  Let items be an empty array.

   2.  While input_string is not empty:

       1.  Let item be set to "Unknown"
   until and if the owner result of the entry requests otherwise.

9. running Parse Item from Textual
           Headers (Section 4.6) with input_string.  If an error is
           encountered, throw it.

       2.  Append item to items.

       3.  Discard any leading OWS from input_string.

       4.  If input_string is empty, return items.

       5.  Consume a COMMA from input_string; if no comma is present,
           raise an error.

       6.  Discard any leading OWS from input_string.

   3.  Return items.

5.  IANA Considerations

   This draft has no actions for IANA.

6.  Security Considerations

   Unique dictionary keys are required to reduce the risk of smuggling
   attacks.

10.

   TBD

7.  References
10.1.

7.1.  Normative References

   [RFC0020]  Cerf, V., "ASCII format for network interchange", STD 80,
              RFC 20, DOI 10.17487/RFC0020, October 1969,
              <https://www.rfc-editor.org/info/rfc20>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <http://www.rfc-editor.org/info/rfc2119>.

   [RFC5137]  Klensin, J., "ASCII Escaping of Unicode Characters",
              BCP 137,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC4648]  Josefsson, S., "The Base16, Base32, and Base64 Data
              Encodings", RFC 5137, 4648, DOI 10.17487/RFC5137, February 2008,
              <http://www.rfc-editor.org/info/rfc5137>. 10.17487/RFC4648, October 2006,
              <https://www.rfc-editor.org/info/rfc4648>.

   [RFC5234]  Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
              Specifications: ABNF", STD 68, RFC 5234,
              DOI 10.17487/RFC5234, January 2008,
              <http://www.rfc-editor.org/info/rfc5234>.
              <https://www.rfc-editor.org/info/rfc5234>.

   [RFC7230]  Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
              Protocol (HTTP/1.1): Message Syntax and Routing",
              RFC 7230, DOI 10.17487/RFC7230, June 2014,
              <http://www.rfc-editor.org/info/rfc7230>.

10.2.
              <https://www.rfc-editor.org/info/rfc7230>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

7.2.  Informative References

   [IEEE754]  IEEE, "IEEE Standard for Floating-Point Arithmetic", 2008,
              <http://grouper.ieee.org/groups/754/>.

   [RFC7231]  Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
              Protocol (HTTP/1.1): Semantics and Content", RFC 7231,
              DOI 10.17487/RFC7231, June 2014,
              <http://www.rfc-editor.org/info/rfc7231>.

   [RFC7232]  Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
              Protocol (HTTP/1.1): Conditional Requests", RFC 7232,
              DOI 10.17487/RFC7232, June 2014,
              <http://www.rfc-editor.org/info/rfc7232>.

   [RFC7233]  Fielding, R., Ed., Lafon, Y., Ed., and J. Reschke, Ed.,
              "Hypertext Transfer Protocol (HTTP/1.1): Range Requests",
              RFC 7233, DOI 10.17487/RFC7233, June 2014,
              <http://www.rfc-editor.org/info/rfc7233>.

   [RFC7234]  Fielding, R., Ed., Nottingham,
              <https://www.rfc-editor.org/info/rfc7231>.

   [RFC7540]  Belshe, M., Ed., and J. Reschke,
              Ed., "Hypertext Transfer Protocol (HTTP/1.1): Caching",
              RFC 7234, DOI 10.17487/RFC7234, June 2014,
              <http://www.rfc-editor.org/info/rfc7234>.

   [RFC7235]  Fielding, Peon, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
              Protocol (HTTP/1.1): Authentication", RFC 7235,
              DOI 10.17487/RFC7235, June 2014,
              <http://www.rfc-editor.org/info/rfc7235>.

   [RFC7239]  Petersson, A. and M. Nilsson, "Forwarded HTTP Extension",
              RFC 7239, DOI 10.17487/RFC7239, June 2014,
              <http://www.rfc-editor.org/info/rfc7239>.

   [RFC7694]  Reschke, J., Thomson, Ed., "Hypertext
              Transfer Protocol (HTTP) Client-
              Initiated Content-Encoding", Version 2 (HTTP/2)", RFC 7694, 7540,
              DOI 10.17487/RFC7694, November 10.17487/RFC7540, May 2015,
              <http://www.rfc-editor.org/info/rfc7694>.
              <https://www.rfc-editor.org/info/rfc7540>.

7.3.  URIs

   [1] https://lists.w3.org/Archives/Public/ietf-http-wg/

   [2] https://httpwg.github.io/

   [3] https://github.com/httpwg/http-extensions/labels/header-structure

Appendix A.  Do HTTP headers have any common structure ?

   Several proposals have been floated in recent years to use some
   preexisting structured data serialization or other for HTTP headers,
   to impose some sanity.

   None of these proposals have gained traction and no obvious candidate
   data serializations have been left unexamined.

   This effort tries to tackle the question from the other side, by
   asking if there is a common structure in existing HTTP headers we can
   generalize for this purpose.  Changes

A.1.  Survey of HTTP header structure

   The RFC723x family of HTTP/1 standards control 49 entries in the IANA
   Message Header Registry, and they share two common motifs.

   The majority of RFC723x HTTP headers are lists.  A few of them are
   ordered, ('Content-Encoding'), some are unordered ('Connection') and
   some are ordered by 'q=%f' weight parameters ('Accept')

   In most cases, the list elements are some kind of identifier, usually
   derived from ABNF 'token' as defined by [RFC7230].

   A subgroup of headers, mostly related to MIME, uses what one could
   call a 'qualified token'::

     qualified-token = token-or-asterix [ "/" token-or-asterix ]

   The second motif is parameterized list elements.  The best known is
   the "q=0.5" weight parameter, but other parameters exist as well.

   Generalizing from these motifs, our candidate "Common Structure" data
   model becomes an ordered list of named dictionaries.

   In pidgin ABNF, ignoring white-space for the sake of clarity, the
   HTTP/1.1 serialization of Common Structure is is something like:

     token-or-asterix = token from RFC7230, but also allowing "*"

     qualified-token = token-or-asterix [ "/" token-or-asterix ]

     field-name, see RFC7230

     Common-Structure-Header = field-name ":" 1#named-dictionary

     named-dictionary = qualified-token [ *(";" param) ]

     param = token [ "=" value ]

     value = we'll get back to this in a moment.

   Nineteen out of the RFC723x's 48 headers, almost 40%, can already be
   parsed using this definition, and none the rest have requirements
   which could not be met by this data model.  See Appendix A.4 and
   Appendix A.5 for the full survey details.

A.2.  Survey of values in HTTP headers

   Surveying the datatypes of HTTP headers, standardized as well as
   private, the following picture emerges:

A.2.1.  Numbers

   Integer and floating point are both used.  Range and precision is
   mostly unspecified in controlling documents.

   Scientific notation (9.192631770e9) does not seem to be used
   anywhere.

   The ranges used seem to be minus several thousand to plus a couple of
   billions, the high end almost exclusively being POSIX time_t
   timestamps.

A.2.2.  Timestamps

   RFC723x text format, but POSIX time_t represented as integer or
   floating point is not uncommon.  ISO8601 have also been spotted.

A.2.3.  Strings

   The vast majority are pure ASCII strings, with either no escapes, %xx
   URL-like escapes or C-style back-slash escapes, possibly with the
   addition of \uxxxx UNICODE escapes.

   Where non-ASCII character sets are used, they are almost always
   implicit, rather than explicit.  UTF8 and ISO-8859-1 seem to be most
   common.

A.2.4.  Binary blobs

   Often used for cryptographic data.  Usually in base64 encoding,
   sometimes ""-quoted more often not.  base85 encoding is also seen,
   usually quoted.

A.2.5.  Identifiers

   Seems to almost always fit in the RFC723x 'token' definition.

A.3.  Is this actually a useful thing to generalize ?

   The number one wishlist item seems to be UNICODE strings, with a big
   side order of not having to write a new parser routine every time
   somebody comes up with a new header.

   Having a common parser would indeed be a good thing, and having an
   underlying data model which makes it possible define a compressed
   serialization, rather than rely on serialization to text followed by
   text compression (ie: HPACK) seems like a good idea too.

   However, when using a datamodel and a parser general enough to
   transport useful data, it will have to be followed by a validation
   step, which checks that the data also makes sense.

   Today validation, such as it is, is often done by the bespoke
   parsers.

   This then is probably where the next big potential for improvement
   lies:

   Ideally a machine readable "data dictionary" which makes it possibly
   to copy that text out of RFCs, run it through a code generator which
   spits out validation code which operates on the output of the common
   parser.

   But history has been particularly unkind to that idea.

   Most attempts studied as part of this effort, have sunk under
   complexity caused by reaching for generality, but where scope has
   been wisely limited, it seems to be possible.

   So file that idea under "future work".

A.4.  RFC723x headers with "common structure"

   o  Accept [RFC7231], Section 5.3.2

   o  Accept-Charset [RFC7231], Section 5.3.3

   o  Accept-Encoding [RFC7231], Section 5.3.4, [RFC7694], Section 3

   o  Accept-Language [RFC7231], Section 5.3.5

   o  Age [RFC7234], Section 5.1

   o  Allow [RFC7231], Section 7.4.1

   o  Connection [RFC7230], Section 6.1

   o  Content-Encoding [RFC7231], Section 3.1.2.2

   o  Content-Language [RFC7231], Section 3.1.3.2

   o  Content-Length [RFC7230], Section 3.3.2

   o  Content-Type [RFC7231], Section 3.1.1.5

   o  Expect [RFC7231], Section 5.1.1

   o  Max-Forwards [RFC7231], Section 5.1.2

   o  MIME-Version [RFC7231], Appendix A.1

   o  TE [RFC7230], Section 4.3

   o  Trailer [RFC7230], Section 4.4

   o  Transfer-Encoding [RFC7230], Section 3.3.1

   o  Upgrade [RFC7230], Section 6.7

   o  Vary [RFC7231], Section 7.1.4

A.5.  RFC723x headers  Since draft-ietf-httpbis-header-structure-01

   Replaced with "uncommon structure"

   1 of the RFC723x headers is only reserved, and therefore have no
   structure at all:

   o  Close [RFC7230], Section 8.1

   5 of the RFC723x headers are HTTP dates:

   o  Date [RFC7231], Section 7.1.1.2

   o  Expires [RFC7234], Section 5.3

   o  If-Modified-Since [RFC7232], Section 3.3

   o  If-Unmodified-Since [RFC7232], Section 3.4

   o  Last-Modified [RFC7232], Section 2.2

   24 of the RFC723x headers use bespoke formats which only a single or
   in rare cases two headers share:

   o  Accept-Ranges [RFC7233], Section 2.3

      *  bytes-unit / other-range-unit

   o  Authorization [RFC7235], Section 4.2

   o  Proxy-Authorization [RFC7235], Section 4.4

      *  credentials

   o  Cache-Control [RFC7234], Section 5.2

      *  1#cache-directive

   o  Content-Location [RFC7231], Section 3.1.4.2

      *  absolute-URI / partial-URI

   o  Content-Range [RFC7233], Section 4.2

      *  byte-content-range / other-content-range

   o  ETag [RFC7232], Section 2.3

      *  entity-tag

   o  Forwarded [RFC7239]

      *  1#forwarded-element

   o  From [RFC7231], Section 5.5.1

      *  mailbox

   o  If-Match [RFC7232], Section 3.1
   o  If-None-Match [RFC7232], Section 3.2

      *  "*" / 1#entity-tag

   o  If-Range [RFC7233], Section 3.2

      *  entity-tag / HTTP-date

   o  Host [RFC7230], Section 5.4

      *  uri-host [ ":" port ]

   o  Location [RFC7231], Section 7.1.2

      *  URI-reference

   o  Pragma [RFC7234], Section 5.4

      *  1#pragma-directive

   o  Range [RFC7233], Section 3.1

      *  byte-ranges-specifier / other-ranges-specifier

   o  Referer [RFC7231], Section 5.5.2

      *  absolute-URI / partial-URI

   o  Retry-After [RFC7231], Section 7.1.3

      *  HTTP-date / delay-seconds

   o  Server [RFC7231], Section 7.4.2

   o  User-Agent [RFC7231], Section 5.5.3

      *  product *( RWS ( product / comment ) )

   o  Via [RFC7230], Section 5.7.1

      *  1#( received-protocol RWS received-by [ RWS comment ] )

   o  Warning [RFC7234], Section 5.5

      *  1#warning-value

   o  Proxy-Authenticate [RFC7235], Section 4.3
   o  WWW-Authenticate [RFC7235], Section 4.1

      *  1#challenge

Appendix B.  Changes

B.1. draft-nottingham-structured-headers.

A.2.  Since draft-ietf-httpbis-header-structure-00

   Added signed 64bit integer type.

   Drop UTF8, and settle on BCP137 [RFC5137]::EmbeddedUnicodeChar ::EmbeddedUnicodeChar for
   h1-unicode-string. h1-unicode-
   string.

   Change h1_blob delimiter to ":" since "'" is valid t_char

Author's Address

Authors' Addresses

   Mark Nottingham
   Fastly

   Email: mnot@mnot.net
   URI:   https://www.mnot.net/

   Poul-Henning Kamp
   The Varnish Cache Project

   Email: phk@varnish-cache.org