Network Working Group                                       S. McQuistin
Internet-Draft                                                   V. Band
Intended status: Informational Experimental                              C. S. Perkins
Expires: 9 January 6 May 2020                                University of Glasgow
                                                             8 July
                                                         3 November 2019

    Fully Specifying

  Describing Protocol Parsing Data Units with Augmented ASCII Packet Header Diagrams
              draft-mcquistin-augmented-ascii-diagrams-00
              draft-mcquistin-augmented-ascii-diagrams-01

Abstract

   This document describes a machine-readable format for fully specifying the process by which
   syntax of protocol data units within a protocol can be parsed. specification.  This
   format combines is comprised of a consistent ASCII consistently formatted packet diagram format with the use
   of header
   diagram, followed by structured text, maintaining explanatory text.  It is designed to
   maintain human readability while enabling support for machine parsing. automated
   parser generation from the specification document.  This document is
   itself an example of how this the format can be used.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 9 January 6 May 2020.

Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Simplified BSD License text
   as described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   3   4
     2.1.  Limitations of current ASCII packet diagrams usage Current Packet Format Diagrams . . .   3 . . .   4
     2.2.  Formal languages in standards documents . . . . . . . . .   6
   3.  Design Principles . . . . . . . . . . . . . . . . . . . . . .   6   7
   4.  Augmented ASCII Packet Header Diagrams  . . . . . . . . . . .   8
     4.1.  Fixed-width Field Format  . . . . . . . .   8
     4.1.  PDUs with Fixed and Variable-Width Fields . . . . . . . .   8   9
     4.2.  Variable-width Field Format . .  PDUs That Cross-Reference Previously Defined
           Fields  . . . . . . . . . . . . .  10
     4.3.  Cross-referencing and Sequences Format  . . . . . . . . .  11
     4.4.  Optional Field Format . . . . .  11
     4.3.  PDUs with Non-Contiguous Fields . . . . . . . . . . . . .  12  14
   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  13  14
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  13  14
   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  13  14
   8.  Informative References  . . . . . . . . . . . . . . . . . . .  13  15
   Appendix A.  ABNF specification . . . . . . . . . . . . . . . . .  14  16
     A.1.  Constraint Expressions  . . . . . . . . . . . . . . . . .  14  16
     A.2.  Augmented ASCII packet diagrams . . . . . . . . . . . . . . . .  14  16
   Appendix B.  Source code repository . . . . . . . . . . . . . . .  16
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  14  16

1.  Introduction

   ASCII packet

   Packet header diagrams have become the de-facto a widely used format for
   describing the syntax of binary protocols.  In otherwise largely
   textual documents, they allow for the visualisation of packet
   formats, reducing human error, and aiding in the implementation of
   parsers for the protocols that they specify.  Given their widespread
   use, and relatively structured form, ASCII

   Figure 1 gives an example of how packet header diagrams
   provide a good base from which are used to develop a
   define binary protocol formats.  The format that supports has an obvious structure:
   the
   automatic generation diagram clearly delineates each field, showing its width and its
   position within the header.  This type of diagram is designed for
   human readers, but is consistent enough that it should be possible to
   develop a tool that generates a parser code from protocol standards
   documents.

   There are two broad issues with for the existing ASCII packet format from the
   diagram.

   :    0                   1                   2                   3
   :    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   :   |          Source Port          |       Destination Port        |
   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   :   |                        Sequence Number                        |
   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   :   |                    Acknowledgment Number                      |
   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   :   |  Data |           |U|A|P|R|S|F|                               |
   :   | Offset| Reserved  |R|C|S|S|Y|I|            Window             |
   :   |       |           |G|K|H|T|N|N|                               |
   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   :   |           Checksum            |         Urgent Pointer        |
   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   :   |                    Options                    |    Padding    |
   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   :   |                             data                              |
   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

               Figure 1: TCP's header format (from [RFC793])

   Unfortunately, the format of such packet diagrams that need varies both within
   and between documents.  This variation makes it difficult to be addressed build
   tools to enable machine-readability.
   First, their use, while sufficiently generate parsers from the specifications.  Better tooling
   could be developed if protocol specifications adopted a consistent
   format for human
   readability, contains enough variation to make machine parsing
   difficult: different documents tend their packet descriptions.  Indeed, this underpins the
   format described by this draft: we want to use subtly different formats
   and conventions.  Second, ASCII retain the benefits that
   packet header diagrams alone do not
   fully capture provide, while identifying the parsing process for protocols, requiring
   supplementary text.  To support machine parsing, this supplementary
   text must be consistently structured. benefits of
   adopting a consistent format.

   This document describes a consistent ASCII packet header diagram format and
   accompanying structured text constructs that allow for the parsing
   process of protocol headers to be fully specified.  This provides
   support for the automatic generation of parser code.  Broad design
   principles, that seek to maintain the primacy of human readability
   and flexibility in authorship, are described, before the format
   itself is given.

   This document is itself an example of the approach that it describes,
   with the ASCII packet header diagrams and structured text format described
   by example.

   This draft describes early work.  As consensus builds around the
   particular syntax of the format described, both a formal ABNF
   specification and code that parses it (and, as described above, this
   document) will be provided.

   :   The RESET_STREAM frame is as follows:
   :
   :    0                   1                   2                   3
   :    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   :   |                        Stream ID (i)                        ...
   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   :   |  Application Error Code (16)  |
   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   :   |                        Final Size (i)                       ...
   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   :
   :   RESET_STREAM frames contain the following fields:
   :
   :   Stream ID:  A variable-length integer encoding of the Stream ID
   :      of the stream being terminated.
   :
   :   Application Protocol Error Code:  A 16-bit application protocol
   :      error code (see Section 20.1) which indicates why the stream
   :      is being closed.
   :
   :   Final Size: A variable-length integer indicating the final size
   :      of the stream by the RESET_STREAM sender, in unit of bytes.

     Figure 2: QUIC's RESET_STREAM frame format (from [QUIC-TRANSPORT])

2.  Background

   This section begins by considering how ASCII packet header diagrams are
   used in existing documents.  This exposes the limitations that the
   current usage has in terms of machine-readability, guiding the design
   of the format that this document proposes.

   While this document focuses on the machine-readability of packet
   header
   format diagrams, this section also discusses the use of other
   structured or formal languages within IETF documents.  Considering
   how and why these languages are used provides an instructive contrast
   to the relatively incremental approach proposed here.

2.1.  Limitations of current ASCII packet diagrams usage

   ASCII packet Current Packet Format Diagrams

   Packet header diagrams are commonplace frequently used in the IETF standards
   documents for to
   describe the format of binary protocols.  While there is no standard
   for how these diagrams should be formatted, they have a broadly
   similar structure, where the layout of a protocol data unit (PDU) or
   structure is given shown in an ASCII diagram, and diagrammatic form, followed by a description
   list of the fields that it contains are given immediately below. contains.  An example of this format is given in Figure 1.

      The RESET_STREAM frame is as follows:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                        Stream ID (i)                        ...
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Application Error Code (16)  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                        Final Size (i)                       ...
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

      RESET_STREAM frames contain the following fields:

      Stream ID:  A variable-length integer encoding of the Stream ID of
         the stream being terminated.

      Application Protocol Error Code:  A 16-bit application protocol
         error code (see Section 20.1) which indicates why format,
   taken from the stream QUIC specification, is
         being closed.

      Final Size:  A variable-length integer indicating the final size
         of the stream by the RESET_STREAM sender, given in unit of bytes. Figure 1: QUIC's RESET_STREAM frame format (from [QUIC-TRANSPORT])

   However, these 2.

   These packet header diagrams, and their the accompanying descriptions, are
   formatted for human readers rather than for machine parsing. automated processing.  As
   a result, while there is broad rough consistency in how ASCII packet header
   diagrams are formatted, there are a number of limitations that are prohibitive make
   them difficult to machine parsing: work with programmatically:

   Inconsistent syntax:  There are two classes of consistency that are
      required for parsability:
      needed to support automated processing of specifications: internal consistency,
      consistency within a document diagram or diagram, document, and external consistency, consistency
      across all documents.  Given
      that ASCII packet diagrams are formatted for human readers, rather
      than for machine parsing, there is sufficient variability in how
      they are formatted that parsing is difficult.

         The format of the "Relay Source Port Option" is shown below:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |    OPTION_RELAY_PORT    |         Option-Len                  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |    Downstream Source Port     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         Where:

         Option-Code:  OPTION_RELAY_PORT. 16-bit value, 135.

         Option-Len:  16-bit value to be set to 2.

         Downstream Source Port:  16-bit value.  To be set by the IPv6
            relay either to the downstream relay agent's UDP source port
            used for the UDP packet, or to zero if only the local relay
            agent uses the non-DHCP UDP port (not 547).

         Figure 2: DHCPv6's Relay Source Port Option (from [RFC8357])

      Figure 1 2 gives an example of internal inconsistency.  Here, the
      ASCII
      packet diagram shows a field labelled "Application Error Code",
      while the accompanying description lists the field as "Application
      Protocol Error Code".  The use of an abbreviated name is suitable
      for human readers, but makes parsing the structure difficult for
      machines.  Figure 2 3 gives a further example, where the description
      lists a field
      includes an "Option-Code" field that does not appear in the ASCII
      diagram.  In addition, packet
      diagram; and where the description list describes states that each field
      as being is 16
      bits in length, while but the diagram shows the OPTION_RELAY_PORT as 13
      bits, and Option-Len as 19 bits.  Another example of this -- where the diagram and accompanying text
      disagree -- is in [RFC6958],
      where the packet header format diagram showing the structure of the
      Burst/Gap Loss Metrics Report Block shows the Number of Bursts
      field as being 12 bits wide but the corresponding text describes
      it as 16 bits.

      Comparing Figure 1 2 with Figure 2 3 exposes external inconsistency
      across documents.  While the ASCII packet format diagrams themselves are broadly
      similar, the text surrounding the diagrams text is formatted differently.  If
      machine parsing is to be made possible, then this text must be
      structured consistently.

   Ambiguous constraints:  The constraints that are enforced on a
      particular field are often described ambiguously, or in a way that
      cannot be parsed easily.  In Figure 2, 3, each of the three fields in
      the structure is constrained.  The first two fields ("Option-Code"
      and "Option-Len") are to be set to constant values (note the
      inconsistency in how these constraints are expressed in the
      description).  However, the third field ("Downstream Source Port")
      can take a value from a constrained set.  This constraint is
      expressed in prose that can easily be parsed cannot readily by humans, but not understood by
      machines. machine.

   Poor linking between sub-structures:  Protocol data units and other
      structures are often comprised of sub-structures that are defined
      elsewhere, either in the same document, or within another
      document.  Chaining these structures together is essential for
      machine parsing: the parsing process for a protocol data unit is
      only fully expressed if all elements can be parsed.

      Figure 1 2 highlights the difficulty that machine parsers have in
      chaining structures together.  Two fields ("Stream ID" and "Final
      Size") are described as being encoded as variable-length integers;
      this is a structure described elsewhere in the same document.
      Structured text is required both alongside the definition of the
      containing structure and with the definition of the sub-structure,
      to allow a parser to link the two together.

   :   The format of the "Relay Source Port Option" is shown below:
   :
   :    0                   1                   2                   3
   :    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   :   |    OPTION_RELAY_PORT    |         Option-Len                  |
   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   :   |    Downstream Source Port     |
   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   :
   :   Where:
   :
   :   Option-Code:  OPTION_RELAY_PORT. 16-bit value, 135.
   :
   :   Option-Len:  16-bit value to be set to 2.
   :
   :   Downstream Source Port:  16-bit value.  To be set by the IPv6
   :      relay either to the downstream relay agent's UDP source port
   :      used for the UDP packet, or to zero if only the local relay
   :      agent uses the non-DHCP UDP port (not 547).

        Figure 3: DHCPv6's Relay Source Port Option (from [RFC8357])

2.2.  Formal languages in standards documents

   A small proportion of IETF standards documents contain structured and
   formal languages, including ABNF [RFC5234], ASN.1 [ASN1], C, CBOR
   [RFC7049], JSON, the TLS presentation language [RFC8446], YANG models
   [RFC7950], and XML.  While this broad range of languages may be
   problematic for the development of tooling to parse specifications,
   these, and other, languages serve a range of different use cases.
   ABNF, for example, is typically used to specify text protocols, while
   ASN.1 is used to specify data structure serialisation.  This document
   specifies a structured language for specifying the parsing of binary
   protocol data units.

3.  Design Principles

   The use of structures that are designed to support machine
   readability may potentially interfere with the existing ways in which
   protocol specifications are used and authored.  To the extent that
   these existing uses are more important than machine readability, such
   interference must be minimised.

   In this section, the broad design principles that underpin the format
   described by this document are given.  However, these principles
   apply more generally to any approach that introduces structured and
   formal languages into standards documents.

   It should be noted that these are design principles: they expose the
   trade-offs that are inherent within any given approach.  Violating
   these principles is sometimes necessary and beneficial, and this
   document sets out the potential consequences of doing so.

   The central tenet that underpins these design principles is a
   recognition that the standardisation process is not broken, and so
   does not need to be fixed.  Failure to recognise this will likely
   lead to approaches that are incompatible with the standards process,
   or that will see limited adoption.  However, the standards process
   can be improved with appropriate approaches, as guided by the
   following broad design principles:

   Most readers are human:  Primarily, standards documents should be
      written for people, who require text and diagrams that they can
      understand.  Structures that cannot be easily parsed by people
      should be avoided, and if included, should be clearly delineated
      from human-readable content.

      Any approach that shifts this balance -- that is, that primarily
      targets machine readers -- is likely to be disruptive to the
      standardisation process, which relies upon discussion centered
      around documents written in prose.

   Authorship tools are diverse:  Authorship is a distributed process
      that involves a diverse set of tools and workflows.  The
      introduction of machine-readable structures into specifications
      should not require that specific tools are used to produce
      standards documents, to ensure that disruption to existing
      workflows is minimised.  This does not preclude the development of
      optional, supplementary tools that aid in the authoring machine-
      readable structures.

      The immediate impact of requiring specific tooling is that
      adoption is likely to be limited.  A long-term impact might be
      that authors whose workflows are incompatible might be alienated
      from the process.

   Canonical specifications:  As far as possible, machine-readable
      structures should not replicate the human readable specification
      of the protocol within the same document.  Such structures should
      form part of a canonical specification of the protocol.  Adding
      supplementary machine-readable structures, in parallel to the
      existing human readable text, is undesirable because it could
      create creates
      the potential for inconsistency.

      As an example, program code that describes how a protocol data
      unit can be parsed might be provided as an appendix within a
      standards document.  This code would provide a specification of
      the protocol that is separate to the prose description in the main
      body of the document.  This has the undesirable effect of
      introducing the potential for the program code to specify
      behaviour that the prose-based specification does not, and vice-
      versa.

   Expressiveness:  Any approach should be expressive enough to capture
      the syntax and parsing process for the majority of binary
      protocols.  If a given language is not sufficiently expressive,
      then adoption is likely to be limited.  At the limits of what can
      be expressed by the language, authors are likely to revert to
      defining the protocol in prose: this undermines the broad goal of
      using structured and formal languages.  Equally, though,
      understandable specifications and ease of use are critical for
      adoption.  A tool that is simple to use and addresses the most
      common use cases might be preferred to a complex tool that
      addresses all use cases.

   Minimise required change:  Any approach should require as few changes
      as possible to the way that documents are formatted, authored, and
      published.  Forcing adoption of a particular structured or formal
      language is incompatible with the IETF's standardisation process:
      there are very few components of standards documents that are non-
      optional.

4.  Augmented ASCII Packet Header Diagrams

   The design principles described in Section 3 can largely be met by
   the existing uses of ASCII packet header diagrams.  These diagrams aid
   human readability, do not require new or specialised authorship
   tools, do not split the specification into multiple parts, can
   express most binary protocol features, and require no changes to the
   existing publication processes.

   However, as discussed in Section 2.1 there are limitations to how
   ASCII
   packet header diagrams are used that must be addressed if they are to
   be parsed by machine.  In this section, an augmented ASCII packet header
   diagram format is described.

   The concept is first illustrated by example.  This is appropriate,
   given the visual nature of the language.  In future drafts, these
   examples will be parsable using provided tools, and a formal
   specification of the augmented ASCII packet diagrams will be given in
   Appendix A.

   In the augmented ASCII packet diagrams, each protocol data unit is
   described in its own section of the document.  This enables cross-
   referencing between data units using section numbering.  In this
   specification-by-example, each element of the format will be
   described as part of a separate PDU.

4.1.  Fixed-width Field Format  PDUs with Fixed and Variable-Width Fields

   The simplest PDU is one that contains only a set of fixed-width
   fields in a known order, with no optional fields or variation in the
   packet format.

   Some packet formats include variable-width fields, where the size of
   a field is either derived from the value of some previous field, or
   is unspecified and inferred from the total size of the packet and the
   size of the other fields.  A Fixed-width Field Format packet can contain only one unspecified
   length field, to ensure there is no ambiguity.

   A PDU description is introduced by the exact phrase "A/An _______ is
   formatted as follows:" at the end of a paragraph.  This is followed
   by the PDU description itself, as a packet diagram within an
   <artwork> element in the XML representation, starting with a header
   line to show the bit width of the diagram.  The description of the
   fields follows the diagram, as an XML <dl> list, after a paragraph
   containing the text "where:".

   Each field of the description starts with a <dt> tag comprising the
   field name and an optional short name in parenthesis.  These are
   followed by a colon, the field length, and a terminating period.  The
   following <dd> tag contains a prose description of the field.

   For example, this can be illustrated using the IPv4 Header Format
   [RFC791].  An IPv4 Header is formatted as follows:

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |F2
       |Version|   IHL |                         Field30    DSCP   |ECN|         Total Length          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |         Identification        |Flags|     Fragment Offset     |
      +                           Field64                             +
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | Time to Live  |    Protocol   |        Header Checksum        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         Source Address                        |
      +            Field48            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                      Destination Address                      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |     Field8                            Options                          ...
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |
      +-+-+-+-+-+-+-+-+                                                               :
       :                            Payload                            :
       :                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   where:

   Field2 (F2): 2 bits

   Version (V): 4 bits.  This is a short fixed-width field, and the diagram cannot
                        show its whose full label.  A short label (F2)
      is used
                        in the diagram, and this short label is
                        provided, in brackets, after the full label shown in the description list. diagram.  The field's width -- 2 4 bits -- is given
      in the label of the description list, separated from the field's
      label by a colon.

   Field30: 30 bits

   Internet Header Length (IHL): 4 bits.  This is a longer field shorter field, whose
      full label can is too large to be shown in the diagram.

   Field64: 8 bytes  A short label
      (IHL) is used in the diagram, and this short label is provided, in
      brackets, after the full label in the description list.

   Differentiated Services Code Point (DSCP): 6 bits.  This is a field that spans multiple rows. fixed-
      width field, as previously defined.

   Explicit Congestion Notification (ECN): 2 bits.  This is a fixed-
      width field, as previously defined.

   Total Length (TL): 2 bytes.  This is a fixed-width field, as
      previously defined.  Where fields are an integral number of bytes
      in size,
                        and start and end on a byte boundary, the field length can be given in bytes rather than in
      bits.

   Field48: 48 bits

   Identification: 2 bytes.  This is another multi-row field.  As
                        illustrated, fields are not required to end of a
                        32-bit boundary.

   Field8: 1 byte fixed-width field, as previously
      defined.

   Flags: 3 bits.  This field has been drawn on the next line,
                        where it would have fit on the previous line.
                        Where possible, the formatting of the diagram
                        should be flexible to meet the needs of human
                        readers.

4.2.  Variable-width Field Format

   Some packet formats include variable-width fields, where the size of
   a field is either derived from the value of some previous a fixed-width field, or as previously defined.

   Fragment Offset: 13 bits.  This is unspecified and inferred from the total size of the packet and the
   size of the other fields.  A packet can contain only one unspecified
   length a fixed-width field, to ensure there is no ambiguity.

   A Variable-width Field Format packet is formatted as follows:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 previously
      defined.

   Time To Live (TTL): 1 2 3 4 5 6 7 8 9 0 byte.  This is a fixed-width field, as
      previously defined.

   Protocol: 1 byte.  This is a fixed-width field, as previously
      defined.

   Header Checksum: 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |     Field8    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                      FieldVar - single row                  ...
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               :
      :                      FieldVar - multi-row                     :
      :                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               :
      :              FieldVar - multi-row, unspecified length         :
      :                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   where:

   Field8 (F): 8 bits bytes.  This is a fixed-width field, as described
      previously.  As shown, while this field has previously
      defined.

   Source Address: 32 bits.  This is a short label (F),
      this does not need to be used in the diagram.

   FieldVar - single row: 2^F bits fixed-width field, as previously
      defined.

   Destination Address: 32 bits.  This is a fixed-width field, as
      previously defined.

   Options: (IHL-5)*32 bits.  This is a variable-length field, whose
      length is defined by the value of the field with short label
      F (Field8). IHL
      (Internet Header Length).  Constraint expressions can be used in
      place of constant values: the grammar for the expression language
      is defined in Section a.1.  Where fields labels are used in Appendix A.1.  Constraints can include a
      constraint, the field being referred to must have been previously
      defined
      before its label is used. field's short or full label, where one has been defined.
      Short variable-length fields are indicated by "..." instead of a
      pipe at the end of the row.

   FieldVar

   Payload: TL - multi-row: 2^F bits ((IHL*32)/8) bytes.  This is a multi-row variable-length variable-
      length field, again constrained by the value values of field F. fields TL and IHL.
      Instead of the "..." notation, ":" is used to indicate that the
      field is variable-length.  The use of ":" instead of "..."
      indicates the field is likely to be a longer, multi-row field.
      However, semantically, there is no difference: these different
      notations are for the benefit of human readers.

   FieldVar - multi-row, unspecified length  This is a variable-width
      field whose length is implied by the lengths of

4.2.  PDUs That Cross-Reference Previously Defined Fields

   Binary formats often reference sub-structures that have been defined
   earlier in the other fields.
      At parsing time, specification.  For example, in RTP [RFC3550], the length
   Contributing Source Identifiers in an RTP Data Packet are defined as
   comprising a list of the PDU Source Identifier elements.  A Source Identifier
   is known, and this formatted as follows:

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                      Source Identifier                        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   where:

   Source Identifier: 32 bits.  This is a fixed-width field, as
      described previously.

   The following example shows how a Source Identifier can be
      used to determine referenced
   in the length description of an RTP Data Packet.  It also shows how the
   presence of some fields whose length is undefined.
      Each PDU can only leave in a single field's length undefined: all
      other fields must format may be fixed-length, or have their widths
      constrained.

4.3.  Cross-referencing and Sequences Format

   A Cross-referencing and Sequences Format packet dependent on the values of
   an earlier field.

   An RTP Data Packet is formatted as follows:

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |     Field8 V |P|X|  CC   |M|     PT      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+       Sequence Number         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |
      +                                                               +                           Timestamp                           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |
      +                       FieldFixedXRef                          +                Synchronization Source identifier              |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |
      +                                                               +                [Contributing Source identifiers]              |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |
      +                                               +-+-+-+-+-+-+-+-+                       Header Extension                        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                             Payload                           :
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                                                               :
       :
      :                         FieldVarXref                          :
      :                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                           Padding             |
      +                                                               +
      |                                                               |
      +                       [SeqFieldFixedXRef]                     +
      |                                                               |
      +                                                               +
      |                                                               |
      +                                               +-+-+-+-+-+-+-+-+
      | Padding Count |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   where:

   Field8 (F): 8 bits

   Version (V): 2 bits.  This is a fixed-width field, as described
      previously.

   FieldFixedXRef:

   Padding (P): 1 Fixed-width Field Format bit.  This is a fixed-width field, as described
      previously.

   Extension (X): 1 bit.  This is a fixed-width field, as described
      previously.

   CSRC count (CC): 4 bits.  This is a fixed-width field, as described
      previously.

   Marker (M): 1 bit.  This is a fixed-width field, as described
      previously.

   Payload Type (PT): 7 bits.  This is a fixed-width field, as described
      previously.

   Sequence Number (PT): 16 bits.  This is a fixed-width field, as
      described previously.

   Timestamp (PT): 32 bits.  This is a fixed-width field, as described
      previously.

   Synchronization Source identifier: 1 * Source Identifier.  This is a
      field whose structure is a previously defined PDU format.  To
      indicate this, the width of the field is given expressed in units terms of the
      cross-referenced structure (here, Fixed-width Field Format).

   FieldVarXref: 1 Variable-width Field Format  This field references a
      variable-width structure.  It can be drawn Source Identifier).  When used
      in constraint expressions, PDU names refer to any width as
      appropriate, but must use a variable-width notation.  Where
      multiple variable-width field format structures are referenced, the requirement that only one field's length can be unspecified
      applies to the enclosing of that
      PDU structure.

   SeqFieldFixedXRef: 2 Fixed-width Field Format

   Contributing Source identifiers: CC * Source Identifier.  Where a
      field is comprised of a sequence of previously defined structures,
      square brackets can be used to indicate this in the diagram.  The
      length of the sequence can be defined using the constraint
      expression grammar as described earlier.

4.4.  Optional Field Format

   An Optional Field Format packet is formatted as follows:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |     Field8    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                         OptionalField                         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   where:

   Field8 (F): 8 bits      This is a fixed-width field, as described
                           previously.

   OptionalField: 4 bytes  Present

   Header Extension: 32 bits; present only when F > 3. X == 1.  This is a field
      whose presence is predicated on an expression given in using the
      constraint expression grammar described earlier.  Optional fields
      can be of any previously defined format (e.g., fixed- or variable-width). variable-
      width).  Optional fields are indicated by the presence of a
      "Present only when [expr]." as the first line in their
                           description
      description.

      [Note that this example deviates from the format as described in
      [RFC3550].  As specified in that document, the Header Extension
      would be a cross-referenced structure.  This is not shown here for
      brevity.]

   Payload.  The length of the Payload is not specified, and hence needs
      to be inferred from the total length of the packet and the lengths
      of the known fields.  There can only be one field of unspecified
      size in a PDU.

   Padding: Padding Count bytes; present only when (P == 1) and
   (Padding Count > 0).
      This is a variable size field, with size dependent on a later
      field in the packet.  Fields can only depend on the value of a
      later field if they follow a field with unspecified size.

   Padding Count: 1 byte; present only when P == 1.  This is a fixed-
      width field, as previously defined.

4.3.  PDUs with Non-Contiguous Fields

   In some binary formats, fields are striped across multiple non-
   contiguous bits.  This is often to allow for backwards compatibility
   with previous definitions of the same fields in earlier documents:
   striping in this way allows for careful use of the possible range of
   values.

   This format is illustrated using the STUN Message Type
   [draft-ietf-tram-stunbis-21].  A STUN Message Type is formatted as
   follows:

        0                   1
        0 1 2 3 4 5 6 7 8 9 0 1 2 3
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |M|M|M|M|M|C|M|M|M|C|M|M|M|M|
       |B|A|9|8|7|1|6|5|4|0|3|2|1|0|
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   where:

   Method (M): 12 bits.  This field is comprised of multiple sub-fields
      (M0 through MB) as shown in the diagram.  That these sub-fields
      should be concatenated, after parsing, into a single field is
      indicated by their being labelled using the 'M' short field name
      followed by a single hexadecimal digit, with the least significant
      bit labelled with 0, and subsequent bits labelled in sequence.

   Class (C): 2 bits.  This field follows the same format as M described
      above.

5.  IANA Considerations

   This document contains no actions for IANA.

6.  Security Considerations

   Poorly implemented parsers are a frequent source of security
   vulnerabilities in protocol implementations.  Structuring the
   description of a protocol data unit so that a parser can be
   automatically derived from the specification can reduce the
   likelihood of vulnerable implementations.

7.  Acknowledgements

   The authors would like to thank David Southgate for preparing a
   prototype implementation of some of the ideas described here.

   The authors would like to thank Marc Petit-Huguenin for feedback on
   the draft.

   This work has received funding from the UK Engineering and Physical
   Sciences Research Council under grant EP/R04144X/1.

8.  Informative References

   [ASN1]     ITU-T, "ITU-T Recommendation X.680, X.681, X.682, and
              X.683", ITU-T Recommendation X.680, X.681, X.682,

   [RFC8357]  Deering, S. and
              X.683. R. Hinden, "Generalized UDP Source Port
              for DHCP Relay", RFC 8357, March 2018,
              <https://www.rfc-editor.org/info/rfc8357>.

   [QUIC-TRANSPORT]
              Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed
              and Secure Transport", draft-ietf-quic-transport-20 (work Work in progress), Progress, Internet-Draft,
              draft-ietf-quic-transport-20, 23 April 2019,
              <http://www.ietf.org/internet-drafts/draft-ietf-quic-
              transport-20.txt>.

   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
              Specifications: ABNF", RFC 5234, January 2008,
              <https://www.rfc-editor.org/info/rfc5234>.

   [RFC6958]  Clark, A., Zhang, S., Zhao, J., and Q. Wu, "RTP Control
              Protocol (RTCP) Extended Report (XR) Block for Burst/Gap
              Loss Metric Reporting", RFC 6958, May 2013,
              <https://www.rfc-editor.org/info/rfc6958>.

   [RFC7049]  Bormann, C. and P. Hoffman, "Concise Binary Object
              Representation (CBOR)", RFC 7049, October 2013,
              <https://www.rfc-editor.org/info/rfc7049>.

   [RFC7950]  Bjorklund, M., "The YANG 1.1 Data Modeling Language",
              RFC 7950, August 2016,
              <https://www.rfc-editor.org/info/rfc7950>.

   [RFC8357]  Deering, S. and R. Hinden, "Generalized UDP Source Port
              for DHCP Relay", RFC 8357, March 2018,
              <https://www.rfc-editor.org/info/rfc8357>.

   [RFC8446]  Rescorla, E., "The Transport Layer Security (TLS) Protocol
              Version 1.3", RFC 8446, August 2018,
              <https://www.rfc-editor.org/info/rfc8446>.

   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
              Specifications: ABNF", RFC 5234, January 2008,
              <https://www.rfc-editor.org/info/rfc5234>.

   [ASN1]     ITU-T, "ITU-T Recommendation X.680, X.681, X.682, and
              X.683", ITU-T Recommendation X.680, X.681, X.682, and
              X.683.

   [RFC7049]  Bormann, C. and P. Hoffman, "Concise Binary Object
              Representation (CBOR)", RFC 7049, October 2013,
              <https://www.rfc-editor.org/info/rfc7049>.

   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
              Jacobson, "RTP: A Transport Protocol for Real-Time
              Applications", RFC 3550, July 2003,
              <https://www.rfc-editor.org/info/rfc3550>.

   [draft-ietf-tram-stunbis-21]
              Petit-Huguenin, M., Salgueiro, G., Rosenberg, J., Wing,
              D., Mahy, R., and P. Matthews, "Session Traversal
              Utilities for NAT (STUN)", Work in Progress, Internet-
              Draft, draft-ietf-tram-stunbis-21, 21 March 2019,
              <http://www.ietf.org/internet-drafts/draft-ietf-tram-
              stunbis-21.txt>.

   [RFC791]   Postel, J., "Internet Protocol", RFC 791, September 1981,
              <https://www.rfc-editor.org/info/rfc791>.

   [RFC793]   Postel, J., "Transmission Control Protocol", RFC 793,
              September 1981, <https://www.rfc-editor.org/info/rfc793>.

Appendix A.  ABNF specification

A.1.  Constraint Expressions

   cond-expr = eq-expr "?" cond-expr ":" eq-expr eq-expr   = bool-expr eq-op   bool-expr bool-expr = ord-expr  bool-op ord-expr ord-expr  = add-expr  ord-op  add-expr add-expr  = mul-expr  add-op  mul-expr mul-expr  = expr      mul-op  expr expr      = *DIGIT / field-name / field-name-ws / "(" expr ")" field-name    = *ALPHA field-name-ws = *(field-name " ") mul-op  = "*" / "/" / "%" add-op  = "+" / "-" ord-op  = "<=" / "<" / ">=" / ">" bool-op = "&&" / "||" / "!" eq-op   = "==" / "!="

A.2.  Augmented ASCII packet diagrams

   Future revisions of this draft will include an ABNF specification for
   the augmented ASCII packet diagram format described in Section 4.  Such a
   specification is omitted from this draft given that the format is
   likely to change as its syntax is developed.  Given the visual nature
   of the format, it is more appropriate for discussion to focus on the
   examples given in Section 4.

Appendix B.  Source code repository

   The source code for tooling that can be used to parse this document
   is available from https://github.com/lumisota/improving-protocol-
   standards.

Authors' Addresses

   Stephen McQuistin
   University of Glasgow
   School of Computing Science
   Glasgow
   G12 8QQ
   United Kingdom

   Email: sm@smcquistin.uk
   Vivian Band
   University of Glasgow
   School of Computing Science
   Glasgow
   G12 8QQ
   United Kingdom

   Email: vivianband0@gmail.com

   Colin Perkins
   University of Glasgow
   School of Computing Science
   Glasgow
   G12 8QQ
   United Kingdom

   Email: csp@csperkins.org