idnits 2.17.1 

draft-mcquistin-augmented-ascii-diagrams-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 831 has weird spacing: '...r eq-op   bool...'

  == Line 832 has weird spacing: '...rd-expr  bool-...'

  == Line 833 has weird spacing: '...dd-expr  ord-o...'

  == Line 835 has weird spacing: '...ul-expr  add-o...'

  == Line 836 has weird spacing: '... mul-op  expr...'

  -- The document date (9 March 2020) is 1508 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-34) exists of
     draft-ietf-quic-transport-20

  -- Obsolete informational reference (is this intentional?): RFC 7049
     (Obsoleted by RFC 8949)

  -- Obsolete informational reference (is this intentional?): RFC  793
     (Obsoleted by RFC 9293)


     Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                       S. McQuistin
3	Internet-Draft                                                   V. Band
4	Intended status: Experimental                                   D. Jacob
5	Expires: 10 September 2020                                 C. S. Perkins
6	                                                   University of Glasgow
7	                                                            9 March 2020

9	  Describing Protocol Data Units with Augmented Packet Header Diagrams
10	              draft-mcquistin-augmented-ascii-diagrams-03

12	Abstract

14	   This document describes a machine-readable format for specifying the
15	   syntax of protocol data units within a protocol specification.  This
16	   format is comprised of a consistently formatted packet header
17	   diagram, followed by structured explanatory text.  It is designed to
18	   maintain human readability while enabling support for automated
19	   parser generation from the specification document.  This document is
20	   itself an example of how the format can be used.

22	Status of This Memo

24	   This Internet-Draft is submitted in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF).  Note that other groups may also distribute
29	   working documents as Internet-Drafts.  The list of current Internet-
30	   Drafts is at https://datatracker.ietf.org/drafts/current/.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   This Internet-Draft will expire on 10 September 2020.

39	Copyright Notice

41	   Copyright (c) 2020 IETF Trust and the persons identified as the
42	   document authors.  All rights reserved.

44	   This document is subject to BCP 78 and the IETF Trust's Legal
45	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
46	   license-info) in effect on the date of publication of this document.
47	   Please review these documents carefully, as they describe your rights
48	   and restrictions with respect to this document.  Code Components
49	   extracted from this document must include Simplified BSD License text
50	   as described in Section 4.e of the Trust Legal Provisions and are
51	   provided without warranty as described in the Simplified BSD License.

53	Table of Contents

55	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
56	   2.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   4
57	     2.1.  Limitations of Current Packet Format Diagrams . . . . . .   4
58	     2.2.  Formal languages in standards documents . . . . . . . . .   7
59	   3.  Design Principles . . . . . . . . . . . . . . . . . . . . . .   7
60	   4.  Augmented Packet Header Diagrams  . . . . . . . . . . . . . .   9
61	     4.1.  PDUs with Fixed and Variable-Width Fields . . . . . . . .  10
62	     4.2.  PDUs That Cross-Reference Previously Defined Fields . . .  12
63	     4.3.  PDUs with Non-Contiguous Fields . . . . . . . . . . . . .  15
64	     4.4.  Importing PDU Definitions from Other Documents  . . . . .  15
65	   5.  Open Issues . . . . . . . . . . . . . . . . . . . . . . . . .  16
66	   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  16
67	   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  16
68	   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  17
69	   9.  Informative References  . . . . . . . . . . . . . . . . . . .  17
70	   Appendix A.  ABNF specification . . . . . . . . . . . . . . . . .  18
71	     A.1.  Constraint Expressions  . . . . . . . . . . . . . . . . .  18
72	     A.2.  Augmented packet diagrams . . . . . . . . . . . . . . . .  19
73	   Appendix B.  Source code repository . . . . . . . . . . . . . . .  19
74	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  19

76	1.  Introduction

78	   Packet header diagrams have become a widely used format for
79	   describing the syntax of binary protocols.  In otherwise largely
80	   textual documents, they allow for the visualisation of packet
81	   formats, reducing human error, and aiding in the implementation of
82	   parsers for the protocols that they specify.

84	   Figure 1 gives an example of how packet header diagrams are used to
85	   define binary protocol formats.  The format has an obvious structure:
86	   the diagram clearly delineates each field, showing its width and its
87	   position within the header.  This type of diagram is designed for
88	   human readers, but is consistent enough that it should be possible to
89	   develop a tool that generates a parser for the packet format from the
90	   diagram.

92	   :    0                   1                   2                   3
93	   :    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
94	   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
95	   :   |          Source Port          |       Destination Port        |
96	   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
97	   :   |                        Sequence Number                        |
98	   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
99	   :   |                    Acknowledgment Number                      |
100	   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
101	   :   |  Data |           |U|A|P|R|S|F|                               |
102	   :   | Offset| Reserved  |R|C|S|S|Y|I|            Window             |
103	   :   |       |           |G|K|H|T|N|N|                               |
104	   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
105	   :   |           Checksum            |         Urgent Pointer        |
106	   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
107	   :   |                    Options                    |    Padding    |
108	   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
109	   :   |                             data                              |
110	   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

112	               Figure 1: TCP's header format (from [RFC793])

114	   Unfortunately, the format of such packet diagrams varies both within
115	   and between documents.  This variation makes it difficult to build
116	   tools to generate parsers from the specifications.  Better tooling
117	   could be developed if protocol specifications adopted a consistent
118	   format for their packet descriptions.  Indeed, this underpins the
119	   format described by this draft: we want to retain the benefits that
120	   packet header diagrams provide, while identifying the benefits of
121	   adopting a consistent format.

123	   This document describes a consistent packet header diagram format and
124	   accompanying structured text constructs that allow for the parsing
125	   process of protocol headers to be fully specified.  This provides
126	   support for the automatic generation of parser code.  Broad design
127	   principles, that seek to maintain the primacy of human readability
128	   and flexibility in writing, are described, before the format itself
129	   is given.

131	   This document is itself an example of the approach that it describes,
132	   with the packet header diagrams and structured text format described
133	   by example.  Examples that do not form part of the protocol
134	   description language are marked by a colon at the beginning of each
135	   line; this prevents them from being parsed by the accompanying
136	   tooling.

138	   This draft describes early work.  As consensus builds around the
139	   particular syntax of the format described, both a formal ABNF
140	   specification (Appendix A) and code (Appendix B) that parses it (and,
141	   as described above, this document) will be provided.

143	2.  Background

145	   This section begins by considering how packet header diagrams are
146	   used in existing documents.  This exposes the limitations that the
147	   current usage has in terms of machine-readability, guiding the design
148	   of the format that this document proposes.

150	   While this document focuses on the machine-readability of packet
151	   format diagrams, this section also discusses the use of other
152	   structured or formal languages within IETF documents.  Considering
153	   how and why these languages are used provides an instructive contrast
154	   to the relatively incremental approach proposed here.

156	2.1.  Limitations of Current Packet Format Diagrams

158	   :   The RESET_STREAM frame is as follows:
159	   :
160	   :    0                   1                   2                   3
161	   :    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
162	   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
163	   :   |                        Stream ID (i)                        ...
164	   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
165	   :   |  Application Error Code (16)  |
166	   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
167	   :   |                        Final Size (i)                       ...
168	   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
169	   :
170	   :   RESET_STREAM frames contain the following fields:
171	   :
172	   :   Stream ID:  A variable-length integer encoding of the Stream ID
173	   :      of the stream being terminated.
174	   :
175	   :   Application Protocol Error Code:  A 16-bit application protocol
176	   :      error code (see Section 20.1) which indicates why the stream
177	   :      is being closed.
178	   :
179	   :   Final Size: A variable-length integer indicating the final size
180	   :      of the stream by the RESET_STREAM sender, in unit of bytes.

182	     Figure 2: QUIC's RESET_STREAM frame format (from [QUIC-TRANSPORT])

184	   Packet header diagrams are frequently used in IETF standards to
185	   describe the format of binary protocols.  While there is no standard
186	   for how these diagrams should be formatted, they have a broadly
187	   similar structure, where the layout of a protocol data unit (PDU) or
188	   structure is shown in diagrammatic form, followed by a description
189	   list of the fields that it contains.  An example of this format,
190	   taken from the QUIC specification, is given in Figure 2.

192	   These packet header diagrams, and the accompanying descriptions, are
193	   formatted for human readers rather than for automated processing.  As
194	   a result, while there is rough consistency in how packet header
195	   diagrams are formatted, there are a number of limitations that make
196	   them difficult to work with programmatically:

198	   Inconsistent syntax:  There are two classes of consistency that are
199	      needed to support automated processing of specifications: internal
200	      consistency within a diagram or document, and external consistency
201	      across all documents.

203	      Figure 2 gives an example of internal inconsistency.  Here, the
204	      packet diagram shows a field labelled "Application Error Code",
205	      while the accompanying description lists the field as "Application
206	      Protocol Error Code".  The use of an abbreviated name is suitable
207	      for human readers, but makes parsing the structure difficult for
208	      machines.  Figure 3 gives a further example, where the description
209	      includes an "Option-Code" field that does not appear in the packet
210	      diagram; and where the description states that each field is 16
211	      bits in length, but the diagram shows the OPTION_RELAY_PORT as 13
212	      bits, and Option-Len as 19 bits.  Another example is [RFC6958],
213	      where the packet format diagram showing the structure of the
214	      Burst/Gap Loss Metrics Report Block shows the Number of Bursts
215	      field as being 12 bits wide but the corresponding text describes
216	      it as 16 bits.

218	      Comparing Figure 2 with Figure 3 exposes external inconsistency
219	      across documents.  While the packet format diagrams are broadly
220	      similar, the surrounding text is formatted differently.  If
221	      machine parsing is to be made possible, then this text must be
222	      structured consistently.

224	   Ambiguous constraints:  The constraints that are enforced on a
225	      particular field are often described ambiguously, or in a way that
226	      cannot be parsed easily.  In Figure 3, each of the three fields in
227	      the structure is constrained.  The first two fields ("Option-Code"
228	      and "Option-Len") are to be set to constant values (note the
229	      inconsistency in how these constraints are expressed in the
230	      description).  However, the third field ("Downstream Source Port")
231	      can take a value from a constrained set.  This constraint is
232	      expressed in prose that cannot readily by understood by machine.

234	   Poor linking between sub-structures:  Protocol data units and other
235	      structures are often comprised of sub-structures that are defined
236	      elsewhere, either in the same document, or within another
237	      document.  Chaining these structures together is essential for
238	      machine parsing: the parsing process for a protocol data unit is
239	      only fully expressed if all elements can be parsed.

241	      Figure 2 highlights the difficulty that machine parsers have in
242	      chaining structures together.  Two fields ("Stream ID" and "Final
243	      Size") are described as being encoded as variable-length integers;
244	      this is a structure described elsewhere in the same document.
245	      Structured text is required both alongside the definition of the
246	      containing structure and with the definition of the sub-structure,
247	      to allow a parser to link the two together.

249	   Lack of extension and evolution syntax:  Protocols are often
250	      specified across multiple documents, either because the protocol
251	      explicitly includes extension points (e.g., profiles and payload
252	      format specifications in RTP [RFC3550]) or because definition of a
253	      protocol data unit has changed and evolved over time.  As a
254	      result, it is essential that syntax be provided to allow for a
255	      complete definition of a protocol's parsing process to be
256	      constructed across multiple documents.

258	   :   The format of the "Relay Source Port Option" is shown below:
259	   :
260	   :    0                   1                   2                   3
261	   :    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
262	   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
263	   :   |    OPTION_RELAY_PORT    |         Option-Len                  |
264	   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
265	   :   |    Downstream Source Port     |
266	   :   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
267	   :
268	   :   Where:
269	   :
270	   :   Option-Code:  OPTION_RELAY_PORT. 16-bit value, 135.
271	   :
272	   :   Option-Len:  16-bit value to be set to 2.
273	   :
274	   :   Downstream Source Port:  16-bit value.  To be set by the IPv6
275	   :      relay either to the downstream relay agent's UDP source port
276	   :      used for the UDP packet, or to zero if only the local relay
277	   :      agent uses the non-DHCP UDP port (not 547).

279	        Figure 3: DHCPv6's Relay Source Port Option (from [RFC8357])

281	2.2.  Formal languages in standards documents

283	   A small proportion of IETF standards documents contain structured and
284	   formal languages, including ABNF [RFC5234], ASN.1 [ASN1], C, CBOR
285	   [RFC7049], JSON, the TLS presentation language [RFC8446], YANG models
286	   [RFC7950], and XML.  While this broad range of languages may be
287	   problematic for the development of tooling to parse specifications,
288	   these, and other, languages serve a range of different use cases.
289	   ABNF, for example, is typically used to specify text protocols, while
290	   ASN.1 is used to specify data structure serialisation.  This document
291	   specifies a structured language for specifying the parsing of binary
292	   protocol data units.

294	3.  Design Principles

296	   The use of structures that are designed to support machine
297	   readability might potentially interfere with the existing ways in
298	   which protocol specifications are used and authored.  To the extent
299	   that these existing uses are more important than machine readability,
300	   such interference must be minimised.

302	   In this section, the broad design principles that underpin the format
303	   described by this document are given.  However, these principles
304	   apply more generally to any approach that introduces structured and
305	   formal languages into standards documents.

307	   It should be noted that these are design principles: they expose the
308	   trade-offs that are inherent within any given approach.  Violating
309	   these principles is sometimes necessary and beneficial, and this
310	   document sets out the potential consequences of doing so.

312	   The central tenet that underpins these design principles is a
313	   recognition that the standardisation process is not broken, and so
314	   does not need to be fixed.  Failure to recognise this will likely
315	   lead to approaches that are incompatible with the standards process,
316	   or that will see limited adoption.  However, the standards process
317	   can be improved with appropriate approaches, as guided by the
318	   following broad design principles:

320	   Most readers are human:  Primarily, standards documents should be
321	      written for people, who require text and diagrams that they can
322	      understand.  Structures that cannot be easily parsed by people
323	      should be avoided, and if included, should be clearly delineated
324	      from human-readable content.

326	      Any approach that shifts this balance -- that is, that primarily
327	      targets machine readers -- is likely to be disruptive to the
328	      standardisation process, which relies upon discussion centered
329	      around documents written in prose.

331	   Writing tools are diverse:  Standards document writing is a
332	      distributed process that involves a diverse set of tools and
333	      workflows.  The introduction of machine-readable structures into
334	      specifications should not require that specific tools are used to
335	      produce standards documents, to ensure that disruption to existing
336	      workflows is minimised.  This does not preclude the development of
337	      optional, supplementary tools that aid in the authoring machine-
338	      readable structures.

340	      The immediate impact of requiring specific tooling is that
341	      adoption is likely to be limited.  A long-term impact might be
342	      that authors whose workflows are incompatible might be alienated
343	      from the process.

345	   Canonical specifications:  As far as possible, machine-readable
346	      structures should not replicate the human readable specification
347	      of the protocol within the same document.  Machine-readable
348	      structures should form part of a canonical specification of the
349	      protocol.  Adding supplementary machine-readable structures, in
350	      parallel to the existing human readable text, is undesirable
351	      because it creates the potential for inconsistency.

353	      As an example, program code that describes how a protocol data
354	      unit can be parsed might be provided as an appendix within a
355	      standards document.  This code would provide a specification of
356	      the protocol that is separate to the prose description in the main
357	      body of the document.  This has the undesirable effect of
358	      introducing the potential for the program code to specify
359	      behaviour that the prose-based specification does not, and vice-
360	      versa.

362	   Expressiveness:  Any approach should be expressive enough to capture
363	      the syntax and parsing process for the majority of binary
364	      protocols.  If a given language is not sufficiently expressive,
365	      then adoption is likely to be limited.  At the limits of what can
366	      be expressed by the language, authors are likely to revert to
367	      defining the protocol in prose: this undermines the broad goal of
368	      using structured and formal languages.  Equally, though,
369	      understandable specifications and ease of use are critical for
370	      adoption.  A tool that is simple to use and addresses the most
371	      common use cases might be preferred to a complex tool that
372	      addresses all use cases.

374	      It may be desirable to restrict expressiveness, however, to
375	      guarantee intrinsic safety, security, and computability properties
376	      of both the generated parser code for the protocol, and the parser
377	      of the description language itself.  In much the same way as the
378	      language-theoretic security ([LANGSEC]) community advocates for
379	      programming language design to be informed by the desired
380	      properties of the parsers for those languages, protocol designers
381	      should be aware of the implications of their design choices.  The
382	      expressiveness of the protocol description languages that they use
383	      to define their protocols can force such awareness.

385	      Broadly, those languages that have grammars which are more
386	      expressive tend to have parsers that are more complex and less
387	      safe.  As a result, while considering the other goals described in
388	      this document, protocol description languages should attempt to be
389	      minimally expressive, and either restrict protocol designs to
390	      those for which safe and secure parsers can be generated, or as a
391	      minimum, ensure that protocol designers are aware of the
392	      boundaries their designs cross, in terms of computability and
393	      decidability [SASSAMAN].

395	   Minimise required change:  Any approach should require as few changes
396	      as possible to the way that documents are formatted, authored, and
397	      published.  Forcing adoption of a particular structured or formal
398	      language is incompatible with the IETF's standardisation process:
399	      there are very few components of standards documents that are non-
400	      optional.

402	4.  Augmented Packet Header Diagrams

404	   The design principles described in Section 3 can largely be met by
405	   the existing uses of packet header diagrams.  These diagrams aid
406	   human readability, do not require new or specialised tools to write,
407	   do not split the specification into multiple parts, can express most
408	   binary protocol features, and require no changes to existing
409	   publication processes.

411	   However, as discussed in Section 2.1 there are limitations to how
412	   packet header diagrams are used that must be addressed if they are to
413	   be parsed by machine.  In this section, an augmented packet header
414	   diagram format is described.

416	   The concept is first illustrated by example.  This is appropriate,
417	   given the visual nature of the language.  In future drafts, these
418	   examples will be parsable using provided tools, and a formal
419	   specification of the augmented packet diagrams will be given in
420	   Appendix A.

422	4.1.  PDUs with Fixed and Variable-Width Fields

424	   The simplest PDU is one that contains only a set of fixed-width
425	   fields in a known order, with no optional fields or variation in the
426	   packet format.

428	   Some packet formats include variable-width fields, where the size of
429	   a field is either derived from the value of some previous field, or
430	   is unspecified and inferred from the total size of the packet and the
431	   size of the other fields.

433	   To ensure that there is no ambiguity, a PDU description can contain
434	   only one field whose length is unspecified.  The length of a single
435	   field, where all other fields are of known (but perhaps variable)
436	   length, can be inferred from the total size of the containing PDU.

438	   A PDU description is introduced by the exact phrase "A/An _______ is
439	   formatted as follows:" at the end of a paragraph.  This is followed
440	   by the PDU description itself, as a packet diagram within an
441	   <artwork> element in the XML representation, starting with a header
442	   line to show the bit width of the diagram.  The description of the
443	   fields follows the diagram, as an XML <dl> list, after a paragraph
444	   containing the text "where:".

446	   PDU names must be unique, both within a document, and across all
447	   documents that are linked together (i.e., using the structured
448	   language defined in Section 4.4).

450	   Each field of the description starts with a <dt> tag comprising the
451	   field name and an optional short name in parenthesis.  These are
452	   followed by a colon, the field length, an optional presence
453	   expression (described in Section 4.2), and a terminating period.  The
454	   following <dd> tag contains a prose description of the field.  Field
455	   names cannot be the same as a previously defined PDU name, and must
456	   be unique within a given structure definition.

458	   For example, this can be illustrated using the IPv4 Header Format
459	   [RFC791].  An IPv4 Header is formatted as follows:

461	        0                   1                   2                   3
462	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
463	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
464	       |Version|   IHL |    DSCP   |ECN|         Total Length          |
465	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
466	       |         Identification        |Flags|     Fragment Offset     |
467	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
468	       | Time to Live  |    Protocol   |        Header Checksum        |
469	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
470	       |                         Source Address                        |
471	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
472	       |                      Destination Address                      |
473	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
474	       |                            Options                          ...
475	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
476	       |                                                               :
477	       :                            Payload                            :
478	       :                                                               |
479	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

481	   where:

483	   Version (V): 4 bits.  This is a fixed-width field, whose full label
484	      is shown in the diagram.  The field's width -- 4 bits -- is given
485	      in the label of the description list, separated from the field's
486	      label by a colon.

488	   Internet Header Length (IHL): 4 bits.  This is a shorter field, whose
489	      full label is too large to be shown in the diagram.  A short label
490	      (IHL) is used in the diagram, and this short label is provided, in
491	      brackets, after the full label in the description list.

493	   Differentiated Services Code Point (DSCP): 6 bits.  This is a fixed-
494	      width field, as previously discussed.

496	   Explicit Congestion Notification (ECN): 2 bits.  This is a fixed-
497	      width field, as previously discussed.

499	   Total Length (TL): 2 bytes.  This is a fixed-width field, as
500	      previously discussed.  Where fields are an integral number of
501	      bytes in size, the field length can be given in bytes rather than
502	      in bits.

504	   Identification: 2 bytes.  This is a fixed-width field, as previously
505	      discussed.

507	   Flags: 3 bits.  This is a fixed-width field, as previously discussed.

509	   Fragment Offset: 13 bits.  This is a fixed-width field, as previously
510	      discussed.

512	   Time to Live (TTL): 1 byte.  This is a fixed-width field, as
513	      previously discussed.

515	   Protocol: 1 byte.  This is a fixed-width field, as previously
516	      discussed.

518	   Header Checksum: 2 bytes.  This is a fixed-width field, as previously
519	      discussed.

521	   Source Address: 32 bits.  This is a fixed-width field, as previously
522	      discussed.

524	   Destination Address: 32 bits.  This is a fixed-width field, as
525	      previously discussed.

527	   Options: (IHL-5)*32 bits.  This is a variable-length field, whose
528	      length is defined by the value of the field with short label IHL
529	      (Internet Header Length).  Constraint expressions can be used in
530	      place of constant values: the grammar for the expression language
531	      is defined in Appendix A.1.  Constraints can include a previously
532	      defined field's short or full label, where one has been defined.
533	      Short variable-length fields are indicated by "..." instead of a
534	      pipe at the end of the row.

536	   Payload: TL - ((IHL*32)/8) bytes.  This is a multi-row variable-
537	      length field, constrained by the values of fields TL and IHL.
538	      Instead of the "..." notation, ":" is used to indicate that the
539	      field is variable-length.  The use of ":" instead of "..."
540	      indicates the field is likely to be a longer, multi-row field.
541	      However, semantically, there is no difference: these different
542	      notations are for the benefit of human readers.

544	4.2.  PDUs That Cross-Reference Previously Defined Fields

546	   Binary formats often reference sub-structures that have been defined
547	   earlier in the specification.  For example, in RTP [RFC3550], the
548	   Contributing Source Identifiers in an RTP Data Packet are defined as
549	   comprising a list of Source Identifier elements.  A Source Identifier
550	   is formatted as follows:

552	        0                   1                   2                   3
553	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
554	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
555	       |                               SSRC                            |
556	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

558	   where:

560	   SSRC: 32 bits.  This is a fixed-width field, as described previously.

562	   The following example shows how a Source Identifier can be referenced
563	   in the description of an RTP Data Packet.  It also shows how the
564	   presence of some fields in a format may be dependent on the values of
565	   an earlier field.

567	   An RTP Data Packet is formatted as follows:

569	        0                   1                   2                   3
570	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
571	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
572	       | V |P|X|  CC   |M|     PT      |       Sequence Number         |
573	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
574	       |                           Timestamp                           |
575	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
576	       |                Synchronization Source identifier              |
577	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
578	       |                [Contributing Source identifiers]              |
579	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
580	       |                       Header Extension                        |
581	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
582	       |                             Payload                           :
583	       :                                                               :
584	       :                                                               |
585	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
586	       |                           Padding             | Padding Count |
587	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

589	   where:

591	   Version (V): 2 bits.  This is a fixed-width field, as described
592	      previously.

594	   Padding (P): 1 bit.  This is a fixed-width field, as described
595	      previously.

597	   Extension (X): 1 bit.  This is a fixed-width field, as described
598	      previously.

600	   CSRC count (CC): 4 bits.  This is a fixed-width field, as described
601	      previously.

603	   Marker (M): 1 bit.  This is a fixed-width field, as described
604	      previously.

606	   Payload Type (PT): 7 bits.  This is a fixed-width field, as described
607	      previously.

609	   Sequence Number (PT): 16 bits.  This is a fixed-width field, as
610	      described previously.

612	   Timestamp (PT): 32 bits.  This is a fixed-width field, as described
613	      previously.

615	   Synchronization Source identifier: 1 * Source Identifier.  This is a
616	      field whose structure is a previously defined PDU format (Source
617	      Identifier).  To indicate this, the width of the field is
618	      expressed in terms of cross-referenced structure.  When used in
619	      constraint expressions, PDU names refer to the length of that PDU
620	      structure.

622	   Contributing Source identifiers: CC * Source Identifier.  Where a
623	      field is comprised of a sequence of previously defined structures,
624	      square brackets can be used to indicate this in the diagram.  The
625	      length of the sequence can be defined using the constraint
626	      expression grammar as described earlier.

628	      In this example, both a PDU name (Source Identifier) and a field
629	      name (CC) are used in the constraint expression.  The PDU name
630	      refers to the length of the PDU, while the field name refers to
631	      the value of the field.  This is possible because field names
632	      cannot be the same as previously defined PDU names.

634	   Header Extension: 32 bits; present only when X == 1.  This is a field
635	      whose presence is predicated on an expression given using the
636	      constraint expression grammar described earlier.  Optional fields
637	      can be of any previously defined format (e.g., fixed- or variable-
638	      width).  Optional fields are indicated by the presence of ";
639	      present only when [expr]." at the end of the definition term
640	      (i.e., the text contained within the <dt> tag).

642	      [Note that this example deviates from the format as described in
643	      [RFC3550].  As specified in that document, the Header Extension
644	      would be a cross-referenced structure.  This is not shown here for
645	      brevity.]

647	   Payload.  The length of the Payload is not specified, and hence needs
648	      to be inferred from the total length of the packet and the lengths
649	      of the known fields.  There can only be one field of unspecified
650	      size in a PDU.

652	   Padding: Padding Count bytes; present only when (P == 1) and
653	   (Padding Count > 0).

655	      This is a variable size field, with size dependent on a later
656	      field in the packet.  Fields can only depend on the value of a
657	      later field if they follow a field with unspecified size.

659	   Padding Count: 1 byte; present only when P == 1.  This is a fixed-
660	      width field, as previously discussed.

662	4.3.  PDUs with Non-Contiguous Fields

664	   In some binary formats, fields are striped across multiple non-
665	   contiguous bits.  This is often to allow for backwards compatibility
666	   with previous definitions of the same fields in earlier documents:
667	   striping in this way allows for careful use of the possible range of
668	   values.

670	   This format is illustrated using the STUN Message Type
671	   [draft-ietf-tram-stunbis-21].  A STUN Message Type is formatted as
672	   follows:

674	        0                   1
675	        0 1 2 3 4 5 6 7 8 9 0 1 2 3
676	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
677	       |M|M|M|M|M|C|M|M|M|C|M|M|M|M|
678	       |B|A|9|8|7|1|6|5|4|0|3|2|1|0|
679	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+

681	   where:

683	   Method (M): 12 bits.  This field is comprised of multiple sub-fields
684	      (M0 through MB) as shown in the diagram.  That these sub-fields
685	      should be concatenated, after parsing, into a single field is
686	      indicated by their being labelled using the 'M' short field name
687	      followed by a single hexadecimal digit, with the least significant
688	      bit labelled with 0, and subsequent bits labelled in sequence.

690	   Class (C): 2 bits.  This field follows the same format as M described
691	      above.

693	4.4.  Importing PDU Definitions from Other Documents

695	   Protocols are often specified across multiple documents, either
696	   because the specification of a protocol's data units has changed over
697	   time, or because of explicit extension points contained in the
698	   protocol's original specification.  To allow a document to make use
699	   of a previous PDU definition, it is possible to import PDU
700	   definitions (written in the format described in this document) from
701	   other documents.

703	   A PDU definition is imported using the exact phrase "A/An ________ is
704	   formatted as described in <document identifier>".  The document
705	   identifier must refer, unambiguously, to an existing document.  An
706	   Internet-Draft is identified by its name.  RFCs are identified by
707	   "RFC" followed by their number.

709	5.  Open Issues

711	   *  Need a simple syntax for defining a list of identical objects, and
712	      a way of referring to the size of the enclosing packet.  The
713	      format cannot currently represent RFC 6716 section 3.2.3, and
714	      should be able to (the underlying type system can do so).

716	   *  Need some discussion about the checks that the tooling might
717	      perform, and the implications of those checks.  For example, the
718	      tooling checks for consistency between the diagram and the
719	      description list of fields, ensuring that fields match by name and
720	      width. -01 of this draft had a field that mismatched because of
721	      case: is this something that the tooling should identify?  More
722	      broadly, what is the trade-off between the rigour that the tooling
723	      can enforce, and the flexibility desired/needed by authors?

725	   *  Need to describe the rules governing the import of PDU definitions
726	      from other documents.

728	6.  IANA Considerations

730	   This document contains no actions for IANA.

732	7.  Security Considerations

734	   Poorly implemented parsers are a frequent source of security
735	   vulnerabilities in protocol implementations.  Structuring the
736	   description of a protocol data unit so that a parser can be
737	   automatically derived from the specification can reduce the
738	   likelihood of vulnerable implementations.

740	   As described in Section 3, the expressiveness of a protocol
741	   description language has implications for the safety, security, and
742	   computability properties of the parser for the protocol description
743	   language itself, and on the generated parser code for the protocols
744	   described using it.  The language-theoretic security ([LANGSEC])
745	   community explores the security implications of programming language
746	   design; the principles developed in that community should guide the
747	   development of protocol description languages.

749	8.  Acknowledgements

751	   The authors would like to thank David Southgate for preparing a
752	   prototype implementation of some of the ideas described here.

754	   The authors would like to thank Marc Petit-Huguenin for feedback on
755	   the draft.

757	   This work has received funding from the UK Engineering and Physical
758	   Sciences Research Council under grant EP/R04144X/1.

760	9.  Informative References

762	   [RFC8357]  Deering, S. and R. Hinden, "Generalized UDP Source Port
763	              for DHCP Relay", RFC 8357, March 2018,
764	              <https://www.rfc-editor.org/info/rfc8357>.

766	   [QUIC-TRANSPORT]
767	              Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed
768	              and Secure Transport", Work in Progress, Internet-Draft,
769	              draft-ietf-quic-transport-20, 23 April 2019,
770	              <http://www.ietf.org/internet-drafts/draft-ietf-quic-
771	              transport-20.txt>.

773	   [RFC6958]  Clark, A., Zhang, S., Zhao, J., and Q. Wu, "RTP Control
774	              Protocol (RTCP) Extended Report (XR) Block for Burst/Gap
775	              Loss Metric Reporting", RFC 6958, May 2013,
776	              <https://www.rfc-editor.org/info/rfc6958>.

778	   [RFC7950]  Bjorklund, M., "The YANG 1.1 Data Modeling Language",
779	              RFC 7950, August 2016,
780	              <https://www.rfc-editor.org/info/rfc7950>.

782	   [RFC8446]  Rescorla, E., "The Transport Layer Security (TLS) Protocol
783	              Version 1.3", RFC 8446, August 2018,
784	              <https://www.rfc-editor.org/info/rfc8446>.

786	   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
787	              Specifications: ABNF", RFC 5234, January 2008,
788	              <https://www.rfc-editor.org/info/rfc5234>.

790	   [ASN1]     ITU-T, "ITU-T Recommendation X.680, X.681, X.682, and
791	              X.683", ITU-T Recommendation X.680, X.681, X.682, and
792	              X.683.

794	   [RFC7049]  Bormann, C. and P. Hoffman, "Concise Binary Object
795	              Representation (CBOR)", RFC 7049, October 2013,
796	              <https://www.rfc-editor.org/info/rfc7049>.

798	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
799	              Jacobson, "RTP: A Transport Protocol for Real-Time
800	              Applications", RFC 3550, July 2003,
801	              <https://www.rfc-editor.org/info/rfc3550>.

803	   [draft-ietf-tram-stunbis-21]
804	              Petit-Huguenin, M., Salgueiro, G., Rosenberg, J., Wing,
805	              D., Mahy, R., and P. Matthews, "Session Traversal
806	              Utilities for NAT (STUN)", Work in Progress, Internet-
807	              Draft, draft-ietf-tram-stunbis-21, 21 March 2019,
808	              <http://www.ietf.org/internet-drafts/draft-ietf-tram-
809	              stunbis-21.txt>.

811	   [RFC791]   Postel, J., "Internet Protocol", RFC 791, September 1981,
812	              <https://www.rfc-editor.org/info/rfc791>.

814	   [RFC793]   Postel, J., "Transmission Control Protocol", RFC 793,
815	              September 1981, <https://www.rfc-editor.org/info/rfc793>.

817	   [LANGSEC]  LANGSEC, "LANGSEC: Language-theoretic Security",
818	              <http://langsec.org>.

820	   [SASSAMAN] Sassaman, L., Patterson, M. L., Bratus, S., and A.
821	              Shubina, "The Halting Problems of Network Stack
822	              Insecurity", ;login: -- December 2011, Volume 36, Number
823	              6, <https://www.usenix.org/publications/login/december-
824	              2011-volume-36-number-6/halting-problems-network-stack-
825	              insecurity>.

827	Appendix A.  ABNF specification

829	A.1.  Constraint Expressions
830	       cond-expr = eq-expr "?" cond-expr ":" eq-expr
831	       eq-expr   = bool-expr eq-op   bool-expr
832	       bool-expr = ord-expr  bool-op ord-expr
833	       ord-expr  = add-expr  ord-op  add-expr

835	       add-expr  = mul-expr  add-op  mul-expr
836	       mul-expr  = expr      mul-op  expr
837	       expr      = *DIGIT / field-name /
838	                   field-name-ws / "(" expr ")"

840	       field-name    = *ALPHA
841	       field-name-ws = *(field-name " ")

843	       mul-op  = "*" / "/" / "%"
844	       add-op  = "+" / "-"
845	       ord-op  = "<=" / "<" / ">=" / ">"
846	       bool-op = "&&" / "||" / "!"
847	       eq-op   = "==" / "!="

849	A.2.  Augmented packet diagrams

851	   Future revisions of this draft will include an ABNF specification for
852	   the augmented packet diagram format described in Section 4.  Such a
853	   specification is omitted from this draft given that the format is
854	   likely to change as its syntax is developed.  Given the visual nature
855	   of the format, it is more appropriate for discussion to focus on the
856	   examples given in Section 4.

858	Appendix B.  Source code repository

860	   The source for this draft is available from https://github.com/
861	   glasgow-ipl/draft-mcquistin-augmented-ascii-diagrams.

863	   The source code for tooling that can be used to parse this document
864	   is available from https://github.com/glasgow-ipl/ips-protodesc-code.

866	Authors' Addresses

868	   Stephen McQuistin
869	   University of Glasgow
870	   School of Computing Science
871	   Glasgow
872	   G12 8QQ
873	   United Kingdom

875	   Email: sm@smcquistin.uk
876	   Vivian Band
877	   University of Glasgow
878	   School of Computing Science
879	   Glasgow
880	   G12 8QQ
881	   United Kingdom

883	   Email: vivianband0@gmail.com

885	   Dejice Jacob
886	   University of Glasgow
887	   School of Computing Science
888	   Glasgow
889	   G12 8QQ
890	   United Kingdom

892	   Email: d.jacob.1@research.gla.ac.uk

894	   Colin Perkins
895	   University of Glasgow
896	   School of Computing Science
897	   Glasgow
898	   G12 8QQ
899	   United Kingdom

901	   Email: csp@csperkins.org