idnits 2.17.1 draft-mcquistin-augmented-ascii-diagrams-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 812 has weird spacing: '...r eq-op bool...' == Line 813 has weird spacing: '...rd-expr bool-...' == Line 814 has weird spacing: '...dd-expr ord-o...' == Line 816 has weird spacing: '...ul-expr add-o...' == Line 817 has weird spacing: '... mul-op expr...' -- The document date (4 February 2020) is 1515 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-20 -- Obsolete informational reference (is this intentional?): RFC 7049 (Obsoleted by RFC 8949) -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. McQuistin 3 Internet-Draft V. Band 4 Intended status: Experimental C. S. Perkins 5 Expires: 7 August 2020 University of Glasgow 6 4 February 2020 8 Describing Protocol Data Units with Augmented Packet Header Diagrams 9 draft-mcquistin-augmented-ascii-diagrams-02 11 Abstract 13 This document describes a machine-readable format for specifying the 14 syntax of protocol data units within a protocol specification. This 15 format is comprised of a consistently formatted packet header 16 diagram, followed by structured explanatory text. It is designed to 17 maintain human readability while enabling support for automated 18 parser generation from the specification document. This document is 19 itself an example of how the format can be used. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on 7 August 2020. 38 Copyright Notice 40 Copyright (c) 2020 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 45 license-info) in effect on the date of publication of this document. 46 Please review these documents carefully, as they describe your rights 47 and restrictions with respect to this document. Code Components 48 extracted from this document must include Simplified BSD License text 49 as described in Section 4.e of the Trust Legal Provisions and are 50 provided without warranty as described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 2.1. Limitations of Current Packet Format Diagrams . . . . . . 4 57 2.2. Formal languages in standards documents . . . . . . . . . 7 58 3. Design Principles . . . . . . . . . . . . . . . . . . . . . . 7 59 4. Augmented Packet Header Diagrams . . . . . . . . . . . . . . 9 60 4.1. PDUs with Fixed and Variable-Width Fields . . . . . . . . 9 61 4.2. PDUs That Cross-Reference Previously Defined Fields . . . 12 62 4.3. PDUs with Non-Contiguous Fields . . . . . . . . . . . . . 15 63 4.4. Importing PDU Definitions from Other Documents . . . . . 15 64 5. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 16 65 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 66 7. Security Considerations . . . . . . . . . . . . . . . . . . . 16 67 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16 68 9. Informative References . . . . . . . . . . . . . . . . . . . 17 69 Appendix A. ABNF specification . . . . . . . . . . . . . . . . . 18 70 A.1. Constraint Expressions . . . . . . . . . . . . . . . . . 18 71 A.2. Augmented packet diagrams . . . . . . . . . . . . . . . . 18 72 Appendix B. Source code repository . . . . . . . . . . . . . . . 19 73 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 75 1. Introduction 77 Packet header diagrams have become a widely used format for 78 describing the syntax of binary protocols. In otherwise largely 79 textual documents, they allow for the visualisation of packet 80 formats, reducing human error, and aiding in the implementation of 81 parsers for the protocols that they specify. 83 Figure 1 gives an example of how packet header diagrams are used to 84 define binary protocol formats. The format has an obvious structure: 85 the diagram clearly delineates each field, showing its width and its 86 position within the header. This type of diagram is designed for 87 human readers, but is consistent enough that it should be possible to 88 develop a tool that generates a parser for the packet format from the 89 diagram. 91 : 0 1 2 3 92 : 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 93 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 94 : | Source Port | Destination Port | 95 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 96 : | Sequence Number | 97 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 98 : | Acknowledgment Number | 99 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 100 : | Data | |U|A|P|R|S|F| | 101 : | Offset| Reserved |R|C|S|S|Y|I| Window | 102 : | | |G|K|H|T|N|N| | 103 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 104 : | Checksum | Urgent Pointer | 105 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 106 : | Options | Padding | 107 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 108 : | data | 109 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 111 Figure 1: TCP's header format (from [RFC793]) 113 Unfortunately, the format of such packet diagrams varies both within 114 and between documents. This variation makes it difficult to build 115 tools to generate parsers from the specifications. Better tooling 116 could be developed if protocol specifications adopted a consistent 117 format for their packet descriptions. Indeed, this underpins the 118 format described by this draft: we want to retain the benefits that 119 packet header diagrams provide, while identifying the benefits of 120 adopting a consistent format. 122 This document describes a consistent packet header diagram format and 123 accompanying structured text constructs that allow for the parsing 124 process of protocol headers to be fully specified. This provides 125 support for the automatic generation of parser code. Broad design 126 principles, that seek to maintain the primacy of human readability 127 and flexibility in authorship, are described, before the format 128 itself is given. 130 This document is itself an example of the approach that it describes, 131 with the packet header diagrams and structured text format described 132 by example. Examples that do not form part of the protocol 133 description language are marked by a colon at the beginning of each 134 line; this prevents them from being parsed by the accompanying 135 tooling. 137 This draft describes early work. As consensus builds around the 138 particular syntax of the format described, both a formal ABNF 139 specification (Appendix A) and code (Appendix B) that parses it (and, 140 as described above, this document) will be provided. 142 2. Background 144 This section begins by considering how packet header diagrams are 145 used in existing documents. This exposes the limitations that the 146 current usage has in terms of machine-readability, guiding the design 147 of the format that this document proposes. 149 While this document focuses on the machine-readability of packet 150 format diagrams, this section also discusses the use of other 151 structured or formal languages within IETF documents. Considering 152 how and why these languages are used provides an instructive contrast 153 to the relatively incremental approach proposed here. 155 2.1. Limitations of Current Packet Format Diagrams 157 : The RESET_STREAM frame is as follows: 158 : 159 : 0 1 2 3 160 : 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 161 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 162 : | Stream ID (i) ... 163 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 164 : | Application Error Code (16) | 165 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 166 : | Final Size (i) ... 167 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 168 : 169 : RESET_STREAM frames contain the following fields: 170 : 171 : Stream ID: A variable-length integer encoding of the Stream ID 172 : of the stream being terminated. 173 : 174 : Application Protocol Error Code: A 16-bit application protocol 175 : error code (see Section 20.1) which indicates why the stream 176 : is being closed. 177 : 178 : Final Size: A variable-length integer indicating the final size 179 : of the stream by the RESET_STREAM sender, in unit of bytes. 181 Figure 2: QUIC's RESET_STREAM frame format (from [QUIC-TRANSPORT]) 183 Packet header diagrams are frequently used in IETF standards to 184 describe the format of binary protocols. While there is no standard 185 for how these diagrams should be formatted, they have a broadly 186 similar structure, where the layout of a protocol data unit (PDU) or 187 structure is shown in diagrammatic form, followed by a description 188 list of the fields that it contains. An example of this format, 189 taken from the QUIC specification, is given in Figure 2. 191 These packet header diagrams, and the accompanying descriptions, are 192 formatted for human readers rather than for automated processing. As 193 a result, while there is rough consistency in how packet header 194 diagrams are formatted, there are a number of limitations that make 195 them difficult to work with programmatically: 197 Inconsistent syntax: There are two classes of consistency that are 198 needed to support automated processing of specifications: internal 199 consistency within a diagram or document, and external consistency 200 across all documents. 202 Figure 2 gives an example of internal inconsistency. Here, the 203 packet diagram shows a field labelled "Application Error Code", 204 while the accompanying description lists the field as "Application 205 Protocol Error Code". The use of an abbreviated name is suitable 206 for human readers, but makes parsing the structure difficult for 207 machines. Figure 3 gives a further example, where the description 208 includes an "Option-Code" field that does not appear in the packet 209 diagram; and where the description states that each field is 16 210 bits in length, but the diagram shows the OPTION_RELAY_PORT as 13 211 bits, and Option-Len as 19 bits. Another example is [RFC6958], 212 where the packet format diagram showing the structure of the 213 Burst/Gap Loss Metrics Report Block shows the Number of Bursts 214 field as being 12 bits wide but the corresponding text describes 215 it as 16 bits. 217 Comparing Figure 2 with Figure 3 exposes external inconsistency 218 across documents. While the packet format diagrams are broadly 219 similar, the surrounding text is formatted differently. If 220 machine parsing is to be made possible, then this text must be 221 structured consistently. 223 Ambiguous constraints: The constraints that are enforced on a 224 particular field are often described ambiguously, or in a way that 225 cannot be parsed easily. In Figure 3, each of the three fields in 226 the structure is constrained. The first two fields ("Option-Code" 227 and "Option-Len") are to be set to constant values (note the 228 inconsistency in how these constraints are expressed in the 229 description). However, the third field ("Downstream Source Port") 230 can take a value from a constrained set. This constraint is 231 expressed in prose that cannot readily by understood by machine. 233 Poor linking between sub-structures: Protocol data units and other 234 structures are often comprised of sub-structures that are defined 235 elsewhere, either in the same document, or within another 236 document. Chaining these structures together is essential for 237 machine parsing: the parsing process for a protocol data unit is 238 only fully expressed if all elements can be parsed. 240 Figure 2 highlights the difficulty that machine parsers have in 241 chaining structures together. Two fields ("Stream ID" and "Final 242 Size") are described as being encoded as variable-length integers; 243 this is a structure described elsewhere in the same document. 244 Structured text is required both alongside the definition of the 245 containing structure and with the definition of the sub-structure, 246 to allow a parser to link the two together. 248 Lack of extension and evolution syntax: Protocols are often 249 specified across multiple documents, either because the protocol 250 explicitly includes extension points (e.g., profiles and payload 251 format specifications in RTP [RFC3550]) or because definition of a 252 protocol data unit has changed and evolved over time. As a 253 result, it is essential that syntax be provided to allow for a 254 complete definition of a protocol's parsing process to be 255 constructed across multiple documents. 257 : The format of the "Relay Source Port Option" is shown below: 258 : 259 : 0 1 2 3 260 : 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 261 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 262 : | OPTION_RELAY_PORT | Option-Len | 263 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 264 : | Downstream Source Port | 265 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 266 : 267 : Where: 268 : 269 : Option-Code: OPTION_RELAY_PORT. 16-bit value, 135. 270 : 271 : Option-Len: 16-bit value to be set to 2. 272 : 273 : Downstream Source Port: 16-bit value. To be set by the IPv6 274 : relay either to the downstream relay agent's UDP source port 275 : used for the UDP packet, or to zero if only the local relay 276 : agent uses the non-DHCP UDP port (not 547). 278 Figure 3: DHCPv6's Relay Source Port Option (from [RFC8357]) 280 2.2. Formal languages in standards documents 282 A small proportion of IETF standards documents contain structured and 283 formal languages, including ABNF [RFC5234], ASN.1 [ASN1], C, CBOR 284 [RFC7049], JSON, the TLS presentation language [RFC8446], YANG models 285 [RFC7950], and XML. While this broad range of languages may be 286 problematic for the development of tooling to parse specifications, 287 these, and other, languages serve a range of different use cases. 288 ABNF, for example, is typically used to specify text protocols, while 289 ASN.1 is used to specify data structure serialisation. This document 290 specifies a structured language for specifying the parsing of binary 291 protocol data units. 293 3. Design Principles 295 The use of structures that are designed to support machine 296 readability might potentially interfere with the existing ways in 297 which protocol specifications are used and authored. To the extent 298 that these existing uses are more important than machine readability, 299 such interference must be minimised. 301 In this section, the broad design principles that underpin the format 302 described by this document are given. However, these principles 303 apply more generally to any approach that introduces structured and 304 formal languages into standards documents. 306 It should be noted that these are design principles: they expose the 307 trade-offs that are inherent within any given approach. Violating 308 these principles is sometimes necessary and beneficial, and this 309 document sets out the potential consequences of doing so. 311 The central tenet that underpins these design principles is a 312 recognition that the standardisation process is not broken, and so 313 does not need to be fixed. Failure to recognise this will likely 314 lead to approaches that are incompatible with the standards process, 315 or that will see limited adoption. However, the standards process 316 can be improved with appropriate approaches, as guided by the 317 following broad design principles: 319 Most readers are human: Primarily, standards documents should be 320 written for people, who require text and diagrams that they can 321 understand. Structures that cannot be easily parsed by people 322 should be avoided, and if included, should be clearly delineated 323 from human-readable content. 325 Any approach that shifts this balance -- that is, that primarily 326 targets machine readers -- is likely to be disruptive to the 327 standardisation process, which relies upon discussion centered 328 around documents written in prose. 330 Authorship tools are diverse: Authorship is a distributed process 331 that involves a diverse set of tools and workflows. The 332 introduction of machine-readable structures into specifications 333 should not require that specific tools are used to produce 334 standards documents, to ensure that disruption to existing 335 workflows is minimised. This does not preclude the development of 336 optional, supplementary tools that aid in the authoring machine- 337 readable structures. 339 The immediate impact of requiring specific tooling is that 340 adoption is likely to be limited. A long-term impact might be 341 that authors whose workflows are incompatible might be alienated 342 from the process. 344 Canonical specifications: As far as possible, machine-readable 345 structures should not replicate the human readable specification 346 of the protocol within the same document. Machine-readable 347 structures should form part of a canonical specification of the 348 protocol. Adding supplementary machine-readable structures, in 349 parallel to the existing human readable text, is undesirable 350 because it creates the potential for inconsistency. 352 As an example, program code that describes how a protocol data 353 unit can be parsed might be provided as an appendix within a 354 standards document. This code would provide a specification of 355 the protocol that is separate to the prose description in the main 356 body of the document. This has the undesirable effect of 357 introducing the potential for the program code to specify 358 behaviour that the prose-based specification does not, and vice- 359 versa. 361 Expressiveness: Any approach should be expressive enough to capture 362 the syntax and parsing process for the majority of binary 363 protocols. If a given language is not sufficiently expressive, 364 then adoption is likely to be limited. At the limits of what can 365 be expressed by the language, authors are likely to revert to 366 defining the protocol in prose: this undermines the broad goal of 367 using structured and formal languages. Equally, though, 368 understandable specifications and ease of use are critical for 369 adoption. A tool that is simple to use and addresses the most 370 common use cases might be preferred to a complex tool that 371 addresses all use cases. 373 It may be desirable to restrict expressiveness, however, to 374 guarantee intrinsic safety, security, and computability properties 375 of both the generated parser code for the protocol, and the parser 376 of the description language itself. In much the same way as the 377 language-theoretic security ([LANGSEC]) community advocates for 378 programming language design to be informed by the desired 379 properties of the parsers for those languages, protocol designers 380 should be aware of the implications of their design choices. The 381 expressiveness of the protocol description languages that they use 382 to define their protocols can force such awareness. 384 Broadly, those languages that are more expressive tend to have 385 parsers that are more complex and less safe. As a result, while 386 considering the other goals described in this document, protocol 387 description languages should attempt to be minimally expressive, 388 and restrict protocol designs to those for which safe and secure 389 parsers can be generated. 391 Minimise required change: Any approach should require as few changes 392 as possible to the way that documents are formatted, authored, and 393 published. Forcing adoption of a particular structured or formal 394 language is incompatible with the IETF's standardisation process: 395 there are very few components of standards documents that are non- 396 optional. 398 4. Augmented Packet Header Diagrams 400 The design principles described in Section 3 can largely be met by 401 the existing uses of packet header diagrams. These diagrams aid 402 human readability, do not require new or specialised authorship 403 tools, do not split the specification into multiple parts, can 404 express most binary protocol features, and require no changes to 405 existing publication processes. 407 However, as discussed in Section 2.1 there are limitations to how 408 packet header diagrams are used that must be addressed if they are to 409 be parsed by machine. In this section, an augmented packet header 410 diagram format is described. 412 The concept is first illustrated by example. This is appropriate, 413 given the visual nature of the language. In future drafts, these 414 examples will be parsable using provided tools, and a formal 415 specification of the augmented packet diagrams will be given in 416 Appendix A. 418 4.1. PDUs with Fixed and Variable-Width Fields 420 The simplest PDU is one that contains only a set of fixed-width 421 fields in a known order, with no optional fields or variation in the 422 packet format. 424 Some packet formats include variable-width fields, where the size of 425 a field is either derived from the value of some previous field, or 426 is unspecified and inferred from the total size of the packet and the 427 size of the other fields. A packet can contain only one unspecified 428 length field, to ensure there is no ambiguity. 430 A PDU description is introduced by the exact phrase "A/An _______ is 431 formatted as follows:" at the end of a paragraph. This is followed 432 by the PDU description itself, as a packet diagram within an 433 element in the XML representation, starting with a header 434 line to show the bit width of the diagram. The description of the 435 fields follows the diagram, as an XML
list, after a paragraph 436 containing the text "where:". 438 Each field of the description starts with a
tag comprising the 439 field name and an optional short name in parenthesis. These are 440 followed by a colon, the field length, an optional presence 441 expression (described in Section 4.2), and a terminating period. The 442 following
tag contains a prose description of the field. Field 443 names cannot be the same as a previously defined PDU name. 445 For example, this can be illustrated using the IPv4 Header Format 446 [RFC791]. An IPv4 Header is formatted as follows: 448 0 1 2 3 449 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 450 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 451 |Version| IHL | DSCP |ECN| Total Length | 452 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 453 | Identification |Flags| Fragment Offset | 454 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 455 | Time to Live | Protocol | Header Checksum | 456 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 457 | Source Address | 458 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 459 | Destination Address | 460 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 461 | Options ... 462 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 463 | : 464 : Payload : 465 : | 466 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 468 where: 470 Version (V): 4 bits. This is a fixed-width field, whose full label 471 is shown in the diagram. The field's width -- 4 bits -- is given 472 in the label of the description list, separated from the field's 473 label by a colon. 475 Internet Header Length (IHL): 4 bits. This is a shorter field, whose 476 full label is too large to be shown in the diagram. A short label 477 (IHL) is used in the diagram, and this short label is provided, in 478 brackets, after the full label in the description list. 480 Differentiated Services Code Point (DSCP): 6 bits. This is a fixed- 481 width field, as previously discussed. 483 Explicit Congestion Notification (ECN): 2 bits. This is a fixed- 484 width field, as previously discussed. 486 Total Length (TL): 2 bytes. This is a fixed-width field, as 487 previously discussed. Where fields are an integral number of 488 bytes in size, the field length can be given in bytes rather than 489 in bits. 491 Identification: 2 bytes. This is a fixed-width field, as previously 492 discussed. 494 Flags: 3 bits. This is a fixed-width field, as previously discussed. 496 Fragment Offset: 13 bits. This is a fixed-width field, as previously 497 discussed. 499 Time to Live (TTL): 1 byte. This is a fixed-width field, as 500 previously discussed. 502 Protocol: 1 byte. This is a fixed-width field, as previously 503 discussed. 505 Header Checksum: 2 bytes. This is a fixed-width field, as previously 506 discussed. 508 Source Address: 32 bits. This is a fixed-width field, as previously 509 discussed. 511 Destination Address: 32 bits. This is a fixed-width field, as 512 previously discussed. 514 Options: (IHL-5)*32 bits. This is a variable-length field, whose 515 length is defined by the value of the field with short label IHL 516 (Internet Header Length). Constraint expressions can be used in 517 place of constant values: the grammar for the expression language 518 is defined in Appendix A.1. Constraints can include a previously 519 defined field's short or full label, where one has been defined. 521 Short variable-length fields are indicated by "..." instead of a 522 pipe at the end of the row. 524 Payload: TL - ((IHL*32)/8) bytes. This is a multi-row variable- 525 length field, constrained by the values of fields TL and IHL. 526 Instead of the "..." notation, ":" is used to indicate that the 527 field is variable-length. The use of ":" instead of "..." 528 indicates the field is likely to be a longer, multi-row field. 529 However, semantically, there is no difference: these different 530 notations are for the benefit of human readers. 532 4.2. PDUs That Cross-Reference Previously Defined Fields 534 Binary formats often reference sub-structures that have been defined 535 earlier in the specification. For example, in RTP [RFC3550], the 536 Contributing Source Identifiers in an RTP Data Packet are defined as 537 comprising a list of Source Identifier elements. A Source Identifier 538 is formatted as follows: 540 0 1 2 3 541 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 543 | SSRC | 544 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 546 where: 548 SSRC: 32 bits. This is a fixed-width field, as described previously. 550 The following example shows how a Source Identifier can be referenced 551 in the description of an RTP Data Packet. It also shows how the 552 presence of some fields in a format may be dependent on the values of 553 an earlier field. 555 An RTP Data Packet is formatted as follows: 557 0 1 2 3 558 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 559 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 560 | V |P|X| CC |M| PT | Sequence Number | 561 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 562 | Timestamp | 563 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 564 | Synchronization Source identifier | 565 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 566 | [Contributing Source identifiers] | 567 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 568 | Header Extension | 569 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 570 | Payload : 571 : : 572 : | 573 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 574 | Padding | Padding Count | 575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 577 where: 579 Version (V): 2 bits. This is a fixed-width field, as described 580 previously. 582 Padding (P): 1 bit. This is a fixed-width field, as described 583 previously. 585 Extension (X): 1 bit. This is a fixed-width field, as described 586 previously. 588 CSRC count (CC): 4 bits. This is a fixed-width field, as described 589 previously. 591 Marker (M): 1 bit. This is a fixed-width field, as described 592 previously. 594 Payload Type (PT): 7 bits. This is a fixed-width field, as described 595 previously. 597 Sequence Number (PT): 16 bits. This is a fixed-width field, as 598 described previously. 600 Timestamp (PT): 32 bits. This is a fixed-width field, as described 601 previously. 603 Synchronization Source identifier: 1 * Source Identifier. This is a 604 field whose structure is a previously defined PDU format (Source 605 Identifier). To indicate this, the width of the field is 606 expressed in terms of cross-referenced structure. When used in 607 constraint expressions, PDU names refer to the length of that PDU 608 structure. 610 Contributing Source identifiers: CC * Source Identifier. Where a 611 field is comprised of a sequence of previously defined structures, 612 square brackets can be used to indicate this in the diagram. The 613 length of the sequence can be defined using the constraint 614 expression grammar as described earlier. 616 In this example, both a PDU name (Source Identifier) and a field 617 name (CC) are used in the constraint expression. The PDU name 618 refers to the length of the PDU, while the field name refers to 619 the value of the field. This is possible because field names 620 cannot be the same as previously defined PDU names. 622 Header Extension: 32 bits; present only when X == 1. This is a field 623 whose presence is predicated on an expression given using the 624 constraint expression grammar described earlier. Optional fields 625 can be of any previously defined format (e.g., fixed- or variable- 626 width). Optional fields are indicated by the presence of "; 627 present only when [expr]." at the end of the definition term 628 (i.e., the text contained within the
tag). 630 [Note that this example deviates from the format as described in 631 [RFC3550]. As specified in that document, the Header Extension 632 would be a cross-referenced structure. This is not shown here for 633 brevity.] 635 Payload. The length of the Payload is not specified, and hence needs 636 to be inferred from the total length of the packet and the lengths 637 of the known fields. There can only be one field of unspecified 638 size in a PDU. 640 Padding: Padding Count bytes; present only when (P == 1) and 641 (Padding Count > 0). 642 This is a variable size field, with size dependent on a later 643 field in the packet. Fields can only depend on the value of a 644 later field if they follow a field with unspecified size. 646 Padding Count: 1 byte; present only when P == 1. This is a fixed- 647 width field, as previously discussed. 649 4.3. PDUs with Non-Contiguous Fields 651 In some binary formats, fields are striped across multiple non- 652 contiguous bits. This is often to allow for backwards compatibility 653 with previous definitions of the same fields in earlier documents: 654 striping in this way allows for careful use of the possible range of 655 values. 657 This format is illustrated using the STUN Message Type 658 [draft-ietf-tram-stunbis-21]. A STUN Message Type is formatted as 659 follows: 661 0 1 662 0 1 2 3 4 5 6 7 8 9 0 1 2 3 663 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 664 |M|M|M|M|M|C|M|M|M|C|M|M|M|M| 665 |B|A|9|8|7|1|6|5|4|0|3|2|1|0| 666 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 668 where: 670 Method (M): 12 bits. This field is comprised of multiple sub-fields 671 (M0 through MB) as shown in the diagram. That these sub-fields 672 should be concatenated, after parsing, into a single field is 673 indicated by their being labelled using the 'M' short field name 674 followed by a single hexadecimal digit, with the least significant 675 bit labelled with 0, and subsequent bits labelled in sequence. 677 Class (C): 2 bits. This field follows the same format as M described 678 above. 680 4.4. Importing PDU Definitions from Other Documents 682 Protocols are often specified across multiple documents, either 683 because the specification of a protocol's data units has changed over 684 time, or because of explicit extension points contained in the 685 protocol's original specification. To allow a document to make use 686 of a previous PDU definition, it is possible to import PDU 687 definitions (written in the format described in this document) from 688 other documents. 690 A PDU definition is imported using the exact phrase "A/An ________ is 691 formatted as described in ". The document 692 identifier must refer, unambiguously, to an existing document. An 693 Internet-Draft is identified by its name. RFCs are identified by 694 "RFC" followed by their number. 696 5. Open Issues 698 * Need a simple syntax for defining a list of identical objects, and 699 a way of referring to the size of the enclosing packet. The 700 format cannot currently represent RFC 6716 section 3.2.3, and 701 should be able to (the underlying type system can do so). 703 * Need some discussion about the checks that the tooling might 704 perform, and the implications of those checks. For example, the 705 tooling checks for consistency between the diagram and the 706 description list of fields, ensuring that fields match by name and 707 width. -01 of this draft had a field that mismatched because of 708 case: is this something that the tooling should identify? More 709 broadly, what is the trade-off between the rigour that the tooling 710 can enforce, and the flexibility desired/needed by authors? 712 * Need to describe the rules governing the import of PDU definitions 713 from other documents. 715 6. IANA Considerations 717 This document contains no actions for IANA. 719 7. Security Considerations 721 Poorly implemented parsers are a frequent source of security 722 vulnerabilities in protocol implementations. Structuring the 723 description of a protocol data unit so that a parser can be 724 automatically derived from the specification can reduce the 725 likelihood of vulnerable implementations. 727 As described in Section 3, the expressiveness of a protocol 728 description language has implications for the safety, security, and 729 computability properties of the parser for the protocol description 730 language itself, and on the generated parser code for the protocols 731 described using it. The language-theoretic security ([LANGSEC]) 732 community explores the security implications of programming language 733 design; the principles developed in that community should guide the 734 development of protocol description languages. 736 8. Acknowledgements 738 The authors would like to thank David Southgate for preparing a 739 prototype implementation of some of the ideas described here. 741 The authors would like to thank Marc Petit-Huguenin for feedback on 742 the draft. 744 This work has received funding from the UK Engineering and Physical 745 Sciences Research Council under grant EP/R04144X/1. 747 9. Informative References 749 [RFC8357] Deering, S. and R. Hinden, "Generalized UDP Source Port 750 for DHCP Relay", RFC 8357, March 2018, 751 . 753 [QUIC-TRANSPORT] 754 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 755 and Secure Transport", Work in Progress, Internet-Draft, 756 draft-ietf-quic-transport-20, 23 April 2019, 757 . 760 [RFC6958] Clark, A., Zhang, S., Zhao, J., and Q. Wu, "RTP Control 761 Protocol (RTCP) Extended Report (XR) Block for Burst/Gap 762 Loss Metric Reporting", RFC 6958, May 2013, 763 . 765 [RFC7950] Bjorklund, M., "The YANG 1.1 Data Modeling Language", 766 RFC 7950, August 2016, 767 . 769 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol 770 Version 1.3", RFC 8446, August 2018, 771 . 773 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 774 Specifications: ABNF", RFC 5234, January 2008, 775 . 777 [ASN1] ITU-T, "ITU-T Recommendation X.680, X.681, X.682, and 778 X.683", ITU-T Recommendation X.680, X.681, X.682, and 779 X.683. 781 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 782 Representation (CBOR)", RFC 7049, October 2013, 783 . 785 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 786 Jacobson, "RTP: A Transport Protocol for Real-Time 787 Applications", RFC 3550, July 2003, 788 . 790 [draft-ietf-tram-stunbis-21] 791 Petit-Huguenin, M., Salgueiro, G., Rosenberg, J., Wing, 792 D., Mahy, R., and P. Matthews, "Session Traversal 793 Utilities for NAT (STUN)", Work in Progress, Internet- 794 Draft, draft-ietf-tram-stunbis-21, 21 March 2019, 795 . 798 [RFC791] Postel, J., "Internet Protocol", RFC 791, September 1981, 799 . 801 [RFC793] Postel, J., "Transmission Control Protocol", RFC 793, 802 September 1981, . 804 [LANGSEC] LANGSEC, "LANGSEC: Language-theoretic Security", 805 . 807 Appendix A. ABNF specification 809 A.1. Constraint Expressions 811 cond-expr = eq-expr "?" cond-expr ":" eq-expr 812 eq-expr = bool-expr eq-op bool-expr 813 bool-expr = ord-expr bool-op ord-expr 814 ord-expr = add-expr ord-op add-expr 816 add-expr = mul-expr add-op mul-expr 817 mul-expr = expr mul-op expr 818 expr = *DIGIT / field-name / 819 field-name-ws / "(" expr ")" 821 field-name = *ALPHA 822 field-name-ws = *(field-name " ") 824 mul-op = "*" / "/" / "%" 825 add-op = "+" / "-" 826 ord-op = "<=" / "<" / ">=" / ">" 827 bool-op = "&&" / "||" / "!" 828 eq-op = "==" / "!=" 830 A.2. Augmented packet diagrams 832 Future revisions of this draft will include an ABNF specification for 833 the augmented packet diagram format described in Section 4. Such a 834 specification is omitted from this draft given that the format is 835 likely to change as its syntax is developed. Given the visual nature 836 of the format, it is more appropriate for discussion to focus on the 837 examples given in Section 4. 839 Appendix B. Source code repository 841 The source code for tooling that can be used to parse this document 842 is available from https://github.com/lumisota/improving-protocol- 843 standards. 845 Authors' Addresses 847 Stephen McQuistin 848 University of Glasgow 849 School of Computing Science 850 Glasgow 851 G12 8QQ 852 United Kingdom 854 Email: sm@smcquistin.uk 856 Vivian Band 857 University of Glasgow 858 School of Computing Science 859 Glasgow 860 G12 8QQ 861 United Kingdom 863 Email: vivianband0@gmail.com 865 Colin Perkins 866 University of Glasgow 867 School of Computing Science 868 Glasgow 869 G12 8QQ 870 United Kingdom 872 Email: csp@csperkins.org