idnits 2.17.1 draft-mcquistin-augmented-ascii-diagrams-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 831 has weird spacing: '...r eq-op bool...' == Line 832 has weird spacing: '...rd-expr bool-...' == Line 833 has weird spacing: '...dd-expr ord-o...' == Line 835 has weird spacing: '...ul-expr add-o...' == Line 836 has weird spacing: '... mul-op expr...' -- The document date (9 March 2020) is 1508 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-20 -- Obsolete informational reference (is this intentional?): RFC 7049 (Obsoleted by RFC 8949) -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. McQuistin 3 Internet-Draft V. Band 4 Intended status: Experimental D. Jacob 5 Expires: 10 September 2020 C. S. Perkins 6 University of Glasgow 7 9 March 2020 9 Describing Protocol Data Units with Augmented Packet Header Diagrams 10 draft-mcquistin-augmented-ascii-diagrams-03 12 Abstract 14 This document describes a machine-readable format for specifying the 15 syntax of protocol data units within a protocol specification. This 16 format is comprised of a consistently formatted packet header 17 diagram, followed by structured explanatory text. It is designed to 18 maintain human readability while enabling support for automated 19 parser generation from the specification document. This document is 20 itself an example of how the format can be used. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at https://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on 10 September 2020. 39 Copyright Notice 41 Copyright (c) 2020 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 46 license-info) in effect on the date of publication of this document. 47 Please review these documents carefully, as they describe your rights 48 and restrictions with respect to this document. Code Components 49 extracted from this document must include Simplified BSD License text 50 as described in Section 4.e of the Trust Legal Provisions and are 51 provided without warranty as described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 56 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 57 2.1. Limitations of Current Packet Format Diagrams . . . . . . 4 58 2.2. Formal languages in standards documents . . . . . . . . . 7 59 3. Design Principles . . . . . . . . . . . . . . . . . . . . . . 7 60 4. Augmented Packet Header Diagrams . . . . . . . . . . . . . . 9 61 4.1. PDUs with Fixed and Variable-Width Fields . . . . . . . . 10 62 4.2. PDUs That Cross-Reference Previously Defined Fields . . . 12 63 4.3. PDUs with Non-Contiguous Fields . . . . . . . . . . . . . 15 64 4.4. Importing PDU Definitions from Other Documents . . . . . 15 65 5. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 16 66 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 67 7. Security Considerations . . . . . . . . . . . . . . . . . . . 16 68 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 17 69 9. Informative References . . . . . . . . . . . . . . . . . . . 17 70 Appendix A. ABNF specification . . . . . . . . . . . . . . . . . 18 71 A.1. Constraint Expressions . . . . . . . . . . . . . . . . . 18 72 A.2. Augmented packet diagrams . . . . . . . . . . . . . . . . 19 73 Appendix B. Source code repository . . . . . . . . . . . . . . . 19 74 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 76 1. Introduction 78 Packet header diagrams have become a widely used format for 79 describing the syntax of binary protocols. In otherwise largely 80 textual documents, they allow for the visualisation of packet 81 formats, reducing human error, and aiding in the implementation of 82 parsers for the protocols that they specify. 84 Figure 1 gives an example of how packet header diagrams are used to 85 define binary protocol formats. The format has an obvious structure: 86 the diagram clearly delineates each field, showing its width and its 87 position within the header. This type of diagram is designed for 88 human readers, but is consistent enough that it should be possible to 89 develop a tool that generates a parser for the packet format from the 90 diagram. 92 : 0 1 2 3 93 : 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 94 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 95 : | Source Port | Destination Port | 96 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 97 : | Sequence Number | 98 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 99 : | Acknowledgment Number | 100 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 101 : | Data | |U|A|P|R|S|F| | 102 : | Offset| Reserved |R|C|S|S|Y|I| Window | 103 : | | |G|K|H|T|N|N| | 104 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 105 : | Checksum | Urgent Pointer | 106 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 107 : | Options | Padding | 108 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 109 : | data | 110 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 112 Figure 1: TCP's header format (from [RFC793]) 114 Unfortunately, the format of such packet diagrams varies both within 115 and between documents. This variation makes it difficult to build 116 tools to generate parsers from the specifications. Better tooling 117 could be developed if protocol specifications adopted a consistent 118 format for their packet descriptions. Indeed, this underpins the 119 format described by this draft: we want to retain the benefits that 120 packet header diagrams provide, while identifying the benefits of 121 adopting a consistent format. 123 This document describes a consistent packet header diagram format and 124 accompanying structured text constructs that allow for the parsing 125 process of protocol headers to be fully specified. This provides 126 support for the automatic generation of parser code. Broad design 127 principles, that seek to maintain the primacy of human readability 128 and flexibility in writing, are described, before the format itself 129 is given. 131 This document is itself an example of the approach that it describes, 132 with the packet header diagrams and structured text format described 133 by example. Examples that do not form part of the protocol 134 description language are marked by a colon at the beginning of each 135 line; this prevents them from being parsed by the accompanying 136 tooling. 138 This draft describes early work. As consensus builds around the 139 particular syntax of the format described, both a formal ABNF 140 specification (Appendix A) and code (Appendix B) that parses it (and, 141 as described above, this document) will be provided. 143 2. Background 145 This section begins by considering how packet header diagrams are 146 used in existing documents. This exposes the limitations that the 147 current usage has in terms of machine-readability, guiding the design 148 of the format that this document proposes. 150 While this document focuses on the machine-readability of packet 151 format diagrams, this section also discusses the use of other 152 structured or formal languages within IETF documents. Considering 153 how and why these languages are used provides an instructive contrast 154 to the relatively incremental approach proposed here. 156 2.1. Limitations of Current Packet Format Diagrams 158 : The RESET_STREAM frame is as follows: 159 : 160 : 0 1 2 3 161 : 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 162 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 163 : | Stream ID (i) ... 164 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 165 : | Application Error Code (16) | 166 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 167 : | Final Size (i) ... 168 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 169 : 170 : RESET_STREAM frames contain the following fields: 171 : 172 : Stream ID: A variable-length integer encoding of the Stream ID 173 : of the stream being terminated. 174 : 175 : Application Protocol Error Code: A 16-bit application protocol 176 : error code (see Section 20.1) which indicates why the stream 177 : is being closed. 178 : 179 : Final Size: A variable-length integer indicating the final size 180 : of the stream by the RESET_STREAM sender, in unit of bytes. 182 Figure 2: QUIC's RESET_STREAM frame format (from [QUIC-TRANSPORT]) 184 Packet header diagrams are frequently used in IETF standards to 185 describe the format of binary protocols. While there is no standard 186 for how these diagrams should be formatted, they have a broadly 187 similar structure, where the layout of a protocol data unit (PDU) or 188 structure is shown in diagrammatic form, followed by a description 189 list of the fields that it contains. An example of this format, 190 taken from the QUIC specification, is given in Figure 2. 192 These packet header diagrams, and the accompanying descriptions, are 193 formatted for human readers rather than for automated processing. As 194 a result, while there is rough consistency in how packet header 195 diagrams are formatted, there are a number of limitations that make 196 them difficult to work with programmatically: 198 Inconsistent syntax: There are two classes of consistency that are 199 needed to support automated processing of specifications: internal 200 consistency within a diagram or document, and external consistency 201 across all documents. 203 Figure 2 gives an example of internal inconsistency. Here, the 204 packet diagram shows a field labelled "Application Error Code", 205 while the accompanying description lists the field as "Application 206 Protocol Error Code". The use of an abbreviated name is suitable 207 for human readers, but makes parsing the structure difficult for 208 machines. Figure 3 gives a further example, where the description 209 includes an "Option-Code" field that does not appear in the packet 210 diagram; and where the description states that each field is 16 211 bits in length, but the diagram shows the OPTION_RELAY_PORT as 13 212 bits, and Option-Len as 19 bits. Another example is [RFC6958], 213 where the packet format diagram showing the structure of the 214 Burst/Gap Loss Metrics Report Block shows the Number of Bursts 215 field as being 12 bits wide but the corresponding text describes 216 it as 16 bits. 218 Comparing Figure 2 with Figure 3 exposes external inconsistency 219 across documents. While the packet format diagrams are broadly 220 similar, the surrounding text is formatted differently. If 221 machine parsing is to be made possible, then this text must be 222 structured consistently. 224 Ambiguous constraints: The constraints that are enforced on a 225 particular field are often described ambiguously, or in a way that 226 cannot be parsed easily. In Figure 3, each of the three fields in 227 the structure is constrained. The first two fields ("Option-Code" 228 and "Option-Len") are to be set to constant values (note the 229 inconsistency in how these constraints are expressed in the 230 description). However, the third field ("Downstream Source Port") 231 can take a value from a constrained set. This constraint is 232 expressed in prose that cannot readily by understood by machine. 234 Poor linking between sub-structures: Protocol data units and other 235 structures are often comprised of sub-structures that are defined 236 elsewhere, either in the same document, or within another 237 document. Chaining these structures together is essential for 238 machine parsing: the parsing process for a protocol data unit is 239 only fully expressed if all elements can be parsed. 241 Figure 2 highlights the difficulty that machine parsers have in 242 chaining structures together. Two fields ("Stream ID" and "Final 243 Size") are described as being encoded as variable-length integers; 244 this is a structure described elsewhere in the same document. 245 Structured text is required both alongside the definition of the 246 containing structure and with the definition of the sub-structure, 247 to allow a parser to link the two together. 249 Lack of extension and evolution syntax: Protocols are often 250 specified across multiple documents, either because the protocol 251 explicitly includes extension points (e.g., profiles and payload 252 format specifications in RTP [RFC3550]) or because definition of a 253 protocol data unit has changed and evolved over time. As a 254 result, it is essential that syntax be provided to allow for a 255 complete definition of a protocol's parsing process to be 256 constructed across multiple documents. 258 : The format of the "Relay Source Port Option" is shown below: 259 : 260 : 0 1 2 3 261 : 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 262 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 263 : | OPTION_RELAY_PORT | Option-Len | 264 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 265 : | Downstream Source Port | 266 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 267 : 268 : Where: 269 : 270 : Option-Code: OPTION_RELAY_PORT. 16-bit value, 135. 271 : 272 : Option-Len: 16-bit value to be set to 2. 273 : 274 : Downstream Source Port: 16-bit value. To be set by the IPv6 275 : relay either to the downstream relay agent's UDP source port 276 : used for the UDP packet, or to zero if only the local relay 277 : agent uses the non-DHCP UDP port (not 547). 279 Figure 3: DHCPv6's Relay Source Port Option (from [RFC8357]) 281 2.2. Formal languages in standards documents 283 A small proportion of IETF standards documents contain structured and 284 formal languages, including ABNF [RFC5234], ASN.1 [ASN1], C, CBOR 285 [RFC7049], JSON, the TLS presentation language [RFC8446], YANG models 286 [RFC7950], and XML. While this broad range of languages may be 287 problematic for the development of tooling to parse specifications, 288 these, and other, languages serve a range of different use cases. 289 ABNF, for example, is typically used to specify text protocols, while 290 ASN.1 is used to specify data structure serialisation. This document 291 specifies a structured language for specifying the parsing of binary 292 protocol data units. 294 3. Design Principles 296 The use of structures that are designed to support machine 297 readability might potentially interfere with the existing ways in 298 which protocol specifications are used and authored. To the extent 299 that these existing uses are more important than machine readability, 300 such interference must be minimised. 302 In this section, the broad design principles that underpin the format 303 described by this document are given. However, these principles 304 apply more generally to any approach that introduces structured and 305 formal languages into standards documents. 307 It should be noted that these are design principles: they expose the 308 trade-offs that are inherent within any given approach. Violating 309 these principles is sometimes necessary and beneficial, and this 310 document sets out the potential consequences of doing so. 312 The central tenet that underpins these design principles is a 313 recognition that the standardisation process is not broken, and so 314 does not need to be fixed. Failure to recognise this will likely 315 lead to approaches that are incompatible with the standards process, 316 or that will see limited adoption. However, the standards process 317 can be improved with appropriate approaches, as guided by the 318 following broad design principles: 320 Most readers are human: Primarily, standards documents should be 321 written for people, who require text and diagrams that they can 322 understand. Structures that cannot be easily parsed by people 323 should be avoided, and if included, should be clearly delineated 324 from human-readable content. 326 Any approach that shifts this balance -- that is, that primarily 327 targets machine readers -- is likely to be disruptive to the 328 standardisation process, which relies upon discussion centered 329 around documents written in prose. 331 Writing tools are diverse: Standards document writing is a 332 distributed process that involves a diverse set of tools and 333 workflows. The introduction of machine-readable structures into 334 specifications should not require that specific tools are used to 335 produce standards documents, to ensure that disruption to existing 336 workflows is minimised. This does not preclude the development of 337 optional, supplementary tools that aid in the authoring machine- 338 readable structures. 340 The immediate impact of requiring specific tooling is that 341 adoption is likely to be limited. A long-term impact might be 342 that authors whose workflows are incompatible might be alienated 343 from the process. 345 Canonical specifications: As far as possible, machine-readable 346 structures should not replicate the human readable specification 347 of the protocol within the same document. Machine-readable 348 structures should form part of a canonical specification of the 349 protocol. Adding supplementary machine-readable structures, in 350 parallel to the existing human readable text, is undesirable 351 because it creates the potential for inconsistency. 353 As an example, program code that describes how a protocol data 354 unit can be parsed might be provided as an appendix within a 355 standards document. This code would provide a specification of 356 the protocol that is separate to the prose description in the main 357 body of the document. This has the undesirable effect of 358 introducing the potential for the program code to specify 359 behaviour that the prose-based specification does not, and vice- 360 versa. 362 Expressiveness: Any approach should be expressive enough to capture 363 the syntax and parsing process for the majority of binary 364 protocols. If a given language is not sufficiently expressive, 365 then adoption is likely to be limited. At the limits of what can 366 be expressed by the language, authors are likely to revert to 367 defining the protocol in prose: this undermines the broad goal of 368 using structured and formal languages. Equally, though, 369 understandable specifications and ease of use are critical for 370 adoption. A tool that is simple to use and addresses the most 371 common use cases might be preferred to a complex tool that 372 addresses all use cases. 374 It may be desirable to restrict expressiveness, however, to 375 guarantee intrinsic safety, security, and computability properties 376 of both the generated parser code for the protocol, and the parser 377 of the description language itself. In much the same way as the 378 language-theoretic security ([LANGSEC]) community advocates for 379 programming language design to be informed by the desired 380 properties of the parsers for those languages, protocol designers 381 should be aware of the implications of their design choices. The 382 expressiveness of the protocol description languages that they use 383 to define their protocols can force such awareness. 385 Broadly, those languages that have grammars which are more 386 expressive tend to have parsers that are more complex and less 387 safe. As a result, while considering the other goals described in 388 this document, protocol description languages should attempt to be 389 minimally expressive, and either restrict protocol designs to 390 those for which safe and secure parsers can be generated, or as a 391 minimum, ensure that protocol designers are aware of the 392 boundaries their designs cross, in terms of computability and 393 decidability [SASSAMAN]. 395 Minimise required change: Any approach should require as few changes 396 as possible to the way that documents are formatted, authored, and 397 published. Forcing adoption of a particular structured or formal 398 language is incompatible with the IETF's standardisation process: 399 there are very few components of standards documents that are non- 400 optional. 402 4. Augmented Packet Header Diagrams 404 The design principles described in Section 3 can largely be met by 405 the existing uses of packet header diagrams. These diagrams aid 406 human readability, do not require new or specialised tools to write, 407 do not split the specification into multiple parts, can express most 408 binary protocol features, and require no changes to existing 409 publication processes. 411 However, as discussed in Section 2.1 there are limitations to how 412 packet header diagrams are used that must be addressed if they are to 413 be parsed by machine. In this section, an augmented packet header 414 diagram format is described. 416 The concept is first illustrated by example. This is appropriate, 417 given the visual nature of the language. In future drafts, these 418 examples will be parsable using provided tools, and a formal 419 specification of the augmented packet diagrams will be given in 420 Appendix A. 422 4.1. PDUs with Fixed and Variable-Width Fields 424 The simplest PDU is one that contains only a set of fixed-width 425 fields in a known order, with no optional fields or variation in the 426 packet format. 428 Some packet formats include variable-width fields, where the size of 429 a field is either derived from the value of some previous field, or 430 is unspecified and inferred from the total size of the packet and the 431 size of the other fields. 433 To ensure that there is no ambiguity, a PDU description can contain 434 only one field whose length is unspecified. The length of a single 435 field, where all other fields are of known (but perhaps variable) 436 length, can be inferred from the total size of the containing PDU. 438 A PDU description is introduced by the exact phrase "A/An _______ is 439 formatted as follows:" at the end of a paragraph. This is followed 440 by the PDU description itself, as a packet diagram within an 441 element in the XML representation, starting with a header 442 line to show the bit width of the diagram. The description of the 443 fields follows the diagram, as an XML
list, after a paragraph 444 containing the text "where:". 446 PDU names must be unique, both within a document, and across all 447 documents that are linked together (i.e., using the structured 448 language defined in Section 4.4). 450 Each field of the description starts with a
tag comprising the 451 field name and an optional short name in parenthesis. These are 452 followed by a colon, the field length, an optional presence 453 expression (described in Section 4.2), and a terminating period. The 454 following
tag contains a prose description of the field. Field 455 names cannot be the same as a previously defined PDU name, and must 456 be unique within a given structure definition. 458 For example, this can be illustrated using the IPv4 Header Format 459 [RFC791]. An IPv4 Header is formatted as follows: 461 0 1 2 3 462 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 463 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 464 |Version| IHL | DSCP |ECN| Total Length | 465 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 466 | Identification |Flags| Fragment Offset | 467 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 468 | Time to Live | Protocol | Header Checksum | 469 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 470 | Source Address | 471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 472 | Destination Address | 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 474 | Options ... 475 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 476 | : 477 : Payload : 478 : | 479 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 481 where: 483 Version (V): 4 bits. This is a fixed-width field, whose full label 484 is shown in the diagram. The field's width -- 4 bits -- is given 485 in the label of the description list, separated from the field's 486 label by a colon. 488 Internet Header Length (IHL): 4 bits. This is a shorter field, whose 489 full label is too large to be shown in the diagram. A short label 490 (IHL) is used in the diagram, and this short label is provided, in 491 brackets, after the full label in the description list. 493 Differentiated Services Code Point (DSCP): 6 bits. This is a fixed- 494 width field, as previously discussed. 496 Explicit Congestion Notification (ECN): 2 bits. This is a fixed- 497 width field, as previously discussed. 499 Total Length (TL): 2 bytes. This is a fixed-width field, as 500 previously discussed. Where fields are an integral number of 501 bytes in size, the field length can be given in bytes rather than 502 in bits. 504 Identification: 2 bytes. This is a fixed-width field, as previously 505 discussed. 507 Flags: 3 bits. This is a fixed-width field, as previously discussed. 509 Fragment Offset: 13 bits. This is a fixed-width field, as previously 510 discussed. 512 Time to Live (TTL): 1 byte. This is a fixed-width field, as 513 previously discussed. 515 Protocol: 1 byte. This is a fixed-width field, as previously 516 discussed. 518 Header Checksum: 2 bytes. This is a fixed-width field, as previously 519 discussed. 521 Source Address: 32 bits. This is a fixed-width field, as previously 522 discussed. 524 Destination Address: 32 bits. This is a fixed-width field, as 525 previously discussed. 527 Options: (IHL-5)*32 bits. This is a variable-length field, whose 528 length is defined by the value of the field with short label IHL 529 (Internet Header Length). Constraint expressions can be used in 530 place of constant values: the grammar for the expression language 531 is defined in Appendix A.1. Constraints can include a previously 532 defined field's short or full label, where one has been defined. 533 Short variable-length fields are indicated by "..." instead of a 534 pipe at the end of the row. 536 Payload: TL - ((IHL*32)/8) bytes. This is a multi-row variable- 537 length field, constrained by the values of fields TL and IHL. 538 Instead of the "..." notation, ":" is used to indicate that the 539 field is variable-length. The use of ":" instead of "..." 540 indicates the field is likely to be a longer, multi-row field. 541 However, semantically, there is no difference: these different 542 notations are for the benefit of human readers. 544 4.2. PDUs That Cross-Reference Previously Defined Fields 546 Binary formats often reference sub-structures that have been defined 547 earlier in the specification. For example, in RTP [RFC3550], the 548 Contributing Source Identifiers in an RTP Data Packet are defined as 549 comprising a list of Source Identifier elements. A Source Identifier 550 is formatted as follows: 552 0 1 2 3 553 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 554 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 555 | SSRC | 556 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 558 where: 560 SSRC: 32 bits. This is a fixed-width field, as described previously. 562 The following example shows how a Source Identifier can be referenced 563 in the description of an RTP Data Packet. It also shows how the 564 presence of some fields in a format may be dependent on the values of 565 an earlier field. 567 An RTP Data Packet is formatted as follows: 569 0 1 2 3 570 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 571 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 572 | V |P|X| CC |M| PT | Sequence Number | 573 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 574 | Timestamp | 575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 576 | Synchronization Source identifier | 577 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 578 | [Contributing Source identifiers] | 579 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 580 | Header Extension | 581 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 582 | Payload : 583 : : 584 : | 585 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 586 | Padding | Padding Count | 587 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 589 where: 591 Version (V): 2 bits. This is a fixed-width field, as described 592 previously. 594 Padding (P): 1 bit. This is a fixed-width field, as described 595 previously. 597 Extension (X): 1 bit. This is a fixed-width field, as described 598 previously. 600 CSRC count (CC): 4 bits. This is a fixed-width field, as described 601 previously. 603 Marker (M): 1 bit. This is a fixed-width field, as described 604 previously. 606 Payload Type (PT): 7 bits. This is a fixed-width field, as described 607 previously. 609 Sequence Number (PT): 16 bits. This is a fixed-width field, as 610 described previously. 612 Timestamp (PT): 32 bits. This is a fixed-width field, as described 613 previously. 615 Synchronization Source identifier: 1 * Source Identifier. This is a 616 field whose structure is a previously defined PDU format (Source 617 Identifier). To indicate this, the width of the field is 618 expressed in terms of cross-referenced structure. When used in 619 constraint expressions, PDU names refer to the length of that PDU 620 structure. 622 Contributing Source identifiers: CC * Source Identifier. Where a 623 field is comprised of a sequence of previously defined structures, 624 square brackets can be used to indicate this in the diagram. The 625 length of the sequence can be defined using the constraint 626 expression grammar as described earlier. 628 In this example, both a PDU name (Source Identifier) and a field 629 name (CC) are used in the constraint expression. The PDU name 630 refers to the length of the PDU, while the field name refers to 631 the value of the field. This is possible because field names 632 cannot be the same as previously defined PDU names. 634 Header Extension: 32 bits; present only when X == 1. This is a field 635 whose presence is predicated on an expression given using the 636 constraint expression grammar described earlier. Optional fields 637 can be of any previously defined format (e.g., fixed- or variable- 638 width). Optional fields are indicated by the presence of "; 639 present only when [expr]." at the end of the definition term 640 (i.e., the text contained within the
tag). 642 [Note that this example deviates from the format as described in 643 [RFC3550]. As specified in that document, the Header Extension 644 would be a cross-referenced structure. This is not shown here for 645 brevity.] 647 Payload. The length of the Payload is not specified, and hence needs 648 to be inferred from the total length of the packet and the lengths 649 of the known fields. There can only be one field of unspecified 650 size in a PDU. 652 Padding: Padding Count bytes; present only when (P == 1) and 653 (Padding Count > 0). 655 This is a variable size field, with size dependent on a later 656 field in the packet. Fields can only depend on the value of a 657 later field if they follow a field with unspecified size. 659 Padding Count: 1 byte; present only when P == 1. This is a fixed- 660 width field, as previously discussed. 662 4.3. PDUs with Non-Contiguous Fields 664 In some binary formats, fields are striped across multiple non- 665 contiguous bits. This is often to allow for backwards compatibility 666 with previous definitions of the same fields in earlier documents: 667 striping in this way allows for careful use of the possible range of 668 values. 670 This format is illustrated using the STUN Message Type 671 [draft-ietf-tram-stunbis-21]. A STUN Message Type is formatted as 672 follows: 674 0 1 675 0 1 2 3 4 5 6 7 8 9 0 1 2 3 676 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 677 |M|M|M|M|M|C|M|M|M|C|M|M|M|M| 678 |B|A|9|8|7|1|6|5|4|0|3|2|1|0| 679 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 681 where: 683 Method (M): 12 bits. This field is comprised of multiple sub-fields 684 (M0 through MB) as shown in the diagram. That these sub-fields 685 should be concatenated, after parsing, into a single field is 686 indicated by their being labelled using the 'M' short field name 687 followed by a single hexadecimal digit, with the least significant 688 bit labelled with 0, and subsequent bits labelled in sequence. 690 Class (C): 2 bits. This field follows the same format as M described 691 above. 693 4.4. Importing PDU Definitions from Other Documents 695 Protocols are often specified across multiple documents, either 696 because the specification of a protocol's data units has changed over 697 time, or because of explicit extension points contained in the 698 protocol's original specification. To allow a document to make use 699 of a previous PDU definition, it is possible to import PDU 700 definitions (written in the format described in this document) from 701 other documents. 703 A PDU definition is imported using the exact phrase "A/An ________ is 704 formatted as described in ". The document 705 identifier must refer, unambiguously, to an existing document. An 706 Internet-Draft is identified by its name. RFCs are identified by 707 "RFC" followed by their number. 709 5. Open Issues 711 * Need a simple syntax for defining a list of identical objects, and 712 a way of referring to the size of the enclosing packet. The 713 format cannot currently represent RFC 6716 section 3.2.3, and 714 should be able to (the underlying type system can do so). 716 * Need some discussion about the checks that the tooling might 717 perform, and the implications of those checks. For example, the 718 tooling checks for consistency between the diagram and the 719 description list of fields, ensuring that fields match by name and 720 width. -01 of this draft had a field that mismatched because of 721 case: is this something that the tooling should identify? More 722 broadly, what is the trade-off between the rigour that the tooling 723 can enforce, and the flexibility desired/needed by authors? 725 * Need to describe the rules governing the import of PDU definitions 726 from other documents. 728 6. IANA Considerations 730 This document contains no actions for IANA. 732 7. Security Considerations 734 Poorly implemented parsers are a frequent source of security 735 vulnerabilities in protocol implementations. Structuring the 736 description of a protocol data unit so that a parser can be 737 automatically derived from the specification can reduce the 738 likelihood of vulnerable implementations. 740 As described in Section 3, the expressiveness of a protocol 741 description language has implications for the safety, security, and 742 computability properties of the parser for the protocol description 743 language itself, and on the generated parser code for the protocols 744 described using it. The language-theoretic security ([LANGSEC]) 745 community explores the security implications of programming language 746 design; the principles developed in that community should guide the 747 development of protocol description languages. 749 8. Acknowledgements 751 The authors would like to thank David Southgate for preparing a 752 prototype implementation of some of the ideas described here. 754 The authors would like to thank Marc Petit-Huguenin for feedback on 755 the draft. 757 This work has received funding from the UK Engineering and Physical 758 Sciences Research Council under grant EP/R04144X/1. 760 9. Informative References 762 [RFC8357] Deering, S. and R. Hinden, "Generalized UDP Source Port 763 for DHCP Relay", RFC 8357, March 2018, 764 . 766 [QUIC-TRANSPORT] 767 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 768 and Secure Transport", Work in Progress, Internet-Draft, 769 draft-ietf-quic-transport-20, 23 April 2019, 770 . 773 [RFC6958] Clark, A., Zhang, S., Zhao, J., and Q. Wu, "RTP Control 774 Protocol (RTCP) Extended Report (XR) Block for Burst/Gap 775 Loss Metric Reporting", RFC 6958, May 2013, 776 . 778 [RFC7950] Bjorklund, M., "The YANG 1.1 Data Modeling Language", 779 RFC 7950, August 2016, 780 . 782 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol 783 Version 1.3", RFC 8446, August 2018, 784 . 786 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 787 Specifications: ABNF", RFC 5234, January 2008, 788 . 790 [ASN1] ITU-T, "ITU-T Recommendation X.680, X.681, X.682, and 791 X.683", ITU-T Recommendation X.680, X.681, X.682, and 792 X.683. 794 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 795 Representation (CBOR)", RFC 7049, October 2013, 796 . 798 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 799 Jacobson, "RTP: A Transport Protocol for Real-Time 800 Applications", RFC 3550, July 2003, 801 . 803 [draft-ietf-tram-stunbis-21] 804 Petit-Huguenin, M., Salgueiro, G., Rosenberg, J., Wing, 805 D., Mahy, R., and P. Matthews, "Session Traversal 806 Utilities for NAT (STUN)", Work in Progress, Internet- 807 Draft, draft-ietf-tram-stunbis-21, 21 March 2019, 808 . 811 [RFC791] Postel, J., "Internet Protocol", RFC 791, September 1981, 812 . 814 [RFC793] Postel, J., "Transmission Control Protocol", RFC 793, 815 September 1981, . 817 [LANGSEC] LANGSEC, "LANGSEC: Language-theoretic Security", 818 . 820 [SASSAMAN] Sassaman, L., Patterson, M. L., Bratus, S., and A. 821 Shubina, "The Halting Problems of Network Stack 822 Insecurity", ;login: -- December 2011, Volume 36, Number 823 6, . 827 Appendix A. ABNF specification 829 A.1. Constraint Expressions 830 cond-expr = eq-expr "?" cond-expr ":" eq-expr 831 eq-expr = bool-expr eq-op bool-expr 832 bool-expr = ord-expr bool-op ord-expr 833 ord-expr = add-expr ord-op add-expr 835 add-expr = mul-expr add-op mul-expr 836 mul-expr = expr mul-op expr 837 expr = *DIGIT / field-name / 838 field-name-ws / "(" expr ")" 840 field-name = *ALPHA 841 field-name-ws = *(field-name " ") 843 mul-op = "*" / "/" / "%" 844 add-op = "+" / "-" 845 ord-op = "<=" / "<" / ">=" / ">" 846 bool-op = "&&" / "||" / "!" 847 eq-op = "==" / "!=" 849 A.2. Augmented packet diagrams 851 Future revisions of this draft will include an ABNF specification for 852 the augmented packet diagram format described in Section 4. Such a 853 specification is omitted from this draft given that the format is 854 likely to change as its syntax is developed. Given the visual nature 855 of the format, it is more appropriate for discussion to focus on the 856 examples given in Section 4. 858 Appendix B. Source code repository 860 The source for this draft is available from https://github.com/ 861 glasgow-ipl/draft-mcquistin-augmented-ascii-diagrams. 863 The source code for tooling that can be used to parse this document 864 is available from https://github.com/glasgow-ipl/ips-protodesc-code. 866 Authors' Addresses 868 Stephen McQuistin 869 University of Glasgow 870 School of Computing Science 871 Glasgow 872 G12 8QQ 873 United Kingdom 875 Email: sm@smcquistin.uk 876 Vivian Band 877 University of Glasgow 878 School of Computing Science 879 Glasgow 880 G12 8QQ 881 United Kingdom 883 Email: vivianband0@gmail.com 885 Dejice Jacob 886 University of Glasgow 887 School of Computing Science 888 Glasgow 889 G12 8QQ 890 United Kingdom 892 Email: d.jacob.1@research.gla.ac.uk 894 Colin Perkins 895 University of Glasgow 896 School of Computing Science 897 Glasgow 898 G12 8QQ 899 United Kingdom 901 Email: csp@csperkins.org