idnits 2.17.1 draft-petithuguenin-computerate-specifying-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 4 characters in excess of 72. == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (17 November 2019) is 1622 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-13) exists of draft-mcquistin-augmented-ascii-diagrams-01 Summary: 3 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Petit-Huguenin 3 Internet-Draft Impedance Mismatch LLC 4 Intended status: Experimental 17 November 2019 5 Expires: 20 May 2020 7 The Computerate Specifying Paradigm 8 draft-petithuguenin-computerate-specifying-02 10 Abstract 12 This document specifies a paradigm named Computerate Specifying, 13 designed to simultaneously document and formally specify 14 communication protocols. This paradigm can be applied to any 15 document produced by any Standard Developing Organization (SDO), but 16 this document targets specifically documents produced by the IETF. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at https://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on 20 May 2020. 35 Copyright Notice 37 Copyright (c) 2019 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 42 license-info) in effect on the date of publication of this document. 43 Please review these documents carefully, as they describe your rights 44 and restrictions with respect to this document. Code Components 45 extracted from this document must include Simplified BSD License text 46 as described in Section 4.e of the Trust Legal Provisions and are 47 provided without warranty as described in the Simplified BSD License. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 52 2. Overview of Operations . . . . . . . . . . . . . . . . . . . 4 53 2.1. Libraries . . . . . . . . . . . . . . . . . . . . . . . . 6 54 2.2. Retrofitting Specifications . . . . . . . . . . . . . . . 7 55 2.3. Revision of Standards . . . . . . . . . . . . . . . . . . 8 56 2.4. Content of a Computerate Specification . . . . . . . . . 8 57 3. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 58 3.1. Syntax Examples . . . . . . . . . . . . . . . . . . . . . 9 59 3.1.1. Data Type . . . . . . . . . . . . . . . . . . . . . . 9 60 3.1.2. Serializer . . . . . . . . . . . . . . . . . . . . . 10 61 3.1.3. Presentation Format . . . . . . . . . . . . . . . . . 10 62 3.2. Formal Syntax Language . . . . . . . . . . . . . . . . . 11 63 3.2.1. Augmented BNF (ABNF) . . . . . . . . . . . . . . . . 11 64 3.2.2. Augmented Packet Header Diagrams (APHD) . . . . . . . 12 65 3.2.3. Mathematical Formulas . . . . . . . . . . . . . . . . 15 66 3.2.4. RELAX NG Format . . . . . . . . . . . . . . . . . . . 15 67 3.2.5. ASN.1 . . . . . . . . . . . . . . . . . . . . . . . . 15 68 3.2.6. TLS Description Language . . . . . . . . . . . . . . 15 69 3.3. Proofs for Syntax . . . . . . . . . . . . . . . . . . . . 15 70 3.3.1. Isomorphism Between Type and Formal Language . . . . 15 71 3.3.2. Data Format Conversion . . . . . . . . . . . . . . . 18 72 3.3.3. Interoperability with Previous Versions . . . . . . . 18 73 3.3.4. Postel's Law . . . . . . . . . . . . . . . . . . . . 19 74 4. Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . 21 75 4.1. Typed Petri Nets . . . . . . . . . . . . . . . . . . . . 21 76 4.2. Semantics Examples . . . . . . . . . . . . . . . . . . . 24 77 4.2.1. Data Type . . . . . . . . . . . . . . . . . . . . . . 24 78 4.2.2. Serializer . . . . . . . . . . . . . . . . . . . . . 24 79 4.2.3. Presentation Format . . . . . . . . . . . . . . . . . 25 80 4.3. Formal Semantics Language . . . . . . . . . . . . . . . . 25 81 4.3.1. Cosmogol . . . . . . . . . . . . . . . . . . . . . . 25 82 4.4. Proofs for Semantics . . . . . . . . . . . . . . . . . . 25 83 4.4.1. Isomorphism . . . . . . . . . . . . . . . . . . . . . 26 84 4.4.2. Postel's Law . . . . . . . . . . . . . . . . . . . . 26 85 4.4.3. Termination . . . . . . . . . . . . . . . . . . . . . 26 86 4.4.4. Liveness . . . . . . . . . . . . . . . . . . . . . . 26 87 4.4.5. Verified Code . . . . . . . . . . . . . . . . . . . . 26 88 5. Informative References . . . . . . . . . . . . . . . . . . . 26 89 Appendix A. Command Line Tools . . . . . . . . . . . . . . . . . 28 90 A.1. Installation . . . . . . . . . . . . . . . . . . . . . . 28 91 A.1.1. Download the Docker Image . . . . . . . . . . . . . . 29 92 A.2. Using the "computerate" Command . . . . . . . . . . . . . 29 93 A.3. Using the Idris REPL . . . . . . . . . . . . . . . . . . 30 94 A.4. Using Other Commands . . . . . . . . . . . . . . . . . . 31 95 A.5. Bugs and Workarounds . . . . . . . . . . . . . . . . . . 31 96 A.6. TODO List . . . . . . . . . . . . . . . . . . . . . . . . 32 98 Appendix B. Computerate Specifications Library . . . . . . . . . 32 99 B.1. Installation . . . . . . . . . . . . . . . . . . . . . . 32 100 B.2. Catalog . . . . . . . . . . . . . . . . . . . . . . . . . 33 101 B.2.1. RFC5234 . . . . . . . . . . . . . . . . . . . . . . . 33 102 Appendix C. Errata Statistics . . . . . . . . . . . . . . . . . 33 103 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 35 104 Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 105 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 37 107 1. Introduction 109 If, as the unofficial IETF motto states, we believe that "running 110 code" is an important part of the feedback provided to the 111 standardization process, then as per the Curry-Howard equivalence 112 [Curry-Howard] (that states that code and mathematical proofs are the 113 same), we ought to also believe that "verified proof" is an equally 114 important part of that feedback. A verified proof is a mathematical 115 proof of a logical proposition that was mechanically verified by a 116 computer, as opposed to just peer-reviewed. 118 The "Experiences with Protocol Description" paper from Pamela Zave 119 [Zave] gives three conclusions about the usage of formal 120 specifications for a protocol standard. The first conclusion states 121 that informal methods (i.e. the absence of verified proofs) are 122 inadequate for widely used protocols. This document is based on the 123 assumption that this conclusion is correct, so its validity will not 124 be discussed further. 126 The second conclusion states that formal specifications are useful 127 even if they fall short of the "gold standard" of a complete formal 128 specification. We will show that a formal specification can be 129 incrementally added to a standard. 131 The third conclusion from Zave's paper states that the normative 132 English language should be paraphrasing the formal specification. 133 The difficulty here is that to be able to keep the formal 134 specification and the normative language synchronized at all times, 135 these two should be kept as physically close as possible to each 136 other. 138 To do that we introduce the concept of "Computerate Specifying" (note 139 that Computerate is a British English word). "Computerate 140 Specifying" is a play on "Literate Computing", itself a play on 141 "Structured Computing" (see [Knuth92] page 99). In the same way that 142 Literate Programming enriches code by interspersing it with its own 143 documentation, Computerate Specifying enriches a standard 144 specification by interspersing it with code (or with proofs, as they 145 are the same thing), making it a computerate specification. 147 Note that computerate specifying is not specific to the IETF, just 148 like literate computing is not restricted to the combination of Tex 149 and Pascal described in Knuth's paper. What this document describes 150 is a specific instance of computerate specifying that combines 151 [AsciiDoc] as formatting language and [Idris] as programming language 152 with the goal of formally specifying IETF protocols. 154 2. Overview of Operations 156 Nowadays specifications at the IETF are written in a format named 157 xml2rfc v3 [RFC7991] but unfortunately making that format 158 "Computerable" is not trivial, mostly because there is no simple 159 solution to mix code and XML together in the same file. Instead, we 160 chose the AsciiDoc format as the basis for computerate specifications 161 as it permits the generation of specifications in the xmlrfc v3 162 format (among other formats) and also because it can be enriched with 163 code in the same file. 165 [I-D.ribose-asciirfc] describes a backend for the [Asciidoctor] tool 166 that converts an AsciiDoc document into an xmlrfc3 document. The 167 AsciiRFC document states various reasons why AsciiDoc is a superior 168 format for the purpose of writing standards, so we will not discuss 169 these further. Note that the same team developed Asciidoctor 170 backends for other Standards Developing Organizations (SDO) 171 [Metanorma], making it easy to develop computerate specifications 172 targeting the standards developed by these SDOs. 174 The code in a computerate specification uses the programming language 175 Idris in literate programming [Literate] mode using the Bird-style, 176 by having each line of code starting with a ">" mark in the first 177 column. 179 That same symbol was also used by AsciiDoc as an alternate way of 180 defining a blockquote [Blockquotes] way which is no longer available 181 in a computerate specification. Bird-style code will simply not 182 appear in the rendered document. 184 The result of Idris code execution can be inserted inside the 185 document part by putting that code fragment in the document between 186 the "{`" string and the "`}" string. 188 A computerate specification is processed by an Asciidoctor 189 preprocessor that does the following: 191 1. Loads the whole document as an Idris program, including importing 192 modules. 194 2. For each instance of an inline code fragment, evaluates that 195 fragment and replace it (including the delimiters) by the result 196 of that evaluation. 198 3. Continues with the normal processing of the modified document. 200 For instance the following computerate specification fragment taken 201 from the computerate specification of STUNbis 203 204 > rto : Int 205 > rto = 500 206 > 207 > rc : Nat 208 > rc = 7 209 > 210 > rm : Int 211 > rm = 16 212 > 213 > -- A stream of transmission times 214 > transmissions : Int -> Int -> Stream Int 215 > transmissions value rto = value :: transmissions (value + rto) 216 > (rto * 2) 217 > 218 > -- Returns a specific transmission time 219 > transmission : Int -> Nat -> Int 220 > transmission timeout i = index i $ transmissions 0 timeout 221 > 222 > a1 : String 223 > a1 = show rto 224 > 225 > a2 : String 226 > a2 = concat (take (rc - 1) (map (\t => show t ++ " ms, ") 227 > (transmissions 0 rto))) ++ "and " ++ show (transmission rto 228 > (rc - 1)) ++ " ms" 229 > 230 > a3 : String 231 > a3 = show $ transmission rto (rc - 1) + rto * rm 233 For example, assuming an RTO of {`a1`}ms, requests would be sent at 234 times {`a2`}. 235 If the client has not received a response after {`a3`} ms, the 236 client will consider the transaction to have timed out. 237 239 Figure 1 241 is rendered as 242 " For example, assuming an 243 RTO of 500ms, requests would be sent at times 0 ms, 500 ms, 1500 ms, 244 3500 ms, 7500 ms, 15500 ms, and 31500 ms. If the client has not 245 received a response after 39500 ms, the client will consider the 246 transaction to have timed out." 248 Figure 2 250 Appendix A explains how to install the command line tools to process 251 a computerate specification. 253 The Idris programming language has been chosen because its type 254 system supports dependent and linear types, and that type system is 255 the language in which formal specifications are written. 257 Following Zave's second conclusion, a computerate specification does 258 not have to be about just formally specifying a protocol and proving 259 properties about it. There is a whole spectrum of formalism that can 260 be introduced in a specification, and we will present it in the 261 remaining sections by increasing order of complexity. Note that 262 because the formal language is a programming language, these usages 263 are not exhaustive, and plenty of other usages can and will be found 264 after the publication of this document. 266 2.1. Libraries 268 A computerate specification does not disappear as soon the standard 269 it describes is published. Quite the opposite, each specification is 270 designed to be used as an Idris module that can be imported in 271 subsequent specifications, reducing over time the amount of code that 272 needs to be written. At the difference of an RFC that is immutable 273 after publication, the code in a specification will be improved over 274 time, especially as new properties are proved or disproved. The 275 latter will happen when a bug is discovered in a specification and a 276 proof of negation is added to the specification, paving the way to a 277 revision of the standard. 279 This document is itself a computerate specification that contains 280 data types and functions that can be reused in future specifications, 281 and as a whole can be considered as the standard library for 282 computerate specifying. 284 For convenience, each public computerate specification, including the 285 one behind this document, will be made available as an individual git 286 repository. Appendix B explains how to gain access to these 287 computerate specifications. 289 2.2. Retrofitting Specifications 291 RFCs, Internet-Drafts and standard documents published by other SDOs 292 did not start their life as computerate specifications, so to be able 293 to use them as Idris modules they will need to be progressively 294 retrofitted. This is done by converting the documents into an 295 AsciiDoc documents and then enriching them with code, in the same way 296 that would have been done if the standard was developed directly as a 297 computerate specification. 299 Converting the whole document in AsciiDoc and enriching it with code, 300 instead of just maintaining a library of code, seems a waste of 301 resources. The reason for doing so is to be able to verify that the 302 rendered text is equivalent to the original standard, which will 303 validate the examples and formal languages. 305 Retrofitted specifications will also be made available as individual 306 git repositories as they are converted. 308 Because the IETF Trust does not permit modifying an RFC as a whole 309 (except for translation purposes), a retrofitted RFC uses 310 transclusion, a mechanism that includes parts of a separate document 311 at runtime. This way, a retrofitted RFC is distributed as two 312 separate files, the original RFC in text form, and a computerate 313 specification that contains only code and transclusions. 315 Transclusion is a special form of AsciiDoc include that takes a range 316 of lines as parameters: 318 include::rfc5234.txt[lines=26..35] 320 Figure 3 322 Here the "include" macro will be replaced by the content of lines 26 323 to 35 (included) of RFC 5234. 325 The "sub" parameter permits modifying the copied content according to 326 a regular expression. For instance the following converts references 327 into the AsciiDoc format: 329 include::rfc5234.txt[lines=121..131,sub="/\[([^\]])\]/<<\1>>/"] 331 Figure 4 333 In the following example, the text is converted into a note: 335 include::rfc5234.txt[lines=151,sub="/^.*$/NOTE: \0/"] 336 Figure 5 338 2.3. Revision of Standards 340 Standards evolve, but because RFCs are immutable, revisions for a 341 standard are done by publishing new RFCs. 343 The matching computerate specifications need to reflect that 344 relationship by extending the data type of syntax and semantics in 345 the new version, instead of recreating new data types from scratch. 346 There are two diametrically opposed directions when extending a type: 348 * The new standard is adding constraints. This is done by indexing 349 the new type over the old type. 351 * The new standard is removing constraints. This is done by 352 defining the new type as a sum type, with one of the alternatives 353 being the old type. 355 // This is correct in theory, but in practice creating new // 356 specifications from old ones as described above is not very // 357 convenient. Maybe an alternate solution is to define the new // 358 specifications from scratch, and use an isomorphism proof to // 359 precisely define the differences between the two. An Idris // 360 elaboration script may permit duplicating a type and modifying it // 361 without having to manually copy it. 363 2.4. Content of a Computerate Specification 365 Communication protocol specifications are generally split in two 366 distinct parts, syntax (the data layout of the messages exchanged) 367 and semantics (the rules that drive the exchange of messages). 369 Section 3 will discuss in detail the application of computerate 370 specifying to syntax descriptions, and Section 4 will be about 371 specifying semantics. 373 3. Syntax 375 The syntax of a communication protocol determines how data is laid 376 out before it is sent over a communication link. Generally the 377 syntax is described only in the context of the layer that this 378 particular protocol is operating at, e.g. an application protocol 379 syntax only describes the data as sent over UDP or TCP, not over 380 Ethernet or Wi-Fi. 382 Syntaxes can generally be split into two broad categories, binary and 383 text, and generally a protocol syntax falls completely into one of 384 these two categories. 386 Syntax descriptions can be formalized for at least three reasons that 387 will be presented in the following sections. 389 3.1. Syntax Examples 391 Examples in protocol documentation are frequently incorrect, which 392 should not be that much of an issue but for the fact that most 393 developers do not read the normative text when an example is 394 available. See Appendix C for statistics about the frequency of 395 incorrect examples in RFC errata. 397 Moving the examples into appendices or adding caution notices may 398 show limited success in preventing that problem. 400 So ensuring that examples match the normative text seems like a good 401 starting point for a computerate specification. This is done by 402 having the possibility of adding the result of a computation directly 403 inside the document. If that computation is done from a type that is 404 (physically and conceptually) close to the normative text, then we 405 gain some level of assurance that both the normative text and the 406 derived examples will match. Note that examples can be inserted in 407 the document as whole blocks of text, or as inline text. 409 Appendix B.2.1 showcases the conversion of the examples in [RFC5234]. 411 3.1.1. Data Type 413 The first step is to define an Idris type that completely defines the 414 layout of the messages exchanged. By "completely define" we mean 415 that the type checker will prevent creating any invalid value of this 416 type. That ensures that all values are correct by construction. 418 E.g. here is the definition of a DNS label per [RFC1034]: 420 421 > data PartialLabel' : List Char -> Type where 422 > Empty : PartialLabel' [] 423 > More : (c : Char) -> (prf1 : isAlphaNum c || c == '-' = True) -> 424 > PartialLabel' s -> (prf2 : length s < 61 = True) -> 425 > PartialLabel' (c :: s) 426 > 427 > data Label' : List Char -> Type where 428 > One : (c : Char) -> (prf1 : isAlpha c = True) -> Label' [c] 429 > Many : (begin : Char) -> (prf1 : isAlpha begin = True) -> 430 > (middle : PartialLabel' xs) -> 431 > (end : Char) -> (prf2 : isAlphaNum end = True) -> 432 > Label' ([begin] ++ xs ++ [end]) 433 > 434 > data Label : {a : Type} -> a -> Type where 435 > MkLabel : {xs : String} -> Label' (unpack xs) -> Label xs 436 438 Figure 6 440 3.1.2. Serializer 442 The second step is writing a serializer from that type into the wire 443 representation. For a text format, it is done by implementing the 444 Show interface: 446 447 > Show (Label xs) where 448 > show _ = xs 449 451 Figure 7 453 // Define binary serializer. 455 3.1.3. Presentation Format 457 Instead of directly generating character strings, the serializer will 458 use as target a dependent type that formalizes the AsciiDoc language. 459 This will permit to know the context in which a character string will 460 be subsequently generated and to correctly escape special characters 461 in that string. 463 Using an intermediary type will also permit to correctly generate 464 AsciiDoc that can generate an xmlrfc 3 file that supports both text 465 and graphical versions of a figure. This will be done by having 466 AsciiDoc blocks converted into elements that contains both 467 the 72 column formatted text and an equivalent SVG file, even for 468 code source (instead of using the element). 470 3.2. Formal Syntax Language 472 Some specifications use a formal language to describe the data 473 layout. One shared property of these languages is that they cannot 474 always formalize all the constraints of a specific data layout, so 475 they have to be enriched with comments. One consequence of this is 476 that they cannot be used as a replacement for the Idris data type 477 described in Section 3.1.1, data type that is purposely complete. 479 The following sections describe how these formal languages have been 480 or will be themselves formalized with the goal of using them in 481 computerate specifications. 483 3.2.1. Augmented BNF (ABNF) 485 Augmented Backus-Naur Form [RFC5234] (ABNF) is a formal language used 486 to describe a text based data layout. 488 The [RFC5234] document has been retrofitted as a computerate 489 specification to provide an internal Domain Specific Language (DSL) 490 that permits specifying an ABNF for a specification. The encoding of 491 an example from Section 2.3 of [RFC5234] looks like this: 493 494 > rulename : Rule 495 > rulename = "rulename" `Eq` (Concat (TermDec 97 []) (TermDec 98 []) 496 > [TermDec 99 []]) 497 499 Figure 8 501 A serializer, also defined in the same specification, permits 502 converting that description into proper ABNF text that can be 503 inserted into the document such as in the following fragment: 505 506 [source,abnf] 507 ---- 508 {`show rulename`} 509 ---- 510 512 Figure 9 514 is rendered as 515 rulename = %d97 %d98 %d99 517 Figure 10 519 See Appendix B.2.1 for access to the source of the retrofitted 520 specification for [RFC5234]. 522 3.2.2. Augmented Packet Header Diagrams (APHD) 524 Augmented Packet Header Diagram 525 [I-D.mcquistin-augmented-ascii-diagrams] (APHD) is a formal language 526 used to describe a binary data layout in a machine-readable format. 528 The [I-D.mcquistin-augmented-ascii-diagrams] document will be 529 retrofitted as a computerate specification to provide an internal 530 Domain Specific Language (DSL) that permits specifying an APHD for a 531 specification. The partial encoding of an example from section 4.1 532 looks like this: 534 535 > ipv4 : Aphd 536 > ipv4 = MkAphd "IPv4 Header" [ 537 > MkField "Version" (Just "V") (Number 4) Bits "This is a" ++ 538 > " fixed-width field, whose full label is shown in the" ++ 539 > " diagram. The field's width -- 4 bits -- is given in" ++ 540 > " the label of the description list, separated from the" ++ 541 > " field's label by a colon.", 542 > ... 543 > MkField "Options" Nothing (Mult (Sub (Name "IHL") (Number 5)) 544 > (Number 32)) Bits "This is a variable-length field, whose" ++ 545 > " length is defined by the value of the field with short" ++ 546 > " label IHL (Internet Header Length). Constraint" ++ 547 > " expressions can be used in place of constant values: the" ++ 548 > " grammar for the expression language is defined in" ++ 549 > " Appendix A.1. Constraints can include a previously" ++ 550 > " defined field's short or full label, where one has been" ++ 551 > " defined. Short variable-length fields are indicated by" ++ 552 > " \"...\" instead of a pipe at the end of the row." 553 > ... 554 > ] 555 557 Figure 11 559 A serializer, also defined in the same specification, permits 560 converting that description into proper ABNF text that can be 561 inserted into the document such as in the following fragment: 563 564 .... 565 {`show ipv4`} 566 .... 567 569 Figure 12 571 is rendered as 572 An IPv4 Header is formatted as follows: 574 0 1 2 3 575 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 576 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 577 |Version| IHL | DSCP |ECN| Total Length | 578 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 579 | Identification |Flags| Fragment Offset | 580 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 581 | Time to Live | Protocol | Header Checksum | 582 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 583 | Source Address | 584 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 585 | Destination Address | 586 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 587 | Options ... 588 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 589 | : 590 : Payload : 591 : | 592 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 594 where: 596 Version (V): 4 bits. This is a fixed-width field, whose full label 597 is shown in the diagram. The field's width -- 4 bits -- is given 598 in the label of the description list, separated from the field's 599 label by a colon. 601 ... 603 Options: (IHL-5)*32 bits. This is a variable-length field, whose 604 length is defined by the value of the field with short label IHL 605 (Internet Header Length). Constraint expressions can be used in 606 place of constant values: the grammar for the expression language 607 is defined in Appendix A.1. Constraints can include a previously 608 defined field's short or full label, where one has been defined. 609 Short variable-length fields are indicated by "..." instead of a 610 pipe at the end of the row. 612 ... 614 Figure 13 616 Here the serializer generates an instance of the intermediary 617 AsciiDoc type that describes the header line (which can be 618 concatenated to previous lines), the block containing the diagram 619 itself, and then a list of all the field definitions. 621 3.2.3. Mathematical Formulas 623 AsciiDoc supports writing equations using either asciimath or 624 latexmath. The rendering for RFCs will generate an artwork element 625 that contains both the text version of the equation and a graphical 626 version in an SVG file.// Not sure what to do with inline formulas, 627 as we cannot generate an // artwork element in that case. 629 An Idris type will be used to described equations at the type level. 630 An interpreter will be used to calculate and insert examples in the 631 document. 633 A serializer will be used to generate the asciimath code that is 634 inserted inside a stem block. 636 3.2.4. RELAX NG Format 638 TBD 640 3.2.5. ASN.1 642 TBD 644 3.2.6. TLS Description Language 646 TBD 648 3.3. Proofs for Syntax 650 The kind of proofs that one would want in a specification are related 651 to isomorphism, i.e. a guarantee that two or more descriptions of a 652 data layout contain exactly the same information. 654 3.3.1. Isomorphism Between Type and Formal Language 656 We saw above that when a data layout is described with a formal 657 language, we end up with two descriptions of that data layout, one 658 using the Idris dependent type (and used to generate examples) and 659 one using the formal language. 661 Proving isomorphism requires generating an Idris type from the formal 662 language instance, which is done using an Idris elaborator script. 664 In Idris, Elaborator Reflection [Elab] is a metaprogramming facility 665 that permits writing code generating type declarations and code 666 (including proofs) automatically. 668 For instance the ABNF language is itself defined using ABNF, so after 669 converting that ABNF into an instance of the Syntax type (which is an 670 holder for a list of instances of the Rule type), it is possible to 671 generate a suite of types that represents the same language: 673 674 > abnf : Syntax 675 > abnf = MkSyntax [ 676 > "rulelist" `Eq` (Repeat (Just 1) Nothing (Group (Altern 677 > (TermName "rule") (Group (Concat (Repeat Nothing Nothing 678 > (TermName "c-wsp")) (TermName "c-nl") [])) []))), 679 > ... 680 > ] 681 > 682 > %runElab (generateType "Abnf" abnf) 683 685 Figure 14 687 The result of the elaboration can then be used to construct a value 688 of type Iso, which requires four total functions, two for the 689 conversion between types, and another two to prove that sequencing 690 the conversions results in the same original value. 692 The following example generates an Idris type "SessionDescription" 693 from the SDP ABNF. It then proves that this type and the Sdp type 694 contain exactly the same information (the proofs themselves have been 695 removed, leaving only the propositions): 697 698 > import Data.Control.Isomorphism 699 > 700 > sdp : Syntax 701 > sdp = MkSyntax [ 702 > "session-description" `Eq` (Concat (TermName "version-field") 703 > (TermName "origin-field") [ 704 > TermName "session-name-field", 705 > Optional (TermName "information-field"), 706 > Optional (TermName "uri-field"), 707 > Repeat Nothing Nothing (TermName "email-field"), 708 > Repeat Nothing Nothing (TermName "phone-field"), 709 > Optional (TermName "connection-field"), 710 > Repeat Nothing Nothing (TermName "bandwidth-field"), 711 > Repeat (Just 1) Nothing (TermName "time-description"), 712 > Optional (TermName "key-field"), 713 > Repeat Nothing Nothing (TermName "attribute-field"), 714 > Repeat Nothing Nothing (TermName "media-description") 715 > ]), 716 > ... 717 > ] 718 > 719 > %runElab (generateType "Sdp" sdp) 720 > 721 > same : Iso Sdp SessionDescription 722 > same = MkIso to from toFrom fromTo 723 > where 724 > to : Sdp -> SessionDescription 725 > 726 > from : SessionDescription -> Abnf 727 > 728 > toFrom : (x : SessionDescription ) -> to (from x) = x 729 > 730 > fromTo : (x : Sdp) -> from (to x) = x 731 > 732 734 Figure 15 736 As stated in Section 3.2, the Idris type and the type generated from 737 the formal language are not always isomorphic, because some 738 constraints cannot be expressed in that formal language. In that 739 case isomorphism can be used to precisely define what is missing 740 information in the formal language type. To do so, the generated 741 type is augmented with a delta type, like so: 743 744 > data DeltaSessionDescription : Type where 745 > ... 746 > 747 > same : Iso Sdp (SessionDescription, DeltaSessionDescription) 748 > ... 749 751 Figure 16 753 Then the DeltaSessionDescription type can be modified to include the 754 missing information until the same function type checks. After this 755 we have a guarantee that we know all about the constraints that 756 cannot be encoded in that formal language, and can check manually 757 that each of them is described as comment. 759 Idris elaborator scripts will be developped for each formal 760 languages. 762 3.3.2. Data Format Conversion 764 For specifications that describe a conversion between different data 765 layouts, having a proof that guarantees that no information is lost 766 in the process can be beneficial. For instance, we observe that 767 syntax encoding tends to be replaced each ten years or so by 768 something "better". Here again isomorphism can tell us exactly what 769 kind of information we lost and gained during that replacement. 771 Here, for example, the definition of a function that would verify an 772 isomorphism between an XML format and a JSON format: 774 775 > isXmlAndJsonSame: Iso (XML, DeltaXML) (JSON, DeltaJson) 776 > ... 777 779 Figure 17 781 Here DeltaXML expresses what is gained by switching from XML to JSON, 782 and DeltaJson expresses what is lost. 784 3.3.3. Interoperability with Previous Versions 786 The syntax of the data layout may be modified as part of the 787 evolution of a standard. In most case a version number prevents old 788 format to be used with the new format, but in cases where that it is 789 not possible, the new specification can ensure that both formats can 790 co-exist by using the same techniques as above. 792 Conversely these techniques can be used during the design phase of a 793 new version of a format, to check if a new version number is 794 warranted. 796 3.3.4. Postel's Law 798 | Be conservative in what you do, be liberal in what you accept from 799 | others. 800 | 801 | -- Jon Postel 803 One of the downsides of formal specifications is that there is no 804 wiggle room possible when implementing it. An implementation either 805 conforms to the specification or does not. 807 One analogy would be specifying a pair of gears. If one decides to 808 have both of them made with tolerances that are too small, then it is 809 very likely that they will not be able to move when put together. A 810 bit of slack is needed to get the gear smoothly working together but 811 more importantly the cost of making these gears is directly 812 proportional to their tolerance. There is an inflexion point where 813 the cost of an high precision gear outweighs its purpose. 815 We have a similar issue when implementing a formal specification, 816 where having an absolutely conformant implementation may cost more 817 money than it is worth spending. On the other hand a specification 818 exists for the purpose of interoperability, so we need some 819 guidelines on what to ignore in a formal specification to make it 820 cost effective. 822 Postel's law proposes an informal way of defining that wiggle room by 823 actually having two different specifications, one that defines a data 824 layout for the purpose of sending it, and another one that defines a 825 data layout for the purpose of receiving that data layout. 827 Existing specifications express that dichotomy in the form of the 828 usage of SHOULD/SHOULD NOT/RECOMMENDED/NOT RECOMMENDED [RFC2119] 829 keywords. For example the SDP spec says that "[t]he sequence CRLF 830 (0x0d0a) is used to end a line, although parsers SHOULD be tolerant 831 and also accept lines terminated with a single newline character." 832 This directly infers two specifications, one used to define an SDP 833 when sending it, that enforces using only CRLF, and a second 834 specification, used to define an SDP when receiving it (or parsing 835 it), that accepts both CRLF and LF. 837 Note that the converse is not necessarily true, i.e. not all usages 838 of these keywords are related to Postel's Law. 840 To ensure that the differences between the sending specification and 841 the receiving specification do not create interoperability problems, 842 we can use a variant of isomorphism, as shown in the following 843 example (data constructors and code elided): 845 846 > data Sending : Type where 847 > 848 > data Receiving : Type where 849 > 850 > to : Sending -> List Receiving 851 > 852 > from : Receiving -> Sending 853 > 854 > toFrom : (y : Receiving) -> Elem y (to (from y)) 855 > 856 > fromTo : (y : Sending) -> True = all (== y) [from x | x <- to y] 857 > 858 860 Figure 18 862 Here we define two data types, one that describes the data layout 863 that is permitted to be sent (Sending) and one that describes the 864 data layout that is permitted to be received (Receiving). For each 865 data layout that is possible to send, there is one or more matching 866 receiving data layouts. This is expressed by the function "to" that 867 takes as input one Sending value and returns a list of Receiving 868 values. 870 Conversely, the "from" function maps a Receiving data layout onto a 871 Sending data layout. Note the asymmetry there, which prevents to use 872 a standard proof of isomorphism. 874 Then the "toFrom" and "fromTo" proofs verify that there is no 875 interoperability issue by guaranteeing that each Receiving value maps 876 to one and only one Sending instance and that this mapping is 877 isomorphic. 879 All of this will provide a clear guidance of when and where to use a 880 SHOULD keyword or its variants, without loss of interoperability. 882 As an trivial example, the following proves that accepting LF 883 characters in addition to CRLF characters as end of line markers does 884 not break interoperability: 886 887 > data Sending : Type where 888 > S_CRLF : Sending 889 > 890 > Eq Sending where 891 > (==) S_CRLF S_CRLF = True 892 > 893 > data Receiving : Type where 894 > R_CRLF : Receiving 895 > R_LF : Receiving 896 > 897 > to : Sending -> List Receiving 898 > to S_CRLF = [R_CRLF, R_LF] 899 > 900 > from : Receiving -> Sending 901 > from R_CRLF = S_CRLF 902 > from R_LF = S_CRLF 903 > 904 > toFrom : (y : Receiving) -> Elem y (to (from y)) 905 > toFrom R_CRLF = Here 906 > toFrom R_LF = There Here 907 > 908 > fromTo : (y : Sending) -> True = all (== y) [from x | x <- to y] 909 > fromTo S_CRLF = Refl 910 912 Figure 19 914 4. Semantics 916 The semantics of a communication protocol determine what messages are 917 exchanged over a communication link and the relationship between 918 them. The semantics are generally described only in the context of 919 the layer that this particular protocol is operating at. 921 4.1. Typed Petri Nets 923 The semantics of a specification require to define an Idris type that 924 strictly enforces these semantics. This can be done in an ad hoc way 925 [Type-Driven], particularly by using linear types that express 926 resources' consumption. 928 But a better solution is to design these graphically, particularly by 929 using Petri Nets. This specification defines a DSL that permits 930 describing a Typed Petri Net (TPN) which is heavily influenced by 931 Coloured Petri Nets [CPN] (CPN). A CPN adds some restriction on the 932 types that can be used in a Petri Net because of limitations in the 933 underlying programming language, SML. The underlying programming 934 used in TPN, Idris, does not have these limitations, so any well- 935 formed Idris type (including polymorphic, linear and dependent types) 936 can be directly used in TPN.// A graphical editor for TPN is planned 937 as part of the integration // tooling. The graphical tool will use 938 the document directly as // storage. 940 Here's an example of TPN (from figure 2.10 in [CPN]): 942 943 > NO : Type 944 > NO = Int 945 > 946 > DATA : Type 947 > DATA = String 948 > 949 > NOxDATA : Type 950 > NOxDATA = (NO, DATA) 951 > 952 > PTS : Place 953 > PTS = MkPlace "Packets To Send" NOxDATA (\() => [(1, "COL"), 954 > (2, "OUR"), (3, "ED "), (4, "PET"), (5, "RI "), (6, "NET")]) 955 > 956 > NS : Place 957 > NS = MkPlace "NextSend" NO (\() => [1]) 958 > 959 > A : Place 960 > A = MkPlace "A" NOxDATA (\() => []) 961 > 962 > input1 : Input 963 > input1 = MkInput PTS (NO, DATA) pure 964 > 965 > input2 : Input 966 > input2 = MkInput NS NO pure 967 > 968 > output1 : Output 969 > output1 = MkOutput PTS (NO, DATA) pure 970 > 971 > output2 : Output 972 > output2 = MkOutput NS NO pure 973 > 974 > output3 : Output 975 > output3 = MkOutput A (NO, DATA) pure 976 > 977 > sendPacket : Transition 978 > sendPacket = MkTransition [input1, input2] [output1, output2, 979 > output3] (\((n, d), n') => if n == n' 980 > then pure ((n, d), n, (n, d)) 981 > else empty) 982 984 Figure 20 986 // The DSL is being currently designed, so the example shows the // 987 generated value. 989 From there it is easy to generate (using the non-deterministic monad 990 in Idris) an interpreter for debugging and simulation purposes: 992 993 > interpret : MS NOxDATA -> MS NO -> MS NOxDATA -> 994 > ND (MS NOxDATA, MS NO, MS NOxDATA) 995 > interpret pts ns a = do 996 > (pts1, pts2) <- sel pts 997 > (ns1, ns2) <- sel ns 998 > i1 <- input' input1 pts1 999 > i2 <- input' input2 ns1 1000 > (pts3, ns3, a3) <- transition' sendPacket (i1, i2) 1001 > let o1 = output' output1 pts3 1002 > let o2 = output' output2 ns3 1003 > let o3 = output' output3 a3 1004 > pure (o1 ++ pts2, o2 ++ ns2, o3 ++ a) 1005 1007 Figure 21 1009 // Replace by the generic variant of the interpreter. 1011 A Petri Net has the advantage that the same graph can be reused to 1012 derive other Petri Nets, e.g., Timed Petri Nets (that can be used to 1013 collect performance metrics) or Stochastic Petri Nets.// The 1014 traditional way of verifying a Petri Net is by using model // 1015 checking. There is nothing in the design that prevents doing // 1016 that, but because that takes quite some time to run and so cannot // 1017 be part of the document processing, how do we store in the // 1018 document a proof that the model checking was successful? 1020 4.2. Semantics Examples 1022 Semantics examples can be wrong, so it is useful to be sure that they 1023 match the specification. 1025 4.2.1. Data Type 1027 As explained above, semantics can be described in an ad hoc manner, 1028 or using the TPN DSL. 1030 4.2.2. Serializer 1032 At the difference of syntax, where there are more or less as many 1033 ways to display them than there are syntaxes, semantics examples 1034 generally use sequence diagrams, eventually augmented with the 1035 content of the packets exchanged (and so using the techniques 1036 described in Section 3.1). 1038 Similarly to what is done in Section 3.2.2, an Asciidoctor block 1039 processor similar to the "msc" type of diagram used by the 1040 asciidoctor-diagram extension will be designed.// We unfortunately 1041 cannot reuse the asciidoctor-diagram extension // because it cannot 1042 generate both text and SVG versions of a // sequence diagram. 1044 The serializer for an example derived from a TPN generates the 1045 content of the msc AsciiDoc block, by selecting one particular path 1046 and its associated bindings through the Petri Net.// We probably want 1047 to use AsciiDoc callouts for these, although that // would require a 1048 modification in AsciiRfc. In fact callout would // be a far better 1049 technique for other diagrams, like AAD, as it will // let the 1050 renderer take care of the best way to place elements // depending on 1051 the output format. 1053 4.2.3. Presentation Format 1055 TBD. 1057 4.3. Formal Semantics Language 1059 Some specifications use a formal language to describe the state 1060 machines. One shared property of these languages is that they cannot 1061 always formalize all the constraints of specific semantics, so they 1062 have to be enriched with comments. One consequence of this is that 1063 they cannot be used as a replacement for the Idris data type 1064 described in Section 4.1, a data type that is purposely complete. 1066 4.3.1. Cosmogol 1068 Cosmogol [I-D.bortzmeyer-language-state-machines] is a formal 1069 language designed to define states machines. The Internet-Draft will 1070 be retrofitted as a computerate specification to provide an internal 1071 Domain Specific Language (DSL) that permits specifying an instance of 1072 that language. A serializer and elaborator script will also be 1073 defined. 1075 Finally, an Asciidoctor block processor would be used to convert the 1076 language into both a text and a graphical view of the state 1077 machine.// Add examples there. 1079 4.4. Proofs for Semantics 1081 Like for syntax formal languages, an elaborator script permits 1082 generating a type from a TPN instance. That type can then be used to 1083 write proofs of the properties that we expect from the semantics. 1085 4.4.1. Isomorphism 1087 An isomorphism proof can be used between two types derived from the 1088 semantics of a specification, for example to prove that no 1089 information is lost in the converting between the underlying 1090 processes, or when upgrading a process. 1092 An example of that would be to prove (or more likely disprove) that 1093 the SIP state machines are isomorphic to the WebRTC state machines. 1095 4.4.2. Postel's Law 1097 Like for the syntax, semantics can introduce wiggle room between the 1098 state machines on the sending side and the state machines on the 1099 receiving side. A similar isomorphism proof can be used to ensure 1100 that this is done without loss of interoperability. 1102 4.4.3. Termination 1104 The TPN type can be used to verify that the protocol actually 1105 terminates, or that it always returns to its initial state. This is 1106 equivalent to proving that a program terminates. 1108 4.4.4. Liveness 1110 The TPN type can be used to verify that the protocol is productive, 1111 i.e. that it does not loop without making progress. 1113 4.4.5. Verified Code 1115 A TPN that covers a whole protocol (i.e. client, network, and server) 1116 is useful to prove the properties listed in the previous sections. 1117 But the TPN is also designed in a way that each of these parts can be 1118 defined separately from the others (making it a Hierarchical TPN). 1119 This permits using the type generated from these (through an 1120 elaborator script) as a type for real code, and thus verifying that 1121 this code conforms to both the syntax and the semantics 1122 specifications. 1124 5. Informative References 1126 [RFC7991] Hoffman, P., "The "xml2rfc" Version 3 Vocabulary", 1127 RFC 7991, DOI 10.17487/RFC7991, December 2016, 1128 . 1130 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 1131 Specifications: ABNF", STD 68, RFC 5234, 1132 DOI 10.17487/RFC5234, January 2008, 1133 . 1135 [I-D.bortzmeyer-language-state-machines] 1136 Bortzmeyer, S., "Cosmogol: a language to describe finite 1137 state machines", Work in Progress, Internet-Draft, draft- 1138 bortzmeyer-language-state-machines-01, 11 November 2006, 1139 . 1142 [Curry-Howard] 1143 "Curry-Howard correspondence", 1144 . 1147 [I-D.mcquistin-augmented-ascii-diagrams] 1148 McQuistin, S., Band, V., and C. Perkins, "Describing 1149 Protocol Data Units with Augmented Packet Header 1150 Diagrams", Work in Progress, Internet-Draft, draft- 1151 mcquistin-augmented-ascii-diagrams-01, 3 November 2019, 1152 . 1155 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", 1156 STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987, 1157 . 1159 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1160 Requirement Levels", BCP 14, RFC 2119, 1161 DOI 10.17487/RFC2119, March 1997, 1162 . 1164 [I-D.ribose-asciirfc] 1165 Tse, R., Nicholas, N., and P. Brasolin, "AsciiRFC: 1166 Authoring Internet-Drafts And RFCs Using AsciiDoc", Work 1167 in Progress, Internet-Draft, draft-ribose-asciirfc-08, 17 1168 April 2018, 1169 . 1171 [Zave] Zave, P., "Experiences with Protocol Description", 1172 Rigorous Protocol Engineering (WRiPE'11), October 2011, 1173 . 1175 [Knuth92] Knuth, D., "Literate Programming", Center for the Study of 1176 Language and Information, 1992. 1178 [Elab] Christiansen, D. and E.C. Brady, "Elaborator reflection: 1179 extending Idris in Idris", In Proceedings of the 21st ACM 1180 SIGPLAN International Conference on Functional 1181 Programming. ACM Press-Association for Computing 1182 Machinery, 2016. 1184 [AsciiDoc] "AsciiDoc", . 1186 [Asciidoctor] 1187 "Asciidoctor", 1188 . 1190 [Metanorma] 1191 "Metanorma", . 1193 [Idris] "Idris: A Language with Dependent Types", 1194 . 1196 [Literate] "Literate programming", . 1200 [Blockquotes] 1201 "Markdown-style blockquotes", 1202 . 1205 [CPN] Jensen, K. and L. M. Kristensen, "Coloured Petri Nets: 1206 Modelling and Validation of Concurrent Systems", Springer, 1207 2009. 1209 [Type-Driven] 1210 Brady, E., "Type-Driven Development with Idris", Manning, 1211 2017. 1213 Appendix A. Command Line Tools 1215 A.1. Installation 1217 The computerate command line tools are run inside a Docker image, so 1218 the first step is to install the Docker software or verify that it is 1219 up to date (https://docs.docker.com/install/). 1221 Note that for the usage described in this document there is no need 1222 for Docker EE or for having a Docker account. 1224 The following instructions assume a Unix based OS, i.e. Linux or 1225 MacOS. Lines separated by a "\" character are meant to be executed 1226 as one single line, with the "\" character removed. 1228 A.1.1. Download the Docker Image 1230 To install the computerate tools, the fastest way is to download and 1231 install the Docker image using a temporary image containing the dat 1232 tool: 1234 docker pull veggiemonk/dat-docker 1235 mkdir computerate 1236 cd computerate 1237 docker run --rm -u $(id -u):$(id -g) -v \ 1238 $(pwd):/tools veggiemonk/dat-docker dat clone \ 1239 dat://6a33cb289d5818e3709f62e95df41be537edba5f4dc26593e2cb870c7982345b \ 1240 tools 1242 Figure 22 1244 After this, the image can be loaded in Docker. The newly installed 1245 Docker image also contains the dat command, so there is no need to 1246 keep the veggiemonk/dat-docker image after this: 1248 docker load -i tools.tar.xz 1249 docker image rm --force veggiemonk/dat-docker 1251 Figure 23 1253 A new version of the tools is released at the same time a new version 1254 of this document is released. After this, running the following 1255 command in the computerate directory will pull any new version of the 1256 tool tar file: 1258 docker run --rm -u $(id -u):$(id -g) \ 1259 -v $(pwd):/computerate computerate/tools dat pull --exit 1261 Figure 24 1263 The docker image can then be loaded as above. 1265 A.2. Using the "computerate" Command 1267 The Docker image main command is "computerate", which takes the same 1268 parameters as the "metanorma" command from the Metanorma tooling: 1270 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1271 computerate/tools computerate -t ietf -x txt 1273 Figure 25 1275 The differences with the "metanorma" command are: 1277 * The "computerate" command can process Literate Idris files (files 1278 with a "lidr" extension, aka lidr files), in addition to AsciiDoc 1279 files (files with an "adoc" extension, aka adoc files). When a 1280 lidr file is processed, all embedded code fragments (text between 1281 prefix "{`" and suffix "`}") are evaluated in the context of the 1282 Idris code contained in this file. Each code fragment (including 1283 the prefix and suffix) are then substituted by the result of that 1284 evaluation. 1286 * The "computerate" command can process included lidr files in the 1287 same way. The embedded code fragments in the imported file are 1288 processed in the context of the included lidr file, not in the 1289 context of the including file. Idris modules (either from an idr 1290 or lidr file) can be imported the usual way. 1292 * The literate code (which is all the text that is starting by a ">" 1293 symbol in column 1) in a lidr file will not be part of the 1294 rendered document. 1296 * The computerate command can process transclusions, as explained in 1297 Section 2.2. 1299 * Lookup of external references is disabled. Use either raw XML 1300 references or an external directory. 1302 * Instead of generating a file based on the name of the input file, 1303 the "computerate" command generates a file based on the ":name:" 1304 attribute in the header of the document. 1306 The "computerate" command can also be used to generate an xmlrfc v3 1307 file, ready for submission to the IETF: 1309 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1310 computerate/tools computerate -t ietf -x xmlrfc3 1312 Figure 26 1314 A.3. Using the Idris REPL 1316 idr and lidr files can be loaded directly in the Idris REPL for 1317 debugging: 1319 docker run --rm -it -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1320 computerate/tools idris 1322 Figure 27 1324 It is possible to directly modify the source code in the REPL by 1325 entering the ":e" command, which will load the file in an instance of 1326 VIM preconfigured to interact with the REPL. 1328 Alternatively the Idris REPL can be started as a server: 1330 docker run --rm -it -u $(id -u):$(id -g) -p 127.0.0.1:4294:4294 \ 1331 -v $(pwd):/computerate computerate/tools idris 1333 Figure 28 1335 Then if a source file is loaded in a separate console with the VIM 1336 instance inside the Docker image, it can interact with the REPL 1337 server: 1339 docker run --rm -u $(id -u):$(id -g) --net=host \ 1340 -v $(pwd):/computerate computerate/tools vim 1342 Figure 29 1344 Note that both commands must be run from the same directory. 1346 A.4. Using Other Commands 1348 For convenience, the docker image provides the latest version of the 1349 xml2rfc, idnits, aspell, and languagetool tools. 1351 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1352 computerate/tools xml2rfc 1353 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1354 computerate/tools idnits --help 1355 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1356 computerate/tools aspell 1357 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1358 computerate/tools languagetool 1360 Figure 30 1362 The Docker image also contains a extended version of git that will be 1363 used to retrieve the computerate specifications in Appendix B. 1365 A.5. Bugs and Workarounds 1367 * Errors in embedded code do not stop the process but replace the 1368 text by the error message, which can be easily overlooked. 1370 * backticks are not escaped in code fragments. 1372 * The current version of Docker in Ubuntu fails, but this can be 1373 fixed with the following commands: 1375 sudo apt-get install containerd.io=1.2.6-3 1376 sudo systemctl restart docker.service 1378 Figure 31 1380 * The Asciidoctor processor does not correctly format the output in 1381 all cases (e.g. "++"). The escaping can be done in Idris until 1382 this is fixed. 1384 * Sometimes the Idris processing fails with an error "Module needs 1385 reloading". Deleting all the files with the ibc extension will 1386 solve that problem. 1388 * Trying to fetch nonexistent new commits on a git repository will 1389 block for 12 seconds. 1391 * xml2rfc does not support PDF output. 1393 A.6. TODO List 1395 * Embedded blocks. 1397 * Test on Windows. 1399 * Using recursive modules with Idris. 1401 Appendix B. Computerate Specifications Library 1403 B.1. Installation 1405 The git repositories that compose the Computerate Specification 1406 Library are distributed over a peer-to-peer protocol based on dat. 1408 This requires an extension to git, extension that is already 1409 installed in the Docker image described in Appendix A. The following 1410 command can be used to retrieve a computerate specification: 1412 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1413 computerate/tools git clone --recursive dat:// 1415 Figure 32 1417 Here is the dat public key for a specific computerate 1418 specification and is the recommended name. Do not use the dat 1419 URIs given in Appendix A, as only the dat public keys listed in 1420 Appendix B.2 can be used with a git clone. 1422 Updating the repository also requires using the Docker image: 1424 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1425 computerate/tools git pull 1426 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1427 computerate/tools git submodule update 1429 Figure 33 1431 All the git commands that do not require access to the remote can be 1432 run natively or from the Docker image. 1434 Note that for the computerate specification library the "computerate" 1435 command must be run from the directory that is one level above the 1436 git repository. The name of the root document is always "Main.lidr" 1437 or "Main.adoc": 1439 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1440 computerate/tools computerate -t ietf -x txt \ 1441 /Main.lidr 1443 Figure 34 1445 B.2. Catalog 1447 For the time being this document will serve as a catalog of available 1448 computerate specifications. 1450 B.2.1. RFC5234 1452 Name: RFC5234 1453 Public key: 1454 994e52b29a7bf4f7590b0f0369a7d55d29fb22befd065e462b2185a8207e21f1 1456 Figure 35 1458 Appendix C. Errata Statistics 1460 In an effort to quantify the potential benefits of using formal 1461 methods at the IETF, an effort to relabel the Errata database is 1462 under way. 1464 The relabeling uses the following labels: 1466 +----------+----------------------------------------------+ 1467 | Label | Description | 1468 +==========+==============================================+ 1469 | AAD | Error in an ASCII bit diagram | 1470 +----------+----------------------------------------------+ 1471 | ABNF | Error in an ABNF | 1472 +----------+----------------------------------------------+ 1473 | Absent | The errata was probably removed | 1474 +----------+----------------------------------------------+ 1475 | ASN.1 | Error in ASN.1 | 1476 +----------+----------------------------------------------+ 1477 | C | Error in C code | 1478 +----------+----------------------------------------------+ 1479 | Diagram | Error in a generic diagram | 1480 +----------+----------------------------------------------+ 1481 | Example | An example does not match the normative text | 1482 +----------+----------------------------------------------+ 1483 | Formula | Error preventable by using Idris code | 1484 +----------+----------------------------------------------+ 1485 | Ladder | Error in a ladder diagram | 1486 +----------+----------------------------------------------+ 1487 | Rejected | The erratum was rejected | 1488 +----------+----------------------------------------------+ 1489 | Text | Error in the text itself, no remedy | 1490 +----------+----------------------------------------------+ 1491 | TLS | Error in the TLS language | 1492 +----------+----------------------------------------------+ 1494 Table 1 1496 At the time of publication the first 700 errata, which represents 1497 11.88% of the total, have been relabeled. On these, 34 were rejected 1498 and 27 were deleted, leaving 639 valid errata. 1500 +---------+-------+------------+ 1501 | Label | Count | Percentage | 1502 +=========+=======+============+ 1503 | Text | 396 | 61.97% | 1504 +---------+-------+------------+ 1505 | Formula | 66 | 10.32% | 1506 +---------+-------+------------+ 1507 | Example | 64 | 10.0% | 1508 +---------+-------+------------+ 1509 | ABNF | 38 | 5.94% | 1510 +---------+-------+------------+ 1511 | AAD | 32 | 5.00% | 1512 +---------+-------+------------+ 1513 | ASN.1 | 27 | 4.22% | 1514 +---------+-------+------------+ 1515 | C | 9 | 1.40% | 1516 +---------+-------+------------+ 1517 | Diagram | 4 | 0.62% | 1518 +---------+-------+------------+ 1519 | TLS | 2 | 0.31% | 1520 +---------+-------+------------+ 1521 | Ladder | 1 | 0.15% | 1522 +---------+-------+------------+ 1524 Table 2 1526 Note that as the relabeling is done in in order of erratum number, at 1527 this point it covers mostly older RFCs. A change in tooling (e.g. 1528 ABNF verifiers) means that these numbers may drastically change as 1529 more errata are relabeled. But at this point it seems that 38.02% of 1530 errata could have been prevented with a more pervasive use of formal 1531 methods. 1533 Acknowledgements 1535 Thanks to Jim Kleck, Stephane Bryant, Eric Petit-Huguenin, Nicolas 1536 Gironi, Stephen McQuistin, and Greg Skinner for the comments, 1537 suggestions, questions, and testing that helped improve this 1538 document. 1540 Changelog 1542 * draft-petithuguenin-computerate-specifying-02 1544 - Document 1546 o Switch to rfcxml3. 1548 o Status is now experimental. 1550 o Many nits. 1552 o Fix incorrect errata stats. 1554 o Move acknowledgment section at the end. 1556 o Rewrite the APHD section (formerly known as AAD) to match 1557 draft-mcquistin-augmented-diagrams-01. 1559 o Fix non-ascii characters in the references. 1561 o Intermediate AsciiDoc representation for serializers. 1563 - Tooling 1565 o xmlrfc3 is now the default extension. 1567 o "docName" and "category" attributes are now generated, and 1568 the "prepTime" is removed. 1570 o Update xml2rfc to 2.35.0. 1572 o Remove LanguageTool. 1574 o Update Metanorma to version 0.3.17. 1576 o Update Asciidoctor to 2.0.10. 1578 o Update list of Working Groups. 1580 - Library 1582 o No update. 1584 * draft-petithuguenin-computerate-specifying-01 1586 - Document 1588 o New changelog appendix. 1590 o Fix incorrect reference, formatting in Idris code. 1592 o Add option to remove container in all "docker run" command. 1594 o Add explanations to use the Idris REPL and VIM inside the 1595 Docker image. 1597 o Add placeholders for ASN.1 and RELAX NG languages. 1599 o New Errata appendix. 1601 o Nits. 1603 o Improve Syntax Examples section. 1605 - Tooling 1607 o Update Metanorma to version 0.3.16 1609 o Update MetaNorma-cli to version 1.2.7.1 1611 o Switch to patched version of Idris 1.3.2 that supports 1612 remote REPL in Docker. 1614 o Add VIM and idris-vim extension. 1616 o Remove some debug statements. 1618 - Library 1620 o No update 1622 Author's Address 1624 Marc Petit-Huguenin 1625 Impedance Mismatch LLC 1627 Email: marc@petit-huguenin.org