idnits 2.17.1 draft-petithuguenin-computerate-specifying-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 4 characters in excess of 72. == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 932: '... usage of SHOULD/SHOULD NOT/RECOMMENDED/NOT RECOMMENDED [RFC2119]...' RFC 2119 keyword, line 934: '...e, although parsers SHOULD be tolerant...' RFC 2119 keyword, line 984: '... SHOULD keyword or its variants, wit...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (9 March 2020) is 1502 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- -- Missing reference section? 'Curry-Howard' on line 1388 looks like a reference -- Missing reference section? 'Zave' on line 1393 looks like a reference -- Missing reference section? 'Knuth92' on line 1395 looks like a reference -- Missing reference section? 'AsciiDoc' on line 1399 looks like a reference -- Missing reference section? 'Idris' on line 1408 looks like a reference -- Missing reference section? 'RFC7991' on line 1343 looks like a reference -- Missing reference section? 'I-D.ribose-asciirfc' on line 1383 looks like a reference -- Missing reference section? 'Asciidoctor' on line 1401 looks like a reference -- Missing reference section? 'Metanorma' on line 1405 looks like a reference -- Missing reference section? 'Literate' on line 1411 looks like a reference -- Missing reference section? 'Blockquotes' on line 1415 looks like a reference -- Missing reference section? 'RFC8489' on line 1356 looks like a reference -- Missing reference section? 'RFC8656' on line 1362 looks like a reference -- Missing reference section? 'RFC5234' on line 1347 looks like a reference -- Missing reference section? 'RFC1034' on line 1352 looks like a reference -- Missing reference section? 'Brinkmann02' on line 1426 looks like a reference -- Missing reference section? 'I-D.mcquistin-augmented-ascii-diagrams' on line 1373 looks like a reference -- Missing reference section? 'Elab' on line 1397 looks like a reference -- Missing reference section? 'RFC2119' on line 1379 looks like a reference -- Missing reference section? 'Type-Driven' on line 1423 looks like a reference -- Missing reference section? 'CPN' on line 1420 looks like a reference -- Missing reference section? 'I-D.bortzmeyer-language-state-machines' on line 1368 looks like a reference -- Missing reference section? 'RFC0761' on line 1339 looks like a reference Summary: 4 errors (**), 0 flaws (~~), 3 warnings (==), 24 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Petit-Huguenin 3 Internet-Draft Impedance Mismatch LLC 4 Intended status: Experimental 9 March 2020 5 Expires: 10 September 2020 7 The Computerate Specifying Paradigm 8 draft-petithuguenin-computerate-specifying-03 10 Abstract 12 This document specifies a paradigm named Computerate Specifying, 13 designed to simultaneously document and formally specify 14 communication protocols. This paradigm can be applied to any 15 document produced by any Standard Developing Organization (SDO), but 16 this document targets specifically documents produced by the IETF. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at https://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on 10 September 2020. 35 Copyright Notice 37 Copyright (c) 2020 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 42 license-info) in effect on the date of publication of this document. 43 Please review these documents carefully, as they describe your rights 44 and restrictions with respect to this document. Code Components 45 extracted from this document must include Simplified BSD License text 46 as described in Section 4.e of the Trust Legal Provisions and are 47 provided without warranty as described in the Simplified BSD License. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 52 2. Overview of Operations . . . . . . . . . . . . . . . . . . . 4 53 2.1. Libraries . . . . . . . . . . . . . . . . . . . . . . . . 6 54 2.2. Retrofitting Specifications . . . . . . . . . . . . . . . 7 55 2.3. Implementation-Oriented Standards . . . . . . . . . . . . 8 56 2.4. Revision of Standards . . . . . . . . . . . . . . . . . . 8 57 2.5. Content of a Computerate Specification . . . . . . . . . 9 58 3. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 59 3.1. Syntax Examples . . . . . . . . . . . . . . . . . . . . . 9 60 3.1.1. Data Type . . . . . . . . . . . . . . . . . . . . . . 10 61 3.1.2. Serializer . . . . . . . . . . . . . . . . . . . . . 10 62 3.1.3. Presentation Format . . . . . . . . . . . . . . . . . 11 63 3.2. Designing a Data Type . . . . . . . . . . . . . . . . . . 11 64 3.3. Formal Syntax Language . . . . . . . . . . . . . . . . . 12 65 3.3.1. Augmented BNF (ABNF) . . . . . . . . . . . . . . . . 13 66 3.3.2. Augmented Packet Header Diagrams (APHD) . . . . . . . 14 67 3.3.3. Mathematical Formulas . . . . . . . . . . . . . . . . 16 68 3.3.4. RELAX NG Format . . . . . . . . . . . . . . . . . . . 16 69 3.3.5. ASN.1 . . . . . . . . . . . . . . . . . . . . . . . . 16 70 3.3.6. TLS Description Language . . . . . . . . . . . . . . 16 71 3.4. Proofs for Syntax . . . . . . . . . . . . . . . . . . . . 16 72 3.4.1. Isomorphism Between Type and Formal Language . . . . 16 73 3.4.2. Data Format Conversion . . . . . . . . . . . . . . . 19 74 3.4.3. Interoperability with Previous Versions . . . . . . . 19 75 3.4.4. Postel's Law . . . . . . . . . . . . . . . . . . . . 20 76 3.5. Extended Registries . . . . . . . . . . . . . . . . . . . 22 77 4. Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . 23 78 4.1. Typed Petri Nets . . . . . . . . . . . . . . . . . . . . 24 79 4.2. Semantics Examples . . . . . . . . . . . . . . . . . . . 26 80 4.2.1. Data Type . . . . . . . . . . . . . . . . . . . . . . 26 81 4.2.2. Serializer . . . . . . . . . . . . . . . . . . . . . 27 82 4.2.3. Presentation Format . . . . . . . . . . . . . . . . . 27 83 4.3. Formal Semantics Language . . . . . . . . . . . . . . . . 27 84 4.3.1. Cosmogol . . . . . . . . . . . . . . . . . . . . . . 27 85 4.4. Proofs for Semantics . . . . . . . . . . . . . . . . . . 28 86 4.4.1. Isomorphism . . . . . . . . . . . . . . . . . . . . . 28 87 4.4.2. Postel's Law . . . . . . . . . . . . . . . . . . . . 28 88 4.4.3. Termination . . . . . . . . . . . . . . . . . . . . . 28 89 4.4.4. Liveness . . . . . . . . . . . . . . . . . . . . . . 28 90 5. Verified Code . . . . . . . . . . . . . . . . . . . . . . . . 28 91 6. Bibliography . . . . . . . . . . . . . . . . . . . . . . . . 29 92 Appendix A. Command Line Tools . . . . . . . . . . . . . . . . . 31 93 A.1. Installation . . . . . . . . . . . . . . . . . . . . . . 31 94 A.1.1. Download the Docker Image . . . . . . . . . . . . . . 32 95 A.2. Using the "computerate" Command . . . . . . . . . . . . . 32 96 A.3. Using the Idris REPL . . . . . . . . . . . . . . . . . . 33 97 A.4. Using Other Commands . . . . . . . . . . . . . . . . . . 34 98 A.5. Bugs and Workarounds . . . . . . . . . . . . . . . . . . 34 99 A.6. TODO List . . . . . . . . . . . . . . . . . . . . . . . . 35 100 Appendix B. Computerate Specifications Library . . . . . . . . . 35 101 B.1. Installation . . . . . . . . . . . . . . . . . . . . . . 35 102 B.2. Catalog . . . . . . . . . . . . . . . . . . . . . . . . . 36 103 B.2.1. RFC 5234 . . . . . . . . . . . . . . . . . . . . . . 36 104 Appendix C. Extended Registry . . . . . . . . . . . . . . . . . 36 105 Appendix D. Errata Statistics . . . . . . . . . . . . . . . . . 37 106 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 39 107 Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 108 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 42 110 1. Introduction 112 If, as the unofficial IETF motto states, we believe that "running 113 code" is an important part of the feedback provided to the 114 standardization process, then as per the Curry-Howard equivalence 115 [Curry-Howard] (that states that code and mathematical proofs are the 116 same), we ought to also believe that "verified proof" is an equally 117 important part of that feedback. A verified proof is a mathematical 118 proof of a logical proposition that was mechanically verified by a 119 computer, as opposed to just peer-reviewed. 121 The "Experiences with Protocol Description" paper from Pamela Zave 122 [Zave] gives three conclusions about the usage of formal 123 specifications for a protocol standard. The first conclusion states 124 that informal methods (i.e. the absence of verified proofs) are 125 inadequate for widely used protocols. This document is based on the 126 assumption that this conclusion is correct, so its validity will not 127 be discussed further. 129 The second conclusion states that formal specifications are useful 130 even if they fall short of the "gold standard" of a complete formal 131 specification. We will show that a formal specification can be 132 incrementally added to a standard. 134 The third conclusion from Zave's paper states that the normative 135 English language should be paraphrasing the formal specification. 136 The difficulty here is that to be able to keep the formal 137 specification and the normative language synchronized at all times, 138 these two should be kept as physically close as possible to each 139 other. 141 To do that we introduce the concept of "Computerate Specifying" (note 142 that Computerate is a British English word). "Computerate 143 Specifying" is a play on "Literate Computing", itself a play on 144 "Structured Computing" (see [Knuth92] page 99). In the same way that 145 Literate Programming enriches code by interspersing it with its own 146 documentation, Computerate Specifying enriches a standard 147 specification by interspersing it with code (or with proofs, as they 148 are the same thing), making it a computerate specification. 150 Note that computerate specifying is not specific to the IETF, just 151 like literate computing is not restricted to the combination of Tex 152 and Pascal described in Knuth's paper. What this document describes 153 is a specific instance of computerate specifying that combines 154 [AsciiDoc] as formatting language and [Idris] as programming language 155 with the goal of formally specifying IETF protocols. 157 2. Overview of Operations 159 Nowadays specifications at the IETF are written in a format named 160 xml2rfc v3 [RFC7991] but unfortunately making that format 161 "Computerable" is not trivial, mostly because there is no simple 162 solution to mix code and XML together in the same file. Instead, we 163 chose the AsciiDoc format as the basis for computerate specifications 164 as it permits the generation of specifications in the xmlrfc v3 165 format (among other formats) and also because it can be enriched with 166 code in the same file. 168 [I-D.ribose-asciirfc] describes a backend for the [Asciidoctor] tool 169 that converts an AsciiDoc document into an xmlrfc3 document. The 170 AsciiRFC document states various reasons why AsciiDoc is a superior 171 format for the purpose of writing standards, so we will not discuss 172 these further. Note that the same team developed Asciidoctor 173 backends for other Standards Developing Organizations (SDO) 174 [Metanorma], making it easy to develop computerate specifications 175 targeting the standards developed by these SDOs. 177 The code in a computerate specification uses the programming language 178 Idris in literate programming [Literate] mode using the Bird-style, 179 by having each line of code starting with a ">" mark in the first 180 column. 182 That same symbol was also used by AsciiDoc as an alternate way of 183 defining a blockquote [Blockquotes] way which is no longer available 184 in a computerate specification. Bird-style code will simply not 185 appear in the rendered document. 187 The result of Idris code execution can be inserted inside the 188 document part by putting that code fragment in the document between 189 the "{`" string and the "`}" string. 191 A computerate specification is processed by an Asciidoctor 192 preprocessor that does the following: 194 1. Loads the whole document as an Idris program, including importing 195 modules. 197 2. For each instance of an inline code fragment, evaluates that 198 fragment and replace it (including the delimiters) by the result 199 of that evaluation. 201 3. Continues with the normal processing of the modified document. 203 For instance the following computerate specification fragment taken 204 from the computerate specification of STUNbis 206 207 > rto : Int 208 > rto = 500 209 > 210 > rc : Nat 211 > rc = 7 212 > 213 > rm : Int 214 > rm = 16 215 > 216 > -- A stream of transmission times 217 > transmissions : Int -> Int -> Stream Int 218 > transmissions value rto = value :: transmissions (value + rto) 219 > (rto * 2) 220 > 221 > -- Returns a specific transmission time 222 > transmission : Int -> Nat -> Int 223 > transmission timeout i = index i $ transmissions 0 timeout 224 > 225 > a1 : String 226 > a1 = show rto 227 > 228 > a2 : String 229 > a2 = concat (take (rc - 1) (map (\t => show t ++ " ms, ") 230 > (transmissions 0 rto))) ++ "and " ++ show (transmission rto 231 > (rc - 1)) ++ " ms" 232 > 233 > a3 : String 234 > a3 = show $ transmission rto (rc - 1) + rto * rm 235 For example, assuming an RTO of {`a1`}ms, requests would be sent at 236 times {`a2`}. 237 If the client has not received a response after {`a3`} ms, the 238 client will consider the transaction to have timed out. 239 241 Figure 1 243 is rendered as 245 " For example, assuming an 246 RTO of 500ms, requests would be sent at times 0 ms, 500 ms, 1500 ms, 247 3500 ms, 7500 ms, 15500 ms, and 31500 ms. If the client has not 248 received a response after 39500 ms, the client will consider the 249 transaction to have timed out." 251 Figure 2 253 Appendix A explains how to install the command line tools to process 254 a computerate specification. 256 The Idris programming language has been chosen because its type 257 system supports dependent and linear types, and that type system is 258 the language in which formal specifications are written. 260 Following Zave's second conclusion, a computerate specification does 261 not have to be about just formally specifying a protocol and proving 262 properties about it. There is a whole spectrum of formalism that can 263 be introduced in a specification, and we will present it in the 264 remaining sections by increasing order of complexity. Note that 265 because the formal language is a programming language, these usages 266 are not exhaustive, and plenty of other usages can and will be found 267 after the publication of this document. 269 2.1. Libraries 271 A computerate specification does not disappear as soon the standard 272 it describes is published. Quite the opposite, each specification is 273 designed to be used as an Idris module that can be imported in 274 subsequent specifications, reducing over time the amount of code that 275 needs to be written. At the difference of an RFC that is immutable 276 after publication, the code in a specification will be improved over 277 time, especially as new properties are proved or disproved. The 278 latter will happen when a bug is discovered in a specification and a 279 proof of negation is added to the specification, paving the way to a 280 revision of the standard. 282 This document is itself a computerate specification that contains 283 data types and functions that can be reused in future specifications, 284 and as a whole can be considered as the standard library for 285 computerate specifying. 287 For convenience, each public computerate specification, including the 288 one behind this document, will be made available as an individual git 289 repository. Appendix B explains how to gain access to these 290 computerate specifications. 292 2.2. Retrofitting Specifications 294 RFCs, Internet-Drafts and standard documents published by other SDOs 295 did not start their life as computerate specifications, so to be able 296 to use them as Idris modules they will need to be progressively 297 retrofitted. This is done by converting the documents into an 298 AsciiDoc documents and then enriching them with code, in the same way 299 that would have been done if the standard was developed directly as a 300 computerate specification. 302 Converting the whole document in AsciiDoc and enriching it with code, 303 instead of just maintaining a library of code, seems a waste of 304 resources. The reason for doing so is to be able to verify that the 305 rendered text is equivalent to the original standard, which will 306 validate the examples and formal languages. 308 Retrofitted specifications will also be made available as individual 309 git repositories as they are converted. 311 Because the IETF Trust does not permit modifying an RFC as a whole 312 (except for translation purposes), a retrofitted RFC uses 313 transclusion, a mechanism that includes parts of a separate document 314 at runtime. This way, a retrofitted RFC is distributed as two 315 separate files, the original RFC in text form, and a computerate 316 specification that contains only code and transclusions. 318 Transclusion is a special form of AsciiDoc include that takes a range 319 of lines as parameters: 321 include::rfc5234.txt[lines=26..35] 323 Figure 3 325 Here the "include" macro will be replaced by the content of lines 26 326 to 35 (included) of RFC 5234. 328 The "sub" parameter permits modifying the copied content according to 329 a regular expression. For instance the following converts references 330 into the AsciiDoc format: 332 include::rfc5234.txt[lines=121..131,sub="/\[([^\]])\]/<<\1>>/"] 334 Figure 4 336 In the following example, the text is converted into a note: 338 include::rfc5234.txt[lines=151,sub="/^.*$/NOTE: \0/"] 339 Figure 5 341 2.3. Implementation-Oriented Standards 343 After spending a few years implementing standard communication 344 protocols, it becomes quite obvious that not all standards are meant 345 to be directly converted into code. In most case an expert in both 346 communication protocols and software development has to rearrange a 347 standard and its set of dependencies into a form that can be 348 implemented. 350 One sure sign that a standard has first to be rearranged is that the 351 information needed to implement one single network element are spread 352 all over the standard. For instance if for implementing the client 353 side of a client-server protocol one has to collect information from 354 most of the normative sections then that standard is not directly 355 ready for implementation and requires first to put together all these 356 pieces in a convenient form. 358 On the other hand some standards have been structured in a way that 359 matches the workflow of a software implementer. [RFC8489] and 360 [RFC8656] are examples of standards that are meant to be easily 361 implemented. 363 Assuming that a standard is meant to be implemented, it follows that 364 it should be a goal to publish it in a form that makes 365 implementations easier on the software developers. On the other 366 hand, writing a complete implementation as part of the development of 367 an standard is a difficult task, especially as a standard will change 368 over time during its development. 370 Because of the constraint that in a computerate specification the 371 specification should be as close as possible to the equivalent 372 normative text, standards developed using that technique tends to be 373 naturally ready for implementation. This contrasts with the 374 difficulties encounters when retrofitting an existing standard as a 375 computerate specification, where trying to keep the specification 376 close to the text is especially challenging. 378 2.4. Revision of Standards 380 Standards evolve, but because RFCs are immutable, revisions for a 381 standard are done by publishing new RFCs. 383 The matching computerate specifications need to reflect that 384 relationship by extending the data type of syntax and semantics in 385 the new version, instead of recreating new data types from scratch. 386 There are two diametrically opposed directions when extending a type: 388 * The new standard is adding constraints. This is done by indexing 389 the new type over the old type. 391 * The new standard is removing constraints. This is done by 392 defining the new type as a sum type, with one of the alternatives 393 being the old type. 395 | NOTE: This is correct in theory, but in practice creating new 396 | specifications from old ones as described above is not very 397 | convenient. Maybe an alternate solution is to define the new 398 | specifications from scratch, and use an isomorphism proof to 399 | precisely define the differences between the two. An Idris 400 | elaboration script may permit duplicating a type and modifying 401 | it without having to manually copy it. 403 2.5. Content of a Computerate Specification 405 Communication protocol specifications are generally split in two 406 distinct parts, syntax (the data layout of the messages exchanged) 407 and semantics (the rules that drive the exchange of messages). 409 Section 3 will discuss in detail the application of computerate 410 specifying to syntax descriptions, and Section 4 will be about 411 specifying semantics. 413 3. Syntax 415 The syntax of a communication protocol determines how data is laid 416 out before it is sent over a communication link. Generally the 417 syntax is described only in the context of the layer that this 418 particular protocol is operating at, e.g. an application protocol 419 syntax only describes the data as sent over UDP or TCP, not over 420 Ethernet or Wi-Fi. 422 Syntaxes can generally be split into two broad categories, binary and 423 text, and generally a protocol syntax falls completely into one of 424 these two categories. 426 Syntax descriptions can be formalized for at least three reasons that 427 will be presented in the following sections. 429 3.1. Syntax Examples 431 Examples in protocol documentation are frequently incorrect, which 432 should not be that much of an issue but for the fact that most 433 developers do not read the normative text when an example is 434 available. See Appendix D for statistics about the frequency of 435 incorrect examples in RFC errata. 437 Moving the examples into appendices or adding caution notices may 438 show limited success in preventing that problem. 440 So ensuring that examples match the normative text seems like a good 441 starting point for a computerate specification. This is done by 442 having the possibility of adding the result of a computation directly 443 inside the document. If that computation is done from a type that is 444 (physically and conceptually) close to the normative text, then we 445 gain some level of assurance that both the normative text and the 446 derived examples will match. Note that examples can be inserted in 447 the document as whole blocks of text, or as inline text. 449 Appendix B.2.1 showcases the conversion of the examples in [RFC5234]. 451 3.1.1. Data Type 453 The first step is to define an Idris type that completely defines the 454 layout of the messages exchanged. By "completely define" we mean 455 that the type checker will prevent creating any invalid value of this 456 type. That ensures that all values are correct by construction. 458 E.g. here is the definition of a DNS label per [RFC1034]: 460 461 > data PartialLabel' : List Char -> Type where 462 > Empty : PartialLabel' [] 463 > More : (c : Char) -> (prf1 : isAlphaNum c || c == '-' = True) -> 464 > PartialLabel' s -> (prf2 : length s < 61 = True) -> 465 > PartialLabel' (c :: s) 466 > 467 > data Label' : List Char -> Type where 468 > One : (c : Char) -> (prf1 : isAlpha c = True) -> Label' [c] 469 > Many : (begin : Char) -> (prf1 : isAlpha begin = True) -> 470 > (middle : PartialLabel' xs) -> 471 > (end : Char) -> (prf2 : isAlphaNum end = True) -> 472 > Label' ([begin] ++ xs ++ [end]) 473 > 474 > data Label : {a : Type} -> a -> Type where 475 > MkLabel : {xs : String} -> Label' (unpack xs) -> Label xs 476 478 Figure 6 480 3.1.2. Serializer 482 The second step is writing a serializer from that type into the wire 483 representation. For a text format, it is done by implementing the 484 Show interface: 486 487 > Show (Label xs) where 488 > show _ = xs 489 491 Figure 7 493 | NOTE: Define binary serializer. 495 3.1.3. Presentation Format 497 Instead of directly generating character strings, the serializer will 498 use as target a dependent type that formalizes the AsciiDoc language. 499 This will permit to know the context in which a character string will 500 be subsequently generated and to correctly escape special characters 501 in that string. 503 Using an intermediary type will also permit to correctly generate 504 AsciiDoc that can generate an xmlrfc 3 file that supports both text 505 and graphical versions of a figure. This will be done by having 506 AsciiDoc blocks converted into elements that contains both 507 the 72 column formatted text and an equivalent SVG file, even for 508 code source (instead of using the element). 510 3.2. Designing a Data Type 512 Builtin data types in Idris are convenient but as a general rule 513 should not be used in the design of data types meant to be used in 514 the context of computerate specifications. Builtin types are 515 abstract data types so most basic proofs about them have to be 516 axioms, making it difficult to reason about them, which does not 517 constitute a solid ground on which to build a verifiable system. 519 The builtin types in Idris are "Int", "Integer", "Double", "Char" and 520 "String". 522 On the other hand user-defined data types like "Nat", "Fin", "Dec" (a 523 proof carrying variant of "Bool"), "Maybe", "Either", "List", "Vect", 524 etc. have already plenty of proofs that can be reused, so should be 525 used in designing data types for computerate specifications. 527 Designing a user-defined data type is done by combining five basic 528 types: 530 * Product type: This is a type built from concatenation of types, 531 similar to Java's "class" or C's "struct". 533 * Sum type: This is a type built from alternative types. It can be 534 simulated with inheritance in Java, or a combination of "struct" 535 and "union" in C. A Java "enum" is a very limited form of Sum 536 type. 538 * Exponential type: This is the type of a total function (which 539 implies free of side-effect and use of external variables). 541 * Pi type: This is a product type (or a exponential type) where the 542 type depends on a value. This is equivalent to the universal 543 quantifier. 545 * Sigma type: This is a sum type where the type depends on a value. 546 This is equivalent to the existential quantifier. 548 In addition the underlying computerate specification of this document 549 defines a set of data types that are suited for computerate 550 specifications: 552 * The "Bitvector" type describes a fixed number of bits that can be 553 manipulated together. The library contains functions to 554 manipulate bitvectors that are inspired by the language in 555 [Brinkmann02]. 557 * The "Unsigned" type describes an unsigned integer data type that 558 is built on top on the Bitvector library. The library contains 559 ordering and arithmetic operators for unsigned numbers. 561 * The "Signed" type describes a signed integer data type that is 562 built on top on the Bitvector library. The library contains 563 ordering and arithmetic operators for signed numbers. 565 | NOTE: Other RFCs are in the process of been retrofitted to 566 | implement data types built on top of the "Bitvector" type. 567 | This is the case for IPv4 and IPv6 address, MAC address, UUID, 568 | Unicode and many more. A future version of this document will 569 | add the IdrisDoc rendering of these libraries as appendices. 571 3.3. Formal Syntax Language 573 Some specifications use a formal language to describe the data 574 layout. One shared property of these languages is that they cannot 575 always formalize all the constraints of a specific data layout, so 576 they have to be enriched with comments. One consequence of this is 577 that they cannot be used as a replacement for the Idris data type 578 described in Section 3.1.1, data type that is purposely complete. 580 The following sections describe how these formal languages have been 581 or will be themselves formalized with the goal of using them in 582 computerate specifications. 584 3.3.1. Augmented BNF (ABNF) 586 Augmented Backus-Naur Form [RFC5234] (ABNF) is a formal language used 587 to describe a text based data layout. 589 The [RFC5234] document has been retrofitted as a computerate 590 specification to provide an internal Domain Specific Language (DSL) 591 that permits specifying an ABNF for a specification. The encoding of 592 an example from Section 2.3 of [RFC5234] looks like this: 594 595 > rulename : Rule 596 > rulename = "rulename" `Eq` (Concat (TermDec 97 []) (TermDec 98 []) 597 > [TermDec 99 []]) 598 600 Figure 8 602 A serializer, also defined in the same specification, permits 603 converting that description into proper ABNF text that can be 604 inserted into the document such as in the following fragment: 606 607 [source,abnf] 608 ---- 609 {`show rulename`} 610 ---- 611 613 Figure 9 615 is rendered as 617 rulename = %d97 %d98 %d99 619 Figure 10 621 See Appendix B.2.1 for access to the source of the retrofitted 622 specification for [RFC5234]. 624 3.3.2. Augmented Packet Header Diagrams (APHD) 626 Augmented Packet Header Diagram 627 [I-D.mcquistin-augmented-ascii-diagrams] (APHD) is a formal language 628 used to describe a binary data layout in a machine-readable format. 630 The [I-D.mcquistin-augmented-ascii-diagrams] document will be 631 retrofitted as a computerate specification to provide an internal 632 Domain Specific Language (DSL) that permits specifying an APHD for a 633 specification. The partial encoding of an example from section 4.1 634 looks like this: 636 637 > ipv4 : Aphd 638 > ipv4 = MkAphd "IPv4 Header" [ 639 > MkField "Version" (Just "V") (Number 4) Bits "This is a" ++ 640 > " fixed-width field, whose full label is shown in the" ++ 641 > " diagram. The field's width -- 4 bits -- is given in" ++ 642 > " the label of the description list, separated from the" ++ 643 > " field's label by a colon.", 644 > ... 645 > MkField "Options" Nothing (Mult (Sub (Name "IHL") (Number 5)) 646 > (Number 32)) Bits "This is a variable-length field, whose" ++ 647 > " length is defined by the value of the field with short" ++ 648 > " label IHL (Internet Header Length). Constraint" ++ 649 > " expressions can be used in place of constant values: the" ++ 650 > " grammar for the expression language is defined in" ++ 651 > " Appendix A.1. Constraints can include a previously" ++ 652 > " defined field's short or full label, where one has been" ++ 653 > " defined. Short variable-length fields are indicated by" ++ 654 > " \"...\" instead of a pipe at the end of the row." 655 > ... 656 > ] 657 659 Figure 11 661 A serializer, also defined in the same specification, permits 662 converting that description into proper ABNF text that can be 663 inserted into the document such as in the following fragment: 665 666 .... 667 {`show ipv4`} 668 .... 669 671 Figure 12 673 is rendered as 675 An IPv4 Header is formatted as follows: 676 0 1 2 3 677 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 678 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 679 |Version| IHL | DSCP |ECN| Total Length | 680 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 681 | Identification |Flags| Fragment Offset | 682 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 683 | Time to Live | Protocol | Header Checksum | 684 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 685 | Source Address | 686 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 687 | Destination Address | 688 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 689 | Options ... 690 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 691 | : 692 : Payload : 693 : | 694 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 696 where: 698 Version (V): 4 bits. This is a fixed-width field, whose full label 699 is shown in the diagram. The field's width -- 4 bits -- is given 700 in the label of the description list, separated from the field's 701 label by a colon. 703 ... 705 Options: (IHL-5)*32 bits. This is a variable-length field, whose 706 length is defined by the value of the field with short label IHL 707 (Internet Header Length). Constraint expressions can be used in 708 place of constant values: the grammar for the expression language 709 is defined in Appendix A.1. Constraints can include a previously 710 defined field's short or full label, where one has been defined. 711 Short variable-length fields are indicated by "..." instead of a 712 pipe at the end of the row. 714 ... 716 Figure 13 718 Here the serializer generates an instance of the intermediary 719 AsciiDoc type that describes the header line (which can be 720 concatenated to previous lines), the block containing the diagram 721 itself, and then a list of all the field definitions. 723 3.3.3. Mathematical Formulas 725 AsciiDoc supports writing equations using either asciimath or 726 latexmath. The rendering for RFCs will generate an artwork element 727 that contains both the text version of the equation and a graphical 728 version in an SVG file. 730 | NOTE: Not sure what to do with inline formulas, as we cannot 731 | generate an artwork element in that case. 733 An Idris type will be used to described equations at the type level. 734 An interpreter will be used to calculate and insert examples in the 735 document. 737 A serializer will be used to generate the asciimath code that is 738 inserted inside a stem block. 740 3.3.4. RELAX NG Format 742 TBD 744 3.3.5. ASN.1 746 TBD 748 3.3.6. TLS Description Language 750 TBD 752 3.4. Proofs for Syntax 754 The kind of proofs that one would want in a specification are related 755 to isomorphism, i.e. a guarantee that two or more descriptions of a 756 data layout contain exactly the same information. 758 3.4.1. Isomorphism Between Type and Formal Language 760 We saw above that when a data layout is described with a formal 761 language, we end up with two descriptions of that data layout, one 762 using the Idris dependent type (and used to generate examples) and 763 one using the formal language. 765 Proving isomorphism requires generating an Idris type from the formal 766 language instance, which is done using an Idris elaborator script. 768 In Idris, Elaborator Reflection [Elab] is a metaprogramming facility 769 that permits writing code generating type declarations and code 770 (including proofs) automatically. 772 For instance the ABNF language is itself defined using ABNF, so after 773 converting that ABNF into an instance of the Syntax type (which is an 774 holder for a list of instances of the Rule type), it is possible to 775 generate a suite of types that represents the same language: 777 778 > abnf : Syntax 779 > abnf = MkSyntax [ 780 > "rulelist" `Eq` (Repeat (Just 1) Nothing (Group (Altern 781 > (TermName "rule") (Group (Concat (Repeat Nothing Nothing 782 > (TermName "c-wsp")) (TermName "c-nl") [])) []))), 783 > ... 784 > ] 785 > 786 > %runElab (generateType "Abnf" abnf) 787 789 Figure 14 791 The result of the elaboration can then be used to construct a value 792 of type Iso, which requires four total functions, two for the 793 conversion between types, and another two to prove that sequencing 794 the conversions results in the same original value. 796 The following example generates an Idris type "SessionDescription" 797 from the SDP ABNF. It then proves that this type and the Sdp type 798 contain exactly the same information (the proofs themselves have been 799 removed, leaving only the propositions): 801 802 > import Data.Control.Isomorphism 803 > 804 > sdp : Syntax 805 > sdp = MkSyntax [ 806 > "session-description" `Eq` (Concat (TermName "version-field") 807 > (TermName "origin-field") [ 808 > TermName "session-name-field", 809 > Optional (TermName "information-field"), 810 > Optional (TermName "uri-field"), 811 > Repeat Nothing Nothing (TermName "email-field"), 812 > Repeat Nothing Nothing (TermName "phone-field"), 813 > Optional (TermName "connection-field"), 814 > Repeat Nothing Nothing (TermName "bandwidth-field"), 815 > Repeat (Just 1) Nothing (TermName "time-description"), 816 > Optional (TermName "key-field"), 817 > Repeat Nothing Nothing (TermName "attribute-field"), 818 > Repeat Nothing Nothing (TermName "media-description") 819 > ]), 820 > ... 821 > ] 822 > 823 > %runElab (generateType "Sdp" sdp) 824 > 825 > same : Iso Sdp SessionDescription 826 > same = MkIso to from toFrom fromTo 827 > where 828 > to : Sdp -> SessionDescription 829 > 830 > from : SessionDescription -> Abnf 831 > 832 > toFrom : (x : SessionDescription ) -> to (from x) = x 833 > 834 > fromTo : (x : Sdp) -> from (to x) = x 835 > 836 838 Figure 15 840 As stated in Section 3.3, the Idris type and the type generated from 841 the formal language are not always isomorphic, because some 842 constraints cannot be expressed in that formal language. In that 843 case isomorphism can be used to precisely define what is missing 844 information in the formal language type. To do so, the generated 845 type is augmented with a delta type, like so: 847 848 > data DeltaSessionDescription : Type where 849 > ... 850 > 851 > same : Iso Sdp (SessionDescription, DeltaSessionDescription) 852 > ... 853 855 Figure 16 857 Then the DeltaSessionDescription type can be modified to include the 858 missing information until the same function type checks. After this 859 we have a guarantee that we know all about the constraints that 860 cannot be encoded in that formal language, and can check manually 861 that each of them is described as comment. 863 Idris elaborator scripts will be developped for each formal 864 languages. 866 3.4.2. Data Format Conversion 868 For specifications that describe a conversion between different data 869 layouts, having a proof that guarantees that no information is lost 870 in the process can be beneficial. For instance, we observe that 871 syntax encoding tends to be replaced each ten years or so by 872 something "better". Here again isomorphism can tell us exactly what 873 kind of information we lost and gained during that replacement. 875 Here, for example, the definition of a function that would verify an 876 isomorphism between an XML format and a JSON format: 878 879 > isXmlAndJsonSame: Iso (XML, DeltaXML) (JSON, DeltaJson) 880 > ... 881 883 Figure 17 885 Here DeltaXML expresses what is gained by switching from XML to JSON, 886 and DeltaJson expresses what is lost. 888 3.4.3. Interoperability with Previous Versions 890 The syntax of the data layout may be modified as part of the 891 evolution of a standard. In most case a version number prevents old 892 format to be used with the new format, but in cases where that it is 893 not possible, the new specification can ensure that both formats can 894 co-exist by using the same techniques as above. 896 Conversely these techniques can be used during the design phase of a 897 new version of a format, to check if a new version number is 898 warranted. 900 3.4.4. Postel's Law 902 | Be conservative in what you do, be liberal in what you accept from 903 | others. 904 | 905 | -- Jon Postel 907 One of the downsides of formal specifications is that there is no 908 wiggle room possible when implementing it. An implementation either 909 conforms to the specification or does not. 911 One analogy would be specifying a pair of gears. If one decides to 912 have both of them made with tolerances that are too small, then it is 913 very likely that they will not be able to move when put together. A 914 bit of slack is needed to get the gear smoothly working together but 915 more importantly the cost of making these gears is directly 916 proportional to their tolerance. There is an inflexion point where 917 the cost of an high precision gear outweighs its purpose. 919 We have a similar issue when implementing a formal specification, 920 where having an absolutely conformant implementation may cost more 921 money than it is worth spending. On the other hand a specification 922 exists for the purpose of interoperability, so we need some 923 guidelines on what to ignore in a formal specification to make it 924 cost effective. 926 Postel's law proposes an informal way of defining that wiggle room by 927 actually having two different specifications, one that defines a data 928 layout for the purpose of sending it, and another one that defines a 929 data layout for the purpose of receiving that data layout. 931 Existing specifications express that dichotomy in the form of the 932 usage of SHOULD/SHOULD NOT/RECOMMENDED/NOT RECOMMENDED [RFC2119] 933 keywords. For example the SDP spec says that "[t]he sequence CRLF 934 (0x0d0a) is used to end a line, although parsers SHOULD be tolerant 935 and also accept lines terminated with a single newline character." 936 This directly infers two specifications, one used to define an SDP 937 when sending it, that enforces using only CRLF, and a second 938 specification, used to define an SDP when receiving it (or parsing 939 it), that accepts both CRLF and LF. 941 Note that the converse is not necessarily true, i.e. not all usages 942 of these keywords are related to Postel's Law. 944 To ensure that the differences between the sending specification and 945 the receiving specification do not create interoperability problems, 946 we can use a variant of isomorphism, as shown in the following 947 example (data constructors and code elided): 949 950 > data Sending : Type where 951 > 952 > data Receiving : Type where 953 > 954 > to : Sending -> List Receiving 955 > 956 > from : Receiving -> Sending 957 > 958 > toFrom : (y : Receiving) -> Elem y (to (from y)) 959 > 960 > fromTo : (y : Sending) -> True = all (== y) [from x | x <- to y] 961 > 962 964 Figure 18 966 Here we define two data types, one that describes the data layout 967 that is permitted to be sent (Sending) and one that describes the 968 data layout that is permitted to be received (Receiving). For each 969 data layout that is possible to send, there is one or more matching 970 receiving data layouts. This is expressed by the function "to" that 971 takes as input one Sending value and returns a list of Receiving 972 values. 974 Conversely, the "from" function maps a Receiving data layout onto a 975 Sending data layout. Note the asymmetry there, which prevents to use 976 a standard proof of isomorphism. 978 Then the "toFrom" and "fromTo" proofs verify that there is no 979 interoperability issue by guaranteeing that each Receiving value maps 980 to one and only one Sending instance and that this mapping is 981 isomorphic. 983 All of this will provide a clear guidance of when and where to use a 984 SHOULD keyword or its variants, without loss of interoperability. 986 As an trivial example, the following proves that accepting LF 987 characters in addition to CRLF characters as end of line markers does 988 not break interoperability: 990 991 > data Sending : Type where 992 > S_CRLF : Sending 993 > 994 > Eq Sending where 995 > (==) S_CRLF S_CRLF = True 996 > 997 > data Receiving : Type where 998 > R_CRLF : Receiving 999 > R_LF : Receiving 1000 > 1001 > to : Sending -> List Receiving 1002 > to S_CRLF = [R_CRLF, R_LF] 1003 > 1004 > from : Receiving -> Sending 1005 > from R_CRLF = S_CRLF 1006 > from R_LF = S_CRLF 1007 > 1008 > toFrom : (y : Receiving) -> Elem y (to (from y)) 1009 > toFrom R_CRLF = Here 1010 > toFrom R_LF = There Here 1011 > 1012 > fromTo : (y : Sending) -> True = all (== y) [from x | x <- to y] 1013 > fromTo S_CRLF = Refl 1014 1016 Figure 19 1018 3.5. Extended Registries 1020 Often parts of a data layout are left unspecified, so they can be 1021 defined in separate specifications. This is mainly used for 1022 extensibility purpose. 1024 In most cases, an external registry is maintained with the list of 1025 specifications that can be used to make sense of these unspecified 1026 parts. 1028 To make sense of an unspecified part in a data layout, 3 pieces of 1029 information are needed: 1031 * The specification that defined the registry. 1033 * The content of the registry itself. 1035 * Each specification that defined one element of that registry. 1037 Defining a data layout for a protocol means also bringing together 1038 these pieces of information so a Sum type of all the specified parts 1039 can be put together. Unfortunately the information available in 1040 these registry is not sufficient to to bring the exact data layout 1041 from the specification of a single element in that Sum type. One 1042 reason if that multiple data layouts can be defined in the 1043 computerate specification of a standard, but there is no indication 1044 in the registry to pin point the exact data type. 1046 For instance IANA is the organization that is maintaining the 1047 registries for the IETF. The "Assigned Internet Protocol Number" 1048 (https://www.iana.org/assignments/protocol-numbers/protocol- 1049 numbers.xml) is an example of a registry that contains the list of 1050 all protocols that can be carried by the Internet Protocol, and was 1051 most recently defined by RFC 5237. The first entry in that registry 1052 was defined by RFC 8200, the next on by RFC 791, and so on for each 1053 entry of that registry. RFC 8200 contains the description for 1054 multiple entries in the registry, and so an additional mechanism is 1055 needed to differentiate them. 1057 That additional mechanism is abstracted as an extended registry that 1058 complements the existing registry, but for the sole purpose of 1059 generating that Sum type. This abstract registry is filled by 1060 information coming from the type-checking of the data layout types in 1061 the respective computerate specifications. It links a specific entry 1062 in the registry with the type pf the data layout in the 1063 specification. 1065 When the specification for a registry is type-checked, it 1066 automatically download the external registry, and access the abstract 1067 extended registry. For each extended entry, the type-checker can 1068 automatically generate a data constructor that reference the data 1069 layout defining in the entry specification. For entries that does 1070 not have an extension (i.e. no computerate specifications exist at 1071 this time), a placeholder data constructor is generated. 1073 Appendix C describes an implementation of the extended registry 1074 mechanism. 1076 4. Semantics 1078 The semantics of a communication protocol determine what messages are 1079 exchanged over a communication link and the relationship between 1080 them. The semantics are generally described only in the context of 1081 the layer that this particular protocol is operating at. 1083 4.1. Typed Petri Nets 1085 The semantics of a specification require to define an Idris type that 1086 strictly enforces these semantics. This can be done in an ad hoc way 1087 [Type-Driven], particularly by using linear types that express 1088 resources' consumption. 1090 But a better solution is to design these graphically, particularly by 1091 using Petri Nets. This specification defines a DSL that permits 1092 describing a Typed Petri Net (TPN) which is heavily influenced by 1093 Coloured Petri Nets [CPN] (CPN). A CPN adds some restriction on the 1094 types that can be used in a Petri Net because of limitations in the 1095 underlying programming language, SML. The underlying programming 1096 used in TPN, Idris, does not have these limitations, so any well- 1097 formed Idris type (including polymorphic, linear and dependent types) 1098 can be directly used in TPN. 1100 | NOTE 1: A graphical editor for TPN is planned as part of the 1101 | integration tooling. The graphical tool will use the document 1102 | directly as storage. 1104 Here's an example of TPN (from figure 2.10 in [CPN]): 1106 1107 > NO : Type 1108 > NO = Int 1109 > 1110 > DATA : Type 1111 > DATA = String 1112 > 1113 > NOxDATA : Type 1114 > NOxDATA = (NO, DATA) 1115 > 1116 > PTS : Place 1117 > PTS = MkPlace "Packets To Send" NOxDATA (\() => [(1, "COL"), 1118 > (2, "OUR"), (3, "ED "), (4, "PET"), (5, "RI "), (6, "NET")]) 1119 > 1120 > NS : Place 1121 > NS = MkPlace "NextSend" NO (\() => [1]) 1122 > 1123 > A : Place 1124 > A = MkPlace "A" NOxDATA (\() => []) 1125 > 1126 > input1 : Input 1127 > input1 = MkInput PTS (NO, DATA) pure 1128 > 1129 > input2 : Input 1130 > input2 = MkInput NS NO pure 1131 > 1132 > output1 : Output 1133 > output1 = MkOutput PTS (NO, DATA) pure 1134 > 1135 > output2 : Output 1136 > output2 = MkOutput NS NO pure 1137 > 1138 > output3 : Output 1139 > output3 = MkOutput A (NO, DATA) pure 1140 > 1141 > sendPacket : Transition 1142 > sendPacket = MkTransition [input1, input2] [output1, output2, 1143 > output3] (\((n, d), n') => if n == n' 1144 > then pure ((n, d), n, (n, d)) 1145 > else empty) 1146 1148 Figure 20 1150 | NOTE: The DSL is being currently designed, so the example shows 1151 | the generated value. 1153 From there it is easy to generate (using the non-deterministic monad 1154 in Idris) an interpreter for debugging and simulation purposes: 1156 1157 > interpret : MS NOxDATA -> MS NO -> MS NOxDATA -> 1158 > ND (MS NOxDATA, MS NO, MS NOxDATA) 1159 > interpret pts ns a = do 1160 > (pts1, pts2) <- sel pts 1161 > (ns1, ns2) <- sel ns 1162 > i1 <- input' input1 pts1 1163 > i2 <- input' input2 ns1 1164 > (pts3, ns3, a3) <- transition' sendPacket (i1, i2) 1165 > let o1 = output' output1 pts3 1166 > let o2 = output' output2 ns3 1167 > let o3 = output' output3 a3 1168 > pure (o1 ++ pts2, o2 ++ ns2, o3 ++ a) 1169 1171 Figure 21 1173 | NOTE: Replace by the generic variant of the interpreter. 1175 A Petri Net has the advantage that the same graph can be reused to 1176 derive other Petri Nets, e.g., Timed Petri Nets (that can be used to 1177 collect performance metrics) or Stochastic Petri Nets. 1179 | NOTE 2: The traditional way of verifying a Petri Net is by 1180 | using model checking. There is nothing in the design that 1181 | prevents doing that, but because that takes quite some time to 1182 | run and so cannot be part of the document processing, how do we 1183 | store in the document a proof that the model checking was 1184 | successful? 1186 A TPN that covers a whole protocol (i.e. client, network, and server) 1187 is useful to prove the properties listed in the previous sections. 1188 But the TPN is also designed in a way that each of these parts can be 1189 defined separately from the others (making it a Hierarchical TPN). 1191 4.2. Semantics Examples 1193 Semantics examples can be wrong, so it is useful to be sure that they 1194 match the specification. 1196 4.2.1. Data Type 1198 As explained above, semantics can be described in an ad hoc manner, 1199 or using the TPN DSL. 1201 4.2.2. Serializer 1203 At the difference of syntax, where there are more or less as many 1204 ways to display them than there are syntaxes, semantics examples 1205 generally use sequence diagrams, eventually augmented with the 1206 content of the packets exchanged (and so using the techniques 1207 described in Section 3.1). 1209 Similarly to what is done in Section 3.3.2, an Asciidoctor block 1210 processor similar to the "msc" type of diagram used by the 1211 asciidoctor-diagram extension will be designed. 1213 | NOTE 1: We unfortunately cannot reuse the asciidoctor-diagram 1214 | extension because it cannot generate both text and SVG versions 1215 | of a sequence diagram. 1217 The serializer for an example derived from a TPN generates the 1218 content of the msc AsciiDoc block, by selecting one particular path 1219 and its associated bindings through the Petri Net. 1221 | NOTE 2: We probably want to use AsciiDoc callouts for these, 1222 | although that would require a modification in AsciiRfc. In 1223 | fact callout would be a far better technique for other 1224 | diagrams, like AAD, as it will let the renderer take care of 1225 | the best way to place elements depending on the output format. 1227 4.2.3. Presentation Format 1229 TBD. 1231 4.3. Formal Semantics Language 1233 Some specifications use a formal language to describe the state 1234 machines. One shared property of these languages is that they cannot 1235 always formalize all the constraints of specific semantics, so they 1236 have to be enriched with comments. One consequence of this is that 1237 they cannot be used as a replacement for the Idris data type 1238 described in Section 4.1, a data type that is purposely complete. 1240 4.3.1. Cosmogol 1242 Cosmogol [I-D.bortzmeyer-language-state-machines] is a formal 1243 language designed to define states machines. The Internet-Draft will 1244 be retrofitted as a computerate specification to provide an internal 1245 Domain Specific Language (DSL) that permits specifying an instance of 1246 that language. A serializer and elaborator script will also be 1247 defined. 1249 Finally, an Asciidoctor block processor would be used to convert the 1250 language into both a text and a graphical view of the state machine. 1252 | NOTE: Add examples there. 1254 4.4. Proofs for Semantics 1256 Like for syntax formal languages, an elaborator script permits 1257 generating a type from a TPN instance. That type can then be used to 1258 write proofs of the properties that we expect from the semantics. 1260 4.4.1. Isomorphism 1262 An isomorphism proof can be used between two types derived from the 1263 semantics of a specification, for example to prove that no 1264 information is lost in the converting between the underlying 1265 processes, or when upgrading a process. 1267 An example of that would be to prove (or more likely disprove) that 1268 the SIP state machines are isomorphic to the WebRTC state machines. 1270 4.4.2. Postel's Law 1272 Like for the syntax, semantics can introduce wiggle room between the 1273 state machines on the sending side and the state machines on the 1274 receiving side. A similar isomorphism proof can be used to ensure 1275 that this is done without loss of interoperability. 1277 4.4.3. Termination 1279 The TPN type can be used to verify that the protocol actually 1280 terminates, or that it always returns to its initial state. This is 1281 equivalent to proving that a program terminates. 1283 4.4.4. Liveness 1285 The TPN type can be used to verify that the protocol is productive, 1286 i.e. that it does not loop without making progress. 1288 5. Verified Code 1290 When applied, the techniques described in Section 3 and Section 4 1291 result in a formal specification, in the form of a set of types. 1292 Types are logical propositions so proofs and disproofs can be written 1293 about them. Interpreted as code, these eventual proofs happen to be 1294 proofs that a specification is implementable. 1296 To make these pieces of code composable, a specification is split in 1297 multiple modules, each one represented as a unique function. The 1298 type of each of these functions is derived from the hierarchical TPNs 1299 described in Section 4, by bundling together all the inputs of the 1300 TPN module as the input for that function, and bundling all the 1301 outputs of the TPN module as the output of this function. 1303 For instance the IPv4 layer is really 4 different functions: 1305 * A function that converts between a byte array and a tree 1306 representation (parsing). 1308 * A function that takes a tree representation and a maximum MTU and 1309 returns a list of tree representations, each one fitting inside 1310 the MTU. 1312 * A function that accumulates tree representations of an IP fragment 1313 until a tree representation of a full IP packet can be returned. 1315 * A function that convert a tree representation into a byte array. 1317 The description of each function is incomplete, as in addition to the 1318 input and the output listed, these functions needs some ancillary 1319 data, in the form of: 1321 * state, which is basically values stored between evaluations of a 1322 function, 1324 * an optional signal, that can be used as an API request or 1325 response. As timers are a fundamental building block for 1326 communication protocols, one common uses for that signal are to 1327 request the arming of a timer, and to receive the indication of 1328 the expiration of that timer. 1330 To unclutter the signature of these function, these ancillary data 1331 are passed to the function using an ad-hoc monad named the Impl Monad 1332 (Impl is short for Implementation or Implementable). To unclutter 1333 even further the signature of these functions, that same Impl monad 1334 can be used to store the various proofs consumed and produced by 1335 these functions, making these function more easily composable. 1337 6. Bibliography 1339 [RFC0761] Postel, J., "DoD standard Transmission Control Protocol", 1340 IETF RFC 0761, DOI 10.17487/RFC0761, January 1980, 1341 . 1343 [RFC7991] Hoffman, P., "The "xml2rfc" Version 3 Vocabulary", 1344 IETF RFC 7991, DOI 10.17487/RFC7991, December 2016, 1345 . 1347 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 1348 Specifications: ABNF", IETF RFC 5234, 1349 DOI 10.17487/RFC5234, January 2008, 1350 . 1352 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", 1353 IETF RFC 1034, DOI 10.17487/RFC1034, November 1987, 1354 . 1356 [RFC8489] Petit-Huguenin, M., Salgueiro, G., Rosenberg, J., Wing, 1357 D., Mahy, R., and P. Matthews, "Session Traversal 1358 Utilities for NAT (STUN)", IETF RFC 8489, 1359 DOI 10.17487/RFC8489, February 2020, 1360 . 1362 [RFC8656] Reddy, T., Ed., Johnston, A., Ed., Matthews, P., and J. 1363 Rosenberg, "Traversal Using Relays around NAT (TURN): 1364 Relay Extensions to Session Traversal Utilities for NAT 1365 (STUN)", IETF RFC 8656, DOI 10.17487/RFC8656, February 1366 2020, . 1368 [I-D.bortzmeyer-language-state-machines] 1369 Bortzmeyer, S., "Cosmogol: a language to describe finite 1370 state machines", IETF I-D.bortzmeyer-language-state- 1371 machines, November 2006. 1373 [I-D.mcquistin-augmented-ascii-diagrams] 1374 McQuistin, S., Band, V., and C. Perkins, "Describing 1375 Protocol Data Units with Augmented Packet Header 1376 Diagrams", IETF I-D.mcquistin-augmented-ascii-diagrams, 1377 February 2020. 1379 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1380 Requirement Levels", IETF RFC 2119, DOI 10.17487/RFC2119, 1381 March 1997, . 1383 [I-D.ribose-asciirfc] 1384 Tse, R., Nicholas, N., and P. Brasolin, "AsciiRFC: 1385 Authoring Internet-Drafts And RFCs Using AsciiDoc", 1386 IETF I-D.ribose-asciirfc, April 2018. 1388 [Curry-Howard] 1389 "Curry-Howard correspondence", 1390 . 1393 [Zave] "Experiences with Protocol Description", October 2011. 1395 [Knuth92] "Literate Programming", 1992. 1397 [Elab] "Elaborator reflection: extending Idris in Idris", 2016. 1399 [AsciiDoc] "AsciiDoc", . 1401 [Asciidoctor] 1402 "Asciidoctor", 1403 . 1405 [Metanorma] 1406 "Metanorma", . 1408 [Idris] "Idris: A Language with Dependent Types", 1409 . 1411 [Literate] "Literate programming", . 1415 [Blockquotes] 1416 "Markdown-style blockquotes", 1417 . 1420 [CPN] "Coloured Petri Nets: Modelling and Validation of 1421 Concurrent Systems", 2009. 1423 [Type-Driven] 1424 "Type-Driven Development with Idris", 2017. 1426 [Brinkmann02] 1427 "RTL-datapath verification using integer linear 1428 programming", 2009. 1430 Appendix A. Command Line Tools 1432 A.1. Installation 1434 The computerate command line tools are run inside a Docker image, so 1435 the first step is to install the Docker software or verify that it is 1436 up to date (https://docs.docker.com/install/). 1438 Note that for the usage described in this document there is no need 1439 for Docker EE or for having a Docker account. 1441 The following instructions assume a Unix based OS, i.e. Linux or 1442 MacOS. Lines separated by a "\" character are meant to be executed 1443 as one single line, with the "\" character removed. 1445 A.1.1. Download the Docker Image 1447 To install the computerate tools, the fastest way is to download and 1448 install the Docker image using a temporary image containing the dat 1449 tool: 1451 docker pull veggiemonk/dat-docker 1452 mkdir computerate 1453 cd computerate 1454 docker run --rm -u $(id -u):$(id -g) -v \ 1455 $(pwd):/tools veggiemonk/dat-docker dat clone \ 1456 dat://6a33cb289d5818e3709f62e95df41be537edba5f4dc26593e2cb870c7982345b \ 1457 tools 1459 Figure 22 1461 After this, the image can be loaded in Docker. The newly installed 1462 Docker image also contains the dat command, so there is no need to 1463 keep the veggiemonk/dat-docker image after this: 1465 docker load -i tools.tar.xz 1466 docker image rm --force veggiemonk/dat-docker 1468 Figure 23 1470 A new version of the tools is released at the same time a new version 1471 of this document is released. After this, running the following 1472 command in the computerate directory will pull any new version of the 1473 tool tar file: 1475 docker run --rm -u $(id -u):$(id -g) \ 1476 -v $(pwd):/computerate computerate/tools dat pull --exit 1478 Figure 24 1480 The docker image can then be loaded as above. 1482 A.2. Using the "computerate" Command 1484 The Docker image main command is "computerate", which takes the same 1485 parameters as the "metanorma" command from the Metanorma tooling: 1487 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1488 computerate/tools computerate -t ietf -x txt 1490 Figure 25 1492 The differences with the "metanorma" command are: 1494 * The "computerate" command can process Literate Idris files (files 1495 with a "lidr" extension, aka lidr files), in addition to AsciiDoc 1496 files (files with an "adoc" extension, aka adoc files). When a 1497 lidr file is processed, all embedded code fragments (text between 1498 prefix "{`" and suffix "`}") are evaluated in the context of the 1499 Idris code contained in this file. Each code fragment (including 1500 the prefix and suffix) are then substituted by the result of that 1501 evaluation. 1503 * The "computerate" command can process included lidr files in the 1504 same way. The embedded code fragments in the imported file are 1505 processed in the context of the included lidr file, not in the 1506 context of the including file. Idris modules (either from an idr 1507 or lidr file) can be imported the usual way. 1509 * The literate code (which is all the text that is starting by a ">" 1510 symbol in column 1) in a lidr file will not be part of the 1511 rendered document. 1513 * The computerate command can process transclusions, as explained in 1514 Section 2.2. 1516 * Lookup of external references is disabled. Use either raw XML 1517 references or an external directory. 1519 * Instead of generating a file based on the name of the input file, 1520 the "computerate" command generates a file based on the ":name:" 1521 attribute in the header of the document. 1523 The "computerate" command can also be used to generate an xmlrfc v3 1524 file, ready for submission to the IETF: 1526 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1527 computerate/tools computerate -t ietf -x rfc 1529 Figure 26 1531 A.3. Using the Idris REPL 1533 idr and lidr files can be loaded directly in the Idris REPL for 1534 debugging: 1536 docker run --rm -it -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1537 computerate/tools idris 1539 Figure 27 1541 It is possible to directly modify the source code in the REPL by 1542 entering the ":e" command, which will load the file in an instance of 1543 VIM preconfigured to interact with the REPL. 1545 Alternatively the Idris REPL can be started as a server: 1547 docker run --rm -it -u $(id -u):$(id -g) -p 127.0.0.1:4294:4294 \ 1548 -v $(pwd):/computerate computerate/tools idris 1550 Figure 28 1552 Then if a source file is loaded in a separate console with the VIM 1553 instance inside the Docker image, it can interact with the REPL 1554 server: 1556 docker run --rm -u $(id -u):$(id -g) --net=host \ 1557 -v $(pwd):/computerate computerate/tools vim 1559 Figure 29 1561 Note that both commands must be run from the same directory. 1563 A.4. Using Other Commands 1565 For convenience, the docker image provides the latest version of the 1566 xml2rfc, aspell, and idnits tools. 1568 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1569 computerate/tools xml2rfc 1570 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1571 computerate/tools idnits --help 1572 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1573 computerate/tools aspell 1575 Figure 30 1577 The Docker image also contains a extended version of git that will be 1578 used to retrieve the computerate specifications in Appendix B. 1580 A.5. Bugs and Workarounds 1582 * Errors in embedded code do not stop the process but replace the 1583 text by the error message, which can be easily overlooked. 1585 * backticks are not escaped in code fragments. 1587 * The current version of Docker in Ubuntu fails, but this can be 1588 fixed with the following commands: 1590 sudo apt-get install containerd.io=1.2.6-3 1591 sudo systemctl restart docker.service 1593 Figure 31 1595 * The Asciidoctor processor does not correctly format the output in 1596 all cases (e.g. "++"). The escaping can be done in Idris until 1597 this is fixed. 1599 * Sometimes the Idris processing fails with an error "Module needs 1600 reloading". Deleting all the files with the ibc extension will 1601 solve that problem. 1603 * Trying to fetch nonexistent new commits on a git repository will 1604 block for 12 seconds. 1606 A.6. TODO List 1608 * Embedded blocks. 1610 * Test on Windows. 1612 * Using recursive modules with Idris. 1614 Appendix B. Computerate Specifications Library 1616 B.1. Installation 1618 The git repositories that compose the Computerate Specification 1619 Library are distributed over a peer-to-peer protocol based on dat. 1621 This requires an extension to git, extension that is already 1622 installed in the Docker image described in Appendix A. The following 1623 command can be used to retrieve a computerate specification: 1625 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1626 computerate/tools git clone --recursive dat:// 1628 Figure 32 1630 Here is the dat public key for a specific computerate 1631 specification and is the recommended name. Do not use the dat 1632 URIs given in Appendix A, as only the dat public keys listed in 1633 Appendix B.2 can be used with a git clone. 1635 Updating the repository also requires using the Docker image: 1637 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1638 computerate/tools git pull 1639 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1640 computerate/tools git submodule update 1642 Figure 33 1644 All the git commands that do not require access to the remote can be 1645 run natively or from the Docker image. 1647 Note that for the computerate specification library the "computerate" 1648 command must be run from the directory that is one level above the 1649 git repository. The name of the root document is always "Main.adoc", 1650 or rarely "Main.lidr": 1652 docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \ 1653 computerate/tools computerate -t ietf -x txt \ 1654 /Main.lidr 1656 Figure 34 1658 B.2. Catalog 1660 For the time being this document will serve as a catalog of available 1661 computerate specifications. 1663 B.2.1. RFC 5234 1665 Name: RFC5234 1666 Public key: 1667 994e52b29a7bf4f7590b0f0369a7d55d29fb22befd065e462b2185a8207e21f1 1669 Figure 35 1671 Appendix C. Extended Registry 1673 When a data layout is typechecked, a Type Provider is used to write 1674 the data layout in the extended registry. 1676 %provide (T : Type) with extendRegistry 1677 "https://www.iana.org/assignments/cose/cose.xhtml#algorithms" 1678 "1" Icmp 1679 Figure 36 1681 The "extendRegistry" function takes 3 parameters: 1683 * The URL or identifier for the registry to extend. 1685 * A unique identifier for the entry to extend. The entry must 1686 exist. 1688 * The data layout Type to bind with that entry. 1690 In the computerate specification that defines a registry, the 1691 following code will automatically generates the corresponding Sum 1692 type: 1694 %provide (MyType : Type) with registry 1695 "https://www.iana.org/assignments/cose/cose.xhtml#algorithms" 1697 Figure 37 1699 That code will download the current content of the registry 1700 identified by the URL passed as parameter, will locate the extended 1701 registry, if any, and will generate the Sum type of type MyType. 1703 The implementations of the "extendRegistry" and "registry" functions 1704 currently only recognize IANA registries. 1706 Appendix D. Errata Statistics 1708 In an effort to quantify the potential benefits of using formal 1709 methods at the IETF, an effort to relabel the Errata database is 1710 under way. 1712 The relabeling uses the following labels: 1714 +----------+----------------------------------------------+ 1715 | Label | Description | 1716 +==========+==============================================+ 1717 | AAD | Error in an ASCII bit diagram | 1718 +----------+----------------------------------------------+ 1719 | ABNF | Error in an ABNF | 1720 +----------+----------------------------------------------+ 1721 | Absent | The errata was probably removed | 1722 +----------+----------------------------------------------+ 1723 | ASN.1 | Error in ASN.1 | 1724 +----------+----------------------------------------------+ 1725 | C | Error in C code | 1726 +----------+----------------------------------------------+ 1727 | Diagram | Error in a generic diagram | 1728 +----------+----------------------------------------------+ 1729 | Example | An example does not match the normative text | 1730 +----------+----------------------------------------------+ 1731 | Formula | Error preventable by using Idris code | 1732 +----------+----------------------------------------------+ 1733 | Ladder | Error in a ladder diagram | 1734 +----------+----------------------------------------------+ 1735 | Rejected | The erratum was rejected | 1736 +----------+----------------------------------------------+ 1737 | Text | Error in the text itself, no remedy | 1738 +----------+----------------------------------------------+ 1739 | TLS | Error in the TLS language | 1740 +----------+----------------------------------------------+ 1742 Table 1 1744 At the time of publication the first 700 errata, which represents 1745 11.88% of the total, have been relabeled. On these, 34 were rejected 1746 and 27 were deleted, leaving 639 valid errata. 1748 +---------+-------+------------+ 1749 | Label | Count | Percentage | 1750 +=========+=======+============+ 1751 | Text | 396 | 61.97% | 1752 +---------+-------+------------+ 1753 | Formula | 66 | 10.32% | 1754 +---------+-------+------------+ 1755 | Example | 64 | 10.0% | 1756 +---------+-------+------------+ 1757 | ABNF | 38 | 5.94% | 1758 +---------+-------+------------+ 1759 | AAD | 32 | 5.00% | 1760 +---------+-------+------------+ 1761 | ASN.1 | 27 | 4.22% | 1762 +---------+-------+------------+ 1763 | C | 9 | 1.40% | 1764 +---------+-------+------------+ 1765 | Diagram | 4 | 0.62% | 1766 +---------+-------+------------+ 1767 | TLS | 2 | 0.31% | 1768 +---------+-------+------------+ 1769 | Ladder | 1 | 0.15% | 1770 +---------+-------+------------+ 1772 Table 2 1774 Note that as the relabeling is done in in order of erratum number, at 1775 this point it covers mostly older RFCs. A change in tooling (e.g. 1776 ABNF verifiers) means that these numbers may drastically change as 1777 more errata are relabeled. But at this point it seems that 38.02% of 1778 errata could have been prevented with a more pervasive use of formal 1779 methods. 1781 Acknowledgements 1783 Thanks to Jim Kleck, Stephane Bryant, Eric Petit-Huguenin, Nicolas 1784 Gironi, Stephen McQuistin, and Greg Skinner for the comments, 1785 suggestions, questions, and testing that helped improve this 1786 document. 1788 Changelog 1790 * draft-petithuguenin-computerate-specifying-03 1792 - Document 1794 o Notes are now correctly displayed. 1796 o Add "Implementations Oriented Standards" section. 1798 o Add "Extended Registries" section and appendix. 1800 o Add paragraph about hierarchical petri nets. 1802 o Convert "Verified Code" section into a top level section, 1803 and expand it. 1805 o Add "Implementation-Oriented Standards" section. 1807 - Tooling 1809 o Many bug fixes in metanorma-ietf. 1811 o Update xml2rfc to 2.40.1. 1813 o Rebuilding text for an RFC with xml2rfc now uses pagination. 1815 o Update metanorma-ietf to version 2.0.5. 1817 o The "computerate" command can directly generate PDF files. 1819 o Add support in xml2rfc for generating PDF files. 1821 o Add asciidoctor-revealjs. 1823 o Update metanorma to version 1.0.0. 1825 o Update metanorma-cli to version 1.2.10.1. 1827 - Library 1829 o No update 1831 * draft-petithuguenin-computerate-specifying-02 1833 - Document 1835 o Switch to rfcxml3. 1837 o Status is now experimental. 1839 o Many nits. 1841 o Fix incorrect errata stats. 1843 o Move acknowledgment section at the end. 1845 o Rewrite the APHD section (formerly known as AAD) to match 1846 draft-mcquistin-augmented-diagrams-01. 1848 o Fix non-ascii characters in the references. 1850 o Intermediate AsciiDoc representation for serializers. 1852 - Tooling 1854 o xmlrfc3 is now the default extension. 1856 o "docName" and "category" attributes are now generated, and 1857 the "prepTime" is removed. 1859 o Update xml2rfc to 2.35.0. 1861 o Remove LanguageTool. 1863 o Update Metanorma to version 0.3.17. 1865 o Update Asciidoctor to 2.0.10. 1867 o Update list of Working Groups. 1869 - Library 1871 o No update. 1873 * draft-petithuguenin-computerate-specifying-01 1875 - Document 1877 o New changelog appendix. 1879 o Fix incorrect reference, formatting in Idris code. 1881 o Add option to remove container in all "docker run" command. 1883 o Add explanations to use the Idris REPL and VIM inside the 1884 Docker image. 1886 o Add placeholders for ASN.1 and RELAX NG languages. 1888 o New Errata appendix. 1890 o Nits. 1892 o Improve Syntax Examples section. 1894 - Tooling 1896 o Update Metanorma to version 0.3.16 1898 o Update MetaNorma-cli to version 1.2.7.1 1900 o Switch to patched version of Idris 1.3.2 that supports 1901 remote REPL in Docker. 1903 o Add VIM and idris-vim extension. 1905 o Remove some debug statements. 1907 - Library 1909 o No update 1911 Author's Address 1913 Marc Petit-Huguenin 1914 Impedance Mismatch LLC 1916 Email: marc@petit-huguenin.org