idnits 2.17.1 

draft-petithuguenin-computerate-specifying-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** There is 1 instance of too long lines in the document, the longest one
     being 4 characters in excess of 72.

  == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document date (17 November 2019) is 1622 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-13) exists of
     draft-mcquistin-augmented-ascii-diagrams-01


     Summary: 3 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                  M. Petit-Huguenin
3	Internet-Draft                                    Impedance Mismatch LLC
4	Intended status: Experimental                           17 November 2019
5	Expires: 20 May 2020

7	                  The Computerate Specifying Paradigm
8	             draft-petithuguenin-computerate-specifying-02

10	Abstract

12	   This document specifies a paradigm named Computerate Specifying,
13	   designed to simultaneously document and formally specify
14	   communication protocols.  This paradigm can be applied to any
15	   document produced by any Standard Developing Organization (SDO), but
16	   this document targets specifically documents produced by the IETF.

18	Status of This Memo

20	   This Internet-Draft is submitted in full conformance with the
21	   provisions of BCP 78 and BCP 79.

23	   Internet-Drafts are working documents of the Internet Engineering
24	   Task Force (IETF).  Note that other groups may also distribute
25	   working documents as Internet-Drafts.  The list of current Internet-
26	   Drafts is at https://datatracker.ietf.org/drafts/current/.

28	   Internet-Drafts are draft documents valid for a maximum of six months
29	   and may be updated, replaced, or obsoleted by other documents at any
30	   time.  It is inappropriate to use Internet-Drafts as reference
31	   material or to cite them other than as "work in progress."

33	   This Internet-Draft will expire on 20 May 2020.

35	Copyright Notice

37	   Copyright (c) 2019 IETF Trust and the persons identified as the
38	   document authors.  All rights reserved.

40	   This document is subject to BCP 78 and the IETF Trust's Legal
41	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
42	   license-info) in effect on the date of publication of this document.
43	   Please review these documents carefully, as they describe your rights
44	   and restrictions with respect to this document.  Code Components
45	   extracted from this document must include Simplified BSD License text
46	   as described in Section 4.e of the Trust Legal Provisions and are
47	   provided without warranty as described in the Simplified BSD License.

49	Table of Contents

51	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
52	   2.  Overview of Operations  . . . . . . . . . . . . . . . . . . .   4
53	     2.1.  Libraries . . . . . . . . . . . . . . . . . . . . . . . .   6
54	     2.2.  Retrofitting Specifications . . . . . . . . . . . . . . .   7
55	     2.3.  Revision of Standards . . . . . . . . . . . . . . . . . .   8
56	     2.4.  Content of a Computerate Specification  . . . . . . . . .   8
57	   3.  Syntax  . . . . . . . . . . . . . . . . . . . . . . . . . . .   8
58	     3.1.  Syntax Examples . . . . . . . . . . . . . . . . . . . . .   9
59	       3.1.1.  Data Type . . . . . . . . . . . . . . . . . . . . . .   9
60	       3.1.2.  Serializer  . . . . . . . . . . . . . . . . . . . . .  10
61	       3.1.3.  Presentation Format . . . . . . . . . . . . . . . . .  10
62	     3.2.  Formal Syntax Language  . . . . . . . . . . . . . . . . .  11
63	       3.2.1.  Augmented BNF (ABNF)  . . . . . . . . . . . . . . . .  11
64	       3.2.2.  Augmented Packet Header Diagrams (APHD) . . . . . . .  12
65	       3.2.3.  Mathematical Formulas . . . . . . . . . . . . . . . .  15
66	       3.2.4.  RELAX NG Format . . . . . . . . . . . . . . . . . . .  15
67	       3.2.5.  ASN.1 . . . . . . . . . . . . . . . . . . . . . . . .  15
68	       3.2.6.  TLS Description Language  . . . . . . . . . . . . . .  15
69	     3.3.  Proofs for Syntax . . . . . . . . . . . . . . . . . . . .  15
70	       3.3.1.  Isomorphism Between Type and Formal Language  . . . .  15
71	       3.3.2.  Data Format Conversion  . . . . . . . . . . . . . . .  18
72	       3.3.3.  Interoperability with Previous Versions . . . . . . .  18
73	       3.3.4.  Postel's Law  . . . . . . . . . . . . . . . . . . . .  19
74	   4.  Semantics . . . . . . . . . . . . . . . . . . . . . . . . . .  21
75	     4.1.  Typed Petri Nets  . . . . . . . . . . . . . . . . . . . .  21
76	     4.2.  Semantics Examples  . . . . . . . . . . . . . . . . . . .  24
77	       4.2.1.  Data Type . . . . . . . . . . . . . . . . . . . . . .  24
78	       4.2.2.  Serializer  . . . . . . . . . . . . . . . . . . . . .  24
79	       4.2.3.  Presentation Format . . . . . . . . . . . . . . . . .  25
80	     4.3.  Formal Semantics Language . . . . . . . . . . . . . . . .  25
81	       4.3.1.  Cosmogol  . . . . . . . . . . . . . . . . . . . . . .  25
82	     4.4.  Proofs for Semantics  . . . . . . . . . . . . . . . . . .  25
83	       4.4.1.  Isomorphism . . . . . . . . . . . . . . . . . . . . .  26
84	       4.4.2.  Postel's Law  . . . . . . . . . . . . . . . . . . . .  26
85	       4.4.3.  Termination . . . . . . . . . . . . . . . . . . . . .  26
86	       4.4.4.  Liveness  . . . . . . . . . . . . . . . . . . . . . .  26
87	       4.4.5.  Verified Code . . . . . . . . . . . . . . . . . . . .  26
88	   5.  Informative References  . . . . . . . . . . . . . . . . . . .  26
89	   Appendix A.  Command Line Tools . . . . . . . . . . . . . . . . .  28
90	     A.1.  Installation  . . . . . . . . . . . . . . . . . . . . . .  28
91	       A.1.1.  Download the Docker Image . . . . . . . . . . . . . .  29
92	     A.2.  Using the "computerate" Command . . . . . . . . . . . . .  29
93	     A.3.  Using the Idris REPL  . . . . . . . . . . . . . . . . . .  30
94	     A.4.  Using Other Commands  . . . . . . . . . . . . . . . . . .  31
95	     A.5.  Bugs and Workarounds  . . . . . . . . . . . . . . . . . .  31
96	     A.6.  TODO List . . . . . . . . . . . . . . . . . . . . . . . .  32

98	   Appendix B.  Computerate Specifications Library . . . . . . . . .  32
99	     B.1.  Installation  . . . . . . . . . . . . . . . . . . . . . .  32
100	     B.2.  Catalog . . . . . . . . . . . . . . . . . . . . . . . . .  33
101	       B.2.1.  RFC5234 . . . . . . . . . . . . . . . . . . . . . . .  33
102	   Appendix C.  Errata Statistics  . . . . . . . . . . . . . . . . .  33
103	   Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . .  35
104	   Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . . .  35
105	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  37

107	1.  Introduction

109	   If, as the unofficial IETF motto states, we believe that "running
110	   code" is an important part of the feedback provided to the
111	   standardization process, then as per the Curry-Howard equivalence
112	   [Curry-Howard] (that states that code and mathematical proofs are the
113	   same), we ought to also believe that "verified proof" is an equally
114	   important part of that feedback.  A verified proof is a mathematical
115	   proof of a logical proposition that was mechanically verified by a
116	   computer, as opposed to just peer-reviewed.

118	   The "Experiences with Protocol Description" paper from Pamela Zave
119	   [Zave] gives three conclusions about the usage of formal
120	   specifications for a protocol standard.  The first conclusion states
121	   that informal methods (i.e. the absence of verified proofs) are
122	   inadequate for widely used protocols.  This document is based on the
123	   assumption that this conclusion is correct, so its validity will not
124	   be discussed further.

126	   The second conclusion states that formal specifications are useful
127	   even if they fall short of the "gold standard" of a complete formal
128	   specification.  We will show that a formal specification can be
129	   incrementally added to a standard.

131	   The third conclusion from Zave's paper states that the normative
132	   English language should be paraphrasing the formal specification.
133	   The difficulty here is that to be able to keep the formal
134	   specification and the normative language synchronized at all times,
135	   these two should be kept as physically close as possible to each
136	   other.

138	   To do that we introduce the concept of "Computerate Specifying" (note
139	   that Computerate is a British English word).  "Computerate
140	   Specifying" is a play on "Literate Computing", itself a play on
141	   "Structured Computing" (see [Knuth92] page 99).  In the same way that
142	   Literate Programming enriches code by interspersing it with its own
143	   documentation, Computerate Specifying enriches a standard
144	   specification by interspersing it with code (or with proofs, as they
145	   are the same thing), making it a computerate specification.

147	   Note that computerate specifying is not specific to the IETF, just
148	   like literate computing is not restricted to the combination of Tex
149	   and Pascal described in Knuth's paper.  What this document describes
150	   is a specific instance of computerate specifying that combines
151	   [AsciiDoc] as formatting language and [Idris] as programming language
152	   with the goal of formally specifying IETF protocols.

154	2.  Overview of Operations

156	   Nowadays specifications at the IETF are written in a format named
157	   xml2rfc v3 [RFC7991] but unfortunately making that format
158	   "Computerable" is not trivial, mostly because there is no simple
159	   solution to mix code and XML together in the same file.  Instead, we
160	   chose the AsciiDoc format as the basis for computerate specifications
161	   as it permits the generation of specifications in the xmlrfc v3
162	   format (among other formats) and also because it can be enriched with
163	   code in the same file.

165	   [I-D.ribose-asciirfc] describes a backend for the [Asciidoctor] tool
166	   that converts an AsciiDoc document into an xmlrfc3 document.  The
167	   AsciiRFC document states various reasons why AsciiDoc is a superior
168	   format for the purpose of writing standards, so we will not discuss
169	   these further.  Note that the same team developed Asciidoctor
170	   backends for other Standards Developing Organizations (SDO)
171	   [Metanorma], making it easy to develop computerate specifications
172	   targeting the standards developed by these SDOs.

174	   The code in a computerate specification uses the programming language
175	   Idris in literate programming [Literate] mode using the Bird-style,
176	   by having each line of code starting with a ">" mark in the first
177	   column.

179	   That same symbol was also used by AsciiDoc as an alternate way of
180	   defining a blockquote [Blockquotes] way which is no longer available
181	   in a computerate specification.  Bird-style code will simply not
182	   appear in the rendered document.

184	   The result of Idris code execution can be inserted inside the
185	   document part by putting that code fragment in the document between
186	   the "{`" string and the "`}" string.

188	   A computerate specification is processed by an Asciidoctor
189	   preprocessor that does the following:

191	   1.  Loads the whole document as an Idris program, including importing
192	       modules.

194	   2.  For each instance of an inline code fragment, evaluates that
195	       fragment and replace it (including the delimiters) by the result
196	       of that evaluation.

198	   3.  Continues with the normal processing of the modified document.

200	   For instance the following computerate specification fragment taken
201	   from the computerate specification of STUNbis

203	   <CODE BEGINS>
204	   > rto : Int
205	   > rto = 500
206	   >
207	   > rc : Nat
208	   > rc = 7
209	   >
210	   > rm : Int
211	   > rm = 16
212	   >
213	   > -- A stream of transmission times
214	   > transmissions : Int -> Int -> Stream Int
215	   > transmissions value rto = value :: transmissions (value + rto)
216	   >   (rto * 2)
217	   >
218	   > -- Returns a specific transmission time
219	   > transmission : Int -> Nat -> Int
220	   > transmission timeout i = index i $ transmissions 0 timeout
221	   >
222	   > a1 : String
223	   > a1 = show rto
224	   >
225	   > a2 : String
226	   > a2 = concat (take (rc - 1) (map (\t => show t ++ " ms, ")
227	   >   (transmissions 0 rto))) ++ "and " ++ show (transmission rto
228	   >     (rc - 1)) ++ " ms"
229	   >
230	   > a3 : String
231	   > a3 = show $ transmission rto (rc - 1) + rto * rm

233	   For example, assuming an RTO of {`a1`}ms, requests would be sent at
234	   times {`a2`}.
235	   If the client has not received a response after {`a3`} ms, the
236	   client will consider the transaction to have timed out.
237	   <CODE ENDS>

239	                                  Figure 1

241	   is rendered as
242	   "                                            For example, assuming an
243	    RTO of 500ms, requests would be sent at times 0 ms, 500 ms, 1500 ms,
244	    3500 ms, 7500 ms, 15500 ms, and 31500 ms.  If the client has not
245	    received a response after 39500 ms, the client will consider the
246	    transaction to have timed out."

248	                                  Figure 2

250	   Appendix A explains how to install the command line tools to process
251	   a computerate specification.

253	   The Idris programming language has been chosen because its type
254	   system supports dependent and linear types, and that type system is
255	   the language in which formal specifications are written.

257	   Following Zave's second conclusion, a computerate specification does
258	   not have to be about just formally specifying a protocol and proving
259	   properties about it.  There is a whole spectrum of formalism that can
260	   be introduced in a specification, and we will present it in the
261	   remaining sections by increasing order of complexity.  Note that
262	   because the formal language is a programming language, these usages
263	   are not exhaustive, and plenty of other usages can and will be found
264	   after the publication of this document.

266	2.1.  Libraries

268	   A computerate specification does not disappear as soon the standard
269	   it describes is published.  Quite the opposite, each specification is
270	   designed to be used as an Idris module that can be imported in
271	   subsequent specifications, reducing over time the amount of code that
272	   needs to be written.  At the difference of an RFC that is immutable
273	   after publication, the code in a specification will be improved over
274	   time, especially as new properties are proved or disproved.  The
275	   latter will happen when a bug is discovered in a specification and a
276	   proof of negation is added to the specification, paving the way to a
277	   revision of the standard.

279	   This document is itself a computerate specification that contains
280	   data types and functions that can be reused in future specifications,
281	   and as a whole can be considered as the standard library for
282	   computerate specifying.

284	   For convenience, each public computerate specification, including the
285	   one behind this document, will be made available as an individual git
286	   repository.  Appendix B explains how to gain access to these
287	   computerate specifications.

289	2.2.  Retrofitting Specifications

291	   RFCs, Internet-Drafts and standard documents published by other SDOs
292	   did not start their life as computerate specifications, so to be able
293	   to use them as Idris modules they will need to be progressively
294	   retrofitted.  This is done by converting the documents into an
295	   AsciiDoc documents and then enriching them with code, in the same way
296	   that would have been done if the standard was developed directly as a
297	   computerate specification.

299	   Converting the whole document in AsciiDoc and enriching it with code,
300	   instead of just maintaining a library of code, seems a waste of
301	   resources.  The reason for doing so is to be able to verify that the
302	   rendered text is equivalent to the original standard, which will
303	   validate the examples and formal languages.

305	   Retrofitted specifications will also be made available as individual
306	   git repositories as they are converted.

308	   Because the IETF Trust does not permit modifying an RFC as a whole
309	   (except for translation purposes), a retrofitted RFC uses
310	   transclusion, a mechanism that includes parts of a separate document
311	   at runtime.  This way, a retrofitted RFC is distributed as two
312	   separate files, the original RFC in text form, and a computerate
313	   specification that contains only code and transclusions.

315	   Transclusion is a special form of AsciiDoc include that takes a range
316	   of lines as parameters:

318	   include::rfc5234.txt[lines=26..35]

320	                                  Figure 3

322	   Here the "include" macro will be replaced by the content of lines 26
323	   to 35 (included) of RFC 5234.

325	   The "sub" parameter permits modifying the copied content according to
326	   a regular expression.  For instance the following converts references
327	   into the AsciiDoc format:

329	   include::rfc5234.txt[lines=121..131,sub="/\[([^\]])\]/<<\1>>/"]

331	                                  Figure 4

333	   In the following example, the text is converted into a note:

335	   include::rfc5234.txt[lines=151,sub="/^.*$/NOTE: \0/"]
336	                                  Figure 5

338	2.3.  Revision of Standards

340	   Standards evolve, but because RFCs are immutable, revisions for a
341	   standard are done by publishing new RFCs.

343	   The matching computerate specifications need to reflect that
344	   relationship by extending the data type of syntax and semantics in
345	   the new version, instead of recreating new data types from scratch.
346	   There are two diametrically opposed directions when extending a type:

348	   *  The new standard is adding constraints.  This is done by indexing
349	      the new type over the old type.

351	   *  The new standard is removing constraints.  This is done by
352	      defining the new type as a sum type, with one of the alternatives
353	      being the old type.

355	   // This is correct in theory, but in practice creating new //
356	   specifications from old ones as described above is not very //
357	   convenient.  Maybe an alternate solution is to define the new //
358	   specifications from scratch, and use an isomorphism proof to //
359	   precisely define the differences between the two.  An Idris //
360	   elaboration script may permit duplicating a type and modifying it //
361	   without having to manually copy it.

363	2.4.  Content of a Computerate Specification

365	   Communication protocol specifications are generally split in two
366	   distinct parts, syntax (the data layout of the messages exchanged)
367	   and semantics (the rules that drive the exchange of messages).

369	   Section 3 will discuss in detail the application of computerate
370	   specifying to syntax descriptions, and Section 4 will be about
371	   specifying semantics.

373	3.  Syntax

375	   The syntax of a communication protocol determines how data is laid
376	   out before it is sent over a communication link.  Generally the
377	   syntax is described only in the context of the layer that this
378	   particular protocol is operating at, e.g. an application protocol
379	   syntax only describes the data as sent over UDP or TCP, not over
380	   Ethernet or Wi-Fi.

382	   Syntaxes can generally be split into two broad categories, binary and
383	   text, and generally a protocol syntax falls completely into one of
384	   these two categories.

386	   Syntax descriptions can be formalized for at least three reasons that
387	   will be presented in the following sections.

389	3.1.  Syntax Examples

391	   Examples in protocol documentation are frequently incorrect, which
392	   should not be that much of an issue but for the fact that most
393	   developers do not read the normative text when an example is
394	   available.  See Appendix C for statistics about the frequency of
395	   incorrect examples in RFC errata.

397	   Moving the examples into appendices or adding caution notices may
398	   show limited success in preventing that problem.

400	   So ensuring that examples match the normative text seems like a good
401	   starting point for a computerate specification.  This is done by
402	   having the possibility of adding the result of a computation directly
403	   inside the document.  If that computation is done from a type that is
404	   (physically and conceptually) close to the normative text, then we
405	   gain some level of assurance that both the normative text and the
406	   derived examples will match.  Note that examples can be inserted in
407	   the document as whole blocks of text, or as inline text.

409	   Appendix B.2.1 showcases the conversion of the examples in [RFC5234].

411	3.1.1.  Data Type

413	   The first step is to define an Idris type that completely defines the
414	   layout of the messages exchanged.  By "completely define" we mean
415	   that the type checker will prevent creating any invalid value of this
416	   type.  That ensures that all values are correct by construction.

418	   E.g. here is the definition of a DNS label per [RFC1034]:

420	   <CODE STARTS>
421	   > data PartialLabel' : List Char -> Type where
422	   >   Empty : PartialLabel' []
423	   >   More : (c : Char) -> (prf1 : isAlphaNum c || c == '-' = True) ->
424	   >     PartialLabel' s -> (prf2 : length s < 61 = True) ->
425	   >     PartialLabel' (c :: s)
426	   >
427	   > data Label' : List Char -> Type where
428	   >   One : (c : Char) -> (prf1 : isAlpha c = True) -> Label' [c]
429	   >   Many : (begin : Char) -> (prf1 : isAlpha begin = True) ->
430	   >     (middle : PartialLabel' xs) ->
431	   >     (end : Char) -> (prf2 : isAlphaNum end = True) ->
432	   >     Label' ([begin] ++ xs ++ [end])
433	   >
434	   > data Label : {a : Type} -> a -> Type where
435	   >   MkLabel : {xs : String} -> Label' (unpack xs) -> Label xs
436	   <CODE ENDS>

438	                                  Figure 6

440	3.1.2.  Serializer

442	   The second step is writing a serializer from that type into the wire
443	   representation.  For a text format, it is done by implementing the
444	   Show interface:

446	   <CODE STARTS>
447	   > Show (Label xs) where
448	   >   show _ = xs
449	   <CODE ENDS>

451	                                  Figure 7

453	   // Define binary serializer.

455	3.1.3.  Presentation Format

457	   Instead of directly generating character strings, the serializer will
458	   use as target a dependent type that formalizes the AsciiDoc language.
459	   This will permit to know the context in which a character string will
460	   be subsequently generated and to correctly escape special characters
461	   in that string.

463	   Using an intermediary type will also permit to correctly generate
464	   AsciiDoc that can generate an xmlrfc 3 file that supports both text
465	   and graphical versions of a figure.  This will be done by having
466	   AsciiDoc blocks converted into <artwork> elements that contains both
467	   the 72 column formatted text and an equivalent SVG file, even for
468	   code source (instead of using the <sourcecode> element).

470	3.2.  Formal Syntax Language

472	   Some specifications use a formal language to describe the data
473	   layout.  One shared property of these languages is that they cannot
474	   always formalize all the constraints of a specific data layout, so
475	   they have to be enriched with comments.  One consequence of this is
476	   that they cannot be used as a replacement for the Idris data type
477	   described in Section 3.1.1, data type that is purposely complete.

479	   The following sections describe how these formal languages have been
480	   or will be themselves formalized with the goal of using them in
481	   computerate specifications.

483	3.2.1.  Augmented BNF (ABNF)

485	   Augmented Backus-Naur Form [RFC5234] (ABNF) is a formal language used
486	   to describe a text based data layout.

488	   The [RFC5234] document has been retrofitted as a computerate
489	   specification to provide an internal Domain Specific Language (DSL)
490	   that permits specifying an ABNF for a specification.  The encoding of
491	   an example from Section 2.3 of [RFC5234] looks like this:

493	   <CODE BEGINS>
494	   > rulename : Rule
495	   > rulename = "rulename" `Eq` (Concat (TermDec 97 []) (TermDec 98 [])
496	   >   [TermDec 99 []])
497	   <CODE ENDS>

499	                                  Figure 8

501	   A serializer, also defined in the same specification, permits
502	   converting that description into proper ABNF text that can be
503	   inserted into the document such as in the following fragment:

505	   <CODE BEGINS>
506	   [source,abnf]
507	   ----
508	   {`show rulename`}
509	   ----
510	   <CODE ENDS>

512	                                  Figure 9

514	   is rendered as
515	   rulename = %d97 %d98 %d99

517	                                 Figure 10

519	   See Appendix B.2.1 for access to the source of the retrofitted
520	   specification for [RFC5234].

522	3.2.2.  Augmented Packet Header Diagrams (APHD)

524	   Augmented Packet Header Diagram
525	   [I-D.mcquistin-augmented-ascii-diagrams] (APHD) is a formal language
526	   used to describe a binary data layout in a machine-readable format.

528	   The [I-D.mcquistin-augmented-ascii-diagrams] document will be
529	   retrofitted as a computerate specification to provide an internal
530	   Domain Specific Language (DSL) that permits specifying an APHD for a
531	   specification.  The partial encoding of an example from section 4.1
532	   looks like this:

534	   <CODE BEGINS>
535	   > ipv4 : Aphd
536	   > ipv4 = MkAphd "IPv4 Header" [
537	   >   MkField "Version" (Just "V") (Number 4) Bits "This is a" ++
538	   >     " fixed-width field, whose full label is shown in the" ++
539	   >     " diagram.  The field's width -- 4 bits -- is given in" ++
540	   >     " the label of the description list, separated from the" ++
541	   >     " field's label by a colon.",
542	   > ...
543	   >   MkField "Options" Nothing (Mult (Sub (Name "IHL") (Number 5))
544	   >     (Number 32)) Bits "This is a variable-length field, whose" ++
545	   >     " length is defined by the value of the field with short" ++
546	   >     " label IHL (Internet Header Length).  Constraint" ++
547	   >     " expressions can be used in place of constant values: the" ++
548	   >     " grammar for the expression language is defined in" ++
549	   >     " Appendix A.1.  Constraints can include a previously" ++
550	   >     " defined field's short or full label, where one has been" ++
551	   >     " defined.  Short variable-length fields are indicated by" ++
552	   >     " \"...\" instead of a pipe at the end of the row."
553	   > ...
554	   > ]
555	   <CODE ENDS>

557	                                 Figure 11

559	   A serializer, also defined in the same specification, permits
560	   converting that description into proper ABNF text that can be
561	   inserted into the document such as in the following fragment:

563	   <CODE BEGINS>
564	    ....
565	    {`show ipv4`}
566	    ....
567	   <CODE ENDS>

569	                                 Figure 12

571	   is rendered as
572	   An IPv4 Header is formatted as follows:

574	    0                   1                   2                   3
575	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
576	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
577	   |Version|   IHL |    DSCP   |ECN|         Total Length          |
578	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
579	   |         Identification        |Flags|     Fragment Offset     |
580	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
581	   | Time to Live  |    Protocol   |        Header Checksum        |
582	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
583	   |                         Source Address                        |
584	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
585	   |                      Destination Address                      |
586	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
587	   |                            Options                          ...
588	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
589	   |                                                               :
590	   :                            Payload                            :
591	   :                                                               |
592	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

594	   where:

596	   Version (V): 4 bits.  This is a fixed-width field, whose full label
597	     is shown in the diagram.  The field's width -- 4 bits -- is given
598	     in the label of the description list, separated from the field's
599	     label by a colon.

601	   ...

603	   Options: (IHL-5)*32 bits.  This is a variable-length field, whose
604	     length is defined by the value of the field with short label IHL
605	     (Internet Header Length).  Constraint expressions can be used in
606	     place of constant values: the grammar for the expression language
607	     is defined in Appendix A.1.  Constraints can include a previously
608	     defined field's short or full label, where one has been defined.
609	     Short variable-length fields are indicated by "..." instead of a
610	     pipe at the end of the row.

612	    ...

614	                                 Figure 13

616	   Here the serializer generates an instance of the intermediary
617	   AsciiDoc type that describes the header line (which can be
618	   concatenated to previous lines), the block containing the diagram
619	   itself, and then a list of all the field definitions.

621	3.2.3.  Mathematical Formulas

623	   AsciiDoc supports writing equations using either asciimath or
624	   latexmath.  The rendering for RFCs will generate an artwork element
625	   that contains both the text version of the equation and a graphical
626	   version in an SVG file.// Not sure what to do with inline formulas,
627	   as we cannot generate an // artwork element in that case.

629	   An Idris type will be used to described equations at the type level.
630	   An interpreter will be used to calculate and insert examples in the
631	   document.

633	   A serializer will be used to generate the asciimath code that is
634	   inserted inside a stem block.

636	3.2.4.  RELAX NG Format

638	   TBD

640	3.2.5.  ASN.1

642	   TBD

644	3.2.6.  TLS Description Language

646	   TBD

648	3.3.  Proofs for Syntax

650	   The kind of proofs that one would want in a specification are related
651	   to isomorphism, i.e. a guarantee that two or more descriptions of a
652	   data layout contain exactly the same information.

654	3.3.1.  Isomorphism Between Type and Formal Language

656	   We saw above that when a data layout is described with a formal
657	   language, we end up with two descriptions of that data layout, one
658	   using the Idris dependent type (and used to generate examples) and
659	   one using the formal language.

661	   Proving isomorphism requires generating an Idris type from the formal
662	   language instance, which is done using an Idris elaborator script.

664	   In Idris, Elaborator Reflection [Elab] is a metaprogramming facility
665	   that permits writing code generating type declarations and code
666	   (including proofs) automatically.

668	   For instance the ABNF language is itself defined using ABNF, so after
669	   converting that ABNF into an instance of the Syntax type (which is an
670	   holder for a list of instances of the Rule type), it is possible to
671	   generate a suite of types that represents the same language:

673	   <CODE BEGINS>
674	   > abnf : Syntax
675	   > abnf = MkSyntax [
676	   >   "rulelist" `Eq` (Repeat (Just 1) Nothing (Group (Altern
677	   >     (TermName "rule") (Group (Concat (Repeat Nothing Nothing
678	   >     (TermName "c-wsp")) (TermName "c-nl") [])) []))),
679	   >     ...
680	   >   ]
681	   >
682	   > %runElab (generateType "Abnf" abnf)
683	   <CODE ENDS>

685	                                 Figure 14

687	   The result of the elaboration can then be used to construct a value
688	   of type Iso, which requires four total functions, two for the
689	   conversion between types, and another two to prove that sequencing
690	   the conversions results in the same original value.

692	   The following example generates an Idris type "SessionDescription"
693	   from the SDP ABNF.  It then proves that this type and the Sdp type
694	   contain exactly the same information (the proofs themselves have been
695	   removed, leaving only the propositions):

697	   <CODE BEGINS>
698	   > import Data.Control.Isomorphism
699	   >
700	   > sdp : Syntax
701	   > sdp = MkSyntax [
702	   >   "session-description" `Eq` (Concat (TermName "version-field")
703	   >     (TermName "origin-field") [
704	   >       TermName "session-name-field",
705	   >       Optional (TermName "information-field"),
706	   >       Optional (TermName "uri-field"),
707	   >       Repeat Nothing Nothing (TermName "email-field"),
708	   >       Repeat Nothing Nothing (TermName "phone-field"),
709	   >       Optional (TermName "connection-field"),
710	   >       Repeat Nothing Nothing (TermName "bandwidth-field"),
711	   >       Repeat (Just 1) Nothing (TermName "time-description"),
712	   >       Optional (TermName "key-field"),
713	   >       Repeat Nothing Nothing (TermName "attribute-field"),
714	   >       Repeat Nothing Nothing (TermName "media-description")
715	   >       ]),
716	   >   ...
717	   >   ]
718	   >
719	   > %runElab (generateType "Sdp" sdp)
720	   >
721	   > same : Iso Sdp SessionDescription
722	   > same = MkIso to from toFrom fromTo
723	   >   where
724	   >     to : Sdp -> SessionDescription
725	   >
726	   >     from : SessionDescription -> Abnf
727	   >
728	   >     toFrom : (x : SessionDescription ) -> to (from x) = x
729	   >
730	   >     fromTo : (x : Sdp) -> from (to x) = x
731	   >
732	   <CODE ENDS>

734	                                 Figure 15

736	   As stated in Section 3.2, the Idris type and the type generated from
737	   the formal language are not always isomorphic, because some
738	   constraints cannot be expressed in that formal language.  In that
739	   case isomorphism can be used to precisely define what is missing
740	   information in the formal language type.  To do so, the generated
741	   type is augmented with a delta type, like so:

743	   <CODE BEGINS>
744	   > data DeltaSessionDescription : Type where
745	   >   ...
746	   >
747	   > same : Iso Sdp (SessionDescription, DeltaSessionDescription)
748	   >   ...
749	   <CODE ENDS>

751	                                 Figure 16

753	   Then the DeltaSessionDescription type can be modified to include the
754	   missing information until the same function type checks.  After this
755	   we have a guarantee that we know all about the constraints that
756	   cannot be encoded in that formal language, and can check manually
757	   that each of them is described as comment.

759	   Idris elaborator scripts will be developped for each formal
760	   languages.

762	3.3.2.  Data Format Conversion

764	   For specifications that describe a conversion between different data
765	   layouts, having a proof that guarantees that no information is lost
766	   in the process can be beneficial.  For instance, we observe that
767	   syntax encoding tends to be replaced each ten years or so by
768	   something "better".  Here again isomorphism can tell us exactly what
769	   kind of information we lost and gained during that replacement.

771	   Here, for example, the definition of a function that would verify an
772	   isomorphism between an XML format and a JSON format:

774	   <CODE BEGINS>
775	   > isXmlAndJsonSame: Iso (XML, DeltaXML) (JSON, DeltaJson)
776	   >   ...
777	   <CODE ENDS>

779	                                 Figure 17

781	   Here DeltaXML expresses what is gained by switching from XML to JSON,
782	   and DeltaJson expresses what is lost.

784	3.3.3.  Interoperability with Previous Versions

786	   The syntax of the data layout may be modified as part of the
787	   evolution of a standard.  In most case a version number prevents old
788	   format to be used with the new format, but in cases where that it is
789	   not possible, the new specification can ensure that both formats can
790	   co-exist by using the same techniques as above.

792	   Conversely these techniques can be used during the design phase of a
793	   new version of a format, to check if a new version number is
794	   warranted.

796	3.3.4.  Postel's Law

798	   |  Be conservative in what you do, be liberal in what you accept from
799	   |  others.
800	   |
801	   |  -- Jon Postel

803	   One of the downsides of formal specifications is that there is no
804	   wiggle room possible when implementing it.  An implementation either
805	   conforms to the specification or does not.

807	   One analogy would be specifying a pair of gears.  If one decides to
808	   have both of them made with tolerances that are too small, then it is
809	   very likely that they will not be able to move when put together.  A
810	   bit of slack is needed to get the gear smoothly working together but
811	   more importantly the cost of making these gears is directly
812	   proportional to their tolerance.  There is an inflexion point where
813	   the cost of an high precision gear outweighs its purpose.

815	   We have a similar issue when implementing a formal specification,
816	   where having an absolutely conformant implementation may cost more
817	   money than it is worth spending.  On the other hand a specification
818	   exists for the purpose of interoperability, so we need some
819	   guidelines on what to ignore in a formal specification to make it
820	   cost effective.

822	   Postel's law proposes an informal way of defining that wiggle room by
823	   actually having two different specifications, one that defines a data
824	   layout for the purpose of sending it, and another one that defines a
825	   data layout for the purpose of receiving that data layout.

827	   Existing specifications express that dichotomy in the form of the
828	   usage of SHOULD/SHOULD NOT/RECOMMENDED/NOT RECOMMENDED [RFC2119]
829	   keywords.  For example the SDP spec says that "[t]he sequence CRLF
830	   (0x0d0a) is used to end a line, although parsers SHOULD be tolerant
831	   and also accept lines terminated with a single newline character."
832	   This directly infers two specifications, one used to define an SDP
833	   when sending it, that enforces using only CRLF, and a second
834	   specification, used to define an SDP when receiving it (or parsing
835	   it), that accepts both CRLF and LF.

837	   Note that the converse is not necessarily true, i.e. not all usages
838	   of these keywords are related to Postel's Law.

840	   To ensure that the differences between the sending specification and
841	   the receiving specification do not create interoperability problems,
842	   we can use a variant of isomorphism, as shown in the following
843	   example (data constructors and code elided):

845	   <CODE BEGINS>
846	   > data Sending : Type where
847	   >
848	   > data Receiving : Type where
849	   >
850	   > to : Sending -> List Receiving
851	   >
852	   > from : Receiving -> Sending
853	   >
854	   > toFrom : (y : Receiving) -> Elem y (to (from y))
855	   >
856	   > fromTo : (y : Sending) -> True = all (== y) [from x | x <- to y]
857	   >
858	   <CODE ENDS>

860	                                 Figure 18

862	   Here we define two data types, one that describes the data layout
863	   that is permitted to be sent (Sending) and one that describes the
864	   data layout that is permitted to be received (Receiving).  For each
865	   data layout that is possible to send, there is one or more matching
866	   receiving data layouts.  This is expressed by the function "to" that
867	   takes as input one Sending value and returns a list of Receiving
868	   values.

870	   Conversely, the "from" function maps a Receiving data layout onto a
871	   Sending data layout.  Note the asymmetry there, which prevents to use
872	   a standard proof of isomorphism.

874	   Then the "toFrom" and "fromTo" proofs verify that there is no
875	   interoperability issue by guaranteeing that each Receiving value maps
876	   to one and only one Sending instance and that this mapping is
877	   isomorphic.

879	   All of this will provide a clear guidance of when and where to use a
880	   SHOULD keyword or its variants, without loss of interoperability.

882	   As an trivial example, the following proves that accepting LF
883	   characters in addition to CRLF characters as end of line markers does
884	   not break interoperability:

886	   <CODE BEGINS>
887	   > data Sending : Type where
888	   >   S_CRLF : Sending
889	   >
890	   > Eq Sending where
891	   >   (==) S_CRLF S_CRLF = True
892	   >
893	   > data Receiving : Type where
894	   >   R_CRLF : Receiving
895	   >   R_LF : Receiving
896	   >
897	   > to : Sending -> List Receiving
898	   > to S_CRLF = [R_CRLF, R_LF]
899	   >
900	   > from : Receiving -> Sending
901	   > from R_CRLF = S_CRLF
902	   > from R_LF = S_CRLF
903	   >
904	   > toFrom : (y : Receiving) -> Elem y (to (from y))
905	   > toFrom R_CRLF = Here
906	   > toFrom R_LF = There Here
907	   >
908	   > fromTo : (y : Sending) -> True = all (== y) [from x | x <- to y]
909	   > fromTo S_CRLF = Refl
910	   <CODE ENDS>

912	                                 Figure 19

914	4.  Semantics

916	   The semantics of a communication protocol determine what messages are
917	   exchanged over a communication link and the relationship between
918	   them.  The semantics are generally described only in the context of
919	   the layer that this particular protocol is operating at.

921	4.1.  Typed Petri Nets

923	   The semantics of a specification require to define an Idris type that
924	   strictly enforces these semantics.  This can be done in an ad hoc way
925	   [Type-Driven], particularly by using linear types that express
926	   resources' consumption.

928	   But a better solution is to design these graphically, particularly by
929	   using Petri Nets.  This specification defines a DSL that permits
930	   describing a Typed Petri Net (TPN) which is heavily influenced by
931	   Coloured Petri Nets [CPN] (CPN).  A CPN adds some restriction on the
932	   types that can be used in a Petri Net because of limitations in the
933	   underlying programming language, SML.  The underlying programming
934	   used in TPN, Idris, does not have these limitations, so any well-
935	   formed Idris type (including polymorphic, linear and dependent types)
936	   can be directly used in TPN.// A graphical editor for TPN is planned
937	   as part of the integration // tooling.  The graphical tool will use
938	   the document directly as // storage.

940	   Here's an example of TPN (from figure 2.10 in [CPN]):

942	   <CODE BEGINS>
943	   > NO : Type
944	   > NO = Int
945	   >
946	   > DATA : Type
947	   > DATA = String
948	   >
949	   > NOxDATA : Type
950	   > NOxDATA = (NO, DATA)
951	   >
952	   > PTS : Place
953	   > PTS = MkPlace "Packets To Send" NOxDATA (\() => [(1, "COL"),
954	   >   (2, "OUR"), (3, "ED "), (4, "PET"), (5, "RI "), (6, "NET")])
955	   >
956	   > NS : Place
957	   > NS = MkPlace "NextSend" NO (\() => [1])
958	   >
959	   > A : Place
960	   > A = MkPlace "A" NOxDATA (\() => [])
961	   >
962	   > input1 : Input
963	   > input1 = MkInput PTS (NO, DATA) pure
964	   >
965	   > input2 : Input
966	   > input2 = MkInput NS NO pure
967	   >
968	   > output1 : Output
969	   > output1 = MkOutput PTS (NO, DATA) pure
970	   >
971	   > output2 : Output
972	   > output2 = MkOutput NS NO pure
973	   >
974	   > output3 : Output
975	   > output3 = MkOutput A (NO, DATA) pure
976	   >
977	   > sendPacket : Transition
978	   > sendPacket = MkTransition [input1, input2] [output1, output2,
979	   >   output3] (\((n, d), n') => if n == n'
980	   >                              then pure ((n, d), n, (n, d))
981	   >                              else empty)
982	   <CODE ENDS>

984	                                 Figure 20

986	   // The DSL is being currently designed, so the example shows the //
987	   generated value.

989	   From there it is easy to generate (using the non-deterministic monad
990	   in Idris) an interpreter for debugging and simulation purposes:

992	   <CODE BEGINS>
993	   > interpret : MS NOxDATA -> MS NO -> MS NOxDATA ->
994	   >   ND (MS NOxDATA, MS NO, MS NOxDATA)
995	   > interpret pts ns a = do
996	   >   (pts1, pts2) <- sel pts
997	   >   (ns1, ns2) <- sel ns
998	   >   i1 <- input' input1 pts1
999	   >   i2 <- input' input2 ns1
1000	   >   (pts3, ns3, a3) <- transition' sendPacket (i1, i2)
1001	   >   let o1 = output' output1 pts3
1002	   >   let o2 = output' output2 ns3
1003	   >   let o3 = output' output3 a3
1004	   >   pure (o1 ++ pts2, o2 ++ ns2, o3 ++ a)
1005	   <CODE ENDS>

1007	                                 Figure 21

1009	   // Replace by the generic variant of the interpreter.

1011	   A Petri Net has the advantage that the same graph can be reused to
1012	   derive other Petri Nets, e.g., Timed Petri Nets (that can be used to
1013	   collect performance metrics) or Stochastic Petri Nets.// The
1014	   traditional way of verifying a Petri Net is by using model //
1015	   checking.  There is nothing in the design that prevents doing //
1016	   that, but because that takes quite some time to run and so cannot //
1017	   be part of the document processing, how do we store in the //
1018	   document a proof that the model checking was successful?

1020	4.2.  Semantics Examples

1022	   Semantics examples can be wrong, so it is useful to be sure that they
1023	   match the specification.

1025	4.2.1.  Data Type

1027	   As explained above, semantics can be described in an ad hoc manner,
1028	   or using the TPN DSL.

1030	4.2.2.  Serializer

1032	   At the difference of syntax, where there are more or less as many
1033	   ways to display them than there are syntaxes, semantics examples
1034	   generally use sequence diagrams, eventually augmented with the
1035	   content of the packets exchanged (and so using the techniques
1036	   described in Section 3.1).

1038	   Similarly to what is done in Section 3.2.2, an Asciidoctor block
1039	   processor similar to the "msc" type of diagram used by the
1040	   asciidoctor-diagram extension will be designed.// We unfortunately
1041	   cannot reuse the asciidoctor-diagram extension // because it cannot
1042	   generate both text and SVG versions of a // sequence diagram.

1044	   The serializer for an example derived from a TPN generates the
1045	   content of the msc AsciiDoc block, by selecting one particular path
1046	   and its associated bindings through the Petri Net.// We probably want
1047	   to use AsciiDoc callouts for these, although that // would require a
1048	   modification in AsciiRfc.  In fact callout would // be a far better
1049	   technique for other diagrams, like AAD, as it will // let the
1050	   renderer take care of the best way to place elements // depending on
1051	   the output format.

1053	4.2.3.  Presentation Format

1055	   TBD.

1057	4.3.  Formal Semantics Language

1059	   Some specifications use a formal language to describe the state
1060	   machines.  One shared property of these languages is that they cannot
1061	   always formalize all the constraints of specific semantics, so they
1062	   have to be enriched with comments.  One consequence of this is that
1063	   they cannot be used as a replacement for the Idris data type
1064	   described in Section 4.1, a data type that is purposely complete.

1066	4.3.1.  Cosmogol

1068	   Cosmogol [I-D.bortzmeyer-language-state-machines] is a formal
1069	   language designed to define states machines.  The Internet-Draft will
1070	   be retrofitted as a computerate specification to provide an internal
1071	   Domain Specific Language (DSL) that permits specifying an instance of
1072	   that language.  A serializer and elaborator script will also be
1073	   defined.

1075	   Finally, an Asciidoctor block processor would be used to convert the
1076	   language into both a text and a graphical view of the state
1077	   machine.// Add examples there.

1079	4.4.  Proofs for Semantics

1081	   Like for syntax formal languages, an elaborator script permits
1082	   generating a type from a TPN instance.  That type can then be used to
1083	   write proofs of the properties that we expect from the semantics.

1085	4.4.1.  Isomorphism

1087	   An isomorphism proof can be used between two types derived from the
1088	   semantics of a specification, for example to prove that no
1089	   information is lost in the converting between the underlying
1090	   processes, or when upgrading a process.

1092	   An example of that would be to prove (or more likely disprove) that
1093	   the SIP state machines are isomorphic to the WebRTC state machines.

1095	4.4.2.  Postel's Law

1097	   Like for the syntax, semantics can introduce wiggle room between the
1098	   state machines on the sending side and the state machines on the
1099	   receiving side.  A similar isomorphism proof can be used to ensure
1100	   that this is done without loss of interoperability.

1102	4.4.3.  Termination

1104	   The TPN type can be used to verify that the protocol actually
1105	   terminates, or that it always returns to its initial state.  This is
1106	   equivalent to proving that a program terminates.

1108	4.4.4.  Liveness

1110	   The TPN type can be used to verify that the protocol is productive,
1111	   i.e. that it does not loop without making progress.

1113	4.4.5.  Verified Code

1115	   A TPN that covers a whole protocol (i.e. client, network, and server)
1116	   is useful to prove the properties listed in the previous sections.
1117	   But the TPN is also designed in a way that each of these parts can be
1118	   defined separately from the others (making it a Hierarchical TPN).
1119	   This permits using the type generated from these (through an
1120	   elaborator script) as a type for real code, and thus verifying that
1121	   this code conforms to both the syntax and the semantics
1122	   specifications.

1124	5.  Informative References

1126	   [RFC7991]  Hoffman, P., "The "xml2rfc" Version 3 Vocabulary",
1127	              RFC 7991, DOI 10.17487/RFC7991, December 2016,
1128	              <https://www.rfc-editor.org/info/rfc7991>.

1130	   [RFC5234]  Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
1131	              Specifications: ABNF", STD 68, RFC 5234,
1132	              DOI 10.17487/RFC5234, January 2008,
1133	              <https://www.rfc-editor.org/info/rfc5234>.

1135	   [I-D.bortzmeyer-language-state-machines]
1136	              Bortzmeyer, S., "Cosmogol: a language to describe finite
1137	              state machines", Work in Progress, Internet-Draft, draft-
1138	              bortzmeyer-language-state-machines-01, 11 November 2006,
1139	              <https://tools.ietf.org/html/draft-bortzmeyer-language-
1140	              state-machines-01>.

1142	   [Curry-Howard]
1143	              "Curry-Howard correspondence",
1144	              <https://en.wikipedia.org/wiki/Curry-
1145	              Howard_correspondence>.

1147	   [I-D.mcquistin-augmented-ascii-diagrams]
1148	              McQuistin, S., Band, V., and C. Perkins, "Describing
1149	              Protocol Data Units with Augmented Packet Header
1150	              Diagrams", Work in Progress, Internet-Draft, draft-
1151	              mcquistin-augmented-ascii-diagrams-01, 3 November 2019,
1152	              <https://tools.ietf.org/html/draft-mcquistin-augmented-
1153	              ascii-diagrams-01>.

1155	   [RFC1034]  Mockapetris, P., "Domain names - concepts and facilities",
1156	              STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987,
1157	              <https://www.rfc-editor.org/info/rfc1034>.

1159	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1160	              Requirement Levels", BCP 14, RFC 2119,
1161	              DOI 10.17487/RFC2119, March 1997,
1162	              <https://www.rfc-editor.org/info/rfc2119>.

1164	   [I-D.ribose-asciirfc]
1165	              Tse, R., Nicholas, N., and P. Brasolin, "AsciiRFC:
1166	              Authoring Internet-Drafts And RFCs Using AsciiDoc", Work
1167	              in Progress, Internet-Draft, draft-ribose-asciirfc-08, 17
1168	              April 2018,
1169	              <https://tools.ietf.org/html/draft-ribose-asciirfc-08>.

1171	   [Zave]     Zave, P., "Experiences with Protocol Description",
1172	              Rigorous Protocol Engineering (WRiPE'11), October 2011,
1173	              <http://www.pamelazave.com/wripe.pdf>.

1175	   [Knuth92]  Knuth, D., "Literate Programming", Center for the Study of
1176	              Language and Information, 1992.

1178	   [Elab]     Christiansen, D. and E.C. Brady, "Elaborator reflection:
1179	              extending Idris in Idris", In Proceedings of the 21st ACM
1180	              SIGPLAN International Conference on Functional
1181	              Programming. ACM Press-Association for Computing
1182	              Machinery, 2016.

1184	   [AsciiDoc] "AsciiDoc", <https://en.wikipedia.org/wiki/AsciiDoc>.

1186	   [Asciidoctor]
1187	              "Asciidoctor",
1188	              <https://asciidoctor.org/docs/user-manual/>.

1190	   [Metanorma]
1191	              "Metanorma", <https://www.metanorma.com/>.

1193	   [Idris]    "Idris: A Language with Dependent Types",
1194	              <https://www.idris-lang.org/>.

1196	   [Literate] "Literate programming", <http://docs.idris-
1197	              lang.org/en/latest/tutorial/miscellany.html#literate-
1198	              programming>.

1200	   [Blockquotes]
1201	              "Markdown-style blockquotes",
1202	              <https://asciidoctor.org/docs/user-manual/#markdown-style-
1203	              blockquotes>.

1205	   [CPN]      Jensen, K. and L. M. Kristensen, "Coloured Petri Nets:
1206	              Modelling and Validation of Concurrent Systems", Springer,
1207	              2009.

1209	   [Type-Driven]
1210	              Brady, E., "Type-Driven Development with Idris", Manning,
1211	              2017.

1213	Appendix A.  Command Line Tools

1215	A.1.  Installation

1217	   The computerate command line tools are run inside a Docker image, so
1218	   the first step is to install the Docker software or verify that it is
1219	   up to date (https://docs.docker.com/install/).

1221	   Note that for the usage described in this document there is no need
1222	   for Docker EE or for having a Docker account.

1224	   The following instructions assume a Unix based OS, i.e. Linux or
1225	   MacOS.  Lines separated by a "\" character are meant to be executed
1226	   as one single line, with the "\" character removed.

1228	A.1.1.  Download the Docker Image

1230	   To install the computerate tools, the fastest way is to download and
1231	   install the Docker image using a temporary image containing the dat
1232	   tool:

1234	   docker pull veggiemonk/dat-docker
1235	   mkdir computerate
1236	   cd computerate
1237	   docker run --rm -u $(id -u):$(id -g) -v \
1238	    $(pwd):/tools veggiemonk/dat-docker dat clone \
1239	    dat://6a33cb289d5818e3709f62e95df41be537edba5f4dc26593e2cb870c7982345b \
1240	    tools

1242	                                 Figure 22

1244	   After this, the image can be loaded in Docker.  The newly installed
1245	   Docker image also contains the dat command, so there is no need to
1246	   keep the veggiemonk/dat-docker image after this:

1248	   docker load -i tools.tar.xz
1249	   docker image rm --force veggiemonk/dat-docker

1251	                                 Figure 23

1253	   A new version of the tools is released at the same time a new version
1254	   of this document is released.  After this, running the following
1255	   command in the computerate directory will pull any new version of the
1256	   tool tar file:

1258	   docker run --rm -u $(id -u):$(id -g) \
1259	    -v $(pwd):/computerate computerate/tools dat pull --exit

1261	                                 Figure 24

1263	   The docker image can then be loaded as above.

1265	A.2.  Using the "computerate" Command

1267	   The Docker image main command is "computerate", which takes the same
1268	   parameters as the "metanorma" command from the Metanorma tooling:

1270	   docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \
1271	     computerate/tools computerate -t ietf -x txt <file>

1273	                                 Figure 25

1275	   The differences with the "metanorma" command are:

1277	   *  The "computerate" command can process Literate Idris files (files
1278	      with a "lidr" extension, aka lidr files), in addition to AsciiDoc
1279	      files (files with an "adoc" extension, aka adoc files).  When a
1280	      lidr file is processed, all embedded code fragments (text between
1281	      prefix "{`" and suffix "`}") are evaluated in the context of the
1282	      Idris code contained in this file.  Each code fragment (including
1283	      the prefix and suffix) are then substituted by the result of that
1284	      evaluation.

1286	   *  The "computerate" command can process included lidr files in the
1287	      same way.  The embedded code fragments in the imported file are
1288	      processed in the context of the included lidr file, not in the
1289	      context of the including file.  Idris modules (either from an idr
1290	      or lidr file) can be imported the usual way.

1292	   *  The literate code (which is all the text that is starting by a ">"
1293	      symbol in column 1) in a lidr file will not be part of the
1294	      rendered document.

1296	   *  The computerate command can process transclusions, as explained in
1297	      Section 2.2.

1299	   *  Lookup of external references is disabled.  Use either raw XML
1300	      references or an external directory.

1302	   *  Instead of generating a file based on the name of the input file,
1303	      the "computerate" command generates a file based on the ":name:"
1304	      attribute in the header of the document.

1306	   The "computerate" command can also be used to generate an xmlrfc v3
1307	   file, ready for submission to the IETF:

1309	   docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \
1310	     computerate/tools computerate -t ietf -x xmlrfc3 <file>

1312	                                 Figure 26

1314	A.3.  Using the Idris REPL

1316	   idr and lidr files can be loaded directly in the Idris REPL for
1317	   debugging:

1319	   docker run --rm -it -u $(id -u):$(id -g) -v $(pwd):/computerate \
1320	     computerate/tools idris <lidr-file>

1322	                                 Figure 27

1324	   It is possible to directly modify the source code in the REPL by
1325	   entering the ":e" command, which will load the file in an instance of
1326	   VIM preconfigured to interact with the REPL.

1328	   Alternatively the Idris REPL can be started as a server:

1330	   docker run --rm -it -u $(id -u):$(id -g) -p 127.0.0.1:4294:4294 \
1331	     -v $(pwd):/computerate computerate/tools idris

1333	                                 Figure 28

1335	   Then if a source file is loaded in a separate console with the VIM
1336	   instance inside the Docker image, it can interact with the REPL
1337	   server:

1339	   docker run --rm -u $(id -u):$(id -g) --net=host \
1340	     -v $(pwd):/computerate computerate/tools vim <file>

1342	                                 Figure 29

1344	   Note that both commands must be run from the same directory.

1346	A.4.  Using Other Commands

1348	   For convenience, the docker image provides the latest version of the
1349	   xml2rfc, idnits, aspell, and languagetool tools.

1351	   docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \
1352	     computerate/tools xml2rfc
1353	   docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \
1354	     computerate/tools idnits --help
1355	   docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \
1356	     computerate/tools aspell
1357	   docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \
1358	     computerate/tools languagetool

1360	                                 Figure 30

1362	   The Docker image also contains a extended version of git that will be
1363	   used to retrieve the computerate specifications in Appendix B.

1365	A.5.  Bugs and Workarounds

1367	   *  Errors in embedded code do not stop the process but replace the
1368	      text by the error message, which can be easily overlooked.

1370	   *  backticks are not escaped in code fragments.

1372	   *  The current version of Docker in Ubuntu fails, but this can be
1373	      fixed with the following commands:

1375	   sudo apt-get install containerd.io=1.2.6-3
1376	   sudo systemctl restart docker.service

1378	                                 Figure 31

1380	   *  The Asciidoctor processor does not correctly format the output in
1381	      all cases (e.g. "++").  The escaping can be done in Idris until
1382	      this is fixed.

1384	   *  Sometimes the Idris processing fails with an error "Module needs
1385	      reloading".  Deleting all the files with the ibc extension will
1386	      solve that problem.

1388	   *  Trying to fetch nonexistent new commits on a git repository will
1389	      block for 12 seconds.

1391	   *  xml2rfc does not support PDF output.

1393	A.6.  TODO List

1395	   *  Embedded blocks.

1397	   *  Test on Windows.

1399	   *  Using recursive modules with Idris.

1401	Appendix B.  Computerate Specifications Library

1403	B.1.  Installation

1405	   The git repositories that compose the Computerate Specification
1406	   Library are distributed over a peer-to-peer protocol based on dat.

1408	   This requires an extension to git, extension that is already
1409	   installed in the Docker image described in Appendix A.  The following
1410	   command can be used to retrieve a computerate specification:

1412	   docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \
1413	     computerate/tools git clone --recursive dat://<public-key> <name>

1415	                                 Figure 32

1417	   Here <public-key> is the dat public key for a specific computerate
1418	   specification and <name> is the recommended name.  Do not use the dat
1419	   URIs given in Appendix A, as only the dat public keys listed in
1420	   Appendix B.2 can be used with a git clone.

1422	   Updating the repository also requires using the Docker image:

1424	   docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \
1425	     computerate/tools git pull
1426	   docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \
1427	     computerate/tools git submodule update

1429	                                 Figure 33

1431	   All the git commands that do not require access to the remote can be
1432	   run natively or from the Docker image.

1434	   Note that for the computerate specification library the "computerate"
1435	   command must be run from the directory that is one level above the
1436	   git repository.  The name of the root document is always "Main.lidr"
1437	   or "Main.adoc":

1439	   docker run --rm -u $(id -u):$(id -g) -v $(pwd):/computerate \
1440	     computerate/tools computerate -t ietf -x txt \
1441	     <git-repository>/Main.lidr

1443	                                 Figure 34

1445	B.2.  Catalog

1447	   For the time being this document will serve as a catalog of available
1448	   computerate specifications.

1450	B.2.1.  RFC5234

1452	   Name: RFC5234
1453	   Public key:
1454	     994e52b29a7bf4f7590b0f0369a7d55d29fb22befd065e462b2185a8207e21f1

1456	                                 Figure 35

1458	Appendix C.  Errata Statistics

1460	   In an effort to quantify the potential benefits of using formal
1461	   methods at the IETF, an effort to relabel the Errata database is
1462	   under way.

1464	   The relabeling uses the following labels:

1466	        +----------+----------------------------------------------+
1467	        | Label    | Description                                  |
1468	        +==========+==============================================+
1469	        | AAD      | Error in an ASCII bit diagram                |
1470	        +----------+----------------------------------------------+
1471	        | ABNF     | Error in an ABNF                             |
1472	        +----------+----------------------------------------------+
1473	        | Absent   | The errata was probably removed              |
1474	        +----------+----------------------------------------------+
1475	        | ASN.1    | Error in ASN.1                               |
1476	        +----------+----------------------------------------------+
1477	        | C        | Error in C code                              |
1478	        +----------+----------------------------------------------+
1479	        | Diagram  | Error in a generic diagram                   |
1480	        +----------+----------------------------------------------+
1481	        | Example  | An example does not match the normative text |
1482	        +----------+----------------------------------------------+
1483	        | Formula  | Error preventable by using Idris code        |
1484	        +----------+----------------------------------------------+
1485	        | Ladder   | Error in a ladder diagram                    |
1486	        +----------+----------------------------------------------+
1487	        | Rejected | The erratum was rejected                     |
1488	        +----------+----------------------------------------------+
1489	        | Text     | Error in the text itself, no remedy          |
1490	        +----------+----------------------------------------------+
1491	        | TLS      | Error in the TLS language                    |
1492	        +----------+----------------------------------------------+

1494	                                  Table 1

1496	   At the time of publication the first 700 errata, which represents
1497	   11.88% of the total, have been relabeled.  On these, 34 were rejected
1498	   and 27 were deleted, leaving 639 valid errata.

1500	                     +---------+-------+------------+
1501	                     | Label   | Count | Percentage |
1502	                     +=========+=======+============+
1503	                     | Text    | 396   | 61.97%     |
1504	                     +---------+-------+------------+
1505	                     | Formula | 66    | 10.32%     |
1506	                     +---------+-------+------------+
1507	                     | Example | 64    | 10.0%      |
1508	                     +---------+-------+------------+
1509	                     | ABNF    | 38    | 5.94%      |
1510	                     +---------+-------+------------+
1511	                     | AAD     | 32    | 5.00%      |
1512	                     +---------+-------+------------+
1513	                     | ASN.1   | 27    | 4.22%      |
1514	                     +---------+-------+------------+
1515	                     | C       | 9     | 1.40%      |
1516	                     +---------+-------+------------+
1517	                     | Diagram | 4     | 0.62%      |
1518	                     +---------+-------+------------+
1519	                     | TLS     | 2     | 0.31%      |
1520	                     +---------+-------+------------+
1521	                     | Ladder  | 1     | 0.15%      |
1522	                     +---------+-------+------------+

1524	                                 Table 2

1526	   Note that as the relabeling is done in in order of erratum number, at
1527	   this point it covers mostly older RFCs.  A change in tooling (e.g.
1528	   ABNF verifiers) means that these numbers may drastically change as
1529	   more errata are relabeled.  But at this point it seems that 38.02% of
1530	   errata could have been prevented with a more pervasive use of formal
1531	   methods.

1533	Acknowledgements

1535	   Thanks to Jim Kleck, Stephane Bryant, Eric Petit-Huguenin, Nicolas
1536	   Gironi, Stephen McQuistin, and Greg Skinner for the comments,
1537	   suggestions, questions, and testing that helped improve this
1538	   document.

1540	Changelog

1542	   *  draft-petithuguenin-computerate-specifying-02

1544	      -  Document

1546	         o  Switch to rfcxml3.

1548	         o  Status is now experimental.

1550	         o  Many nits.

1552	         o  Fix incorrect errata stats.

1554	         o  Move acknowledgment section at the end.

1556	         o  Rewrite the APHD section (formerly known as AAD) to match
1557	            draft-mcquistin-augmented-diagrams-01.

1559	         o  Fix non-ascii characters in the references.

1561	         o  Intermediate AsciiDoc representation for serializers.

1563	      -  Tooling

1565	         o  xmlrfc3 is now the default extension.

1567	         o  "docName" and "category" attributes are now generated, and
1568	            the "prepTime" is removed.

1570	         o  Update xml2rfc to 2.35.0.

1572	         o  Remove LanguageTool.

1574	         o  Update Metanorma to version 0.3.17.

1576	         o  Update Asciidoctor to 2.0.10.

1578	         o  Update list of Working Groups.

1580	      -  Library

1582	         o  No update.

1584	   *  draft-petithuguenin-computerate-specifying-01

1586	      -  Document

1588	         o  New changelog appendix.

1590	         o  Fix incorrect reference, formatting in Idris code.

1592	         o  Add option to remove container in all "docker run" command.

1594	         o  Add explanations to use the Idris REPL and VIM inside the
1595	            Docker image.

1597	         o  Add placeholders for ASN.1 and RELAX NG languages.

1599	         o  New Errata appendix.

1601	         o  Nits.

1603	         o  Improve Syntax Examples section.

1605	      -  Tooling

1607	         o  Update Metanorma to version 0.3.16

1609	         o  Update MetaNorma-cli to version 1.2.7.1

1611	         o  Switch to patched version of Idris 1.3.2 that supports
1612	            remote REPL in Docker.

1614	         o  Add VIM and idris-vim extension.

1616	         o  Remove some debug statements.

1618	      -  Library

1620	         o  No update

1622	Author's Address

1624	   Marc Petit-Huguenin
1625	   Impedance Mismatch LLC

1627	   Email: marc@petit-huguenin.org