idnits 2.17.1 draft-ietf-cbor-cddl-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 23, 2018) is 2071 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Cc' is mentioned on line 1242, but not defined == Missing Reference: 'Aa' is mentioned on line 1242, but not defined == Missing Reference: 'Ss' is mentioned on line 1242, but not defined == Missing Reference: 'Ee' is mentioned on line 1242, but not defined == Missing Reference: 'RFCthis' is mentioned on line 1644, but not defined -- Looks like a reference, but probably isn't: '1' on line 2503 -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO6093' ** Obsolete normative reference: RFC 7049 (Obsoleted by RFC 8949) == Outdated reference: A later version (-13) exists of draft-bormann-cbor-cddl-freezer-01 -- Obsolete informational reference (is this intentional?): RFC 8152 (Obsoleted by RFC 9052, RFC 9053) Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 CBOR H. Birkholz 3 Internet-Draft Fraunhofer SIT 4 Intended status: Standards Track C. Vigano 5 Expires: February 24, 2019 Universitaet Bremen 6 C. Bormann 7 Universitaet Bremen TZI 8 August 23, 2018 10 Concise data definition language (CDDL): a notational convention to 11 express CBOR and JSON data structures 12 draft-ietf-cbor-cddl-05 14 Abstract 16 This document proposes a notational convention to express CBOR data 17 structures (RFC 7049). Its main goal is to provide an easy and 18 unambiguous way to express structures for protocol messages and data 19 formats that use CBOR or JSON. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on February 24, 2019. 38 Copyright Notice 40 Copyright (c) 2018 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (https://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 56 1.1. Requirements notation . . . . . . . . . . . . . . . . . . 4 57 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 58 2. The Style of Data Structure Specification . . . . . . . . . . 4 59 2.1. Groups and Composition in CDDL . . . . . . . . . . . . . 6 60 2.1.1. Usage . . . . . . . . . . . . . . . . . . . . . . . . 8 61 2.1.2. Syntax . . . . . . . . . . . . . . . . . . . . . . . 9 62 2.2. Types . . . . . . . . . . . . . . . . . . . . . . . . . . 9 63 2.2.1. Values . . . . . . . . . . . . . . . . . . . . . . . 9 64 2.2.2. Choices . . . . . . . . . . . . . . . . . . . . . . . 10 65 2.2.3. Representation Types . . . . . . . . . . . . . . . . 12 66 2.2.4. Root type . . . . . . . . . . . . . . . . . . . . . . 13 67 3. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 68 3.1. General conventions . . . . . . . . . . . . . . . . . . . 13 69 3.2. Occurrence . . . . . . . . . . . . . . . . . . . . . . . 15 70 3.3. Predefined names for types . . . . . . . . . . . . . . . 16 71 3.4. Arrays . . . . . . . . . . . . . . . . . . . . . . . . . 16 72 3.5. Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 17 73 3.5.1. Structs . . . . . . . . . . . . . . . . . . . . . . . 17 74 3.5.2. Tables . . . . . . . . . . . . . . . . . . . . . . . 20 75 3.5.3. Non-deterministic order . . . . . . . . . . . . . . . 21 76 3.5.4. Cuts in Maps . . . . . . . . . . . . . . . . . . . . 22 77 3.6. Tags . . . . . . . . . . . . . . . . . . . . . . . . . . 23 78 3.7. Unwrapping . . . . . . . . . . . . . . . . . . . . . . . 23 79 3.8. Controls . . . . . . . . . . . . . . . . . . . . . . . . 24 80 3.8.1. Control operator .size . . . . . . . . . . . . . . . 25 81 3.8.2. Control operator .bits . . . . . . . . . . . . . . . 25 82 3.8.3. Control operator .regexp . . . . . . . . . . . . . . 26 83 3.8.4. Control operators .cbor and .cborseq . . . . . . . . 28 84 3.8.5. Control operators .within and .and . . . . . . . . . 28 85 3.8.6. Control operators .lt, .le, .gt, .ge, .eq, .ne, and 86 .default . . . . . . . . . . . . . . . . . . . . . . 29 87 3.9. Socket/Plug . . . . . . . . . . . . . . . . . . . . . . . 30 88 3.10. Generics . . . . . . . . . . . . . . . . . . . . . . . . 31 89 3.11. Operator Precedence . . . . . . . . . . . . . . . . . . . 32 90 4. Making Use of CDDL . . . . . . . . . . . . . . . . . . . . . 33 91 4.1. As a guide to a human user . . . . . . . . . . . . . . . 33 92 4.2. For automated checking of CBOR data structure . . . . . . 34 93 4.3. For data analysis tools . . . . . . . . . . . . . . . . . 34 94 5. Security considerations . . . . . . . . . . . . . . . . . . . 34 95 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 35 96 6.1. CDDL control operator registry . . . . . . . . . . . . . 35 98 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 36 99 7.1. Normative References . . . . . . . . . . . . . . . . . . 36 100 7.2. Informative References . . . . . . . . . . . . . . . . . 37 101 Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 39 102 A.1. RFC 7071 . . . . . . . . . . . . . . . . . . . . . . . . 39 103 A.2. Examples from JSON Content Rules . . . . . . . . . . . . 43 104 Appendix B. ABNF grammar . . . . . . . . . . . . . . . . . . . . 45 105 Appendix C. Matching rules . . . . . . . . . . . . . . . . . . . 47 106 Appendix D. Standard Prelude . . . . . . . . . . . . . . . . . . 51 107 Appendix E. Use with JSON . . . . . . . . . . . . . . . . . . . 53 108 Appendix F. The CDDL tool . . . . . . . . . . . . . . . . . . . 55 109 Appendix G. Extended Diagnostic Notation . . . . . . . . . . . . 55 110 G.1. White space in byte string notation . . . . . . . . . . . 56 111 G.2. Text in byte string notation . . . . . . . . . . . . . . 56 112 G.3. Embedded CBOR and CBOR sequences in byte strings . . . . 56 113 G.4. Concatenated Strings . . . . . . . . . . . . . . . . . . 57 114 G.5. Hexadecimal, octal, and binary numbers . . . . . . . . . 57 115 G.6. Comments . . . . . . . . . . . . . . . . . . . . . . . . 58 116 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 58 117 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 59 118 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 59 120 1. Introduction 122 In this document, a notational convention to express CBOR [RFC7049] 123 data structures is defined. 125 The main goal for the convention is to provide a unified notation 126 that can be used when defining protocols that use CBOR. We term the 127 convention "Concise data definition language", or CDDL. 129 The CBOR notational convention has the following goals: 131 (G1) Provide an unambiguous description of the overall structure of 132 a CBOR data item. 134 (G2) Be flexible in expressing the multiple ways in which data can 135 be represented in the CBOR data format. 137 (G3) Able to express common CBOR datatypes and structures. 139 (G4) Provide a single format that is both readable and editable for 140 humans and processable by machine. 142 (G5) Enable automatic checking of CBOR data items for data format 143 compliance. 145 (G6) Enable extraction of specific elements from CBOR data for 146 further processing. 148 Not an original goal per se, but a convenient side effect of the JSON 149 generic data model being a subset of the CBOR generic data model, is 150 the fact that CDDL can also be used for describing JSON data 151 structures (see Appendix E). 153 This document has the following structure: 155 The syntax of CDDL is defined in Section 3. Examples of CDDL and 156 related CBOR data items ("instances", which all happen to be in JSON 157 form) are given in Appendix A. Section 4 discusses usage of CDDL. 158 Examples are provided early in the text to better illustrate concept 159 definitions. A formal definition of CDDL using ABNF grammar is 160 provided in Appendix B. Finally, a _prelude_ of standard CDDL 161 definitions that is automatically prepended to and thus available in 162 every CBOR specification is listed in Appendix D. 164 1.1. Requirements notation 166 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 167 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 168 "OPTIONAL" in this document are to be interpreted as described in 169 BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 170 capitals, as shown here. 172 1.2. Terminology 174 New terms are introduced in _cursive_. CDDL text in the running text 175 is in "typewriter". 177 In this specification, the term "byte" is used in its now customary 178 sense as a synonym for "octet". 180 2. The Style of Data Structure Specification 182 CDDL focuses on styles of specification that are in use in the 183 community employing the data model as pioneered by JSON and now 184 refined in CBOR. 186 There are a number of more or less atomic elements of a CBOR data 187 model, such as numbers, simple values (false, true, nil), text and 188 byte strings; CDDL does not focus on specifying their structure. 189 CDDL of course also allows adding a CBOR tag to a data item. 191 The more important components of a data structure definition language 192 are the data types used for composition: arrays and maps in CBOR 193 (called arrays and objects in JSON). While these are only two 194 representation formats, they are used to specify four loosely 195 distinguishable styles of composition: 197 o A _vector_, an array of elements that are mostly of the same 198 semantics. The set of signatures associated with a signed data 199 item is a typical application of a vector. 201 o A _record_, an array the elements of which have different, 202 positionally defined semantics, as detailed in the data structure 203 definition. A 2D point, specified as an array of an x coordinate 204 (which comes first) and a y coordinate (coming second) is an 205 example of a record, as is the pair of exponent (first) and 206 mantissa (second) in a CBOR decimal fraction. 208 o A _table_, a map from a domain of map keys to a domain of map 209 values, that are mostly of the same semantics. A set of language 210 tags, each mapped to a text string translated to that specific 211 language, is an example of a table. The key domain is usually not 212 limited to a specific set by the specification, but open for the 213 application, e.g., in a table mapping IP addresses to MAC 214 addresses, the specification does not attempt to foresee all 215 possible IP addresses. In a language such as JavaScript, a "Map" 216 (as opposed to a plain "Object") would often be employed to 217 achieve the generality of the key domain. 219 o A _struct_, a map from a domain of map keys as defined by the 220 specification to a domain of map values the semantics of each of 221 which is bound to a specific map key. This is what many people 222 have in mind when they think about JSON objects; CBOR adds the 223 ability to use map keys that are not just text strings. Structs 224 can be used to solve similar problems as records; the use of 225 explicit map keys facilitates optionality and extensibility. 227 Two important concepts provide the foundation for CDDL: 229 1. Instead of defining all four types of composition in CDDL 230 separately, or even defining one kind for arrays (vectors and 231 records) and one kind for maps (tables and structs), there is 232 only one kind of composition in CDDL: the _group_ (Section 2.1). 234 2. The other important concept is that of a _type_. The entire CDDL 235 specification defines a type (the one defined by its first 236 _rule_), which formally is the set of CBOR data items that are 237 acceptable as "instances" for this specification. CDDL 238 predefines a number of basic types such as "uint" (unsigned 239 integer) or "tstr" (text string), often making use of a simple 240 formal notation for CBOR data items. Each value that can be 241 expressed as a CBOR data item also is a type in its own right, 242 e.g. "1". A type can be built as a _choice_ of other types, 243 e.g., an "int" is either a "uint" or a "nint" (negative integer). 244 Finally, a type can be built as an array or a map from a group. 246 The rest of this section introduces a number of basic concepts of 247 CDDL, and section Section 3 defines additional syntax. Appendix C 248 gives a concise summary of the semantics of CDDL. 250 2.1. Groups and Composition in CDDL 252 CDDL Groups are lists of group _entries_, each of which can be a 253 name/value pair or a more complex group expression (which then in 254 turn stands for a sequence of name/value pairs). A CDDL group is a 255 production in a grammar that matches certain sequences of name/value 256 pairs but not others. The grammar is based on the concepts of 257 Parsing Expression Grammars [PEG]. 259 In an array context, only the value of the name/value pair is 260 represented; the name is annotation only (and can be left off from 261 the group specification if not needed). In a map context, the names 262 become the map keys ("member keys"). 264 In an array context, the actual sequence of elements in the group is 265 important, as that sequence is the information that allows 266 associating actual array elements with entries in the group. In a 267 map context, the sequence of entries in a group is not relevant (but 268 there is still a need to write down group entries in a sequence). 270 An array matches a specification given as a group when the group 271 matches a sequence of name/value pairs the value parts of which 272 exactly match the elements of the array in order. 274 A map matches a specification given as a group when the group matches 275 a sequence of name/value pairs such that all of these name/value 276 pairs are present in the map and the map has no name/value pair that 277 is not covered by the group. 279 A simple example of using a group directly in a map definition is: 281 person = { 282 age: int, 283 name: tstr, 284 employer: tstr, 285 } 287 Figure 1: Using a group directly in a map 289 The three entries of the group are written between the curly braces 290 that create the map: Here, "age", "name", and "employer" are the 291 names that turn into the map key text strings, and "int" and "tstr" 292 (text string) are the types of the map values under these keys. 294 A group by itself (without creating a map around it) can be placed in 295 (round) parentheses, and given a name by using it in a rule: 297 pii = ( 298 age: int, 299 name: tstr, 300 employer: tstr, 301 ) 303 Figure 2: A basic group 305 This separate, named group definition allows us to rephrase Figure 1 306 as: 308 person = { 309 pii 310 } 312 Figure 3: Using a group by name 314 Note that the (curly) braces signify the creation of a map; the 315 groups themselves are neutral as to whether they will be used in a 316 map or an array. 318 As shown in Figure 1, the parentheses for groups are optional when 319 there is some other set of brackets present. Note that they can 320 still be used, leading to the not so realistic, but perfectly valid 321 example: 323 person = {( 324 age: int, 325 name: tstr, 326 employer: tstr, 327 )} 329 Figure 4: Using a parenthesized group in a map 331 Groups can be used to factor out common parts of structs, e.g., 332 instead of writing copy/paste style specifications such as in 333 Figure 5, one can factor out the common subgroup, choose a name for 334 it, and write only the specific parts into the individual maps 335 (Figure 6). 337 person = { 338 age: int, 339 name: tstr, 340 employer: tstr, 341 } 343 dog = { 344 age: int, 345 name: tstr, 346 leash-length: float, 347 } 349 Figure 5: Maps with copy/paste 351 person = { 352 identity, 353 employer: tstr, 354 } 356 dog = { 357 identity, 358 leash-length: float, 359 } 361 identity = ( 362 age: int, 363 name: tstr, 364 ) 366 Figure 6: Using a group for factorization 368 Note that the lists inside the braces in the above definitions 369 constitute (anonymous) groups, while "identity" is a named group. 371 2.1.1. Usage 373 Groups are the instrument used in composing data structures with 374 CDDL. It is a matter of style in defining those structures whether 375 to define groups (anonymously) right in their contexts or whether to 376 define them in a separate rule and to reference them with their 377 respective name (possibly more than once). 379 With this, one is allowed to define all small parts of their data 380 structures and compose bigger protocol units with those or to have 381 only one big protocol data unit that has all definitions ad hoc where 382 needed. 384 2.1.2. Syntax 386 The composition syntax intends to be concise and easy to read: 388 o The start and end of a group can be marked by '(' and ')' 390 o Definitions of entries inside of a group are noted as follows: 391 _keytype => valuetype,_ (read "keytype maps to valuetype"). The 392 comma is actually optional (not just in the final entry), but it 393 is considered good style to set it. The double arrow can be 394 replaced by a colon in the common case of directly using a text 395 string or integer literal as a key (see Section 3.5.1; this is 396 also the common way of naming elements of an array just for 397 documentation, see Section 3.4). 399 A basic entry consists of a _keytype_ and a _valuetype_, both of 400 which are types (Section 2.2); this entry matches any name-value pair 401 the name of which is in the keytype and the value of which is in the 402 valuetype. 404 A group defined as a sequence of group entries matches any sequence 405 of name-value pairs that is composed by concatenation in order of 406 what the entries match. 408 A group definition can also contain choices between groups, see 409 Section 2.2.2. 411 2.2. Types 413 2.2.1. Values 415 Values such as numbers and strings can be used in place of a type. 416 (For instance, this is a very common thing to do for a keytype, 417 common enough that CDDL provides additional convenience syntax for 418 this.) 420 The value notation is based on the C language, but does not offer all 421 the syntactic variations (see Appendix B for details). The value 422 notation for numbers inherits from C the distinction between integer 423 values (no fractional part or exponent given -- NR1 [ISO6093]) and 424 floating point values (where a fractional part and/or an exponent is 425 present -- NR2 or NR3), so the type "1" does not include any floating 426 point numbers while the types "1e3" and "1.5" are both floating point 427 numbers and do not include any integer numbers. 429 2.2.2. Choices 431 Many places that allow a type also allow a choice between types, 432 delimited by a "/" (slash). The entire choice construct can be put 433 into parentheses if this is required to make the construction 434 unambiguous (please see Appendix B for the details). 436 Choices of values can be used to express enumerations: 438 attire = "bow tie" / "necktie" / "Internet attire" 439 protocol = 6 / 17 441 Similarly as for types, CDDL also allows choices between groups, 442 delimited by a "//" (double slash). Note that the "//" operators 443 binds much more weakly than the other CDDL operators, so each line 444 within "delivery" in the following example is its own alternative in 445 the group choice: 447 address = { delivery } 449 delivery = ( 450 street: tstr, ? number: uint, city // 451 po-box: uint, city // 452 per-pickup: true ) 454 city = ( 455 name: tstr, zip-code: uint 456 ) 458 A group choice matches the union of the sets of name-value pair 459 sequences that the alternatives in the choice can. 461 Both for type choices and for group choices, additional alternatives 462 can be added to a rule later in separate rules by using "/=" and 463 "//=", respectively, instead of "=": 465 attire /= "swimwear" 467 delivery //= ( 468 lat: float, long: float, drone-type: tstr 469 ) 471 It is not an error if a name is first used with a "/=" or "//=" 472 (there is no need to "create it" with "="). 474 2.2.2.1. Ranges 476 Instead of naming all the values that make up a choice, CDDL allows 477 building a _range_ out of two values that are in an ordering 478 relationship. A range can be inclusive of both ends given (denoted 479 by joining two values by ".."), or include the first and exclude the 480 second (denoted by instead using "..."). 482 device-address = byte 483 max-byte = 255 484 byte = 0..max-byte ; inclusive range 485 first-non-byte = 256 486 byte1 = 0...first-non-byte ; byte1 is equivalent to byte 488 CDDL currently only allows ranges between integers (matching integer 489 values) or between floating point values (matching floating point 490 values). If both are needed in a type, a type choice between the two 491 kinds of ranges can be (clumsily) used: 493 int-range = 0..10 ; only integers match 494 float-range = 0.0..10.0 ; only floats match 495 BAD-range1 = 0..10.0 ; NOT DEFINED 496 BAD-range2 = 0.0..10 ; NOT DEFINED 497 numeric-range = int-range / float-range 499 (See also the control operators .lt/.ge and .le/.gt in 500 Section 3.8.6.) 502 Note that the dot is a valid name continuation character in CDDL, so 503 "min..max" is not a range expression but a single name. When using a 504 name as the left hand side of a range operator, use spacing as in 505 "min .. max" to separate off the range operator. 507 2.2.2.2. Turning a group into a choice 509 Some choices are built out of large numbers of values, often 510 integers, each of which is best given a semantic name in the 511 specification. Instead of naming each of these integers and then 512 accumulating these into a choice, CDDL allows building a choice from 513 a group by prefixing it with a "&" character: 515 terminal-color = &basecolors 516 basecolors = ( 517 black: 0, red: 1, green: 2, yellow: 3, 518 blue: 4, magenta: 5, cyan: 6, white: 7, 519 ) 520 extended-color = &( 521 basecolors, 522 orange: 8, pink: 9, purple: 10, brown: 11, 523 ) 525 As with the use of groups in arrays (Section 3.4), the member names 526 have only documentary value (in particular, they might be used by a 527 tool when displaying integers that are taken from that choice). 529 2.2.3. Representation Types 531 CDDL allows the specification of a data item type by referring to the 532 CBOR representation (major and minor numbers). How this is used 533 should be evident from the prelude (Appendix D): a hash mark ("#") 534 optionally followed by a number from 0 to 7 identifying the major 535 type, which then can be followed by a dot and a number specifying the 536 additional information. This construction specifies the set of 537 values that can be serialized in CBOR (i.e., "any"), by the given 538 major type if one is given, or by the given major type with the 539 additional information if both are given. Where a major type of 6 540 (Tag) is used, the type of the tagged item can be specified by 541 appending it in parentheses. 543 Note that although this notation is based on the CBOR serialization, 544 it is about a set of values at the data model level, e.g. "#7.25" 545 specifies the set of values that can be represented as half-precision 546 floats; it does not mandate that these values also do have to be 547 serialized as half-precision floats: CDDL does not provide any 548 language means to restrict the choice of serialization variants. 549 This also enables the use of CDDL with JSON, which uses a 550 fundamentally different way of serializing (some of) the same values. 552 It may be necessary to make use of representation types outside the 553 prelude, e.g., a specification could start by making use of an 554 existing tag in a more specific way, or define a new tag not defined 555 in the prelude: 557 my_breakfast = #6.55799(breakfast) ; cbor-any is too general! 558 breakfast = cereal / porridge 559 cereal = #6.998(tstr) 560 porridge = #6.999([liquid, solid]) 561 liquid = milk / water 562 milk = 0 563 water = 1 564 solid = tstr 566 2.2.4. Root type 568 There is no special syntax to identify the root of a CDDL data 569 structure definition: that role is simply taken by the first rule 570 defined in the file. 572 This is motivated by the usual top-down approach for defining data 573 structures, decomposing a big data structure unit into smaller parts; 574 however, except for the root type, there is no need to strictly 575 follow this sequence. 577 (Note that there is no way to use a group as a root - it must be a 578 type. Using a group as the root might be employed as a way to 579 specify a CBOR sequence in a future version of this specification; 580 this would act as if that group is used in an array and the data 581 items in that fictional array form the members of the CBOR sequence.) 583 3. Syntax 585 In this section, the overall syntax of CDDL is shown, alongside some 586 examples just illustrating syntax. (The definition will not attempt 587 to be overly formal; refer to Appendix B for the details.) 589 3.1. General conventions 591 The basic syntax is inspired by ABNF [RFC5234], with 593 o rules, whether they define groups or types, are defined with a 594 name, followed by an equals sign "=" and the actual definition 595 according to the respective syntactic rules of that definition. 597 o A name can consist of any of the characters from the set {'A', 598 ..., 'Z', 'a', ..., 'z', '0', ..., '9', '_', '-', '@', '.', '$'}, 599 starting with an alphabetic character (including '@', '_', '$') 600 and ending in one or a digit. 602 * Names are case sensitive. 604 * It is preferred style to start a name with a lower case letter. 606 * The hyphen is preferred over the underscore (except in a 607 "bareword" (Section 3.5.1), where the semantics may actually 608 require an underscore). 610 * The period may be useful for larger specifications, to express 611 some module structure (as in "tcp.throughput" vs. 612 "udp.throughput"). 614 * A number of names are predefined in the CDDL prelude, as listed 615 in Appendix D. 617 * Rule names (types or groups) do not appear in the actual CBOR 618 encoding, but names used as "barewords" in member keys do. 620 o Comments are started by a ';' (semicolon) character and finish at 621 the end of a line (LF or CRLF). 623 o outside strings, whitespace (spaces, newlines, and comments) is 624 used to separate syntactic elements for readability (and to 625 separate identifiers or numbers that follow each other); it is 626 otherwise completely optional. 628 o Hexadecimal numbers are preceded by '0x' (without quotes, lower 629 case x), and are case insensitive. Similarly, binary numbers are 630 preceded by '0b'. 632 o Text strings are enclosed by double quotation '"' characters. 633 They follow the conventions for strings as defined in section 7 of 634 [RFC8259]. (ABNF users may want to note that there is no support 635 in CDDL for the concept of case insensitivity in text strings; if 636 necessary, regular expressions can be used (Section 3.8.3).) 638 o Byte strings are enclosed by single quotation "'" characters and 639 may be prefixed by "h" or "b64". If unprefixed, the string is 640 interpreted as with a text string, except that single quotes must 641 be escaped and that the UTF-8 bytes resulting are marked as a byte 642 string (major type 2). If prefixed as "h" or "b64", the string is 643 interpreted as a sequence of pairs of hex digits (base16) or a 644 base64(url) string, respectively (as with the diagnostic notation 645 in section 6 of [RFC7049]; cf. Appendix G.2); any white space 646 present within the string (including comments) is ignored in the 647 prefixed case. 649 o CDDL uses UTF-8 [RFC3629] for its encoding. 651 Example: 653 ; This is a comment 654 person = { g } 656 g = ( 657 "name": tstr, 658 age: int, ; "age" is a bareword 659 ) 661 3.2. Occurrence 663 An optional _occurrence_ indicator can be given in front of a group 664 entry. It is either one of the characters '?' (optional), '*' (zero 665 or more), or '+' (one or more), or is of the form n*m, where n and m 666 are optional unsigned integers and n is the lower limit (default 0) 667 and m is the upper limit (default no limit) of occurrences. 669 If no occurrence indicator is specified, the group entry is to occur 670 exactly once (as if 1*1 were specified). A group entry with an 671 occurrence indicator matches sequences of name-value pairs that are 672 composed by concatenating a number of sequences that the basic group 673 entry matches, where the number needs to be allowed by the occurrence 674 indicator. 676 Note that CDDL, outside any directives/annotations that could 677 possibly be defined, does not make any prescription as to whether 678 arrays or maps use the definite length or indefinite length encoding. 679 I.e., there is no correlation between leaving the size of an array 680 "open" in the spec and the fact that it is then interchanged with 681 definite or indefinite length. 683 Please also note that CDDL can describe flexibility that the data 684 model of the target representation does not have. This is rather 685 obvious for JSON, but also is relevant for CBOR: 687 apartment = { 688 kitchen: size, 689 * bedroom: size, 690 } 691 size = float ; in m2 693 The previous specification does not mean that CBOR is changed to 694 allow to use the key "bedroom" more than once. In other words, due 695 to the restrictions imposed by the data model, the third line pretty 696 much turns into: 698 ? bedroom: size, 700 (Occurrence indicators beyond one still are useful in maps for groups 701 that allow a variety of keys.) 703 3.3. Predefined names for types 705 CDDL predefines a number of names. This subsection summarizes these 706 names, but please see Appendix D for the exact definitions. 708 The following keywords for primitive datatypes are defined: 710 "bool" Boolean value (major type 7, additional information 20 or 711 21). 713 "uint" An unsigned integer (major type 0). 715 "nint" A negative integer (major type 1). 717 "int" An unsigned integer or a negative integer. 719 "float16" A number representable as an IEEE 754 half-precision float 720 (major type 7, additional information 25). 722 "float32" A number representable as an IEEE 754 single-precision 723 float (major type 7, additional information 26). 725 "float64" A number representable as an IEEE 754 double-precision 726 float (major type 7, additional information 27). 728 "float" One of float16, float32, or float64. 730 "bstr" or "bytes" A byte string (major type 2). 732 "tstr" or "text" Text string (major type 3) 734 (Note that there are no predefined names for arrays or maps; these 735 are defined with the syntax given below.) 737 In addition, a number of types are defined in the prelude that are 738 associated with CBOR tags, such as "tdate", "bigint", "regexp" etc. 740 3.4. Arrays 742 Array definitions surround a group with square brackets. 744 For each entry, an occurrence indicator as specified in Section 3.2 745 is permitted. 747 For example: 749 unlimited-people = [* person] 750 one-or-two-people = [1*2 person] 751 at-least-two-people = [2* person] 752 person = ( 753 name: tstr, 754 age: uint, 755 ) 757 The group "person" is defined in such a way that repeating it in the 758 array each time generates alternating names and ages, so these are 759 four valid values for a data item of type "unlimited-people": 761 ["roundlet", 1047, "psychurgy", 2204, "extrarhythmical", 2231] 762 [] 763 ["aluminize", 212, "climograph", 4124] 764 ["penintime", 1513, "endocarditis", 4084, "impermeator", 1669, 765 "coextension", 865] 767 3.5. Maps 769 The syntax for specifying maps merits special attention, as well as a 770 number of optimizations and conveniences, as it is likely to be the 771 focal point of many specifications employing CDDL. While the syntax 772 does not strictly distinguish struct and table usage of maps, it 773 caters specifically to each of them. 775 But first, let's reiterate a feature of CBOR that it has inherited 776 from JSON: The key/value pairs in CBOR maps have no fixed ordering. 777 (One could imagine situations where fixing the ordering may be of 778 use. For example, a decoder could look for values related with 779 integer keys 1, 3 and 7. If the order were fixed and the decoder 780 encounters the key 4 without having encountered key 3, it could 781 conclude that key 3 is not available without doing more complicated 782 bookkeeping. Unfortunately, neither JSON nor CBOR support this, so 783 no attempt was made to support this in CDDL either.) 785 3.5.1. Structs 787 The "struct" usage of maps is similar to the way JSON objects are 788 used in many JSON applications. 790 A map is defined in the same way as defining an array (see 791 Section 3.4), except for using curly braces "{}" instead of square 792 brackets "[]". 794 An occurrence indicator as specified in Section 3.2 is permitted for 795 each group entry. 797 The following is an example of a structure: 799 Geography = [ 800 city : tstr, 801 gpsCoordinates : GpsCoordinates, 802 ] 804 GpsCoordinates = { 805 longitude : uint, ; multiplied by 10^7 806 latitude : uint, ; multiplied by 10^7 807 } 809 When encoding, the Geography structure is encoded using a CBOR array 810 with two entries (the keys for the group entries are ignored), 811 whereas the GpsCoordinates are encoded as a CBOR map with two key/ 812 value pairs. 814 Types used in a structure can be defined in separate rules or just in 815 place (potentially placed inside parentheses, such as for choices). 816 E.g.: 818 located-samples = { 819 sample-point: int, 820 samples: [+ float], 821 } 823 where "located-samples" is the datatype to be used when referring to 824 the struct, and "sample-point" and "samples" are the keys to be used. 825 This is actually a complete example: an identifier that is followed 826 by a colon can be directly used as the text string for a member key 827 (we speak of a "bareword" member key), as can a double-quoted string 828 or a number. (When other types, in particular multi-valued ones, are 829 used as the types of keys, they are followed by a double arrow, see 830 below.) 832 If a text string key does not match the syntax for an identifier (or 833 if the specifier just happens to prefer using double quotes), the 834 text string syntax can also be used in the member key position, 835 followed by a colon. The above example could therefore have been 836 written with quoted strings in the member key positions. More 837 generally, all the types defined can be used in a keytype position by 838 following them with a double arrow -- in particular, the double arrow 839 is necessary if a type is named by an identifier (which would be 840 interpreted as a string before a colon). A string also is a (single- 841 valued) type, so another form for this example is: 843 located-samples = { 844 "sample-point" => int, 845 "samples" => [+ float], 846 } 848 See Section 3.5.4 below for how the colon shortcut described here 849 also adds some implied semantics. 851 A better way to demonstrate the double-arrow use may be: 853 located-samples = { 854 sample-point: int, 855 samples: [+ float], 856 * equipment-type => equipment-tolerances, 857 } 858 equipment-type = [name: tstr, manufacturer: tstr] 859 equipment-tolerances = [+ [float, float]] 861 The example below defines a struct with optional entries: display 862 name (as a text string), the name components first name and family 863 name (as text strings), and age information (as an unsigned integer). 865 PersonalData = { 866 ? displayName: tstr, 867 NameComponents, 868 ? age: uint, 869 } 871 NameComponents = ( 872 ? firstName: tstr, 873 ? familyName: tstr, 874 ) 876 Note that the group definition for NameComponents does not generate 877 another map; instead, all four keys are directly in the struct built 878 by PersonalData. 880 In this example, all key/value pairs are optional from the 881 perspective of CDDL. With no occurrence indicator, an entry is 882 mandatory. 884 If the addition of more entries not specified by the current 885 specification is desired, one can add this possibility explicitly: 887 PersonalData = { 888 ? displayName: tstr, 889 NameComponents, 890 ? age: uint, 891 * tstr => any 892 } 894 NameComponents = ( 895 ? firstName: tstr, 896 ? familyName: tstr, 897 ) 899 Figure 7: Personal Data: Example for extensibility 901 The cddl tool (Appendix F) generated as one acceptable instance for 902 this specification: 904 {"familyName": "agust", "antiforeignism": "pretzel", 905 "springbuck": "illuminatingly", "exuviae": "ephemeris", 906 "kilometrage": "frogfish"} 908 (See Section 3.9 for one way to explicitly identify an extension 909 point.) 911 3.5.2. Tables 913 A table can be specified by defining a map with entries where the 914 keytype is not single-valued, e.g.: 916 square-roots = {* x => y} 917 x = int 918 y = float 920 Here, the key in each key/value pair has datatype x (defined as int), 921 and the value has datatype y (defined as float). 923 If the specification does not need to restrict one of x or y (i.e., 924 the application is free to choose per entry), it can be replaced by 925 the predefined name "any". 927 As another example, the following could be used as a conversion table 928 converting from an integer or float to a string: 930 tostring = {* mynumber => tstr} 931 mynumber = int / float 933 3.5.3. Non-deterministic order 935 While the way arrays are matched is fully determined by the Parsing 936 Expression Grammar (PEG) algorithm, matching is more complicated for 937 maps, as maps do not have an inherent order. For each candidate 938 name/value pair that the PEG algorithm would try, a matching member 939 is picked out of the entire map. For certain group expressions, more 940 than one member in the map may match. Most often, this is 941 inconsequential, as the group expression tends to consume all 942 matches: 944 labeled-values = { 945 ? fritz: number, 946 * label => value 947 } 948 label = text 949 value = number 951 Here, if any member with the key "fritz" is present, this will be 952 picked by the first entry of the group; all remaining text/number 953 member will be picked by the second entry (and if anything remains 954 unpicked, the map does not match). 956 However, it is possible to construct group expressions where what is 957 actually picked is indeterminate, and does matter: 959 do-not-do-this = { 960 int => int, 961 int => 6, 962 } 964 When this expression is matched against "{3: 5, 4: 6}", the first 965 group entry might pick off the "3: 5", leaving "4: 6" for matching 966 the second one. Or it might pick off "4: 6", leaving nothing for the 967 second entry. This pathological non-determinism is caused by 968 specifying more general before more specific, and by having a general 969 rule that only consumes a subset of the map key/value pairs that it 970 is able to match -- both tend not to occur in real-world 971 specifications of maps. At the time of writing, CDDL tools cannot 972 detect such cases automatically, and for the present version of the 973 CDDL specification, the specification writer is simply urged to not 974 write pathologically non-deterministic specifications. 976 (The astute reader will be reminded of what was called "ambiguous 977 content models" in SGML and "non-deterministic content models" in 978 XML. That problem is related to the one described here, but the 979 problem here is specifically caused by the lack of order in maps, 980 something that the XML schema languages do not have to contend with. 982 Note that Relax-NG's "interleave" pattern handles lack of order 983 explicitly on the specification side, while the instances in XML 984 always have determinate order.) 986 3.5.4. Cuts in Maps 988 The extensibility idiom discussed above for structs has one problem: 990 extensible-map-example = { 991 ? "optional-key" => int, 992 * tstr => any 993 } 995 In this example, there is one optional key "optional-key", which, 996 when present, maps to an integer. There is also a wild card for any 997 future additions. 999 Unfortunately, the data item 1001 { "optional-key": "nonsense" } 1003 does match this specification: While the first entry of the group 1004 does not match, the second one (the wildcard) does. This may be very 1005 well desirable (e.g., if a future extension is to be allowed to 1006 extend the type of "optional-key"), but in many cases isn't. 1008 In anticipation of a more general potential feature called "cuts", 1009 CDDL allows inserting a cut "^" into the definition of the map entry: 1011 extensible-map-example = { 1012 ? "optional-key" ^ => int, 1013 * tstr => any 1014 } 1016 A cut in this position means that once the member key matches the 1017 name part of an entry that carries a cut, other potential matches for 1018 the key of the member that occur in later entries in the group of the 1019 map are no longer allowed. In other words, when a group entry would 1020 pick a key/value pair based on just a matching key, it "locks in" the 1021 pick -- this rule applies independent of whether the value matches as 1022 well, so when it does not, the entire map fails to match. In 1023 summary, the example above no longer matches the specification as 1024 modified with the cut. 1026 Since the desire for this kind of exclusive matching is so frequent, 1027 the ":" shortcut is actually defined to include the cut semantics. 1028 So the preceding example (including the cut) can be written more 1029 simply as: 1031 extensible-map-example = { 1032 ? "optional-key": int, 1033 * tstr => any 1034 } 1036 or even shorter, using a bareword for the key: 1038 extensible-map-example = { 1039 ? optional-key: int, 1040 * tstr => any 1041 } 1043 3.6. Tags 1045 A type can make use of a CBOR tag (major type 6) by using the 1046 representation type notation, giving #6.nnn(type) where nnn is an 1047 unsigned integer giving the tag number and "type" is the type of the 1048 data item being tagged. 1050 For example, the following line from the CDDL prelude (Appendix D) 1051 defines "biguint" as a type name for a positive bignum N: 1053 biguint = #6.2(bstr) 1055 The tags defined by [RFC7049] are included in the prelude. 1056 Additional tags since registered need to be added to a CDDL 1057 specification as needed; e.g., a binary UUID tag could be referenced 1058 as "buuid" in a specification after defining 1060 buuid = #6.37(bstr) 1062 In the following example, usage of the tag 32 for URIs is optional: 1064 my_uri = #6.32(tstr) / tstr 1066 3.7. Unwrapping 1068 The group that is used to define a map or an array can often be 1069 reused in the definition of another map or array. Similarly, a type 1070 defined as a tag carries an internal data item that one would like to 1071 refer to. In these cases, it is expedient to simply use the name of 1072 the map, array, or tag type as a handle for the group or type defined 1073 inside it. 1075 The "unwrap" operator (written by preceding a name by a tilde 1076 character "~") can be used to strip the type defined for a name by 1077 one layer, exposing the underlying group (for maps and arrays) or 1078 type (for tags). 1080 For example, an application might want to define a basic and an 1081 advanced header. Without unwrapping, this might be done as follows: 1083 basic-header-group = ( 1084 field1: int, 1085 field2: text, 1086 ) 1088 basic-header = [ basic-header-group ] 1090 advanced-header = [ 1091 basic-header-group, 1092 field3: bytes, 1093 field4: number, ; as in the tagged type "time" 1094 ] 1096 Unwrapping simplifies this to: 1098 basic-header = [ 1099 field1: int, 1100 field2: text, 1101 ] 1103 advanced-header = [ 1104 ~basic-header, 1105 field3: bytes, 1106 field4: ~time, 1107 ] 1109 (Note that leaving out the first unwrap operator in the latter 1110 example would lead to nesting the basic-header in its own array 1111 inside the advanced-header, while, with the unwrapped basic-header, 1112 the definition of the group inside basic-header is essentially 1113 repeated inside advanced-header, leading to a single array. This can 1114 be used for various applications often solved by inheritance in 1115 programming languages. The effect of unwrapping can also be 1116 described as "threading in" the group or type inside the referenced 1117 type, which suggested the thread-like "~" character.) 1119 3.8. Controls 1121 A _control_ allows to relate a _target_ type with a _controller_ type 1122 via a _control operator_. 1124 The syntax for a control type is "target .control-operator 1125 controller", where control operators are special identifiers prefixed 1126 by a dot. (Note that _target_ or _controller_ might need to be 1127 parenthesized.) 1128 A number of control operators are defined at this point. Further 1129 control operators may be defined by new versions of this 1130 specification or by registering them according to the procedures in 1131 Section 6.1. 1133 3.8.1. Control operator .size 1135 A ".size" control controls the size of the target in bytes by the 1136 control type. The control is defined for text and byte strings, 1137 where it directly controls the number of bytes in the string. It is 1138 also defined for unsigned integers (see below). Figure 8 shows 1139 example usage for byte strings. 1141 full-address = [[+ label], ip4, ip6] 1142 ip4 = bstr .size 4 1143 ip6 = bstr .size 16 1144 label = bstr .size (1..63) 1146 Figure 8: Control for size in bytes 1148 When applied to an unsigned integer, the ".size" control restricts 1149 the range of that integer by giving a maximum number of bytes that 1150 should be needed in a computer representation of that unsigned 1151 integer. In other words, "uint .size N" is equivalent to 1152 "0...BYTES_N", where BYTES_N == 256**N. 1154 audio_sample = uint .size 3 ; 24-bit, equivalent to 0..16777215 1156 Figure 9: Control for integer size in bytes 1158 Note that, as with value restrictions in CDDL, this control is not a 1159 representation constraint; a number that fits into fewer bytes can 1160 still be represented in that form, and an inefficient implementation 1161 could use a longer form (unless that is restricted by some format 1162 constraints outside of CDDL, such as the rules in Section 3.9 of 1163 [RFC7049]). 1165 3.8.2. Control operator .bits 1167 A ".bits" control on a byte string indicates that, in the target, 1168 only the bits numbered by a number in the control type are allowed to 1169 be set. (Bits are counted the usual way, bit number "n" being set in 1170 "str" meaning that "(str[n >> 3] & (1 << (n & 7))) != 0".) 1171 Similarly, a ".bits" control on an unsigned integer "i" indicates 1172 that for all unsigned integers "n" where "(i & (1 << n)) != 0", "n" 1173 must be in the control type. 1175 tcpflagbytes = bstr .bits flags 1176 flags = &( 1177 fin: 8, 1178 syn: 9, 1179 rst: 10, 1180 psh: 11, 1181 ack: 12, 1182 urg: 13, 1183 ece: 14, 1184 cwr: 15, 1185 ns: 0, 1186 ) / (4..7) ; data offset bits 1188 rwxbits = uint .bits rwx 1189 rwx = &(r: 2, w: 1, x: 0) 1191 Figure 10: Control for what bits can be set 1193 The CDDL tool generates the following ten example instances for 1194 "tcpflagbytes": 1196 h'906d' h'01fc' h'8145' h'01b7' h'013d' h'409f' h'018e' h'c05f' 1197 h'01fa' h'01fe' 1199 These examples do not illustrate that the above CDDL specification 1200 does not explicitly specify a size of two bytes: A valid all clear 1201 instance of flag bytes could be "h''" or "h'00'" or even "h'000000'" 1202 as well. 1204 3.8.3. Control operator .regexp 1206 A ".regexp" control indicates that the text string given as a target 1207 needs to match the XSD regular expression given as a value in the 1208 control type. XSD regular expressions are defined in Appendix F of 1209 [W3C.REC-xmlschema-2-20041028]. 1211 nai = tstr .regexp "[A-Za-z0-9]+@[A-Za-z0-9]+(\\.[A-Za-z0-9]+)+" 1213 Figure 11: Control with an XSD regexp 1215 The CDDL tool proposes: 1217 "N1@CH57HF.4Znqe0.dYJRN.igjf" 1219 3.8.3.1. Usage considerations 1221 Note that XSD regular expressions do not support the usual \x or \u 1222 escapes for hexadecimal expression of bytes or unicode code points. 1223 However, in CDDL the XSD regular expressions are contained in text 1224 strings, the literal notation for which provides \u escapes; this 1225 should suffice for most applications that use regular expressions for 1226 text strings. (Note that this also means that there is one level of 1227 string escaping before the XSD escaping rules are applied.) 1229 XSD regular expressions support character class subtraction, a 1230 feature often not found in regular expression libraries; 1231 specification writers may want to use this feature sparingly. 1232 Similar considerations apply to Unicode character classes; where 1233 these are used, the specification SHOULD identify which Unicode 1234 versions are addressed. 1236 Other surprises for infrequent users of XSD regular expressions may 1237 include: 1239 o No direct support for case insensitivity. While case 1240 insensitivity has gone mostly out of fashion in protocol design, 1241 it is sometimes needed and then needs to be expressed manually as 1242 in "[Cc][Aa][Ss][Ee]". 1244 o The support for popular character classes such as \w and \d is 1245 based on Unicode character properties, which is often not what is 1246 desired in an ASCII-based protocol and thus might lead to 1247 surprises. (\s and \S do have their more conventional meanings, 1248 and "." matches any character but the line ending characters \r or 1249 \n.) 1251 3.8.3.2. Discussion 1253 There are many flavors of regular expression in use in the 1254 programming community. For instance, perl-compatible regular 1255 expressions (PCRE) are widely used and probably are more useful than 1256 XSD regular expressions. However, there is no normative reference 1257 for PCRE that could be used in the present document. Instead, we opt 1258 for XSD regular expressions for now. There is precedent for that 1259 choice in the IETF, e.g., in YANG [RFC7950]. 1261 Note that CDDL uses controls as its main extension point. This 1262 creates the opportunity to add further regular expression formats in 1263 addition to the one referenced here if desired. As an example, a 1264 control ".pcre" is defined in [I-D.bormann-cbor-cddl-freezer]. 1266 3.8.4. Control operators .cbor and .cborseq 1268 A ".cbor" control on a byte string indicates that the byte string 1269 carries a CBOR encoded data item. Decoded, the data item matches the 1270 type given as the right-hand side argument (type1 in the following 1271 example). 1273 "bytes .cbor type1" 1275 Similarly, a ".cborseq" control on a byte string indicates that the 1276 byte string carries a sequence of CBOR encoded data items. When the 1277 data items are taken as an array, the array matches the type given as 1278 the right-hand side argument (type2 in the following example). 1280 "bytes .cborseq type2" 1282 (The conversion of the encoded sequence to an array can be effected 1283 for instance by wrapping the byte string between the two bytes 0x9f 1284 and 0xff and decoding the wrapped byte string as a CBOR encoded data 1285 item.) 1287 3.8.5. Control operators .within and .and 1289 A ".and" control on a type indicates that the data item matches both 1290 that left hand side type and the type given as the right hand side. 1291 (Formally, the resulting type is the intersection of the two types 1292 given.) 1294 "type1 .and type2" 1296 A variant of the ".and" control is the ".within" control, which 1297 expresses an additional intent: the left hand side type is meant to 1298 be a subset of the right-hand-side type. 1300 "type1 .within type2" 1302 While both forms have the identical formal semantics (intersection), 1303 the intention of the ".within" form is that the right hand side gives 1304 guidance to the types allowed on the left hand side, which typically 1305 is a socket (Section 3.9): 1307 message = $message .within message-structure 1308 message-structure = [message_type, *message_option] 1309 message_type = 0..255 1310 message_option = any 1312 $message /= [3, dough: text, topping: [* text]] 1313 $message /= [4, noodles: text, sauce: text, parmesan: bool] 1314 For ".within", a tool might flag an error if type1 allows data items 1315 that are not allowed by type2. In contrast, for ".and", there is no 1316 expectation that type1 already is a subset of type2. 1318 3.8.6. Control operators .lt, .le, .gt, .ge, .eq, .ne, and .default 1320 The controls .lt, .le, .gt, .ge, .eq, .ne specify a constraint on the 1321 left hand side type to be a value less than, less than or equal, 1322 greater than, greater than or equal, equal, or not equal, to a value 1323 given as a (single-valued) right hand side type. In the present 1324 specification, the first four controls (.lt, .le, .gt, .ge) are 1325 defined only for numeric types, as these have a natural ordering 1326 relationship. 1328 speed = number .ge 0 ; unit: m/s 1330 .ne and .eq are defined both for numeric values and values of other 1331 types. If one of the values is not of a numeric type, equality is 1332 determined as follows: Text strings are equal (satisfy .eq/do not 1333 satisfy .ne) if they are byte-wise identical; the same applies for 1334 byte strings. Arrays are equal if they have the same number of 1335 elements, all of which are equal pairwise in order between the 1336 arrays. Maps are equal if they have the same number of key/value 1337 pairs, and there is pairwise equality between the key/value pairs 1338 between the two maps. Tagged values are equal if they both have the 1339 same tag and the values are equal. Values of simple types match if 1340 they are the same values. Numeric types that occur within arrays, 1341 maps, or tagged values are equal if their numeric value is equal and 1342 they are both integers or both floating point values. All other 1343 cases are not equal (e.g., comparing a text string with a byte 1344 string). 1346 A variant of the ".ne" control is the ".default" control, which 1347 expresses an additional intent: the value specified by the right- 1348 hand-side type is intended as a default value for the left hand side 1349 type given, and the implied .ne control is there to prevent this 1350 value from being sent over the wire. This control is only meaningful 1351 when the control type is used in an optional context; otherwise there 1352 would be no way to express the default value. 1354 timer = { 1355 time: uint, 1356 ? displayed-step: (number .gt 0) .default 1 1357 } 1359 3.9. Socket/Plug 1361 Both for type choices and group choices, a mechanism is defined that 1362 facilitates starting out with empty choices and assembling them 1363 later, potentially in separate files that are concatenated to build 1364 the full specification. 1366 Per convention, CDDL extension points are marked with a leading 1367 dollar sign (types) or two leading dollar signs (groups). Tools 1368 honor that convention by not raising an error if such a type or group 1369 is not defined at all; the symbol is then taken to be an empty type 1370 choice (group choice), i.e., no choice is available. 1372 tcp-header = {seq: uint, ack: uint, * $$tcp-option} 1374 ; later, in a different file 1376 $$tcp-option //= ( 1377 sack: [+(left: uint, right: uint)] 1378 ) 1380 ; and, maybe in another file 1382 $$tcp-option //= ( 1383 sack-permitted: true 1384 ) 1386 Names that start with a single "$" are "type sockets", names with a 1387 double "$$" are "group sockets". It is not an error if there is no 1388 definition for a socket at all; this then means there is no way to 1389 satisfy the rule (i.e., the choice is empty). 1391 As a convention, all definitions (plugs) for socket names must be 1392 augmentations, i.e., they must be using "/=" and "//=", respectively. 1394 To pick up the example illustrated in Figure 7, the socket/plug 1395 mechanism could be used as shown in Figure 12: 1397 PersonalData = { 1398 ? displayName: tstr, 1399 NameComponents, 1400 ? age: uint, 1401 * $$personaldata-extensions 1402 } 1404 NameComponents = ( 1405 ? firstName: tstr, 1406 ? familyName: tstr, 1407 ) 1409 ; The above already works as is. 1410 ; But then, we can add later: 1412 $$personaldata-extensions //= ( 1413 favorite-salsa: tstr, 1414 ) 1416 ; and again, somewhere else: 1418 $$personaldata-extensions //= ( 1419 shoesize: uint, 1420 ) 1422 Figure 12: Personal Data example: Using socket/plug extensibility 1424 3.10. Generics 1426 Using angle brackets, the left hand side of a rule can add formal 1427 parameters after the name being defined, as in: 1429 messages = message<"reboot", "now"> / message<"sleep", 1..100> 1430 message = {type: t, value: v} 1432 When using a generic rule, the formal parameters are bound to the 1433 actual arguments supplied (also using angle brackets), within the 1434 scope of the generic rule (as if there were a rule of the form 1435 parameter = argument). 1437 Generic rules can be used for establishing names for both types and 1438 groups. 1440 (There are some limitations to nesting of generics in the tool 1441 described in Appendix F at this time.) 1443 3.11. Operator Precedence 1445 As with any language that has multiple syntactic features such as 1446 prefix and infix operators, CDDL has operators that bind more tightly 1447 than others. This is becoming more complicated than, say, in ABNF, 1448 as CDDL has both types and groups, with operators that are specific 1449 to these concepts. Type operators (such as "/" for type choice) 1450 operate on types, while group operators (such as "//" for group 1451 choice) operate on groups. Types can simply be used in groups, but 1452 groups need to be bracketed (as arrays or maps) to become types. So, 1453 type operators naturally bind closer than group operators. 1455 For instance, in 1457 t = [group1] 1458 group1 = (a / b // c / d) 1459 a = 1 b = 2 c = 3 d = 4 1461 group1 is a group choice between the type choice of a and b and the 1462 type choice of c and d. This becomes more relevant once member keys 1463 and/or occurrences are added in: 1465 t = {group2} 1466 group2 = (? ab: a / b // cd: c / d) 1467 a = 1 b = 2 c = 3 d = 4 1469 is a group choice between the optional member "ab" of type a or b and 1470 the member "cd" of type c or d. Note that the optionality is 1471 attached to the first choice ("ab"), not to the second choice. 1473 Similarly, in 1475 t = [group3] 1476 group3 = (+ a / b / c) 1477 a = 1 b = 2 c = 3 1479 group3 is a repetition of a type choice between a, b, and c; if just 1480 a is to be repeatable, a group choice is needed to focus the 1481 occurrence: 1483 (A comment has been that this could be counter-intuitive. The 1484 specification writer is encouraged to use parentheses liberally to 1485 guide readers that are not familiar with CDDL precedence rules.) 1487 t = [group4] 1488 group4 = (+ a // b / c) 1489 a = 1 b = 2 c = 3 1490 group4 is a group choice between a repeatable a and a single b or c. 1492 In general, as with many other languages with operator precedence 1493 rules, it is best not to rely on them, but to insert parentheses for 1494 readability: 1496 t = [group4a] 1497 group4a = ((+ a) // (b / c)) 1498 a = 1 b = 2 c = 3 1500 The operator precedences, in sequence of loose to tight binding, are 1501 defined in Appendix B and summarized in Table 1. (Arities given are 1502 1 for unary prefix operators and 2 for binary infix operators.) 1504 +----------+----+---------------------------+------+ 1505 | Operator | Ar | Operates on | Prec | 1506 +----------+----+---------------------------+------+ 1507 | = | 2 | name = type, name = group | 1 | 1508 | /= | 2 | name /= type | 1 | 1509 | //= | 2 | name //= group | 1 | 1510 | // | 2 | group // group | 2 | 1511 | , | 2 | group, group | 3 | 1512 | * | 1 | * group | 4 | 1513 | N*M | 1 | N*M group | 4 | 1514 | + | 1 | + group | 4 | 1515 | ? | 1 | ? group | 4 | 1516 | => | 2 | type => type | 5 | 1517 | : | 2 | name: type | 5 | 1518 | / | 2 | type / type | 6 | 1519 | .. | 2 | type..type | 7 | 1520 | ... | 2 | type...type | 7 | 1521 | .ctrl | 2 | type .ctrl type | 7 | 1522 | & | 1 | &group | 8 | 1523 | ~ | 1 | ~type | 8 | 1524 +----------+----+---------------------------+------+ 1526 Table 1: Summary of operator precedences 1528 4. Making Use of CDDL 1530 In this section, we discuss several potential ways to employ CDDL. 1532 4.1. As a guide to a human user 1534 CDDL can be used to efficiently define the layout of CBOR data, such 1535 that a human implementer can easily see how data is supposed to be 1536 encoded. 1538 Since CDDL maps parts of the CBOR data to human readable names, tools 1539 could be built that use CDDL to provide a human friendly 1540 representation of the CBOR data, and allow them to edit such data 1541 while remaining compliant to its CDDL definition. 1543 4.2. For automated checking of CBOR data structure 1545 CDDL has been specified such that a machine can handle the CDDL 1546 definition and related CBOR data (and, thus, also JSON data). For 1547 example, a machine could use CDDL to check whether or not CBOR data 1548 is compliant to its definition. 1550 The need for thoroughness of such compliance checking depends on the 1551 application. For example, an application may decide not to check the 1552 data structure at all, and use the CDDL definition solely as a means 1553 to indicate the structure of the data to the programmer. 1555 On the other end, the application may also implement a checking 1556 mechanism that goes as far as checking that all mandatory map members 1557 are available. 1559 The matter in how far the data description must be enforced by an 1560 application is left to the designers and implementers of that 1561 application, keeping in mind related security considerations. 1563 In no case the intention is that a CDDL tool would be "writing code" 1564 for an implementation. 1566 4.3. For data analysis tools 1568 In the long run, it can be expected that more and more data will be 1569 stored using the CBOR data format. 1571 Where there is data, there is data analysis and the need to process 1572 such data automatically. CDDL can be used for such automated data 1573 processing, allowing tools to verify data, clean it, and extract 1574 particular parts of interest from it. 1576 Since CBOR is designed with constrained devices in mind, a likely use 1577 of it would be small sensors. An interesting use would thus be 1578 automated analysis of sensor data. 1580 5. Security considerations 1582 This document presents a content rules language for expressing CBOR 1583 data structures. As such, it does not bring any security issues on 1584 itself, although specification of protocols that use CBOR naturally 1585 need security analysis when defined. 1587 Topics that could be considered in a security considerations section 1588 that uses CDDL to define CBOR structures include the following: 1590 o Where could the language maybe cause confusion in a way that will 1591 enable security issues? 1593 o Where a CDDL matcher is part of the implementation of a system, 1594 the security of the system ought not depend on the correctness of 1595 the CDDL specification or CDDL implementation without any further 1596 defenses in place. 1598 o Where the CDDL includes extension points, the impact of extensions 1599 on the security of the system needs to be carefully considered. 1601 Writers of CDDL specifications are strongly encouraged to value 1602 simplicity and transparency of the specification over its elegance. 1603 Keep it as simple as possible while still expressing the needed data 1604 model. 1606 A related observation about formal description techniques in general 1607 that is strongly recommended to be kept in mind by writers of CDDL 1608 specifications: Just because CDDL makes it easier to handle 1609 complexity in a specification, that does not make that complexity 1610 somehow less bad (except maybe on the level of the humans having to 1611 grasp the complex structure while reading the spec). 1613 6. IANA Considerations 1615 6.1. CDDL control operator registry 1617 IANA is requested to create a registry for control operators 1618 Section 3.8. The name of this registry is "CDDL Control Operators". 1620 Each entry in the subregistry must include the name of the control 1621 operator (by convention given with the leading dot) and a reference 1622 to its documentation. Names must be composed of the leading dot 1623 followed by a text string conforming to the production "id" in 1624 Appendix B. 1626 Initial entries in this registry are as follows: 1628 +----------+---------------+ 1629 | name | documentation | 1630 +----------+---------------+ 1631 | .size | [RFCthis] | 1632 | .bits | [RFCthis] | 1633 | .regexp | [RFCthis] | 1634 | .cbor | [RFCthis] | 1635 | .cborseq | [RFCthis] | 1636 | .within | [RFCthis] | 1637 | .and | [RFCthis] | 1638 | .lt | [RFCthis] | 1639 | .le | [RFCthis] | 1640 | .gt | [RFCthis] | 1641 | .ge | [RFCthis] | 1642 | .eq | [RFCthis] | 1643 | .ne | [RFCthis] | 1644 | .default | [RFCthis] | 1645 +----------+---------------+ 1647 All other control operator names are Unassigned. 1649 The IANA policy for additions to this registry is "Specification 1650 Required" as defined in [RFC8126] (which involves an Expert Review) 1651 for names that do not include an internal dot, and "IETF Review" for 1652 names that do include an internal dot. The Expert is specifically 1653 instructed that other Standards Development Organizations (SDOs) may 1654 want to define control operators that are specific to their fields 1655 (e.g., based on a binary syntax already in use at the SDO); the 1656 review process should strive to facilitate such an undertaking. 1658 7. References 1660 7.1. Normative References 1662 [ISO6093] ISO, "Information processing -- Representation of 1663 numerical values in character strings for information 1664 interchange", ISO 6093, 1985. 1666 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1667 Requirement Levels", BCP 14, RFC 2119, 1668 DOI 10.17487/RFC2119, March 1997, 1669 . 1671 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 1672 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 1673 2003, . 1675 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 1676 Specifications: ABNF", STD 68, RFC 5234, 1677 DOI 10.17487/RFC5234, January 2008, 1678 . 1680 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 1681 Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, 1682 October 2013, . 1684 [RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493, 1685 DOI 10.17487/RFC7493, March 2015, 1686 . 1688 [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for 1689 Writing an IANA Considerations Section in RFCs", BCP 26, 1690 RFC 8126, DOI 10.17487/RFC8126, June 2017, 1691 . 1693 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1694 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1695 May 2017, . 1697 [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data 1698 Interchange Format", STD 90, RFC 8259, 1699 DOI 10.17487/RFC8259, December 2017, 1700 . 1702 [W3C.REC-xmlschema-2-20041028] 1703 Biron, P. and A. Malhotra, "XML Schema Part 2: Datatypes 1704 Second Edition", World Wide Web Consortium Recommendation 1705 REC-xmlschema-2-20041028, October 2004, 1706 . 1708 7.2. Informative References 1710 [I-D.bormann-cbor-cddl-freezer] 1711 Bormann, C., "A feature freezer for the Concise Data 1712 Definition Language (CDDL)", draft-bormann-cbor-cddl- 1713 freezer-01 (work in progress), August 2018. 1715 [I-D.ietf-anima-grasp] 1716 Bormann, C., Carpenter, B., and B. Liu, "A Generic 1717 Autonomic Signaling Protocol (GRASP)", draft-ietf-anima- 1718 grasp-15 (work in progress), July 2017. 1720 [I-D.ietf-core-senml] 1721 Jennings, C., Shelby, Z., Arkko, J., Keranen, A., and C. 1722 Bormann, "Sensor Measurement Lists (SenML)", draft-ietf- 1723 core-senml-16 (work in progress), May 2018. 1725 [I-D.newton-json-content-rules] 1726 Newton, A. and P. Cordell, "A Language for Rules 1727 Describing JSON Content", draft-newton-json-content- 1728 rules-09 (work in progress), September 2017. 1730 [PEG] Ford, B., "Parsing expression grammars", Proceedings of 1731 the 31st ACM SIGPLAN-SIGACT symposium on Principles of 1732 programming languages - POPL '04, 1733 DOI 10.1145/964001.964011, 2004. 1735 [RELAXNG] ISO/IEC, "Information technology -- Document Schema 1736 Definition Language (DSDL) -- Part 2: Regular-grammar- 1737 based validation -- RELAX NG", ISO/IEC 19757-2, December 1738 2008. 1740 [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 1741 Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, 1742 . 1744 [RFC7071] Borenstein, N. and M. Kucherawy, "A Media Type for 1745 Reputation Interchange", RFC 7071, DOI 10.17487/RFC7071, 1746 November 2013, . 1748 [RFC7950] Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language", 1749 RFC 7950, DOI 10.17487/RFC7950, August 2016, 1750 . 1752 [RFC8007] Murray, R. and B. Niven-Jenkins, "Content Delivery Network 1753 Interconnection (CDNI) Control Interface / Triggers", 1754 RFC 8007, DOI 10.17487/RFC8007, December 2016, 1755 . 1757 [RFC8152] Schaad, J., "CBOR Object Signing and Encryption (COSE)", 1758 RFC 8152, DOI 10.17487/RFC8152, July 2017, 1759 . 1761 7.3. URIs 1763 [1] https://github.com/cabo/cbor-diag 1765 Appendix A. Examples 1767 This section contains a few examples of structures defined using 1768 CDDL. 1770 The theme for the first example is taken from [RFC7071], which 1771 defines certain JSON structures in English. For a similar example, 1772 it may also be of interest to examine Appendix A of [RFC8007], which 1773 contains a CDDL definition for a JSON structure defined in the main 1774 body of the RFC. 1776 The second subsection in this appendix translates examples from 1777 [I-D.newton-json-content-rules] into CDDL. 1779 These examples all happen to describe data that is interchanged in 1780 JSON. Examples for CDDL definitions of data that is interchanged in 1781 CBOR can be found in [RFC8152], [I-D.ietf-anima-grasp], or 1782 [I-D.ietf-core-senml]. 1784 A.1. RFC 7071 1786 [RFC7071] defines the Reputon structure for JSON using somewhat 1787 formalized English text. Here is a (somewhat verbose) equivalent 1788 definition using the same terms, but notated in CDDL: 1790 reputation-object = { 1791 reputation-context, 1792 reputon-list 1793 } 1795 reputation-context = ( 1796 application: text 1797 ) 1799 reputon-list = ( 1800 reputons: reputon-array 1801 ) 1803 reputon-array = [* reputon] 1805 reputon = { 1806 rater-value, 1807 assertion-value, 1808 rated-value, 1809 rating-value, 1810 ? conf-value, 1811 ? normal-value, 1812 ? sample-value, 1813 ? gen-value, 1814 ? expire-value, 1815 * ext-value, 1816 } 1818 rater-value = ( rater: text ) 1819 assertion-value = ( assertion: text ) 1820 rated-value = ( rated: text ) 1821 rating-value = ( rating: float16 ) 1822 conf-value = ( confidence: float16 ) 1823 normal-value = ( normal-rating: float16 ) 1824 sample-value = ( sample-size: uint ) 1825 gen-value = ( generated: uint ) 1826 expire-value = ( expires: uint ) 1827 ext-value = ( text => any ) 1829 An equivalent, more compact form of this example would be: 1831 reputation-object = { 1832 application: text 1833 reputons: [* reputon] 1834 } 1836 reputon = { 1837 rater: text 1838 assertion: text 1839 rated: text 1840 rating: float16 1841 ? confidence: float16 1842 ? normal-rating: float16 1843 ? sample-size: uint 1844 ? generated: uint 1845 ? expires: uint 1846 * text => any 1847 } 1849 Note how this rather clearly delineates the structure somewhat 1850 shrouded by so many words in section 6.2.2. of [RFC7071]. Also, this 1851 definition makes it clear that several ext-values are allowed (by 1852 definition with different member names); RFC 7071 could be read to 1853 forbid the repetition of ext-value ("A specific reputon-element MUST 1854 NOT appear more than once" is ambiguous.) 1856 The CDDL tool (which hasn't quite been trained for polite 1857 conversation) says: 1859 { 1860 "application": "tridentiferous", 1861 "reputons": [ 1862 { 1863 "rater": "loamily", 1864 "assertion": "Dasyprocta", 1865 "rated": "uncommensurableness", 1866 "rating": 0.05055809746548934, 1867 "confidence": 0.7484706448605812, 1868 "normal-rating": 0.8677887734049299, 1869 "sample-size": 4059, 1870 "expires": 3969, 1871 "bearer": "nitty", 1872 "faucal": "postulnar", 1873 "naturalism": "sarcotic" 1874 }, 1875 { 1876 "rater": "precreed", 1877 "assertion": "xanthosis", 1878 "rated": "balsamy", 1879 "rating": 0.36091333590593955, 1880 "confidence": 0.3700759808403371, 1881 "sample-size": 3904 1882 }, 1883 { 1884 "rater": "urinosexual", 1885 "assertion": "malacostracous", 1886 "rated": "arenariae", 1887 "rating": 0.9210673488013762, 1888 "normal-rating": 0.4778762617112776, 1889 "sample-size": 4428, 1890 "generated": 3294, 1891 "backfurrow": "enterable", 1892 "fruitgrower": "flannelflower" 1893 }, 1894 { 1895 "rater": "pedologistically", 1896 "assertion": "unmetaphysical", 1897 "rated": "elocutionist", 1898 "rating": 0.42073613384304287, 1899 "misimagine": "retinaculum", 1900 "snobbish": "contradict", 1901 "Bosporanic": "periostotomy", 1902 "dayworker": "intragyral" 1903 } 1904 ] 1905 } 1907 A.2. Examples from JSON Content Rules 1909 Although JSON Content Rules [I-D.newton-json-content-rules] seems to 1910 address a more general problem than CDDL, it is still a worthwhile 1911 resource to explore for examples (beyond all the inspiration the 1912 format itself has had for CDDL). 1914 Figure 2 of the JCR I-D looks very similar, if slightly less noisy, 1915 in CDDL: 1917 root = [2*2 { 1918 precision: text, 1919 Latitude: float, 1920 Longitude: float, 1921 Address: text, 1922 City: text, 1923 State: text, 1924 Zip: text, 1925 Country: text 1926 }] 1928 Figure 13: JCR, Figure 2, in CDDL 1930 Apart from the lack of a need to quote the member names, text strings 1931 are called "text" or "tstr" in CDDL ("string" would be ambiguous as 1932 CBOR also provides byte strings). 1934 The CDDL tool creates the below example instance for this: 1936 [{"precision": "pyrosphere", "Latitude": 0.5399712314350172, 1937 "Longitude": 0.5157523963028087, "Address": "resow", 1938 "City": "problemwise", "State": "martyrlike", "Zip": "preprove", 1939 "Country": "Pace"}, 1940 {"precision": "unrigging", "Latitude": 0.10422704368372193, 1941 "Longitude": 0.6279808663725834, "Address": "picturedom", 1942 "City": "decipherability", "State": "autometry", "Zip": "pout", 1943 "Country": "wimple"}] 1945 Figure 4 of the JCR I-D in CDDL: 1947 root = { image } 1949 image = ( 1950 Image: { 1951 size, 1952 Title: text, 1953 thumbnail, 1954 IDs: [* int] 1955 } 1956 ) 1958 size = ( 1959 Width: 0..1280 1960 Height: 0..1024 1961 ) 1963 thumbnail = ( 1964 Thumbnail: { 1965 size, 1966 Url: ~uri 1967 } 1968 ) 1970 This shows how the group concept can be used to keep related elements 1971 (here: width, height) together, and to emulate the JCR style of 1972 specification. (It also shows referencing a type by unwrapping a tag 1973 from the prelude, "uri" - this could be done differently.) The more 1974 compact form of Figure 5 of the JCR I-D could be emulated like this: 1976 root = { 1977 Image: { 1978 size, Title: text, 1979 Thumbnail: { size, Url: ~uri }, 1980 IDs: [* int] 1981 } 1982 } 1984 size = ( 1985 Width: 0..1280, 1986 Height: 0..1024, 1987 ) 1989 The CDDL tool creates the below example instance for this: 1991 {"Image": {"Width": 566, "Height": 516, "Title": "leisterer", 1992 "Thumbnail": {"Width": 1111, "Height": 176, "Url": 32("scrog")}, 1993 "IDs": []}} 1995 Appendix B. ABNF grammar 1997 The following is a formal definition of the CDDL syntax in Augmented 1998 Backus-Naur Form (ABNF, [RFC5234]). 2000 cddl = S 1*(rule S) 2001 rule = typename [genericparm] S assignt S type 2002 / groupname [genericparm] S assigng S grpent 2004 typename = id 2005 groupname = id 2007 assignt = "=" / "/=" 2008 assigng = "=" / "//=" 2010 genericparm = "<" S id S *("," S id S ) ">" 2011 genericarg = "<" S type1 S *("," S type1 S ) ">" 2013 type = type1 *(S "/" S type1) 2015 type1 = type2 [S (rangeop / ctlop) S type2] 2017 type2 = value 2018 / typename [genericarg] 2019 / "(" S type S ")" 2020 / "{" S group S "}" 2021 / "[" S group S "]" 2022 / "~" S typename [genericarg] 2023 / "&" S "(" S group S ")" 2024 / "&" S groupname [genericarg] 2025 / "#" "6" ["." uint] "(" S type S ")" ; note no space! 2026 / "#" DIGIT ["." uint] ; major/ai 2027 / "#" ; any 2029 rangeop = "..." / ".." 2031 ctlop = "." id 2033 group = grpchoice *(S "//" S grpchoice) 2035 grpchoice = *(grpent optcom) 2037 grpent = [occur S] [memberkey S] type 2038 / [occur S] groupname [genericarg] ; preempted by above 2039 / [occur S] "(" S group S ")" 2041 memberkey = type1 S ["^" S] "=>" 2042 / bareword S ":" 2043 / value S ":" 2045 bareword = id 2047 optcom = S ["," S] 2049 occur = [uint] "*" [uint] 2050 / "+" 2051 / "?" 2053 uint = DIGIT1 *DIGIT 2054 / "0x" 1*HEXDIG 2055 / "0b" 1*BINDIG 2056 / "0" 2058 value = number 2059 / text 2060 / bytes 2062 int = ["-"] uint 2064 ; This is a float if it has fraction or exponent; int otherwise 2065 number = hexfloat / (int ["." fraction] ["e" exponent ]) 2066 hexfloat = "0x" 1*HEXDIG ["." 1*HEXDIG] "p" exponent 2067 fraction = 1*DIGIT 2068 exponent = ["+"/"-"] 1*DIGIT 2070 text = %x22 *SCHAR %x22 2071 SCHAR = %x20-21 / %x23-5B / %x5D-10FFFD / SESC 2072 SESC = "\" %x20-10FFFD 2074 bytes = [bsqual] %x27 *BCHAR %x27 2075 BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / CRLF 2076 bsqual = "h" / "b64" 2078 id = EALPHA *(*("-" / ".") (EALPHA / DIGIT)) 2079 ALPHA = %x41-5A / %x61-7A 2080 EALPHA = ALPHA / "@" / "_" / "$" 2081 DIGIT = %x30-39 2082 DIGIT1 = %x31-39 2083 HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F" 2084 BINDIG = %x30-31 2086 S = *WS 2087 WS = SP / NL 2088 SP = %x20 2089 NL = COMMENT / CRLF 2090 COMMENT = ";" *PCHAR CRLF 2091 PCHAR = %x20-10FFFD 2092 CRLF = %x0A / %x0D.0A 2094 Figure 14: CDDL ABNF 2096 Note that this ABNF does not attempt to reflect the detailed rules of 2097 what can be in a prefixed byte string. 2099 Appendix C. Matching rules 2101 In this appendix, we go through the ABNF syntax rules defined in 2102 Appendix B and briefly describe the matching semantics of each 2103 syntactic feature. In this context, an instance (data item) 2104 "matches" a CDDL specification if it is allowed by the CDDL 2105 specification; this is then broken down to parts of specifications 2106 (type and group expressions) and parts of instances (data items). 2108 cddl = S 1*rule 2110 A CDDL specification is a sequence of one or more rules. Each rule 2111 gives a name to a right hand side expression, either a CDDL type or a 2112 CDDL group. Rule names can be used in the rule itself and/or other 2113 rules (and tools can output warnings if that is not the case). The 2114 order of the rules is significant only in two cases: 2116 1. The first rule defines the semantics of the entire specification; 2117 hence, there is no need to give that root rule a special name or 2118 special syntax in the language (as, e.g., with "start" in Relax- 2119 NG); its name can be therefore chosen to be descriptive. (As 2120 with all other rule names, the name of the initial rule may be 2121 used in itself or in other rules). 2123 2. Where a rule contributes to a type or group choice (using "/=" or 2124 "//="), that choice is populated in the order the rules are 2125 given; see below. 2127 rule = typename [genericparm] S assignt S type S 2128 / groupname [genericparm] S assigng S grpent S 2130 typename = id 2131 groupname = id 2133 A rule defines a name for a type expression (production "type") or 2134 for a group expression (production "grpent"), with the intention that 2135 the semantics does not change when the name is replaced by its 2136 (parenthesized if needed) definition. Note that whether the name 2137 defined by a rule stands for a type or a group isn't always 2138 determined by syntax alone: e.g., "a = b" can make "a" a type if "b" 2139 is one, or a group if "b" is one. More subtly, in "a = (b)", "a" may 2140 be used as a type if "b" is a type, or as a group both when "b" is a 2141 group and when "b" is a type (a good convention to make the latter 2142 case stand out to the human reader is to write "a = (b,)"). (Note 2143 that the same dual meaning of parentheses applies within an 2144 expression, but often can be resolved by the context of the 2145 parenthesized expression. On the more general point, it may not be 2146 clear immediately either whether "b" stands for a group or a type -- 2147 this semantic processing may need to span several levels of rule 2148 definitions before a determination can be made.) 2150 assignt = "=" / "/=" 2151 assigng = "=" / "//=" 2153 A plain equals sign defines the rule name as the equivalent of the 2154 expression to the right. A "/=" or "//=" extends a named type or a 2155 group by additional choices; a number of these could be replaced by 2156 collecting all the right hand sides and creating a single rule with a 2157 type choice or a group choice built from the right hand sides in the 2158 order of the rules given. (It is not an error to extend a rule name 2159 that has not yet been defined; this makes the right hand side the 2160 first entry in the choice being created.) 2162 genericparm = "<" S id S *("," S id S ) ">" 2163 genericarg = "<" S type1 S *("," S type1 S ) ">" 2165 Rule names can have generic parameters, which cause temporary 2166 assignments within the right hand sides to the parameter names from 2167 the arguments given when citing the rule name. 2169 type = type1 S *("/" S type1 S) 2171 A type can be given as a choice between one or more types. The 2172 choice matches a data item if the data item matches any one of the 2173 types given in the choice. The choice uses Parsing Expression 2174 Grammar [PEG] semantics: The first choice that matches wins. (As a 2175 result, the order of rules that contribute to a single rule name can 2176 very well matter.) 2178 type1 = type2 [S (rangeop / ctlop) S type2] 2180 Two types can be combined with a range operator (which see below) or 2181 a control operator (see Section 3.8). 2183 type2 = value 2184 A type can be just a single value (such as 1 or "icecream" or 2185 h'0815'), which matches only a data item with that specific value (no 2186 conversions defined), 2188 / typename [genericarg] 2190 or be defined by a rule giving a meaning to a name (possibly after 2191 supplying generic arguments as required by the generic parameters), 2193 / "(" S type S ")" 2195 or be defined in a parenthesized type expression (parentheses may be 2196 necessary to override some operator precedence), or 2198 / "{" S group S "}" 2200 a map expression, which matches a valid CBOR map the key/value pairs 2201 of which can be ordered in such a way that the resulting sequence 2202 matches the group expression, or 2204 / "[" S group S "]" 2206 an array expression, which matches a CBOR array the elements of 2207 which, when taken as values and complemented by a wildcard (matches 2208 anything) key each, match the group, or 2210 / "~" S typename [genericarg] 2212 an "unwrapped" group (see Section 3.7), which matches the group 2213 inside a type defined as a map or an array by wrapping the group, or 2215 / "&" S "(" S group S ")" 2216 / "&" S groupname [genericarg] 2218 an enumeration expression, which matches any a value that is within 2219 the set of values that the values of the group given can take, or 2221 / "#" "6" ["." uint] "(" S type S ")" ; note no space! 2223 a tagged data item, tagged with the "uint" given and containing the 2224 type given as the tagged value, or 2226 / "#" DIGIT ["." uint] ; major/ai 2228 a data item of a major type (given by the DIGIT), optionally 2229 constrained to the additional information given by the uint, or 2231 / "#" ; any 2233 any data item. 2235 rangeop = "..." / ".." 2237 A range operator can be used to join two type expressions that stand 2238 for either two integer values or two floating point values; it 2239 matches any value that is between the two values, where the first 2240 value is always included in the matching set and the second value is 2241 included for ".." and excluded for "...". 2243 ctlop = "." id 2245 A control operator ties a _target_ type to a _controller_ type as 2246 defined in Section 3.8. Note that control operators are an extension 2247 point for CDDL; additional documents may want to define additional 2248 control operators. 2250 group = grpchoice S *("//" S grpchoice S) 2252 A group matches any sequence of key/value pairs that matches any of 2253 the choices given (again using Parsing Expression Grammar semantics). 2255 grpchoice = *grpent 2257 Each of the component groups is given as a sequence of group entries. 2258 For a match, the sequence of key/value pairs given needs to match the 2259 sequence of group entries in the sequence given. 2261 grpent = [occur S] [memberkey S] type optcom 2263 A group entry can be given by a value type, which needs to be matched 2264 by the value part of a single element, and optionally a memberkey 2265 type, which needs to be matched by the key part of the element, if 2266 the memberkey is given. If the memberkey is not given, the entry can 2267 only be used for matching arrays, not for maps. (See below how that 2268 is modified by the occurrence indicator.) 2270 / [occur S] groupname [genericarg] optcom ; preempted by above 2272 A group entry can be built from a named group, or 2274 / [occur S] "(" S group S ")" optcom 2276 from a parenthesized group, again with a possible occurrence 2277 indicator. 2279 memberkey = type1 S ["^" S] "=>" 2280 / bareword S ":" 2281 / value S ":" 2283 Key types can be given by a type expression, a bareword (which stands 2284 for a type that just contains a string value created from this 2285 bareword), or a value (which stands for a type that just contains 2286 this value). A key value matches its key type if the key value is a 2287 member of the key type, unless a cut preceding it in the group 2288 applies (see Section 3.5.4 how map matching is influenced by the 2289 presence of the cuts denoted by "^" or ":" in previous entries). 2291 bareword = id 2293 A bareword is an alternative way to write a type with a single text 2294 string value; it can only be used in the syntactic context given 2295 above. 2297 optcom = S ["," S] 2299 (Optional commas do not influence the matching.) 2301 occur = [uint] "*" [uint] 2302 / "+" 2303 / "?" 2305 An occurrence indicator modifies the group given to its right by 2306 requiring the group to match the sequence to be matched exactly for a 2307 certain number of times (see Section 3.2) in sequence, i.e. it acts 2308 as a (possibly infinite) group choice that contains choices with the 2309 group repeated each of the occurrences times. 2311 The rest of the ABNF describes syntax for value notation that should 2312 be familiar from programming languages, with the possible exception 2313 of h'..' and b64'..' for byte strings, as well as syntactic elements 2314 such as comments and line ends. 2316 Appendix D. Standard Prelude 2318 The following prelude is automatically added to each CDDL file. 2319 (Note that technically, it is a postlude, as it does not disturb the 2320 selection of the first rule as the root of the definition.) 2321 any = # 2323 uint = #0 2324 nint = #1 2325 int = uint / nint 2327 bstr = #2 2328 bytes = bstr 2329 tstr = #3 2330 text = tstr 2332 tdate = #6.0(tstr) 2333 time = #6.1(number) 2334 number = int / float 2335 biguint = #6.2(bstr) 2336 bignint = #6.3(bstr) 2337 bigint = biguint / bignint 2338 integer = int / bigint 2339 unsigned = uint / biguint 2340 decfrac = #6.4([e10: int, m: integer]) 2341 bigfloat = #6.5([e2: int, m: integer]) 2342 eb64url = #6.21(any) 2343 eb64legacy = #6.22(any) 2344 eb16 = #6.23(any) 2345 encoded-cbor = #6.24(bstr) 2346 uri = #6.32(tstr) 2347 b64url = #6.33(tstr) 2348 b64legacy = #6.34(tstr) 2349 regexp = #6.35(tstr) 2350 mime-message = #6.36(tstr) 2351 cbor-any = #6.55799(any) 2353 float16 = #7.25 2354 float32 = #7.26 2355 float64 = #7.27 2356 float16-32 = float16 / float32 2357 float32-64 = float32 / float64 2358 float = float16-32 / float64 2360 false = #7.20 2361 true = #7.21 2362 bool = false / true 2363 nil = #7.22 2364 null = nil 2365 undefined = #7.23 2367 Figure 15: CDDL Prelude 2369 Note that the prelude is deemed to be fixed. This means, for 2370 instance, that additional tags beyond [RFC7049], as registered, need 2371 to be defined in each CDDL file that is using them. 2373 A common stumbling point is that the prelude does not define a type 2374 "string". CBOR has byte strings ("bytes" in the prelude) and text 2375 strings ("text"), so a type that is simply called "string" would be 2376 ambiguous. 2378 Appendix E. Use with JSON 2380 The JSON generic data model (implicit in [RFC8259]) is a subset of 2381 the generic data model of CBOR. So one can use CDDL with JSON by 2382 limiting oneself to what can be represented in JSON. Roughly 2383 speaking, this means leaving out byte strings, tags, and simple 2384 values other than "false", "true", and "null", leading to the 2385 following limited prelude: 2387 any = # 2389 uint = #0 2390 nint = #1 2391 int = uint / nint 2393 tstr = #3 2394 text = tstr 2396 number = int / float 2398 float16 = #7.25 2399 float32 = #7.26 2400 float64 = #7.27 2401 float16-32 = float16 / float32 2402 float32-64 = float32 / float64 2403 float = float16-32 / float64 2405 false = #7.20 2406 true = #7.21 2407 bool = false / true 2408 nil = #7.22 2409 null = nil 2411 Figure 16: JSON compatible subset of CDDL Prelude 2413 (The major types given here do not have a direct meaning in JSON, but 2414 they can be interpreted as CBOR major types translated through 2415 Section 4 of [RFC7049].) 2416 There are a few fine points in using CDDL with JSON. First, JSON 2417 does not distinguish between integers and floating point numbers; 2418 there is only one kind of number (which may happen to be integral). 2419 In this context, specifying a type as "uint", "nint" or "int" then 2420 becomes a predicate that the number be integral. As an example, this 2421 means that the following JSON numbers are all matching "uint": 2423 10 10.0 1e1 1.0e1 100e-1 2425 (The fact that these are all integers may be surprising to users 2426 accustomed to the long tradition in programming languages of using 2427 decimal points or exponents in a number to indicate a floating point 2428 literal.) 2430 CDDL distinguishes the various CBOR number types, but there is only 2431 one number type in JSON. The effect of specifying a floating point 2432 precision (float16/float32/float64) is only to restrict the set of 2433 permissible values to those expressible with binary16/binary32/ 2434 binary64; this is unlikely to be very useful when using CDDL for 2435 specifying JSON data structures. 2437 Fundamentally, the number system of JSON itself is based on decimal 2438 numbers and decimal fractions and does not have limits to its 2439 precision or range. In practice, JSON numbers are often parsed into 2440 a number type that is called float64 here, creating a number of 2441 limitations to the generic data model [RFC7493]. In particular, this 2442 means that integers can only be expressed with interoperable 2443 exactness when they lie in the range [-(2**53)+1, (2**53)-1] -- a 2444 smaller range than that covered by CDDL "int". 2446 JSON applications that want to stay compatible with I-JSON therefore 2447 may want to define integer types with more limited ranges, such as in 2448 Figure 17. Note that the types given here are not part of the 2449 prelude; they need to be copied into the CDDL specification if 2450 needed. 2452 ij-uint = 0..9007199254740991 2453 ij-nint = -9007199254740991..-1 2454 ij-int = -9007199254740991..9007199254740991 2456 Figure 17: I-JSON types for CDDL (not part of prelude) 2458 JSON applications that do not need to stay compatible with I-JSON and 2459 that actually may need to go beyond the 64-bit unsigned and negative 2460 integers supported by "int" (= "uint"/"nint") may want to use the 2461 following additional types from the standard prelude, which are 2462 expressed in terms of tags but can straightforwardly be mapped into 2463 JSON (but not I-JSON) numbers: 2465 biguint = #6.2(bstr) 2466 bignint = #6.3(bstr) 2467 bigint = biguint / bignint 2468 integer = int / bigint 2469 unsigned = uint / biguint 2471 CDDL at this point does not have a way to express the unlimited 2472 floating point precision that is theoretically possible with JSON; at 2473 the time of writing, this is rarely used in protocols in practice. 2475 Note that a data model described in CDDL is always restricted by what 2476 can be expressed in the serialization; e.g., floating point values 2477 such as NaN (not a number) and the infinities cannot be represented 2478 in JSON even if they are allowed in the CDDL generic data model. 2480 Appendix F. The CDDL tool 2482 A rough CDDL tool is available. For CDDL specifications, it can 2483 check the syntax, generate one or more instances (expressed in CBOR 2484 diagnostic notation or in pretty-printed JSON), and validate an 2485 existing instance against the specification: 2487 Usage: 2488 cddl spec.cddl generate [n] 2489 cddl spec.cddl json-generate [n] 2490 cddl spec.cddl validate instance.cbor 2491 cddl spec.cddl validate instance.json 2493 Figure 18: CDDL tool usage 2495 Install on a system with a modern Ruby via: 2497 gem install cddl 2499 Figure 19: CDDL tool installation 2501 The accompanying CBOR diagnostic tools (which are automatically 2502 installed by the above) are described in https://github.com/cabo/ 2503 cbor-diag [1]; they can be used to convert between binary CBOR, a 2504 pretty-printed form of that, CBOR diagnostic notation, JSON, and 2505 YAML. 2507 Appendix G. Extended Diagnostic Notation 2509 Section 6 of [RFC7049] defines a "diagnostic notation" in order to be 2510 able to converse about CBOR data items without having to resort to 2511 binary data. Diagnostic notation is based on JSON, with extensions 2512 for representing CBOR constructs such as binary data and tags. 2514 (Standardizing this together with the actual interchange format does 2515 not serve to create another interchange format, but enables the use 2516 of a shared diagnostic notation in tools for and documents about 2517 CBOR.) 2519 This section discusses a few extensions to the diagnostic notation 2520 that have turned out to be useful since RFC 7049 was written. We 2521 refer to the result as extended diagnostic notation (EDN). 2523 G.1. White space in byte string notation 2525 Examples often benefit from some white space (spaces, line breaks) in 2526 byte strings. In extended diagnostic notation, white space is 2527 ignored in prefixed byte strings; for instance, the following are 2528 equivalent: 2530 h'48656c6c6f20776f726c64' 2531 h'48 65 6c 6c 6f 20 77 6f 72 6c 64' 2532 h'4 86 56c 6c6f 2533 20776 f726c64' 2535 G.2. Text in byte string notation 2537 Diagnostic notation notates Byte strings in one of the [RFC4648] base 2538 encodings,, enclosed in single quotes, prefixed by >h< for base16, 2539 >b32< for base32, >h32< for base32hex, >b64< for base64 or base64url. 2540 Quite often, byte strings carry bytes that are meaningfully 2541 interpreted as UTF-8 text. Extended Diagnostic Notation allows the 2542 use of single quotes without a prefix to express byte strings with 2543 UTF-8 text; for instance, the following are equivalent: 2545 'hello world' 2546 h'68656c6c6f20776f726c64' 2548 The escaping rules of JSON strings are applied equivalently for text- 2549 based byte strings, e.g., \ stands for a single backslash and ' 2550 stands for a single quote. White space is included literally, i.e., 2551 the previous section does not apply to text-based byte strings. 2553 G.3. Embedded CBOR and CBOR sequences in byte strings 2555 Where a byte string is to carry an embedded CBOR-encoded item, or 2556 more generally a sequence of zero or more such items, the diagnostic 2557 notation for these zero or more CBOR data items, separated by 2558 commata, can be enclosed in << and >> to notate the byte string 2559 resulting from encoding the data items and concatenating the result. 2560 For instance, each pair of columns in the following are equivalent: 2562 <<1>> h'01' 2563 <<1, 2>> h'0102' 2564 <<"foo", null>> h'63666F6FF6' 2565 <<>> h'' 2567 G.4. Concatenated Strings 2569 While the ability to include white space enables line-breaking of 2570 encoded byte strings, a mechanism is needed to be able to include 2571 text strings as well as byte strings in direct UTF-8 representation 2572 into line-based documents (such as RFCs and source code). 2574 We extend the diagnostic notation by allowing multiple text strings 2575 or multiple byte strings to be notated separated by white space, 2576 these are then concatenated into a single text or byte string, 2577 respectively. Text strings and byte strings do not mix within such a 2578 concatenation, except that byte string notation can be used inside a 2579 sequence of concatenated text string notation to encode characters 2580 that may be better represented in an encoded way. The following four 2581 values are equivalent: 2583 "Hello world" 2584 "Hello " "world" 2585 "Hello" h'20' "world" 2586 "" h'48656c6c6f20776f726c64' "" 2588 Similarly, the following byte string values are equivalent 2590 'Hello world' 2591 'Hello ' 'world' 2592 'Hello ' h'776f726c64' 2593 'Hello' h'20' 'world' 2594 '' h'48656c6c6f20776f726c64' '' b64'' 2595 h'4 86 56c 6c6f' h' 20776 f726c64' 2597 (Note that the approach of separating by whitespace, while familiar 2598 from the C language, requires some attention - a single comma makes a 2599 big difference here.) 2601 G.5. Hexadecimal, octal, and binary numbers 2603 In addition to JSON's decimal numbers, EDN provides hexadecimal, 2604 octal and binary numbers in the usual C-language notation (octal with 2605 0o prefix present only). 2607 The following are equivalent: 2609 4711 2610 0x1267 2611 0o11147 2612 0b1001001100111 2614 As are: 2616 1.5 2617 0x1.8p0 2618 0x18p-4 2620 G.6. Comments 2622 Longer pieces of diagnostic notation may benefit from comments. JSON 2623 famously does not provide for comments, and basic RFC 7049 diagnostic 2624 notation inherits this property. 2626 In extended diagnostic notation, comments can be included, delimited 2627 by slashes ("/"). Any text within and including a pair of slashes is 2628 considered a comment. 2630 Comments are considered white space. Hence, they are allowed in 2631 prefixed byte strings; for instance, the following are equivalent: 2633 h'68656c6c6f20776f726c64' 2634 h'68 65 6c /doubled l!/ 6c 6f /hello/ 2635 20 /space/ 2636 77 6f 72 6c 64' /world/ 2638 This can be used to annotate a CBOR structure as in: 2640 /grasp-message/ [/M_DISCOVERY/ 1, /session-id/ 10584416, 2641 /objective/ [/objective-name/ "opsonize", 2642 /D, N, S/ 7, /loop-count/ 105]] 2644 (There are currently no end-of-line comments. If we want to add 2645 them, "//" sounds like a reasonable delimiter given that we already 2646 use slashes for comments, but we also could go e.g. for "#".) 2648 Contributors 2650 CDDL was originally conceived by Bert Greevenbosch, who also wrote 2651 the original five versions of this document. 2653 Acknowledgements 2655 Inspiration was taken from the C and Pascal languages, MPEG's 2656 conventions for describing structures in the ISO base media file 2657 format, Relax-NG and its compact syntax [RELAXNG], and in particular 2658 from Andrew Lee Newton's "JSON Content Rules" 2659 [I-D.newton-json-content-rules]. 2661 Lots of highly useful feedback came from members of the IETF CBOR WG, 2662 in particular Ari Keraenen, Brian Carpenter, Burt Harris, Jeffrey 2663 Yasskin, Jim Hague, Jim Schaad, Joe Hildebrand, Max Pritikin, Michael 2664 Richardson, Pete Cordell, Sean Leonard, and Yaron Sheffer. Also, 2665 Francesca Palombini and Joe volunteered to chair the WG when it was 2666 created, providing the framework for generating and processing this 2667 feedback; with Barry Leiba having taken over from Joe since. 2669 The CDDL tool was written by Carsten Bormann, building on previous 2670 work by Troy Heninger and Tom Lord. 2672 Authors' Addresses 2674 Henk Birkholz 2675 Fraunhofer SIT 2676 Rheinstrasse 75 2677 Darmstadt 64295 2678 Germany 2680 Email: henk.birkholz@sit.fraunhofer.de 2682 Christoph Vigano 2683 Universitaet Bremen 2685 Email: christoph.vigano@uni-bremen.de 2687 Carsten Bormann 2688 Universitaet Bremen TZI 2689 Bibliothekstr. 1 2690 Bremen D-28359 2691 Germany 2693 Phone: +49-421-218-63921 2694 Email: cabo@tzi.org