idnits 2.17.1 draft-greevenbosch-appsawg-cbor-cddl-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 13, 2017) is 2601 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 7049 (Obsoleted by RFC 8949) ** Obsolete normative reference: RFC 7159 (Obsoleted by RFC 8259) == Outdated reference: A later version (-09) exists of draft-newton-json-content-rules-07 Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group H. Birkholz 3 Internet-Draft Fraunhofer SIT 4 Intended status: Informational C. Vigano 5 Expires: September 14, 2017 Universitaet Bremen 6 C. Bormann 7 Universitaet Bremen TZI 8 March 13, 2017 10 CBOR data definition language (CDDL): a notational convention to express 11 CBOR data structures 12 draft-greevenbosch-appsawg-cbor-cddl-10 14 Abstract 16 This document proposes a notational convention to express CBOR data 17 structures (RFC 7049). Its main goal is to provide an easy and 18 unambiguous way to express structures for protocol messages and data 19 formats that use CBOR. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on September 14, 2017. 38 Copyright Notice 40 Copyright (c) 2017 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 56 1.1. Requirements notation . . . . . . . . . . . . . . . . . . 4 57 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 58 2. The Style of Data Structure Specification . . . . . . . . . . 4 59 2.1. Groups and Composition in CDDL . . . . . . . . . . . . . 5 60 2.1.1. Usage . . . . . . . . . . . . . . . . . . . . . . . . 7 61 2.1.2. Syntax . . . . . . . . . . . . . . . . . . . . . . . 8 62 2.2. Types . . . . . . . . . . . . . . . . . . . . . . . . . . 8 63 2.2.1. Values . . . . . . . . . . . . . . . . . . . . . . . 8 64 2.2.2. Choices . . . . . . . . . . . . . . . . . . . . . . . 8 65 2.2.3. Representation Types . . . . . . . . . . . . . . . . 10 66 2.2.4. Root type . . . . . . . . . . . . . . . . . . . . . . 10 67 3. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 68 3.1. General conventions . . . . . . . . . . . . . . . . . . . 11 69 3.2. Occurrence . . . . . . . . . . . . . . . . . . . . . . . 12 70 3.3. Predefined names for types . . . . . . . . . . . . . . . 13 71 3.4. Arrays . . . . . . . . . . . . . . . . . . . . . . . . . 14 72 3.5. Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 15 73 3.5.1. Structs . . . . . . . . . . . . . . . . . . . . . . . 15 74 3.5.2. Tables . . . . . . . . . . . . . . . . . . . . . . . 17 75 3.6. Tags . . . . . . . . . . . . . . . . . . . . . . . . . . 18 76 3.7. Annotations . . . . . . . . . . . . . . . . . . . . . . . 18 77 3.7.1. Annotation .size . . . . . . . . . . . . . . . . . . 19 78 3.7.2. Annotation .bits . . . . . . . . . . . . . . . . . . 19 79 3.7.3. Annotation .regexp . . . . . . . . . . . . . . . . . 20 80 3.7.4. Annotations .cbor and .cborseq . . . . . . . . . . . 21 81 3.7.5. Annotations .within and .and . . . . . . . . . . . . 21 82 3.7.6. Annotations .lt, .le, .gt, .ge, .eq, .ne, and 83 .default . . . . . . . . . . . . . . . . . . . . . . 22 84 3.8. Socket/Plug . . . . . . . . . . . . . . . . . . . . . . . 22 85 3.9. Operator Precedence . . . . . . . . . . . . . . . . . . . 24 86 4. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 26 87 4.1. Fruit . . . . . . . . . . . . . . . . . . . . . . . . . . 26 88 4.2. RFC 7071 . . . . . . . . . . . . . . . . . . . . . . . . 27 89 4.3. Examples from JSON Content Rules . . . . . . . . . . . . 31 90 5. Making Use of CDDL . . . . . . . . . . . . . . . . . . . . . 33 91 5.1. As a guide to a human user . . . . . . . . . . . . . . . 33 92 5.2. For automated checking of CBOR data structure . . . . . . 33 93 5.3. For data analysis tools . . . . . . . . . . . . . . . . . 33 94 6. Security considerations . . . . . . . . . . . . . . . . . . . 34 95 7. IANA considerations . . . . . . . . . . . . . . . . . . . . . 34 96 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 34 97 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 34 98 9.1. Normative References . . . . . . . . . . . . . . . . . . 34 99 9.2. Informative References . . . . . . . . . . . . . . . . . 35 100 Appendix A. Cemetery . . . . . . . . . . . . . . . . . . . . . . 35 101 A.1. Resolved Issues . . . . . . . . . . . . . . . . . . . . . 36 102 Appendix B. Nursery . . . . . . . . . . . . . . . . . . . . . . 36 103 B.1. Generics . . . . . . . . . . . . . . . . . . . . . . . . 36 104 Appendix C. Change Log . . . . . . . . . . . . . . . . . . . . . 36 105 Appendix D. ABNF grammar . . . . . . . . . . . . . . . . . . . . 39 106 Appendix E. Standard Prelude . . . . . . . . . . . . . . . . . . 41 107 Appendix F. The CDDL tool . . . . . . . . . . . . . . . . . . . 43 108 Appendix G. Extended Diagnostic Notation . . . . . . . . . . . . 43 109 G.1. White space in binary strings . . . . . . . . . . . . . . 44 110 G.2. Text in binary strings . . . . . . . . . . . . . . . . . 44 111 G.3. Concatenated Strings . . . . . . . . . . . . . . . . . . 44 112 G.4. Hexadecimal, octal, and binary numbers . . . . . . . . . 45 113 G.5. Comments . . . . . . . . . . . . . . . . . . . . . . . . 45 114 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 47 116 1. Introduction 118 In this document, a notational convention to express CBOR [RFC7049] 119 data structures is defined. 121 The main goal for the convention is to provide a unified notation 122 that can be used when defining protocols that use CBOR. We term the 123 convention "CBOR data definition language", or CDDL. 125 The CBOR notational convention has the following goals: 127 (G1) Provide an unambiguous description of the overall structure of 128 a CBOR data structure. 130 (G2) Flexibility to express the freedoms of choice in the CBOR data 131 format. 133 (G3) Possibility to restrict format choices where appropriate 134 [_format]. 136 (G4) Able to express common CBOR datatypes and structures. 138 (G5) Human and machine readable and processable. 140 (G6) Automatic checking of data format compliance. 142 (G7) Extraction of specific elements from CBOR data for further 143 processing. 145 This document has the following structure: 147 The syntax of CDDL is defined in Section 3. Examples of CDDL and 148 related CBOR data instances are defined in Section 4. Section 5 149 discusses usage of CDDL. Examples are provided early in the text to 150 better illustrate concept definitions. A formal definition of CDDL 151 using ABNF grammar is provided in Appendix D. Finally, a prelude of 152 standard CDDL definitions available in every CBOR specification is 153 listed in Appendix E. 155 1.1. Requirements notation 157 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 158 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 159 "OPTIONAL" in this document are to be interpreted as described in RFC 160 2119, BCP 14 [RFC2119]. 162 1.2. Terminology 164 New terms are introduced in _cursive_. CDDL text in the running text 165 is in "typewriter". 167 2. The Style of Data Structure Specification 169 CDDL focuses on styles of specification that are in use in the 170 community employing the data model as pioneered by JSON and now 171 refined in CBOR. 173 There are a number of more or less atomic elements of a CBOR data 174 model, such as numbers, simple values (false, true, nil), text and 175 byte strings; CDDL does not focus on specifying their structure. 176 CDDL of course also allows adding a CBOR tag to a data item. 178 The more important components of a data structure definition language 179 are the data types used for composition: arrays and maps in CBOR 180 (called arrays and objects in JSON). While these are only two 181 representation formats, they are used to specify four loosely 182 distinguishable styles of composition: 184 o A _vector_, an array of elements that are mostly of the same 185 semantics. The set of signatures associated with a signed data 186 item is a typical application of a vector. 188 o A _record_, an array the elements of which have different, 189 positionally defined semantics, as detailed in the data structure 190 definition. A 2D point, specified as an array of an x coordinate 191 (which comes first) and a y coordinate (coming second) is an 192 example of a record, as is the pair of exponent (first) and 193 mantissa (second) in a CBOR decimal fraction. 195 o A _table_, a map from a domain of map keys to a domain of map 196 values, that are mostly of the same semantics. A set of language 197 tags, each mapped to a text string translated to that specific 198 language, is an example of a table. The key domain is usually not 199 limited to a specific set by the specification, but open for the 200 application, e.g., in a table mapping IP addresses to MAC 201 addresses, the specification does not attempt to foresee all 202 possible IP addresses. 204 o A _struct_, a map from a domain of map keys as defined by the 205 specification to a domain of map values the semantics of each of 206 which is bound to a specific map key. This is what many people 207 have in mind when they think about JSON objects; CBOR adds the 208 ability to use map keys that are not just text strings. Structs 209 can be used to solve similar problems as records; the use of 210 explicit map keys facilitates optionality and extensibility. 212 Two important concepts provide the foundation for CDDL: 214 1. Instead of defining all four types of composition in CDDL 215 separately, or even defining one kind for arrays (vectors and 216 records) and one kind for maps (tables and structs), there is 217 only one kind of composition in CDDL: the _group_ (Section 2.1). 219 2. The other important concept is that of a _type_. The entire CDDL 220 specification defines a type (the one defined by its first 221 _rule_), which formally is the set of CBOR instances that are 222 acceptable for this specification. CDDL predefines a number of 223 basic types such as "uint" (unsigned integer) or "tstr" (text 224 string), often making use of a simple formal notation for CBOR 225 data items. Each value that can be expressed as a CBOR data item 226 also is a type in its own right, e.g. "1". A type can be built 227 as a _choice_ of other types, e.g., an "int" is either a "uint" 228 or a "nint" (negative integer). Finally, a type can be built as 229 an array or a map from a group. 231 2.1. Groups and Composition in CDDL 233 CDDL Groups are lists of name/value pairs (group _entries_). 235 In an array context, only the value of the entry is represented; the 236 name is annotation only (and can be left off if not needed). In a 237 map context, the names become the map keys ("member keys"). 239 In an array context, the sequence of elements in the group is 240 important, as it is the information that allows associating actual 241 array elements with entries in the group. In a map context, the 242 sequence of entries in a group is not relevant (but there is still a 243 need to write down group entries in a sequence). 245 A group can be placed in (round) parentheses, and given a name by 246 using it in a rule: 248 pii = ( 249 age: int, 250 name: tstr, 251 employer: tstr, 252 ) 254 Figure 1: A basic group 256 Or a group can just be used in the definition of something else: 258 person = {( 259 age: int, 260 name: tstr, 261 employer: tstr, 262 )} 264 Figure 2: Using a group in a map 266 which, given the above rule for pii, is identical to: 268 person = { 269 pii 270 } 272 Figure 3: Using a group by name 274 Note that the (curly) braces signify the creation of a map; the 275 groups themselves are neutral as to whether they will be used in a 276 map or an array. 278 The parentheses for groups are optional when there is some other set 279 of brackets present, so it would be slightly more natural to express 280 Figure 2 as: 282 person = { 283 age: int, 284 name: tstr, 285 employer: tstr, 286 } 288 Groups can be used to factor out common parts of structs, e.g., 289 instead of writing: 291 person = { 292 age: int, 293 name: tstr, 294 employer: tstr, 295 } 297 dog = { 298 age: int, 299 name: tstr, 300 leash-length: float, 301 } 303 one can choose a name for the common subgroup and write: 305 person = { 306 identity, 307 employer: tstr, 308 } 310 dog = { 311 identity, 312 leash-length: float, 313 } 315 identity = ( 316 age: int, 317 name: tstr, 318 ) 320 Figure 4: Using a group for factorization 322 Note that the contents of the braces in the above definitions 323 constitute (anonymous) groups, while "identity" is a named group. 325 2.1.1. Usage 327 Groups are the instrument used in composing data structures with 328 CDDL. It is a matter of style in defining those structures whether 329 to define groups (anonymously) right in their contexts or whether to 330 define them in a separate rule and to reference them with their 331 respective name (possibly more than once). 333 With this, one is allowed to define all small parts of their data 334 structures and compose bigger protocol units with those or to have 335 only one big protocol data unit that has all definitions ad hoc where 336 needed. 338 2.1.2. Syntax 340 The composition syntax intends to be concise and easy to read: 342 o The start of a group can be marked by '(' 344 o The end of a group can be marked by ')' 346 o Definitions of entries inside of a group are noted as follows: 347 _keytype => valuetype,_ (read "keytype maps to valuetype"). The 348 comma is actually optional (not just in the final entry), but it 349 is considered good style to set it. The double arrow can be 350 replaced by a colon in the common case of directly using a text 351 string as a key (see Section 3.5.1). 353 An entry consists of a _keytype_ and a _valuetype_: 355 o _keytype_ is either an atom used as the actual key or a type in 356 general. The latter case may be needed when using groups in a 357 table context, where the actual keys are of lesser importance than 358 the key types, e.g in contexts verifying incoming data. 360 o _valuetype_ is a type, which could be derived from the major types 361 defined in [RFC7049], could be a convenience valuetype defined in 362 this document (Appendix E) or the name of a type defined in the 363 specification. 365 A group definition can also contain choices between groups, see 366 Section 2.2.2. 368 2.2. Types 370 2.2.1. Values 372 Values such as numbers and strings can be used in place of a type. 373 (For instance, this is a very common thing to do for a keytype, 374 common enough that CDDL provides additional convenience syntax for 375 this.) 377 2.2.2. Choices 379 Many places that allow a type also allow a choice between types, 380 delimited by a "/" (slash). The entire choice construct can be put 381 into parentheses if this is required to make the construction 382 unambiguous (please see Appendix D for the details). 384 Choices of values can be used to express enumerations: 386 attire = "bow tie" / "necktie" / "Internet attire" 387 protocol = 6 / 17 389 Similarly as for types, CDDL also allows choices between groups, 390 delimited by a "//" (double slash). 392 address = { delivery } 394 delivery = ( 395 street: tstr, ? number: uint, city // 396 po-box: uint, city // 397 per-pickup: true ) 399 city = ( 400 name: tstr, zip-code: uint 401 ) 403 Both for type choices and for group choices, additional alternatives 404 can be added to a rule later in separate rules by using "/=" and 405 "//=", respectively, instead of "=": 407 attire /= "swimwear" 409 delivery //= ( 410 lat: float, long: float, drone-type: tstr 411 ) 413 It is not an error if a name is first used with a "/=" or "//=" 414 (there is no need to "create it" with "="). 416 2.2.2.1. Ranges 418 Instead of naming all the values that make up a choice, CDDL allows 419 building a _range_ out of two values that are in an ordering 420 relationship. A range can be inclusive of both ends given (denoted 421 by joining two values by ".."), or include the first and exclude the 422 second (denoted by instead using "..."). 424 device-address = byte 425 max-byte = 255 426 byte = 0..max-byte ; inclusive range 427 first-non-byte = 256 428 byte1 = 0...first-non-byte ; byte1 is equivalent to byte 430 CDDL currently only allows ranges between numbers [_range]. 432 2.2.2.2. Turning a group into a choice 434 Some choices are built out of large numbers of values, often 435 integers, each of which is best given a semantic name in the 436 specification. Instead of naming each of these integers and then 437 accumulating these into a choice, CDDL allows building a choice from 438 a group by prefixing it with a "&" character: 440 terminal-color = &basecolors 441 basecolors = ( 442 black: 0, red: 1, green: 2, yellow: 3, 443 blue: 4, magenta: 5, cyan: 6, white: 7, 444 ) 445 extended-color = &( 446 basecolors, 447 orange: 8, pink: 9, purple: 10, brown: 11, 448 ) 450 As with the use of groups in arrays (Section 3.4), the membernames 451 have only documentary value (in particular, they might be used by a 452 tool when displaying integers that are taken from that choice). 454 2.2.3. Representation Types 456 CDDL allows the specification of a data item type by referring to the 457 CBOR representation (major and minor numbers). How this is used 458 should be evident from the prelude (Appendix E). 460 It may be necessary to make use of representation types outside the 461 prelude, e.g., a specification could start by making use of an 462 existing tag in a more specific way, or define a new tag not defined 463 in the prelude: 465 my_breakfast = #6.55799(breakfast) ; cbor-any is too general! 466 breakfast = cereal / porridge 467 cereal = #6.998(tstr) 468 porridge = #6.999([liquid, solid]) 469 liquid = milk / water 470 milk = 0 471 water = 1 472 solid = tstr 474 2.2.4. Root type 476 There is no special syntax to identify the root of a CDDL data 477 structure definition: that role is simply taken by the first rule 478 defined in the file. 480 This is motivated by the usual top-down approach for defining data 481 structures, decomposing a big data structure unit into smaller parts; 482 however, except for the root type, there is no need to strictly 483 follow this sequence. 485 (Note that there is no way to use a group as a root - it must be a 486 type. Using a group as the root might be employed as a way to 487 specify a CBOR sequence in a future version of this specification; 488 this would act as if that group is used in an array and the data 489 items in that fictional array form the members of the CBOR sequence.) 491 3. Syntax 493 In this section, the overall syntax of CDDL is shown, alongside some 494 examples just illustrating syntax. (The definition will not attempt 495 to be overly formal; refer to Appendix D for the details.) 497 3.1. General conventions 499 The basic syntax is inspired by ABNF [RFC5234], with 501 o rules, whether they define groups or types, are defined with a 502 name, followed by an equals sign "=" and the actual definition 503 according to the respective syntactic rules of that definition. 505 o A name can consist of any of the characters from the set {'A', 506 ..., 'Z', 'a', ..., 'z', '0', ..., '9', '_', '-', '@', '.', '$'}, 507 starting with an alphabetic character (including '@', '_', '$') 508 and ending in one or a digit. 510 * Names are case sensitive. 512 * It is preferred style to start a name with a lower case letter. 514 * The hyphen is preferred over the underscore (except in a 515 "bareword" (Section 3.5.1), where the semantics may actually 516 require an underscore). 518 * The period may be useful for larger specifications, to express 519 some module structure (as in "tcp.throughput" vs. 520 "udp.throughput"). 522 * A number of names are predefined in the CDDL prelude, as listed 523 in Appendix E. 525 * Rule names (types or groups) do not appear in the actual CBOR 526 encoding, but names used as "barewords" in member keys do. 528 o Comments are started by a ';' (semicolon) character and finish at 529 the end of a line (LF or CRLF). 531 o outside strings, whitespace (spaces, newlines, and comments) is 532 used to separate syntactic elements for readability (and to 533 separate identifiers or numbers that follow each other); it is 534 otherwise completely optional. 536 o Hexadecimal numbers are preceded by '0x' (without quotes, lower 537 case x), and are case insensitive. Similarly, binary numbers are 538 preceded by '0b'. 540 o Text strings are enclosed by double quotation '"' characters. 541 They follow the conventions for strings as defined in section 7 of 542 [RFC7159]. (ABNF users may want to note that there is no support 543 in CDDL for the concept of case insensitivity in text strings; if 544 necessary, regular expressions can be used (Section 3.7.3).) 546 o Byte strings are enclosed by single quotation "'" characters and 547 may be prefixed by "h" or "b64". If unprefixed, the string is 548 interpreted as with a text string, except that single quotes must 549 be escaped and that the UTF-8 bytes resulting are marked as a byte 550 string (major type 2). If prefixed as "h" or "b64", the string is 551 interpreted as a sequence of hex digits or a base64(url) string, 552 respectively (as with the diagnostic notation in section 6 of 553 [RFC7049]; cf. Appendix G.2); any white space present within the 554 string (including comments) is ignored in the prefixed case. 555 [_strings] 557 o CDDL uses UTF-8 [RFC3629] for its encoding. 559 Example: 561 ; This is a comment 562 person = { g } 564 g = ( 565 "name": tstr, 566 age: int, ; "age" is a bareword 567 ) 569 3.2. Occurrence 571 An optional _occurrence_ indicator can be given in front of a group 572 entry. It is either one of the characters '?' (optional), '*' (zero 573 or more), or '+' (one or more), or is of the form n*m, where n and m 574 are optional unsigned integers and n is the lower limit (default 0) 575 and m is the upper limit (default no limit) of occurrences. 577 If no occurrence indicator is specified, the group entry is to occur 578 exactly once (as if 1*1 were specified). 580 Note that CDDL, outside any directives/annotations that could 581 possibly be defined, does not make any prescription as to whether 582 arrays or maps use the definite length or indefinite length encoding. 583 I.e., there is no correlation between leaving the size of an array 584 "open" in the spec and the fact that it is then interchanged with 585 definite or indefinite length. 587 Please also note that CDDL can describe flexibility that the data 588 model of the target representation does not have. This is rather 589 obvious for JSON, but also is relevant for CBOR: 591 apartment = { 592 kitchen: size, 593 * bedroom: size, 594 } 595 size = float ; in m2 597 The previous specification does not mean that CBOR is changed to 598 allow to use the key "bedroom" more than once. In other words, due 599 to the restrictions imposed by the data model, the third line pretty 600 much turns into: 602 ? bedroom: size, 604 (Occurrence indicators beyond one still are useful in maps for groups 605 that allow a variety of keys.) 607 3.3. Predefined names for types 609 CDDL predefines a number of names. This subsection summarizes these 610 names, but please see Appendix E for the exact definitions. 612 The following keywords for primitive datatypes are defined: 614 "bool" Boolean value (major type 7, additional information 20 or 615 21). 617 "uint" An unsigned integer (major type 0). 619 "nint" A negative integer (major type 1). 621 "int" An unsigned integer or a negative integer. 623 "float16" IEEE 754 half-precision float (major type 7, additional 624 information 25). 626 "float32" IEEE 754 single-precision float (major type 7, additional 627 information 26). 629 "float64" IEEE 754 double-precision float (major type 7, additional 630 information 27). 632 "float" One of float16, float32, or float64. 634 "bstr" or "bytes" A byte string (major type 2). 636 "tstr" or "text" Text string (major type 3) 638 (Note that there are no predefined names for arrays or maps; these 639 are defined with the syntax given below.) 641 In addition, a number of types are defined in the prelude that are 642 associated with CBOR tags, such as "tdate", "bigint", "regexp" etc. 644 3.4. Arrays 646 Array definitions surround a group with square brackets. 648 For each entry, an occurrence indicator as specified in Section 3.2 649 is permitted. 651 For example: 653 unlimited-people = [* person] 654 one-or-two-people = [1*2 person] 655 at-least-two-people = [2* person] 656 person = ( 657 name: tstr, 658 age: uint, 659 ) 661 The group "person" is defined in such a way that repeating it in the 662 array each time generates alternating names and ages, so these are 663 four valid values for a data item of type "unlimited-people": 665 ["roundlet", 1047, "psychurgy", 2204, "extrarhythmical", 2231] 666 [] 667 ["aluminize", 212, "climograph", 4124] 668 ["penintime", 1513, "endocarditis", 4084, "impermeator", 1669, 669 "coextension", 865] 671 3.5. Maps 673 The syntax for specifying maps merits special attention, as well as a 674 number of optimizations and conveniences, as it is likely to be the 675 focal point of many specifications employing CDDL. While the syntax 676 does not strictly distinguish struct and table usage of maps, it 677 caters specifically to each of them. 679 3.5.1. Structs 681 The "struct" usage of maps is similar to the way JSON objects are 682 used in many JSON applications. 684 A map is defined in the same way as defining an array (see 685 Section 3.4), except for using curly braces "{}" instead of square 686 brackets "[]". 688 An occurrence indicator as specified in Section 3.2 is permitted for 689 each group entry. 691 The following is an example of a structure: 693 Geography = [ 694 city : tstr, 695 gpsCoordinates : GpsCoordinates, 696 ] 698 GpsCoordinates = { 699 longitude : uint, ; multiplied by 10^7 700 latitude : uint, ; multiplied by 10^7 701 } 703 When encoding, the Geography structure is encoded using a CBOR array 704 with two entries (the keys for the group entries are ignored), 705 whereas the GpsCoordinates are encoded as a CBOR map with two key- 706 value pairs. 708 Types used in a structure can be defined in separate rules or just in 709 place (potentially placed inside parentheses, such as for choices). 710 E.g.: 712 located-samples = { 713 sample-point: int, 714 samples: [+ float], 715 } 717 where "located-samples" is the datatype to be used when referring to 718 the struct, and "sample-point" and "samples" are the keys to be used. 719 This is actually a complete example: an identifier that is followed 720 by a colon can be directly used as the text string for a member key 721 (we speak of a "bareword" member key), as can a double-quoted string 722 or a number. (When other types, in particular multi-valued ones, are 723 used as keytypes, they are followed by a double arrow, see below.) 725 If a text string key does not match the syntax for an identifier (or 726 if the specifier just happens to prefer using double quotes), the 727 text string syntax can also be used in the member key position, 728 followed by a colon. The above example could therefore have been 729 written with quoted strings in the member key positions. 731 All the types defined can be used in a keytype position by following 732 them with a double arrow. A string also is a (single-valued) type, 733 so another form for this example is: 735 located-samples = { 736 "sample-point" => int, 737 "samples" => [+ float], 738 } 740 A better way to demonstrate the double-arrow use may be: 742 located-samples = { 743 sample-point: int, 744 samples: [+ float], 745 * equipment-type => equipment-tolerances, 746 } 747 equipment-type = [name: tstr, manufacturer: tstr] 748 equipment-tolerances = [+ [float, float]] 750 The example below defines a struct with optional entries: display 751 name (as a text string), the name components first name and family 752 name (as a map of text strings), and age information (as an unsigned 753 integer). 755 PersonalData = { 756 ? displayName: tstr, 757 NameComponents, 758 ? age: uint, 759 } 761 NameComponents = ( 762 ? firstName: tstr, 763 ? familyName: tstr, 764 ) 766 Note that the group definition for NameComponents does not generate 767 another map; instead, all four keys are directly in the struct built 768 by PersonalData. 770 In this example, all key/value pairs are optional from the 771 perspective of CDDL. With no occurrence indicator, an entry is 772 mandatory. 774 If the addition of more entries not specified by the current 775 specification is desired, one can add this possibility explicitly: 777 PersonalData = { 778 ? displayName: tstr, 779 NameComponents, 780 ? age: uint, 781 * tstr => any 782 } 784 NameComponents = ( 785 ? firstName: tstr, 786 ? familyName: tstr, 787 ) 789 Figure 5: Personal Data: Example for extensibility 791 The cddl tool (Appendix F) generated as one acceptable instance for 792 this specification: 794 {"familyName": "agust", "antiforeignism": "pretzel", 795 "springbuck": "illuminatingly", "exuviae": "ephemeris", 796 "kilometrage": "frogfish"} 798 (See Section 3.8 for one way to explicitly identify an extension 799 point.) 801 3.5.2. Tables 803 A table can be specified by defining a map with entries where the 804 keytype is not single-valued, e.g.: 806 square-roots = {* x => y} 807 x = int 808 y = float 810 Here, the key in each key/value pair has datatype x (defined as int), 811 and the value has datatype y (defined as float). 813 If the specification does not need to restrict one of x or y (i.e., 814 the application is free to choose per entry), it can be replaced by 815 the predefined name "any". 817 As another example, the following could be used as a conversion table 818 converting from an integer or float to a string: 820 tostring = {* mynumber => tstr} 821 mynumber = int / float 823 3.6. Tags 825 A type can make use of a CBOR tag (major type 6) by using the 826 representation type notation, giving #6.nnn(type) where nnn is an 827 unsigned integer giving the tag number and "type" is the type of the 828 data item being tagged. 830 For example, the following line from the CDDL prelude (Appendix E) 831 defines "biguint" as a type name for a positive bignum N: 833 biguint = #6.2(bstr) 835 The tags defined by [RFC7049] are included in the prelude. 836 Additional tags since registered need to be added to a CDDL 837 specification as needed; e.g., a binary UUID tag could be referenced 838 as "buuid" in a specification after defining 840 buuid = #6.37(bstr) 842 In the following example, usage of the tag 32 for URIs is optional: 844 my_uri = #6.32(tstr) / tstr 846 3.7. Annotations 848 An _annotation_ allows to annotate a _target_ type with a _control_ 849 type via an _annotator_. 851 The syntax for an annotated type is "target .annotator control", 852 where annotators are special identifiers prefixed by a dot. (Note 853 that _target_ or _control_ might need to be parenthesized.) 855 Three annotators are defined at his point. Note that the CDDL tool 856 does not currently support combining multiple annotations on a single 857 target. 859 3.7.1. Annotation .size 861 A ".size" annotation controls the size of the target in bytes by the 862 control type. Examples: 864 full-address = [[+ label], ip4, ip6] 865 ip4 = bstr .size 4 866 ip6 = bstr .size 16 867 label = bstr .size (1..63) 869 Figure 6: Annotation for size in bytes 871 When applied to an unsigned integer, the ".size" annotation restricts 872 the range of that integer by giving a maximum number of bytes that 873 should be needed in a computer representation of that unsigned 874 integer. In other words, "uint .size N" is equivalent to 875 "0...BYTES_N", where BYTES_N == 256**N. 877 audio_sample = uint .size 3 ; 24-bit, equivalent to 0..16777215 879 Figure 7: Annotation for integer size in bytes 881 Note that, as with value restrictions in CDDL, this annotation is not 882 a representation constraint; a number that fits into fewer bytes can 883 still be represented in that form, and an inefficient implementation 884 could use a longer form (unless that is restricted by some format 885 constraints outside of CDDL, such as the rules in Section 3.9 of 886 [RFC7049]). 888 3.7.2. Annotation .bits 890 A ".bits" annotation on a byte string indicates that, in the target, 891 only the bits numbered by a number in the control type are allowed to 892 be set. (Bits are counted the usual way, bit number "n" being set in 893 "str" meaning that "(str[n >> 3] & (1 << (n & 7))) != 0".) 894 [_bitsendian] 896 Similarly, a ".bits" annotation on an unsigned integer "i" indicates 897 that for all unsigned integers "n" where "(i & (1 << n)) != 0", "n" 898 must be in the control type. 900 tcpflagbytes = bstr .bits flags 901 flags = &( 902 fin: 8, 903 syn: 9, 904 rst: 10, 905 psh: 11, 906 ack: 12, 907 urg: 13, 908 ece: 14, 909 cwr: 15, 910 ns: 0, 911 ) / (4..7) ; data offset bits 913 rwxbits = uint .bits rwx 914 rwx = &(r: 2, w: 1, x: 0) 916 Figure 8: Annotation for what bits can be set 918 The CDDL tool generates the following ten example instances for 919 "tcpflagbytes": 921 h'906d' h'01fc' h'8145' h'01b7' h'013d' h'409f' h'018e' h'c05f' 922 h'01fa' h'01fe' 924 These examples do not illustrate that the above CDDL specification 925 does not explicitly specify a size of two bytes: A valid all clear 926 instance of flag bytes could be "h''" or "h'00'" or even "h'000000'" 927 as well. 929 3.7.3. Annotation .regexp 931 A ".regexp" annotation indicates that the text string given as a 932 target needs to match the PCRE regular expression given as a value in 933 the control type, where that regular expression is anchored on both 934 sides. (If anchoring is not desired for a side, ".*" needs to be 935 inserted there.) 937 nai = tstr .regexp "\\w+@\\w+(\\.\\w+)+" 939 Figure 9: Annotation with a PCRE regexp 941 The CDDL tool proposes: 943 "N1@CH57HF.4Znqe0.dYJRN.igjf" 945 3.7.4. Annotations .cbor and .cborseq 947 A ".cbor" annotation on a byte string indicates that the byte string 948 carries a CBOR encoded data item. Decoded, the data item matches the 949 type given as the right-hand side argument (type1 in the following 950 example). 952 "bytes .cbor type1" 954 Similarly, a ".cborseq" annotation on a byte string indicates that 955 the byte string carries a sequence of CBOR encoded data items. When 956 the data items are taken as an array, the array matches the type 957 given as the right-hand side argument (type2 in the following 958 example). 960 "bytes .cborseq type2" 962 (The conversion of the encoded sequence to an array can be effected 963 for instance by wrapping the byte string between the two bytes 0x9f 964 and 0xff and decoding the wrapped byte string as a CBOR encoded data 965 item.) 967 3.7.5. Annotations .within and .and 969 A ".and" annotation on a type indicates that the data item matches 970 both that left hand side type and the type given as the right hand 971 side. (Formally, the resulting type is the intersection of the two 972 types given.) 974 "type1 .and type2" 976 A variant of the ".and" annotation is the ".within" annotation, which 977 expresses an additional intent: the left hand side type is meant to 978 be a subset of the right-hand-side type. 980 "type1 .within type2" 982 While both forms have the identical formal semantics (intersection), 983 the intention of the ".within" form is that the right hand side gives 984 guidance to the types allowed on the left hand side, which typically 985 is a socket (Section 3.8): 987 message = $message .within message-structure 988 message-structure = [message_type, *message_option] 989 message_type = 0..255 990 message_option = any 992 $message /= [3, dough: text, topping: [* text]] 993 $message /= [4, noodles: text, sauce: text, parmesan: bool] 995 For ".within", a tool might flag an error if type1 allows data items 996 that are not allowed by type2. In contrast, for ".and", there is no 997 expectation that type1 already is a subset of type2. 999 3.7.6. Annotations .lt, .le, .gt, .ge, .eq, .ne, and .default 1001 The annotations .lt, .le, .gt, .ge, .eq, .ne specify a constraint on 1002 the left hand side type to be a value less than, less than or equal, 1003 equal to, not equal to, greather than, or greater than or equal to a 1004 value given as a (single-valued) right hand side type. In the 1005 present specification, the first four annotations (.lt, .le, .gt, 1006 .ge) are defined only for numeric types, as these have a natural 1007 ordering relationship. 1009 speed = number .ge 0 ; unit: m/s 1011 A variant of the ".ne" annotation is the ".default" annotation, which 1012 expresses an additional intent: the value specified by the right- 1013 hand-side type is intended as a default value for the left hand side 1014 type given, and the implied .ne annotation is there to prevent this 1015 value from being sent over the wire. This annotation is only 1016 meaningful when the annotated type is used in an optional context; 1017 otherwise there would be no way to express the default value. 1019 timer = { 1020 time: uint, 1021 ? displayed-step: (number .gt 0) .default 1 1022 } 1024 3.8. Socket/Plug 1026 Both for type choices and group choices, a mechanism is defined that 1027 facilitates starting out with empty choices and assembling them 1028 later, potentially in separate files that are concatenated to build 1029 the full specification. 1031 Per convention, CDDL extension points are marked with a leading 1032 dollar sign (types) or two leading dollar signs (groups). Tools 1033 honor that convention by not raising an error if such a type or group 1034 is not defined at all; the symbol is then taken to be an empty type 1035 choice (group choice), i.e., no choice is available. 1037 tcp-header = {seq: uint, ack: uint, * $$tcp-option} 1039 ; later, in a different file 1041 $$tcp-option //= ( 1042 sack: [+(left: uint, right: uint)] 1043 ) 1045 ; and, maybe in another file 1047 $$tcp-option //= ( 1048 sack-permitted: true 1049 ) 1051 Names that start with a single "$" are "type sockets", names with a 1052 double "$$" are "group sockets". It is not an error if there is no 1053 definition for a socket at all; this then means there is no way to 1054 satisfy the rule (i.e., the choice is empty). 1056 All definitions (plugs) for socket names must be augmentations, i.e., 1057 they must be using "/=" and "//=", respectively. 1059 To pick up the example illustrated in Figure 5, the socket/plug 1060 mechanism could be used as shown in Figure 10: 1062 PersonalData = { 1063 ? displayName: tstr, 1064 NameComponents, 1065 ? age: uint, 1066 * $$personaldata-extensions 1067 } 1069 NameComponents = ( 1070 ? firstName: tstr, 1071 ? familyName: tstr, 1072 ) 1074 ; The above already works as is. 1075 ; But then, we can add later: 1077 $$personaldata-extensions //= ( 1078 favorite-salsa: tstr, 1079 ) 1081 ; and again, somewhere else: 1083 $$personaldata-extensions //= ( 1084 shoesize: uint, 1085 ) 1087 Figure 10: Personal Data example: Using socket/plug extensibility 1089 3.9. Operator Precedence 1091 As with any language that has multiple syntactic features such as 1092 prefix and infix operators, CDDL has operators that bind more tightly 1093 than others. This is becoming more complicated than, say, in ABNF, 1094 as CDDL has both types and groups, with operators that are specific 1095 to these concepts. Type operators (such as "/" for type choice) 1096 operate on types, while group operators (such as "//" for group 1097 choice) operate on groups. Types can simply be used in groups, but 1098 groups need to be bracketed (as arrays or maps) to become types. So, 1099 type operators naturally bind closer than group operators. 1101 For instance, in 1103 t = [group1] 1104 group1 = (a / b // c / d) 1105 a = 1 b = 2 c = 3 d = 4 1107 group1 is a group choice between the type choice of a and b and the 1108 type choice of c and d. This becomes more relevant once member keys 1109 and/or occurrences are added in: 1111 t = {group2} 1112 group2 = (? ab: a / b // cd: c / d) 1113 a = 1 b = 2 c = 3 d = 4 1115 is a group choice between the optional member "ab" of type a or b and 1116 the member "cd" of type c or d. Note that the optionality is 1117 attached to the first choice ("ab"), not to the second choice. 1119 Similarly, in 1121 t = [group3] 1122 group3 = (+ a / b / c) 1123 a = 1 b = 2 c = 3 1125 group3 is a repetition of a type choice between a, b, and c [unflex]; 1126 if just a is to be repeatable, a group choice is needed to focus the 1127 occurrence: 1129 t = [group4] 1130 group4 = (+ a // b / c) 1131 a = 1 b = 2 c = 3 1133 group4 is a group choice between a repeatable a and a single b or c. 1135 In general, as with many other languages with operator precedence 1136 rules, it is best not to rely on them, but to insert parentheses for 1137 readability: 1139 t = [group4a] 1140 group4a = ((+ a) // (b / c)) 1141 a = 1 b = 2 c = 3 1143 The operator precedences, in sequence of loose to tight binding, are 1144 defined in Appendix D and summarized in Table 1. (Arities given are 1145 1 for unary prefix operators and 2 for binary infix operators.) 1146 +----------+----+---------------------------+------+ 1147 | Operator | Ar | Operates on | Prec | 1148 +----------+----+---------------------------+------+ 1149 | = | 2 | name = type, name = group | 1 | 1150 | /= | 2 | name /= type | 1 | 1151 | //= | 2 | name //= group | 1 | 1152 | // | 2 | group // group | 2 | 1153 | , | 2 | group, group | 3 | 1154 | * | 1 | * group | 4 | 1155 | N*M | 1 | N*M group | 4 | 1156 | + | 1 | + group | 4 | 1157 | ? | 1 | ? group | 4 | 1158 | => | 2 | type => type | 5 | 1159 | : | 2 | name: type | 5 | 1160 | / | 2 | type / type | 6 | 1161 | & | 1 | &group | 6 | 1162 | .. | 2 | type..type | 7 | 1163 | ... | 2 | type...type | 7 | 1164 | .anno | 2 | type .anno type | 7 | 1165 +----------+----+---------------------------+------+ 1167 Table 1: Summary of operator precedences 1169 4. Examples 1171 This section contains various examples of structures defined using 1172 CDDL. 1174 4.1. Fruit 1176 Figure 11 contains an example for a CBOR structure that contains 1177 information about fruit. 1179 fruitlist = [* Fruit] 1181 Fruit = { 1182 name : tstr, 1183 colour : [* color], 1184 avg_weight : float16, 1185 price : uint, 1186 international_names : International, 1187 rfu : bstr, ; reserved for future use 1188 } 1190 International = { 1191 "DE" : tstr, ; German 1192 "EN" : tstr, ; English 1193 "FR" : tstr, ; French 1194 "NL" : tstr, ; Dutch 1195 "ZH-HANS" : tstr, ; Chinese 1196 } 1198 color = &( 1199 black: 0, red: 1, green: 2, yellow: 3, 1200 blue: 4, magenta: 5, cyan: 6, white: 7, 1201 ) 1203 Figure 11: Example CBOR structure 1205 4.2. RFC 7071 1207 [RFC7071] defines the Reputon structure for JSON using somewhat 1208 formalized English text. Here is a (somewhat verbose) equivalent 1209 definition using the same terms, but notated in CDDL: 1211 reputation-object = { 1212 reputation-context, 1213 reputon-list 1214 } 1216 reputation-context = ( 1217 application: text 1218 ) 1220 reputon-list = ( 1221 reputons: reputon-array 1222 ) 1224 reputon-array = [* reputon] 1226 reputon = { 1227 rater-value, 1228 assertion-value, 1229 rated-value, 1230 rating-value, 1231 ? conf-value, 1232 ? normal-value, 1233 ? sample-value, 1234 ? gen-value, 1235 ? expire-value, 1236 * ext-value, 1237 } 1239 rater-value = ( rater: text ) 1240 assertion-value = ( assertion: text ) 1241 rated-value = ( rated: text ) 1242 rating-value = ( rating: float16 ) 1243 conf-value = ( confidence: float16 ) 1244 normal-value = ( normal-rating: float16 ) 1245 sample-value = ( sample-size: uint ) 1246 gen-value = ( generated: uint ) 1247 expire-value = ( expires: uint ) 1248 ext-value = ( text => any ) 1250 An equivalent, more compact form of this example would be: 1252 reputation-object = { 1253 application: text 1254 reputons: [* reputon] 1255 } 1257 reputon = { 1258 rater: text 1259 assertion: text 1260 rated: text 1261 rating: float16 1262 ? confidence: float16 1263 ? normal-rating: float16 1264 ? sample-size: uint 1265 ? generated: uint 1266 ? expires: uint 1267 * text => any 1268 } 1270 Note how this rather clearly delineates the structure somewhat 1271 shrouded by so many words in section 6.2.2. of [RFC7071]. Also, this 1272 definition makes it clear that several ext-values are allowed (by 1273 definition with different member names); RFC 7071 could be read to 1274 forbid the repetition of ext-value ("A specific reputon-element MUST 1275 NOT appear more than once" is ambiguous.) 1277 The CDDL tool (which hasn't quite been trained for polite 1278 conversation) says: 1280 { 1281 "application": "tridentiferous", 1282 "reputons": [ 1283 { 1284 "rater": "loamily", 1285 "assertion": "Dasyprocta", 1286 "rated": "uncommensurableness", 1287 "rating": 0.05055809746548934, 1288 "confidence": 0.7484706448605812, 1289 "normal-rating": 0.8677887734049299, 1290 "sample-size": 4059, 1291 "expires": 3969, 1292 "bearer": "nitty", 1293 "faucal": "postulnar", 1294 "naturalism": "sarcotic" 1295 }, 1296 { 1297 "rater": "precreed", 1298 "assertion": "xanthosis", 1299 "rated": "balsamy", 1300 "rating": 0.36091333590593955, 1301 "confidence": 0.3700759808403371, 1302 "sample-size": 3904 1303 }, 1304 { 1305 "rater": "urinosexual", 1306 "assertion": "malacostracous", 1307 "rated": "arenariae", 1308 "rating": 0.9210673488013762, 1309 "normal-rating": 0.4778762617112776, 1310 "sample-size": 4428, 1311 "generated": 3294, 1312 "backfurrow": "enterable", 1313 "fruitgrower": "flannelflower" 1314 }, 1315 { 1316 "rater": "pedologistically", 1317 "assertion": "unmetaphysical", 1318 "rated": "elocutionist", 1319 "rating": 0.42073613384304287, 1320 "misimagine": "retinaculum", 1321 "snobbish": "contradict", 1322 "Bosporanic": "periostotomy", 1323 "dayworker": "intragyral" 1324 } 1325 ] 1326 } 1328 4.3. Examples from JSON Content Rules 1330 Although JSON Content Rules [I-D.newton-json-content-rules] seems to 1331 address a more general problem than CDDL, it is still a worthwhile 1332 resource to explore for examples (beyond all the inspiration the 1333 format itself has had for CDDL). 1335 Figure 2 of the JCR I-D looks very similar, if slightly less noisy, 1336 in CDDL: 1338 root = [2*2 { 1339 precision: text, 1340 Latitude: float, 1341 Longitude: float, 1342 Address: text, 1343 City: text, 1344 State: text, 1345 Zip: text, 1346 Country: text 1347 }] 1349 Figure 12: JCR, Figure 2, in CDDL 1351 Apart from the lack of a need to quote the member names, text strings 1352 are called "text" or "tstr" in CDDL ("string" would be ambiguous as 1353 CBOR also provides byte strings). 1355 The CDDL tool creates the below example instance for this: 1357 [{"precision": "pyrosphere", "Latitude": 0.5399712314350172, 1358 "Longitude": 0.5157523963028087, "Address": "resow", 1359 "City": "problemwise", "State": "martyrlike", "Zip": "preprove", 1360 "Country": "Pace"}, 1361 {"precision": "unrigging", "Latitude": 0.10422704368372193, 1362 "Longitude": 0.6279808663725834, "Address": "picturedom", 1363 "City": "decipherability", "State": "autometry", "Zip": "pout", 1364 "Country": "wimple"}] 1366 Figure 4 of the JCR I-D in CDDL: 1368 root = { image } 1370 image = ( 1371 Image: { 1372 size, 1373 Title: text, 1374 thumbnail, 1375 IDs: [* int] 1376 } 1377 ) 1379 size = ( 1380 Width: 0..1280 1381 Height: 0..1024 1382 ) 1384 thumbnail = ( 1385 Thumbnail: { 1386 size, 1387 Url: uri 1388 } 1389 ) 1391 This shows how the group concept can be used to keep related elements 1392 (here: width, height) together, and to emulate the JCR style of 1393 specification. (It also shows using a tag from the prelude, "uri" - 1394 this could be done differently.) The more compact form of Figure 5 1395 of the JCR I-D could be emulated like this: 1397 root = { 1398 Image: { 1399 size, Title: text, 1400 Thumbnail: { size, Url: uri }, 1401 IDs: [* int] 1402 } 1403 } 1405 size = ( 1406 Width: 0..1280, 1407 Height: 0..1024, 1408 ) 1410 The CDDL tool creates the below example instance for this: 1412 {"Image": {"Width": 566, "Height": 516, "Title": "leisterer", 1413 "Thumbnail": {"Width": 1111, "Height": 176, "Url": 32("scrog")}, 1414 "IDs": []}} 1416 5. Making Use of CDDL 1418 In this section, we discuss several potential ways to employ CDDL. 1420 5.1. As a guide to a human user 1422 CDDL can be used to efficiently define the layout of CBOR data, such 1423 that a human implementer can easily see how data is supposed to be 1424 encoded. 1426 Since CDDL maps parts of the CBOR data to human readable names, tools 1427 could be built that use CDDL to provide a human friendly 1428 representation of the CBOR data, and allow them to edit such data 1429 while remaining compliant to its CDDL definition. 1431 5.2. For automated checking of CBOR data structure 1433 CDDL has been specified such that a machine can handle the CDDL 1434 definition and related CBOR data. For example, a machine could use 1435 CDDL to check whether or not CBOR data is compliant to its 1436 definition. 1438 The need for thoroughness of such compliance checking depends on the 1439 application. For example, an application may decide not to check the 1440 data structure at all, and use the CDDL definition solely as a means 1441 to indicate the structure of the data to the programmer. 1443 On the other end, the application may also implement a checking 1444 mechanism that goes as far as checking that all mandatory map pairs 1445 are available. 1447 The matter in how far the data description must be enforced by an 1448 application is left to the designers and implementers of that 1449 application, keeping in mind related security considerations. 1451 In no case the intention is that a CDDL tool would be "writing code" 1452 for an implementation. 1454 5.3. For data analysis tools 1456 In the long run, it can be expected that more and more data will be 1457 stored using the CBOR data format. 1459 Where there is data, there is data analysis and the need to process 1460 such data automatically. CDDL can be used for such automated data 1461 processing, allowing tools to verify data, clean it, and extract 1462 particular parts of interest from it. 1464 Since CBOR is designed with constrained devices in mind, a likely use 1465 of it would be small sensors. An interesting use would thus be 1466 automated analysis of sensor data. 1468 6. Security considerations 1470 This document presents a content rules language for expressing CBOR 1471 data structures. As such, it does not bring any security issues on 1472 itself, although specification of protocols that use CBOR naturally 1473 need security analysis when defined. 1475 Topics that could be considered in a security considerations section 1476 that uses CDDL to define CBOR structures include the following: 1478 o Where could the language maybe cause confusion in a way that will 1479 enable security issues? 1481 7. IANA considerations 1483 This document does not require any IANA registrations. 1485 8. Acknowledgements 1487 CDDL was originally conceived by Bert Greevenbosch, who also wrote 1488 the original five versions of this document. 1490 Inspiration was taken from the C and Pascal languages, MPEG's 1491 conventions for describing structures in the ISO base media file 1492 format, Relax-NG and its compact syntax [RELAXNG], and in particular 1493 from Andrew Lee Newton's "JSON Content Rules" 1494 [I-D.newton-json-content-rules]. 1496 Useful feedback came from Joe Hildebrand, Sean Leonard and Jim 1497 Schaad. 1499 The CDDL tool was written by Carsten Bormann, building on previous 1500 work by Troy Heninger and Tom Lord. 1502 9. References 1504 9.1. Normative References 1506 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1507 Requirement Levels", BCP 14, RFC 2119, 1508 DOI 10.17487/RFC2119, March 1997, 1509 . 1511 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 1512 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 1513 2003, . 1515 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 1516 Specifications: ABNF", STD 68, RFC 5234, 1517 DOI 10.17487/RFC5234, January 2008, 1518 . 1520 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 1521 Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, 1522 October 2013, . 1524 [RFC7159] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data 1525 Interchange Format", RFC 7159, DOI 10.17487/RFC7159, March 1526 2014, . 1528 9.2. Informative References 1530 [RELAXNG] OASIS, "RELAX-NG Compact Syntax", November 2002, 1531 . 1533 [RFC7071] Borenstein, N. and M. Kucherawy, "A Media Type for 1534 Reputation Interchange", RFC 7071, DOI 10.17487/RFC7071, 1535 November 2013, . 1537 [I-D.newton-json-content-rules] 1538 Newton, A. and P. Cordell, "A Language for Rules 1539 Describing JSON Content", draft-newton-json-content- 1540 rules-07 (work in progress), September 2016. 1542 [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 1543 Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, 1544 . 1546 Appendix A. Cemetery 1548 The following ideas have been buried in the discussions leading up to 1549 the present specification: 1551 o <...> as syntax for enumerations. We view values to be just 1552 another type (a very specific type with just one member), so that 1553 an enumeration can be denoted as a choice using "/" as the 1554 delimiter of choices. Because of this, no evidence is present 1555 that a separate syntax for enumerations is needed. 1557 A.1. Resolved Issues 1559 o The key/value pairs in maps have no fixed ordering. One could 1560 imagine situations where fixing the ordering may be of use. For 1561 example, a decoder could look for values related with integer keys 1562 1, 3 and 7. If the order were fixed and the decoder encounters 1563 the key 4 without having encountered key 3, it could conclude that 1564 key 3 is not available without doing more complicated bookkeeping. 1565 Unfortunately, neither JSON nor CBOR support this, so no attempt 1566 was made to support this in CDDL either. 1568 o CDDL distinguishes the various CBOR number types, but there is 1569 only one number type in JSON. There is no effect in specifying a 1570 precision (float16/float32/float64) when using CDDL for specifying 1571 JSON data structures. (The current validator implementation 1572 Appendix F does not handle this very well, either.) 1574 Appendix B. Nursery 1576 This appendix describes advanced features that are still under review 1577 as they have not yet been heavily used in specifications. 1579 B.1. Generics 1581 Using angle brackets, the left hand side of a rule can add formal 1582 parameters after the name being defined, as in: 1584 messages = message<"reboot", "now"> / message<"sleep", 1..100> 1585 message = {type: t, value: v} 1587 When using a generic rule, the formal parameters are bound to the 1588 actual arguments supplied (also using angle brackets), within the 1589 scope of the generic rule (as if there were a rule of the form 1590 parameter = argument). 1592 (There are some limitations to nesting of generics in Appendix F at 1593 this time.) 1595 Appendix C. Change Log 1597 Changes from version 00 to version 01: 1599 o Removed constants 1601 o Updated the tag mechanism 1603 o Extended the map structure 1604 o Added examples 1606 Changes from version 01 to version 02: 1608 o Fixed example 1610 Changes from version 02 to version 03: 1612 o Added information about characters used in names 1614 o Added text about an overall data structure and order of definition 1615 of fields 1617 o Added text about encoding of keys 1619 o Added table with keywords 1621 o Strings and integer writing conventions 1623 o Added ABNF 1625 Changes from version 03 to version 04: 1627 o Removed optional fields for non-maps 1629 o Defined all key/value pairs in maps are considered optional from 1630 the CDDL perspective 1632 o Allow omission of type of keys for maps with only text string and 1633 integer keys 1635 o Changed order of definitions 1637 o Updated fruit and moves examples 1639 o Renamed the "Philosophy" section to "Using CDDL", and added more 1640 text about CDDL usage 1642 o Several editorials 1644 Changes from version 04 to version 05: 1646 o Added text about alternative datatypes and any datatype 1648 o Fixed typos 1650 o Restructured syntax and semantics 1651 Changes from version 05 to version 05: 1653 o Fixed the ABNF for choices (no longer need to write a: (b/c)) 1655 o Added group choices (//) 1657 o Added /= and //= 1659 o Added experimental socket/plug 1661 o Added aliases text, bytes, null to prelude 1663 o Documented generics 1665 o Fixed more typos 1667 Changes from 06 to 07: 1669 o .cbor, .cborseq, .within, .and 1671 o Define .size on uint 1673 o Extended Diagnostic Notation 1675 o Precedence discussion and table 1677 o Remove some of the "issues" that can only be understood with 1678 historical context 1680 o Prefer "text" over "tstr" in some of the examples 1682 o Add "unsigned" to the prelude 1684 Changes from 07 to 08: 1686 o .lt, .le, .eq, .ne, .gt, .ge 1688 o .default 1690 Changes from 08 to 09: 1692 o Take annotations and socket/plug out of the nursery; they have 1693 been battle-proven enough. 1695 o Define a value notation for byte strings as well. 1697 o Removed discussion section that was no longer relevant; move 1698 "Resolved Issues" to appendix. 1700 Changes from 09 to 10: 1702 o Remove a long but not very elucidating example. (Maybe we'll add 1703 back some shorter examples later.) 1705 o A few clarifications. 1707 o Updated author list. 1709 Appendix D. ABNF grammar 1711 The following is a formal definition of the CDDL syntax in Augmented 1712 Backus-Naur Form (ABNF, [RFC5234]). [_abnftodo] 1714 cddl = S 1*rule 1715 rule = typename [genericparm] S assign S type S 1716 / groupname [genericparm] S assign S grpent S 1718 typename = id 1719 groupname = id 1721 assign = "=" / "/=" / "//=" 1723 genericparm = "<" S id S *("," S id S ) ">" 1724 genericarg = "<" S type1 S *("," S type1 S ) ">" 1726 type = type1 S *("/" S type1 S) 1728 type1 = type2 [S (rangeop / annotator) S type2] 1729 / "#" "6" ["." uint] "(" S type S ")" ; note no space! 1730 / "#" DIGIT ["." uint] ; major/ai 1731 / "#" ; any 1732 / "{" S group S "}" 1733 / "[" S group S "]" 1734 / "&" S "(" S group S ")" 1735 / "&" S groupname [genericarg] 1737 type2 = value 1738 / typename [genericarg] 1739 / "(" type ")" 1741 rangeop = "..." / ".." 1743 annotator = "." id 1745 group = grpchoice S *("//" S grpchoice S) 1747 grpchoice = *grpent 1748 grpent = [occur S] [memberkey S] type optcom 1749 / [occur S] groupname [genericarg] optcom ; preempted by above 1750 / [occur S] "(" S group S ")" optcom 1752 memberkey = type1 S "=>" 1753 / bareword S ":" 1754 / value S ":" 1756 bareword = id 1758 optcom = S ["," S] 1760 occur = [uint] "*" [uint] 1761 / "+" 1762 / "?" 1764 uint = ["0x" / "0b"] "0" 1765 / ["0x" / "0b"] DIGIT1 *DIGIT 1767 value = number 1768 / text 1769 / bytes 1771 int = ["-"] uint 1773 ; This is a float if it has fraction or exponent; int otherwise 1774 number = int ["." fraction] ["e" exponent ] 1775 fraction = 1*DIGIT 1776 exponent = int 1778 text = %x22 *SCHAR %x22 1779 SCHAR = %x20-21 / %x23-7E / SESC 1780 SESC = "\" %x20-7E 1782 bytes = [bsqual] %x27 *BCHAR %x27 1783 BCHAR = %x20-26 / %x28-7E / SESC / CRLF 1784 bsqual = %x68 ; "h" 1785 / %x62.36.34 ; "b64" 1787 id = EALPHA *(*("-" / ".") (EALPHA / DIGIT)) 1788 ALPHA = %x41-5A / %x61-7A 1789 EALPHA = %x41-5A / %x61-7A / "@" / "_" / "$" 1790 DIGIT = %x30-39 1791 DIGIT1 = %x31-39 1792 S = *WS 1793 WS = SP / NL 1794 SP = %x20 1795 NL = COMMENT / CRLF 1796 COMMENT = ";" *(SP / VCHAR) CRLF 1797 VCHAR = %x21-7E 1798 CRLF = %x0A / %x0D.0A 1800 Figure 13: CDDL ABNF 1802 Appendix E. Standard Prelude 1804 The following prelude is automatically added to each CDDL file 1805 [tdate]. (Note that technically, it is a postlude, as it does not 1806 disturb the selection of the first rule as the root of the 1807 definition.) 1808 any = # 1810 uint = #0 1811 nint = #1 1812 int = uint / nint 1814 bstr = #2 1815 bytes = bstr 1816 tstr = #3 1817 text = tstr 1819 tdate = #6.0(tstr) 1820 time = #6.1(number) 1821 number = int / float 1822 biguint = #6.2(bstr) 1823 bignint = #6.3(bstr) 1824 bigint = biguint / bignint 1825 integer = int / bigint 1826 unsigned = uint / biguint 1827 decfrac = #6.4([e10: int, m: integer]) 1828 bigfloat = #6.5([e2: int, m: integer]) 1829 eb64url = #6.21(any) 1830 eb64legacy = #6.22(any) 1831 eb16 = #6.23(any) 1832 encoded-cbor = #6.24(bstr) 1833 uri = #6.32(tstr) 1834 b64url = #6.33(tstr) 1835 b64legacy = #6.34(tstr) 1836 regexp = #6.35(tstr) 1837 mime-message = #6.36(tstr) 1838 cbor-any = #6.55799(any) 1840 float16 = #7.25 1841 float32 = #7.26 1842 float64 = #7.27 1843 float16-32 = float16 / float32 1844 float32-64 = float32 / float64 1845 float = float16-32 / float64 1847 false = #7.20 1848 true = #7.21 1849 bool = false / true 1850 nil = #7.22 1851 null = nil 1852 undefined = #7.23 1854 Figure 14: CDDL Prelude 1856 Note that the prelude is deemed to be fixed. This means, for 1857 instance, that additional tags beyond [RFC7049], as registered, need 1858 to be defined in each CDDL file that is using them. 1860 A common stumbling point is that the prelude does not define a type 1861 "string". CBOR has byte strings ("bytes" in the prelude) and text 1862 strings ("text"), so a type that is simply called "string" would be 1863 ambiguous. 1865 Appendix F. The CDDL tool 1867 A rough CDDL tool is available. For CDDL specifications that do not 1868 use recursion, it can check the syntax, generate one or more 1869 instances (expressed in CBOR diagnostic notation or in pretty-printed 1870 JSON), and validate an existing instance against the specification: 1872 Usage: 1873 cddl spec.cddl generate [n] 1874 cddl spec.cddl json-generate [n] 1875 cddl spec.cddl validate instance.cbor 1876 cddl spec.cddl validate instance.json 1878 Figure 15: CDDL tool usage 1880 Install on a system with a modern Ruby via: 1882 gem install cddl 1884 Figure 16 1886 The accompanying CBOR diagnostic tools (which are automatically 1887 installed by the above) are described in https://github.com/cabo/ 1888 cbor-diag ; they can be used to convert between binary CBOR, a 1889 pretty-printed form of that, CBOR diagnostic notation, JSON, and 1890 YAML. 1892 Appendix G. Extended Diagnostic Notation 1894 Section 6 of [RFC7049] defines a "diagnostic notation" in order to be 1895 able to converse about CBOR data items without having to resort to 1896 binary data. Diagnostic notation is based on JSON, with extensions 1897 for representing CBOR constructs such as binary data and tags. 1899 (Standardizing this together with the actual interchange format does 1900 not serve to create another interchange format, but enables the use 1901 of a shared diagnostic notation in tools for and documents about 1902 CBOR.) 1903 This section discusses a few extensions to the diagnostic notation 1904 that have turned out to be useful since RFC 7049 was written. We 1905 refer to the result as extended diagnostic notation (EDN). 1907 G.1. White space in binary strings 1909 Examples often benefit from some white space (spaces, line breaks) in 1910 binary strings. In extended diagnostic notation, white space is 1911 ignored in prefixed binary strings; for instance, the following are 1912 equivalent: 1914 h'48656c6c6f20776f726c64' 1915 h'48 65 6c 6c 6f 20 77 6f 72 6c 64' 1916 h'4 86 56c 6c6f 1917 20776 f726c64' 1919 G.2. Text in binary strings 1921 Diagnostic notation notates Byte strings in one of the [RFC4648] base 1922 encodings,, enclosed in single quotes, prefixed by >h< for base16, 1923 >b32< for base32, >h32< for base32hex, >b64< for base64 or base64url. 1924 Quite often, binary strings carry bytes that are meaningfully 1925 interpreted as UTF-8 text. Extended Diagnostic Notation allows the 1926 use of single quotes without a prefix to express byte strings with 1927 UTF-8 text; for instance, the following are equivalent: 1929 'hello world' 1930 h'68656c6c6f20776f726c64' 1932 The escaping rules of JSON strings are applied equivalently for text- 1933 based binary strings, e.g., \ stands for a single backslash and ' 1934 stands for a single quote. White space is included literally, i.e., 1935 the previous section does not apply to text-based binary strings. 1937 G.3. Concatenated Strings 1939 While the ability to include white space enables line-breaking of 1940 encoded binary strings, a mechanism is needed to be able to include 1941 text strings as well as binary strings in direct UTF-8 representation 1942 into line-based documents (such as RFCs and source code). 1944 We extend the diagnostic notation by allowing multiple text strings 1945 or multiple byte strings to be notated separated by white space, 1946 these are then concatenated into a single text or byte string, 1947 respectively. Text strings and binary strings do not mix within such 1948 a concatenation, except that binary string notation can be used 1949 inside a sequence of concatenated text string notation to encode 1950 characters that may be better represented in an encoded way. The 1951 following four values are equivalent: 1953 "Hello world" 1954 "Hello " "world" 1955 "Hello" h'20' "world" 1956 "" h'48656c6c6f20776f726c64' "" 1958 Similarly, the following byte string values are equivalent 1960 'Hello world' 1961 'Hello ' 'world' 1962 'Hello ' h'776f726c64' 1963 'Hello' h'20' 'world' 1964 '' h'48656c6c6f20776f726c64' '' b64'' 1965 h'4 86 56c 6c6f' h' 20776 f726c64' 1967 (Note that the approach of separating by whitespace, while familiar 1968 from the C language, requires some attention - a single comma makes a 1969 big difference here.) 1971 G.4. Hexadecimal, octal, and binary numbers 1973 In addition to JSON's decimal numbers, EDN provides hexadecimal, 1974 octal and binary numbers in the usual C-language notation (octal with 1975 0o prefix present only). 1977 The following are equivalent: 1979 4711 1980 0x1267 1981 0o11147 1982 0b1001001100111 1984 As are: 1986 1.5 1987 0x1.8p0 1988 0x18p-4 1990 G.5. Comments 1992 Longer pieces of diagnostic notation may benefit from comments. JSON 1993 famously does not provide for comments, and basic RFC 7049 diagnostic 1994 notation inherits this property. 1996 In extended diagnostic notation, comments can be included, delimited 1997 by slashes ("/"). Any text within and including a pair of slashes is 1998 considered a comment. 2000 Comments are considered white space. Hence, they are allowed in 2001 prefixed binary strings; for instance, the following are equivalent: 2003 h'68656c6c6f20776f726c64' 2004 h'68 65 6c /doubled l!/ 6c 6f /hello/ 2005 20 /space/ 2006 77 6f 72 6c 64' /world/ 2008 This can be used to annotate a CBOR structure as in: 2010 /grasp-message/ [/M_DISCOVERY/ 1, /session-id/ 10584416, 2011 /objective/ [/objective-name/ "opsonize", 2012 /D, N, S/ 7, /loop-count/ 105]] 2014 (There are currently no end-of-line comments. If we want to add 2015 them, "//" sounds like a reasonable delimiter given that we already 2016 use slashes for comments, but we also could go e.g. for "#".) 2018 Editorial Comments 2020 [_format] So far, the ability to restrict format choices have not been 2021 needed beyond the floating point formats. Those can be 2022 applied to ranges using the new .and annotation now. It is 2023 not clear we want to add more format control before we have a 2024 use case. 2026 [_range] TO DO: define this precisely. This clearly includes integers 2027 and floats. Strings - as in "a".."z" - could be added if 2028 desired, but this would require adopting a definition of string 2029 ordering and possibly a successor function so "a".."z" does not 2030 include "bb". 2032 [_strings] TO DO: This still needs to be fully realized in the ABNF and 2033 in the CDDL tool. 2035 [_bitsendian] How useful would it be to have another variant that counts 2036 bits like in RFC box notation? (Or at least per-byte? 2037 32-bit words don't always perfectly mesh with byte 2038 strings.) 2040 [unflex] A comment has been that this is counter-intuitive. One 2041 solution would be to simply disallow unparenthesized usage of 2042 occurrence indicators in front of type choices unless a member 2043 key is also present like in group2 above. 2045 [_abnftodo] TO DO: This doesn't allow non-ASCII characters in the text 2046 or byte strings yet (and the prefixed byte strings are more 2047 liberally specified than they actually are); representation 2048 indicators are missing as well. 2050 [tdate] The prelude as included here does not yet have a .regexp 2051 annotation on tdate, but we probably do want to have one. 2053 Authors' Addresses 2055 Henk Birkholz 2056 Fraunhofer SIT 2057 Rheinstrasse 75 2058 Darmstadt 64295 2059 Germany 2061 Email: henk.birkholz@sit.fraunhofer.de 2063 Christoph Vigano 2064 Universitaet Bremen 2066 Email: christoph.vigano@uni-bremen.de 2068 Carsten Bormann 2069 Universitaet Bremen TZI 2070 Bibliothekstr. 1 2071 Bremen D-28359 2072 Germany 2074 Phone: +49-421-218-63921 2075 Email: cabo@tzi.org