idnits 2.17.1 draft-ietf-cbor-cddl-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 02, 2018) is 2118 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Cc' is mentioned on line 1114, but not defined == Missing Reference: 'Aa' is mentioned on line 1114, but not defined == Missing Reference: 'Ss' is mentioned on line 1114, but not defined == Missing Reference: 'Ee' is mentioned on line 1114, but not defined -- Looks like a reference, but probably isn't: '1' on line 2055 -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO6093' ** Obsolete normative reference: RFC 7049 (Obsoleted by RFC 8949) == Outdated reference: A later version (-13) exists of draft-bormann-cbor-cddl-freezer-00 -- Obsolete informational reference (is this intentional?): RFC 8152 (Obsoleted by RFC 9052, RFC 9053) Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group H. Birkholz 3 Internet-Draft Fraunhofer SIT 4 Intended status: Standards Track C. Vigano 5 Expires: January 3, 2019 Universitaet Bremen 6 C. Bormann 7 Universitaet Bremen TZI 8 July 02, 2018 10 Concise data definition language (CDDL): a notational convention to 11 express CBOR data structures 12 draft-ietf-cbor-cddl-03 14 Abstract 16 This document proposes a notational convention to express CBOR data 17 structures (RFC 7049). Its main goal is to provide an easy and 18 unambiguous way to express structures for protocol messages and data 19 formats that use CBOR. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on January 3, 2019. 38 Copyright Notice 40 Copyright (c) 2018 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (https://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 56 1.1. Requirements notation . . . . . . . . . . . . . . . . . . 4 57 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 58 2. The Style of Data Structure Specification . . . . . . . . . . 4 59 2.1. Groups and Composition in CDDL . . . . . . . . . . . . . 6 60 2.1.1. Usage . . . . . . . . . . . . . . . . . . . . . . . . 8 61 2.1.2. Syntax . . . . . . . . . . . . . . . . . . . . . . . 8 62 2.2. Types . . . . . . . . . . . . . . . . . . . . . . . . . . 9 63 2.2.1. Values . . . . . . . . . . . . . . . . . . . . . . . 9 64 2.2.2. Choices . . . . . . . . . . . . . . . . . . . . . . . 9 65 2.2.3. Representation Types . . . . . . . . . . . . . . . . 11 66 2.2.4. Root type . . . . . . . . . . . . . . . . . . . . . . 11 67 3. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 68 3.1. General conventions . . . . . . . . . . . . . . . . . . . 12 69 3.2. Occurrence . . . . . . . . . . . . . . . . . . . . . . . 13 70 3.3. Predefined names for types . . . . . . . . . . . . . . . 14 71 3.4. Arrays . . . . . . . . . . . . . . . . . . . . . . . . . 15 72 3.5. Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 15 73 3.5.1. Structs . . . . . . . . . . . . . . . . . . . . . . . 16 74 3.5.2. Tables . . . . . . . . . . . . . . . . . . . . . . . 19 75 3.5.3. Cuts in Maps . . . . . . . . . . . . . . . . . . . . 19 76 3.6. Tags . . . . . . . . . . . . . . . . . . . . . . . . . . 20 77 3.7. Unwrapping . . . . . . . . . . . . . . . . . . . . . . . 21 78 3.8. Controls . . . . . . . . . . . . . . . . . . . . . . . . 22 79 3.8.1. Control operator .size . . . . . . . . . . . . . . . 22 80 3.8.2. Control operator .bits . . . . . . . . . . . . . . . 23 81 3.8.3. Control operator .regexp . . . . . . . . . . . . . . 24 82 3.8.4. Control operators .cbor and .cborseq . . . . . . . . 25 83 3.8.5. Control operators .within and .and . . . . . . . . . 25 84 3.8.6. Control operators .lt, .le, .gt, .ge, .eq, .ne, and 85 .default . . . . . . . . . . . . . . . . . . . . . . 26 86 3.9. Socket/Plug . . . . . . . . . . . . . . . . . . . . . . . 27 87 3.10. Generics . . . . . . . . . . . . . . . . . . . . . . . . 28 88 3.11. Operator Precedence . . . . . . . . . . . . . . . . . . . 28 89 4. Making Use of CDDL . . . . . . . . . . . . . . . . . . . . . 30 90 4.1. As a guide to a human user . . . . . . . . . . . . . . . 30 91 4.2. For automated checking of CBOR data structure . . . . . . 30 92 4.3. For data analysis tools . . . . . . . . . . . . . . . . . 31 93 5. Security considerations . . . . . . . . . . . . . . . . . . . 31 94 6. IANA considerations . . . . . . . . . . . . . . . . . . . . . 32 95 6.1. CDDL control operator registry . . . . . . . . . . . . . 32 96 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 32 97 7.1. Normative References . . . . . . . . . . . . . . . . . . 32 98 7.2. Informative References . . . . . . . . . . . . . . . . . 33 99 Appendix A. (Not used.) . . . . . . . . . . . . . . . . . . . . 34 100 Appendix B. ABNF grammar . . . . . . . . . . . . . . . . . . . . 34 101 Appendix C. Matching rules . . . . . . . . . . . . . . . . . . . 36 102 Appendix D. (Not used.) . . . . . . . . . . . . . . . . . . . . 40 103 Appendix E. Standard Prelude . . . . . . . . . . . . . . . . . . 40 104 E.1. Use with JSON . . . . . . . . . . . . . . . . . . . . . . 42 105 Appendix F. The CDDL tool . . . . . . . . . . . . . . . . . . . 44 106 Appendix G. Extended Diagnostic Notation . . . . . . . . . . . . 44 107 G.1. White space in byte string notation . . . . . . . . . . . 45 108 G.2. Text in byte string notation . . . . . . . . . . . . . . 45 109 G.3. Embedded CBOR and CBOR sequences in byte strings . . . . 45 110 G.4. Concatenated Strings . . . . . . . . . . . . . . . . . . 46 111 G.5. Hexadecimal, octal, and binary numbers . . . . . . . . . 46 112 G.6. Comments . . . . . . . . . . . . . . . . . . . . . . . . 47 113 Appendix H. Examples . . . . . . . . . . . . . . . . . . . . . . 47 114 H.1. RFC 7071 . . . . . . . . . . . . . . . . . . . . . . . . 48 115 H.1.1. Examples from JSON Content Rules . . . . . . . . . . 52 116 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 54 117 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 55 119 1. Introduction 121 In this document, a notational convention to express CBOR [RFC7049] 122 data structures is defined. 124 The main goal for the convention is to provide a unified notation 125 that can be used when defining protocols that use CBOR. We term the 126 convention "Concise data definition language", or CDDL. 128 The CBOR notational convention has the following goals: 130 (G1) Provide an unambiguous description of the overall structure of 131 a CBOR data structure. 133 (G2) Flexibility to express the freedoms of choice in the CBOR data 134 format. 136 (G3) Possibility to restrict format choices where appropriate 137 [_format]. 139 (G4) Able to express common CBOR datatypes and structures. 141 (G5) Human and machine readable and processable. 143 (G6) Automatic checking of data format compliance. 145 (G7) Extraction of specific elements from CBOR data for further 146 processing. 148 Not an explicit goal per se, but a convenient side effect of the JSON 149 generic data model being a subset of the CBOR generic data model, is 150 the fact that CDDL can also be used for describing JSON data 151 structures (see Appendix E.1). 153 This document has the following structure: 155 The syntax of CDDL is defined in Section 3. Examples of CDDL and 156 related CBOR data items ("instances") are defined in Appendix H. 157 Section 4 discusses usage of CDDL. Examples are provided early in 158 the text to better illustrate concept definitions. A formal 159 definition of CDDL using ABNF grammar is provided in Appendix B. 160 Finally, a _prelude_ of standard CDDL definitions that is 161 automatically prepended to and thus available in every CBOR 162 specification is listed in Appendix E. 164 1.1. Requirements notation 166 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 167 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 168 "OPTIONAL" in this document are to be interpreted as described in RFC 169 2119, BCP 14 [RFC2119]. 171 1.2. Terminology 173 New terms are introduced in _cursive_. CDDL text in the running text 174 is in "typewriter". 176 2. The Style of Data Structure Specification 178 CDDL focuses on styles of specification that are in use in the 179 community employing the data model as pioneered by JSON and now 180 refined in CBOR. 182 There are a number of more or less atomic elements of a CBOR data 183 model, such as numbers, simple values (false, true, nil), text and 184 byte strings; CDDL does not focus on specifying their structure. 185 CDDL of course also allows adding a CBOR tag to a data item. 187 The more important components of a data structure definition language 188 are the data types used for composition: arrays and maps in CBOR 189 (called arrays and objects in JSON). While these are only two 190 representation formats, they are used to specify four loosely 191 distinguishable styles of composition: 193 o A _vector_, an array of elements that are mostly of the same 194 semantics. The set of signatures associated with a signed data 195 item is a typical application of a vector. 197 o A _record_, an array the elements of which have different, 198 positionally defined semantics, as detailed in the data structure 199 definition. A 2D point, specified as an array of an x coordinate 200 (which comes first) and a y coordinate (coming second) is an 201 example of a record, as is the pair of exponent (first) and 202 mantissa (second) in a CBOR decimal fraction. 204 o A _table_, a map from a domain of map keys to a domain of map 205 values, that are mostly of the same semantics. A set of language 206 tags, each mapped to a text string translated to that specific 207 language, is an example of a table. The key domain is usually not 208 limited to a specific set by the specification, but open for the 209 application, e.g., in a table mapping IP addresses to MAC 210 addresses, the specification does not attempt to foresee all 211 possible IP addresses. In a language such as JavaScript, a "Map" 212 (as opposed to a plain "Object") would often be employed to 213 achieve the generality of the key domain. 215 o A _struct_, a map from a domain of map keys as defined by the 216 specification to a domain of map values the semantics of each of 217 which is bound to a specific map key. This is what many people 218 have in mind when they think about JSON objects; CBOR adds the 219 ability to use map keys that are not just text strings. Structs 220 can be used to solve similar problems as records; the use of 221 explicit map keys facilitates optionality and extensibility. 223 Two important concepts provide the foundation for CDDL: 225 1. Instead of defining all four types of composition in CDDL 226 separately, or even defining one kind for arrays (vectors and 227 records) and one kind for maps (tables and structs), there is 228 only one kind of composition in CDDL: the _group_ (Section 2.1). 230 2. The other important concept is that of a _type_. The entire CDDL 231 specification defines a type (the one defined by its first 232 _rule_), which formally is the set of CBOR data items that are 233 acceptable as "instances" for this specification. CDDL 234 predefines a number of basic types such as "uint" (unsigned 235 integer) or "tstr" (text string), often making use of a simple 236 formal notation for CBOR data items. Each value that can be 237 expressed as a CBOR data item also is a type in its own right, 238 e.g. "1". A type can be built as a _choice_ of other types, 239 e.g., an "int" is either a "uint" or a "nint" (negative integer). 240 Finally, a type can be built as an array or a map from a group. 242 The rest of this section introduces a number of basic concepts of 243 CDDL, and section Section 3 defines additional syntax. Appendix C 244 gives a concise summary of the semantics of CDDL. 246 2.1. Groups and Composition in CDDL 248 CDDL Groups are lists of group _entries_, each of which can be a 249 name/value pair or a more complex group expression composed of name/ 250 value pairs. 252 In an array context, only the value of the entry is represented; the 253 name is annotation only (and can be left off if not needed). In a 254 map context, the names become the map keys ("member keys"). 256 In an array context, the sequence of elements in the group is 257 important, as it is the information that allows associating actual 258 array elements with entries in the group. In a map context, the 259 sequence of entries in a group is not relevant (but there is still a 260 need to write down group entries in a sequence). 262 A simple example of using a group right in a map definition is: 264 person = { 265 age: int, 266 name: tstr, 267 employer: tstr, 268 } 270 Figure 1: Using a group in a map 272 The three entries of the group are written between the curly braces 273 that create the map: Here, "age", "name", and "employer" are the 274 names that turn into the map key text strings, and "int" and "tstr" 275 (text string) are the types of the map values under these keys. 277 A group by itself (without creating a map around it) can be placed in 278 (round) parentheses, and given a name by using it in a rule: 280 pii = ( 281 age: int, 282 name: tstr, 283 employer: tstr, 284 ) 286 Figure 2: A basic group 288 This separate, named group definition allows us to rephrase Figure 1 289 as: 291 person = { 292 pii 293 } 295 Figure 3: Using a group by name 297 Note that the (curly) braces signify the creation of a map; the 298 groups themselves are neutral as to whether they will be used in a 299 map or an array. 301 As shown in Figure 1, the parentheses for groups are optional when 302 there is some other set of brackets present. Note that they can 303 still be used, leading to the not so realistic, but perfectly valid 304 example: 306 person = {( 307 age: int, 308 name: tstr, 309 employer: tstr, 310 )} 312 Groups can be used to factor out common parts of structs, e.g., 313 instead of writing copy/paste style specifications such as in 314 Figure 4, one can factor out the common subgroup, choose a name for 315 it, and write only the specific parts into the individual maps 316 (Figure 5). 318 person = { 319 age: int, 320 name: tstr, 321 employer: tstr, 322 } 324 dog = { 325 age: int, 326 name: tstr, 327 leash-length: float, 328 } 330 Figure 4: Maps with copy/paste 331 person = { 332 identity, 333 employer: tstr, 334 } 336 dog = { 337 identity, 338 leash-length: float, 339 } 341 identity = ( 342 age: int, 343 name: tstr, 344 ) 346 Figure 5: Using a group for factorization 348 Note that the lists inside the braces in the above definitions 349 constitute (anonymous) groups, while "identity" is a named group. 351 2.1.1. Usage 353 Groups are the instrument used in composing data structures with 354 CDDL. It is a matter of style in defining those structures whether 355 to define groups (anonymously) right in their contexts or whether to 356 define them in a separate rule and to reference them with their 357 respective name (possibly more than once). 359 With this, one is allowed to define all small parts of their data 360 structures and compose bigger protocol units with those or to have 361 only one big protocol data unit that has all definitions ad hoc where 362 needed. 364 2.1.2. Syntax 366 The composition syntax intends to be concise and easy to read: 368 o The start of a group can be marked by '(' 370 o The end of a group can be marked by ')' 372 o Definitions of entries inside of a group are noted as follows: 373 _keytype => valuetype,_ (read "keytype maps to valuetype"). The 374 comma is actually optional (not just in the final entry), but it 375 is considered good style to set it. The double arrow can be 376 replaced by a colon in the common case of directly using a text 377 string or integer literal as a key (see Section 3.5.1). 379 A basic entry consists of a _keytype_ and a _valuetype_, both of 380 which are types (Section 2.2). 382 A group definition can also contain choices between groups, see 383 Section 2.2.2. 385 2.2. Types 387 2.2.1. Values 389 Values such as numbers and strings can be used in place of a type. 390 (For instance, this is a very common thing to do for a keytype, 391 common enough that CDDL provides additional convenience syntax for 392 this.) 394 The value notation is based on the C language, but does not offer all 395 the syntactic variations Appendix B. The value notation for numbers 396 inherits from C the distinction between integer values (no fractional 397 part or exponent given -- NR1 [ISO6093]) and floating point values 398 (where a fractional part and/or an exponent is present -- NR2 or 399 NR3), so the type "1" does not include any floating point numbers 400 while the types "1e3" and "1.5" are both floating point Numbers and 401 do not include any integer numbers. 403 2.2.2. Choices 405 Many places that allow a type also allow a choice between types, 406 delimited by a "/" (slash). The entire choice construct can be put 407 into parentheses if this is required to make the construction 408 unambiguous (please see Appendix B for the details). 410 Choices of values can be used to express enumerations: 412 attire = "bow tie" / "necktie" / "Internet attire" 413 protocol = 6 / 17 415 Similarly as for types, CDDL also allows choices between groups, 416 delimited by a "//" (double slash). 418 address = { delivery } 420 delivery = ( 421 street: tstr, ? number: uint, city // 422 po-box: uint, city // 423 per-pickup: true ) 425 city = ( 426 name: tstr, zip-code: uint 427 ) 429 Both for type choices and for group choices, additional alternatives 430 can be added to a rule later in separate rules by using "/=" and 431 "//=", respectively, instead of "=": 433 attire /= "swimwear" 435 delivery //= ( 436 lat: float, long: float, drone-type: tstr 437 ) 439 It is not an error if a name is first used with a "/=" or "//=" 440 (there is no need to "create it" with "="). 442 2.2.2.1. Ranges 444 Instead of naming all the values that make up a choice, CDDL allows 445 building a _range_ out of two values that are in an ordering 446 relationship. A range can be inclusive of both ends given (denoted 447 by joining two values by ".."), or include the first and exclude the 448 second (denoted by instead using "..."). 450 device-address = byte 451 max-byte = 255 452 byte = 0..max-byte ; inclusive range 453 first-non-byte = 256 454 byte1 = 0...first-non-byte ; byte1 is equivalent to byte 456 CDDL currently only allows ranges between numbers [_range]. 458 2.2.2.2. Turning a group into a choice 460 Some choices are built out of large numbers of values, often 461 integers, each of which is best given a semantic name in the 462 specification. Instead of naming each of these integers and then 463 accumulating these into a choice, CDDL allows building a choice from 464 a group by prefixing it with a "&" character: 466 terminal-color = &basecolors 467 basecolors = ( 468 black: 0, red: 1, green: 2, yellow: 3, 469 blue: 4, magenta: 5, cyan: 6, white: 7, 470 ) 471 extended-color = &( 472 basecolors, 473 orange: 8, pink: 9, purple: 10, brown: 11, 474 ) 476 As with the use of groups in arrays (Section 3.4), the membernames 477 have only documentary value (in particular, they might be used by a 478 tool when displaying integers that are taken from that choice). 480 2.2.3. Representation Types 482 CDDL allows the specification of a data item type by referring to the 483 CBOR representation (major and minor numbers). How this is used 484 should be evident from the prelude (Appendix E). 486 It may be necessary to make use of representation types outside the 487 prelude, e.g., a specification could start by making use of an 488 existing tag in a more specific way, or define a new tag not defined 489 in the prelude: 491 my_breakfast = #6.55799(breakfast) ; cbor-any is too general! 492 breakfast = cereal / porridge 493 cereal = #6.998(tstr) 494 porridge = #6.999([liquid, solid]) 495 liquid = milk / water 496 milk = 0 497 water = 1 498 solid = tstr 500 2.2.4. Root type 502 There is no special syntax to identify the root of a CDDL data 503 structure definition: that role is simply taken by the first rule 504 defined in the file. 506 This is motivated by the usual top-down approach for defining data 507 structures, decomposing a big data structure unit into smaller parts; 508 however, except for the root type, there is no need to strictly 509 follow this sequence. 511 (Note that there is no way to use a group as a root - it must be a 512 type. Using a group as the root might be employed as a way to 513 specify a CBOR sequence in a future version of this specification; 514 this would act as if that group is used in an array and the data 515 items in that fictional array form the members of the CBOR sequence.) 517 3. Syntax 519 In this section, the overall syntax of CDDL is shown, alongside some 520 examples just illustrating syntax. (The definition will not attempt 521 to be overly formal; refer to Appendix B for the details.) 523 3.1. General conventions 525 The basic syntax is inspired by ABNF [RFC5234], with 527 o rules, whether they define groups or types, are defined with a 528 name, followed by an equals sign "=" and the actual definition 529 according to the respective syntactic rules of that definition. 531 o A name can consist of any of the characters from the set {'A', 532 ..., 'Z', 'a', ..., 'z', '0', ..., '9', '_', '-', '@', '.', '$'}, 533 starting with an alphabetic character (including '@', '_', '$') 534 and ending in one or a digit. 536 * Names are case sensitive. 538 * It is preferred style to start a name with a lower case letter. 540 * The hyphen is preferred over the underscore (except in a 541 "bareword" (Section 3.5.1), where the semantics may actually 542 require an underscore). 544 * The period may be useful for larger specifications, to express 545 some module structure (as in "tcp.throughput" vs. 546 "udp.throughput"). 548 * A number of names are predefined in the CDDL prelude, as listed 549 in Appendix E. 551 * Rule names (types or groups) do not appear in the actual CBOR 552 encoding, but names used as "barewords" in member keys do. 554 o Comments are started by a ';' (semicolon) character and finish at 555 the end of a line (LF or CRLF). 557 o outside strings, whitespace (spaces, newlines, and comments) is 558 used to separate syntactic elements for readability (and to 559 separate identifiers or numbers that follow each other); it is 560 otherwise completely optional. 562 o Hexadecimal numbers are preceded by '0x' (without quotes, lower 563 case x), and are case insensitive. Similarly, binary numbers are 564 preceded by '0b'. 566 o Text strings are enclosed by double quotation '"' characters. 567 They follow the conventions for strings as defined in section 7 of 568 [RFC8259]. (ABNF users may want to note that there is no support 569 in CDDL for the concept of case insensitivity in text strings; if 570 necessary, regular expressions can be used (Section 3.8.3).) 572 o Byte strings are enclosed by single quotation "'" characters and 573 may be prefixed by "h" or "b64". If unprefixed, the string is 574 interpreted as with a text string, except that single quotes must 575 be escaped and that the UTF-8 bytes resulting are marked as a byte 576 string (major type 2). If prefixed as "h" or "b64", the string is 577 interpreted as a sequence of hex digits or a base64(url) string, 578 respectively (as with the diagnostic notation in section 6 of 579 [RFC7049]; cf. Appendix G.2); any white space present within the 580 string (including comments) is ignored in the prefixed case. 581 [_strings] 583 o CDDL uses UTF-8 [RFC3629] for its encoding. 585 Example: 587 ; This is a comment 588 person = { g } 590 g = ( 591 "name": tstr, 592 age: int, ; "age" is a bareword 593 ) 595 3.2. Occurrence 597 An optional _occurrence_ indicator can be given in front of a group 598 entry. It is either one of the characters '?' (optional), '*' (zero 599 or more), or '+' (one or more), or is of the form n*m, where n and m 600 are optional unsigned integers and n is the lower limit (default 0) 601 and m is the upper limit (default no limit) of occurrences. 603 If no occurrence indicator is specified, the group entry is to occur 604 exactly once (as if 1*1 were specified). 606 Note that CDDL, outside any directives/annotations that could 607 possibly be defined, does not make any prescription as to whether 608 arrays or maps use the definite length or indefinite length encoding. 609 I.e., there is no correlation between leaving the size of an array 610 "open" in the spec and the fact that it is then interchanged with 611 definite or indefinite length. 613 Please also note that CDDL can describe flexibility that the data 614 model of the target representation does not have. This is rather 615 obvious for JSON, but also is relevant for CBOR: 617 apartment = { 618 kitchen: size, 619 * bedroom: size, 620 } 621 size = float ; in m2 623 The previous specification does not mean that CBOR is changed to 624 allow to use the key "bedroom" more than once. In other words, due 625 to the restrictions imposed by the data model, the third line pretty 626 much turns into: 628 ? bedroom: size, 630 (Occurrence indicators beyond one still are useful in maps for groups 631 that allow a variety of keys.) 633 3.3. Predefined names for types 635 CDDL predefines a number of names. This subsection summarizes these 636 names, but please see Appendix E for the exact definitions. 638 The following keywords for primitive datatypes are defined: 640 "bool" Boolean value (major type 7, additional information 20 or 641 21). 643 "uint" An unsigned integer (major type 0). 645 "nint" A negative integer (major type 1). 647 "int" An unsigned integer or a negative integer. 649 "float16" A number representable as an IEEE 754 half-precision float 650 (major type 7, additional information 25). 652 "float32" A number representable as an IEEE 754 single-precision 653 float (major type 7, additional information 26). 655 "float64" A number representable as an IEEE 754 double-precision 656 float (major type 7, additional information 27). 658 "float" One of float16, float32, or float64. 660 "bstr" or "bytes" A byte string (major type 2). 662 "tstr" or "text" Text string (major type 3) 664 (Note that there are no predefined names for arrays or maps; these 665 are defined with the syntax given below.) 667 In addition, a number of types are defined in the prelude that are 668 associated with CBOR tags, such as "tdate", "bigint", "regexp" etc. 670 3.4. Arrays 672 Array definitions surround a group with square brackets. 674 For each entry, an occurrence indicator as specified in Section 3.2 675 is permitted. 677 For example: 679 unlimited-people = [* person] 680 one-or-two-people = [1*2 person] 681 at-least-two-people = [2* person] 682 person = ( 683 name: tstr, 684 age: uint, 685 ) 687 The group "person" is defined in such a way that repeating it in the 688 array each time generates alternating names and ages, so these are 689 four valid values for a data item of type "unlimited-people": 691 ["roundlet", 1047, "psychurgy", 2204, "extrarhythmical", 2231] 692 [] 693 ["aluminize", 212, "climograph", 4124] 694 ["penintime", 1513, "endocarditis", 4084, "impermeator", 1669, 695 "coextension", 865] 697 3.5. Maps 699 The syntax for specifying maps merits special attention, as well as a 700 number of optimizations and conveniences, as it is likely to be the 701 focal point of many specifications employing CDDL. While the syntax 702 does not strictly distinguish struct and table usage of maps, it 703 caters specifically to each of them. 705 But first, let's reiterate a feature of CBOR that it has inherited 706 from JSON: The key/value pairs in CBOR maps have no fixed ordering. 707 (One could imagine situations where fixing the ordering may be of 708 use. For example, a decoder could look for values related with 709 integer keys 1, 3 and 7. If the order were fixed and the decoder 710 encounters the key 4 without having encountered key 3, it could 711 conclude that key 3 is not available without doing more complicated 712 bookkeeping. Unfortunately, neither JSON nor CBOR support this, so 713 no attempt was made to support this in CDDL either.) 715 3.5.1. Structs 717 The "struct" usage of maps is similar to the way JSON objects are 718 used in many JSON applications. 720 A map is defined in the same way as defining an array (see 721 Section 3.4), except for using curly braces "{}" instead of square 722 brackets "[]". 724 An occurrence indicator as specified in Section 3.2 is permitted for 725 each group entry. 727 The following is an example of a structure: 729 Geography = [ 730 city : tstr, 731 gpsCoordinates : GpsCoordinates, 732 ] 734 GpsCoordinates = { 735 longitude : uint, ; multiplied by 10^7 736 latitude : uint, ; multiplied by 10^7 737 } 739 When encoding, the Geography structure is encoded using a CBOR array 740 with two entries (the keys for the group entries are ignored), 741 whereas the GpsCoordinates are encoded as a CBOR map with two key/ 742 value pairs. 744 Types used in a structure can be defined in separate rules or just in 745 place (potentially placed inside parentheses, such as for choices). 746 E.g.: 748 located-samples = { 749 sample-point: int, 750 samples: [+ float], 751 } 753 where "located-samples" is the datatype to be used when referring to 754 the struct, and "sample-point" and "samples" are the keys to be used. 755 This is actually a complete example: an identifier that is followed 756 by a colon can be directly used as the text string for a member key 757 (we speak of a "bareword" member key), as can a double-quoted string 758 or a number. (When other types, in particular multi-valued ones, are 759 used as keytypes, they are followed by a double arrow, see below.) 761 If a text string key does not match the syntax for an identifier (or 762 if the specifier just happens to prefer using double quotes), the 763 text string syntax can also be used in the member key position, 764 followed by a colon. The above example could therefore have been 765 written with quoted strings in the member key positions. More 766 generally, all the types defined can be used in a keytype position by 767 following them with a double arrow -- in particular, the double arrow 768 is necessary if a type is named by an identifier (which would be 769 interpreted as a string before a colon). A string also is a (single- 770 valued) type, so another form for this example is: 772 located-samples = { 773 "sample-point" => int, 774 "samples" => [+ float], 775 } 777 See Section 3.5.3 below for how the colon shortcut described here 778 also adds some implied semantics. 780 A better way to demonstrate the double-arrow use may be: 782 located-samples = { 783 sample-point: int, 784 samples: [+ float], 785 * equipment-type => equipment-tolerances, 786 } 787 equipment-type = [name: tstr, manufacturer: tstr] 788 equipment-tolerances = [+ [float, float]] 790 The example below defines a struct with optional entries: display 791 name (as a text string), the name components first name and family 792 name (as a map of text strings), and age information (as an unsigned 793 integer). 795 PersonalData = { 796 ? displayName: tstr, 797 NameComponents, 798 ? age: uint, 799 } 801 NameComponents = ( 802 ? firstName: tstr, 803 ? familyName: tstr, 804 ) 806 Note that the group definition for NameComponents does not generate 807 another map; instead, all four keys are directly in the struct built 808 by PersonalData. 810 In this example, all key/value pairs are optional from the 811 perspective of CDDL. With no occurrence indicator, an entry is 812 mandatory. 814 If the addition of more entries not specified by the current 815 specification is desired, one can add this possibility explicitly: 817 PersonalData = { 818 ? displayName: tstr, 819 NameComponents, 820 ? age: uint, 821 * tstr => any 822 } 824 NameComponents = ( 825 ? firstName: tstr, 826 ? familyName: tstr, 827 ) 829 Figure 6: Personal Data: Example for extensibility 831 The cddl tool (Appendix F) generated as one acceptable instance for 832 this specification: 834 {"familyName": "agust", "antiforeignism": "pretzel", 835 "springbuck": "illuminatingly", "exuviae": "ephemeris", 836 "kilometrage": "frogfish"} 838 (See Section 3.9 for one way to explicitly identify an extension 839 point.) 841 3.5.2. Tables 843 A table can be specified by defining a map with entries where the 844 keytype is not single-valued, e.g.: 846 square-roots = {* x => y} 847 x = int 848 y = float 850 Here, the key in each key/value pair has datatype x (defined as int), 851 and the value has datatype y (defined as float). 853 If the specification does not need to restrict one of x or y (i.e., 854 the application is free to choose per entry), it can be replaced by 855 the predefined name "any". 857 As another example, the following could be used as a conversion table 858 converting from an integer or float to a string: 860 tostring = {* mynumber => tstr} 861 mynumber = int / float 863 3.5.3. Cuts in Maps 865 The extensibility idiom discussed above for structs has one problem: 867 extensible-map-example = { 868 ? "optional-key" => int, 869 * tstr => any 870 } 872 In this example, there is one optional key "optional-key", which, 873 when present, maps to an integer. There is also a wild card for any 874 future additions. 876 Unfortunately, the data item 878 { "optional-key": "nonsense" } 880 does match this specification: While the first entry of the group 881 does not match, the second one (the wildcard) does. This may be very 882 well desirable (e.g., if a future extension is to be allowed to 883 extend the type of "optional-key"), but in many cases isn't. 885 In anticipation of a more general potential feature called "cuts", 886 CDDL allows inserting a cut "^" into the definition of the map entry: 888 extensible-map-example = { 889 ? "optional-key" ^ => int, 890 * tstr => any 891 } 893 A cut in this position means that once the map key matches the entry 894 carrying the cut, other potential matches for the key that occur in 895 later entries in the group of the map are no longer allowed. (This 896 rule applies independent of whether the value matches, too.) So the 897 example above no longer matches the version modified with a cut. 899 Since the desire for this kind of exclusive matching is so frequent, 900 the ":" shortcut is actually defined to include the cut semantics. 901 So the preceding example (including the cut) can be written more 902 simply as: 904 extensible-map-example = { 905 ? "optional-key": int, 906 * tstr => any 907 } 909 or even shorter, using a bareword for the key: 911 extensible-map-example = { 912 ? optional-key: int, 913 * tstr => any 914 } 916 3.6. Tags 918 A type can make use of a CBOR tag (major type 6) by using the 919 representation type notation, giving #6.nnn(type) where nnn is an 920 unsigned integer giving the tag number and "type" is the type of the 921 data item being tagged. 923 For example, the following line from the CDDL prelude (Appendix E) 924 defines "biguint" as a type name for a positive bignum N: 926 biguint = #6.2(bstr) 928 The tags defined by [RFC7049] are included in the prelude. 929 Additional tags since registered need to be added to a CDDL 930 specification as needed; e.g., a binary UUID tag could be referenced 931 as "buuid" in a specification after defining 933 buuid = #6.37(bstr) 935 In the following example, usage of the tag 32 for URIs is optional: 937 my_uri = #6.32(tstr) / tstr 939 3.7. Unwrapping 941 The group that is used to define a map or an array can often be 942 reused in the definition of another map or array. Similarly, a type 943 defined as a tag carries an internal data item that one would like to 944 refer to. In these cases, it is expedient to simply use the name of 945 the map, array, or tag type as a handle for the group or type defined 946 inside it. 948 The "unwrap" operator (written by preceding a name by a tilde 949 character "~") can be used to strip the type defined for a name by 950 one layer, exposing the underlying group (for maps and arrays) or 951 type (for tags). 953 For example, an application might want to define a basic and an 954 advanced header. Without unwrapping, this might be done as follows: 956 basic-header-group = ( 957 field1: int, 958 field2: text, 959 ) 961 basic-header = [ basic-header-group ] 963 advanced-header = [ 964 basic-header-group, 965 field3: bytes, 966 field4: number, ; as in the tagged type "time" 967 ] 969 Unwrapping simplifies this to: 971 basic-header = [ 972 field1: int, 973 field2: text, 974 ] 976 advanced-header = [ 977 ~basic-header, 978 field3: bytes, 979 field4: ~time, 980 ] 982 (Note that leaving out the first unwrap operator in the latter 983 example would lead to nesting the basic-header in its own array 984 inside the advanced-header, while, with the unwrapped basic-header, 985 the definition of the group inside basic-header is essentially 986 repeated inside advanced-header, leading to a single array. This can 987 be used for various applications often solved by inheritance in 988 programming languages. The effect of unwrapping can also be 989 described as "threading in" the group or type inside the referenced 990 type, which suggested the thread-like "~" character.) 992 3.8. Controls 994 A _control_ allows to relate a _target_ type with a _controller_ type 995 via a _control operator_. 997 The syntax for a control type is "target .control-operator 998 controller", where control operators are special identifiers prefixed 999 by a dot. (Note that _target_ or _controller_ might need to be 1000 parenthesized.) 1002 A number of control operators are defined at his point. Note that 1003 the CDDL tool does not currently support combining multiple controls 1004 on a single target. 1006 3.8.1. Control operator .size 1008 A ".size" control controls the size of the target in bytes by the 1009 control type. Examples: 1011 full-address = [[+ label], ip4, ip6] 1012 ip4 = bstr .size 4 1013 ip6 = bstr .size 16 1014 label = bstr .size (1..63) 1016 Figure 7: Control for size in bytes 1018 When applied to an unsigned integer, the ".size" control restricts 1019 the range of that integer by giving a maximum number of bytes that 1020 should be needed in a computer representation of that unsigned 1021 integer. In other words, "uint .size N" is equivalent to 1022 "0...BYTES_N", where BYTES_N == 256**N. 1024 audio_sample = uint .size 3 ; 24-bit, equivalent to 0..16777215 1026 Figure 8: Control for integer size in bytes 1028 Note that, as with value restrictions in CDDL, this control is not a 1029 representation constraint; a number that fits into fewer bytes can 1030 still be represented in that form, and an inefficient implementation 1031 could use a longer form (unless that is restricted by some format 1032 constraints outside of CDDL, such as the rules in Section 3.9 of 1033 [RFC7049]). 1035 3.8.2. Control operator .bits 1037 A ".bits" control on a byte string indicates that, in the target, 1038 only the bits numbered by a number in the control type are allowed to 1039 be set. (Bits are counted the usual way, bit number "n" being set in 1040 "str" meaning that "(str[n >> 3] & (1 << (n & 7))) != 0".) 1041 [_bitsendian] 1043 Similarly, a ".bits" control on an unsigned integer "i" indicates 1044 that for all unsigned integers "n" where "(i & (1 << n)) != 0", "n" 1045 must be in the control type. 1047 tcpflagbytes = bstr .bits flags 1048 flags = &( 1049 fin: 8, 1050 syn: 9, 1051 rst: 10, 1052 psh: 11, 1053 ack: 12, 1054 urg: 13, 1055 ece: 14, 1056 cwr: 15, 1057 ns: 0, 1058 ) / (4..7) ; data offset bits 1060 rwxbits = uint .bits rwx 1061 rwx = &(r: 2, w: 1, x: 0) 1063 Figure 9: Control for what bits can be set 1065 The CDDL tool generates the following ten example instances for 1066 "tcpflagbytes": 1068 h'906d' h'01fc' h'8145' h'01b7' h'013d' h'409f' h'018e' h'c05f' 1069 h'01fa' h'01fe' 1071 These examples do not illustrate that the above CDDL specification 1072 does not explicitly specify a size of two bytes: A valid all clear 1073 instance of flag bytes could be "h''" or "h'00'" or even "h'000000'" 1074 as well. 1076 3.8.3. Control operator .regexp 1078 A ".regexp" control indicates that the text string given as a target 1079 needs to match the XSD regular expression given as a value in the 1080 control type. XSD regular expressions are defined in Appendix F of 1081 [W3C.REC-xmlschema-2-20041028]. 1083 nai = tstr .regexp "[A-Za-z0-9]+@[A-Za-z0-9]+(\\.[A-Za-z0-9]+)+" 1085 Figure 10: Control with an XSD regexp 1087 The CDDL tool proposes: 1089 "N1@CH57HF.4Znqe0.dYJRN.igjf" 1091 3.8.3.1. Usage considerations 1093 Note that XSD regular expressions do not support the usual \x or \u 1094 escapes for hexadecimal expression of bytes or unicode code points. 1095 However, in CDDL the XSD regular expressions are contained in text 1096 strings, the literal notation for which provides \u escapes; this 1097 should suffice for most applications that use regular expressions for 1098 text strings. (Note that this also means that there is one level of 1099 string escaping before the XSD escaping rules are applied.) 1101 XSD regular expressions support character class subtraction, a 1102 feature often not found in regular expression libraries; 1103 specification writers may want to use this feature sparingly. 1104 Similar considerations apply to Unicode character classes; where 1105 these are used, the specification SHOULD identify which Unicode 1106 versions are addressed. 1108 Other surprises for infrequent users of XSD regular expressions may 1109 include: 1111 o No direct support for case insensitivity. While case 1112 insensitivity has gone mostly out of fashion in protocol design, 1113 it is sometimes needed and then needs to be expressed manually as 1114 in "[Cc][Aa][Ss][Ee]". 1116 o The support for popular character classes such as \w and \d is 1117 based on Unicode character properties, which is often not what is 1118 desired in an ASCII-based protocol and thus might lead to 1119 surprises. (\s and \S do have their more conventional meanings, 1120 and "." matches any character but the line ending characters \r or 1121 \n.) 1123 3.8.3.2. Discussion 1125 There are many flavors of regular expression in use in the 1126 programming community. For instance, perl-compatible regular 1127 expressions (PCRE) are widely used and probably are more useful than 1128 XSD regular expressions. However, there is no normative reference 1129 for PCRE that could be used in the present document. Instead, we opt 1130 for XSD regular expressions for now. There is precedent for that 1131 choice in the IETF, e.g., in YANG [RFC7950]. 1133 Note that CDDL uses controls as its main extension point. This 1134 creates the opportunity to add further regular expression formats in 1135 addition to the one referenced here if desired. As an example, a 1136 control ".pcre" is defined in [I-D.bormann-cbor-cddl-freezer]. 1138 3.8.4. Control operators .cbor and .cborseq 1140 A ".cbor" control on a byte string indicates that the byte string 1141 carries a CBOR encoded data item. Decoded, the data item matches the 1142 type given as the right-hand side argument (type1 in the following 1143 example). 1145 "bytes .cbor type1" 1147 Similarly, a ".cborseq" control on a byte string indicates that the 1148 byte string carries a sequence of CBOR encoded data items. When the 1149 data items are taken as an array, the array matches the type given as 1150 the right-hand side argument (type2 in the following example). 1152 "bytes .cborseq type2" 1154 (The conversion of the encoded sequence to an array can be effected 1155 for instance by wrapping the byte string between the two bytes 0x9f 1156 and 0xff and decoding the wrapped byte string as a CBOR encoded data 1157 item.) 1159 3.8.5. Control operators .within and .and 1161 A ".and" control on a type indicates that the data item matches both 1162 that left hand side type and the type given as the right hand side. 1163 (Formally, the resulting type is the intersection of the two types 1164 given.) 1166 "type1 .and type2" 1168 A variant of the ".and" control is the ".within" control, which 1169 expresses an additional intent: the left hand side type is meant to 1170 be a subset of the right-hand-side type. 1172 "type1 .within type2" 1174 While both forms have the identical formal semantics (intersection), 1175 the intention of the ".within" form is that the right hand side gives 1176 guidance to the types allowed on the left hand side, which typically 1177 is a socket (Section 3.9): 1179 message = $message .within message-structure 1180 message-structure = [message_type, *message_option] 1181 message_type = 0..255 1182 message_option = any 1184 $message /= [3, dough: text, topping: [* text]] 1185 $message /= [4, noodles: text, sauce: text, parmesan: bool] 1187 For ".within", a tool might flag an error if type1 allows data items 1188 that are not allowed by type2. In contrast, for ".and", there is no 1189 expectation that type1 already is a subset of type2. 1191 3.8.6. Control operators .lt, .le, .gt, .ge, .eq, .ne, and .default 1193 The controls .lt, .le, .gt, .ge, .eq, .ne specify a constraint on the 1194 left hand side type to be a value less than, less than or equal, 1195 equal to, not equal to, greater than, or greater than or equal to a 1196 value given as a (single-valued) right hand side type. In the 1197 present specification, the first four controls (.lt, .le, .gt, .ge) 1198 are defined only for numeric types, as these have a natural ordering 1199 relationship. 1201 speed = number .ge 0 ; unit: m/s 1203 A variant of the ".ne" control is the ".default" control, which 1204 expresses an additional intent: the value specified by the right- 1205 hand-side type is intended as a default value for the left hand side 1206 type given, and the implied .ne control is there to prevent this 1207 value from being sent over the wire. This control is only meaningful 1208 when the control type is used in an optional context; otherwise there 1209 would be no way to express the default value. 1211 timer = { 1212 time: uint, 1213 ? displayed-step: (number .gt 0) .default 1 1214 } 1216 3.9. Socket/Plug 1218 Both for type choices and group choices, a mechanism is defined that 1219 facilitates starting out with empty choices and assembling them 1220 later, potentially in separate files that are concatenated to build 1221 the full specification. 1223 Per convention, CDDL extension points are marked with a leading 1224 dollar sign (types) or two leading dollar signs (groups). Tools 1225 honor that convention by not raising an error if such a type or group 1226 is not defined at all; the symbol is then taken to be an empty type 1227 choice (group choice), i.e., no choice is available. 1229 tcp-header = {seq: uint, ack: uint, * $$tcp-option} 1231 ; later, in a different file 1233 $$tcp-option //= ( 1234 sack: [+(left: uint, right: uint)] 1235 ) 1237 ; and, maybe in another file 1239 $$tcp-option //= ( 1240 sack-permitted: true 1241 ) 1243 Names that start with a single "$" are "type sockets", names with a 1244 double "$$" are "group sockets". It is not an error if there is no 1245 definition for a socket at all; this then means there is no way to 1246 satisfy the rule (i.e., the choice is empty). 1248 All definitions (plugs) for socket names must be augmentations, i.e., 1249 they must be using "/=" and "//=", respectively. 1251 To pick up the example illustrated in Figure 6, the socket/plug 1252 mechanism could be used as shown in Figure 11: 1254 PersonalData = { 1255 ? displayName: tstr, 1256 NameComponents, 1257 ? age: uint, 1258 * $$personaldata-extensions 1259 } 1261 NameComponents = ( 1262 ? firstName: tstr, 1263 ? familyName: tstr, 1264 ) 1266 ; The above already works as is. 1267 ; But then, we can add later: 1269 $$personaldata-extensions //= ( 1270 favorite-salsa: tstr, 1271 ) 1273 ; and again, somewhere else: 1275 $$personaldata-extensions //= ( 1276 shoesize: uint, 1277 ) 1279 Figure 11: Personal Data example: Using socket/plug extensibility 1281 3.10. Generics 1283 Using angle brackets, the left hand side of a rule can add formal 1284 parameters after the name being defined, as in: 1286 messages = message<"reboot", "now"> / message<"sleep", 1..100> 1287 message = {type: t, value: v} 1289 When using a generic rule, the formal parameters are bound to the 1290 actual arguments supplied (also using angle brackets), within the 1291 scope of the generic rule (as if there were a rule of the form 1292 parameter = argument). 1294 (There are some limitations to nesting of generics in Appendix F at 1295 this time.) 1297 3.11. Operator Precedence 1299 As with any language that has multiple syntactic features such as 1300 prefix and infix operators, CDDL has operators that bind more tightly 1301 than others. This is becoming more complicated than, say, in ABNF, 1302 as CDDL has both types and groups, with operators that are specific 1303 to these concepts. Type operators (such as "/" for type choice) 1304 operate on types, while group operators (such as "//" for group 1305 choice) operate on groups. Types can simply be used in groups, but 1306 groups need to be bracketed (as arrays or maps) to become types. So, 1307 type operators naturally bind closer than group operators. 1309 For instance, in 1311 t = [group1] 1312 group1 = (a / b // c / d) 1313 a = 1 b = 2 c = 3 d = 4 1315 group1 is a group choice between the type choice of a and b and the 1316 type choice of c and d. This becomes more relevant once member keys 1317 and/or occurrences are added in: 1319 t = {group2} 1320 group2 = (? ab: a / b // cd: c / d) 1321 a = 1 b = 2 c = 3 d = 4 1323 is a group choice between the optional member "ab" of type a or b and 1324 the member "cd" of type c or d. Note that the optionality is 1325 attached to the first choice ("ab"), not to the second choice. 1327 Similarly, in 1329 t = [group3] 1330 group3 = (+ a / b / c) 1331 a = 1 b = 2 c = 3 1333 group3 is a repetition of a type choice between a, b, and c [unflex]; 1334 if just a is to be repeatable, a group choice is needed to focus the 1335 occurrence: 1337 t = [group4] 1338 group4 = (+ a // b / c) 1339 a = 1 b = 2 c = 3 1341 group4 is a group choice between a repeatable a and a single b or c. 1343 In general, as with many other languages with operator precedence 1344 rules, it is best not to rely on them, but to insert parentheses for 1345 readability: 1347 t = [group4a] 1348 group4a = ((+ a) // (b / c)) 1349 a = 1 b = 2 c = 3 1350 The operator precedences, in sequence of loose to tight binding, are 1351 defined in Appendix B and summarized in Table 1. (Arities given are 1352 1 for unary prefix operators and 2 for binary infix operators.) 1354 +----------+----+---------------------------+------+ 1355 | Operator | Ar | Operates on | Prec | 1356 +----------+----+---------------------------+------+ 1357 | = | 2 | name = type, name = group | 1 | 1358 | /= | 2 | name /= type | 1 | 1359 | //= | 2 | name //= group | 1 | 1360 | // | 2 | group // group | 2 | 1361 | , | 2 | group, group | 3 | 1362 | * | 1 | * group | 4 | 1363 | N*M | 1 | N*M group | 4 | 1364 | + | 1 | + group | 4 | 1365 | ? | 1 | ? group | 4 | 1366 | => | 2 | type => type | 5 | 1367 | : | 2 | name: type | 5 | 1368 | / | 2 | type / type | 6 | 1369 | & | 1 | &group | 6 | 1370 | .. | 2 | type..type | 7 | 1371 | ... | 2 | type...type | 7 | 1372 | .anno | 2 | type .anno type | 7 | 1373 +----------+----+---------------------------+------+ 1375 Table 1: Summary of operator precedences 1377 4. Making Use of CDDL 1379 In this section, we discuss several potential ways to employ CDDL. 1381 4.1. As a guide to a human user 1383 CDDL can be used to efficiently define the layout of CBOR data, such 1384 that a human implementer can easily see how data is supposed to be 1385 encoded. 1387 Since CDDL maps parts of the CBOR data to human readable names, tools 1388 could be built that use CDDL to provide a human friendly 1389 representation of the CBOR data, and allow them to edit such data 1390 while remaining compliant to its CDDL definition. 1392 4.2. For automated checking of CBOR data structure 1394 CDDL has been specified such that a machine can handle the CDDL 1395 definition and related CBOR data (and, thus, also JSON data). For 1396 example, a machine could use CDDL to check whether or not CBOR data 1397 is compliant to its definition. 1399 The need for thoroughness of such compliance checking depends on the 1400 application. For example, an application may decide not to check the 1401 data structure at all, and use the CDDL definition solely as a means 1402 to indicate the structure of the data to the programmer. 1404 On the other end, the application may also implement a checking 1405 mechanism that goes as far as checking that all mandatory map members 1406 are available. 1408 The matter in how far the data description must be enforced by an 1409 application is left to the designers and implementers of that 1410 application, keeping in mind related security considerations. 1412 In no case the intention is that a CDDL tool would be "writing code" 1413 for an implementation. 1415 4.3. For data analysis tools 1417 In the long run, it can be expected that more and more data will be 1418 stored using the CBOR data format. 1420 Where there is data, there is data analysis and the need to process 1421 such data automatically. CDDL can be used for such automated data 1422 processing, allowing tools to verify data, clean it, and extract 1423 particular parts of interest from it. 1425 Since CBOR is designed with constrained devices in mind, a likely use 1426 of it would be small sensors. An interesting use would thus be 1427 automated analysis of sensor data. 1429 5. Security considerations 1431 This document presents a content rules language for expressing CBOR 1432 data structures. As such, it does not bring any security issues on 1433 itself, although specification of protocols that use CBOR naturally 1434 need security analysis when defined. 1436 Topics that could be considered in a security considerations section 1437 that uses CDDL to define CBOR structures include the following: 1439 o Where could the language maybe cause confusion in a way that will 1440 enable security issues? 1442 o Where a CDDL matcher is part of the implementation of a system, 1443 the security of the system ought not depend on the correctness of 1444 the CDDL specification or CDDL implementation without any further 1445 defenses in place. 1447 Writers of CDDL specifications are strongly encouraged to value 1448 simplicity and transparency of the specification over its elegance. 1449 Keep it as simple as possible while still expressing the needed data 1450 model. 1452 A related observation about formal description techniques in general 1453 that is strongly recommended to be kept in mind by writers of CDDL 1454 specifications: Just because CDDL makes it easier to handle 1455 complexity in a specification, that does not make that complexity 1456 somehow less bad (except maybe on the level of the humans having to 1457 grasp the complex structure while reading the spec). 1459 6. IANA considerations 1461 6.1. CDDL control operator registry 1463 IANA is requested ... 1465 (TBD: define a registry of control operators. Policy to be defined, 1466 definitely at least specification required. Designated expert should 1467 be instructed to require a workable specification that enables 1468 interoperability of implementations of CDDL specifications making use 1469 of the control operator. Define initial table from the present 1470 document.) 1472 7. References 1474 7.1. Normative References 1476 [ISO6093] ISO, "Information processing -- Representation of 1477 numerical values in character strings for information 1478 interchange", ISO 6093, 1985. 1480 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1481 Requirement Levels", BCP 14, RFC 2119, 1482 DOI 10.17487/RFC2119, March 1997, 1483 . 1485 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 1486 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 1487 2003, . 1489 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 1490 Specifications: ABNF", STD 68, RFC 5234, 1491 DOI 10.17487/RFC5234, January 2008, 1492 . 1494 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 1495 Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, 1496 October 2013, . 1498 [RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493, 1499 DOI 10.17487/RFC7493, March 2015, 1500 . 1502 [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data 1503 Interchange Format", STD 90, RFC 8259, 1504 DOI 10.17487/RFC8259, December 2017, 1505 . 1507 [W3C.REC-xmlschema-2-20041028] 1508 Biron, P. and A. Malhotra, "XML Schema Part 2: Datatypes 1509 Second Edition", World Wide Web Consortium Recommendation 1510 REC-xmlschema-2-20041028, October 2004, 1511 . 1513 7.2. Informative References 1515 [I-D.bormann-cbor-cddl-freezer] 1516 Bormann, C., "A feature freezer for the Concise Data 1517 Definition Language (CDDL)", draft-bormann-cbor-cddl- 1518 freezer-00 (work in progress), January 2018. 1520 [I-D.ietf-anima-grasp] 1521 Bormann, C., Carpenter, B., and B. Liu, "A Generic 1522 Autonomic Signaling Protocol (GRASP)", draft-ietf-anima- 1523 grasp-15 (work in progress), July 2017. 1525 [I-D.ietf-core-senml] 1526 Jennings, C., Shelby, Z., Arkko, J., Keranen, A., and C. 1527 Bormann, "Sensor Measurement Lists (SenML)", draft-ietf- 1528 core-senml-16 (work in progress), May 2018. 1530 [I-D.newton-json-content-rules] 1531 Newton, A. and P. Cordell, "A Language for Rules 1532 Describing JSON Content", draft-newton-json-content- 1533 rules-09 (work in progress), September 2017. 1535 [RELAXNG] ISO/IEC, "Information technology -- Document Schema 1536 Definition Language (DSDL) -- Part 2: Regular-grammar- 1537 based validation -- RELAX NG", ISO/IEC 19757-2, December 1538 2008. 1540 [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 1541 Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, 1542 . 1544 [RFC7071] Borenstein, N. and M. Kucherawy, "A Media Type for 1545 Reputation Interchange", RFC 7071, DOI 10.17487/RFC7071, 1546 November 2013, . 1548 [RFC7950] Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language", 1549 RFC 7950, DOI 10.17487/RFC7950, August 2016, 1550 . 1552 [RFC8007] Murray, R. and B. Niven-Jenkins, "Content Delivery Network 1553 Interconnection (CDNI) Control Interface / Triggers", 1554 RFC 8007, DOI 10.17487/RFC8007, December 2016, 1555 . 1557 [RFC8152] Schaad, J., "CBOR Object Signing and Encryption (COSE)", 1558 RFC 8152, DOI 10.17487/RFC8152, July 2017, 1559 . 1561 7.3. URIs 1563 [1] https://github.com/cabo/cbor-diag 1565 Appendix A. (Not used.) 1567 Appendix B. ABNF grammar 1569 The following is a formal definition of the CDDL syntax in Augmented 1570 Backus-Naur Form (ABNF, [RFC5234]). [_abnftodo] 1572 cddl = S 1*rule 1573 rule = typename [genericparm] S assign S type S 1574 / groupname [genericparm] S assign S grpent S 1576 typename = id 1577 groupname = id 1579 assign = "=" / "/=" / "//=" 1581 genericparm = "<" S id S *("," S id S ) ">" 1582 genericarg = "<" S type1 S *("," S type1 S ) ">" 1584 type = type1 S *("/" S type1 S) 1586 type1 = type2 [S (rangeop / ctlop) S type2] 1587 type2 = value 1588 / typename [genericarg] 1589 / "(" type ")" 1590 / "~" S groupname [genericarg] 1591 / "#" "6" ["." uint] "(" S type S ")" ; note no space! 1592 / "#" DIGIT ["." uint] ; major/ai 1593 / "#" ; any 1594 / "{" S group S "}" 1595 / "[" S group S "]" 1596 / "&" S "(" S group S ")" 1597 / "&" S groupname [genericarg] 1599 rangeop = "..." / ".." 1601 ctlop = "." id 1603 group = grpchoice S *("//" S grpchoice S) 1605 grpchoice = *grpent 1607 grpent = [occur S] [memberkey S] type optcom 1608 / [occur S] groupname [genericarg] optcom ; preempted by above 1609 / [occur S] "(" S group S ")" optcom 1611 memberkey = type1 S ["^" S] "=>" 1612 / bareword S ":" 1613 / value S ":" 1615 bareword = id 1617 optcom = S ["," S] 1619 occur = [uint] "*" [uint] 1620 / "+" 1621 / "?" 1623 uint = ["0x" / "0b"] "0" 1624 / DIGIT1 *DIGIT 1625 / "0x" 1*HEXDIG 1626 / "0b" 1*BINDIG 1628 value = number 1629 / text 1630 / bytes 1632 int = ["-"] uint 1634 ; This is a float if it has fraction or exponent; int otherwise 1635 number = hexfloat / (int ["." fraction] ["e" exponent ]) 1636 hexfloat = "0x" 1*HEXDIG ["." 1*HEXDIG] "p" exponent 1637 fraction = 1*DIGIT 1638 exponent = ["+"/"-"] 1*DIGIT 1640 text = %x22 *SCHAR %x22 1641 SCHAR = %x20-21 / %x23-5B / %x5D-10FFFD / SESC 1642 SESC = "\" %x20-10FFFD 1644 bytes = [bsqual] %x27 *BCHAR %x27 1645 BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / CRLF 1646 bsqual = %x68 ; "h" 1647 / %x62.36.34 ; "b64" 1649 id = EALPHA *(*("-" / ".") (EALPHA / DIGIT)) 1650 ALPHA = %x41-5A / %x61-7A 1651 EALPHA = %x41-5A / %x61-7A / "@" / "_" / "$" 1652 DIGIT = %x30-39 1653 DIGIT1 = %x31-39 1654 HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F" 1655 BINDIG = %x30-31 1657 S = *WS 1658 WS = SP / NL 1659 SP = %x20 1660 NL = COMMENT / CRLF 1661 COMMENT = ";" *PCHAR CRLF 1662 PCHAR = %x20-10FFFD 1663 CRLF = %x0A / %x0D.0A 1665 Figure 12: CDDL ABNF 1667 Appendix C. Matching rules 1669 In this appendix, we go through the ABNF syntax rules defined in 1670 Appendix B and briefly describe the matching semantics of each 1671 syntactic feature. In this context, an instance (data item) 1672 "matches" a CDDL specification if it is allowed by the CDDL 1673 specification; this is then broken down to parts of specifications 1674 (type and group expressions) and parts of instances (data items). 1676 cddl = S 1*rule 1678 A CDDL specification is a sequence of one or more rules. Each rule 1679 gives a name to a right hand side expression, either a CDDL type or a 1680 CDDL group. Rule names can be used in the rule itself and/or other 1681 rules (and tools can output warnings if that is not the case). The 1682 order of the rules is significant only in two cases, including the 1683 following: The first rule defines the semantics of the entire 1684 specification; hence, its name may be descriptive only (or may be 1685 used in itself or other rules as with the other rule names). 1687 rule = typename [genericparm] S assign S type S 1688 / groupname [genericparm] S assign S grpent S 1690 typename = id 1691 groupname = id 1693 A rule defines a name for a type expression (production "type") or 1694 for a group expression (production "grpent"), with the intention that 1695 the semantics does not change when the name is replaced by its 1696 (parenthesized if needed) definition. 1698 assign = "=" / "/=" / "//=" 1700 A plain equals sign defines the rule name as the equivalent of the 1701 expression to the right. A "/=" or "//=" extends a named type or a 1702 group by additional choices; a number of these could be replaced by 1703 collecting all the right hand sides and creating a single rule with a 1704 type choice or a group choice built from the right hand sides in the 1705 order of the rules given. (It is not an error to extend a rule name 1706 that has not yet been defined; this makes the right hand side the 1707 first entry in the choice being created.) The creation of the type 1708 choices and group choices from the right hand sides of rules is the 1709 other case where rule order can be significant. 1711 genericparm = "<" S id S *("," S id S ) ">" 1712 genericarg = "<" S type1 S *("," S type1 S ) ">" 1714 Rule names can have generic parameters, which cause temporary 1715 assignments within the right hand sides to the parameter names from 1716 the arguments given when citing the rule name. 1718 type = type1 S *("/" S type1 S) 1720 A type can be given as a choice between one or more types. The 1721 choice matches a data item if the data item matches any one of the 1722 types given in the choice. The choice uses Parse Expression Grammar 1723 (PEG) semantics: The first choice that matches wins. (As a result, 1724 the order of rules that contribute to a single rule name can very 1725 well matter.) 1727 type1 = type2 [S (rangeop / ctlop) S type2] 1729 Two types can be combined with a range operator (which see below) or 1730 a control operator (see Section 3.8). 1732 type2 = value 1734 A type can be just a single value (such as 1 or "icecream" or 1735 h'0815'), which matches only a data item with that specific value (no 1736 conversions defined), 1738 / typename [genericarg] 1740 or be defined by a rule giving a meaning to a name (possibly after 1741 supplying generic args as required by the generic parameters), 1743 / "(" type ")" 1745 or be defined in a parenthesized type expression (parentheses may be 1746 necessary to override some operator precendence), or 1748 / "~" S groupname [genericarg] 1750 an "unwrapped" group (see Section 3.7), which matches the group 1751 inside a type defined as a map or an array by wrapping the group, or 1753 / "#" "6" ["." uint] "(" S type S ")" ; note no space! 1755 a tagged data item, tagged with the "uint" given and containing the 1756 type given as the tagged value, or 1758 / "#" DIGIT ["." uint] ; major/ai 1760 a data item of a major type (given by the DIGIT), optionally 1761 constrained to the additional information given by the uint, or 1763 / "#" ; any 1765 any data item, or 1767 / "{" S group S "}" 1769 a map expression, which matches a valid CBOR map the key/value pairs 1770 of which can be ordered in such a way that the resulting sequence 1771 matches the group expression, or 1773 / "[" S group S "]" 1775 an array expression, which matches a CBOR array the elements of 1776 which, when taken as values and complemented by a wildcard (matches 1777 anything) key each, match the group, or 1778 / "&" S "(" S group S ")" 1779 / "&" S groupname [genericarg] 1781 an enumeration expression, which matches any a value that is within 1782 the set of values that the values of the group given can take. 1784 rangeop = "..." / ".." 1786 A range operator can be used to join two type expressions that stand 1787 for either two integer values or two floating point values; it 1788 matches any value that is between the two values, where the first 1789 value is always included in the matching set and the second value is 1790 included for ".." and excluded for "...". 1792 ctlop = "." id 1794 A control operator ties a _target_ type to a _controller_ type as 1795 defined in Section 3.8. Note that control operators are an extension 1796 point for CDDL; additional documents may want to define additional 1797 control operators. 1799 group = grpchoice S *("//" S grpchoice S) 1801 A group matches any sequence of key/value pairs that matches any of 1802 the choices given (again using Parse Expression Grammar semantics). 1804 grpchoice = *grpent 1806 Each of the component groups is given as a sequence of group entries. 1807 For a match, the sequence of key/value pairs given needs to match the 1808 sequence of group entries in the sequence given. 1810 grpent = [occur S] [memberkey S] type optcom 1812 A group entry can be given by a value type, which needs to be matched 1813 by the value part of a single element, and optionally a memberkey 1814 type, which needs to be matched by the key part of the element, if 1815 the memberkey is given. If the memberkey is not given, the entry can 1816 only be used for matching arrays, not for maps. (See below how that 1817 is modified by the occurrence indicator.) 1819 / [occur S] groupname [genericarg] optcom ; preempted by above 1821 A group entry can be built from a named group, or 1823 / [occur S] "(" S group S ")" optcom 1825 from a parenthesized group, again with a possible occurrence 1826 indicator. 1828 memberkey = type1 S ["^" S] "=>" 1829 / bareword S ":" 1830 / value S ":" 1832 Key types can be given by a type expression, a bareword (which stands 1833 for a type that just contains a string value created from this 1834 bareword), or a value (which stands for a type that just contains 1835 this value). A key value matches its key type if the key value is a 1836 member of the key type, unless a cut preceding it in the group 1837 applies (see Section 3.5.3 how map matching is infuenced by the 1838 presence of the cuts denoted by "^" or ":" in previous entries). 1840 bareword = id 1842 A bareword is an alternative way to write a type with a single text 1843 string value; it can only be used in the syntactic context given 1844 above. 1846 optcom = S ["," S] 1848 (Optional commas do not influence the matching.) 1850 occur = [uint] "*" [uint] 1851 / "+" 1852 / "?" 1854 An occurrence indicator modifies the group given to its right by 1855 requiring the group to match the sequence to be matched exactly for a 1856 certain number of times (see Section 3.2) in sequence, i.e. it acts 1857 as a (possibly infinite) group choice that contains choices with the 1858 group repeated each of the occurrences times. 1860 The rest of the ABNF describes syntax for value notation that should 1861 be familiar from programming languages, with the possible exception 1862 of h'..' and b64'..' for byte strings, as well as syntactic elements 1863 such as comments and line ends. 1865 Appendix D. (Not used.) 1867 Appendix E. Standard Prelude 1869 The following prelude is automatically added to each CDDL file 1870 [tdate]. (Note that technically, it is a postlude, as it does not 1871 disturb the selection of the first rule as the root of the 1872 definition.) 1873 any = # 1875 uint = #0 1876 nint = #1 1877 int = uint / nint 1879 bstr = #2 1880 bytes = bstr 1881 tstr = #3 1882 text = tstr 1884 tdate = #6.0(tstr) 1885 time = #6.1(number) 1886 number = int / float 1887 biguint = #6.2(bstr) 1888 bignint = #6.3(bstr) 1889 bigint = biguint / bignint 1890 integer = int / bigint 1891 unsigned = uint / biguint 1892 decfrac = #6.4([e10: int, m: integer]) 1893 bigfloat = #6.5([e2: int, m: integer]) 1894 eb64url = #6.21(any) 1895 eb64legacy = #6.22(any) 1896 eb16 = #6.23(any) 1897 encoded-cbor = #6.24(bstr) 1898 uri = #6.32(tstr) 1899 b64url = #6.33(tstr) 1900 b64legacy = #6.34(tstr) 1901 regexp = #6.35(tstr) 1902 mime-message = #6.36(tstr) 1903 cbor-any = #6.55799(any) 1905 float16 = #7.25 1906 float32 = #7.26 1907 float64 = #7.27 1908 float16-32 = float16 / float32 1909 float32-64 = float32 / float64 1910 float = float16-32 / float64 1912 false = #7.20 1913 true = #7.21 1914 bool = false / true 1915 nil = #7.22 1916 null = nil 1917 undefined = #7.23 1919 Figure 13: CDDL Prelude 1921 Note that the prelude is deemed to be fixed. This means, for 1922 instance, that additional tags beyond [RFC7049], as registered, need 1923 to be defined in each CDDL file that is using them. 1925 A common stumbling point is that the prelude does not define a type 1926 "string". CBOR has byte strings ("bytes" in the prelude) and text 1927 strings ("text"), so a type that is simply called "string" would be 1928 ambiguous. 1930 E.1. Use with JSON 1932 The JSON generic data model (implicit in [RFC8259]) is a subset of 1933 the generic data model of CBOR. So one can use CDDL with JSON by 1934 limiting oneself to what can be represented in JSON. Roughly 1935 speaking, this means leaving out byte strings, tags, and simple 1936 values other than "false", "true", and "null", leading to the 1937 following limited prelude: 1939 any = # 1941 uint = #0 1942 nint = #1 1943 int = uint / nint 1945 tstr = #3 1946 text = tstr 1948 number = int / float 1950 float16 = #7.25 1951 float32 = #7.26 1952 float64 = #7.27 1953 float16-32 = float16 / float32 1954 float32-64 = float32 / float64 1955 float = float16-32 / float64 1957 false = #7.20 1958 true = #7.21 1959 bool = false / true 1960 nil = #7.22 1961 null = nil 1963 Figure 14: JSON compatible subset of CDDL Prelude 1965 (The major types given here do not have a direct meaning in JSON, but 1966 they can be interpreted as CBOR major types translated through 1967 Section 4 of [RFC7049].) 1968 There are a few fine points in using CDDL with JSON. First, JSON 1969 does not distinguish between integers and floating point numbers; 1970 there is only one kind of number (which may happen to be integral). 1971 In this context, specifying a type as "uint", "nint" or "int" then 1972 becomes a predicate that the number be integral. As an example, this 1973 means that the following JSON numbers are all matching "uint": 1975 10 10.0 1e1 1.0e1 100e-1 1977 (The fact that these are all integers may be surprising to users 1978 accustomed to the long tradition in programming languages of using 1979 decimal points or exponents in a number to indicate a floating point 1980 literal.) 1982 CDDL distinguishes the various CBOR number types, but there is only 1983 one number type in JSON. The effect of specifying a floating point 1984 precision (float16/float32/float64) is only to restrict the set of 1985 permissible values to those expressible with binary16/binary32/ 1986 binary64; this is unlikely to be very useful when using CDDL for 1987 specifying JSON data structures. 1989 Fundamentally, the number system of JSON itself is based on decimal 1990 numbers and decimal fractions and does not have limits to its 1991 precision or range. In practice, JSON numbers are often parsed into 1992 a number type that is called float64 here, creating a number of 1993 limitations to the generic data model [RFC7493]. In particular, this 1994 means that integers can only be expressed with interoperable 1995 exactness when they lie in the range [-(2**53)+1, (2**53)-1] -- a 1996 smaller range than that covered by CDDL "int". 1998 JSON applications that want to stay compatible with I-JSON therefore 1999 may want to define integer types with more limited ranges, such as in 2000 Figure 15. Note that the types given here are not part of the 2001 prelude; they need to be copied into the CDDL specification if 2002 needed. 2004 ij-uint = 0..9007199254740991 2005 ij-nint = -9007199254740991..-1 2006 ij-int = -9007199254740991..9007199254740991 2008 Figure 15: I-JSON types for CDDL (not part of prelude) 2010 JSON applications that do not need to stay compatible with I-JSON and 2011 that actually may need to go beyond the 64-bit unsigned and negative 2012 integers supported by "int" (= "uint"/"nint") may want to use the 2013 following additional types from the standard prelude, which are 2014 expressed in terms of tags but can straightforwardly be mapped into 2015 JSON (but not I-JSON) numbers: 2017 biguint = #6.2(bstr) 2018 bignint = #6.3(bstr) 2019 bigint = biguint / bignint 2020 integer = int / bigint 2021 unsigned = uint / biguint 2023 CDDL at this point does not have a way to express the unlimited 2024 floating point precision that is theoretically possible with JSON; at 2025 the time of writing, this is rarely used in protocols in practice. 2027 Note that a data model described in CDDL is always restricted by what 2028 can be expressed in the serialization; e.g., floating point values 2029 such as NaN (not a number) and the infinities cannot be represented 2030 in JSON even if they are allowed in the CDDL generic data model. 2032 Appendix F. The CDDL tool 2034 A rough CDDL tool is available. For CDDL specifications, it can 2035 check the syntax, generate one or more instances (expressed in CBOR 2036 diagnostic notation or in pretty-printed JSON), and validate an 2037 existing instance against the specification: 2039 Usage: 2040 cddl spec.cddl generate [n] 2041 cddl spec.cddl json-generate [n] 2042 cddl spec.cddl validate instance.cbor 2043 cddl spec.cddl validate instance.json 2045 Figure 16: CDDL tool usage 2047 Install on a system with a modern Ruby via: 2049 gem install cddl 2051 Figure 17: CDDL tool installation 2053 The accompanying CBOR diagnostic tools (which are automatically 2054 installed by the above) are described in https://github.com/cabo/ 2055 cbor-diag [1]; they can be used to convert between binary CBOR, a 2056 pretty-printed form of that, CBOR diagnostic notation, JSON, and 2057 YAML. 2059 Appendix G. Extended Diagnostic Notation 2061 Section 6 of [RFC7049] defines a "diagnostic notation" in order to be 2062 able to converse about CBOR data items without having to resort to 2063 binary data. Diagnostic notation is based on JSON, with extensions 2064 for representing CBOR constructs such as binary data and tags. 2066 (Standardizing this together with the actual interchange format does 2067 not serve to create another interchange format, but enables the use 2068 of a shared diagnostic notation in tools for and documents about 2069 CBOR.) 2071 This section discusses a few extensions to the diagnostic notation 2072 that have turned out to be useful since RFC 7049 was written. We 2073 refer to the result as extended diagnostic notation (EDN). 2075 G.1. White space in byte string notation 2077 Examples often benefit from some white space (spaces, line breaks) in 2078 byte strings. In extended diagnostic notation, white space is 2079 ignored in prefixed byte strings; for instance, the following are 2080 equivalent: 2082 h'48656c6c6f20776f726c64' 2083 h'48 65 6c 6c 6f 20 77 6f 72 6c 64' 2084 h'4 86 56c 6c6f 2085 20776 f726c64' 2087 G.2. Text in byte string notation 2089 Diagnostic notation notates Byte strings in one of the [RFC4648] base 2090 encodings,, enclosed in single quotes, prefixed by >h< for base16, 2091 >b32< for base32, >h32< for base32hex, >b64< for base64 or base64url. 2092 Quite often, byte strings carry bytes that are meaningfully 2093 interpreted as UTF-8 text. Extended Diagnostic Notation allows the 2094 use of single quotes without a prefix to express byte strings with 2095 UTF-8 text; for instance, the following are equivalent: 2097 'hello world' 2098 h'68656c6c6f20776f726c64' 2100 The escaping rules of JSON strings are applied equivalently for text- 2101 based byte strings, e.g., \ stands for a single backslash and ' 2102 stands for a single quote. White space is included literally, i.e., 2103 the previous section does not apply to text-based byte strings. 2105 G.3. Embedded CBOR and CBOR sequences in byte strings 2107 Where a byte string is to carry an embedded CBOR-encoded item, or 2108 more generally a sequence of zero or more such items, the diagnostic 2109 notation for these zero or more CBOR data items, separated by 2110 commata, can be enclosed in << and >> to notate the byte string 2111 resulting from encoding the data items and concatenating the result. 2112 For instance, each pair of columns in the following are equivalent: 2114 <<1>> h'01' 2115 <<1, 2>> h'0102' 2116 <<"foo", null>> h'63666F6FF6' 2117 <<>> h'' 2119 G.4. Concatenated Strings 2121 While the ability to include white space enables line-breaking of 2122 encoded byte strings, a mechanism is needed to be able to include 2123 text strings as well as byte strings in direct UTF-8 representation 2124 into line-based documents (such as RFCs and source code). 2126 We extend the diagnostic notation by allowing multiple text strings 2127 or multiple byte strings to be notated separated by white space, 2128 these are then concatenated into a single text or byte string, 2129 respectively. Text strings and byte strings do not mix within such a 2130 concatenation, except that byte string notation can be used inside a 2131 sequence of concatenated text string notation to encode characters 2132 that may be better represented in an encoded way. The following four 2133 values are equivalent: 2135 "Hello world" 2136 "Hello " "world" 2137 "Hello" h'20' "world" 2138 "" h'48656c6c6f20776f726c64' "" 2140 Similarly, the following byte string values are equivalent 2142 'Hello world' 2143 'Hello ' 'world' 2144 'Hello ' h'776f726c64' 2145 'Hello' h'20' 'world' 2146 '' h'48656c6c6f20776f726c64' '' b64'' 2147 h'4 86 56c 6c6f' h' 20776 f726c64' 2149 (Note that the approach of separating by whitespace, while familiar 2150 from the C language, requires some attention - a single comma makes a 2151 big difference here.) 2153 G.5. Hexadecimal, octal, and binary numbers 2155 In addition to JSON's decimal numbers, EDN provides hexadecimal, 2156 octal and binary numbers in the usual C-language notation (octal with 2157 0o prefix present only). 2159 The following are equivalent: 2161 4711 2162 0x1267 2163 0o11147 2164 0b1001001100111 2166 As are: 2168 1.5 2169 0x1.8p0 2170 0x18p-4 2172 G.6. Comments 2174 Longer pieces of diagnostic notation may benefit from comments. JSON 2175 famously does not provide for comments, and basic RFC 7049 diagnostic 2176 notation inherits this property. 2178 In extended diagnostic notation, comments can be included, delimited 2179 by slashes ("/"). Any text within and including a pair of slashes is 2180 considered a comment. 2182 Comments are considered white space. Hence, they are allowed in 2183 prefixed byte strings; for instance, the following are equivalent: 2185 h'68656c6c6f20776f726c64' 2186 h'68 65 6c /doubled l!/ 6c 6f /hello/ 2187 20 /space/ 2188 77 6f 72 6c 64' /world/ 2190 This can be used to annotate a CBOR structure as in: 2192 /grasp-message/ [/M_DISCOVERY/ 1, /session-id/ 10584416, 2193 /objective/ [/objective-name/ "opsonize", 2194 /D, N, S/ 7, /loop-count/ 105]] 2196 (There are currently no end-of-line comments. If we want to add 2197 them, "//" sounds like a reasonable delimiter given that we already 2198 use slashes for comments, but we also could go e.g. for "#".) 2200 Appendix H. Examples 2202 This section contains various examples of structures defined using 2203 CDDL. 2205 The theme for the first example is taken from [RFC7071], which 2206 defines certain JSON structures in English. For a similar example, 2207 it may also be of interest to examine Appendix A of [RFC8007], which 2208 contains a CDDL definition for a JSON structure defined in the main 2209 body of the RFC. 2211 The second subsection in this appendix translates examples from 2212 [I-D.newton-json-content-rules] into CDDL. 2214 These examples all happen to describe data that is interchanged in 2215 JSON. Examples for CDDL definitions of data that is interchanged in 2216 CBOR can be found in [RFC8152], [I-D.ietf-anima-grasp], or 2217 [I-D.ietf-core-senml]. 2219 H.1. RFC 7071 2221 [RFC7071] defines the Reputon structure for JSON using somewhat 2222 formalized English text. Here is a (somewhat verbose) equivalent 2223 definition using the same terms, but notated in CDDL: 2225 reputation-object = { 2226 reputation-context, 2227 reputon-list 2228 } 2230 reputation-context = ( 2231 application: text 2232 ) 2234 reputon-list = ( 2235 reputons: reputon-array 2236 ) 2238 reputon-array = [* reputon] 2240 reputon = { 2241 rater-value, 2242 assertion-value, 2243 rated-value, 2244 rating-value, 2245 ? conf-value, 2246 ? normal-value, 2247 ? sample-value, 2248 ? gen-value, 2249 ? expire-value, 2250 * ext-value, 2251 } 2253 rater-value = ( rater: text ) 2254 assertion-value = ( assertion: text ) 2255 rated-value = ( rated: text ) 2256 rating-value = ( rating: float16 ) 2257 conf-value = ( confidence: float16 ) 2258 normal-value = ( normal-rating: float16 ) 2259 sample-value = ( sample-size: uint ) 2260 gen-value = ( generated: uint ) 2261 expire-value = ( expires: uint ) 2262 ext-value = ( text => any ) 2264 An equivalent, more compact form of this example would be: 2266 reputation-object = { 2267 application: text 2268 reputons: [* reputon] 2269 } 2271 reputon = { 2272 rater: text 2273 assertion: text 2274 rated: text 2275 rating: float16 2276 ? confidence: float16 2277 ? normal-rating: float16 2278 ? sample-size: uint 2279 ? generated: uint 2280 ? expires: uint 2281 * text => any 2282 } 2284 Note how this rather clearly delineates the structure somewhat 2285 shrouded by so many words in section 6.2.2. of [RFC7071]. Also, this 2286 definition makes it clear that several ext-values are allowed (by 2287 definition with different member names); RFC 7071 could be read to 2288 forbid the repetition of ext-value ("A specific reputon-element MUST 2289 NOT appear more than once" is ambiguous.) 2291 The CDDL tool (which hasn't quite been trained for polite 2292 conversation) says: 2294 { 2295 "application": "tridentiferous", 2296 "reputons": [ 2297 { 2298 "rater": "loamily", 2299 "assertion": "Dasyprocta", 2300 "rated": "uncommensurableness", 2301 "rating": 0.05055809746548934, 2302 "confidence": 0.7484706448605812, 2303 "normal-rating": 0.8677887734049299, 2304 "sample-size": 4059, 2305 "expires": 3969, 2306 "bearer": "nitty", 2307 "faucal": "postulnar", 2308 "naturalism": "sarcotic" 2309 }, 2310 { 2311 "rater": "precreed", 2312 "assertion": "xanthosis", 2313 "rated": "balsamy", 2314 "rating": 0.36091333590593955, 2315 "confidence": 0.3700759808403371, 2316 "sample-size": 3904 2317 }, 2318 { 2319 "rater": "urinosexual", 2320 "assertion": "malacostracous", 2321 "rated": "arenariae", 2322 "rating": 0.9210673488013762, 2323 "normal-rating": 0.4778762617112776, 2324 "sample-size": 4428, 2325 "generated": 3294, 2326 "backfurrow": "enterable", 2327 "fruitgrower": "flannelflower" 2328 }, 2329 { 2330 "rater": "pedologistically", 2331 "assertion": "unmetaphysical", 2332 "rated": "elocutionist", 2333 "rating": 0.42073613384304287, 2334 "misimagine": "retinaculum", 2335 "snobbish": "contradict", 2336 "Bosporanic": "periostotomy", 2337 "dayworker": "intragyral" 2338 } 2339 ] 2340 } 2342 H.1.1. Examples from JSON Content Rules 2344 Although JSON Content Rules [I-D.newton-json-content-rules] seems to 2345 address a more general problem than CDDL, it is still a worthwhile 2346 resource to explore for examples (beyond all the inspiration the 2347 format itself has had for CDDL). 2349 Figure 2 of the JCR I-D looks very similar, if slightly less noisy, 2350 in CDDL: 2352 root = [2*2 { 2353 precision: text, 2354 Latitude: float, 2355 Longitude: float, 2356 Address: text, 2357 City: text, 2358 State: text, 2359 Zip: text, 2360 Country: text 2361 }] 2363 Figure 18: JCR, Figure 2, in CDDL 2365 Apart from the lack of a need to quote the member names, text strings 2366 are called "text" or "tstr" in CDDL ("string" would be ambiguous as 2367 CBOR also provides byte strings). 2369 The CDDL tool creates the below example instance for this: 2371 [{"precision": "pyrosphere", "Latitude": 0.5399712314350172, 2372 "Longitude": 0.5157523963028087, "Address": "resow", 2373 "City": "problemwise", "State": "martyrlike", "Zip": "preprove", 2374 "Country": "Pace"}, 2375 {"precision": "unrigging", "Latitude": 0.10422704368372193, 2376 "Longitude": 0.6279808663725834, "Address": "picturedom", 2377 "City": "decipherability", "State": "autometry", "Zip": "pout", 2378 "Country": "wimple"}] 2380 Figure 4 of the JCR I-D in CDDL: 2382 root = { image } 2384 image = ( 2385 Image: { 2386 size, 2387 Title: text, 2388 thumbnail, 2389 IDs: [* int] 2390 } 2391 ) 2393 size = ( 2394 Width: 0..1280 2395 Height: 0..1024 2396 ) 2398 thumbnail = ( 2399 Thumbnail: { 2400 size, 2401 Url: ~uri 2402 } 2403 ) 2405 This shows how the group concept can be used to keep related elements 2406 (here: width, height) together, and to emulate the JCR style of 2407 specification. (It also shows referencing a type by unwrapping a tag 2408 from the prelude, "uri" - this could be done differently.) The more 2409 compact form of Figure 5 of the JCR I-D could be emulated like this: 2411 root = { 2412 Image: { 2413 size, Title: text, 2414 Thumbnail: { size, Url: ~uri }, 2415 IDs: [* int] 2416 } 2417 } 2419 size = ( 2420 Width: 0..1280, 2421 Height: 0..1024, 2422 ) 2424 The CDDL tool creates the below example instance for this: 2426 {"Image": {"Width": 566, "Height": 516, "Title": "leisterer", 2427 "Thumbnail": {"Width": 1111, "Height": 176, "Url": 32("scrog")}, 2428 "IDs": []}} 2430 Acknowledgements 2432 CDDL was originally conceived by Bert Greevenbosch, who also wrote 2433 the original five versions of this document. 2435 Inspiration was taken from the C and Pascal languages, MPEG's 2436 conventions for describing structures in the ISO base media file 2437 format, Relax-NG and its compact syntax [RELAXNG], and in particular 2438 from Andrew Lee Newton's "JSON Content Rules" 2439 [I-D.newton-json-content-rules]. 2441 Useful feedback came from members of the IETF CBOR WG, in particular 2442 Joe Hildebrand, Sean Leonard and Jim Schaad. Also, Francesca 2443 Palombini and Joe volunteered to chair this WG, providing the 2444 framework for generating and processing this feedback. 2446 The CDDL tool was written by Carsten Bormann, building on previous 2447 work by Troy Heninger and Tom Lord. 2449 Editorial Comments 2451 [_format] So far, the ability to restrict format choices have not been 2452 needed beyond the floating point formats. Those can be 2453 applied to ranges using the new .and control now. It is not 2454 clear we want to add more format control before we have a use 2455 case. 2457 [_range] TO DO: define this precisely. This clearly includes integers 2458 and floats. Strings - as in "a".."z" - could be added if 2459 desired, but this would require adopting a definition of string 2460 ordering and possibly a successor function so "a".."z" does not 2461 include "bb". 2463 [_strings] TO DO: This still needs to be fully realized in the ABNF and 2464 in the CDDL tool. 2466 [_bitsendian] How useful would it be to have another variant that counts 2467 bits like in RFC box notation? (Or at least per-byte? 2468 32-bit words don't always perfectly mesh with byte 2469 strings.) 2471 [unflex] A comment has been that this is counter-intuitive. One 2472 solution would be to simply disallow unparenthesized usage of 2473 occurrence indicators in front of type choices unless a member 2474 key is also present like in group2 above. 2476 [_abnftodo] Potential improvements: the prefixed byte strings are more 2477 liberally specified than they actually are. 2479 [tdate] The prelude as included here does not yet have a .regexp control 2480 on tdate, but we probably do want to have one. 2482 Authors' Addresses 2484 Henk Birkholz 2485 Fraunhofer SIT 2486 Rheinstrasse 75 2487 Darmstadt 64295 2488 Germany 2490 Email: henk.birkholz@sit.fraunhofer.de 2492 Christoph Vigano 2493 Universitaet Bremen 2495 Email: christoph.vigano@uni-bremen.de 2497 Carsten Bormann 2498 Universitaet Bremen TZI 2499 Bibliothekstr. 1 2500 Bremen D-28359 2501 Germany 2503 Phone: +49-421-218-63921 2504 Email: cabo@tzi.org