idnits 2.17.1 draft-ietf-cbor-cddl-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 26, 2018) is 2244 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Cc' is mentioned on line 1106, but not defined == Missing Reference: 'Aa' is mentioned on line 1106, but not defined == Missing Reference: 'Ss' is mentioned on line 1106, but not defined == Missing Reference: 'Ee' is mentioned on line 1106, but not defined -- Looks like a reference, but probably isn't: '1' on line 2016 ** Obsolete normative reference: RFC 7049 (Obsoleted by RFC 8949) == Outdated reference: A later version (-13) exists of draft-bormann-cbor-cddl-freezer-00 == Outdated reference: A later version (-16) exists of draft-ietf-core-senml-12 -- Obsolete informational reference (is this intentional?): RFC 8152 (Obsoleted by RFC 9052, RFC 9053) Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group H. Birkholz 3 Internet-Draft Fraunhofer SIT 4 Intended status: Standards Track C. Vigano 5 Expires: August 30, 2018 Universitaet Bremen 6 C. Bormann 7 Universitaet Bremen TZI 8 February 26, 2018 10 Concise data definition language (CDDL): a notational convention to 11 express CBOR data structures 12 draft-ietf-cbor-cddl-02 14 Abstract 16 This document proposes a notational convention to express CBOR data 17 structures (RFC 7049). Its main goal is to provide an easy and 18 unambiguous way to express structures for protocol messages and data 19 formats that use CBOR. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on August 30, 2018. 38 Copyright Notice 40 Copyright (c) 2018 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (https://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 56 1.1. Requirements notation . . . . . . . . . . . . . . . . . . 4 57 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 58 2. The Style of Data Structure Specification . . . . . . . . . . 4 59 2.1. Groups and Composition in CDDL . . . . . . . . . . . . . 6 60 2.1.1. Usage . . . . . . . . . . . . . . . . . . . . . . . . 8 61 2.1.2. Syntax . . . . . . . . . . . . . . . . . . . . . . . 8 62 2.2. Types . . . . . . . . . . . . . . . . . . . . . . . . . . 9 63 2.2.1. Values . . . . . . . . . . . . . . . . . . . . . . . 9 64 2.2.2. Choices . . . . . . . . . . . . . . . . . . . . . . . 9 65 2.2.3. Representation Types . . . . . . . . . . . . . . . . 11 66 2.2.4. Root type . . . . . . . . . . . . . . . . . . . . . . 11 67 3. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 68 3.1. General conventions . . . . . . . . . . . . . . . . . . . 12 69 3.2. Occurrence . . . . . . . . . . . . . . . . . . . . . . . 13 70 3.3. Predefined names for types . . . . . . . . . . . . . . . 14 71 3.4. Arrays . . . . . . . . . . . . . . . . . . . . . . . . . 15 72 3.5. Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 15 73 3.5.1. Structs . . . . . . . . . . . . . . . . . . . . . . . 16 74 3.5.2. Tables . . . . . . . . . . . . . . . . . . . . . . . 19 75 3.5.3. Cuts in Maps . . . . . . . . . . . . . . . . . . . . 19 76 3.6. Tags . . . . . . . . . . . . . . . . . . . . . . . . . . 20 77 3.7. Unwrapping . . . . . . . . . . . . . . . . . . . . . . . 21 78 3.8. Controls . . . . . . . . . . . . . . . . . . . . . . . . 22 79 3.8.1. Control operator .size . . . . . . . . . . . . . . . 22 80 3.8.2. Control operator .bits . . . . . . . . . . . . . . . 23 81 3.8.3. Control operator .regexp . . . . . . . . . . . . . . 24 82 3.8.4. Control operators .cbor and .cborseq . . . . . . . . 25 83 3.8.5. Control operators .within and .and . . . . . . . . . 25 84 3.8.6. Control operators .lt, .le, .gt, .ge, .eq, .ne, and 85 .default . . . . . . . . . . . . . . . . . . . . . . 26 86 3.9. Socket/Plug . . . . . . . . . . . . . . . . . . . . . . . 27 87 3.10. Generics . . . . . . . . . . . . . . . . . . . . . . . . 28 88 3.11. Operator Precedence . . . . . . . . . . . . . . . . . . . 28 89 4. Making Use of CDDL . . . . . . . . . . . . . . . . . . . . . 30 90 4.1. As a guide to a human user . . . . . . . . . . . . . . . 30 91 4.2. For automated checking of CBOR data structure . . . . . . 30 92 4.3. For data analysis tools . . . . . . . . . . . . . . . . . 31 93 5. Security considerations . . . . . . . . . . . . . . . . . . . 31 94 6. IANA considerations . . . . . . . . . . . . . . . . . . . . . 31 95 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 32 96 7.1. Normative References . . . . . . . . . . . . . . . . . . 32 97 7.2. Informative References . . . . . . . . . . . . . . . . . 32 98 Appendix A. (Not used.) . . . . . . . . . . . . . . . . . . . . 33 99 Appendix B. ABNF grammar . . . . . . . . . . . . . . . . . . . . 33 100 Appendix C. Matching rules . . . . . . . . . . . . . . . . . . . 36 101 Appendix D. (Not used.) . . . . . . . . . . . . . . . . . . . . 40 102 Appendix E. Standard Prelude . . . . . . . . . . . . . . . . . . 40 103 E.1. Use with JSON . . . . . . . . . . . . . . . . . . . . . . 42 104 Appendix F. The CDDL tool . . . . . . . . . . . . . . . . . . . 44 105 Appendix G. Extended Diagnostic Notation . . . . . . . . . . . . 44 106 G.1. White space in byte string notation . . . . . . . . . . . 45 107 G.2. Text in byte string notation . . . . . . . . . . . . . . 45 108 G.3. Embedded CBOR and CBOR sequences in byte strings . . . . 45 109 G.4. Concatenated Strings . . . . . . . . . . . . . . . . . . 46 110 G.5. Hexadecimal, octal, and binary numbers . . . . . . . . . 46 111 G.6. Comments . . . . . . . . . . . . . . . . . . . . . . . . 47 112 Appendix H. Examples . . . . . . . . . . . . . . . . . . . . . . 47 113 H.1. RFC 7071 . . . . . . . . . . . . . . . . . . . . . . . . 48 114 H.1.1. Examples from JSON Content Rules . . . . . . . . . . 52 115 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 54 116 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 55 118 1. Introduction 120 In this document, a notational convention to express CBOR [RFC7049] 121 data structures is defined. 123 The main goal for the convention is to provide a unified notation 124 that can be used when defining protocols that use CBOR. We term the 125 convention "Concise data definition language", or CDDL. 127 The CBOR notational convention has the following goals: 129 (G1) Provide an unambiguous description of the overall structure of 130 a CBOR data structure. 132 (G2) Flexibility to express the freedoms of choice in the CBOR data 133 format. 135 (G3) Possibility to restrict format choices where appropriate 136 [_format]. 138 (G4) Able to express common CBOR datatypes and structures. 140 (G5) Human and machine readable and processable. 142 (G6) Automatic checking of data format compliance. 144 (G7) Extraction of specific elements from CBOR data for further 145 processing. 147 Not an explicit goal per se, but a convenient side effect of the JSON 148 generic data model being a subset of the CBOR generic data model, is 149 the fact that CDDL can also be used for describing JSON data 150 structures (see Appendix E.1). 152 This document has the following structure: 154 The syntax of CDDL is defined in Section 3. Examples of CDDL and 155 related CBOR data items ("instances") are defined in Appendix H. 156 Section 4 discusses usage of CDDL. Examples are provided early in 157 the text to better illustrate concept definitions. A formal 158 definition of CDDL using ABNF grammar is provided in Appendix B. 159 Finally, a prelude of standard CDDL definitions available in every 160 CBOR specification is listed in Appendix E. 162 1.1. Requirements notation 164 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 165 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 166 "OPTIONAL" in this document are to be interpreted as described in RFC 167 2119, BCP 14 [RFC2119]. 169 1.2. Terminology 171 New terms are introduced in _cursive_. CDDL text in the running text 172 is in "typewriter". 174 2. The Style of Data Structure Specification 176 CDDL focuses on styles of specification that are in use in the 177 community employing the data model as pioneered by JSON and now 178 refined in CBOR. 180 There are a number of more or less atomic elements of a CBOR data 181 model, such as numbers, simple values (false, true, nil), text and 182 byte strings; CDDL does not focus on specifying their structure. 183 CDDL of course also allows adding a CBOR tag to a data item. 185 The more important components of a data structure definition language 186 are the data types used for composition: arrays and maps in CBOR 187 (called arrays and objects in JSON). While these are only two 188 representation formats, they are used to specify four loosely 189 distinguishable styles of composition: 191 o A _vector_, an array of elements that are mostly of the same 192 semantics. The set of signatures associated with a signed data 193 item is a typical application of a vector. 195 o A _record_, an array the elements of which have different, 196 positionally defined semantics, as detailed in the data structure 197 definition. A 2D point, specified as an array of an x coordinate 198 (which comes first) and a y coordinate (coming second) is an 199 example of a record, as is the pair of exponent (first) and 200 mantissa (second) in a CBOR decimal fraction. 202 o A _table_, a map from a domain of map keys to a domain of map 203 values, that are mostly of the same semantics. A set of language 204 tags, each mapped to a text string translated to that specific 205 language, is an example of a table. The key domain is usually not 206 limited to a specific set by the specification, but open for the 207 application, e.g., in a table mapping IP addresses to MAC 208 addresses, the specification does not attempt to foresee all 209 possible IP addresses. 211 o A _struct_, a map from a domain of map keys as defined by the 212 specification to a domain of map values the semantics of each of 213 which is bound to a specific map key. This is what many people 214 have in mind when they think about JSON objects; CBOR adds the 215 ability to use map keys that are not just text strings. Structs 216 can be used to solve similar problems as records; the use of 217 explicit map keys facilitates optionality and extensibility. 219 Two important concepts provide the foundation for CDDL: 221 1. Instead of defining all four types of composition in CDDL 222 separately, or even defining one kind for arrays (vectors and 223 records) and one kind for maps (tables and structs), there is 224 only one kind of composition in CDDL: the _group_ (Section 2.1). 226 2. The other important concept is that of a _type_. The entire CDDL 227 specification defines a type (the one defined by its first 228 _rule_), which formally is the set of CBOR data items that are 229 acceptable as "instances" for this specification. CDDL 230 predefines a number of basic types such as "uint" (unsigned 231 integer) or "tstr" (text string), often making use of a simple 232 formal notation for CBOR data items. Each value that can be 233 expressed as a CBOR data item also is a type in its own right, 234 e.g. "1". A type can be built as a _choice_ of other types, 235 e.g., an "int" is either a "uint" or a "nint" (negative integer). 236 Finally, a type can be built as an array or a map from a group. 238 The rest of this section introduces a number of basic concepts of 239 CDDL, and section Section 3 defines additional syntax. Appendix C 240 gives a concise summary of the semantics of CDDL. 242 2.1. Groups and Composition in CDDL 244 CDDL Groups are lists of name/value pairs (group _entries_). 246 In an array context, only the value of the entry is represented; the 247 name is annotation only (and can be left off if not needed). In a 248 map context, the names become the map keys ("member keys"). 250 In an array context, the sequence of elements in the group is 251 important, as it is the information that allows associating actual 252 array elements with entries in the group. In a map context, the 253 sequence of entries in a group is not relevant (but there is still a 254 need to write down group entries in a sequence). 256 A simple example of using a group right in a map definition is: 258 person = { 259 age: int, 260 name: tstr, 261 employer: tstr, 262 } 264 Figure 1: Using a group in a map 266 The three entries of the group are written between the curly braces 267 that create the map: Here, "age", "name", and "employer" are the 268 names that turn into the map key text strings, and "int" and "tstr" 269 (text string) are the types of the map values under these keys. 271 A group by itself (without creating a map around it) can be placed in 272 (round) parentheses, and given a name by using it in a rule: 274 pii = ( 275 age: int, 276 name: tstr, 277 employer: tstr, 278 ) 280 Figure 2: A basic group 282 This separate, named group definition allows us to rephrase Figure 1 283 as: 285 person = { 286 pii 287 } 289 Figure 3: Using a group by name 291 Note that the (curly) braces signify the creation of a map; the 292 groups themselves are neutral as to whether they will be used in a 293 map or an array. 295 As shown in Figure 1, the parentheses for groups are optional when 296 there is some other set of brackets present. Note that they can 297 still be used, leading to the not so realistic, but perfectly valid 298 example: 300 person = {( 301 age: int, 302 name: tstr, 303 employer: tstr, 304 )} 306 Groups can be used to factor out common parts of structs, e.g., 307 instead of writing copy/paste style specifications such as in 308 Figure 4, one can factor out the common subgroup, choose a name for 309 it, and write only the specific parts into the individual maps 310 (Figure 5). 312 person = { 313 age: int, 314 name: tstr, 315 employer: tstr, 316 } 318 dog = { 319 age: int, 320 name: tstr, 321 leash-length: float, 322 } 324 Figure 4: Maps with copy/paste 325 person = { 326 identity, 327 employer: tstr, 328 } 330 dog = { 331 identity, 332 leash-length: float, 333 } 335 identity = ( 336 age: int, 337 name: tstr, 338 ) 340 Figure 5: Using a group for factorization 342 Note that the lists inside the braces in the above definitions 343 constitute (anonymous) groups, while "identity" is a named group. 345 2.1.1. Usage 347 Groups are the instrument used in composing data structures with 348 CDDL. It is a matter of style in defining those structures whether 349 to define groups (anonymously) right in their contexts or whether to 350 define them in a separate rule and to reference them with their 351 respective name (possibly more than once). 353 With this, one is allowed to define all small parts of their data 354 structures and compose bigger protocol units with those or to have 355 only one big protocol data unit that has all definitions ad hoc where 356 needed. 358 2.1.2. Syntax 360 The composition syntax intends to be concise and easy to read: 362 o The start of a group can be marked by '(' 364 o The end of a group can be marked by ')' 366 o Definitions of entries inside of a group are noted as follows: 367 _keytype => valuetype,_ (read "keytype maps to valuetype"). The 368 comma is actually optional (not just in the final entry), but it 369 is considered good style to set it. The double arrow can be 370 replaced by a colon in the common case of directly using a text 371 string or integer literal as a key (see Section 3.5.1). 373 An entry consists of a _keytype_ and a _valuetype_: 375 o _keytype_ is either an atom used as the actual key or a type in 376 general. The latter case may be needed when using groups in a 377 table context, where the actual keys are of lesser importance than 378 the key types, e.g in contexts verifying incoming data. 380 o _valuetype_ is a type, which could be derived from the major types 381 defined in [RFC7049], could be a convenience valuetype defined in 382 this document (Appendix E) or the name of a type defined in the 383 specification. 385 A group definition can also contain choices between groups, see 386 Section 2.2.2. 388 2.2. Types 390 2.2.1. Values 392 Values such as numbers and strings can be used in place of a type. 393 (For instance, this is a very common thing to do for a keytype, 394 common enough that CDDL provides additional convenience syntax for 395 this.) 397 2.2.2. Choices 399 Many places that allow a type also allow a choice between types, 400 delimited by a "/" (slash). The entire choice construct can be put 401 into parentheses if this is required to make the construction 402 unambiguous (please see Appendix B for the details). 404 Choices of values can be used to express enumerations: 406 attire = "bow tie" / "necktie" / "Internet attire" 407 protocol = 6 / 17 409 Similarly as for types, CDDL also allows choices between groups, 410 delimited by a "//" (double slash). 412 address = { delivery } 414 delivery = ( 415 street: tstr, ? number: uint, city // 416 po-box: uint, city // 417 per-pickup: true ) 419 city = ( 420 name: tstr, zip-code: uint 421 ) 423 Both for type choices and for group choices, additional alternatives 424 can be added to a rule later in separate rules by using "/=" and 425 "//=", respectively, instead of "=": 427 attire /= "swimwear" 429 delivery //= ( 430 lat: float, long: float, drone-type: tstr 431 ) 433 It is not an error if a name is first used with a "/=" or "//=" 434 (there is no need to "create it" with "="). 436 2.2.2.1. Ranges 438 Instead of naming all the values that make up a choice, CDDL allows 439 building a _range_ out of two values that are in an ordering 440 relationship. A range can be inclusive of both ends given (denoted 441 by joining two values by ".."), or include the first and exclude the 442 second (denoted by instead using "..."). 444 device-address = byte 445 max-byte = 255 446 byte = 0..max-byte ; inclusive range 447 first-non-byte = 256 448 byte1 = 0...first-non-byte ; byte1 is equivalent to byte 450 CDDL currently only allows ranges between numbers [_range]. 452 2.2.2.2. Turning a group into a choice 454 Some choices are built out of large numbers of values, often 455 integers, each of which is best given a semantic name in the 456 specification. Instead of naming each of these integers and then 457 accumulating these into a choice, CDDL allows building a choice from 458 a group by prefixing it with a "&" character: 460 terminal-color = &basecolors 461 basecolors = ( 462 black: 0, red: 1, green: 2, yellow: 3, 463 blue: 4, magenta: 5, cyan: 6, white: 7, 464 ) 465 extended-color = &( 466 basecolors, 467 orange: 8, pink: 9, purple: 10, brown: 11, 468 ) 470 As with the use of groups in arrays (Section 3.4), the membernames 471 have only documentary value (in particular, they might be used by a 472 tool when displaying integers that are taken from that choice). 474 2.2.3. Representation Types 476 CDDL allows the specification of a data item type by referring to the 477 CBOR representation (major and minor numbers). How this is used 478 should be evident from the prelude (Appendix E). 480 It may be necessary to make use of representation types outside the 481 prelude, e.g., a specification could start by making use of an 482 existing tag in a more specific way, or define a new tag not defined 483 in the prelude: 485 my_breakfast = #6.55799(breakfast) ; cbor-any is too general! 486 breakfast = cereal / porridge 487 cereal = #6.998(tstr) 488 porridge = #6.999([liquid, solid]) 489 liquid = milk / water 490 milk = 0 491 water = 1 492 solid = tstr 494 2.2.4. Root type 496 There is no special syntax to identify the root of a CDDL data 497 structure definition: that role is simply taken by the first rule 498 defined in the file. 500 This is motivated by the usual top-down approach for defining data 501 structures, decomposing a big data structure unit into smaller parts; 502 however, except for the root type, there is no need to strictly 503 follow this sequence. 505 (Note that there is no way to use a group as a root - it must be a 506 type. Using a group as the root might be employed as a way to 507 specify a CBOR sequence in a future version of this specification; 508 this would act as if that group is used in an array and the data 509 items in that fictional array form the members of the CBOR sequence.) 511 3. Syntax 513 In this section, the overall syntax of CDDL is shown, alongside some 514 examples just illustrating syntax. (The definition will not attempt 515 to be overly formal; refer to Appendix B for the details.) 517 3.1. General conventions 519 The basic syntax is inspired by ABNF [RFC5234], with 521 o rules, whether they define groups or types, are defined with a 522 name, followed by an equals sign "=" and the actual definition 523 according to the respective syntactic rules of that definition. 525 o A name can consist of any of the characters from the set {'A', 526 ..., 'Z', 'a', ..., 'z', '0', ..., '9', '_', '-', '@', '.', '$'}, 527 starting with an alphabetic character (including '@', '_', '$') 528 and ending in one or a digit. 530 * Names are case sensitive. 532 * It is preferred style to start a name with a lower case letter. 534 * The hyphen is preferred over the underscore (except in a 535 "bareword" (Section 3.5.1), where the semantics may actually 536 require an underscore). 538 * The period may be useful for larger specifications, to express 539 some module structure (as in "tcp.throughput" vs. 540 "udp.throughput"). 542 * A number of names are predefined in the CDDL prelude, as listed 543 in Appendix E. 545 * Rule names (types or groups) do not appear in the actual CBOR 546 encoding, but names used as "barewords" in member keys do. 548 o Comments are started by a ';' (semicolon) character and finish at 549 the end of a line (LF or CRLF). 551 o outside strings, whitespace (spaces, newlines, and comments) is 552 used to separate syntactic elements for readability (and to 553 separate identifiers or numbers that follow each other); it is 554 otherwise completely optional. 556 o Hexadecimal numbers are preceded by '0x' (without quotes, lower 557 case x), and are case insensitive. Similarly, binary numbers are 558 preceded by '0b'. 560 o Text strings are enclosed by double quotation '"' characters. 561 They follow the conventions for strings as defined in section 7 of 562 [RFC8259]. (ABNF users may want to note that there is no support 563 in CDDL for the concept of case insensitivity in text strings; if 564 necessary, regular expressions can be used (Section 3.8.3).) 566 o Byte strings are enclosed by single quotation "'" characters and 567 may be prefixed by "h" or "b64". If unprefixed, the string is 568 interpreted as with a text string, except that single quotes must 569 be escaped and that the UTF-8 bytes resulting are marked as a byte 570 string (major type 2). If prefixed as "h" or "b64", the string is 571 interpreted as a sequence of hex digits or a base64(url) string, 572 respectively (as with the diagnostic notation in section 6 of 573 [RFC7049]; cf. Appendix G.2); any white space present within the 574 string (including comments) is ignored in the prefixed case. 575 [_strings] 577 o CDDL uses UTF-8 [RFC3629] for its encoding. 579 Example: 581 ; This is a comment 582 person = { g } 584 g = ( 585 "name": tstr, 586 age: int, ; "age" is a bareword 587 ) 589 3.2. Occurrence 591 An optional _occurrence_ indicator can be given in front of a group 592 entry. It is either one of the characters '?' (optional), '*' (zero 593 or more), or '+' (one or more), or is of the form n*m, where n and m 594 are optional unsigned integers and n is the lower limit (default 0) 595 and m is the upper limit (default no limit) of occurrences. 597 If no occurrence indicator is specified, the group entry is to occur 598 exactly once (as if 1*1 were specified). 600 Note that CDDL, outside any directives/annotations that could 601 possibly be defined, does not make any prescription as to whether 602 arrays or maps use the definite length or indefinite length encoding. 603 I.e., there is no correlation between leaving the size of an array 604 "open" in the spec and the fact that it is then interchanged with 605 definite or indefinite length. 607 Please also note that CDDL can describe flexibility that the data 608 model of the target representation does not have. This is rather 609 obvious for JSON, but also is relevant for CBOR: 611 apartment = { 612 kitchen: size, 613 * bedroom: size, 614 } 615 size = float ; in m2 617 The previous specification does not mean that CBOR is changed to 618 allow to use the key "bedroom" more than once. In other words, due 619 to the restrictions imposed by the data model, the third line pretty 620 much turns into: 622 ? bedroom: size, 624 (Occurrence indicators beyond one still are useful in maps for groups 625 that allow a variety of keys.) 627 3.3. Predefined names for types 629 CDDL predefines a number of names. This subsection summarizes these 630 names, but please see Appendix E for the exact definitions. 632 The following keywords for primitive datatypes are defined: 634 "bool" Boolean value (major type 7, additional information 20 or 635 21). 637 "uint" An unsigned integer (major type 0). 639 "nint" A negative integer (major type 1). 641 "int" An unsigned integer or a negative integer. 643 "float16" A number representable as an IEEE 754 half-precision float 644 (major type 7, additional information 25). 646 "float32" A number representable as an IEEE 754 single-precision 647 float (major type 7, additional information 26). 649 "float64" A number representable as an IEEE 754 double-precision 650 float (major type 7, additional information 27). 652 "float" One of float16, float32, or float64. 654 "bstr" or "bytes" A byte string (major type 2). 656 "tstr" or "text" Text string (major type 3) 658 (Note that there are no predefined names for arrays or maps; these 659 are defined with the syntax given below.) 661 In addition, a number of types are defined in the prelude that are 662 associated with CBOR tags, such as "tdate", "bigint", "regexp" etc. 664 3.4. Arrays 666 Array definitions surround a group with square brackets. 668 For each entry, an occurrence indicator as specified in Section 3.2 669 is permitted. 671 For example: 673 unlimited-people = [* person] 674 one-or-two-people = [1*2 person] 675 at-least-two-people = [2* person] 676 person = ( 677 name: tstr, 678 age: uint, 679 ) 681 The group "person" is defined in such a way that repeating it in the 682 array each time generates alternating names and ages, so these are 683 four valid values for a data item of type "unlimited-people": 685 ["roundlet", 1047, "psychurgy", 2204, "extrarhythmical", 2231] 686 [] 687 ["aluminize", 212, "climograph", 4124] 688 ["penintime", 1513, "endocarditis", 4084, "impermeator", 1669, 689 "coextension", 865] 691 3.5. Maps 693 The syntax for specifying maps merits special attention, as well as a 694 number of optimizations and conveniences, as it is likely to be the 695 focal point of many specifications employing CDDL. While the syntax 696 does not strictly distinguish struct and table usage of maps, it 697 caters specifically to each of them. 699 But first, let's reiterate a feature of CBOR that it has inherited 700 from JSON: The key/value pairs in CBOR maps have no fixed ordering. 701 (One could imagine situations where fixing the ordering may be of 702 use. For example, a decoder could look for values related with 703 integer keys 1, 3 and 7. If the order were fixed and the decoder 704 encounters the key 4 without having encountered key 3, it could 705 conclude that key 3 is not available without doing more complicated 706 bookkeeping. Unfortunately, neither JSON nor CBOR support this, so 707 no attempt was made to support this in CDDL either.) 709 3.5.1. Structs 711 The "struct" usage of maps is similar to the way JSON objects are 712 used in many JSON applications. 714 A map is defined in the same way as defining an array (see 715 Section 3.4), except for using curly braces "{}" instead of square 716 brackets "[]". 718 An occurrence indicator as specified in Section 3.2 is permitted for 719 each group entry. 721 The following is an example of a structure: 723 Geography = [ 724 city : tstr, 725 gpsCoordinates : GpsCoordinates, 726 ] 728 GpsCoordinates = { 729 longitude : uint, ; multiplied by 10^7 730 latitude : uint, ; multiplied by 10^7 731 } 733 When encoding, the Geography structure is encoded using a CBOR array 734 with two entries (the keys for the group entries are ignored), 735 whereas the GpsCoordinates are encoded as a CBOR map with two key/ 736 value pairs. 738 Types used in a structure can be defined in separate rules or just in 739 place (potentially placed inside parentheses, such as for choices). 740 E.g.: 742 located-samples = { 743 sample-point: int, 744 samples: [+ float], 745 } 747 where "located-samples" is the datatype to be used when referring to 748 the struct, and "sample-point" and "samples" are the keys to be used. 749 This is actually a complete example: an identifier that is followed 750 by a colon can be directly used as the text string for a member key 751 (we speak of a "bareword" member key), as can a double-quoted string 752 or a number. (When other types, in particular multi-valued ones, are 753 used as keytypes, they are followed by a double arrow, see below.) 755 If a text string key does not match the syntax for an identifier (or 756 if the specifier just happens to prefer using double quotes), the 757 text string syntax can also be used in the member key position, 758 followed by a colon. The above example could therefore have been 759 written with quoted strings in the member key positions. More 760 generally, all the types defined can be used in a keytype position by 761 following them with a double arrow. A string also is a (single- 762 valued) type, so another form for this example is: 764 located-samples = { 765 "sample-point" => int, 766 "samples" => [+ float], 767 } 769 See Section 3.5.3 below for how the colon shortcut described here 770 also adds some implied semantics. 772 A better way to demonstrate the double-arrow use may be: 774 located-samples = { 775 sample-point: int, 776 samples: [+ float], 777 * equipment-type => equipment-tolerances, 778 } 779 equipment-type = [name: tstr, manufacturer: tstr] 780 equipment-tolerances = [+ [float, float]] 782 The example below defines a struct with optional entries: display 783 name (as a text string), the name components first name and family 784 name (as a map of text strings), and age information (as an unsigned 785 integer). 787 PersonalData = { 788 ? displayName: tstr, 789 NameComponents, 790 ? age: uint, 791 } 793 NameComponents = ( 794 ? firstName: tstr, 795 ? familyName: tstr, 796 ) 798 Note that the group definition for NameComponents does not generate 799 another map; instead, all four keys are directly in the struct built 800 by PersonalData. 802 In this example, all key/value pairs are optional from the 803 perspective of CDDL. With no occurrence indicator, an entry is 804 mandatory. 806 If the addition of more entries not specified by the current 807 specification is desired, one can add this possibility explicitly: 809 PersonalData = { 810 ? displayName: tstr, 811 NameComponents, 812 ? age: uint, 813 * tstr => any 814 } 816 NameComponents = ( 817 ? firstName: tstr, 818 ? familyName: tstr, 819 ) 821 Figure 6: Personal Data: Example for extensibility 823 The cddl tool (Appendix F) generated as one acceptable instance for 824 this specification: 826 {"familyName": "agust", "antiforeignism": "pretzel", 827 "springbuck": "illuminatingly", "exuviae": "ephemeris", 828 "kilometrage": "frogfish"} 830 (See Section 3.9 for one way to explicitly identify an extension 831 point.) 833 3.5.2. Tables 835 A table can be specified by defining a map with entries where the 836 keytype is not single-valued, e.g.: 838 square-roots = {* x => y} 839 x = int 840 y = float 842 Here, the key in each key/value pair has datatype x (defined as int), 843 and the value has datatype y (defined as float). 845 If the specification does not need to restrict one of x or y (i.e., 846 the application is free to choose per entry), it can be replaced by 847 the predefined name "any". 849 As another example, the following could be used as a conversion table 850 converting from an integer or float to a string: 852 tostring = {* mynumber => tstr} 853 mynumber = int / float 855 3.5.3. Cuts in Maps 857 The extensibility idiom discussed above for structs has one problem: 859 extensible-map-example = { 860 ? "optional-key" => int, 861 * tstr => any 862 } 864 In this example, there is one optional key "optional-key", which, 865 when present, maps to an integer. There is also a wild card for any 866 future additions. 868 Unfortunately, the data item 870 { "optional-key": "nonsense" } 872 does match this specification: While the first entry of the group 873 does not match, the second one (the wildcard) does. This may be very 874 well desirable (e.g., if a future extension is to be allowed to 875 extend the type of "optional-key"), but in many cases isn't. 877 In anticipation of a more general potential feature called "cuts", 878 CDDL allows inserting a cut "^" into the definition of the map entry: 880 extensible-map-example = { 881 ? "optional-key" ^ => int, 882 * tstr => any 883 } 885 A cut in this position means that once the map key matches the entry 886 carrying the cut, other potential matches for the key that occur in 887 later entries in the group of the map are no longer allowed. (This 888 rule applies independent of whether the value matches, too.) So the 889 example above no longer matches the version modified with a cut. 891 Since the desire for this kind of exclusive matching is so frequent, 892 the ":" shortcut is actually defined to include the cut semantics. 893 So the preceding example (including the cut) can be written more 894 simply as: 896 extensible-map-example = { 897 ? "optional-key": int, 898 * tstr => any 899 } 901 or even shorter, using a bareword for the key: 903 extensible-map-example = { 904 ? optional-key: int, 905 * tstr => any 906 } 908 3.6. Tags 910 A type can make use of a CBOR tag (major type 6) by using the 911 representation type notation, giving #6.nnn(type) where nnn is an 912 unsigned integer giving the tag number and "type" is the type of the 913 data item being tagged. 915 For example, the following line from the CDDL prelude (Appendix E) 916 defines "biguint" as a type name for a positive bignum N: 918 biguint = #6.2(bstr) 920 The tags defined by [RFC7049] are included in the prelude. 921 Additional tags since registered need to be added to a CDDL 922 specification as needed; e.g., a binary UUID tag could be referenced 923 as "buuid" in a specification after defining 925 buuid = #6.37(bstr) 927 In the following example, usage of the tag 32 for URIs is optional: 929 my_uri = #6.32(tstr) / tstr 931 3.7. Unwrapping 933 The group that is used to define a map or an array can often be 934 reused in the definition of another map or array. Similarly, a type 935 defined as a tag carries an internal data item that one would like to 936 refer to. In these cases, it is expedient to simply use the name of 937 the map, array, or tag type as a handle for the group or type defined 938 inside it. 940 The "unwrap" operator (written by preceding a name by a tilde 941 character "~") can be used to strip the type defined for a name by 942 one layer, exposing the underlying group (for maps and arrays) or 943 type (for tags). 945 For example, an application might want to define a basic and an 946 advanced header. Without unwrapping, this might be done as follows: 948 basic-header-group = ( 949 field1: int, 950 field2: text, 951 ) 953 basic-header = { basic-header-group } 955 advanced-header = { 956 basic-header-group, 957 field3: bytes, 958 field4: number, ; as in the tagged type "time" 959 } 961 Unwrapping simplifies this to: 963 basic-header = { 964 field1: int, 965 field2: text, 966 } 968 advanced-header = { 969 ~basic-header, 970 field3: bytes, 971 field4: ~time, 972 } 974 (Note that leaving out the first unwrap operator in the latter 975 example would lead to nesting the basic-header in its own map inside 976 the advanced-header, while, with the unwrapped basic-header, the 977 definition of the group inside basic-header is essentially repeated 978 inside advanced-header, leading to a single map. This can be used 979 for various applications often solved by inheritance in programming 980 languages. The effect of unwrapping can also be described as 981 "threading in" the group or type inside the referenced type, which 982 suggested the thread-like "~" character.) 984 3.8. Controls 986 A _control_ allows to relate a _target_ type with a _controller_ type 987 via a _control operator_. 989 The syntax for a control type is "target .control-operator 990 controller", where control operators are special identifiers prefixed 991 by a dot. (Note that _target_ or _controller_ might need to be 992 parenthesized.) 994 A number of control operators are defined at his point. Note that 995 the CDDL tool does not currently support combining multiple controls 996 on a single target. 998 3.8.1. Control operator .size 1000 A ".size" control controls the size of the target in bytes by the 1001 control type. Examples: 1003 full-address = [[+ label], ip4, ip6] 1004 ip4 = bstr .size 4 1005 ip6 = bstr .size 16 1006 label = bstr .size (1..63) 1008 Figure 7: Control for size in bytes 1010 When applied to an unsigned integer, the ".size" control restricts 1011 the range of that integer by giving a maximum number of bytes that 1012 should be needed in a computer representation of that unsigned 1013 integer. In other words, "uint .size N" is equivalent to 1014 "0...BYTES_N", where BYTES_N == 256**N. 1016 audio_sample = uint .size 3 ; 24-bit, equivalent to 0..16777215 1018 Figure 8: Control for integer size in bytes 1020 Note that, as with value restrictions in CDDL, this control is not a 1021 representation constraint; a number that fits into fewer bytes can 1022 still be represented in that form, and an inefficient implementation 1023 could use a longer form (unless that is restricted by some format 1024 constraints outside of CDDL, such as the rules in Section 3.9 of 1025 [RFC7049]). 1027 3.8.2. Control operator .bits 1029 A ".bits" control on a byte string indicates that, in the target, 1030 only the bits numbered by a number in the control type are allowed to 1031 be set. (Bits are counted the usual way, bit number "n" being set in 1032 "str" meaning that "(str[n >> 3] & (1 << (n & 7))) != 0".) 1033 [_bitsendian] 1035 Similarly, a ".bits" control on an unsigned integer "i" indicates 1036 that for all unsigned integers "n" where "(i & (1 << n)) != 0", "n" 1037 must be in the control type. 1039 tcpflagbytes = bstr .bits flags 1040 flags = &( 1041 fin: 8, 1042 syn: 9, 1043 rst: 10, 1044 psh: 11, 1045 ack: 12, 1046 urg: 13, 1047 ece: 14, 1048 cwr: 15, 1049 ns: 0, 1050 ) / (4..7) ; data offset bits 1052 rwxbits = uint .bits rwx 1053 rwx = &(r: 2, w: 1, x: 0) 1055 Figure 9: Control for what bits can be set 1057 The CDDL tool generates the following ten example instances for 1058 "tcpflagbytes": 1060 h'906d' h'01fc' h'8145' h'01b7' h'013d' h'409f' h'018e' h'c05f' 1061 h'01fa' h'01fe' 1063 These examples do not illustrate that the above CDDL specification 1064 does not explicitly specify a size of two bytes: A valid all clear 1065 instance of flag bytes could be "h''" or "h'00'" or even "h'000000'" 1066 as well. 1068 3.8.3. Control operator .regexp 1070 A ".regexp" control indicates that the text string given as a target 1071 needs to match the XSD regular expression given as a value in the 1072 control type. XSD regular expressions are defined in Appendix F of 1073 [W3C.REC-xmlschema-2-20041028]. 1075 nai = tstr .regexp "[A-Za-z0-9]+@[A-Za-z0-9]+(\\.[A-Za-z0-9]+)+" 1077 Figure 10: Control with an XSD regexp 1079 The CDDL tool proposes: 1081 "N1@CH57HF.4Znqe0.dYJRN.igjf" 1083 3.8.3.1. Usage considerations 1085 Note that XSD regular expressions do not support the usual \x or \u 1086 escapes for hexadecimal expression of bytes or unicode code points. 1087 However, in CDDL the XSD regular expressions are contained in text 1088 strings, the literal notation for which provides \u escapes; this 1089 should suffice for most applications that use regular expressions for 1090 text strings. (Note that this also means that there is one level of 1091 string escaping before the XSD escaping rules are applied.) 1093 XSD regular expressions support character class subtraction, a 1094 feature often not found in regular expression libraries; 1095 specification writers may want to use this feature sparingly. 1096 Similar considerations apply to Unicode character classes; where 1097 these are used, the specification SHOULD identify which Unicode 1098 versions are addressed. 1100 Other surprises for infrequent users of XSD regular expressions may 1101 include: 1103 o No direct support for case insensitivity. While case 1104 insensitivity has gone mostly out of fashion in protocol design, 1105 it is sometimes needed and then needs to be expressed manually as 1106 in "[Cc][Aa][Ss][Ee]". 1108 o The support for popular character classes such as \w and \d is 1109 based on Unicode character properties, which is often not what is 1110 desired in an ASCII-based protocol and thus might lead to 1111 surprises. (\s and \S do have their more conventional meanings, 1112 and "." matches any character but the line ending characters \r or 1113 \n.) 1115 3.8.3.2. Discussion 1117 There are many flavors of regular expression in use in the 1118 programming community. For instance, perl-compatible regular 1119 expressions (PCRE) are widely used and probably are more useful than 1120 XSD regular expressions. However, there is no normative reference 1121 for PCRE that could be used in the present document. Instead, we opt 1122 for XSD regular expressions for now. There is precedent for that 1123 choice in the IETF, e.g., in YANG [RFC7950]. 1125 Note that CDDL uses controls as its main extension point. This 1126 creates the opportunity to add further regular expression formats in 1127 addition to the one referenced here if desired. As an example, a 1128 control ".pcre" is defined in [I-D.bormann-cbor-cddl-freezer]. 1130 3.8.4. Control operators .cbor and .cborseq 1132 A ".cbor" control on a byte string indicates that the byte string 1133 carries a CBOR encoded data item. Decoded, the data item matches the 1134 type given as the right-hand side argument (type1 in the following 1135 example). 1137 "bytes .cbor type1" 1139 Similarly, a ".cborseq" control on a byte string indicates that the 1140 byte string carries a sequence of CBOR encoded data items. When the 1141 data items are taken as an array, the array matches the type given as 1142 the right-hand side argument (type2 in the following example). 1144 "bytes .cborseq type2" 1146 (The conversion of the encoded sequence to an array can be effected 1147 for instance by wrapping the byte string between the two bytes 0x9f 1148 and 0xff and decoding the wrapped byte string as a CBOR encoded data 1149 item.) 1151 3.8.5. Control operators .within and .and 1153 A ".and" control on a type indicates that the data item matches both 1154 that left hand side type and the type given as the right hand side. 1155 (Formally, the resulting type is the intersection of the two types 1156 given.) 1158 "type1 .and type2" 1160 A variant of the ".and" control is the ".within" control, which 1161 expresses an additional intent: the left hand side type is meant to 1162 be a subset of the right-hand-side type. 1164 "type1 .within type2" 1166 While both forms have the identical formal semantics (intersection), 1167 the intention of the ".within" form is that the right hand side gives 1168 guidance to the types allowed on the left hand side, which typically 1169 is a socket (Section 3.9): 1171 message = $message .within message-structure 1172 message-structure = [message_type, *message_option] 1173 message_type = 0..255 1174 message_option = any 1176 $message /= [3, dough: text, topping: [* text]] 1177 $message /= [4, noodles: text, sauce: text, parmesan: bool] 1179 For ".within", a tool might flag an error if type1 allows data items 1180 that are not allowed by type2. In contrast, for ".and", there is no 1181 expectation that type1 already is a subset of type2. 1183 3.8.6. Control operators .lt, .le, .gt, .ge, .eq, .ne, and .default 1185 The controls .lt, .le, .gt, .ge, .eq, .ne specify a constraint on the 1186 left hand side type to be a value less than, less than or equal, 1187 equal to, not equal to, greather than, or greater than or equal to a 1188 value given as a (single-valued) right hand side type. In the 1189 present specification, the first four controls (.lt, .le, .gt, .ge) 1190 are defined only for numeric types, as these have a natural ordering 1191 relationship. 1193 speed = number .ge 0 ; unit: m/s 1195 A variant of the ".ne" control is the ".default" control, which 1196 expresses an additional intent: the value specified by the right- 1197 hand-side type is intended as a default value for the left hand side 1198 type given, and the implied .ne control is there to prevent this 1199 value from being sent over the wire. This control is only meaningful 1200 when the controld type is used in an optional context; otherwise 1201 there would be no way to express the default value. 1203 timer = { 1204 time: uint, 1205 ? displayed-step: (number .gt 0) .default 1 1206 } 1208 3.9. Socket/Plug 1210 Both for type choices and group choices, a mechanism is defined that 1211 facilitates starting out with empty choices and assembling them 1212 later, potentially in separate files that are concatenated to build 1213 the full specification. 1215 Per convention, CDDL extension points are marked with a leading 1216 dollar sign (types) or two leading dollar signs (groups). Tools 1217 honor that convention by not raising an error if such a type or group 1218 is not defined at all; the symbol is then taken to be an empty type 1219 choice (group choice), i.e., no choice is available. 1221 tcp-header = {seq: uint, ack: uint, * $$tcp-option} 1223 ; later, in a different file 1225 $$tcp-option //= ( 1226 sack: [+(left: uint, right: uint)] 1227 ) 1229 ; and, maybe in another file 1231 $$tcp-option //= ( 1232 sack-permitted: true 1233 ) 1235 Names that start with a single "$" are "type sockets", names with a 1236 double "$$" are "group sockets". It is not an error if there is no 1237 definition for a socket at all; this then means there is no way to 1238 satisfy the rule (i.e., the choice is empty). 1240 All definitions (plugs) for socket names must be augmentations, i.e., 1241 they must be using "/=" and "//=", respectively. 1243 To pick up the example illustrated in Figure 6, the socket/plug 1244 mechanism could be used as shown in Figure 11: 1246 PersonalData = { 1247 ? displayName: tstr, 1248 NameComponents, 1249 ? age: uint, 1250 * $$personaldata-extensions 1251 } 1253 NameComponents = ( 1254 ? firstName: tstr, 1255 ? familyName: tstr, 1256 ) 1258 ; The above already works as is. 1259 ; But then, we can add later: 1261 $$personaldata-extensions //= ( 1262 favorite-salsa: tstr, 1263 ) 1265 ; and again, somewhere else: 1267 $$personaldata-extensions //= ( 1268 shoesize: uint, 1269 ) 1271 Figure 11: Personal Data example: Using socket/plug extensibility 1273 3.10. Generics 1275 Using angle brackets, the left hand side of a rule can add formal 1276 parameters after the name being defined, as in: 1278 messages = message<"reboot", "now"> / message<"sleep", 1..100> 1279 message = {type: t, value: v} 1281 When using a generic rule, the formal parameters are bound to the 1282 actual arguments supplied (also using angle brackets), within the 1283 scope of the generic rule (as if there were a rule of the form 1284 parameter = argument). 1286 (There are some limitations to nesting of generics in Appendix F at 1287 this time.) 1289 3.11. Operator Precedence 1291 As with any language that has multiple syntactic features such as 1292 prefix and infix operators, CDDL has operators that bind more tightly 1293 than others. This is becoming more complicated than, say, in ABNF, 1294 as CDDL has both types and groups, with operators that are specific 1295 to these concepts. Type operators (such as "/" for type choice) 1296 operate on types, while group operators (such as "//" for group 1297 choice) operate on groups. Types can simply be used in groups, but 1298 groups need to be bracketed (as arrays or maps) to become types. So, 1299 type operators naturally bind closer than group operators. 1301 For instance, in 1303 t = [group1] 1304 group1 = (a / b // c / d) 1305 a = 1 b = 2 c = 3 d = 4 1307 group1 is a group choice between the type choice of a and b and the 1308 type choice of c and d. This becomes more relevant once member keys 1309 and/or occurrences are added in: 1311 t = {group2} 1312 group2 = (? ab: a / b // cd: c / d) 1313 a = 1 b = 2 c = 3 d = 4 1315 is a group choice between the optional member "ab" of type a or b and 1316 the member "cd" of type c or d. Note that the optionality is 1317 attached to the first choice ("ab"), not to the second choice. 1319 Similarly, in 1321 t = [group3] 1322 group3 = (+ a / b / c) 1323 a = 1 b = 2 c = 3 1325 group3 is a repetition of a type choice between a, b, and c [unflex]; 1326 if just a is to be repeatable, a group choice is needed to focus the 1327 occurrence: 1329 t = [group4] 1330 group4 = (+ a // b / c) 1331 a = 1 b = 2 c = 3 1333 group4 is a group choice between a repeatable a and a single b or c. 1335 In general, as with many other languages with operator precedence 1336 rules, it is best not to rely on them, but to insert parentheses for 1337 readability: 1339 t = [group4a] 1340 group4a = ((+ a) // (b / c)) 1341 a = 1 b = 2 c = 3 1342 The operator precedences, in sequence of loose to tight binding, are 1343 defined in Appendix B and summarized in Table 1. (Arities given are 1344 1 for unary prefix operators and 2 for binary infix operators.) 1346 +----------+----+---------------------------+------+ 1347 | Operator | Ar | Operates on | Prec | 1348 +----------+----+---------------------------+------+ 1349 | = | 2 | name = type, name = group | 1 | 1350 | /= | 2 | name /= type | 1 | 1351 | //= | 2 | name //= group | 1 | 1352 | // | 2 | group // group | 2 | 1353 | , | 2 | group, group | 3 | 1354 | * | 1 | * group | 4 | 1355 | N*M | 1 | N*M group | 4 | 1356 | + | 1 | + group | 4 | 1357 | ? | 1 | ? group | 4 | 1358 | => | 2 | type => type | 5 | 1359 | : | 2 | name: type | 5 | 1360 | / | 2 | type / type | 6 | 1361 | & | 1 | &group | 6 | 1362 | .. | 2 | type..type | 7 | 1363 | ... | 2 | type...type | 7 | 1364 | .anno | 2 | type .anno type | 7 | 1365 +----------+----+---------------------------+------+ 1367 Table 1: Summary of operator precedences 1369 4. Making Use of CDDL 1371 In this section, we discuss several potential ways to employ CDDL. 1373 4.1. As a guide to a human user 1375 CDDL can be used to efficiently define the layout of CBOR data, such 1376 that a human implementer can easily see how data is supposed to be 1377 encoded. 1379 Since CDDL maps parts of the CBOR data to human readable names, tools 1380 could be built that use CDDL to provide a human friendly 1381 representation of the CBOR data, and allow them to edit such data 1382 while remaining compliant to its CDDL definition. 1384 4.2. For automated checking of CBOR data structure 1386 CDDL has been specified such that a machine can handle the CDDL 1387 definition and related CBOR data (and, thus, also JSON data). For 1388 example, a machine could use CDDL to check whether or not CBOR data 1389 is compliant to its definition. 1391 The need for thoroughness of such compliance checking depends on the 1392 application. For example, an application may decide not to check the 1393 data structure at all, and use the CDDL definition solely as a means 1394 to indicate the structure of the data to the programmer. 1396 On the other end, the application may also implement a checking 1397 mechanism that goes as far as checking that all mandatory map members 1398 are available. 1400 The matter in how far the data description must be enforced by an 1401 application is left to the designers and implementers of that 1402 application, keeping in mind related security considerations. 1404 In no case the intention is that a CDDL tool would be "writing code" 1405 for an implementation. 1407 4.3. For data analysis tools 1409 In the long run, it can be expected that more and more data will be 1410 stored using the CBOR data format. 1412 Where there is data, there is data analysis and the need to process 1413 such data automatically. CDDL can be used for such automated data 1414 processing, allowing tools to verify data, clean it, and extract 1415 particular parts of interest from it. 1417 Since CBOR is designed with constrained devices in mind, a likely use 1418 of it would be small sensors. An interesting use would thus be 1419 automated analysis of sensor data. 1421 5. Security considerations 1423 This document presents a content rules language for expressing CBOR 1424 data structures. As such, it does not bring any security issues on 1425 itself, although specification of protocols that use CBOR naturally 1426 need security analysis when defined. 1428 Topics that could be considered in a security considerations section 1429 that uses CDDL to define CBOR structures include the following: 1431 o Where could the language maybe cause confusion in a way that will 1432 enable security issues? 1434 6. IANA considerations 1436 This document does not require any IANA registrations. 1438 7. References 1440 7.1. Normative References 1442 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1443 Requirement Levels", BCP 14, RFC 2119, 1444 DOI 10.17487/RFC2119, March 1997, 1445 . 1447 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 1448 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 1449 2003, . 1451 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 1452 Specifications: ABNF", STD 68, RFC 5234, 1453 DOI 10.17487/RFC5234, January 2008, 1454 . 1456 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 1457 Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, 1458 October 2013, . 1460 [RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493, 1461 DOI 10.17487/RFC7493, March 2015, 1462 . 1464 [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data 1465 Interchange Format", STD 90, RFC 8259, 1466 DOI 10.17487/RFC8259, December 2017, 1467 . 1469 [W3C.REC-xmlschema-2-20041028] 1470 Biron, P. and A. Malhotra, "XML Schema Part 2: Datatypes 1471 Second Edition", World Wide Web Consortium Recommendation 1472 REC-xmlschema-2-20041028, October 2004, 1473 . 1475 7.2. Informative References 1477 [I-D.bormann-cbor-cddl-freezer] 1478 Bormann, C., "A feature freezer for the Concise Data 1479 Definition Language (CDDL)", draft-bormann-cbor-cddl- 1480 freezer-00 (work in progress), January 2018. 1482 [I-D.ietf-anima-grasp] 1483 Bormann, C., Carpenter, B., and B. Liu, "A Generic 1484 Autonomic Signaling Protocol (GRASP)", draft-ietf-anima- 1485 grasp-15 (work in progress), July 2017. 1487 [I-D.ietf-core-senml] 1488 Jennings, C., Shelby, Z., Arkko, J., Keranen, A., and C. 1489 Bormann, "Media Types for Sensor Measurement Lists 1490 (SenML)", draft-ietf-core-senml-12 (work in progress), 1491 December 2017. 1493 [I-D.newton-json-content-rules] 1494 Newton, A. and P. Cordell, "A Language for Rules 1495 Describing JSON Content", draft-newton-json-content- 1496 rules-09 (work in progress), September 2017. 1498 [RELAXNG] ISO/IEC, "Information technology -- Document Schema 1499 Definition Language (DSDL) -- Part 2: Regular-grammar- 1500 based validation -- RELAX NG", ISO/IEC 19757-2, December 1501 2008. 1503 [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 1504 Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, 1505 . 1507 [RFC7071] Borenstein, N. and M. Kucherawy, "A Media Type for 1508 Reputation Interchange", RFC 7071, DOI 10.17487/RFC7071, 1509 November 2013, . 1511 [RFC7950] Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language", 1512 RFC 7950, DOI 10.17487/RFC7950, August 2016, 1513 . 1515 [RFC8007] Murray, R. and B. Niven-Jenkins, "Content Delivery Network 1516 Interconnection (CDNI) Control Interface / Triggers", 1517 RFC 8007, DOI 10.17487/RFC8007, December 2016, 1518 . 1520 [RFC8152] Schaad, J., "CBOR Object Signing and Encryption (COSE)", 1521 RFC 8152, DOI 10.17487/RFC8152, July 2017, 1522 . 1524 7.3. URIs 1526 [1] https://github.com/cabo/cbor-diag 1528 Appendix A. (Not used.) 1530 Appendix B. ABNF grammar 1532 The following is a formal definition of the CDDL syntax in Augmented 1533 Backus-Naur Form (ABNF, [RFC5234]). [_abnftodo] 1534 cddl = S 1*rule 1535 rule = typename [genericparm] S assign S type S 1536 / groupname [genericparm] S assign S grpent S 1538 typename = id 1539 groupname = id 1541 assign = "=" / "/=" / "//=" 1543 genericparm = "<" S id S *("," S id S ) ">" 1544 genericarg = "<" S type1 S *("," S type1 S ) ">" 1546 type = type1 S *("/" S type1 S) 1548 type1 = type2 [S (rangeop / ctlop) S type2] 1550 type2 = value 1551 / typename [genericarg] 1552 / "(" type ")" 1553 / "~" S groupname [genericarg] 1554 / "#" "6" ["." uint] "(" S type S ")" ; note no space! 1555 / "#" DIGIT ["." uint] ; major/ai 1556 / "#" ; any 1557 / "{" S group S "}" 1558 / "[" S group S "]" 1559 / "&" S "(" S group S ")" 1560 / "&" S groupname [genericarg] 1562 rangeop = "..." / ".." 1564 ctlop = "." id 1566 group = grpchoice S *("//" S grpchoice S) 1568 grpchoice = *grpent 1570 grpent = [occur S] [memberkey S] type optcom 1571 / [occur S] groupname [genericarg] optcom ; preempted by above 1572 / [occur S] "(" S group S ")" optcom 1574 memberkey = type1 S ["^" S] "=>" 1575 / bareword S ":" 1576 / value S ":" 1578 bareword = id 1580 optcom = S ["," S] 1581 occur = [uint] "*" [uint] 1582 / "+" 1583 / "?" 1585 uint = ["0x" / "0b"] "0" 1586 / DIGIT1 *DIGIT 1587 / "0x" 1*HEXDIG 1588 / "0b" 1*BINDIG 1590 value = number 1591 / text 1592 / bytes 1594 int = ["-"] uint 1596 ; This is a float if it has fraction or exponent; int otherwise 1597 number = hexfloat / (int ["." fraction] ["e" exponent ]) 1598 hexfloat = "0x" 1*HEXDIG ["." 1*HEXDIG] "p" exponent 1599 fraction = 1*DIGIT 1600 exponent = ["+"/"-"] 1*DIGIT 1602 text = %x22 *SCHAR %x22 1603 SCHAR = %x20-21 / %x23-5B / %x5D-10FFFD / SESC 1604 SESC = "\" %x20-10FFFD 1606 bytes = [bsqual] %x27 *BCHAR %x27 1607 BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / CRLF 1608 bsqual = %x68 ; "h" 1609 / %x62.36.34 ; "b64" 1611 id = EALPHA *(*("-" / ".") (EALPHA / DIGIT)) 1612 ALPHA = %x41-5A / %x61-7A 1613 EALPHA = %x41-5A / %x61-7A / "@" / "_" / "$" 1614 DIGIT = %x30-39 1615 DIGIT1 = %x31-39 1616 HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F" 1617 BINDIG = %x30-31 1619 S = *WS 1620 WS = SP / NL 1621 SP = %x20 1622 NL = COMMENT / CRLF 1623 COMMENT = ";" *PCHAR CRLF 1624 PCHAR = %x20-10FFFD 1625 CRLF = %x0A / %x0D.0A 1627 Figure 12: CDDL ABNF 1629 Appendix C. Matching rules 1631 In this appendix, we go through the ABNF syntax rules defined in 1632 Appendix B and briefly describe the matching semantics of each 1633 syntactic feature. In this context, an instance (data item) 1634 "matches" a CDDL specification if it is allowed by the CDDL 1635 specification; this is then broken down to parts of specifications 1636 (type and group expressions) and parts of instances (data items). 1638 cddl = S 1*rule 1640 A CDDL specification is a sequence of one or more rules. Each rule 1641 gives a name to a right hand side expression, either a CDDL type or a 1642 CDDL group. Rule names can be used in the rule itself and/or other 1643 rules (and tools can output warnings if that is not the case). The 1644 order of the rules is significant only in two cases, including the 1645 following: The first rule defines the semantics of the entire 1646 specification; hence, its name may be descriptive only (or may be 1647 used in itself or other rules as with the other rule names). 1649 rule = typename [genericparm] S assign S type S 1650 / groupname [genericparm] S assign S grpent S 1652 typename = id 1653 groupname = id 1655 A rule defines a name for a type expression (production "type") or 1656 for a group expression (production "grpent"), with the intention that 1657 the semantics does not change when the name is replaced by its 1658 (parenthesized if needed) definition. 1660 assign = "=" / "/=" / "//=" 1662 A plain equals sign defines the rule name as the equivalent of the 1663 expression to the right. A "/=" or "//=" extends a named type or a 1664 group by additional choices; a number of these could be replaced by 1665 collecting all the right hand sides and creating a single rule with a 1666 type choice or a group choice built from the right hand sides in the 1667 order of the rules given. (It is not an error to extend a rule name 1668 that has not yet been defined; this makes the right hand side the 1669 first entry in the choice being created.) The creation of the type 1670 choices and group choices from the right hand sides of rules is the 1671 other case where rule order can be significant. 1673 genericparm = "<" S id S *("," S id S ) ">" 1674 genericarg = "<" S type1 S *("," S type1 S ) ">" 1675 Rule names can have generic parameters, which cause temporary 1676 assignments within the right hand sides to the parameter names from 1677 the arguments given when citing the rule name. 1679 type = type1 S *("/" S type1 S) 1681 A type can be given as a choice between one or more types. The 1682 choice matches a data item if the data item matches any one of the 1683 types given in the choice. The choice uses Parse Expression Grammar 1684 (PEG) semantics: The first choice that matches wins. (As a result, 1685 the order of rules that contribute to a single rule name can very 1686 well matter.) 1688 type1 = type2 [S (rangeop / ctlop) S type2] 1690 Two types can be combined with a range operator (which see below) or 1691 a control operator (see Section 3.8). 1693 type2 = value 1695 A type can be just a single value (such as 1 or "icecream" or 1696 h'0815'), which matches only a data item with that specific value (no 1697 conversions defined), 1699 / typename [genericarg] 1701 or be defined by a rule giving a meaning to a name (possibly after 1702 supplying generic args as required by the generic parameters), 1704 / "(" type ")" 1706 or be defined in a parenthesized type expression (parentheses may be 1707 necessary to override some operator precendence), or 1709 / "~" S groupname [genericarg] 1711 an "unwrapped" group (see Section 3.7), which matches the group 1712 inside a type defined as a map or an array by wrapping the group, or 1714 / "#" "6" ["." uint] "(" S type S ")" ; note no space! 1716 a tagged data item, tagged with the "uint" given and containing the 1717 type given as the tagged value, or 1719 / "#" DIGIT ["." uint] ; major/ai 1721 a data item of a major type (given by the DIGIT), optionally 1722 constrained to the additional information given by the uint, or 1723 / "#" ; any 1725 any data item, or 1727 / "{" S group S "}" 1729 a map expression, which matches a valid CBOR map the key/value pairs 1730 of which can be ordered in such a way that the resulting sequence 1731 matches the group expression, or 1733 / "[" S group S "]" 1735 an array expression, which matches a CBOR array the elements of 1736 which, when taken as values and complemented by a wildcard (matches 1737 anything) key each, match the group, or 1739 / "&" S "(" S group S ")" 1740 / "&" S groupname [genericarg] 1742 an enumeration expression, which matches any a value that is within 1743 the set of values that the values of the group given can take. 1745 rangeop = "..." / ".." 1747 A range operator can be used to join two type expressions that stand 1748 for either two integer values or two floating point values; it 1749 matches any value that is between the two values, where the first 1750 value is always included in the matching set and the second value is 1751 included for ".." and excluded for "...". 1753 ctlop = "." id 1755 A control operator ties a _target_ type to a _controller_ type as 1756 defined in Section 3.8. Note that control operators are an extension 1757 point for CDDL; additional documents may want to define additional 1758 control operators. 1760 group = grpchoice S *("//" S grpchoice S) 1762 A group matches any sequence of key/value pairs that matches any of 1763 the choices given (again using Parse Expression Grammar semantics). 1765 grpchoice = *grpent 1767 Each of the component groups is given as a sequence of group entries. 1768 For a match, the sequence of key/value pairs given needs to match the 1769 sequence of group entries in the sequence given. 1771 grpent = [occur S] [memberkey S] type optcom 1773 A group entry can be given by a value type, which needs to be matched 1774 by the value part of a single element, and optionally a memberkey 1775 type, which needs to be matched by the key part of the element, if 1776 the memberkey is given. If the memberkey is not given, the entry can 1777 only be used for matching arrays, not for maps. (See below how that 1778 is modified by the occurrence indicator.) 1780 / [occur S] groupname [genericarg] optcom ; preempted by above 1782 A group entry can be built from a named group, or 1784 / [occur S] "(" S group S ")" optcom 1786 from a parenthesized group, again with a possible occurrence 1787 indicator. 1789 memberkey = type1 S ["^" S] "=>" 1790 / bareword S ":" 1791 / value S ":" 1793 Key types can be given by a type expression, a bareword (which stands 1794 for string value created from this bareword), or a value (which 1795 stands for a type that just contains this value). A key value 1796 matches its key type if the key value is a member of the key type, 1797 unless a cut preceding it in the group applies (see Section 3.5.3 how 1798 map matching is infuenced by the presence of the cuts denoted by "^" 1799 or ":" in previous entries). 1801 bareword = id 1803 A bareword is an alternative way to write a type with a single text 1804 string value; it can only be used in the syntactic context given 1805 above. 1807 optcom = S ["," S] 1809 (Optional commas do not influence the matching.) 1811 occur = [uint] "*" [uint] 1812 / "+" 1813 / "?" 1815 An occurrence indicator modifies the group given to its right by 1816 requiring the group to match the sequence to be matched exactly for a 1817 certain number of times (see Section 3.2) in sequence, i.e. it acts 1818 as a (possibly infinite) group choice that contains choices with the 1819 group repeated each of the occurrences times. 1821 The rest of the ABNF describes syntax for value notation that should 1822 be familiar from programming languages, with the possible exception 1823 of h'..' and b64'..' for byte strings, as well as syntactic elements 1824 such as comments and line ends. 1826 Appendix D. (Not used.) 1828 Appendix E. Standard Prelude 1830 The following prelude is automatically added to each CDDL file 1831 [tdate]. (Note that technically, it is a postlude, as it does not 1832 disturb the selection of the first rule as the root of the 1833 definition.) 1834 any = # 1836 uint = #0 1837 nint = #1 1838 int = uint / nint 1840 bstr = #2 1841 bytes = bstr 1842 tstr = #3 1843 text = tstr 1845 tdate = #6.0(tstr) 1846 time = #6.1(number) 1847 number = int / float 1848 biguint = #6.2(bstr) 1849 bignint = #6.3(bstr) 1850 bigint = biguint / bignint 1851 integer = int / bigint 1852 unsigned = uint / biguint 1853 decfrac = #6.4([e10: int, m: integer]) 1854 bigfloat = #6.5([e2: int, m: integer]) 1855 eb64url = #6.21(any) 1856 eb64legacy = #6.22(any) 1857 eb16 = #6.23(any) 1858 encoded-cbor = #6.24(bstr) 1859 uri = #6.32(tstr) 1860 b64url = #6.33(tstr) 1861 b64legacy = #6.34(tstr) 1862 regexp = #6.35(tstr) 1863 mime-message = #6.36(tstr) 1864 cbor-any = #6.55799(any) 1866 float16 = #7.25 1867 float32 = #7.26 1868 float64 = #7.27 1869 float16-32 = float16 / float32 1870 float32-64 = float32 / float64 1871 float = float16-32 / float64 1873 false = #7.20 1874 true = #7.21 1875 bool = false / true 1876 nil = #7.22 1877 null = nil 1878 undefined = #7.23 1880 Figure 13: CDDL Prelude 1882 Note that the prelude is deemed to be fixed. This means, for 1883 instance, that additional tags beyond [RFC7049], as registered, need 1884 to be defined in each CDDL file that is using them. 1886 A common stumbling point is that the prelude does not define a type 1887 "string". CBOR has byte strings ("bytes" in the prelude) and text 1888 strings ("text"), so a type that is simply called "string" would be 1889 ambiguous. 1891 E.1. Use with JSON 1893 The JSON generic data model (implicit in [RFC8259]) is a subset of 1894 the generic data model of CBOR. So one can use CDDL with JSON by 1895 limiting oneself to what can be represented in JSON. Roughly 1896 speaking, this means leaving out byte strings, tags, and simple 1897 values other than "false", "true", and "null", leading to the 1898 following limited prelude: 1900 any = # 1902 uint = #0 1903 nint = #1 1904 int = uint / nint 1906 tstr = #3 1907 text = tstr 1909 number = int / float 1911 float16 = #7.25 1912 float32 = #7.26 1913 float64 = #7.27 1914 float16-32 = float16 / float32 1915 float32-64 = float32 / float64 1916 float = float16-32 / float64 1918 false = #7.20 1919 true = #7.21 1920 bool = false / true 1921 nil = #7.22 1922 null = nil 1924 Figure 14: JSON compatible subset of CDDL Prelude 1926 (The major types given here do not have a direct meaning in JSON, but 1927 they can be interpreted as CBOR major types translated through 1928 Section 4 of [RFC7049].) 1929 There are a few fine points in using CDDL with JSON. First, JSON 1930 does not distinguish between integers and floating point numbers; 1931 there is only one kind of number (which may happen to be integral). 1932 In this context, specifying a type as "uint", "nint" or "int" then 1933 becomes a predicate that the number be integral. As an example, this 1934 means that the following JSON numbers are all matching "uint": 1936 10 10.0 1e1 1.0e1 100e-1 1938 (The fact that these are all integers may be surprising to users 1939 accustomed to the long tradition in programming languages of using 1940 decimal points or exponents in a number to indicate a floating point 1941 literal.) 1943 CDDL distinguishes the various CBOR number types, but there is only 1944 one number type in JSON. The effect of specifying a floating point 1945 precision (float16/float32/float64) is only to restrict the set of 1946 permissible values to those expressible with binary16/binary32/ 1947 binary64; this is unlikely to be very useful when using CDDL for 1948 specifying JSON data structures. 1950 Fundamentally, the number system of JSON itself is based on decimal 1951 numbers and decimal fractions and does not have limits to its 1952 precision or range. In practice, JSON numbers are often parsed into 1953 a number type that is called float64 here, creating a number of 1954 limitations to the generic data model [RFC7493]. In particular, this 1955 means that integers can only be expressed with interoperable 1956 exactness when they lie in the range [-(2**53)+1, (2**53)-1] -- a 1957 smaller range than that covered by CDDL "int". 1959 JSON applications that want to stay compatible with I-JSON therefore 1960 may want to define integer types with more limited ranges, such as in 1961 Figure 15. Note that the types given here are not part of the 1962 prelude; they need to be copied into the CDDL specification if 1963 needed. 1965 ij-uint = 0..9007199254740991 1966 ij-nint = -9007199254740991..-1 1967 ij-int = -9007199254740991..9007199254740991 1969 Figure 15: I-JSON types for CDDL (not part of prelude) 1971 JSON applications that do not need to stay compatible with I-JSON and 1972 that actually may need to go beyond the 64-bit unsigned and negative 1973 integers supported by "int" (= "uint"/"nint") may want to use the 1974 following additional types from the standard prelude, which are 1975 expressed in terms of tags but can straightforwardly be mapped into 1976 JSON (but not I-JSON) numbers: 1978 biguint = #6.2(bstr) 1979 bignint = #6.3(bstr) 1980 bigint = biguint / bignint 1981 integer = int / bigint 1982 unsigned = uint / biguint 1984 CDDL at this point does not have a way to express the unlimited 1985 floating point precision that is theoretically possible with JSON; at 1986 the time of writing, this is rarely used in protocols in practice. 1988 Note that a data model described in CDDL is always restricted by what 1989 can be expressed in the serialization; e.g., floating point values 1990 such as NaN (not a number) and the infinities cannot be represented 1991 in JSON even if they are allowed in the CDDL generic data model. 1993 Appendix F. The CDDL tool 1995 A rough CDDL tool is available. For CDDL specifications, it can 1996 check the syntax, generate one or more instances (expressed in CBOR 1997 diagnostic notation or in pretty-printed JSON), and validate an 1998 existing instance against the specification: 2000 Usage: 2001 cddl spec.cddl generate [n] 2002 cddl spec.cddl json-generate [n] 2003 cddl spec.cddl validate instance.cbor 2004 cddl spec.cddl validate instance.json 2006 Figure 16: CDDL tool usage 2008 Install on a system with a modern Ruby via: 2010 gem install cddl 2012 Figure 17: CDDL tool installation 2014 The accompanying CBOR diagnostic tools (which are automatically 2015 installed by the above) are described in https://github.com/cabo/ 2016 cbor-diag [1]; they can be used to convert between binary CBOR, a 2017 pretty-printed form of that, CBOR diagnostic notation, JSON, and 2018 YAML. 2020 Appendix G. Extended Diagnostic Notation 2022 Section 6 of [RFC7049] defines a "diagnostic notation" in order to be 2023 able to converse about CBOR data items without having to resort to 2024 binary data. Diagnostic notation is based on JSON, with extensions 2025 for representing CBOR constructs such as binary data and tags. 2027 (Standardizing this together with the actual interchange format does 2028 not serve to create another interchange format, but enables the use 2029 of a shared diagnostic notation in tools for and documents about 2030 CBOR.) 2032 This section discusses a few extensions to the diagnostic notation 2033 that have turned out to be useful since RFC 7049 was written. We 2034 refer to the result as extended diagnostic notation (EDN). 2036 G.1. White space in byte string notation 2038 Examples often benefit from some white space (spaces, line breaks) in 2039 byte strings. In extended diagnostic notation, white space is 2040 ignored in prefixed byte strings; for instance, the following are 2041 equivalent: 2043 h'48656c6c6f20776f726c64' 2044 h'48 65 6c 6c 6f 20 77 6f 72 6c 64' 2045 h'4 86 56c 6c6f 2046 20776 f726c64' 2048 G.2. Text in byte string notation 2050 Diagnostic notation notates Byte strings in one of the [RFC4648] base 2051 encodings,, enclosed in single quotes, prefixed by >h< for base16, 2052 >b32< for base32, >h32< for base32hex, >b64< for base64 or base64url. 2053 Quite often, byte strings carry bytes that are meaningfully 2054 interpreted as UTF-8 text. Extended Diagnostic Notation allows the 2055 use of single quotes without a prefix to express byte strings with 2056 UTF-8 text; for instance, the following are equivalent: 2058 'hello world' 2059 h'68656c6c6f20776f726c64' 2061 The escaping rules of JSON strings are applied equivalently for text- 2062 based byte strings, e.g., \ stands for a single backslash and ' 2063 stands for a single quote. White space is included literally, i.e., 2064 the previous section does not apply to text-based byte strings. 2066 G.3. Embedded CBOR and CBOR sequences in byte strings 2068 Where a byte string is to carry an embedded CBOR-encoded item, or 2069 more generally a sequence of zero or more such items, the diagnostic 2070 notation for these zero or more CBOR data items, separated by 2071 commata, can be enclosed in << and >> to notate the byte string 2072 resulting from encoding the data items and concatenating the result. 2073 For instance, each pair of columns in the following are equivalent: 2075 <<1>> h'01' 2076 <<1, 2>> h'0102' 2077 <<"foo", null>> h'63666F6FF6' 2078 <<>> h'' 2080 G.4. Concatenated Strings 2082 While the ability to include white space enables line-breaking of 2083 encoded byte strings, a mechanism is needed to be able to include 2084 text strings as well as byte strings in direct UTF-8 representation 2085 into line-based documents (such as RFCs and source code). 2087 We extend the diagnostic notation by allowing multiple text strings 2088 or multiple byte strings to be notated separated by white space, 2089 these are then concatenated into a single text or byte string, 2090 respectively. Text strings and byte strings do not mix within such a 2091 concatenation, except that byte string notation can be used inside a 2092 sequence of concatenated text string notation to encode characters 2093 that may be better represented in an encoded way. The following four 2094 values are equivalent: 2096 "Hello world" 2097 "Hello " "world" 2098 "Hello" h'20' "world" 2099 "" h'48656c6c6f20776f726c64' "" 2101 Similarly, the following byte string values are equivalent 2103 'Hello world' 2104 'Hello ' 'world' 2105 'Hello ' h'776f726c64' 2106 'Hello' h'20' 'world' 2107 '' h'48656c6c6f20776f726c64' '' b64'' 2108 h'4 86 56c 6c6f' h' 20776 f726c64' 2110 (Note that the approach of separating by whitespace, while familiar 2111 from the C language, requires some attention - a single comma makes a 2112 big difference here.) 2114 G.5. Hexadecimal, octal, and binary numbers 2116 In addition to JSON's decimal numbers, EDN provides hexadecimal, 2117 octal and binary numbers in the usual C-language notation (octal with 2118 0o prefix present only). 2120 The following are equivalent: 2122 4711 2123 0x1267 2124 0o11147 2125 0b1001001100111 2127 As are: 2129 1.5 2130 0x1.8p0 2131 0x18p-4 2133 G.6. Comments 2135 Longer pieces of diagnostic notation may benefit from comments. JSON 2136 famously does not provide for comments, and basic RFC 7049 diagnostic 2137 notation inherits this property. 2139 In extended diagnostic notation, comments can be included, delimited 2140 by slashes ("/"). Any text within and including a pair of slashes is 2141 considered a comment. 2143 Comments are considered white space. Hence, they are allowed in 2144 prefixed byte strings; for instance, the following are equivalent: 2146 h'68656c6c6f20776f726c64' 2147 h'68 65 6c /doubled l!/ 6c 6f /hello/ 2148 20 /space/ 2149 77 6f 72 6c 64' /world/ 2151 This can be used to annotate a CBOR structure as in: 2153 /grasp-message/ [/M_DISCOVERY/ 1, /session-id/ 10584416, 2154 /objective/ [/objective-name/ "opsonize", 2155 /D, N, S/ 7, /loop-count/ 105]] 2157 (There are currently no end-of-line comments. If we want to add 2158 them, "//" sounds like a reasonable delimiter given that we already 2159 use slashes for comments, but we also could go e.g. for "#".) 2161 Appendix H. Examples 2163 This section contains various examples of structures defined using 2164 CDDL. 2166 The theme for the first example is taken from [RFC7071], which 2167 defines certain JSON structures in English. For a similar example, 2168 it may also be of interest to examine Appendix A of [RFC8007], which 2169 contains a CDDL definition for a JSON structure defined in the main 2170 body of the RFC. 2172 The second subsection in this appendix translates examples from 2173 [I-D.newton-json-content-rules] into CDDL. 2175 These examples all happen to describe data that is interchanged in 2176 JSON. Examples for CDDL definitions of data that is interchanged in 2177 CBOR can be found in [RFC8152], [I-D.ietf-anima-grasp], or 2178 [I-D.ietf-core-senml]. 2180 H.1. RFC 7071 2182 [RFC7071] defines the Reputon structure for JSON using somewhat 2183 formalized English text. Here is a (somewhat verbose) equivalent 2184 definition using the same terms, but notated in CDDL: 2186 reputation-object = { 2187 reputation-context, 2188 reputon-list 2189 } 2191 reputation-context = ( 2192 application: text 2193 ) 2195 reputon-list = ( 2196 reputons: reputon-array 2197 ) 2199 reputon-array = [* reputon] 2201 reputon = { 2202 rater-value, 2203 assertion-value, 2204 rated-value, 2205 rating-value, 2206 ? conf-value, 2207 ? normal-value, 2208 ? sample-value, 2209 ? gen-value, 2210 ? expire-value, 2211 * ext-value, 2212 } 2214 rater-value = ( rater: text ) 2215 assertion-value = ( assertion: text ) 2216 rated-value = ( rated: text ) 2217 rating-value = ( rating: float16 ) 2218 conf-value = ( confidence: float16 ) 2219 normal-value = ( normal-rating: float16 ) 2220 sample-value = ( sample-size: uint ) 2221 gen-value = ( generated: uint ) 2222 expire-value = ( expires: uint ) 2223 ext-value = ( text => any ) 2225 An equivalent, more compact form of this example would be: 2227 reputation-object = { 2228 application: text 2229 reputons: [* reputon] 2230 } 2232 reputon = { 2233 rater: text 2234 assertion: text 2235 rated: text 2236 rating: float16 2237 ? confidence: float16 2238 ? normal-rating: float16 2239 ? sample-size: uint 2240 ? generated: uint 2241 ? expires: uint 2242 * text => any 2243 } 2245 Note how this rather clearly delineates the structure somewhat 2246 shrouded by so many words in section 6.2.2. of [RFC7071]. Also, this 2247 definition makes it clear that several ext-values are allowed (by 2248 definition with different member names); RFC 7071 could be read to 2249 forbid the repetition of ext-value ("A specific reputon-element MUST 2250 NOT appear more than once" is ambiguous.) 2252 The CDDL tool (which hasn't quite been trained for polite 2253 conversation) says: 2255 { 2256 "application": "tridentiferous", 2257 "reputons": [ 2258 { 2259 "rater": "loamily", 2260 "assertion": "Dasyprocta", 2261 "rated": "uncommensurableness", 2262 "rating": 0.05055809746548934, 2263 "confidence": 0.7484706448605812, 2264 "normal-rating": 0.8677887734049299, 2265 "sample-size": 4059, 2266 "expires": 3969, 2267 "bearer": "nitty", 2268 "faucal": "postulnar", 2269 "naturalism": "sarcotic" 2270 }, 2271 { 2272 "rater": "precreed", 2273 "assertion": "xanthosis", 2274 "rated": "balsamy", 2275 "rating": 0.36091333590593955, 2276 "confidence": 0.3700759808403371, 2277 "sample-size": 3904 2278 }, 2279 { 2280 "rater": "urinosexual", 2281 "assertion": "malacostracous", 2282 "rated": "arenariae", 2283 "rating": 0.9210673488013762, 2284 "normal-rating": 0.4778762617112776, 2285 "sample-size": 4428, 2286 "generated": 3294, 2287 "backfurrow": "enterable", 2288 "fruitgrower": "flannelflower" 2289 }, 2290 { 2291 "rater": "pedologistically", 2292 "assertion": "unmetaphysical", 2293 "rated": "elocutionist", 2294 "rating": 0.42073613384304287, 2295 "misimagine": "retinaculum", 2296 "snobbish": "contradict", 2297 "Bosporanic": "periostotomy", 2298 "dayworker": "intragyral" 2299 } 2300 ] 2301 } 2303 H.1.1. Examples from JSON Content Rules 2305 Although JSON Content Rules [I-D.newton-json-content-rules] seems to 2306 address a more general problem than CDDL, it is still a worthwhile 2307 resource to explore for examples (beyond all the inspiration the 2308 format itself has had for CDDL). 2310 Figure 2 of the JCR I-D looks very similar, if slightly less noisy, 2311 in CDDL: 2313 root = [2*2 { 2314 precision: text, 2315 Latitude: float, 2316 Longitude: float, 2317 Address: text, 2318 City: text, 2319 State: text, 2320 Zip: text, 2321 Country: text 2322 }] 2324 Figure 18: JCR, Figure 2, in CDDL 2326 Apart from the lack of a need to quote the member names, text strings 2327 are called "text" or "tstr" in CDDL ("string" would be ambiguous as 2328 CBOR also provides byte strings). 2330 The CDDL tool creates the below example instance for this: 2332 [{"precision": "pyrosphere", "Latitude": 0.5399712314350172, 2333 "Longitude": 0.5157523963028087, "Address": "resow", 2334 "City": "problemwise", "State": "martyrlike", "Zip": "preprove", 2335 "Country": "Pace"}, 2336 {"precision": "unrigging", "Latitude": 0.10422704368372193, 2337 "Longitude": 0.6279808663725834, "Address": "picturedom", 2338 "City": "decipherability", "State": "autometry", "Zip": "pout", 2339 "Country": "wimple"}] 2341 Figure 4 of the JCR I-D in CDDL: 2343 root = { image } 2345 image = ( 2346 Image: { 2347 size, 2348 Title: text, 2349 thumbnail, 2350 IDs: [* int] 2351 } 2352 ) 2354 size = ( 2355 Width: 0..1280 2356 Height: 0..1024 2357 ) 2359 thumbnail = ( 2360 Thumbnail: { 2361 size, 2362 Url: ~uri 2363 } 2364 ) 2366 This shows how the group concept can be used to keep related elements 2367 (here: width, height) together, and to emulate the JCR style of 2368 specification. (It also shows referencing a type by unwrapping a tag 2369 from the prelude, "uri" - this could be done differently.) The more 2370 compact form of Figure 5 of the JCR I-D could be emulated like this: 2372 root = { 2373 Image: { 2374 size, Title: text, 2375 Thumbnail: { size, Url: ~uri }, 2376 IDs: [* int] 2377 } 2378 } 2380 size = ( 2381 Width: 0..1280, 2382 Height: 0..1024, 2383 ) 2385 The CDDL tool creates the below example instance for this: 2387 {"Image": {"Width": 566, "Height": 516, "Title": "leisterer", 2388 "Thumbnail": {"Width": 1111, "Height": 176, "Url": 32("scrog")}, 2389 "IDs": []}} 2391 Acknowledgements 2393 CDDL was originally conceived by Bert Greevenbosch, who also wrote 2394 the original five versions of this document. 2396 Inspiration was taken from the C and Pascal languages, MPEG's 2397 conventions for describing structures in the ISO base media file 2398 format, Relax-NG and its compact syntax [RELAXNG], and in particular 2399 from Andrew Lee Newton's "JSON Content Rules" 2400 [I-D.newton-json-content-rules]. 2402 Useful feedback came from members of the IETF CBOR WG, in particular 2403 Joe Hildebrand, Sean Leonard and Jim Schaad. Also, Francesca 2404 Palombini and Joe volunteered to chair this WG, providing the 2405 framework for generating and processing this feedback. 2407 The CDDL tool was written by Carsten Bormann, building on previous 2408 work by Troy Heninger and Tom Lord. 2410 Editorial Comments 2412 [_format] So far, the ability to restrict format choices have not been 2413 needed beyond the floating point formats. Those can be 2414 applied to ranges using the new .and control now. It is not 2415 clear we want to add more format control before we have a use 2416 case. 2418 [_range] TO DO: define this precisely. This clearly includes integers 2419 and floats. Strings - as in "a".."z" - could be added if 2420 desired, but this would require adopting a definition of string 2421 ordering and possibly a successor function so "a".."z" does not 2422 include "bb". 2424 [_strings] TO DO: This still needs to be fully realized in the ABNF and 2425 in the CDDL tool. 2427 [_bitsendian] How useful would it be to have another variant that counts 2428 bits like in RFC box notation? (Or at least per-byte? 2429 32-bit words don't always perfectly mesh with byte 2430 strings.) 2432 [unflex] A comment has been that this is counter-intuitive. One 2433 solution would be to simply disallow unparenthesized usage of 2434 occurrence indicators in front of type choices unless a member 2435 key is also present like in group2 above. 2437 [_abnftodo] Potential improvements: the prefixed byte strings are more 2438 liberally specified than they actually are. 2440 [tdate] The prelude as included here does not yet have a .regexp control 2441 on tdate, but we probably do want to have one. 2443 Authors' Addresses 2445 Henk Birkholz 2446 Fraunhofer SIT 2447 Rheinstrasse 75 2448 Darmstadt 64295 2449 Germany 2451 Email: henk.birkholz@sit.fraunhofer.de 2453 Christoph Vigano 2454 Universitaet Bremen 2456 Email: christoph.vigano@uni-bremen.de 2458 Carsten Bormann 2459 Universitaet Bremen TZI 2460 Bibliothekstr. 1 2461 Bremen D-28359 2462 Germany 2464 Phone: +49-421-218-63921 2465 Email: cabo@tzi.org