idnits 2.17.1 draft-bormann-cbor-cddl-freezer-08.txt: -(3): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 2 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (25 June 2021) is 1026 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Duplicate reference: RFC8610, mentioned in 'Err6526', was also mentioned in 'RFC8610'. -- Duplicate reference: RFC8610, mentioned in 'Err6527', was also mentioned in 'Err6526'. -- Duplicate reference: RFC8610, mentioned in 'Err6543', was also mentioned in 'Err6527'. == Outdated reference: A later version (-04) exists of draft-bormann-jsonpath-iregexp-00 == Outdated reference: A later version (-06) exists of draft-ietf-core-coral-03 Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Bormann 3 Internet-Draft Universität Bremen TZI 4 Intended status: Informational 25 June 2021 5 Expires: 27 December 2021 7 A feature freezer for the Concise Data Definition Language (CDDL) 8 draft-bormann-cbor-cddl-freezer-08 10 Abstract 12 In defining the Concise Data Definition Language (CDDL), some 13 features have turned up that would be nice to have. In the interest 14 of completing this specification in a timely manner, the present 15 document was started to collect nice-to-have features that did not 16 make it into the first RFC for CDDL, RFC 8610. 18 It is now time to discuss thawing some of the concepts discussed 19 here. A number of additional proposals have been added. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on 27 December 2021. 38 Copyright Notice 40 Copyright (c) 2021 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 45 license-info) in effect on the date of publication of this document. 46 Please review these documents carefully, as they describe your rights 47 and restrictions with respect to this document. Code Components 48 extracted from this document must include Simplified BSD License text 49 as described in Section 4.e of the Trust Legal Provisions and are 50 provided without warranty as described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 2. Base language features . . . . . . . . . . . . . . . . . . . 3 56 2.1. Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 3. Literal syntax . . . . . . . . . . . . . . . . . . . . . . . 3 58 3.1. Tag-oriented Literals . . . . . . . . . . . . . . . . . . 3 59 3.2. Regular Expression Literals . . . . . . . . . . . . . . . 3 60 3.3. Clarifications . . . . . . . . . . . . . . . . . . . . . 4 61 3.3.1. Err6527 . . . . . . . . . . . . . . . . . . . . . . . 4 62 3.3.2. Err6543 . . . . . . . . . . . . . . . . . . . . . . . 5 63 4. Controls . . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 4.1. Control operator .pcre . . . . . . . . . . . . . . . . . 6 65 4.2. Endianness in .bits . . . . . . . . . . . . . . . . . . . 6 66 4.3. .bitfield control . . . . . . . . . . . . . . . . . . . . 6 67 5. Co-occurrence Constraints . . . . . . . . . . . . . . . . . . 7 68 6. Module superstructure . . . . . . . . . . . . . . . . . . . . 8 69 6.1. Namespacing . . . . . . . . . . . . . . . . . . . . . . . 8 70 6.2. Cross-universe references . . . . . . . . . . . . . . . . 8 71 6.2.1. IANA references . . . . . . . . . . . . . . . . . . . 8 72 7. Alternative Representations . . . . . . . . . . . . . . . . . 9 73 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 74 9. Security considerations . . . . . . . . . . . . . . . . . . . 9 75 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 76 10.1. Normative References . . . . . . . . . . . . . . . . . . 9 77 10.2. Informative References . . . . . . . . . . . . . . . . . 10 78 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 11 79 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 11 81 1. Introduction 83 In defining the Concise Data Definition Language (CDDL), some 84 features have turned up that would be nice to have. In the interest 85 of completing this specification in a timely manner, the present 86 document was started to collect nice-to-have features that did not 87 make it into the first RFC for CDDL [RFC8610]. 89 It is now time to discuss thawing some of the concepts discussed 90 here. A number of additional proposals have been added. 92 There is always a danger for a document like this to become a 93 shopping list; the intention is to develop this document further 94 based on real-world experience with the first CDDL standard. 96 2. Base language features 98 2.1. Cuts 100 Section 3.5.4 of [RFC8610] alludes to a new language feature, _cuts_, 101 and defines it in a fashion that is rather focused on a single 102 application in the context of maps and generating better diagnostic 103 information about them. 105 The present document is expected to grow a more complete definition 106 of cuts, with the expectation that it will be upwards-compatible to 107 the existing one in [RFC8610], before this possibly becomes a 108 mainline language feature in a future version of CDDL. 110 3. Literal syntax 112 3.1. Tag-oriented Literals 114 Some CBOR tags often would be most natural to use in a CDDL spec with 115 a literal syntax that is tailored to their semantics instead of their 116 serialization in CBOR. There is currently no way to add such 117 syntaxes, no defined extension point either. 119 The text form of CoRAL [I-D.ietf-core-coral] defines literals of the 120 form 122 dt'2019-07-21T19:53Z' 124 for datetime items. (Similar advances should then probably be made 125 in diagnostic notation.) 127 3.2. Regular Expression Literals 129 Regular expressions currently are notated as strings in CDDL, with 130 all the string escaping rules applied once. It might be convenient 131 to have a more conventional literal format for regular expressions, 132 possibly also providing a place to add modifiers such as "/i". This 133 might also imply "text .regexp ...", which with the proposal in 134 Section 4.1 then raises the question of how to indicate the regular 135 expression flavor. 137 3.3. Clarifications 139 A number of errata reports have been made around some details of text 140 string and byte string literal syntax: [Err6527] and [Err6543]. 141 These need to be addressed by re-examining the details of these 142 literal syntaxes. Also, [Err6526] needs to be applied. 144 3.3.1. Err6527 146 The ABNF used in [RFC8610] for the content of text string literals is 147 rather permissive: 149 text = %x22 *SCHAR %x22 150 SCHAR = %x20-21 / %x23-5B / %x5D-7E / %x80-10FFFD / SESC 151 SESC = "\" (%x20-7E / %x80-10FFFD) 153 This allows almost any non-C0 character to be escaped by a backslash, 154 but critically misses out on the "\uXXXX" and "\uHHHH\uLLLL" forms 155 that JSON allows to specify characters in hex. Both can be solved by 156 updating the SESC production to: 158 SESC = "\" ( %x22 / "/" / "\" / ; \" \/ \\ 159 %x62 / %x66 / %x6E / %x72 / %x74 / ; \b \f \n \r \t 160 (%x75 hexchar) ) ; \u 161 hexchar = non-surrogate / (high-surrogate "\" %x75 low-surrogate) 162 non-surrogate = ((DIGIT / "A"/"B"/"C" / "E"/"F") 3HEXDIG) / 163 ("D" %x30-37 2HEXDIG ) 164 high-surrogate = "D" ("8"/"9"/"A"/"B") 2HEXDIG 165 low-surrogate = "D" ("C"/"D"/"E"/"F") 2HEXDIG 167 Now that SESC is more restrictively formulated, this also requires an 168 update to the BCHAR production used in the ABNF syntax for byte 169 string literals: 171 bytes = [bsqual] %x27 *BCHAR %x27 172 BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / CRLF 173 bsqual = "h" / "b64" 175 The updated version explicit allows "\'", which is no longer allowed 176 in the updated SESC: 178 BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / "\'" / CRLF 180 3.3.2. Err6543 182 The ABNF used in [RFC8610] for the content of byte string literals 183 lumps together byte strings notated as text with byte strings notated 184 in base16 (hex) or base64 (but see also updated BCHAR production 185 above): 187 bytes = [bsqual] %x27 *BCHAR %x27 188 BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / CRLF 190 Errata report 6543 proposes to handle the two cases in separate 191 productions (where, with an updated SESC, BCHAR obviously needs to be 192 updated as above): 194 bytes = %x27 *BCHAR %x27 195 / bsqual %x27 *QCHAR %x27 196 BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / CRLF 197 QCHAR = DIGIT / ALPHA / "+" / "/" / "-" / "_" / "=" / WS 199 This potentially causes a subtle change, which is hidden in the WS 200 production: 202 WS = SP / NL 203 SP = %x20 204 NL = COMMENT / CRLF 205 COMMENT = ";" *PCHAR CRLF 206 PCHAR = %x20-7E / %x80-10FFFD 207 CRLF = %x0A / %x0D.0A 209 This allows any non-C0 character in a comment, so this fragment 210 becomes possible: 212 foo = h' 213 43424F52 ; 'CBOR' 214 0A ; LF, but don't use CR! 215 ' 217 The current text is not unambiguously saying whether the three 218 apostrophes need to be escaped with a "\" or not, as in: 220 foo = h' 221 43424F52 ; \'CBOR\' 222 0A ; LF, but don\'t use CR! 223 ' 225 ... which would be supported by the existing ABNF in [RFC8610]. 227 4. Controls 229 Controls are the main extension point of the CDDL language. It is 230 relatively painless to add controls to CDDL. Several candidates have 231 been identified that aren't quite ready for adoption, of which one 232 shall be listed here. 234 4.1. Control operator .pcre 236 There are many variants of regular expression languages. 237 Section 3.8.3 of [RFC8610] defines the .regexp control, which is 238 based on XSD [XSD2] regular expressions. As discussed in that 239 section, the most desirable form of regular expressions in many cases 240 is the family called "Perl-Compatible Regular Expressions" ([PCRE]); 241 however, no formally stable definition of PCRE is available at this 242 time for normatively referencing it from an RFC. 244 The present document defines the control operator .pcre, which is 245 similar to .regexp, but uses PCRE2 regular expressions. More 246 specifically, a ".pcre" control indicates that the text string given 247 as a target needs to match the PCRE regular expression given as a 248 value in the control type, where that regular expression is anchored 249 on both sides. (If anchoring is not desired for a side, ".*" needs 250 to be inserted there.) 252 Similarly, ".es2018re" could be defined for ECMAscript 2018 regular 253 expressions with anchors added. 255 See also [I-D.draft-bormann-jsonpath-iregexp], which could be 256 specifically called out via ".iregexp" (even though ".regexp" as per 257 Section 3.8.3 of [RFC8610] would also have the same semantics). 259 4.2. Endianness in .bits 261 How useful would it be to have another variant of .bits that counts 262 bits like in RFC box notation? (Or at least per-byte? 32-bit words 263 don't always perfectly mesh with byte strings.) 265 4.3. .bitfield control 267 Provide a way to specify bitfields in byte strings and uints to a 268 higher level of detail than is possible with .bits. Strawman: 270 Field = uint .bitfield Fieldbits 272 Fieldbits = [ 273 flag1: [1, bool], 274 val: [4, Vals], 275 flag2: [1, bool], 276 ] 278 Vals = &(A: 0, B: 1, C: 2, D: 3) 280 Note that the group within the controlling array can have choices, 281 enabling the whole power of a context-free grammar (but not much 282 more). 284 5. Co-occurrence Constraints 286 While there are no co-occurrence constraints in CDDL, many actual use 287 cases can be addressed by using the fact that a group is a grammar: 289 postal = { 290 ( street: text, 291 housenumber: text) // 292 ( pobox: text .regexp "[0-9]+" ) 293 } 295 However, constraints that are not just structural/tree-based but are 296 predicates combining parts of the structure cannot be expressed: 298 session = { 299 timeout: uint, 300 } 302 other-session = { 303 timeout: uint .lt [somehow refer to session.timeout], 304 } 306 As a minimum, this requires the ability to reach over to other parts 307 of the tree in a control. Compare JSON Pointer [RFC6901] and JSON 308 Relative Pointer [I-D.handrews-relative-json-pointer]. Stefan 309 Goessner's jsonpath is a JSON variant of XPath that has not been 310 formally standardized [jsonpath]. 312 More generally, something akin to what Schematron is to Relax-NG may 313 be needed. 315 6. Module superstructure 317 CDDL rules could be packaged as modules and referenced from other 318 modules. There could be some control of namespace pollution, as well 319 as unambiguous referencing ("versioning"). 321 This is probably best achieved by a pragma-like syntax which could be 322 carried in CDDL comments, leaving each module to be valid CDDL (if 323 missing some rule definitions to be imported). 325 6.1. Namespacing 327 A convention for mapping CDDL-internal names to external ones could 328 be developed, possibly steered by some pragma-like constructs. 329 External names would likely be URI-based, with some conventions as 330 they are used in RDF or Curies. Internal names might look similar to 331 XML QNames. Note that the identifier character set for CDDL 332 deliberately includes $ and @, which could be used in such a 333 convention. 335 6.2. Cross-universe references 337 Often, a CDDL specfication needs to import from specifications in a 338 different language or platform. 340 6.2.1. IANA references 342 In many cases, CDDL specifications make use of values that are 343 specified in IANA registries. The ".iana" control operator can be 344 used to reference such a set of values. 346 The reference needs to be able to point to a draft, the registry of 347 which has not been established yet, as well as to an established IANA 348 registry. 350 An example of such a usage might be: 352 cose-algorithm = int .iana ["cose", "algorithms", "value"] 354 Unfortunately, the vocabulary employed in IANA registries has not 355 been designed for machine references. In this case, the potential 356 values would come from applying the XPath expression 358 //iana:registry[@id='algorithms']/iana:record/iana:value 359 to "https://www.iana.org/assignments/cose/cose.xml", plus some 360 filtering on the records returned that only leaves actual 361 allocations. Additional functionality may be needed for filtering 362 with respect to other columns of the registry record, e.g., 363 "" in the case of this example. 365 7. Alternative Representations 367 For CDDL, alternative representations e.g. in JSON (and thus in YAML) 368 could be defined, similar to the way YANG defines an XML-based 369 serialization called YIN in Section 11 of [RFC6020]. One proposal 370 for such a syntax is provided by the "cddlc" tool [cddlc]; this could 371 be written up and agreed upon. 373 cddlj = ["cddl", +rule] 374 rule = ["=" / "/=" / "//=", namep, type] 375 namep = ["name", id] / ["gen", id, +id] 376 id = text .regexp "[A-Za-z@_$](([-.])*[A-Za-z0-9@_$])*" 377 op = ".." / "..." / 378 text .regexp "\\.[A-Za-z@_$](([-.])*[A-Za-z0-9@_$])*" 379 namea = ["name", id] / ["gen", id, +type] 380 type = value / namea / ["op", op, type, type] / 381 ["map", group] / ["ary", group] / ["tcho", 2*type] / 382 ["unwrap", namea] / ["enum", group / namea] / 383 ["prim", ?(0..7, ?uint)] 384 group = ["mem", null/type, type] / 385 ["rep", uint, uint/false, group] / 386 ["seq", 2*group] / ["gcho", 2*group] 387 value = ["number"/"text"/"bytes", text] 389 8. IANA Considerations 391 This document makes no requests of IANA. 393 9. Security considerations 395 The security considerations of [RFC8610] apply. 397 10. References 399 10.1. Normative References 401 [RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data 402 Definition Language (CDDL): A Notational Convention to 403 Express Concise Binary Object Representation (CBOR) and 404 JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, 405 June 2019, . 407 10.2. Informative References 409 [cddlc] "CDDL conversion utilities", n.d., 410 . 412 [Err6526] "Errata Report 6526", RFC 8610, 413 . 415 [Err6527] "Errata Report 6527", RFC 8610, 416 . 418 [Err6543] "Errata Report 6543", RFC 8610, 419 . 421 [I-D.draft-bormann-jsonpath-iregexp] 422 Bormann, C., "I-Regexp: An Interoperable Regexp Format", 423 Work in Progress, Internet-Draft, draft-bormann-jsonpath- 424 iregexp-00, 12 May 2021, . 427 [I-D.handrews-relative-json-pointer] 428 Luff, G. and H. Andrews, "Relative JSON Pointers", Work in 429 Progress, Internet-Draft, draft-handrews-relative-json- 430 pointer-02, 18 September 2019, 431 . 434 [I-D.ietf-core-coral] 435 Hartke, K., "The Constrained RESTful Application Language 436 (CoRAL)", Work in Progress, Internet-Draft, draft-ietf- 437 core-coral-03, 9 March 2020, 438 . 441 [jsonpath] "jsonpath online evaluator", n.d., . 443 [PCRE] "Perl-compatible Regular Expressions (revised API: 444 PCRE2)", n.d., . 446 [RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for 447 the Network Configuration Protocol (NETCONF)", RFC 6020, 448 DOI 10.17487/RFC6020, October 2010, 449 . 451 [RFC6901] Bryan, P., Ed., Zyp, K., and M. Nottingham, Ed., 452 "JavaScript Object Notation (JSON) Pointer", RFC 6901, 453 DOI 10.17487/RFC6901, April 2013, 454 . 456 [XSD2] Biron, P. and A. Malhotra, "XML Schema Part 2: Datatypes 457 Second Edition", World Wide Web Consortium Recommendation 458 REC-xmlschema-2-20041028, 28 October 2004, 459 . 461 Acknowledgements 463 Many people have asked for CDDL to be completed, soon. These are 464 usually also the people who have brought up observations that led to 465 the proposals discussed here. Sean Leonard has campaigned for a 466 regexp literal syntax. 468 Author's Address 470 Carsten Bormann 471 Universität Bremen TZI 472 Postfach 330440 473 D-28359 Bremen 474 Germany 476 Phone: +49-421-218-63921 477 Email: cabo@tzi.org