idnits 2.17.1 draft-bormann-cbor-tags-oid-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 15 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. -- The draft header indicates that this document updates RFC7049, but the abstract doesn't seem to directly say this. It does mention RFC7049 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 28, 2016) is 2828 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 7049 (Obsoleted by RFC 8949) ** Obsolete normative reference: RFC 7230 (Obsoleted by RFC 9110, RFC 9112) Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Bormann 3 Internet-Draft Universitaet Bremen TZI 4 Updates: 7049 (if approved) S. Leonard 5 Intended status: Standards Track Penango, Inc. 6 Expires: January 29, 2017 July 28, 2016 8 Concise Binary Object Representation (CBOR) Tags and Techniques for 9 Object Identifiers, UUIDs, Enumerations, Binary Entities, Regular 10 Expressions, and Sets 11 draft-bormann-cbor-tags-oid-05 13 Abstract 15 The Concise Binary Object Representation (CBOR, RFC 7049) is a data 16 format whose design goals include the possibility of extremely small 17 code size, fairly small message size, and extensibility without the 18 need for version negotiation. 20 Useful tags and techniques have emerged since the publication of RFC 21 7049; the present document makes use of CBOR's built-in major types 22 to define and refine several useful constructs, without changing the 23 wire protocol. This document adds object identifiers (OIDs) to CBOR 24 with CBOR tags <> and <> [values TBD]. It is intended as the 25 reference document for the IANA registration of the CBOR tags so 26 defined. Useful techniques for enumerations and sets are presented 27 (without new tags). As the documentation for binary UUIDs (tag 37), 28 MIME entities (tag 36) and regular expressions (tag 35) RFC 7049 left 29 much out, this document provides more comprehensive specifications. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at http://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on January 29, 2017. 48 Copyright Notice 50 Copyright (c) 2016 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (http://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 66 2. Object Identifiers . . . . . . . . . . . . . . . . . . . . . 4 67 3. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 6 68 4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 8 69 5. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 8 70 6. A New Arc for Concise OIDs . . . . . . . . . . . . . . . . . 9 71 7. Tag Factoring and Tag Stacking with OID Arrays and Maps . . . 10 72 8. Applications and Examples of OIDs . . . . . . . . . . . . . . 13 73 9. Universally Unique Identifiers in CBOR . . . . . . . . . . . 16 74 10. Enumerations in CBOR . . . . . . . . . . . . . . . . . . . . 18 75 11. Binary Internet Messages and MIME Entities . . . . . . . . . 22 76 12. Applications and Examples of Messages and Entities . . . . . 25 77 13. X.690 Series Tags . . . . . . . . . . . . . . . . . . . . . . 25 78 14. Regular Expression Clarification . . . . . . . . . . . . . . 26 79 15. Set and Multiset Technique . . . . . . . . . . . . . . . . . 26 80 16. Fruits Basket Example . . . . . . . . . . . . . . . . . . . . 27 81 17. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 82 18. Security Considerations . . . . . . . . . . . . . . . . . . . 29 83 19. References . . . . . . . . . . . . . . . . . . . . . . . . . 30 84 Appendix A. Changes from -03 to -04 . . . . . . . . . . . . . . 32 85 Appendix B. Changes from -02 to -03 . . . . . . . . . . . . . . 33 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 88 1. Introduction 90 The Concise Binary Object Representation (CBOR, [RFC7049]) provides 91 for the interchange of structured data without a requirement for a 92 pre-agreed schema. RFC 7049 defines a basic set of data types, as 93 well as a tagging mechanism that enables extending the set of data 94 types supported via an IANA registry. 96 Useful tags and techniques have emerged since the publication of 97 [RFC7049]. This document makes use of CBOR's built-in major types to 98 provide for several useful constructs without changing the wire 99 protocol. 101 The original focus of this work was to add support for object 102 identifiers (OIDs, [X.660]), which many IETF protocols carry. The 103 ASN.1 Basic Encoding Rules (BER, [X.690]) specify the binary 104 encodings of both object identifiers and relative object identifiers. 105 The contents of these encodings can be carried in a CBOR byte string. 106 This document defines two CBOR tags that cover the two kinds of ASN.1 107 object identifiers encoded in this way. The tags can also be applied 108 to arrays and maps for more articulated identification purposes. It 109 is intended as the reference document for the IANA registration of 110 the tags so defined. To promote the use and usefulness of OIDs in 111 CBOR, a new arc is also proposed. 113 This document covers several useful techniques that have been or are 114 being developed as implementers are applying CBOR to practical 115 problems. Enumerations have found wide utility in CBOR, despite 116 CBOR's lack of a native enumerated type. A section covers the 117 advantages of choosing built-in types, with additional consideration 118 for using the newly-defined object identifier (OID) and universally 119 unique identifier (UUID) types in enumerations. CBOR also lacks a 120 native set type (in the mathematical sense of an arbitrary unordered 121 collection of items), but has a more powerful alternative in its 122 native map type. A section covers how to adapt the map type to 123 express set and multiset semantics. 125 Finally, this document covers the semantics of existing tags in 126 [RFC7049] that were somewhat underspecified. "Tag 36 is for MIME 127 messages", but the reference [RFC2045] actually defines a different 128 construct, the MIME entity, that finds expression in a variety of 129 message-oriented Internet protocols. Similarly, "Tag 35 is for 130 regular expressions", but the references to Perl Compatible Regular 131 Expressions (PCRE) and JavaScript syntax (ECMA-262) are not 132 compatible with each other. Two sections cover the subtleties of 133 items tagged with these tags, and so update [RFC7049] without 134 changing the basic CBOR wire protocol. One section enhances UUIDs. 136 1.1. Terminology 138 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 139 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 140 "OPTIONAL" in this document are to be interpreted as described in RFC 141 2119 [RFC2119]. 143 The terminology of RFC 7049 applies; in particular the term "byte" is 144 used in its now customary sense as a synonym for "octet". 146 2. Object Identifiers 148 The International Object Identifier tree [X.660] is a hierarchically 149 managed space of identifiers, each of which is uniquely represented 150 as a sequence of primary integer values [X.680]. While these 151 sequences can easily be represented in CBOR arrays of unsigned 152 integers, a more compact representation can often be achieved by 153 adopting the widely used representation of object identifiers defined 154 in BER; this representation may also be more amenable to processing 155 by other software making use of object identifiers. 157 BER represents the sequence of unsigned integers by concatenating 158 self-delimiting [RFC6256] representations of each of the primary 159 integer values in sequence. 161 ASN.1 distinguishes absolute object identifiers (ASN.1 Type 162 "OBJECT IDENTIFIER"), which begin at a root arc ([X.660] Clause 163 3.5.21), from relative object identifiers (ASN.1 Type "RELATIVE- 164 OID"), which begin relative to some object identifier known from 165 context ([X.680] Clause 3.8.63). As a special optimization, BER 166 combines the first two integers in an absolute object identifier into 167 one numeric identifier by making use of the property of the hierarchy 168 that the first arc has only three integer values (0, 1, and 2), and 169 the second arcs under 0 and 1 are limited to the integer values 170 between 0 and 39. (The root arc "joint-iso-itu-t(2)" has no such 171 limitations on its second arc.) If X and Y are the first two 172 integers, the single integer actually encoded is computed as: 174 X * 40 + Y 176 The inverse transformation (again making use of the known ranges of X 177 and Y) is applied when decoding the object identifier. 179 Since the semantics of absolute and relative object identifiers 180 differ, this specification defines two tags: 182 Tag <> (value TBD): tags a byte string as the [X.690] encoding of 183 an absolute object identifier (simply "object identifier" or "OID"). 185 Tag <> (value TBD): tags a byte string as the [X.690] encoding of 186 a relative object identifier (also "relative OID"). 188 2.1. Requirements on the byte string being tagged 190 A byte string tagged by <> or <> MUST be a syntactically valid 191 BER representation of an object identifier. Specifically: 193 o its first byte, and any byte that follows a byte that has the most 194 significant bit unset, MUST NOT be 0x80 (this requirement excludes 195 expressing the primary integer values with anything but the 196 shortest form) 198 o its last byte MUST NOT have the most significant bit set (this 199 requirement excludes an incomplete final primary integer value) 201 If either of these invalid conditions are encountered, they MUST be 202 treated as decoding errors. Comparing two OIDs or relative OIDs for 203 equality in a byte-for-byte fashion may not be safe before these 204 checks succeed on at least one of them (this includes the case where 205 one of them is a local constant); a process implementing an exclusion 206 list MUST check for decoding errors first. 208 [X.680] restricts RELATIVE-OID values to have at least one arc. This 209 specification permits empty relative object identifiers; they may 210 still be excluded by application semantics. 212 [RFC7049] permits byte strings to be indefinite-length, with chunks 213 divided at arbitrary byte boundaries. This contrasts with text 214 strings, where each chunk in an indefinite-length text string is 215 required be well-formed UTF-8 on its own: splitting the octets of a 216 UTF-8 character encoding between chunks is not allowed. 218 By analogy to this principle and to Clauses 8.9.1 and 8.20.1 of 219 [X.690], the byte strings carrying the OIDs and relative OIDs are 220 also to be treated as indivisible units: They MUST be encoded in 221 definite-length form; indefinite-length form is treated as an 222 encoding error (and the same considerations as above apply). (An 223 added convenience is that CBOR encodings can be searched through 224 efficiently for specific object identifiers without initiating the 225 decoding process.) 227 We provide "binary regular expression" forms for implementation 228 convenience. Unlike typical regular expressions that operate on 229 character sequences, the following regular expressions take bytes as 230 their domain, so they can be applied directly to CBOR byte strings. 232 For byte strings with tag <>: 234 "/^((?:[\x81-\xFF][\x80-\xFF]*)?[\x00-\x7F])+$/" 236 For byte strings with tag <>: 238 "/^((?:[\x81-\xFF][\x80-\xFF]*)?[\x00-\x7F])*$/" 240 Putative CBOR data that fails these tests SHALL be rejected as 241 improperly coded. 243 Another (possibly more efficient) way to validate the byte strings is 244 to hunt for prohibited patterns. 246 For byte strings with tag <>: 248 "/^$|(?:^|[\x00-\x7F])\x80|[\x80-\xFF]$/" 250 or with lookbehind: 252 "/^$|^\x80|(?<[\x00-\x7F])\x80|(?<[\x80-\xFF])$/" 254 For byte strings with tag <>: 256 "/(?:^|[\x00-\x7F])\x80|[\x80-\xFF]$/" 258 or with lookbehind: 260 "/^\x80|(?<[\x00-\x7F])\x80|(?<[\x80-\xFF])$/" 262 Putative CBOR data that passes these tests SHALL be rejected as 263 improperly coded. 265 (It is worth pointing out that these tests, when optimally 266 implemented, ought to be markedly faster than UTF-8 validation.) 268 3. Examples 270 In the following examples, we are using tag number 6 for <> and 271 tag number 7 for <>. See Section 17.2. 273 3.1. Encoding of the SHA-256 OID 275 ASN.1 Value Notation 276 { joint-iso-itu-t(2) country(16) us(840) organization(1) gov(101) 277 csor(3) nistalgorithm(4) hashalgs(2) sha256(1) } 279 Dotted Decimal Notation (also XML Value Notation) 280 2.16.840.1.101.3.4.2.1 281 06 # UNIVERSAL TAG 6 282 09 # 9 bytes, primitive 283 60 86 48 01 65 03 04 02 01 # X.690 Clause 8.19 284 # | 840 1 | 3 4 2 1 show component encoding 285 # 2.16 101 287 Figure 1: SHA-256 OID in BER 289 C6 # 0b110_00110: mt 6, tag 6 290 49 # 0b010_01001: mt 2, 9 bytes 291 60 86 48 01 65 03 04 02 01 # X.690 Clause 8.19 293 Figure 2: SHA-256 OID in CBOR 295 3.2. Encoding of a UUID OID 297 UUID 298 8b0d1a20-dcc5-11d9-bda9-0002a5d5c51b 300 ASN.1 Value Notation 301 { joint-iso-itu-t(2) uuid(25) 302 geomicaGPAS(184830721219540099336690027854602552603) } 304 Dotted Decimal Notation (also XML Value Notation) 305 2.25.184830721219540099336690027854602552603 307 06 # UNIVERSAL TAG 6 308 14 # 20 bytes, primitive 309 69 82 96 8D 8D 88 9B CC A8 C7 B3 BD D4 C0 80 AA AE D7 8A 1B 310 # | 184830721219540099336690027854602552603 311 # 2.25 313 Figure 3: UUID in an object identifier, in BER 315 C6 # 0b110_00110: mt 6, tag 6 316 54 # 0b010_10100: mt 2, 20 bytes 317 69 82 96 8D 8D 88 9B CC A8 C7 B3 BD D4 C0 80 AA AE D7 8A 1B 319 Figure 4: UUID in an object identifier, in CBOR 321 3.3. Encoding of a MIB Relative OID 323 Given some OID (e.g., "lowpanMib", assumed to be "1.3.6.1.2.1.226" 324 [RFC7388]), to which the following is added: 326 ASN.1 Value Notation (not suitable for diagnostic notation) 327 { lowpanObjects(1) lowpanStats(1) lowpanOutTransmits(29) } 328 Dotted Decimal Notation (diagnostic notation; see Section 5) 329 .1.1.29 331 0D # UNIVERSAL TAG 13 332 03 # 3 bytes, primitive 333 01 01 1D # X.690 Clause 8.20 334 # 1 1 29 show component encoding 336 Figure 5: MIB relative object identifier, in BER 338 C7 # 0b110_00110: mt 6, tag 7 339 43 # 0b010_01001: mt 2 (bstr), 3 bytes 340 01 01 1D # X.690 Clause 8.20 342 Figure 6: MIB relative object identifier, in CBOR 344 This relative OID saves seven bytes compared to the full OID 345 encoding. 347 4. Discussion 349 Staying close to the way object identifiers are encoded in ASN.1 BER 350 makes back-and-forth translation easy. Object identifiers in IETF 351 protocols are serialized in dotted decimal form or BER form, so there 352 is an advantage in not inventing a third form. Also, expectations of 353 the cost of encoding object identifiers are based on BER; using a 354 different encoding might not be aligned with these expectations. If 355 additional information about an OID is desired, lookup services such 356 as the OID Resolution Service (ORS) [X.672] and the OID Repository 357 [OID-INFO] are available. 359 This specification allocates two numbers out of the single-byte tag 360 space. This use of code point space is justified by the wide use of 361 object identifiers in data interchange. For most common OIDs in use 362 (namely those whose contents encode to less than 24 bytes), the CBOR 363 encoding will match the efficiency of [X.690]. (This preliminary 364 conclusion is likely to generate some discussion, see Section 17.2.) 366 5. Diagnostic Notation 368 Implementers will likely want to see OIDs and relative OIDs in their 369 "natural forms" (as sequences of decimal unsigned integers) for 370 diagnostic purposes. Accordingly, this section defines additional 371 syntactic elements that can be used in conjunction with the 372 diagnostic notation described in Section 6 of [RFC7049]. 374 An object identifier may be written in ASN.1 value notation (with 375 enclosing braces and secondary identifiers, ObjectIdentifierValue of 376 Clause 32.3 of [X.680]), or in dotted decimal notation with at least 377 three arcs. Both examples are shown in Section 3. The surrounding 378 tag notation is not to be used, because the tag is implied. The 379 ASN.1 value notation for OIDs does not overlap with JSON object 380 notation for CBOR maps, because at least two arcs are required for a 381 valid OID. 383 A relative object identifier may be written in dotted decimal 384 notation or in ASN.1 value notation, in both cases prefixed with a 385 dot as shown in Section 3.3. The surrounding tag notation is not to 386 be used, because the tag is implied. 388 The notation in this section may be employed in addition to the basic 389 notation, which would be a tagged binary string. 391 +------------------------------+--------------+------------+ 392 | RFC 7049 diagnostic notation | 6(h'2b0601') | 7(h'0601') | 393 +------------------------------+--------------+------------+ 394 | Dotted decimal notation | 1.3.6.1 | .6.1 | 395 | ASN.1 value notation | {1 3 6 1} | .{6 1} | 396 +------------------------------+--------------+------------+ 398 Table 1: Examples for extended diagnostic notation 400 6. A New Arc for Concise OIDs 402 Object identifiers in [X.690] form are remarkably compact. 403 Nevertheless, for some applications (and engineers), they are simply 404 not compact enough, at least when compared to certain alternatives 405 such as very small unsigned integers (see Section 10). The shortest 406 object identifier under the IETF's control is 1.3.6.1 (4 bytes), 407 although an assignment directly under that arc has not happened since 408 1999 [RFC2506], and no assignments directly under that arc have ever 409 been assigned directly to protocol elements. The shortest IETF- 410 controlled, First-Come, First-Served OID arc is 8 bytes by getting a 411 Private Enterprise Number from IANA, an OID for which is assigned 412 under 1.3.6.1.4.1. To promote object identifier usage in CBOR and to 413 make OIDs as competitive as possible, (the authors / the IETF / ISOC) 414 have secured a very short arc "{ x y z }" that only occupies (1, 2, 415 3) byte(s). 417 [[NB: Registration procedures under that arc.]] 419 The history of OIDs suggests that the human mind tends to excessive 420 taxonomy around them. "Excessive taxonomy" means that while 421 classifying purposes are served, the detailed taxonomy comes at the 422 expense of concise encoding to the point that other implementers 423 complain that the OIDs are "too long". OIDs also lose mnemonic 424 properties when the arcs are so long that implementers cannot keep 425 track of all of the divisions. Unlike assignments in the 1.3.6.1 426 range, this document suggests that registrants acquire OIDs under 427 this short arc "laterally" rather than hierarchically, in keeping 428 with CBOR's design goal to have concise serializations. 430 7. Tag Factoring and Tag Stacking with OID Arrays and Maps 432 A common use of object identifiers in ASN.1 is to identify the kind 433 of data in an open type (Clause 3.8.57 of [X.680]), using information 434 object classes [X.681]. CBOR is schema-neutral, and (although not 435 fully discussed in [RFC7049]) semantic tagging was originally 436 intended to identify items in a global, context-free way (i.e., where 437 a specification would not repurpose a tag with different semantics 438 than its IANA registration). Therefore, using OIDs to identify 439 contextual data in a similar fashion to [X.681] is RECOMMENDED. 441 7.1. Tag Factoring 443 <> and <> can tag CBOR arrays and maps. The idea is that the 444 tag is factored out from each individual byte string; the tag is 445 placed in front of the array or map instead. The tags <> and 446 <> are left-distributive. 448 When the <> or <> tag is applied to an array, it means that the 449 respective tag is imputed to all items in the array. For example, 450 when the array is tagged with <>, every array item that is a 451 binary string is an OID. 453 When the <> or <> tag is applied to a map, it means that the 454 respective tag is imputed to all keys in the map. The values in the 455 map are not considered specially tagged. 457 Array and map stacking is permitted. For example, a 3-dimensional 458 array of OIDs can be composed by using a single <> tag, followed 459 by an array of arrays of arrays of binary strings. All such binary 460 strings are considered OIDs. 462 7.2. Switching OID and Relative OID 464 If an individual item in a <> or <> tagged array, or an 465 individual key in a <> or <> tagged map, is tagged with the 466 opposite tag (<> or <>) of the array or map itself, that tag 467 cancels and replaces the outer tag for that item. Like tags MUST NOT 468 be used on such individual items; such tagging is a coding error. 469 For example, if <> is the outer tag on an array and <> is the 470 inner tag on a binary string, semantically the inner item is treated 471 as a regular OID, not as a relative OID. 473 The purpose is to create more compact and flexible identifier spaces, 474 especially when object identifiers are used as enumerated items. 475 Examples: 477 <> outside, <> inside: An implementation that strives for a 478 compact representation, does not have to emit base OID arcs 479 repeatedly for each item. At the same time, if a private 480 organization or standards body separate from the specification needs 481 to identify something that the specification maintainers disagree 482 with, the separate body does not need to request registration of an 483 identifier under a controlled arc (i.e., the base arc of the relative 484 OIDs). 486 <> outside, <> inside: A collection of OIDs is supposed to be 487 open to all-comers, but a certain set of OIDs issued under a 488 particular arc is foreseeable for the majority of implementations. 489 For example, an OID protocol slot may identify cryptographic 490 algorithms: anyone can write (and has written) an algorithm with an 491 arbitrary OID. However, the protocol slot designer may wish to 492 privilege certain algorithms (and therefore OIDs) that are well-known 493 in that field of use. 495 7.3. Tag Stacking 497 CBOR permits tag stacking (tagging a tagged item), although this 498 technique has not been used much yet. This specification anticipates 499 that OIDs and relative OIDs will be associated with values with 500 uniform semantics. This section provides specific semantics when 501 tags are "stacked", that is, a CBOR item starts with tag <> or 502 <>, followed by one or more arbitrary tags ("subsequent tags"), 503 followed by a map or array. 505 7.3.1. Map 507 The overall gist is that the first tag applies to the keys in a map; 508 the subsequent tags apply to the values in a map. 510 When <> or <> is the first tag in a stack of tags, followed by 511 a map: 513 o The <> or <> tag indicates that the keys of the map are byte 514 string OIDs, byte string relative OIDs, or tag-factored arrays or 515 maps of the same. 517 o The subsequent tags uniformly apply to all of the values. 519 For example, if tag 32 (URL) is the subsequent tag, then all values 520 in the map are treated semantically as if tag 32 is applied to them 521 individually. See Figure 7. 523 It is possible that individual values can be tagged. Semantically, 524 these tags cumulate with the outer subsequent tags; inner value tags 525 do not cancel or replace the outer tags. 527 7.3.2. Array 529 The overall gist is that the first tag applies to the ordered "keys" 530 in the array (even-numbered items, assuming that the index starts at 531 0); the subsequent tags apply to the ordered "values" in the array 532 (odd-numbered items). This tagging technique creates an ordered 533 associative array. [[NB: Some call this the FORTRAN approach. need 534 to cite]] 536 When <> or <> is the first tag in a stack of tags, followed by 537 an array: 539 o The <> or <> tag indicates that alternating items, starting 540 with the first item, are byte string OIDs, byte string relative 541 OIDs, or tag-factored arrays or maps of the same. 543 o The subsequent tags uniformly apply to the alternating items, 544 starting with the second item. 546 o The array MUST have an even number of items; an array that has an 547 odd number of items is a coding error. 549 To create an ordered associative array wherein the values (even 550 elements) are arbitrarily tagged, stack tag 55799, self-describe CBOR 551 (Section 2.4.5 of [RFC7049]), after the <> or <> tag. Tag 552 55799 imparts no special semantics, so it is an effective 553 placeholder. (This sequence is mainly provided for completeness: it 554 is a more compact alternative to an array of duple-arrays that each 555 contain an OID or relative OID, and an arbitrary value.) 557 7.4. Diagnostic Notation for OID Arrays and Maps 559 There are no syntactic changes to diagnostic notation beyond 560 Section 5. Using <> or <> with arrays and maps, however, leads 561 to some sublime results. 563 When an array or map is tagged, that item is embraced with the usual 564 tag format: "<>()" or "<>()". This syntax 565 indicates the presence of the tag on the outer item. Inner items in 566 the array or keys in the map are noted in Section 5 form, but are not 567 individually tagged on-the-wire when the tag is the same as the outer 568 tag, because like-tagging is a coding error. 570 An array or map that involves a stack of tags is notated the usual 571 way. For example, the CBOR diagnostic notation of a map of OIDs to 572 URIs is: 574 6(32({0.9.2342.7776.1: "http://example.com/", 575 0.9.2342.7776.2: "ftp://ftp.example.com/pub/"})) 577 Figure 7: Map of OIDs to URIs, in CBOR Diagnostic Diagnostic Notation 579 8. Applications and Examples of OIDs 581 8.1. GPU Farm 583 Consider a 3-dimensional OID array, indicating certain operations to 584 perform on a matrix of values in a GPU farm. Default operations are 585 under the OID arc 0.9.2342.7777 (such as .1, .2, .124, etc.); the arc 586 0.9.2342.7777 itself represents the identity operation. Certain 587 cryptographic operations like SHA-256 hashing 588 (2.16.840.1.101.3.4.2.1) are also permitted. The resulting notation 589 would be: 591 7([[[.1, .2, .3], 592 [.1, .2, .3], 593 [.1, .2, .3]], 594 [[.124, .125, .126], 595 [.95, .96, .97 ], 596 [.11, .12, .13 ]], 597 [[h'', .6, .4.2], 598 [.6, h'', .4.2], 599 [.6, 2.16.840.1.101.3.4.2.1, h'']]]) 601 Figure 8: GPU Farm Matrix Operations, in CBOR Diagnostic Notation 603 c7 # tag(7) 604 83 # array(3) 605 83 # array(3) 606 83 # array(3) 607 41 01 # .1 (2) 608 41 02 # .2 (2) 609 41 03 # .3 (2) 610 83 # array(3) 611 41 01 # .1 (2) 612 41 02 # .2 (2) 613 41 03 # .3 (2) 614 83 # array(3) 615 41 01 # .1 (2) 616 41 02 # .2 (2) 617 41 03 # .3 (2) 618 83 # array(3) 619 83 # array(3) 620 41 7c # .124 (2) 621 41 7d # .125 (2) 622 41 7e # .126 (2) 623 83 # array(3) 624 41 5f # .95 (2) 625 41 60 # .96 (2) 626 41 61 # .97 (2) 627 83 # array(3) 628 41 0b # .11 (2) 629 41 0c # .12 (2) 630 41 0d # .13 (2) 631 83 # array(3) 632 83 # array(3) 633 40 # (empty) (1) 634 41 06 # .6 (2) 635 42 0402 # .4.2 (3) 636 83 # array(3) 637 41 06 # .6 (2) 638 40 # (empty) (1) 639 42 0402 # .4.2 (3) 640 83 # array(3) 641 41 06 # .6 (2) 642 c6 49 608648016503040201 # 2.16.840.1.101.3.4.2.1 (10) 643 40 # (empty) (1) 645 Figure 9: GPU Farm Matrix Operations, in CBOR (76 bytes) 647 8.2. X.500 Distinguished Name 649 Consider the X.500 distinguished name: 651 +----------------------------------------------+--------------------+ 652 | Attribute Types | Attribute Values | 653 +----------------------------------------------+--------------------+ 654 | c (2.5.4.6) | US | 655 +----------------------------------------------+--------------------+ 656 | l (2.5.4.7) | Los Angeles | 657 | s (2.5.4.8) | CA | 658 | postalCode (2.5.4.17) | 90013 | 659 +----------------------------------------------+--------------------+ 660 | street (2.5.4.9) | 532 S Olive St | 661 +----------------------------------------------+--------------------+ 662 | businessCategory (2.5.4.15) | Public Park | 663 | buildingName (0.9.2342.19200300.100.1.48) | Pershing Square | 664 +----------------------------------------------+--------------------+ 666 Table 2: Example X.500 Distinguished Name 668 Table 2 has four RDNs. The country and street RDNs are single- 669 valued. The second and fourth RDNs are multi-valued. 671 The equivalent representations in CBOR diagnostic notation and CBOR 672 are: 674 6([{ 2.5.4.6: "US" }, 675 { 2.5.4.7: "Los Angeles", 2.5.4.8: "CA", 2.5.4.17: "90013" }, 676 { 2.5.4.9: "532 S Olive St" }, 677 { 2.5.4.15: "Public Park", 678 0.9.2342.19200300.100.1.48: "Pershing Square" }]) 680 Figure 10: Distinguished Name, in CBOR Diagnostic Notation 682 6([{ h'550406': "US" }, 683 { h'550407': "Los Angeles", h'550408': "CA", h'550411': "90013" }, 684 { h'550409': "532 S Olive St" }, 685 { h'55040f': "Public Park", 686 h'0992268993f22c640130': "Pershing Square" }]) 688 Figure 11: Distinguished Name, in CBOR Diagnostic Notation (RFC 7049 689 only) 691 c6 # tag(6) 692 84 # array(4) 693 a1 # map(1) 694 43 550406 # 2.5.4.6 (4) 695 62 # text(2) 696 5553 # "US" 697 a3 # map(3) 698 43 550407 # 2.5.4.7 (4) 699 6b # text(11) 700 4c6f7320416e67656c6573 # "Los Angeles" 701 43 550408 # 2.5.4.8 (4) 702 62 # text(2) 703 4341 # "CA" 704 43 550411 # 2.5.4.17 (4) 705 65 # text(5) 706 3930303133 # "90013" 707 a1 # map(1) 708 43 550409 # 2.5.4.9 (4) 709 6e # text(14) 710 3533322053204f6c697665205374 # "532 S Olive St" 711 a2 # map(2) 712 43 55040f # 2.5.4.15 (4) 713 6b # text(11) 714 5075626c6963205061726b # "Public Park" 715 4a 0992268993f22c640130 # 0.9.2342.19200300.100.1.48 (11) 716 6f # text(15) 717 5065727368696e6720537175617265 # "Pershing Square" 719 Figure 12: Distinguished Name, in CBOR (108 bytes) 721 (This example encoding assumes that all attribute values are UTF-8 722 strings, or can be represented as UTF-8 strings with no loss of 723 information.) 725 For reference, the [RFC4514] LDAP string encoding of such data would 726 be: 728 buildingName=Pershing Square+businessCategory=Public Park, 729 street=532 S Olive St,l=Los Angeles+postalCode=90013+st=CA,c=US 731 Figure 13: Distinguished Name, in LDAP String Encoding (121 bytes) 733 9. Universally Unique Identifiers in CBOR 735 This section provides guidance on the Universally Unique Identifier 736 (UUID) type, which was introduced into CBOR with tag <> (currently 737 tag 37, reassignment to be discussed in view of this section). A 738 UUID [RFC4122] is 128 bits long and requires no central registration 739 process. UUIDs were originally used in the Apollo Network Computing 740 System and later in the Open Software Foundation's (OSF) Distributed 741 Computing Environment (DCE), for Remote Procedure Calls (RPC) 742 [DCE-RPC]. 744 As a tagged binary string identifier type in CBOR, the UUID type 745 shares several characteristics with OID types. The main differences 746 are that a UUID is always 16 bytes (anything less or more is a coding 747 error), there is no central assignment process, and every 128-bit 748 combination is valid. ([RFC4122] calls out the nil UUID, which is 749 special but perfectly valid.) Optional registries have cropped up 750 over the years; one such registry is [OID-INFO]. Users who use UUIDs 751 in CBOR are strongly encouraged to document their UUIDs in such 752 registries. 754 To provide parity with OIDs, UUIDs MUST be encoded in definite-length 755 form (see Section 2). Consequently, individual UUIDs can be easily 756 searched for by looking for "d8 25" (major type 6, tag 37), "50" 757 (major type 2, additional information 16), and 16 bytes. Therefore, 758 a directly encoded UUID in CBOR occupies 19 bytes. In contrast, 759 stuffing a UUID in an OID in CBOR requires 22 bytes (see Figure 4); 760 conversion between OID-UUID form and binary or string UUID forms 761 requires bit-shifting (but mercifcully not base-shifting, see 762 Section 18.1). An example based on Figure 4 is below: 764 D8 25 # tag(37) 765 54 # 0b010_10000: mt 2, 16 bytes 766 8B 0D 1A 20 DC C5 11 D9 BD A9 00 02 A5 D5 C5 1B 768 Figure 14: Binary UUID in CBOR 770 9.1. Diagnostic Notation 772 Implementers will likely want to see UUIDs in their "natural forms" 773 for diagnostic purposes. Accordingly, this section defines 774 additional syntactic elements that can be used in conjunction with 775 the diagnostic notation described in Section 6 of [RFC7049]. 777 A universally unique identifier may be written in "string 778 representation" as that term is defined in [RFC4122]. An example of 779 such a string is "8b0d1a20-dcc5-11d9-bda9-0002a5d5c51b" (see Figure 4 780 and Figure 14). Lowercase is the preferred form. (TBD: permit, 781 require, or prohibit curly brace form?) 783 The notation in this section may be employed in addition to the basic 784 notation, which would be a tagged binary string. 786 9.2. Tag Factoring and Tag Stacking 788 Tag Factoring and Tag Stacking are hereby permitted with the UUID 789 type, with the same semantics as Section 7. 791 10. Enumerations in CBOR 793 This section provides a roadmap to using enumerated items in CBOR, 794 including design considerations for choosing between OIDs, UUIDs, 795 integers, and UTF-8 strings. 797 CBOR does not have an ENUMERATED type like ASN.1 to identify named 798 values in a protocol element with three or more states (Clause 20 and 799 Clause G.2.3 of [X.680]). ASN.1 ENUMERATED turns out to be 800 superfluous because ASN.1 INTEGER values can get named (and have 801 historically been used for finite, multistate variables, such as 802 version numbers), while ASN.1 ENUMERATED types can be defined to be 803 extensible with the ellipsis lexical item. Practically, the named 804 integers are not serialized in the binary encodings anyway; they 805 merely serve as a semantic hints for designers and debuggers. 807 CBOR expects that protocol designers will use one of the basic major 808 types for multistate variables, assigning semantics to particular 809 values using higher-level schemas. The obvious choices for the basic 810 types are integers (particularly unsigned integers) and UTF-8 811 strings. However, these major types are not without drawbacks. 813 Integers are compact for small values, but have a flat namespace so 814 there are mis-assignment and collision risks that can only be 815 mitigated with protocol-specific registries. Arrays of integers are 816 possible, but arrays require more processing logic for equality 817 comparisons, and the JSON conversion is not intuitive when the 818 enumerated value serves as a key in a map. 820 UTF-8 strings are less compact when the strings are supposed to 821 resemble their semantics, and there are normalization issues if the 822 strings contain characters beyond the ASCII range. UTF-8 strings 823 also comprise a flat namespace like integers unless the higher-level 824 schema employs delimiters, which makes the string even larger. If 825 conciseness is a design goal, other perceived advantages of a string 826 as an identifier are pretty much blown out the moment one has to tack 827 "https://" onto the front. 829 This section provides novel alternatives in OIDs and UUIDs. It 830 compares and contrasts these binary types to other enumerants, namely 831 integers and text (UTF-8) strings. 833 10.1. Factors Favoring OID Enumerations 835 A protocol designer might choose OIDs or relative OIDs for an 836 enumerated item in view of the following observations: 838 1. OIDs and relative OIDs are quite compact: a single-arc relative 839 OID encoded according to this specification occupies just two 840 bytes for primary integer values 0-127 (excluding the semantic 841 tag <>), and three bytes for primary integer values 128-16383. 842 (In contrast, an unsigned integer requires one byte for 0-23, two 843 bytes for 24-255, and three bytes for 256-65535.) 845 2. OIDs and relative OIDs (with base) are persistent and globally 846 unambiguous. 848 3. OIDs and relative OIDs have built-in semantics for designers and 849 debuggers. Specifically, the advent of universal OID 850 repositories such as [OID-INFO] makes it easy for a designer or 851 debugger to pull up useful information about the object of 852 interest (Clause 3.5.10 of [X.660]). This useful information 853 (for humans) does not have to bleed into the encoded 854 representation (for machines). 856 4. OIDs and relative OIDs are always compared for exact equality: no 857 need to deal with case folding, case sensitivity, or other 858 normalization issues. ("Overlong" encodings are PROHIBITED; 859 therefore overlong encodings MUST be treated as coding errors.) 861 5. OIDs and relative OIDs have a built-in hierarchy, so if 862 implementers want to extend an enumeration without assigning new 863 values "horizontally", they have the option of assigning new 864 values "vertically", possibly with more or less stringent 865 assignment rules. 867 6. Because OIDs and relative OIDs (with base) are part of the so- 868 called International Object Identifier tree [X.660], any other 869 protocol specification can reuse the enumeration if the designers 870 find it useful. 872 7. OIDs and relative OIDs have natural JSON representations in the 873 dotted decimal notations prescribed in Section 5. OIDs and 874 relative OIDs can be distinguished from each other by the 875 presence or absence of the leading dot ".". As the resulting 876 JSON string is entirely numeric in the ASCII range, case and 877 normalization are irrelevant to the comparison. (An object 878 identifier also has a semantic string representation in the form 879 of an OID-IRI [X.680], for those who really want that type of 880 thing.) 882 8. OIDs and relative OIDs are human language-neutral. A protocol 883 designer working in US-English might name an enumerated value 884 "sig" for "signature", but "sig" could also stand for 885 "significand", "signal", or "special interest group". In Swedish 886 and Norwegian, "sig" is a pronoun that means "himself, herself, 887 itself, one, them", etc.--an entirely different meaning. 889 10.2. Factors Favoring UUID Enumerations 891 A Universally Unique Identifier (UUID) is a 128-bit identifier that 892 is unique across both space and time with a very high degree of 893 probability; one intent is to identify "very persistent objects 894 across a network", such as remote procedure call interfaces 895 [DCE-RPC]. 897 A protocol designer might choose UUIDs for an enumerated item in view 898 of the following observations: 900 1. UUIDs are always 16 bytes. This means that while they are not 901 particularly short, they also cannot be overly long. Space is 902 constant and predictable. (As great as OIDs are, an OID that 903 exceeds 17 bytes is simply excessive compared to a randomly- 904 assigned UUID.) 906 2. Any 128-bit combination is a valid UUID. The other types in this 907 section have to be validated, even integers (e.g., to avoid 908 overflow and out-of-range conditions). 910 3. There is no registration authority that serves as a roadblock, 911 and (for all practical purposes) no semantic or aesthetic values 912 are implied by lower bit combinations. 914 4. Many platforms can compare UUIDs (128-bit values) in one atomic 915 operation. The comparison can be done without regard to 916 endianness, provided that the endianness is the same between two 917 UUIDs in memory. (On the wire, a CBOR UUID is big-endian.) For 918 this reason, UUIDs may be faster than (naive) integer 919 enumerations. 921 5. UUIDs have natural JSON representations in the string 922 representations prescribed by [RFC4122]. The resulting JSON 923 strings are entirely in the ASCII range and occupy exactly 36 924 characters; however, normalization (to lowercase) is a 925 complicating factor. 927 6. UUIDs are human language-neutral. (However, unlike OIDs, UUIDs 928 are too long to be described as mnemonic in any practical sense.) 930 10.3. Factors Favoring Integer Enumerations 932 A protocol designer might choose integers for an enumerated item in 933 view of the following observations: 935 1. The CBOR encoding of unsigned integers 0-23 is the most compact, 936 occupying exactly one byte (excluding any semantic tags). 938 2. A protocol designer may wish to prohibit extensibility as a 939 matter of course. Integers comprise a single flat namespace: 940 there is no hierarchy. 942 3. If greater range is desired while sticking to one byte, a 943 protocol designer may double the range of possible values by 944 allowing negative integers. However, enumerating values using 945 negative integers may have unintended side-effects, because some 946 programming environments (e.g., C/C++) make implementation- 947 defined assumptions about the number of bits needed for an 948 enumerated type. 950 10.4. Factors Favoring UTF-8 String Enumerations 952 A protocol designer might choose UTF-8 strings for an enumerated item 953 in view of the following observations: 955 1. A specification can practically limit the content of UTF-8 956 strings to the ASCII range (or narrower), mitigating some 957 normalization problems. 959 2. UTF-8 strings are easier to read on-the-wire for humans. 961 3. UTF-8 strings can contain arbitrary textual identifiers, which 962 can be hierarchical, e.g., URIs. 964 10.5. OID Enumeration Example 966 An enumerated item indicates the revision level of a data format. 967 Revision levels are issued by year, such as 2011, 2012, etc. 968 However, in the year 2013, two revisions were issued: the first one 969 and an important update in June that needs to be distinguished. The 970 revision levels are assigned to some OID arc: 972 "{2 25 6464646464 revs(4)}" 974 In this arc, the following sub-arcs are assigned: 976 +--------------------+ 977 | Sub-Arc | 978 +--------------------+ 979 | {v2011(1)} | 980 | {v2012(2)} | 981 | {v2013(3)} | 982 | {v2013(3) june(6)} | 983 | {v2014(4)} | 984 | {v2015(5)} | 985 +--------------------+ 987 Table 3: Example Sub-Arcs 989 In CBOR, the enumeration is encoded as a relative OID. The schema 990 specifies the base OID arc, which is omitted: 992 c7 # tag(7) 993 41 03 # .3 995 c7 # tag(7) 996 42 0306 # .3.6 998 Figure 15: Enumerated Items in CBOR 1000 .3 1001 .{v2013(3) june(6)} 1003 Figure 16: Enumerated Items in CBOR Diagnostic Notation 1005 ".3" 1006 ".3.6" 1008 Figure 17: Enumerated Items in JSON (possibility 1) 1010 "v2013" 1011 "v2013/june" 1013 Figure 18: Enumerated Items in JSON (possibility 2) 1015 11. Binary Internet Messages and MIME Entities 1017 Section 2.4.4.3 of [RFC7049] assigns tag 36 to "MIME messages 1018 (including all headers)" [RFC2045], and prescribes UTF-8 strings, 1019 without further elaboration. Actually MIME encircles several 1020 different formats, and is not limited to UTF-8 strings. This section 1021 updates tag 36. 1023 11.1. CBOR Byte String and Binary MIME 1025 Tag 36 is to be used with byte strings. When the tagged item is a 1026 byte string, any octet can be used in the content. Arbitrary octets 1027 are supported by [RFC2045] and can be supported in protocols such as 1028 SMTP using BINARYMIME [RFC3030]. 1030 A conforming implementation that purports to process tag 36-tagged 1031 items, MUST accept byte strings as well as UTF-8 strings. Byte 1032 strings, rather than UTF-8 strings, SHOULD be considered the default. 1033 (While binary Content-Transfer-Encoding is not particularly common as 1034 of this writing, 8-bit encoding is, and it is foreseeable that many 1035 8-bit encoded messages will still have charsets other than UTF-8.) 1037 11.2. Internet Messages, MIME Messages, and MIME Entities 1039 Definitions: "MIME message" is not explicitly defined in [RFC2045], 1040 but a careful read suggests that a MIME message is: "either a 1041 (complete or "top-level") RFC 822 message being transferred on a 1042 network, or a message encapsulated in a body of type "message/rfc822" 1043 or "message/partial"," that also contains MIME header fields, namely, 1044 MIME-Version field, which MUST be present (Section 4 of [RFC2045]. 1045 Other MIME header fields such as Content-Type and Content-Transfer- 1046 Encoding are assumed to be their [RFC2045] default values, if not 1047 present in the data. 1049 When the contents have a From field (a type of "originator address 1050 field") and a Date field (the lone "origination date field") 1051 (Section 3.6 of [RFC5322]), the item is concluded to have a Content- 1052 Type of message/rfc822 or message/global, as appropriate, except as 1053 otherwise specified in this section. 1055 (TBD: Do we need a separate tag for a MIME entity?) (Alternate 1056 proposal: When the tagged data does not include a MIME-Version field 1057 or other fields required by RFC822 (5322) (e.g., no From field), it 1058 is presumed to be a MIME entity, rather than a MIME message. 1059 Therefore, it has no top-level content-type: instead it is simply a 1060 "MIME entity", consisting of one element, whose Content-Type is the 1061 content of the Content-Type header field, if present, or the 1062 [RFC2045] default of "text/plain; charset=us-ascii", if absent. 1063 Content-Transfer-Encoding SHALL be assumed to be 8bit when the CBOR 1064 item is a UTF-8 string, and SHALL be assumed to be binary when the 1065 CBOR item is a byte string. (Or should all be considered CTE: 1066 binary?) And, when the tagged data has RFC822 required fields but no 1067 MIME-Version, shall we assume it's a MIME entity, or shall we assume 1068 it's an Internet message that does not conform to MIME?) 1069 Content that has no headers whatsoever is valid, and implementations 1070 that process tag 36 MUST permit this case: in such a case, the data 1071 starts with CRLF CRLF, followed by the body. In such a case, the 1072 content is assumed to be a MIME entity of Content-Type "text/plain; 1073 charset=us-ascii", and not an RFC822 (RFC5322) Internet message. 1074 (TBD: Confirm.) 1076 11.3. Netnews, HTTP, and SIP Messages 1078 Other message types that are MIME-related are message/news, message/ 1079 http, and message/sip. 1081 [RFC5537] specifies that message/news is deprecated (marked as 1082 obsolete) and that message/rfc822 SHOULD be used in its place; 1083 presumably this also extends to message/global over time. Netnews 1084 Article Format [RFC5536] is a strict subset of Internet Message 1085 Format; it can be detected by the presence of the six mandatory 1086 header fields: Date, From, Message-ID, Newsgroups, Path, and Subject. 1087 (Newsgroups and Path fields are specific to Netnews.) 1089 message/http [RFC7230] is the media type for HTTP requests and 1090 responses. It can be detected by analyzing the first line of the 1091 body, which is an HTTP Start Line (Section 3.1 of [RFC7230]): it does 1092 not conform to the syntax of an Internet Message Format header field. 1093 The optional parameter "msgtype" can be inferred from the Start Line. 1094 Implementers need to be aware that the default character encoding for 1095 message/http is ISO-8859-1, not UTF-8. Therefore, implementations 1096 SHOULD NOT encode HTTP messages with CBOR UTF-8 strings. 1098 Similarly, message/sip [RFC3261] is the media type of SIP request and 1099 response messages. It can be detected by analyzing the first line of 1100 the body, which is a SIP start-line (Section 7.1 of [RFC3261]): it 1101 does not conform to the syntax of an Internet Message Format header 1102 field. The optional parameter can be inferred from the start-line. 1104 11.4. Other Messages 1106 The CBOR binary or UTF-8 string MAY contain other types of messages. 1107 An implementation MAY send such a message as a MIME entity with the 1108 Content-Type field appropriately set, or alternatively, MAY send the 1109 message at the top-level directly. However, if a purported message 1110 type is ambiguous with a message/rfc822 (or message/global) message, 1111 a receiver SHALL treat the message as message/rfc822 (or message/ 1112 global). If a purported message type is ambiguous with a MIME entity 1113 (and unambiguously not message/rfc822 or message/global), a receiver 1114 SHALL treat the message as a MIME entity. 1116 12. Applications and Examples of Messages and Entities 1118 Tag 36 is the RECOMMENDED way to convey data with MIME-related 1119 metadata, including messages (which may or may not actually be MIME- 1120 enabled) and MIME entities. 1122 Example 1: A legacy RFC822 message is encoded as a UTF-8 string or 1123 byte string with tag 36. The contents have From, To, Date, and 1124 Subject header fields, two CRLFs, and a single line "Hello World!", 1125 terminated with a CRLF. 1127 Example 2a: A [RFC5280] certificate is encoded as a byte string with 1128 tag 36. The contents are comprised of "Content-Type: application/ 1129 pkix-cert", two CRLFs, and the DER encoding of the certificate. (The 1130 "Content-Transfer-Encoding: binary" header is not necessary.) 1132 Example 2b: A [RFC5280] certificate is encoded as a UTF-8 string or 1133 byte string with tag 36. The contents are comprised of "Content- 1134 Type: application/pkix-cert", a CRLF, "Content-Transfer-Encoding: 1135 base64", two CRLFs, and the base64 encoding of the DER encoding of 1136 the certificate, conforming to Section 6.8 of [RFC2045]. In 1137 particular, base64 lines are limited to 76 characters, separated by 1138 CRLF, and the final line is supposed to end with CRLF. Needless to 1139 say, this is not nearly as efficient as Example 2a. 1141 13. X.690 Series Tags 1143 [[NB: Carsten probably won't like this. Plan on removing this 1144 section. It is mainly provided to contrast with Section 10.]] 1146 It is foreseeable that CBOR applications will need to send and 1147 receive ASN.1 data, for example, for legacy or security applications. 1148 While a native representation in CBOR is preferred, preserving the 1149 data in an ASN.1 encoding may be necessary, for example, to preserve 1150 cryptographic verification. A tag <> is allocated for this 1151 purpose. 1153 When the tagged item is a byte string, the byte string contents are 1154 encoded according to [X.690], i.e., BER, CER, or DER. CBOR 1155 implementations are not required to validate conformance of the 1156 contained data to [X.690]. 1158 When the tagged item is an array with 3 items: 1160 1. The first item SHALL be an OID (with tag <> omitted; it SHALL 1161 NOT be a relative OID), indicating the ASN.1 module containing 1162 the type of the PDU. [[NB: this is a good example of a non- 1163 trivial structure in which an element is well-defined to be an 1164 OID, which has a tag. Is the CBOR philosophy to tag the item, or 1165 omit the tag on the item, when the item's semantics are already 1166 fixed by the outer tag? Similar situations can apply to tag 32 1167 (URI), etc.]] 1169 2. The second item SHALL be a UTF-8 string indicating the ASN.1 1170 value's _type reference name_ (Clause 3.8.88 of [X.680]) 1171 conforming to the "typereference" production (Clause 12.2 of 1172 [X.680]). 1174 3. The third item SHALL be a byte string, whose contents are encoded 1175 per the prior paragraph. 1177 (TBD: Use of tagged UTF-8 string is reserved for ASN.1 textual 1178 formats such as XER and ASN.1 value notation? Probably not 1179 necessary. Just omit.) 1181 Implementation note: DER-encoded items are always definite-length, so 1182 there is very little reason to use CBOR byte string indefinite 1183 encoding when encoding such DER-encoded items. 1185 Example: A [RFC5280] certificate can be encoded: 1187 1. as a byte string with tag <>, or 1189 2. as an array with tag <>, with three elements: 1191 (1) a byte string "h'2B 06 01 05 05 07 00 12'", which is the BER 1192 encoding of 1.3.6.1.5.5.7.0.18, 1194 (2) a UTF-8 string "Certificate", and 1196 (3) a byte string containing the DER encoding of the 1197 certificate. 1199 14. Regular Expression Clarification 1201 (TODO: better specify conformance to actual regular expression 1202 standards with tag 35. PCRE and JavaScript/ECMAScript regular 1203 expressions are very different; [RFC7049] is not specific enough 1204 about this.) 1206 15. Set and Multiset Technique 1208 CBOR has no native type for a set, which is an arbitrary unordered 1209 collection of items. The following technique is RECOMMENDED to 1210 express set and multiset semantics concisely in native CBOR data. 1212 In computer science, a _set_ is a collection of distinct items; there 1213 is no ordering to the items. Thus, implementations can optimize set 1214 storage in many ways that are not available with ordered elements in 1215 arrays. Sets can be stored in hashtables, bit fields, trees, or 1216 other abstract data types. 1218 In computer science, a _multiset_ allows multiple instances of a 1219 set's elements. Put another way, each distinct item has a 1220 cardinality property indicating the number of these items in the 1221 multiset. 1223 To store items in a set or multiset, it is RECOMMENDED to store the 1224 CBOR items as keys in a map; the values SHALL all be positive 1225 integers (major type 0, value/additional information greater than or 1226 equal to 1). In the special case of a set, the values SHALL be the 1227 integer 1. This technique has no special tag associated with it. As 1228 with arrays that schemas classify as "records" (i.e., arrays with 1229 positionally defined elements), schemas are likewise free to classify 1230 maps as sets in particular instances. 1232 16. Fruits Basket Example 1234 Consider a basket of fruits. The basket can contain any number of 1235 fruits; each fruit of the same species is considered identical. This 1236 basket has two apples, four bananas, six pears, and one pineapple: 1238 {"\u{1F34E}": 2, "\u{1F34C}": 4, 1239 "\u{1F350}": 6, "\u{1F34D}": 1} 1241 Figure 19: Fruits Basket in CBOR Diagnostic Notation 1243 A4 # map(4) 1244 64 # text(4) 1245 f09f8d8e # "\u{1F34E}" 1246 02 # unsigned(2) 1247 64 # text(4) 1248 f09f8d8c # "\u{1F34C}" 1249 04 # unsigned(4) 1250 64 # text(4) 1251 f09f8d90 # "\u{1F350}" 1252 06 # unsigned(6) 1253 64 # text(4) 1254 f09f8d8d # "\u{1F34D}" 1255 01 # unsigned(1) 1257 Figure 20: Fruits Basket in CBOR (33 bytes) 1259 [[TODO: Consider a Merkle Tree example: set of sets of sets of sets 1260 of things. ???]] 1262 17. IANA Considerations 1264 (This section to be edited by the RFC editor.) 1266 17.1. CBOR Tags 1268 IANA is requested to assign the CBOR tags in Table 4, with the 1269 present document as the specification reference. 1271 +----------+-------------+------------------------------------------+ 1272 | Tag | Data Item | Semantics | 1273 +----------+-------------+------------------------------------------+ 1274 | 6<> | multiple | object identifier (BER encoding) | 1275 | 7<> | multiple | relative object identifier (BER | 1276 | | | encoding) | 1277 +----------+-------------+------------------------------------------+ 1279 Table 4: Values for New Tags 1281 17.2. Discussion 1283 (This subsection to be removed by the RFC editor.) 1285 The space for single-byte tags in CBOR (0..23) is severely limited. 1286 It is not clear that the benefits of encoding OIDs/relative OIDs with 1287 one less byte per instance outweigh the consumption of two values in 1288 this code point space. 1290 Procedurally, this space is also reserved for standards action. 1292 An alternative would be to go for the specification required space, 1293 e.g. tag number 40 for <> and tag number 41 for <>. As an 1294 example this would change Figure 2 into: 1296 d8 28 # tag(40) 1297 49 # bytes(9) 1298 60 86 48 01 65 03 04 02 01 # 1300 Figure 21: SHA-256 OID in cbor (using specification required tag) 1302 17.3. Pre-Existing Tags 1304 (TODO: complete.) IANA is requested to modify the registrations for 1305 the following CBOR tags: 1307 +-----+-------------+----------------------------+ 1308 | Tag | Data Item | Semantics | 1309 +-----+-------------+----------------------------+ 1310 | 35 | <> | regular expression <> | 1311 | 36 | multiple | message or MIME entity | 1312 | 37 | multiple | binary UUID | 1313 +-----+-------------+----------------------------+ 1315 Table 5: Values for Existing Tags 1317 17.4. New Tags 1319 (TODO: complete.) 1321 18. Security Considerations 1323 The security considerations of RFC 7049 apply. 1325 The encodings in Clauses 8.19 and 8.20 of [X.690] are extremely 1326 compact and unambiguous, but MUST be followed precisely to avoid 1327 security pitfalls. In particular, the requirements set out in 1328 Section 2.1 of this document need to be followed; otherwise, an 1329 attacker may be able to subvert a checking process by submitting 1330 alternative representations that are later taken as the original (or 1331 even something else entirely) by another decoder supposed to be 1332 protected by the checking process. 1334 OIDs and relative OIDs can always be treated as opaque byte strings. 1335 Actually understanding the structure that was used for generating 1336 them is not necessary, and, except for checking the structure 1337 requirements, it is strongly NOT RECOMMENDED to perform any 1338 processing of this kind (e.g., converting into dotted notation and 1339 back) unless absolutely necessary. If the OIDs are translated into 1340 other representations, the usual security considerations for non- 1341 trivial representation conversions apply; the primary integer values 1342 are unlimited in range (cf. Figure 4). 1344 18.1. Conversions Between BER and Dotted Decimal Notation 1346 [PKILCAKE] uncovers exploit vectors for the illegal values above, as 1347 well as for cases in which conversion to or from the dotted decimal 1348 notation goes awry. Neither [X.660] nor [X.680] place an upper bound 1349 on the range of unsigned integer values for an arc; the integers are 1350 arbitrarily valued. An implementation SHOULD NOT attempt to convert 1351 each component using a fixed-size accumulator, as an attacker will 1352 certainly be able to cause the accumulator to overflow. Compact and 1353 efficient techniques for such conversions, such as the double dabble 1354 algorithm [DOUBLEDABBLE] are well-known in the art; their application 1355 to this field is left as an exercise to the reader. 1357 19. References 1359 19.1. Normative References 1361 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 1362 Extensions (MIME) Part One: Format of Internet Message 1363 Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996, 1364 . 1366 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1367 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/ 1368 RFC2119, March 1997, 1369 . 1371 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1372 A., Peterson, J., Sparks, R., Handley, M., and E. 1373 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1374 DOI 10.17487/RFC3261, June 2002, 1375 . 1377 [RFC4122] Leach, P., Mealling, M., and R. Salz, "A Universally 1378 Unique IDentifier (UUID) URN Namespace", RFC 4122, DOI 1379 10.17487/RFC4122, July 2005, 1380 . 1382 [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322, DOI 1383 10.17487/RFC5322, October 2008, 1384 . 1386 [RFC5536] Murchison, K., Ed., Lindsey, C., and D. Kohn, "Netnews 1387 Article Format", RFC 5536, DOI 10.17487/RFC5536, November 1388 2009, . 1390 [RFC5537] Allbery, R., Ed. and C. Lindsey, "Netnews Architecture and 1391 Protocols", RFC 5537, DOI 10.17487/RFC5537, November 2009, 1392 . 1394 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 1395 Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, 1396 October 2013, . 1398 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 1399 Protocol (HTTP/1.1): Message Syntax and Routing", RFC 1400 7230, DOI 10.17487/RFC7230, June 2014, 1401 . 1403 [X.660] International Telecommunications Union, "Information 1404 technology -- Procedures for the operation of object 1405 identifier registration authorities: General procedures 1406 and top arcs of the international object identifier tree", 1407 ITU-T Recommendation X.660, July 2011. 1409 [X.680] International Telecommunications Union, "Information 1410 technology -- Abstract Syntax Notation One (ASN.1): 1411 Specification of basic notation", ITU-T Recommendation 1412 X.680, August 2015. 1414 [X.690] International Telecommunications Union, "Information 1415 technology -- ASN.1 encoding rules: Specification of Basic 1416 Encoding Rules (BER), Canonical Encoding Rules (CER) and 1417 Distinguished Encoding Rules (DER)", ITU-T Recommendation 1418 X.690, August 2015. 1420 19.2. Informative References 1422 [DCE-RPC] Open Group CAE, "DCE: Remote Procedure Call", 1423 Specification C309, ISBN 1-85912-041-5, August 1994. 1425 [DOUBLEDABBLE] 1426 Gao, S., Al-Khalili, D., and N. Chabini, "An improved BCD 1427 adder using 6-LUT FPGAs", IEEE 10th International New 1428 Circuits and Systems Conference (NEWCAS 2012), pp. 13-16, 1429 DOI: 10.1109/NEWCAS.2012.6328944, June 2012. 1431 [OID-INFO] 1432 Orange SA, "OID Repository", 2016, 1433 . 1435 [PKILCAKE] 1436 Kaminsky, D., Patterson, M., and L. Sassaman, "PKI Layer 1437 Cake: New Collision Attacks Against the Global X.509 1438 Infrastructure", FC 2010, Lecture Notes in Computer 1439 Science 6052 289-303, DOI: 10.1007/978-3-642-14577-3_22, 1440 January 2010, . 1442 [RFC2506] Holtman, K., Mutz, A., and T. Hardie, "Media Feature Tag 1443 Registration Procedure", BCP 31, RFC 2506, DOI 10.17487/ 1444 RFC2506, March 1999, 1445 . 1447 [RFC3030] Vaudreuil, G., "SMTP Service Extensions for Transmission 1448 of Large and Binary MIME Messages", RFC 3030, DOI 1449 10.17487/RFC3030, December 2000, 1450 . 1452 [RFC4514] Zeilenga, K., Ed., "Lightweight Directory Access Protocol 1453 (LDAP): String Representation of Distinguished Names", RFC 1454 4514, DOI 10.17487/RFC4514, June 2006, 1455 . 1457 [RFC5280] Cooper, D., Santesson, S., Farrell, S., Boeyen, S., 1458 Housley, R., and W. Polk, "Internet X.509 Public Key 1459 Infrastructure Certificate and Certificate Revocation List 1460 (CRL) Profile", RFC 5280, DOI 10.17487/RFC5280, May 2008, 1461 . 1463 [RFC6256] Eddy, W. and E. Davies, "Using Self-Delimiting Numeric 1464 Values in Protocols", RFC 6256, DOI 10.17487/RFC6256, May 1465 2011, . 1467 [RFC7388] Schoenwaelder, J., Sehgal, A., Tsou, T., and C. Zhou, 1468 "Definition of Managed Objects for IPv6 over Low-Power 1469 Wireless Personal Area Networks (6LoWPANs)", RFC 7388, DOI 1470 10.17487/RFC7388, October 2014, 1471 . 1473 [X.672] International Telecommunications Union, "Information 1474 technology -- Open systems interconnection -- Object 1475 identifier resolution system", ITU-T Recommendation X.672, 1476 August 2010. 1478 [X.681] International Telecommunications Union, "Information 1479 technology -- Abstract Syntax Notation One (ASN.1): 1480 Information object specification", ITU-T Recommendation 1481 X.681, August 2015. 1483 Appendix A. Changes from -03 to -04 1485 Changes occurred based on limited feedback, mainly centered around 1486 the abstract and introduction, rather than substantive technical 1487 changes. These changes include: 1489 o Changed the title so that it is about tags and techniques. 1491 o Rewrote the abstract to describe the content more accurately, and 1492 to point out that no changes to the wire protocol are being 1493 proposed. 1495 o Removed "ASN.1" from "object identifiers", as OIDs are independent 1496 of ASN.1. 1498 o Rewrote the introduction to be more about the present text. 1500 o Proposed a concise OID arc. 1502 o Provided binary regular expression forms for OID validation. 1504 o Updated IANA registration tables. 1506 Appendix B. Changes from -02 to -03 1508 Many significant changes occurred in this version. These changes 1509 include: 1511 o Expanded the draft scope to be a comprehensive CBOR update. 1513 o Added OID-related sections: OID Enumerations, OID Maps and Arrays, 1514 and Applications and Examples of OIDs. 1516 o Added Tag 36 update (binary MIME, better definitions). 1518 o Added stub/experimental sections for X.690 Series Tags (tag <>) 1519 and Regular Expressions (tag 35). 1521 o Added technique for representing sets and multisets. 1523 o Added references and fixed typos. 1525 Authors' Addresses 1527 Carsten Bormann 1528 Universitaet Bremen TZI 1529 Postfach 330440 1530 Bremen D-28359 1531 Germany 1533 Phone: +49-421-218-63921 1534 Email: cabo@tzi.org 1536 Sean Leonard 1537 Penango, Inc. 1538 5900 Wilshire Boulevard 1539 21st Floor 1540 Los Angeles, CA 90036 1541 USA 1543 Email: dev+ietf@seantek.com 1544 URI: http://www.penango.com/