idnits 2.17.1 draft-bormann-cbor-tags-oid-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 15 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. -- The draft header indicates that this document updates RFC7049, but the abstract doesn't seem to directly say this. It does mention RFC7049 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 13, 2017) is 2563 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 7049 (Obsoleted by RFC 8949) ** Obsolete normative reference: RFC 7230 (Obsoleted by RFC 9110, RFC 9112) Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Bormann 3 Internet-Draft Universitaet Bremen TZI 4 Updates: 7049 (if approved) S. Leonard 5 Intended status: Standards Track Penango, Inc. 6 Expires: September 14, 2017 March 13, 2017 8 Concise Binary Object Representation (CBOR) Tags and Techniques for 9 Object Identifiers, UUIDs, Enumerations, Binary Entities, Regular 10 Expressions, and Sets 11 draft-bormann-cbor-tags-oid-06 13 Abstract 15 The Concise Binary Object Representation (CBOR, RFC 7049) is a data 16 format whose design goals include the possibility of extremely small 17 code size, fairly small message size, and extensibility without the 18 need for version negotiation. 20 Useful tags and techniques have emerged since the publication of RFC 21 7049; the present document makes use of CBOR's built-in major types 22 to define and refine several useful constructs, without changing the 23 wire protocol. This document adds object identifiers (OIDs) to CBOR 24 with CBOR tags <> and <> [values TBD]. It is intended as the 25 reference document for the IANA registration of the CBOR tags so 26 defined. Useful techniques for enumerations and sets are presented 27 (without new tags). As the documentation for binary UUIDs (tag 37), 28 MIME entities (tag 36) and regular expressions (tag 35) RFC 7049 left 29 much out, this document provides more comprehensive specifications. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at http://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on September 14, 2017. 48 Copyright Notice 50 Copyright (c) 2017 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (http://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 66 2. Object Identifiers . . . . . . . . . . . . . . . . . . . . . 4 67 3. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 6 68 4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 8 69 5. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 8 70 6. A New Arc for Concise OIDs . . . . . . . . . . . . . . . . . 9 71 7. Tag Factoring and Tag Stacking with OID Arrays and Maps . . . 10 72 8. Applications and Examples of OIDs . . . . . . . . . . . . . . 13 73 9. Universally Unique Identifiers in CBOR . . . . . . . . . . . 16 74 10. Enumerations in CBOR . . . . . . . . . . . . . . . . . . . . 18 75 11. Binary Internet Messages and MIME Entities . . . . . . . . . 22 76 12. Applications and Examples of Messages and Entities . . . . . 25 77 13. X.690 Series Tags . . . . . . . . . . . . . . . . . . . . . . 25 78 14. Regular Expression Clarification . . . . . . . . . . . . . . 26 79 15. Set and Multiset Technique . . . . . . . . . . . . . . . . . 26 80 16. Fruits Basket Example . . . . . . . . . . . . . . . . . . . . 27 81 17. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 82 18. Security Considerations . . . . . . . . . . . . . . . . . . . 29 83 19. References . . . . . . . . . . . . . . . . . . . . . . . . . 30 84 Appendix A. Changes from -05 to -06 . . . . . . . . . . . . . . 32 85 Appendix B. Changes from -04 to -05 . . . . . . . . . . . . . . 32 86 Appendix C. Changes from -03 to -04 . . . . . . . . . . . . . . 32 87 Appendix D. Changes from -02 to -03 . . . . . . . . . . . . . . 33 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 90 1. Introduction 92 The Concise Binary Object Representation (CBOR, [RFC7049]) provides 93 for the interchange of structured data without a requirement for a 94 pre-agreed schema. RFC 7049 defines a basic set of data types, as 95 well as a tagging mechanism that enables extending the set of data 96 types supported via an IANA registry. 98 Useful tags and techniques have emerged since the publication of 99 [RFC7049]. This document makes use of CBOR's built-in major types to 100 provide for several useful constructs without changing the wire 101 protocol. 103 The original focus of this work was to add support for object 104 identifiers (OIDs, [X.660]), which many IETF protocols carry. The 105 ASN.1 Basic Encoding Rules (BER, [X.690]) specify the binary 106 encodings of both object identifiers and relative object identifiers. 107 The contents of these encodings can be carried in a CBOR byte string. 108 This document defines two CBOR tags that cover the two kinds of ASN.1 109 object identifiers encoded in this way. The tags can also be applied 110 to arrays and maps for more articulated identification purposes. It 111 is intended as the reference document for the IANA registration of 112 the tags so defined. To promote the use and usefulness of OIDs in 113 CBOR, a new arc is also proposed. 115 This document covers several useful techniques that have been or are 116 being developed as implementers are applying CBOR to practical 117 problems. Enumerations have found wide utility in CBOR, despite 118 CBOR's lack of a native enumerated type. A section covers the 119 advantages of choosing built-in types, with additional consideration 120 for using the newly-defined object identifier (OID) and universally 121 unique identifier (UUID) types in enumerations. CBOR also lacks a 122 native set type (in the mathematical sense of an arbitrary unordered 123 collection of items), but has a more powerful alternative in its 124 native map type. A section covers how to adapt the map type to 125 express set and multiset semantics. 127 Finally, this document covers the semantics of existing tags in 128 [RFC7049] that were somewhat underspecified. "Tag 36 is for MIME 129 messages", but the reference [RFC2045] actually defines a different 130 construct, the MIME entity, that finds expression in a variety of 131 message-oriented Internet protocols. Similarly, "Tag 35 is for 132 regular expressions", but the references to Perl Compatible Regular 133 Expressions (PCRE) and JavaScript syntax (ECMA-262) are not 134 compatible with each other. Two sections cover the subtleties of 135 items tagged with these tags, and so update [RFC7049] without 136 changing the basic CBOR wire protocol. One section enhances UUIDs. 138 1.1. Terminology 140 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 141 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 142 "OPTIONAL" in this document are to be interpreted as described in RFC 143 2119 [RFC2119]. 145 The terminology of RFC 7049 applies; in particular the term "byte" is 146 used in its now customary sense as a synonym for "octet". 148 2. Object Identifiers 150 The International Object Identifier tree [X.660] is a hierarchically 151 managed space of identifiers, each of which is uniquely represented 152 as a sequence of primary integer values [X.680]. While these 153 sequences can easily be represented in CBOR arrays of unsigned 154 integers, a more compact representation can often be achieved by 155 adopting the widely used representation of object identifiers defined 156 in BER; this representation may also be more amenable to processing 157 by other software making use of object identifiers. 159 BER represents the sequence of unsigned integers by concatenating 160 self-delimiting [RFC6256] representations of each of the primary 161 integer values in sequence. 163 ASN.1 distinguishes absolute object identifiers (ASN.1 Type 164 "OBJECT IDENTIFIER"), which begin at a root arc ([X.660] Clause 165 3.5.21), from relative object identifiers (ASN.1 Type "RELATIVE- 166 OID"), which begin relative to some object identifier known from 167 context ([X.680] Clause 3.8.63). As a special optimization, BER 168 combines the first two integers in an absolute object identifier into 169 one numeric identifier by making use of the property of the hierarchy 170 that the first arc has only three integer values (0, 1, and 2), and 171 the second arcs under 0 and 1 are limited to the integer values 172 between 0 and 39. (The root arc "joint-iso-itu-t(2)" has no such 173 limitations on its second arc.) If X and Y are the first two 174 integers, the single integer actually encoded is computed as: 176 X * 40 + Y 178 The inverse transformation (again making use of the known ranges of X 179 and Y) is applied when decoding the object identifier. 181 Since the semantics of absolute and relative object identifiers 182 differ, this specification defines two tags: 184 Tag <> (value TBD): tags a byte string as the [X.690] encoding of 185 an absolute object identifier (simply "object identifier" or "OID"). 187 Tag <> (value TBD): tags a byte string as the [X.690] encoding of 188 a relative object identifier (also "relative OID"). 190 2.1. Requirements on the byte string being tagged 192 A byte string tagged by <> or <> MUST be a syntactically valid 193 BER representation of an object identifier. Specifically: 195 o its first byte, and any byte that follows a byte that has the most 196 significant bit unset, MUST NOT be 0x80 (this requirement excludes 197 expressing the primary integer values with anything but the 198 shortest form) 200 o its last byte MUST NOT have the most significant bit set (this 201 requirement excludes an incomplete final primary integer value) 203 If either of these invalid conditions are encountered, they MUST be 204 treated as decoding errors. Comparing two OIDs or relative OIDs for 205 equality in a byte-for-byte fashion may not be safe before these 206 checks succeed on at least one of them (this includes the case where 207 one of them is a local constant); a process implementing an exclusion 208 list MUST check for decoding errors first. 210 [X.680] restricts RELATIVE-OID values to have at least one arc. This 211 specification permits empty relative object identifiers; they may 212 still be excluded by application semantics. 214 [RFC7049] permits byte strings to be indefinite-length, with chunks 215 divided at arbitrary byte boundaries. This contrasts with text 216 strings, where each chunk in an indefinite-length text string is 217 required be well-formed UTF-8 on its own: splitting the octets of a 218 UTF-8 character encoding between chunks is not allowed. 220 By analogy to this principle and to Clauses 8.9.1 and 8.20.1 of 221 [X.690], the byte strings carrying the OIDs and relative OIDs are 222 also to be treated as indivisible units: They MUST be encoded in 223 definite-length form; indefinite-length form is treated as an 224 encoding error (and the same considerations as above apply). (An 225 added convenience is that CBOR encodings can be searched through 226 efficiently for specific object identifiers without initiating the 227 decoding process.) 229 We provide "binary regular expression" forms for implementation 230 convenience. Unlike typical regular expressions that operate on 231 character sequences, the following regular expressions take bytes as 232 their domain, so they can be applied directly to CBOR byte strings. 234 For byte strings with tag <>: 236 "/^((?:[\x81-\xFF][\x80-\xFF]*)?[\x00-\x7F])+$/" 238 For byte strings with tag <>: 240 "/^((?:[\x81-\xFF][\x80-\xFF]*)?[\x00-\x7F])*$/" 242 Putative CBOR data that fails these tests SHALL be rejected as 243 improperly coded. 245 Another (possibly more efficient) way to validate the byte strings is 246 to hunt for prohibited patterns. 248 For byte strings with tag <>: 250 "/^$|(?:^|[\x00-\x7F])\x80|[\x80-\xFF]$/" 252 or with lookbehind: 254 "/^$|^\x80|(?<[\x00-\x7F])\x80|(?<[\x80-\xFF])$/" 256 For byte strings with tag <>: 258 "/(?:^|[\x00-\x7F])\x80|[\x80-\xFF]$/" 260 or with lookbehind: 262 "/^\x80|(?<[\x00-\x7F])\x80|(?<[\x80-\xFF])$/" 264 Putative CBOR data that passes these tests SHALL be rejected as 265 improperly coded. 267 (It is worth pointing out that these tests, when optimally 268 implemented, ought to be markedly faster than UTF-8 validation.) 270 3. Examples 272 In the following examples, we are using tag number 6 for <> and 273 tag number 7 for <>. See Section 17.2. 275 3.1. Encoding of the SHA-256 OID 277 ASN.1 Value Notation 278 { joint-iso-itu-t(2) country(16) us(840) organization(1) gov(101) 279 csor(3) nistalgorithm(4) hashalgs(2) sha256(1) } 281 Dotted Decimal Notation (also XML Value Notation) 282 2.16.840.1.101.3.4.2.1 283 06 # UNIVERSAL TAG 6 284 09 # 9 bytes, primitive 285 60 86 48 01 65 03 04 02 01 # X.690 Clause 8.19 286 # | 840 1 | 3 4 2 1 show component encoding 287 # 2.16 101 289 Figure 1: SHA-256 OID in BER 291 C6 # 0b110_00110: mt 6, tag 6 292 49 # 0b010_01001: mt 2, 9 bytes 293 60 86 48 01 65 03 04 02 01 # X.690 Clause 8.19 295 Figure 2: SHA-256 OID in CBOR 297 3.2. Encoding of a UUID OID 299 UUID 300 8b0d1a20-dcc5-11d9-bda9-0002a5d5c51b 302 ASN.1 Value Notation 303 { joint-iso-itu-t(2) uuid(25) 304 geomicaGPAS(184830721219540099336690027854602552603) } 306 Dotted Decimal Notation (also XML Value Notation) 307 2.25.184830721219540099336690027854602552603 309 06 # UNIVERSAL TAG 6 310 14 # 20 bytes, primitive 311 69 82 96 8D 8D 88 9B CC A8 C7 B3 BD D4 C0 80 AA AE D7 8A 1B 312 # | 184830721219540099336690027854602552603 313 # 2.25 315 Figure 3: UUID in an object identifier, in BER 317 C6 # 0b110_00110: mt 6, tag 6 318 54 # 0b010_10100: mt 2, 20 bytes 319 69 82 96 8D 8D 88 9B CC A8 C7 B3 BD D4 C0 80 AA AE D7 8A 1B 321 Figure 4: UUID in an object identifier, in CBOR 323 3.3. Encoding of a MIB Relative OID 325 Given some OID (e.g., "lowpanMib", assumed to be "1.3.6.1.2.1.226" 326 [RFC7388]), to which the following is added: 328 ASN.1 Value Notation (not suitable for diagnostic notation) 329 { lowpanObjects(1) lowpanStats(1) lowpanOutTransmits(29) } 330 Dotted Decimal Notation (diagnostic notation; see Section 5) 331 .1.1.29 333 0D # UNIVERSAL TAG 13 334 03 # 3 bytes, primitive 335 01 01 1D # X.690 Clause 8.20 336 # 1 1 29 show component encoding 338 Figure 5: MIB relative object identifier, in BER 340 C7 # 0b110_00110: mt 6, tag 7 341 43 # 0b010_01001: mt 2 (bstr), 3 bytes 342 01 01 1D # X.690 Clause 8.20 344 Figure 6: MIB relative object identifier, in CBOR 346 This relative OID saves seven bytes compared to the full OID 347 encoding. 349 4. Discussion 351 Staying close to the way object identifiers are encoded in ASN.1 BER 352 makes back-and-forth translation easy. Object identifiers in IETF 353 protocols are serialized in dotted decimal form or BER form, so there 354 is an advantage in not inventing a third form. Also, expectations of 355 the cost of encoding object identifiers are based on BER; using a 356 different encoding might not be aligned with these expectations. If 357 additional information about an OID is desired, lookup services such 358 as the OID Resolution Service (ORS) [X.672] and the OID Repository 359 [OID-INFO] are available. 361 This specification allocates two numbers out of the single-byte tag 362 space. This use of code point space is justified by the wide use of 363 object identifiers in data interchange. For most common OIDs in use 364 (namely those whose contents encode to less than 24 bytes), the CBOR 365 encoding will match the efficiency of [X.690]. (This preliminary 366 conclusion is likely to generate some discussion, see Section 17.2.) 368 5. Diagnostic Notation 370 Implementers will likely want to see OIDs and relative OIDs in their 371 "natural forms" (as sequences of decimal unsigned integers) for 372 diagnostic purposes. Accordingly, this section defines additional 373 syntactic elements that can be used in conjunction with the 374 diagnostic notation described in Section 6 of [RFC7049]. 376 An object identifier may be written in ASN.1 value notation (with 377 enclosing braces and secondary identifiers, ObjectIdentifierValue of 378 Clause 32.3 of [X.680]), or in dotted decimal notation with at least 379 three arcs. Both examples are shown in Section 3. The surrounding 380 tag notation is not to be used, because the tag is implied. The 381 ASN.1 value notation for OIDs does not overlap with JSON object 382 notation for CBOR maps, because at least two arcs are required for a 383 valid OID. 385 A relative object identifier may be written in dotted decimal 386 notation or in ASN.1 value notation, in both cases prefixed with a 387 dot as shown in Section 3.3. The surrounding tag notation is not to 388 be used, because the tag is implied. 390 The notation in this section may be employed in addition to the basic 391 notation, which would be a tagged binary string. 393 +------------------------------+--------------+------------+ 394 | RFC 7049 diagnostic notation | 6(h'2b0601') | 7(h'0601') | 395 +------------------------------+--------------+------------+ 396 | Dotted decimal notation | 1.3.6.1 | .6.1 | 397 | ASN.1 value notation | {1 3 6 1} | .{6 1} | 398 +------------------------------+--------------+------------+ 400 Table 1: Examples for extended diagnostic notation 402 6. A New Arc for Concise OIDs 404 Object identifiers in [X.690] form are remarkably compact. 405 Nevertheless, for some applications (and engineers), they are simply 406 not compact enough, at least when compared to certain alternatives 407 such as very small unsigned integers (see Section 10). The shortest 408 object identifier under the IETF's control is 1.3.6.1 (4 bytes), 409 although an assignment directly under that arc has not happened since 410 1999 [RFC2506], and no assignments directly under that arc have ever 411 been assigned directly to protocol elements. The shortest IETF- 412 controlled, First-Come, First-Served OID arc is 8 bytes by getting a 413 Private Enterprise Number from IANA, an OID for which is assigned 414 under 1.3.6.1.4.1. To promote object identifier usage in CBOR and to 415 make OIDs as competitive as possible, (the authors / the IETF / ISOC) 416 have secured a very short arc "{ x y z }" that only occupies (1, 2, 417 3) byte(s). 419 [[NB: Registration procedures under that arc.]] 421 The history of OIDs suggests that the human mind tends to excessive 422 taxonomy around them. "Excessive taxonomy" means that while 423 classifying purposes are served, the detailed taxonomy comes at the 424 expense of concise encoding to the point that other implementers 425 complain that the OIDs are "too long". OIDs also lose mnemonic 426 properties when the arcs are so long that implementers cannot keep 427 track of all of the divisions. Unlike assignments in the 1.3.6.1 428 range, this document suggests that registrants acquire OIDs under 429 this short arc "laterally" rather than hierarchically, in keeping 430 with CBOR's design goal to have concise serializations. 432 7. Tag Factoring and Tag Stacking with OID Arrays and Maps 434 A common use of object identifiers in ASN.1 is to identify the kind 435 of data in an open type (Clause 3.8.57 of [X.680]), using information 436 object classes [X.681]. CBOR is schema-neutral, and (although not 437 fully discussed in [RFC7049]) semantic tagging was originally 438 intended to identify items in a global, context-free way (i.e., where 439 a specification would not repurpose a tag with different semantics 440 than its IANA registration). Therefore, using OIDs to identify 441 contextual data in a similar fashion to [X.681] is RECOMMENDED. 443 7.1. Tag Factoring 445 <> and <> can tag CBOR arrays and maps. The idea is that the 446 tag is factored out from each individual byte string; the tag is 447 placed in front of the array or map instead. The tags <> and 448 <> are left-distributive. 450 When the <> or <> tag is applied to an array, it means that the 451 respective tag is imputed to all items in the array. For example, 452 when the array is tagged with <>, every array item that is a 453 binary string is an OID. 455 When the <> or <> tag is applied to a map, it means that the 456 respective tag is imputed to all keys in the map. The values in the 457 map are not considered specially tagged. 459 Array and map stacking is permitted. For example, a 3-dimensional 460 array of OIDs can be composed by using a single <> tag, followed 461 by an array of arrays of arrays of binary strings. All such binary 462 strings are considered OIDs. 464 7.2. Switching OID and Relative OID 466 If an individual item in a <> or <> tagged array, or an 467 individual key in a <> or <> tagged map, is tagged with the 468 opposite tag (<> or <>) of the array or map itself, that tag 469 cancels and replaces the outer tag for that item. Like tags MUST NOT 470 be used on such individual items; such tagging is a coding error. 471 For example, if <> is the outer tag on an array and <> is the 472 inner tag on a binary string, semantically the inner item is treated 473 as a regular OID, not as a relative OID. 475 The purpose is to create more compact and flexible identifier spaces, 476 especially when object identifiers are used as enumerated items. 477 Examples: 479 <> outside, <> inside: An implementation that strives for a 480 compact representation, does not have to emit base OID arcs 481 repeatedly for each item. At the same time, if a private 482 organization or standards body separate from the specification needs 483 to identify something that the specification maintainers disagree 484 with, the separate body does not need to request registration of an 485 identifier under a controlled arc (i.e., the base arc of the relative 486 OIDs). 488 <> outside, <> inside: A collection of OIDs is supposed to be 489 open to all-comers, but a certain set of OIDs issued under a 490 particular arc is foreseeable for the majority of implementations. 491 For example, an OID protocol slot may identify cryptographic 492 algorithms: anyone can write (and has written) an algorithm with an 493 arbitrary OID. However, the protocol slot designer may wish to 494 privilege certain algorithms (and therefore OIDs) that are well-known 495 in that field of use. 497 7.3. Tag Stacking 499 CBOR permits tag stacking (tagging a tagged item), although this 500 technique has not been used much yet. This specification anticipates 501 that OIDs and relative OIDs will be associated with values with 502 uniform semantics. This section provides specific semantics when 503 tags are "stacked", that is, a CBOR item starts with tag <> or 504 <>, followed by one or more arbitrary tags ("subsequent tags"), 505 followed by a map or array. 507 7.3.1. Map 509 The overall gist is that the first tag applies to the keys in a map; 510 the subsequent tags apply to the values in a map. 512 When <> or <> is the first tag in a stack of tags, followed by 513 a map: 515 o The <> or <> tag indicates that the keys of the map are byte 516 string OIDs, byte string relative OIDs, or tag-factored arrays or 517 maps of the same. 519 o The subsequent tags uniformly apply to all of the values. 521 For example, if tag 32 (URL) is the subsequent tag, then all values 522 in the map are treated semantically as if tag 32 is applied to them 523 individually. See Figure 7. 525 It is possible that individual values can be tagged. Semantically, 526 these tags cumulate with the outer subsequent tags; inner value tags 527 do not cancel or replace the outer tags. 529 7.3.2. Array 531 The overall gist is that the first tag applies to the ordered "keys" 532 in the array (even-numbered items, assuming that the index starts at 533 0); the subsequent tags apply to the ordered "values" in the array 534 (odd-numbered items). This tagging technique creates an ordered 535 associative array. [[NB: Some call this the FORTRAN approach. need 536 to cite]] 538 When <> or <> is the first tag in a stack of tags, followed by 539 an array: 541 o The <> or <> tag indicates that alternating items, starting 542 with the first item, are byte string OIDs, byte string relative 543 OIDs, or tag-factored arrays or maps of the same. 545 o The subsequent tags uniformly apply to the alternating items, 546 starting with the second item. 548 o The array MUST have an even number of items; an array that has an 549 odd number of items is a coding error. 551 To create an ordered associative array wherein the values (even 552 elements) are arbitrarily tagged, stack tag 55799, self-describe CBOR 553 (Section 2.4.5 of [RFC7049]), after the <> or <> tag. Tag 554 55799 imparts no special semantics, so it is an effective 555 placeholder. (This sequence is mainly provided for completeness: it 556 is a more compact alternative to an array of duple-arrays that each 557 contain an OID or relative OID, and an arbitrary value.) 559 7.4. Diagnostic Notation for OID Arrays and Maps 561 There are no syntactic changes to diagnostic notation beyond 562 Section 5. Using <> or <> with arrays and maps, however, leads 563 to some sublime results. 565 When an array or map is tagged, that item is embraced with the usual 566 tag format: "<>()" or "<>()". This syntax 567 indicates the presence of the tag on the outer item. Inner items in 568 the array or keys in the map are noted in Section 5 form, but are not 569 individually tagged on-the-wire when the tag is the same as the outer 570 tag, because like-tagging is a coding error. 572 An array or map that involves a stack of tags is notated the usual 573 way. For example, the CBOR diagnostic notation of a map of OIDs to 574 URIs is: 576 6(32({0.9.2342.7776.1: "http://example.com/", 577 0.9.2342.7776.2: "ftp://ftp.example.com/pub/"})) 579 Figure 7: Map of OIDs to URIs, in CBOR Diagnostic Diagnostic Notation 581 8. Applications and Examples of OIDs 583 8.1. GPU Farm 585 Consider a 3-dimensional OID array, indicating certain operations to 586 perform on a matrix of values in a GPU farm. Default operations are 587 under the OID arc 0.9.2342.7777 (such as .1, .2, .124, etc.); the arc 588 0.9.2342.7777 itself represents the identity operation. Certain 589 cryptographic operations like SHA-256 hashing 590 (2.16.840.1.101.3.4.2.1) are also permitted. The resulting notation 591 would be: 593 7([[[.1, .2, .3], 594 [.1, .2, .3], 595 [.1, .2, .3]], 596 [[.124, .125, .126], 597 [.95, .96, .97 ], 598 [.11, .12, .13 ]], 599 [[h'', .6, .4.2], 600 [.6, h'', .4.2], 601 [.6, 2.16.840.1.101.3.4.2.1, h'']]]) 603 Figure 8: GPU Farm Matrix Operations, in CBOR Diagnostic Notation 605 c7 # tag(7) 606 83 # array(3) 607 83 # array(3) 608 83 # array(3) 609 41 01 # .1 (2) 610 41 02 # .2 (2) 611 41 03 # .3 (2) 612 83 # array(3) 613 41 01 # .1 (2) 614 41 02 # .2 (2) 615 41 03 # .3 (2) 616 83 # array(3) 617 41 01 # .1 (2) 618 41 02 # .2 (2) 619 41 03 # .3 (2) 620 83 # array(3) 621 83 # array(3) 622 41 7c # .124 (2) 623 41 7d # .125 (2) 624 41 7e # .126 (2) 625 83 # array(3) 626 41 5f # .95 (2) 627 41 60 # .96 (2) 628 41 61 # .97 (2) 629 83 # array(3) 630 41 0b # .11 (2) 631 41 0c # .12 (2) 632 41 0d # .13 (2) 633 83 # array(3) 634 83 # array(3) 635 40 # (empty) (1) 636 41 06 # .6 (2) 637 42 0402 # .4.2 (3) 638 83 # array(3) 639 41 06 # .6 (2) 640 40 # (empty) (1) 641 42 0402 # .4.2 (3) 642 83 # array(3) 643 41 06 # .6 (2) 644 c6 49 608648016503040201 # 2.16.840.1.101.3.4.2.1 (10) 645 40 # (empty) (1) 647 Figure 9: GPU Farm Matrix Operations, in CBOR (76 bytes) 649 8.2. X.500 Distinguished Name 651 Consider the X.500 distinguished name: 653 +----------------------------------------------+--------------------+ 654 | Attribute Types | Attribute Values | 655 +----------------------------------------------+--------------------+ 656 | c (2.5.4.6) | US | 657 +----------------------------------------------+--------------------+ 658 | l (2.5.4.7) | Los Angeles | 659 | s (2.5.4.8) | CA | 660 | postalCode (2.5.4.17) | 90013 | 661 +----------------------------------------------+--------------------+ 662 | street (2.5.4.9) | 532 S Olive St | 663 +----------------------------------------------+--------------------+ 664 | businessCategory (2.5.4.15) | Public Park | 665 | buildingName (0.9.2342.19200300.100.1.48) | Pershing Square | 666 +----------------------------------------------+--------------------+ 668 Table 2: Example X.500 Distinguished Name 670 Table 2 has four RDNs. The country and street RDNs are single- 671 valued. The second and fourth RDNs are multi-valued. 673 The equivalent representations in CBOR diagnostic notation and CBOR 674 are: 676 6([{ 2.5.4.6: "US" }, 677 { 2.5.4.7: "Los Angeles", 2.5.4.8: "CA", 2.5.4.17: "90013" }, 678 { 2.5.4.9: "532 S Olive St" }, 679 { 2.5.4.15: "Public Park", 680 0.9.2342.19200300.100.1.48: "Pershing Square" }]) 682 Figure 10: Distinguished Name, in CBOR Diagnostic Notation 684 6([{ h'550406': "US" }, 685 { h'550407': "Los Angeles", h'550408': "CA", h'550411': "90013" }, 686 { h'550409': "532 S Olive St" }, 687 { h'55040f': "Public Park", 688 h'0992268993f22c640130': "Pershing Square" }]) 690 Figure 11: Distinguished Name, in CBOR Diagnostic Notation (RFC 7049 691 only) 693 c6 # tag(6) 694 84 # array(4) 695 a1 # map(1) 696 43 550406 # 2.5.4.6 (4) 697 62 # text(2) 698 5553 # "US" 699 a3 # map(3) 700 43 550407 # 2.5.4.7 (4) 701 6b # text(11) 702 4c6f7320416e67656c6573 # "Los Angeles" 703 43 550408 # 2.5.4.8 (4) 704 62 # text(2) 705 4341 # "CA" 706 43 550411 # 2.5.4.17 (4) 707 65 # text(5) 708 3930303133 # "90013" 709 a1 # map(1) 710 43 550409 # 2.5.4.9 (4) 711 6e # text(14) 712 3533322053204f6c697665205374 # "532 S Olive St" 713 a2 # map(2) 714 43 55040f # 2.5.4.15 (4) 715 6b # text(11) 716 5075626c6963205061726b # "Public Park" 717 4a 0992268993f22c640130 # 0.9.2342.19200300.100.1.48 (11) 718 6f # text(15) 719 5065727368696e6720537175617265 # "Pershing Square" 721 Figure 12: Distinguished Name, in CBOR (108 bytes) 723 (This example encoding assumes that all attribute values are UTF-8 724 strings, or can be represented as UTF-8 strings with no loss of 725 information.) 727 For reference, the [RFC4514] LDAP string encoding of such data would 728 be: 730 buildingName=Pershing Square+businessCategory=Public Park, 731 street=532 S Olive St,l=Los Angeles+postalCode=90013+st=CA,c=US 733 Figure 13: Distinguished Name, in LDAP String Encoding (121 bytes) 735 9. Universally Unique Identifiers in CBOR 737 This section provides guidance on the Universally Unique Identifier 738 (UUID) type, which was introduced into CBOR with tag <> (currently 739 tag 37, reassignment to be discussed in view of this section). A 740 UUID [RFC4122] is 128 bits long and requires no central registration 741 process. UUIDs were originally used in the Apollo Network Computing 742 System and later in the Open Software Foundation's (OSF) Distributed 743 Computing Environment (DCE), for Remote Procedure Calls (RPC) 744 [DCE-RPC]. 746 As a tagged binary string identifier type in CBOR, the UUID type 747 shares several characteristics with OID types. The main differences 748 are that a UUID is always 16 bytes (anything less or more is a coding 749 error), there is no central assignment process, and every 128-bit 750 combination is valid. ([RFC4122] calls out the nil UUID, which is 751 special but perfectly valid.) Optional registries have cropped up 752 over the years; one such registry is [OID-INFO]. Users who use UUIDs 753 in CBOR are strongly encouraged to document their UUIDs in such 754 registries. 756 To provide parity with OIDs, UUIDs MUST be encoded in definite-length 757 form (see Section 2). Consequently, individual UUIDs can be easily 758 searched for by looking for "d8 25" (major type 6, tag 37), "50" 759 (major type 2, additional information 16), and 16 bytes. Therefore, 760 a directly encoded UUID in CBOR occupies 19 bytes. In contrast, 761 stuffing a UUID in an OID in CBOR requires 22 bytes (see Figure 4); 762 conversion between OID-UUID form and binary or string UUID forms 763 requires bit-shifting (but mercifcully not base-shifting, see 764 Section 18.1). An example based on Figure 4 is below: 766 D8 25 # tag(37) 767 54 # 0b010_10000: mt 2, 16 bytes 768 8B 0D 1A 20 DC C5 11 D9 BD A9 00 02 A5 D5 C5 1B 770 Figure 14: Binary UUID in CBOR 772 9.1. Diagnostic Notation 774 Implementers will likely want to see UUIDs in their "natural forms" 775 for diagnostic purposes. Accordingly, this section defines 776 additional syntactic elements that can be used in conjunction with 777 the diagnostic notation described in Section 6 of [RFC7049]. 779 A universally unique identifier may be written in "string 780 representation" as that term is defined in [RFC4122]. An example of 781 such a string is "8b0d1a20-dcc5-11d9-bda9-0002a5d5c51b" (see Figure 4 782 and Figure 14). Lowercase is the preferred form. (TBD: permit, 783 require, or prohibit curly brace form?) 785 The notation in this section may be employed in addition to the basic 786 notation, which would be a tagged binary string. 788 9.2. Tag Factoring and Tag Stacking 790 Tag Factoring and Tag Stacking are hereby permitted with the UUID 791 type, with the same semantics as Section 7. 793 10. Enumerations in CBOR 795 This section provides a roadmap to using enumerated items in CBOR, 796 including design considerations for choosing between OIDs, UUIDs, 797 integers, and UTF-8 strings. 799 CBOR does not have an ENUMERATED type like ASN.1 to identify named 800 values in a protocol element with three or more states (Clause 20 and 801 Clause G.2.3 of [X.680]). ASN.1 ENUMERATED turns out to be 802 superfluous because ASN.1 INTEGER values can get named (and have 803 historically been used for finite, multistate variables, such as 804 version numbers), while ASN.1 ENUMERATED types can be defined to be 805 extensible with the ellipsis lexical item. Practically, the named 806 integers are not serialized in the binary encodings anyway; they 807 merely serve as a semantic hints for designers and debuggers. 809 CBOR expects that protocol designers will use one of the basic major 810 types for multistate variables, assigning semantics to particular 811 values using higher-level schemas. The obvious choices for the basic 812 types are integers (particularly unsigned integers) and UTF-8 813 strings. However, these major types are not without drawbacks. 815 Integers are compact for small values, but have a flat namespace so 816 there are mis-assignment and collision risks that can only be 817 mitigated with protocol-specific registries. Arrays of integers are 818 possible, but arrays require more processing logic for equality 819 comparisons, and the JSON conversion is not intuitive when the 820 enumerated value serves as a key in a map. 822 UTF-8 strings are less compact when the strings are supposed to 823 resemble their semantics, and there are normalization issues if the 824 strings contain characters beyond the ASCII range. UTF-8 strings 825 also comprise a flat namespace like integers unless the higher-level 826 schema employs delimiters, which makes the string even larger. If 827 conciseness is a design goal, other perceived advantages of a string 828 as an identifier are pretty much blown out the moment one has to tack 829 "https://" onto the front. 831 This section provides novel alternatives in OIDs and UUIDs. It 832 compares and contrasts these binary types to other enumerants, namely 833 integers and text (UTF-8) strings. 835 10.1. Factors Favoring OID Enumerations 837 A protocol designer might choose OIDs or relative OIDs for an 838 enumerated item in view of the following observations: 840 1. OIDs and relative OIDs are quite compact: a single-arc relative 841 OID encoded according to this specification occupies just two 842 bytes for primary integer values 0-127 (excluding the semantic 843 tag <>), and three bytes for primary integer values 128-16383. 844 (In contrast, an unsigned integer requires one byte for 0-23, two 845 bytes for 24-255, and three bytes for 256-65535.) 847 2. OIDs and relative OIDs (with base) are persistent and globally 848 unambiguous. 850 3. OIDs and relative OIDs have built-in semantics for designers and 851 debuggers. Specifically, the advent of universal OID 852 repositories such as [OID-INFO] makes it easy for a designer or 853 debugger to pull up useful information about the object of 854 interest (Clause 3.5.10 of [X.660]). This useful information 855 (for humans) does not have to bleed into the encoded 856 representation (for machines). 858 4. OIDs and relative OIDs are always compared for exact equality: no 859 need to deal with case folding, case sensitivity, or other 860 normalization issues. ("Overlong" encodings are PROHIBITED; 861 therefore overlong encodings MUST be treated as coding errors.) 863 5. OIDs and relative OIDs have a built-in hierarchy, so if 864 implementers want to extend an enumeration without assigning new 865 values "horizontally", they have the option of assigning new 866 values "vertically", possibly with more or less stringent 867 assignment rules. 869 6. Because OIDs and relative OIDs (with base) are part of the so- 870 called International Object Identifier tree [X.660], any other 871 protocol specification can reuse the enumeration if the designers 872 find it useful. 874 7. OIDs and relative OIDs have natural JSON representations in the 875 dotted decimal notations prescribed in Section 5. OIDs and 876 relative OIDs can be distinguished from each other by the 877 presence or absence of the leading dot ".". As the resulting 878 JSON string is entirely numeric in the ASCII range, case and 879 normalization are irrelevant to the comparison. (An object 880 identifier also has a semantic string representation in the form 881 of an OID-IRI [X.680], for those who really want that type of 882 thing.) 884 8. OIDs and relative OIDs are human language-neutral. A protocol 885 designer working in US-English might name an enumerated value 886 "sig" for "signature", but "sig" could also stand for 887 "significand", "signal", or "special interest group". In Swedish 888 and Norwegian, "sig" is a pronoun that means "himself, herself, 889 itself, one, them", etc.--an entirely different meaning. 891 10.2. Factors Favoring UUID Enumerations 893 A Universally Unique Identifier (UUID) is a 128-bit identifier that 894 is unique across both space and time with a very high degree of 895 probability; one intent is to identify "very persistent objects 896 across a network", such as remote procedure call interfaces 897 [DCE-RPC]. 899 A protocol designer might choose UUIDs for an enumerated item in view 900 of the following observations: 902 1. UUIDs are always 16 bytes. This means that while they are not 903 particularly short, they also cannot be overly long. Space is 904 constant and predictable. (As great as OIDs are, an OID that 905 exceeds 17 bytes is simply excessive compared to a randomly- 906 assigned UUID.) 908 2. Any 128-bit combination is a valid UUID. The other types in this 909 section have to be validated, even integers (e.g., to avoid 910 overflow and out-of-range conditions). 912 3. There is no registration authority that serves as a roadblock, 913 and (for all practical purposes) no semantic or aesthetic values 914 are implied by lower bit combinations. 916 4. Many platforms can compare UUIDs (128-bit values) in one atomic 917 operation. The comparison can be done without regard to 918 endianness, provided that the endianness is the same between two 919 UUIDs in memory. (On the wire, a CBOR UUID is big-endian.) For 920 this reason, UUIDs may be faster than (naive) integer 921 enumerations. 923 5. UUIDs have natural JSON representations in the string 924 representations prescribed by [RFC4122]. The resulting JSON 925 strings are entirely in the ASCII range and occupy exactly 36 926 characters; however, normalization (to lowercase) is a 927 complicating factor. 929 6. UUIDs are human language-neutral. (However, unlike OIDs, UUIDs 930 are too long to be described as mnemonic in any practical sense.) 932 10.3. Factors Favoring Integer Enumerations 934 A protocol designer might choose integers for an enumerated item in 935 view of the following observations: 937 1. The CBOR encoding of unsigned integers 0-23 is the most compact, 938 occupying exactly one byte (excluding any semantic tags). 940 2. A protocol designer may wish to prohibit extensibility as a 941 matter of course. Integers comprise a single flat namespace: 942 there is no hierarchy. 944 3. If greater range is desired while sticking to one byte, a 945 protocol designer may double the range of possible values by 946 allowing negative integers. However, enumerating values using 947 negative integers may have unintended side-effects, because some 948 programming environments (e.g., C/C++) make implementation- 949 defined assumptions about the number of bits needed for an 950 enumerated type. 952 10.4. Factors Favoring UTF-8 String Enumerations 954 A protocol designer might choose UTF-8 strings for an enumerated item 955 in view of the following observations: 957 1. A specification can practically limit the content of UTF-8 958 strings to the ASCII range (or narrower), mitigating some 959 normalization problems. 961 2. UTF-8 strings are easier to read on-the-wire for humans. 963 3. UTF-8 strings can contain arbitrary textual identifiers, which 964 can be hierarchical, e.g., URIs. 966 10.5. OID Enumeration Example 968 An enumerated item indicates the revision level of a data format. 969 Revision levels are issued by year, such as 2011, 2012, etc. 970 However, in the year 2013, two revisions were issued: the first one 971 and an important update in June that needs to be distinguished. The 972 revision levels are assigned to some OID arc: 974 "{2 25 6464646464 revs(4)}" 976 In this arc, the following sub-arcs are assigned: 978 +--------------------+ 979 | Sub-Arc | 980 +--------------------+ 981 | {v2011(1)} | 982 | {v2012(2)} | 983 | {v2013(3)} | 984 | {v2013(3) june(6)} | 985 | {v2014(4)} | 986 | {v2015(5)} | 987 +--------------------+ 989 Table 3: Example Sub-Arcs 991 In CBOR, the enumeration is encoded as a relative OID. The schema 992 specifies the base OID arc, which is omitted: 994 c7 # tag(7) 995 41 03 # .3 997 c7 # tag(7) 998 42 0306 # .3.6 1000 Figure 15: Enumerated Items in CBOR 1002 .3 1003 .{v2013(3) june(6)} 1005 Figure 16: Enumerated Items in CBOR Diagnostic Notation 1007 ".3" 1008 ".3.6" 1010 Figure 17: Enumerated Items in JSON (possibility 1) 1012 "v2013" 1013 "v2013/june" 1015 Figure 18: Enumerated Items in JSON (possibility 2) 1017 11. Binary Internet Messages and MIME Entities 1019 Section 2.4.4.3 of [RFC7049] assigns tag 36 to "MIME messages 1020 (including all headers)" [RFC2045], and prescribes UTF-8 strings, 1021 without further elaboration. Actually MIME encircles several 1022 different formats, and is not limited to UTF-8 strings. This section 1023 updates tag 36. 1025 11.1. CBOR Byte String and Binary MIME 1027 Tag 36 is to be used with byte strings. When the tagged item is a 1028 byte string, any octet can be used in the content. Arbitrary octets 1029 are supported by [RFC2045] and can be supported in protocols such as 1030 SMTP using BINARYMIME [RFC3030]. 1032 A conforming implementation that purports to process tag 36-tagged 1033 items, MUST accept byte strings as well as UTF-8 strings. Byte 1034 strings, rather than UTF-8 strings, SHOULD be considered the default. 1035 (While binary Content-Transfer-Encoding is not particularly common as 1036 of this writing, 8-bit encoding is, and it is foreseeable that many 1037 8-bit encoded messages will still have charsets other than UTF-8.) 1039 11.2. Internet Messages, MIME Messages, and MIME Entities 1041 Definitions: "MIME message" is not explicitly defined in [RFC2045], 1042 but a careful read suggests that a MIME message is: "either a 1043 (complete or "top-level") RFC 822 message being transferred on a 1044 network, or a message encapsulated in a body of type "message/rfc822" 1045 or "message/partial"," that also contains MIME header fields, namely, 1046 MIME-Version field, which MUST be present (Section 4 of [RFC2045]. 1047 Other MIME header fields such as Content-Type and Content-Transfer- 1048 Encoding are assumed to be their [RFC2045] default values, if not 1049 present in the data. 1051 When the contents have a From field (a type of "originator address 1052 field") and a Date field (the lone "origination date field") 1053 (Section 3.6 of [RFC5322]), the item is concluded to have a Content- 1054 Type of message/rfc822 or message/global, as appropriate, except as 1055 otherwise specified in this section. 1057 (TBD: Do we need a separate tag for a MIME entity?) (Alternate 1058 proposal: When the tagged data does not include a MIME-Version field 1059 or other fields required by RFC822 (5322) (e.g., no From field), it 1060 is presumed to be a MIME entity, rather than a MIME message. 1061 Therefore, it has no top-level content-type: instead it is simply a 1062 "MIME entity", consisting of one element, whose Content-Type is the 1063 content of the Content-Type header field, if present, or the 1064 [RFC2045] default of "text/plain; charset=us-ascii", if absent. 1065 Content-Transfer-Encoding SHALL be assumed to be 8bit when the CBOR 1066 item is a UTF-8 string, and SHALL be assumed to be binary when the 1067 CBOR item is a byte string. (Or should all be considered CTE: 1068 binary?) And, when the tagged data has RFC822 required fields but no 1069 MIME-Version, shall we assume it's a MIME entity, or shall we assume 1070 it's an Internet message that does not conform to MIME?) 1071 Content that has no headers whatsoever is valid, and implementations 1072 that process tag 36 MUST permit this case: in such a case, the data 1073 starts with CRLF CRLF, followed by the body. In such a case, the 1074 content is assumed to be a MIME entity of Content-Type "text/plain; 1075 charset=us-ascii", and not an RFC822 (RFC5322) Internet message. 1076 (TBD: Confirm.) 1078 11.3. Netnews, HTTP, and SIP Messages 1080 Other message types that are MIME-related are message/news, message/ 1081 http, and message/sip. 1083 [RFC5537] specifies that message/news is deprecated (marked as 1084 obsolete) and that message/rfc822 SHOULD be used in its place; 1085 presumably this also extends to message/global over time. Netnews 1086 Article Format [RFC5536] is a strict subset of Internet Message 1087 Format; it can be detected by the presence of the six mandatory 1088 header fields: Date, From, Message-ID, Newsgroups, Path, and Subject. 1089 (Newsgroups and Path fields are specific to Netnews.) 1091 message/http [RFC7230] is the media type for HTTP requests and 1092 responses. It can be detected by analyzing the first line of the 1093 body, which is an HTTP Start Line (Section 3.1 of [RFC7230]): it does 1094 not conform to the syntax of an Internet Message Format header field. 1095 The optional parameter "msgtype" can be inferred from the Start Line. 1096 Implementers need to be aware that the default character encoding for 1097 message/http is ISO-8859-1, not UTF-8. Therefore, implementations 1098 SHOULD NOT encode HTTP messages with CBOR UTF-8 strings. 1100 Similarly, message/sip [RFC3261] is the media type of SIP request and 1101 response messages. It can be detected by analyzing the first line of 1102 the body, which is a SIP start-line (Section 7.1 of [RFC3261]): it 1103 does not conform to the syntax of an Internet Message Format header 1104 field. The optional parameter can be inferred from the start-line. 1106 11.4. Other Messages 1108 The CBOR binary or UTF-8 string MAY contain other types of messages. 1109 An implementation MAY send such a message as a MIME entity with the 1110 Content-Type field appropriately set, or alternatively, MAY send the 1111 message at the top-level directly. However, if a purported message 1112 type is ambiguous with a message/rfc822 (or message/global) message, 1113 a receiver SHALL treat the message as message/rfc822 (or message/ 1114 global). If a purported message type is ambiguous with a MIME entity 1115 (and unambiguously not message/rfc822 or message/global), a receiver 1116 SHALL treat the message as a MIME entity. 1118 12. Applications and Examples of Messages and Entities 1120 Tag 36 is the RECOMMENDED way to convey data with MIME-related 1121 metadata, including messages (which may or may not actually be MIME- 1122 enabled) and MIME entities. 1124 Example 1: A legacy RFC822 message is encoded as a UTF-8 string or 1125 byte string with tag 36. The contents have From, To, Date, and 1126 Subject header fields, two CRLFs, and a single line "Hello World!", 1127 terminated with a CRLF. 1129 Example 2a: A [RFC5280] certificate is encoded as a byte string with 1130 tag 36. The contents are comprised of "Content-Type: application/ 1131 pkix-cert", two CRLFs, and the DER encoding of the certificate. (The 1132 "Content-Transfer-Encoding: binary" header is not necessary.) 1134 Example 2b: A [RFC5280] certificate is encoded as a UTF-8 string or 1135 byte string with tag 36. The contents are comprised of "Content- 1136 Type: application/pkix-cert", a CRLF, "Content-Transfer-Encoding: 1137 base64", two CRLFs, and the base64 encoding of the DER encoding of 1138 the certificate, conforming to Section 6.8 of [RFC2045]. In 1139 particular, base64 lines are limited to 76 characters, separated by 1140 CRLF, and the final line is supposed to end with CRLF. Needless to 1141 say, this is not nearly as efficient as Example 2a. 1143 13. X.690 Series Tags 1145 [[NB: Carsten probably won't like this. Plan on removing this 1146 section. It is mainly provided to contrast with Section 10.]] 1148 It is foreseeable that CBOR applications will need to send and 1149 receive ASN.1 data, for example, for legacy or security applications. 1150 While a native representation in CBOR is preferred, preserving the 1151 data in an ASN.1 encoding may be necessary, for example, to preserve 1152 cryptographic verification. A tag <> is allocated for this 1153 purpose. 1155 When the tagged item is a byte string, the byte string contents are 1156 encoded according to [X.690], i.e., BER, CER, or DER. CBOR 1157 implementations are not required to validate conformance of the 1158 contained data to [X.690]. 1160 When the tagged item is an array with 3 items: 1162 1. The first item SHALL be an OID (with tag <> omitted; it SHALL 1163 NOT be a relative OID), indicating the ASN.1 module containing 1164 the type of the PDU. [[NB: this is a good example of a non- 1165 trivial structure in which an element is well-defined to be an 1166 OID, which has a tag. Is the CBOR philosophy to tag the item, or 1167 omit the tag on the item, when the item's semantics are already 1168 fixed by the outer tag? Similar situations can apply to tag 32 1169 (URI), etc.]] 1171 2. The second item SHALL be a UTF-8 string indicating the ASN.1 1172 value's _type reference name_ (Clause 3.8.88 of [X.680]) 1173 conforming to the "typereference" production (Clause 12.2 of 1174 [X.680]). 1176 3. The third item SHALL be a byte string, whose contents are encoded 1177 per the prior paragraph. 1179 (TBD: Use of tagged UTF-8 string is reserved for ASN.1 textual 1180 formats such as XER and ASN.1 value notation? Probably not 1181 necessary. Just omit.) 1183 Implementation note: DER-encoded items are always definite-length, so 1184 there is very little reason to use CBOR byte string indefinite 1185 encoding when encoding such DER-encoded items. 1187 Example: A [RFC5280] certificate can be encoded: 1189 1. as a byte string with tag <>, or 1191 2. as an array with tag <>, with three elements: 1193 (1) a byte string "h'2B 06 01 05 05 07 00 12'", which is the BER 1194 encoding of 1.3.6.1.5.5.7.0.18, 1196 (2) a UTF-8 string "Certificate", and 1198 (3) a byte string containing the DER encoding of the 1199 certificate. 1201 14. Regular Expression Clarification 1203 (TODO: better specify conformance to actual regular expression 1204 standards with tag 35. PCRE and JavaScript/ECMAScript regular 1205 expressions are very different; [RFC7049] is not specific enough 1206 about this.) 1208 15. Set and Multiset Technique 1210 CBOR has no native type for a set, which is an arbitrary unordered 1211 collection of items. The following technique is RECOMMENDED to 1212 express set and multiset semantics concisely in native CBOR data. 1214 In computer science, a _set_ is a collection of distinct items; there 1215 is no ordering to the items. Thus, implementations can optimize set 1216 storage in many ways that are not available with ordered elements in 1217 arrays. Sets can be stored in hashtables, bit fields, trees, or 1218 other abstract data types. 1220 In computer science, a _multiset_ allows multiple instances of a 1221 set's elements. Put another way, each distinct item has a 1222 cardinality property indicating the number of these items in the 1223 multiset. 1225 To store items in a set or multiset, it is RECOMMENDED to store the 1226 CBOR items as keys in a map; the values SHALL all be positive 1227 integers (major type 0, value/additional information greater than or 1228 equal to 1). In the special case of a set, the values SHALL be the 1229 integer 1. This technique has no special tag associated with it. As 1230 with arrays that schemas classify as "records" (i.e., arrays with 1231 positionally defined elements), schemas are likewise free to classify 1232 maps as sets in particular instances. 1234 16. Fruits Basket Example 1236 Consider a basket of fruits. The basket can contain any number of 1237 fruits; each fruit of the same species is considered identical. This 1238 basket has two apples, four bananas, six pears, and one pineapple: 1240 {"\u{1F34E}": 2, "\u{1F34C}": 4, 1241 "\u{1F350}": 6, "\u{1F34D}": 1} 1243 Figure 19: Fruits Basket in CBOR Diagnostic Notation 1245 A4 # map(4) 1246 64 # text(4) 1247 f09f8d8e # "\u{1F34E}" 1248 02 # unsigned(2) 1249 64 # text(4) 1250 f09f8d8c # "\u{1F34C}" 1251 04 # unsigned(4) 1252 64 # text(4) 1253 f09f8d90 # "\u{1F350}" 1254 06 # unsigned(6) 1255 64 # text(4) 1256 f09f8d8d # "\u{1F34D}" 1257 01 # unsigned(1) 1259 Figure 20: Fruits Basket in CBOR (33 bytes) 1261 [[TODO: Consider a Merkle Tree example: set of sets of sets of sets 1262 of things. ???]] 1264 17. IANA Considerations 1266 (This section to be edited by the RFC editor.) 1268 17.1. CBOR Tags 1270 IANA is requested to assign the CBOR tags in Table 4, with the 1271 present document as the specification reference. 1273 +----------+-------------+------------------------------------------+ 1274 | Tag | Data Item | Semantics | 1275 +----------+-------------+------------------------------------------+ 1276 | 6<> | multiple | object identifier (BER encoding) | 1277 | 7<> | multiple | relative object identifier (BER | 1278 | | | encoding) | 1279 +----------+-------------+------------------------------------------+ 1281 Table 4: Values for New Tags 1283 17.2. Discussion 1285 (This subsection to be removed by the RFC editor.) 1287 The space for single-byte tags in CBOR (0..23) is severely limited. 1288 It is not clear that the benefits of encoding OIDs/relative OIDs with 1289 one less byte per instance outweigh the consumption of two values in 1290 this code point space. 1292 Procedurally, this space is also reserved for standards action. 1294 An alternative would be to go for the specification required space, 1295 e.g. tag number 40 for <> and tag number 41 for <>. As an 1296 example this would change Figure 2 into: 1298 d8 28 # tag(40) 1299 49 # bytes(9) 1300 60 86 48 01 65 03 04 02 01 # 1302 Figure 21: SHA-256 OID in cbor (using specification required tag) 1304 17.3. Pre-Existing Tags 1306 (TODO: complete.) IANA is requested to modify the registrations for 1307 the following CBOR tags: 1309 +-----+-------------+----------------------------+ 1310 | Tag | Data Item | Semantics | 1311 +-----+-------------+----------------------------+ 1312 | 35 | <> | regular expression <> | 1313 | 36 | multiple | message or MIME entity | 1314 | 37 | multiple | binary UUID | 1315 +-----+-------------+----------------------------+ 1317 Table 5: Values for Existing Tags 1319 17.4. New Tags 1321 (TODO: complete.) 1323 18. Security Considerations 1325 The security considerations of RFC 7049 apply. 1327 The encodings in Clauses 8.19 and 8.20 of [X.690] are extremely 1328 compact and unambiguous, but MUST be followed precisely to avoid 1329 security pitfalls. In particular, the requirements set out in 1330 Section 2.1 of this document need to be followed; otherwise, an 1331 attacker may be able to subvert a checking process by submitting 1332 alternative representations that are later taken as the original (or 1333 even something else entirely) by another decoder supposed to be 1334 protected by the checking process. 1336 OIDs and relative OIDs can always be treated as opaque byte strings. 1337 Actually understanding the structure that was used for generating 1338 them is not necessary, and, except for checking the structure 1339 requirements, it is strongly NOT RECOMMENDED to perform any 1340 processing of this kind (e.g., converting into dotted notation and 1341 back) unless absolutely necessary. If the OIDs are translated into 1342 other representations, the usual security considerations for non- 1343 trivial representation conversions apply; the primary integer values 1344 are unlimited in range (cf. Figure 4). 1346 18.1. Conversions Between BER and Dotted Decimal Notation 1348 [PKILCAKE] uncovers exploit vectors for the illegal values above, as 1349 well as for cases in which conversion to or from the dotted decimal 1350 notation goes awry. Neither [X.660] nor [X.680] place an upper bound 1351 on the range of unsigned integer values for an arc; the integers are 1352 arbitrarily valued. An implementation SHOULD NOT attempt to convert 1353 each component using a fixed-size accumulator, as an attacker will 1354 certainly be able to cause the accumulator to overflow. Compact and 1355 efficient techniques for such conversions, such as the double dabble 1356 algorithm [DOUBLEDABBLE] are well-known in the art; their application 1357 to this field is left as an exercise to the reader. 1359 19. References 1361 19.1. Normative References 1363 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 1364 Extensions (MIME) Part One: Format of Internet Message 1365 Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996, 1366 . 1368 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1369 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/ 1370 RFC2119, March 1997, 1371 . 1373 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1374 A., Peterson, J., Sparks, R., Handley, M., and E. 1375 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1376 DOI 10.17487/RFC3261, June 2002, 1377 . 1379 [RFC4122] Leach, P., Mealling, M., and R. Salz, "A Universally 1380 Unique IDentifier (UUID) URN Namespace", RFC 4122, DOI 1381 10.17487/RFC4122, July 2005, 1382 . 1384 [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322, DOI 1385 10.17487/RFC5322, October 2008, 1386 . 1388 [RFC5536] Murchison, K., Ed., Lindsey, C., and D. Kohn, "Netnews 1389 Article Format", RFC 5536, DOI 10.17487/RFC5536, November 1390 2009, . 1392 [RFC5537] Allbery, R., Ed. and C. Lindsey, "Netnews Architecture and 1393 Protocols", RFC 5537, DOI 10.17487/RFC5537, November 2009, 1394 . 1396 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 1397 Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, 1398 October 2013, . 1400 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 1401 Protocol (HTTP/1.1): Message Syntax and Routing", RFC 1402 7230, DOI 10.17487/RFC7230, June 2014, 1403 . 1405 [X.660] International Telecommunications Union, "Information 1406 technology -- Procedures for the operation of object 1407 identifier registration authorities: General procedures 1408 and top arcs of the international object identifier tree", 1409 ITU-T Recommendation X.660, July 2011. 1411 [X.680] International Telecommunications Union, "Information 1412 technology -- Abstract Syntax Notation One (ASN.1): 1413 Specification of basic notation", ITU-T Recommendation 1414 X.680, August 2015. 1416 [X.690] International Telecommunications Union, "Information 1417 technology -- ASN.1 encoding rules: Specification of Basic 1418 Encoding Rules (BER), Canonical Encoding Rules (CER) and 1419 Distinguished Encoding Rules (DER)", ITU-T Recommendation 1420 X.690, August 2015. 1422 19.2. Informative References 1424 [DCE-RPC] Open Group CAE, "DCE: Remote Procedure Call", 1425 Specification C309, ISBN 1-85912-041-5, August 1994. 1427 [DOUBLEDABBLE] 1428 Gao, S., Al-Khalili, D., and N. Chabini, "An improved BCD 1429 adder using 6-LUT FPGAs", IEEE 10th International New 1430 Circuits and Systems Conference (NEWCAS 2012), pp. 13-16, 1431 DOI: 10.1109/NEWCAS.2012.6328944, June 2012. 1433 [OID-INFO] 1434 Orange SA, "OID Repository", 2016, 1435 . 1437 [PKILCAKE] 1438 Kaminsky, D., Patterson, M., and L. Sassaman, "PKI Layer 1439 Cake: New Collision Attacks Against the Global X.509 1440 Infrastructure", FC 2010, Lecture Notes in Computer 1441 Science 6052 289-303, DOI: 10.1007/978-3-642-14577-3_22, 1442 January 2010, . 1444 [RFC2506] Holtman, K., Mutz, A., and T. Hardie, "Media Feature Tag 1445 Registration Procedure", BCP 31, RFC 2506, DOI 10.17487/ 1446 RFC2506, March 1999, 1447 . 1449 [RFC3030] Vaudreuil, G., "SMTP Service Extensions for Transmission 1450 of Large and Binary MIME Messages", RFC 3030, DOI 1451 10.17487/RFC3030, December 2000, 1452 . 1454 [RFC4514] Zeilenga, K., Ed., "Lightweight Directory Access Protocol 1455 (LDAP): String Representation of Distinguished Names", RFC 1456 4514, DOI 10.17487/RFC4514, June 2006, 1457 . 1459 [RFC5280] Cooper, D., Santesson, S., Farrell, S., Boeyen, S., 1460 Housley, R., and W. Polk, "Internet X.509 Public Key 1461 Infrastructure Certificate and Certificate Revocation List 1462 (CRL) Profile", RFC 5280, DOI 10.17487/RFC5280, May 2008, 1463 . 1465 [RFC6256] Eddy, W. and E. Davies, "Using Self-Delimiting Numeric 1466 Values in Protocols", RFC 6256, DOI 10.17487/RFC6256, May 1467 2011, . 1469 [RFC7388] Schoenwaelder, J., Sehgal, A., Tsou, T., and C. Zhou, 1470 "Definition of Managed Objects for IPv6 over Low-Power 1471 Wireless Personal Area Networks (6LoWPANs)", RFC 7388, DOI 1472 10.17487/RFC7388, October 2014, 1473 . 1475 [X.672] International Telecommunications Union, "Information 1476 technology -- Open systems interconnection -- Object 1477 identifier resolution system", ITU-T Recommendation X.672, 1478 August 2010. 1480 [X.681] International Telecommunications Union, "Information 1481 technology -- Abstract Syntax Notation One (ASN.1): 1482 Information object specification", ITU-T Recommendation 1483 X.681, August 2015. 1485 Appendix A. Changes from -05 to -06 1487 Refreshed the draft to the current date ("keep-alive"). 1489 Appendix B. Changes from -04 to -05 1491 Discussed UUID usage in CBOR, and incorporated fixes proposed by 1492 Olivier Dubuisson, including fixes regarding OID nomenclature. 1494 Appendix C. Changes from -03 to -04 1496 Changes occurred based on limited feedback, mainly centered around 1497 the abstract and introduction, rather than substantive technical 1498 changes. These changes include: 1500 o Changed the title so that it is about tags and techniques. 1502 o Rewrote the abstract to describe the content more accurately, and 1503 to point out that no changes to the wire protocol are being 1504 proposed. 1506 o Removed "ASN.1" from "object identifiers", as OIDs are independent 1507 of ASN.1. 1509 o Rewrote the introduction to be more about the present text. 1511 o Proposed a concise OID arc. 1513 o Provided binary regular expression forms for OID validation. 1515 o Updated IANA registration tables. 1517 Appendix D. Changes from -02 to -03 1519 Many significant changes occurred in this version. These changes 1520 include: 1522 o Expanded the draft scope to be a comprehensive CBOR update. 1524 o Added OID-related sections: OID Enumerations, OID Maps and Arrays, 1525 and Applications and Examples of OIDs. 1527 o Added Tag 36 update (binary MIME, better definitions). 1529 o Added stub/experimental sections for X.690 Series Tags (tag <>) 1530 and Regular Expressions (tag 35). 1532 o Added technique for representing sets and multisets. 1534 o Added references and fixed typos. 1536 Authors' Addresses 1538 Carsten Bormann 1539 Universitaet Bremen TZI 1540 Postfach 330440 1541 Bremen D-28359 1542 Germany 1544 Phone: +49-421-218-63921 1545 Email: cabo@tzi.org 1546 Sean Leonard 1547 Penango, Inc. 1548 5900 Wilshire Boulevard 1549 21st Floor 1550 Los Angeles, CA 90036 1551 USA 1553 Email: dev+ietf@seantek.com 1554 URI: http://www.penango.com/