idnits 2.17.1 

draft-bormann-cbor-tags-oid-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 15 instances of lines with non-RFC6890-compliant IPv4
     addresses in the document.  If these are example addresses, they should
     be changed.

  -- The draft header indicates that this document updates RFC7049, but the
     abstract doesn't seem to directly say this.  It does mention RFC7049
     though, so this could be OK.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (March 13, 2017) is 2600 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 7049 (Obsoleted by RFC 8949)

  ** Obsolete normative reference: RFC 7230 (Obsoleted by RFC 9110, RFC 9112)


     Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         C. Bormann
3	Internet-Draft                                   Universitaet Bremen TZI
4	Updates: 7049 (if approved)                                   S. Leonard
5	Intended status: Standards Track                           Penango, Inc.
6	Expires: September 14, 2017                               March 13, 2017

8	  Concise Binary Object Representation (CBOR) Tags and Techniques for
9	   Object Identifiers, UUIDs, Enumerations, Binary Entities, Regular
10	                         Expressions, and Sets
11	                     draft-bormann-cbor-tags-oid-06

13	Abstract

15	   The Concise Binary Object Representation (CBOR, RFC 7049) is a data
16	   format whose design goals include the possibility of extremely small
17	   code size, fairly small message size, and extensibility without the
18	   need for version negotiation.

20	   Useful tags and techniques have emerged since the publication of RFC
21	   7049; the present document makes use of CBOR's built-in major types
22	   to define and refine several useful constructs, without changing the
23	   wire protocol.  This document adds object identifiers (OIDs) to CBOR
24	   with CBOR tags <<O>> and <<R>> [values TBD].  It is intended as the
25	   reference document for the IANA registration of the CBOR tags so
26	   defined.  Useful techniques for enumerations and sets are presented
27	   (without new tags).  As the documentation for binary UUIDs (tag 37),
28	   MIME entities (tag 36) and regular expressions (tag 35) RFC 7049 left
29	   much out, this document provides more comprehensive specifications.

31	Status of This Memo

33	   This Internet-Draft is submitted in full conformance with the
34	   provisions of BCP 78 and BCP 79.

36	   Internet-Drafts are working documents of the Internet Engineering
37	   Task Force (IETF).  Note that other groups may also distribute
38	   working documents as Internet-Drafts.  The list of current Internet-
39	   Drafts is at http://datatracker.ietf.org/drafts/current/.

41	   Internet-Drafts are draft documents valid for a maximum of six months
42	   and may be updated, replaced, or obsoleted by other documents at any
43	   time.  It is inappropriate to use Internet-Drafts as reference
44	   material or to cite them other than as "work in progress."

46	   This Internet-Draft will expire on September 14, 2017.

48	Copyright Notice

50	   Copyright (c) 2017 IETF Trust and the persons identified as the
51	   document authors.  All rights reserved.

53	   This document is subject to BCP 78 and the IETF Trust's Legal
54	   Provisions Relating to IETF Documents
55	   (http://trustee.ietf.org/license-info) in effect on the date of
56	   publication of this document.  Please review these documents
57	   carefully, as they describe your rights and restrictions with respect
58	   to this document.  Code Components extracted from this document must
59	   include Simplified BSD License text as described in Section 4.e of
60	   the Trust Legal Provisions and are provided without warranty as
61	   described in the Simplified BSD License.

63	Table of Contents

65	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
66	   2.  Object Identifiers  . . . . . . . . . . . . . . . . . . . . .   4
67	   3.  Examples  . . . . . . . . . . . . . . . . . . . . . . . . . .   6
68	   4.  Discussion  . . . . . . . . . . . . . . . . . . . . . . . . .   8
69	   5.  Diagnostic Notation . . . . . . . . . . . . . . . . . . . . .   8
70	   6.  A New Arc for Concise OIDs  . . . . . . . . . . . . . . . . .   9
71	   7.  Tag Factoring and Tag Stacking with OID Arrays and Maps . . .  10
72	   8.  Applications and Examples of OIDs . . . . . . . . . . . . . .  13
73	   9.  Universally Unique Identifiers in CBOR  . . . . . . . . . . .  16
74	   10. Enumerations in CBOR  . . . . . . . . . . . . . . . . . . . .  18
75	   11. Binary Internet Messages and MIME Entities  . . . . . . . . .  22
76	   12. Applications and Examples of Messages and Entities  . . . . .  25
77	   13. X.690 Series Tags . . . . . . . . . . . . . . . . . . . . . .  25
78	   14. Regular Expression Clarification  . . . . . . . . . . . . . .  26
79	   15. Set and Multiset Technique  . . . . . . . . . . . . . . . . .  26
80	   16. Fruits Basket Example . . . . . . . . . . . . . . . . . . . .  27
81	   17. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  28
82	   18. Security Considerations . . . . . . . . . . . . . . . . . . .  29
83	   19. References  . . . . . . . . . . . . . . . . . . . . . . . . .  30
84	   Appendix A.  Changes from -05 to -06  . . . . . . . . . . . . . .  32
85	   Appendix B.  Changes from -04 to -05  . . . . . . . . . . . . . .  32
86	   Appendix C.  Changes from -03 to -04  . . . . . . . . . . . . . .  32
87	   Appendix D.  Changes from -02 to -03  . . . . . . . . . . . . . .  33
88	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  33

90	1.  Introduction

92	   The Concise Binary Object Representation (CBOR, [RFC7049]) provides
93	   for the interchange of structured data without a requirement for a
94	   pre-agreed schema.  RFC 7049 defines a basic set of data types, as
95	   well as a tagging mechanism that enables extending the set of data
96	   types supported via an IANA registry.

98	   Useful tags and techniques have emerged since the publication of
99	   [RFC7049].  This document makes use of CBOR's built-in major types to
100	   provide for several useful constructs without changing the wire
101	   protocol.

103	   The original focus of this work was to add support for object
104	   identifiers (OIDs, [X.660]), which many IETF protocols carry.  The
105	   ASN.1 Basic Encoding Rules (BER, [X.690]) specify the binary
106	   encodings of both object identifiers and relative object identifiers.
107	   The contents of these encodings can be carried in a CBOR byte string.
108	   This document defines two CBOR tags that cover the two kinds of ASN.1
109	   object identifiers encoded in this way.  The tags can also be applied
110	   to arrays and maps for more articulated identification purposes.  It
111	   is intended as the reference document for the IANA registration of
112	   the tags so defined.  To promote the use and usefulness of OIDs in
113	   CBOR, a new arc is also proposed.

115	   This document covers several useful techniques that have been or are
116	   being developed as implementers are applying CBOR to practical
117	   problems.  Enumerations have found wide utility in CBOR, despite
118	   CBOR's lack of a native enumerated type.  A section covers the
119	   advantages of choosing built-in types, with additional consideration
120	   for using the newly-defined object identifier (OID) and universally
121	   unique identifier (UUID) types in enumerations.  CBOR also lacks a
122	   native set type (in the mathematical sense of an arbitrary unordered
123	   collection of items), but has a more powerful alternative in its
124	   native map type.  A section covers how to adapt the map type to
125	   express set and multiset semantics.

127	   Finally, this document covers the semantics of existing tags in
128	   [RFC7049] that were somewhat underspecified.  "Tag 36 is for MIME
129	   messages", but the reference [RFC2045] actually defines a different
130	   construct, the MIME entity, that finds expression in a variety of
131	   message-oriented Internet protocols.  Similarly, "Tag 35 is for
132	   regular expressions", but the references to Perl Compatible Regular
133	   Expressions (PCRE) and JavaScript syntax (ECMA-262) are not
134	   compatible with each other.  Two sections cover the subtleties of
135	   items tagged with these tags, and so update [RFC7049] without
136	   changing the basic CBOR wire protocol.  One section enhances UUIDs.

138	1.1.  Terminology

140	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
141	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
142	   "OPTIONAL" in this document are to be interpreted as described in RFC
143	   2119 [RFC2119].

145	   The terminology of RFC 7049 applies; in particular the term "byte" is
146	   used in its now customary sense as a synonym for "octet".

148	2.  Object Identifiers

150	   The International Object Identifier tree [X.660] is a hierarchically
151	   managed space of identifiers, each of which is uniquely represented
152	   as a sequence of primary integer values [X.680].  While these
153	   sequences can easily be represented in CBOR arrays of unsigned
154	   integers, a more compact representation can often be achieved by
155	   adopting the widely used representation of object identifiers defined
156	   in BER; this representation may also be more amenable to processing
157	   by other software making use of object identifiers.

159	   BER represents the sequence of unsigned integers by concatenating
160	   self-delimiting [RFC6256] representations of each of the primary
161	   integer values in sequence.

163	   ASN.1 distinguishes absolute object identifiers (ASN.1 Type
164	   "OBJECT IDENTIFIER"), which begin at a root arc ([X.660] Clause
165	   3.5.21), from relative object identifiers (ASN.1 Type "RELATIVE-
166	   OID"), which begin relative to some object identifier known from
167	   context ([X.680] Clause 3.8.63).  As a special optimization, BER
168	   combines the first two integers in an absolute object identifier into
169	   one numeric identifier by making use of the property of the hierarchy
170	   that the first arc has only three integer values (0, 1, and 2), and
171	   the second arcs under 0 and 1 are limited to the integer values
172	   between 0 and 39.  (The root arc "joint-iso-itu-t(2)" has no such
173	   limitations on its second arc.)  If X and Y are the first two
174	   integers, the single integer actually encoded is computed as:

176	      X * 40 + Y

178	   The inverse transformation (again making use of the known ranges of X
179	   and Y) is applied when decoding the object identifier.

181	   Since the semantics of absolute and relative object identifiers
182	   differ, this specification defines two tags:

184	   Tag <<O>> (value TBD): tags a byte string as the [X.690] encoding of
185	   an absolute object identifier (simply "object identifier" or "OID").

187	   Tag <<R>> (value TBD): tags a byte string as the [X.690] encoding of
188	   a relative object identifier (also "relative OID").

190	2.1.  Requirements on the byte string being tagged

192	   A byte string tagged by <<O>> or <<R>> MUST be a syntactically valid
193	   BER representation of an object identifier.  Specifically:

195	   o  its first byte, and any byte that follows a byte that has the most
196	      significant bit unset, MUST NOT be 0x80 (this requirement excludes
197	      expressing the primary integer values with anything but the
198	      shortest form)

200	   o  its last byte MUST NOT have the most significant bit set (this
201	      requirement excludes an incomplete final primary integer value)

203	   If either of these invalid conditions are encountered, they MUST be
204	   treated as decoding errors.  Comparing two OIDs or relative OIDs for
205	   equality in a byte-for-byte fashion may not be safe before these
206	   checks succeed on at least one of them (this includes the case where
207	   one of them is a local constant); a process implementing an exclusion
208	   list MUST check for decoding errors first.

210	   [X.680] restricts RELATIVE-OID values to have at least one arc.  This
211	   specification permits empty relative object identifiers; they may
212	   still be excluded by application semantics.

214	   [RFC7049] permits byte strings to be indefinite-length, with chunks
215	   divided at arbitrary byte boundaries.  This contrasts with text
216	   strings, where each chunk in an indefinite-length text string is
217	   required be well-formed UTF-8 on its own: splitting the octets of a
218	   UTF-8 character encoding between chunks is not allowed.

220	   By analogy to this principle and to Clauses 8.9.1 and 8.20.1 of
221	   [X.690], the byte strings carrying the OIDs and relative OIDs are
222	   also to be treated as indivisible units: They MUST be encoded in
223	   definite-length form; indefinite-length form is treated as an
224	   encoding error (and the same considerations as above apply).  (An
225	   added convenience is that CBOR encodings can be searched through
226	   efficiently for specific object identifiers without initiating the
227	   decoding process.)

229	   We provide "binary regular expression" forms for implementation
230	   convenience.  Unlike typical regular expressions that operate on
231	   character sequences, the following regular expressions take bytes as
232	   their domain, so they can be applied directly to CBOR byte strings.

234	   For byte strings with tag <<O>>:

236	      "/^((?:[\x81-\xFF][\x80-\xFF]*)?[\x00-\x7F])+$/"

238	   For byte strings with tag <<R>>:

240	      "/^((?:[\x81-\xFF][\x80-\xFF]*)?[\x00-\x7F])*$/"

242	   Putative CBOR data that fails these tests SHALL be rejected as
243	   improperly coded.

245	   Another (possibly more efficient) way to validate the byte strings is
246	   to hunt for prohibited patterns.

248	   For byte strings with tag <<O>>:

250	      "/^$|(?:^|[\x00-\x7F])\x80|[\x80-\xFF]$/"

252	   or with lookbehind:

254	      "/^$|^\x80|(?<[\x00-\x7F])\x80|(?<[\x80-\xFF])$/"

256	   For byte strings with tag <<R>>:

258	      "/(?:^|[\x00-\x7F])\x80|[\x80-\xFF]$/"

260	   or with lookbehind:

262	      "/^\x80|(?<[\x00-\x7F])\x80|(?<[\x80-\xFF])$/"

264	   Putative CBOR data that passes these tests SHALL be rejected as
265	   improperly coded.

267	   (It is worth pointing out that these tests, when optimally
268	   implemented, ought to be markedly faster than UTF-8 validation.)

270	3.  Examples

272	   In the following examples, we are using tag number 6 for <<O>> and
273	   tag number 7 for <<R>>.  See Section 17.2.

275	3.1.  Encoding of the SHA-256 OID

277	   ASN.1 Value Notation
278	   { joint-iso-itu-t(2) country(16) us(840) organization(1) gov(101)
279	   csor(3) nistalgorithm(4) hashalgs(2) sha256(1) }

281	   Dotted Decimal Notation (also XML Value Notation)
282	   2.16.840.1.101.3.4.2.1
283	   06                                # UNIVERSAL TAG 6
284	      09                             # 9 bytes, primitive
285	         60 86 48 01 65 03 04 02 01  # X.690 Clause 8.19
286	   #      |   840  1  |  3  4  2  1    show component encoding
287	   #   2.16         101

289	                       Figure 1: SHA-256 OID in BER

291	   C6                                # 0b110_00110: mt 6, tag 6
292	      49                             # 0b010_01001: mt 2, 9 bytes
293	         60 86 48 01 65 03 04 02 01  # X.690 Clause 8.19

295	                       Figure 2: SHA-256 OID in CBOR

297	3.2.  Encoding of a UUID OID

299	   UUID
300	   8b0d1a20-dcc5-11d9-bda9-0002a5d5c51b

302	   ASN.1 Value Notation
303	   { joint-iso-itu-t(2) uuid(25)
304	   geomicaGPAS(184830721219540099336690027854602552603) }

306	   Dotted Decimal Notation (also XML Value Notation)
307	   2.25.184830721219540099336690027854602552603

309	   06                                # UNIVERSAL TAG 6
310	      14                             # 20 bytes, primitive
311	         69 82 96 8D 8D 88 9B CC A8 C7 B3 BD D4 C0 80 AA AE D7 8A 1B
312	   #      |                  184830721219540099336690027854602552603
313	   #   2.25

315	              Figure 3: UUID in an object identifier, in BER

317	   C6                                # 0b110_00110: mt 6, tag 6
318	      54                             # 0b010_10100: mt 2, 20 bytes
319	         69 82 96 8D 8D 88 9B CC A8 C7 B3 BD D4 C0 80 AA AE D7 8A 1B

321	              Figure 4: UUID in an object identifier, in CBOR

323	3.3.  Encoding of a MIB Relative OID

325	   Given some OID (e.g., "lowpanMib", assumed to be "1.3.6.1.2.1.226"
326	   [RFC7388]), to which the following is added:

328	   ASN.1 Value Notation (not suitable for diagnostic notation)
329	   { lowpanObjects(1) lowpanStats(1) lowpanOutTransmits(29) }
330	   Dotted Decimal Notation (diagnostic notation; see Section 5)
331	   .1.1.29

333	   0D                                # UNIVERSAL TAG 13
334	      03                             # 3 bytes, primitive
335	         01 01 1D                    # X.690 Clause 8.20
336	   #      1  1 29                      show component encoding

338	             Figure 5: MIB relative object identifier, in BER

340	   C7                                # 0b110_00110: mt 6, tag 7
341	      43                             # 0b010_01001: mt 2 (bstr), 3 bytes
342	         01 01 1D                    # X.690 Clause 8.20

344	             Figure 6: MIB relative object identifier, in CBOR

346	   This relative OID saves seven bytes compared to the full OID
347	   encoding.

349	4.  Discussion

351	   Staying close to the way object identifiers are encoded in ASN.1 BER
352	   makes back-and-forth translation easy.  Object identifiers in IETF
353	   protocols are serialized in dotted decimal form or BER form, so there
354	   is an advantage in not inventing a third form.  Also, expectations of
355	   the cost of encoding object identifiers are based on BER; using a
356	   different encoding might not be aligned with these expectations.  If
357	   additional information about an OID is desired, lookup services such
358	   as the OID Resolution Service (ORS) [X.672] and the OID Repository
359	   [OID-INFO] are available.

361	   This specification allocates two numbers out of the single-byte tag
362	   space.  This use of code point space is justified by the wide use of
363	   object identifiers in data interchange.  For most common OIDs in use
364	   (namely those whose contents encode to less than 24 bytes), the CBOR
365	   encoding will match the efficiency of [X.690].  (This preliminary
366	   conclusion is likely to generate some discussion, see Section 17.2.)

368	5.  Diagnostic Notation

370	   Implementers will likely want to see OIDs and relative OIDs in their
371	   "natural forms" (as sequences of decimal unsigned integers) for
372	   diagnostic purposes.  Accordingly, this section defines additional
373	   syntactic elements that can be used in conjunction with the
374	   diagnostic notation described in Section 6 of [RFC7049].

376	   An object identifier may be written in ASN.1 value notation (with
377	   enclosing braces and secondary identifiers, ObjectIdentifierValue of
378	   Clause 32.3 of [X.680]), or in dotted decimal notation with at least
379	   three arcs.  Both examples are shown in Section 3.  The surrounding
380	   tag notation is not to be used, because the tag is implied.  The
381	   ASN.1 value notation for OIDs does not overlap with JSON object
382	   notation for CBOR maps, because at least two arcs are required for a
383	   valid OID.

385	   A relative object identifier may be written in dotted decimal
386	   notation or in ASN.1 value notation, in both cases prefixed with a
387	   dot as shown in Section 3.3.  The surrounding tag notation is not to
388	   be used, because the tag is implied.

390	   The notation in this section may be employed in addition to the basic
391	   notation, which would be a tagged binary string.

393	       +------------------------------+--------------+------------+
394	       | RFC 7049 diagnostic notation | 6(h'2b0601') | 7(h'0601') |
395	       +------------------------------+--------------+------------+
396	       | Dotted decimal notation      | 1.3.6.1      | .6.1       |
397	       | ASN.1 value notation         | {1 3 6 1}    | .{6 1}     |
398	       +------------------------------+--------------+------------+

400	            Table 1: Examples for extended diagnostic notation

402	6.  A New Arc for Concise OIDs

404	   Object identifiers in [X.690] form are remarkably compact.
405	   Nevertheless, for some applications (and engineers), they are simply
406	   not compact enough, at least when compared to certain alternatives
407	   such as very small unsigned integers (see Section 10).  The shortest
408	   object identifier under the IETF's control is 1.3.6.1 (4 bytes),
409	   although an assignment directly under that arc has not happened since
410	   1999 [RFC2506], and no assignments directly under that arc have ever
411	   been assigned directly to protocol elements.  The shortest IETF-
412	   controlled, First-Come, First-Served OID arc is 8 bytes by getting a
413	   Private Enterprise Number from IANA, an OID for which is assigned
414	   under 1.3.6.1.4.1.  To promote object identifier usage in CBOR and to
415	   make OIDs as competitive as possible, (the authors / the IETF / ISOC)
416	   have secured a very short arc "{ x y z }" that only occupies (1, 2,
417	   3) byte(s).

419	   [[NB: Registration procedures under that arc.]]

421	   The history of OIDs suggests that the human mind tends to excessive
422	   taxonomy around them.  "Excessive taxonomy" means that while
423	   classifying purposes are served, the detailed taxonomy comes at the
424	   expense of concise encoding to the point that other implementers
425	   complain that the OIDs are "too long".  OIDs also lose mnemonic
426	   properties when the arcs are so long that implementers cannot keep
427	   track of all of the divisions.  Unlike assignments in the 1.3.6.1
428	   range, this document suggests that registrants acquire OIDs under
429	   this short arc "laterally" rather than hierarchically, in keeping
430	   with CBOR's design goal to have concise serializations.

432	7.  Tag Factoring and Tag Stacking with OID Arrays and Maps

434	   A common use of object identifiers in ASN.1 is to identify the kind
435	   of data in an open type (Clause 3.8.57 of [X.680]), using information
436	   object classes [X.681].  CBOR is schema-neutral, and (although not
437	   fully discussed in [RFC7049]) semantic tagging was originally
438	   intended to identify items in a global, context-free way (i.e., where
439	   a specification would not repurpose a tag with different semantics
440	   than its IANA registration).  Therefore, using OIDs to identify
441	   contextual data in a similar fashion to [X.681] is RECOMMENDED.

443	7.1.  Tag Factoring

445	   <<O>> and <<R>> can tag CBOR arrays and maps.  The idea is that the
446	   tag is factored out from each individual byte string; the tag is
447	   placed in front of the array or map instead.  The tags <<O>> and
448	   <<R>> are left-distributive.

450	   When the <<O>> or <<R>> tag is applied to an array, it means that the
451	   respective tag is imputed to all items in the array.  For example,
452	   when the array is tagged with <<O>>, every array item that is a
453	   binary string is an OID.

455	   When the <<O>> or <<R>> tag is applied to a map, it means that the
456	   respective tag is imputed to all keys in the map.  The values in the
457	   map are not considered specially tagged.

459	   Array and map stacking is permitted.  For example, a 3-dimensional
460	   array of OIDs can be composed by using a single <<O>> tag, followed
461	   by an array of arrays of arrays of binary strings.  All such binary
462	   strings are considered OIDs.

464	7.2.  Switching OID and Relative OID

466	   If an individual item in a <<O>> or <<R>> tagged array, or an
467	   individual key in a <<O>> or <<R>> tagged map, is tagged with the
468	   opposite tag (<<R>> or <<O>>) of the array or map itself, that tag
469	   cancels and replaces the outer tag for that item.  Like tags MUST NOT
470	   be used on such individual items; such tagging is a coding error.
471	   For example, if <<R>> is the outer tag on an array and <<O>> is the
472	   inner tag on a binary string, semantically the inner item is treated
473	   as a regular OID, not as a relative OID.

475	   The purpose is to create more compact and flexible identifier spaces,
476	   especially when object identifiers are used as enumerated items.
477	   Examples:

479	   <<R>> outside, <<O>> inside: An implementation that strives for a
480	   compact representation, does not have to emit base OID arcs
481	   repeatedly for each item.  At the same time, if a private
482	   organization or standards body separate from the specification needs
483	   to identify something that the specification maintainers disagree
484	   with, the separate body does not need to request registration of an
485	   identifier under a controlled arc (i.e., the base arc of the relative
486	   OIDs).

488	   <<O>> outside, <<R>> inside: A collection of OIDs is supposed to be
489	   open to all-comers, but a certain set of OIDs issued under a
490	   particular arc is foreseeable for the majority of implementations.
491	   For example, an OID protocol slot may identify cryptographic
492	   algorithms: anyone can write (and has written) an algorithm with an
493	   arbitrary OID.  However, the protocol slot designer may wish to
494	   privilege certain algorithms (and therefore OIDs) that are well-known
495	   in that field of use.

497	7.3.  Tag Stacking

499	   CBOR permits tag stacking (tagging a tagged item), although this
500	   technique has not been used much yet.  This specification anticipates
501	   that OIDs and relative OIDs will be associated with values with
502	   uniform semantics.  This section provides specific semantics when
503	   tags are "stacked", that is, a CBOR item starts with tag <<O>> or
504	   <<R>>, followed by one or more arbitrary tags ("subsequent tags"),
505	   followed by a map or array.

507	7.3.1.  Map

509	   The overall gist is that the first tag applies to the keys in a map;
510	   the subsequent tags apply to the values in a map.

512	   When <<O>> or <<R>> is the first tag in a stack of tags, followed by
513	   a map:

515	   o  The <<O>> or <<R>> tag indicates that the keys of the map are byte
516	      string OIDs, byte string relative OIDs, or tag-factored arrays or
517	      maps of the same.

519	   o  The subsequent tags uniformly apply to all of the values.

521	   For example, if tag 32 (URL) is the subsequent tag, then all values
522	   in the map are treated semantically as if tag 32 is applied to them
523	   individually.  See Figure 7.

525	   It is possible that individual values can be tagged.  Semantically,
526	   these tags cumulate with the outer subsequent tags; inner value tags
527	   do not cancel or replace the outer tags.

529	7.3.2.  Array

531	   The overall gist is that the first tag applies to the ordered "keys"
532	   in the array (even-numbered items, assuming that the index starts at
533	   0); the subsequent tags apply to the ordered "values" in the array
534	   (odd-numbered items).  This tagging technique creates an ordered
535	   associative array.  [[NB: Some call this the FORTRAN approach. need
536	   to cite]]

538	   When <<O>> or <<R>> is the first tag in a stack of tags, followed by
539	   an array:

541	   o  The <<O>> or <<R>> tag indicates that alternating items, starting
542	      with the first item, are byte string OIDs, byte string relative
543	      OIDs, or tag-factored arrays or maps of the same.

545	   o  The subsequent tags uniformly apply to the alternating items,
546	      starting with the second item.

548	   o  The array MUST have an even number of items; an array that has an
549	      odd number of items is a coding error.

551	   To create an ordered associative array wherein the values (even
552	   elements) are arbitrarily tagged, stack tag 55799, self-describe CBOR
553	   (Section 2.4.5 of [RFC7049]), after the <<O>> or <<R>> tag.  Tag
554	   55799 imparts no special semantics, so it is an effective
555	   placeholder.  (This sequence is mainly provided for completeness: it
556	   is a more compact alternative to an array of duple-arrays that each
557	   contain an OID or relative OID, and an arbitrary value.)

559	7.4.  Diagnostic Notation for OID Arrays and Maps

561	   There are no syntactic changes to diagnostic notation beyond
562	   Section 5.  Using <<O>> or <<R>> with arrays and maps, however, leads
563	   to some sublime results.

565	   When an array or map is tagged, that item is embraced with the usual
566	   tag format: "<<O>>(<item>)" or "<<R>>(<item>)".  This syntax
567	   indicates the presence of the tag on the outer item.  Inner items in
568	   the array or keys in the map are noted in Section 5 form, but are not
569	   individually tagged on-the-wire when the tag is the same as the outer
570	   tag, because like-tagging is a coding error.

572	   An array or map that involves a stack of tags is notated the usual
573	   way.  For example, the CBOR diagnostic notation of a map of OIDs to
574	   URIs is:

576	   6(32({0.9.2342.7776.1: "http://example.com/",
577	         0.9.2342.7776.2: "ftp://ftp.example.com/pub/"}))

579	   Figure 7: Map of OIDs to URIs, in CBOR Diagnostic Diagnostic Notation

581	8.  Applications and Examples of OIDs

583	8.1.  GPU Farm

585	   Consider a 3-dimensional OID array, indicating certain operations to
586	   perform on a matrix of values in a GPU farm.  Default operations are
587	   under the OID arc 0.9.2342.7777 (such as .1, .2, .124, etc.); the arc
588	   0.9.2342.7777 itself represents the identity operation.  Certain
589	   cryptographic operations like SHA-256 hashing
590	   (2.16.840.1.101.3.4.2.1) are also permitted.  The resulting notation
591	   would be:

593	   7([[[.1,   .2,   .3],
594	       [.1,   .2,   .3],
595	       [.1,   .2,   .3]],
596	      [[.124, .125, .126],
597	       [.95,  .96,  .97 ],
598	       [.11,  .12,  .13 ]],
599	      [[h'',  .6,   .4.2],
600	       [.6,   h'',  .4.2],
601	       [.6,   2.16.840.1.101.3.4.2.1, h'']]])

603	     Figure 8: GPU Farm Matrix Operations, in CBOR Diagnostic Notation

605	   c7                                   # tag(7)
606	      83                                # array(3)
607	         83                             # array(3)
608	            83                          # array(3)
609	               41 01                    # .1 (2)
610	               41 02                    # .2 (2)
611	               41 03                    # .3 (2)
612	            83                          # array(3)
613	               41 01                    # .1 (2)
614	               41 02                    # .2 (2)
615	               41 03                    # .3 (2)
616	            83                          # array(3)
617	               41 01                    # .1 (2)
618	               41 02                    # .2 (2)
619	               41 03                    # .3 (2)
620	         83                             # array(3)
621	            83                          # array(3)
622	               41 7c                    # .124 (2)
623	               41 7d                    # .125 (2)
624	               41 7e                    # .126 (2)
625	            83                          # array(3)
626	               41 5f                    # .95 (2)
627	               41 60                    # .96 (2)
628	               41 61                    # .97 (2)
629	            83                          # array(3)
630	               41 0b                    # .11 (2)
631	               41 0c                    # .12 (2)
632	               41 0d                    # .13 (2)
633	         83                             # array(3)
634	            83                          # array(3)
635	               40                       # (empty) (1)
636	               41 06                    # .6 (2)
637	               42 0402                  # .4.2 (3)
638	            83                          # array(3)
639	               41 06                    # .6 (2)
640	               40                       # (empty) (1)
641	               42 0402                  # .4.2 (3)
642	            83                          # array(3)
643	               41 06                    # .6 (2)
644	               c6 49 608648016503040201 # 2.16.840.1.101.3.4.2.1 (10)
645	               40                       # (empty) (1)

647	         Figure 9: GPU Farm Matrix Operations, in CBOR (76 bytes)

649	8.2.  X.500 Distinguished Name

651	   Consider the X.500 distinguished name:

653	   +----------------------------------------------+--------------------+
654	   | Attribute Types                              | Attribute Values   |
655	   +----------------------------------------------+--------------------+
656	   | c (2.5.4.6)                                  | US                 |
657	   +----------------------------------------------+--------------------+
658	   | l (2.5.4.7)                                  | Los Angeles        |
659	   | s (2.5.4.8)                                  | CA                 |
660	   | postalCode (2.5.4.17)                        | 90013              |
661	   +----------------------------------------------+--------------------+
662	   | street (2.5.4.9)                             | 532 S Olive St     |
663	   +----------------------------------------------+--------------------+
664	   | businessCategory (2.5.4.15)                  | Public Park        |
665	   | buildingName (0.9.2342.19200300.100.1.48)    | Pershing Square    |
666	   +----------------------------------------------+--------------------+

668	                 Table 2: Example X.500 Distinguished Name

670	   Table 2 has four RDNs.  The country and street RDNs are single-
671	   valued.  The second and fourth RDNs are multi-valued.

673	   The equivalent representations in CBOR diagnostic notation and CBOR
674	   are:

676	   6([{ 2.5.4.6: "US" },
677	      { 2.5.4.7: "Los Angeles", 2.5.4.8: "CA", 2.5.4.17: "90013" },
678	      { 2.5.4.9: "532 S Olive St" },
679	      { 2.5.4.15: "Public Park",
680	        0.9.2342.19200300.100.1.48: "Pershing Square" }])

682	        Figure 10: Distinguished Name, in CBOR Diagnostic Notation

684	   6([{ h'550406': "US" },
685	      { h'550407': "Los Angeles", h'550408': "CA", h'550411': "90013" },
686	      { h'550409': "532 S Olive St" },
687	      { h'55040f': "Public Park",
688	        h'0992268993f22c640130': "Pershing Square" }])

690	   Figure 11: Distinguished Name, in CBOR Diagnostic Notation (RFC 7049
691	                                   only)

693	   c6                                         # tag(6)
694	      84                                      # array(4)
695	         a1                                   # map(1)
696	            43 550406                         # 2.5.4.6 (4)
697	            62                                # text(2)
698	               5553                           # "US"
699	         a3                                   # map(3)
700	            43 550407                         # 2.5.4.7 (4)
701	            6b                                # text(11)
702	               4c6f7320416e67656c6573         # "Los Angeles"
703	            43 550408                         # 2.5.4.8 (4)
704	            62                                # text(2)
705	               4341                           # "CA"
706	            43 550411                         # 2.5.4.17 (4)
707	            65                                # text(5)
708	               3930303133                     # "90013"
709	         a1                                   # map(1)
710	            43 550409                         # 2.5.4.9 (4)
711	            6e                                # text(14)
712	               3533322053204f6c697665205374   # "532 S Olive St"
713	         a2                                   # map(2)
714	            43 55040f                         # 2.5.4.15 (4)
715	            6b                                # text(11)
716	               5075626c6963205061726b         # "Public Park"
717	            4a 0992268993f22c640130    # 0.9.2342.19200300.100.1.48 (11)
718	            6f                                # text(15)
719	               5065727368696e6720537175617265 # "Pershing Square"

721	            Figure 12: Distinguished Name, in CBOR (108 bytes)

723	   (This example encoding assumes that all attribute values are UTF-8
724	   strings, or can be represented as UTF-8 strings with no loss of
725	   information.)

727	   For reference, the [RFC4514] LDAP string encoding of such data would
728	   be:

730	   buildingName=Pershing Square+businessCategory=Public Park,
731	   street=532 S Olive St,l=Los Angeles+postalCode=90013+st=CA,c=US

733	    Figure 13: Distinguished Name, in LDAP String Encoding (121 bytes)

735	9.  Universally Unique Identifiers in CBOR

737	   This section provides guidance on the Universally Unique Identifier
738	   (UUID) type, which was introduced into CBOR with tag <<U>> (currently
739	   tag 37, reassignment to be discussed in view of this section).  A
740	   UUID [RFC4122] is 128 bits long and requires no central registration
741	   process.  UUIDs were originally used in the Apollo Network Computing
742	   System and later in the Open Software Foundation's (OSF) Distributed
743	   Computing Environment (DCE), for Remote Procedure Calls (RPC)
744	   [DCE-RPC].

746	   As a tagged binary string identifier type in CBOR, the UUID type
747	   shares several characteristics with OID types.  The main differences
748	   are that a UUID is always 16 bytes (anything less or more is a coding
749	   error), there is no central assignment process, and every 128-bit
750	   combination is valid.  ([RFC4122] calls out the nil UUID, which is
751	   special but perfectly valid.)  Optional registries have cropped up
752	   over the years; one such registry is [OID-INFO].  Users who use UUIDs
753	   in CBOR are strongly encouraged to document their UUIDs in such
754	   registries.

756	   To provide parity with OIDs, UUIDs MUST be encoded in definite-length
757	   form (see Section 2).  Consequently, individual UUIDs can be easily
758	   searched for by looking for "d8 25" (major type 6, tag 37), "50"
759	   (major type 2, additional information 16), and 16 bytes.  Therefore,
760	   a directly encoded UUID in CBOR occupies 19 bytes.  In contrast,
761	   stuffing a UUID in an OID in CBOR requires 22 bytes (see Figure 4);
762	   conversion between OID-UUID form and binary or string UUID forms
763	   requires bit-shifting (but mercifcully not base-shifting, see
764	   Section 18.1).  An example based on Figure 4 is below:

766	   D8 25                             # tag(37)
767	      54                             # 0b010_10000: mt 2, 16 bytes
768	         8B 0D 1A 20 DC C5 11 D9 BD A9 00 02 A5 D5 C5 1B

770	                      Figure 14: Binary UUID in CBOR

772	9.1.  Diagnostic Notation

774	   Implementers will likely want to see UUIDs in their "natural forms"
775	   for diagnostic purposes.  Accordingly, this section defines
776	   additional syntactic elements that can be used in conjunction with
777	   the diagnostic notation described in Section 6 of [RFC7049].

779	   A universally unique identifier may be written in "string
780	   representation" as that term is defined in [RFC4122].  An example of
781	   such a string is "8b0d1a20-dcc5-11d9-bda9-0002a5d5c51b" (see Figure 4
782	   and Figure 14).  Lowercase is the preferred form.  (TBD: permit,
783	   require, or prohibit curly brace form?)

785	   The notation in this section may be employed in addition to the basic
786	   notation, which would be a tagged binary string.

788	9.2.  Tag Factoring and Tag Stacking

790	   Tag Factoring and Tag Stacking are hereby permitted with the UUID
791	   type, with the same semantics as Section 7.

793	10.  Enumerations in CBOR

795	   This section provides a roadmap to using enumerated items in CBOR,
796	   including design considerations for choosing between OIDs, UUIDs,
797	   integers, and UTF-8 strings.

799	   CBOR does not have an ENUMERATED type like ASN.1 to identify named
800	   values in a protocol element with three or more states (Clause 20 and
801	   Clause G.2.3 of [X.680]).  ASN.1 ENUMERATED turns out to be
802	   superfluous because ASN.1 INTEGER values can get named (and have
803	   historically been used for finite, multistate variables, such as
804	   version numbers), while ASN.1 ENUMERATED types can be defined to be
805	   extensible with the ellipsis lexical item.  Practically, the named
806	   integers are not serialized in the binary encodings anyway; they
807	   merely serve as a semantic hints for designers and debuggers.

809	   CBOR expects that protocol designers will use one of the basic major
810	   types for multistate variables, assigning semantics to particular
811	   values using higher-level schemas.  The obvious choices for the basic
812	   types are integers (particularly unsigned integers) and UTF-8
813	   strings.  However, these major types are not without drawbacks.

815	   Integers are compact for small values, but have a flat namespace so
816	   there are mis-assignment and collision risks that can only be
817	   mitigated with protocol-specific registries.  Arrays of integers are
818	   possible, but arrays require more processing logic for equality
819	   comparisons, and the JSON conversion is not intuitive when the
820	   enumerated value serves as a key in a map.

822	   UTF-8 strings are less compact when the strings are supposed to
823	   resemble their semantics, and there are normalization issues if the
824	   strings contain characters beyond the ASCII range.  UTF-8 strings
825	   also comprise a flat namespace like integers unless the higher-level
826	   schema employs delimiters, which makes the string even larger.  If
827	   conciseness is a design goal, other perceived advantages of a string
828	   as an identifier are pretty much blown out the moment one has to tack
829	   "https://" onto the front.

831	   This section provides novel alternatives in OIDs and UUIDs.  It
832	   compares and contrasts these binary types to other enumerants, namely
833	   integers and text (UTF-8) strings.

835	10.1.  Factors Favoring OID Enumerations

837	   A protocol designer might choose OIDs or relative OIDs for an
838	   enumerated item in view of the following observations:

840	   1.  OIDs and relative OIDs are quite compact: a single-arc relative
841	       OID encoded according to this specification occupies just two
842	       bytes for primary integer values 0-127 (excluding the semantic
843	       tag <<R>>), and three bytes for primary integer values 128-16383.
844	       (In contrast, an unsigned integer requires one byte for 0-23, two
845	       bytes for 24-255, and three bytes for 256-65535.)

847	   2.  OIDs and relative OIDs (with base) are persistent and globally
848	       unambiguous.

850	   3.  OIDs and relative OIDs have built-in semantics for designers and
851	       debuggers.  Specifically, the advent of universal OID
852	       repositories such as [OID-INFO] makes it easy for a designer or
853	       debugger to pull up useful information about the object of
854	       interest (Clause 3.5.10 of [X.660]).  This useful information
855	       (for humans) does not have to bleed into the encoded
856	       representation (for machines).

858	   4.  OIDs and relative OIDs are always compared for exact equality: no
859	       need to deal with case folding, case sensitivity, or other
860	       normalization issues.  ("Overlong" encodings are PROHIBITED;
861	       therefore overlong encodings MUST be treated as coding errors.)

863	   5.  OIDs and relative OIDs have a built-in hierarchy, so if
864	       implementers want to extend an enumeration without assigning new
865	       values "horizontally", they have the option of assigning new
866	       values "vertically", possibly with more or less stringent
867	       assignment rules.

869	   6.  Because OIDs and relative OIDs (with base) are part of the so-
870	       called International Object Identifier tree [X.660], any other
871	       protocol specification can reuse the enumeration if the designers
872	       find it useful.

874	   7.  OIDs and relative OIDs have natural JSON representations in the
875	       dotted decimal notations prescribed in Section 5.  OIDs and
876	       relative OIDs can be distinguished from each other by the
877	       presence or absence of the leading dot ".".  As the resulting
878	       JSON string is entirely numeric in the ASCII range, case and
879	       normalization are irrelevant to the comparison.  (An object
880	       identifier also has a semantic string representation in the form
881	       of an OID-IRI [X.680], for those who really want that type of
882	       thing.)

884	   8.  OIDs and relative OIDs are human language-neutral.  A protocol
885	       designer working in US-English might name an enumerated value
886	       "sig" for "signature", but "sig" could also stand for
887	       "significand", "signal", or "special interest group".  In Swedish
888	       and Norwegian, "sig" is a pronoun that means "himself, herself,
889	       itself, one, them", etc.--an entirely different meaning.

891	10.2.  Factors Favoring UUID Enumerations

893	   A Universally Unique Identifier (UUID) is a 128-bit identifier that
894	   is unique across both space and time with a very high degree of
895	   probability; one intent is to identify "very persistent objects
896	   across a network", such as remote procedure call interfaces
897	   [DCE-RPC].

899	   A protocol designer might choose UUIDs for an enumerated item in view
900	   of the following observations:

902	   1.  UUIDs are always 16 bytes.  This means that while they are not
903	       particularly short, they also cannot be overly long.  Space is
904	       constant and predictable.  (As great as OIDs are, an OID that
905	       exceeds 17 bytes is simply excessive compared to a randomly-
906	       assigned UUID.)

908	   2.  Any 128-bit combination is a valid UUID.  The other types in this
909	       section have to be validated, even integers (e.g., to avoid
910	       overflow and out-of-range conditions).

912	   3.  There is no registration authority that serves as a roadblock,
913	       and (for all practical purposes) no semantic or aesthetic values
914	       are implied by lower bit combinations.

916	   4.  Many platforms can compare UUIDs (128-bit values) in one atomic
917	       operation.  The comparison can be done without regard to
918	       endianness, provided that the endianness is the same between two
919	       UUIDs in memory.  (On the wire, a CBOR UUID is big-endian.)  For
920	       this reason, UUIDs may be faster than (naive) integer
921	       enumerations.

923	   5.  UUIDs have natural JSON representations in the string
924	       representations prescribed by [RFC4122].  The resulting JSON
925	       strings are entirely in the ASCII range and occupy exactly 36
926	       characters; however, normalization (to lowercase) is a
927	       complicating factor.

929	   6.  UUIDs are human language-neutral.  (However, unlike OIDs, UUIDs
930	       are too long to be described as mnemonic in any practical sense.)

932	10.3.  Factors Favoring Integer Enumerations

934	   A protocol designer might choose integers for an enumerated item in
935	   view of the following observations:

937	   1.  The CBOR encoding of unsigned integers 0-23 is the most compact,
938	       occupying exactly one byte (excluding any semantic tags).

940	   2.  A protocol designer may wish to prohibit extensibility as a
941	       matter of course.  Integers comprise a single flat namespace:
942	       there is no hierarchy.

944	   3.  If greater range is desired while sticking to one byte, a
945	       protocol designer may double the range of possible values by
946	       allowing negative integers.  However, enumerating values using
947	       negative integers may have unintended side-effects, because some
948	       programming environments (e.g., C/C++) make implementation-
949	       defined assumptions about the number of bits needed for an
950	       enumerated type.

952	10.4.  Factors Favoring UTF-8 String Enumerations

954	   A protocol designer might choose UTF-8 strings for an enumerated item
955	   in view of the following observations:

957	   1.  A specification can practically limit the content of UTF-8
958	       strings to the ASCII range (or narrower), mitigating some
959	       normalization problems.

961	   2.  UTF-8 strings are easier to read on-the-wire for humans.

963	   3.  UTF-8 strings can contain arbitrary textual identifiers, which
964	       can be hierarchical, e.g., URIs.

966	10.5.  OID Enumeration Example

968	   An enumerated item indicates the revision level of a data format.
969	   Revision levels are issued by year, such as 2011, 2012, etc.
970	   However, in the year 2013, two revisions were issued: the first one
971	   and an important update in June that needs to be distinguished.  The
972	   revision levels are assigned to some OID arc:

974	   "{2 25 6464646464 revs(4)}"

976	   In this arc, the following sub-arcs are assigned:

978	                          +--------------------+
979	                          | Sub-Arc            |
980	                          +--------------------+
981	                          | {v2011(1)}         |
982	                          | {v2012(2)}         |
983	                          | {v2013(3)}         |
984	                          | {v2013(3) june(6)} |
985	                          | {v2014(4)}         |
986	                          | {v2015(5)}         |
987	                          +--------------------+

989	                         Table 3: Example Sub-Arcs

991	   In CBOR, the enumeration is encoded as a relative OID.  The schema
992	   specifies the base OID arc, which is omitted:

994	   c7         # tag(7)
995	      41 03   # .3

997	   c7         # tag(7)
998	      42 0306 # .3.6

1000	                    Figure 15: Enumerated Items in CBOR

1002	   .3
1003	   .{v2013(3) june(6)}

1005	          Figure 16: Enumerated Items in CBOR Diagnostic Notation

1007	   ".3"
1008	   ".3.6"

1010	            Figure 17: Enumerated Items in JSON (possibility 1)

1012	   "v2013"
1013	   "v2013/june"

1015	            Figure 18: Enumerated Items in JSON (possibility 2)

1017	11.  Binary Internet Messages and MIME Entities

1019	   Section 2.4.4.3 of [RFC7049] assigns tag 36 to "MIME messages
1020	   (including all headers)" [RFC2045], and prescribes UTF-8 strings,
1021	   without further elaboration.  Actually MIME encircles several
1022	   different formats, and is not limited to UTF-8 strings.  This section
1023	   updates tag 36.

1025	11.1.  CBOR Byte String and Binary MIME

1027	   Tag 36 is to be used with byte strings.  When the tagged item is a
1028	   byte string, any octet can be used in the content.  Arbitrary octets
1029	   are supported by [RFC2045] and can be supported in protocols such as
1030	   SMTP using BINARYMIME [RFC3030].

1032	   A conforming implementation that purports to process tag 36-tagged
1033	   items, MUST accept byte strings as well as UTF-8 strings.  Byte
1034	   strings, rather than UTF-8 strings, SHOULD be considered the default.
1035	   (While binary Content-Transfer-Encoding is not particularly common as
1036	   of this writing, 8-bit encoding is, and it is foreseeable that many
1037	   8-bit encoded messages will still have charsets other than UTF-8.)

1039	11.2.  Internet Messages, MIME Messages, and MIME Entities

1041	   Definitions: "MIME message" is not explicitly defined in [RFC2045],
1042	   but a careful read suggests that a MIME message is: "either a
1043	   (complete or "top-level") RFC 822 message being transferred on a
1044	   network, or a message encapsulated in a body of type "message/rfc822"
1045	   or "message/partial"," that also contains MIME header fields, namely,
1046	   MIME-Version field, which MUST be present (Section 4 of [RFC2045].
1047	   Other MIME header fields such as Content-Type and Content-Transfer-
1048	   Encoding are assumed to be their [RFC2045] default values, if not
1049	   present in the data.

1051	   When the contents have a From field (a type of "originator address
1052	   field") and a Date field (the lone "origination date field")
1053	   (Section 3.6 of [RFC5322]), the item is concluded to have a Content-
1054	   Type of message/rfc822 or message/global, as appropriate, except as
1055	   otherwise specified in this section.

1057	   (TBD: Do we need a separate tag for a MIME entity?)  (Alternate
1058	   proposal: When the tagged data does not include a MIME-Version field
1059	   or other fields required by RFC822 (5322) (e.g., no From field), it
1060	   is presumed to be a MIME entity, rather than a MIME message.
1061	   Therefore, it has no top-level content-type: instead it is simply a
1062	   "MIME entity", consisting of one element, whose Content-Type is the
1063	   content of the Content-Type header field, if present, or the
1064	   [RFC2045] default of "text/plain; charset=us-ascii", if absent.
1065	   Content-Transfer-Encoding SHALL be assumed to be 8bit when the CBOR
1066	   item is a UTF-8 string, and SHALL be assumed to be binary when the
1067	   CBOR item is a byte string.  (Or should all be considered CTE:
1068	   binary?)  And, when the tagged data has RFC822 required fields but no
1069	   MIME-Version, shall we assume it's a MIME entity, or shall we assume
1070	   it's an Internet message that does not conform to MIME?)
1071	   Content that has no headers whatsoever is valid, and implementations
1072	   that process tag 36 MUST permit this case: in such a case, the data
1073	   starts with CRLF CRLF, followed by the body.  In such a case, the
1074	   content is assumed to be a MIME entity of Content-Type "text/plain;
1075	   charset=us-ascii", and not an RFC822 (RFC5322) Internet message.
1076	   (TBD: Confirm.)

1078	11.3.  Netnews, HTTP, and SIP Messages

1080	   Other message types that are MIME-related are message/news, message/
1081	   http, and message/sip.

1083	   [RFC5537] specifies that message/news is deprecated (marked as
1084	   obsolete) and that message/rfc822 SHOULD be used in its place;
1085	   presumably this also extends to message/global over time.  Netnews
1086	   Article Format [RFC5536] is a strict subset of Internet Message
1087	   Format; it can be detected by the presence of the six mandatory
1088	   header fields: Date, From, Message-ID, Newsgroups, Path, and Subject.
1089	   (Newsgroups and Path fields are specific to Netnews.)

1091	   message/http [RFC7230] is the media type for HTTP requests and
1092	   responses.  It can be detected by analyzing the first line of the
1093	   body, which is an HTTP Start Line (Section 3.1 of [RFC7230]): it does
1094	   not conform to the syntax of an Internet Message Format header field.
1095	   The optional parameter "msgtype" can be inferred from the Start Line.
1096	   Implementers need to be aware that the default character encoding for
1097	   message/http is ISO-8859-1, not UTF-8.  Therefore, implementations
1098	   SHOULD NOT encode HTTP messages with CBOR UTF-8 strings.

1100	   Similarly, message/sip [RFC3261] is the media type of SIP request and
1101	   response messages.  It can be detected by analyzing the first line of
1102	   the body, which is a SIP start-line (Section 7.1 of [RFC3261]): it
1103	   does not conform to the syntax of an Internet Message Format header
1104	   field.  The optional parameter can be inferred from the start-line.

1106	11.4.  Other Messages

1108	   The CBOR binary or UTF-8 string MAY contain other types of messages.
1109	   An implementation MAY send such a message as a MIME entity with the
1110	   Content-Type field appropriately set, or alternatively, MAY send the
1111	   message at the top-level directly.  However, if a purported message
1112	   type is ambiguous with a message/rfc822 (or message/global) message,
1113	   a receiver SHALL treat the message as message/rfc822 (or message/
1114	   global).  If a purported message type is ambiguous with a MIME entity
1115	   (and unambiguously not message/rfc822 or message/global), a receiver
1116	   SHALL treat the message as a MIME entity.

1118	12.  Applications and Examples of Messages and Entities

1120	   Tag 36 is the RECOMMENDED way to convey data with MIME-related
1121	   metadata, including messages (which may or may not actually be MIME-
1122	   enabled) and MIME entities.

1124	   Example 1: A legacy RFC822 message is encoded as a UTF-8 string or
1125	   byte string with tag 36.  The contents have From, To, Date, and
1126	   Subject header fields, two CRLFs, and a single line "Hello World!",
1127	   terminated with a CRLF.

1129	   Example 2a: A [RFC5280] certificate is encoded as a byte string with
1130	   tag 36.  The contents are comprised of "Content-Type: application/
1131	   pkix-cert", two CRLFs, and the DER encoding of the certificate.  (The
1132	   "Content-Transfer-Encoding: binary" header is not necessary.)

1134	   Example 2b: A [RFC5280] certificate is encoded as a UTF-8 string or
1135	   byte string with tag 36.  The contents are comprised of "Content-
1136	   Type: application/pkix-cert", a CRLF, "Content-Transfer-Encoding:
1137	   base64", two CRLFs, and the base64 encoding of the DER encoding of
1138	   the certificate, conforming to Section 6.8 of [RFC2045].  In
1139	   particular, base64 lines are limited to 76 characters, separated by
1140	   CRLF, and the final line is supposed to end with CRLF.  Needless to
1141	   say, this is not nearly as efficient as Example 2a.

1143	13.  X.690 Series Tags

1145	   [[NB: Carsten probably won't like this.  Plan on removing this
1146	   section.  It is mainly provided to contrast with Section 10.]]

1148	   It is foreseeable that CBOR applications will need to send and
1149	   receive ASN.1 data, for example, for legacy or security applications.
1150	   While a native representation in CBOR is preferred, preserving the
1151	   data in an ASN.1 encoding may be necessary, for example, to preserve
1152	   cryptographic verification.  A tag <<X>> is allocated for this
1153	   purpose.

1155	   When the tagged item is a byte string, the byte string contents are
1156	   encoded according to [X.690], i.e., BER, CER, or DER.  CBOR
1157	   implementations are not required to validate conformance of the
1158	   contained data to [X.690].

1160	   When the tagged item is an array with 3 items:

1162	   1.  The first item SHALL be an OID (with tag <<O>> omitted; it SHALL
1163	       NOT be a relative OID), indicating the ASN.1 module containing
1164	       the type of the PDU.  [[NB: this is a good example of a non-
1165	       trivial structure in which an element is well-defined to be an
1166	       OID, which has a tag.  Is the CBOR philosophy to tag the item, or
1167	       omit the tag on the item, when the item's semantics are already
1168	       fixed by the outer tag?  Similar situations can apply to tag 32
1169	       (URI), etc.]]

1171	   2.  The second item SHALL be a UTF-8 string indicating the ASN.1
1172	       value's _type reference name_ (Clause 3.8.88 of [X.680])
1173	       conforming to the "typereference" production (Clause 12.2 of
1174	       [X.680]).

1176	   3.  The third item SHALL be a byte string, whose contents are encoded
1177	       per the prior paragraph.

1179	   (TBD: Use of tagged UTF-8 string is reserved for ASN.1 textual
1180	   formats such as XER and ASN.1 value notation?  Probably not
1181	   necessary.  Just omit.)

1183	   Implementation note: DER-encoded items are always definite-length, so
1184	   there is very little reason to use CBOR byte string indefinite
1185	   encoding when encoding such DER-encoded items.

1187	   Example: A [RFC5280] certificate can be encoded:

1189	   1.  as a byte string with tag <<X>>, or

1191	   2.  as an array with tag <<X>>, with three elements:

1193	       (1)  a byte string "h'2B 06 01 05 05 07 00 12'", which is the BER
1194	            encoding of 1.3.6.1.5.5.7.0.18,

1196	       (2)  a UTF-8 string "Certificate", and

1198	       (3)  a byte string containing the DER encoding of the
1199	            certificate.

1201	14.  Regular Expression Clarification

1203	   (TODO: better specify conformance to actual regular expression
1204	   standards with tag 35.  PCRE and JavaScript/ECMAScript regular
1205	   expressions are very different; [RFC7049] is not specific enough
1206	   about this.)

1208	15.  Set and Multiset Technique

1210	   CBOR has no native type for a set, which is an arbitrary unordered
1211	   collection of items.  The following technique is RECOMMENDED to
1212	   express set and multiset semantics concisely in native CBOR data.

1214	   In computer science, a _set_ is a collection of distinct items; there
1215	   is no ordering to the items.  Thus, implementations can optimize set
1216	   storage in many ways that are not available with ordered elements in
1217	   arrays.  Sets can be stored in hashtables, bit fields, trees, or
1218	   other abstract data types.

1220	   In computer science, a _multiset_ allows multiple instances of a
1221	   set's elements.  Put another way, each distinct item has a
1222	   cardinality property indicating the number of these items in the
1223	   multiset.

1225	   To store items in a set or multiset, it is RECOMMENDED to store the
1226	   CBOR items as keys in a map; the values SHALL all be positive
1227	   integers (major type 0, value/additional information greater than or
1228	   equal to 1).  In the special case of a set, the values SHALL be the
1229	   integer 1.  This technique has no special tag associated with it.  As
1230	   with arrays that schemas classify as "records" (i.e., arrays with
1231	   positionally defined elements), schemas are likewise free to classify
1232	   maps as sets in particular instances.

1234	16.  Fruits Basket Example

1236	   Consider a basket of fruits.  The basket can contain any number of
1237	   fruits; each fruit of the same species is considered identical.  This
1238	   basket has two apples, four bananas, six pears, and one pineapple:

1240	   {"\u{1F34E}": 2, "\u{1F34C}": 4,
1241	    "\u{1F350}": 6, "\u{1F34D}": 1}

1243	           Figure 19: Fruits Basket in CBOR Diagnostic Notation

1245	   A4                       # map(4)
1246	      64                    # text(4)
1247	         f09f8d8e           # "\u{1F34E}"
1248	      02                    # unsigned(2)
1249	      64                    # text(4)
1250	         f09f8d8c           # "\u{1F34C}"
1251	      04                    # unsigned(4)
1252	      64                    # text(4)
1253	         f09f8d90           # "\u{1F350}"
1254	      06                    # unsigned(6)
1255	      64                    # text(4)
1256	         f09f8d8d           # "\u{1F34D}"
1257	      01                    # unsigned(1)

1259	                Figure 20: Fruits Basket in CBOR (33 bytes)

1261	   [[TODO: Consider a Merkle Tree example: set of sets of sets of sets
1262	   of things. ???]]

1264	17.  IANA Considerations

1266	   (This section to be edited by the RFC editor.)

1268	17.1.  CBOR Tags

1270	   IANA is requested to assign the CBOR tags in Table 4, with the
1271	   present document as the specification reference.

1273	   +----------+-------------+------------------------------------------+
1274	   | Tag      | Data Item   |                                Semantics |
1275	   +----------+-------------+------------------------------------------+
1276	   | 6<<TBD>> | multiple    |         object identifier (BER encoding) |
1277	   | 7<<TBD>> | multiple    |          relative object identifier (BER |
1278	   |          |             |                                encoding) |
1279	   +----------+-------------+------------------------------------------+

1281	                       Table 4: Values for New Tags

1283	17.2.  Discussion

1285	   (This subsection to be removed by the RFC editor.)

1287	   The space for single-byte tags in CBOR (0..23) is severely limited.
1288	   It is not clear that the benefits of encoding OIDs/relative OIDs with
1289	   one less byte per instance outweigh the consumption of two values in
1290	   this code point space.

1292	   Procedurally, this space is also reserved for standards action.

1294	   An alternative would be to go for the specification required space,
1295	   e.g. tag number 40 for <<O>> and tag number 41 for <<R>>.  As an
1296	   example this would change Figure 2 into:

1298	   d8 28                            # tag(40)
1299	      49                            # bytes(9)
1300	         60 86 48 01 65 03 04 02 01 #

1302	     Figure 21: SHA-256 OID in cbor (using specification required tag)

1304	17.3.  Pre-Existing Tags

1306	   (TODO: complete.)  IANA is requested to modify the registrations for
1307	   the following CBOR tags:

1309	            +-----+-------------+----------------------------+
1310	            | Tag | Data Item   |                  Semantics |
1311	            +-----+-------------+----------------------------+
1312	            | 35  | <<TBD>>     | regular expression <<TBD>> |
1313	            | 36  | multiple    |     message or MIME entity |
1314	            | 37  | multiple    |                binary UUID |
1315	            +-----+-------------+----------------------------+

1317	                     Table 5: Values for Existing Tags

1319	17.4.  New Tags

1321	   (TODO: complete.)

1323	18.  Security Considerations

1325	   The security considerations of RFC 7049 apply.

1327	   The encodings in Clauses 8.19 and 8.20 of [X.690] are extremely
1328	   compact and unambiguous, but MUST be followed precisely to avoid
1329	   security pitfalls.  In particular, the requirements set out in
1330	   Section 2.1 of this document need to be followed; otherwise, an
1331	   attacker may be able to subvert a checking process by submitting
1332	   alternative representations that are later taken as the original (or
1333	   even something else entirely) by another decoder supposed to be
1334	   protected by the checking process.

1336	   OIDs and relative OIDs can always be treated as opaque byte strings.
1337	   Actually understanding the structure that was used for generating
1338	   them is not necessary, and, except for checking the structure
1339	   requirements, it is strongly NOT RECOMMENDED to perform any
1340	   processing of this kind (e.g., converting into dotted notation and
1341	   back) unless absolutely necessary.  If the OIDs are translated into
1342	   other representations, the usual security considerations for non-
1343	   trivial representation conversions apply; the primary integer values
1344	   are unlimited in range (cf.  Figure 4).

1346	18.1.  Conversions Between BER and Dotted Decimal Notation

1348	   [PKILCAKE] uncovers exploit vectors for the illegal values above, as
1349	   well as for cases in which conversion to or from the dotted decimal
1350	   notation goes awry.  Neither [X.660] nor [X.680] place an upper bound
1351	   on the range of unsigned integer values for an arc; the integers are
1352	   arbitrarily valued.  An implementation SHOULD NOT attempt to convert
1353	   each component using a fixed-size accumulator, as an attacker will
1354	   certainly be able to cause the accumulator to overflow.  Compact and
1355	   efficient techniques for such conversions, such as the double dabble
1356	   algorithm [DOUBLEDABBLE] are well-known in the art; their application
1357	   to this field is left as an exercise to the reader.

1359	19.  References

1361	19.1.  Normative References

1363	   [RFC2045]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
1364	              Extensions (MIME) Part One: Format of Internet Message
1365	              Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996,
1366	              <http://www.rfc-editor.org/info/rfc2045>.

1368	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1369	              Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/
1370	              RFC2119, March 1997,
1371	              <http://www.rfc-editor.org/info/rfc2119>.

1373	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
1374	              A., Peterson, J., Sparks, R., Handley, M., and E.
1375	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
1376	              DOI 10.17487/RFC3261, June 2002,
1377	              <http://www.rfc-editor.org/info/rfc3261>.

1379	   [RFC4122]  Leach, P., Mealling, M., and R. Salz, "A Universally
1380	              Unique IDentifier (UUID) URN Namespace", RFC 4122, DOI
1381	              10.17487/RFC4122, July 2005,
1382	              <http://www.rfc-editor.org/info/rfc4122>.

1384	   [RFC5322]  Resnick, P., Ed., "Internet Message Format", RFC 5322, DOI
1385	              10.17487/RFC5322, October 2008,
1386	              <http://www.rfc-editor.org/info/rfc5322>.

1388	   [RFC5536]  Murchison, K., Ed., Lindsey, C., and D. Kohn, "Netnews
1389	              Article Format", RFC 5536, DOI 10.17487/RFC5536, November
1390	              2009, <http://www.rfc-editor.org/info/rfc5536>.

1392	   [RFC5537]  Allbery, R., Ed. and C. Lindsey, "Netnews Architecture and
1393	              Protocols", RFC 5537, DOI 10.17487/RFC5537, November 2009,
1394	              <http://www.rfc-editor.org/info/rfc5537>.

1396	   [RFC7049]  Bormann, C. and P. Hoffman, "Concise Binary Object
1397	              Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049,
1398	              October 2013, <http://www.rfc-editor.org/info/rfc7049>.

1400	   [RFC7230]  Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
1401	              Protocol (HTTP/1.1): Message Syntax and Routing", RFC
1402	              7230, DOI 10.17487/RFC7230, June 2014,
1403	              <http://www.rfc-editor.org/info/rfc7230>.

1405	   [X.660]    International Telecommunications Union, "Information
1406	              technology -- Procedures for the operation of object
1407	              identifier registration authorities: General procedures
1408	              and top arcs of the international object identifier tree",
1409	              ITU-T Recommendation X.660, July 2011.

1411	   [X.680]    International Telecommunications Union, "Information
1412	              technology -- Abstract Syntax Notation One (ASN.1):
1413	              Specification of basic notation", ITU-T Recommendation
1414	              X.680, August 2015.

1416	   [X.690]    International Telecommunications Union, "Information
1417	              technology -- ASN.1 encoding rules: Specification of Basic
1418	              Encoding Rules (BER), Canonical Encoding Rules (CER) and
1419	              Distinguished Encoding Rules (DER)", ITU-T Recommendation
1420	              X.690, August 2015.

1422	19.2.  Informative References

1424	   [DCE-RPC]  Open Group CAE, "DCE: Remote Procedure Call",
1425	              Specification C309, ISBN 1-85912-041-5, August 1994.

1427	   [DOUBLEDABBLE]
1428	              Gao, S., Al-Khalili, D., and N. Chabini, "An improved BCD
1429	              adder using 6-LUT FPGAs", IEEE 10th International New
1430	              Circuits and Systems Conference (NEWCAS 2012), pp. 13-16,
1431	              DOI: 10.1109/NEWCAS.2012.6328944, June 2012.

1433	   [OID-INFO]
1434	              Orange SA, "OID Repository", 2016,
1435	              <http://www.oid-info.com/>.

1437	   [PKILCAKE]
1438	              Kaminsky, D., Patterson, M., and L. Sassaman, "PKI Layer
1439	              Cake: New Collision Attacks Against the Global X.509
1440	              Infrastructure", FC 2010, Lecture Notes in Computer
1441	              Science 6052 289-303, DOI: 10.1007/978-3-642-14577-3_22,
1442	              January 2010, <http://dl.acm.org/citation.cfm?id=2163593>.

1444	   [RFC2506]  Holtman, K., Mutz, A., and T. Hardie, "Media Feature Tag
1445	              Registration Procedure", BCP 31, RFC 2506, DOI 10.17487/
1446	              RFC2506, March 1999,
1447	              <http://www.rfc-editor.org/info/rfc2506>.

1449	   [RFC3030]  Vaudreuil, G., "SMTP Service Extensions for Transmission
1450	              of Large and Binary MIME Messages", RFC 3030, DOI
1451	              10.17487/RFC3030, December 2000,
1452	              <http://www.rfc-editor.org/info/rfc3030>.

1454	   [RFC4514]  Zeilenga, K., Ed., "Lightweight Directory Access Protocol
1455	              (LDAP): String Representation of Distinguished Names", RFC
1456	              4514, DOI 10.17487/RFC4514, June 2006,
1457	              <http://www.rfc-editor.org/info/rfc4514>.

1459	   [RFC5280]  Cooper, D., Santesson, S., Farrell, S., Boeyen, S.,
1460	              Housley, R., and W. Polk, "Internet X.509 Public Key
1461	              Infrastructure Certificate and Certificate Revocation List
1462	              (CRL) Profile", RFC 5280, DOI 10.17487/RFC5280, May 2008,
1463	              <http://www.rfc-editor.org/info/rfc5280>.

1465	   [RFC6256]  Eddy, W. and E. Davies, "Using Self-Delimiting Numeric
1466	              Values in Protocols", RFC 6256, DOI 10.17487/RFC6256, May
1467	              2011, <http://www.rfc-editor.org/info/rfc6256>.

1469	   [RFC7388]  Schoenwaelder, J., Sehgal, A., Tsou, T., and C. Zhou,
1470	              "Definition of Managed Objects for IPv6 over Low-Power
1471	              Wireless Personal Area Networks (6LoWPANs)", RFC 7388, DOI
1472	              10.17487/RFC7388, October 2014,
1473	              <http://www.rfc-editor.org/info/rfc7388>.

1475	   [X.672]    International Telecommunications Union, "Information
1476	              technology -- Open systems interconnection -- Object
1477	              identifier resolution system", ITU-T Recommendation X.672,
1478	              August 2010.

1480	   [X.681]    International Telecommunications Union, "Information
1481	              technology -- Abstract Syntax Notation One (ASN.1):
1482	              Information object specification", ITU-T Recommendation
1483	              X.681, August 2015.

1485	Appendix A.  Changes from -05 to -06

1487	   Refreshed the draft to the current date ("keep-alive").

1489	Appendix B.  Changes from -04 to -05

1491	   Discussed UUID usage in CBOR, and incorporated fixes proposed by
1492	   Olivier Dubuisson, including fixes regarding OID nomenclature.

1494	Appendix C.  Changes from -03 to -04

1496	   Changes occurred based on limited feedback, mainly centered around
1497	   the abstract and introduction, rather than substantive technical
1498	   changes.  These changes include:

1500	   o  Changed the title so that it is about tags and techniques.

1502	   o  Rewrote the abstract to describe the content more accurately, and
1503	      to point out that no changes to the wire protocol are being
1504	      proposed.

1506	   o  Removed "ASN.1" from "object identifiers", as OIDs are independent
1507	      of ASN.1.

1509	   o  Rewrote the introduction to be more about the present text.

1511	   o  Proposed a concise OID arc.

1513	   o  Provided binary regular expression forms for OID validation.

1515	   o  Updated IANA registration tables.

1517	Appendix D.  Changes from -02 to -03

1519	   Many significant changes occurred in this version.  These changes
1520	   include:

1522	   o  Expanded the draft scope to be a comprehensive CBOR update.

1524	   o  Added OID-related sections: OID Enumerations, OID Maps and Arrays,
1525	      and Applications and Examples of OIDs.

1527	   o  Added Tag 36 update (binary MIME, better definitions).

1529	   o  Added stub/experimental sections for X.690 Series Tags (tag <<X>>)
1530	      and Regular Expressions (tag 35).

1532	   o  Added technique for representing sets and multisets.

1534	   o  Added references and fixed typos.

1536	Authors' Addresses

1538	   Carsten Bormann
1539	   Universitaet Bremen TZI
1540	   Postfach 330440
1541	   Bremen  D-28359
1542	   Germany

1544	   Phone: +49-421-218-63921
1545	   Email: cabo@tzi.org
1546	   Sean Leonard
1547	   Penango, Inc.
1548	   5900 Wilshire Boulevard
1549	   21st Floor
1550	   Los Angeles, CA  90036
1551	   USA

1553	   Email: dev+ietf@seantek.com
1554	   URI:   http://www.penango.com/