idnits 2.17.1 

draft-bormann-cbor-tags-oid-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 15 instances of lines with non-RFC6890-compliant IPv4
     addresses in the document.  If these are example addresses, they should
     be changed.

  -- The draft header indicates that this document updates RFC7049, but the
     abstract doesn't seem to directly say this.  It does mention RFC7049
     though, so this could be OK.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 08, 2016) is 2839 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 7049 (Obsoleted by RFC 8949)

  ** Obsolete normative reference: RFC 7230 (Obsoleted by RFC 9110, RFC 9112)


     Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         C. Bormann
3	Internet-Draft                                   Universitaet Bremen TZI
4	Updates: 7049 (if approved)                                   S. Leonard
5	Intended status: Standards Track                           Penango, Inc.
6	Expires: January 9, 2017                                   July 08, 2016

8	  Concise Binary Object Representation (CBOR) Tags and Techniques for
9	Object Identifiers, Enumerations, Binary Entities, Regular Expressions,
10	                                and Sets
11	                     draft-bormann-cbor-tags-oid-04

13	Abstract

15	   The Concise Binary Object Representation (CBOR, RFC 7049) is a data
16	   format whose design goals include the possibility of extremely small
17	   code size, fairly small message size, and extensibility without the
18	   need for version negotiation.

20	   Useful tags and techniques have emerged since the publication of RFC
21	   7049; the present document makes use of CBOR's built-in major types
22	   to define and refine several useful constructs, without changing the
23	   wire protocol.  This document adds object identifiers (OIDs) to CBOR
24	   with CBOR tags <<O>> and <<R>> [values TBD].  It is intended as the
25	   reference document for the IANA registration of the CBOR tags so
26	   defined.  Useful techniques for enumerations and sets are presented
27	   (without new tags).  As the documentation for MIME entities (tag 36)
28	   and regular expressions (tag 35) RFC 7049 left much out, this
29	   document provides more comprehensive specifications.

31	Status of This Memo

33	   This Internet-Draft is submitted in full conformance with the
34	   provisions of BCP 78 and BCP 79.

36	   Internet-Drafts are working documents of the Internet Engineering
37	   Task Force (IETF).  Note that other groups may also distribute
38	   working documents as Internet-Drafts.  The list of current Internet-
39	   Drafts is at http://datatracker.ietf.org/drafts/current/.

41	   Internet-Drafts are draft documents valid for a maximum of six months
42	   and may be updated, replaced, or obsoleted by other documents at any
43	   time.  It is inappropriate to use Internet-Drafts as reference
44	   material or to cite them other than as "work in progress."

46	   This Internet-Draft will expire on January 9, 2017.

48	Copyright Notice

50	   Copyright (c) 2016 IETF Trust and the persons identified as the
51	   document authors.  All rights reserved.

53	   This document is subject to BCP 78 and the IETF Trust's Legal
54	   Provisions Relating to IETF Documents
55	   (http://trustee.ietf.org/license-info) in effect on the date of
56	   publication of this document.  Please review these documents
57	   carefully, as they describe your rights and restrictions with respect
58	   to this document.  Code Components extracted from this document must
59	   include Simplified BSD License text as described in Section 4.e of
60	   the Trust Legal Provisions and are provided without warranty as
61	   described in the Simplified BSD License.

63	Table of Contents

65	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
66	   2.  Object Identifiers  . . . . . . . . . . . . . . . . . . . . .   4
67	   3.  Examples  . . . . . . . . . . . . . . . . . . . . . . . . . .   6
68	   4.  Discussion  . . . . . . . . . . . . . . . . . . . . . . . . .   8
69	   5.  Diagnostic Notation . . . . . . . . . . . . . . . . . . . . .   8
70	   6.  A New Arc for Concise OIDs  . . . . . . . . . . . . . . . . .   9
71	   7.  Enumerations in CBOR  . . . . . . . . . . . . . . . . . . . .  10
72	   8.  Tag Factoring and Tag Stacking with OID Arrays and Maps . . .  13
73	   9.  Applications and Examples of OIDs . . . . . . . . . . . . . .  17
74	   10. Binary Internet Messages and MIME Entities  . . . . . . . . .  20
75	   11. Applications and Examples of Messages and Entities  . . . . .  23
76	   12. X.690 Series Tags . . . . . . . . . . . . . . . . . . . . . .  23
77	   13. Regular Expression Clarification  . . . . . . . . . . . . . .  24
78	   14. Set and Multiset Technique  . . . . . . . . . . . . . . . . .  25
79	   15. Fruits Basket Example . . . . . . . . . . . . . . . . . . . .  25
80	   16. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  26
81	   17. Security Considerations . . . . . . . . . . . . . . . . . . .  27
82	   18. References  . . . . . . . . . . . . . . . . . . . . . . . . .  28
83	   Appendix A.  Changes from -03 to -04  . . . . . . . . . . . . . .  30
84	   Appendix B.  Changes from -02 to -03  . . . . . . . . . . . . . .  31
85	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  31

87	1.  Introduction

89	   The Concise Binary Object Representation (CBOR, [RFC7049]) provides
90	   for the interchange of structured data without a requirement for a
91	   pre-agreed schema.  RFC 7049 defines a basic set of data types, as
92	   well as a tagging mechanism that enables extending the set of data
93	   types supported via an IANA registry.

95	   Useful tags and techniques have emerged since the publication of
96	   [RFC7049].  This document makes use of CBOR's built-in major types to
97	   provide for several useful constructs without changing the wire
98	   protocol.

100	   The original focus of this work was to add support for object
101	   identifiers (OIDs, [X.680]), which many IETF protocols carry.  The
102	   ASN.1 Basic Encoding Rules (BER, [X.690]) specify the binary
103	   encodings of both object identifiers and relative object identifiers.
104	   The contents of these encodings can be carried in a CBOR byte string.
105	   This document defines two CBOR tags that cover the two kinds of ASN.1
106	   object identifiers encoded in this way.  The tags can also be applied
107	   to arrays and maps for more articulated identification purposes.  It
108	   is intended as the reference document for the IANA registration of
109	   the tags so defined.  To promote the use and usefulness of OIDs in
110	   CBOR, a new arc is also proposed.

112	   This document covers several useful techniques that have been or are
113	   being developed as implementers are applying CBOR to practical
114	   problems.  Enumerations have found wide utility in CBOR, despite
115	   CBOR's lack of a native enumerated type.  A section covers the
116	   advantages of choosing built-in types, with additional consideration
117	   for using the newly-defined object identifier types in enumerations.
118	   CBOR also lacks a native set type (in the mathematical sense of an
119	   arbitrary unordered collection of items), but has a more powerful
120	   alternative in its native map type.  A section covers how to adapt
121	   the map type to express set and multiset semantics.

123	   Finally, this document covers the semantics of existing tags in
124	   [RFC7049] that were somewhat underspecified.  "Tag 36 is for MIME
125	   messages", but the reference [RFC2045] actually defines a different
126	   construct, the MIME entity, that finds expression in a variety of
127	   message-oriented Internet protocols.  Similarly, "Tag 35 is for
128	   regular expressions", but the references to Perl Compatible Regular
129	   Expressions (PCRE) and JavaScript syntax (ECMA-262) are not
130	   compatible with each other.  Two sections cover the subtleties of
131	   items tagged with these tags, and so update [RFC7049] without
132	   changing the basic CBOR wire protocol.

134	1.1.  Terminology

136	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
137	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
138	   "OPTIONAL" in this document are to be interpreted as described in RFC
139	   2119 [RFC2119].

141	   The terminology of RFC 7049 applies; in particular the term "byte" is
142	   used in its now customary sense as a synonym for "octet".

144	2.  Object Identifiers

146	   The International Object Identifier tree [X.660] is a hierarchically
147	   managed space of identifiers, each of which is uniquely represented
148	   as a sequence of unsigned integers ("sub-identifiers") [X.680].
149	   While these sequences can easily be represented in CBOR arrays of
150	   unsigned integers, a more compact representation can often be
151	   achieved by adopting the widely used representation of object
152	   identifiers defined in BER; this representation may also be more
153	   amenable to processing by other software making use of object
154	   identifiers.

156	   BER represents the sequence of unsigned integers by concatenating
157	   self-delimiting [RFC6256] representations of each of the sub-
158	   identifiers in sequence.

160	   ASN.1 distinguishes absolute object identifiers (ASN.1 Type
161	   "OBJECT IDENTIFIER"), which begin at a root arc ([X.660] Clause
162	   3.5.21), from relative object identifiers (ASN.1 Type "RELATIVE-
163	   OID"), which begin relative to some object identifier known from
164	   context ([X.680] Clause 3.8.63).  As a special optimization, BER
165	   combines the first two integers in an absolute object identifier into
166	   one numeric identifier by making use of the property of the hierarchy
167	   that the first arc has only three integer values (0, 1, and 2), and
168	   the second arcs under 0 and 1 are limited to the integer values
169	   between 0 and 39.  (The root arc "joint-iso-itu-t(2)" has no such
170	   limitations on its second arc.)  If X and Y are the first two
171	   integers, the single integer actually encoded is computed as:

173	      X * 40 + Y

175	   The inverse transformation (again making use of the known ranges of X
176	   and Y) is applied when decoding the object identifier.

178	   Since the semantics of absolute and relative object identifiers
179	   differ, this specification defines two tags:

181	   Tag <<O>> (value TBD): tags a byte string as the [X.690] encoding of
182	   an absolute object identifier (simply "object identifier" or "OID").

184	   Tag <<R>> (value TBD): tags a byte string as the [X.690] encoding of
185	   a relative object identifier (also "relative OID").

187	2.1.  Requirements on the byte string being tagged

189	   A byte string tagged by <<O>> or <<R>> MUST be a syntactically valid
190	   BER representation of an object identifier.  Specifically:

192	   o  its first byte, and any byte that follows a byte that has the most
193	      significant bit unset, MUST NOT be 0x80 (this requirement excludes
194	      expressing the sub-identifiers with anything but the shortest
195	      form)

197	   o  its last byte MUST NOT have the most significant bit set (this
198	      requirement excludes an incomplete final sub-identifier)

200	   If either of these invalid conditions are encountered, they MUST be
201	   treated as decoding errors.  Comparing two OIDs or relative OIDs for
202	   equality in a byte-for-byte fashion may not be safe before these
203	   checks succeed on at least one of them (this includes the case where
204	   one of them is a local constant); a process implementing an exclusion
205	   list MUST check for decoding errors first.

207	   [X.680] restricts RELATIVE-OID values to have at least one sub-
208	   identifier (array element).  This specification permits empty
209	   relative object identifiers; they may still be excluded by
210	   application semantics.

212	   [RFC7049] permits byte strings to be indefinite-length, with chunks
213	   divided at arbitrary byte boundaries.  This contrasts with text
214	   strings, where each chunk in an indefinite-length text string is
215	   required be well-formed UTF-8 on its own: splitting the octets of a
216	   UTF-8 character encoding between chunks is not allowed.

218	   By analogy to this principle and to Clauses 8.9.1 and 8.20.1 of
219	   [X.690], the byte strings carrying the OIDs and relative OIDs are
220	   also to be treated as indivisible units: They MUST be encoded in
221	   definite-length form; indefinite-length form is treated as an
222	   encoding error (and the same considerations as above apply).  (An
223	   added convenience is that CBOR encodings can be searched through
224	   efficiently for specific object identifiers without initiating the
225	   decoding process.)

227	   We provide "binary regular expression" forms for implementation
228	   convenience.  Unlike typical regular expressions that operate on
229	   character sequences, the following regular expressions take bytes as
230	   their domain, so they can be applied directly to CBOR byte strings.

232	   For byte strings with tag <<O>>:

234	      /^((?:[\x81-\xFF][\x80-\xFF]*)?[\x00-\x7F])+$/

236	   For byte strings with tag <<R>>:

238	      /^((?:[\x81-\xFF][\x80-\xFF]*)?[\x00-\x7F])*$/

240	   Putative CBOR data that fails these tests SHALL be rejected as
241	   improperly coded.

243	   Another (possibly more efficient) way to validate the byte strings is
244	   to hunt for prohibited patterns.

246	   For byte strings with tag <<O>>:

248	      /^$|(?:^|[\x00-\x7F])\x80|[\x80-\xFF]$/

250	   or with lookbehind:

252	      /^$|^\x80|(?<[\x00-\x7F])\x80|(?<[\x80-\xFF])$/

254	   For byte strings with tag <<R>>:

256	      /(?:^|[\x00-\x7F])\x80|[\x80-\xFF]$/

258	   or with lookbehind:

260	      /^\x80|(?<[\x00-\x7F])\x80|(?<[\x80-\xFF])$/

262	   Putative CBOR data that passes these tests SHALL be rejected as
263	   improperly coded.

265	   (It is worth pointing out that these tests, when optimally
266	   implemented, ought to be markedly faster than UTF-8 validation.)

268	3.  Examples

270	   In the following examples, we are using tag number 6 for <<O>> and
271	   tag number 7 for <<R>>.  See Section 16.2.

273	3.1.  Encoding of the SHA-256 OID

275	   ASN.1 Value Notation
276	   { joint-iso-itu-t(2) country(16) us(840) organization(1) gov(101)
277	   csor(3) nistalgorithm(4) hashalgs(2) sha256(1) }

279	   Dotted Decimal Notation (also XML Value Notation)
280	   2.16.840.1.101.3.4.2.1
281	   06                                # UNIVERSAL TAG 6
282	      09                             # 9 bytes, primitive
283	         60 86 48 01 65 03 04 02 01  # X.690 Clause 8.19
284	   #      |   840  1  |  3  4  2  1    show component encoding
285	   #   2.16         101

287	                       Figure 1: SHA-256 OID in BER

289	   C6                                # 0b110_00110: mt 6, tag 6
290	      49                             # 0b010_01001: mt 2, 9 bytes
291	         60 86 48 01 65 03 04 02 01  # X.690 Clause 8.19

293	                       Figure 2: SHA-256 OID in CBOR

295	3.2.  Encoding of a UUID OID

297	   UUID
298	   8b0d1a20-dcc5-11d9-bda9-0002a5d5c51b

300	   ASN.1 Value Notation
301	   { joint-iso-itu-t(2) uuid(25)
302	   geomicaGPAS(184830721219540099336690027854602552603) }

304	   Dotted Decimal Notation (also XML Value Notation)
305	   2.25.184830721219540099336690027854602552603

307	   06                                # UNIVERSAL TAG 6
308	      14                             # 20 bytes, primitive
309	         69 82 96 8D 8D 88 9B CC A8 C7 B3 BD D4 C0 80 AA AE D7 8A 1B
310	   #      |                  184830721219540099336690027854602552603
311	   #   2.25

313	              Figure 3: UUID in an object identifier, in BER

315	   C6                                # 0b110_00110: mt 6, tag 6
316	      54                             # 0b010_10100: mt 2, 20 bytes
317	         69 82 96 8D 8D 88 9B CC A8 C7 B3 BD D4 C0 80 AA AE D7 8A 1B

319	              Figure 4: UUID in an object identifier, in CBOR

321	3.3.  Encoding of a MIB Relative OID

323	   Given some OID (e.g., "lowpanMib", assumed to be "1.3.6.1.2.1.226"
324	   [RFC7388]), to which the following is added:

326	   ASN.1 Value Notation (not suitable for diagnostic notation)
327	   { lowpanObjects(1) lowpanStats(1) lowpanOutTransmits(29) }
328	   Dotted Decimal Notation (diagnostic notation; see Section 5)
329	   .1.1.29

331	   0D                                # UNIVERSAL TAG 13
332	      03                             # 3 bytes, primitive
333	         01 01 1D                    # X.690 Clause 8.20
334	   #      1  1 29                      show component encoding

336	             Figure 5: MIB relative object identifier, in BER

338	   C7                                # 0b110_00110: mt 6, tag 7
339	      43                             # 0b010_01001: mt 2 (bstr), 3 bytes
340	         01 01 1D                    # X.690 Clause 8.20

342	             Figure 6: MIB relative object identifier, in CBOR

344	   This relative OID saves seven bytes compared to the full OID
345	   encoding.

347	4.  Discussion

349	   Staying close to the way object identifiers are encoded in ASN.1 BER
350	   makes back-and-forth translation easy.  Object identifiers in IETF
351	   protocols are serialized in dotted decimal form or BER form, so there
352	   is an advantage in not inventing a third form.  Also, expectations of
353	   the cost of encoding object identifiers are based on BER; using a
354	   different encoding might not be aligned with these expectations.  If
355	   additional information about an OID is desired, lookup services such
356	   as the OID Resolution Service (ORS) [X.672] and the OID Repository
357	   [OID-INFO] are available.

359	   This specification allocates two numbers out of the single-byte tag
360	   space.  This use of code point space is justified by the wide use of
361	   object identifiers in data interchange.  For most common OIDs in use
362	   (namely those whose contents encode to less than 24 bytes), the CBOR
363	   encoding will match the efficiency of [X.690].  (This preliminary
364	   conclusion is likely to generate some discussion, see Section 16.2.)

366	5.  Diagnostic Notation

368	   Implementers will likely want to see OIDs and relative OIDs in their
369	   "natural forms" (as sequences of decimal unsigned integers) for
370	   diagnostic purposes.  Accordingly, this section defines additional
371	   syntactic elements that can be used in conjunction with the
372	   diagnostic notation described in Section 6 of [RFC7049].

374	   An object identifier may be written in ASN.1 value notation (with
375	   enclosing braces and secondary identifiers, ObjectIdentifierValue of
376	   Clause 32.3 of [X.680]), or in dotted decimal notation with at least
377	   three arcs.  Both examples are shown in Section 3.  The surrounding
378	   tag notation is not to be used, because the tag is implied.  The
379	   ASN.1 value notation for OIDs does not overlap with JSON object
380	   notation for CBOR maps, because at least two arcs are required for a
381	   valid OID.

383	   A relative object identifier may be written in dotted decimal
384	   notation or in ASN.1 value notation, in both cases prefixed with a
385	   dot as shown in Section 3.3.  The surrounding tag notation is not to
386	   be used, because the tag is implied.

388	   The notation in this section may be employed in addition to the basic
389	   notation, which would be a tagged binary string.

391	       +------------------------------+--------------+------------+
392	       | RFC 7049 diagnostic notation | 6(h'2b0601') | 7(h'0601') |
393	       +------------------------------+--------------+------------+
394	       | Dotted decimal notation      | 1.3.6.1      | .6.1       |
395	       | ASN.1 value notation         | {1 3 6 1}    | .{6 1}     |
396	       +------------------------------+--------------+------------+

398	            Table 1: Examples for extended diagnostic notation

400	6.  A New Arc for Concise OIDs

402	   Object identifiers in [X.690] form are remarkably compact.
403	   Nevertheless, for some applications (and engineers), they are simply
404	   not compact enough, at least when compared to certain alternatives
405	   such as very small unsigned integers (see Section 7).  The shortest
406	   object identifier under the IETF's control is 1.3.6.1 (4 bytes),
407	   although an assignment directly under that arc has not happened since
408	   1999 [RFC2506], and no assignments directly under that arc have ever
409	   been assigned directly to protocol elements.  The shortest IETF-
410	   controlled, First-Come, First-Served OID arc is 8 bytes by getting a
411	   Private Enterprise Number from IANA, an OID for which is assigned
412	   under 1.3.6.1.4.1.  To promote object identifier usage in CBOR and to
413	   make OIDs as competitive as possible, (the authors / the IETF / ISOC)
414	   have secured a very short arc "{ x y z }" that only occupies (1, 2,
415	   3) byte(s).

417	   [[NB: Registration procedures under that arc.]]

419	   The history of OIDs suggests that the human mind tends to excessive
420	   taxonomy around them.  Unlike assignments in the 1.3.6.1 range, this
421	   document suggests that registrants acquire OIDs under this short arc
422	   "laterally" rather than hierarchically, in keeping with CBOR's design
423	   goal to have concise serializations.

425	7.  Enumerations in CBOR

427	   This section provides a roadmap to using enumerated items in CBOR,
428	   including design considerations for choosing between OIDs, integers,
429	   and UTF-8 strings.

431	   CBOR does not have an ENUMERATED type like ASN.1 to identify named
432	   values in a protocol element with three or more states (Clause 20 and
433	   Clause G.2.3 of [X.680]).  ASN.1 ENUMERATED turns out to be
434	   superfluous because ASN.1 INTEGER values can get named (and have
435	   historically been used for finite, multistate variables, such as
436	   version numbers), while ASN.1 ENUMERATED types can be defined to be
437	   extensible with the ellipsis lexical item.  Practically, the named
438	   integers are not serialized in the binary encodings anyway; they
439	   merely serve as a semantic hints for designers and debuggers.

441	   CBOR expects that protocol designers will use one of the basic major
442	   types for multistate variables, assigning semantics to particular
443	   values using higher-level schemas.  The obvious choices for the basic
444	   types are integers (particularly unsigned integers) and UTF-8
445	   strings.  However, these major types are not without drawbacks.

447	   Integers are compact for small values, but have a flat namespace so
448	   there are mis-assignment and collision risks that can only be
449	   mitigated with protocol-specific registries.  Arrays of integers are
450	   possible, but arrays require more processing logic for equality
451	   comparisons, and the JSON conversion is not intuitive when the
452	   enumerated value serves as a key in a map.

454	   UTF-8 strings are less compact when the strings are supposed to
455	   resemble their semantics, and there are normalization issues if the
456	   strings contain characters beyond the ASCII range.  UTF-8 strings
457	   also comprise a flat namespace like integers unless the higher-level
458	   schema employs delimiters, which makes the string even larger.  If
459	   conciseness is a design goal, other perceived advantages of a string
460	   as an identifier are pretty much blown out the moment one has to tack
461	   "https://" onto the front.

463	   This section provides a novel alternative in OIDs.

465	7.1.  Factors Favoring OID Enumerations

467	   A protocol designer might choose OIDs or relative OIDs for an
468	   enumerated item in view of the following observations:

470	   1.  OIDs and relative OIDs are quite compact: a single-arc relative
471	       OID encoded according to this specification occupies just two
472	       bytes for primary integer values 0-127 (excluding the semantic
473	       tag <<R>>), and three bytes for primary integer values 128-16383.
474	       (In contrast, an unsigned integer requires one byte for 0-23, two
475	       bytes for 24-255, and three bytes for 256-65535.)

477	   2.  OIDs and relative OIDs (with base) are persistent and globally
478	       unambiguous.

480	   3.  OIDs and relative OIDs have built-in semantics for designers and
481	       debuggers.  Specifically, the advent of universal OID
482	       repositories such as [OID-INFO] makes it easy for a designer or
483	       debugger to pull up useful information about the object of
484	       interest (Clause 3.5.10 of [X.660]).  This useful information
485	       (for humans) does not have to bleed into the encoded
486	       representation (for machines).

488	   4.  OIDs and relative OIDs are always compared for exact equality: no
489	       need to deal with case folding, case sensitivity, or other
490	       normalization issues.  ("Overlong" encodings are PROHIBITED;
491	       therefore overlong encodings MUST be treated as coding errors.)

493	   5.  OIDs and relative OIDs have a built-in hierarchy, so if
494	       implementers want to extend an enumeration without assigning new
495	       values "horizontally", they have the option of assigning new
496	       values "vertically", possibly with more or less stringent
497	       assignment rules.

499	   6.  Because OIDs and relative OIDs (with base) are part of the so-
500	       called International Object Identifier tree [X.660], any other
501	       protocol specification can reuse the enumeration if the designers
502	       find it useful.

504	   7.  OIDs and relative OIDs have natural JSON representations in the
505	       dotted decimal notations prescribed in Section 5.  OIDs and
506	       relative OIDs can be distinguished from each other by the
507	       presence or absence of the leading dot ".".  As the resulting
508	       JSON string is entirely numeric in the ASCII range, case and
509	       normalization are irrelevant to the comparison.  (An object
510	       identifier also has a semantic string representation in the form
511	       of an OID-IRI [X.680], for those who really want that type of
512	       thing.)

514	   8.  OIDs and relative OIDs are human language-neutral.  A protocol
515	       designer working in US-English might name an enumerated value
516	       "sig" for "signature", but "sig" could also stand for
517	       "significand", "signal", or "special interest group".  In Swedish
518	       and Norwegian, "sig" is a pronoun that means "himself, herself,
519	       itself, one, them", etc.--an entirely different meaning.

521	7.2.  Factors Favoring Integer Enumerations

523	   A protocol designer might choose integers for an enumerated item in
524	   view of the following observations:

526	   1.  The CBOR encoding of unsigned integers 0-23 is the most compact,
527	       occupying exactly one byte (excluding any semantic tags).

529	   2.  A protocol designer may wish to prohibit extensibility as a
530	       matter of course.  Integers comprise a single flat namespace:
531	       there is no hierarchy.

533	   3.  If greater range is desired while sticking to one byte, a
534	       protocol designer may double the range of possible values by
535	       allowing negative integers.  However, enumerating values using
536	       negative integers may have unintended side-effects, because some
537	       programming environments (e.g., C/C++) make implementation-
538	       defined assumptions about the number of bits needed for an
539	       enumerated type.

541	7.3.  Factors Favoring UTF-8 String Enumerations

543	   A protocol designer might choose UTF-8 strings for an enumerated item
544	   in view of the following observations:

546	   1.  A specification can practically limit the content of UTF-8
547	       strings to the ASCII range (or narrower), mitigating some
548	       normalization problems.

550	   2.  UTF-8 strings are easier to read on-the-wire for humans.

552	   3.  UTF-8 strings can contain arbitrary textual identifiers, which
553	       can be hierarchical, e.g., URIs.

555	7.4.  OID Enumeration Example

557	   An enumerated item indicates the revision level of a data format.
558	   Revision levels are issued by year, such as 2011, 2012, etc.
559	   However, in the year 2013, two revisions were issued: the first one
560	   and an important update in June that needs to be distinguished.  The
561	   revision levels are assigned to some OID arc:

563	   "{2 25 6464646464 revs(4)}"

565	   In this arc, the following sub-arcs are assigned:

567	                          +--------------------+
568	                          | Sub-Arc            |
569	                          +--------------------+
570	                          | {v2011(1)}         |
571	                          | {v2012(2)}         |
572	                          | {v2013(3)}         |
573	                          | {v2013(3) june(6)} |
574	                          | {v2014(4)}         |
575	                          | {v2015(5)}         |
576	                          +--------------------+

578	                         Table 2: Example Sub-Arcs

580	   In CBOR, the enumeration is encoded as a relative OID.  The schema
581	   specifies the base OID arc, which is omitted:

583	   c7         # tag(7)
584	      41 03   # .3

586	   c7         # tag(7)
587	      42 0306 # .3.6

589	                    Figure 7: Enumerated Items in CBOR

591	   .3
592	   .{v2013(3) june(6)}

594	          Figure 8: Enumerated Items in CBOR Diagnostic Notation

596	   ".3"
597	   ".3.6"

599	            Figure 9: Enumerated Items in JSON (possibility 1)

601	   "v2013"
602	   "v2013/june"

604	            Figure 10: Enumerated Items in JSON (possibility 2)

606	8.  Tag Factoring and Tag Stacking with OID Arrays and Maps

608	   A common use of object identifiers in ASN.1 is to identify the kind
609	   of data in an open type (Clause 3.8.57 of [X.680]), using information
610	   object classes [X.681].  CBOR is schema-neutral, and (although not
611	   fully discussed in [RFC7049]) semantic tagging was originally
612	   intended to identify items in a global, context-free way (i.e., where
613	   a specification would not repurpose a tag with different semantics
614	   than its IANA registration).  Therefore, using OIDs to identify
615	   contextual data in a similar fashion to [X.681] is RECOMMENDED.

617	8.1.  Tag Factoring

619	   <<O>> and <<R>> can tag CBOR arrays and maps.  The idea is that the
620	   tag is factored out from each individual byte string; the tag is
621	   placed in front of the array or map instead.  The tags <<O>> and
622	   <<R>> are left-distributive.

624	   When the <<O>> or <<R>> tag is applied to an array, it means that the
625	   respective tag is imputed to all items in the array.  For example,
626	   when the array is tagged with <<O>>, every array item that is a
627	   binary string is an OID.

629	   When the <<O>> or <<R>> tag is applied to a map, it means that the
630	   respective tag is imputed to all keys in the map.  The values in the
631	   map are not considered specially tagged.

633	   Array and map stacking is permitted.  For example, a 3-dimensional
634	   array of OIDs can be composed by using a single <<O>> tag, followed
635	   by an array of arrays of arrays of binary strings.  All such binary
636	   strings are considered OIDs.

638	8.2.  Switching OID and Relative OID

640	   If an individual item in a <<O>> or <<R>> tagged array, or an
641	   individual key in a <<O>> or <<R>> tagged map, is tagged with the
642	   opposite tag (<<R>> or <<O>>) of the array or map itself, that tag
643	   cancels and replaces the outer tag for that item.  Like tags MUST NOT
644	   be used on such individual items; such tagging is a coding error.
645	   For example, if <<R>> is the outer tag on an array and <<O>> is the
646	   inner tag on a binary string, semantically the inner item is treated
647	   as a regular OID, not as a relative OID.

649	   The purpose is to create more compact and flexible identifier spaces,
650	   especially when object identifiers are used as enumerated items.
651	   Examples:

653	   <<R>> outside, <<O>> inside: An implementation that strives for a
654	   compact representation, does not have to emit base OID arcs
655	   repeatedly for each item.  At the same time, if a private
656	   organization or standards body separate from the specification needs
657	   to identify something that the specification maintainers disagree
658	   with, the separate body does not need to request registration of an
659	   identifier under a controlled arc (i.e., the base arc of the relative
660	   OIDs).

662	   <<O>> outside, <<R>> inside: A collection of OIDs is supposed to be
663	   open to all-comers, but a certain set of OIDs issued under a
664	   particular arc is foreseeable for the majority of implementations.
665	   For example, an OID protocol slot may identify cryptographic
666	   algorithms: anyone can write (and has written) an algorithm with an
667	   arbitrary OID.  However, the protocol slot designer may wish to
668	   privilege certain algorithms (and therefore OIDs) that are well-known
669	   in that field of use.

671	8.3.  Tag Stacking

673	   CBOR permits tag stacking (tagging a tagged item), although this
674	   technique has not been used much yet.  This specification anticipates
675	   that OIDs and relative OIDs will be associated with values with
676	   uniform semantics.  This section provides specific semantics when
677	   tags are "stacked", that is, a CBOR item starts with tag <<O>> or
678	   <<R>>, followed by one or more arbitrary tags ("subsequent tags"),
679	   followed by a map or array.

681	8.3.1.  Map

683	   The overall gist is that the first tag applies to the keys in a map;
684	   the subsequent tags apply to the values in a map.

686	   When <<O>> or <<R>> is the first tag in a stack of tags, followed by
687	   a map:

689	   o  The <<O>> or <<R>> tag indicates that the keys of the map are byte
690	      string OIDs, byte string relative OIDs, or tag-factored arrays or
691	      maps of the same.

693	   o  The subsequent tags uniformly apply to all of the values.

695	   For example, if tag 32 (URL) is the subsequent tag, then all values
696	   in the map are treated semantically as if tag 32 is applied to them
697	   individually.  See Figure 11.

699	   It is possible that individual values can be tagged.  Semantically,
700	   these tags cumulate with the outer subsequent tags; inner value tags
701	   do not cancel or replace the outer tags.

703	8.3.2.  Array

705	   The overall gist is that the first tag applies to the ordered "keys"
706	   in the array (even-numbered items, assuming that the index starts at
707	   0); the subsequent tags apply to the ordered "values" in the array
708	   (odd-numbered items).  This tagging technique creates an ordered
709	   associative array.  [[NB: Some call this the FORTRAN approach. need
710	   to cite]]

712	   When <<O>> or <<R>> is the first tag in a stack of tags, followed by
713	   an array:

715	   o  The <<O>> or <<R>> tag indicates that alternating items, starting
716	      with the first item, are byte string OIDs, byte string relative
717	      OIDs, or tag-factored arrays or maps of the same.

719	   o  The subsequent tags uniformly apply to the alternating items,
720	      starting with the second item.

722	   o  The array MUST have an even number of items; an array that has an
723	      odd number of items is a coding error.

725	   To create an ordered associative array wherein the values (even
726	   elements) are arbitrarily tagged, stack tag 55799, self-describe CBOR
727	   (Section 2.4.5 of [RFC7049]), after the <<O>> or <<R>> tag.  Tag
728	   55799 imparts no special semantics, so it is an effective
729	   placeholder.  (This sequence is mainly provided for completeness: it
730	   is a more compact alternative to an array of duple-arrays that each
731	   contain an OID or relative OID, and an arbitrary value.)

733	8.4.  Diagnostic Notation for OID Arrays and Maps

735	   There are no syntactic changes to diagnostic notation beyond
736	   Section 5.  Using <<O>> or <<R>> with arrays and maps, however, leads
737	   to some sublime results.

739	   When an array or map is tagged, that item is embraced with the usual
740	   tag format: "<<O>>(<item>)" or "<<R>>(<item>)".  This syntax
741	   indicates the presence of the tag on the outer item.  Inner items in
742	   the array or keys in the map are noted in Section 5 form, but are not
743	   individually tagged on-the-wire when the tag is the same as the outer
744	   tag, because like-tagging is a coding error.

746	   An array or map that involves a stack of tags is notated the usual
747	   way.  For example, the CBOR diagnostic notation of a map of OIDs to
748	   URIs is:

750	   6(32({0.9.2342.7776.1: "http://example.com/",
751	         0.9.2342.7776.2: "ftp://ftp.example.com/pub/"}))

753	       Figure 11: Map of OIDs to URIs, in CBOR Diagnostic Diagnostic
754	                                 Notation

756	9.  Applications and Examples of OIDs

758	9.1.  GPU Farm

760	   Consider a 3-dimensional OID array, indicating certain operations to
761	   perform on a matrix of values in a GPU farm.  Default operations are
762	   under the OID arc 0.9.2342.7777 (such as .1, .2, .124, etc.); the arc
763	   0.9.2342.7777 itself represents the identity operation.  Certain
764	   cryptographic operations like SHA-256 hashing
765	   (2.16.840.1.101.3.4.2.1) are also permitted.  The resulting notation
766	   would be:

768	   7([[[.1,   .2,   .3],
769	       [.1,   .2,   .3],
770	       [.1,   .2,   .3]],
771	      [[.124, .125, .126],
772	       [.95,  .96,  .97 ],
773	       [.11,  .12,  .13 ]],
774	      [[h'',  .6,   .4.2],
775	       [.6,   h'',  .4.2],
776	       [.6,   2.16.840.1.101.3.4.2.1, h'']]])

778	    Figure 12: GPU Farm Matrix Operations, in CBOR Diagnostic Notation

780	   c7                                   # tag(7)
781	      83                                # array(3)
782	         83                             # array(3)
783	            83                          # array(3)
784	               41 01                    # .1 (2)
785	               41 02                    # .2 (2)
786	               41 03                    # .3 (2)
787	            83                          # array(3)
788	               41 01                    # .1 (2)
789	               41 02                    # .2 (2)
790	               41 03                    # .3 (2)
791	            83                          # array(3)
792	               41 01                    # .1 (2)
793	               41 02                    # .2 (2)
794	               41 03                    # .3 (2)
795	         83                             # array(3)
796	            83                          # array(3)
797	               41 7c                    # .124 (2)
798	               41 7d                    # .125 (2)
799	               41 7e                    # .126 (2)
800	            83                          # array(3)
801	               41 5f                    # .95 (2)
802	               41 60                    # .96 (2)
803	               41 61                    # .97 (2)
804	            83                          # array(3)
805	               41 0b                    # .11 (2)
806	               41 0c                    # .12 (2)
807	               41 0d                    # .13 (2)
808	         83                             # array(3)
809	            83                          # array(3)
810	               40                       # (empty) (1)
811	               41 06                    # .6 (2)
812	               42 0402                  # .4.2 (3)
813	            83                          # array(3)
814	               41 06                    # .6 (2)
815	               40                       # (empty) (1)
816	               42 0402                  # .4.2 (3)
817	            83                          # array(3)
818	               41 06                    # .6 (2)
819	               c6 49 608648016503040201 # 2.16.840.1.101.3.4.2.1 (10)
820	               40                       # (empty) (1)

822	         Figure 13: GPU Farm Matrix Operations, in CBOR (76 bytes)

824	9.2.  X.500 Distinguished Name

826	   Consider the X.500 distinguished name:

828	   +----------------------------------------------+--------------------+
829	   | Attribute Types                              | Attribute Values   |
830	   +----------------------------------------------+--------------------+
831	   | c (2.5.4.6)                                  | US                 |
832	   +----------------------------------------------+--------------------+
833	   | l (2.5.4.7)                                  | Los Angeles        |
834	   | s (2.5.4.8)                                  | CA                 |
835	   | postalCode (2.5.4.17)                        | 90013              |
836	   +----------------------------------------------+--------------------+
837	   | street (2.5.4.9)                             | 532 S Olive St     |
838	   +----------------------------------------------+--------------------+
839	   | businessCategory (2.5.4.15)                  | Public Park        |
840	   | buildingName (0.9.2342.19200300.100.1.48)    | Pershing Square    |
841	   +----------------------------------------------+--------------------+

843	                 Table 3: Example X.500 Distinguished Name

845	   Table 3 has four RDNs.  The country and street RDNs are single-
846	   valued.  The second and fourth RDNs are multi-valued.

848	   The equivalent representations in CBOR diagnostic notation and CBOR
849	   are:

851	   6([{ 2.5.4.6: "US" },
852	      { 2.5.4.7: "Los Angeles", 2.5.4.8: "CA", 2.5.4.17: "90013" },
853	      { 2.5.4.9: "532 S Olive St" },
854	      { 2.5.4.15: "Public Park",
855	        0.9.2342.19200300.100.1.48: "Pershing Square" }])

857	        Figure 14: Distinguished Name, in CBOR Diagnostic Notation

859	   6([{ h'550406': "US" },
860	      { h'550407': "Los Angeles", h'550408': "CA", h'550411': "90013" },
861	      { h'550409': "532 S Olive St" },
862	      { h'55040f': "Public Park",
863	        h'0992268993f22c640130': "Pershing Square" }])

865	   Figure 15: Distinguished Name, in CBOR Diagnostic Notation (RFC 7049
866	                                   only)

868	   c6                                         # tag(6)
869	      84                                      # array(4)
870	         a1                                   # map(1)
871	            43 550406                         # 2.5.4.6 (4)
872	            62                                # text(2)
873	               5553                           # "US"
874	         a3                                   # map(3)
875	            43 550407                         # 2.5.4.7 (4)
876	            6b                                # text(11)
877	               4c6f7320416e67656c6573         # "Los Angeles"
878	            43 550408                         # 2.5.4.8 (4)
879	            62                                # text(2)
880	               4341                           # "CA"
881	            43 550411                         # 2.5.4.17 (4)
882	            65                                # text(5)
883	               3930303133                     # "90013"
884	         a1                                   # map(1)
885	            43 550409                         # 2.5.4.9 (4)
886	            6e                                # text(14)
887	               3533322053204f6c697665205374   # "532 S Olive St"
888	         a2                                   # map(2)
889	            43 55040f                         # 2.5.4.15 (4)
890	            6b                                # text(11)
891	               5075626c6963205061726b         # "Public Park"
892	            4a 0992268993f22c640130    # 0.9.2342.19200300.100.1.48 (11)
893	            6f                                # text(15)
894	               5065727368696e6720537175617265 # "Pershing Square"

896	            Figure 16: Distinguished Name, in CBOR (108 bytes)

898	   (This example encoding assumes that all attribute values are UTF-8
899	   strings, or can be represented as UTF-8 strings with no loss of
900	   information.)

902	   For reference, the [RFC4514] LDAP string encoding of such data would
903	   be:

905	   buildingName=Pershing Square+businessCategory=Public Park,
906	   street=532 S Olive St,l=Los Angeles+postalCode=90013+st=CA,c=US

908	    Figure 17: Distinguished Name, in LDAP String Encoding (121 bytes)

910	10.  Binary Internet Messages and MIME Entities

912	   Section 2.4.4.3 of [RFC7049] assigns tag 36 to "MIME messages
913	   (including all headers)" [RFC2045], and prescribes UTF-8 strings,
914	   without further elaboration.  Actually MIME encircles several
915	   different formats, and is not limited to UTF-8 strings.  This section
916	   updates tag 36.

918	10.1.  CBOR Byte String and Binary MIME

920	   Tag 36 is to be used with byte strings.  When the tagged item is a
921	   byte string, any octet can be used in the content.  Arbitrary octets
922	   are supported by [RFC2045] and can be supported in protocols such as
923	   SMTP using BINARYMIME [RFC3030].

925	   A conforming implementation that purports to process tag 36-tagged
926	   items, MUST accept byte strings as well as UTF-8 strings.  Byte
927	   strings, rather than UTF-8 strings, SHOULD be considered the default.
928	   (While binary Content-Transfer-Encoding is not particularly common as
929	   of this writing, 8-bit encoding is, and it is foreseeable that many
930	   8-bit encoded messages will still have charsets other than UTF-8.)

932	10.2.  Internet Messages, MIME Messages, and MIME Entities

934	   Definitions: "MIME message" is not explicitly defined in [RFC2045],
935	   but a careful read suggests that a MIME message is: "either a
936	   (complete or "top-level") RFC 822 message being transferred on a
937	   network, or a message encapsulated in a body of type "message/rfc822"
938	   or "message/partial"," that also contains MIME header fields, namely,
939	   MIME-Version field, which MUST be present (Section 4 of [RFC2045].
940	   Other MIME header fields such as Content-Type and Content-Transfer-
941	   Encoding are assumed to be their [RFC2045] default values, if not
942	   present in the data.

944	   When the contents have a From field (a type of "originator address
945	   field") and a Date field (the lone "origination date field")
946	   (Section 3.6 of [RFC5322]), the item is concluded to have a Content-
947	   Type of message/rfc822 or message/global, as appropriate, except as
948	   otherwise specified in this section.

950	   (TBD: Do we need a separate tag for a MIME entity?)  (Alternate
951	   proposal: When the tagged data does not include a MIME-Version field
952	   or other fields required by RFC822 (5322) (e.g., no From field), it
953	   is presumed to be a MIME entity, rather than a MIME message.
954	   Therefore, it has no top-level content-type: instead it is simply a
955	   "MIME entity", consisting of one element, whose Content-Type is the
956	   content of the Content-Type header field, if present, or the
957	   [RFC2045] default of "text/plain; charset=us-ascii", if absent.
958	   Content-Transfer-Encoding SHALL be assumed to be 8bit when the CBOR
959	   item is a UTF-8 string, and SHALL be assumed to be binary when the
960	   CBOR item is a byte string.  (Or should all be considered CTE:
961	   binary?)  And, when the tagged data has RFC822 required fields but no
962	   MIME-Version, shall we assume it's a MIME entity, or shall we assume
963	   it's an Internet message that does not conform to MIME?)

965	   Content that has no headers whatsoever is valid, and implementations
966	   that process tag 36 MUST permit this case: in such a case, the data
967	   starts with CRLF CRLF, followed by the body.  In such a case, the
968	   content is assumed to be a MIME entity of Content-Type "text/plain;
969	   charset=us-ascii", and not an RFC822 (RFC5322) Internet message.
970	   (TBD: Confirm.)

972	10.3.  Netnews, HTTP, and SIP Messages

974	   Other message types that are MIME-related are message/news, message/
975	   http, and message/sip.

977	   [RFC5537] specifies that message/news is deprecated (marked as
978	   obsolete) and that message/rfc822 SHOULD be used in its place;
979	   presumably this also extends to message/global over time.  Netnews
980	   Article Format [RFC5536] is a strict subset of Internet Message
981	   Format; it can be detected by the presence of the six mandatory
982	   header fields: Date, From, Message-ID, Newsgroups, Path, and Subject.
983	   (Newsgroups and Path fields are specific to Netnews.)

985	   message/http [RFC7230] is the media type for HTTP requests and
986	   responses.  It can be detected by analyzing the first line of the
987	   body, which is an HTTP Start Line (Section 3.1 of [RFC7230]): it does
988	   not conform to the syntax of an Internet Message Format header field.
989	   The optional parameter "msgtype" can be inferred from the Start Line.
990	   Implementers need to be aware that the default character encoding for
991	   message/http is ISO-8859-1, not UTF-8.  Therefore, implementations
992	   SHOULD NOT encode HTTP messages with CBOR UTF-8 strings.

994	   Similarly, message/sip [RFC3261] is the media type of SIP request and
995	   response messages.  It can be detected by analyzing the first line of
996	   the body, which is a SIP start-line (Section 7.1 of [RFC3261]): it
997	   does not conform to the syntax of an Internet Message Format header
998	   field.  The optional parameter can be inferred from the start-line.

1000	10.4.  Other Messages

1002	   The CBOR binary or UTF-8 string MAY contain other types of messages.
1003	   An implementation MAY send such a message as a MIME entity with the
1004	   Content-Type field appropriately set, or alternatively, MAY send the
1005	   message at the top-level directly.  However, if a purported message
1006	   type is ambiguous with a message/rfc822 (or message/global) message,
1007	   a receiver SHALL treat the message as message/rfc822 (or message/
1008	   global).  If a purported message type is ambiguous with a MIME entity
1009	   (and unambiguously not message/rfc822 or message/global), a receiver
1010	   SHALL treat the message as a MIME entity.

1012	11.  Applications and Examples of Messages and Entities

1014	   Tag 36 is the RECOMMENDED way to convey data with MIME-related
1015	   metadata, including messages (which may or may not actually be MIME-
1016	   enabled) and MIME entities.

1018	   Example 1: A legacy RFC822 message is encoded as a UTF-8 string or
1019	   byte string with tag 36.  The contents have From, To, Date, and
1020	   Subject header fields, two CRLFs, and a single line "Hello World!",
1021	   terminated with a CRLF.

1023	   Example 2a: A [RFC5280] certificate is encoded as a byte string with
1024	   tag 36.  The contents are comprised of "Content-Type: application/
1025	   pkix-cert", two CRLFs, and the DER encoding of the certificate.  (The
1026	   "Content-Transfer-Encoding: binary" header is not necessary.)

1028	   Example 2b: A [RFC5280] certificate is encoded as a UTF-8 string or
1029	   byte string with tag 36.  The contents are comprised of "Content-
1030	   Type: application/pkix-cert", a CRLF, "Content-Transfer-Encoding:
1031	   base64", two CRLFs, and the base64 encoding of the DER encoding of
1032	   the certificate, conforming to Section 6.8 of [RFC2045].  In
1033	   particular, base64 lines are limited to 76 characters, separated by
1034	   CRLF, and the final line is supposed to end with CRLF.  Needless to
1035	   say, this is not nearly as efficient as Example 2a.

1037	12.  X.690 Series Tags

1039	   [[NB: Carsten probably won't like this.  Plan on removing this
1040	   section.  It is mainly provided to contrast with Section 10.]]

1042	   It is foreseeable that CBOR applications will need to send and
1043	   receive ASN.1 data, for example, for legacy or security applications.
1044	   While a native representation in CBOR is preferred, preserving the
1045	   data in an ASN.1 encoding may be necessary, for example, to preserve
1046	   cryptographic verification.  A tag <<X>> is allocated for this
1047	   purpose.

1049	   When the tagged item is a byte string, the byte string contents are
1050	   encoded according to [X.690], i.e., BER, CER, or DER.  CBOR
1051	   implementations are not required to validate conformance of the
1052	   contained data to [X.690].

1054	   When the tagged item is an array with 3 items:

1056	   1.  The first item SHALL be an OID (with tag <<O>> omitted; it SHALL
1057	       NOT be a relative OID), indicating the ASN.1 module containing
1058	       the type of the PDU.  [[NB: this is a good example of a non-
1059	       trivial structure in which an element is well-defined to be an
1060	       OID, which has a tag.  Is the CBOR philosophy to tag the item, or
1061	       omit the tag on the item, when the item's semantics are already
1062	       fixed by the outer tag?  Similar situations can apply to tag 32
1063	       (URI), etc.]]

1065	   2.  The second item SHALL be a UTF-8 string indicating the ASN.1
1066	       value's _type reference name_ (Clause 3.8.88 of [X.680])
1067	       conforming to the "typereference" production (Clause 12.2 of
1068	       [X.680]).

1070	   3.  The third item SHALL be a byte string, whose contents are encoded
1071	       per the prior paragraph.

1073	   (TBD: Use of tagged UTF-8 string is reserved for ASN.1 textual
1074	   formats such as XER and ASN.1 value notation?  Probably not
1075	   necessary.  Just omit.)

1077	   Implementation note: DER-encoded items are always definite-length, so
1078	   there is very little reason to use CBOR byte string indefinite
1079	   encoding when encoding such DER-encoded items.

1081	   Example: A [RFC5280] certificate can be encoded:

1083	   1.  as a byte string with tag <<X>>, or

1085	   2.  as an array with tag <<X>>, with three elements:

1087	       (1)  a byte string "h'2B 06 01 05 05 07 00 12'", which is the BER
1088	            encoding of 1.3.6.1.5.5.7.0.18,

1090	       (2)  a UTF-8 string "Certificate", and

1092	       (3)  a byte string containing the DER encoding of the
1093	            certificate.

1095	13.  Regular Expression Clarification

1097	   (TODO: better specify conformance to actual regular expression
1098	   standards with tag 35.  PCRE and JavaScript/ECMAScript regular
1099	   expressions are very different; [RFC7049] is not specific enough
1100	   about this.)

1102	14.  Set and Multiset Technique

1104	   CBOR has no native type for a set, which is an arbitrary unordered
1105	   collection of items.  The following technique is RECOMMENDED to
1106	   express set and multiset semantics concisely in native CBOR data.

1108	   In computer science, a _set_ is a collection of distinct items; there
1109	   is no ordering to the items.  Thus, implementations can optimize set
1110	   storage in many ways that are not available with ordered elements in
1111	   arrays.  Sets can be stored in hashtables, bit fields, trees, or
1112	   other abstract data types.

1114	   In computer science, a _multiset_ allows multiple instances of a
1115	   set's elements.  Put another way, each distinct item has a
1116	   cardinality property indicating the number of these items in the
1117	   multiset.

1119	   To store items in a set or multiset, it is RECOMMENDED to store the
1120	   CBOR items as keys in a map; the values SHALL all be positive
1121	   integers (major type 0, value/additional information greater than or
1122	   equal to 1).  In the special case of a set, the values SHALL be the
1123	   integer 1.  This technique has no special tag associated with it.  As
1124	   with arrays that schemas classify as "records" (i.e., arrays with
1125	   positionally defined elements), schemas are likewise free to classify
1126	   maps as sets in particular instances.

1128	15.  Fruits Basket Example

1130	   Consider a basket of fruits.  The basket can contain any number of
1131	   fruits; each fruit of the same species is considered identical.  This
1132	   basket has two apples, four bananas, six pears, and one pineapple:

1134	   {"\u{1F34E}": 2, "\u{1F34C}": 4,
1135	    "\u{1F350}": 6, "\u{1F34D}": 1}

1137	           Figure 18: Fruits Basket in CBOR Diagnostic Notation

1139	   A4                       # map(4)
1140	      64                    # text(4)
1141	         f09f8d8e           # "\u{1F34E}"
1142	      02                    # unsigned(2)
1143	      64                    # text(4)
1144	         f09f8d8c           # "\u{1F34C}"
1145	      04                    # unsigned(4)
1146	      64                    # text(4)
1147	         f09f8d90           # "\u{1F350}"
1148	      06                    # unsigned(6)
1149	      64                    # text(4)
1150	         f09f8d8d           # "\u{1F34D}"
1151	      01                    # unsigned(1)

1153	                Figure 19: Fruits Basket in CBOR (33 bytes)

1155	   [[TODO: Consider a Merkle Tree example: set of sets of sets of sets
1156	   of things. ???]]

1158	16.  IANA Considerations

1160	   (This section to be edited by the RFC editor.)

1162	16.1.  CBOR Tags

1164	   IANA is requested to assign the CBOR tags in Table 4, with the
1165	   present document as the specification reference.

1167	   +----------+-------------+------------------------------------------+
1168	   | Tag      | Data Item   |                                Semantics |
1169	   +----------+-------------+------------------------------------------+
1170	   | 6<<TBD>> | multiple    |         object identifier (BER encoding) |
1171	   | 7<<TBD>> | multiple    |          relative object identifier (BER |
1172	   |          |             |                                encoding) |
1173	   +----------+-------------+------------------------------------------+

1175	                       Table 4: Values for New Tags

1177	16.2.  Discussion

1179	   (This subsection to be removed by the RFC editor.)

1181	   The space for single-byte tags in CBOR (0..23) is severely limited.
1182	   It is not clear that the benefits of encoding OIDs/relative OIDs with
1183	   one less byte per instance outweigh the consumption of two values in
1184	   this code point space.

1186	   Procedurally, this space is also reserved for standards action.

1188	   An alternative would be to go for the specification required space,
1189	   e.g. tag number 40 for <<O>> and tag number 41 for <<R>>.  As an
1190	   example this would change Figure 2 into:

1192	   d8 28                            # tag(40)
1193	      49                            # bytes(9)
1194	         60 86 48 01 65 03 04 02 01 #

1196	     Figure 20: SHA-256 OID in cbor (using specification required tag)

1198	16.3.  Pre-Existing Tags

1200	   (TODO: complete.)  IANA is requested to modify the registrations for
1201	   the following CBOR tags:

1203	            +-----+-------------+----------------------------+
1204	            | Tag | Data Item   |                  Semantics |
1205	            +-----+-------------+----------------------------+
1206	            | 35  | <<TBD>>     | regular expression <<TBD>> |
1207	            | 36  | multiple    |     message or MIME entity |
1208	            +-----+-------------+----------------------------+

1210	                     Table 5: Values for Existing Tags

1212	16.4.  New Tags

1214	   (TODO: complete.)

1216	17.  Security Considerations

1218	   The security considerations of RFC 7049 apply.

1220	   The encodings in Clauses 8.19 and 8.20 of [X.690] are extremely
1221	   compact and unambiguous, but MUST be followed precisely to avoid
1222	   security pitfalls.  In particular, the requirements set out in
1223	   Section 2.1 of this document need to be followed; otherwise, an
1224	   attacker may be able to subvert a checking process by submitting
1225	   alternative representations that are later taken as the original (or
1226	   even something else entirely) by another decoder supposed to be
1227	   protected by the checking process.

1229	   OIDs and relative OIDs can always be treated as opaque byte strings.
1230	   Actually understanding the structure that was used for generating
1231	   them is not necessary, and, except for checking the structure
1232	   requirements, it is strongly NOT RECOMMENDED to perform any
1233	   processing of this kind (e.g., converting into dotted notation and
1234	   back) unless absolutely necessary.  If the OIDs are translated into
1235	   other representations, the usual security considerations for non-
1236	   trivial representation conversions apply; the integers of the sub-
1237	   identifiers need to be handled as unlimited-range integers (cf.
1238	   Figure 4).

1240	17.1.  Conversions Between BER and Dotted Decimal Notation

1242	   [PKILCAKE] uncovers exploit vectors for the illegal values above, as
1243	   well as for cases in which conversion to or from the dotted decimal
1244	   notation goes awry.  Neither [X.660] nor [X.680] place an upper bound
1245	   on the range of unsigned integer values for an arc; the integers are
1246	   arbitrarily valued.  An implementation SHOULD NOT attempt to convert
1247	   each component using a fixed-size accumulator, as an attacker will
1248	   certainly be able to cause the accumulator to overflow.  Compact and
1249	   efficient techniques for such conversions, such as the double dabble
1250	   algorithm [DOUBLEDABBLE] are well-known in the art; their application
1251	   to this field is left as an exercise to the reader.

1253	18.  References

1255	18.1.  Normative References

1257	   [RFC2045]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
1258	              Extensions (MIME) Part One: Format of Internet Message
1259	              Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996,
1260	              <http://www.rfc-editor.org/info/rfc2045>.

1262	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1263	              Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/
1264	              RFC2119, March 1997,
1265	              <http://www.rfc-editor.org/info/rfc2119>.

1267	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
1268	              A., Peterson, J., Sparks, R., Handley, M., and E.
1269	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
1270	              DOI 10.17487/RFC3261, June 2002,
1271	              <http://www.rfc-editor.org/info/rfc3261>.

1273	   [RFC5322]  Resnick, P., Ed., "Internet Message Format", RFC 5322, DOI
1274	              10.17487/RFC5322, October 2008,
1275	              <http://www.rfc-editor.org/info/rfc5322>.

1277	   [RFC5536]  Murchison, K., Ed., Lindsey, C., and D. Kohn, "Netnews
1278	              Article Format", RFC 5536, DOI 10.17487/RFC5536, November
1279	              2009, <http://www.rfc-editor.org/info/rfc5536>.

1281	   [RFC5537]  Allbery, R., Ed. and C. Lindsey, "Netnews Architecture and
1282	              Protocols", RFC 5537, DOI 10.17487/RFC5537, November 2009,
1283	              <http://www.rfc-editor.org/info/rfc5537>.

1285	   [RFC7049]  Bormann, C. and P. Hoffman, "Concise Binary Object
1286	              Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049,
1287	              October 2013, <http://www.rfc-editor.org/info/rfc7049>.

1289	   [RFC7230]  Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
1290	              Protocol (HTTP/1.1): Message Syntax and Routing", RFC
1291	              7230, DOI 10.17487/RFC7230, June 2014,
1292	              <http://www.rfc-editor.org/info/rfc7230>.

1294	   [X.660]    International Telecommunications Union, "Information
1295	              technology -- Procedures for the operation of object
1296	              identifier registration authorities: General procedures
1297	              and top arcs of the international object identifier tree",
1298	              ITU-T Recommendation X.660, July 2011.

1300	   [X.680]    International Telecommunications Union, "Information
1301	              technology -- Abstract Syntax Notation One (ASN.1):
1302	              Specification of basic notation", ITU-T Recommendation
1303	              X.680, August 2015.

1305	   [X.690]    International Telecommunications Union, "Information
1306	              technology -- ASN.1 encoding rules: Specification of Basic
1307	              Encoding Rules (BER), Canonical Encoding Rules (CER) and
1308	              Distinguished Encoding Rules (DER)", ITU-T Recommendation
1309	              X.690, August 2015.

1311	18.2.  Informative References

1313	   [DOUBLEDABBLE]
1314	              Gao, S., Al-Khalili, D., and N. Chabini, "An improved BCD
1315	              adder using 6-LUT FPGAs", IEEE 10th International New
1316	              Circuits and Systems Conference (NEWCAS 2012), pp. 13-16,
1317	              DOI: 10.1109/NEWCAS.2012.6328944, June 2012.

1319	   [OID-INFO]
1320	              Orange SA, "OID Repository", 2016,
1321	              <http://www.oid-info.com/>.

1323	   [PKILCAKE]
1324	              Kaminsky, D., Patterson, M., and L. Sassaman, "PKI Layer
1325	              Cake: New Collision Attacks Against the Global X.509
1326	              Infrastructure", FC 2010, Lecture Notes in Computer
1327	              Science 6052 289-303, DOI: 10.1007/978-3-642-14577-3_22,
1328	              January 2010, <http://dl.acm.org/citation.cfm?id=2163593>.

1330	   [RFC2506]  Holtman, K., Mutz, A., and T. Hardie, "Media Feature Tag
1331	              Registration Procedure", BCP 31, RFC 2506, DOI 10.17487/
1332	              RFC2506, March 1999,
1333	              <http://www.rfc-editor.org/info/rfc2506>.

1335	   [RFC3030]  Vaudreuil, G., "SMTP Service Extensions for Transmission
1336	              of Large and Binary MIME Messages", RFC 3030, DOI
1337	              10.17487/RFC3030, December 2000,
1338	              <http://www.rfc-editor.org/info/rfc3030>.

1340	   [RFC4514]  Zeilenga, K., Ed., "Lightweight Directory Access Protocol
1341	              (LDAP): String Representation of Distinguished Names", RFC
1342	              4514, DOI 10.17487/RFC4514, June 2006,
1343	              <http://www.rfc-editor.org/info/rfc4514>.

1345	   [RFC5280]  Cooper, D., Santesson, S., Farrell, S., Boeyen, S.,
1346	              Housley, R., and W. Polk, "Internet X.509 Public Key
1347	              Infrastructure Certificate and Certificate Revocation List
1348	              (CRL) Profile", RFC 5280, DOI 10.17487/RFC5280, May 2008,
1349	              <http://www.rfc-editor.org/info/rfc5280>.

1351	   [RFC6256]  Eddy, W. and E. Davies, "Using Self-Delimiting Numeric
1352	              Values in Protocols", RFC 6256, DOI 10.17487/RFC6256, May
1353	              2011, <http://www.rfc-editor.org/info/rfc6256>.

1355	   [RFC7388]  Schoenwaelder, J., Sehgal, A., Tsou, T., and C. Zhou,
1356	              "Definition of Managed Objects for IPv6 over Low-Power
1357	              Wireless Personal Area Networks (6LoWPANs)", RFC 7388, DOI
1358	              10.17487/RFC7388, October 2014,
1359	              <http://www.rfc-editor.org/info/rfc7388>.

1361	   [X.672]    International Telecommunications Union, "Information
1362	              technology -- Open systems interconnection -- Object
1363	              identifier resolution system", ITU-T Recommendation X.672,
1364	              August 2010.

1366	   [X.681]    International Telecommunications Union, "Information
1367	              technology -- Abstract Syntax Notation One (ASN.1):
1368	              Information object specification", ITU-T Recommendation
1369	              X.681, August 2015.

1371	Appendix A.  Changes from -03 to -04

1373	   Changes occurred based on limited feedback, mainly centered around
1374	   the abstract and introduction, rather than substantive technical
1375	   changes.  These changes include:

1377	   o  Changed the title so that it is about tags and techniques.

1379	   o  Rewrote the abstract to describe the content more accurately, and
1380	      to point out that no changes to the wire protocol are being
1381	      proposed.

1383	   o  Removed "ASN.1" from "object identifiers", as OIDs are independent
1384	      of ASN.1.

1386	   o  Rewrote the introduction to be more about the present text.

1388	   o  Proposed a concise OID arc.

1390	   o  Provided binary regular expression forms for OID validation.

1392	   o  Updated IANA registration tables.

1394	Appendix B.  Changes from -02 to -03

1396	   Many significant changes occurred in this version.  These changes
1397	   include:

1399	   o  Expanded the draft scope to be a comprehensive CBOR update.

1401	   o  Added OID-related sections: OID Enumerations, OID Maps and Arrays,
1402	      and Applications and Examples of OIDs.

1404	   o  Added Tag 36 update (binary MIME, better definitions).

1406	   o  Added stub/experimental sections for X.690 Series Tags (tag <<X>>)
1407	      and Regular Expressions (tag 35).

1409	   o  Added technique for representing sets and multisets.

1411	   o  Added references and fixed typos.

1413	Authors' Addresses

1415	   Carsten Bormann
1416	   Universitaet Bremen TZI
1417	   Postfach 330440
1418	   Bremen  D-28359
1419	   Germany

1421	   Phone: +49-421-218-63921
1422	   Email: cabo@tzi.org
1423	   Sean Leonard
1424	   Penango, Inc.
1425	   5900 Wilshire Boulevard
1426	   21st Floor
1427	   Los Angeles, CA  90036
1428	   USA

1430	   Email: dev+ietf@seantek.com
1431	   URI:   http://www.penango.com/