idnits 2.17.1 draft-ietf-cbor-7049bis-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 23, 2018) is 2012 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '2' on line 2379

  -- Looks like a reference, but probably isn't: '3' on line 2379

  -- Looks like a reference, but probably isn't: '4' on line 2377

  -- Looks like a reference, but probably isn't: '5' on line 2377

  -- Looks like a reference, but probably isn't: '100' on line 1577

  == Missing Reference: '-1' is mentioned on line 1573, but not defined

  -- Looks like a reference, but probably isn't: '1' on line 2656

  == Missing Reference: 'RFCthis' is mentioned on line 2009, but not defined

  == Missing Reference: 'TM' is mentioned on line 2196, but not defined

  -- Looks like a reference, but probably isn't: '0' on line 2672

  == Missing Reference: 'RFC4267' is mentioned on line 2807, but not defined

  == Missing Reference: 'CNN-TERMS' is mentioned on line 2809, but not defined

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ECMA262'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'IEEE.754.2008'

  -- Obsolete informational reference (is this intentional?): RFC 7049
     (Obsoleted by RFC 8949)


     Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 12 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                         C. Bormann
3	Internet-Draft                                   Universitaet Bremen TZI
4	Intended status: Standards Track                              P. Hoffman
5	Expires: April 26, 2019                                            ICANN
6	                                                        October 23, 2018

8	              Concise Binary Object Representation (CBOR)
9	                       draft-ietf-cbor-7049bis-04

11	Abstract

13	   The Concise Binary Object Representation (CBOR) is a data format
14	   whose design goals include the possibility of extremely small code
15	   size, fairly small message size, and extensibility without the need
16	   for version negotiation.  These design goals make it different from
17	   earlier binary serializations such as ASN.1 and MessagePack.

19	Contributing

21	   This document is being worked on in the CBOR Working Group.  Please
22	   contribute on the mailing list there, or in the GitHub repository for
23	   this draft: https://github.com/cbor-wg/CBORbis

25	   The charter for the CBOR Working Group says that the WG will update
26	   RFC 7049 to fix verified errata.  Security issues and clarifications
27	   may be addressed, but changes to this document will ensure backward
28	   compatibility for popular deployed codebases.  This document will be
29	   targeted at becoming an Internet Standard.

31	Status of This Memo

33	   This Internet-Draft is submitted in full conformance with the
34	   provisions of BCP 78 and BCP 79.

36	   Internet-Drafts are working documents of the Internet Engineering
37	   Task Force (IETF).  Note that other groups may also distribute
38	   working documents as Internet-Drafts.  The list of current Internet-
39	   Drafts is at https://datatracker.ietf.org/drafts/current/.

41	   Internet-Drafts are draft documents valid for a maximum of six months
42	   and may be updated, replaced, or obsoleted by other documents at any
43	   time.  It is inappropriate to use Internet-Drafts as reference
44	   material or to cite them other than as "work in progress."

46	   This Internet-Draft will expire on April 26, 2019.

48	Copyright Notice

50	   Copyright (c) 2018 IETF Trust and the persons identified as the
51	   document authors.  All rights reserved.

53	   This document is subject to BCP 78 and the IETF Trust's Legal
54	   Provisions Relating to IETF Documents
55	   (https://trustee.ietf.org/license-info) in effect on the date of
56	   publication of this document.  Please review these documents
57	   carefully, as they describe your rights and restrictions with respect
58	   to this document.  Code Components extracted from this document must
59	   include Simplified BSD License text as described in Section 4.e of
60	   the Trust Legal Provisions and are provided without warranty as
61	   described in the Simplified BSD License.

63	Table of Contents

65	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
66	     1.1.  Objectives  . . . . . . . . . . . . . . . . . . . . . . .   4
67	     1.2.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   5
68	   2.  CBOR Data Models  . . . . . . . . . . . . . . . . . . . . . .   7
69	     2.1.  Extended Generic Data Models  . . . . . . . . . . . . . .   8
70	     2.2.  Specific Data Models  . . . . . . . . . . . . . . . . . .   8
71	   3.  Specification of the CBOR Encoding  . . . . . . . . . . . . .   9
72	     3.1.  Major Types . . . . . . . . . . . . . . . . . . . . . . .  10
73	     3.2.  Indefinite Lengths for Some Major Types . . . . . . . . .  11
74	       3.2.1.  Indefinite-Length Arrays and Maps . . . . . . . . . .  12
75	       3.2.2.  Indefinite-Length Byte Strings and Text Strings . . .  14
76	     3.3.  Floating-Point Numbers and Values with No Content . . . .  15
77	     3.4.  Optional Tagging of Items . . . . . . . . . . . . . . . .  16
78	       3.4.1.  Date and Time . . . . . . . . . . . . . . . . . . . .  18
79	       3.4.2.  Standard Date/Time String . . . . . . . . . . . . . .  18
80	       3.4.3.  Epoch-based Date/Time . . . . . . . . . . . . . . . .  18
81	       3.4.4.  Bignums . . . . . . . . . . . . . . . . . . . . . . .  19
82	       3.4.5.  Decimal Fractions and Bigfloats . . . . . . . . . . .  20
83	       3.4.6.  Content Hints . . . . . . . . . . . . . . . . . . . .  21
84	         3.4.6.1.  Encoded CBOR Data Item  . . . . . . . . . . . . .  21
85	         3.4.6.2.  Expected Later Encoding for CBOR-to-JSON
86	                   Converters  . . . . . . . . . . . . . . . . . . .  21
87	         3.4.6.3.  Encoded Text  . . . . . . . . . . . . . . . . . .  22
88	       3.4.7.  Self-Describe CBOR  . . . . . . . . . . . . . . . . .  22
89	   4.  Creating CBOR-Based Protocols . . . . . . . . . . . . . . . .  23
90	     4.1.  CBOR in Streaming Applications  . . . . . . . . . . . . .  24
91	     4.2.  Generic Encoders and Decoders . . . . . . . . . . . . . .  24
92	     4.3.  Syntax Errors . . . . . . . . . . . . . . . . . . . . . .  25
93	       4.3.1.  Incomplete CBOR Data Items  . . . . . . . . . . . . .  25
94	       4.3.2.  Malformed Indefinite-Length Items . . . . . . . . . .  25
95	       4.3.3.  Unknown Additional Information Values . . . . . . . .  26

97	     4.4.  Other Decoding Errors . . . . . . . . . . . . . . . . . .  26
98	     4.5.  Handling Unknown Simple Values and Tags . . . . . . . . .  27
99	     4.6.  Numbers . . . . . . . . . . . . . . . . . . . . . . . . .  27
100	     4.7.  Specifying Keys for Maps  . . . . . . . . . . . . . . . .  28
101	       4.7.1.  Equivalence of Keys . . . . . . . . . . . . . . . . .  29
102	     4.8.  Undefined Values  . . . . . . . . . . . . . . . . . . . .  30
103	     4.9.  Preferred Serialization . . . . . . . . . . . . . . . . .  30
104	     4.10. Canonical CBOR  . . . . . . . . . . . . . . . . . . . . .  31
105	       4.10.1.  Length-first map key ordering  . . . . . . . . . . .  33
106	     4.11. Strict Mode . . . . . . . . . . . . . . . . . . . . . . .  34
107	   5.  Converting Data between CBOR and JSON . . . . . . . . . . . .  35
108	     5.1.  Converting from CBOR to JSON  . . . . . . . . . . . . . .  35
109	     5.2.  Converting from JSON to CBOR  . . . . . . . . . . . . . .  37
110	   6.  Future Evolution of CBOR  . . . . . . . . . . . . . . . . . .  37
111	     6.1.  Extension Points  . . . . . . . . . . . . . . . . . . . .  38
112	     6.2.  Curating the Additional Information Space . . . . . . . .  39
113	   7.  Diagnostic Notation . . . . . . . . . . . . . . . . . . . . .  39
114	     7.1.  Encoding Indicators . . . . . . . . . . . . . . . . . . .  40
115	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  41
116	     8.1.  Simple Values Registry  . . . . . . . . . . . . . . . . .  41
117	     8.2.  Tags Registry . . . . . . . . . . . . . . . . . . . . . .  41
118	     8.3.  Media Type ("MIME Type")  . . . . . . . . . . . . . . . .  42
119	     8.4.  CoAP Content-Format . . . . . . . . . . . . . . . . . . .  42
120	     8.5.  The +cbor Structured Syntax Suffix Registration . . . . .  43
121	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  44
122	   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  44
123	   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  45
124	     11.1.  Normative References . . . . . . . . . . . . . . . . . .  45
125	     11.2.  Informative References . . . . . . . . . . . . . . . . .  46
126	   Appendix A.  Examples . . . . . . . . . . . . . . . . . . . . . .  48
127	   Appendix B.  Jump Table . . . . . . . . . . . . . . . . . . . . .  52
128	   Appendix C.  Pseudocode . . . . . . . . . . . . . . . . . . . . .  55
129	   Appendix D.  Half-Precision . . . . . . . . . . . . . . . . . . .  57
130	   Appendix E.  Comparison of Other Binary Formats to CBOR's Design
131	                Objectives . . . . . . . . . . . . . . . . . . . . .  58
132	     E.1.  ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . .  59
133	     E.2.  MessagePack . . . . . . . . . . . . . . . . . . . . . . .  59
134	     E.3.  BSON  . . . . . . . . . . . . . . . . . . . . . . . . . .  60
135	     E.4.  MSDTP: RFC 713  . . . . . . . . . . . . . . . . . . . . .  60
136	     E.5.  Conciseness on the Wire . . . . . . . . . . . . . . . . .  60
137	   Appendix F.  Changes from RFC 7049  . . . . . . . . . . . . . . .  61
138	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  61

140	1.  Introduction

142	   There are hundreds of standardized formats for binary representation
143	   of structured data (also known as binary serialization formats).  Of
144	   those, some are for specific domains of information, while others are
145	   generalized for arbitrary data.  In the IETF, probably the best-known
146	   formats in the latter category are ASN.1's BER and DER [ASN.1].

148	   The format defined here follows some specific design goals that are
149	   not well met by current formats.  The underlying data model is an
150	   extended version of the JSON data model [RFC8259].  It is important
151	   to note that this is not a proposal that the grammar in RFC 8259 be
152	   extended in general, since doing so would cause a significant
153	   backwards incompatibility with already deployed JSON documents.
154	   Instead, this document simply defines its own data model that starts
155	   from JSON.

157	   Appendix E lists some existing binary formats and discusses how well
158	   they do or do not fit the design objectives of the Concise Binary
159	   Object Representation (CBOR).

161	1.1.  Objectives

163	   The objectives of CBOR, roughly in decreasing order of importance,
164	   are:

166	   1.  The representation must be able to unambiguously encode most
167	       common data formats used in Internet standards.

169	       *  It must represent a reasonable set of basic data types and
170	          structures using binary encoding.  "Reasonable" here is
171	          largely influenced by the capabilities of JSON, with the major
172	          addition of binary byte strings.  The structures supported are
173	          limited to arrays and trees; loops and lattice-style graphs
174	          are not supported.

176	       *  There is no requirement that all data formats be uniquely
177	          encoded; that is, it is acceptable that the number "7" might
178	          be encoded in multiple different ways.

180	   2.  The code for an encoder or decoder must be able to be compact in
181	       order to support systems with very limited memory, processor
182	       power, and instruction sets.

184	       *  An encoder and a decoder need to be implementable in a very
185	          small amount of code (for example, in class 1 constrained
186	          nodes as defined in [RFC7228]).

188	       *  The format should use contemporary machine representations of
189	          data (for example, not requiring binary-to-decimal
190	          conversion).

192	   3.  Data must be able to be decoded without a schema description.

194	       *  Similar to JSON, encoded data should be self-describing so
195	          that a generic decoder can be written.

197	   4.  The serialization must be reasonably compact, but data
198	       compactness is secondary to code compactness for the encoder and
199	       decoder.

201	       *  "Reasonable" here is bounded by JSON as an upper bound in
202	          size, and by implementation complexity maintaining a lower
203	          bound.  Using either general compression schemes or extensive
204	          bit-fiddling violates the complexity goals.

206	   5.  The format must be applicable to both constrained nodes and high-
207	       volume applications.

209	       *  This means it must be reasonably frugal in CPU usage for both
210	          encoding and decoding.  This is relevant both for constrained
211	          nodes and for potential usage in applications with a very high
212	          volume of data.

214	   6.  The format must support all JSON data types for conversion to and
215	       from JSON.

217	       *  It must support a reasonable level of conversion as long as
218	          the data represented is within the capabilities of JSON.  It
219	          must be possible to define a unidirectional mapping towards
220	          JSON for all types of data.

222	   7.  The format must be extensible, and the extended data must be
223	       decodable by earlier decoders.

225	       *  The format is designed for decades of use.

227	       *  The format must support a form of extensibility that allows
228	          fallback so that a decoder that does not understand an
229	          extension can still decode the message.

231	       *  The format must be able to be extended in the future by later
232	          IETF standards.

234	1.2.  Terminology

236	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
237	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
238	   document are to be interpreted as described in RFC 2119, BCP 14
239	   [RFC2119] and indicate requirement levels for compliant CBOR
240	   implementations.

242	   The term "byte" is used in its now-customary sense as a synonym for
243	   "octet".  All multi-byte values are encoded in network byte order
244	   (that is, most significant byte first, also known as "big-endian").

246	   This specification makes use of the following terminology:

248	   Data item:  A single piece of CBOR data.  The structure of a data
249	      item may contain zero, one, or more nested data items.  The term
250	      is used both for the data item in representation format and for
251	      the abstract idea that can be derived from that by a decoder.

253	   Decoder:  A process that decodes a CBOR data item and makes it
254	      available to an application.  Formally speaking, a decoder
255	      contains a parser to break up the input using the syntax rules of
256	      CBOR, as well as a semantic processor to prepare the data in a
257	      form suitable to the application.

259	   Encoder:  A process that generates the representation format of a
260	      CBOR data item from application information.

262	   Data Stream:  A sequence of zero or more data items, not further
263	      assembled into a larger containing data item.  The independent
264	      data items that make up a data stream are sometimes also referred
265	      to as "top-level data items".

267	   Well-formed:  A data item that follows the syntactic structure of
268	      CBOR.  A well-formed data item uses the initial bytes and the byte
269	      strings and/or data items that are implied by their values as
270	      defined in CBOR and is not followed by extraneous data.

272	   Valid:  A data item that is well-formed and also follows the semantic
273	      restrictions that apply to CBOR data items.

275	   Stream decoder:  A process that decodes a data stream and makes each
276	      of the data items in the sequence available to an application as
277	      they are received.

279	   Where bit arithmetic or data types are explained, this document uses
280	   the notation familiar from the programming language C, except that
281	   "**" denotes exponentiation.  Similar to the "0x" notation for
282	   hexadecimal numbers, numbers in binary notation are prefixed with
283	   "0b".  Underscores can be added to such a number solely for
284	   readability, so 0b00100001 (0x21) might be written 0b001_00001 to
285	   emphasize the desired interpretation of the bits in the byte; in this
286	   case, it is split into three bits and five bits.  Encoded CBOR data
287	   items are sometimes given in the "0x" or "0b" notation; these values
288	   are first interpreted as numbers as in C and are then interpreted as
289	   byte strings in network byte order, including any leading zero bytes
290	   expressed in the notation.

292	2.  CBOR Data Models

294	   CBOR is explicit about its generic data model, which defines the set
295	   of all data items that can be represented in CBOR.  Its basic generic
296	   data model is extensible by the registration of simple type values
297	   and tags.  Applications can then subset the resulting extended
298	   generic data model to build their specific data models.

300	   Within environments that can represent the data items in the generic
301	   data model, generic CBOR encoders and decoders can be implemented
302	   (which usually involves defining additional implementation data types
303	   for those data items that do not already have a natural
304	   representation in the environment).  The ability to provide generic
305	   encoders and decoders is an explicit design goal of CBOR; however
306	   many applications will provide their own application-specific
307	   encoders and/or decoders.

309	   In the basic (un-extended) generic data model, a data item is one of:

311	   o  an integer in the range -2**64..2**64-1 inclusive

313	   o  a simple value, identified by a number between 0 and 255, but
314	      distinct from that number

316	   o  a floating point value, distinct from an integer, out of the set
317	      representable by IEEE 754 binary64 (including non-finites)
318	      [IEEE.754.2008]

320	   o  a sequence of zero or more bytes ("byte string")

322	   o  a sequence of zero or more Unicode code points ("text string")

324	   o  a sequence of zero or more data items ("array")

326	   o  a mapping (mathematical function) from zero or more data items
327	      ("keys") each to a data item ("values"), ("map")

329	   o  a tagged data item, comprising a tag (an integer in the range
330	      0..2**64-1) and a value (a data item)

332	   Note that integer and floating-point values are distinct in this
333	   model, even if they have the same numeric value.

335	   Also note that serialization variants, such as number of bytes of the
336	   encoded floating value, or the choice of one of the ways in which an
337	   integer, the length of a text or byte string, the number of elements
338	   in an array or pairs in a map, or a tag value, (collectively "the
339	   argument", see Section 3) can be encoded, are not visible at the
340	   generic data model level.

342	2.1.  Extended Generic Data Models

344	   This basic generic data model comes pre-extended by the registration
345	   of a number of simple values and tags right in this document, such
346	   as:

348	   o  "false", "true", "null", and "undefined" (simple values identified
349	      by 20..23)

351	   o  integer and floating point values with a larger range and
352	      precision than the above (tags 2 to 5)

354	   o  application data types such as a point in time or an RFC 3339
355	      date/time string (tags 1, 0)

357	   Further elements of the extended generic data model can be (and have
358	   been) defined via the IANA registries created for CBOR.  Even if such
359	   an extension is unknown to a generic encoder or decoder, data items
360	   using that extension can be passed to or from the application by
361	   representing them at the interface to the application within the
362	   basic generic data model, i.e., as generic values of a simple type or
363	   generic tagged items.

365	   In other words, the basic generic data model is stable as defined in
366	   this document, while the extended generic data model expands by the
367	   registration of new simple values or tags, but never shrinks.

369	   While there is a strong expectation that generic encoders and
370	   decoders can represent "false", "true", and "null" ("undefined" is
371	   intentionally omitted) in the form appropriate for their programming
372	   environment, implementation of the data model extensions created by
373	   tags is truly optional and a matter of implementation quality.

375	2.2.  Specific Data Models

377	   The specific data model for a CBOR-based protocol usually subsets the
378	   extended generic data model and assigns application semantics to the
379	   data items within this subset and its components.  When documenting
380	   such specific data models, where it is desired to specify the types
381	   of data items, it is preferred to identify the types by the names
382	   they have in the generic data model ("negative integer", "array")
383	   instead of by referring to aspects of their CBOR representation
384	   ("major type 1", "major type 4").

386	   Specific data models can also specify what values (including values
387	   of different types) are equivalent for the purposes of map keys and
388	   encoder freedom.  For example, in the generic data model, a valid map
389	   MAY have both "0" and "0.0" as keys, and an encoder MUST NOT encode
390	   "0.0" as an integer (major type 0, Section 3.1).  However, if a
391	   specific data model declares that floating point and integer
392	   representations of integral values are equivalent, using both map
393	   keys "0" and "0.0" in a single map would be considered duplicates and
394	   so invalid, and an encoder could encode integral-valued floats as
395	   integers or vice versa, perhaps to save encoded bytes.

397	3.  Specification of the CBOR Encoding

399	   A CBOR data item (Section 2) is encoded to or decoded from a byte
400	   string as described in this section.  The encoding is summarized in
401	   Table 5.

403	   The initial byte of each encoded data item contains both information
404	   about the major type (the high-order 3 bits, described in
405	   Section 3.1) and additional information (the low-order 5 bits).
406	   Additional information value 31 is used for indefinite-length items,
407	   described in Section 3.2.  Additional information values 28 to 30 are
408	   reserved for future expansion.

410	   Additional information values from 0 to 27 describes how to construct
411	   an "argument", possibly consuming additional bytes.  For major type 7
412	   and additional information 25 to 27 (floating point numbers), there
413	   is a special case; in all other cases the additional information
414	   value, possibly combined with following bytes, the argument
415	   constructed is an unsigned integer.

417	   When the value of the additional information is less than 24, it is
418	   directly used as the argument's value.  When it is 24 to 27, the
419	   argument's value is held in the following 1, 2, 4, or 8,
420	   respectively, bytes, in network byte order.

422	   The meaning of this argument depends on the major type.  For example,
423	   in major type 0, the argument is the value of the data item itself
424	   (and in major type 1 the value of the data item is computed from the
425	   argument); in major type 2 and 3 it gives the length of the string
426	   data in bytes that follows; and in major types 4 and 5 it is used to
427	   determine the number of data items enclosed.

429	   If the encoded sequence of bytes ends before the end of a data item
430	   would be reached, that encoding is not well-formed.  If the encoded
431	   sequence of bytes still has bytes remaining after the outermost
432	   encoded item is parsed, that encoding is not a single well-formed
433	   CBOR item.

435	   A CBOR decoder implementation can be based on a jump table with all
436	   256 defined values for the initial byte (Table 5).  A decoder in a
437	   constrained implementation can instead use the structure of the
438	   initial byte and following bytes for more compact code (see
439	   Appendix C for a rough impression of how this could look).

441	3.1.  Major Types

443	   The following lists the major types and the additional information
444	   and other bytes associated with the type.

446	   Major type 0:  an integer in the range 0..2**64-1 inclusive.  The
447	      value of the encoded item is the argument itself.  For example,
448	      the integer 10 is denoted as the one byte 0b000_01010 (major type
449	      0, additional information 10).  The integer 500 would be
450	      0b000_11001 (major type 0, additional information 25) followed by
451	      the two bytes 0x01f4, which is 500 in decimal.

453	   Major type 1:  a negative integer in the range -2**64..-1 inclusive.
454	      The value of the item is -1 minus the argument.  For example, the
455	      integer -500 would be 0b001_11001 (major type 1, additional
456	      information 25) followed by the two bytes 0x01f3, which is 499 in
457	      decimal.

459	   Major type 2:  a byte string.  The number of bytes in the string is
460	      equal to the argument.  For example, a byte string whose length is
461	      5 would have an initial byte of 0b010_00101 (major type 2,
462	      additional information 5 for the length), followed by 5 bytes of
463	      binary content.  A byte string whose length is 500 would have 3
464	      initial bytes of 0b010_11001 (major type 2, additional information
465	      25 to indicate a two-byte length) followed by the two bytes 0x01f4
466	      for a length of 500, followed by 500 bytes of binary content.

468	   Major type 3:  a text string (Section 2), encoded as UTF-8
469	      ([RFC3629]).  The number of bytes in the string is equal to the
470	      argument.  A string containing an invalid UTF-8 sequence is well-
471	      formed but invalid.  This type is provided for systems that need
472	      to interpret or display human-readable text, and allows the
473	      differentiation between unstructured bytes and text that has a
474	      specified repertoire and encoding.  In contrast to formats such as
475	      JSON, the Unicode characters in this type are never escaped.
476	      Thus, a newline character (U+000A) is always represented in a
477	      string as the byte 0x0a, and never as the bytes 0x5c6e (the
478	      characters "\" and "n") or as 0x5c7530303061 (the characters "\",
479	      "u", "0", "0", "0", and "a").

481	   Major type 4:  an array of data items.  Arrays are also called lists,
482	      sequences, or tuples.  The argument is the number of data items in
483	      the array.  Items in an array do not need to all be of the same
484	      type.  For example, an array that contains 10 items of any type
485	      would have an initial byte of 0b100_01010 (major type of 4,
486	      additional information of 10 for the length) followed by the 10
487	      remaining items.

489	   Major type 5:  a map of pairs of data items.  Maps are also called
490	      tables, dictionaries, hashes, or objects (in JSON).  A map is
491	      comprised of pairs of data items, each pair consisting of a key
492	      that is immediately followed by a value.  The argument is the
493	      number of _pairs_ of data items in the map.  For example, a map
494	      that contains 9 pairs would have an initial byte of 0b101_01001
495	      (major type of 5, additional information of 9 for the number of
496	      pairs) followed by the 18 remaining items.  The first item is the
497	      first key, the second item is the first value, the third item is
498	      the second key, and so on.  A map that has duplicate keys may be
499	      well-formed, but it is not valid, and thus it causes indeterminate
500	      decoding; see also Section 4.7.

502	   Major type 6:  a tagged data item whose tag is the argument and whose
503	      value is the single following encoded item.  See Section 3.4.

505	   Major type 7:  floating-point numbers and simple values, as well as
506	      the "break" stop code.  See Section 3.3.

508	   These eight major types lead to a simple table showing which of the
509	   256 possible values for the initial byte of a data item are used
510	   (Table 5).

512	   In major types 6 and 7, many of the possible values are reserved for
513	   future specification.  See Section 8 for more information on these
514	   values.

516	3.2.  Indefinite Lengths for Some Major Types

518	   Four CBOR items (arrays, maps, byte strings, and text strings) can be
519	   encoded with an indefinite length using additional information value
520	   31.  This is useful if the encoding of the item needs to begin before
521	   the number of items inside the array or map, or the total length of
522	   the string, is known.  (The application of this is often referred to
523	   as "streaming" within a data item.)

525	   Indefinite-length arrays and maps are dealt with differently than
526	   indefinite-length byte strings and text strings.

528	3.2.1.  Indefinite-Length Arrays and Maps

530	   Indefinite-length arrays and maps are simply opened without
531	   indicating the number of data items that will be included in the
532	   array or map, using the additional information value of 31.  The
533	   initial major type and additional information byte is followed by the
534	   elements of the array or map, just as they would be in other arrays
535	   or maps.  The end of the array or map is indicated by encoding a
536	   "break" stop code in a place where the next data item would normally
537	   have been included.  The "break" is encoded with major type 7 and
538	   additional information value 31 (0b111_11111) but is not itself a
539	   data item: it is just a syntactic feature to close the array or map.
540	   That is, the "break" stop code comes after the last item in the array
541	   or map, and it cannot occur anywhere else in place of a data item.
542	   In this way, indefinite-length arrays and maps look identical to
543	   other arrays and maps except for beginning with the additional
544	   information value 31 and ending with the "break" stop code.

546	   Arrays and maps with indefinite lengths allow any number of items
547	   (for arrays) and key/value pairs (for maps) to be given before the
548	   "break" stop code.  There is no restriction against nesting
549	   indefinite-length array or map items.  A "break" only terminates a
550	   single item, so nested indefinite-length items need exactly as many
551	   "break" stop codes as there are type bytes starting an indefinite-
552	   length item.

554	   For example, assume an encoder wants to represent the abstract array
555	   [1, [2, 3], [4, 5]].  The definite-length encoding would be
556	   0x8301820203820405:

558	   83        -- Array of length 3
559	      01     -- 1
560	      82     -- Array of length 2
561	         02  -- 2
562	         03  -- 3
563	      82     -- Array of length 2
564	         04  -- 4
565	         05  -- 5

567	   Indefinite-length encoding could be applied independently to each of
568	   the three arrays encoded in this data item, as required, leading to
569	   representations such as:

571	   0x9f018202039f0405ffff
572	   9F        -- Start indefinite-length array
573	      01     -- 1
574	      82     -- Array of length 2
575	         02  -- 2
576	         03  -- 3
577	      9F     -- Start indefinite-length array
578	         04  -- 4
579	         05  -- 5
580	         FF  -- "break" (inner array)
581	      FF     -- "break" (outer array)

583	   0x9f01820203820405ff
584	   9F        -- Start indefinite-length array
585	      01     -- 1
586	      82     -- Array of length 2
587	         02  -- 2
588	         03  -- 3
589	      82     -- Array of length 2
590	         04  -- 4
591	         05  -- 5
592	      FF     -- "break"

594	   0x83018202039f0405ff
595	   83        -- Array of length 3
596	      01     -- 1
597	      82     -- Array of length 2
598	         02  -- 2
599	         03  -- 3
600	      9F     -- Start indefinite-length array
601	         04  -- 4
602	         05  -- 5
603	         FF  -- "break"

605	   0x83019f0203ff820405
606	   83        -- Array of length 3
607	      01     -- 1
608	      9F     -- Start indefinite-length array
609	         02  -- 2
610	         03  -- 3
611	         FF  -- "break"
612	      82     -- Array of length 2
613	         04  -- 4
614	         05  -- 5

616	   An example of an indefinite-length map (that happens to have two key/
617	   value pairs) might be:

619	   0xbf6346756ef563416d7421ff
620	   BF           -- Start indefinite-length map
621	      63        -- First key, UTF-8 string length 3
622	         46756e --   "Fun"
623	      F5        -- First value, true
624	      63        -- Second key, UTF-8 string length 3
625	         416d74 --   "Amt"
626	      21        -- Second value, -2
627	      FF        -- "break"

629	3.2.2.  Indefinite-Length Byte Strings and Text Strings

631	   Indefinite-length byte strings and text strings are actually a
632	   concatenation of zero or more definite-length byte or text strings
633	   ("chunks") that are together treated as one contiguous string.
634	   Indefinite-length strings are opened with the major type and
635	   additional information value of 31, but what follows are a series of
636	   byte or text strings that have definite lengths (the chunks).  The
637	   end of the series of chunks is indicated by encoding the "break" stop
638	   code (0b111_11111) in a place where the next chunk in the series
639	   would occur.  The contents of the chunks are concatenated together,
640	   and the overall length of the indefinite-length string will be the
641	   sum of the lengths of all of the chunks.  In summary, an indefinite-
642	   length string is encoded similarly to how an indefinite-length array
643	   of its chunks would be encoded, except that the major type of the
644	   indefinite-length string is that of a (text or byte) string and
645	   matches the major types of its chunks.

647	   For indefinite-length byte strings, every data item (chunk) between
648	   the indefinite-length indicator and the "break" MUST be a definite-
649	   length byte string item; if the parser sees any item type other than
650	   a byte string before it sees the "break", it is an error.

652	   For example, assume the sequence:

654	   0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111

656	   5F              -- Start indefinite-length byte string
657	      44           -- Byte string of length 4
658	         aabbccdd  -- Bytes content
659	      43           -- Byte string of length 3
660	         eeff99    -- Bytes content
661	      FF           -- "break"

663	   After decoding, this results in a single byte string with seven
664	   bytes: 0xaabbccddeeff99.

666	   Text strings with indefinite lengths act the same as byte strings
667	   with indefinite lengths, except that all their chunks MUST be
668	   definite-length text strings.  Note that this implies that the bytes
669	   of a single UTF-8 character cannot be spread between chunks: a new
670	   chunk can only be started at a character boundary.

672	3.3.  Floating-Point Numbers and Values with No Content

674	   Major type 7 is for two types of data: floating-point numbers and
675	   "simple values" that do not need any content.  Each value of the
676	   5-bit additional information in the initial byte has its own separate
677	   meaning, as defined in Table 1.  Like the major types for integers,
678	   items of this major type do not carry content data; all the
679	   information is in the initial bytes.

681	    +-------------+--------------------------------------------------+
682	    | 5-Bit Value | Semantics                                        |
683	    +-------------+--------------------------------------------------+
684	    | 0..23       | Simple value (value 0..23)                       |
685	    |             |                                                  |
686	    | 24          | Simple value (value 32..255 in following byte)   |
687	    |             |                                                  |
688	    | 25          | IEEE 754 Half-Precision Float (16 bits follow)   |
689	    |             |                                                  |
690	    | 26          | IEEE 754 Single-Precision Float (32 bits follow) |
691	    |             |                                                  |
692	    | 27          | IEEE 754 Double-Precision Float (64 bits follow) |
693	    |             |                                                  |
694	    | 28-30       | (Unassigned)                                     |
695	    |             |                                                  |
696	    | 31          | "break" stop code for indefinite-length items    |
697	    +-------------+--------------------------------------------------+

699	        Table 1: Values for Additional Information in Major Type 7

701	   As with all other major types, the 5-bit value 24 signifies a single-
702	   byte extension: it is followed by an additional byte to represent the
703	   simple value.  (To minimize confusion, only the values 32 to 255 are
704	   used.)  This maintains the structure of the initial bytes: as for the
705	   other major types, the length of these always depends on the
706	   additional information in the first byte.  Table 2 lists the values
707	   assigned and available for simple types.

709	                       +---------+-----------------+
710	                       | Value   | Semantics       |
711	                       +---------+-----------------+
712	                       | 0..19   | (Unassigned)    |
713	                       |         |                 |
714	                       | 20      | False           |
715	                       |         |                 |
716	                       | 21      | True            |
717	                       |         |                 |
718	                       | 22      | Null            |
719	                       |         |                 |
720	                       | 23      | Undefined value |
721	                       |         |                 |
722	                       | 24..31  | (Reserved)      |
723	                       |         |                 |
724	                       | 32..255 | (Unassigned)    |
725	                       +---------+-----------------+

727	                          Table 2: Simple Values

729	   The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit
730	   IEEE 754 binary floating-point values [IEEE.754.2008].  These
731	   floating-point values are encoded in the additional bytes of the
732	   appropriate size.  (See Appendix D for some information about 16-bit
733	   floating point.)

735	   An encoder MUST NOT encode False as the two-byte sequence of 0xf814,
736	   MUST NOT encode True as the two-byte sequence of 0xf815, MUST NOT
737	   encode Null as the two-byte sequence of 0xf816, and MUST NOT encode
738	   Undefined value as the two-byte sequence of 0xf817.  A decoder MUST
739	   treat these two-byte sequences as an error.  Similar prohibitions
740	   apply to the unassigned simple values as well.

742	3.4.  Optional Tagging of Items

744	   In CBOR, a data item can optionally be preceded by a tag to give it
745	   additional semantics while retaining its structure.  The tag is major
746	   type 6, and represents an integer number as indicated by the tag's
747	   argument (Section 3); the (sole) data item is carried as content
748	   data.  If a tag requires structured data, this structure is encoded
749	   into the nested data item.  The definition of a tag usually restricts
750	   what kinds of nested data item or items are valid.

752	   The initial bytes of the tag follow the rules for positive integers
753	   (major type 0).  The tag is followed by a single data item of any
754	   type.  For example, assume that a byte string of length 12 is marked
755	   with a tag to indicate it is a positive bignum (Section 3.4.4).  This
756	   would be marked as 0b110_00010 (major type 6, additional information
757	   2 for the tag) followed by 0b010_01100 (major type 2, additional
758	   information of 12 for the length) followed by the 12 bytes of the
759	   bignum.

761	   Decoders do not need to understand tags, and thus tags may be of
762	   little value in applications where the implementation creating a
763	   particular CBOR data item and the implementation decoding that stream
764	   know the semantic meaning of each item in the data flow.  Their
765	   primary purpose in this specification is to define common data types
766	   such as dates.  A secondary purpose is to allow optional tagging when
767	   the decoder is a generic CBOR decoder that might be able to benefit
768	   from hints about the content of items.  Understanding the semantic
769	   tags is optional for a decoder; it can just jump over the initial
770	   bytes of the tag and interpret the tagged data item itself.

772	   A tag always applies to the item that is directly followed by it.
773	   Thus, if tag A is followed by tag B, which is followed by data item
774	   C, tag A applies to the result of applying tag B on data item C.
775	   That is, a tagged item is a data item consisting of a tag and a
776	   value.  The content of the tagged item is the data item (the value)
777	   that is being tagged.

779	   IANA maintains a registry of tag values as described in Section 8.2.
780	   Table 3 provides a list of initial values, with definitions in the
781	   rest of this section.

783	   +-----------+--------------+----------------------------------------+
784	   | Tag       | Data Item    | Semantics                              |
785	   +-----------+--------------+----------------------------------------+
786	   | 0         | UTF-8 string | Standard date/time string; see Section |
787	   |           |              | 3.4.2                                  |
788	   |           |              |                                        |
789	   | 1         | multiple     | Epoch-based date/time; see Section     |
790	   |           |              | 3.4.3                                  |
791	   |           |              |                                        |
792	   | 2         | byte string  | Positive bignum; see Section 3.4.4     |
793	   |           |              |                                        |
794	   | 3         | byte string  | Negative bignum; see Section 3.4.4     |
795	   |           |              |                                        |
796	   | 4         | array        | Decimal fraction; see Section 3.4.5    |
797	   |           |              |                                        |
798	   | 5         | array        | Bigfloat; see Section 3.4.5            |
799	   |           |              |                                        |
800	   | 6..20     | (Unassigned) | (Unassigned)                           |
801	   |           |              |                                        |
802	   | 21        | multiple     | Expected conversion to base64url       |
803	   |           |              | encoding; see Section 3.4.6.2          |
804	   |           |              |                                        |
805	   | 22        | multiple     | Expected conversion to base64          |
806	   |           |              | encoding; see Section 3.4.6.2          |
807	   |           |              |                                        |
808	   | 23        | multiple     | Expected conversion to base16          |
809	   |           |              | encoding; see Section 3.4.6.2          |
810	   |           |              |                                        |
811	   | 24        | byte string  | Encoded CBOR data item; see Section    |
812	   |           |              | 3.4.6.1                                |
813	   |           |              |                                        |
814	   | 25..31    | (Unassigned) | (Unassigned)                           |
815	   |           |              |                                        |
816	   | 32        | UTF-8 string | URI; see Section 3.4.6.3               |
817	   |           |              |                                        |
818	   | 33        | UTF-8 string | base64url; see Section 3.4.6.3         |
819	   |           |              |                                        |
820	   | 34        | UTF-8 string | base64; see Section 3.4.6.3            |
821	   |           |              |                                        |
822	   | 35        | UTF-8 string | Regular expression; see Section        |
823	   |           |              | 3.4.6.3                                |
824	   |           |              |                                        |
825	   | 36        | UTF-8 string | MIME message; see Section 3.4.6.3      |
826	   |           |              |                                        |
827	   | 37..55798 | (Unassigned) | (Unassigned)                           |
828	   |           |              |                                        |
829	   | 55799     | multiple     | Self-describe CBOR; see Section 3.4.7  |
830	   |           |              |                                        |
831	   | 55800+    | (Unassigned) | (Unassigned)                           |
832	   +-----------+--------------+----------------------------------------+

834	                         Table 3: Values for Tags

836	3.4.1.  Date and Time

838	   Protocols using tag values 0 and 1 extend the generic data model
839	   (Section 2) with data items representing points in time.

841	3.4.2.  Standard Date/Time String

843	   Tag value 0 is for date/time strings that follow the standard format
844	   described in [RFC3339], as refined by Section 3.3 of [RFC4287].

846	3.4.3.  Epoch-based Date/Time

848	   Tag value 1 is for numerical representation of civil time expressed
849	   in seconds relative to 1970-01-01T00:00Z (in UTC time).

851	   The tagged item MUST be an unsigned or negative integer (major types
852	   0 and 1), or a floating-point number (major type 7 with additional
853	   information 25, 26, or 27).

855	   Non-negative values (major type 0 and non-negative floating-point
856	   numbers) stand for time values on or after 1970-01-01T00:00Z UTC and
857	   are interpreted according to POSIX [TIME_T].  (POSIX time is also
858	   known as UNIX Epoch time.  Note that leap seconds are handled
859	   specially by POSIX time and this results in a 1 second discontinuity
860	   several times per decade.)  Note that applications that require the
861	   expression of times beyond early 2106 cannot leave out support of
862	   64-bit integers for the tagged value.

864	   Negative values (major type 1 and negative floating-point numbers)
865	   are interpreted as determined by the application requirements as
866	   there is no universal standard for UTC count-of-seconds time before
867	   1970-01-01T00:00Z (this is particularly true for points in time that
868	   precede discontinuities in national calendars).

870	   To indicate fractional seconds, floating point values can be used
871	   within Tag 1 instead of integer values.  Note that this generally
872	   requires binary64 support, as binary16 and binary32 provide non-zero
873	   fractions of seconds only for a short period of time around early
874	   1970.  An application that requires Tag 1 support may restrict the
875	   tagged value to be an integer (or a floating-point value) only.

877	3.4.4.  Bignums

879	   Protocols using tag values 2 and 3 extend the generic data model
880	   (Section 2) with "bignums" representing arbitrary integers.  In the
881	   generic data model, bignum values are not equal to integers from the
882	   basic data model, but specific data models can define that
883	   equivalence.

885	   Bignums are encoded as a byte string data item, which is interpreted
886	   as an unsigned integer n in network byte order.  For tag value 2, the
887	   value of the bignum is n.  For tag value 3, the value of the bignum
888	   is -1 - n.  Decoders that understand these tags MUST be able to
889	   decode bignums that have leading zeroes.

891	   For example, the number 18446744073709551616 (2**64) is represented
892	   as 0b110_00010 (major type 6, tag 2), followed by 0b010_01001 (major
893	   type 2, length 9), followed by 0x010000000000000000 (one byte 0x01
894	   and eight bytes 0x00).  In hexadecimal:

896	   C2                        -- Tag 2
897	      49                     -- Byte string of length 9
898	         010000000000000000  -- Bytes content

900	3.4.5.  Decimal Fractions and Bigfloats

902	   Protocols using tag value 4 extend the generic data model with data
903	   items representing arbitrary-length decimal fractions m*(10*e).
904	   Protocols using tag value 5 extend the generic data model with data
905	   items representing arbitrary-length binary fractions m*(2*e).  As
906	   with bignums, values of different types are not equal in the generic
907	   data model.

909	   Decimal fractions combine an integer mantissa with a base-10 scaling
910	   factor.  They are most useful if an application needs the exact
911	   representation of a decimal fraction such as 1.1 because there is no
912	   exact representation for many decimal fractions in binary floating
913	   point.

915	   Bigfloats combine an integer mantissa with a base-2 scaling factor.
916	   They are binary floating-point values that can exceed the range or
917	   the precision of the three IEEE 754 formats supported by CBOR
918	   (Section 3.3).  Bigfloats may also be used by constrained
919	   applications that need some basic binary floating-point capability
920	   without the need for supporting IEEE 754.

922	   A decimal fraction or a bigfloat is represented as a tagged array
923	   that contains exactly two integer numbers: an exponent e and a
924	   mantissa m.  Decimal fractions (tag 4) use base-10 exponents; the
925	   value of a decimal fraction data item is m*(10**e).  Bigfloats (tag
926	   5) use base-2 exponents; the value of a bigfloat data item is
927	   m*(2**e).  The exponent e MUST be represented in an integer of major
928	   type 0 or 1, while the mantissa also can be a bignum (Section 3.4.4).

930	   An example of a decimal fraction is that the number 273.15 could be
931	   represented as 0b110_00100 (major type of 6 for the tag, additional
932	   information of 4 for the type of tag), followed by 0b100_00010 (major
933	   type of 4 for the array, additional information of 2 for the length
934	   of the array), followed by 0b001_00001 (major type of 1 for the first
935	   integer, additional information of 1 for the value of -2), followed
936	   by 0b000_11001 (major type of 0 for the second integer, additional
937	   information of 25 for a two-byte value), followed by
938	   0b0110101010110011 (27315 in two bytes).  In hexadecimal:

940	   C4             -- Tag 4
941	      82          -- Array of length 2
942	         21       -- -2
943	         19 6ab3  -- 27315

945	   An example of a bigfloat is that the number 1.5 could be represented
946	   as 0b110_00101 (major type of 6 for the tag, additional information
947	   of 5 for the type of tag), followed by 0b100_00010 (major type of 4
948	   for the array, additional information of 2 for the length of the
949	   array), followed by 0b001_00000 (major type of 1 for the first
950	   integer, additional information of 0 for the value of -1), followed
951	   by 0b000_00011 (major type of 0 for the second integer, additional
952	   information of 3 for the value of 3).  In hexadecimal:

954	   C5             -- Tag 5
955	      82          -- Array of length 2
956	         20       -- -1
957	         03       -- 3

959	   Decimal fractions and bigfloats provide no representation of
960	   Infinity, -Infinity, or NaN; if these are needed in place of a
961	   decimal fraction or bigfloat, the IEEE 754 half-precision
962	   representations from Section 3.3 can be used.  For constrained
963	   applications, where there is a choice between representing a specific
964	   number as an integer and as a decimal fraction or bigfloat (such as
965	   when the exponent is small and non-negative), there is a quality-of-
966	   implementation expectation that the integer representation is used
967	   directly.

969	3.4.6.  Content Hints

971	   The tags in this section are for content hints that might be used by
972	   generic CBOR processors.  These content hints do not extend the
973	   generic data model.

975	3.4.6.1.  Encoded CBOR Data Item

977	   Sometimes it is beneficial to carry an embedded CBOR data item that
978	   is not meant to be decoded immediately at the time the enclosing data
979	   item is being parsed.  Tag 24 (CBOR data item) can be used to tag the
980	   embedded byte string as a data item encoded in CBOR format.

982	3.4.6.2.  Expected Later Encoding for CBOR-to-JSON Converters

984	   Tags 21 to 23 indicate that a byte string might require a specific
985	   encoding when interoperating with a text-based representation.  These
986	   tags are useful when an encoder knows that the byte string data it is
987	   writing is likely to be later converted to a particular JSON-based
988	   usage.  That usage specifies that some strings are encoded as base64,
989	   base64url, and so on.  The encoder uses byte strings instead of doing
990	   the encoding itself to reduce the message size, to reduce the code
991	   size of the encoder, or both.  The encoder does not know whether or
992	   not the converter will be generic, and therefore wants to say what it
993	   believes is the proper way to convert binary strings to JSON.

995	   The data item tagged can be a byte string or any other data item.  In
996	   the latter case, the tag applies to all of the byte string data items
997	   contained in the data item, except for those contained in a nested
998	   data item tagged with an expected conversion.

1000	   These three tag types suggest conversions to three of the base data
1001	   encodings defined in [RFC4648].  For base64url encoding, padding is
1002	   not used (see Section 3.2 of RFC 4648); that is, all trailing equals
1003	   signs ("=") are removed from the base64url-encoded string.  Later
1004	   tags might be defined for other data encodings of RFC 4648 or for
1005	   other ways to encode binary data in strings.

1007	3.4.6.3.  Encoded Text

1009	   Some text strings hold data that have formats widely used on the
1010	   Internet, and sometimes those formats can be validated and presented
1011	   to the application in appropriate form by the decoder.  There are
1012	   tags for some of these formats.

1014	   o  Tag 32 is for URIs, as defined in [RFC3986];

1016	   o  Tags 33 and 34 are for base64url- and base64-encoded text strings,
1017	      as defined in [RFC4648];

1019	   o  Tag 35 is for regular expressions that are roughly in Perl
1020	      Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] or a
1021	      version of the JavaScript regular expression syntax [ECMA262].
1022	      (Note that more specific identification may be necessary if the
1023	      actual version of the specification underlying the regular
1024	      expression, or more than just the text of the regular expression
1025	      itself, need to be conveyed.)

1027	   o  Tag 36 is for MIME messages (including all headers), as defined in
1028	      [RFC2045];

1030	   Note that tags 33 and 34 differ from 21 and 22 in that the data is
1031	   transported in base-encoded form for the former and in raw byte
1032	   string form for the latter.

1034	3.4.7.  Self-Describe CBOR

1036	   In many applications, it will be clear from the context that CBOR is
1037	   being employed for encoding a data item.  For instance, a specific
1038	   protocol might specify the use of CBOR, or a media type is indicated
1039	   that specifies its use.  However, there may be applications where
1040	   such context information is not available, such as when CBOR data is
1041	   stored in a file and disambiguating metadata is not in use.  Here, it
1042	   may help to have some distinguishing characteristics for the data
1043	   itself.

1045	   Tag 55799 is defined for this purpose.  It does not impart any
1046	   special semantics on the data item that follows; that is, the
1047	   semantics of a data item tagged with tag 55799 is exactly identical
1048	   to the semantics of the data item itself.

1050	   The serialization of this tag is 0xd9d9f7, which appears not to be in
1051	   use as a distinguishing mark for frequently used file types.  In
1052	   particular, it is not a valid start of a Unicode text in any Unicode
1053	   encoding if followed by a valid CBOR data item.

1055	   For instance, a decoder might be able to parse both CBOR and JSON.
1056	   Such a decoder would need to mechanically distinguish the two
1057	   formats.  An easy way for an encoder to help the decoder would be to
1058	   tag the entire CBOR item with tag 55799, the serialization of which
1059	   will never be found at the beginning of a JSON text.

1061	4.  Creating CBOR-Based Protocols

1063	   Data formats such as CBOR are often used in environments where there
1064	   is no format negotiation.  A specific design goal of CBOR is to not
1065	   need any included or assumed schema: a decoder can take a CBOR item
1066	   and decode it with no other knowledge.

1068	   Of course, in real-world implementations, the encoder and the decoder
1069	   will have a shared view of what should be in a CBOR data item.  For
1070	   example, an agreed-to format might be "the item is an array whose
1071	   first value is a UTF-8 string, second value is an integer, and
1072	   subsequent values are zero or more floating-point numbers" or "the
1073	   item is a map that has byte strings for keys and contains at least
1074	   one pair whose key is 0xab01".

1076	   This specification puts no restrictions on CBOR-based protocols.  An
1077	   encoder can be capable of encoding as many or as few types of values
1078	   as is required by the protocol in which it is used; a decoder can be
1079	   capable of understanding as many or as few types of values as is
1080	   required by the protocols in which it is used.  This lack of
1081	   restrictions allows CBOR to be used in extremely constrained
1082	   environments.

1084	   This section discusses some considerations in creating CBOR-based
1085	   protocols.  It is advisory only and explicitly excludes any language
1086	   from RFC 2119 other than words that could be interpreted as "MAY" in
1087	   the sense of RFC 2119.

1089	4.1.  CBOR in Streaming Applications

1091	   In a streaming application, a data stream may be composed of a
1092	   sequence of CBOR data items concatenated back-to-back.  In such an
1093	   environment, the decoder immediately begins decoding a new data item
1094	   if data is found after the end of a previous data item.

1096	   Not all of the bytes making up a data item may be immediately
1097	   available to the decoder; some decoders will buffer additional data
1098	   until a complete data item can be presented to the application.
1099	   Other decoders can present partial information about a top-level data
1100	   item to an application, such as the nested data items that could
1101	   already be decoded, or even parts of a byte string that hasn't
1102	   completely arrived yet.

1104	   Note that some applications and protocols will not want to use
1105	   indefinite-length encoding.  Using indefinite-length encoding allows
1106	   an encoder to not need to marshal all the data for counting, but it
1107	   requires a decoder to allocate increasing amounts of memory while
1108	   waiting for the end of the item.  This might be fine for some
1109	   applications but not others.

1111	4.2.  Generic Encoders and Decoders

1113	   A generic CBOR decoder can decode all well-formed CBOR data and
1114	   present them to an application.  CBOR data is well-formed if it uses
1115	   the initial bytes, as well as the byte strings and/or data items that
1116	   are implied by their values, in the manner defined by CBOR, and no
1117	   extraneous data follows (Appendix C).

1119	   Even though CBOR attempts to minimize these cases, not all well-
1120	   formed CBOR data is valid: for example, the format excludes simple
1121	   values below 32 that are encoded with an extension byte.  Also,
1122	   specific tags may make semantic constraints that may be violated,
1123	   such as by including a tag in a bignum tag or by following a byte
1124	   string within a date tag.  Finally, the data may be invalid, such as
1125	   invalid UTF-8 strings or date strings that do not conform to
1126	   [RFC3339].  There is no requirement that generic encoders and
1127	   decoders make unnatural choices for their application interface to
1128	   enable the processing of invalid data.  Generic encoders and decoders
1129	   are expected to forward simple values and tags even if their specific
1130	   codepoints are not registered at the time the encoder/decoder is
1131	   written (Section 4.5).

1133	   Generic decoders provide ways to present well-formed CBOR values,
1134	   both valid and invalid, to an application.  The diagnostic notation
1135	   (Section 7) may be used to present well-formed CBOR values to humans.

1137	   Generic encoders provide an application interface that allows the
1138	   application to specify any well-formed value, including simple values
1139	   and tags unknown to the encoder.

1141	4.3.  Syntax Errors

1143	   A decoder encountering a CBOR data item that is not well-formed
1144	   generally can choose to completely fail the decoding (issue an error
1145	   and/or stop processing altogether), substitute the problematic data
1146	   and data items using a decoder-specific convention that clearly
1147	   indicates there has been a problem, or take some other action.

1149	4.3.1.  Incomplete CBOR Data Items

1151	   The representation of a CBOR data item has a specific length,
1152	   determined by its initial bytes and by the structure of any data
1153	   items enclosed in the data items.  If less data is available, this
1154	   can be treated as a syntax error.  A decoder may also implement
1155	   incremental parsing, that is, decode the data item as far as it is
1156	   available and present the data found so far (such as in an event-
1157	   based interface), with the option of continuing the decoding once
1158	   further data is available.

1160	   Examples of incomplete data items include:

1162	   o  A decoder expects a certain number of array or map entries but
1163	      instead encounters the end of the data.

1165	   o  A decoder processes what it expects to be the last pair in a map
1166	      and comes to the end of the data.

1168	   o  A decoder has just seen a tag and then encounters the end of the
1169	      data.

1171	   o  A decoder has seen the beginning of an indefinite-length item but
1172	      encounters the end of the data before it sees the "break" stop
1173	      code.

1175	4.3.2.  Malformed Indefinite-Length Items

1177	   Examples of malformed indefinite-length data items include:

1179	   o  Within an indefinite-length byte string or text, a decoder finds
1180	      an item that is not of the appropriate major type before it finds
1181	      the "break" stop code.

1183	   o  Within an indefinite-length map, a decoder encounters the "break"
1184	      stop code immediately after reading a key (the value is missing).

1186	   Another error is finding a "break" stop code at a point in the data
1187	   where there is no immediately enclosing (unclosed) indefinite-length
1188	   item.

1190	4.3.3.  Unknown Additional Information Values

1192	   At the time of writing, some additional information values are
1193	   unassigned and reserved for future versions of this document (see
1194	   Section 6.2).  Since the overall syntax for these additional
1195	   information values is not yet defined, a decoder that sees an
1196	   additional information value that it does not understand cannot
1197	   continue parsing.

1199	4.4.  Other Decoding Errors

1201	   A CBOR data item may be syntactically well-formed but present a
1202	   problem with interpreting the data encoded in it in the CBOR data
1203	   model.  Generally speaking, a decoder that finds a data item with
1204	   such a problem might issue a warning, might stop processing
1205	   altogether, might handle the error and make the problematic value
1206	   available to the application as such, or take some other type of
1207	   action.

1209	   Such problems might include:

1211	   Duplicate keys in a map:  Generic decoders (Section 4.2) make data
1212	      available to applications using the native CBOR data model.  That
1213	      data model includes maps (key-value mappings with unique keys),
1214	      not multimaps (key-value mappings where multiple entries can have
1215	      the same key).  Thus, a generic decoder that gets a CBOR map item
1216	      that has duplicate keys will decode to a map with only one
1217	      instance of that key, or it might stop processing altogether.  On
1218	      the other hand, a "streaming decoder" may not even be able to
1219	      notice (Section 4.7).

1221	   Inadmissible type on the value following a tag:  Tags (Section 3.4)
1222	      specify what type of data item is supposed to follow the tag; for
1223	      example, the tags for positive or negative bignums are supposed to
1224	      be put on byte strings.  A decoder that decodes the tagged data
1225	      item into a native representation (a native big integer in this
1226	      example) is expected to check the type of the data item being
1227	      tagged.  Even decoders that don't have such native representations
1228	      available in their environment may perform the check on those tags
1229	      known to them and react appropriately.

1231	   Invalid UTF-8 string:  A decoder might or might not want to verify
1232	      that the sequence of bytes in a UTF-8 string (major type 3) is
1233	      actually valid UTF-8 and react appropriately.

1235	4.5.  Handling Unknown Simple Values and Tags

1237	   A decoder that comes across a simple value (Section 3.3) that it does
1238	   not recognize, such as a value that was added to the IANA registry
1239	   after the decoder was deployed or a value that the decoder chose not
1240	   to implement, might issue a warning, might stop processing
1241	   altogether, might handle the error by making the unknown value
1242	   available to the application as such (as is expected of generic
1243	   decoders), or take some other type of action.

1245	   A decoder that comes across a tag (Section 3.4) that it does not
1246	   recognize, such as a tag that was added to the IANA registry after
1247	   the decoder was deployed or a tag that the decoder chose not to
1248	   implement, might issue a warning, might stop processing altogether,
1249	   might handle the error and present the unknown tag value together
1250	   with the contained data item to the application (as is expected of
1251	   generic decoders), might ignore the tag and simply present the
1252	   contained data item only to the application, or take some other type
1253	   of action.

1255	4.6.  Numbers

1257	   An application or protocol that uses CBOR might restrict the
1258	   representations of numbers.  For instance, a protocol that only deals
1259	   with integers might say that floating-point numbers may not be used
1260	   and that decoders of that protocol do not need to be able to handle
1261	   floating-point numbers.  Similarly, a protocol or application that
1262	   uses CBOR might say that decoders need to be able to handle either
1263	   type of number.

1265	   CBOR-based protocols should take into account that different language
1266	   environments pose different restrictions on the range and precision
1267	   of numbers that are representable.  For example, the JavaScript
1268	   number system treats all numbers as floating point, which may result
1269	   in silent loss of precision in decoding integers with more than 53
1270	   significant bits.  A protocol that uses numbers should define its
1271	   expectations on the handling of non-trivial numbers in decoders and
1272	   receiving applications.

1274	   A CBOR-based protocol that includes floating-point numbers can
1275	   restrict which of the three formats (half-precision, single-
1276	   precision, and double-precision) are to be supported.  For an
1277	   integer-only application, a protocol may want to completely exclude
1278	   the use of floating-point values.

1280	   A CBOR-based protocol designed for compactness may want to exclude
1281	   specific integer encodings that are longer than necessary for the
1282	   application, such as to save the need to implement 64-bit integers.

1284	   There is an expectation that encoders will use the most compact
1285	   integer representation that can represent a given value.  However, a
1286	   compact application should accept values that use a longer-than-
1287	   needed encoding (such as encoding "0" as 0b000_11001 followed by two
1288	   bytes of 0x00) as long as the application can decode an integer of
1289	   the given size.

1291	   The preferred encoding for a floating point value is the shortest
1292	   floating point encoding that preserves its value, e.g., 0xf94580 for
1293	   the number 5.5, and 0xfa45ad9c00 for the number 5555.5, unless the
1294	   CBOR-based protocol specifically excludes the use of the shorter
1295	   floating point encodings.  For NaN values, a shorter encoding is
1296	   preferred if zero-padding the shorter significand towards the right
1297	   reconstitutes the original NaN value (for many applications, the
1298	   single NaN encoding 0xf97e00 will suffice).

1300	4.7.  Specifying Keys for Maps

1302	   The encoding and decoding applications need to agree on what types of
1303	   keys are going to be used in maps.  In applications that need to
1304	   interwork with JSON-based applications, keys probably should be
1305	   limited to UTF-8 strings only; otherwise, there has to be a specified
1306	   mapping from the other CBOR types to Unicode characters, and this
1307	   often leads to implementation errors.  In applications where keys are
1308	   numeric in nature and numeric ordering of keys is important to the
1309	   application, directly using the numbers for the keys is useful.

1311	   If multiple types of keys are to be used, consideration should be
1312	   given to how these types would be represented in the specific
1313	   programming environments that are to be used.  For example, in
1314	   JavaScript Maps [ECMA262], a key of integer 1 cannot be distinguished
1315	   from a key of floating point 1.0.  This means that, if integer keys
1316	   are used, the protocol needs to avoid use of floating-point keys the
1317	   values of which happen to be integer numbers in the same map.

1319	   Decoders that deliver data items nested within a CBOR data item
1320	   immediately on decoding them ("streaming decoders") often do not keep
1321	   the state that is necessary to ascertain uniqueness of a key in a
1322	   map.  Similarly, an encoder that can start encoding data items before
1323	   the enclosing data item is completely available ("streaming encoder")
1324	   may want to reduce its overhead significantly by relying on its data
1325	   source to maintain uniqueness.

1327	   A CBOR-based protocol should make an intentional decision about what
1328	   to do when a receiving application does see multiple identical keys
1329	   in a map.  The resulting rule in the protocol should respect the CBOR
1330	   data model: it cannot prescribe a specific handling of the entries
1331	   with the identical keys, except that it might have a rule that having
1332	   identical keys in a map indicates a malformed map and that the
1333	   decoder has to stop with an error.  Duplicate keys are also
1334	   prohibited by CBOR decoders that are using strict mode
1335	   (Section 4.11).

1337	   The CBOR data model for maps does not allow ascribing semantics to
1338	   the order of the key/value pairs in the map representation.  Thus, a
1339	   CBOR-based protocol MUST NOT specify that changing the key/value pair
1340	   order in a map would change the semantics, except to specify that
1341	   some, e.g. non-canonical, orders are disallowed.  Timing, cache
1342	   usage, and other side channels are not considered part of the
1343	   semantics.

1345	   Applications for constrained devices that have maps with 24 or fewer
1346	   frequently used keys should consider using small integers (and those
1347	   with up to 48 frequently used keys should consider also using small
1348	   negative integers) because the keys can then be encoded in a single
1349	   byte.

1351	4.7.1.  Equivalence of Keys

1353	   The specific data model applying to a CBOR data item is used to
1354	   determine whether keys occurring in maps are duplicates or distinct.

1356	   At the generic data model level, numerically equivalent integer and
1357	   floating point values are distinct from each other, as they are from
1358	   the various big numbers (Tags 2 to 5).  Similarly, text strings are
1359	   distinct from byte strings, even if composed of the same bytes.  A
1360	   tagged value is distinct from an untagged value or from a value
1361	   tagged with a different tag.

1363	   Within each of these groups, numeric values are distinct unless they
1364	   are numerically equal (specifically, -0.0 is equal to 0.0); for the
1365	   purpose of map key equivalence, NaN (not a number) values are
1366	   equivalent if they have the same significand after zero-extending
1367	   both significands at the right to 64 bits.

1369	   (Byte and text) strings are compared byte by byte, arrays element by
1370	   element, and are equal if they have the same number of bytes/elements
1371	   and the same values at the same positions.  Two maps are equal if
1372	   they have the same set of pairs regardless of their order; pairs are
1373	   equal if both the key and value are equal.

1375	   Tagged values are equal if both the tag and the value are equal.
1376	   Simple values are equal if they simply have the same value.  Nothing
1377	   else is equal in the generic data model, a simple value 2 is not
1378	   equivalent to an integer 2 and an array is never equivalent to a map.

1380	   As discussed in Section 2.2, specific data models can make values
1381	   equivalent for the purpose of comparing map keys that are distinct in
1382	   the generic data model.  Note that this implies that a generic
1383	   decoder may deliver a decoded map to an application that needs to be
1384	   checked for duplicate map keys by that application (alternatively,
1385	   the decoder may provide a programming interface to perform this
1386	   service for the application).  Specific data models cannot
1387	   distinguish values for map keys that are equal for this purpose at
1388	   the generic data model level.

1390	4.8.  Undefined Values

1392	   In some CBOR-based protocols, the simple value (Section 3.3) of
1393	   Undefined might be used by an encoder as a substitute for a data item
1394	   with an encoding problem, in order to allow the rest of the enclosing
1395	   data items to be encoded without harm.

1397	4.9.  Preferred Serialization

1399	   For some values at the data model level, CBOR provides multiple
1400	   serializations.  For many applications, it is desirable that an
1401	   encoder always chooses a preferred serialization; however, the
1402	   present specification does not put the burden of enforcing this
1403	   preference on either encoder or decoder.

1405	   Some constrained decoders may be limited in their ability to decode
1406	   non-preferred serializations: For example, if only integers below
1407	   1_000_000_000 are expected in an application, the decoder may leave
1408	   out the code that would be needed to decode 64-bit arguments in
1409	   integers.  An encoder that always uses preferred serialization
1410	   ("preferred encoder") interoperates with this decoder for the numbers
1411	   that can occur in this application.  More generally speaking, it
1412	   therefore can be said that a preferred encoder is more universally
1413	   interoperable (and also less wasteful) than one that, say, always
1414	   uses 64-bit integers.

1416	   Similarly, a constrained encoder may be limited in the variety of
1417	   representation variants it supports in such a way that it does not
1418	   emit preferred serializations ("variant encoder"): Say, it could be
1419	   designed to always use the 32-bit variant for an integer that it
1420	   encodes even if a short representation is available (again, assuming
1421	   that there is no application need for integers that can only be
1422	   represented with the 64-bit variant).  A decoder that does not rely
1423	   on only ever receiving preferred serializations ("variation-tolerant
1424	   decoder") can there be said to be more universally interoperable (it
1425	   might very well optimize for the case of receiving preferred
1426	   serializations, though).  Full implementations of CBOR decoders are
1427	   by definition variation-tolerant; the distinction is only relevant if
1428	   a constrained implementation of a CBOR decoder meets a variant
1429	   encoder.

1431	   The preferred serialization always uses the shortest form of
1432	   representing the argument (Section 3)); it also uses the shortest
1433	   floating point encoding that preserves the value being encoded (see
1434	   Section 4.6).  Definite length encoding is preferred whenever the
1435	   length is known at the time the serialization of the item starts.

1437	4.10.  Canonical CBOR

1439	   Some protocols may want encoders to only emit CBOR in a particular
1440	   canonical format; those protocols might also have the decoders check
1441	   that their input is canonical.  Those protocols are free to define
1442	   what they mean by a canonical format and what encoders and decoders
1443	   are expected to do.  This section defines a set of restrictions that
1444	   can serve as the base of such a canonical format.

1446	   A CBOR encoding satisfies the "core canonicalization requirements" if
1447	   it satisfies the following restrictions:

1449	   o  Arguments (see Section 3) for integers, lengths in major types 2
1450	      through 5, and tags MUST be as short as possible.  In particular:

1452	      *  0 to 23 and -1 to -24 MUST be expressed in the same byte as the
1453	         major type;

1455	      *  24 to 255 and -25 to -256 MUST be expressed only with an
1456	         additional uint8_t;

1458	      *  256 to 65535 and -257 to -65536 MUST be expressed only with an
1459	         additional uint16_t;

1461	      *  65536 to 4294967295 and -65537 to -4294967296 MUST be expressed
1462	         only with an additional uint32_t.

1464	   o  The keys in every map MUST be sorted in the bytewise lexicographic
1465	      order of their canonical encodings.  For example, the following
1466	      keys are sorted correctly:

1468	      1.  10, encoded as 0x0a.

1470	      2.  100, encoded as 0x1864.

1472	      3.  -1, encoded as 0x20.

1474	      4.  "z", encoded as 0x617a.

1476	      5.  "aa", encoded as 0x626161.

1478	      6.  [100], encoded as 0x811864.

1480	      7.  [-1], encoded as 0x8120.

1482	      8.  false, encoded as 0xf4.

1484	   o  Indefinite-length items MUST NOT appear.  They can be encoded as
1485	      definite-length items instead.

1487	   If a protocol allows for IEEE floats, then additional
1488	   canonicalization rules might need to be added.  One example rule
1489	   might be to have all floats start as a 64-bit float, then do a test
1490	   conversion to a 32-bit float; if the result is the same numeric
1491	   value, use the shorter value and repeat the process with a test
1492	   conversion to a 16-bit float.  (This rule selects 16-bit float for
1493	   positive and negative Infinity as well.)  Also, there are many
1494	   representations for NaN.  If NaN is an allowed value, it must always
1495	   be represented as 0xf97e00.

1497	   CBOR tags present additional considerations for canonicalization.
1498	   The absence or presence of tags in a canonical format is determined
1499	   by the optionality of the tags in the protocol.  In a CBOR-based
1500	   protocol that allows optional tagging anywhere, the canonical format
1501	   must not allow them.  In a protocol that requires tags in certain
1502	   places, the tag needs to appear in the canonical format.  A CBOR-
1503	   based protocol that uses canonicalization might instead say that all
1504	   tags that appear in a message must be retained regardless of whether
1505	   they are optional.

1507	   Protocols that include floating, big integer, or other complex values
1508	   need to define extra requirements on their canonical encodings.  For
1509	   example:

1511	   o  If a protocol includes a field that can express floating values
1512	      (Section 3.3), the protocol's canonicalization needs to specify
1513	      whether the integer 1.0 is encoded as 0x01, 0xf93c00,
1514	      0xfa3f800000, or 0xfb3ff0000000000000.  Three sensible rules for
1515	      this are:

1517	      1.  Encode integral values that fit in 64 bits as values from
1518	          major types 0 and 1, and other values as the smallest of 16-,
1519	          32-, or 64-bit floating point that accurately represents the
1520	          value,

1522	      2.  Encode all values as the smallest of 16-, 32-, or 64-bit
1523	          floating point that accurately represents the value, even for
1524	          integral values, or

1526	      3.  Encode all values as 64-bit floating point.

1528	      If NaN is an allowed value, the protocol needs to pick a single
1529	      representation, for example 0xf97e00.

1531	   o  If a protocol includes a field that can express integers larger
1532	      than 2^64 using tag 2 (Section 3.4.4), the protocol's
1533	      canonicalization needs to specify whether small integers are
1534	      expressed using the tag or major types 0 and 1.

1536	   o  A protocol might give encoders the choice of representing a URL as
1537	      either a text string or, using Section 3.4.6.3, tag 32 containing
1538	      a text string.  This protocol's canonicalization needs to either
1539	      require that the tag is present or require that it's absent, not
1540	      allow either one.

1542	4.10.1.  Length-first map key ordering

1544	   The core canonicalization requirements sort map keys in a different
1545	   order from the one suggested by [RFC7049].  Protocols that need to be
1546	   compatible with [RFC7049]'s order can instead be specified in terms
1547	   of this specification's "length-first core canonicalization
1548	   requirements":

1550	   A CBOR encoding satisfies the "length-first core canonicalization
1551	   requirements" if it satisfies the core canonicalization requirements
1552	   except that the keys in every map MUST be sorted such that:

1554	   1.  If two keys have different lengths, the shorter one sorts
1555	       earlier;

1557	   2.  If two keys have the same length, the one with the lower value in
1558	       (byte-wise) lexical order sorts earlier.

1560	   For example, under the length-first core canonicalization
1561	   requirements, the following keys are sorted correctly:

1563	   1.  10, encoded as 0x0a.

1565	   2.  -1, encoded as 0x20.

1567	   3.  false, encoded as 0xf4.

1569	   4.  100, encoded as 0x1864.

1571	   5.  "z", encoded as 0x617a.

1573	   6.  [-1], encoded as 0x8120.

1575	   7.  "aa", encoded as 0x626161.

1577	   8.  [100], encoded as 0x811864.

1579	4.11.  Strict Mode

1581	   Some areas of application of CBOR do not require canonicalization
1582	   (Section 4.10) but may require that different decoders reach the same
1583	   (semantically equivalent) results, even in the presence of
1584	   potentially malicious data.  This can be required if one application
1585	   (such as a firewall or other protecting entity) makes a decision
1586	   based on the data that another application, which independently
1587	   decodes the data, relies on.

1589	   Normally, it is the responsibility of the sender to avoid ambiguously
1590	   decodable data.  However, the sender might be an attacker specially
1591	   making up CBOR data such that it will be interpreted differently by
1592	   different decoders in an attempt to exploit that as a vulnerability.
1593	   Generic decoders used in applications where this might be a problem
1594	   need to support a strict mode in which it is also the responsibility
1595	   of the receiver to reject ambiguously decodable data.  It is expected
1596	   that firewalls and other security systems that decode CBOR will only
1597	   decode in strict mode.

1599	   A decoder in strict mode will reliably reject any data that could be
1600	   interpreted by other decoders in different ways.  It will reliably
1601	   reject data items with syntax errors (Section 4.3).  It will also
1602	   expend the effort to reliably detect other decoding errors
1603	   (Section 4.4).  In particular, a strict decoder needs to have an API
1604	   that reports an error (and does not return data) for a CBOR data item
1605	   that contains any of the following:

1607	   o  a map (major type 5) that has more than one entry with the same
1608	      key

1610	   o  a tag that is used on a data item of the incorrect type

1612	   o  a data item that is incorrectly formatted for the type given to
1613	      it, such as invalid UTF-8 or data that cannot be interpreted with
1614	      the specific tag that it has been tagged with

1616	   A decoder in strict mode can do one of two things when it encounters
1617	   a tag or simple value that it does not recognize:

1619	   o  It can report an error (and not return data).

1621	   o  It can emit the unknown item (type, value, and, for tags, the
1622	      decoded tagged data item) to the application calling the decoder
1623	      with an indication that the decoder did not recognize that tag or
1624	      simple value.

1626	   The latter approach, which is also appropriate for non-strict
1627	   decoders, supports forward compatibility with newly registered tags
1628	   and simple values without the requirement to update the encoder at
1629	   the same time as the calling application.  (For this, the API for the
1630	   decoder needs to have a way to mark unknown items so that the calling
1631	   application can handle them in a manner appropriate for the program.)

1633	   Since some of this processing may have an appreciable cost (in
1634	   particular with duplicate detection for maps), support of strict mode
1635	   is not a requirement placed on all CBOR decoders.

1637	   Some encoders will rely on their applications to provide input data
1638	   in such a way that unambiguously decodable CBOR results.  A generic
1639	   encoder also may want to provide a strict mode where it reliably
1640	   limits its output to unambiguously decodable CBOR, independent of
1641	   whether or not its application is providing API-conformant data.

1643	5.  Converting Data between CBOR and JSON

1645	   This section gives non-normative advice about converting between CBOR
1646	   and JSON.  Implementations of converters are free to use whichever
1647	   advice here they want.

1649	   It is worth noting that a JSON text is a sequence of characters, not
1650	   an encoded sequence of bytes, while a CBOR data item consists of
1651	   bytes, not characters.

1653	5.1.  Converting from CBOR to JSON

1655	   Most of the types in CBOR have direct analogs in JSON.  However, some
1656	   do not, and someone implementing a CBOR-to-JSON converter has to
1657	   consider what to do in those cases.  The following non-normative
1658	   advice deals with these by converting them to a single substitute
1659	   value, such as a JSON null.

1661	   o  An integer (major type 0 or 1) becomes a JSON number.

1663	   o  A byte string (major type 2) that is not embedded in a tag that
1664	      specifies a proposed encoding is encoded in base64url without
1665	      padding and becomes a JSON string.

1667	   o  A UTF-8 string (major type 3) becomes a JSON string.  Note that
1668	      JSON requires escaping certain characters ([RFC8259], Section 7):
1669	      quotation mark (U+0022), reverse solidus (U+005C), and the "C0
1670	      control characters" (U+0000 through U+001F).  All other characters
1671	      are copied unchanged into the JSON UTF-8 string.

1673	   o  An array (major type 4) becomes a JSON array.

1675	   o  A map (major type 5) becomes a JSON object.  This is possible
1676	      directly only if all keys are UTF-8 strings.  A converter might
1677	      also convert other keys into UTF-8 strings (such as by converting
1678	      integers into strings containing their decimal representation);
1679	      however, doing so introduces a danger of key collision.

1681	   o  False (major type 7, additional information 20) becomes a JSON
1682	      false.

1684	   o  True (major type 7, additional information 21) becomes a JSON
1685	      true.

1687	   o  Null (major type 7, additional information 22) becomes a JSON
1688	      null.

1690	   o  A floating-point value (major type 7, additional information 25
1691	      through 27) becomes a JSON number if it is finite (that is, it can
1692	      be represented in a JSON number); if the value is non-finite (NaN,
1693	      or positive or negative Infinity), it is represented by the
1694	      substitute value.

1696	   o  Any other simple value (major type 7, any additional information
1697	      value not yet discussed) is represented by the substitute value.

1699	   o  A bignum (major type 6, tag value 2 or 3) is represented by
1700	      encoding its byte string in base64url without padding and becomes
1701	      a JSON string.  For tag value 3 (negative bignum), a "~" (ASCII
1702	      tilde) is inserted before the base-encoded value.  (The conversion
1703	      to a binary blob instead of a number is to prevent a likely
1704	      numeric overflow for the JSON decoder.)

1706	   o  A byte string with an encoding hint (major type 6, tag value 21
1707	      through 23) is encoded as described and becomes a JSON string.

1709	   o  For all other tags (major type 6, any other tag value), the
1710	      embedded CBOR item is represented as a JSON value; the tag value
1711	      is ignored.

1713	   o  Indefinite-length items are made definite before conversion.

1715	5.2.  Converting from JSON to CBOR

1717	   All JSON values, once decoded, directly map into one or more CBOR
1718	   values.  As with any kind of CBOR generation, decisions have to be
1719	   made with respect to number representation.  In a suggested
1720	   conversion:

1722	   o  JSON numbers without fractional parts (integer numbers) are
1723	      represented as integers (major types 0 and 1, possibly major type
1724	      6 tag value 2 and 3), choosing the shortest form; integers longer
1725	      than an implementation-defined threshold (which is usually either
1726	      32 or 64 bits) may instead be represented as floating-point
1727	      values.  (If the JSON was generated from a JavaScript
1728	      implementation, its precision is already limited to 53 bits
1729	      maximum.)

1731	   o  Numbers with fractional parts are represented as floating-point
1732	      values.  Preferably, the shortest exact floating-point
1733	      representation is used; for instance, 1.5 is represented in a
1734	      16-bit floating-point value (not all implementations will be
1735	      capable of efficiently finding the minimum form, though).  There
1736	      may be an implementation-defined limit to the precision that will
1737	      affect the precision of the represented values.  Decimal
1738	      representation should only be used if that is specified in a
1739	      protocol.

1741	   CBOR has been designed to generally provide a more compact encoding
1742	   than JSON.  One implementation strategy that might come to mind is to
1743	   perform a JSON-to-CBOR encoding in place in a single buffer.  This
1744	   strategy would need to carefully consider a number of pathological
1745	   cases, such as that some strings represented with no or very few
1746	   escapes and longer (or much longer) than 255 bytes may expand when
1747	   encoded as UTF-8 strings in CBOR.  Similarly, a few of the binary
1748	   floating-point representations might cause expansion from some short
1749	   decimal representations (1.1, 1e9) in JSON.  This may be hard to get
1750	   right, and any ensuing vulnerabilities may be exploited by an
1751	   attacker.

1753	6.  Future Evolution of CBOR

1755	   Successful protocols evolve over time.  New ideas appear,
1756	   implementation platforms improve, related protocols are developed and
1757	   evolve, and new requirements from applications and protocols are
1758	   added.  Facilitating protocol evolution is therefore an important
1759	   design consideration for any protocol development.

1761	   For protocols that will use CBOR, CBOR provides some useful
1762	   mechanisms to facilitate their evolution.  Best practices for this
1763	   are well known, particularly from JSON format development of JSON-
1764	   based protocols.  Therefore, such best practices are outside the
1765	   scope of this specification.

1767	   However, facilitating the evolution of CBOR itself is very well
1768	   within its scope.  CBOR is designed to both provide a stable basis
1769	   for development of CBOR-based protocols and to be able to evolve.
1770	   Since a successful protocol may live for decades, CBOR needs to be
1771	   designed for decades of use and evolution.  This section provides
1772	   some guidance for the evolution of CBOR.  It is necessarily more
1773	   subjective than other parts of this document.  It is also necessarily
1774	   incomplete, lest it turn into a textbook on protocol development.

1776	6.1.  Extension Points

1778	   In a protocol design, opportunities for evolution are often included
1779	   in the form of extension points.  For example, there may be a
1780	   codepoint space that is not fully allocated from the outset, and the
1781	   protocol is designed to tolerate and embrace implementations that
1782	   start using more codepoints than initially allocated.

1784	   Sizing the codepoint space may be difficult because the range
1785	   required may be hard to predict.  An attempt should be made to make
1786	   the codepoint space large enough so that it can slowly be filled over
1787	   the intended lifetime of the protocol.

1789	   CBOR has three major extension points:

1791	   o  the "simple" space (values in major type 7).  Of the 24 efficient
1792	      (and 224 slightly less efficient) values, only a small number have
1793	      been allocated.  Implementations receiving an unknown simple data
1794	      item may be able to process it as such, given that the structure
1795	      of the value is indeed simple.  The IANA registry in Section 8.1
1796	      is the appropriate way to address the extensibility of this
1797	      codepoint space.

1799	   o  the "tag" space (values in major type 6).  Again, only a small
1800	      part of the codepoint space has been allocated, and the space is
1801	      abundant (although the early numbers are more efficient than the
1802	      later ones).  Implementations receiving an unknown tag can choose
1803	      to simply ignore it or to process it as an unknown tag wrapping
1804	      the following data item.  The IANA registry in Section 8.2 is the
1805	      appropriate way to address the extensibility of this codepoint
1806	      space.

1808	   o  the "additional information" space.  An implementation receiving
1809	      an unknown additional information value has no way to continue
1810	      parsing, so allocating codepoints to this space is a major step.
1811	      There are also very few codepoints left.

1813	6.2.  Curating the Additional Information Space

1815	   The human mind is sometimes drawn to filling in little perceived gaps
1816	   to make something neat.  We expect the remaining gaps in the
1817	   codepoint space for the additional information values to be an
1818	   attractor for new ideas, just because they are there.

1820	   The present specification does not manage the additional information
1821	   codepoint space by an IANA registry.  Instead, allocations out of
1822	   this space can only be done by updating this specification.

1824	   For an additional information value of n >= 24, the size of the
1825	   additional data typically is 2**(n-24) bytes.  Therefore, additional
1826	   information values 28 and 29 should be viewed as candidates for
1827	   128-bit and 256-bit quantities, in case a need arises to add them to
1828	   the protocol.  Additional information value 30 is then the only
1829	   additional information value available for general allocation, and
1830	   there should be a very good reason for allocating it before assigning
1831	   it through an update of this protocol.

1833	7.  Diagnostic Notation

1835	   CBOR is a binary interchange format.  To facilitate documentation and
1836	   debugging, and in particular to facilitate communication between
1837	   entities cooperating in debugging, this section defines a simple
1838	   human-readable diagnostic notation.  All actual interchange always
1839	   happens in the binary format.

1841	   Note that this truly is a diagnostic format; it is not meant to be
1842	   parsed.  Therefore, no formal definition (as in ABNF) is given in
1843	   this document.  (Implementers looking for a text-based format for
1844	   representing CBOR data items in configuration files may also want to
1845	   consider YAML [YAML].)

1847	   The diagnostic notation is loosely based on JSON as it is defined in
1848	   RFC 8259, extending it where needed.

1850	   The notation borrows the JSON syntax for numbers (integer and
1851	   floating point), True (>true<), False (>false<), Null (>null<), UTF-8
1852	   strings, arrays, and maps (maps are called objects in JSON; the
1853	   diagnostic notation extends JSON here by allowing any data item in
1854	   the key position).  Undefined is written >undefined< as in
1855	   JavaScript.  The non-finite floating-point numbers Infinity,
1856	   -Infinity, and NaN are written exactly as in this sentence (this is
1857	   also a way they can be written in JavaScript, although JSON does not
1858	   allow them).  A tagged item is written as an integer number for the
1859	   tag followed by the item in parentheses; for instance, an RFC 3339
1860	   (ISO 8601) date could be notated as:

1862	      0("2013-03-21T20:04:00Z")

1864	   or the equivalent relative time as

1866	      1(1363896240)

1868	   Byte strings are notated in one of the base encodings, without
1869	   padding, enclosed in single quotes, prefixed by >h< for base16, >b32<
1870	   for base32, >h32< for base32hex, >b64< for base64 or base64url (the
1871	   actual encodings do not overlap, so the string remains unambiguous).
1872	   For example, the byte string 0x12345678 could be written h'12345678',
1873	   b32'CI2FM6A', or b64'EjRWeA'.

1875	   Unassigned simple values are given as "simple()" with the appropriate
1876	   integer in the parentheses.  For example, "simple(42)" indicates
1877	   major type 7, value 42.

1879	7.1.  Encoding Indicators

1881	   Sometimes it is useful to indicate in the diagnostic notation which
1882	   of several alternative representations were actually used; for
1883	   example, a data item written >1.5< by a diagnostic decoder might have
1884	   been encoded as a half-, single-, or double-precision float.

1886	   The convention for encoding indicators is that anything starting with
1887	   an underscore and all following characters that are alphanumeric or
1888	   underscore, is an encoding indicator, and can be ignored by anyone
1889	   not interested in this information.  Encoding indicators are always
1890	   optional.

1892	   A single underscore can be written after the opening brace of a map
1893	   or the opening bracket of an array to indicate that the data item was
1894	   represented in indefinite-length format.  For example, [_ 1, 2]
1895	   contains an indicator that an indefinite-length representation was
1896	   used to represent the data item [1, 2].

1898	   An underscore followed by a decimal digit n indicates that the
1899	   preceding item (or, for arrays and maps, the item starting with the
1900	   preceding bracket or brace) was encoded with an additional
1901	   information value of 24+n.  For example, 1.5_1 is a half-precision
1902	   floating-point number, while 1.5_3 is encoded as double precision.
1903	   This encoding indicator is not shown in Appendix A.  (Note that the
1904	   encoding indicator "_" is thus an abbreviation of the full form "_7",
1905	   which is not used.)
1906	   As a special case, byte and text strings of indefinite length can be
1907	   notated in the form (_ h'0123', h'4567') and (_ "foo", "bar").

1909	8.  IANA Considerations

1911	   IANA has created two registries for new CBOR values.  The registries
1912	   are separate, that is, not under an umbrella registry, and follow the
1913	   rules in [RFC8126].  IANA has also assigned a new MIME media type and
1914	   an associated Constrained Application Protocol (CoAP) Content-Format
1915	   entry.

1917	8.1.  Simple Values Registry

1919	   IANA has created the "Concise Binary Object Representation (CBOR)
1920	   Simple Values" registry.  The initial values are shown in Table 2.

1922	   New entries in the range 0 to 19 are assigned by Standards Action.
1923	   It is suggested that these Standards Actions allocate values starting
1924	   with the number 16 in order to reserve the lower numbers for
1925	   contiguous blocks (if any).

1927	   New entries in the range 32 to 255 are assigned by Specification
1928	   Required.

1930	8.2.  Tags Registry

1932	   IANA has created the "Concise Binary Object Representation (CBOR)
1933	   Tags" registry.  The initial values are shown in Table 3.

1935	   New entries in the range 0 to 23 are assigned by Standards Action.
1936	   New entries in the range 24 to 255 are assigned by Specification
1937	   Required.  New entries in the range 256 to 18446744073709551615 are
1938	   assigned by First Come First Served.  The template for registration
1939	   requests is:

1941	   o  Data item

1943	   o  Semantics (short form)

1945	   In addition, First Come First Served requests should include:

1947	   o  Point of contact

1949	   o  Description of semantics (URL) - This description is optional; the
1950	      URL can point to something like an Internet-Draft or a web page.

1952	8.3.  Media Type ("MIME Type")

1954	   The Internet media type [RFC6838] for CBOR data is application/cbor.

1956	   Type name: application

1958	   Subtype name: cbor

1960	   Required parameters: n/a

1962	   Optional parameters: n/a

1964	   Encoding considerations:  binary

1966	   Security considerations:  See Section 9 of this document

1968	   Interoperability considerations: n/a

1970	   Published specification: This document

1972	   Applications that use this media type:  None yet, but it is expected
1973	      that this format will be deployed in protocols and applications.

1975	   Additional information:
1976	     Magic number(s): n/a
1977	     File extension(s): .cbor
1978	     Macintosh file type code(s): n/a

1980	   Person & email address to contact for further information:
1981	     Carsten Bormann
1982	     cabo@tzi.org

1984	   Intended usage: COMMON

1986	   Restrictions on usage: none

1988	   Author:
1989	     Carsten Bormann 

1991	   Change controller:
1992	     The IESG 

1994	8.4.  CoAP Content-Format

1996	   Media Type: application/cbor

1998	   Encoding: -
1999	   Id: 60

2001	   Reference: [RFCthis]

2003	8.5.  The +cbor Structured Syntax Suffix Registration

2005	   Name: Concise Binary Object Representation (CBOR)

2007	   +suffix: +cbor

2009	   References: [RFCthis]

2011	   Encoding Considerations: CBOR is a binary format.

2013	   Interoperability Considerations: n/a

2015	   Fragment Identifier Considerations:
2016	     The syntax and semantics of fragment identifiers specified for
2017	     +cbor SHOULD be as specified for "application/cbor".  (At
2018	     publication of this document, there is no fragment identification
2019	     syntax defined for "application/cbor".)

2021	     The syntax and semantics for fragment identifiers for a specific
2022	     "xxx/yyy+cbor" SHOULD be processed as follows:

2024	     For cases defined in +cbor, where the fragment identifier resolves
2025	     per the +cbor rules, then process as specified in +cbor.

2027	     For cases defined in +cbor, where the fragment identifier does
2028	     not resolve per the +cbor rules, then process as specified in
2029	     "xxx/yyy+cbor".

2031	     For cases not defined in +cbor, then process as specified in
2032	     "xxx/yyy+cbor".

2034	   Security Considerations:  See Section 9 of this document

2036	   Contact:
2037	     Apps Area Working Group (apps-discuss@ietf.org)

2039	   Author/Change Controller:
2040	     The Apps Area Working Group.
2041	     The IESG has change control over this registration.

2043	9.  Security Considerations

2045	   A network-facing application can exhibit vulnerabilities in its
2046	   processing logic for incoming data.  Complex parsers are well known
2047	   as a likely source of such vulnerabilities, such as the ability to
2048	   remotely crash a node, or even remotely execute arbitrary code on it.
2049	   CBOR attempts to narrow the opportunities for introducing such
2050	   vulnerabilities by reducing parser complexity, by giving the entire
2051	   range of encodable values a meaning where possible.

2053	   Resource exhaustion attacks might attempt to lure a decoder into
2054	   allocating very big data items (strings, arrays, maps) or exhaust the
2055	   stack depth by setting up deeply nested items.  Decoders need to have
2056	   appropriate resource management to mitigate these attacks.  (Items
2057	   for which very large sizes are given can also attempt to exploit
2058	   integer overflow vulnerabilities.)

2060	   Applications where a CBOR data item is examined by a gatekeeper
2061	   function and later used by a different application may exhibit
2062	   vulnerabilities when multiple interpretations of the data item are
2063	   possible.  For example, an attacker could make use of duplicate keys
2064	   in maps and precision issues in numbers to make the gatekeeper base
2065	   its decisions on a different interpretation than the one that will be
2066	   used by the second application.  Protocols that are used in a
2067	   security context should be defined in such a way that these multiple
2068	   interpretations are reliably reduced to a single one.  To facilitate
2069	   this, encoder and decoder implementations used in such contexts
2070	   should provide at least one strict mode of operation (Section 4.11).

2072	10.  Acknowledgements

2074	   CBOR was inspired by MessagePack.  MessagePack was developed and
2075	   promoted by Sadayuki Furuhashi ("frsyuki").  This reference to
2076	   MessagePack is solely for attribution; CBOR is not intended as a
2077	   version of or replacement for MessagePack, as it has different design
2078	   goals and requirements.

2080	   The need for functionality beyond the original MessagePack
2081	   Specification became obvious to many people at about the same time
2082	   around the year 2012.  BinaryPack is a minor derivation of
2083	   MessagePack that was developed by Eric Zhang for the binaryjs
2084	   project.  A similar, but different, extension was made by Tim Caswell
2085	   for his msgpack-js and msgpack-js-browser projects.  Many people have
2086	   contributed to the recent discussion about extending MessagePack to
2087	   separate text string representation from byte string representation.

2089	   The encoding of the additional information in CBOR was inspired by
2090	   the encoding of length information designed by Klaus Hartke for CoAP.

2092	   This document also incorporates suggestions made by many people,
2093	   notably Dan Frost, James Manger, Joe Hildebrand, Keith Moore,
2094	   Laurence Lundblade, Matthew Lepinski, Michael Richardson, Nico
2095	   Williams, Phillip Hallam-Baker, Ray Polk, Tim Bray, Tony Finch, Tony
2096	   Hansen, and Yaron Sheffer.

2098	11.  References

2100	11.1.  Normative References

2102	   [ECMA262]  Ecma International, "ECMAScript 2018 Language
2103	              Specification", ECMA Standard ECMA-262, 9th Edition, June
2104	              2018, .

2108	   [IEEE.754.2008]
2109	              Institute of Electrical and Electronics Engineers, "IEEE
2110	              Standard for Floating-Point Arithmetic", IEEE
2111	              Standard 754-2008, August 2008.

2113	   [RFC2045]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
2114	              Extensions (MIME) Part One: Format of Internet Message
2115	              Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996,
2116	              .

2118	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
2119	              Requirement Levels", BCP 14, RFC 2119,
2120	              DOI 10.17487/RFC2119, March 1997,
2121	              .

2123	   [RFC3339]  Klyne, G. and C. Newman, "Date and Time on the Internet:
2124	              Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002,
2125	              .

2127	   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
2128	              10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November
2129	              2003, .

2131	   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
2132	              Resource Identifier (URI): Generic Syntax", STD 66,
2133	              RFC 3986, DOI 10.17487/RFC3986, January 2005,
2134	              .

2136	   [RFC4287]  Nottingham, M., Ed. and R. Sayre, Ed., "The Atom
2137	              Syndication Format", RFC 4287, DOI 10.17487/RFC4287,
2138	              December 2005, .

2140	   [RFC4648]  Josefsson, S., "The Base16, Base32, and Base64 Data
2141	              Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
2142	              .

2144	   [RFC8126]  Cotton, M., Leiba, B., and T. Narten, "Guidelines for
2145	              Writing an IANA Considerations Section in RFCs", BCP 26,
2146	              RFC 8126, DOI 10.17487/RFC8126, June 2017,
2147	              .

2149	   [TIME_T]   The Open Group Base Specifications, "Vol. 1: Base
2150	              Definitions, Issue 7", Section 4.15 'Seconds Since the
2151	              Epoch', IEEE Std 1003.1, 2013 Edition, 2013,
2152	              .

2155	11.2.  Informative References

2157	   [ASN.1]    International Telecommunication Union, "Information
2158	              Technology -- ASN.1 encoding rules: Specification of Basic
2159	              Encoding Rules (BER), Canonical Encoding Rules (CER) and
2160	              Distinguished Encoding Rules (DER)", ITU-T Recommendation
2161	              X.690, 1994.

2163	   [BSON]     Various, "BSON - Binary JSON", 2013,
2164	              .

2166	   [MessagePack]
2167	              Furuhashi, S., "MessagePack", 2013, .

2169	   [PCRE]     Ho, A., "PCRE - Perl Compatible Regular Expressions",
2170	              2018, .

2172	   [RFC0713]  Haverty, J., "MSDTP-Message Services Data Transmission
2173	              Protocol", RFC 713, DOI 10.17487/RFC0713, April 1976,
2174	              .

2176	   [RFC6838]  Freed, N., Klensin, J., and T. Hansen, "Media Type
2177	              Specifications and Registration Procedures", BCP 13,
2178	              RFC 6838, DOI 10.17487/RFC6838, January 2013,
2179	              .

2181	   [RFC7049]  Bormann, C. and P. Hoffman, "Concise Binary Object
2182	              Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049,
2183	              October 2013, .

2185	   [RFC7228]  Bormann, C., Ersue, M., and A. Keranen, "Terminology for
2186	              Constrained-Node Networks", RFC 7228,
2187	              DOI 10.17487/RFC7228, May 2014,
2188	              .

2190	   [RFC8259]  Bray, T., Ed., "The JavaScript Object Notation (JSON) Data
2191	              Interchange Format", STD 90, RFC 8259,
2192	              DOI 10.17487/RFC8259, December 2017,
2193	              .

2195	   [YAML]     Ben-Kiki, O., Evans, C., and I. Net, "YAML Ain't Markup
2196	              Language (YAML[TM]) Version 1.2", 3rd Edition, October
2197	              2009, .

2199	Appendix A.  Examples

2201	   The following table provides some CBOR-encoded values in hexadecimal
2202	   (right column), together with diagnostic notation for these values
2203	   (left column).  Note that the string "\u00fc" is one form of
2204	   diagnostic notation for a UTF-8 string containing the single Unicode
2205	   character U+00FC, LATIN SMALL LETTER U WITH DIAERESIS (u umlaut).
2206	   Similarly, "\u6c34" is a UTF-8 string in diagnostic notation with a
2207	   single character U+6C34 (CJK UNIFIED IDEOGRAPH-6C34, often
2208	   representing "water"), and "\ud800\udd51" is a UTF-8 string in
2209	   diagnostic notation with a single character U+10151 (GREEK ACROPHONIC
2210	   ATTIC FIFTY STATERS).  (Note that all these single-character strings
2211	   could also be represented in native UTF-8 in diagnostic notation,
2212	   just not in an ASCII-only specification like the present one.)  In
2213	   the diagnostic notation provided for bignums, their intended numeric
2214	   value is shown as a decimal number (such as 18446744073709551616)
2215	   instead of showing a tagged byte string (such as
2216	   2(h'010000000000000000')).

2218	   +------------------------------+------------------------------------+
2219	   | Diagnostic                   | Encoded                            |
2220	   +------------------------------+------------------------------------+
2221	   | 0                            | 0x00                               |
2222	   |                              |                                    |
2223	   | 1                            | 0x01                               |
2224	   |                              |                                    |
2225	   | 10                           | 0x0a                               |
2226	   |                              |                                    |
2227	   | 23                           | 0x17                               |
2228	   |                              |                                    |
2229	   | 24                           | 0x1818                             |
2230	   |                              |                                    |
2231	   | 25                           | 0x1819                             |
2232	   |                              |                                    |
2233	   | 100                          | 0x1864                             |
2234	   |                              |                                    |
2235	   | 1000                         | 0x1903e8                           |
2236	   |                              |                                    |
2237	   | 1000000                      | 0x1a000f4240                       |
2238	   |                              |                                    |
2239	   | 1000000000000                | 0x1b000000e8d4a51000               |
2240	   |                              |                                    |
2241	   | 18446744073709551615         | 0x1bffffffffffffffff               |
2242	   |                              |                                    |
2243	   | 18446744073709551616         | 0xc249010000000000000000           |
2244	   |                              |                                    |
2245	   | -18446744073709551616        | 0x3bffffffffffffffff               |
2246	   |                              |                                    |
2247	   | -18446744073709551617        | 0xc349010000000000000000           |
2248	   |                              |                                    |
2249	   | -1                           | 0x20                               |
2250	   |                              |                                    |
2251	   | -10                          | 0x29                               |
2252	   |                              |                                    |
2253	   | -100                         | 0x3863                             |
2254	   |                              |                                    |
2255	   | -1000                        | 0x3903e7                           |
2256	   |                              |                                    |
2257	   | 0.0                          | 0xf90000                           |
2258	   |                              |                                    |
2259	   | -0.0                         | 0xf98000                           |
2260	   |                              |                                    |
2261	   | 1.0                          | 0xf93c00                           |
2262	   |                              |                                    |
2263	   | 1.1                          | 0xfb3ff199999999999a               |
2264	   |                              |                                    |
2265	   | 1.5                          | 0xf93e00                           |
2266	   |                              |                                    |
2267	   | 65504.0                      | 0xf97bff                           |
2268	   |                              |                                    |
2269	   | 100000.0                     | 0xfa47c35000                       |
2270	   |                              |                                    |
2271	   | 3.4028234663852886e+38       | 0xfa7f7fffff                       |
2272	   |                              |                                    |
2273	   | 1.0e+300                     | 0xfb7e37e43c8800759c               |
2274	   |                              |                                    |
2275	   | 5.960464477539063e-8         | 0xf90001                           |
2276	   |                              |                                    |
2277	   | 0.00006103515625             | 0xf90400                           |
2278	   |                              |                                    |
2279	   | -4.0                         | 0xf9c400                           |
2280	   |                              |                                    |
2281	   | -4.1                         | 0xfbc010666666666666               |
2282	   |                              |                                    |
2283	   | Infinity                     | 0xf97c00                           |
2284	   |                              |                                    |
2285	   | NaN                          | 0xf97e00                           |
2286	   |                              |                                    |
2287	   | -Infinity                    | 0xf9fc00                           |
2288	   |                              |                                    |
2289	   | Infinity                     | 0xfa7f800000                       |
2290	   |                              |                                    |
2291	   | NaN                          | 0xfa7fc00000                       |
2292	   |                              |                                    |
2293	   | -Infinity                    | 0xfaff800000                       |
2294	   |                              |                                    |
2295	   | Infinity                     | 0xfb7ff0000000000000               |
2296	   |                              |                                    |
2297	   | NaN                          | 0xfb7ff8000000000000               |
2298	   |                              |                                    |
2299	   | -Infinity                    | 0xfbfff0000000000000               |
2300	   |                              |                                    |
2301	   | false                        | 0xf4                               |
2302	   |                              |                                    |
2303	   | true                         | 0xf5                               |
2304	   |                              |                                    |
2305	   | null                         | 0xf6                               |
2306	   |                              |                                    |
2307	   | undefined                    | 0xf7                               |
2308	   |                              |                                    |
2309	   | simple(16)                   | 0xf0                               |
2310	   |                              |                                    |
2311	   | simple(24)                   | 0xf818                             |
2312	   |                              |                                    |
2313	   | simple(255)                  | 0xf8ff                             |
2314	   |                              |                                    |
2315	   | 0("2013-03-21T20:04:00Z")    | 0xc074323031332d30332d32315432303a |
2316	   |                              | 30343a30305a                       |
2317	   |                              |                                    |
2318	   | 1(1363896240)                | 0xc11a514b67b0                     |
2319	   |                              |                                    |
2320	   | 1(1363896240.5)              | 0xc1fb41d452d9ec200000             |
2321	   |                              |                                    |
2322	   | 23(h'01020304')              | 0xd74401020304                     |
2323	   |                              |                                    |
2324	   | 24(h'6449455446')            | 0xd818456449455446                 |
2325	   |                              |                                    |
2326	   | 32("http://www.example.com") | 0xd82076687474703a2f2f7777772e6578 |
2327	   |                              | 616d706c652e636f6d                 |
2328	   |                              |                                    |
2329	   | h''                          | 0x40                               |
2330	   |                              |                                    |
2331	   | h'01020304'                  | 0x4401020304                       |
2332	   |                              |                                    |
2333	   | ""                           | 0x60                               |
2334	   |                              |                                    |
2335	   | "a"                          | 0x6161                             |
2336	   |                              |                                    |
2337	   | "IETF"                       | 0x6449455446                       |
2338	   |                              |                                    |
2339	   | "\"\\"                       | 0x62225c                           |
2340	   |                              |                                    |
2341	   | "\u00fc"                     | 0x62c3bc                           |
2342	   |                              |                                    |
2343	   | "\u6c34"                     | 0x63e6b0b4                         |
2344	   |                              |                                    |
2345	   | "\ud800\udd51"               | 0x64f0908591                       |
2346	   |                              |                                    |
2347	   | []                           | 0x80                               |
2348	   |                              |                                    |
2349	   | [1, 2, 3]                    | 0x83010203                         |
2350	   |                              |                                    |
2351	   | [1, [2, 3], [4, 5]]          | 0x8301820203820405                 |
2352	   |                              |                                    |
2353	   | [1, 2, 3, 4, 5, 6, 7, 8, 9,  | 0x98190102030405060708090a0b0c0d0e |
2354	   | 10, 11, 12, 13, 14, 15, 16,  | 0f101112131415161718181819         |
2355	   | 17, 18, 19, 20, 21, 22, 23,  |                                    |
2356	   | 24, 25]                      |                                    |
2357	   |                              |                                    |
2358	   | {}                           | 0xa0                               |
2359	   |                              |                                    |
2360	   | {1: 2, 3: 4}                 | 0xa201020304                       |
2361	   |                              |                                    |
2362	   | {"a": 1, "b": [2, 3]}        | 0xa26161016162820203               |
2363	   |                              |                                    |
2364	   | ["a", {"b": "c"}]            | 0x826161a161626163                 |
2365	   |                              |                                    |
2366	   | {"a": "A", "b": "B", "c":    | 0xa5616161416162614261636143616461 |
2367	   | "C", "d": "D", "e": "E"}     | 4461656145                         |
2368	   |                              |                                    |
2369	   | (_ h'0102', h'030405')       | 0x5f42010243030405ff               |
2370	   |                              |                                    |
2371	   | (_ "strea", "ming")          | 0x7f657374726561646d696e67ff       |
2372	   |                              |                                    |
2373	   | [_ ]                         | 0x9fff                             |
2374	   |                              |                                    |
2375	   | [_ 1, [2, 3], [_ 4, 5]]      | 0x9f018202039f0405ffff             |
2376	   |                              |                                    |
2377	   | [_ 1, [2, 3], [4, 5]]        | 0x9f01820203820405ff               |
2378	   |                              |                                    |
2379	   | [1, [2, 3], [_ 4, 5]]        | 0x83018202039f0405ff               |
2380	   |                              |                                    |
2381	   | [1, [_ 2, 3], [4, 5]]        | 0x83019f0203ff820405               |
2382	   |                              |                                    |
2383	   | [_ 1, 2, 3, 4, 5, 6, 7, 8,   | 0x9f0102030405060708090a0b0c0d0e0f |
2384	   | 9, 10, 11, 12, 13, 14, 15,   | 101112131415161718181819ff         |
2385	   | 16, 17, 18, 19, 20, 21, 22,  |                                    |
2386	   | 23, 24, 25]                  |                                    |
2387	   |                              |                                    |
2388	   | {_ "a": 1, "b": [_ 2, 3]}    | 0xbf61610161629f0203ffff           |
2389	   |                              |                                    |
2390	   | ["a", {_ "b": "c"}]          | 0x826161bf61626163ff               |
2391	   |                              |                                    |
2392	   | {_ "Fun": true, "Amt": -2}   | 0xbf6346756ef563416d7421ff         |
2393	   +------------------------------+------------------------------------+

2395	               Table 4: Examples of Encoded CBOR Data Items

2397	Appendix B.  Jump Table

2399	   For brevity, this jump table does not show initial bytes that are
2400	   reserved for future extension.  It also only shows a selection of the
2401	   initial bytes that can be used for optional features.  (All unsigned
2402	   integers are in network byte order.)

2404	   +------------+------------------------------------------------------+
2405	   | Byte       | Structure/Semantics                                  |
2406	   +------------+------------------------------------------------------+
2407	   | 0x00..0x17 | Integer 0x00..0x17 (0..23)                           |
2408	   |            |                                                      |
2409	   | 0x18       | Unsigned integer (one-byte uint8_t follows)          |
2410	   |            |                                                      |
2411	   | 0x19       | Unsigned integer (two-byte uint16_t follows)         |
2412	   |            |                                                      |
2413	   | 0x1a       | Unsigned integer (four-byte uint32_t follows)        |
2414	   |            |                                                      |
2415	   | 0x1b       | Unsigned integer (eight-byte uint64_t follows)       |
2416	   |            |                                                      |
2417	   | 0x20..0x37 | Negative integer -1-0x00..-1-0x17 (-1..-24)          |
2418	   |            |                                                      |
2419	   | 0x38       | Negative integer -1-n (one-byte uint8_t for n        |
2420	   |            | follows)                                             |
2421	   |            |                                                      |
2422	   | 0x39       | Negative integer -1-n (two-byte uint16_t for n       |
2423	   |            | follows)                                             |
2424	   |            |                                                      |
2425	   | 0x3a       | Negative integer -1-n (four-byte uint32_t for n      |
2426	   |            | follows)                                             |
2427	   |            |                                                      |
2428	   | 0x3b       | Negative integer -1-n (eight-byte uint64_t for n     |
2429	   |            | follows)                                             |
2430	   |            |                                                      |
2431	   | 0x40..0x57 | byte string (0x00..0x17 bytes follow)                |
2432	   |            |                                                      |
2433	   | 0x58       | byte string (one-byte uint8_t for n, and then n      |
2434	   |            | bytes follow)                                        |
2435	   |            |                                                      |
2436	   | 0x59       | byte string (two-byte uint16_t for n, and then n     |
2437	   |            | bytes follow)                                        |
2438	   |            |                                                      |
2439	   | 0x5a       | byte string (four-byte uint32_t for n, and then n    |
2440	   |            | bytes follow)                                        |
2441	   |            |                                                      |
2442	   | 0x5b       | byte string (eight-byte uint64_t for n, and then n   |
2443	   |            | bytes follow)                                        |
2444	   |            |                                                      |
2445	   | 0x5f       | byte string, byte strings follow, terminated by      |
2446	   |            | "break"                                              |
2447	   |            |                                                      |
2448	   | 0x60..0x77 | UTF-8 string (0x00..0x17 bytes follow)               |
2449	   |            |                                                      |
2450	   | 0x78       | UTF-8 string (one-byte uint8_t for n, and then n     |
2451	   |            | bytes follow)                                        |
2452	   |            |                                                      |
2453	   | 0x79       | UTF-8 string (two-byte uint16_t for n, and then n    |
2454	   |            | bytes follow)                                        |
2455	   |            |                                                      |
2456	   | 0x7a       | UTF-8 string (four-byte uint32_t for n, and then n   |
2457	   |            | bytes follow)                                        |
2458	   |            |                                                      |
2459	   | 0x7b       | UTF-8 string (eight-byte uint64_t for n, and then n  |
2460	   |            | bytes follow)                                        |
2461	   |            |                                                      |
2462	   | 0x7f       | UTF-8 string, UTF-8 strings follow, terminated by    |
2463	   |            | "break"                                              |
2464	   |            |                                                      |
2465	   | 0x80..0x97 | array (0x00..0x17 data items follow)                 |
2466	   |            |                                                      |
2467	   | 0x98       | array (one-byte uint8_t for n, and then n data items |
2468	   |            | follow)                                              |
2469	   |            |                                                      |
2470	   | 0x99       | array (two-byte uint16_t for n, and then n data      |
2471	   |            | items follow)                                        |
2472	   |            |                                                      |
2473	   | 0x9a       | array (four-byte uint32_t for n, and then n data     |
2474	   |            | items follow)                                        |
2475	   |            |                                                      |
2476	   | 0x9b       | array (eight-byte uint64_t for n, and then n data    |
2477	   |            | items follow)                                        |
2478	   |            |                                                      |
2479	   | 0x9f       | array, data items follow, terminated by "break"      |
2480	   |            |                                                      |
2481	   | 0xa0..0xb7 | map (0x00..0x17 pairs of data items follow)          |
2482	   |            |                                                      |
2483	   | 0xb8       | map (one-byte uint8_t for n, and then n pairs of     |
2484	   |            | data items follow)                                   |
2485	   |            |                                                      |
2486	   | 0xb9       | map (two-byte uint16_t for n, and then n pairs of    |
2487	   |            | data items follow)                                   |
2488	   |            |                                                      |
2489	   | 0xba       | map (four-byte uint32_t for n, and then n pairs of   |
2490	   |            | data items follow)                                   |
2491	   |            |                                                      |
2492	   | 0xbb       | map (eight-byte uint64_t for n, and then n pairs of  |
2493	   |            | data items follow)                                   |
2494	   |            |                                                      |
2495	   | 0xbf       | map, pairs of data items follow, terminated by       |
2496	   |            | "break"                                              |
2497	   |            |                                                      |
2498	   | 0xc0       | Text-based date/time (data item follows; see Section |
2499	   |            | 3.4.2)                                               |
2500	   |            |                                                      |
2501	   | 0xc1       | Epoch-based date/time (data item follows; see        |
2502	   |            | Section 3.4.3)                                       |
2503	   |            |                                                      |
2504	   | 0xc2       | Positive bignum (data item "byte string" follows)    |
2505	   |            |                                                      |
2506	   | 0xc3       | Negative bignum (data item "byte string" follows)    |
2507	   |            |                                                      |
2508	   | 0xc4       | Decimal Fraction (data item "array" follows; see     |
2509	   |            | Section 3.4.5)                                       |
2510	   |            |                                                      |
2511	   | 0xc5       | Bigfloat (data item "array" follows; see Section     |
2512	   |            | 3.4.5)                                               |
2513	   |            |                                                      |
2514	   | 0xc6..0xd4 | (tagged item)                                        |
2515	   |            |                                                      |
2516	   | 0xd5..0xd7 | Expected Conversion (data item follows; see Section  |
2517	   |            | 3.4.6.2)                                             |
2518	   |            |                                                      |
2519	   | 0xd8..0xdb | (more tagged items, 1/2/4/8 bytes and then a data    |
2520	   |            | item follow)                                         |
2521	   |            |                                                      |
2522	   | 0xe0..0xf3 | (simple value)                                       |
2523	   |            |                                                      |
2524	   | 0xf4       | False                                                |
2525	   |            |                                                      |
2526	   | 0xf5       | True                                                 |
2527	   |            |                                                      |
2528	   | 0xf6       | Null                                                 |
2529	   |            |                                                      |
2530	   | 0xf7       | Undefined                                            |
2531	   |            |                                                      |
2532	   | 0xf8       | (simple value, one byte follows)                     |
2533	   |            |                                                      |
2534	   | 0xf9       | Half-Precision Float (two-byte IEEE 754)             |
2535	   |            |                                                      |
2536	   | 0xfa       | Single-Precision Float (four-byte IEEE 754)          |
2537	   |            |                                                      |
2538	   | 0xfb       | Double-Precision Float (eight-byte IEEE 754)         |
2539	   |            |                                                      |
2540	   | 0xff       | "break" stop code                                    |
2541	   +------------+------------------------------------------------------+

2543	                   Table 5: Jump Table for Initial Byte

2545	Appendix C.  Pseudocode

2547	   The well-formedness of a CBOR item can be checked by the pseudocode
2548	   in Figure 1.  The data is well-formed if and only if:

2550	   o  the pseudocode does not "fail";

2552	   o  after execution of the pseudocode, no bytes are left in the input
2553	      (except in streaming applications)

2555	   The pseudocode has the following prerequisites:

2557	   o  take(n) reads n bytes from the input data and returns them as a
2558	      byte string.  If n bytes are no longer available, take(n) fails.

2560	   o  uint() converts a byte string into an unsigned integer by
2561	      interpreting the byte string in network byte order.

2563	   o  Arithmetic works as in C.

2565	   o  All variables are unsigned integers of sufficient range.

2567	   well_formed (breakable = false) {
2568	     // process initial bytes
2569	     ib = uint(take(1));
2570	     mt = ib >> 5;
2571	     val = ai = ib & 0x1f;
2572	     switch (ai) {
2573	       case 24: val = uint(take(1)); break;
2574	       case 25: val = uint(take(2)); break;
2575	       case 26: val = uint(take(4)); break;
2576	       case 27: val = uint(take(8)); break;
2577	       case 28: case 29: case 30: fail();
2578	       case 31:
2579	         return well_formed_indefinite(mt, breakable);
2580	     }
2581	     // process content
2582	     switch (mt) {
2583	       // case 0, 1, 7 do not have content; just use val
2584	       case 2: case 3: take(val); break; // bytes/UTF-8
2585	       case 4: for (i = 0; i < val; i++) well_formed(); break;
2586	       case 5: for (i = 0; i < val*2; i++) well_formed(); break;
2587	       case 6: well_formed(); break;     // 1 embedded data item
2588	     }
2589	     return mt;                    // finite data item
2590	   }

2592	   well_formed_indefinite(mt, breakable) {
2593	     switch (mt) {
2594	       case 2: case 3:
2595	         while ((it = well_formed(true)) != -1)
2596	           if (it != mt)           // need finite embedded
2597	             fail();               //    of same type
2598	         break;
2599	       case 4: while (well_formed(true) != -1); break;
2600	       case 5: while (well_formed(true) != -1) well_formed(); break;
2601	       case 7:
2602	         if (breakable)
2603	           return -1;              // signal break out
2604	         else fail();              // no enclosing indefinite
2605	       default: fail();            // wrong mt
2606	     }
2607	     return 0;                     // no break out
2608	   }

2610	              Figure 1: Pseudocode for Well-Formedness Check

2612	   Note that the remaining complexity of a complete CBOR decoder is
2613	   about presenting data that has been parsed to the application in an
2614	   appropriate form.

2616	   Major types 0 and 1 are designed in such a way that they can be
2617	   encoded in C from a signed integer without actually doing an if-then-
2618	   else for positive/negative (Figure 2).  This uses the fact that
2619	   (-1-n), the transformation for major type 1, is the same as ~n
2620	   (bitwise complement) in C unsigned arithmetic; ~n can then be
2621	   expressed as (-1)^n for the negative case, while 0^n leaves n
2622	   unchanged for non-negative.  The sign of a number can be converted to
2623	   -1 for negative and 0 for non-negative (0 or positive) by arithmetic-
2624	   shifting the number by one bit less than the bit length of the number
2625	   (for example, by 63 for 64-bit numbers).

2627	   void encode_sint(int64_t n) {
2628	     uint64t ui = n >> 63;    // extend sign to whole length
2629	     mt = ui & 0x20;          // extract major type
2630	     ui ^= n;                 // complement negatives
2631	     if (ui < 24)
2632	       *p++ = mt + ui;
2633	     else if (ui < 256) {
2634	       *p++ = mt + 24;
2635	       *p++ = ui;
2636	     } else
2637	          ...

2639	            Figure 2: Pseudocode for Encoding a Signed Integer

2641	Appendix D.  Half-Precision

2643	   As half-precision floating-point numbers were only added to IEEE 754
2644	   in 2008 [IEEE.754.2008], today's programming platforms often still
2645	   only have limited support for them.  It is very easy to include at
2646	   least decoding support for them even without such support.  An
2647	   example of a small decoder for half-precision floating-point numbers
2648	   in the C language is shown in Figure 3.  A similar program for Python
2649	   is in Figure 4; this code assumes that the 2-byte value has already
2650	   been decoded as an (unsigned short) integer in network byte order (as
2651	   would be done by the pseudocode in Appendix C).

2653	   #include 

2655	   double decode_half(unsigned char *halfp) {
2656	     int half = (halfp[0] << 8) + halfp[1];
2657	     int exp = (half >> 10) & 0x1f;
2658	     int mant = half & 0x3ff;
2659	     double val;
2660	     if (exp == 0) val = ldexp(mant, -24);
2661	     else if (exp != 31) val = ldexp(mant + 1024, exp - 25);
2662	     else val = mant == 0 ? INFINITY : NAN;
2663	     return half & 0x8000 ? -val : val;
2664	   }

2666	               Figure 3: C Code for a Half-Precision Decoder

2668	   import struct
2669	   from math import ldexp

2671	   def decode_single(single):
2672	       return struct.unpack("!f", struct.pack("!I", single))[0]

2674	   def decode_half(half):
2675	       valu = (half & 0x7fff) << 13 | (half & 0x8000) << 16
2676	       if ((half & 0x7c00) != 0x7c00):
2677	           return ldexp(decode_single(valu), 112)
2678	       return decode_single(valu | 0x7f800000)

2680	            Figure 4: Python Code for a Half-Precision Decoder

2682	Appendix E.  Comparison of Other Binary Formats to CBOR's Design
2683	             Objectives

2685	   The proposal for CBOR follows a history of binary formats that is as
2686	   long as the history of computers themselves.  Different formats have
2687	   had different objectives.  In most cases, the objectives of the
2688	   format were never stated, although they can sometimes be implied by
2689	   the context where the format was first used.  Some formats were meant
2690	   to be universally usable, although history has proven that no binary
2691	   format meets the needs of all protocols and applications.

2693	   CBOR differs from many of these formats due to it starting with a set
2694	   of objectives and attempting to meet just those.  This section
2695	   compares a few of the dozens of formats with CBOR's objectives in
2696	   order to help the reader decide if they want to use CBOR or a
2697	   different format for a particular protocol or application.

2699	   Note that the discussion here is not meant to be a criticism of any
2700	   format: to the best of our knowledge, no format before CBOR was meant
2701	   to cover CBOR's objectives in the priority we have assigned them.  A
2702	   brief recap of the objectives from Section 1.1 is:

2704	   1.  unambiguous encoding of most common data formats from Internet
2705	       standards

2707	   2.  code compactness for encoder or decoder

2709	   3.  no schema description needed

2711	   4.  reasonably compact serialization

2713	   5.  applicability to constrained and unconstrained applications

2715	   6.  good JSON conversion

2717	   7.  extensibility

2719	E.1.  ASN.1 DER, BER, and PER

2721	   [ASN.1] has many serializations.  In the IETF, DER and BER are the
2722	   most common.  The serialized output is not particularly compact for
2723	   many items, and the code needed to decode numeric items can be
2724	   complex on a constrained device.

2726	   Few (if any) IETF protocols have adopted one of the several variants
2727	   of Packed Encoding Rules (PER).  There could be many reasons for
2728	   this, but one that is commonly stated is that PER makes use of the
2729	   schema even for parsing the surface structure of the data stream,
2730	   requiring significant tool support.  There are different versions of
2731	   the ASN.1 schema language in use, which has also hampered adoption.

2733	E.2.  MessagePack

2735	   [MessagePack] is a concise, widely implemented counted binary
2736	   serialization format, similar in many properties to CBOR, although
2737	   somewhat less regular.  While the data model can be used to represent
2738	   JSON data, MessagePack has also been used in many remote procedure
2739	   call (RPC) applications and for long-term storage of data.

2741	   MessagePack has been essentially stable since it was first published
2742	   around 2011; it has not yet had a transition.  The evolution of
2743	   MessagePack is impeded by an imperative to maintain complete
2744	   backwards compatibility with existing stored data, while only few
2745	   bytecodes are still available for extension.  Repeated requests over
2746	   the years from the MessagePack user community to separate out binary
2747	   and text strings in the encoding recently have led to an extension
2748	   proposal that would leave MessagePack's "raw" data ambiguous between
2749	   its usages for binary and text data.  The extension mechanism for
2750	   MessagePack remains unclear.

2752	E.3.  BSON

2754	   [BSON] is a data format that was developed for the storage of JSON-
2755	   like maps (JSON objects) in the MongoDB database.  Its major
2756	   distinguishing feature is the capability for in-place update,
2757	   foregoing a compact representation.  BSON uses a counted
2758	   representation except for map keys, which are null-byte terminated.
2759	   While BSON can be used for the representation of JSON-like objects on
2760	   the wire, its specification is dominated by the requirements of the
2761	   database application and has become somewhat baroque.  The status of
2762	   how BSON extensions will be implemented remains unclear.

2764	E.4.  MSDTP: RFC 713

2766	   Message Services Data Transmission (MSDTP) is a very early example of
2767	   a compact message format; it is described in [RFC0713], written in
2768	   1976.  It is included here for its historical value, not because it
2769	   was ever widely used.

2771	E.5.  Conciseness on the Wire

2773	   While CBOR's design objective of code compactness for encoders and
2774	   decoders is a higher priority than its objective of conciseness on
2775	   the wire, many people focus on the wire size.  Table 6 shows some
2776	   encoding examples for the simple nested array [1, [2, 3]]; where some
2777	   form of indefinite-length encoding is supported by the encoding,
2778	   [_ 1, [2, 3]] (indefinite length on the outer array) is also shown.

2780	   +-------------+--------------------------+--------------------------+
2781	   | Format      | [1, [2, 3]]              | [_ 1, [2, 3]]            |
2782	   +-------------+--------------------------+--------------------------+
2783	   | RFC 713     | c2 05 81 c2 02 82 83     |                          |
2784	   |             |                          |                          |
2785	   | ASN.1 BER   | 30 0b 02 01 01 30 06 02  | 30 80 02 01 01 30 06 02  |
2786	   |             | 01 02 02 01 03           | 01 02 02 01 03 00 00     |
2787	   |             |                          |                          |
2788	   | MessagePack | 92 01 92 02 03           |                          |
2789	   |             |                          |                          |
2790	   | BSON        | 22 00 00 00 10 30 00 01  |                          |
2791	   |             | 00 00 00 04 31 00 13 00  |                          |
2792	   |             | 00 00 10 30 00 02 00 00  |                          |
2793	   |             | 00 10 31 00 03 00 00 00  |                          |
2794	   |             | 00 00                    |                          |
2795	   |             |                          |                          |
2796	   | CBOR        | 82 01 82 02 03           | 9f 01 82 02 03 ff        |
2797	   +-------------+--------------------------+--------------------------+

2799	           Table 6: Examples for Different Levels of Conciseness

2801	Appendix F.  Changes from RFC 7049

2803	   The following is a list of known changes from RFC 7049.  This list is
2804	   non-authoritative.  It is meant to help reviewers see the significant
2805	   differences.

2807	   o  Updated reference for [RFC4267] to [RFC8259] in many places

2809	   o  Updated reference for [CNN-TERMS] to [RFC7228]

2811	   o  Added a comment to the last example in Section 2.2.1 (added
2812	      "Second value")

2814	   o  Fixed a bug in the example in Section 2.4.2 ("29" -> "49")

2816	   o  Fixed a bug in the last paragraph of Section 3.6 ("0b000_11101" ->
2817	      "0b000_11001")

2819	Authors' Addresses
2820	   Carsten Bormann
2821	   Universitaet Bremen TZI
2822	   Postfach 330440
2823	   D-28359 Bremen
2824	   Germany

2826	   Phone: +49-421-218-63921
2827	   EMail: cabo@tzi.org

2829	   Paul Hoffman
2830	   ICANN

2832	   EMail: paul.hoffman@icann.org