idnits 2.17.1 draft-ietf-cbor-7049bis-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: o Indefinite-length items MUST not appear. They can be encoded as definite-length items instead. -- The document date (March 02, 2018) is 2241 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '2' on line 2365

  -- Looks like a reference, but probably isn't: '3' on line 2365

  -- Looks like a reference, but probably isn't: '4' on line 2363

  -- Looks like a reference, but probably isn't: '5' on line 2363

  -- Looks like a reference, but probably isn't: '100' on line 1569

  == Missing Reference: '-1' is mentioned on line 1565, but not defined

  -- Looks like a reference, but probably isn't: '1' on line 2642

  == Missing Reference: 'RFCthis' is mentioned on line 2003, but not defined

  == Missing Reference: 'TM' is mentioned on line 2182, but not defined

  -- Looks like a reference, but probably isn't: '0' on line 2658

  == Missing Reference: 'RFC4267' is mentioned on line 2810, but not defined

  == Missing Reference: 'CNN-TERMS' is mentioned on line 2812, but not defined

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ECMA262'

  ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126)

  -- Obsolete informational reference (is this intentional?): RFC 7049
     (Obsoleted by RFC 8949)

  -- Obsolete informational reference (is this intentional?): RFC 7159
     (Obsoleted by RFC 8259)


     Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 12 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         C. Bormann
3	Internet-Draft                                   Universitaet Bremen TZI
4	Intended status: Standards Track                              P. Hoffman
5	Expires: September 3, 2018                                         ICANN
6	                                                          March 02, 2018

8	              Concise Binary Object Representation (CBOR)
9	                     draft-ietf-cbor-7049bis-02

11	Abstract

13	   The Concise Binary Object Representation (CBOR) is a data format
14	   whose design goals include the possibility of extremely small code
15	   size, fairly small message size, and extensibility without the need
16	   for version negotiation.  These design goals make it different from
17	   earlier binary serializations such as ASN.1 and MessagePack.

19	Contributing

21	   This document is being worked on in the CBOR Working Group.  Please
22	   contribute on the mailing list there, or in the GitHub repository for
23	   this draft: https://github.com/cbor-wg/CBORbis

25	   The charter for the CBOR Working Group says that the WG will update
26	   RFC 7049 to fix verified errata.  Security issues and clarifications
27	   may be addressed, but changes to this document will ensure backward
28	   compatibility for popular deployed codebases.  This document will be
29	   targeted at becoming an Internet Standard.

31	Status of This Memo

33	   This Internet-Draft is submitted in full conformance with the
34	   provisions of BCP 78 and BCP 79.

36	   Internet-Drafts are working documents of the Internet Engineering
37	   Task Force (IETF).  Note that other groups may also distribute
38	   working documents as Internet-Drafts.  The list of current Internet-
39	   Drafts is at https://datatracker.ietf.org/drafts/current/.

41	   Internet-Drafts are draft documents valid for a maximum of six months
42	   and may be updated, replaced, or obsoleted by other documents at any
43	   time.  It is inappropriate to use Internet-Drafts as reference
44	   material or to cite them other than as "work in progress."

46	   This Internet-Draft will expire on September 3, 2018.

48	Copyright Notice

50	   Copyright (c) 2018 IETF Trust and the persons identified as the
51	   document authors.  All rights reserved.

53	   This document is subject to BCP 78 and the IETF Trust's Legal
54	   Provisions Relating to IETF Documents
55	   (https://trustee.ietf.org/license-info) in effect on the date of
56	   publication of this document.  Please review these documents
57	   carefully, as they describe your rights and restrictions with respect
58	   to this document.  Code Components extracted from this document must
59	   include Simplified BSD License text as described in Section 4.e of
60	   the Trust Legal Provisions and are provided without warranty as
61	   described in the Simplified BSD License.

63	Table of Contents

65	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
66	     1.1.  Objectives  . . . . . . . . . . . . . . . . . . . . . . .   4
67	     1.2.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   5
68	   2.  CBOR Data Models  . . . . . . . . . . . . . . . . . . . . . .   7
69	     2.1.  Extended Generic Data Models  . . . . . . . . . . . . . .   7
70	     2.2.  Specific Data Models  . . . . . . . . . . . . . . . . . .   8
71	   3.  Specification of the CBOR Encoding  . . . . . . . . . . . . .   9
72	     3.1.  Major Types . . . . . . . . . . . . . . . . . . . . . . .   9
73	     3.2.  Indefinite Lengths for Some Major Types . . . . . . . . .  11
74	       3.2.1.  Indefinite-Length Arrays and Maps . . . . . . . . . .  11
75	       3.2.2.  Indefinite-Length Byte Strings and Text Strings . . .  14
76	     3.3.  Floating-Point Numbers and Values with No Content . . . .  14
77	     3.4.  Optional Tagging of Items . . . . . . . . . . . . . . . .  16
78	       3.4.1.  Date and Time . . . . . . . . . . . . . . . . . . . .  18
79	       3.4.2.  Bignums . . . . . . . . . . . . . . . . . . . . . . .  19
80	       3.4.3.  Decimal Fractions and Bigfloats . . . . . . . . . . .  19
81	       3.4.4.  Content Hints . . . . . . . . . . . . . . . . . . . .  21
82	         3.4.4.1.  Encoded CBOR Data Item  . . . . . . . . . . . . .  21
83	         3.4.4.2.  Expected Later Encoding for CBOR-to-JSON
84	                   Converters  . . . . . . . . . . . . . . . . . . .  21
85	         3.4.4.3.  Encoded Text  . . . . . . . . . . . . . . . . . .  21
86	       3.4.5.  Self-Describe CBOR  . . . . . . . . . . . . . . . . .  22
87	     3.5.  CBOR Data Models  . . . . . . . . . . . . . . . . . . . .  22
88	   4.  Creating CBOR-Based Protocols . . . . . . . . . . . . . . . .  24
89	     4.1.  CBOR in Streaming Applications  . . . . . . . . . . . . .  25
90	     4.2.  Generic Encoders and Decoders . . . . . . . . . . . . . .  25
91	     4.3.  Syntax Errors . . . . . . . . . . . . . . . . . . . . . .  26
92	       4.3.1.  Incomplete CBOR Data Items  . . . . . . . . . . . . .  26
93	       4.3.2.  Malformed Indefinite-Length Items . . . . . . . . . .  27
94	       4.3.3.  Unknown Additional Information Values . . . . . . . .  27
95	     4.4.  Other Decoding Errors . . . . . . . . . . . . . . . . . .  27
96	     4.5.  Handling Unknown Simple Values and Tags . . . . . . . . .  28
97	     4.6.  Numbers . . . . . . . . . . . . . . . . . . . . . . . . .  28
98	     4.7.  Specifying Keys for Maps  . . . . . . . . . . . . . . . .  29
99	       4.7.1.  Equivalence of Keys . . . . . . . . . . . . . . . . .  30
100	     4.8.  Undefined Values  . . . . . . . . . . . . . . . . . . . .  31
101	     4.9.  Canonical CBOR  . . . . . . . . . . . . . . . . . . . . .  31
102	       4.9.1.  Length-first map key ordering . . . . . . . . . . . .  33
103	     4.10. Strict Mode . . . . . . . . . . . . . . . . . . . . . . .  34
104	   5.  Converting Data between CBOR and JSON . . . . . . . . . . . .  36
105	     5.1.  Converting from CBOR to JSON  . . . . . . . . . . . . . .  36
106	     5.2.  Converting from JSON to CBOR  . . . . . . . . . . . . . .  37
107	   6.  Future Evolution of CBOR  . . . . . . . . . . . . . . . . . .  38
108	     6.1.  Extension Points  . . . . . . . . . . . . . . . . . . . .  38
109	     6.2.  Curating the Additional Information Space . . . . . . . .  39
110	   7.  Diagnostic Notation . . . . . . . . . . . . . . . . . . . . .  40
111	     7.1.  Encoding Indicators . . . . . . . . . . . . . . . . . . .  41
112	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  41
113	     8.1.  Simple Values Registry  . . . . . . . . . . . . . . . . .  41
114	     8.2.  Tags Registry . . . . . . . . . . . . . . . . . . . . . .  42
115	     8.3.  Media Type ("MIME Type")  . . . . . . . . . . . . . . . .  42
116	     8.4.  CoAP Content-Format . . . . . . . . . . . . . . . . . . .  43
117	     8.5.  The +cbor Structured Syntax Suffix Registration . . . . .  43
118	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  44
119	   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  45
120	   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  45
121	     11.1.  Normative References . . . . . . . . . . . . . . . . . .  45
122	     11.2.  Informative References . . . . . . . . . . . . . . . . .  46
123	   Appendix A.  Examples . . . . . . . . . . . . . . . . . . . . . .  48
124	   Appendix B.  Jump Table . . . . . . . . . . . . . . . . . . . . .  52
125	   Appendix C.  Pseudocode . . . . . . . . . . . . . . . . . . . . .  55
126	   Appendix D.  Half-Precision . . . . . . . . . . . . . . . . . . .  57
127	   Appendix E.  Comparison of Other Binary Formats to CBOR's Design
128	                Objectives . . . . . . . . . . . . . . . . . . . . .  58
129	     E.1.  ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . .  59
130	     E.2.  MessagePack . . . . . . . . . . . . . . . . . . . . . . .  59
131	     E.3.  BSON  . . . . . . . . . . . . . . . . . . . . . . . . . .  60
132	     E.4.  UBJSON  . . . . . . . . . . . . . . . . . . . . . . . . .  60
133	     E.5.  MSDTP: RFC 713  . . . . . . . . . . . . . . . . . . . . .  60
134	     E.6.  Conciseness on the Wire . . . . . . . . . . . . . . . . .  60
135	   Appendix F.  Changes from RFC 7049  . . . . . . . . . . . . . . .  61
136	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  61

138	1.  Introduction

140	   There are hundreds of standardized formats for binary representation
141	   of structured data (also known as binary serialization formats).  Of
142	   those, some are for specific domains of information, while others are
143	   generalized for arbitrary data.  In the IETF, probably the best-known
144	   formats in the latter category are ASN.1's BER and DER [ASN.1].

146	   The format defined here follows some specific design goals that are
147	   not well met by current formats.  The underlying data model is an
148	   extended version of the JSON data model [RFC7159].  It is important
149	   to note that this is not a proposal that the grammar in RFC 7159 be
150	   extended in general, since doing so would cause a significant
151	   backwards incompatibility with already deployed JSON documents.
152	   Instead, this document simply defines its own data model that starts
153	   from JSON.

155	   Appendix E lists some existing binary formats and discusses how well
156	   they do or do not fit the design objectives of the Concise Binary
157	   Object Representation (CBOR).

159	1.1.  Objectives

161	   The objectives of CBOR, roughly in decreasing order of importance,
162	   are:

164	   1.  The representation must be able to unambiguously encode most
165	       common data formats used in Internet standards.

167	       *  It must represent a reasonable set of basic data types and
168	          structures using binary encoding.  "Reasonable" here is
169	          largely influenced by the capabilities of JSON, with the major
170	          addition of binary byte strings.  The structures supported are
171	          limited to arrays and trees; loops and lattice-style graphs
172	          are not supported.

174	       *  There is no requirement that all data formats be uniquely
175	          encoded; that is, it is acceptable that the number "7" might
176	          be encoded in multiple different ways.

178	   2.  The code for an encoder or decoder must be able to be compact in
179	       order to support systems with very limited memory, processor
180	       power, and instruction sets.

182	       *  An encoder and a decoder need to be implementable in a very
183	          small amount of code (for example, in class 1 constrained
184	          nodes as defined in [RFC7228]).

186	       *  The format should use contemporary machine representations of
187	          data (for example, not requiring binary-to-decimal
188	          conversion).

190	   3.  Data must be able to be decoded without a schema description.

192	       *  Similar to JSON, encoded data should be self-describing so
193	          that a generic decoder can be written.

195	   4.  The serialization must be reasonably compact, but data
196	       compactness is secondary to code compactness for the encoder and
197	       decoder.

199	       *  "Reasonable" here is bounded by JSON as an upper bound in
200	          size, and by implementation complexity maintaining a lower
201	          bound.  Using either general compression schemes or extensive
202	          bit-fiddling violates the complexity goals.

204	   5.  The format must be applicable to both constrained nodes and high-
205	       volume applications.

207	       *  This means it must be reasonably frugal in CPU usage for both
208	          encoding and decoding.  This is relevant both for constrained
209	          nodes and for potential usage in applications with a very high
210	          volume of data.

212	   6.  The format must support all JSON data types for conversion to and
213	       from JSON.

215	       *  It must support a reasonable level of conversion as long as
216	          the data represented is within the capabilities of JSON.  It
217	          must be possible to define a unidirectional mapping towards
218	          JSON for all types of data.

220	   7.  The format must be extensible, and the extended data must be
221	       decodable by earlier decoders.

223	       *  The format is designed for decades of use.

225	       *  The format must support a form of extensibility that allows
226	          fallback so that a decoder that does not understand an
227	          extension can still decode the message.

229	       *  The format must be able to be extended in the future by later
230	          IETF standards.

232	1.2.  Terminology

234	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
235	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
236	   document are to be interpreted as described in RFC 2119, BCP 14
237	   [RFC2119] and indicate requirement levels for compliant CBOR
238	   implementations.

240	   The term "byte" is used in its now-customary sense as a synonym for
241	   "octet".  All multi-byte values are encoded in network byte order
242	   (that is, most significant byte first, also known as "big-endian").

244	   This specification makes use of the following terminology:

246	   Data item:  A single piece of CBOR data.  The structure of a data
247	      item may contain zero, one, or more nested data items.  The term
248	      is used both for the data item in representation format and for
249	      the abstract idea that can be derived from that by a decoder.

251	   Decoder:  A process that decodes a CBOR data item and makes it
252	      available to an application.  Formally speaking, a decoder
253	      contains a parser to break up the input using the syntax rules of
254	      CBOR, as well as a semantic processor to prepare the data in a
255	      form suitable to the application.

257	   Encoder:  A process that generates the representation format of a
258	      CBOR data item from application information.

260	   Data Stream:  A sequence of zero or more data items, not further
261	      assembled into a larger containing data item.  The independent
262	      data items that make up a data stream are sometimes also referred
263	      to as "top-level data items".

265	   Well-formed:  A data item that follows the syntactic structure of
266	      CBOR.  A well-formed data item uses the initial bytes and the byte
267	      strings and/or data items that are implied by their values as
268	      defined in CBOR and is not followed by extraneous data.

270	   Valid:  A data item that is well-formed and also follows the semantic
271	      restrictions that apply to CBOR data items.

273	   Stream decoder:  A process that decodes a data stream and makes each
274	      of the data items in the sequence available to an application as
275	      they are received.

277	   Where bit arithmetic or data types are explained, this document uses
278	   the notation familiar from the programming language C, except that
279	   "**" denotes exponentiation.  Similar to the "0x" notation for
280	   hexadecimal numbers, numbers in binary notation are prefixed with
281	   "0b".  Underscores can be added to such a number solely for
282	   readability, so 0b00100001 (0x21) might be written 0b001_00001 to
283	   emphasize the desired interpretation of the bits in the byte; in this
284	   case, it is split into three bits and five bits.

286	2.  CBOR Data Models

288	   CBOR is explicit about its generic data model, which defines the set
289	   of all data items that can be represented in CBOR.  Its basic generic
290	   data model is extensible by the registration of simple type values
291	   and tags.  Applications can then subset the resulting extended
292	   generic data model to build their specific data models.

294	   Within environments that can represent the data items in the generic
295	   data model, generic CBOR encoders and decoders can be implemented
296	   (which usually involves defining additional implementation data types
297	   for those data items that do not already have a natural
298	   representation in the environment).  The ability to provide generic
299	   encoders and decoders is an explicit design goal of CBOR; however
300	   many applications will provide their own application-specific
301	   encoders and/or decoders.

303	   In the basic (un-extended) generic data model, a data item is one of:

305	   o  an integer in the range -2**64..2**64-1 inclusive

307	   o  a simple value, identified by a number between 0 and 255, but
308	      distinct from that number

310	   o  a floating point value, distinct from an integer, out of the set
311	      representable by IEEE 754 binary64 (including non-finites)

313	   o  a sequence of zero or more bytes ("byte string")

315	   o  a sequence of zero or more Unicode code points ("text string")

317	   o  a sequence of zero or more data items ("array")

319	   o  a mapping (mathematical function) from zero or more data items
320	      ("keys") each to a data item ("values"), ("map")

322	   o  a tagged data item, comprising a tag (an integer in the range
323	      0..2**64-1) and a value (a data item)

325	   Note that integer and floating-point values are distinct in this
326	   model, even if they have the same numeric value.

328	2.1.  Extended Generic Data Models

330	   This basic generic data model comes pre-extended by the registration
331	   of a number of simple values and tags right in this document, such
332	   as:

334	   o  "false", "true", "null", and "undefined" (simple values identified
335	      by 20..23)

337	   o  integer and floating point values with a larger range and
338	      precision than the above (tags 2 to 5)

340	   o  application data types such as a point in time (tags 1, 0)

342	   Further elements of the extended generic data model can be (and have
343	   been) defined via the IANA registries created for CBOR.  Even if such
344	   an extension is unknown to a generic encoder or decoder, data items
345	   using that extension can be passed to or from the application by
346	   representing them at the interface to the application within the
347	   basic generic data model, i.e., as generic values of a simple type or
348	   generic tagged items.

350	   In other words, the basic generic data model is stable as defined in
351	   this document, while the extended generic data model expands by the
352	   registration of new simple values or tags, but never shrinks.

354	   While there is a strong expectation that generic encoders and
355	   decoders can represent "false", "true", and "null" in the form
356	   appropriate for their programming environment, implementation of the
357	   data model extensions created by tags is truly optional and a matter
358	   of implementation quality.

360	2.2.  Specific Data Models

362	   The specific data model for a CBOR-based protocol usually subsets the
363	   extended generic data model and assigns application semantics to the
364	   data items within this subset and its components.  When documenting
365	   such specific data models, where it is desired to specify the types
366	   of data items, it is preferred to identify the types by their names
367	   in the generic data model ("negative integer", "array") instead of by
368	   referring to aspects of their CBOR representation ("major type 1",
369	   "major type 4").

371	   Specific data models can also specify that values of different types
372	   are equivalent for the purposes of map keys and encoder freedom.  For
373	   example, in the generic data model, a valid map MAY have both "0" and
374	   "0.0" as keys, and an encoder MUST NOT encode "0.0" as an integer
375	   (major type 0, Section 3.1).  However, if a specific data model
376	   declares that floating point and integer representations of integral
377	   values are equivalent, map keys "0" and "0.0" would be considered
378	   duplicates and so invalid, and an encoder could encode integral-
379	   valued floats as integers or vice versa, perhaps to save encoded
380	   bytes.

382	3.  Specification of the CBOR Encoding

384	   A CBOR data item (Section 2) is encoded to or decoded from a byte
385	   string as described in this section.  The encoding is summarized in
386	   Table 5.

388	   The initial byte of each data item contains both information about
389	   the major type (the high-order 3 bits, described in Section 3.1) and
390	   additional information (the low-order 5 bits).  When the value of the
391	   additional information is less than 24, it is directly used as a
392	   small unsigned integer.  When it is 24 to 27, the additional bytes
393	   for a variable-length integer immediately follow; the values 24 to 27
394	   of the additional information specify that its length is a 1-, 2-,
395	   4-, or 8-byte unsigned integer, respectively.  Additional information
396	   value 31 is used for indefinite-length items, described in
397	   Section 3.2.  Additional information values 28 to 30 are reserved for
398	   future expansion.

400	   In all additional information values, the resulting integer is
401	   interpreted depending on the major type.  It may represent the actual
402	   data: for example, in integer types, the resulting integer is used
403	   for the value itself.  It may instead supply length information: for
404	   example, in byte strings it gives the length of the byte string data
405	   that follows.

407	   A CBOR decoder implementation can be based on a jump table with all
408	   256 defined values for the initial byte (Table 5).  A decoder in a
409	   constrained implementation can instead use the structure of the
410	   initial byte and following bytes for more compact code (see
411	   Appendix C for a rough impression of how this could look).

413	3.1.  Major Types

415	   The following lists the major types and the additional information
416	   and other bytes associated with the type.

418	   Major type 0:  an unsigned integer.  The 5-bit additional information
419	      is either the integer itself (for additional information values 0
420	      through 23) or the length of additional data.  Additional
421	      information 24 means the value is represented in an additional
422	      uint8_t, 25 means a uint16_t, 26 means a uint32_t, and 27 means a
423	      uint64_t.  For example, the integer 10 is denoted as the one byte
424	      0b000_01010 (major type 0, additional information 10).  The
425	      integer 500 would be 0b000_11001 (major type 0, additional
426	      information 25) followed by the two bytes 0x01f4, which is 500 in
427	      decimal.

429	   Major type 1:  a negative integer.  The encoding follows the rules
430	      for unsigned integers (major type 0), except that the value is
431	      then -1 minus the encoded unsigned integer.  For example, the
432	      integer -500 would be 0b001_11001 (major type 1, additional
433	      information 25) followed by the two bytes 0x01f3, which is 499 in
434	      decimal.

436	   Major type 2:  a byte string.  The string's length in bytes is
437	      represented following the rules for positive integers (major type
438	      0).  For example, a byte string whose length is 5 would have an
439	      initial byte of 0b010_00101 (major type 2, additional information
440	      5 for the length), followed by 5 bytes of binary content.  A byte
441	      string whose length is 500 would have 3 initial bytes of
442	      0b010_11001 (major type 2, additional information 25 to indicate a
443	      two-byte length) followed by the two bytes 0x01f4 for a length of
444	      500, followed by 500 bytes of binary content.

446	   Major type 3:  a text string, specifically a string of Unicode
447	      characters that is encoded as UTF-8 [RFC3629].  The format of this
448	      type is identical to that of byte strings (major type 2), that is,
449	      as with major type 2, the length gives the number of bytes.  This
450	      type is provided for systems that need to interpret or display
451	      human-readable text, and allows the differentiation between
452	      unstructured bytes and text that has a specified repertoire and
453	      encoding.  In contrast to formats such as JSON, the Unicode
454	      characters in this type are never escaped.  Thus, a newline
455	      character (U+000A) is always represented in a string as the byte
456	      0x0a, and never as the bytes 0x5c6e (the characters "\" and "n")
457	      or as 0x5c7530303061 (the characters "\", "u", "0", "0", "0", and
458	      "a").

460	   Major type 4:  an array of data items.  Arrays are also called lists,
461	      sequences, or tuples.  The array's length follows the rules for
462	      byte strings (major type 2), except that the length denotes the
463	      number of data items, not the length in bytes that the array takes
464	      up.  Items in an array do not need to all be of the same type.
465	      For example, an array that contains 10 items of any type would
466	      have an initial byte of 0b100_01010 (major type of 4, additional
467	      information of 10 for the length) followed by the 10 remaining
468	      items.

470	   Major type 5:  a map of pairs of data items.  Maps are also called
471	      tables, dictionaries, hashes, or objects (in JSON).  A map is
472	      comprised of pairs of data items, each pair consisting of a key
473	      that is immediately followed by a value.  The map's length follows
474	      the rules for byte strings (major type 2), except that the length
475	      denotes the number of pairs, not the length in bytes that the map
476	      takes up.  For example, a map that contains 9 pairs would have an
477	      initial byte of 0b101_01001 (major type of 5, additional
478	      information of 9 for the number of pairs) followed by the 18
479	      remaining items.  The first item is the first key, the second item
480	      is the first value, the third item is the second key, and so on.
481	      A map that has duplicate keys may be well-formed, but it is not
482	      valid, and thus it causes indeterminate decoding; see also
483	      Section 4.7.

485	   Major type 6:  optional semantic tagging of other major types.  See
486	      Section 3.4.

488	   Major type 7:  floating-point numbers and simple data types that need
489	      no content, as well as the "break" stop code.  See Section 3.3.

491	   These eight major types lead to a simple table showing which of the
492	   256 possible values for the initial byte of a data item are used
493	   (Table 5).

495	   In major types 6 and 7, many of the possible values are reserved for
496	   future specification.  See Section 8 for more information on these
497	   values.

499	3.2.  Indefinite Lengths for Some Major Types

501	   Four CBOR items (arrays, maps, byte strings, and text strings) can be
502	   encoded with an indefinite length using additional information value
503	   31.  This is useful if the encoding of the item needs to begin before
504	   the number of items inside the array or map, or the total length of
505	   the string, is known.  (The application of this is often referred to
506	   as "streaming" within a data item.)

508	   Indefinite-length arrays and maps are dealt with differently than
509	   indefinite-length byte strings and text strings.

511	3.2.1.  Indefinite-Length Arrays and Maps

513	   Indefinite-length arrays and maps are simply opened without
514	   indicating the number of data items that will be included in the
515	   array or map, using the additional information value of 31.  The
516	   initial major type and additional information byte is followed by the
517	   elements of the array or map, just as they would be in other arrays
518	   or maps.  The end of the array or map is indicated by encoding a
519	   "break" stop code in a place where the next data item would normally
520	   have been included.  The "break" is encoded with major type 7 and
521	   additional information value 31 (0b111_11111) but is not itself a
522	   data item: it is just a syntactic feature to close the array or map.
523	   That is, the "break" stop code comes after the last item in the array
524	   or map, and it cannot occur anywhere else in place of a data item.

526	   In this way, indefinite-length arrays and maps look identical to
527	   other arrays and maps except for beginning with the additional
528	   information value 31 and ending with the "break" stop code.

530	   Arrays and maps with indefinite lengths allow any number of items
531	   (for arrays) and key/value pairs (for maps) to be given before the
532	   "break" stop code.  There is no restriction against nesting
533	   indefinite-length array or map items.  A "break" only terminates a
534	   single item, so nested indefinite-length items need exactly as many
535	   "break" stop codes as there are type bytes starting an indefinite-
536	   length item.

538	   For example, assume an encoder wants to represent the abstract array
539	   [1, [2, 3], [4, 5]].  The definite-length encoding would be
540	   0x8301820203820405:

542	   83        -- Array of length 3
543	      01     -- 1
544	      82     -- Array of length 2
545	         02  -- 2
546	         03  -- 3
547	      82     -- Array of length 2
548	         04  -- 4
549	         05  -- 5

551	   Indefinite-length encoding could be applied independently to each of
552	   the three arrays encoded in this data item, as required, leading to
553	   representations such as:

555	   0x9f018202039f0405ffff
556	   9F        -- Start indefinite-length array
557	      01     -- 1
558	      82     -- Array of length 2
559	         02  -- 2
560	         03  -- 3
561	      9F     -- Start indefinite-length array
562	         04  -- 4
563	         05  -- 5
564	         FF  -- "break" (inner array)
565	      FF     -- "break" (outer array)

567	   0x9f01820203820405ff
568	   9F        -- Start indefinite-length array
569	      01     -- 1
570	      82     -- Array of length 2
571	         02  -- 2
572	         03  -- 3
573	      82     -- Array of length 2
574	         04  -- 4
575	         05  -- 5
576	      FF     -- "break"

578	   0x83018202039f0405ff
579	   83        -- Array of length 3
580	      01     -- 1
581	      82     -- Array of length 2
582	         02  -- 2
583	         03  -- 3
584	      9F     -- Start indefinite-length array
585	         04  -- 4
586	         05  -- 5
587	         FF  -- "break"

589	   0x83019f0203ff820405
590	   83        -- Array of length 3
591	      01     -- 1
592	      9F     -- Start indefinite-length array
593	         02  -- 2
594	         03  -- 3
595	         FF  -- "break"
596	      82     -- Array of length 2
597	         04  -- 4
598	         05  -- 5

600	   An example of an indefinite-length map (that happens to have two key/
601	   value pairs) might be:

603	   0xbf6346756ef563416d7421ff
604	   BF           -- Start indefinite-length map
605	      63        -- First key, UTF-8 string length 3
606	         46756e --   "Fun"
607	      F5        -- First value, true
608	      63        -- Second key, UTF-8 string length 3
609	         416d74 --   "Amt"
610	      21        -- Second value, -2
611	      FF        -- "break"

613	3.2.2.  Indefinite-Length Byte Strings and Text Strings

615	   Indefinite-length byte strings and text strings are actually a
616	   concatenation of zero or more definite-length byte or text strings
617	   ("chunks") that are together treated as one contiguous string.
618	   Indefinite-length strings are opened with the major type and
619	   additional information value of 31, but what follows are a series of
620	   byte or text strings that have definite lengths (the chunks).  The
621	   end of the series of chunks is indicated by encoding the "break" stop
622	   code (0b111_11111) in a place where the next chunk in the series
623	   would occur.  The contents of the chunks are concatenated together,
624	   and the overall length of the indefinite-length string will be the
625	   sum of the lengths of all of the chunks.  In summary, an indefinite-
626	   length string is encoded similarly to how an indefinite-length array
627	   of its chunks would be encoded, except that the major type of the
628	   indefinite-length string is that of a (text or byte) string and
629	   matches the major types of its chunks.

631	   For indefinite-length byte strings, every data item (chunk) between
632	   the indefinite-length indicator and the "break" MUST be a definite-
633	   length byte string item; if the parser sees any item type other than
634	   a byte string before it sees the "break", it is an error.

636	   For example, assume the sequence:

638	   0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111

640	   5F              -- Start indefinite-length byte string
641	      44           -- Byte string of length 4
642	         aabbccdd  -- Bytes content
643	      43           -- Byte string of length 3
644	         eeff99    -- Bytes content
645	      FF           -- "break"

647	   After decoding, this results in a single byte string with seven
648	   bytes: 0xaabbccddeeff99.

650	   Text strings with indefinite lengths act the same as byte strings
651	   with indefinite lengths, except that all their chunks MUST be
652	   definite-length text strings.  Note that this implies that the bytes
653	   of a single UTF-8 character cannot be spread between chunks: a new
654	   chunk can only be started at a character boundary.

656	3.3.  Floating-Point Numbers and Values with No Content

658	   Major type 7 is for two types of data: floating-point numbers and
659	   "simple values" that do not need any content.  Each value of the
660	   5-bit additional information in the initial byte has its own separate
661	   meaning, as defined in Table 1.  Like the major types for integers,
662	   items of this major type do not carry content data; all the
663	   information is in the initial bytes.

665	    +-------------+--------------------------------------------------+
666	    | 5-Bit Value | Semantics                                        |
667	    +-------------+--------------------------------------------------+
668	    | 0..23       | Simple value (value 0..23)                       |
669	    |             |                                                  |
670	    | 24          | Simple value (value 32..255 in following byte)   |
671	    |             |                                                  |
672	    | 25          | IEEE 754 Half-Precision Float (16 bits follow)   |
673	    |             |                                                  |
674	    | 26          | IEEE 754 Single-Precision Float (32 bits follow) |
675	    |             |                                                  |
676	    | 27          | IEEE 754 Double-Precision Float (64 bits follow) |
677	    |             |                                                  |
678	    | 28-30       | (Unassigned)                                     |
679	    |             |                                                  |
680	    | 31          | "break" stop code for indefinite-length items    |
681	    +-------------+--------------------------------------------------+

683	        Table 1: Values for Additional Information in Major Type 7

685	   As with all other major types, the 5-bit value 24 signifies a single-
686	   byte extension: it is followed by an additional byte to represent the
687	   simple value.  (To minimize confusion, only the values 32 to 255 are
688	   used.)  This maintains the structure of the initial bytes: as for the
689	   other major types, the length of these always depends on the
690	   additional information in the first byte.  Table 2 lists the values
691	   assigned and available for simple types.

693	                       +---------+-----------------+
694	                       | Value   | Semantics       |
695	                       +---------+-----------------+
696	                       | 0..19   | (Unassigned)    |
697	                       |         |                 |
698	                       | 20      | False           |
699	                       |         |                 |
700	                       | 21      | True            |
701	                       |         |                 |
702	                       | 22      | Null            |
703	                       |         |                 |
704	                       | 23      | Undefined value |
705	                       |         |                 |
706	                       | 24..31  | (Reserved)      |
707	                       |         |                 |
708	                       | 32..255 | (Unassigned)    |
709	                       +---------+-----------------+

711	                          Table 2: Simple Values

713	   The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit
714	   IEEE 754 binary floating-point values.  These floating-point values
715	   are encoded in the additional bytes of the appropriate size.  (See
716	   Appendix D for some information about 16-bit floating point.)

718	   An encoder MUST NOT encode False as the two-byte sequence of 0xf814,
719	   MUST NOT encode True as the two-byte sequence of 0xf815, MUST NOT
720	   encode Null as the two-byte sequence of 0xf816, and MUST NOT encode
721	   Undefined value as the two-byte sequence of 0xf817.  A decoder MUST
722	   treat these two-byte sequences as an error.  Similar prohibitions
723	   apply to the unassigned simple values as well.

725	3.4.  Optional Tagging of Items

727	   In CBOR, a data item can optionally be preceded by a tag to give it
728	   additional semantics while retaining its structure.  The tag is major
729	   type 6, and represents an integer number as indicated by the tag's
730	   integer value; the (sole) data item is carried as content data.  If a
731	   tag requires structured data, this structure is encoded into the
732	   nested data item.  The definition of a tag usually restricts what
733	   kinds of nested data item or items can be carried by a tag.

735	   The initial bytes of the tag follow the rules for positive integers
736	   (major type 0).  The tag is followed by a single data item of any
737	   type.  For example, assume that a byte string of length 12 is marked
738	   with a tag to indicate it is a positive bignum (Section 3.4.2).  This
739	   would be marked as 0b110_00010 (major type 6, additional information
740	   2 for the tag) followed by 0b010_01100 (major type 2, additional
741	   information of 12 for the length) followed by the 12 bytes of the
742	   bignum.

744	   Decoders do not need to understand tags, and thus tags may be of
745	   little value in applications where the implementation creating a
746	   particular CBOR data item and the implementation decoding that stream
747	   know the semantic meaning of each item in the data flow.  Their
748	   primary purpose in this specification is to define common data types
749	   such as dates.  A secondary purpose is to allow optional tagging when
750	   the decoder is a generic CBOR decoder that might be able to benefit
751	   from hints about the content of items.  Understanding the semantic
752	   tags is optional for a decoder; it can just jump over the initial
753	   bytes of the tag and interpret the tagged data item itself.

755	   A tag always applies to the item that is directly followed by it.
756	   Thus, if tag A is followed by tag B, which is followed by data item
757	   C, tag A applies to the result of applying tag B on data item C.
758	   That is, a tagged item is a data item consisting of a tag and a
759	   value.  The content of the tagged item is the data item (the value)
760	   that is being tagged.

762	   IANA maintains a registry of tag values as described in Section 8.2.
763	   Table 3 provides a list of initial values, with definitions in the
764	   rest of this section.

766	   +-----------+--------------+----------------------------------------+
767	   | Tag       | Data Item    | Semantics                              |
768	   +-----------+--------------+----------------------------------------+
769	   | 0         | UTF-8 string | Standard date/time string; see         |
770	   |           |              | Section 3.4.1                          |
771	   |           |              |                                        |
772	   | 1         | multiple     | Epoch-based date/time; see             |
773	   |           |              | Section 3.4.1                          |
774	   |           |              |                                        |
775	   | 2         | byte string  | Positive bignum; see Section 3.4.2     |
776	   |           |              |                                        |
777	   | 3         | byte string  | Negative bignum; see Section 3.4.2     |
778	   |           |              |                                        |
779	   | 4         | array        | Decimal fraction; see Section 3.4.3    |
780	   |           |              |                                        |
781	   | 5         | array        | Bigfloat; see Section 3.4.3            |
782	   |           |              |                                        |
783	   | 6..20     | (Unassigned) | (Unassigned)                           |
784	   |           |              |                                        |
785	   | 21        | multiple     | Expected conversion to base64url       |
786	   |           |              | encoding; see Section 3.4.4.2          |
787	   |           |              |                                        |
788	   | 22        | multiple     | Expected conversion to base64          |
789	   |           |              | encoding; see Section 3.4.4.2          |
790	   |           |              |                                        |
791	   | 23        | multiple     | Expected conversion to base16          |
792	   |           |              | encoding; see Section 3.4.4.2          |
793	   |           |              |                                        |
794	   | 24        | byte string  | Encoded CBOR data item; see            |
795	   |           |              | Section 3.4.4.1                        |
796	   |           |              |                                        |
797	   | 25..31    | (Unassigned) | (Unassigned)                           |
798	   |           |              |                                        |
799	   | 32        | UTF-8 string | URI; see Section 3.4.4.3               |
800	   |           |              |                                        |
801	   | 33        | UTF-8 string | base64url; see Section 3.4.4.3         |
802	   |           |              |                                        |
803	   | 34        | UTF-8 string | base64; see Section 3.4.4.3            |
804	   |           |              |                                        |
805	   | 35        | UTF-8 string | Regular expression; see                |
806	   |           |              | Section 3.4.4.3                        |
807	   |           |              |                                        |
808	   | 36        | UTF-8 string | MIME message; see Section 3.4.4.3      |
809	   |           |              |                                        |
810	   | 37..55798 | (Unassigned) | (Unassigned)                           |
811	   |           |              |                                        |
812	   | 55799     | multiple     | Self-describe CBOR; see Section 3.4.5  |
813	   |           |              |                                        |
814	   | 55800+    | (Unassigned) | (Unassigned)                           |
815	   +-----------+--------------+----------------------------------------+

817	                         Table 3: Values for Tags

819	3.4.1.  Date and Time

821	   Protocols using tag values 0 and 1 extend the generic data model
822	   (Section 2) with data items representing points in time.

824	   Tag value 0 is for date/time strings that follow the standard format
825	   described in [RFC3339], as refined by Section 3.3 of [RFC4287].

827	   Tag value 1 is for numerical representation of seconds relative to
828	   1970-01-01T00:00Z in UTC time.  (For the non-negative values that the
829	   Portable Operating System Interface (POSIX) defines, the number of
830	   seconds is counted in the same way as for POSIX "seconds since the
831	   epoch" [TIME_T].)  The tagged item can be a positive or negative
832	   integer (major types 0 and 1), or a floating-point number (major type
833	   7 with additional information 25, 26, or 27).  Note that the number
834	   can be negative (time before 1970-01-01T00:00Z) and, if a floating-
835	   point number, indicate fractional seconds.

837	3.4.2.  Bignums

839	   Protocols using tag values 2 and 3 extend the generic data model
840	   (Section 2) with "bignums" representing arbitrary integers.  In the
841	   generic data model, bignum values are not equal to integers from the
842	   basic data model, but specific data models can define that
843	   equivalence.

845	   Bignums are encoded as a byte string data item, which is interpreted
846	   as an unsigned integer n in network byte order.  For tag value 2, the
847	   value of the bignum is n.  For tag value 3, the value of the bignum
848	   is -1 - n.  Decoders that understand these tags MUST be able to
849	   decode bignums that have leading zeroes.

851	   For example, the number 18446744073709551616 (2**64) is represented
852	   as 0b110_00010 (major type 6, tag 2), followed by 0b010_01001 (major
853	   type 2, length 9), followed by 0x010000000000000000 (one byte 0x01
854	   and eight bytes 0x00).  In hexadecimal:

856	   C2                        -- Tag 2
857	      49                     -- Byte string of length 9
858	         010000000000000000  -- Bytes content

860	3.4.3.  Decimal Fractions and Bigfloats

862	   Protocols using tag value 4 extend the generic data model with data
863	   items representing arbitrary-length decimal fractions m*(10*e).
864	   Protocols using tag value 5 extend the generic data model with data
865	   items representing arbitrary-length binary fractions m*(2*e).  As
866	   with bignums, values of different types are not equal in the generic
867	   data model.

869	   Decimal fractions combine an integer mantissa with a base-10 scaling
870	   factor.  They are most useful if an application needs the exact
871	   representation of a decimal fraction such as 1.1 because there is no
872	   exact representation for many decimal fractions in binary floating
873	   point.

875	   Bigfloats combine an integer mantissa with a base-2 scaling factor.
876	   They are binary floating-point values that can exceed the range or
877	   the precision of the three IEEE 754 formats supported by CBOR
878	   (Section 3.3).  Bigfloats may also be used by constrained
879	   applications that need some basic binary floating-point capability
880	   without the need for supporting IEEE 754.

882	   A decimal fraction or a bigfloat is represented as a tagged array
883	   that contains exactly two integer numbers: an exponent e and a
884	   mantissa m.  Decimal fractions (tag 4) use base-10 exponents; the
885	   value of a decimal fraction data item is m*(10**e).  Bigfloats (tag
886	   5) use base-2 exponents; the value of a bigfloat data item is
887	   m*(2**e).  The exponent e MUST be represented in an integer of major
888	   type 0 or 1, while the mantissa also can be a bignum (Section 3.4.2).

890	   An example of a decimal fraction is that the number 273.15 could be
891	   represented as 0b110_00100 (major type of 6 for the tag, additional
892	   information of 4 for the type of tag), followed by 0b100_00010 (major
893	   type of 4 for the array, additional information of 2 for the length
894	   of the array), followed by 0b001_00001 (major type of 1 for the first
895	   integer, additional information of 1 for the value of -2), followed
896	   by 0b000_11001 (major type of 0 for the second integer, additional
897	   information of 25 for a two-byte value), followed by
898	   0b0110101010110011 (27315 in two bytes).  In hexadecimal:

900	   C4             -- Tag 4
901	      82          -- Array of length 2
902	         21       -- -2
903	         19 6ab3  -- 27315

905	   An example of a bigfloat is that the number 1.5 could be represented
906	   as 0b110_00101 (major type of 6 for the tag, additional information
907	   of 5 for the type of tag), followed by 0b100_00010 (major type of 4
908	   for the array, additional information of 2 for the length of the
909	   array), followed by 0b001_00000 (major type of 1 for the first
910	   integer, additional information of 0 for the value of -1), followed
911	   by 0b000_00011 (major type of 0 for the second integer, additional
912	   information of 3 for the value of 3).  In hexadecimal:

914	   C5             -- Tag 5
915	      82          -- Array of length 2
916	         20       -- -1
917	         03       -- 3

919	   Decimal fractions and bigfloats provide no representation of
920	   Infinity, -Infinity, or NaN; if these are needed in place of a
921	   decimal fraction or bigfloat, the IEEE 754 half-precision
922	   representations from Section 3.3 can be used.  For constrained
923	   applications, where there is a choice between representing a specific
924	   number as an integer and as a decimal fraction or bigfloat (such as
925	   when the exponent is small and non-negative), there is a quality-of-
926	   implementation expectation that the integer representation is used
927	   directly.

929	3.4.4.  Content Hints

931	   The tags in this section are for content hints that might be used by
932	   generic CBOR processors.  These content hints do not extend the
933	   generic data model.

935	3.4.4.1.  Encoded CBOR Data Item

937	   Sometimes it is beneficial to carry an embedded CBOR data item that
938	   is not meant to be decoded immediately at the time the enclosing data
939	   item is being parsed.  Tag 24 (CBOR data item) can be used to tag the
940	   embedded byte string as a data item encoded in CBOR format.

942	3.4.4.2.  Expected Later Encoding for CBOR-to-JSON Converters

944	   Tags 21 to 23 indicate that a byte string might require a specific
945	   encoding when interoperating with a text-based representation.  These
946	   tags are useful when an encoder knows that the byte string data it is
947	   writing is likely to be later converted to a particular JSON-based
948	   usage.  That usage specifies that some strings are encoded as base64,
949	   base64url, and so on.  The encoder uses byte strings instead of doing
950	   the encoding itself to reduce the message size, to reduce the code
951	   size of the encoder, or both.  The encoder does not know whether or
952	   not the converter will be generic, and therefore wants to say what it
953	   believes is the proper way to convert binary strings to JSON.

955	   The data item tagged can be a byte string or any other data item.  In
956	   the latter case, the tag applies to all of the byte string data items
957	   contained in the data item, except for those contained in a nested
958	   data item tagged with an expected conversion.

960	   These three tag types suggest conversions to three of the base data
961	   encodings defined in [RFC4648].  For base64url encoding, padding is
962	   not used (see Section 3.2 of RFC 4648); that is, all trailing equals
963	   signs ("=") are removed from the base64url-encoded string.  Later
964	   tags might be defined for other data encodings of RFC 4648 or for
965	   other ways to encode binary data in strings.

967	3.4.4.3.  Encoded Text

969	   Some text strings hold data that have formats widely used on the
970	   Internet, and sometimes those formats can be validated and presented
971	   to the application in appropriate form by the decoder.  There are
972	   tags for some of these formats.

974	   o  Tag 32 is for URIs, as defined in [RFC3986];
975	   o  Tags 33 and 34 are for base64url- and base64-encoded text strings,
976	      as defined in [RFC4648];

978	   o  Tag 35 is for regular expressions in Perl Compatible Regular
979	      Expressions (PCRE) / JavaScript syntax [ECMA262].

981	   o  Tag 36 is for MIME messages (including all headers), as defined in
982	      [RFC2045];

984	   Note that tags 33 and 34 differ from 21 and 22 in that the data is
985	   transported in base-encoded form for the former and in raw byte
986	   string form for the latter.

988	3.4.5.  Self-Describe CBOR

990	   In many applications, it will be clear from the context that CBOR is
991	   being employed for encoding a data item.  For instance, a specific
992	   protocol might specify the use of CBOR, or a media type is indicated
993	   that specifies its use.  However, there may be applications where
994	   such context information is not available, such as when CBOR data is
995	   stored in a file and disambiguating metadata is not in use.  Here, it
996	   may help to have some distinguishing characteristics for the data
997	   itself.

999	   Tag 55799 is defined for this purpose.  It does not impart any
1000	   special semantics on the data item that follows; that is, the
1001	   semantics of a data item tagged with tag 55799 is exactly identical
1002	   to the semantics of the data item itself.

1004	   The serialization of this tag is 0xd9d9f7, which appears not to be in
1005	   use as a distinguishing mark for frequently used file types.  In
1006	   particular, it is not a valid start of a Unicode text in any Unicode
1007	   encoding if followed by a valid CBOR data item.

1009	   For instance, a decoder might be able to parse both CBOR and JSON.
1010	   Such a decoder would need to mechanically distinguish the two
1011	   formats.  An easy way for an encoder to help the decoder would be to
1012	   tag the entire CBOR item with tag 55799, the serialization of which
1013	   will never be found at the beginning of a JSON text.

1015	3.5.  CBOR Data Models

1017	   CBOR is explicit about its generic data model, which defines the set
1018	   of all data items that can be represented in CBOR.  Its basic generic
1019	   data model is extensible by the registration of simple type values
1020	   and tags.  Applications can then subset the resulting extended
1021	   generic data model to build their specific data models.

1023	   Within environments that can represent the data items in the generic
1024	   data model, generic CBOR encoders and decoders can be implemented
1025	   (which usually involves defining additional implementation data types
1026	   for those data items that do not already have a natural
1027	   representation in the environment).  The ability to provide generic
1028	   encoders and decoders is an explicit design goal of CBOR; however
1029	   many applications will provide their own application-specific
1030	   encoders and/or decoders.

1032	   In the basic (un-extended) generic data model, a data item is one of:

1034	   o  an integer in the range -2**64..2**64-1 inclusive

1036	   o  a simple value, identified by a number between 0 and 255, but
1037	      distinct from that number

1039	   o  a floating point value, distinct from an integer, out of the set
1040	      representable by IEEE 754 binary64 (including non-finites)

1042	   o  a sequence of zero or more bytes ("byte string")

1044	   o  a sequence of zero or more Unicode code points ("text string")

1046	   o  a sequence of zero or more data items ("array")

1048	   o  a mapping (mathematical function) from zero or more data items
1049	      ("keys") each to a data item ("values"), ("map")

1051	   o  a tagged data item, comprising a tag (an integer in the range
1052	      0..2**64-1) and a value (a data item)

1054	   Note that integer and floating-point values are distinct in this
1055	   model, even if they have the same numeric value.

1057	   This basic generic data model comes pre-extended by the registration
1058	   of a number of simple values and tags right in this document, such
1059	   as:

1061	   o  "false", "true", "null", and "undefined" (simple values identified
1062	      by 20..23)

1064	   o  integer and floating point values with a larger range and
1065	      precision than the above (tags 2 to 5)

1067	   o  application data types such as a point in time or an RFC 3339
1068	      date/time string (tags 1, 0)

1070	   Further elements of the extended generic data model can be (and have
1071	   been) defined via the IANA registries created for CBOR.  Even if such
1072	   an extension is unknown to a generic encoder or decoder, data items
1073	   using that extension can be passed to or from the application by
1074	   representing them at the interface to the application within the
1075	   basic generic data model, i.e., as generic values of a simple type or
1076	   generic tagged items.

1078	   In other words, the basic generic data model is stable as defined in
1079	   this document, while the extended generic data model expands by the
1080	   registration of new simple values or tags, but never shrinks.

1082	   While there is a strong expectation that generic encoders and
1083	   decoders can represent "false", "true", and "null" ("undefined" is
1084	   intentionally omitted) in the form appropriate for their programming
1085	   environment, implementation of the data model extensions created by
1086	   tags is truly optional and a matter of implementation quality.

1088	   A specific data model usually subsets the extended generic data model
1089	   and assigns application semantics to the data items within this
1090	   subset and its components.  When documenting such specific data
1091	   models, where it is desired to specify the types of data items, it is
1092	   preferred to identify the types by their names in the generic data
1093	   model ("negative integer", "array") instead of by referring to
1094	   aspects of their CBOR representation ("major type 1", "major type
1095	   4").

1097	4.  Creating CBOR-Based Protocols

1099	   Data formats such as CBOR are often used in environments where there
1100	   is no format negotiation.  A specific design goal of CBOR is to not
1101	   need any included or assumed schema: a decoder can take a CBOR item
1102	   and decode it with no other knowledge.

1104	   Of course, in real-world implementations, the encoder and the decoder
1105	   will have a shared view of what should be in a CBOR data item.  For
1106	   example, an agreed-to format might be "the item is an array whose
1107	   first value is a UTF-8 string, second value is an integer, and
1108	   subsequent values are zero or more floating-point numbers" or "the
1109	   item is a map that has byte strings for keys and contains at least
1110	   one pair whose key is 0xab01".

1112	   This specification puts no restrictions on CBOR-based protocols.  An
1113	   encoder can be capable of encoding as many or as few types of values
1114	   as is required by the protocol in which it is used; a decoder can be
1115	   capable of understanding as many or as few types of values as is
1116	   required by the protocols in which it is used.  This lack of
1117	   restrictions allows CBOR to be used in extremely constrained
1118	   environments.

1120	   This section discusses some considerations in creating CBOR-based
1121	   protocols.  It is advisory only and explicitly excludes any language
1122	   from RFC 2119 other than words that could be interpreted as "MAY" in
1123	   the sense of RFC 2119.

1125	4.1.  CBOR in Streaming Applications

1127	   In a streaming application, a data stream may be composed of a
1128	   sequence of CBOR data items concatenated back-to-back.  In such an
1129	   environment, the decoder immediately begins decoding a new data item
1130	   if data is found after the end of a previous data item.

1132	   Not all of the bytes making up a data item may be immediately
1133	   available to the decoder; some decoders will buffer additional data
1134	   until a complete data item can be presented to the application.
1135	   Other decoders can present partial information about a top-level data
1136	   item to an application, such as the nested data items that could
1137	   already be decoded, or even parts of a byte string that hasn't
1138	   completely arrived yet.

1140	   Note that some applications and protocols will not want to use
1141	   indefinite-length encoding.  Using indefinite-length encoding allows
1142	   an encoder to not need to marshal all the data for counting, but it
1143	   requires a decoder to allocate increasing amounts of memory while
1144	   waiting for the end of the item.  This might be fine for some
1145	   applications but not others.

1147	4.2.  Generic Encoders and Decoders

1149	   A generic CBOR decoder can decode all well-formed CBOR data and
1150	   present them to an application.  CBOR data is well-formed if it uses
1151	   the initial bytes, as well as the byte strings and/or data items that
1152	   are implied by their values, in the manner defined by CBOR, and no
1153	   extraneous data follows (Appendix C).

1155	   Even though CBOR attempts to minimize these cases, not all well-
1156	   formed CBOR data is valid: for example, the format excludes simple
1157	   values below 32 that are encoded with an extension byte.  Also,
1158	   specific tags may make semantic constraints that may be violated,
1159	   such as by including a tag in a bignum tag or by following a byte
1160	   string within a date tag.  Finally, the data may be invalid, such as
1161	   invalid UTF-8 strings or date strings that do not conform to
1162	   [RFC3339].  There is no requirement that generic encoders and
1163	   decoders make unnatural choices for their application interface to
1164	   enable the processing of invalid data.  Generic encoders and decoders
1165	   are expected to forward simple values and tags even if their specific
1166	   codepoints are not registered at the time the encoder/decoder is
1167	   written (Section 4.5).

1169	   Generic decoders provide ways to present well-formed CBOR values,
1170	   both valid and invalid, to an application.  The diagnostic notation
1171	   (Section 7) may be used to present well-formed CBOR values to humans.

1173	   Generic encoders provide an application interface that allows the
1174	   application to specify any well-formed value, including simple values
1175	   and tags unknown to the encoder.

1177	4.3.  Syntax Errors

1179	   A decoder encountering a CBOR data item that is not well-formed
1180	   generally can choose to completely fail the decoding (issue an error
1181	   and/or stop processing altogether), substitute the problematic data
1182	   and data items using a decoder-specific convention that clearly
1183	   indicates there has been a problem, or take some other action.

1185	4.3.1.  Incomplete CBOR Data Items

1187	   The representation of a CBOR data item has a specific length,
1188	   determined by its initial bytes and by the structure of any data
1189	   items enclosed in the data items.  If less data is available, this
1190	   can be treated as a syntax error.  A decoder may also implement
1191	   incremental parsing, that is, decode the data item as far as it is
1192	   available and present the data found so far (such as in an event-
1193	   based interface), with the option of continuing the decoding once
1194	   further data is available.

1196	   Examples of incomplete data items include:

1198	   o  A decoder expects a certain number of array or map entries but
1199	      instead encounters the end of the data.

1201	   o  A decoder processes what it expects to be the last pair in a map
1202	      and comes to the end of the data.

1204	   o  A decoder has just seen a tag and then encounters the end of the
1205	      data.

1207	   o  A decoder has seen the beginning of an indefinite-length item but
1208	      encounters the end of the data before it sees the "break" stop
1209	      code.

1211	4.3.2.  Malformed Indefinite-Length Items

1213	   Examples of malformed indefinite-length data items include:

1215	   o  Within an indefinite-length byte string or text, a decoder finds
1216	      an item that is not of the appropriate major type before it finds
1217	      the "break" stop code.

1219	   o  Within an indefinite-length map, a decoder encounters the "break"
1220	      stop code immediately after reading a key (the value is missing).

1222	   Another error is finding a "break" stop code at a point in the data
1223	   where there is no immediately enclosing (unclosed) indefinite-length
1224	   item.

1226	4.3.3.  Unknown Additional Information Values

1228	   At the time of writing, some additional information values are
1229	   unassigned and reserved for future versions of this document (see
1230	   Section 6.2).  Since the overall syntax for these additional
1231	   information values is not yet defined, a decoder that sees an
1232	   additional information value that it does not understand cannot
1233	   continue parsing.

1235	4.4.  Other Decoding Errors

1237	   A CBOR data item may be syntactically well-formed but present a
1238	   problem with interpreting the data encoded in it in the CBOR data
1239	   model.  Generally speaking, a decoder that finds a data item with
1240	   such a problem might issue a warning, might stop processing
1241	   altogether, might handle the error and make the problematic value
1242	   available to the application as such, or take some other type of
1243	   action.

1245	   Such problems might include:

1247	   Duplicate keys in a map:  Generic decoders (Section 4.2) make data
1248	      available to applications using the native CBOR data model.  That
1249	      data model includes maps (key-value mappings with unique keys),
1250	      not multimaps (key-value mappings where multiple entries can have
1251	      the same key).  Thus, a generic decoder that gets a CBOR map item
1252	      that has duplicate keys will decode to a map with only one
1253	      instance of that key, or it might stop processing altogether.  On
1254	      the other hand, a "streaming decoder" may not even be able to
1255	      notice (Section 4.7).

1257	   Inadmissible type on the value following a tag:  Tags (Section 3.4)
1258	      specify what type of data item is supposed to follow the tag; for
1259	      example, the tags for positive or negative bignums are supposed to
1260	      be put on byte strings.  A decoder that decodes the tagged data
1261	      item into a native representation (a native big integer in this
1262	      example) is expected to check the type of the data item being
1263	      tagged.  Even decoders that don't have such native representations
1264	      available in their environment may perform the check on those tags
1265	      known to them and react appropriately.

1267	   Invalid UTF-8 string:  A decoder might or might not want to verify
1268	      that the sequence of bytes in a UTF-8 string (major type 3) is
1269	      actually valid UTF-8 and react appropriately.

1271	4.5.  Handling Unknown Simple Values and Tags

1273	   A decoder that comes across a simple value (Section 3.3) that it does
1274	   not recognize, such as a value that was added to the IANA registry
1275	   after the decoder was deployed or a value that the decoder chose not
1276	   to implement, might issue a warning, might stop processing
1277	   altogether, might handle the error by making the unknown value
1278	   available to the application as such (as is expected of generic
1279	   decoders), or take some other type of action.

1281	   A decoder that comes across a tag (Section 3.4) that it does not
1282	   recognize, such as a tag that was added to the IANA registry after
1283	   the decoder was deployed or a tag that the decoder chose not to
1284	   implement, might issue a warning, might stop processing altogether,
1285	   might handle the error and present the unknown tag value together
1286	   with the contained data item to the application (as is expected of
1287	   generic decoders), might ignore the tag and simply present the
1288	   contained data item only to the application, or take some other type
1289	   of action.

1291	4.6.  Numbers

1293	   An application or protocol that uses CBOR might restrict the
1294	   representations of numbers.  For instance, a protocol that only deals
1295	   with integers might say that floating-point numbers may not be used
1296	   and that decoders of that protocol do not need to be able to handle
1297	   floating-point numbers.  Similarly, a protocol or application that
1298	   uses CBOR might say that decoders need to be able to handle either
1299	   type of number.

1301	   CBOR-based protocols should take into account that different language
1302	   environments pose different restrictions on the range and precision
1303	   of numbers that are representable.  For example, the JavaScript
1304	   number system treats all numbers as floating point, which may result
1305	   in silent loss of precision in decoding integers with more than 53
1306	   significant bits.  A protocol that uses numbers should define its
1307	   expectations on the handling of non-trivial numbers in decoders and
1308	   receiving applications.

1310	   A CBOR-based protocol that includes floating-point numbers can
1311	   restrict which of the three formats (half-precision, single-
1312	   precision, and double-precision) are to be supported.  For an
1313	   integer-only application, a protocol may want to completely exclude
1314	   the use of floating-point values.

1316	   A CBOR-based protocol designed for compactness may want to exclude
1317	   specific integer encodings that are longer than necessary for the
1318	   application, such as to save the need to implement 64-bit integers.
1319	   There is an expectation that encoders will use the most compact
1320	   integer representation that can represent a given value.  However, a
1321	   compact application should accept values that use a longer-than-
1322	   needed encoding (such as encoding "0" as 0b000_11001 followed by two
1323	   bytes of 0x00) as long as the application can decode an integer of
1324	   the given size.

1326	4.7.  Specifying Keys for Maps

1328	   The encoding and decoding applications need to agree on what types of
1329	   keys are going to be used in maps.  In applications that need to
1330	   interwork with JSON-based applications, keys probably should be
1331	   limited to UTF-8 strings only; otherwise, there has to be a specified
1332	   mapping from the other CBOR types to Unicode characters, and this
1333	   often leads to implementation errors.  In applications where keys are
1334	   numeric in nature and numeric ordering of keys is important to the
1335	   application, directly using the numbers for the keys is useful.

1337	   If multiple types of keys are to be used, consideration should be
1338	   given to how these types would be represented in the specific
1339	   programming environments that are to be used.  For example, in
1340	   JavaScript objects, a key of integer 1 cannot be distinguished from a
1341	   key of string "1".  This means that, if integer keys are used, the
1342	   simultaneous use of string keys that look like numbers needs to be
1343	   avoided.  Again, this leads to the conclusion that keys should be of
1344	   a single CBOR type.

1346	   Decoders that deliver data items nested within a CBOR data item
1347	   immediately on decoding them ("streaming decoders") often do not keep
1348	   the state that is necessary to ascertain uniqueness of a key in a
1349	   map.  Similarly, an encoder that can start encoding data items before
1350	   the enclosing data item is completely available ("streaming encoder")
1351	   may want to reduce its overhead significantly by relying on its data
1352	   source to maintain uniqueness.

1354	   A CBOR-based protocol should make an intentional decision about what
1355	   to do when a receiving application does see multiple identical keys
1356	   in a map.  The resulting rule in the protocol should respect the CBOR
1357	   data model: it cannot prescribe a specific handling of the entries
1358	   with the identical keys, except that it might have a rule that having
1359	   identical keys in a map indicates a malformed map and that the
1360	   decoder has to stop with an error.  Duplicate keys are also
1361	   prohibited by CBOR decoders that are using strict mode
1362	   (Section 4.10).

1364	   The CBOR data model for maps does not allow ascribing semantics to
1365	   the order of the key/value pairs in the map representation.  Thus, it
1366	   would be a very bad practice to define a CBOR-based protocol in such
1367	   a way that changing the key/value pair order in a map would change
1368	   the semantics, apart from trivial aspects (cache usage, etc.).  (A
1369	   CBOR-based protocol can prescribe a specific order of serialization,
1370	   such as for canonicalization.)

1372	   Applications for constrained devices that have maps with 24 or fewer
1373	   frequently used keys should consider using small integers (and those
1374	   with up to 48 frequently used keys should consider also using small
1375	   negative integers) because the keys can then be encoded in a single
1376	   byte.

1378	4.7.1.  Equivalence of Keys

1380	   This notion of equivalence must be used to determine whether keys in
1381	   maps are duplicates or distinct.

1383	   o  All numbers are compared by their numeric value.

1385	      *  Integer data items with the same value are equal regardless of
1386	         how many bytes are used to encode them.

1388	      *  Floating point data items with the same value are equal
1389	         regardless of how many bytes are used to encode them.

1391	      *  An integer value encoded as a floating point data item is
1392	         equivalent to the same value encoded as an integer

1394	   o  Byte strings and text strings are compared by their binary
1395	      content.

1397	      *  A different length encoding has no effect on equivalence.

1399	      *  A byte string is equal to a text string if they have the same
1400	         binary content.

1402	   o  Two arrays are equal if all their items are in the same order and
1403	      equal.

1405	   o  Two maps are equal if they have the same set of pairs regardless
1406	      of their order; pairs are equal if both the key and value are
1407	      equal.

1409	   o  Tags have no effect in determining equality of a data item, if two
1410	      items are equal then they are equal irrespective of any tags that
1411	      either or both may have.

1413	   o  Simple values are equal if they simply have the same value.

1415	   Nothing else is equal, a simple value 2 is not equivalent to an
1416	   integer 2 and an array cannot be equivalent to a map with the same
1417	   values and sequential integer keys.

1419	4.8.  Undefined Values

1421	   In some CBOR-based protocols, the simple value (Section 3.3) of
1422	   Undefined might be used by an encoder as a substitute for a data item
1423	   with an encoding problem, in order to allow the rest of the enclosing
1424	   data items to be encoded without harm.

1426	4.9.  Canonical CBOR

1428	   Some protocols may want encoders to only emit CBOR in a particular
1429	   canonical format; those protocols might also have the decoders check
1430	   that their input is canonical.  Those protocols are free to define
1431	   what they mean by a canonical format and what encoders and decoders
1432	   are expected to do.  This section defines a set of restrictions that
1433	   can serve as the base of such a canonical format.

1435	   A CBOR encoding satisfies the "core canonicalization requirements" if
1436	   it satisfies the following restrictions:

1438	   o  Integers MUST be as short as possible.  In particular:

1440	      *  0 to 23 and -1 to -24 MUST be expressed in the same byte as the
1441	         major type;

1443	      *  24 to 255 and -25 to -256 MUST be expressed only with an
1444	         additional uint8_t;

1446	      *  256 to 65535 and -257 to -65536 MUST be expressed only with an
1447	         additional uint16_t;

1449	      *  65536 to 4294967295 and -65537 to -4294967296 MUST be expressed
1450	         only with an additional uint32_t.

1452	   o  The expression of lengths in major types 2 through 5 MUST be as
1453	      short as possible.  The rules for these lengths follow the above
1454	      rule for integers.

1456	   o  The keys in every map MUST be sorted in the bytewise lexicographic
1457	      order of their canonical encodings.  For example, the following
1458	      keys are sorted correctly:

1460	      1.  10, encoded as 0x0a.

1462	      2.  100, encoded as 0x1864.

1464	      3.  -1, encoded as 0x20.

1466	      4.  "z", encoded as 0x617a.

1468	      5.  "aa", encoded as 0x626161.

1470	      6.  [100], encoded as 0x811864.

1472	      7.  [-1], encoded as 0x8120.

1474	      8.  false, encoded as 0xf4.

1476	   o  Indefinite-length items MUST not appear.  They can be encoded as
1477	      definite-length items instead.

1479	   If a protocol allows for IEEE floats, then additional
1480	   canonicalization rules might need to be added.  One example rule
1481	   might be to have all floats start as a 64-bit float, then do a test
1482	   conversion to a 32-bit float; if the result is the same numeric
1483	   value, use the shorter value and repeat the process with a test
1484	   conversion to a 16-bit float.  (This rule selects 16-bit float for
1485	   positive and negative Infinity as well.)  Also, there are many
1486	   representations for NaN.  If NaN is an allowed value, it must always
1487	   be represented as 0xf97e00.

1489	   CBOR tags present additional considerations for canonicalization.
1490	   The absence or presence of tags in a canonical format is determined
1491	   by the optionality of the tags in the protocol.  In a CBOR-based
1492	   protocol that allows optional tagging anywhere, the canonical format
1493	   must not allow them.  In a protocol that requires tags in certain
1494	   places, the tag needs to appear in the canonical format.  A CBOR-
1495	   based protocol that uses canonicalization might instead say that all
1496	   tags that appear in a message must be retained regardless of whether
1497	   they are optional.

1499	   Protocols that include floating, big integer, or other complex values
1500	   need to define extra requirements on their canonical encodings.  For
1501	   example:

1503	   o  If a protocol includes a field that can express floating values
1504	      (Section 3.3), the protocol's canonicalization needs to specify
1505	      whether the integer 1.0 is encoded as 0x01, 0xf93c00,
1506	      0xfa3f800000, or 0xfb3ff0000000000000.  Three sensible rules for
1507	      this are:

1509	      1.  Encode integral values that fit in 64 bits as values from
1510	          major types 0 and 1, and other values as the smallest of 16-,
1511	          32-, or 64-bit floating point that accurately represents the
1512	          value,

1514	      2.  Encode all values as the smallest of 16-, 32-, or 64-bit
1515	          floating point that accurately represents the value, even for
1516	          integral values, or

1518	      3.  Encode all values as 64-bit floating point.

1520	      If NaN is an allowed value, the protocol needs to pick a single
1521	      representation, for example 0xf97e00.

1523	   o  If a protocol includes a field that can express integers larger
1524	      than 2^64 using tag 2 (Section 3.4.2), the protocol's
1525	      canonicalization needs to specify whether small integers are
1526	      expressed using the tag or major types 0 and 1.

1528	   o  A protocol might give encoders the choice of representing a URL as
1529	      either a text string or, using Section 3.4.4.3, tag 32 containing
1530	      a text string.  This protocol's canonicalization needs to either
1531	      require that the tag is present or require that it's absent, not
1532	      allow either one.

1534	4.9.1.  Length-first map key ordering

1536	   The core canonicalization requirements sort map keys in a different
1537	   order from the one suggested by [RFC7049].  Protocols that need to be
1538	   compatible with [RFC7049]'s order can instead be specified in terms
1539	   of this specification's "length-first core canonicalization
1540	   requirements":

1542	   A CBOR encoding satisfies the "length-first core canonicalization
1543	   requirements" if it satisfies the core canonicalization requirements
1544	   except that the keys in every map MUST be sorted such that:

1546	   1.  If two keys have different lengths, the shorter one sorts
1547	       earlier;

1549	   2.  If two keys have the same length, the one with the lower value in
1550	       (byte-wise) lexical order sorts earlier.

1552	   For example, under the length-first core canonicalization
1553	   requirements, the following keys are sorted correctly:

1555	   1.  10, encoded as 0x0a.

1557	   2.  -1, encoded as 0x20.

1559	   3.  false, encoded as 0xf4.

1561	   4.  100, encoded as 0x1864.

1563	   5.  "z", encoded as 0x617a.

1565	   6.  [-1], encoded as 0x8120.

1567	   7.  "aa", encoded as 0x626161.

1569	   8.  [100], encoded as 0x811864.

1571	4.10.  Strict Mode

1573	   Some areas of application of CBOR do not require canonicalization
1574	   (Section 4.9) but may require that different decoders reach the same
1575	   (semantically equivalent) results, even in the presence of
1576	   potentially malicious data.  This can be required if one application
1577	   (such as a firewall or other protecting entity) makes a decision
1578	   based on the data that another application, which independently
1579	   decodes the data, relies on.

1581	   Normally, it is the responsibility of the sender to avoid ambiguously
1582	   decodable data.  However, the sender might be an attacker specially
1583	   making up CBOR data such that it will be interpreted differently by
1584	   different decoders in an attempt to exploit that as a vulnerability.
1585	   Generic decoders used in applications where this might be a problem
1586	   need to support a strict mode in which it is also the responsibility
1587	   of the receiver to reject ambiguously decodable data.  It is expected
1588	   that firewalls and other security systems that decode CBOR will only
1589	   decode in strict mode.

1591	   A decoder in strict mode will reliably reject any data that could be
1592	   interpreted by other decoders in different ways.  It will reliably
1593	   reject data items with syntax errors (Section 4.3).  It will also
1594	   expend the effort to reliably detect other decoding errors
1595	   (Section 4.4).  In particular, a strict decoder needs to have an API
1596	   that reports an error (and does not return data) for a CBOR data item
1597	   that contains any of the following:

1599	   o  a map (major type 5) that has more than one entry with the same
1600	      key

1602	   o  a tag that is used on a data item of the incorrect type

1604	   o  a data item that is incorrectly formatted for the type given to
1605	      it, such as invalid UTF-8 or data that cannot be interpreted with
1606	      the specific tag that it has been tagged with

1608	   A decoder in strict mode can do one of two things when it encounters
1609	   a tag or simple value that it does not recognize:

1611	   o  It can report an error (and not return data).

1613	   o  It can emit the unknown item (type, value, and, for tags, the
1614	      decoded tagged data item) to the application calling the decoder
1615	      with an indication that the decoder did not recognize that tag or
1616	      simple value.

1618	   The latter approach, which is also appropriate for non-strict
1619	   decoders, supports forward compatibility with newly registered tags
1620	   and simple values without the requirement to update the encoder at
1621	   the same time as the calling application.  (For this, the API for the
1622	   decoder needs to have a way to mark unknown items so that the calling
1623	   application can handle them in a manner appropriate for the program.)

1625	   Since some of this processing may have an appreciable cost (in
1626	   particular with duplicate detection for maps), support of strict mode
1627	   is not a requirement placed on all CBOR decoders.

1629	   Some encoders will rely on their applications to provide input data
1630	   in such a way that unambiguously decodable CBOR results.  A generic
1631	   encoder also may want to provide a strict mode where it reliably
1632	   limits its output to unambiguously decodable CBOR, independent of
1633	   whether or not its application is providing API-conformant data.

1635	5.  Converting Data between CBOR and JSON

1637	   This section gives non-normative advice about converting between CBOR
1638	   and JSON.  Implementations of converters are free to use whichever
1639	   advice here they want.

1641	   It is worth noting that a JSON text is a sequence of characters, not
1642	   an encoded sequence of bytes, while a CBOR data item consists of
1643	   bytes, not characters.

1645	5.1.  Converting from CBOR to JSON

1647	   Most of the types in CBOR have direct analogs in JSON.  However, some
1648	   do not, and someone implementing a CBOR-to-JSON converter has to
1649	   consider what to do in those cases.  The following non-normative
1650	   advice deals with these by converting them to a single substitute
1651	   value, such as a JSON null.

1653	   o  An integer (major type 0 or 1) becomes a JSON number.

1655	   o  A byte string (major type 2) that is not embedded in a tag that
1656	      specifies a proposed encoding is encoded in base64url without
1657	      padding and becomes a JSON string.

1659	   o  A UTF-8 string (major type 3) becomes a JSON string.  Note that
1660	      JSON requires escaping certain characters (RFC 7159, Section 7):
1661	      quotation mark (U+0022), reverse solidus (U+005C), and the "C0
1662	      control characters" (U+0000 through U+001F).  All other characters
1663	      are copied unchanged into the JSON UTF-8 string.

1665	   o  An array (major type 4) becomes a JSON array.

1667	   o  A map (major type 5) becomes a JSON object.  This is possible
1668	      directly only if all keys are UTF-8 strings.  A converter might
1669	      also convert other keys into UTF-8 strings (such as by converting
1670	      integers into strings containing their decimal representation);
1671	      however, doing so introduces a danger of key collision.

1673	   o  False (major type 7, additional information 20) becomes a JSON
1674	      false.

1676	   o  True (major type 7, additional information 21) becomes a JSON
1677	      true.

1679	   o  Null (major type 7, additional information 22) becomes a JSON
1680	      null.

1682	   o  A floating-point value (major type 7, additional information 25
1683	      through 27) becomes a JSON number if it is finite (that is, it can
1684	      be represented in a JSON number); if the value is non-finite (NaN,
1685	      or positive or negative Infinity), it is represented by the
1686	      substitute value.

1688	   o  Any other simple value (major type 7, any additional information
1689	      value not yet discussed) is represented by the substitute value.

1691	   o  A bignum (major type 6, tag value 2 or 3) is represented by
1692	      encoding its byte string in base64url without padding and becomes
1693	      a JSON string.  For tag value 3 (negative bignum), a "~" (ASCII
1694	      tilde) is inserted before the base-encoded value.  (The conversion
1695	      to a binary blob instead of a number is to prevent a likely
1696	      numeric overflow for the JSON decoder.)

1698	   o  A byte string with an encoding hint (major type 6, tag value 21
1699	      through 23) is encoded as described and becomes a JSON string.

1701	   o  For all other tags (major type 6, any other tag value), the
1702	      embedded CBOR item is represented as a JSON value; the tag value
1703	      is ignored.

1705	   o  Indefinite-length items are made definite before conversion.

1707	5.2.  Converting from JSON to CBOR

1709	   All JSON values, once decoded, directly map into one or more CBOR
1710	   values.  As with any kind of CBOR generation, decisions have to be
1711	   made with respect to number representation.  In a suggested
1712	   conversion:

1714	   o  JSON numbers without fractional parts (integer numbers) are
1715	      represented as integers (major types 0 and 1, possibly major type
1716	      6 tag value 2 and 3), choosing the shortest form; integers longer
1717	      than an implementation-defined threshold (which is usually either
1718	      32 or 64 bits) may instead be represented as floating-point
1719	      values.  (If the JSON was generated from a JavaScript
1720	      implementation, its precision is already limited to 53 bits
1721	      maximum.)

1723	   o  Numbers with fractional parts are represented as floating-point
1724	      values.  Preferably, the shortest exact floating-point
1725	      representation is used; for instance, 1.5 is represented in a
1726	      16-bit floating-point value (not all implementations will be
1727	      capable of efficiently finding the minimum form, though).  There
1728	      may be an implementation-defined limit to the precision that will
1729	      affect the precision of the represented values.  Decimal
1730	      representation should only be used if that is specified in a
1731	      protocol.

1733	   CBOR has been designed to generally provide a more compact encoding
1734	   than JSON.  One implementation strategy that might come to mind is to
1735	   perform a JSON-to-CBOR encoding in place in a single buffer.  This
1736	   strategy would need to carefully consider a number of pathological
1737	   cases, such as that some strings represented with no or very few
1738	   escapes and longer (or much longer) than 255 bytes may expand when
1739	   encoded as UTF-8 strings in CBOR.  Similarly, a few of the binary
1740	   floating-point representations might cause expansion from some short
1741	   decimal representations (1.1, 1e9) in JSON.  This may be hard to get
1742	   right, and any ensuing vulnerabilities may be exploited by an
1743	   attacker.

1745	6.  Future Evolution of CBOR

1747	   Successful protocols evolve over time.  New ideas appear,
1748	   implementation platforms improve, related protocols are developed and
1749	   evolve, and new requirements from applications and protocols are
1750	   added.  Facilitating protocol evolution is therefore an important
1751	   design consideration for any protocol development.

1753	   For protocols that will use CBOR, CBOR provides some useful
1754	   mechanisms to facilitate their evolution.  Best practices for this
1755	   are well known, particularly from JSON format development of JSON-
1756	   based protocols.  Therefore, such best practices are outside the
1757	   scope of this specification.

1759	   However, facilitating the evolution of CBOR itself is very well
1760	   within its scope.  CBOR is designed to both provide a stable basis
1761	   for development of CBOR-based protocols and to be able to evolve.
1762	   Since a successful protocol may live for decades, CBOR needs to be
1763	   designed for decades of use and evolution.  This section provides
1764	   some guidance for the evolution of CBOR.  It is necessarily more
1765	   subjective than other parts of this document.  It is also necessarily
1766	   incomplete, lest it turn into a textbook on protocol development.

1768	6.1.  Extension Points

1770	   In a protocol design, opportunities for evolution are often included
1771	   in the form of extension points.  For example, there may be a
1772	   codepoint space that is not fully allocated from the outset, and the
1773	   protocol is designed to tolerate and embrace implementations that
1774	   start using more codepoints than initially allocated.

1776	   Sizing the codepoint space may be difficult because the range
1777	   required may be hard to predict.  An attempt should be made to make
1778	   the codepoint space large enough so that it can slowly be filled over
1779	   the intended lifetime of the protocol.

1781	   CBOR has three major extension points:

1783	   o  the "simple" space (values in major type 7).  Of the 24 efficient
1784	      (and 224 slightly less efficient) values, only a small number have
1785	      been allocated.  Implementations receiving an unknown simple data
1786	      item may be able to process it as such, given that the structure
1787	      of the value is indeed simple.  The IANA registry in Section 8.1
1788	      is the appropriate way to address the extensibility of this
1789	      codepoint space.

1791	   o  the "tag" space (values in major type 6).  Again, only a small
1792	      part of the codepoint space has been allocated, and the space is
1793	      abundant (although the early numbers are more efficient than the
1794	      later ones).  Implementations receiving an unknown tag can choose
1795	      to simply ignore it or to process it as an unknown tag wrapping
1796	      the following data item.  The IANA registry in Section 8.2 is the
1797	      appropriate way to address the extensibility of this codepoint
1798	      space.

1800	   o  the "additional information" space.  An implementation receiving
1801	      an unknown additional information value has no way to continue
1802	      parsing, so allocating codepoints to this space is a major step.
1803	      There are also very few codepoints left.

1805	6.2.  Curating the Additional Information Space

1807	   The human mind is sometimes drawn to filling in little perceived gaps
1808	   to make something neat.  We expect the remaining gaps in the
1809	   codepoint space for the additional information values to be an
1810	   attractor for new ideas, just because they are there.

1812	   The present specification does not manage the additional information
1813	   codepoint space by an IANA registry.  Instead, allocations out of
1814	   this space can only be done by updating this specification.

1816	   For an additional information value of n >= 24, the size of the
1817	   additional data typically is 2**(n-24) bytes.  Therefore, additional
1818	   information values 28 and 29 should be viewed as candidates for
1819	   128-bit and 256-bit quantities, in case a need arises to add them to
1820	   the protocol.  Additional information value 30 is then the only
1821	   additional information value available for general allocation, and
1822	   there should be a very good reason for allocating it before assigning
1823	   it through an update of this protocol.

1825	7.  Diagnostic Notation

1827	   CBOR is a binary interchange format.  To facilitate documentation and
1828	   debugging, and in particular to facilitate communication between
1829	   entities cooperating in debugging, this section defines a simple
1830	   human-readable diagnostic notation.  All actual interchange always
1831	   happens in the binary format.

1833	   Note that this truly is a diagnostic format; it is not meant to be
1834	   parsed.  Therefore, no formal definition (as in ABNF) is given in
1835	   this document.  (Implementers looking for a text-based format for
1836	   representing CBOR data items in configuration files may also want to
1837	   consider YAML [YAML].)

1839	   The diagnostic notation is loosely based on JSON as it is defined in
1840	   RFC 7159, extending it where needed.

1842	   The notation borrows the JSON syntax for numbers (integer and
1843	   floating point), True (>true<), False (>false<), Null (>null<), UTF-8
1844	   strings, arrays, and maps (maps are called objects in JSON; the
1845	   diagnostic notation extends JSON here by allowing any data item in
1846	   the key position).  Undefined is written >undefined< as in
1847	   JavaScript.  The non-finite floating-point numbers Infinity,
1848	   -Infinity, and NaN are written exactly as in this sentence (this is
1849	   also a way they can be written in JavaScript, although JSON does not
1850	   allow them).  A tagged item is written as an integer number for the
1851	   tag followed by the item in parentheses; for instance, an RFC 3339
1852	   (ISO 8601) date could be notated as:

1854	      0("2013-03-21T20:04:00Z")

1856	   or the equivalent relative time as

1858	      1(1363896240)

1860	   Byte strings are notated in one of the base encodings, without
1861	   padding, enclosed in single quotes, prefixed by >h< for base16, >b32<
1862	   for base32, >h32< for base32hex, >b64< for base64 or base64url (the
1863	   actual encodings do not overlap, so the string remains unambiguous).
1864	   For example, the byte string 0x12345678 could be written h'12345678',
1865	   b32'CI2FM6A', or b64'EjRWeA'.

1867	   Unassigned simple values are given as "simple()" with the appropriate
1868	   integer in the parentheses.  For example, "simple(42)" indicates
1869	   major type 7, value 42.

1871	7.1.  Encoding Indicators

1873	   Sometimes it is useful to indicate in the diagnostic notation which
1874	   of several alternative representations were actually used; for
1875	   example, a data item written >1.5< by a diagnostic decoder might have
1876	   been encoded as a half-, single-, or double-precision float.

1878	   The convention for encoding indicators is that anything starting with
1879	   an underscore and all following characters that are alphanumeric or
1880	   underscore, is an encoding indicator, and can be ignored by anyone
1881	   not interested in this information.  Encoding indicators are always
1882	   optional.

1884	   A single underscore can be written after the opening brace of a map
1885	   or the opening bracket of an array to indicate that the data item was
1886	   represented in indefinite-length format.  For example, [_ 1, 2]
1887	   contains an indicator that an indefinite-length representation was
1888	   used to represent the data item [1, 2].

1890	   An underscore followed by a decimal digit n indicates that the
1891	   preceding item (or, for arrays and maps, the item starting with the
1892	   preceding bracket or brace) was encoded with an additional
1893	   information value of 24+n.  For example, 1.5_1 is a half-precision
1894	   floating-point number, while 1.5_3 is encoded as double precision.
1895	   This encoding indicator is not shown in Appendix A.  (Note that the
1896	   encoding indicator "_" is thus an abbreviation of the full form "_7",
1897	   which is not used.)

1899	   As a special case, byte and text strings of indefinite length can be
1900	   notated in the form (_ h'0123', h'4567') and (_ "foo", "bar").

1902	8.  IANA Considerations

1904	   IANA has created two registries for new CBOR values.  The registries
1905	   are separate, that is, not under an umbrella registry, and follow the
1906	   rules in [RFC5226].  IANA has also assigned a new MIME media type and
1907	   an associated Constrained Application Protocol (CoAP) Content-Format
1908	   entry.

1910	8.1.  Simple Values Registry

1912	   IANA has created the "Concise Binary Object Representation (CBOR)
1913	   Simple Values" registry.  The initial values are shown in Table 2.

1915	   New entries in the range 0 to 19 are assigned by Standards Action.
1916	   It is suggested that these Standards Actions allocate values starting
1917	   with the number 16 in order to reserve the lower numbers for
1918	   contiguous blocks (if any).

1920	   New entries in the range 32 to 255 are assigned by Specification
1921	   Required.

1923	8.2.  Tags Registry

1925	   IANA has created the "Concise Binary Object Representation (CBOR)
1926	   Tags" registry.  The initial values are shown in Table 3.

1928	   New entries in the range 0 to 23 are assigned by Standards Action.
1929	   New entries in the range 24 to 255 are assigned by Specification
1930	   Required.  New entries in the range 256 to 18446744073709551615 are
1931	   assigned by First Come First Served.  The template for registration
1932	   requests is:

1934	   o  Data item

1936	   o  Semantics (short form)

1938	   In addition, First Come First Served requests should include:

1940	   o  Point of contact

1942	   o  Description of semantics (URL) - This description is optional; the
1943	      URL can point to something like an Internet-Draft or a web page.

1945	8.3.  Media Type ("MIME Type")

1947	   The Internet media type [RFC6838] for CBOR data is application/cbor.

1949	   Type name: application

1951	   Subtype name: cbor

1953	   Required parameters: n/a

1955	   Optional parameters: n/a

1957	   Encoding considerations:  binary

1959	   Security considerations:  See Section 9 of this document

1961	   Interoperability considerations: n/a

1963	   Published specification: This document

1965	   Applications that use this media type:  None yet, but it is expected
1966	      that this format will be deployed in protocols and applications.

1968	   Additional information:
1969	     Magic number(s): n/a
1970	     File extension(s): .cbor
1971	     Macintosh file type code(s): n/a

1973	   Person & email address to contact for further information:
1974	     Carsten Bormann
1975	     cabo@tzi.org

1977	   Intended usage: COMMON

1979	   Restrictions on usage: none

1981	   Author:
1982	     Carsten Bormann 

1984	   Change controller:
1985	     The IESG 

1987	8.4.  CoAP Content-Format

1989	   Media Type: application/cbor

1991	   Encoding: -

1993	   Id: 60

1995	   Reference: [RFCthis]

1997	8.5.  The +cbor Structured Syntax Suffix Registration

1999	   Name: Concise Binary Object Representation (CBOR)

2001	   +suffix: +cbor

2003	   References: [RFCthis]

2005	   Encoding Considerations: CBOR is a binary format.

2007	   Interoperability Considerations: n/a
2008	   Fragment Identifier Considerations:
2009	     The syntax and semantics of fragment identifiers specified for
2010	     +cbor SHOULD be as specified for "application/cbor".  (At
2011	     publication of this document, there is no fragment identification
2012	     syntax defined for "application/cbor".)

2014	     The syntax and semantics for fragment identifiers for a specific
2015	     "xxx/yyy+cbor" SHOULD be processed as follows:

2017	     For cases defined in +cbor, where the fragment identifier resolves
2018	     per the +cbor rules, then process as specified in +cbor.

2020	     For cases defined in +cbor, where the fragment identifier does
2021	     not resolve per the +cbor rules, then process as specified in
2022	     "xxx/yyy+cbor".

2024	     For cases not defined in +cbor, then process as specified in
2025	     "xxx/yyy+cbor".

2027	   Security Considerations:  See Section 9 of this document

2029	   Contact:
2030	     Apps Area Working Group (apps-discuss@ietf.org)

2032	   Author/Change Controller:
2033	     The Apps Area Working Group.
2034	     The IESG has change control over this registration.

2036	9.  Security Considerations

2038	   A network-facing application can exhibit vulnerabilities in its
2039	   processing logic for incoming data.  Complex parsers are well known
2040	   as a likely source of such vulnerabilities, such as the ability to
2041	   remotely crash a node, or even remotely execute arbitrary code on it.
2042	   CBOR attempts to narrow the opportunities for introducing such
2043	   vulnerabilities by reducing parser complexity, by giving the entire
2044	   range of encodable values a meaning where possible.

2046	   Resource exhaustion attacks might attempt to lure a decoder into
2047	   allocating very big data items (strings, arrays, maps) or exhaust the
2048	   stack depth by setting up deeply nested items.  Decoders need to have
2049	   appropriate resource management to mitigate these attacks.  (Items
2050	   for which very large sizes are given can also attempt to exploit
2051	   integer overflow vulnerabilities.)

2053	   Applications where a CBOR data item is examined by a gatekeeper
2054	   function and later used by a different application may exhibit
2055	   vulnerabilities when multiple interpretations of the data item are
2056	   possible.  For example, an attacker could make use of duplicate keys
2057	   in maps and precision issues in numbers to make the gatekeeper base
2058	   its decisions on a different interpretation than the one that will be
2059	   used by the second application.  Protocols that are used in a
2060	   security context should be defined in such a way that these multiple
2061	   interpretations are reliably reduced to a single one.  To facilitate
2062	   this, encoder and decoder implementations used in such contexts
2063	   should provide at least one strict mode of operation (Section 4.10).

2065	10.  Acknowledgements

2067	   CBOR was inspired by MessagePack.  MessagePack was developed and
2068	   promoted by Sadayuki Furuhashi ("frsyuki").  This reference to
2069	   MessagePack is solely for attribution; CBOR is not intended as a
2070	   version of or replacement for MessagePack, as it has different design
2071	   goals and requirements.

2073	   The need for functionality beyond the original MessagePack
2074	   Specification became obvious to many people at about the same time
2075	   around the year 2012.  BinaryPack is a minor derivation of
2076	   MessagePack that was developed by Eric Zhang for the binaryjs
2077	   project.  A similar, but different, extension was made by Tim Caswell
2078	   for his msgpack-js and msgpack-js-browser projects.  Many people have
2079	   contributed to the recent discussion about extending MessagePack to
2080	   separate text string representation from byte string representation.

2082	   The encoding of the additional information in CBOR was inspired by
2083	   the encoding of length information designed by Klaus Hartke for CoAP.

2085	   This document also incorporates suggestions made by many people,
2086	   notably Dan Frost, James Manger, Joe Hildebrand, Keith Moore, Matthew
2087	   Lepinski, Nico Williams, Phillip Hallam-Baker, Ray Polk, Tim Bray,
2088	   Tony Finch, Tony Hansen, and Yaron Sheffer.

2090	11.  References

2092	11.1.  Normative References

2094	   [ECMA262]  European Computer Manufacturers Association, "ECMAScript
2095	              Language Specification 5.1 Edition", ECMA Standard ECMA-
2096	              262, June 2011, .

2100	   [RFC2045]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
2101	              Extensions (MIME) Part One: Format of Internet Message
2102	              Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996,
2103	              .

2105	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
2106	              Requirement Levels", BCP 14, RFC 2119,
2107	              DOI 10.17487/RFC2119, March 1997,
2108	              .

2110	   [RFC3339]  Klyne, G. and C. Newman, "Date and Time on the Internet:
2111	              Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002,
2112	              .

2114	   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
2115	              10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November
2116	              2003, .

2118	   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
2119	              Resource Identifier (URI): Generic Syntax", STD 66,
2120	              RFC 3986, DOI 10.17487/RFC3986, January 2005,
2121	              .

2123	   [RFC4287]  Nottingham, M., Ed. and R. Sayre, Ed., "The Atom
2124	              Syndication Format", RFC 4287, DOI 10.17487/RFC4287,
2125	              December 2005, .

2127	   [RFC4648]  Josefsson, S., "The Base16, Base32, and Base64 Data
2128	              Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
2129	              .

2131	   [RFC5226]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
2132	              IANA Considerations Section in RFCs", RFC 5226,
2133	              DOI 10.17487/RFC5226, May 2008,
2134	              .

2136	   [TIME_T]   The Open Group Base Specifications, "Vol. 1: Base
2137	              Definitions, Issue 7", Section 4.15 'Seconds Since the
2138	              Epoch', IEEE Std 1003.1, 2013 Edition, 2013,
2139	              .

2142	11.2.  Informative References

2144	   [ASN.1]    International Telecommunication Union, "Information
2145	              Technology -- ASN.1 encoding rules: Specification of Basic
2146	              Encoding Rules (BER), Canonical Encoding Rules (CER) and
2147	              Distinguished Encoding Rules (DER)", ITU-T Recommendation
2148	              X.690, 1994.

2150	   [BSON]     Various, "BSON - Binary JSON", 2013,
2151	              .

2153	   [MessagePack]
2154	              Furuhashi, S., "MessagePack", 2013, .

2156	   [RFC0713]  Haverty, J., "MSDTP-Message Services Data Transmission
2157	              Protocol", RFC 713, DOI 10.17487/RFC0713, April 1976,
2158	              .

2160	   [RFC6838]  Freed, N., Klensin, J., and T. Hansen, "Media Type
2161	              Specifications and Registration Procedures", BCP 13,
2162	              RFC 6838, DOI 10.17487/RFC6838, January 2013,
2163	              .

2165	   [RFC7049]  Bormann, C. and P. Hoffman, "Concise Binary Object
2166	              Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049,
2167	              October 2013, .

2169	   [RFC7159]  Bray, T., Ed., "The JavaScript Object Notation (JSON) Data
2170	              Interchange Format", RFC 7159, DOI 10.17487/RFC7159, March
2171	              2014, .

2173	   [RFC7228]  Bormann, C., Ersue, M., and A. Keranen, "Terminology for
2174	              Constrained-Node Networks", RFC 7228,
2175	              DOI 10.17487/RFC7228, May 2014,
2176	              .

2178	   [UBJSON]   The Buzz Media, "Universal Binary JSON Specification",
2179	              2013, .

2181	   [YAML]     Ben-Kiki, O., Evans, C., and I. Net, "YAML Ain't Markup
2182	              Language (YAML[TM]) Version 1.2", 3rd Edition, October
2183	              2009, .

2185	Appendix A.  Examples

2187	   The following table provides some CBOR-encoded values in hexadecimal
2188	   (right column), together with diagnostic notation for these values
2189	   (left column).  Note that the string "\u00fc" is one form of
2190	   diagnostic notation for a UTF-8 string containing the single Unicode
2191	   character U+00FC, LATIN SMALL LETTER U WITH DIAERESIS (u umlaut).
2192	   Similarly, "\u6c34" is a UTF-8 string in diagnostic notation with a
2193	   single character U+6C34 (CJK UNIFIED IDEOGRAPH-6C34, often
2194	   representing "water"), and "\ud800\udd51" is a UTF-8 string in
2195	   diagnostic notation with a single character U+10151 (GREEK ACROPHONIC
2196	   ATTIC FIFTY STATERS).  (Note that all these single-character strings
2197	   could also be represented in native UTF-8 in diagnostic notation,
2198	   just not in an ASCII-only specification like the present one.)  In
2199	   the diagnostic notation provided for bignums, their intended numeric
2200	   value is shown as a decimal number (such as 18446744073709551616)
2201	   instead of showing a tagged byte string (such as
2202	   2(h'010000000000000000')).

2204	   +------------------------------+------------------------------------+
2205	   | Diagnostic                   | Encoded                            |
2206	   +------------------------------+------------------------------------+
2207	   | 0                            | 0x00                               |
2208	   |                              |                                    |
2209	   | 1                            | 0x01                               |
2210	   |                              |                                    |
2211	   | 10                           | 0x0a                               |
2212	   |                              |                                    |
2213	   | 23                           | 0x17                               |
2214	   |                              |                                    |
2215	   | 24                           | 0x1818                             |
2216	   |                              |                                    |
2217	   | 25                           | 0x1819                             |
2218	   |                              |                                    |
2219	   | 100                          | 0x1864                             |
2220	   |                              |                                    |
2221	   | 1000                         | 0x1903e8                           |
2222	   |                              |                                    |
2223	   | 1000000                      | 0x1a000f4240                       |
2224	   |                              |                                    |
2225	   | 1000000000000                | 0x1b000000e8d4a51000               |
2226	   |                              |                                    |
2227	   | 18446744073709551615         | 0x1bffffffffffffffff               |
2228	   |                              |                                    |
2229	   | 18446744073709551616         | 0xc249010000000000000000           |
2230	   |                              |                                    |
2231	   | -18446744073709551616        | 0x3bffffffffffffffff               |
2232	   |                              |                                    |
2233	   | -18446744073709551617        | 0xc349010000000000000000           |
2234	   |                              |                                    |
2235	   | -1                           | 0x20                               |
2236	   |                              |                                    |
2237	   | -10                          | 0x29                               |
2238	   |                              |                                    |
2239	   | -100                         | 0x3863                             |
2240	   |                              |                                    |
2241	   | -1000                        | 0x3903e7                           |
2242	   |                              |                                    |
2243	   | 0.0                          | 0xf90000                           |
2244	   |                              |                                    |
2245	   | -0.0                         | 0xf98000                           |
2246	   |                              |                                    |
2247	   | 1.0                          | 0xf93c00                           |
2248	   |                              |                                    |
2249	   | 1.1                          | 0xfb3ff199999999999a               |
2250	   |                              |                                    |
2251	   | 1.5                          | 0xf93e00                           |
2252	   |                              |                                    |
2253	   | 65504.0                      | 0xf97bff                           |
2254	   |                              |                                    |
2255	   | 100000.0                     | 0xfa47c35000                       |
2256	   |                              |                                    |
2257	   | 3.4028234663852886e+38       | 0xfa7f7fffff                       |
2258	   |                              |                                    |
2259	   | 1.0e+300                     | 0xfb7e37e43c8800759c               |
2260	   |                              |                                    |
2261	   | 5.960464477539063e-8         | 0xf90001                           |
2262	   |                              |                                    |
2263	   | 0.00006103515625             | 0xf90400                           |
2264	   |                              |                                    |
2265	   | -4.0                         | 0xf9c400                           |
2266	   |                              |                                    |
2267	   | -4.1                         | 0xfbc010666666666666               |
2268	   |                              |                                    |
2269	   | Infinity                     | 0xf97c00                           |
2270	   |                              |                                    |
2271	   | NaN                          | 0xf97e00                           |
2272	   |                              |                                    |
2273	   | -Infinity                    | 0xf9fc00                           |
2274	   |                              |                                    |
2275	   | Infinity                     | 0xfa7f800000                       |
2276	   |                              |                                    |
2277	   | NaN                          | 0xfa7fc00000                       |
2278	   |                              |                                    |
2279	   | -Infinity                    | 0xfaff800000                       |
2280	   |                              |                                    |
2281	   | Infinity                     | 0xfb7ff0000000000000               |
2282	   |                              |                                    |
2283	   | NaN                          | 0xfb7ff8000000000000               |
2284	   |                              |                                    |
2285	   | -Infinity                    | 0xfbfff0000000000000               |
2286	   |                              |                                    |
2287	   | false                        | 0xf4                               |
2288	   |                              |                                    |
2289	   | true                         | 0xf5                               |
2290	   |                              |                                    |
2291	   | null                         | 0xf6                               |
2292	   |                              |                                    |
2293	   | undefined                    | 0xf7                               |
2294	   |                              |                                    |
2295	   | simple(16)                   | 0xf0                               |
2296	   |                              |                                    |
2297	   | simple(24)                   | 0xf818                             |
2298	   |                              |                                    |
2299	   | simple(255)                  | 0xf8ff                             |
2300	   |                              |                                    |
2301	   | 0("2013-03-21T20:04:00Z")    | 0xc074323031332d30332d32315432303a |
2302	   |                              | 30343a30305a                       |
2303	   |                              |                                    |
2304	   | 1(1363896240)                | 0xc11a514b67b0                     |
2305	   |                              |                                    |
2306	   | 1(1363896240.5)              | 0xc1fb41d452d9ec200000             |
2307	   |                              |                                    |
2308	   | 23(h'01020304')              | 0xd74401020304                     |
2309	   |                              |                                    |
2310	   | 24(h'6449455446')            | 0xd818456449455446                 |
2311	   |                              |                                    |
2312	   | 32("http://www.example.com") | 0xd82076687474703a2f2f7777772e6578 |
2313	   |                              | 616d706c652e636f6d                 |
2314	   |                              |                                    |
2315	   | h''                          | 0x40                               |
2316	   |                              |                                    |
2317	   | h'01020304'                  | 0x4401020304                       |
2318	   |                              |                                    |
2319	   | ""                           | 0x60                               |
2320	   |                              |                                    |
2321	   | "a"                          | 0x6161                             |
2322	   |                              |                                    |
2323	   | "IETF"                       | 0x6449455446                       |
2324	   |                              |                                    |
2325	   | "\"\\"                       | 0x62225c                           |
2326	   |                              |                                    |
2327	   | "\u00fc"                     | 0x62c3bc                           |
2328	   |                              |                                    |
2329	   | "\u6c34"                     | 0x63e6b0b4                         |
2330	   |                              |                                    |
2331	   | "\ud800\udd51"               | 0x64f0908591                       |
2332	   |                              |                                    |
2333	   | []                           | 0x80                               |
2334	   |                              |                                    |
2335	   | [1, 2, 3]                    | 0x83010203                         |
2336	   |                              |                                    |
2337	   | [1, [2, 3], [4, 5]]          | 0x8301820203820405                 |
2338	   |                              |                                    |
2339	   | [1, 2, 3, 4, 5, 6, 7, 8, 9,  | 0x98190102030405060708090a0b0c0d0e |
2340	   | 10, 11, 12, 13, 14, 15, 16,  | 0f101112131415161718181819         |
2341	   | 17, 18, 19, 20, 21, 22, 23,  |                                    |
2342	   | 24, 25]                      |                                    |
2343	   |                              |                                    |
2344	   | {}                           | 0xa0                               |
2345	   |                              |                                    |
2346	   | {1: 2, 3: 4}                 | 0xa201020304                       |
2347	   |                              |                                    |
2348	   | {"a": 1, "b": [2, 3]}        | 0xa26161016162820203               |
2349	   |                              |                                    |
2350	   | ["a", {"b": "c"}]            | 0x826161a161626163                 |
2351	   |                              |                                    |
2352	   | {"a": "A", "b": "B", "c":    | 0xa5616161416162614261636143616461 |
2353	   | "C", "d": "D", "e": "E"}     | 4461656145                         |
2354	   |                              |                                    |
2355	   | (_ h'0102', h'030405')       | 0x5f42010243030405ff               |
2356	   |                              |                                    |
2357	   | (_ "strea", "ming")          | 0x7f657374726561646d696e67ff       |
2358	   |                              |                                    |
2359	   | [_ ]                         | 0x9fff                             |
2360	   |                              |                                    |
2361	   | [_ 1, [2, 3], [_ 4, 5]]      | 0x9f018202039f0405ffff             |
2362	   |                              |                                    |
2363	   | [_ 1, [2, 3], [4, 5]]        | 0x9f01820203820405ff               |
2364	   |                              |                                    |
2365	   | [1, [2, 3], [_ 4, 5]]        | 0x83018202039f0405ff               |
2366	   |                              |                                    |
2367	   | [1, [_ 2, 3], [4, 5]]        | 0x83019f0203ff820405               |
2368	   |                              |                                    |
2369	   | [_ 1, 2, 3, 4, 5, 6, 7, 8,   | 0x9f0102030405060708090a0b0c0d0e0f |
2370	   | 9, 10, 11, 12, 13, 14, 15,   | 101112131415161718181819ff         |
2371	   | 16, 17, 18, 19, 20, 21, 22,  |                                    |
2372	   | 23, 24, 25]                  |                                    |
2373	   |                              |                                    |
2374	   | {_ "a": 1, "b": [_ 2, 3]}    | 0xbf61610161629f0203ffff           |
2375	   |                              |                                    |
2376	   | ["a", {_ "b": "c"}]          | 0x826161bf61626163ff               |
2377	   |                              |                                    |
2378	   | {_ "Fun": true, "Amt": -2}   | 0xbf6346756ef563416d7421ff         |
2379	   +------------------------------+------------------------------------+

2381	               Table 4: Examples of Encoded CBOR Data Items

2383	Appendix B.  Jump Table

2385	   For brevity, this jump table does not show initial bytes that are
2386	   reserved for future extension.  It also only shows a selection of the
2387	   initial bytes that can be used for optional features.  (All unsigned
2388	   integers are in network byte order.)

2390	   +------------+------------------------------------------------------+
2391	   | Byte       | Structure/Semantics                                  |
2392	   +------------+------------------------------------------------------+
2393	   | 0x00..0x17 | Integer 0x00..0x17 (0..23)                           |
2394	   |            |                                                      |
2395	   | 0x18       | Unsigned integer (one-byte uint8_t follows)          |
2396	   |            |                                                      |
2397	   | 0x19       | Unsigned integer (two-byte uint16_t follows)         |
2398	   |            |                                                      |
2399	   | 0x1a       | Unsigned integer (four-byte uint32_t follows)        |
2400	   |            |                                                      |
2401	   | 0x1b       | Unsigned integer (eight-byte uint64_t follows)       |
2402	   |            |                                                      |
2403	   | 0x20..0x37 | Negative integer -1-0x00..-1-0x17 (-1..-24)          |
2404	   |            |                                                      |
2405	   | 0x38       | Negative integer -1-n (one-byte uint8_t for n        |
2406	   |            | follows)                                             |
2407	   |            |                                                      |
2408	   | 0x39       | Negative integer -1-n (two-byte uint16_t for n       |
2409	   |            | follows)                                             |
2410	   |            |                                                      |
2411	   | 0x3a       | Negative integer -1-n (four-byte uint32_t for n      |
2412	   |            | follows)                                             |
2413	   |            |                                                      |
2414	   | 0x3b       | Negative integer -1-n (eight-byte uint64_t for n     |
2415	   |            | follows)                                             |
2416	   |            |                                                      |
2417	   | 0x40..0x57 | byte string (0x00..0x17 bytes follow)                |
2418	   |            |                                                      |
2419	   | 0x58       | byte string (one-byte uint8_t for n, and then n      |
2420	   |            | bytes follow)                                        |
2421	   |            |                                                      |
2422	   | 0x59       | byte string (two-byte uint16_t for n, and then n     |
2423	   |            | bytes follow)                                        |
2424	   |            |                                                      |
2425	   | 0x5a       | byte string (four-byte uint32_t for n, and then n    |
2426	   |            | bytes follow)                                        |
2427	   |            |                                                      |
2428	   | 0x5b       | byte string (eight-byte uint64_t for n, and then n   |
2429	   |            | bytes follow)                                        |
2430	   |            |                                                      |
2431	   | 0x5f       | byte string, byte strings follow, terminated by      |
2432	   |            | "break"                                              |
2433	   |            |                                                      |
2434	   | 0x60..0x77 | UTF-8 string (0x00..0x17 bytes follow)               |
2435	   |            |                                                      |
2436	   | 0x78       | UTF-8 string (one-byte uint8_t for n, and then n     |
2437	   |            | bytes follow)                                        |
2438	   |            |                                                      |
2439	   | 0x79       | UTF-8 string (two-byte uint16_t for n, and then n    |
2440	   |            | bytes follow)                                        |
2441	   |            |                                                      |
2442	   | 0x7a       | UTF-8 string (four-byte uint32_t for n, and then n   |
2443	   |            | bytes follow)                                        |
2444	   |            |                                                      |
2445	   | 0x7b       | UTF-8 string (eight-byte uint64_t for n, and then n  |
2446	   |            | bytes follow)                                        |
2447	   |            |                                                      |
2448	   | 0x7f       | UTF-8 string, UTF-8 strings follow, terminated by    |
2449	   |            | "break"                                              |
2450	   |            |                                                      |
2451	   | 0x80..0x97 | array (0x00..0x17 data items follow)                 |
2452	   |            |                                                      |
2453	   | 0x98       | array (one-byte uint8_t for n, and then n data items |
2454	   |            | follow)                                              |
2455	   |            |                                                      |
2456	   | 0x99       | array (two-byte uint16_t for n, and then n data      |
2457	   |            | items follow)                                        |
2458	   |            |                                                      |
2459	   | 0x9a       | array (four-byte uint32_t for n, and then n data     |
2460	   |            | items follow)                                        |
2461	   |            |                                                      |
2462	   | 0x9b       | array (eight-byte uint64_t for n, and then n data    |
2463	   |            | items follow)                                        |
2464	   |            |                                                      |
2465	   | 0x9f       | array, data items follow, terminated by "break"      |
2466	   |            |                                                      |
2467	   | 0xa0..0xb7 | map (0x00..0x17 pairs of data items follow)          |
2468	   |            |                                                      |
2469	   | 0xb8       | map (one-byte uint8_t for n, and then n pairs of     |
2470	   |            | data items follow)                                   |
2471	   |            |                                                      |
2472	   | 0xb9       | map (two-byte uint16_t for n, and then n pairs of    |
2473	   |            | data items follow)                                   |
2474	   |            |                                                      |
2475	   | 0xba       | map (four-byte uint32_t for n, and then n pairs of   |
2476	   |            | data items follow)                                   |
2477	   |            |                                                      |
2478	   | 0xbb       | map (eight-byte uint64_t for n, and then n pairs of  |
2479	   |            | data items follow)                                   |
2480	   |            |                                                      |
2481	   | 0xbf       | map, pairs of data items follow, terminated by       |
2482	   |            | "break"                                              |
2483	   |            |                                                      |
2484	   | 0xc0       | Text-based date/time (data item follows; see         |
2485	   |            | Section 3.4.1)                                       |
2486	   |            |                                                      |
2487	   | 0xc1       | Epoch-based date/time (data item follows; see        |
2488	   |            | Section 3.4.1)                                       |
2489	   |            |                                                      |
2490	   | 0xc2       | Positive bignum (data item "byte string" follows)    |
2491	   |            |                                                      |
2492	   | 0xc3       | Negative bignum (data item "byte string" follows)    |
2493	   |            |                                                      |
2494	   | 0xc4       | Decimal Fraction (data item "array" follows; see     |
2495	   |            | Section 3.4.3)                                       |
2496	   |            |                                                      |
2497	   | 0xc5       | Bigfloat (data item "array" follows; see             |
2498	   |            | Section 3.4.3)                                       |
2499	   |            |                                                      |
2500	   | 0xc6..0xd4 | (tagged item)                                        |
2501	   |            |                                                      |
2502	   | 0xd5..0xd7 | Expected Conversion (data item follows; see          |
2503	   |            | Section 3.4.4.2)                                     |
2504	   |            |                                                      |
2505	   | 0xd8..0xdb | (more tagged items, 1/2/4/8 bytes and then a data    |
2506	   |            | item follow)                                         |
2507	   |            |                                                      |
2508	   | 0xe0..0xf3 | (simple value)                                       |
2509	   |            |                                                      |
2510	   | 0xf4       | False                                                |
2511	   |            |                                                      |
2512	   | 0xf5       | True                                                 |
2513	   |            |                                                      |
2514	   | 0xf6       | Null                                                 |
2515	   |            |                                                      |
2516	   | 0xf7       | Undefined                                            |
2517	   |            |                                                      |
2518	   | 0xf8       | (simple value, one byte follows)                     |
2519	   |            |                                                      |
2520	   | 0xf9       | Half-Precision Float (two-byte IEEE 754)             |
2521	   |            |                                                      |
2522	   | 0xfa       | Single-Precision Float (four-byte IEEE 754)          |
2523	   |            |                                                      |
2524	   | 0xfb       | Double-Precision Float (eight-byte IEEE 754)         |
2525	   |            |                                                      |
2526	   | 0xff       | "break" stop code                                    |
2527	   +------------+------------------------------------------------------+

2529	                   Table 5: Jump Table for Initial Byte

2531	Appendix C.  Pseudocode

2533	   The well-formedness of a CBOR item can be checked by the pseudocode
2534	   in Figure 1.  The data is well-formed if and only if:

2536	   o  the pseudocode does not "fail";

2538	   o  after execution of the pseudocode, no bytes are left in the input
2539	      (except in streaming applications)

2541	   The pseudocode has the following prerequisites:

2543	   o  take(n) reads n bytes from the input data and returns them as a
2544	      byte string.  If n bytes are no longer available, take(n) fails.

2546	   o  uint() converts a byte string into an unsigned integer by
2547	      interpreting the byte string in network byte order.

2549	   o  Arithmetic works as in C.

2551	   o  All variables are unsigned integers of sufficient range.

2553	   well_formed (breakable = false) {
2554	     // process initial bytes
2555	     ib = uint(take(1));
2556	     mt = ib >> 5;
2557	     val = ai = ib & 0x1f;
2558	     switch (ai) {
2559	       case 24: val = uint(take(1)); break;
2560	       case 25: val = uint(take(2)); break;
2561	       case 26: val = uint(take(4)); break;
2562	       case 27: val = uint(take(8)); break;
2563	       case 28: case 29: case 30: fail();
2564	       case 31:
2565	         return well_formed_indefinite(mt, breakable);
2566	     }
2567	     // process content
2568	     switch (mt) {
2569	       // case 0, 1, 7 do not have content; just use val
2570	       case 2: case 3: take(val); break; // bytes/UTF-8
2571	       case 4: for (i = 0; i < val; i++) well_formed(); break;
2572	       case 5: for (i = 0; i < val*2; i++) well_formed(); break;
2573	       case 6: well_formed(); break;     // 1 embedded data item
2574	     }
2575	     return mt;                    // finite data item
2576	   }

2578	   well_formed_indefinite(mt, breakable) {
2579	     switch (mt) {
2580	       case 2: case 3:
2581	         while ((it = well_formed(true)) != -1)
2582	           if (it != mt)           // need finite embedded
2583	             fail();               //    of same type
2584	         break;
2585	       case 4: while (well_formed(true) != -1); break;
2586	       case 5: while (well_formed(true) != -1) well_formed(); break;
2587	       case 7:
2588	         if (breakable)
2589	           return -1;              // signal break out
2590	         else fail();              // no enclosing indefinite
2591	       default: fail();            // wrong mt
2592	     }
2593	     return 0;                     // no break out
2594	   }

2596	              Figure 1: Pseudocode for Well-Formedness Check

2598	   Note that the remaining complexity of a complete CBOR decoder is
2599	   about presenting data that has been parsed to the application in an
2600	   appropriate form.

2602	   Major types 0 and 1 are designed in such a way that they can be
2603	   encoded in C from a signed integer without actually doing an if-then-
2604	   else for positive/negative (Figure 2).  This uses the fact that
2605	   (-1-n), the transformation for major type 1, is the same as ~n
2606	   (bitwise complement) in C unsigned arithmetic; ~n can then be
2607	   expressed as (-1)^n for the negative case, while 0^n leaves n
2608	   unchanged for non-negative.  The sign of a number can be converted to
2609	   -1 for negative and 0 for non-negative (0 or positive) by arithmetic-
2610	   shifting the number by one bit less than the bit length of the number
2611	   (for example, by 63 for 64-bit numbers).

2613	   void encode_sint(int64_t n) {
2614	     uint64t ui = n >> 63;    // extend sign to whole length
2615	     mt = ui & 0x20;          // extract major type
2616	     ui ^= n;                 // complement negatives
2617	     if (ui < 24)
2618	       *p++ = mt + ui;
2619	     else if (ui < 256) {
2620	       *p++ = mt + 24;
2621	       *p++ = ui;
2622	     } else
2623	          ...

2625	            Figure 2: Pseudocode for Encoding a Signed Integer

2627	Appendix D.  Half-Precision

2629	   As half-precision floating-point numbers were only added to IEEE 754
2630	   in 2008, today's programming platforms often still only have limited
2631	   support for them.  It is very easy to include at least decoding
2632	   support for them even without such support.  An example of a small
2633	   decoder for half-precision floating-point numbers in the C language
2634	   is shown in Figure 3.  A similar program for Python is in Figure 4;
2635	   this code assumes that the 2-byte value has already been decoded as
2636	   an (unsigned short) integer in network byte order (as would be done
2637	   by the pseudocode in Appendix C).

2639	   #include 

2641	   double decode_half(unsigned char *halfp) {
2642	     int half = (halfp[0] << 8) + halfp[1];
2643	     int exp = (half >> 10) & 0x1f;
2644	     int mant = half & 0x3ff;
2645	     double val;
2646	     if (exp == 0) val = ldexp(mant, -24);
2647	     else if (exp != 31) val = ldexp(mant + 1024, exp - 25);
2648	     else val = mant == 0 ? INFINITY : NAN;
2649	     return half & 0x8000 ? -val : val;
2650	   }

2652	               Figure 3: C Code for a Half-Precision Decoder

2654	   import struct
2655	   from math import ldexp

2657	   def decode_single(single):
2658	       return struct.unpack("!f", struct.pack("!I", single))[0]

2660	   def decode_half(half):
2661	       valu = (half & 0x7fff) << 13 | (half & 0x8000) << 16
2662	       if ((half & 0x7c00) != 0x7c00):
2663	           return ldexp(decode_single(valu), 112)
2664	       return decode_single(valu | 0x7f800000)

2666	            Figure 4: Python Code for a Half-Precision Decoder

2668	Appendix E.  Comparison of Other Binary Formats to CBOR's Design
2669	             Objectives

2671	   The proposal for CBOR follows a history of binary formats that is as
2672	   long as the history of computers themselves.  Different formats have
2673	   had different objectives.  In most cases, the objectives of the
2674	   format were never stated, although they can sometimes be implied by
2675	   the context where the format was first used.  Some formats were meant
2676	   to be universally usable, although history has proven that no binary
2677	   format meets the needs of all protocols and applications.

2679	   CBOR differs from many of these formats due to it starting with a set
2680	   of objectives and attempting to meet just those.  This section
2681	   compares a few of the dozens of formats with CBOR's objectives in
2682	   order to help the reader decide if they want to use CBOR or a
2683	   different format for a particular protocol or application.

2685	   Note that the discussion here is not meant to be a criticism of any
2686	   format: to the best of our knowledge, no format before CBOR was meant
2687	   to cover CBOR's objectives in the priority we have assigned them.  A
2688	   brief recap of the objectives from Section 1.1 is:

2690	   1.  unambiguous encoding of most common data formats from Internet
2691	       standards

2693	   2.  code compactness for encoder or decoder

2695	   3.  no schema description needed

2697	   4.  reasonably compact serialization

2699	   5.  applicability to constrained and unconstrained applications

2701	   6.  good JSON conversion

2703	   7.  extensibility

2705	E.1.  ASN.1 DER, BER, and PER

2707	   [ASN.1] has many serializations.  In the IETF, DER and BER are the
2708	   most common.  The serialized output is not particularly compact for
2709	   many items, and the code needed to decode numeric items can be
2710	   complex on a constrained device.

2712	   Few (if any) IETF protocols have adopted one of the several variants
2713	   of Packed Encoding Rules (PER).  There could be many reasons for
2714	   this, but one that is commonly stated is that PER makes use of the
2715	   schema even for parsing the surface structure of the data stream,
2716	   requiring significant tool support.  There are different versions of
2717	   the ASN.1 schema language in use, which has also hampered adoption.

2719	E.2.  MessagePack

2721	   [MessagePack] is a concise, widely implemented counted binary
2722	   serialization format, similar in many properties to CBOR, although
2723	   somewhat less regular.  While the data model can be used to represent
2724	   JSON data, MessagePack has also been used in many remote procedure
2725	   call (RPC) applications and for long-term storage of data.

2727	   MessagePack has been essentially stable since it was first published
2728	   around 2011; it has not yet had a transition.  The evolution of
2729	   MessagePack is impeded by an imperative to maintain complete
2730	   backwards compatibility with existing stored data, while only few
2731	   bytecodes are still available for extension.  Repeated requests over
2732	   the years from the MessagePack user community to separate out binary
2733	   and text strings in the encoding recently have led to an extension
2734	   proposal that would leave MessagePack's "raw" data ambiguous between
2735	   its usages for binary and text data.  The extension mechanism for
2736	   MessagePack remains unclear.

2738	E.3.  BSON

2740	   [BSON] is a data format that was developed for the storage of JSON-
2741	   like maps (JSON objects) in the MongoDB database.  Its major
2742	   distinguishing feature is the capability for in-place update,
2743	   foregoing a compact representation.  BSON uses a counted
2744	   representation except for map keys, which are null-byte terminated.
2745	   While BSON can be used for the representation of JSON-like objects on
2746	   the wire, its specification is dominated by the requirements of the
2747	   database application and has become somewhat baroque.  The status of
2748	   how BSON extensions will be implemented remains unclear.

2750	E.4.  UBJSON

2752	   [UBJSON] has a design goal to make JSON faster and somewhat smaller,
2753	   using a binary format that is limited to exactly the data model JSON
2754	   uses.  Thus, there is expressly no intention to support, for example,
2755	   binary data; however, there is a "high-precision number", expressed
2756	   as a character string in JSON syntax.  UBJSON is not optimized for
2757	   code compactness, and its type byte coding is optimized for human
2758	   recognition and not for compact representation of native types such
2759	   as small integers.  Although UBJSON is mostly counted, it provides a
2760	   reserved "unknown-length" value to support streaming of arrays and
2761	   maps (JSON objects).  Within these containers, UBJSON also has a
2762	   "Noop" type for padding.

2764	E.5.  MSDTP: RFC 713

2766	   Message Services Data Transmission (MSDTP) is a very early example of
2767	   a compact message format; it is described in [RFC0713], written in
2768	   1976.  It is included here for its historical value, not because it
2769	   was ever widely used.

2771	E.6.  Conciseness on the Wire

2773	   While CBOR's design objective of code compactness for encoders and
2774	   decoders is a higher priority than its objective of conciseness on
2775	   the wire, many people focus on the wire size.  Table 6 shows some
2776	   encoding examples for the simple nested array [1, [2, 3]]; where some
2777	   form of indefinite-length encoding is supported by the encoding,
2778	   [_ 1, [2, 3]] (indefinite length on the outer array) is also shown.

2780	   +-------------+--------------------------+--------------------------+
2781	   | Format      | [1, [2, 3]]              | [_ 1, [2, 3]]            |
2782	   +-------------+--------------------------+--------------------------+
2783	   | RFC 713     | c2 05 81 c2 02 82 83     |                          |
2784	   |             |                          |                          |
2785	   | ASN.1 BER   | 30 0b 02 01 01 30 06 02  | 30 80 02 01 01 30 06 02  |
2786	   |             | 01 02 02 01 03           | 01 02 02 01 03 00 00     |
2787	   |             |                          |                          |
2788	   | MessagePack | 92 01 92 02 03           |                          |
2789	   |             |                          |                          |
2790	   | BSON        | 22 00 00 00 10 30 00 01  |                          |
2791	   |             | 00 00 00 04 31 00 13 00  |                          |
2792	   |             | 00 00 10 30 00 02 00 00  |                          |
2793	   |             | 00 10 31 00 03 00 00 00  |                          |
2794	   |             | 00 00                    |                          |
2795	   |             |                          |                          |
2796	   | UBJSON      | 61 02 42 01 61 02 42 02  | 61 ff 42 01 61 02 42 02  |
2797	   |             | 42 03                    | 42 03 45                 |
2798	   |             |                          |                          |
2799	   | CBOR        | 82 01 82 02 03           | 9f 01 82 02 03 ff        |
2800	   +-------------+--------------------------+--------------------------+

2802	           Table 6: Examples for Different Levels of Conciseness

2804	Appendix F.  Changes from RFC 7049

2806	   The following is a list of known changes from RFC 7049.  This list is
2807	   non-authoritative.  It is meant to help reviewers see the significant
2808	   differences.

2810	   o  Updated reference for [RFC4267] to [RFC7159] in many places

2812	   o  Updated reference for [CNN-TERMS] to [RFC7228]

2814	   o  Added a comment to the last example in Section 2.2.1 (added
2815	      "Second value")

2817	   o  Fixed a bug in the example in Section 2.4.2 ("29" -> "49")

2819	   o  Fixed a bug in the last paragraph of Section 3.6 ("0b000_11101" ->
2820	      "0b000_11001")

2822	Authors' Addresses
2823	   Carsten Bormann
2824	   Universitaet Bremen TZI
2825	   Postfach 330440
2826	   D-28359 Bremen
2827	   Germany

2829	   Phone: +49-421-218-63921
2830	   EMail: cabo@tzi.org

2832	   Paul Hoffman
2833	   ICANN

2835	   EMail: paul.hoffman@icann.org