idnits 2.17.1 draft-ietf-cbor-7049bis-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (8 March 2020) is 1510 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '2' on line 2757

  -- Looks like a reference, but probably isn't: '3' on line 2757

  -- Looks like a reference, but probably isn't: '4' on line 2755

  -- Looks like a reference, but probably isn't: '5' on line 2755

  -- Looks like a reference, but probably isn't: '100' on line 1518

  == Missing Reference: '-1' is mentioned on line 1514, but not defined

  -- Looks like a reference, but probably isn't: '1' on line 3044

  == Missing Reference: 'RFCthis' is mentioned on line 2281, but not defined

  == Missing Reference: 'TM' is mentioned on line 2576, but not defined

  -- Looks like a reference, but probably isn't: '0' on line 3060

  == Missing Reference: 'RFC4627' is mentioned on line 3202, but not defined

  ** Obsolete undefined reference: RFC 4627 (Obsoleted by RFC 7158, RFC 7159)

  == Missing Reference: 'CNN-TERMS' is mentioned on line 3203, but not defined

  == Unused Reference: 'RFC8746' is defined on line 2560, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ECMA262'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'IEEE754'

  -- Obsolete informational reference (is this intentional?): RFC 7049
     (Obsoleted by RFC 8949)


     Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 12 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         C. Bormann
3	Internet-Draft                                   Universitaet Bremen TZI
4	Obsoletes: 7049 (if approved)                                 P. Hoffman
5	Intended status: Standards Track                                   ICANN
6	Expires: 9 September 2020                                   8 March 2020

8	              Concise Binary Object Representation (CBOR)
9	                       draft-ietf-cbor-7049bis-13

11	Abstract

13	   The Concise Binary Object Representation (CBOR) is a data format
14	   whose design goals include the possibility of extremely small code
15	   size, fairly small message size, and extensibility without the need
16	   for version negotiation.  These design goals make it different from
17	   earlier binary serializations such as ASN.1 and MessagePack.

19	   This document is a revised edition of RFC 7049, with editorial
20	   improvements, added detail, and fixed errata.  This revision formally
21	   obsoletes RFC 7049, while keeping full compatibility of the
22	   interchange format from RFC 7049.  It does not create a new version
23	   of the format.

25	Contributing

27	   This document is being worked on in the CBOR Working Group.  Please
28	   contribute on the mailing list there, or in the GitHub repository for
29	   this draft: https://github.com/cbor-wg/CBORbis

31	   The charter for the CBOR Working Group says that the WG will update
32	   RFC 7049 to fix verified errata.  Security issues and clarifications
33	   may be addressed, but changes to this document will ensure backward
34	   compatibility for popular deployed codebases.  This document will be
35	   targeted at becoming an Internet Standard.

37	   [RFC editor: please remove this note.]

39	Status of This Memo

41	   This Internet-Draft is submitted in full conformance with the
42	   provisions of BCP 78 and BCP 79.

44	   Internet-Drafts are working documents of the Internet Engineering
45	   Task Force (IETF).  Note that other groups may also distribute
46	   working documents as Internet-Drafts.  The list of current Internet-
47	   Drafts is at https://datatracker.ietf.org/drafts/current/.

49	   Internet-Drafts are draft documents valid for a maximum of six months
50	   and may be updated, replaced, or obsoleted by other documents at any
51	   time.  It is inappropriate to use Internet-Drafts as reference
52	   material or to cite them other than as "work in progress."

54	   This Internet-Draft will expire on 9 September 2020.

56	Copyright Notice

58	   Copyright (c) 2020 IETF Trust and the persons identified as the
59	   document authors.  All rights reserved.

61	   This document is subject to BCP 78 and the IETF Trust's Legal
62	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
63	   license-info) in effect on the date of publication of this document.
64	   Please review these documents carefully, as they describe your rights
65	   and restrictions with respect to this document.  Code Components
66	   extracted from this document must include Simplified BSD License text
67	   as described in Section 4.e of the Trust Legal Provisions and are
68	   provided without warranty as described in the Simplified BSD License.

70	Table of Contents

72	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
73	     1.1.  Objectives  . . . . . . . . . . . . . . . . . . . . . . .   4
74	     1.2.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   6
75	   2.  CBOR Data Models  . . . . . . . . . . . . . . . . . . . . . .   7
76	     2.1.  Extended Generic Data Models  . . . . . . . . . . . . . .   8
77	     2.2.  Specific Data Models  . . . . . . . . . . . . . . . . . .   9
78	   3.  Specification of the CBOR Encoding  . . . . . . . . . . . . .  10
79	     3.1.  Major Types . . . . . . . . . . . . . . . . . . . . . . .  11
80	     3.2.  Indefinite Lengths for Some Major Types . . . . . . . . .  13
81	       3.2.1.  The "break" Stop Code . . . . . . . . . . . . . . . .  13
82	       3.2.2.  Indefinite-Length Arrays and Maps . . . . . . . . . .  14
83	       3.2.3.  Indefinite-Length Byte Strings and Text Strings . . .  16
84	       3.2.4.  Summary of indefinite-length use of major types . . .  17
85	     3.3.  Floating-Point Numbers and Values with No Content . . . .  17
86	     3.4.  Tagging of Items  . . . . . . . . . . . . . . . . . . . .  19
87	       3.4.1.  Standard Date/Time String . . . . . . . . . . . . . .  22
88	       3.4.2.  Epoch-based Date/Time . . . . . . . . . . . . . . . .  22
89	       3.4.3.  Bignums . . . . . . . . . . . . . . . . . . . . . . .  23
90	       3.4.4.  Decimal Fractions and Bigfloats . . . . . . . . . . .  24
91	       3.4.5.  Content Hints . . . . . . . . . . . . . . . . . . . .  25
92	         3.4.5.1.  Encoded CBOR Data Item  . . . . . . . . . . . . .  25
93	         3.4.5.2.  Expected Later Encoding for CBOR-to-JSON
94	                 Converters  . . . . . . . . . . . . . . . . . . . .  25
95	         3.4.5.3.  Encoded Text  . . . . . . . . . . . . . . . . . .  26
96	       3.4.6.  Self-Described CBOR . . . . . . . . . . . . . . . . .  27

98	   4.  Serialization Considerations  . . . . . . . . . . . . . . . .  28
99	     4.1.  Preferred Serialization . . . . . . . . . . . . . . . . .  28
100	     4.2.  Deterministically Encoded CBOR  . . . . . . . . . . . . .  29
101	       4.2.1.  Core Deterministic Encoding Requirements  . . . . . .  29
102	       4.2.2.  Additional Deterministic Encoding Considerations  . .  30
103	       4.2.3.  Length-first Map Key Ordering . . . . . . . . . . . .  32
104	   5.  Creating CBOR-Based Protocols . . . . . . . . . . . . . . . .  33
105	     5.1.  CBOR in Streaming Applications  . . . . . . . . . . . . .  33
106	     5.2.  Generic Encoders and Decoders . . . . . . . . . . . . . .  34
107	     5.3.  Validity of Items . . . . . . . . . . . . . . . . . . . .  35
108	       5.3.1.  Basic validity  . . . . . . . . . . . . . . . . . . .  35
109	       5.3.2.  Tag validity  . . . . . . . . . . . . . . . . . . . .  35
110	     5.4.  Validity and Evolution  . . . . . . . . . . . . . . . . .  36
111	     5.5.  Numbers . . . . . . . . . . . . . . . . . . . . . . . . .  37
112	     5.6.  Specifying Keys for Maps  . . . . . . . . . . . . . . . .  38
113	       5.6.1.  Equivalence of Keys . . . . . . . . . . . . . . . . .  39
114	     5.7.  Undefined Values  . . . . . . . . . . . . . . . . . . . .  40
115	   6.  Converting Data between CBOR and JSON . . . . . . . . . . . .  40
116	     6.1.  Converting from CBOR to JSON  . . . . . . . . . . . . . .  41
117	     6.2.  Converting from JSON to CBOR  . . . . . . . . . . . . . .  42
118	   7.  Future Evolution of CBOR  . . . . . . . . . . . . . . . . . .  43
119	     7.1.  Extension Points  . . . . . . . . . . . . . . . . . . . .  43
120	     7.2.  Curating the Additional Information Space . . . . . . . .  44
121	   8.  Diagnostic Notation . . . . . . . . . . . . . . . . . . . . .  45
122	     8.1.  Encoding Indicators . . . . . . . . . . . . . . . . . . .  46
123	   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  46
124	     9.1.  Simple Values Registry  . . . . . . . . . . . . . . . . .  47
125	     9.2.  Tags Registry . . . . . . . . . . . . . . . . . . . . . .  47
126	     9.3.  Media Type ("MIME Type")  . . . . . . . . . . . . . . . .  47
127	     9.4.  CoAP Content-Format . . . . . . . . . . . . . . . . . . .  48
128	     9.5.  The +cbor Structured Syntax Suffix Registration . . . . .  49
129	   10. Security Considerations . . . . . . . . . . . . . . . . . . .  50
130	   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  52
131	     11.1.  Normative References . . . . . . . . . . . . . . . . . .  52
132	     11.2.  Informative References . . . . . . . . . . . . . . . . .  53
133	   Appendix A.  Examples . . . . . . . . . . . . . . . . . . . . . .  55
134	   Appendix B.  Jump Table . . . . . . . . . . . . . . . . . . . . .  59
135	   Appendix C.  Pseudocode . . . . . . . . . . . . . . . . . . . . .  62
136	   Appendix D.  Half-Precision . . . . . . . . . . . . . . . . . . .  65
137	   Appendix E.  Comparison of Other Binary Formats to CBOR's Design
138	           Objectives  . . . . . . . . . . . . . . . . . . . . . . .  66
139	     E.1.  ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . .  67
140	     E.2.  MessagePack . . . . . . . . . . . . . . . . . . . . . . .  67
141	     E.3.  BSON  . . . . . . . . . . . . . . . . . . . . . . . . . .  68
142	     E.4.  MSDTP: RFC 713  . . . . . . . . . . . . . . . . . . . . .  68
143	     E.5.  Conciseness on the Wire . . . . . . . . . . . . . . . . .  68
144	   Appendix F.  Changes from RFC 7049  . . . . . . . . . . . . . . .  69
145	   Appendix G.  Well-formedness errors and examples  . . . . . . . .  70
146	     G.1.  Examples for CBOR data items that are not well-formed . .  71
147	   Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . .  73
148	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  74

150	1.  Introduction

152	   There are hundreds of standardized formats for binary representation
153	   of structured data (also known as binary serialization formats).  Of
154	   those, some are for specific domains of information, while others are
155	   generalized for arbitrary data.  In the IETF, probably the best-known
156	   formats in the latter category are ASN.1's BER and DER [ASN.1].

158	   The format defined here follows some specific design goals that are
159	   not well met by current formats.  The underlying data model is an
160	   extended version of the JSON data model [RFC8259].  It is important
161	   to note that this is not a proposal that the grammar in RFC 8259 be
162	   extended in general, since doing so would cause a significant
163	   backwards incompatibility with already deployed JSON documents.
164	   Instead, this document simply defines its own data model that starts
165	   from JSON.

167	   Appendix E lists some existing binary formats and discusses how well
168	   they do or do not fit the design objectives of the Concise Binary
169	   Object Representation (CBOR).

171	   This document is a revised edition of [RFC7049], with editorial
172	   improvements, added detail, and fixed errata.  This revision formally
173	   obsoletes RFC 7049, while keeping full compatibility of the
174	   interchange format from RFC 7049.  It does not create a new version
175	   of the format.

177	1.1.  Objectives

179	   The objectives of CBOR, roughly in decreasing order of importance,
180	   are:

182	   1.  The representation must be able to unambiguously encode most
183	       common data formats used in Internet standards.

185	       *  It must represent a reasonable set of basic data types and
186	          structures using binary encoding.  "Reasonable" here is
187	          largely influenced by the capabilities of JSON, with the major
188	          addition of binary byte strings.  The structures supported are
189	          limited to arrays and trees; loops and lattice-style graphs
190	          are not supported.

192	       *  There is no requirement that all data formats be uniquely
193	          encoded; that is, it is acceptable that the number "7" might
194	          be encoded in multiple different ways.

196	   2.  The code for an encoder or decoder must be able to be compact in
197	       order to support systems with very limited memory, processor
198	       power, and instruction sets.

200	       *  An encoder and a decoder need to be implementable in a very
201	          small amount of code (for example, in class 1 constrained
202	          nodes as defined in [RFC7228]).

204	       *  The format should use contemporary machine representations of
205	          data (for example, not requiring binary-to-decimal
206	          conversion).

208	   3.  Data must be able to be decoded without a schema description.

210	       *  Similar to JSON, encoded data should be self-describing so
211	          that a generic decoder can be written.

213	   4.  The serialization must be reasonably compact, but data
214	       compactness is secondary to code compactness for the encoder and
215	       decoder.

217	       *  "Reasonable" here is bounded by JSON as an upper bound in
218	          size, and by the implementation complexity limiting how much
219	          effort can go into achieving that compactness.  Using either
220	          general compression schemes or extensive bit-fiddling violates
221	          the complexity goals.

223	   5.  The format must be applicable to both constrained nodes and high-
224	       volume applications.

226	       *  This means it must be reasonably frugal in CPU usage for both
227	          encoding and decoding.  This is relevant both for constrained
228	          nodes and for potential usage in applications with a very high
229	          volume of data.

231	   6.  The format must support all JSON data types for conversion to and
232	       from JSON.

234	       *  It must support a reasonable level of conversion as long as
235	          the data represented is within the capabilities of JSON.  It
236	          must be possible to define a unidirectional mapping towards
237	          JSON for all types of data.

239	   7.  The format must be extensible, and the extended data must be
240	       decodable by earlier decoders.

242	       *  The format is designed for decades of use.

244	       *  The format must support a form of extensibility that allows
245	          fallback so that a decoder that does not understand an
246	          extension can still decode the message.

248	       *  The format must be able to be extended in the future by later
249	          IETF standards.

251	1.2.  Terminology

253	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
254	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
255	   "OPTIONAL" in this document are to be interpreted as described in
256	   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
257	   capitals, as shown here.

259	   The term "byte" is used in its now-customary sense as a synonym for
260	   "octet".  All multi-byte values are encoded in network byte order
261	   (that is, most significant byte first, also known as "big-endian").

263	   This specification makes use of the following terminology:

265	   Data item:  A single piece of CBOR data.  The structure of a data
266	      item may contain zero, one, or more nested data items.  The term
267	      is used both for the data item in representation format and for
268	      the abstract idea that can be derived from that by a decoder; the
269	      former can be addressed specifically by using "encoded data item".

271	   Decoder:  A process that decodes a well-formed encoded CBOR data item
272	      and makes it available to an application.  Formally speaking, a
273	      decoder contains a parser to break up the input using the syntax
274	      rules of CBOR, as well as a semantic processor to prepare the data
275	      in a form suitable to the application.

277	   Encoder:  A process that generates the (well-formed) representation
278	      format of a CBOR data item from application information.

280	   Data Stream:  A sequence of zero or more data items, not further
281	      assembled into a larger containing data item.  The independent
282	      data items that make up a data stream are sometimes also referred
283	      to as "top-level data items".

285	   Well-formed:  A data item that follows the syntactic structure of
286	      CBOR.  A well-formed data item uses the initial bytes and the byte
287	      strings and/or data items that are implied by their values as
288	      defined in CBOR and does not include following extraneous data.

290	      CBOR decoders by definition only return contents from well-formed
291	      data items.

293	   Valid:  A data item that is well-formed and also follows the semantic
294	      restrictions that apply to CBOR data items (Section 5.3).

296	   Expected:  Besides its normal English meaning, the term "expected" is
297	      used to describe requirements beyond CBOR validity that an
298	      application has on its input data.  Well-formed (processable at
299	      all), valid (checked by a validity-checking generic decoder), and
300	      expected (checked by the application) form a hierarchy of layers
301	      of acceptability.

303	   Stream decoder:  A process that decodes a data stream and makes each
304	      of the data items in the sequence available to an application as
305	      they are received.

307	   Terms and concepts for floating-point values such as Infinity, NaN
308	   (not a number), negative zero, and subnormal are defined in
309	   [IEEE754].

311	   Where bit arithmetic or data types are explained, this document uses
312	   the notation familiar from the programming language C, except that
313	   "**" denotes exponentiation.  Similar to the "0x" notation for
314	   hexadecimal numbers, numbers in binary notation are prefixed with
315	   "0b".  Underscores can be added to a number solely for readability,
316	   so 0b00100001 (0x21) might be written 0b001_00001 to emphasize the
317	   desired interpretation of the bits in the byte; in this case, it is
318	   split into three bits and five bits.  Encoded CBOR data items are
319	   sometimes given in the "0x" or "0b" notation; these values are first
320	   interpreted as numbers as in C and are then interpreted as byte
321	   strings in network byte order, including any leading zero bytes
322	   expressed in the notation.

324	   Words may be _italicized_ for emphasis; in the plain text form of
325	   this specification this is indicated by surrounding words with
326	   underscore characters.  Verbatim text (e.g., names from a programming
327	   language) may be set in "monospace" type; in plain text this is
328	   approximated somewhat ambiguously by surrounding the text in double
329	   quotes (which also retain their usual meaning).

331	2.  CBOR Data Models

333	   CBOR is explicit about its generic data model, which defines the set
334	   of all data items that can be represented in CBOR.  Its basic generic
335	   data model is extensible by the registration of simple type values
336	   and tags.  Applications can then subset the resulting extended
337	   generic data model to build their specific data models.

339	   Within environments that can represent the data items in the generic
340	   data model, generic CBOR encoders and decoders can be implemented
341	   (which usually involves defining additional implementation data types
342	   for those data items that do not already have a natural
343	   representation in the environment).  The ability to provide generic
344	   encoders and decoders is an explicit design goal of CBOR; however
345	   many applications will provide their own application-specific
346	   encoders and/or decoders.

348	   In the basic (un-extended) generic data model, a data item is one of:

350	   *  an integer in the range -2**64..2**64-1 inclusive

352	   *  a simple value, identified by a number between 0 and 255, but
353	      distinct from that number itself

355	   *  a floating-point value, distinct from an integer, out of the set
356	      representable by IEEE 754 binary64 (including non-finites)
357	      [IEEE754]

359	   *  a sequence of zero or more bytes ("byte string")

361	   *  a sequence of zero or more Unicode code points ("text string")

363	   *  a sequence of zero or more data items ("array")

365	   *  a mapping (mathematical function) from zero or more data items
366	      ("keys") each to a data item ("values"), ("map")

368	   *  a tagged data item ("tag"), comprising a tag number (an integer in
369	      the range 0..2**64-1) and the tag content (a data item)

371	   Note that integer and floating-point values are distinct in this
372	   model, even if they have the same numeric value.

374	   Also note that serialization variants, such as the number of bytes of
375	   the encoded floating-point value, or the choice of one of the ways in
376	   which an integer, the length of a text or byte string, the number of
377	   elements in an array or pairs in a map, or a tag number,
378	   (collectively "the argument", see Section 3) can be encoded, are not
379	   visible at the generic data model level.

381	2.1.  Extended Generic Data Models

383	   This basic generic data model comes pre-extended by the registration
384	   of a number of simple values and tag numbers right in this document,
385	   such as:

387	   *  "false", "true", "null", and "undefined" (simple values identified
388	      by 20..23)

390	   *  integer and floating-point values with a larger range and
391	      precision than the above (tag numbers 2 to 5)

393	   *  application data types such as a point in time or an RFC 3339
394	      date/time string (tag numbers 1, 0)

396	   Further elements of the extended generic data model can be (and have
397	   been) defined via the IANA registries created for CBOR.  Even if such
398	   an extension is unknown to a generic encoder or decoder, data items
399	   using that extension can be passed to or from the application by
400	   representing them at the interface to the application within the
401	   basic generic data model, i.e., as generic values of a simple type or
402	   generic tags.

404	   In other words, the basic generic data model is stable as defined in
405	   this document, while the extended generic data model expands by the
406	   registration of new simple values or tag numbers, but never shrinks.

408	   While there is a strong expectation that generic encoders and
409	   decoders can represent "false", "true", and "null" ("undefined" is
410	   intentionally omitted) in the form appropriate for their programming
411	   environment, implementation of the data model extensions created by
412	   tags is truly optional and a matter of implementation quality.

414	2.2.  Specific Data Models

416	   The specific data model for a CBOR-based protocol usually subsets the
417	   extended generic data model and assigns application semantics to the
418	   data items within this subset and its components.  When documenting
419	   such specific data models, where it is desired to specify the types
420	   of data items, it is preferred to identify the types by the names
421	   they have in the generic data model ("negative integer", "array")
422	   instead of by referring to aspects of their CBOR representation
423	   ("major type 1", "major type 4").

425	   Specific data models can also specify what values (including values
426	   of different types) are equivalent for the purposes of map keys and
427	   encoder freedom.  For example, in the generic data model, a valid map
428	   MAY have both "0" and "0.0" as keys, and an encoder MUST NOT encode
429	   "0.0" as an integer (major type 0, Section 3.1).  However, if a
430	   specific data model declares that floating-point and integer
431	   representations of integral values are equivalent, using both map
432	   keys "0" and "0.0" in a single map would be considered duplicates,
433	   even while encoded as different major types, and so invalid; and an
434	   encoder could encode integral-valued floats as integers or vice
435	   versa, perhaps to save encoded bytes.

437	3.  Specification of the CBOR Encoding

439	   A CBOR data item (Section 2) is encoded to or decoded from a byte
440	   string carrying a well-formed encoded data item as described in this
441	   section.  The encoding is summarized in Table 7, indexed by the
442	   initial byte.  An encoder MUST produce only well-formed encoded data
443	   items.  A decoder MUST NOT return a decoded data item when it
444	   encounters input that is not a well-formed encoded CBOR data item
445	   (this does not detract from the usefulness of diagnostic and recovery
446	   tools that might make available some information from a damaged
447	   encoded CBOR data item).

449	   The initial byte of each encoded data item contains both information
450	   about the major type (the high-order 3 bits, described in
451	   Section 3.1) and additional information (the low-order 5 bits).  With
452	   a few exceptions, the additional information's value describes how to
453	   load an unsigned integer "argument":

455	   Less than 24:  The argument's value is the value of the additional
456	      information.

458	   24, 25, 26, or 27:  The argument's value is held in the following 1,
459	      2, 4, or 8 bytes, respectively, in network byte order.  For major
460	      type 7 and additional information value 25, 26, 27, these bytes
461	      are not used as an integer argument, but as a floating-point value
462	      (see Section 3.3).

464	   28, 29, 30:  These values are reserved for future additions to the
465	      CBOR format.  In the present version of CBOR, the encoded item is
466	      not well-formed.

468	   31:  No argument value is derived.  If the major type is 0, 1, or 6,
469	      the encoded item is not well-formed.  For major types 2 to 5, the
470	      item's length is indefinite, and for major type 7, the byte does
471	      not consitute a data item at all but terminates an indefinite
472	      length item; both are described in Section 3.2.

474	   The initial byte and any additional bytes consumed to construct the
475	   argument are collectively referred to as the "head" of the data item.

477	   The meaning of this argument depends on the major type.  For example,
478	   in major type 0, the argument is the value of the data item itself
479	   (and in major type 1 the value of the data item is computed from the
480	   argument); in major type 2 and 3 it gives the length of the string
481	   data in bytes that follows; and in major types 4 and 5 it is used to
482	   determine the number of data items enclosed.

484	   If the encoded sequence of bytes ends before the end of a data item,
485	   that item is not well-formed.  If the encoded sequence of bytes still
486	   has bytes remaining after the outermost encoded item is decoded, that
487	   encoding is not a single well-formed CBOR item; depending on the
488	   application, the decoder may either treat the encoding as not well-
489	   formed or just identify the start of the remaining bytes to the
490	   application.

492	   A CBOR decoder implementation can be based on a jump table with all
493	   256 defined values for the initial byte (Table 7).  A decoder in a
494	   constrained implementation can instead use the structure of the
495	   initial byte and following bytes for more compact code (see
496	   Appendix C for a rough impression of how this could look).

498	3.1.  Major Types

500	   The following lists the major types and the additional information
501	   and other bytes associated with the type.

503	   Major type 0:  an integer in the range 0..2**64-1 inclusive.  The
504	      value of the encoded item is the argument itself.  For example,
505	      the integer 10 is denoted as the one byte 0b000_01010 (major type
506	      0, additional information 10).  The integer 500 would be
507	      0b000_11001 (major type 0, additional information 25) followed by
508	      the two bytes 0x01f4, which is 500 in decimal.

510	   Major type 1:  a negative integer in the range -2**64..-1 inclusive.
511	      The value of the item is -1 minus the argument.  For example, the
512	      integer -500 would be 0b001_11001 (major type 1, additional
513	      information 25) followed by the two bytes 0x01f3, which is 499 in
514	      decimal.

516	   Major type 2:  a byte string.  The number of bytes in the string is
517	      equal to the argument.  For example, a byte string whose length is
518	      5 would have an initial byte of 0b010_00101 (major type 2,
519	      additional information 5 for the length), followed by 5 bytes of
520	      binary content.  A byte string whose length is 500 would have 3
521	      initial bytes of 0b010_11001 (major type 2, additional information
522	      25 to indicate a two-byte length) followed by the two bytes 0x01f4
523	      for a length of 500, followed by 500 bytes of binary content.

525	   Major type 3:  a text string (Section 2), encoded as UTF-8
526	      ([RFC3629]).  The number of bytes in the string is equal to the
527	      argument.  A string containing an invalid UTF-8 sequence is well-
528	      formed but invalid.  This type is provided for systems that need
529	      to interpret or display human-readable text, and allows the
530	      differentiation between unstructured bytes and text that has a
531	      specified repertoire and encoding.  In contrast to formats such as
532	      JSON, the Unicode characters in this type are never escaped.
533	      Thus, a newline character (U+000A) is always represented in a
534	      string as the byte 0x0a, and never as the bytes 0x5c6e (the
535	      characters "\" and "n") or as 0x5c7530303061 (the characters "\",
536	      "u", "0", "0", "0", and "a").

538	   Major type 4:  an array of data items.  In other formats, arrays are
539	      also called lists, sequences, or tuples (a "CBOR sequence" is
540	      something slightly different, though [RFC8742]).  The argument is
541	      the number of data items in the array.  Items in an array do not
542	      need to all be of the same type.  For example, an array that
543	      contains 10 items of any type would have an initial byte of
544	      0b100_01010 (major type of 4, additional information of 10 for the
545	      length) followed by the 10 remaining items.

547	   Major type 5:  a map of pairs of data items.  Maps are also called
548	      tables, dictionaries, hashes, or objects (in JSON).  A map is
549	      comprised of pairs of data items, each pair consisting of a key
550	      that is immediately followed by a value.  The argument is the
551	      number of _pairs_ of data items in the map.  For example, a map
552	      that contains 9 pairs would have an initial byte of 0b101_01001
553	      (major type of 5, additional information of 9 for the number of
554	      pairs) followed by the 18 remaining items.  The first item is the
555	      first key, the second item is the first value, the third item is
556	      the second key, and so on.  Because items in a map come in pairs,
557	      their total number is always even: A map that contains an odd
558	      number of items (no value data present after the last key data
559	      item) is not well-formed.  A map that has duplicate keys may be
560	      well-formed, but it is not valid, and thus it causes indeterminate
561	      decoding; see also Section 5.6.

563	   Major type 6:  a tagged data item ("tag") whose tag number, an
564	      integer in the range 0..2**64-1 inclusive, is the argument and
565	      whose enclosed data item ("tag content") is the single encoded
566	      data item that follows the head.  See Section 3.4.

568	   Major type 7:  floating-point numbers and simple values, as well as
569	      the "break" stop code.  See Section 3.3.

571	   These eight major types lead to a simple table showing which of the
572	   256 possible values for the initial byte of a data item are used
573	   (Table 7).

575	   In major types 6 and 7, many of the possible values are reserved for
576	   future specification.  See Section 9 for more information on these
577	   values.

579	   Table 1 summarizes the major types defined by CBOR, ignoring the next
580	   section for now.  The number N in this table stands for the argument,
581	   mt for the major type.

583	     +----+-----------------------+---------------------------------+
584	     | mt | Meaning               | Content                         |
585	     +====+=======================+=================================+
586	     | 0  | unsigned integer N    | -                               |
587	     +----+-----------------------+---------------------------------+
588	     | 1  | negative integer -1-N | -                               |
589	     +----+-----------------------+---------------------------------+
590	     | 2  | byte string           | N bytes                         |
591	     +----+-----------------------+---------------------------------+
592	     | 3  | text string           | N bytes (UTF-8 text)            |
593	     +----+-----------------------+---------------------------------+
594	     | 4  | array                 | N data items (elements)         |
595	     +----+-----------------------+---------------------------------+
596	     | 5  | map                   | 2N data items (key/value pairs) |
597	     +----+-----------------------+---------------------------------+
598	     | 6  | tag of number N       | 1 data item                     |
599	     +----+-----------------------+---------------------------------+
600	     | 7  | simple/float          | -                               |
601	     +----+-----------------------+---------------------------------+

603	       Table 1: Overview over the definite-length use of CBOR major
604	                  types (mt = major type, N = argument)

606	3.2.  Indefinite Lengths for Some Major Types

608	   Four CBOR items (arrays, maps, byte strings, and text strings) can be
609	   encoded with an indefinite length using additional information value
610	   31.  This is useful if the encoding of the item needs to begin before
611	   the number of items inside the array or map, or the total length of
612	   the string, is known.  (The ability to start sending a data item
613	   before all of it is known is often referred to as "streaming" within
614	   that data item.)

616	   Indefinite-length arrays and maps are dealt with differently than
617	   indefinite-length byte strings and text strings.

619	3.2.1.  The "break" Stop Code

621	   The "break" stop code is encoded with major type 7 and additional
622	   information value 31 (0b111_11111).  It is not itself a data item: it
623	   is just a syntactic feature to close an indefinite-length item.

625	   If the "break" stop code appears anywhere where a data item is
626	   expected, other than directly inside an indefinite-length string,
627	   array, or map -- for example directly inside a definite-length array
628	   or map -- the enclosing item is not well-formed.

630	3.2.2.  Indefinite-Length Arrays and Maps

632	   Indefinite-length arrays and maps are represented using their major
633	   type with the additional information value of 31, followed by an
634	   arbitrary-length sequence of zero or more items for an array or key/
635	   value pairs for a map, followed by the "break" stop code
636	   (Section 3.2.1).  In other words, indefinite-length arrays and maps
637	   look identical to other arrays and maps except for beginning with the
638	   additional information value of 31 and ending with the "break" stop
639	   code.

641	   If the break stop code appears after a key in a map, in place of that
642	   key's value, the map is not well-formed.

644	   There is no restriction against nesting indefinite-length array or
645	   map items.  A "break" only terminates a single item, so nested
646	   indefinite-length items need exactly as many "break" stop codes as
647	   there are type bytes starting an indefinite-length item.

649	   For example, assume an encoder wants to represent the abstract array
650	   [1, [2, 3], [4, 5]].  The definite-length encoding would be
651	   0x8301820203820405:

653	   83        -- Array of length 3
654	      01     -- 1
655	      82     -- Array of length 2
656	         02  -- 2
657	         03  -- 3
658	      82     -- Array of length 2
659	         04  -- 4
660	         05  -- 5

662	   Indefinite-length encoding could be applied independently to each of
663	   the three arrays encoded in this data item, as required, leading to
664	   representations such as:

666	   0x9f018202039f0405ffff
667	   9F        -- Start indefinite-length array
668	      01     -- 1
669	      82     -- Array of length 2
670	         02  -- 2
671	         03  -- 3
672	      9F     -- Start indefinite-length array
673	         04  -- 4
674	         05  -- 5
675	         FF  -- "break" (inner array)
676	      FF     -- "break" (outer array)

678	   0x9f01820203820405ff
679	   9F        -- Start indefinite-length array
680	      01     -- 1
681	      82     -- Array of length 2
682	         02  -- 2
683	         03  -- 3
684	      82     -- Array of length 2
685	         04  -- 4
686	         05  -- 5
687	      FF     -- "break"

689	   0x83018202039f0405ff
690	   83        -- Array of length 3
691	      01     -- 1
692	      82     -- Array of length 2
693	         02  -- 2
694	         03  -- 3
695	      9F     -- Start indefinite-length array
696	         04  -- 4
697	         05  -- 5
698	         FF  -- "break"

700	   0x83019f0203ff820405
701	   83        -- Array of length 3
702	      01     -- 1
703	      9F     -- Start indefinite-length array
704	         02  -- 2
705	         03  -- 3
706	         FF  -- "break"
707	      82     -- Array of length 2
708	         04  -- 4
709	         05  -- 5

711	   An example of an indefinite-length map (that happens to have two key/
712	   value pairs) might be:

714	   0xbf6346756ef563416d7421ff
715	   BF           -- Start indefinite-length map
716	      63        -- First key, UTF-8 string length 3
717	         46756e --   "Fun"
718	      F5        -- First value, true
719	      63        -- Second key, UTF-8 string length 3
720	         416d74 --   "Amt"
721	      21        -- Second value, -2
722	      FF        -- "break"

724	3.2.3.  Indefinite-Length Byte Strings and Text Strings

726	   Indefinite-length strings are represented by a byte containing the
727	   major type and additional information value of 31, followed by a
728	   series of zero or more byte or text strings ("chunks") that have
729	   definite lengths, followed by the "break" stop code (Section 3.2.1).
730	   The data item represented by the indefinite-length string is the
731	   concatenation of the chunks (i.e., the empty byte or text string,
732	   respectively, if no chunk is present).  (Note that zero-length
733	   chunks, while not particularly useful, are permitted.)

735	   If any item between the indefinite-length string indicator
736	   (0b010_11111 or 0b011_11111) and the "break" stop code is not a
737	   definite-length string item of the same major type, the string is not
738	   well-formed.

740	   If any definite-length text string inside an indefinite-length text
741	   string is invalid, the indefinite-length text string is invalid.
742	   Note that this implies that the bytes of a single UTF-8 character
743	   cannot be split up between chunks: a new chunk of a text string can
744	   only be started at a character boundary.

746	   For example, assume an encoded data item consisting of the bytes:

748	   0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111

750	   5F              -- Start indefinite-length byte string
751	      44           -- Byte string of length 4
752	         aabbccdd  -- Bytes content
753	      43           -- Byte string of length 3
754	         eeff99    -- Bytes content
755	      FF           -- "break"

757	   After decoding, this results in a single byte string with seven
758	   bytes: 0xaabbccddeeff99.

760	3.2.4.  Summary of indefinite-length use of major types

762	   Table 2 summarizes the major types defined by CBOR as used for
763	   indefinite length encoding (with additional information set to 31).
764	   mt stands for the major type.

766	       +----+-------------------+----------------------------------+
767	       | mt | Meaning           | enclosed up to "break" stop code |
768	       +====+===================+==================================+
769	       | 0  | (not well-formed) | -                                |
770	       +----+-------------------+----------------------------------+
771	       | 1  | (not well-formed) | -                                |
772	       +----+-------------------+----------------------------------+
773	       | 2  | byte string       | definite-length byte strings     |
774	       +----+-------------------+----------------------------------+
775	       | 3  | text string       | definite-length text strings     |
776	       +----+-------------------+----------------------------------+
777	       | 4  | array             | data items (elements)            |
778	       +----+-------------------+----------------------------------+
779	       | 5  | map               | data items (key/value pairs)     |
780	       +----+-------------------+----------------------------------+
781	       | 6  | (not well-formed) | -                                |
782	       +----+-------------------+----------------------------------+
783	       | 7  | "break" stop code | -                                |
784	       +----+-------------------+----------------------------------+

786	          Table 2: Overview over the indefinite-length use of CBOR
787	           major types (mt = major type, additional information =
788	                                    31)

790	3.3.  Floating-Point Numbers and Values with No Content

792	   Major type 7 is for two types of data: floating-point numbers and
793	   "simple values" that do not need any content.  Each value of the
794	   5-bit additional information in the initial byte has its own separate
795	   meaning, as defined in Table 3.  Like the major types for integers,
796	   items of this major type do not carry content data; all the
797	   information is in the initial bytes.

799	    +-------------+---------------------------------------------------+
800	    | 5-Bit Value | Semantics                                         |
801	    +=============+===================================================+
802	    | 0..23       | Simple value (value 0..23)                        |
803	    +-------------+---------------------------------------------------+
804	    | 24          | Simple value (value 32..255 in following byte)    |
805	    +-------------+---------------------------------------------------+
806	    | 25          | IEEE 754 Half-Precision Float (16 bits follow)    |
807	    +-------------+---------------------------------------------------+
808	    | 26          | IEEE 754 Single-Precision Float (32 bits follow)  |
809	    +-------------+---------------------------------------------------+
810	    | 27          | IEEE 754 Double-Precision Float (64 bits follow)  |
811	    +-------------+---------------------------------------------------+
812	    | 28-30       | Reserved, not well-formed in the present document |
813	    +-------------+---------------------------------------------------+
814	    | 31          | "break" stop code for indefinite-length items     |
815	    |             | (Section 3.2.1)                                   |
816	    +-------------+---------------------------------------------------+

818	         Table 3: Values for Additional Information in Major Type 7

820	   As with all other major types, the 5-bit value 24 signifies a single-
821	   byte extension: it is followed by an additional byte to represent the
822	   simple value.  (To minimize confusion, only the values 32 to 255 are
823	   used.)  This maintains the structure of the initial bytes: as for the
824	   other major types, the length of these always depends on the
825	   additional information in the first byte.  Table 4 lists the values
826	   assigned and available for simple types.

828	                       +---------+-----------------+
829	                       | Value   | Semantics       |
830	                       +=========+=================+
831	                       | 0..19   | (Unassigned)    |
832	                       +---------+-----------------+
833	                       | 20      | False           |
834	                       +---------+-----------------+
835	                       | 21      | True            |
836	                       +---------+-----------------+
837	                       | 22      | Null            |
838	                       +---------+-----------------+
839	                       | 23      | Undefined value |
840	                       +---------+-----------------+
841	                       | 24..31  | (Reserved)      |
842	                       +---------+-----------------+
843	                       | 32..255 | (Unassigned)    |
844	                       +---------+-----------------+

846	                           Table 4: Simple Values

848	   An encoder MUST NOT issue two-byte sequences that start with 0xf8
849	   (major type = 7, additional information = 24) and continue with a
850	   byte less than 0x20 (32 decimal).  Such sequences are not well-
851	   formed.  (This implies that an encoder cannot encode false, true,
852	   null, or undefined in two-byte sequences, only the one-byte variants
853	   of these are well-formed.)

855	   The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit
856	   IEEE 754 binary floating-point values [IEEE754].  These floating-
857	   point values are encoded in the additional bytes of the appropriate
858	   size.  (See Appendix D for some information about 16-bit floating-
859	   point numbers.)

861	3.4.  Tagging of Items

863	   In CBOR, a data item can be enclosed by a tag to give it some
864	   additional semantics, as uniquely identified by a "tag number".  The
865	   tag is major type 6, its argument (Section 3) indicates the tag
866	   number, and it contains a single enclosed data item, the "tag
867	   content".  (If a tag requires further structure to its content, this
868	   structure is provided by the enclosed data item.)  We use the term
869	   "tag" for the entire data item consisting of both a tag number and
870	   the tag content: the tag content is the data item that is being
871	   tagged.

873	   For example, assume that a byte string of length 12 is marked with a
874	   tag of number 2 to indicate it is a positive "bignum"
875	   (Section 3.4.3).  The encoded data item would start with a byte
876	   0b110_00010 (major type 6, additional information 2 for the tag
877	   number) followed by the encoded tag content: 0b010_01100 (major type
878	   2, additional information of 12 for the length) followed by the 12
879	   bytes of the bignum.

881	   The definition of a tag number describes the additional semantics
882	   conveyed for tags with this tag number in the extended generic data
883	   model.  These semantics may include equivalence of some tagged data
884	   items with other data items, including some that can already be
885	   represented in the basic generic data model.  For instance, 0xc24101,
886	   a bignum the tag content of which is the byte string with the single
887	   byte 0x01, is equivalent to an integer 1, which could also be encoded
888	   for instance as 0x01, 0x1801, or 0x190001.  The tag definition may
889	   include the definition of a preferred serialization (Section 4.1)
890	   that is recommended for generic encoders; this may prefer basic
891	   generic data model representations over ones that employ a tag.

893	   The tag definition usually restricts what kinds of nested data item
894	   or items are valid for such tags.  Tag definitions may restrict their
895	   content to a very specific syntactic structure, as the tags defined
896	   in this document do, or they may aim at a more semantically defined
897	   definition of their content, as for instance tags 40 and 1040 do
898	   [rfc8746]: These accept a number of different ways of representing
899	   arrays.

901	   As a matter of convention, many tags do not accept null or undefined
902	   values as tag content; instead, the expectation is that a null or
903	   undefined value can be used in place of the entire tag; Section 3.4.2
904	   provides some further considerations for one specific tag about the
905	   handling of this convention in application protocols and in mapping
906	   to platform types.

908	   Decoders do not need to understand tags of every tag number, and tags
909	   may be of little value in applications where the implementation
910	   creating a particular CBOR data item and the implementation decoding
911	   that stream know the semantic meaning of each item in the data flow.
912	   Their primary purpose in this specification is to define common data
913	   types such as dates.  A secondary purpose is to provide conversion
914	   hints when it is foreseen that the CBOR data item needs to be
915	   translated into a different format, requiring hints about the content
916	   of items.  Understanding the semantics of tags is optional for a
917	   decoder; it can simply present both the tag number and the tag
918	   content to the application, without interpreting the additional
919	   semantics of the tag.

921	   A tag applies semantics to the data item it encloses.  Tags can nest:
922	   If tag A encloses tag B, which encloses data item C, tag A applies to
923	   the result of applying tag B on data item C.

925	   IANA maintains a registry of tag numbers as described in Section 9.2.
926	   Table 5 provides a list of tag numbers that were defined in
927	   [RFC7049], with definitions in the rest of this section.  Note that
928	   many other tag numbers have been defined since the publication of
929	   [RFC7049]; see the registry described at Section 9.2 for the complete
930	   list.

932	      +------------+-------------+----------------------------------+
933	      | Tag Number | Data Item   | Semantics                        |
934	      +============+=============+==================================+
935	      | 0          | text string | Standard date/time string; see   |
936	      |            |             | Section 3.4.1                    |
937	      +------------+-------------+----------------------------------+
938	      | 1          | integer or  | Epoch-based date/time; see       |
939	      |            | float       | Section 3.4.2                    |
940	      +------------+-------------+----------------------------------+
941	      | 2          | byte string | Positive bignum; see             |
942	      |            |             | Section 3.4.3                    |
943	      +------------+-------------+----------------------------------+
944	      | 3          | byte string | Negative bignum; see             |
945	      |            |             | Section 3.4.3                    |
946	      +------------+-------------+----------------------------------+
947	      | 4          | array       | Decimal fraction; see            |
948	      |            |             | Section 3.4.4                    |
949	      +------------+-------------+----------------------------------+
950	      | 5          | array       | Bigfloat; see Section 3.4.4      |
951	      +------------+-------------+----------------------------------+
952	      | 21         | (any)       | Expected conversion to base64url |
953	      |            |             | encoding; see Section 3.4.5.2    |
954	      +------------+-------------+----------------------------------+
955	      | 22         | (any)       | Expected conversion to base64    |
956	      |            |             | encoding; see Section 3.4.5.2    |
957	      +------------+-------------+----------------------------------+
958	      | 23         | (any)       | Expected conversion to base16    |
959	      |            |             | encoding; see Section 3.4.5.2    |
960	      +------------+-------------+----------------------------------+
961	      | 24         | byte string | Encoded CBOR data item; see      |
962	      |            |             | Section 3.4.5.1                  |
963	      +------------+-------------+----------------------------------+
964	      | 32         | text string | URI; see Section 3.4.5.3         |
965	      +------------+-------------+----------------------------------+
966	      | 33         | text string | base64url; see Section 3.4.5.3   |
967	      +------------+-------------+----------------------------------+
968	      | 34         | text string | base64; see Section 3.4.5.3      |
969	      +------------+-------------+----------------------------------+
970	      | 35         | text string | Regular expression; see          |
971	      |            |             | Section 3.4.5.3                  |
972	      +------------+-------------+----------------------------------+
973	      | 36         | text string | MIME message; see                |
974	      |            |             | Section 3.4.5.3                  |
975	      +------------+-------------+----------------------------------+
976	      | 55799      | (any)       | Self-described CBOR; see         |
977	      |            |             | Section 3.4.6                    |
978	      +------------+-------------+----------------------------------+

980	                  Table 5: Tag numbers defined in RFC 7049

982	   Conceptually, tags are interpreted in the generic data model, not at
983	   (de-)serialization time.  A small number of tags (specifically, tag
984	   number 25 and tag number 29) have been registered with semantics that
985	   may require processing at (de-)serialization time: The decoder needs
986	   to be aware and the encoder needs to be in control of the exact
987	   sequence in which data items are encoded into the CBOR data stream.
988	   This means these tags cannot be implemented on top of every generic
989	   CBOR encoder/decoder (which might not reflect the serialization order
990	   for entries in a map at the data model level and vice versa); their
991	   implementation therefore typically needs to be integrated into the
992	   generic encoder/decoder.  The definition of new tags with this
993	   property is NOT RECOMMENDED.

995	   Protocols using tag numbers 0 and 1 extend the generic data model
996	   (Section 2) with data items representing points in time; tag numbers
997	   2 and 3, with arbitrarily sized integers; and tag numbers 4 and 5,
998	   with floating-point values of arbitrary size and precision.

1000	3.4.1.  Standard Date/Time String

1002	   Tag number 0 contains a text string in the standard format described
1003	   by the "date-time" production in [RFC3339], as refined by Section 3.3
1004	   of [RFC4287], representing the point in time described there.  A
1005	   nested item of another type or that doesn't match the [RFC4287]
1006	   format is invalid.

1008	3.4.2.  Epoch-based Date/Time

1010	   Tag number 1 contains a numerical value counting the number of
1011	   seconds from 1970-01-01T00:00Z in UTC time to the represented point
1012	   in civil time.

1014	   The tag content MUST be an unsigned or negative integer (major types
1015	   0 and 1), or a floating-point number (major type 7 with additional
1016	   information 25, 26, or 27).  Other contained types are invalid.

1018	   Non-negative values (major type 0 and non-negative floating-point
1019	   numbers) stand for time values on or after 1970-01-01T00:00Z UTC and
1020	   are interpreted according to POSIX [TIME_T].  (POSIX time is also
1021	   known as UNIX Epoch time.  Note that leap seconds are handled
1022	   specially by POSIX time and this results in a 1 second discontinuity
1023	   several times per decade.)  Note that applications that require the
1024	   expression of times beyond early 2106 cannot leave out support of
1025	   64-bit integers for the tag content.

1027	   Negative values (major type 1 and negative floating-point numbers)
1028	   are interpreted as determined by the application requirements as
1029	   there is no universal standard for UTC count-of-seconds time before
1030	   1970-01-01T00:00Z (this is particularly true for points in time that
1031	   precede discontinuities in national calendars).  The same applies to
1032	   non-finite values.

1034	   To indicate fractional seconds, floating-point values can be used
1035	   within tag number 1 instead of integer values.  Note that this
1036	   generally requires binary64 support, as binary16 and binary32 provide
1037	   non-zero fractions of seconds only for a short period of time around
1038	   early 1970.  An application that requires tag number 1 support may
1039	   restrict the tag content to be an integer (or a floating-point value)
1040	   only.

1042	   Note that platform types for date/time may include null or undefined
1043	   values, which may also be desirable at an application protocol level.
1044	   While emitting tag number 1 values with non-finite tag content values
1045	   (e.g., with NaN for undefined date/time values or with Infinite for
1046	   an expiry date that is not set) may seem an obvious way to handle
1047	   this, using untagged null or undefined is often a better solution.
1048	   Application protocol designers are encouraged to consider these cases
1049	   and include clear guidelines for handling them.

1051	3.4.3.  Bignums

1053	   Protocols using tag numbers 2 and 3 extend the generic data model
1054	   (Section 2) with "bignums" representing arbitrarily sized integers.
1055	   In the basic generic data model, bignum values are not equal to
1056	   integers from the same model, but the extended generic data model
1057	   created by this tag definition defines equivalence based on numeric
1058	   value, and preferred serialization (Section 4.1) never makes use of
1059	   bignums that also can be expressed as basic integers (see below).

1061	   Bignums are encoded as a byte string data item, which is interpreted
1062	   as an unsigned integer n in network byte order.  Contained items of
1063	   other types are invalid.  For tag number 2, the value of the bignum
1064	   is n.  For tag number 3, the value of the bignum is -1 - n.  The
1065	   preferred serialization of the byte string is to leave out any
1066	   leading zeroes (note that this means the preferred serialization for
1067	   n = 0 is the empty byte string, but see below).  Decoders that
1068	   understand these tags MUST be able to decode bignums that do have
1069	   leading zeroes.  The preferred serialization of an integer that can
1070	   be represented using major type 0 or 1 is to encode it this way
1071	   instead of as a bignum (which means that the empty string never
1072	   occurs in a bignum when using preferred serialization).  Note that
1073	   this means the non-preferred choice of a bignum representation
1074	   instead of a basic integer for encoding a number is not intended to
1075	   have application semantics (just as the choice of a longer basic
1076	   integer representation than needed, such as 0x1800 for 0x00 does
1077	   not).

1079	   For example, the number 18446744073709551616 (2**64) is represented
1080	   as 0b110_00010 (major type 6, tag number 2), followed by 0b010_01001
1081	   (major type 2, length 9), followed by 0x010000000000000000 (one byte
1082	   0x01 and eight bytes 0x00).  In hexadecimal:

1084	   C2                        -- Tag 2
1085	      49                     -- Byte string of length 9
1086	         010000000000000000  -- Bytes content

1088	3.4.4.  Decimal Fractions and Bigfloats

1090	   Protocols using tag number 4 extend the generic data model with data
1091	   items representing arbitrary-length decimal fractions of the form
1092	   m*(10**e).  Protocols using tag number 5 extend the generic data
1093	   model with data items representing arbitrary-length binary fractions
1094	   of the form m*(2**e).  As with bignums, values of different types are
1095	   not equal in the generic data model.

1097	   Decimal fractions combine an integer mantissa with a base-10 scaling
1098	   factor.  They are most useful if an application needs the exact
1099	   representation of a decimal fraction such as 1.1 because there is no
1100	   exact representation for many decimal fractions in binary floating-
1101	   point representations.

1103	   "Bigfloats" combine an integer mantissa with a base-2 scaling factor.
1104	   They are binary floating-point values that can exceed the range or
1105	   the precision of the three IEEE 754 formats supported by CBOR
1106	   (Section 3.3).  Bigfloats may also be used by constrained
1107	   applications that need some basic binary floating-point capability
1108	   without the need for supporting IEEE 754.

1110	   A decimal fraction or a bigfloat is represented as a tagged array
1111	   that contains exactly two integer numbers: an exponent e and a
1112	   mantissa m.  Decimal fractions (tag number 4) use base-10 exponents;
1113	   the value of a decimal fraction data item is m*(10**e).  Bigfloats
1114	   (tag number 5) use base-2 exponents; the value of a bigfloat data
1115	   item is m*(2**e).  The exponent e MUST be represented in an integer
1116	   of major type 0 or 1, while the mantissa can also be a bignum
1117	   (Section 3.4.3).  Contained items with other structures are invalid.

1119	   An example of a decimal fraction is that the number 273.15 could be
1120	   represented as 0b110_00100 (major type of 6 for the tag, additional
1121	   information of 4 for the number of tag), followed by 0b100_00010
1122	   (major type of 4 for the array, additional information of 2 for the
1123	   length of the array), followed by 0b001_00001 (major type of 1 for
1124	   the first integer, additional information of 1 for the value of -2),
1125	   followed by 0b000_11001 (major type of 0 for the second integer,
1126	   additional information of 25 for a two-byte value), followed by
1127	   0b0110101010110011 (27315 in two bytes).  In hexadecimal:

1129	   C4             -- Tag 4
1130	      82          -- Array of length 2
1131	         21       -- -2
1132	         19 6ab3  -- 27315

1134	   An example of a bigfloat is that the number 1.5 could be represented
1135	   as 0b110_00101 (major type of 6 for the tag, additional information
1136	   of 5 for the number of tag), followed by 0b100_00010 (major type of 4
1137	   for the array, additional information of 2 for the length of the
1138	   array), followed by 0b001_00000 (major type of 1 for the first
1139	   integer, additional information of 0 for the value of -1), followed
1140	   by 0b000_00011 (major type of 0 for the second integer, additional
1141	   information of 3 for the value of 3).  In hexadecimal:

1143	   C5             -- Tag 5
1144	      82          -- Array of length 2
1145	         20       -- -1
1146	         03       -- 3

1148	   Decimal fractions and bigfloats provide no representation of
1149	   Infinity, -Infinity, or NaN; if these are needed in place of a
1150	   decimal fraction or bigfloat, the IEEE 754 half-precision
1151	   representations from Section 3.3 can be used.

1153	3.4.5.  Content Hints

1155	   The tags in this section are for content hints that might be used by
1156	   generic CBOR processors.  These content hints do not extend the
1157	   generic data model.

1159	3.4.5.1.  Encoded CBOR Data Item

1161	   Sometimes it is beneficial to carry an embedded CBOR data item that
1162	   is not meant to be decoded immediately at the time the enclosing data
1163	   item is being decoded.  Tag number 24 (CBOR data item) can be used to
1164	   tag the embedded byte string as a data item encoded in CBOR format.
1165	   Contained items that aren't byte strings are invalid.  A contained
1166	   byte string is valid if it encodes a well-formed CBOR item; validity
1167	   checking of the decoded CBOR item is not required for tag validity
1168	   (but could be offered by a generic decoder as a special option).

1170	3.4.5.2.  Expected Later Encoding for CBOR-to-JSON Converters

1172	   Tags number 21 to 23 indicate that a byte string might require a
1173	   specific encoding when interoperating with a text-based
1174	   representation.  These tags are useful when an encoder knows that the
1175	   byte string data it is writing is likely to be later converted to a
1176	   particular JSON-based usage.  That usage specifies that some strings
1177	   are encoded as base64, base64url, and so on.  The encoder uses byte
1178	   strings instead of doing the encoding itself to reduce the message
1179	   size, to reduce the code size of the encoder, or both.  The encoder
1180	   does not know whether or not the converter will be generic, and
1181	   therefore wants to say what it believes is the proper way to convert
1182	   binary strings to JSON.

1184	   The data item tagged can be a byte string or any other data item.  In
1185	   the latter case, the tag applies to all of the byte string data items
1186	   contained in the data item, except for those contained in a nested
1187	   data item tagged with an expected conversion.

1189	   These three tag numbers suggest conversions to three of the base data
1190	   encodings defined in [RFC4648].  Tag number 21 suggests conversion to
1191	   base64url encoding (Section 5 of RFC 4648), where padding is not used
1192	   (see Section 3.2 of RFC 4648); that is, all trailing equals signs
1193	   ("=") are removed from the encoded string.  Tag number 22 suggests
1194	   conversion to classical base64 encoding (Section 4 of RFC 4648), with
1195	   padding as defined in RFC 4648.  For both base64url and base64,
1196	   padding bits are set to zero (see Section 3.5 of RFC 4648), and
1197	   encoding is performed without the inclusion of any line breaks,
1198	   whitespace, or other additional characters.  Tag number 23 suggests
1199	   conversion to base16 (hex) encoding, with uppercase alphabetics (see
1200	   Section 8 of RFC 4648).  Note that, for all three tag numbers, the
1201	   encoding of the empty byte string is the empty text string.

1203	3.4.5.3.  Encoded Text

1205	   Some text strings hold data that have formats widely used on the
1206	   Internet, and sometimes those formats can be validated and presented
1207	   to the application in appropriate form by the decoder.  There are
1208	   tags for some of these formats.  As with tag numbers 21 to 23, if
1209	   these tags are applied to an item other than a text string, they
1210	   apply to all text string data items it contains.

1212	   *  Tag number 32 is for URIs, as defined in [RFC3986].  If the text
1213	      string doesn't match the "URI-reference" production, the string is
1214	      invalid.

1216	   *  Tag numbers 33 and 34 are for base64url- and base64-encoded text
1217	      strings, respectively, as defined in [RFC4648].  If any of:

1219	      -  the encoded text string contains non-alphabet characters or
1220	         only 1 character in the last block of 4, or

1222	      -  the padding bits in a 2- or 3-character block are not 0, or

1224	      -  the base64 encoding has the wrong number of padding characters,
1225	         or

1227	      -  the base64url encoding has padding characters,

1229	      the string is invalid.

1231	   *  Tag number 35 is for regular expressions that are roughly in Perl
1232	      Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] or a
1233	      version of the JavaScript regular expression syntax [ECMA262].
1234	      (Note that more specific identification may be necessary if the
1235	      actual version of the specification underlying the regular
1236	      expression, or more than just the text of the regular expression
1237	      itself, need to be conveyed.)  Any contained string value is
1238	      valid.

1240	   *  Tag number 36 is for MIME messages (including all headers), as
1241	      defined in [RFC2045].  A text string that isn't a valid MIME
1242	      message is invalid.  (For this tag, validity checking may be
1243	      particularly onerous for a generic decoder and might therefore not
1244	      be offered.  Note that many MIME messages are general binary data
1245	      and can therefore not be represented in a text string;
1246	      [IANA.cbor-tags] lists a registration for tag number 257 that is
1247	      similar to tag number 36 but uses a byte string as its tag
1248	      content.)

1250	   Note that tag numbers 33 and 34 differ from 21 and 22 in that the
1251	   data is transported in base-encoded form for the former and in raw
1252	   byte string form for the latter.

1254	3.4.6.  Self-Described CBOR

1256	   In many applications, it will be clear from the context that CBOR is
1257	   being employed for encoding a data item.  For instance, a specific
1258	   protocol might specify the use of CBOR, or a media type is indicated
1259	   that specifies its use.  However, there may be applications where
1260	   such context information is not available, such as when CBOR data is
1261	   stored in a file that does not have disambiguating metadata.  Here,
1262	   it may help to have some distinguishing characteristics for the data
1263	   itself.

1265	   Tag number 55799 is defined for this purpose.  It does not impart any
1266	   special semantics on the data item that it encloses; that is, the
1267	   semantics of the tag content enclosed in tag number 55799 is exactly
1268	   identical to the semantics of the tag content itself.

1270	   The serialization of this tag's head is 0xd9d9f7, which does not
1271	   appear to be in use as a distinguishing mark for any frequently used
1272	   file types.  In particular, 0xd9d9f7 is not a valid start of a
1273	   Unicode text in any Unicode encoding if it is followed by a valid
1274	   CBOR data item.

1276	   For instance, a decoder might be able to decode both CBOR and JSON.
1277	   Such a decoder would need to mechanically distinguish the two
1278	   formats.  An easy way for an encoder to help the decoder would be to
1279	   tag the entire CBOR item with tag number 55799, the serialization of
1280	   which will never be found at the beginning of a JSON text.

1282	4.  Serialization Considerations

1284	4.1.  Preferred Serialization

1286	   For some values at the data model level, CBOR provides multiple
1287	   serializations.  For many applications, it is desirable that an
1288	   encoder always chooses a preferred serialization (preferred
1289	   encoding); however, the present specification does not put the burden
1290	   of enforcing this preference on either encoder or decoder.

1292	   Some constrained decoders may be limited in their ability to decode
1293	   non-preferred serializations: For example, if only integers below
1294	   1_000_000_000 (one billion) are expected in an application, the
1295	   decoder may leave out the code that would be needed to decode 64-bit
1296	   arguments in integers.  An encoder that always uses preferred
1297	   serialization ("preferred encoder") interoperates with this decoder
1298	   for the numbers that can occur in this application.  More generally
1299	   speaking, it therefore can be said that a preferred encoder is more
1300	   universally interoperable (and also less wasteful) than one that,
1301	   say, always uses 64-bit integers.

1303	   Similarly, a constrained encoder may be limited in the variety of
1304	   representation variants it supports in such a way that it does not
1305	   emit preferred serializations ("variant encoder"): Say, it could be
1306	   designed to always use the 32-bit variant for an integer that it
1307	   encodes even if a short representation is available (again, assuming
1308	   that there is no application need for integers that can only be
1309	   represented with the 64-bit variant).  A decoder that does not rely
1310	   on only ever receiving preferred serializations ("variation-tolerant
1311	   decoder") can there be said to be more universally interoperable (it
1312	   might very well optimize for the case of receiving preferred
1313	   serializations, though).  Full implementations of CBOR decoders are
1314	   by definition variation-tolerant; the distinction is only relevant if
1315	   a constrained implementation of a CBOR decoder meets a variant
1316	   encoder.

1318	   The preferred serialization always uses the shortest form of
1319	   representing the argument (Section 3); it also uses the shortest
1320	   floating-point encoding that preserves the value being encoded.

1322	   The preferred serialization for a floating-point value is the
1323	   shortest floating-point encoding that preserves its value, e.g.,
1324	   0xf94580 for the number 5.5, and 0xfa45ad9c00 for the number 5555.5.
1325	   For NaN values, a shorter encoding is preferred if zero-padding the
1326	   shorter significand towards the right reconstitutes the original NaN
1327	   value (for many applications, the single NaN encoding 0xf97e00 will
1328	   suffice).

1330	   Definite length encoding is preferred whenever the length is known at
1331	   the time the serialization of the item starts.

1333	4.2.  Deterministically Encoded CBOR

1335	   Some protocols may want encoders to only emit CBOR in a particular
1336	   deterministic format; those protocols might also have the decoders
1337	   check that their input is in that deterministic format.  Those
1338	   protocols are free to define what they mean by a "deterministic
1339	   format" and what encoders and decoders are expected to do.  This
1340	   section defines a set of restrictions that can serve as the base of
1341	   such a deterministic format.

1343	4.2.1.  Core Deterministic Encoding Requirements

1345	   A CBOR encoding satisfies the "core deterministic encoding
1346	   requirements" if it satisfies the following restrictions:

1348	   *  Preferred serialization MUST be used.  In particular, this means
1349	      that arguments (see Section 3) for integers, lengths in major
1350	      types 2 through 5, and tags MUST be as short as possible, for
1351	      instance:

1353	      -  0 to 23 and -1 to -24 MUST be expressed in the same byte as the
1354	         major type;

1356	      -  24 to 255 and -25 to -256 MUST be expressed only with an
1357	         additional uint8_t;

1359	      -  256 to 65535 and -257 to -65536 MUST be expressed only with an
1360	         additional uint16_t;

1362	      -  65536 to 4294967295 and -65537 to -4294967296 MUST be expressed
1363	         only with an additional uint32_t.

1365	      Floating-point values also MUST use the shortest form that
1366	      preserves the value, e.g. 1.5 is encoded as 0xf93e00 and 1000000.5
1367	      as 0xfa49742408.  (One implementation of this is to have all
1368	      floats start as a 64-bit float, then do a test conversion to a
1369	      32-bit float; if the result is the same numeric value, use the
1370	      shorter form and repeat the process with a test conversion to a
1371	      16-bit float.  This also works to select 16-bit float for positive
1372	      and negative Infinity as well.)

1374	   *  Indefinite-length items MUST NOT appear.  They can be encoded as
1375	      definite-length items instead.

1377	   *  The keys in every map MUST be sorted in the bytewise lexicographic
1378	      order of their deterministic encodings.  For example, the
1379	      following keys are sorted correctly:

1381	      1.  10, encoded as 0x0a.

1383	      2.  100, encoded as 0x1864.

1385	      3.  -1, encoded as 0x20.

1387	      4.  "z", encoded as 0x617a.

1389	      5.  "aa", encoded as 0x626161.

1391	      6.  [100], encoded as 0x811864.

1393	      7.  [-1], encoded as 0x8120.

1395	      8.  false, encoded as 0xf4.

1397	4.2.2.  Additional Deterministic Encoding Considerations

1399	   CBOR tags present additional considerations for deterministic
1400	   encoding.  If a CBOR-based protocol were to provide the same
1401	   semantics for the presence and absence of a specific tag (e.g., by
1402	   allowing both tag 1 data items and raw numbers in a date/time
1403	   position, treating the latter as if they were tagged), the
1404	   deterministic format would not allow the presence of the tag, based
1405	   on the "shortest form" principle.  For example, a protocol might give
1406	   encoders the choice of representing a URL as either a text string or,
1407	   using Section 3.4.5.3, tag number 32 containing a text string.  This
1408	   protocol's deterministic encoding needs to either require that the
1409	   tag is present or require that it is absent, not allow either one.

1411	   In a protocol that does require tags in certain places to obtain
1412	   specific semantics, the tag needs to appear in the deterministic
1413	   format as well.  Deterministic encoding considerations also apply to
1414	   the content of tags.

1416	   If a protocol includes a field that can express integers with an
1417	   absolute value of 2^64 or larger using tag numbers 2 or 3
1418	   (Section 3.4.3), the protocol's deterministic encoding needs to
1419	   specify whether smaller integers are also expressed using these tags
1420	   or using major types 0 and 1.  Preferred serialization uses the
1421	   latter choice, which is therefore recommended.

1423	   Protocols that include floating-point values, whether represented
1424	   using basic floating-point values (Section 3.3) or using tags (or
1425	   both), may need to define extra requirements on their deterministic
1426	   encodings, such as:

1428	   *  Although IEEE floating-point values can represent both positive
1429	      and negative zero as distinct values, the application might not
1430	      distinguish these and might decide to represent all zero values
1431	      with a positive sign, disallowing negative zero.  (The application
1432	      may also want to restrict the precision of floating point values
1433	      in such a way that there is never a need to represent 64-bit -- or
1434	      even 32-bit -- floating-point values.)

1436	   *  If a protocol includes a field that can express floating-point
1437	      values, with a specific data model that declares integer and
1438	      floating-point values to be interchangeable, the protocol's
1439	      deterministic encoding needs to specify whether the integer 1.0 is
1440	      encoded as 0x01, 0xf93c00, 0xfa3f800000, or 0xfb3ff0000000000000.
1441	      Example rules for this are:

1443	      1.  Encode integral values that fit in 64 bits as values from
1444	          major types 0 and 1, and other values as the preferred
1445	          (smallest of 16-, 32-, or 64-bit) floating-point
1446	          representation that accurately represents the value,

1448	      2.  Encode all values as the preferred floating-point
1449	          representation that accurately represents the value, even for
1450	          integral values, or

1452	      3.  Encode all values as 64-bit floating-point representations.

1454	      Rule 1 straddles the boundaries between integers and floating-
1455	      point values, and Rule 3 does not use preferred serialization, so
1456	      Rule 2 may be a good choice in many cases.

1458	   *  If NaN is an allowed value and there is no intent to support NaN
1459	      payloads or signaling NaNs, the protocol needs to pick a single
1460	      representation, typically 0xf97e00.  If that simple choice is not
1461	      possible, specific attention will be needed for NaN handling.

1463	   *  Subnormal numbers (nonzero numbers with the lowest possible
1464	      exponent of a given IEEE 754 number format) may be flushed to zero
1465	      outputs or be treated as zero inputs in some floating-point
1466	      implementations.  A protocol's deterministic encoding may want to
1467	      specifically accommodate such implementations while creating an
1468	      onus on other implementations, by excluding subnormal numbers from
1469	      interchange, interchanging zero instead.

1471	   *  The same number can be represented by different decimal fractions,
1472	      by different bigfloats, and by different forms under other tags
1473	      that may be defined to express numeric values.  Depending on the
1474	      implementation, it may not always be practical to determine
1475	      whether any of these forms (or forms in the basic generic data
1476	      model) are equivalent.  An application protocol that presents
1477	      choices of this kind for the representation format of numbers
1478	      needs to be explicit in how the formats are to be chosen for
1479	      deterministic encoding.

1481	4.2.3.  Length-first Map Key Ordering

1483	   The core deterministic encoding requirements (Section 4.2.1) sort map
1484	   keys in a different order from the one suggested by Section 3.9 of
1485	   [RFC7049] (called "Canonical CBOR" there).  Protocols that need to be
1486	   compatible with [RFC7049]'s order can instead be specified in terms
1487	   of this specification's "length-first core deterministic encoding
1488	   requirements":

1490	   A CBOR encoding satisfies the "length-first core deterministic
1491	   encoding requirements" if it satisfies the core deterministic
1492	   encoding requirements except that the keys in every map MUST be
1493	   sorted such that:

1495	   1.  If two keys have different lengths, the shorter one sorts
1496	       earlier;

1498	   2.  If two keys have the same length, the one with the lower value in
1499	       (byte-wise) lexical order sorts earlier.

1501	   For example, under the length-first core deterministic encoding
1502	   requirements, the following keys are sorted correctly:

1504	   1.  10, encoded as 0x0a.

1506	   2.  -1, encoded as 0x20.

1508	   3.  false, encoded as 0xf4.

1510	   4.  100, encoded as 0x1864.

1512	   5.  "z", encoded as 0x617a.

1514	   6.  [-1], encoded as 0x8120.

1516	   7.  "aa", encoded as 0x626161.

1518	   8.  [100], encoded as 0x811864.

1520	   (Although [RFC7049] used the term "Canonical CBOR" for its form of
1521	   requirements on deterministic encoding, this document avoids this
1522	   term because "canonicalization" is often associated with specific
1523	   uses of deterministic encoding only.  The terms are essentially
1524	   interchangeable, however, and the set of core requirements in this
1525	   document could also be called "Canonical CBOR", while the length-
1526	   first-ordered version of that could be called "Old Canonical CBOR".)

1528	5.  Creating CBOR-Based Protocols

1530	   Data formats such as CBOR are often used in environments where there
1531	   is no format negotiation.  A specific design goal of CBOR is to not
1532	   need any included or assumed schema: a decoder can take a CBOR item
1533	   and decode it with no other knowledge.

1535	   Of course, in real-world implementations, the encoder and the decoder
1536	   will have a shared view of what should be in a CBOR data item.  For
1537	   example, an agreed-to format might be "the item is an array whose
1538	   first value is a UTF-8 string, second value is an integer, and
1539	   subsequent values are zero or more floating-point numbers" or "the
1540	   item is a map that has byte strings for keys and contains at least
1541	   one pair whose key is 0xab01".

1543	   CBOR-based protocols MUST specify how their decoders handle invalid
1544	   and other unexpected data.  CBOR-based protocols MAY specify that
1545	   they treat arbitrary valid data as unexpected.  Encoders for CBOR-
1546	   based protocols MUST produce only valid items, that is, the protocol
1547	   cannot be designed to make use of invalid items.  An encoder can be
1548	   capable of encoding as many or as few types of values as is required
1549	   by the protocol in which it is used; a decoder can be capable of
1550	   understanding as many or as few types of values as is required by the
1551	   protocols in which it is used.  This lack of restrictions allows CBOR
1552	   to be used in extremely constrained environments.

1554	   The rest of this section discusses some considerations in creating
1555	   CBOR-based protocols.  With few exceptions, it is advisory only and
1556	   explicitly excludes any language from BCP 14 other than words that
1557	   could be interpreted as "MAY" in the sense of BCP 14.  The exceptions
1558	   aim at facilitating interoperability of CBOR-based protocols while
1559	   making use of a wide variety of both generic and application-specific
1560	   encoders and decoders.

1562	5.1.  CBOR in Streaming Applications

1564	   In a streaming application, a data stream may be composed of a
1565	   sequence of CBOR data items concatenated back-to-back.  In such an
1566	   environment, the decoder immediately begins decoding a new data item
1567	   if data is found after the end of a previous data item.

1569	   Not all of the bytes making up a data item may be immediately
1570	   available to the decoder; some decoders will buffer additional data
1571	   until a complete data item can be presented to the application.
1572	   Other decoders can present partial information about a top-level data
1573	   item to an application, such as the nested data items that could
1574	   already be decoded, or even parts of a byte string that hasn't
1575	   completely arrived yet.

1577	   Note that some applications and protocols will not want to use
1578	   indefinite-length encoding.  Using indefinite-length encoding allows
1579	   an encoder to not need to marshal all the data for counting, but it
1580	   requires a decoder to allocate increasing amounts of memory while
1581	   waiting for the end of the item.  This might be fine for some
1582	   applications but not others.

1584	5.2.  Generic Encoders and Decoders

1586	   A generic CBOR decoder can decode all well-formed CBOR data and
1587	   present them to an application.  See Appendix C.

1589	   Even though CBOR attempts to minimize these cases, not all well-
1590	   formed CBOR data is valid: for example, the encoded text string
1591	   "0x62c0ae" does not contain valid UTF-8 and so is not a valid CBOR
1592	   item.  Also, specific tags may make semantic constraints that may be
1593	   violated, such as a bignum tag enclosing another tag, or an instance
1594	   of tag number 0 containing a byte string, or containing a text string
1595	   with contents that do not match [RFC3339]'s "date-time" production.
1596	   There is no requirement that generic encoders and decoders make
1597	   unnatural choices for their application interface to enable the
1598	   processing of invalid data.  Generic encoders and decoders are
1599	   expected to forward simple values and tags even if their specific
1600	   codepoints are not registered at the time the encoder/decoder is
1601	   written (Section 5.4).

1603	   Generic decoders provide ways to present well-formed CBOR values,
1604	   both valid and invalid, to an application.  The diagnostic notation
1605	   (Section 8) may be used to present well-formed CBOR values to humans.

1607	   Generic encoders provide an application interface that allows the
1608	   application to specify any well-formed value, including simple values
1609	   and tags unknown to the encoder.

1611	5.3.  Validity of Items

1613	   A well-formed but invalid CBOR data item presents a problem with
1614	   interpreting the data encoded in it in the CBOR data model.  A CBOR-
1615	   based protocol could be specified in several layers, in which the
1616	   lower layers don't process the semantics of some of the CBOR data
1617	   they forward.  These layers can't notice any validity errors in data
1618	   they don't process and MUST forward that data as-is.  The first layer
1619	   that does process the semantics of an invalid CBOR item MUST take one
1620	   of two choices:

1622	   1.  Replace the problematic item with an error marker and continue
1623	       with the next item, or

1625	   2.  Issue an error and stop processing altogether.

1627	   A CBOR-based protocol MUST specify which of these options its
1628	   decoders take, for each kind of invalid item they might encounter.

1630	   Such problems might occur at the basic validity level of CBOR or in
1631	   the context of tags (tag validity).

1633	5.3.1.  Basic validity

1635	   Two kinds of validity errors can occur in the basic generic data
1636	   model:

1638	   Duplicate keys in a map:  Generic decoders (Section 5.2) make data
1639	      available to applications using the native CBOR data model.  That
1640	      data model includes maps (key-value mappings with unique keys),
1641	      not multimaps (key-value mappings where multiple entries can have
1642	      the same key).  Thus, a generic decoder that gets a CBOR map item
1643	      that has duplicate keys will decode to a map with only one
1644	      instance of that key, or it might stop processing altogether.  On
1645	      the other hand, a "streaming decoder" may not even be able to
1646	      notice (Section 5.6).

1648	   Invalid UTF-8 string:  A decoder might or might not want to verify
1649	      that the sequence of bytes in a UTF-8 string (major type 3) is
1650	      actually valid UTF-8 and react appropriately.

1652	5.3.2.  Tag validity

1654	   Two additional kinds of validity errors are introduced by adding tags
1655	   to the basic generic data model:

1657	   Inadmissible type for tag content:  Tag numbers (Section 3.4) specify
1658	      what type of data item is supposed to be used as their tag
1659	      content; for example, the tag numbers for positive or negative
1660	      bignums are supposed to be put on byte strings.  A decoder that
1661	      decodes the tagged data item into a native representation (a
1662	      native big integer in this example) is expected to check the type
1663	      of the data item being tagged.  Even decoders that don't have such
1664	      native representations available in their environment may perform
1665	      the check on those tags known to them and react appropriately.

1667	   Inadmissible value for tag content:  The type of data item may be
1668	      admissible for a tag's content, but the specific value may not be;
1669	      e.g., a value of "yesterday" is not acceptable for the content of
1670	      tag 0, even though it properly is a text string.  A decoder that
1671	      normally ingests such tags into equivalent platform types might
1672	      present this tag to the application in a similar way to how it
1673	      would present a tag with an unknown tag number (Section 5.4).

1675	5.4.  Validity and Evolution

1677	   A decoder with validity checking will expend the effort to reliably
1678	   detect data items with validity errors.  For example, such a decoder
1679	   needs to have an API that reports an error (and does not return data)
1680	   for a CBOR data item that contains any of the validity errors listed
1681	   in the previous subsection.

1683	   The set of tags defined in the tag registry (Section 9.2), as well as
1684	   the set of simple values defined in the simple values registry
1685	   (Section 9.1), can grow at any time beyond the set understood by a
1686	   generic decoder.  A validity-checking decoder can do one of two
1687	   things when it encounters such a case that it does not recognize:

1689	   *  It can report an error (and not return data).  Note that this
1690	      error is not a validity error per se.  This kind of error is more
1691	      likely to be raised by a decoder that would be performing validity
1692	      checking if this were a known case.

1694	   *  It can emit the unknown item (type, value, and, for tags, the
1695	      decoded tagged data item) to the application calling the decoder,
1696	      with an indication that the decoder did not recognize that tag
1697	      number or simple value.

1699	   The latter approach, which is also appropriate for decoders that do
1700	   not support validity checking, provides forward compatibility with
1701	   newly registered tags and simple values without the requirement to
1702	   update the encoder at the same time as the calling application.  (For
1703	   this, the API for the decoder needs to have a way to mark unknown
1704	   items so that the calling application can handle them in a manner
1705	   appropriate for the program.)
1706	   Since some of the processing needed for validity checking may have an
1707	   appreciable cost (in particular with duplicate detection for maps),
1708	   support of validity checking is not a requirement placed on all CBOR
1709	   decoders.

1711	   Some encoders will rely on their applications to provide input data
1712	   in such a way that valid CBOR results from the encoder.  A generic
1713	   encoder may also want to provide a validity-checking mode where it
1714	   reliably limits its output to valid CBOR, independent of whether or
1715	   not its application is indeed providing API-conformant data.

1717	5.5.  Numbers

1719	   CBOR-based protocols should take into account that different language
1720	   environments pose different restrictions on the range and precision
1721	   of numbers that are representable.  For example, the basic JavaScript
1722	   number system treats all numbers as floating-point values, which may
1723	   result in silent loss of precision in decoding integers with more
1724	   than 53 significant bits.  A protocol that uses numbers should define
1725	   its expectations on the handling of non-trivial numbers in decoders
1726	   and receiving applications.

1728	   A CBOR-based protocol that includes floating-point numbers can
1729	   restrict which of the three formats (half-precision, single-
1730	   precision, and double-precision) are to be supported.  For an
1731	   integer-only application, a protocol may want to completely exclude
1732	   the use of floating-point values.

1734	   A CBOR-based protocol designed for compactness may want to exclude
1735	   specific integer encodings that are longer than necessary for the
1736	   application, such as to save the need to implement 64-bit integers.
1737	   There is an expectation that encoders will use the most compact
1738	   integer representation that can represent a given value.  However, a
1739	   compact application that does not require deterministic encoding
1740	   should accept values that use a longer-than-needed encoding (such as
1741	   encoding "0" as 0b000_11001 followed by two bytes of 0x00) as long as
1742	   the application can decode an integer of the given size.  Similar
1743	   considerations apply to floating-point values; decoding both
1744	   preferred serializations and longer-than-needed ones is recommended.

1746	   CBOR-based protocols for constrained applications that provide a
1747	   choice between representing a specific number as an integer and as a
1748	   decimal fraction or bigfloat (such as when the exponent is small and
1749	   non-negative), might express a quality-of-implementation expectation
1750	   that the integer representation is used directly.

1752	5.6.  Specifying Keys for Maps

1754	   The encoding and decoding applications need to agree on what types of
1755	   keys are going to be used in maps.  In applications that need to
1756	   interwork with JSON-based applications, conversion is simplified by
1757	   limiting keys to text strings only; otherwise, there has to be a
1758	   specified mapping from the other CBOR types to text strings, and this
1759	   often leads to implementation errors.  In applications where keys are
1760	   numeric in nature and numeric ordering of keys is important to the
1761	   application, directly using the numbers for the keys is useful.

1763	   If multiple types of keys are to be used, consideration should be
1764	   given to how these types would be represented in the specific
1765	   programming environments that are to be used.  For example, in
1766	   JavaScript Maps [ECMA262], a key of integer 1 cannot be distinguished
1767	   from a key of floating-point 1.0.  This means that, if integer keys
1768	   are used, the protocol needs to avoid use of floating-point keys the
1769	   values of which happen to be integer numbers in the same map.

1771	   Decoders that deliver data items nested within a CBOR data item
1772	   immediately on decoding them ("streaming decoders") often do not keep
1773	   the state that is necessary to ascertain uniqueness of a key in a
1774	   map.  Similarly, an encoder that can start encoding data items before
1775	   the enclosing data item is completely available ("streaming encoder")
1776	   may want to reduce its overhead significantly by relying on its data
1777	   source to maintain uniqueness.

1779	   A CBOR-based protocol MUST define what to do when a receiving
1780	   application does see multiple identical keys in a map.  The resulting
1781	   rule in the protocol MUST respect the CBOR data model: it cannot
1782	   prescribe a specific handling of the entries with the identical keys,
1783	   except that it might have a rule that having identical keys in a map
1784	   indicates a malformed map and that the decoder has to stop with an
1785	   error.  When processing maps that exhibit entries with duplicate
1786	   keys, a generic decoder might do one of the following:

1788	   *  Not accept maps duplicate keys (that is, enforce validity for
1789	      maps, see also Section 5.4).  These generic decoders are
1790	      universally useful.  An application may still need to do perform
1791	      its own duplicate checking based on application rules (for
1792	      instance if the application equates integers and floating point
1793	      values in map key positions for specific maps).

1795	   *  Pass all map entries to the application, including ones with
1796	      duplicate keys.  This requires the application to handle (check
1797	      against) duplicate keys, even if the application rules are
1798	      identical to the generic data model rules.

1800	   *  Lose some entries with duplicate keys, e.g. by only delivering the
1801	      final (or first) entry out of the entries with the same key.  With
1802	      such a generic decoder, applications may get different results for
1803	      a specific key on different runs and with different generic
1804	      decoders as which value is returned is based on generic decoder
1805	      implementation and the actual order of keys in the map.  In
1806	      particular, applications cannot validate key uniqueness on their
1807	      own as they do not necessarily see all entries; they may not be
1808	      able to use such a generic decoder if they do need to validate key
1809	      uniqueness.  These generic decoders can only be used in situations
1810	      where the data source and transfer can be relied upon to always
1811	      provide valid maps; this is not possible if the data source and
1812	      transfer can be attacked.

1814	   Generic decoders need to document which of these three approaches
1815	   they implement.

1817	   The CBOR data model for maps does not allow ascribing semantics to
1818	   the order of the key/value pairs in the map representation.  Thus, a
1819	   CBOR-based protocol MUST NOT specify that changing the key/value pair
1820	   order in a map would change the semantics, except to specify that
1821	   some orders are disallowed, for example where they would not meet the
1822	   requirements of a deterministic encoding (Section 4.2).  (Any
1823	   secondary effects of map ordering such as on timing, cache usage, and
1824	   other potential side channels are not considered part of the
1825	   semantics but may be enough reason on their own for a protocol to
1826	   require a deterministic encoding format.)

1828	   Applications for constrained devices that have maps where a small
1829	   number of frequently used keys can be identified should consider
1830	   using small integers as keys; for instance, a set of 24 or fewer
1831	   frequent keys can be encoded in a single byte as unsigned integers,
1832	   up to 48 if negative integers are also used.  Less frequently
1833	   occurring keys can then use integers with longer encodings.

1835	5.6.1.  Equivalence of Keys

1837	   The specific data model applying to a CBOR data item is used to
1838	   determine whether keys occurring in maps are duplicates or distinct.

1840	   At the generic data model level, numerically equivalent integer and
1841	   floating-point values are distinct from each other, as they are from
1842	   the various big numbers (Tags 2 to 5).  Similarly, text strings are
1843	   distinct from byte strings, even if composed of the same bytes.  A
1844	   tagged value is distinct from an untagged value or from a value
1845	   tagged with a different tag number.

1847	   Within each of these groups, numeric values are distinct unless they
1848	   are numerically equal (specifically, -0.0 is equal to 0.0); for the
1849	   purpose of map key equivalence, NaN (not a number) values are
1850	   equivalent if they have the same significand after zero-extending
1851	   both significands at the right to 64 bits.

1853	   (Byte and text) strings are compared byte by byte, arrays element by
1854	   element, and are equal if they have the same number of bytes/elements
1855	   and the same values at the same positions.  Two maps are equal if
1856	   they have the same set of pairs regardless of their order; pairs are
1857	   equal if both the key and value are equal.

1859	   Tagged values are equal if both the tag number and the tag content
1860	   are equal.  (Note that a generic decoder that provides processing for
1861	   a specific tag may not be able to distinguish some semantically
1862	   equivalent values, e.g. if leading zeroes occur in the content of tag
1863	   2/3 (Section 3.4.3).)  Simple values are equal if they simply have
1864	   the same value.  Nothing else is equal in the generic data model, a
1865	   simple value 2 is not equivalent to an integer 2 and an array is
1866	   never equivalent to a map.

1868	   As discussed in Section 2.2, specific data models can make values
1869	   equivalent for the purpose of comparing map keys that are distinct in
1870	   the generic data model.  Note that this implies that a generic
1871	   decoder may deliver a decoded map to an application that needs to be
1872	   checked for duplicate map keys by that application (alternatively,
1873	   the decoder may provide a programming interface to perform this
1874	   service for the application).  Specific data models cannot
1875	   distinguish values for map keys that are equal for this purpose at
1876	   the generic data model level.

1878	5.7.  Undefined Values

1880	   In some CBOR-based protocols, the simple value (Section 3.3) of
1881	   Undefined might be used by an encoder as a substitute for a data item
1882	   with an encoding problem, in order to allow the rest of the enclosing
1883	   data items to be encoded without harm.

1885	6.  Converting Data between CBOR and JSON

1887	   This section gives non-normative advice about converting between CBOR
1888	   and JSON.  Implementations of converters are free to use whichever
1889	   advice here they want.

1891	   It is worth noting that a JSON text is a sequence of characters, not
1892	   an encoded sequence of bytes, while a CBOR data item consists of
1893	   bytes, not characters.

1895	6.1.  Converting from CBOR to JSON

1897	   Most of the types in CBOR have direct analogs in JSON.  However, some
1898	   do not, and someone implementing a CBOR-to-JSON converter has to
1899	   consider what to do in those cases.  The following non-normative
1900	   advice deals with these by converting them to a single substitute
1901	   value, such as a JSON null.

1903	   *  An integer (major type 0 or 1) becomes a JSON number.

1905	   *  A byte string (major type 2) that is not embedded in a tag that
1906	      specifies a proposed encoding is encoded in base64url without
1907	      padding and becomes a JSON string.

1909	   *  A UTF-8 string (major type 3) becomes a JSON string.  Note that
1910	      JSON requires escaping certain characters ([RFC8259], Section 7):
1911	      quotation mark (U+0022), reverse solidus (U+005C), and the "C0
1912	      control characters" (U+0000 through U+001F).  All other characters
1913	      are copied unchanged into the JSON UTF-8 string.

1915	   *  An array (major type 4) becomes a JSON array.

1917	   *  A map (major type 5) becomes a JSON object.  This is possible
1918	      directly only if all keys are UTF-8 strings.  A converter might
1919	      also convert other keys into UTF-8 strings (such as by converting
1920	      integers into strings containing their decimal representation);
1921	      however, doing so introduces a danger of key collision.  Note also
1922	      that, if tags on UTF-8 strings are ignored as proposed below, this
1923	      will cause a key collision if the tags are different but the
1924	      strings are the same.

1926	   *  False (major type 7, additional information 20) becomes a JSON
1927	      false.

1929	   *  True (major type 7, additional information 21) becomes a JSON
1930	      true.

1932	   *  Null (major type 7, additional information 22) becomes a JSON
1933	      null.

1935	   *  A floating-point value (major type 7, additional information 25
1936	      through 27) becomes a JSON number if it is finite (that is, it can
1937	      be represented in a JSON number); if the value is non-finite (NaN,
1938	      or positive or negative Infinity), it is represented by the
1939	      substitute value.

1941	   *  Any other simple value (major type 7, any additional information
1942	      value not yet discussed) is represented by the substitute value.

1944	   *  A bignum (major type 6, tag number 2 or 3) is represented by
1945	      encoding its byte string in base64url without padding and becomes
1946	      a JSON string.  For tag number 3 (negative bignum), a "~" (ASCII
1947	      tilde) is inserted before the base-encoded value.  (The conversion
1948	      to a binary blob instead of a number is to prevent a likely
1949	      numeric overflow for the JSON decoder.)

1951	   *  A byte string with an encoding hint (major type 6, tag number 21
1952	      through 23) is encoded as described and becomes a JSON string.

1954	   *  For all other tags (major type 6, any other tag number), the tag
1955	      content is represented as a JSON value; the tag number is ignored.

1957	   *  Indefinite-length items are made definite before conversion.

1959	6.2.  Converting from JSON to CBOR

1961	   All JSON values, once decoded, directly map into one or more CBOR
1962	   values.  As with any kind of CBOR generation, decisions have to be
1963	   made with respect to number representation.  In a suggested
1964	   conversion:

1966	   *  JSON numbers without fractional parts (integer numbers) are
1967	      represented as integers (major types 0 and 1, possibly major type
1968	      6 tag number 2 and 3), choosing the shortest form; integers longer
1969	      than an implementation-defined threshold may instead be
1970	      represented as floating-point values.  The default range that is
1971	      represented as integer is -2**53+1..2**53-1 (fully exploiting the
1972	      range for exact integers in the binary64 representation often used
1973	      for decoding JSON [RFC7493]).  A CBOR-based protocol, or a generic
1974	      converter implementation, may choose -2**32..2**32-1 or
1975	      -2**64..2**64-1 (fully using the integer ranges available in CBOR
1976	      with uint32_t or uint64_t, respectively) or even -2**31..2**31-1
1977	      or -2**63..2**63-1 (using popular ranges for two's complement
1978	      signed integers).  (If the JSON was generated from a JavaScript
1979	      implementation, its precision is already limited to 53 bits
1980	      maximum.)

1982	   *  Numbers with fractional parts are represented as floating-point
1983	      values, performing the decimal-to-binary conversion based on the
1984	      precision provided by IEEE 754 binary64.  Then, when encoding in
1985	      CBOR, the preferred serialization uses the shortest floating-point
1986	      representation exactly representing this conversion result; for
1987	      instance, 1.5 is represented in a 16-bit floating-point value (not
1988	      all implementations will be capable of efficiently finding the
1989	      minimum form, though).  Instead of using the default binary64
1990	      precision, there may be an implementation-defined limit to the
1991	      precision of the conversion that will affect the precision of the
1992	      represented values.  Decimal representation should only be used on
1993	      the CBOR side if that is specified in a protocol.

1995	   CBOR has been designed to generally provide a more compact encoding
1996	   than JSON.  One implementation strategy that might come to mind is to
1997	   perform a JSON-to-CBOR encoding in place in a single buffer.  This
1998	   strategy would need to carefully consider a number of pathological
1999	   cases, such as that some strings represented with no or very few
2000	   escapes and longer (or much longer) than 255 bytes may expand when
2001	   encoded as UTF-8 strings in CBOR.  Similarly, a few of the binary
2002	   floating-point representations might cause expansion from some short
2003	   decimal representations (1.1, 1e9) in JSON.  This may be hard to get
2004	   right, and any ensuing vulnerabilities may be exploited by an
2005	   attacker.

2007	7.  Future Evolution of CBOR

2009	   Successful protocols evolve over time.  New ideas appear,
2010	   implementation platforms improve, related protocols are developed and
2011	   evolve, and new requirements from applications and protocols are
2012	   added.  Facilitating protocol evolution is therefore an important
2013	   design consideration for any protocol development.

2015	   For protocols that will use CBOR, CBOR provides some useful
2016	   mechanisms to facilitate their evolution.  Best practices for this
2017	   are well known, particularly from JSON format development of JSON-
2018	   based protocols.  Therefore, such best practices are outside the
2019	   scope of this specification.

2021	   However, facilitating the evolution of CBOR itself is very well
2022	   within its scope.  CBOR is designed to both provide a stable basis
2023	   for development of CBOR-based protocols and to be able to evolve.
2024	   Since a successful protocol may live for decades, CBOR needs to be
2025	   designed for decades of use and evolution.  This section provides
2026	   some guidance for the evolution of CBOR.  It is necessarily more
2027	   subjective than other parts of this document.  It is also necessarily
2028	   incomplete, lest it turn into a textbook on protocol development.

2030	7.1.  Extension Points

2032	   In a protocol design, opportunities for evolution are often included
2033	   in the form of extension points.  For example, there may be a
2034	   codepoint space that is not fully allocated from the outset, and the
2035	   protocol is designed to tolerate and embrace implementations that
2036	   start using more codepoints than initially allocated.

2038	   Sizing the codepoint space may be difficult because the range
2039	   required may be hard to predict.  An attempt should be made to make
2040	   the codepoint space large enough so that it can slowly be filled over
2041	   the intended lifetime of the protocol.

2043	   CBOR has three major extension points:

2045	   *  the "simple" space (values in major type 7).  Of the 24 efficient
2046	      (and 224 slightly less efficient) values, only a small number have
2047	      been allocated.  Implementations receiving an unknown simple data
2048	      item may be able to process it as such, given that the structure
2049	      of the value is indeed simple.  The IANA registry in Section 9.1
2050	      is the appropriate way to address the extensibility of this
2051	      codepoint space.

2053	   *  the "tag" space (values in major type 6).  Again, only a small
2054	      part of the codepoint space has been allocated, and the space is
2055	      abundant (although the early numbers are more efficient than the
2056	      later ones).  Implementations receiving an unknown tag number can
2057	      choose to simply ignore it (process just the enclosed tag content)
2058	      or to process it as an unknown tag number wrapping the tag
2059	      content.  The IANA registry in Section 9.2 is the appropriate way
2060	      to address the extensibility of this codepoint space.

2062	   *  the "additional information" space.  An implementation receiving
2063	      an unknown additional information value has no way to continue
2064	      decoding, so allocating codepoints to this space is a major step.
2065	      There are also very few codepoints left.  See also Section 7.2.

2067	7.2.  Curating the Additional Information Space

2069	   The human mind is sometimes drawn to filling in little perceived gaps
2070	   to make something neat.  We expect the remaining gaps in the
2071	   codepoint space for the additional information values to be an
2072	   attractor for new ideas, just because they are there.

2074	   The present specification does not manage the additional information
2075	   codepoint space by an IANA registry.  Instead, allocations out of
2076	   this space can only be done by updating this specification.

2078	   For an additional information value of n >= 24, the size of the
2079	   additional data typically is 2**(n-24) bytes.  Therefore, additional
2080	   information values 28 and 29 should be viewed as candidates for
2081	   128-bit and 256-bit quantities, in case a need arises to add them to
2082	   the protocol.  Additional information value 30 is then the only
2083	   additional information value available for general allocation, and
2084	   there should be a very good reason for allocating it before assigning
2085	   it through an update of the present specification.

2087	8.  Diagnostic Notation

2089	   CBOR is a binary interchange format.  To facilitate documentation and
2090	   debugging, and in particular to facilitate communication between
2091	   entities cooperating in debugging, this section defines a simple
2092	   human-readable diagnostic notation.  All actual interchange always
2093	   happens in the binary format.

2095	   Note that this truly is a diagnostic format; it is not meant to be
2096	   parsed.  Therefore, no formal definition (as in ABNF) is given in
2097	   this document.  (Implementers looking for a text-based format for
2098	   representing CBOR data items in configuration files may also want to
2099	   consider YAML [YAML].)

2101	   The diagnostic notation is loosely based on JSON as it is defined in
2102	   RFC 8259, extending it where needed.

2104	   The notation borrows the JSON syntax for numbers (integer and
2105	   floating-point), True (>true<), False (>false<), Null (>null<), UTF-8
2106	   strings, arrays, and maps (maps are called objects in JSON; the
2107	   diagnostic notation extends JSON here by allowing any data item in
2108	   the key position).  Undefined is written >undefined< as in
2109	   JavaScript.  The non-finite floating-point numbers Infinity,
2110	   -Infinity, and NaN are written exactly as in this sentence (this is
2111	   also a way they can be written in JavaScript, although JSON does not
2112	   allow them).  A tag is written as an integer number for the tag
2113	   number, followed by the tag content in parentheses; for instance, an
2114	   RFC 3339 (ISO 8601) date could be notated as:

2116	      0("2013-03-21T20:04:00Z")

2118	   or the equivalent relative time as

2120	      1(1363896240)

2122	   Byte strings are notated in one of the base encodings, without
2123	   padding, enclosed in single quotes, prefixed by >h< for base16, >b32<
2124	   for base32, >h32< for base32hex, >b64< for base64 or base64url (the
2125	   actual encodings do not overlap, so the string remains unambiguous).
2126	   For example, the byte string 0x12345678 could be written h'12345678',
2127	   b32'CI2FM6A', or b64'EjRWeA'.

2129	   Unassigned simple values are given as "simple()" with the appropriate
2130	   integer in the parentheses.  For example, "simple(42)" indicates
2131	   major type 7, value 42.

2133	   A number of useful extensions to the diagnostic notation defined here
2134	   are provided in Appendix G of [RFC8610], "Extended Diagnostic
2135	   Notation" (EDN).

2137	8.1.  Encoding Indicators

2139	   Sometimes it is useful to indicate in the diagnostic notation which
2140	   of several alternative representations were actually used; for
2141	   example, a data item written >1.5< by a diagnostic decoder might have
2142	   been encoded as a half-, single-, or double-precision float.

2144	   The convention for encoding indicators is that anything starting with
2145	   an underscore and all following characters that are alphanumeric or
2146	   underscore, is an encoding indicator, and can be ignored by anyone
2147	   not interested in this information.  For example, "_" or "_3".
2148	   Encoding indicators are always optional.

2150	   A single underscore can be written after the opening brace of a map
2151	   or the opening bracket of an array to indicate that the data item was
2152	   represented in indefinite-length format.  For example, [_ 1, 2]
2153	   contains an indicator that an indefinite-length representation was
2154	   used to represent the data item [1, 2].

2156	   An underscore followed by a decimal digit n indicates that the
2157	   preceding item (or, for arrays and maps, the item starting with the
2158	   preceding bracket or brace) was encoded with an additional
2159	   information value of 24+n.  For example, 1.5_1 is a half-precision
2160	   floating-point number, while 1.5_3 is encoded as double precision.
2161	   This encoding indicator is not shown in Appendix A.  (Note that the
2162	   encoding indicator "_" is thus an abbreviation of the full form "_7",
2163	   which is not used.)

2165	   Byte and text strings of indefinite length can be notated in the form
2166	   (_ h'0123', h'4567') and (_ "foo", "bar").

2168	9.  IANA Considerations

2170	   IANA has created two registries for new CBOR values.  The registries
2171	   are separate, that is, not under an umbrella registry, and follow the
2172	   rules in [RFC8126].  IANA has also assigned a new MIME media type and
2173	   an associated Constrained Application Protocol (CoAP) Content-Format
2174	   entry.

2176	   [To be removed by RFC editor:] IANA is requested to update these
2177	   registries to point to the present document instead of RFC 7049.

2179	9.1.  Simple Values Registry

2181	   IANA has created the "Concise Binary Object Representation (CBOR)
2182	   Simple Values" registry at [IANA.cbor-simple-values].  The initial
2183	   values are shown in Table 4.

2185	   New entries in the range 0 to 19 are assigned by Standards Action.
2186	   It is suggested that these Standards Actions allocate values starting
2187	   with the number 16 in order to reserve the lower numbers for
2188	   contiguous blocks (if any).

2190	   New entries in the range 32 to 255 are assigned by Specification
2191	   Required.

2193	9.2.  Tags Registry

2195	   IANA has created the "Concise Binary Object Representation (CBOR)
2196	   Tags" registry at [IANA.cbor-tags].  The tags that were defined in
2197	   [RFC7049] are described in detail in Section 3.4, and other tags have
2198	   already been defined.

2200	   New entries in the range 0 to 23 are assigned by Standards Action.
2201	   New entries in the range 24 to 255 are assigned by Specification
2202	   Required.  New entries in the range 256 to 18446744073709551615 are
2203	   assigned by First Come First Served.  The template for registration
2204	   requests is:

2206	   *  Data item

2208	   *  Semantics (short form)

2210	   In addition, First Come First Served requests should include:

2212	   *  Point of contact

2214	   *  Description of semantics (URL) - This description is optional; the
2215	      URL can point to something like an Internet-Draft or a web page.

2217	9.3.  Media Type ("MIME Type")

2219	   The Internet media type [RFC6838] for a single encoded CBOR data item
2220	   is application/cbor, as defined in [IANA.media-types]:

2222	   Type name: application

2224	   Subtype name: cbor

2226	   Required parameters: n/a
2227	   Optional parameters: n/a

2229	   Encoding considerations:  binary

2231	   Security considerations:  See Section 10 of this document

2233	   Interoperability considerations: n/a

2235	   Published specification: This document

2237	   Applications that use this media type:  None yet, but it is expected
2238	      that this format will be deployed in protocols and applications.

2240	   Additional information:  *  Magic number(s): n/a

2242	      *  File extension(s): .cbor

2244	      *  Macintosh file type code(s): n/a

2246	   Person & email address to contact for further information:  IETF CBOR
2247	      Working Group cbor@ietf.org (mailto:cbor@ietf.org) or IETF
2248	      Applications and Real-Time Area art@ietf.org (mailto:art@ietf.org)

2250	   Intended usage: COMMON

2252	   Restrictions on usage: none

2254	   Author:  IETF CBOR Working Group cbor@ietf.org (mailto:cbor@ietf.org)

2256	   Change controller:  The IESG iesg@ietf.org (mailto:iesg@ietf.org)

2258	9.4.  CoAP Content-Format

2260	   The CoAP Content-Format for CBOR is defined in
2261	   [IANA.core-parameters]:

2263	   Media Type: application/cbor

2265	   Encoding: -

2267	   Id: 60

2269	   Reference: [RFCthis]

2271	9.5.  The +cbor Structured Syntax Suffix Registration

2273	   The Structured Syntax Suffix [RFC6838] for media types based on a
2274	   single encoded CBOR data item is +cbor, as defined in
2275	   [IANA.media-type-structured-suffix]:

2277	   Name: Concise Binary Object Representation (CBOR)

2279	   +suffix: +cbor

2281	   References: [RFCthis]

2283	   Encoding Considerations: CBOR is a binary format.

2285	   Interoperability Considerations: n/a

2287	   Fragment Identifier Considerations:  The syntax and semantics of
2288	      fragment identifiers specified for +cbor SHOULD be as specified
2289	      for "application/cbor".  (At publication of this document, there
2290	      is no fragment identification syntax defined for "application/
2291	      cbor".)

2293	      The syntax and semantics for fragment identifiers for a specific
2294	      "xxx/yyy+cbor" SHOULD be processed as follows:

2296	      *  For cases defined in +cbor, where the fragment identifier
2297	         resolves per the +cbor rules, then process as specified in
2298	         +cbor.

2300	      *  For cases defined in +cbor, where the fragment identifier does
2301	         not resolve per the +cbor rules, then process as specified in
2302	         "xxx/yyy+cbor".

2304	      *  For cases not defined in +cbor, then process as specified in
2305	         "xxx/yyy+cbor".

2307	   Security Considerations:  See Section 10 of this document

2309	   Contact:  IETF CBOR Working Group cbor@ietf.org
2310	      (mailto:cbor@ietf.org) or IETF Applications and Real-Time Area
2311	      art@ietf.org (mailto:art@ietf.org)

2313	   Author/Change Controller:  The IESG iesg@ietf.org
2314	      (mailto:iesg@ietf.org)
2315	                              // Editors' note: RFC 6838 has a template
2316	      field Author/Change
2317	                              // controller, the descriptive text of
2318	      which makes clear that this is
2319	                              // the change controller, not the author.
2320	      Go figure.  There is no
2321	                              // separate author entry as in the media
2322	      types registry.  (RFC
2323	                              // editor: Please remove this note before
2324	      publication.)

2326	10.  Security Considerations

2328	   A network-facing application can exhibit vulnerabilities in its
2329	   processing logic for incoming data.  Complex parsers are well known
2330	   as a likely source of such vulnerabilities, such as the ability to
2331	   remotely crash a node, or even remotely execute arbitrary code on it.
2332	   CBOR attempts to narrow the opportunities for introducing such
2333	   vulnerabilities by reducing parser complexity, by giving the entire
2334	   range of encodable values a meaning where possible.

2336	   Because CBOR decoders are often used as a first step in processing
2337	   unvalidated input, they need to be fully prepared for all types of
2338	   hostile input that may be designed to corrupt, overrun, or achieve
2339	   control of the system decoding the CBOR data item.  A CBOR decoder
2340	   needs to assume that all input may be hostile even if it has been
2341	   checked by a firewall, has come over a secure channel such as TLS, is
2342	   encrypted or signed, or has come from some other source that is
2343	   presumed trusted.

2345	   Hostile input may be constructed to overrun buffers, overflow or
2346	   underflow integer arithmetic, or cause other decoding disruption.
2347	   CBOR data items might have lengths or sizes that are intentionally
2348	   extremely large or too short.  Resource exhaustion attacks might
2349	   attempt to lure a decoder into allocating very big data items
2350	   (strings, arrays, maps, or even arbitrary precision numbers) or
2351	   exhaust the stack depth by setting up deeply nested items.  Decoders
2352	   need to have appropriate resource management to mitigate these
2353	   attacks.  (Items for which very large sizes are given can also
2354	   attempt to exploit integer overflow vulnerabilities.)

2356	   A CBOR decoder, by definition, only accepts well-formed CBOR; this is
2357	   the first step to its robustness.  Input that is not well-formed CBOR
2358	   causes no further processing from the point where the lack of well-
2359	   formedness was detected.  If possible, any data decoded up to this
2360	   point should have no impact on the application using the CBOR
2361	   decoder.

2363	   In addition to ascertaining well-formedness, a CBOR decoder might
2364	   also perform validity checks on the CBOR data.  Alternatively, it can
2365	   leave those checks to the application using the decoder.  This choice
2366	   needs to be clearly documented in the decoder.  Beyond the validity
2367	   at the CBOR level, an application also needs to ascertain that the
2368	   input is in alignment with the application protocol that is
2369	   serialized in CBOR.

2371	   The input check itself may consume resources.  This is usually linear
2372	   in the size of the input, which means that an attacker has to spend
2373	   resources that are commensurate to the resources spent by the
2374	   defender on input validation.  Processing for arbitrary-precision
2375	   numbers may exceed linear effort.  Also, some hash-table
2376	   implementations that are used by decoders to build in-memory
2377	   representations of maps can be attacked to spend quadratic effort,
2378	   unless a secret key is employed (see Section 7 of [SIPHASH]).  Such
2379	   superlinear efforts can be employed by an attacker to exhaust
2380	   resources at or before the input validator; they therefore need to be
2381	   avoided in a CBOR decoder implementation.  Note that tag number
2382	   definitions and their implementations can add security considerations
2383	   of this kind; this should then be discussed in the security
2384	   considerations of the tag number definition.

2386	   CBOR encoders do not receive input directly from the network and are
2387	   thus not directly attackable in the same way as CBOR decoders.
2388	   However, CBOR encoders often have an API that takes input from
2389	   another level in the implementation and can be attacked through that
2390	   API.  The design and implementation of that API should assume the
2391	   behavior of its caller may be based on hostile input or on coding
2392	   mistakes.  It should check inputs for buffer overruns, overflow and
2393	   underflow of integer arithmetic, and other such errors that are aimed
2394	   to disrupt the encoder.

2396	   Protocols should be defined in such a way that potential multiple
2397	   interpretations are reliably reduced to a single interpretation.  For
2398	   example, an attacker could make use of invalid input such as
2399	   duplicate keys in maps, or exploit different precision in processing
2400	   numbers to make one application base its decisions on a different
2401	   interpretation than the one that will be used by a second
2402	   application.  To facilitate consistent interpretation, encoder and
2403	   decoder implementations should provide a validity checking mode of
2404	   operation (Section 5.4).  Note, however, that a generic decoder
2405	   cannot know about all requirements that an application poses on its
2406	   input data; it is therefore not relieving the application from
2407	   performing its own input checking.  Also, since the set of defined
2408	   tag numbers evolves, the application may employ a tag number that is
2409	   not yet supported for validity checking by the generic decoder it
2410	   uses.  Generic decoders therefore need to provide documentation which
2411	   tag numbers they support and what validity checking they can provide
2412	   for each of them as well as for basic CBOR validity (UTF-8 checking,
2413	   duplicate map key checking).

2415	11.  References

2417	11.1.  Normative References

2419	   [ECMA262]  Ecma International, "ECMAScript 2018 Language
2420	              Specification", ECMA Standard ECMA-262, 9th Edition, June
2421	              2018, .

2425	   [IEEE754]  IEEE, "IEEE Standard for Floating-Point Arithmetic", IEEE
2426	              Std 754-2008.

2428	   [RFC2045]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
2429	              Extensions (MIME) Part One: Format of Internet Message
2430	              Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996,
2431	              .

2433	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
2434	              Requirement Levels", BCP 14, RFC 2119,
2435	              DOI 10.17487/RFC2119, March 1997,
2436	              .

2438	   [RFC3339]  Klyne, G. and C. Newman, "Date and Time on the Internet:
2439	              Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002,
2440	              .

2442	   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
2443	              10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November
2444	              2003, .

2446	   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
2447	              Resource Identifier (URI): Generic Syntax", STD 66,
2448	              RFC 3986, DOI 10.17487/RFC3986, January 2005,
2449	              .

2451	   [RFC4287]  Nottingham, M., Ed. and R. Sayre, Ed., "The Atom
2452	              Syndication Format", RFC 4287, DOI 10.17487/RFC4287,
2453	              December 2005, .

2455	   [RFC4648]  Josefsson, S., "The Base16, Base32, and Base64 Data
2456	              Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
2457	              .

2459	   [RFC8126]  Cotton, M., Leiba, B., and T. Narten, "Guidelines for
2460	              Writing an IANA Considerations Section in RFCs", BCP 26,
2461	              RFC 8126, DOI 10.17487/RFC8126, June 2017,
2462	              .

2464	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2465	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
2466	              May 2017, .

2468	   [TIME_T]   The Open Group Base Specifications, "Vol. 1: Base
2469	              Definitions, Issue 7", 2013 Edition, IEEE Std 1003.1,
2470	              Section 4.15 'Seconds Since the Epoch', 2013,
2471	              .

2474	11.2.  Informative References

2476	   [ASN.1]    International Telecommunication Union, "Information
2477	              Technology -- ASN.1 encoding rules: Specification of Basic
2478	              Encoding Rules (BER), Canonical Encoding Rules (CER) and
2479	              Distinguished Encoding Rules (DER)", ITU-T Recommendation
2480	              X.690, 1994.

2482	   [BSON]     Various, "BSON - Binary JSON", 2013,
2483	              .

2485	   [IANA.cbor-simple-values]
2486	              IANA, "Concise Binary Object Representation (CBOR) Simple
2487	              Values",
2488	              .

2490	   [IANA.cbor-tags]
2491	              IANA, "Concise Binary Object Representation (CBOR) Tags",
2492	              .

2494	   [IANA.core-parameters]
2495	              IANA, "Constrained RESTful Environments (CoRE)
2496	              Parameters",
2497	              .

2499	   [IANA.media-type-structured-suffix]
2500	              IANA, "Structured Syntax Suffix Registry",
2501	              .

2504	   [IANA.media-types]
2505	              IANA, "Media Types",
2506	              .

2508	   [MessagePack]
2509	              Furuhashi, S., "MessagePack", 2013, .

2511	   [PCRE]     Ho, A., "PCRE - Perl Compatible Regular Expressions",
2512	              2018, .

2514	   [RFC0713]  Haverty, J., "MSDTP-Message Services Data Transmission
2515	              Protocol", RFC 713, DOI 10.17487/RFC0713, April 1976,
2516	              .

2518	   [RFC6838]  Freed, N., Klensin, J., and T. Hansen, "Media Type
2519	              Specifications and Registration Procedures", BCP 13,
2520	              RFC 6838, DOI 10.17487/RFC6838, January 2013,
2521	              .

2523	   [RFC7049]  Bormann, C. and P. Hoffman, "Concise Binary Object
2524	              Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049,
2525	              October 2013, .

2527	   [RFC7228]  Bormann, C., Ersue, M., and A. Keranen, "Terminology for
2528	              Constrained-Node Networks", RFC 7228,
2529	              DOI 10.17487/RFC7228, May 2014,
2530	              .

2532	   [RFC7493]  Bray, T., Ed., "The I-JSON Message Format", RFC 7493,
2533	              DOI 10.17487/RFC7493, March 2015,
2534	              .

2536	   [RFC7991]  Hoffman, P., "The "xml2rfc" Version 3 Vocabulary",
2537	              RFC 7991, DOI 10.17487/RFC7991, December 2016,
2538	              .

2540	   [RFC8259]  Bray, T., Ed., "The JavaScript Object Notation (JSON) Data
2541	              Interchange Format", STD 90, RFC 8259,
2542	              DOI 10.17487/RFC8259, December 2017,
2543	              .

2545	   [RFC8610]  Birkholz, H., Vigano, C., and C. Bormann, "Concise Data
2546	              Definition Language (CDDL): A Notational Convention to
2547	              Express Concise Binary Object Representation (CBOR) and
2548	              JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610,
2549	              June 2019, .

2551	   [RFC8618]  Dickinson, J., Hague, J., Dickinson, S., Manderson, T.,
2552	              and J. Bond, "Compacted-DNS (C-DNS): A Format for DNS
2553	              Packet Capture", RFC 8618, DOI 10.17487/RFC8618, September
2554	              2019, .

2556	   [RFC8742]  Bormann, C., "Concise Binary Object Representation (CBOR)
2557	              Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020,
2558	              .

2560	   [RFC8746]  Bormann, C., Ed., "Concise Binary Object Representation
2561	              (CBOR) Tags for Typed Arrays", RFC 8746,
2562	              DOI 10.17487/RFC8746, February 2020,
2563	              .

2565	   [rfc8746]  Bormann, C., Ed., "Concise Binary Object Representation
2566	              (CBOR) Tags for Typed Arrays", RFC 8746,
2567	              DOI 10.17487/RFC8746, February 2020,
2568	              .

2570	   [SIPHASH]  Aumasson, J. and D. Bernstein, "SipHash: A Fast Short-
2571	              Input PRF", DOI 10.1007/978-3-642-34931-7_28, Lecture
2572	              Notes in Computer Science pp. 489-508, 2012,
2573	              .

2575	   [YAML]     Ben-Kiki, O., Evans, C., and I.d. Net, "YAML Ain't Markup
2576	              Language (YAML[TM]) Version 1.2", 3rd Edition, October
2577	              2009, .

2579	Appendix A.  Examples

2581	   The following table provides some CBOR-encoded values in hexadecimal
2582	   (right column), together with diagnostic notation for these values
2583	   (left column).  Note that the string "\u00fc" is one form of
2584	   diagnostic notation for a UTF-8 string containing the single Unicode
2585	   character U+00FC, LATIN SMALL LETTER U WITH DIAERESIS (u umlaut).
2586	   Similarly, "\u6c34" is a UTF-8 string in diagnostic notation with a
2587	   single character U+6C34 (CJK UNIFIED IDEOGRAPH-6C34, often
2588	   representing "water"), and "\ud800\udd51" is a UTF-8 string in
2589	   diagnostic notation with a single character U+10151 (GREEK ACROPHONIC
2590	   ATTIC FIFTY STATERS).  (Note that all these single-character strings
2591	   could also be represented in native UTF-8 in diagnostic notation,
2592	   just not in an ASCII-only specification like the present one.)  In
2593	   the diagnostic notation provided for bignums, their intended numeric
2594	   value is shown as a decimal number (such as 18446744073709551616)
2595	   instead of showing a tagged byte string (such as
2596	   2(h'010000000000000000')).

2598	   +------------------------------+------------------------------------+
2599	   | Diagnostic                   | Encoded                            |
2600	   +==============================+====================================+
2601	   | 0                            | 0x00                               |
2602	   +------------------------------+------------------------------------+
2603	   | 1                            | 0x01                               |
2604	   +------------------------------+------------------------------------+
2605	   | 10                           | 0x0a                               |
2606	   +------------------------------+------------------------------------+
2607	   | 23                           | 0x17                               |
2608	   +------------------------------+------------------------------------+
2609	   | 24                           | 0x1818                             |
2610	   +------------------------------+------------------------------------+
2611	   | 25                           | 0x1819                             |
2612	   +------------------------------+------------------------------------+
2613	   | 100                          | 0x1864                             |
2614	   +------------------------------+------------------------------------+
2615	   | 1000                         | 0x1903e8                           |
2616	   +------------------------------+------------------------------------+
2617	   | 1000000                      | 0x1a000f4240                       |
2618	   +------------------------------+------------------------------------+
2619	   | 1000000000000                | 0x1b000000e8d4a51000               |
2620	   +------------------------------+------------------------------------+
2621	   | 18446744073709551615         | 0x1bffffffffffffffff               |
2622	   +------------------------------+------------------------------------+
2623	   | 18446744073709551616         | 0xc249010000000000000000           |
2624	   +------------------------------+------------------------------------+
2625	   | -18446744073709551616        | 0x3bffffffffffffffff               |
2626	   +------------------------------+------------------------------------+
2627	   | -18446744073709551617        | 0xc349010000000000000000           |
2628	   +------------------------------+------------------------------------+
2629	   | -1                           | 0x20                               |
2630	   +------------------------------+------------------------------------+
2631	   | -10                          | 0x29                               |
2632	   +------------------------------+------------------------------------+
2633	   | -100                         | 0x3863                             |
2634	   +------------------------------+------------------------------------+
2635	   | -1000                        | 0x3903e7                           |
2636	   +------------------------------+------------------------------------+
2637	   | 0.0                          | 0xf90000                           |
2638	   +------------------------------+------------------------------------+
2639	   | -0.0                         | 0xf98000                           |
2640	   +------------------------------+------------------------------------+
2641	   | 1.0                          | 0xf93c00                           |
2642	   +------------------------------+------------------------------------+
2643	   | 1.1                          | 0xfb3ff199999999999a               |
2644	   +------------------------------+------------------------------------+
2645	   | 1.5                          | 0xf93e00                           |
2646	   +------------------------------+------------------------------------+
2647	   | 65504.0                      | 0xf97bff                           |
2648	   +------------------------------+------------------------------------+
2649	   | 100000.0                     | 0xfa47c35000                       |
2650	   +------------------------------+------------------------------------+
2651	   | 3.4028234663852886e+38       | 0xfa7f7fffff                       |
2652	   +------------------------------+------------------------------------+
2653	   | 1.0e+300                     | 0xfb7e37e43c8800759c               |
2654	   +------------------------------+------------------------------------+
2655	   | 5.960464477539063e-8         | 0xf90001                           |
2656	   +------------------------------+------------------------------------+
2657	   | 0.00006103515625             | 0xf90400                           |
2658	   +------------------------------+------------------------------------+
2659	   | -4.0                         | 0xf9c400                           |
2660	   +------------------------------+------------------------------------+
2661	   | -4.1                         | 0xfbc010666666666666               |
2662	   +------------------------------+------------------------------------+
2663	   | Infinity                     | 0xf97c00                           |
2664	   +------------------------------+------------------------------------+
2665	   | NaN                          | 0xf97e00                           |
2666	   +------------------------------+------------------------------------+
2667	   | -Infinity                    | 0xf9fc00                           |
2668	   +------------------------------+------------------------------------+
2669	   | Infinity                     | 0xfa7f800000                       |
2670	   +------------------------------+------------------------------------+
2671	   | NaN                          | 0xfa7fc00000                       |
2672	   +------------------------------+------------------------------------+
2673	   | -Infinity                    | 0xfaff800000                       |
2674	   +------------------------------+------------------------------------+
2675	   | Infinity                     | 0xfb7ff0000000000000               |
2676	   +------------------------------+------------------------------------+
2677	   | NaN                          | 0xfb7ff8000000000000               |
2678	   +------------------------------+------------------------------------+
2679	   | -Infinity                    | 0xfbfff0000000000000               |
2680	   +------------------------------+------------------------------------+
2681	   | false                        | 0xf4                               |
2682	   +------------------------------+------------------------------------+
2683	   | true                         | 0xf5                               |
2684	   +------------------------------+------------------------------------+
2685	   | null                         | 0xf6                               |
2686	   +------------------------------+------------------------------------+
2687	   | undefined                    | 0xf7                               |
2688	   +------------------------------+------------------------------------+
2689	   | simple(16)                   | 0xf0                               |
2690	   +------------------------------+------------------------------------+
2691	   | simple(255)                  | 0xf8ff                             |
2692	   +------------------------------+------------------------------------+
2693	   | 0("2013-03-21T20:04:00Z")    | 0xc074323031332d30332d32315432303a |
2694	   |                              | 30343a30305a                       |
2695	   +------------------------------+------------------------------------+
2696	   | 1(1363896240)                | 0xc11a514b67b0                     |
2697	   +------------------------------+------------------------------------+
2698	   | 1(1363896240.5)              | 0xc1fb41d452d9ec200000             |
2699	   +------------------------------+------------------------------------+
2700	   | 23(h'01020304')              | 0xd74401020304                     |
2701	   +------------------------------+------------------------------------+
2702	   | 24(h'6449455446')            | 0xd818456449455446                 |
2703	   +------------------------------+------------------------------------+
2704	   | 32("http://www.example.com") | 0xd82076687474703a2f2f7777772e6578 |
2705	   |                              | 616d706c652e636f6d                 |
2706	   +------------------------------+------------------------------------+
2707	   | h''                          | 0x40                               |
2708	   +------------------------------+------------------------------------+
2709	   | h'01020304'                  | 0x4401020304                       |
2710	   +------------------------------+------------------------------------+
2711	   | ""                           | 0x60                               |
2712	   +------------------------------+------------------------------------+
2713	   | "a"                          | 0x6161                             |
2714	   +------------------------------+------------------------------------+
2715	   | "IETF"                       | 0x6449455446                       |
2716	   +------------------------------+------------------------------------+
2717	   | "\"\\"                       | 0x62225c                           |
2718	   +------------------------------+------------------------------------+
2719	   | "\u00fc"                     | 0x62c3bc                           |
2720	   +------------------------------+------------------------------------+
2721	   | "\u6c34"                     | 0x63e6b0b4                         |
2722	   +------------------------------+------------------------------------+
2723	   | "\ud800\udd51"               | 0x64f0908591                       |
2724	   +------------------------------+------------------------------------+
2725	   | []                           | 0x80                               |
2726	   +------------------------------+------------------------------------+
2727	   | [1, 2, 3]                    | 0x83010203                         |
2728	   +------------------------------+------------------------------------+
2729	   | [1, [2, 3], [4, 5]]          | 0x8301820203820405                 |
2730	   +------------------------------+------------------------------------+
2731	   | [1, 2, 3, 4, 5, 6, 7, 8, 9,  | 0x98190102030405060708090a0b0c0d0e |
2732	   | 10, 11, 12, 13, 14, 15, 16,  | 0f101112131415161718181819         |
2733	   | 17, 18, 19, 20, 21, 22, 23,  |                                    |
2734	   | 24, 25]                      |                                    |
2735	   +------------------------------+------------------------------------+
2736	   | {}                           | 0xa0                               |
2737	   +------------------------------+------------------------------------+
2738	   | {1: 2, 3: 4}                 | 0xa201020304                       |
2739	   +------------------------------+------------------------------------+
2740	   | {"a": 1, "b": [2, 3]}        | 0xa26161016162820203               |
2741	   +------------------------------+------------------------------------+
2742	   | ["a", {"b": "c"}]            | 0x826161a161626163                 |
2743	   +------------------------------+------------------------------------+
2744	   |{"a": "A", "b": "B", "c": "C",| 0xa5616161416162614261636143616461 |
2745	   | "d": "D", "e": "E"}          | 4461656145                         |
2746	   +------------------------------+------------------------------------+
2747	   | (_ h'0102', h'030405')       | 0x5f42010243030405ff               |
2748	   +------------------------------+------------------------------------+
2749	   | (_ "strea", "ming")          | 0x7f657374726561646d696e67ff       |
2750	   +------------------------------+------------------------------------+
2751	   | [_ ]                         | 0x9fff                             |
2752	   +------------------------------+------------------------------------+
2753	   | [_ 1, [2, 3], [_ 4, 5]]      | 0x9f018202039f0405ffff             |
2754	   +------------------------------+------------------------------------+
2755	   | [_ 1, [2, 3], [4, 5]]        | 0x9f01820203820405ff               |
2756	   +------------------------------+------------------------------------+
2757	   | [1, [2, 3], [_ 4, 5]]        | 0x83018202039f0405ff               |
2758	   +------------------------------+------------------------------------+
2759	   | [1, [_ 2, 3], [4, 5]]        | 0x83019f0203ff820405               |
2760	   +------------------------------+------------------------------------+
2761	   |[_ 1, 2, 3, 4, 5, 6, 7, 8, 9, | 0x9f0102030405060708090a0b0c0d0e0f |
2762	   | 10, 11, 12, 13, 14, 15, 16,  | 101112131415161718181819ff         |
2763	   | 17, 18, 19, 20, 21, 22, 23,  |                                    |
2764	   | 24, 25]                      |                                    |
2765	   +------------------------------+------------------------------------+
2766	   | {_ "a": 1, "b": [_ 2, 3]}    | 0xbf61610161629f0203ffff           |
2767	   +------------------------------+------------------------------------+
2768	   | ["a", {_ "b": "c"}]          | 0x826161bf61626163ff               |
2769	   +------------------------------+------------------------------------+
2770	   | {_ "Fun": true, "Amt": -2}   | 0xbf6346756ef563416d7421ff         |
2771	   +------------------------------+------------------------------------+

2773	                Table 6: Examples of Encoded CBOR Data Items

2775	Appendix B.  Jump Table

2777	   For brevity, this jump table does not show initial bytes that are
2778	   reserved for future extension.  It also only shows a selection of the
2779	   initial bytes that can be used for optional features.  (All unsigned
2780	   integers are in network byte order.)

2782	      +------------+------------------------------------------------+
2783	      | Byte       | Structure/Semantics                            |
2784	      +============+================================================+
2785	      | 0x00..0x17 | Unsigned integer 0x00..0x17 (0..23)            |
2786	      +------------+------------------------------------------------+
2787	      | 0x18       | Unsigned integer (one-byte uint8_t follows)    |
2788	      +------------+------------------------------------------------+
2789	      | 0x19       | Unsigned integer (two-byte uint16_t follows)   |
2790	      +------------+------------------------------------------------+
2791	      | 0x1a       | Unsigned integer (four-byte uint32_t follows)  |
2792	      +------------+------------------------------------------------+
2793	      | 0x1b       | Unsigned integer (eight-byte uint64_t follows) |
2794	      +------------+------------------------------------------------+
2795	      | 0x20..0x37 | Negative integer -1-0x00..-1-0x17 (-1..-24)    |
2796	      +------------+------------------------------------------------+
2797	      | 0x38       | Negative integer -1-n (one-byte uint8_t for n  |
2798	      |            | follows)                                       |
2799	      +------------+------------------------------------------------+
2800	      | 0x39       | Negative integer -1-n (two-byte uint16_t for n |
2801	      |            | follows)                                       |
2802	      +------------+------------------------------------------------+
2803	      | 0x3a       | Negative integer -1-n (four-byte uint32_t for  |
2804	      |            | n follows)                                     |
2805	      +------------+------------------------------------------------+
2806	      | 0x3b       | Negative integer -1-n (eight-byte uint64_t for |
2807	      |            | n follows)                                     |
2808	      +------------+------------------------------------------------+
2809	      | 0x40..0x57 | byte string (0x00..0x17 bytes follow)          |
2810	      +------------+------------------------------------------------+
2811	      | 0x58       | byte string (one-byte uint8_t for n, and then  |
2812	      |            | n bytes follow)                                |
2813	      +------------+------------------------------------------------+
2814	      | 0x59       | byte string (two-byte uint16_t for n, and then |
2815	      |            | n bytes follow)                                |
2816	      +------------+------------------------------------------------+
2817	      | 0x5a       | byte string (four-byte uint32_t for n, and     |
2818	      |            | then n bytes follow)                           |
2819	      +------------+------------------------------------------------+
2820	      | 0x5b       | byte string (eight-byte uint64_t for n, and    |
2821	      |            | then n bytes follow)                           |
2822	      +------------+------------------------------------------------+
2823	      | 0x5f       | byte string, byte strings follow, terminated   |
2824	      |            | by "break"                                     |
2825	      +------------+------------------------------------------------+
2826	      | 0x60..0x77 | UTF-8 string (0x00..0x17 bytes follow)         |
2827	      +------------+------------------------------------------------+
2828	      | 0x78       | UTF-8 string (one-byte uint8_t for n, and then |
2829	      |            | n bytes follow)                                |
2830	      +------------+------------------------------------------------+
2831	      | 0x79       | UTF-8 string (two-byte uint16_t for n, and     |
2832	      |            | then n bytes follow)                           |
2833	      +------------+------------------------------------------------+
2834	      | 0x7a       | UTF-8 string (four-byte uint32_t for n, and    |
2835	      |            | then n bytes follow)                           |
2836	      +------------+------------------------------------------------+
2837	      | 0x7b       | UTF-8 string (eight-byte uint64_t for n, and   |
2838	      |            | then n bytes follow)                           |
2839	      +------------+------------------------------------------------+
2840	      | 0x7f       | UTF-8 string, UTF-8 strings follow, terminated |
2841	      |            | by "break"                                     |
2842	      +------------+------------------------------------------------+
2843	      | 0x80..0x97 | array (0x00..0x17 data items follow)           |
2844	      +------------+------------------------------------------------+
2845	      | 0x98       | array (one-byte uint8_t for n, and then n data |
2846	      |            | items follow)                                  |
2847	      +------------+------------------------------------------------+
2848	      | 0x99       | array (two-byte uint16_t for n, and then n     |
2849	      |            | data items follow)                             |
2850	      +------------+------------------------------------------------+
2851	      | 0x9a       | array (four-byte uint32_t for n, and then n    |
2852	      |            | data items follow)                             |
2853	      +------------+------------------------------------------------+
2854	      | 0x9b       | array (eight-byte uint64_t for n, and then n   |
2855	      |            | data items follow)                             |
2856	      +------------+------------------------------------------------+
2857	      | 0x9f       | array, data items follow, terminated by        |
2858	      |            | "break"                                        |
2859	      +------------+------------------------------------------------+
2860	      | 0xa0..0xb7 | map (0x00..0x17 pairs of data items follow)    |
2861	      +------------+------------------------------------------------+
2862	      | 0xb8       | map (one-byte uint8_t for n, and then n pairs  |
2863	      |            | of data items follow)                          |
2864	      +------------+------------------------------------------------+
2865	      | 0xb9       | map (two-byte uint16_t for n, and then n pairs |
2866	      |            | of data items follow)                          |
2867	      +------------+------------------------------------------------+
2868	      | 0xba       | map (four-byte uint32_t for n, and then n      |
2869	      |            | pairs of data items follow)                    |
2870	      +------------+------------------------------------------------+
2871	      | 0xbb       | map (eight-byte uint64_t for n, and then n     |
2872	      |            | pairs of data items follow)                    |
2873	      +------------+------------------------------------------------+
2874	      | 0xbf       | map, pairs of data items follow, terminated by |
2875	      |            | "break"                                        |
2876	      +------------+------------------------------------------------+
2877	      | 0xc0       | Text-based date/time (data item follows; see   |
2878	      |            | Section 3.4.1)                                 |
2879	      +------------+------------------------------------------------+
2880	      | 0xc1       | Epoch-based date/time (data item follows; see  |
2881	      |            | Section 3.4.2)                                 |
2882	      +------------+------------------------------------------------+
2883	      | 0xc2       | Positive bignum (data item "byte string"       |
2884	      |            | follows)                                       |
2885	      +------------+------------------------------------------------+
2886	      | 0xc3       | Negative bignum (data item "byte string"       |
2887	      |            | follows)                                       |
2888	      +------------+------------------------------------------------+
2889	      | 0xc4       | Decimal Fraction (data item "array" follows;   |
2890	      |            | see Section 3.4.4)                             |
2891	      +------------+------------------------------------------------+
2892	      | 0xc5       | Bigfloat (data item "array" follows; see       |
2893	      |            | Section 3.4.4)                                 |
2894	      +------------+------------------------------------------------+
2895	      | 0xc6..0xd4 | (tag)                                          |
2896	      +------------+------------------------------------------------+
2897	      | 0xd5..0xd7 | Expected Conversion (data item follows; see    |
2898	      |            | Section 3.4.5.2)                               |
2899	      +------------+------------------------------------------------+
2900	      | 0xd8..0xdb | (more tags, 1/2/4/8 bytes and then a data item |
2901	      |            | follow)                                        |
2902	      +------------+------------------------------------------------+
2903	      | 0xe0..0xf3 | (simple value)                                 |
2904	      +------------+------------------------------------------------+
2905	      | 0xf4       | False                                          |
2906	      +------------+------------------------------------------------+
2907	      | 0xf5       | True                                           |
2908	      +------------+------------------------------------------------+
2909	      | 0xf6       | Null                                           |
2910	      +------------+------------------------------------------------+
2911	      | 0xf7       | Undefined                                      |
2912	      +------------+------------------------------------------------+
2913	      | 0xf8       | (simple value, one byte follows)               |
2914	      +------------+------------------------------------------------+
2915	      | 0xf9       | Half-Precision Float (two-byte IEEE 754)       |
2916	      +------------+------------------------------------------------+
2917	      | 0xfa       | Single-Precision Float (four-byte IEEE 754)    |
2918	      +------------+------------------------------------------------+
2919	      | 0xfb       | Double-Precision Float (eight-byte IEEE 754)   |
2920	      +------------+------------------------------------------------+
2921	      | 0xff       | "break" stop code                              |
2922	      +------------+------------------------------------------------+

2924	                    Table 7: Jump Table for Initial Byte

2926	Appendix C.  Pseudocode

2928	   The well-formedness of a CBOR item can be checked by the pseudocode
2929	   in Figure 1.  The data is well-formed if and only if:

2931	   *  the pseudocode does not "fail";

2933	   *  after execution of the pseudocode, no bytes are left in the input
2934	      (except in streaming applications)

2936	   The pseudocode has the following prerequisites:

2938	   *  take(n) reads n bytes from the input data and returns them as a
2939	      byte string.  If n bytes are no longer available, take(n) fails.

2941	   *  uint() converts a byte string into an unsigned integer by
2942	      interpreting the byte string in network byte order.

2944	   *  Arithmetic works as in C.

2946	   *  All variables are unsigned integers of sufficient range.

2948	   Note that "well_formed" returns the major type for well-formed
2949	   definite length items, but 0 for an indefinite length item (or -1 for
2950	   a break stop code, only if "breakable" is set).  This is used in
2951	   "well_formed_indefinite" to ascertain that indefinite length strings
2952	   only contain definite length strings as chunks.

2954	   well_formed (breakable = false) {
2955	     // process initial bytes
2956	     ib = uint(take(1));
2957	     mt = ib >> 5;
2958	     val = ai = ib & 0x1f;
2959	     switch (ai) {
2960	       case 24: val = uint(take(1)); break;
2961	       case 25: val = uint(take(2)); break;
2962	       case 26: val = uint(take(4)); break;
2963	       case 27: val = uint(take(8)); break;
2964	       case 28: case 29: case 30: fail();
2965	       case 31:
2966	         return well_formed_indefinite(mt, breakable);
2967	     }
2968	     // process content
2969	     switch (mt) {
2970	       // case 0, 1, 7 do not have content; just use val
2971	       case 2: case 3: take(val); break; // bytes/UTF-8
2972	       case 4: for (i = 0; i < val; i++) well_formed(); break;
2973	       case 5: for (i = 0; i < val*2; i++) well_formed(); break;
2974	       case 6: well_formed(); break;     // 1 embedded data item
2975	       case 7: if (ai == 24 && val < 32) fail(); // bad simple
2976	     }
2977	     return mt;                    // finite data item
2978	   }

2980	   well_formed_indefinite(mt, breakable) {
2981	     switch (mt) {
2982	       case 2: case 3:
2983	         while ((it = well_formed(true)) != -1)
2984	           if (it != mt)           // need finite-length chunk
2985	             fail();               //    of same type
2986	         break;
2987	       case 4: while (well_formed(true) != -1); break;
2988	       case 5: while (well_formed(true) != -1) well_formed(); break;
2989	       case 7:
2990	         if (breakable)
2991	           return -1;              // signal break out
2992	         else fail();              // no enclosing indefinite
2993	       default: fail();            // wrong mt
2994	     }
2995	     return 0;                     // no break out
2996	   }

2998	               Figure 1: Pseudocode for Well-Formedness Check

3000	   Note that the remaining complexity of a complete CBOR decoder is
3001	   about presenting data that has been decoded to the application in an
3002	   appropriate form.

3004	   Major types 0 and 1 are designed in such a way that they can be
3005	   encoded in C from a signed integer without actually doing an if-then-
3006	   else for positive/negative (Figure 2).  This uses the fact that
3007	   (-1-n), the transformation for major type 1, is the same as ~n
3008	   (bitwise complement) in C unsigned arithmetic; ~n can then be
3009	   expressed as (-1)^n for the negative case, while 0^n leaves n
3010	   unchanged for non-negative.  The sign of a number can be converted to
3011	   -1 for negative and 0 for non-negative (0 or positive) by arithmetic-
3012	   shifting the number by one bit less than the bit length of the number
3013	   (for example, by 63 for 64-bit numbers).

3015	   void encode_sint(int64_t n) {
3016	     uint64t ui = n >> 63;    // extend sign to whole length
3017	     mt = ui & 0x20;          // extract major type
3018	     ui ^= n;                 // complement negatives
3019	     if (ui < 24)
3020	       *p++ = mt + ui;
3021	     else if (ui < 256) {
3022	       *p++ = mt + 24;
3023	       *p++ = ui;
3024	     } else
3025	          ...

3027	             Figure 2: Pseudocode for Encoding a Signed Integer

3029	Appendix D.  Half-Precision

3031	   As half-precision floating-point numbers were only added to IEEE 754
3032	   in 2008 [IEEE754], today's programming platforms often still only
3033	   have limited support for them.  It is very easy to include at least
3034	   decoding support for them even without such support.  An example of a
3035	   small decoder for half-precision floating-point numbers in the C
3036	   language is shown in Figure 3.  A similar program for Python is in
3037	   Figure 4; this code assumes that the 2-byte value has already been
3038	   decoded as an (unsigned short) integer in network byte order (as
3039	   would be done by the pseudocode in Appendix C).

3041	   #include 

3043	   double decode_half(unsigned char *halfp) {
3044	     int half = (halfp[0] << 8) + halfp[1];
3045	     int exp = (half >> 10) & 0x1f;
3046	     int mant = half & 0x3ff;
3047	     double val;
3048	     if (exp == 0) val = ldexp(mant, -24);
3049	     else if (exp != 31) val = ldexp(mant + 1024, exp - 25);
3050	     else val = mant == 0 ? INFINITY : NAN;
3051	     return half & 0x8000 ? -val : val;
3052	   }

3054	               Figure 3: C Code for a Half-Precision Decoder

3056	   import struct
3057	   from math import ldexp

3059	   def decode_single(single):
3060	       return struct.unpack("!f", struct.pack("!I", single))[0]

3062	   def decode_half(half):
3063	       valu = (half & 0x7fff) << 13 | (half & 0x8000) << 16
3064	       if ((half & 0x7c00) != 0x7c00):
3065	           return ldexp(decode_single(valu), 112)
3066	       return decode_single(valu | 0x7f800000)

3068	             Figure 4: Python Code for a Half-Precision Decoder

3070	Appendix E.  Comparison of Other Binary Formats to CBOR's Design
3071	             Objectives

3073	   The proposal for CBOR follows a history of binary formats that is as
3074	   long as the history of computers themselves.  Different formats have
3075	   had different objectives.  In most cases, the objectives of the
3076	   format were never stated, although they can sometimes be implied by
3077	   the context where the format was first used.  Some formats were meant
3078	   to be universally usable, although history has proven that no binary
3079	   format meets the needs of all protocols and applications.

3081	   CBOR differs from many of these formats due to it starting with a set
3082	   of objectives and attempting to meet just those.  This section
3083	   compares a few of the dozens of formats with CBOR's objectives in
3084	   order to help the reader decide if they want to use CBOR or a
3085	   different format for a particular protocol or application.

3087	   Note that the discussion here is not meant to be a criticism of any
3088	   format: to the best of our knowledge, no format before CBOR was meant
3089	   to cover CBOR's objectives in the priority we have assigned them.  A
3090	   brief recap of the objectives from Section 1.1 is:

3092	   1.  unambiguous encoding of most common data formats from Internet
3093	       standards

3095	   2.  code compactness for encoder or decoder

3097	   3.  no schema description needed

3099	   4.  reasonably compact serialization

3101	   5.  applicability to constrained and unconstrained applications

3103	   6.  good JSON conversion

3105	   7.  extensibility

3107	   A discussion of CBOR and other formats with respect to a different
3108	   set of design objectives is provided in Section 5 and Appendix C of
3109	   [RFC8618].

3111	E.1.  ASN.1 DER, BER, and PER

3113	   [ASN.1] has many serializations.  In the IETF, DER and BER are the
3114	   most common.  The serialized output is not particularly compact for
3115	   many items, and the code needed to decode numeric items can be
3116	   complex on a constrained device.

3118	   Few (if any) IETF protocols have adopted one of the several variants
3119	   of Packed Encoding Rules (PER).  There could be many reasons for
3120	   this, but one that is commonly stated is that PER makes use of the
3121	   schema even for parsing the surface structure of the data stream,
3122	   requiring significant tool support.  There are different versions of
3123	   the ASN.1 schema language in use, which has also hampered adoption.

3125	E.2.  MessagePack

3127	   [MessagePack] is a concise, widely implemented counted binary
3128	   serialization format, similar in many properties to CBOR, although
3129	   somewhat less regular.  While the data model can be used to represent
3130	   JSON data, MessagePack has also been used in many remote procedure
3131	   call (RPC) applications and for long-term storage of data.

3133	   MessagePack has been essentially stable since it was first published
3134	   around 2011; it has not yet had a transition.  The evolution of
3135	   MessagePack is impeded by an imperative to maintain complete
3136	   backwards compatibility with existing stored data, while only few
3137	   bytecodes are still available for extension.  Repeated requests over
3138	   the years from the MessagePack user community to separate out binary
3139	   and text strings in the encoding recently have led to an extension
3140	   proposal that would leave MessagePack's "raw" data ambiguous between
3141	   its usages for binary and text data.  The extension mechanism for
3142	   MessagePack remains unclear.

3144	E.3.  BSON

3146	   [BSON] is a data format that was developed for the storage of JSON-
3147	   like maps (JSON objects) in the MongoDB database.  Its major
3148	   distinguishing feature is the capability for in-place update, which
3149	   prevents a compact representation.  BSON uses a counted
3150	   representation except for map keys, which are null-byte terminated.
3151	   While BSON can be used for the representation of JSON-like objects on
3152	   the wire, its specification is dominated by the requirements of the
3153	   database application and has become somewhat baroque.  The status of
3154	   how BSON extensions will be implemented remains unclear.

3156	E.4.  MSDTP: RFC 713

3158	   Message Services Data Transmission (MSDTP) is a very early example of
3159	   a compact message format; it is described in [RFC0713], written in
3160	   1976.  It is included here for its historical value, not because it
3161	   was ever widely used.

3163	E.5.  Conciseness on the Wire

3165	   While CBOR's design objective of code compactness for encoders and
3166	   decoders is a higher priority than its objective of conciseness on
3167	   the wire, many people focus on the wire size.  Table 8 shows some
3168	   encoding examples for the simple nested array [1, [2, 3]]; where some
3169	   form of indefinite-length encoding is supported by the encoding,
3170	   [_ 1, [2, 3]] (indefinite length on the outer array) is also shown.

3172	       +-------------+----------------------------+----------------+
3173	       | Format      | [1, [2, 3]]                | [_ 1, [2, 3]]  |
3174	       +=============+============================+================+
3175	       | RFC 713     | c2 05 81 c2 02 82 83       |                |
3176	       +-------------+----------------------------+----------------+
3177	       | ASN.1 BER   | 30 0b 02 01 01 30 06 02 01 | 30 80 02 01 01 |
3178	       |             | 02 02 01 03                | 30 06 02 01 02 |
3179	       |             |                            | 02 01 03 00 00 |
3180	       +-------------+----------------------------+----------------+
3181	       | MessagePack | 92 01 92 02 03             |                |
3182	       +-------------+----------------------------+----------------+
3183	       | BSON        | 22 00 00 00 10 30 00 01 00 |                |
3184	       |             | 00 00 04 31 00 13 00 00 00 |                |
3185	       |             | 10 30 00 02 00 00 00 10 31 |                |
3186	       |             | 00 03 00 00 00 00 00       |                |
3187	       +-------------+----------------------------+----------------+
3188	       | CBOR        | 82 01 82 02 03             | 9f 01 82 02 03 |
3189	       |             |                            | ff             |
3190	       +-------------+----------------------------+----------------+

3192	           Table 8: Examples for Different Levels of Conciseness

3194	Appendix F.  Changes from RFC 7049

3196	   The following is a list of known changes from RFC 7049.  This list is
3197	   non-authoritative.  It is meant to help reviewers see the significant
3198	   differences.

3200	   *  Made some use of new RFCXML functionality [RFC7991]

3202	   *  Updated references, e.g. for [RFC4627] to [RFC8259] in many
3203	      places, for [CNN-TERMS] to [RFC7228]; added missing reference to
3204	      [IEEE754] and updated to [ECMA262]

3206	   *  Fixed errata: in the example in Section 2.4.2 ("29" -> "49"), and
3207	      in the last paragraph of Section 3.6 ("0b000_11101" ->
3208	      "0b000_11001")

3210	   *  Added a comment to the last example in Section 3.2.2 (added
3211	      "Second value")

3213	   *  Applied numerous small editorial changes

3215	   *  Added a few tables for illustration

3217	   *  More stringently used terminology for well-formed and valid data,
3218	      avoiding less well-defined alternative terms such as "syntax
3219	      error", "decoding error" and "strict mode" outside examples

3221	   *  Streamlined terminology to talk about tags, tag numbers, and tag
3222	      content

3224	   *  Clarified the restrictions on tag content, in general and
3225	      specifically for tag 1

3227	   *  Added text about the CBOR data model and its small variations
3228	      (basic generic, extended generic, specific)

3230	   *  More clearly separated integers from floating-point values;
3231	      provided a suggestion (based on I-JSON [RFC7493]) for handling
3232	      these types when converting JSON to CBOR

3234	   *  Added term "preferred serialization" and defined it for various
3235	      kinds of data items

3237	   *  Added comment about tags with semantics that depend on
3238	      serialization order

3240	   *  Defined "deterministic encoding", making use of "preferred
3241	      serialization", and simplified the suggested map ordering for the
3242	      "Core Deterministic Encoding Requirements", easing implementation,
3243	      while keeping RFC 7049 map ordering as an alternative "length-
3244	      first map key ordering"; now avoiding the terms "canonical" and
3245	      "canonicalization"

3247	   *  Clarified map validity (handling of duplicate keys) and explained
3248	      the domain of applicability of certain implementation choices

3250	   *  Updated IANA considerations

3252	   *  Added security considerations

3254	   *  Clarified handling of non-well-formed simple values in text and
3255	      pseudocode

3257	   *  Added Appendix G, well-formedness errors and examples

3259	   *  Removed UBJSON from Appendix E, as that format has completely
3260	      changed since RFC 7049; added reference to [RFC8618]

3262	Appendix G.  Well-formedness errors and examples

3264	   There are three basic kinds of well-formedness errors that can occur
3265	   in decoding a CBOR data item:

3267	   *  Too much data: There are input bytes left that were not consumed.
3268	      This is only an error if the application assumed that the input
3269	      bytes would span exactly one data item.  Where the application
3270	      uses the self-delimiting nature of CBOR encoding to permit
3271	      additional data after the data item, as is for example done in
3272	      CBOR sequences [RFC8742], the CBOR decoder can simply indicate
3273	      what part of the input has not been consumed.

3275	   *  Too little data: The input data available would need additional
3276	      bytes added at their end for a complete CBOR data item.  This may
3277	      indicate the input is truncated; it is also a common error when
3278	      trying to decode random data as CBOR.  For some applications
3279	      however, this may not actually be an error, as the application may
3280	      not be certain it has all the data yet and can obtain or wait for
3281	      additional input bytes.  Some of these applications may have an
3282	      upper limit for how much additional data can show up; here the
3283	      decoder may be able to indicate that the encoded CBOR data item
3284	      cannot be completed within this limit.

3286	   *  Syntax error: The input data are not consistent with the
3287	      requirements of the CBOR encoding, and this cannot be remedied by
3288	      adding (or removing) data at the end.

3290	   In Appendix C, errors of the first kind are addressed in the first
3291	   paragraph/bullet list (requiring "no bytes are left"), and errors of
3292	   the second kind are addressed in the second paragraph/bullet list
3293	   (failing "if n bytes are no longer available").  Errors of the third
3294	   kind are identified in the pseudocode by specific instances of
3295	   calling fail(), in order:

3297	   *  a reserved value is used for additional information (28, 29, 30)

3299	   *  major type 7, additional information 24, value < 32 (incorrect or
3300	      incorrectly encoded simple type)

3302	   *  incorrect substructure of indefinite length byte/text string (may
3303	      only contain definite length strings of the same major type)

3305	   *  break stop code (mt=7, ai=31) occurs in a value position of a map
3306	      or except at a position directly in an indefinite length item
3307	      where also another enclosed data item could occur

3309	   *  additional information 31 used with major type 0, 1, or 6

3311	G.1.  Examples for CBOR data items that are not well-formed

3313	   This subsection shows a few examples for CBOR data items that are not
3314	   well-formed.  Each example is a sequence of bytes each shown in
3315	   hexadecimal; multiple examples in a list are separated by commas.

3317	   Examples for well-formedness error kind 1 (too much data) can easily
3318	   be formed by adding data to a well-formed encoded CBOR data item.

3320	   Similarly, examples for well-formedness error kind 2 (too little
3321	   data) can be formed by truncating a well-formed encoded CBOR data
3322	   item.  In test suites, it may be beneficial to specifically test with
3323	   incomplete data items that would require large amounts of addition to
3324	   be completed (for instance by starting the encoding of a string of a
3325	   very large size).

3327	   A premature end of the input can occur in a head or within the
3328	   enclosed data, which may be bare strings or enclosed data items that
3329	   are either counted or should have been ended by a break stop code.

3331	   *  End of input in a head: 18, 19, 1a, 1b, 19 01, 1a 01 02, 1b 01 02
3332	      03 04 05 06 07, 38, 58, 78, 98, 9a 01 ff 00, b8, d8, f8, f9 00, fa
3333	      00 00, fb 00 00 00

3335	   *  Definite length strings with short data: 41, 61, 5a ff ff ff ff
3336	      00, 5b ff ff ff ff ff ff ff ff 01 02 03, 7a ff ff ff ff 00, 7b 7f
3337	      ff ff ff ff ff ff ff 01 02 03

3339	   *  Definite length maps and arrays not closed with enough items: 81,
3340	      81 81 81 81 81 81 81 81 81, 82 00, a1, a2 01 02, a1 00, a2 00 00
3341	      00

3343	   *  Tag number not followed by tag content: c0

3345	   *  Indefinite length strings not closed by a break stop code: 5f 41
3346	      00, 7f 61 00

3348	   *  Indefinite length maps and arrays not closed by a break stop code:
3349	      9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f 9f 9f
3350	      ff ff ff ff, 9f 81 9f 81 9f 9f ff ff ff

3352	   A few examples for the five subkinds of well-formedness error kind 3
3353	   (syntax error) are shown below.

3355	   Subkind 1:

3357	   *  Reserved additional information values: 1c, 1d, 1e, 3c, 3d, 3e,
3358	      5c, 5d, 5e, 7c, 7d, 7e, 9c, 9d, 9e, bc, bd, be, dc, dd, de, fc,
3359	      fd, fe,

3361	   Subkind 2:

3363	   *  Reserved two-byte encodings of simple types: f8 00, f8 01, f8 18,
3364	      f8 1f

3366	   Subkind 3:

3368	   *  Indefinite length string chunks not of the correct type: 5f 00 ff,
3369	      5f 21 ff, 5f 61 00 ff, 5f 80 ff, 5f a0 ff, 5f c0 00 ff, 5f e0 ff,
3370	      7f 41 00 ff

3372	   *  Indefinite length string chunks not definite length: 5f 5f 41 00
3373	      ff ff, 7f 7f 61 00 ff ff

3375	   Subkind 4:

3377	   *  Break occurring on its own outside of an indefinite length item:
3378	      ff

3380	   *  Break occurring in a definite length array or map or a tag: 81 ff,
3381	      82 00 ff, a1 ff, a1 ff 00, a1 00 ff, a2 00 00 ff, 9f 81 ff, 9f 82
3382	      9f 81 9f 9f ff ff ff ff

3384	   *  Break in indefinite length map would lead to odd number of items
3385	      (break in a value position): bf 00 ff, bf 00 00 00 ff

3387	   Subkind 5:

3389	   *  Major type 0, 1, 6 with additional information 31: 1f, 3f, df

3391	Acknowledgements

3393	   CBOR was inspired by MessagePack.  MessagePack was developed and
3394	   promoted by Sadayuki Furuhashi ("frsyuki").  This reference to
3395	   MessagePack is solely for attribution; CBOR is not intended as a
3396	   version of or replacement for MessagePack, as it has different design
3397	   goals and requirements.

3399	   The need for functionality beyond the original MessagePack
3400	   Specification became obvious to many people at about the same time
3401	   around the year 2012.  BinaryPack is a minor derivation of
3402	   MessagePack that was developed by Eric Zhang for the binaryjs
3403	   project.  A similar, but different, extension was made by Tim Caswell
3404	   for his msgpack-js and msgpack-js-browser projects.  Many people have
3405	   contributed to the discussion about extending MessagePack to separate
3406	   text string representation from byte string representation.

3408	   The encoding of the additional information in CBOR was inspired by
3409	   the encoding of length information designed by Klaus Hartke for CoAP.

3411	   This document also incorporates suggestions made by many people,
3412	   notably Dan Frost, James Manger, Jeffrey Yasskin, Joe Hildebrand,
3413	   Keith Moore, Laurence Lundblade, Matthew Lepinski, Michael
3414	   Richardson, Nico Williams, Peter Occil, Phillip Hallam-Baker, Ray
3415	   Polk, Tim Bray, Tony Finch, Tony Hansen, and Yaron Sheffer.

3417	Authors' Addresses

3419	   Carsten Bormann
3420	   Universitaet Bremen TZI
3421	   Postfach 330440
3422	   D-28359 Bremen
3423	   Germany

3425	   Phone: +49-421-218-63921
3426	   Email: cabo@tzi.org

3428	   Paul Hoffman
3429	   ICANN

3431	   Email: paul.hoffman@icann.org