idnits 2.17.1 draft-ietf-cbor-7049bis-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 12, 2017) is 2572 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '2' on line 2041

  -- Looks like a reference, but probably isn't: '3' on line 2041

  -- Looks like a reference, but probably isn't: '4' on line 2039

  -- Looks like a reference, but probably isn't: '5' on line 2039

  -- Looks like a reference, but probably isn't: '1' on line 2318

  == Missing Reference: 'RFCthis' is mentioned on line 1683, but not defined

  == Missing Reference: 'TM' is mentioned on line 1858, but not defined

  -- Looks like a reference, but probably isn't: '0' on line 2334

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ECMA262'

  ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126)

  -- Obsolete informational reference (is this intentional?): RFC 7159
     (Obsoleted by RFC 8259)


     Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         C. Bormann
3	Internet-Draft                                   Universitaet Bremen TZI
4	Intended status: Standards Track                              P. Hoffman
5	Expires: October 14, 2017                                          ICANN
6	                                                          April 12, 2017

8	              Concise Binary Object Representation (CBOR)
9	                     draft-ietf-cbor-7049bis-00

11	Abstract

13	   The Concise Binary Object Representation (CBOR) is a data format
14	   whose design goals include the possibility of extremely small code
15	   size, fairly small message size, and extensibility without the need
16	   for version negotiation.  These design goals make it different from
17	   earlier binary serializations such as ASN.1 and MessagePack.

19	Contributing

21	   This document is being worked on in the CBOR Working Group.  Please
22	   contribute on the mailing list there, or in the GitHub repository for
23	   this draft: https://github.com/cbor-wg/CBORbis

25	   The charter for the CBOR Working Group says that the WG will update
26	   RFC 7049 to fix verified errata.  Security issues and clarifications
27	   may be addressed, but changes to this document will ensure backward
28	   compatibility for popular deployed codebases.  This document will be
29	   targeted at becoming an Internet Standard.

31	Status of This Memo

33	   This Internet-Draft is submitted in full conformance with the
34	   provisions of BCP 78 and BCP 79.

36	   Internet-Drafts are working documents of the Internet Engineering
37	   Task Force (IETF).  Note that other groups may also distribute
38	   working documents as Internet-Drafts.  The list of current Internet-
39	   Drafts is at http://datatracker.ietf.org/drafts/current/.

41	   Internet-Drafts are draft documents valid for a maximum of six months
42	   and may be updated, replaced, or obsoleted by other documents at any
43	   time.  It is inappropriate to use Internet-Drafts as reference
44	   material or to cite them other than as "work in progress."

46	   This Internet-Draft will expire on October 14, 2017.

48	Copyright Notice

50	   Copyright (c) 2017 IETF Trust and the persons identified as the
51	   document authors.  All rights reserved.

53	   This document is subject to BCP 78 and the IETF Trust's Legal
54	   Provisions Relating to IETF Documents
55	   (http://trustee.ietf.org/license-info) in effect on the date of
56	   publication of this document.  Please review these documents
57	   carefully, as they describe your rights and restrictions with respect
58	   to this document.  Code Components extracted from this document must
59	   include Simplified BSD License text as described in Section 4.e of
60	   the Trust Legal Provisions and are provided without warranty as
61	   described in the Simplified BSD License.

63	Table of Contents

65	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
66	     1.1.  Objectives  . . . . . . . . . . . . . . . . . . . . . . .   4
67	     1.2.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   5
68	   2.  Specification of the CBOR Encoding  . . . . . . . . . . . . .   6
69	     2.1.  Major Types . . . . . . . . . . . . . . . . . . . . . . .   7
70	     2.2.  Indefinite Lengths for Some Major Types . . . . . . . . .   9
71	       2.2.1.  Indefinite-Length Arrays and Maps . . . . . . . . . .   9
72	       2.2.2.  Indefinite-Length Byte Strings and Text Strings . . .  11
73	     2.3.  Floating-Point Numbers and Values with No Content . . . .  12
74	     2.4.  Optional Tagging of Items . . . . . . . . . . . . . . . .  14
75	       2.4.1.  Date and Time . . . . . . . . . . . . . . . . . . . .  16
76	       2.4.2.  Bignums . . . . . . . . . . . . . . . . . . . . . . .  16
77	       2.4.3.  Decimal Fractions and Bigfloats . . . . . . . . . . .  16
78	       2.4.4.  Content Hints . . . . . . . . . . . . . . . . . . . .  18
79	         2.4.4.1.  Encoded CBOR Data Item  . . . . . . . . . . . . .  18
80	         2.4.4.2.  Expected Later Encoding for CBOR-to-JSON
81	                   Converters  . . . . . . . . . . . . . . . . . . .  18
82	         2.4.4.3.  Encoded Text  . . . . . . . . . . . . . . . . . .  18
83	       2.4.5.  Self-Describe CBOR  . . . . . . . . . . . . . . . . .  19
84	   3.  Creating CBOR-Based Protocols . . . . . . . . . . . . . . . .  19
85	     3.1.  CBOR in Streaming Applications  . . . . . . . . . . . . .  20
86	     3.2.  Generic Encoders and Decoders . . . . . . . . . . . . . .  20
87	     3.3.  Syntax Errors . . . . . . . . . . . . . . . . . . . . . .  21
88	       3.3.1.  Incomplete CBOR Data Items  . . . . . . . . . . . . .  21
89	       3.3.2.  Malformed Indefinite-Length Items . . . . . . . . . .  22
90	       3.3.3.  Unknown Additional Information Values . . . . . . . .  22
91	     3.4.  Other Decoding Errors . . . . . . . . . . . . . . . . . .  22
92	     3.5.  Handling Unknown Simple Values and Tags . . . . . . . . .  23
93	     3.6.  Numbers . . . . . . . . . . . . . . . . . . . . . . . . .  23
94	     3.7.  Specifying Keys for Maps  . . . . . . . . . . . . . . . .  24
95	     3.8.  Undefined Values  . . . . . . . . . . . . . . . . . . . .  25
96	     3.9.  Canonical CBOR  . . . . . . . . . . . . . . . . . . . . .  26
97	     3.10. Strict Mode . . . . . . . . . . . . . . . . . . . . . . .  27
98	   4.  Converting Data between CBOR and JSON . . . . . . . . . . . .  28
99	     4.1.  Converting from CBOR to JSON  . . . . . . . . . . . . . .  29
100	     4.2.  Converting from JSON to CBOR  . . . . . . . . . . . . . .  30
101	   5.  Future Evolution of CBOR  . . . . . . . . . . . . . . . . . .  31
102	     5.1.  Extension Points  . . . . . . . . . . . . . . . . . . . .  31
103	     5.2.  Curating the Additional Information Space . . . . . . . .  32
104	   6.  Diagnostic Notation . . . . . . . . . . . . . . . . . . . . .  32
105	     6.1.  Encoding Indicators . . . . . . . . . . . . . . . . . . .  33
106	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  34
107	     7.1.  Simple Values Registry  . . . . . . . . . . . . . . . . .  34
108	     7.2.  Tags Registry . . . . . . . . . . . . . . . . . . . . . .  34
109	     7.3.  Media Type ("MIME Type")  . . . . . . . . . . . . . . . .  35
110	     7.4.  CoAP Content-Format . . . . . . . . . . . . . . . . . . .  36
111	     7.5.  The +cbor Structured Syntax Suffix Registration . . . . .  36
112	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  37
113	   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  38
114	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  38
115	     10.1.  Normative References . . . . . . . . . . . . . . . . . .  38
116	     10.2.  Informative References . . . . . . . . . . . . . . . . .  39
117	   Appendix A.  Examples . . . . . . . . . . . . . . . . . . . . . .  41
118	   Appendix B.  Jump Table . . . . . . . . . . . . . . . . . . . . .  45
119	   Appendix C.  Pseudocode . . . . . . . . . . . . . . . . . . . . .  48
120	   Appendix D.  Half-Precision . . . . . . . . . . . . . . . . . . .  50
121	   Appendix E.  Comparison of Other Binary Formats to CBOR's Design
122	                Objectives . . . . . . . . . . . . . . . . . . . . .  51
123	     E.1.  ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . .  52
124	     E.2.  MessagePack . . . . . . . . . . . . . . . . . . . . . . .  52
125	     E.3.  BSON  . . . . . . . . . . . . . . . . . . . . . . . . . .  53
126	     E.4.  UBJSON  . . . . . . . . . . . . . . . . . . . . . . . . .  53
127	     E.5.  MSDTP: RFC 713  . . . . . . . . . . . . . . . . . . . . .  53
128	     E.6.  Conciseness on the Wire . . . . . . . . . . . . . . . . .  53
129	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  54

131	1.  Introduction

133	   There are hundreds of standardized formats for binary representation
134	   of structured data (also known as binary serialization formats).  Of
135	   those, some are for specific domains of information, while others are
136	   generalized for arbitrary data.  In the IETF, probably the best-known
137	   formats in the latter category are ASN.1's BER and DER [ASN.1].

139	   The format defined here follows some specific design goals that are
140	   not well met by current formats.  The underlying data model is an
141	   extended version of the JSON data model [RFC7159].  It is important
142	   to note that this is not a proposal that the grammar in RFC 7159 be
143	   extended in general, since doing so would cause a significant
144	   backwards incompatibility with already deployed JSON documents.
145	   Instead, this document simply defines its own data model that starts
146	   from JSON.

148	   Appendix E lists some existing binary formats and discusses how well
149	   they do or do not fit the design objectives of the Concise Binary
150	   Object Representation (CBOR).

152	1.1.  Objectives

154	   The objectives of CBOR, roughly in decreasing order of importance,
155	   are:

157	   1.  The representation must be able to unambiguously encode most
158	       common data formats used in Internet standards.

160	       *  It must represent a reasonable set of basic data types and
161	          structures using binary encoding.  "Reasonable" here is
162	          largely influenced by the capabilities of JSON, with the major
163	          addition of binary byte strings.  The structures supported are
164	          limited to arrays and trees; loops and lattice-style graphs
165	          are not supported.

167	       *  There is no requirement that all data formats be uniquely
168	          encoded; that is, it is acceptable that the number "7" might
169	          be encoded in multiple different ways.

171	   2.  The code for an encoder or decoder must be able to be compact in
172	       order to support systems with very limited memory, processor
173	       power, and instruction sets.

175	       *  An encoder and a decoder need to be implementable in a very
176	          small amount of code (for example, in class 1 constrained
177	          nodes as defined in [RFC7228]).

179	       *  The format should use contemporary machine representations of
180	          data (for example, not requiring binary-to-decimal
181	          conversion).

183	   3.  Data must be able to be decoded without a schema description.

185	       *  Similar to JSON, encoded data should be self-describing so
186	          that a generic decoder can be written.

188	   4.  The serialization must be reasonably compact, but data
189	       compactness is secondary to code compactness for the encoder and
190	       decoder.

192	       *  "Reasonable" here is bounded by JSON as an upper bound in
193	          size, and by implementation complexity maintaining a lower
194	          bound.  Using either general compression schemes or extensive
195	          bit-fiddling violates the complexity goals.

197	   5.  The format must be applicable to both constrained nodes and high-
198	       volume applications.

200	       *  This means it must be reasonably frugal in CPU usage for both
201	          encoding and decoding.  This is relevant both for constrained
202	          nodes and for potential usage in applications with a very high
203	          volume of data.

205	   6.  The format must support all JSON data types for conversion to and
206	       from JSON.

208	       *  It must support a reasonable level of conversion as long as
209	          the data represented is within the capabilities of JSON.  It
210	          must be possible to define a unidirectional mapping towards
211	          JSON for all types of data.

213	   7.  The format must be extensible, and the extended data must be
214	       decodable by earlier decoders.

216	       *  The format is designed for decades of use.

218	       *  The format must support a form of extensibility that allows
219	          fallback so that a decoder that does not understand an
220	          extension can still decode the message.

222	       *  The format must be able to be extended in the future by later
223	          IETF standards.

225	1.2.  Terminology

227	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
228	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
229	   document are to be interpreted as described in RFC 2119, BCP 14
230	   [RFC2119] and indicate requirement levels for compliant CBOR
231	   implementations.

233	   The term "byte" is used in its now-customary sense as a synonym for
234	   "octet".  All multi-byte values are encoded in network byte order
235	   (that is, most significant byte first, also known as "big-endian").

237	   This specification makes use of the following terminology:

239	   Data item:  A single piece of CBOR data.  The structure of a data
240	      item may contain zero, one, or more nested data items.  The term
241	      is used both for the data item in representation format and for
242	      the abstract idea that can be derived from that by a decoder.

244	   Decoder:  A process that decodes a CBOR data item and makes it
245	      available to an application.  Formally speaking, a decoder
246	      contains a parser to break up the input using the syntax rules of
247	      CBOR, as well as a semantic processor to prepare the data in a
248	      form suitable to the application.

250	   Encoder:  A process that generates the representation format of a
251	      CBOR data item from application information.

253	   Data Stream:  A sequence of zero or more data items, not further
254	      assembled into a larger containing data item.  The independent
255	      data items that make up a data stream are sometimes also referred
256	      to as "top-level data items".

258	   Well-formed:  A data item that follows the syntactic structure of
259	      CBOR.  A well-formed data item uses the initial bytes and the byte
260	      strings and/or data items that are implied by their values as
261	      defined in CBOR and is not followed by extraneous data.

263	   Valid:  A data item that is well-formed and also follows the semantic
264	      restrictions that apply to CBOR data items.

266	   Stream decoder:  A process that decodes a data stream and makes each
267	      of the data items in the sequence available to an application as
268	      they are received.

270	   Where bit arithmetic or data types are explained, this document uses
271	   the notation familiar from the programming language C, except that
272	   "**" denotes exponentiation.  Similar to the "0x" notation for
273	   hexadecimal numbers, numbers in binary notation are prefixed with
274	   "0b".  Underscores can be added to such a number solely for
275	   readability, so 0b00100001 (0x21) might be written 0b001_00001 to
276	   emphasize the desired interpretation of the bits in the byte; in this
277	   case, it is split into three bits and five bits.

279	2.  Specification of the CBOR Encoding

281	   A CBOR-encoded data item is structured and encoded as described in
282	   this section.  The encoding is summarized in Table 5.

284	   The initial byte of each data item contains both information about
285	   the major type (the high-order 3 bits, described in Section 2.1) and
286	   additional information (the low-order 5 bits).  When the value of the
287	   additional information is less than 24, it is directly used as a
288	   small unsigned integer.  When it is 24 to 27, the additional bytes
289	   for a variable-length integer immediately follow; the values 24 to 27
290	   of the additional information specify that its length is a 1-, 2-,
291	   4-, or 8-byte unsigned integer, respectively.  Additional information
292	   value 31 is used for indefinite-length items, described in
293	   Section 2.2.  Additional information values 28 to 30 are reserved for
294	   future expansion.

296	   In all additional information values, the resulting integer is
297	   interpreted depending on the major type.  It may represent the actual
298	   data: for example, in integer types, the resulting integer is used
299	   for the value itself.  It may instead supply length information: for
300	   example, in byte strings it gives the length of the byte string data
301	   that follows.

303	   A CBOR decoder implementation can be based on a jump table with all
304	   256 defined values for the initial byte (Table 5).  A decoder in a
305	   constrained implementation can instead use the structure of the
306	   initial byte and following bytes for more compact code (see
307	   Appendix C for a rough impression of how this could look).

309	2.1.  Major Types

311	   The following lists the major types and the additional information
312	   and other bytes associated with the type.

314	   Major type 0:  an unsigned integer.  The 5-bit additional information
315	      is either the integer itself (for additional information values 0
316	      through 23) or the length of additional data.  Additional
317	      information 24 means the value is represented in an additional
318	      uint8_t, 25 means a uint16_t, 26 means a uint32_t, and 27 means a
319	      uint64_t.  For example, the integer 10 is denoted as the one byte
320	      0b000_01010 (major type 0, additional information 10).  The
321	      integer 500 would be 0b000_11001 (major type 0, additional
322	      information 25) followed by the two bytes 0x01f4, which is 500 in
323	      decimal.

325	   Major type 1:  a negative integer.  The encoding follows the rules
326	      for unsigned integers (major type 0), except that the value is
327	      then -1 minus the encoded unsigned integer.  For example, the
328	      integer -500 would be 0b001_11001 (major type 1, additional
329	      information 25) followed by the two bytes 0x01f3, which is 499 in
330	      decimal.

332	   Major type 2:  a byte string.  The string's length in bytes is
333	      represented following the rules for positive integers (major type
334	      0).  For example, a byte string whose length is 5 would have an
335	      initial byte of 0b010_00101 (major type 2, additional information
336	      5 for the length), followed by 5 bytes of binary content.  A byte
337	      string whose length is 500 would have 3 initial bytes of
338	      0b010_11001 (major type 2, additional information 25 to indicate a
339	      two-byte length) followed by the two bytes 0x01f4 for a length of
340	      500, followed by 500 bytes of binary content.

342	   Major type 3:  a text string, specifically a string of Unicode
343	      characters that is encoded as UTF-8 [RFC3629].  The format of this
344	      type is identical to that of byte strings (major type 2), that is,
345	      as with major type 2, the length gives the number of bytes.  This
346	      type is provided for systems that need to interpret or display
347	      human-readable text, and allows the differentiation between
348	      unstructured bytes and text that has a specified repertoire and
349	      encoding.  In contrast to formats such as JSON, the Unicode
350	      characters in this type are never escaped.  Thus, a newline
351	      character (U+000A) is always represented in a string as the byte
352	      0x0a, and never as the bytes 0x5c6e (the characters "\" and "n")
353	      or as 0x5c7530303061 (the characters "\", "u", "0", "0", "0", and
354	      "a").

356	   Major type 4:  an array of data items.  Arrays are also called lists,
357	      sequences, or tuples.  The array's length follows the rules for
358	      byte strings (major type 2), except that the length denotes the
359	      number of data items, not the length in bytes that the array takes
360	      up.  Items in an array do not need to all be of the same type.
361	      For example, an array that contains 10 items of any type would
362	      have an initial byte of 0b100_01010 (major type of 4, additional
363	      information of 10 for the length) followed by the 10 remaining
364	      items.

366	   Major type 5:  a map of pairs of data items.  Maps are also called
367	      tables, dictionaries, hashes, or objects (in JSON).  A map is
368	      comprised of pairs of data items, each pair consisting of a key
369	      that is immediately followed by a value.  The map's length follows
370	      the rules for byte strings (major type 2), except that the length
371	      denotes the number of pairs, not the length in bytes that the map
372	      takes up.  For example, a map that contains 9 pairs would have an
373	      initial byte of 0b101_01001 (major type of 5, additional
374	      information of 9 for the number of pairs) followed by the 18
375	      remaining items.  The first item is the first key, the second item
376	      is the first value, the third item is the second key, and so on.
377	      A map that has duplicate keys may be well-formed, but it is not
378	      valid, and thus it causes indeterminate decoding; see also
379	      Section 3.7.

381	   Major type 6:  optional semantic tagging of other major types.  See
382	      Section 2.4.

384	   Major type 7:  floating-point numbers and simple data types that need
385	      no content, as well as the "break" stop code.  See Section 2.3.

387	   These eight major types lead to a simple table showing which of the
388	   256 possible values for the initial byte of a data item are used
389	   (Table 5).

391	   In major types 6 and 7, many of the possible values are reserved for
392	   future specification.  See Section 7 for more information on these
393	   values.

395	2.2.  Indefinite Lengths for Some Major Types

397	   Four CBOR items (arrays, maps, byte strings, and text strings) can be
398	   encoded with an indefinite length using additional information value
399	   31.  This is useful if the encoding of the item needs to begin before
400	   the number of items inside the array or map, or the total length of
401	   the string, is known.  (The application of this is often referred to
402	   as "streaming" within a data item.)

404	   Indefinite-length arrays and maps are dealt with differently than
405	   indefinite-length byte strings and text strings.

407	2.2.1.  Indefinite-Length Arrays and Maps

409	   Indefinite-length arrays and maps are simply opened without
410	   indicating the number of data items that will be included in the
411	   array or map, using the additional information value of 31.  The
412	   initial major type and additional information byte is followed by the
413	   elements of the array or map, just as they would be in other arrays
414	   or maps.  The end of the array or map is indicated by encoding a
415	   "break" stop code in a place where the next data item would normally
416	   have been included.  The "break" is encoded with major type 7 and
417	   additional information value 31 (0b111_11111) but is not itself a
418	   data item: it is just a syntactic feature to close the array or map.
419	   That is, the "break" stop code comes after the last item in the array
420	   or map, and it cannot occur anywhere else in place of a data item.
421	   In this way, indefinite-length arrays and maps look identical to
422	   other arrays and maps except for beginning with the additional
423	   information value 31 and ending with the "break" stop code.

425	   Arrays and maps with indefinite lengths allow any number of items
426	   (for arrays) and key/value pairs (for maps) to be given before the
427	   "break" stop code.  There is no restriction against nesting
428	   indefinite-length array or map items.  A "break" only terminates a
429	   single item, so nested indefinite-length items need exactly as many
430	   "break" stop codes as there are type bytes starting an indefinite-
431	   length item.

433	   For example, assume an encoder wants to represent the abstract array
434	   [1, [2, 3], [4, 5]].  The definite-length encoding would be
435	   0x8301820203820405:

437	   83        -- Array of length 3
438	      01     -- 1
439	      82     -- Array of length 2
440	         02  -- 2
441	         03  -- 3
442	      82     -- Array of length 2
443	         04  -- 4
444	         05  -- 5

446	   Indefinite-length encoding could be applied independently to each of
447	   the three arrays encoded in this data item, as required, leading to
448	   representations such as:

450	   0x9f018202039f0405ffff
451	   9F        -- Start indefinite-length array
452	      01     -- 1
453	      82     -- Array of length 2
454	         02  -- 2
455	         03  -- 3
456	      9F     -- Start indefinite-length array
457	         04  -- 4
458	         05  -- 5
459	         FF  -- "break" (inner array)
460	      FF     -- "break" (outer array)

462	   0x9f01820203820405ff
463	   9F        -- Start indefinite-length array
464	      01     -- 1
465	      82     -- Array of length 2
466	         02  -- 2
467	         03  -- 3
468	      82     -- Array of length 2
469	         04  -- 4
470	         05  -- 5
471	      FF     -- "break"

473	   0x83018202039f0405ff
474	   83        -- Array of length 3
475	      01     -- 1
476	      82     -- Array of length 2
477	         02  -- 2
478	         03  -- 3
479	      9F     -- Start indefinite-length array
480	         04  -- 4
481	         05  -- 5
482	         FF  -- "break"

484	   0x83019f0203ff820405
485	   83        -- Array of length 3
486	      01     -- 1
487	      9F     -- Start indefinite-length array
488	         02  -- 2
489	         03  -- 3
490	         FF  -- "break"
491	      82     -- Array of length 2
492	         04  -- 4
493	         05  -- 5

495	   An example of an indefinite-length map (that happens to have two key/
496	   value pairs) might be:

498	   0xbf6346756ef563416d7421ff
499	   BF           -- Start indefinite-length map
500	      63        -- First key, UTF-8 string length 3
501	         46756e --   "Fun"
502	      F5        -- First value, true
503	      63        -- Second key, UTF-8 string length 3
504	         416d74 --   "Amt"
505	      21        -- Second value, -2
506	      FF        -- "break"

508	2.2.2.  Indefinite-Length Byte Strings and Text Strings

510	   Indefinite-length byte strings and text strings are actually a
511	   concatenation of zero or more definite-length byte or text strings
512	   ("chunks") that are together treated as one contiguous string.
513	   Indefinite-length strings are opened with the major type and
514	   additional information value of 31, but what follows are a series of
515	   byte or text strings that have definite lengths (the chunks).  The
516	   end of the series of chunks is indicated by encoding the "break" stop
517	   code (0b111_11111) in a place where the next chunk in the series
518	   would occur.  The contents of the chunks are concatenated together,
519	   and the overall length of the indefinite-length string will be the
520	   sum of the lengths of all of the chunks.  In summary, an indefinite-
521	   length string is encoded similarly to how an indefinite-length array
522	   of its chunks would be encoded, except that the major type of the
523	   indefinite-length string is that of a (text or byte) string and
524	   matches the major types of its chunks.

526	   For indefinite-length byte strings, every data item (chunk) between
527	   the indefinite-length indicator and the "break" MUST be a definite-
528	   length byte string item; if the parser sees any item type other than
529	   a byte string before it sees the "break", it is an error.

531	   For example, assume the sequence:

533	   0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111

535	   5F              -- Start indefinite-length byte string
536	      44           -- Byte string of length 4
537	         aabbccdd  -- Bytes content
538	      43           -- Byte string of length 3
539	         eeff99    -- Bytes content
540	      FF           -- "break"

542	   After decoding, this results in a single byte string with seven
543	   bytes: 0xaabbccddeeff99.

545	   Text strings with indefinite lengths act the same as byte strings
546	   with indefinite lengths, except that all their chunks MUST be
547	   definite-length text strings.  Note that this implies that the bytes
548	   of a single UTF-8 character cannot be spread between chunks: a new
549	   chunk can only be started at a character boundary.

551	2.3.  Floating-Point Numbers and Values with No Content

553	   Major type 7 is for two types of data: floating-point numbers and
554	   "simple values" that do not need any content.  Each value of the
555	   5-bit additional information in the initial byte has its own separate
556	   meaning, as defined in Table 1.  Like the major types for integers,
557	   items of this major type do not carry content data; all the
558	   information is in the initial bytes.

560	    +-------------+--------------------------------------------------+
561	    | 5-Bit Value | Semantics                                        |
562	    +-------------+--------------------------------------------------+
563	    | 0..23       | Simple value (value 0..23)                       |
564	    |             |                                                  |
565	    | 24          | Simple value (value 32..255 in following byte)   |
566	    |             |                                                  |
567	    | 25          | IEEE 754 Half-Precision Float (16 bits follow)   |
568	    |             |                                                  |
569	    | 26          | IEEE 754 Single-Precision Float (32 bits follow) |
570	    |             |                                                  |
571	    | 27          | IEEE 754 Double-Precision Float (64 bits follow) |
572	    |             |                                                  |
573	    | 28-30       | (Unassigned)                                     |
574	    |             |                                                  |
575	    | 31          | "break" stop code for indefinite-length items    |
576	    +-------------+--------------------------------------------------+

578	        Table 1: Values for Additional Information in Major Type 7

580	   As with all other major types, the 5-bit value 24 signifies a single-
581	   byte extension: it is followed by an additional byte to represent the
582	   simple value.  (To minimize confusion, only the values 32 to 255 are
583	   used.)  This maintains the structure of the initial bytes: as for the
584	   other major types, the length of these always depends on the
585	   additional information in the first byte.  Table 2 lists the values
586	   assigned and available for simple types.

588	                       +---------+-----------------+
589	                       | Value   | Semantics       |
590	                       +---------+-----------------+
591	                       | 0..19   | (Unassigned)    |
592	                       |         |                 |
593	                       | 20      | False           |
594	                       |         |                 |
595	                       | 21      | True            |
596	                       |         |                 |
597	                       | 22      | Null            |
598	                       |         |                 |
599	                       | 23      | Undefined value |
600	                       |         |                 |
601	                       | 24..31  | (Reserved)      |
602	                       |         |                 |
603	                       | 32..255 | (Unassigned)    |
604	                       +---------+-----------------+

606	                          Table 2: Simple Values

608	   The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit
609	   IEEE 754 binary floating-point values.  These floating-point values
610	   are encoded in the additional bytes of the appropriate size.  (See
611	   Appendix D for some information about 16-bit floating point.)

613	2.4.  Optional Tagging of Items

615	   In CBOR, a data item can optionally be preceded by a tag to give it
616	   additional semantics while retaining its structure.  The tag is major
617	   type 6, and represents an integer number as indicated by the tag's
618	   integer value; the (sole) data item is carried as content data.  If a
619	   tag requires structured data, this structure is encoded into the
620	   nested data item.  The definition of a tag usually restricts what
621	   kinds of nested data item or items can be carried by a tag.

623	   The initial bytes of the tag follow the rules for positive integers
624	   (major type 0).  The tag is followed by a single data item of any
625	   type.  For example, assume that a byte string of length 12 is marked
626	   with a tag to indicate it is a positive bignum (Section 2.4.2).  This
627	   would be marked as 0b110_00010 (major type 6, additional information
628	   2 for the tag) followed by 0b010_01100 (major type 2, additional
629	   information of 12 for the length) followed by the 12 bytes of the
630	   bignum.

632	   Decoders do not need to understand tags, and thus tags may be of
633	   little value in applications where the implementation creating a
634	   particular CBOR data item and the implementation decoding that stream
635	   know the semantic meaning of each item in the data flow.  Their
636	   primary purpose in this specification is to define common data types
637	   such as dates.  A secondary purpose is to allow optional tagging when
638	   the decoder is a generic CBOR decoder that might be able to benefit
639	   from hints about the content of items.  Understanding the semantic
640	   tags is optional for a decoder; it can just jump over the initial
641	   bytes of the tag and interpret the tagged data item itself.

643	   A tag always applies to the item that is directly followed by it.
644	   Thus, if tag A is followed by tag B, which is followed by data item
645	   C, tag A applies to the result of applying tag B on data item C.
646	   That is, a tagged item is a data item consisting of a tag and a
647	   value.  The content of the tagged item is the data item (the value)
648	   that is being tagged.

650	   IANA maintains a registry of tag values as described in Section 7.2.
651	   Table 3 provides a list of initial values, with definitions in the
652	   rest of this section.

654	   +-----------+--------------+----------------------------------------+
655	   | Tag       | Data Item    | Semantics                              |
656	   +-----------+--------------+----------------------------------------+
657	   | 0         | UTF-8 string | Standard date/time string; see         |
658	   |           |              | Section 2.4.1                          |
659	   |           |              |                                        |
660	   | 1         | multiple     | Epoch-based date/time; see             |
661	   |           |              | Section 2.4.1                          |
662	   |           |              |                                        |
663	   | 2         | byte string  | Positive bignum; see Section 2.4.2     |
664	   |           |              |                                        |
665	   | 3         | byte string  | Negative bignum; see Section 2.4.2     |
666	   |           |              |                                        |
667	   | 4         | array        | Decimal fraction; see Section 2.4.3    |
668	   |           |              |                                        |
669	   | 5         | array        | Bigfloat; see Section 2.4.3            |
670	   |           |              |                                        |
671	   | 6..20     | (Unassigned) | (Unassigned)                           |
672	   |           |              |                                        |
673	   | 21        | multiple     | Expected conversion to base64url       |
674	   |           |              | encoding; see Section 2.4.4.2          |
675	   |           |              |                                        |
676	   | 22        | multiple     | Expected conversion to base64          |
677	   |           |              | encoding; see Section 2.4.4.2          |
678	   |           |              |                                        |
679	   | 23        | multiple     | Expected conversion to base16          |
680	   |           |              | encoding; see Section 2.4.4.2          |
681	   |           |              |                                        |
682	   | 24        | byte string  | Encoded CBOR data item; see            |
683	   |           |              | Section 2.4.4.1                        |
684	   |           |              |                                        |
685	   | 25..31    | (Unassigned) | (Unassigned)                           |
686	   |           |              |                                        |
687	   | 32        | UTF-8 string | URI; see Section 2.4.4.3               |
688	   |           |              |                                        |
689	   | 33        | UTF-8 string | base64url; see Section 2.4.4.3         |
690	   |           |              |                                        |
691	   | 34        | UTF-8 string | base64; see Section 2.4.4.3            |
692	   |           |              |                                        |
693	   | 35        | UTF-8 string | Regular expression; see                |
694	   |           |              | Section 2.4.4.3                        |
695	   |           |              |                                        |
696	   | 36        | UTF-8 string | MIME message; see Section 2.4.4.3      |
697	   |           |              |                                        |
698	   | 37..55798 | (Unassigned) | (Unassigned)                           |
699	   |           |              |                                        |
700	   | 55799     | multiple     | Self-describe CBOR; see Section 2.4.5  |
701	   |           |              |                                        |
702	   | 55800+    | (Unassigned) | (Unassigned)                           |
703	   +-----------+--------------+----------------------------------------+
704	                         Table 3: Values for Tags

706	2.4.1.  Date and Time

708	   Tag value 0 is for date/time strings that follow the standard format
709	   described in [RFC3339], as refined by Section 3.3 of [RFC4287].

711	   Tag value 1 is for numerical representation of seconds relative to
712	   1970-01-01T00:00Z in UTC time.  (For the non-negative values that the
713	   Portable Operating System Interface (POSIX) defines, the number of
714	   seconds is counted in the same way as for POSIX "seconds since the
715	   epoch" [TIME_T].)  The tagged item can be a positive or negative
716	   integer (major types 0 and 1), or a floating-point number (major type
717	   7 with additional information 25, 26, or 27).  Note that the number
718	   can be negative (time before 1970-01-01T00:00Z) and, if a floating-
719	   point number, indicate fractional seconds.

721	2.4.2.  Bignums

723	   Bignums are integers that do not fit into the basic integer
724	   representations provided by major types 0 and 1.  They are encoded as
725	   a byte string data item, which is interpreted as an unsigned integer
726	   n in network byte order.  For tag value 2, the value of the bignum is
727	   n.  For tag value 3, the value of the bignum is -1 - n.  Decoders
728	   that understand these tags MUST be able to decode bignums that have
729	   leading zeroes.

731	   For example, the number 18446744073709551616 (2**64) is represented
732	   as 0b110_00010 (major type 6, tag 2), followed by 0b010_01001 (major
733	   type 2, length 9), followed by 0x010000000000000000 (one byte 0x01
734	   and eight bytes 0x00).  In hexadecimal:

736	   C2                        -- Tag 2
737	      49                     -- Byte string of length 9
738	         010000000000000000  -- Bytes content

740	2.4.3.  Decimal Fractions and Bigfloats

742	   Decimal fractions combine an integer mantissa with a base-10 scaling
743	   factor.  They are most useful if an application needs the exact
744	   representation of a decimal fraction such as 1.1 because there is no
745	   exact representation for many decimal fractions in binary floating
746	   point.

748	   Bigfloats combine an integer mantissa with a base-2 scaling factor.
749	   They are binary floating-point values that can exceed the range or
750	   the precision of the three IEEE 754 formats supported by CBOR
751	   (Section 2.3).  Bigfloats may also be used by constrained
752	   applications that need some basic binary floating-point capability
753	   without the need for supporting IEEE 754.

755	   A decimal fraction or a bigfloat is represented as a tagged array
756	   that contains exactly two integer numbers: an exponent e and a
757	   mantissa m.  Decimal fractions (tag 4) use base-10 exponents; the
758	   value of a decimal fraction data item is m*(10**e).  Bigfloats (tag
759	   5) use base-2 exponents; the value of a bigfloat data item is
760	   m*(2**e).  The exponent e MUST be represented in an integer of major
761	   type 0 or 1, while the mantissa also can be a bignum (Section 2.4.2).

763	   An example of a decimal fraction is that the number 273.15 could be
764	   represented as 0b110_00100 (major type of 6 for the tag, additional
765	   information of 4 for the type of tag), followed by 0b100_00010 (major
766	   type of 4 for the array, additional information of 2 for the length
767	   of the array), followed by 0b001_00001 (major type of 1 for the first
768	   integer, additional information of 1 for the value of -2), followed
769	   by 0b000_11001 (major type of 0 for the second integer, additional
770	   information of 25 for a two-byte value), followed by
771	   0b0110101010110011 (27315 in two bytes).  In hexadecimal:

773	   C4             -- Tag 4
774	      82          -- Array of length 2
775	         21       -- -2
776	         19 6ab3  -- 27315

778	   An example of a bigfloat is that the number 1.5 could be represented
779	   as 0b110_00101 (major type of 6 for the tag, additional information
780	   of 5 for the type of tag), followed by 0b100_00010 (major type of 4
781	   for the array, additional information of 2 for the length of the
782	   array), followed by 0b001_00000 (major type of 1 for the first
783	   integer, additional information of 0 for the value of -1), followed
784	   by 0b000_00011 (major type of 0 for the second integer, additional
785	   information of 3 for the value of 3).  In hexadecimal:

787	   C5             -- Tag 5
788	      82          -- Array of length 2
789	         20       -- -1
790	         03       -- 3

792	   Decimal fractions and bigfloats provide no representation of
793	   Infinity, -Infinity, or NaN; if these are needed in place of a
794	   decimal fraction or bigfloat, the IEEE 754 half-precision
795	   representations from Section 2.3 can be used.  For constrained
796	   applications, where there is a choice between representing a specific
797	   number as an integer and as a decimal fraction or bigfloat (such as
798	   when the exponent is small and non-negative), there is a quality-of-
799	   implementation expectation that the integer representation is used
800	   directly.

802	2.4.4.  Content Hints

804	   The tags in this section are for content hints that might be used by
805	   generic CBOR processors.

807	2.4.4.1.  Encoded CBOR Data Item

809	   Sometimes it is beneficial to carry an embedded CBOR data item that
810	   is not meant to be decoded immediately at the time the enclosing data
811	   item is being parsed.  Tag 24 (CBOR data item) can be used to tag the
812	   embedded byte string as a data item encoded in CBOR format.

814	2.4.4.2.  Expected Later Encoding for CBOR-to-JSON Converters

816	   Tags 21 to 23 indicate that a byte string might require a specific
817	   encoding when interoperating with a text-based representation.  These
818	   tags are useful when an encoder knows that the byte string data it is
819	   writing is likely to be later converted to a particular JSON-based
820	   usage.  That usage specifies that some strings are encoded as base64,
821	   base64url, and so on.  The encoder uses byte strings instead of doing
822	   the encoding itself to reduce the message size, to reduce the code
823	   size of the encoder, or both.  The encoder does not know whether or
824	   not the converter will be generic, and therefore wants to say what it
825	   believes is the proper way to convert binary strings to JSON.

827	   The data item tagged can be a byte string or any other data item.  In
828	   the latter case, the tag applies to all of the byte string data items
829	   contained in the data item, except for those contained in a nested
830	   data item tagged with an expected conversion.

832	   These three tag types suggest conversions to three of the base data
833	   encodings defined in [RFC4648].  For base64url encoding, padding is
834	   not used (see Section 3.2 of RFC 4648); that is, all trailing equals
835	   signs ("=") are removed from the base64url-encoded string.  Later
836	   tags might be defined for other data encodings of RFC 4648 or for
837	   other ways to encode binary data in strings.

839	2.4.4.3.  Encoded Text

841	   Some text strings hold data that have formats widely used on the
842	   Internet, and sometimes those formats can be validated and presented
843	   to the application in appropriate form by the decoder.  There are
844	   tags for some of these formats.

846	   o  Tag 32 is for URIs, as defined in [RFC3986];
847	   o  Tags 33 and 34 are for base64url- and base64-encoded text strings,
848	      as defined in [RFC4648];

850	   o  Tag 35 is for regular expressions in Perl Compatible Regular
851	      Expressions (PCRE) / JavaScript syntax [ECMA262].

853	   o  Tag 36 is for MIME messages (including all headers), as defined in
854	      [RFC2045];

856	   Note that tags 33 and 34 differ from 21 and 22 in that the data is
857	   transported in base-encoded form for the former and in raw byte
858	   string form for the latter.

860	2.4.5.  Self-Describe CBOR

862	   In many applications, it will be clear from the context that CBOR is
863	   being employed for encoding a data item.  For instance, a specific
864	   protocol might specify the use of CBOR, or a media type is indicated
865	   that specifies its use.  However, there may be applications where
866	   such context information is not available, such as when CBOR data is
867	   stored in a file and disambiguating metadata is not in use.  Here, it
868	   may help to have some distinguishing characteristics for the data
869	   itself.

871	   Tag 55799 is defined for this purpose.  It does not impart any
872	   special semantics on the data item that follows; that is, the
873	   semantics of a data item tagged with tag 55799 is exactly identical
874	   to the semantics of the data item itself.

876	   The serialization of this tag is 0xd9d9f7, which appears not to be in
877	   use as a distinguishing mark for frequently used file types.  In
878	   particular, it is not a valid start of a Unicode text in any Unicode
879	   encoding if followed by a valid CBOR data item.

881	   For instance, a decoder might be able to parse both CBOR and JSON.
882	   Such a decoder would need to mechanically distinguish the two
883	   formats.  An easy way for an encoder to help the decoder would be to
884	   tag the entire CBOR item with tag 55799, the serialization of which
885	   will never be found at the beginning of a JSON text.

887	3.  Creating CBOR-Based Protocols

889	   Data formats such as CBOR are often used in environments where there
890	   is no format negotiation.  A specific design goal of CBOR is to not
891	   need any included or assumed schema: a decoder can take a CBOR item
892	   and decode it with no other knowledge.

894	   Of course, in real-world implementations, the encoder and the decoder
895	   will have a shared view of what should be in a CBOR data item.  For
896	   example, an agreed-to format might be "the item is an array whose
897	   first value is a UTF-8 string, second value is an integer, and
898	   subsequent values are zero or more floating-point numbers" or "the
899	   item is a map that has byte strings for keys and contains at least
900	   one pair whose key is 0xab01".

902	   This specification puts no restrictions on CBOR-based protocols.  An
903	   encoder can be capable of encoding as many or as few types of values
904	   as is required by the protocol in which it is used; a decoder can be
905	   capable of understanding as many or as few types of values as is
906	   required by the protocols in which it is used.  This lack of
907	   restrictions allows CBOR to be used in extremely constrained
908	   environments.

910	   This section discusses some considerations in creating CBOR-based
911	   protocols.  It is advisory only and explicitly excludes any language
912	   from RFC 2119 other than words that could be interpreted as "MAY" in
913	   the sense of RFC 2119.

915	3.1.  CBOR in Streaming Applications

917	   In a streaming application, a data stream may be composed of a
918	   sequence of CBOR data items concatenated back-to-back.  In such an
919	   environment, the decoder immediately begins decoding a new data item
920	   if data is found after the end of a previous data item.

922	   Not all of the bytes making up a data item may be immediately
923	   available to the decoder; some decoders will buffer additional data
924	   until a complete data item can be presented to the application.
925	   Other decoders can present partial information about a top-level data
926	   item to an application, such as the nested data items that could
927	   already be decoded, or even parts of a byte string that hasn't
928	   completely arrived yet.

930	   Note that some applications and protocols will not want to use
931	   indefinite-length encoding.  Using indefinite-length encoding allows
932	   an encoder to not need to marshal all the data for counting, but it
933	   requires a decoder to allocate increasing amounts of memory while
934	   waiting for the end of the item.  This might be fine for some
935	   applications but not others.

937	3.2.  Generic Encoders and Decoders

939	   A generic CBOR decoder can decode all well-formed CBOR data and
940	   present them to an application.  CBOR data is well-formed if it uses
941	   the initial bytes, as well as the byte strings and/or data items that
942	   are implied by their values, in the manner defined by CBOR, and no
943	   extraneous data follows (Appendix C).

945	   Even though CBOR attempts to minimize these cases, not all well-
946	   formed CBOR data is valid: for example, the format excludes simple
947	   values below 32 that are encoded with an extension byte.  Also,
948	   specific tags may make semantic constraints that may be violated,
949	   such as by including a tag in a bignum tag or by following a byte
950	   string within a date tag.  Finally, the data may be invalid, such as
951	   invalid UTF-8 strings or date strings that do not conform to
952	   [RFC3339].  There is no requirement that generic encoders and
953	   decoders make unnatural choices for their application interface to
954	   enable the processing of invalid data.  Generic encoders and decoders
955	   are expected to forward simple values and tags even if their specific
956	   codepoints are not registered at the time the encoder/decoder is
957	   written (Section 3.5).

959	   Generic decoders provide ways to present well-formed CBOR values,
960	   both valid and invalid, to an application.  The diagnostic notation
961	   (Section 6) may be used to present well-formed CBOR values to humans.

963	   Generic encoders provide an application interface that allows the
964	   application to specify any well-formed value, including simple values
965	   and tags unknown to the encoder.

967	3.3.  Syntax Errors

969	   A decoder encountering a CBOR data item that is not well-formed
970	   generally can choose to completely fail the decoding (issue an error
971	   and/or stop processing altogether), substitute the problematic data
972	   and data items using a decoder-specific convention that clearly
973	   indicates there has been a problem, or take some other action.

975	3.3.1.  Incomplete CBOR Data Items

977	   The representation of a CBOR data item has a specific length,
978	   determined by its initial bytes and by the structure of any data
979	   items enclosed in the data items.  If less data is available, this
980	   can be treated as a syntax error.  A decoder may also implement
981	   incremental parsing, that is, decode the data item as far as it is
982	   available and present the data found so far (such as in an event-
983	   based interface), with the option of continuing the decoding once
984	   further data is available.

986	   Examples of incomplete data items include:

988	   o  A decoder expects a certain number of array or map entries but
989	      instead encounters the end of the data.

991	   o  A decoder processes what it expects to be the last pair in a map
992	      and comes to the end of the data.

994	   o  A decoder has just seen a tag and then encounters the end of the
995	      data.

997	   o  A decoder has seen the beginning of an indefinite-length item but
998	      encounters the end of the data before it sees the "break" stop
999	      code.

1001	3.3.2.  Malformed Indefinite-Length Items

1003	   Examples of malformed indefinite-length data items include:

1005	   o  Within an indefinite-length byte string or text, a decoder finds
1006	      an item that is not of the appropriate major type before it finds
1007	      the "break" stop code.

1009	   o  Within an indefinite-length map, a decoder encounters the "break"
1010	      stop code immediately after reading a key (the value is missing).

1012	   Another error is finding a "break" stop code at a point in the data
1013	   where there is no immediately enclosing (unclosed) indefinite-length
1014	   item.

1016	3.3.3.  Unknown Additional Information Values

1018	   At the time of writing, some additional information values are
1019	   unassigned and reserved for future versions of this document (see
1020	   Section 5.2).  Since the overall syntax for these additional
1021	   information values is not yet defined, a decoder that sees an
1022	   additional information value that it does not understand cannot
1023	   continue parsing.

1025	3.4.  Other Decoding Errors

1027	   A CBOR data item may be syntactically well-formed but present a
1028	   problem with interpreting the data encoded in it in the CBOR data
1029	   model.  Generally speaking, a decoder that finds a data item with
1030	   such a problem might issue a warning, might stop processing
1031	   altogether, might handle the error and make the problematic value
1032	   available to the application as such, or take some other type of
1033	   action.

1035	   Such problems might include:

1037	   Duplicate keys in a map:  Generic decoders (Section 3.2) make data
1038	      available to applications using the native CBOR data model.  That
1039	      data model includes maps (key-value mappings with unique keys),
1040	      not multimaps (key-value mappings where multiple entries can have
1041	      the same key).  Thus, a generic decoder that gets a CBOR map item
1042	      that has duplicate keys will decode to a map with only one
1043	      instance of that key, or it might stop processing altogether.  On
1044	      the other hand, a "streaming decoder" may not even be able to
1045	      notice (Section 3.7).

1047	   Inadmissible type on the value following a tag:  Tags (Section 2.4)
1048	      specify what type of data item is supposed to follow the tag; for
1049	      example, the tags for positive or negative bignums are supposed to
1050	      be put on byte strings.  A decoder that decodes the tagged data
1051	      item into a native representation (a native big integer in this
1052	      example) is expected to check the type of the data item being
1053	      tagged.  Even decoders that don't have such native representations
1054	      available in their environment may perform the check on those tags
1055	      known to them and react appropriately.

1057	   Invalid UTF-8 string:  A decoder might or might not want to verify
1058	      that the sequence of bytes in a UTF-8 string (major type 3) is
1059	      actually valid UTF-8 and react appropriately.

1061	3.5.  Handling Unknown Simple Values and Tags

1063	   A decoder that comes across a simple value (Section 2.3) that it does
1064	   not recognize, such as a value that was added to the IANA registry
1065	   after the decoder was deployed or a value that the decoder chose not
1066	   to implement, might issue a warning, might stop processing
1067	   altogether, might handle the error by making the unknown value
1068	   available to the application as such (as is expected of generic
1069	   decoders), or take some other type of action.

1071	   A decoder that comes across a tag (Section 2.4) that it does not
1072	   recognize, such as a tag that was added to the IANA registry after
1073	   the decoder was deployed or a tag that the decoder chose not to
1074	   implement, might issue a warning, might stop processing altogether,
1075	   might handle the error and present the unknown tag value together
1076	   with the contained data item to the application (as is expected of
1077	   generic decoders), might ignore the tag and simply present the
1078	   contained data item only to the application, or take some other type
1079	   of action.

1081	3.6.  Numbers

1083	   For the purposes of this specification, all number representations
1084	   for the same numeric value are equivalent.  This means that an
1085	   encoder can encode a floating-point value of 0.0 as the integer 0.
1086	   It, however, also means that an application that expects to find
1087	   integer values only might find floating-point values if the encoder
1088	   decides these are desirable, such as when the floating-point value is
1089	   more compact than a 64-bit integer.

1091	   An application or protocol that uses CBOR might restrict the
1092	   representations of numbers.  For instance, a protocol that only deals
1093	   with integers might say that floating-point numbers may not be used
1094	   and that decoders of that protocol do not need to be able to handle
1095	   floating-point numbers.  Similarly, a protocol or application that
1096	   uses CBOR might say that decoders need to be able to handle either
1097	   type of number.

1099	   CBOR-based protocols should take into account that different language
1100	   environments pose different restrictions on the range and precision
1101	   of numbers that are representable.  For example, the JavaScript
1102	   number system treats all numbers as floating point, which may result
1103	   in silent loss of precision in decoding integers with more than 53
1104	   significant bits.  A protocol that uses numbers should define its
1105	   expectations on the handling of non-trivial numbers in decoders and
1106	   receiving applications.

1108	   A CBOR-based protocol that includes floating-point numbers can
1109	   restrict which of the three formats (half-precision, single-
1110	   precision, and double-precision) are to be supported.  For an
1111	   integer-only application, a protocol may want to completely exclude
1112	   the use of floating-point values.

1114	   A CBOR-based protocol designed for compactness may want to exclude
1115	   specific integer encodings that are longer than necessary for the
1116	   application, such as to save the need to implement 64-bit integers.
1117	   There is an expectation that encoders will use the most compact
1118	   integer representation that can represent a given value.  However, a
1119	   compact application should accept values that use a longer-than-
1120	   needed encoding (such as encoding "0" as 0b000_11001 followed by two
1121	   bytes of 0x00) as long as the application can decode an integer of
1122	   the given size.

1124	3.7.  Specifying Keys for Maps

1126	   The encoding and decoding applications need to agree on what types of
1127	   keys are going to be used in maps.  In applications that need to
1128	   interwork with JSON-based applications, keys probably should be
1129	   limited to UTF-8 strings only; otherwise, there has to be a specified
1130	   mapping from the other CBOR types to Unicode characters, and this
1131	   often leads to implementation errors.  In applications where keys are
1132	   numeric in nature and numeric ordering of keys is important to the
1133	   application, directly using the numbers for the keys is useful.

1135	   If multiple types of keys are to be used, consideration should be
1136	   given to how these types would be represented in the specific
1137	   programming environments that are to be used.  For example, in
1138	   JavaScript objects, a key of integer 1 cannot be distinguished from a
1139	   key of string "1".  This means that, if integer keys are used, the
1140	   simultaneous use of string keys that look like numbers needs to be
1141	   avoided.  Again, this leads to the conclusion that keys should be of
1142	   a single CBOR type.

1144	   Decoders that deliver data items nested within a CBOR data item
1145	   immediately on decoding them ("streaming decoders") often do not keep
1146	   the state that is necessary to ascertain uniqueness of a key in a
1147	   map.  Similarly, an encoder that can start encoding data items before
1148	   the enclosing data item is completely available ("streaming encoder")
1149	   may want to reduce its overhead significantly by relying on its data
1150	   source to maintain uniqueness.

1152	   A CBOR-based protocol should make an intentional decision about what
1153	   to do when a receiving application does see multiple identical keys
1154	   in a map.  The resulting rule in the protocol should respect the CBOR
1155	   data model: it cannot prescribe a specific handling of the entries
1156	   with the identical keys, except that it might have a rule that having
1157	   identical keys in a map indicates a malformed map and that the
1158	   decoder has to stop with an error.  Duplicate keys are also
1159	   prohibited by CBOR decoders that are using strict mode
1160	   (Section 3.10).

1162	   The CBOR data model for maps does not allow ascribing semantics to
1163	   the order of the key/value pairs in the map representation.
1164	   Thus, it would be a very bad practice to define a CBOR-based protocol
1165	   in such a way that changing the key/value pair order in a map would
1166	   change the semantics, apart from trivial aspects (cache usage, etc.).
1167	   (A CBOR-based protocol can prescribe a specific order of
1168	   serialization, such as for canonicalization.)

1170	   Applications for constrained devices that have maps with 24 or fewer
1171	   frequently used keys should consider using small integers (and those
1172	   with up to 48 frequently used keys should consider also using small
1173	   negative integers) because the keys can then be encoded in a single
1174	   byte.

1176	3.8.  Undefined Values

1178	   In some CBOR-based protocols, the simple value (Section 2.3) of
1179	   Undefined might be used by an encoder as a substitute for a data item
1180	   with an encoding problem, in order to allow the rest of the enclosing
1181	   data items to be encoded without harm.

1183	3.9.  Canonical CBOR

1185	   Some protocols may want encoders to only emit CBOR in a particular
1186	   canonical format; those protocols might also have the decoders check
1187	   that their input is canonical.  Those protocols are free to define
1188	   what they mean by a canonical format and what encoders and decoders
1189	   are expected to do.  This section lists some suggestions for such
1190	   protocols.

1192	   If a protocol considers "canonical" to mean that two encoder
1193	   implementations starting with the same input data will produce the
1194	   same CBOR output, the following four rules would suffice:

1196	   o  Integers must be as small as possible.

1198	      *  0 to 23 and -1 to -24 must be expressed in the same byte as the
1199	         major type;

1201	      *  24 to 255 and -25 to -256 must be expressed only with an
1202	         additional uint8_t;

1204	      *  256 to 65535 and -257 to -65536 must be expressed only with an
1205	         additional uint16_t;

1207	      *  65536 to 4294967295 and -65537 to -4294967296 must be expressed
1208	         only with an additional uint32_t.

1210	   o  The expression of lengths in major types 2 through 5 must be as
1211	      short as possible.  The rules for these lengths follow the above
1212	      rule for integers.

1214	   o  The keys in every map must be sorted lowest value to highest.
1215	      Sorting is performed on the bytes of the representation of the key
1216	      data items without paying attention to the 3/5 bit splitting for
1217	      major types.  (Note that this rule allows maps that have keys of
1218	      different types, even though that is probably a bad practice that
1219	      could lead to errors in some canonicalization implementations.)
1220	      The sorting rules are:

1222	      *  If two keys have different lengths, the shorter one sorts
1223	         earlier;

1225	      *  If two keys have the same length, the one with the lower value
1226	         in (byte-wise) lexical order sorts earlier.

1228	   o  Indefinite-length items must be made into definite-length items.

1230	   If a protocol allows for IEEE floats, then additional
1231	   canonicalization rules might need to be added.  One example rule
1232	   might be to have all floats start as a 64-bit float, then do a test
1233	   conversion to a 32-bit float; if the result is the same numeric
1234	   value, use the shorter value and repeat the process with a test
1235	   conversion to a 16-bit float.  (This rule selects 16-bit float for
1236	   positive and negative Infinity as well.)  Also, there are many
1237	   representations for NaN.  If NaN is an allowed value, it must always
1238	   be represented as 0xf97e00.

1240	   CBOR tags present additional considerations for canonicalization.
1241	   The absence or presence of tags in a canonical format is determined
1242	   by the optionality of the tags in the protocol.  In a CBOR-based
1243	   protocol that allows optional tagging anywhere, the canonical format
1244	   must not allow them.  In a protocol that requires tags in certain
1245	   places, the tag needs to appear in the canonical format.  A CBOR-
1246	   based protocol that uses canonicalization might instead say that all
1247	   tags that appear in a message must be retained regardless of whether
1248	   they are optional.

1250	3.10.  Strict Mode

1252	   Some areas of application of CBOR do not require canonicalization
1253	   (Section 3.9) but may require that different decoders reach the same
1254	   (semantically equivalent) results, even in the presence of
1255	   potentially malicious data.  This can be required if one application
1256	   (such as a firewall or other protecting entity) makes a decision
1257	   based on the data that another application, which independently
1258	   decodes the data, relies on.

1260	   Normally, it is the responsibility of the sender to avoid ambiguously
1261	   decodable data.  However, the sender might be an attacker specially
1262	   making up CBOR data such that it will be interpreted differently by
1263	   different decoders in an attempt to exploit that as a vulnerability.
1264	   Generic decoders used in applications where this might be a problem
1265	   need to support a strict mode in which it is also the responsibility
1266	   of the receiver to reject ambiguously decodable data.  It is expected
1267	   that firewalls and other security systems that decode CBOR will only
1268	   decode in strict mode.

1270	   A decoder in strict mode will reliably reject any data that could be
1271	   interpreted by other decoders in different ways.  It will reliably
1272	   reject data items with syntax errors (Section 3.3).  It will also
1273	   expend the effort to reliably detect other decoding errors
1274	   (Section 3.4).  In particular, a strict decoder needs to have an API
1275	   that reports an error (and does not return data) for a CBOR data item
1276	   that contains any of the following:

1278	   o  a map (major type 5) that has more than one entry with the same
1279	      key

1281	   o  a tag that is used on a data item of the incorrect type

1283	   o  a data item that is incorrectly formatted for the type given to
1284	      it, such as invalid UTF-8 or data that cannot be interpreted with
1285	      the specific tag that it has been tagged with

1287	   A decoder in strict mode can do one of two things when it encounters
1288	   a tag or simple value that it does not recognize:

1290	   o  It can report an error (and not return data).

1292	   o  It can emit the unknown item (type, value, and, for tags, the
1293	      decoded tagged data item) to the application calling the decoder
1294	      with an indication that the decoder did not recognize that tag or
1295	      simple value.

1297	   The latter approach, which is also appropriate for non-strict
1298	   decoders, supports forward compatibility with newly registered tags
1299	   and simple values without the requirement to update the encoder at
1300	   the same time as the calling application.  (For this, the API for the
1301	   decoder needs to have a way to mark unknown items so that the calling
1302	   application can handle them in a manner appropriate for the program.)

1304	   Since some of this processing may have an appreciable cost (in
1305	   particular with duplicate detection for maps), support of strict mode
1306	   is not a requirement placed on all CBOR decoders.

1308	   Some encoders will rely on their applications to provide input data
1309	   in such a way that unambiguously decodable CBOR results.  A generic
1310	   encoder also may want to provide a strict mode where it reliably
1311	   limits its output to unambiguously decodable CBOR, independent of
1312	   whether or not its application is providing API-conformant data.

1314	4.  Converting Data between CBOR and JSON

1316	   This section gives non-normative advice about converting between CBOR
1317	   and JSON.  Implementations of converters are free to use whichever
1318	   advice here they want.

1320	   It is worth noting that a JSON text is a sequence of characters, not
1321	   an encoded sequence of bytes, while a CBOR data item consists of
1322	   bytes, not characters.

1324	4.1.  Converting from CBOR to JSON

1326	   Most of the types in CBOR have direct analogs in JSON.  However, some
1327	   do not, and someone implementing a CBOR-to-JSON converter has to
1328	   consider what to do in those cases.  The following non-normative
1329	   advice deals with these by converting them to a single substitute
1330	   value, such as a JSON null.

1332	   o  An integer (major type 0 or 1) becomes a JSON number.

1334	   o  A byte string (major type 2) that is not embedded in a tag that
1335	      specifies a proposed encoding is encoded in base64url without
1336	      padding and becomes a JSON string.

1338	   o  A UTF-8 string (major type 3) becomes a JSON string.  Note that
1339	      JSON requires escaping certain characters (RFC 7159, Section 7):
1340	      quotation mark (U+0022), reverse solidus (U+005C), and the "C0
1341	      control characters" (U+0000 through U+001F).  All other characters
1342	      are copied unchanged into the JSON UTF-8 string.

1344	   o  An array (major type 4) becomes a JSON array.

1346	   o  A map (major type 5) becomes a JSON object.  This is possible
1347	      directly only if all keys are UTF-8 strings.  A converter might
1348	      also convert other keys into UTF-8 strings (such as by converting
1349	      integers into strings containing their decimal representation);
1350	      however, doing so introduces a danger of key collision.

1352	   o  False (major type 7, additional information 20) becomes a JSON
1353	      false.

1355	   o  True (major type 7, additional information 21) becomes a JSON
1356	      true.

1358	   o  Null (major type 7, additional information 22) becomes a JSON
1359	      null.

1361	   o  A floating-point value (major type 7, additional information 25
1362	      through 27) becomes a JSON number if it is finite (that is, it can
1363	      be represented in a JSON number); if the value is non-finite (NaN,
1364	      or positive or negative Infinity), it is represented by the
1365	      substitute value.

1367	   o  Any other simple value (major type 7, any additional information
1368	      value not yet discussed) is represented by the substitute value.

1370	   o  A bignum (major type 6, tag value 2 or 3) is represented by
1371	      encoding its byte string in base64url without padding and becomes
1372	      a JSON string.  For tag value 3 (negative bignum), a "~" (ASCII
1373	      tilde) is inserted before the base-encoded value.  (The conversion
1374	      to a binary blob instead of a number is to prevent a likely
1375	      numeric overflow for the JSON decoder.)

1377	   o  A byte string with an encoding hint (major type 6, tag value 21
1378	      through 23) is encoded as described and becomes a JSON string.

1380	   o  For all other tags (major type 6, any other tag value), the
1381	      embedded CBOR item is represented as a JSON value; the tag value
1382	      is ignored.

1384	   o  Indefinite-length items are made definite before conversion.

1386	4.2.  Converting from JSON to CBOR

1388	   All JSON values, once decoded, directly map into one or more CBOR
1389	   values.  As with any kind of CBOR generation, decisions have to be
1390	   made with respect to number representation.  In a suggested
1391	   conversion:

1393	   o  JSON numbers without fractional parts (integer numbers) are
1394	      represented as integers (major types 0 and 1, possibly major type
1395	      6 tag value 2 and 3), choosing the shortest form; integers longer
1396	      than an implementation-defined threshold (which is usually either
1397	      32 or 64 bits) may instead be represented as floating-point
1398	      values.  (If the JSON was generated from a JavaScript
1399	      implementation, its precision is already limited to 53 bits
1400	      maximum.)

1402	   o  Numbers with fractional parts are represented as floating-point
1403	      values.  Preferably, the shortest exact floating-point
1404	      representation is used; for instance, 1.5 is represented in a
1405	      16-bit floating-point value (not all implementations will be
1406	      capable of efficiently finding the minimum form, though).  There
1407	      may be an implementation-defined limit to the precision that will
1408	      affect the precision of the represented values.  Decimal
1409	      representation should only be used if that is specified in a
1410	      protocol.

1412	   CBOR has been designed to generally provide a more compact encoding
1413	   than JSON.  One implementation strategy that might come to mind is to
1414	   perform a JSON-to-CBOR encoding in place in a single buffer.  This
1415	   strategy would need to carefully consider a number of pathological
1416	   cases, such as that some strings represented with no or very few
1417	   escapes and longer (or much longer) than 255 bytes may expand when
1418	   encoded as UTF-8 strings in CBOR.  Similarly, a few of the binary
1419	   floating-point representations might cause expansion from some short
1420	   decimal representations (1.1, 1e9) in JSON.  This may be hard to get
1421	   right, and any ensuing vulnerabilities may be exploited by an
1422	   attacker.

1424	5.  Future Evolution of CBOR

1426	   Successful protocols evolve over time.  New ideas appear,
1427	   implementation platforms improve, related protocols are developed and
1428	   evolve, and new requirements from applications and protocols are
1429	   added.  Facilitating protocol evolution is therefore an important
1430	   design consideration for any protocol development.

1432	   For protocols that will use CBOR, CBOR provides some useful
1433	   mechanisms to facilitate their evolution.  Best practices for this
1434	   are well known, particularly from JSON format development of JSON-
1435	   based protocols.  Therefore, such best practices are outside the
1436	   scope of this specification.

1438	   However, facilitating the evolution of CBOR itself is very well
1439	   within its scope.  CBOR is designed to both provide a stable basis
1440	   for development of CBOR-based protocols and to be able to evolve.
1441	   Since a successful protocol may live for decades, CBOR needs to be
1442	   designed for decades of use and evolution.  This section provides
1443	   some guidance for the evolution of CBOR.  It is necessarily more
1444	   subjective than other parts of this document.  It is also necessarily
1445	   incomplete, lest it turn into a textbook on protocol development.

1447	5.1.  Extension Points

1449	   In a protocol design, opportunities for evolution are often included
1450	   in the form of extension points.  For example, there may be a
1451	   codepoint space that is not fully allocated from the outset, and the
1452	   protocol is designed to tolerate and embrace implementations that
1453	   start using more codepoints than initially allocated.

1455	   Sizing the codepoint space may be difficult because the range
1456	   required may be hard to predict.  An attempt should be made to make
1457	   the codepoint space large enough so that it can slowly be filled over
1458	   the intended lifetime of the protocol.

1460	   CBOR has three major extension points:

1462	   o  the "simple" space (values in major type 7).  Of the 24 efficient
1463	      (and 224 slightly less efficient) values, only a small number have
1464	      been allocated.  Implementations receiving an unknown simple data
1465	      item may be able to process it as such, given that the structure
1466	      of the value is indeed simple.  The IANA registry in Section 7.1
1467	      is the appropriate way to address the extensibility of this
1468	      codepoint space.

1470	   o  the "tag" space (values in major type 6).  Again, only a small
1471	      part of the codepoint space has been allocated, and the space is
1472	      abundant (although the early numbers are more efficient than the
1473	      later ones).  Implementations receiving an unknown tag can choose
1474	      to simply ignore it or to process it as an unknown tag wrapping
1475	      the following data item.  The IANA registry in Section 7.2 is the
1476	      appropriate way to address the extensibility of this codepoint
1477	      space.

1479	   o  the "additional information" space.  An implementation receiving
1480	      an unknown additional information value has no way to continue
1481	      parsing, so allocating codepoints to this space is a major step.
1482	      There are also very few codepoints left.

1484	5.2.  Curating the Additional Information Space

1486	   The human mind is sometimes drawn to filling in little perceived gaps
1487	   to make something neat.  We expect the remaining gaps in the
1488	   codepoint space for the additional information values to be an
1489	   attractor for new ideas, just because they are there.

1491	   The present specification does not manage the additional information
1492	   codepoint space by an IANA registry.  Instead, allocations out of
1493	   this space can only be done by updating this specification.

1495	   For an additional information value of n >= 24, the size of the
1496	   additional data typically is 2**(n-24) bytes.  Therefore, additional
1497	   information values 28 and 29 should be viewed as candidates for
1498	   128-bit and 256-bit quantities, in case a need arises to add them to
1499	   the protocol.  Additional information value 30 is then the only
1500	   additional information value available for general allocation, and
1501	   there should be a very good reason for allocating it before assigning
1502	   it through an update of this protocol.

1504	6.  Diagnostic Notation

1506	   CBOR is a binary interchange format.  To facilitate documentation and
1507	   debugging, and in particular to facilitate communication between
1508	   entities cooperating in debugging, this section defines a simple
1509	   human-readable diagnostic notation.  All actual interchange always
1510	   happens in the binary format.

1512	   Note that this truly is a diagnostic format; it is not meant to be
1513	   parsed.  Therefore, no formal definition (as in ABNF) is given in
1514	   this document.  (Implementers looking for a text-based format for
1515	   representing CBOR data items in configuration files may also want to
1516	   consider YAML [YAML].)

1518	   The diagnostic notation is loosely based on JSON as it is defined in
1519	   RFC 7159, extending it where needed.

1521	   The notation borrows the JSON syntax for numbers (integer and
1522	   floating point), True (>true<), False (>false<), Null (>null<), UTF-8
1523	   strings, arrays, and maps (maps are called objects in JSON; the
1524	   diagnostic notation extends JSON here by allowing any data item in
1525	   the key position).  Undefined is written >undefined< as in
1526	   JavaScript.  The non-finite floating-point numbers Infinity,
1527	   -Infinity, and NaN are written exactly as in this sentence (this is
1528	   also a way they can be written in JavaScript, although JSON does not
1529	   allow them).  A tagged item is written as an integer number for the
1530	   tag followed by the item in parentheses; for instance, an RFC 3339
1531	   (ISO 8601) date could be notated as:

1533	      0("2013-03-21T20:04:00Z")

1535	   or the equivalent relative time as

1537	      1(1363896240)

1539	   Byte strings are notated in one of the base encodings, without
1540	   padding, enclosed in single quotes, prefixed by >h< for base16, >b32<
1541	   for base32, >h32< for base32hex, >b64< for base64 or base64url (the
1542	   actual encodings do not overlap, so the string remains unambiguous).
1543	   For example, the byte string 0x12345678 could be written h'12345678',
1544	   b32'CI2FM6A', or b64'EjRWeA'.

1546	   Unassigned simple values are given as "simple()" with the appropriate
1547	   integer in the parentheses.  For example, "simple(42)" indicates
1548	   major type 7, value 42.

1550	6.1.  Encoding Indicators

1552	   Sometimes it is useful to indicate in the diagnostic notation which
1553	   of several alternative representations were actually used; for
1554	   example, a data item written >1.5< by a diagnostic decoder might have
1555	   been encoded as a half-, single-, or double-precision float.

1557	   The convention for encoding indicators is that anything starting with
1558	   an underscore and all following characters that are alphanumeric or
1559	   underscore, is an encoding indicator, and can be ignored by anyone
1560	   not interested in this information.  Encoding indicators are always
1561	   optional.

1563	   A single underscore can be written after the opening brace of a map
1564	   or the opening bracket of an array to indicate that the data item was
1565	   represented in indefinite-length format.  For example, [_ 1, 2]
1566	   contains an indicator that an indefinite-length representation was
1567	   used to represent the data item [1, 2].

1569	   An underscore followed by a decimal digit n indicates that the
1570	   preceding item (or, for arrays and maps, the item starting with the
1571	   preceding bracket or brace) was encoded with an additional
1572	   information value of 24+n.  For example, 1.5_1 is a half-precision
1573	   floating-point number, while 1.5_3 is encoded as double precision.
1574	   This encoding indicator is not shown in Appendix A.  (Note that the
1575	   encoding indicator "_" is thus an abbreviation of the full form "_7",
1576	   which is not used.)

1578	   As a special case, byte and text strings of indefinite length can be
1579	   notated in the form (_ h'0123', h'4567') and (_ "foo", "bar").

1581	7.  IANA Considerations

1583	   IANA has created two registries for new CBOR values.  The registries
1584	   are separate, that is, not under an umbrella registry, and follow the
1585	   rules in [RFC5226].  IANA has also assigned a new MIME media type and
1586	   an associated Constrained Application Protocol (CoAP) Content-Format
1587	   entry.

1589	7.1.  Simple Values Registry

1591	   IANA has created the "Concise Binary Object Representation (CBOR)
1592	   Simple Values" registry.  The initial values are shown in Table 2.

1594	   New entries in the range 0 to 19 are assigned by Standards Action.
1595	   It is suggested that these Standards Actions allocate values starting
1596	   with the number 16 in order to reserve the lower numbers for
1597	   contiguous blocks (if any).

1599	   New entries in the range 32 to 255 are assigned by Specification
1600	   Required.

1602	7.2.  Tags Registry

1604	   IANA has created the "Concise Binary Object Representation (CBOR)
1605	   Tags" registry.  The initial values are shown in Table 3.

1607	   New entries in the range 0 to 23 are assigned by Standards Action.
1608	   New entries in the range 24 to 255 are assigned by Specification
1609	   Required.  New entries in the range 256 to 18446744073709551615 are
1610	   assigned by First Come First Served.  The template for registration
1611	   requests is:

1613	   o  Data item

1615	   o  Semantics (short form)

1617	   In addition, First Come First Served requests should include:

1619	   o  Point of contact

1621	   o  Description of semantics (URL)
1622	      This description is optional; the URL can point to something like
1623	      an Internet-Draft or a web page.

1625	7.3.  Media Type ("MIME Type")

1627	   The Internet media type [RFC6838] for CBOR data is application/cbor.

1629	   Type name: application

1631	   Subtype name: cbor

1633	   Required parameters: n/a

1635	   Optional parameters: n/a

1637	   Encoding considerations:  binary

1639	   Security considerations:  See Section 8 of this document

1641	   Interoperability considerations: n/a

1643	   Published specification: This document

1645	   Applications that use this media type:  None yet, but it is expected
1646	      that this format will be deployed in protocols and applications.

1648	   Additional information:
1649	     Magic number(s): n/a
1650	     File extension(s): .cbor
1651	     Macintosh file type code(s): n/a

1653	   Person & email address to contact for further information:
1654	     Carsten Bormann
1655	     cabo@tzi.org

1657	   Intended usage: COMMON

1659	   Restrictions on usage: none

1661	   Author:
1662	     Carsten Bormann 

1664	   Change controller:
1665	     The IESG 

1667	7.4.  CoAP Content-Format

1669	   Media Type: application/cbor

1671	   Encoding: -

1673	   Id: 60

1675	   Reference: [RFCthis]

1677	7.5.  The +cbor Structured Syntax Suffix Registration

1679	   Name: Concise Binary Object Representation (CBOR)

1681	   +suffix: +cbor

1683	   References: [RFCthis]

1685	   Encoding Considerations: CBOR is a binary format.

1687	   Interoperability Considerations: n/a
1688	   Fragment Identifier Considerations:
1689	     The syntax and semantics of fragment identifiers specified for
1690	     +cbor SHOULD be as specified for "application/cbor".  (At
1691	     publication of this document, there is no fragment identification
1692	     syntax defined for "application/cbor".)

1694	     The syntax and semantics for fragment identifiers for a specific
1695	     "xxx/yyy+cbor" SHOULD be processed as follows:

1697	     For cases defined in +cbor, where the fragment identifier resolves
1698	     per the +cbor rules, then process as specified in +cbor.

1700	     For cases defined in +cbor, where the fragment identifier does
1701	     not resolve per the +cbor rules, then process as specified in
1702	     "xxx/yyy+cbor".

1704	     For cases not defined in +cbor, then process as specified in
1705	     "xxx/yyy+cbor".

1707	   Security Considerations:  See Section 8 of this document

1709	   Contact:
1710	     Apps Area Working Group (apps-discuss@ietf.org)

1712	   Author/Change Controller:
1713	     The Apps Area Working Group.
1714	     The IESG has change control over this registration.

1716	8.  Security Considerations

1718	   A network-facing application can exhibit vulnerabilities in its
1719	   processing logic for incoming data.  Complex parsers are well known
1720	   as a likely source of such vulnerabilities, such as the ability to
1721	   remotely crash a node, or even remotely execute arbitrary code on it.
1722	   CBOR attempts to narrow the opportunities for introducing such
1723	   vulnerabilities by reducing parser complexity, by giving the entire
1724	   range of encodable values a meaning where possible.

1726	   Resource exhaustion attacks might attempt to lure a decoder into
1727	   allocating very big data items (strings, arrays, maps) or exhaust the
1728	   stack depth by setting up deeply nested items.  Decoders need to have
1729	   appropriate resource management to mitigate these attacks.  (Items
1730	   for which very large sizes are given can also attempt to exploit
1731	   integer overflow vulnerabilities.)

1733	   Applications where a CBOR data item is examined by a gatekeeper
1734	   function and later used by a different application may exhibit
1735	   vulnerabilities when multiple interpretations of the data item are
1736	   possible.  For example, an attacker could make use of duplicate keys
1737	   in maps and precision issues in numbers to make the gatekeeper base
1738	   its decisions on a different interpretation than the one that will be
1739	   used by the second application.  Protocols that are used in a
1740	   security context should be defined in such a way that these multiple
1741	   interpretations are reliably reduced to a single one.  To facilitate
1742	   this, encoder and decoder implementations used in such contexts
1743	   should provide at least one strict mode of operation (Section 3.10).

1745	9.  Acknowledgements

1747	   CBOR was inspired by MessagePack.  MessagePack was developed and
1748	   promoted by Sadayuki Furuhashi ("frsyuki").  This reference to
1749	   MessagePack is solely for attribution; CBOR is not intended as a
1750	   version of or replacement for MessagePack, as it has different design
1751	   goals and requirements.

1753	   The need for functionality beyond the original MessagePack
1754	   Specification became obvious to many people at about the same time
1755	   around the year 2012.  BinaryPack is a minor derivation of
1756	   MessagePack that was developed by Eric Zhang for the binaryjs
1757	   project.  A similar, but different, extension was made by Tim Caswell
1758	   for his msgpack-js and msgpack-js-browser projects.  Many people have
1759	   contributed to the recent discussion about extending MessagePack to
1760	   separate text string representation from byte string representation.

1762	   The encoding of the additional information in CBOR was inspired by
1763	   the encoding of length information designed by Klaus Hartke for CoAP.

1765	   This document also incorporates suggestions made by many people,
1766	   notably Dan Frost, James Manger, Joe Hildebrand, Keith Moore, Matthew
1767	   Lepinski, Nico Williams, Phillip Hallam-Baker, Ray Polk, Tim Bray,
1768	   Tony Finch, Tony Hansen, and Yaron Sheffer.

1770	10.  References

1772	10.1.  Normative References

1774	   [ECMA262]  European Computer Manufacturers Association, "ECMAScript
1775	              Language Specification 5.1 Edition", ECMA Standard ECMA-
1776	              262, June 2011, .

1780	   [RFC2045]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
1781	              Extensions (MIME) Part One: Format of Internet Message
1782	              Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996,
1783	              .

1785	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1786	              Requirement Levels", BCP 14, RFC 2119,
1787	              DOI 10.17487/RFC2119, March 1997,
1788	              .

1790	   [RFC3339]  Klyne, G. and C. Newman, "Date and Time on the Internet:
1791	              Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002,
1792	              .

1794	   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
1795	              10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November
1796	              2003, .

1798	   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
1799	              Resource Identifier (URI): Generic Syntax", STD 66,
1800	              RFC 3986, DOI 10.17487/RFC3986, January 2005,
1801	              .

1803	   [RFC4287]  Nottingham, M., Ed. and R. Sayre, Ed., "The Atom
1804	              Syndication Format", RFC 4287, DOI 10.17487/RFC4287,
1805	              December 2005, .

1807	   [RFC4648]  Josefsson, S., "The Base16, Base32, and Base64 Data
1808	              Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
1809	              .

1811	   [RFC5226]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
1812	              IANA Considerations Section in RFCs", BCP 26, RFC 5226,
1813	              DOI 10.17487/RFC5226, May 2008,
1814	              .

1816	   [TIME_T]   The Open Group Base Specifications, "Vol. 1: Base
1817	              Definitions, Issue 7", Section 4.15 'Seconds Since the
1818	              Epoch', IEEE Std 1003.1, 2013 Edition, 2013,
1819	              .

1822	10.2.  Informative References

1824	   [ASN.1]    International Telecommunication Union, "Information
1825	              Technology -- ASN.1 encoding rules: Specification of Basic
1826	              Encoding Rules (BER), Canonical Encoding Rules (CER) and
1827	              Distinguished Encoding Rules (DER)", ITU-T Recommendation
1828	              X.690, 1994.

1830	   [BSON]     Various, "BSON - Binary JSON", 2013,
1831	              .

1833	   [MessagePack]
1834	              Furuhashi, S., "MessagePack", 2013, .

1836	   [RFC0713]  Haverty, J., "MSDTP-Message Services Data Transmission
1837	              Protocol", RFC 713, DOI 10.17487/RFC0713, April 1976,
1838	              .

1840	   [RFC6838]  Freed, N., Klensin, J., and T. Hansen, "Media Type
1841	              Specifications and Registration Procedures", BCP 13,
1842	              RFC 6838, DOI 10.17487/RFC6838, January 2013,
1843	              .

1845	   [RFC7159]  Bray, T., Ed., "The JavaScript Object Notation (JSON) Data
1846	              Interchange Format", RFC 7159, DOI 10.17487/RFC7159, March
1847	              2014, .

1849	   [RFC7228]  Bormann, C., Ersue, M., and A. Keranen, "Terminology for
1850	              Constrained-Node Networks", RFC 7228,
1851	              DOI 10.17487/RFC7228, May 2014,
1852	              .

1854	   [UBJSON]   The Buzz Media, "Universal Binary JSON Specification",
1855	              2013, .

1857	   [YAML]     Ben-Kiki, O., Evans, C., and I. Net, "YAML Ain't Markup
1858	              Language (YAML[TM]) Version 1.2", 3rd Edition, October
1859	              2009, .

1861	Appendix A.  Examples

1863	   The following table provides some CBOR-encoded values in hexadecimal
1864	   (right column), together with diagnostic notation for these values
1865	   (left column).  Note that the string "\u00fc" is one form of
1866	   diagnostic notation for a UTF-8 string containing the single Unicode
1867	   character U+00FC, LATIN SMALL LETTER U WITH DIAERESIS (u umlaut).
1868	   Similarly, "\u6c34" is a UTF-8 string in diagnostic notation with a
1869	   single character U+6C34 (CJK UNIFIED IDEOGRAPH-6C34, often
1870	   representing "water"), and "\ud800\udd51" is a UTF-8 string in
1871	   diagnostic notation with a single character U+10151 (GREEK ACROPHONIC
1872	   ATTIC FIFTY STATERS).  (Note that all these single-character strings
1873	   could also be represented in native UTF-8 in diagnostic notation,
1874	   just not in an ASCII-only specification like the present one.)  In
1875	   the diagnostic notation provided for bignums, their intended numeric
1876	   value is shown as a decimal number (such as 18446744073709551616)
1877	   instead of showing a tagged byte string (such as
1878	   2(h'010000000000000000')).

1880	   +------------------------------+------------------------------------+
1881	   | Diagnostic                   | Encoded                            |
1882	   +------------------------------+------------------------------------+
1883	   | 0                            | 0x00                               |
1884	   |                              |                                    |
1885	   | 1                            | 0x01                               |
1886	   |                              |                                    |
1887	   | 10                           | 0x0a                               |
1888	   |                              |                                    |
1889	   | 23                           | 0x17                               |
1890	   |                              |                                    |
1891	   | 24                           | 0x1818                             |
1892	   |                              |                                    |
1893	   | 25                           | 0x1819                             |
1894	   |                              |                                    |
1895	   | 100                          | 0x1864                             |
1896	   |                              |                                    |
1897	   | 1000                         | 0x1903e8                           |
1898	   |                              |                                    |
1899	   | 1000000                      | 0x1a000f4240                       |
1900	   |                              |                                    |
1901	   | 1000000000000                | 0x1b000000e8d4a51000               |
1902	   |                              |                                    |
1903	   | 18446744073709551615         | 0x1bffffffffffffffff               |
1904	   |                              |                                    |
1905	   | 18446744073709551616         | 0xc249010000000000000000           |
1906	   |                              |                                    |
1907	   | -18446744073709551616        | 0x3bffffffffffffffff               |
1908	   |                              |                                    |
1909	   | -18446744073709551617        | 0xc349010000000000000000           |
1910	   |                              |                                    |
1911	   | -1                           | 0x20                               |
1912	   |                              |                                    |
1913	   | -10                          | 0x29                               |
1914	   |                              |                                    |
1915	   | -100                         | 0x3863                             |
1916	   |                              |                                    |
1917	   | -1000                        | 0x3903e7                           |
1918	   |                              |                                    |
1919	   | 0.0                          | 0xf90000                           |
1920	   |                              |                                    |
1921	   | -0.0                         | 0xf98000                           |
1922	   |                              |                                    |
1923	   | 1.0                          | 0xf93c00                           |
1924	   |                              |                                    |
1925	   | 1.1                          | 0xfb3ff199999999999a               |
1926	   |                              |                                    |
1927	   | 1.5                          | 0xf93e00                           |
1928	   |                              |                                    |
1929	   | 65504.0                      | 0xf97bff                           |
1930	   |                              |                                    |
1931	   | 100000.0                     | 0xfa47c35000                       |
1932	   |                              |                                    |
1933	   | 3.4028234663852886e+38       | 0xfa7f7fffff                       |
1934	   |                              |                                    |
1935	   | 1.0e+300                     | 0xfb7e37e43c8800759c               |
1936	   |                              |                                    |
1937	   | 5.960464477539063e-8         | 0xf90001                           |
1938	   |                              |                                    |
1939	   | 0.00006103515625             | 0xf90400                           |
1940	   |                              |                                    |
1941	   | -4.0                         | 0xf9c400                           |
1942	   |                              |                                    |
1943	   | -4.1                         | 0xfbc010666666666666               |
1944	   |                              |                                    |
1945	   | Infinity                     | 0xf97c00                           |
1946	   |                              |                                    |
1947	   | NaN                          | 0xf97e00                           |
1948	   |                              |                                    |
1949	   | -Infinity                    | 0xf9fc00                           |
1950	   |                              |                                    |
1951	   | Infinity                     | 0xfa7f800000                       |
1952	   |                              |                                    |
1953	   | NaN                          | 0xfa7fc00000                       |
1954	   |                              |                                    |
1955	   | -Infinity                    | 0xfaff800000                       |
1956	   |                              |                                    |
1957	   | Infinity                     | 0xfb7ff0000000000000               |
1958	   |                              |                                    |
1959	   | NaN                          | 0xfb7ff8000000000000               |
1960	   |                              |                                    |
1961	   | -Infinity                    | 0xfbfff0000000000000               |
1962	   |                              |                                    |
1963	   | false                        | 0xf4                               |
1964	   |                              |                                    |
1965	   | true                         | 0xf5                               |
1966	   |                              |                                    |
1967	   | null                         | 0xf6                               |
1968	   |                              |                                    |
1969	   | undefined                    | 0xf7                               |
1970	   |                              |                                    |
1971	   | simple(16)                   | 0xf0                               |
1972	   |                              |                                    |
1973	   | simple(24)                   | 0xf818                             |
1974	   |                              |                                    |
1975	   | simple(255)                  | 0xf8ff                             |
1976	   |                              |                                    |
1977	   | 0("2013-03-21T20:04:00Z")    | 0xc074323031332d30332d32315432303a |
1978	   |                              | 30343a30305a                       |
1979	   |                              |                                    |
1980	   | 1(1363896240)                | 0xc11a514b67b0                     |
1981	   |                              |                                    |
1982	   | 1(1363896240.5)              | 0xc1fb41d452d9ec200000             |
1983	   |                              |                                    |
1984	   | 23(h'01020304')              | 0xd74401020304                     |
1985	   |                              |                                    |
1986	   | 24(h'6449455446')            | 0xd818456449455446                 |
1987	   |                              |                                    |
1988	   | 32("http://www.example.com") | 0xd82076687474703a2f2f7777772e6578 |
1989	   |                              | 616d706c652e636f6d                 |
1990	   |                              |                                    |
1991	   | h''                          | 0x40                               |
1992	   |                              |                                    |
1993	   | h'01020304'                  | 0x4401020304                       |
1994	   |                              |                                    |
1995	   | ""                           | 0x60                               |
1996	   |                              |                                    |
1997	   | "a"                          | 0x6161                             |
1998	   |                              |                                    |
1999	   | "IETF"                       | 0x6449455446                       |
2000	   |                              |                                    |
2001	   | "\"\\"                       | 0x62225c                           |
2002	   |                              |                                    |
2003	   | "\u00fc"                     | 0x62c3bc                           |
2004	   |                              |                                    |
2005	   | "\u6c34"                     | 0x63e6b0b4                         |
2006	   |                              |                                    |
2007	   | "\ud800\udd51"               | 0x64f0908591                       |
2008	   |                              |                                    |
2009	   | []                           | 0x80                               |
2010	   |                              |                                    |
2011	   | [1, 2, 3]                    | 0x83010203                         |
2012	   |                              |                                    |
2013	   | [1, [2, 3], [4, 5]]          | 0x8301820203820405                 |
2014	   |                              |                                    |
2015	   | [1, 2, 3, 4, 5, 6, 7, 8, 9,  | 0x98190102030405060708090a0b0c0d0e |
2016	   | 10, 11, 12, 13, 14, 15, 16,  | 0f101112131415161718181819         |
2017	   | 17, 18, 19, 20, 21, 22, 23,  |                                    |
2018	   | 24, 25]                      |                                    |
2019	   |                              |                                    |
2020	   | {}                           | 0xa0                               |
2021	   |                              |                                    |
2022	   | {1: 2, 3: 4}                 | 0xa201020304                       |
2023	   |                              |                                    |
2024	   | {"a": 1, "b": [2, 3]}        | 0xa26161016162820203               |
2025	   |                              |                                    |
2026	   | ["a", {"b": "c"}]            | 0x826161a161626163                 |
2027	   |                              |                                    |
2028	   | {"a": "A", "b": "B", "c":    | 0xa5616161416162614261636143616461 |
2029	   | "C", "d": "D", "e": "E"}     | 4461656145                         |
2030	   |                              |                                    |
2031	   | (_ h'0102', h'030405')       | 0x5f42010243030405ff               |
2032	   |                              |                                    |
2033	   | (_ "strea", "ming")          | 0x7f657374726561646d696e67ff       |
2034	   |                              |                                    |
2035	   | [_ ]                         | 0x9fff                             |
2036	   |                              |                                    |
2037	   | [_ 1, [2, 3], [_ 4, 5]]      | 0x9f018202039f0405ffff             |
2038	   |                              |                                    |
2039	   | [_ 1, [2, 3], [4, 5]]        | 0x9f01820203820405ff               |
2040	   |                              |                                    |
2041	   | [1, [2, 3], [_ 4, 5]]        | 0x83018202039f0405ff               |
2042	   |                              |                                    |
2043	   | [1, [_ 2, 3], [4, 5]]        | 0x83019f0203ff820405               |
2044	   |                              |                                    |
2045	   | [_ 1, 2, 3, 4, 5, 6, 7, 8,   | 0x9f0102030405060708090a0b0c0d0e0f |
2046	   | 9, 10, 11, 12, 13, 14, 15,   | 101112131415161718181819ff         |
2047	   | 16, 17, 18, 19, 20, 21, 22,  |                                    |
2048	   | 23, 24, 25]                  |                                    |
2049	   |                              |                                    |
2050	   | {_ "a": 1, "b": [_ 2, 3]}    | 0xbf61610161629f0203ffff           |
2051	   |                              |                                    |
2052	   | ["a", {_ "b": "c"}]          | 0x826161bf61626163ff               |
2053	   |                              |                                    |
2054	   | {_ "Fun": true, "Amt": -2}   | 0xbf6346756ef563416d7421ff         |
2055	   +------------------------------+------------------------------------+

2057	               Table 4: Examples of Encoded CBOR Data Items

2059	Appendix B.  Jump Table

2061	   For brevity, this jump table does not show initial bytes that are
2062	   reserved for future extension.  It also only shows a selection of the
2063	   initial bytes that can be used for optional features.  (All unsigned
2064	   integers are in network byte order.)

2066	   +------------+------------------------------------------------------+
2067	   | Byte       | Structure/Semantics                                  |
2068	   +------------+------------------------------------------------------+
2069	   | 0x00..0x17 | Integer 0x00..0x17 (0..23)                           |
2070	   |            |                                                      |
2071	   | 0x18       | Unsigned integer (one-byte uint8_t follows)          |
2072	   |            |                                                      |
2073	   | 0x19       | Unsigned integer (two-byte uint16_t follows)         |
2074	   |            |                                                      |
2075	   | 0x1a       | Unsigned integer (four-byte uint32_t follows)        |
2076	   |            |                                                      |
2077	   | 0x1b       | Unsigned integer (eight-byte uint64_t follows)       |
2078	   |            |                                                      |
2079	   | 0x20..0x37 | Negative integer -1-0x00..-1-0x17 (-1..-24)          |
2080	   |            |                                                      |
2081	   | 0x38       | Negative integer -1-n (one-byte uint8_t for n        |
2082	   |            | follows)                                             |
2083	   |            |                                                      |
2084	   | 0x39       | Negative integer -1-n (two-byte uint16_t for n       |
2085	   |            | follows)                                             |
2086	   |            |                                                      |
2087	   | 0x3a       | Negative integer -1-n (four-byte uint32_t for n      |
2088	   |            | follows)                                             |
2089	   |            |                                                      |
2090	   | 0x3b       | Negative integer -1-n (eight-byte uint64_t for n     |
2091	   |            | follows)                                             |
2092	   |            |                                                      |
2093	   | 0x40..0x57 | byte string (0x00..0x17 bytes follow)                |
2094	   |            |                                                      |
2095	   | 0x58       | byte string (one-byte uint8_t for n, and then n      |
2096	   |            | bytes follow)                                        |
2097	   |            |                                                      |
2098	   | 0x59       | byte string (two-byte uint16_t for n, and then n     |
2099	   |            | bytes follow)                                        |
2100	   |            |                                                      |
2101	   | 0x5a       | byte string (four-byte uint32_t for n, and then n    |
2102	   |            | bytes follow)                                        |
2103	   |            |                                                      |
2104	   | 0x5b       | byte string (eight-byte uint64_t for n, and then n   |
2105	   |            | bytes follow)                                        |
2106	   |            |                                                      |
2107	   | 0x5f       | byte string, byte strings follow, terminated by      |
2108	   |            | "break"                                              |
2109	   |            |                                                      |
2110	   | 0x60..0x77 | UTF-8 string (0x00..0x17 bytes follow)               |
2111	   |            |                                                      |
2112	   | 0x78       | UTF-8 string (one-byte uint8_t for n, and then n     |
2113	   |            | bytes follow)                                        |
2114	   |            |                                                      |
2115	   | 0x79       | UTF-8 string (two-byte uint16_t for n, and then n    |
2116	   |            | bytes follow)                                        |
2117	   |            |                                                      |
2118	   | 0x7a       | UTF-8 string (four-byte uint32_t for n, and then n   |
2119	   |            | bytes follow)                                        |
2120	   |            |                                                      |
2121	   | 0x7b       | UTF-8 string (eight-byte uint64_t for n, and then n  |
2122	   |            | bytes follow)                                        |
2123	   |            |                                                      |
2124	   | 0x7f       | UTF-8 string, UTF-8 strings follow, terminated by    |
2125	   |            | "break"                                              |
2126	   |            |                                                      |
2127	   | 0x80..0x97 | array (0x00..0x17 data items follow)                 |
2128	   |            |                                                      |
2129	   | 0x98       | array (one-byte uint8_t for n, and then n data items |
2130	   |            | follow)                                              |
2131	   |            |                                                      |
2132	   | 0x99       | array (two-byte uint16_t for n, and then n data      |
2133	   |            | items follow)                                        |
2134	   |            |                                                      |
2135	   | 0x9a       | array (four-byte uint32_t for n, and then n data     |
2136	   |            | items follow)                                        |
2137	   |            |                                                      |
2138	   | 0x9b       | array (eight-byte uint64_t for n, and then n data    |
2139	   |            | items follow)                                        |
2140	   |            |                                                      |
2141	   | 0x9f       | array, data items follow, terminated by "break"      |
2142	   |            |                                                      |
2143	   | 0xa0..0xb7 | map (0x00..0x17 pairs of data items follow)          |
2144	   |            |                                                      |
2145	   | 0xb8       | map (one-byte uint8_t for n, and then n pairs of     |
2146	   |            | data items follow)                                   |
2147	   |            |                                                      |
2148	   | 0xb9       | map (two-byte uint16_t for n, and then n pairs of    |
2149	   |            | data items follow)                                   |
2150	   |            |                                                      |
2151	   | 0xba       | map (four-byte uint32_t for n, and then n pairs of   |
2152	   |            | data items follow)                                   |
2153	   |            |                                                      |
2154	   | 0xbb       | map (eight-byte uint64_t for n, and then n pairs of  |
2155	   |            | data items follow)                                   |
2156	   |            |                                                      |
2157	   | 0xbf       | map, pairs of data items follow, terminated by       |
2158	   |            | "break"                                              |
2159	   |            |                                                      |
2160	   | 0xc0       | Text-based date/time (data item follows; see         |
2161	   |            | Section 2.4.1)                                       |
2162	   |            |                                                      |
2163	   | 0xc1       | Epoch-based date/time (data item follows; see        |
2164	   |            | Section 2.4.1)                                       |
2165	   |            |                                                      |
2166	   | 0xc2       | Positive bignum (data item "byte string" follows)    |
2167	   |            |                                                      |
2168	   | 0xc3       | Negative bignum (data item "byte string" follows)    |
2169	   |            |                                                      |
2170	   | 0xc4       | Decimal Fraction (data item "array" follows; see     |
2171	   |            | Section 2.4.3)                                       |
2172	   |            |                                                      |
2173	   | 0xc5       | Bigfloat (data item "array" follows; see             |
2174	   |            | Section 2.4.3)                                       |
2175	   |            |                                                      |
2176	   | 0xc6..0xd4 | (tagged item)                                        |
2177	   |            |                                                      |
2178	   | 0xd5..0xd7 | Expected Conversion (data item follows; see          |
2179	   |            | Section 2.4.4.2)                                     |
2180	   |            |                                                      |
2181	   | 0xd8..0xdb | (more tagged items, 1/2/4/8 bytes and then a data    |
2182	   |            | item follow)                                         |
2183	   |            |                                                      |
2184	   | 0xe0..0xf3 | (simple value)                                       |
2185	   |            |                                                      |
2186	   | 0xf4       | False                                                |
2187	   |            |                                                      |
2188	   | 0xf5       | True                                                 |
2189	   |            |                                                      |
2190	   | 0xf6       | Null                                                 |
2191	   |            |                                                      |
2192	   | 0xf7       | Undefined                                            |
2193	   |            |                                                      |
2194	   | 0xf8       | (simple value, one byte follows)                     |
2195	   |            |                                                      |
2196	   | 0xf9       | Half-Precision Float (two-byte IEEE 754)             |
2197	   |            |                                                      |
2198	   | 0xfa       | Single-Precision Float (four-byte IEEE 754)          |
2199	   |            |                                                      |
2200	   | 0xfb       | Double-Precision Float (eight-byte IEEE 754)         |
2201	   |            |                                                      |
2202	   | 0xff       | "break" stop code                                    |
2203	   +------------+------------------------------------------------------+

2205	                   Table 5: Jump Table for Initial Byte

2207	Appendix C.  Pseudocode

2209	   The well-formedness of a CBOR item can be checked by the pseudocode
2210	   in Figure 1.  The data is well-formed if and only if:

2212	   o  the pseudocode does not "fail";

2214	   o  after execution of the pseudocode, no bytes are left in the input
2215	      (except in streaming applications)

2217	   The pseudocode has the following prerequisites:

2219	   o  take(n) reads n bytes from the input data and returns them as a
2220	      byte string.  If n bytes are no longer available, take(n) fails.

2222	   o  uint() converts a byte string into an unsigned integer by
2223	      interpreting the byte string in network byte order.

2225	   o  Arithmetic works as in C.

2227	   o  All variables are unsigned integers of sufficient range.

2229	   well_formed (breakable = false) {
2230	     // process initial bytes
2231	     ib = uint(take(1));
2232	     mt = ib >> 5;
2233	     val = ai = ib & 0x1f;
2234	     switch (ai) {
2235	       case 24: val = uint(take(1)); break;
2236	       case 25: val = uint(take(2)); break;
2237	       case 26: val = uint(take(4)); break;
2238	       case 27: val = uint(take(8)); break;
2239	       case 28: case 29: case 30: fail();
2240	       case 31:
2241	         return well_formed_indefinite(mt, breakable);
2242	     }
2243	     // process content
2244	     switch (mt) {
2245	       // case 0, 1, 7 do not have content; just use val
2246	       case 2: case 3: take(val); break; // bytes/UTF-8
2247	       case 4: for (i = 0; i < val; i++) well_formed(); break;
2248	       case 5: for (i = 0; i < val*2; i++) well_formed(); break;
2249	       case 6: well_formed(); break;     // 1 embedded data item
2250	     }
2251	     return mt;                    // finite data item
2252	   }

2254	   well_formed_indefinite(mt, breakable) {
2255	     switch (mt) {
2256	       case 2: case 3:
2257	         while ((it = well_formed(true)) != -1)
2258	           if (it != mt)           // need finite embedded
2259	             fail();               //    of same type
2260	         break;
2261	       case 4: while (well_formed(true) != -1); break;
2262	       case 5: while (well_formed(true) != -1) well_formed(); break;
2263	       case 7:
2264	         if (breakable)
2265	           return -1;              // signal break out
2266	         else fail();              // no enclosing indefinite
2267	       default: fail();            // wrong mt
2268	     }
2269	     return 0;                     // no break out
2270	   }

2272	              Figure 1: Pseudocode for Well-Formedness Check

2274	   Note that the remaining complexity of a complete CBOR decoder is
2275	   about presenting data that has been parsed to the application in an
2276	   appropriate form.

2278	   Major types 0 and 1 are designed in such a way that they can be
2279	   encoded in C from a signed integer without actually doing an if-then-
2280	   else for positive/negative (Figure 2).  This uses the fact that
2281	   (-1-n), the transformation for major type 1, is the same as ~n
2282	   (bitwise complement) in C unsigned arithmetic; ~n can then be
2283	   expressed as (-1)^n for the negative case, while 0^n leaves n
2284	   unchanged for non-negative.  The sign of a number can be converted to
2285	   -1 for negative and 0 for non-negative (0 or positive) by arithmetic-
2286	   shifting the number by one bit less than the bit length of the number
2287	   (for example, by 63 for 64-bit numbers).

2289	   void encode_sint(int64_t n) {
2290	     uint64t ui = n >> 63;    // extend sign to whole length
2291	     mt = ui & 0x20;          // extract major type
2292	     ui ^= n;                 // complement negatives
2293	     if (ui < 24)
2294	       *p++ = mt + ui;
2295	     else if (ui < 256) {
2296	       *p++ = mt + 24;
2297	       *p++ = ui;
2298	     } else
2299	          ...

2301	            Figure 2: Pseudocode for Encoding a Signed Integer

2303	Appendix D.  Half-Precision

2305	   As half-precision floating-point numbers were only added to IEEE 754
2306	   in 2008, today's programming platforms often still only have limited
2307	   support for them.  It is very easy to include at least decoding
2308	   support for them even without such support.  An example of a small
2309	   decoder for half-precision floating-point numbers in the C language
2310	   is shown in Figure 3.  A similar program for Python is in Figure 4;
2311	   this code assumes that the 2-byte value has already been decoded as
2312	   an (unsigned short) integer in network byte order (as would be done
2313	   by the pseudocode in Appendix C).

2315	   #include 

2317	   double decode_half(unsigned char *halfp) {
2318	     int half = (halfp[0] << 8) + halfp[1];
2319	     int exp = (half >> 10) & 0x1f;
2320	     int mant = half & 0x3ff;
2321	     double val;
2322	     if (exp == 0) val = ldexp(mant, -24);
2323	     else if (exp != 31) val = ldexp(mant + 1024, exp - 25);
2324	     else val = mant == 0 ? INFINITY : NAN;
2325	     return half & 0x8000 ? -val : val;
2326	   }

2328	               Figure 3: C Code for a Half-Precision Decoder

2330	   import struct
2331	   from math import ldexp

2333	   def decode_single(single):
2334	       return struct.unpack("!f", struct.pack("!I", single))[0]

2336	   def decode_half(half):
2337	       valu = (half & 0x7fff) << 13 | (half & 0x8000) << 16
2338	       if ((half & 0x7c00) != 0x7c00):
2339	           return ldexp(decode_single(valu), 112)
2340	       return decode_single(valu | 0x7f800000)

2342	            Figure 4: Python Code for a Half-Precision Decoder

2344	Appendix E.  Comparison of Other Binary Formats to CBOR's Design
2345	             Objectives

2347	   The proposal for CBOR follows a history of binary formats that is as
2348	   long as the history of computers themselves.  Different formats have
2349	   had different objectives.  In most cases, the objectives of the
2350	   format were never stated, although they can sometimes be implied by
2351	   the context where the format was first used.  Some formats were meant
2352	   to be universally usable, although history has proven that no binary
2353	   format meets the needs of all protocols and applications.

2355	   CBOR differs from many of these formats due to it starting with a set
2356	   of objectives and attempting to meet just those.  This section
2357	   compares a few of the dozens of formats with CBOR's objectives in
2358	   order to help the reader decide if they want to use CBOR or a
2359	   different format for a particular protocol or application.

2361	   Note that the discussion here is not meant to be a criticism of any
2362	   format: to the best of our knowledge, no format before CBOR was meant
2363	   to cover CBOR's objectives in the priority we have assigned them.  A
2364	   brief recap of the objectives from Section 1.1 is:

2366	   1.  unambiguous encoding of most common data formats from Internet
2367	       standards

2369	   2.  code compactness for encoder or decoder

2371	   3.  no schema description needed

2373	   4.  reasonably compact serialization

2375	   5.  applicability to constrained and unconstrained applications

2377	   6.  good JSON conversion

2379	   7.  extensibility

2381	E.1.  ASN.1 DER, BER, and PER

2383	   [ASN.1] has many serializations.  In the IETF, DER and BER are the
2384	   most common.  The serialized output is not particularly compact for
2385	   many items, and the code needed to decode numeric items can be
2386	   complex on a constrained device.

2388	   Few (if any) IETF protocols have adopted one of the several variants
2389	   of Packed Encoding Rules (PER).  There could be many reasons for
2390	   this, but one that is commonly stated is that PER makes use of the
2391	   schema even for parsing the surface structure of the data stream,
2392	   requiring significant tool support.  There are different versions of
2393	   the ASN.1 schema language in use, which has also hampered adoption.

2395	E.2.  MessagePack

2397	   [MessagePack] is a concise, widely implemented counted binary
2398	   serialization format, similar in many properties to CBOR, although
2399	   somewhat less regular.  While the data model can be used to represent
2400	   JSON data, MessagePack has also been used in many remote procedure
2401	   call (RPC) applications and for long-term storage of data.

2403	   MessagePack has been essentially stable since it was first published
2404	   around 2011; it has not yet had a transition.  The evolution of
2405	   MessagePack is impeded by an imperative to maintain complete
2406	   backwards compatibility with existing stored data, while only few
2407	   bytecodes are still available for extension.  Repeated requests over
2408	   the years from the MessagePack user community to separate out binary
2409	   and text strings in the encoding recently have led to an extension
2410	   proposal that would leave MessagePack's "raw" data ambiguous between
2411	   its usages for binary and text data.  The extension mechanism for
2412	   MessagePack remains unclear.

2414	E.3.  BSON

2416	   [BSON] is a data format that was developed for the storage of JSON-
2417	   like maps (JSON objects) in the MongoDB database.  Its major
2418	   distinguishing feature is the capability for in-place update,
2419	   foregoing a compact representation.  BSON uses a counted
2420	   representation except for map keys, which are null-byte terminated.
2421	   While BSON can be used for the representation of JSON-like objects on
2422	   the wire, its specification is dominated by the requirements of the
2423	   database application and has become somewhat baroque.  The status of
2424	   how BSON extensions will be implemented remains unclear.

2426	E.4.  UBJSON

2428	   [UBJSON] has a design goal to make JSON faster and somewhat smaller,
2429	   using a binary format that is limited to exactly the data model JSON
2430	   uses.  Thus, there is expressly no intention to support, for example,
2431	   binary data; however, there is a "high-precision number", expressed
2432	   as a character string in JSON syntax.  UBJSON is not optimized for
2433	   code compactness, and its type byte coding is optimized for human
2434	   recognition and not for compact representation of native types such
2435	   as small integers.  Although UBJSON is mostly counted, it provides a
2436	   reserved "unknown-length" value to support streaming of arrays and
2437	   maps (JSON objects).  Within these containers, UBJSON also has a
2438	   "Noop" type for padding.

2440	E.5.  MSDTP: RFC 713

2442	   Message Services Data Transmission (MSDTP) is a very early example of
2443	   a compact message format; it is described in [RFC0713], written in
2444	   1976.  It is included here for its historical value, not because it
2445	   was ever widely used.

2447	E.6.  Conciseness on the Wire

2449	   While CBOR's design objective of code compactness for encoders and
2450	   decoders is a higher priority than its objective of conciseness on
2451	   the wire, many people focus on the wire size.  Table 6 shows some
2452	   encoding examples for the simple nested array [1, [2, 3]]; where some
2453	   form of indefinite-length encoding is supported by the encoding,
2454	   [_ 1, [2, 3]] (indefinite length on the outer array) is also shown.

2456	   +-------------+--------------------------+--------------------------+
2457	   | Format      | [1, [2, 3]]              | [_ 1, [2, 3]]            |
2458	   +-------------+--------------------------+--------------------------+
2459	   | RFC 713     | c2 05 81 c2 02 82 83     |                          |
2460	   |             |                          |                          |
2461	   | ASN.1 BER   | 30 0b 02 01 01 30 06 02  | 30 80 02 01 01 30 06 02  |
2462	   |             | 01 02 02 01 03           | 01 02 02 01 03 00 00     |
2463	   |             |                          |                          |
2464	   | MessagePack | 92 01 92 02 03           |                          |
2465	   |             |                          |                          |
2466	   | BSON        | 22 00 00 00 10 30 00 01  |                          |
2467	   |             | 00 00 00 04 31 00 13 00  |                          |
2468	   |             | 00 00 10 30 00 02 00 00  |                          |
2469	   |             | 00 10 31 00 03 00 00 00  |                          |
2470	   |             | 00 00                    |                          |
2471	   |             |                          |                          |
2472	   | UBJSON      | 61 02 42 01 61 02 42 02  | 61 ff 42 01 61 02 42 02  |
2473	   |             | 42 03                    | 42 03 45                 |
2474	   |             |                          |                          |
2475	   | CBOR        | 82 01 82 02 03           | 9f 01 82 02 03 ff        |
2476	   +-------------+--------------------------+--------------------------+

2478	           Table 6: Examples for Different Levels of Conciseness

2480	Authors' Addresses

2482	   Carsten Bormann
2483	   Universitaet Bremen TZI
2484	   Postfach 330440
2485	   D-28359 Bremen
2486	   Germany

2488	   Phone: +49-421-218-63921
2489	   EMail: cabo@tzi.org

2491	   Paul Hoffman
2492	   ICANN

2494	   EMail: paul.hoffman@icann.org