idnits 2.17.1 draft-bormann-cbor-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 12, 2013) is 3877 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '2' on line 2014

  -- Looks like a reference, but probably isn't: '3' on line 2014

  -- Looks like a reference, but probably isn't: '4' on line 1980

  -- Looks like a reference, but probably isn't: '5' on line 1980

  -- Looks like a reference, but probably isn't: '1' on line 2298

  == Missing Reference: 'RFC-THIS-SPEC' is mentioned on line 1659, but not
     defined

  == Missing Reference: 'TM' is mentioned on line 1822, but not defined

  -- Looks like a reference, but probably isn't: '0' on line 2314

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ECMA262'

  ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126)

  == Outdated reference: A later version (-07) exists of
     draft-ietf-lwig-terminology-05

  -- Obsolete informational reference (is this intentional?): RFC 4627
     (Obsoleted by RFC 7158, RFC 7159)


     Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         C. Bormann
3	Internet-Draft                                   Universitaet Bremen TZI
4	Intended status: Standards Track                              P. Hoffman
5	Expires: March 16, 2014                                   VPN Consortium
6	                                                      September 12, 2013

8	              Concise Binary Object Representation (CBOR)
9	                         draft-bormann-cbor-09

11	Abstract

13	   The Concise Binary Object Representation (CBOR) is a data format
14	   whose design goals include the possibility of extremely small code
15	   size, fairly small message size, and extensibility without the need
16	   for version negotiation.  These design goals make it different from
17	   earlier binary serializations such as ASN.1 and MessagePack.

19	Status of This Memo

21	   This Internet-Draft is submitted in full conformance with the
22	   provisions of BCP 78 and BCP 79.

24	   Internet-Drafts are working documents of the Internet Engineering
25	   Task Force (IETF).  Note that other groups may also distribute
26	   working documents as Internet-Drafts.  The list of current Internet-
27	   Drafts is at http://datatracker.ietf.org/drafts/current/.

29	   Internet-Drafts are draft documents valid for a maximum of six months
30	   and may be updated, replaced, or obsoleted by other documents at any
31	   time.  It is inappropriate to use Internet-Drafts as reference
32	   material or to cite them other than as "work in progress."

34	   This Internet-Draft will expire on March 16, 2014.

36	Copyright Notice

38	   Copyright (c) 2013 IETF Trust and the persons identified as the
39	   document authors.  All rights reserved.

41	   This document is subject to BCP 78 and the IETF Trust's Legal
42	   Provisions Relating to IETF Documents
43	   (http://trustee.ietf.org/license-info) in effect on the date of
44	   publication of this document.  Please review these documents
45	   carefully, as they describe your rights and restrictions with respect
46	   to this document.  Code Components extracted from this document must
47	   include Simplified BSD License text as described in Section 4.e of
48	   the Trust Legal Provisions and are provided without warranty as
49	   described in the Simplified BSD License.

51	Table of Contents

53	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
54	     1.1.  Objectives  . . . . . . . . . . . . . . . . . . . . . . .   4
55	     1.2.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   5
56	   2.  Specification of the CBOR Encoding  . . . . . . . . . . . . .   6
57	     2.1.  Major Types . . . . . . . . . . . . . . . . . . . . . . .   7
58	     2.2.  Indefinite Lengths for Some Major Types . . . . . . . . .   9
59	       2.2.1.  Indefinite Length Arrays and Maps . . . . . . . . . .   9
60	       2.2.2.  Indefinite Length Byte Strings and Text Strings . . .  11
61	     2.3.  Floating Point Numbers and Values with No Content . . . .  13
62	     2.4.  Optional Tagging of Items . . . . . . . . . . . . . . . .  14
63	       2.4.1.  Date and Time . . . . . . . . . . . . . . . . . . . .  16
64	       2.4.2.  Bignums . . . . . . . . . . . . . . . . . . . . . . .  16
65	       2.4.3.  Decimal Fractions and Bigfloats . . . . . . . . . . .  17
66	       2.4.4.  Content Hints . . . . . . . . . . . . . . . . . . . .  18
67	         2.4.4.1.  Encoded CBOR data item  . . . . . . . . . . . . .  18
68	         2.4.4.2.  Expected Later Encoding for CBOR to JSON
69	                   Converters  . . . . . . . . . . . . . . . . . . .  18
70	         2.4.4.3.  Encoded Text  . . . . . . . . . . . . . . . . . .  19
71	       2.4.5.  Self-describe CBOR  . . . . . . . . . . . . . . . . .  19
72	   3.  Creating CBOR-Based Protocols . . . . . . . . . . . . . . . .  20
73	     3.1.  CBOR in Streaming Applications  . . . . . . . . . . . . .  21
74	     3.2.  Generic Encoders and Decoders . . . . . . . . . . . . . .  21
75	     3.3.  Syntax Errors . . . . . . . . . . . . . . . . . . . . . .  22
76	       3.3.1.  Incomplete CBOR data items  . . . . . . . . . . . . .  22
77	       3.3.2.  Malformed Indefinite Length Items . . . . . . . . . .  23
78	       3.3.3.  Unknown Additional Information Values . . . . . . . .  23
79	     3.4.  Other Decoding Errors . . . . . . . . . . . . . . . . . .  23
80	     3.5.  Handling Unknown Simple Values and Tags . . . . . . . . .  24
81	     3.6.  Numbers . . . . . . . . . . . . . . . . . . . . . . . . .  24
82	     3.7.  Specifying Keys for Maps  . . . . . . . . . . . . . . . .  25
83	     3.8.  Undefined Values  . . . . . . . . . . . . . . . . . . . .  26
84	     3.9.  Canonical CBOR  . . . . . . . . . . . . . . . . . . . . .  26
85	     3.10. Strict Mode . . . . . . . . . . . . . . . . . . . . . . .  28
86	   4.  Converting Data Between CBOR and JSON . . . . . . . . . . . .  29
87	     4.1.  Converting From CBOR to JSON  . . . . . . . . . . . . . .  30
88	     4.2.  Converting From JSON to CBOR  . . . . . . . . . . . . . .  31

90	   5.  Future Evolution of CBOR  . . . . . . . . . . . . . . . . . .  32
91	     5.1.  Extension Points  . . . . . . . . . . . . . . . . . . . .  32
92	     5.2.  Curating the Additional Information Space . . . . . . . .  33
93	   6.  Diagnostic Notation . . . . . . . . . . . . . . . . . . . . .  33
94	     6.1.  Encoding indicators . . . . . . . . . . . . . . . . . . .  34
95	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  35
96	     7.1.  Simple Values Registry  . . . . . . . . . . . . . . . . .  35
97	     7.2.  Tags Registry . . . . . . . . . . . . . . . . . . . . . .  35
98	     7.3.  Media Type ("MIME Type")  . . . . . . . . . . . . . . . .  35
99	     7.4.  CoAP Content-Format . . . . . . . . . . . . . . . . . . .  36
100	     7.5.  The +cbor Structured Syntax Suffix Registration . . . . .  37
101	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  37
102	   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  38
103	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  39
104	     10.1.  Normative References . . . . . . . . . . . . . . . . . .  39
105	     10.2.  Informative References . . . . . . . . . . . . . . . . .  40
106	   Appendix A.  Examples . . . . . . . . . . . . . . . . . . . . . .  40
107	   Appendix B.  Jump Table . . . . . . . . . . . . . . . . . . . . .  45
108	   Appendix C.  Pseudocode . . . . . . . . . . . . . . . . . . . . .  48
109	   Appendix D.  Half-precision . . . . . . . . . . . . . . . . . . .  50
110	   Appendix E.  Comparison of Other Binary Formats to CBOR's Design
111	                Objectives . . . . . . . . . . . . . . . . . . . . .  51
112	     E.1.  ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . .  52
113	     E.2.  MessagePack . . . . . . . . . . . . . . . . . . . . . . .  52
114	     E.3.  BSON  . . . . . . . . . . . . . . . . . . . . . . . . . .  52
115	     E.4.  UBJSON  . . . . . . . . . . . . . . . . . . . . . . . . .  52
116	     E.5.  MSDTP: RFC 713  . . . . . . . . . . . . . . . . . . . . .  53
117	     E.6.  Conciseness On The Wire . . . . . . . . . . . . . . . . .  53
118	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  54

120	1.  Introduction

122	   There are hundreds of standardized formats for binary representation
123	   of structured data (also known as binary serialization formats).  Of
124	   those, some are for specific domains of information, while others are
125	   generalized for arbitrary data.  In the IETF, probably the best-known
126	   formats in the latter category are ASN.1's BER and DER [ASN.1].

128	   The format defined here follows some specific design goals that are
129	   not well met by current formats.  The underlying data model is an
130	   extended version of the JSON data model [RFC4627].  It is important
131	   to note that this is not a proposal that the grammar in RFC 4627 be
132	   extended in general, since doing so would cause a significant
133	   backwards incompatibility with already-deployed JSON documents.
134	   Instead, this document simply defines its own data model which starts
135	   from JSON.

137	   Appendix E lists some existing binary formats and discusses how well
138	   they do or do not fit the design objectives of CBOR.

140	1.1.  Objectives

142	   The objectives of the Concise Binary Object Representation (CBOR),
143	   roughly in decreasing order of importance, are:

145	   1.  The representation must be able to unambiguously encode most
146	       common data formats used in Internet standards.

148	       *  Representing a reasonable set of basic data types and
149	          structures using binary encoding.  "Reasonable" here is
150	          largely influenced by the capabilities of JSON, with the major
151	          addition of binary byte strings.  The structures supported are
152	          limited to arrays and trees; loops and lattice-style graphs
153	          are not supported.

155	       *  There is no requirement that all data formats be uniquely
156	          encoded; that is, it is acceptable that the number "7" might
157	          be encoded in multiple different ways.

159	   2.  The code for an encoder or decoder must be able to be compact in
160	       order to support systems with very limited memory and processor
161	       power and instruction sets.

163	       *  An encoder and a decoder need to be implementable in a very
164	          small amount of code (for example, in class 1 constrained
165	          nodes as defined in [I-D.ietf-lwig-terminology]).

167	       *  The format should use contemporary machine representations of
168	          data (for example, not requiring binary-to-decimal
169	          conversion).

171	   3.  Data must be able to be decoded without a schema description.

173	       *  Similar to JSON, encoded data should be self-describing so
174	          that a generic decoder can be written.

176	   4.  The serialization must be reasonably compact, but data
177	       compactness is secondary to code compactness for the encoder and
178	       decoder.

180	       *  "Reasonable" here is bounded by JSON as an upper bound in
181	          size, and by implementation complexity maintaining a lower
182	          bound.  Using either general compression schemes or extensive
183	          bit-fiddling violates the complexity goals.

185	   5.  The format must be applicable to both constrained nodes and high-
186	       volume applications.

188	       *  This means it must be reasonably frugal in CPU usage for both
189	          encoding and decoding.  This is relevant both for constrained
190	          nodes and for potential usage in applications with a very high
191	          volume of data.

193	   6.  The format must support all JSON data types for conversion to and
194	       from JSON.

196	       *  It must support a reasonable level of conversion as long as
197	          the data represented are within the capabilities of JSON.  It
198	          must be possible to define a unidirectional mapping towards
199	          JSON for all types of data.

201	   7.  The format must be extensible, with the extended data being able
202	       to be decoded by earlier decoders.

204	       *  The format is designed for decades of use.

206	       *  The format must support a form of extensibility that allows
207	          fallback so that a decoder that does not understand an
208	          extension can still decode the message.

210	       *  The format must be able to be extended in the future by later
211	          IETF standards.

213	1.2.  Terminology

215	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
216	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
217	   document are to be interpreted as described in RFC 2119, BCP 14
218	   [RFC2119] and indicate requirement levels for compliant CBOR
219	   implementations.

221	   The term "byte" is used in its now-customary sense as a synonym for
222	   "octet".  All multi-byte values are encoded in network byte order
223	   (that is, most significant byte first, also known as "big-endian").

225	   This specification makes use of the following terminology:

227	   Data item:  A single piece of CBOR data.  The structure of a data
228	      item may contain zero, one or more nested data items.  The term is
229	      used both for the data item in representation format and for the
230	      abstract idea that can be derived from that by a decoder.

232	   Decoder:  A process that decodes a CBOR data item and makes it
233	      available to an application.  Formally speaking, a decoder
234	      contains a parser to break up the input using the syntax rules of
235	      CBOR, as well as a semantic processor to prepare the data in a
236	      form suitable to the application.

238	   Encoder:  A process that generates the representation format of a
239	      CBOR data item from application information.

241	   Data Stream:  A sequence of zero or more data items, not further
242	      assembled into a larger containing data item.  The independent
243	      data items that make up a data stream are sometimes also referred
244	      to as "top-level data items".

246	   Well-formed:  A data item that follows the syntactic structure of
247	      CBOR.  A well-formed data item uses the initial bytes and the byte
248	      strings and/or data items that are implied by their values as
249	      defined in CBOR and is not followed by extraneous data.

251	   Valid:  A data item that is well-formed and also follows the semantic
252	      restrictions that apply to CBOR data items.

254	   Stream decoder:  A process that decodes a data stream and makes each
255	      of the data items in the sequence available to an application as
256	      they are received.

258	   Where bit arithmetic or data types are explained, this document uses
259	   the notation familiar from the programming language C, except that **
260	   denotes exponentiation.  Similar to the "0x" notation for hexadecimal
261	   numbers, numbers in binary notation are prefixed with "0b".
262	   Underscores can be added to such a number solely for readability, so
263	   0b00100001 (0x21) might be written 0b001_00001 to emphasize the
264	   desired interpretation of the bits in the byte, in this case split
265	   into three bits and five bits.

267	2.  Specification of the CBOR Encoding

269	   A CBOR encoded data item is structured and encoded as described in
270	   this section.  The encoding is summarized in Table 4.

272	   The initial byte of each data item contains both information about
273	   the major type (the high-order 3 bits, described in Section 2.1) and
274	   additional information (the low-order 5 bits).  When the value of the
275	   additional information is less than 24, it is directly used as a
276	   small unsigned integer.  When it is 24 to 27, the additional bytes
277	   for a variable-length integer immediately follow; the values 24 to 27
278	   of the additional information specify that its length is a 1-, 2-, 4-
279	   or 8-byte unsigned integer, respectively.  Additional information
280	   value 31 is used for indefinite length items, described in
281	   Section 2.2.  Additional information values 28 to 30 are reserved for
282	   future expansion.

284	   In all additional information values, the resulting integer is
285	   interpreted depending on the major type.  It may represent the actual
286	   data: for example, in integer types the resulting integer is used for
287	   the value itself.  It may instead supply length information: for
288	   example, in byte strings it gives the length of the byte string data
289	   that follows.

291	   A CBOR decoder implementation can be based on a jump table with all
292	   256 defined values for the initial byte (Table 4).  A decoder in a
293	   constrained implementation can instead use the structure of the
294	   initial byte and following bytes for more compact code (see
295	   Appendix C for a rough impression of how this could look like).

297	2.1.  Major Types

299	   The following lists the major types and the additional information
300	   and other bytes associated with the type.

302	   Major type 0:  an unsigned integer.  The 5-bit additional information
303	      is either the integer itself (for additional information values 0
304	      through 23), or the length of additional data.  Additional
305	      information 24 means the value is represented in an additional
306	      uint8_t, 25 means a uint16_t, 26 means a uint32_t, and 27 means a
307	      uint64_t.  For example, the integer 10 is denoted as the one byte
308	      0b000_01010 (major type 0, additional information 10).  The
309	      integer 500 would be 0b000_11001 (major type 0, additional
310	      information 25) followed by the two bytes 0x01f4, which is 500 in
311	      decimal.

313	   Major type 1:  a negative integer.  The encoding follows the rules
314	      for unsigned integers (major type 0), except that the value is
315	      then -1 minus the encoded unsigned integer.  For example, the
316	      integer -500 would be 0b001_11001 (major type 1, additional
317	      information 25) followed by the two bytes 0x01f3, which is 499 in
318	      decimal.

320	   Major type 2:  a byte string.  The string's length in bytes is
321	      represented following the rules for positive integers (major type
322	      0).  For example, a byte string whose length is 5 would have an
323	      initial byte of 0b010_00101 (major type 2, additional information
324	      5 for the length), followed by 5 bytes of binary content.  A byte
325	      string whose length is 500 would have 3 initial bytes of
326	      0b010_11001 (major type 2, additional information 25 to indicate a
327	      two-byte length) followed by the two bytes 0x01f4 for a length of
328	      500, followed by 500 bytes of binary content.

330	   Major type 3:  a text string, specifically a string of Unicode
331	      characters that is encoded as UTF-8 [RFC3629].  The format of this
332	      type is identical to that of byte strings (major type 2), that is,
333	      as with major type 2, the length gives the number of bytes.  This
334	      type is provided for systems that need to interpret or display
335	      human-readable text, and allows the differentiation between
336	      unstructured bytes and text that has a specified repertoire and
337	      encoding.  In contrast to formats such as JSON, the Unicode
338	      characters in this type are never escaped.  Thus, a newline
339	      character (U+000A) is always represented in a string as the byte
340	      0x0a, and never as the bytes 0x5c6e (the characters "\" and "n")
341	      or as 0x5c7530303061 (the characters "\", "u", "0", "0", "0", and
342	      "a").

344	   Major type 4:  an array of data items.  Arrays are also called lists,
345	      sequences, or tuples.  The array's length follows the rules for
346	      byte strings (major type 2), except that the length denotes the
347	      number of data items, not the length in bytes that the array takes
348	      up.  Items in an array do not need to all be of the same type.
349	      For example, an array that contains 10 items of any type would
350	      have an initial byte of 0b100_01010 (major type of 4, additional
351	      information of 10 for the length) followed by the 10 remaining
352	      items.

354	   Major type 5:  a map of pairs of data items.  Maps are also called
355	      tables, dictionaries, hashes, or objects (in JSON).  A map is
356	      comprised of pairs of data items, each pair consisting of a key
357	      which is immediately followed by a value.  The map's length
358	      follows the rules for byte strings (major type 2), except that the
359	      length denotes the number of pairs, not the length in bytes that
360	      the map takes up.  For example, a map that contains 9 pairs would
361	      have an initial byte of 0b101_01001 (major type of 5, additional
362	      information of 9 for the number of pairs) followed by the 18
363	      remaining items.  The first item is the first key, the second item
364	      is the first value, the third item is the second key, and so on.
365	      A map that has duplicate keys may be well-formed but it is not
366	      valid, and thus it causes indeterminate decoding; see also
367	      Section 3.7.

369	   Major type 6:  optional semantic tagging of other major types.  See
370	      Section 2.4.

372	   Major type 7:  floating point numbers and simple data types that need
373	      no content, as well as the "break" stop code.  See Section 2.3.

375	   These eight major types lead to a simple table showing which of the
376	   256 possible values for the initial byte of a data item are used
377	   (Table 4).

379	   In major types 6 and 7, many of the possible values are reserved for
380	   future specification.  See Section 7 for more information on these
381	   values.

383	2.2.  Indefinite Lengths for Some Major Types

385	   Four CBOR items (arrays, maps, byte strings, and text strings) can be
386	   encoded with an indefinite length using additional information value
387	   31.  This is useful if the encoding of the item needs to begin before
388	   the number of items inside the array or map, or the total length of
389	   the string, is known.  (The application of this is often referred to
390	   as "streaming" within a data item.)

392	   Indefinite length arrays and maps are dealt with differently than
393	   indefinite length byte strings and text strings.

395	2.2.1.  Indefinite Length Arrays and Maps

397	   Indefinite length arrays and maps are simply opened without
398	   indicating the number of data items that will be included in the
399	   array or map, using the additional information value of 31.  The
400	   initial major type and additional information byte is followed by the
401	   elements of the array or map, just as they would be in other arrays
402	   or maps.  The end of the array or map is indicated by encoding a
403	   "break" stop code in a place where the next data item would normally
404	   have been included.  "Break" is encoded with major type 7 and
405	   additional information value 31 (0b111_11111), but is not itself a
406	   data item: it is just a syntactic feature to close the array or map.
407	   That is, the "break" stop code comes after the last item in the array
408	   or map, and cannot occur anywhere else in place of a data item.  In
409	   this way, indefinite length arrays and maps look identical to other
410	   arrays and maps except for beginning with the additional information
411	   value 31 and ending with the "break" stop code.

413	   Arrays and maps with indefinite lengths allow any number of items
414	   (for arrays) and key/value pairs (for maps) to be given before the
415	   "break" stop code.  There is no restriction against nesting
416	   indefinite length array or map items.  A "break" only terminates a
417	   single item, so nested indefinite length items need exactly as many
418	   "break" stop codes as there are type bytes starting an indefinite
419	   length item.

421	   For example, assume an encoder wants to represent the abstract array
422	   [1, [2, 3], [4, 5]].  The definite-length encoding would be
423	   0x8301820203820405:

425	   83        -- Array of length 3
426	      01     -- 1
427	      82     -- Array of length 2
428	         02  -- 2
429	         03  -- 3
430	      82     -- Array of length 2
431	         04  -- 4
432	         05  -- 5

434	   Indefinite length encoding could be applied independently to each of
435	   the three arrays encoded in this data item, as required, leading to
436	   representations such as:

438	   0x9f018202039f0405ffff
439	   9F        -- Start indefinite length array
440	      01     -- 1
441	      82     -- Array of length 2
442	         02  -- 2
443	         03  -- 3
444	      9F     -- Start indefinite length array
445	         04  -- 4
446	         05  -- 5
447	         FF  -- "break" (inner array)
448	      FF     -- "break" (outer array)

450	   0x9f01820203820405ff
451	   9F        -- Start indefinite length array
452	      01     -- 1
453	      82     -- Array of length 2
454	         02  -- 2
455	         03  -- 3
456	      82     -- Array of length 2
457	         04  -- 4
458	         05  -- 5
459	      FF     -- "break"

461	   0x83018202039f0405ff
462	   83        -- Array of length 3
463	      01     -- 1
464	      82     -- Array of length 2
465	         02  -- 2
466	         03  -- 3
467	      9F     -- Start indefinite length array
468	         04  -- 4
469	         05  -- 5
470	         FF  -- "break"

472	   0x83019f0203ff820405
473	   83        -- Array of length 3
474	      01     -- 1
475	      9F     -- Start indefinite length array
476	         02  -- 2
477	         03  -- 3
478	         FF  -- "break"
479	      82     -- Array of length 2
480	         04  -- 4
481	         05  -- 5

483	   An example of an indefinite length map (that happens to have two key/
484	   value pairs) might be:

486	   0xbf6346756ef563416d7421ff
487	   BF           -- Start indefinite length map
488	      63        -- First key, UTF-8 string length 3
489	         46756e --   "Fun"
490	      F5        -- First value, true
491	      63        -- Second key, UTF-8 string length 3
492	         416d74 --   "Amt"
493	      21        -- -2
494	      FF        -- "break"

496	2.2.2.  Indefinite Length Byte Strings and Text Strings

498	   Indefinite length byte strings and text strings are actually a
499	   concatenation of zero or more definite length byte or text strings
500	   ("chunks") that are together treated as one contiguous string.
501	   Indefinite length strings are opened with the major type and
502	   additional information value of 31, but what follows are a series of
503	   byte or text strings that have definite lengths (the chunks).  The
504	   end of the series of chunks is indicated by encoding the "break" stop
505	   code (0b111_11111) in a place where the next chunk in the series
506	   would occur.  The contents of the chunks are concatenated together,
507	   and the overall length of the indefinite length string will be the
508	   sum of the lengths of all of the chunks.  In summary, an indefinite
509	   length string is encoded similarly to how an indefinite length array
510	   of its chunks would be encoded, except that the major type of the
511	   indefinite length string is that of a (text or byte) string and
512	   matches the major types of its chunks.

514	   For indefinite length byte strings, every data item (chunk) between
515	   the indefinite length indicator and the "break" MUST be a definite
516	   length byte string item; if the parser sees any item type other than
517	   a byte string before it sees the "break", it is an error.

519	   For example, assume the sequence:

521	   0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111

523	   5F              -- Start indefinite length byte string
524	      44           -- Byte string of length 4
525	         aabbccdd  -- Bytes content
526	      43           -- Byte string of length 3
527	         eeff99    -- Bytes content
528	      FF           -- "break"

530	   After decoding, this results in a single byte string with seven
531	   bytes: 0xaabbccddeeff99.

533	   Text strings with indefinite lengths act the same as byte strings
534	   with indefinite lengths, except that all their chunks MUST be
535	   definite length text strings.  Note that this implies that the bytes
536	   of a single UTF-8 character cannot be spread between chunks: a new
537	   chunk can only be started at a character boundary.

539	2.3.  Floating Point Numbers and Values with No Content

541	   Major type 7 is for two types of data: floating point numbers and
542	   "simple values" that do not need any content.  Each value of the
543	   5-bit additional information in the initial byte has its own separate
544	   meaning, as defined in Table 1.  Like the major types for integers,
545	   items of this major type do not carry content data; all the
546	   information is in the initial bytes.

548	    +-------------+--------------------------------------------------+
549	    | 5-bit value | semantics                                        |
550	    +-------------+--------------------------------------------------+
551	    | 0..23       | Simple value (value 0..23)                       |
552	    |             |                                                  |
553	    | 24          | Simple value (value 32..255 in following byte)   |
554	    |             |                                                  |
555	    | 25          | IEEE 754 Half-Precision Float (16 bits follow)   |
556	    |             |                                                  |
557	    | 26          | IEEE 754 Single-Precision Float (32 bits follow) |
558	    |             |                                                  |
559	    | 27          | IEEE 754 Double-Precision Float (64 bits follow) |
560	    |             |                                                  |
561	    | 28-30       | (Unassigned)                                     |
562	    |             |                                                  |
563	    | 31          | "break" stop code for indefinite length items    |
564	    +-------------+--------------------------------------------------+

566	        Table 1: Values for Additional Information in Major Type 7

568	   As with all other major types, the 5-bit value 24 signifies a single-
569	   byte extension: it is followed by an additional byte to represent the
570	   simple value (to minimize confusion, only the values 32 to 255 are
571	   used).  This maintains the structure of the initial bytes: as for the
572	   other major types, the length of these always depends on the
573	   additional information in the first byte.  Table 2 lists the values
574	   assigned and available for simple types.

576	                       +---------+-----------------+
577	                       | value   | semantics       |
578	                       +---------+-----------------+
579	                       | 0..19   | (Unassigned)    |
580	                       |         |                 |
581	                       | 20      | False           |
582	                       |         |                 |
583	                       | 21      | True            |
584	                       |         |                 |
585	                       | 22      | Null            |
586	                       |         |                 |
587	                       | 23      | Undefined value |
588	                       |         |                 |
589	                       | 24..31  | (reserved)      |
590	                       |         |                 |
591	                       | 32..255 | (Unassigned)    |
592	                       +---------+-----------------+

594	                          Table 2: Simple Values

596	   The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit
597	   IEEE 754 binary floating point values.  These floating point values
598	   are encoded in the additional bytes of the appropriate size.  (See
599	   Appendix D for some information about 16-bit floating point.)

601	2.4.  Optional Tagging of Items

603	   In CBOR, a data item can optionally be preceded by a tag to give it
604	   additional semantics while retaining its structure.  The tag is major
605	   type 6, and represents an integer number as indicated by the tag's
606	   integer value; the (sole) data item is carried as content data.  If a
607	   tag requires structured data, this structure is encoded into the
608	   nested data item.  The definition of a tag usually restricts what
609	   kinds of nested data item or items can be carried by a tag.

611	   The initial bytes of the tag follow the rules for positive integers
612	   (major type 0).  The tag is followed by a single data item of any
613	   type.  For example, assume that a byte string of length 12 is marked
614	   with a tag to indicate it is a positive bignum.  This would be marked
615	   as 0b110_00010 (major type 6, additional information 2 for the tag)
616	   followed by 0b010_01100 (major type 2, additional information of 12
617	   for the length) followed by the 12 bytes of the bignum.

619	   Decoders do not need to understand tags, and thus tags may be of
620	   little value in applications where the implementation creating a
621	   particular CBOR data item and the implementation decoding that stream
622	   know the semantic meaning of each item in the data flow.  Their
623	   primary purpose in this specification is to define common data types
624	   such as dates.  A secondary purpose is to allow optional tagging when
625	   the decoder is a generic CBOR decoder that might be able to benefit
626	   from hints about the content of items.  Understanding the semantic
627	   tags is optional for a decoder; it can just jump over the initial
628	   bytes of the tag and interpret the tagged data item itself.

630	   A tag always applies to the item that is directly followed by it.
631	   Thus, if tag A is followed by tag B which is followed by data item C,
632	   tag A applies to the result of applying tag B on data item C.  That
633	   is, a tagged item is a data item consisting of a tag and a value.
634	   The content of the tagged item is the data item (the value) that is
635	   being tagged.

637	   IANA maintains a registry of tag values as described in Section 7.2.
638	   Table 3 provides a list of initial values, with definitions in the
639	   rest of this section.

641	   +--------------+------------------+---------------------------------+
642	   | tag          | data item        | semantics                       |
643	   +--------------+------------------+---------------------------------+
644	   | 0            | UTF-8 string     | Standard date/time string; see  |
645	   |              |                  | Section 2.4.1                   |
646	   |              |                  |                                 |
647	   | 1            | multiple         | Epoch-based date/time; see      |
648	   |              |                  | Section 2.4.1                   |
649	   |              |                  |                                 |
650	   | 2            | byte string      | Positive bignum; see Section    |
651	   |              |                  | 2.4.2                           |
652	   |              |                  |                                 |
653	   | 3            | byte string      | Negative bignum; see Section    |
654	   |              |                  | 2.4.2                           |
655	   |              |                  |                                 |
656	   | 4            | array            | Decimal fraction; see Section   |
657	   |              |                  | 2.4.3                           |
658	   |              |                  |                                 |
659	   | 5            | array            | Bigfloat; see Section 2.4.3     |
660	   |              |                  |                                 |
661	   | 6..20        | (Unassigned)     | (Unassigned)                    |
662	   |              |                  |                                 |
663	   | 21           | multiple         | Expected conversion to          |
664	   |              |                  | base64url encoding; see Section |
665	   |              |                  | 2.4.4.2                         |
666	   |              |                  |                                 |
667	   | 22           | multiple         | Expected conversion to base64   |
668	   |              |                  | encoding; see Section 2.4.4.2   |
669	   |              |                  |                                 |
670	   | 23           | multiple         | Expected conversion to base16   |
671	   |              |                  | encoding; see Section 2.4.4.2   |
672	   |              |                  |                                 |
673	   | 24           | byte string      | Encoded CBOR data item; see     |
674	   |              |                  | Section 2.4.4.1                 |
675	   |              |                  |                                 |
676	   | 25..31       | (Unassigned)     | (Unassigned)                    |
677	   |              |                  |                                 |
678	   | 32           | UTF-8 string     | URI; see Section 2.4.4.3        |
679	   |              |                  |                                 |
680	   | 33           | UTF-8 string     | Base64url; see Section 2.4.4.3  |
681	   |              |                  |                                 |
682	   | 34           | UTF-8 string     | Base64; see Section 2.4.4.3     |
683	   |              |                  |                                 |
684	   | 35           | UTF-8 string     | Regular expression; see Section |
685	   |              |                  | 2.4.4.3                         |
686	   |              |                  |                                 |
687	   | 36           | UTF-8 string     | MIME message; see Section       |
688	   |              |                  | 2.4.4.3                         |
689	   |              |                  |                                 |
690	   | 37..55798    | (Unassigned)     | (Unassigned)                    |
691	   |              |                  |                                 |
692	   | 55799        | multiple         | Self-describe CBOR; see Section |
693	   |              |                  | 2.4.5                           |
694	   |              |                  |                                 |
695	   | 55800+       | (Unassigned)     | (Unassigned)                    |
696	   +--------------+------------------+---------------------------------+

698	                         Table 3: Values for tags

700	2.4.1.  Date and Time

702	   Tag value 0 is for date/time strings that follow the standard format
703	   described in [RFC3339], as refined by Section 3.3 of [RFC4287].

705	   Tag value 1 is for numerical representation of seconds relative to
706	   1970-01-01T00:00Z in UTC time.  (For the non-negative values that
707	   POSIX defines, the number of seconds is counted in the same way as
708	   for POSIX "seconds since the epoch" [TIME_T].)  The tagged item can
709	   be a positive or negative integer (major types 0 and 1), or a
710	   floating point number (major type 7 with additional information 25,
711	   26 or 27).  Note that the number can be negative (time before
712	   1970-01-01T00:00Z) and, if a floating point number, indicate
713	   fractional seconds.

715	2.4.2.  Bignums

717	   Bignums are integers that do not fit into the basic integer
718	   representations provided by major types 0 and 1.  They are encoded as
719	   a byte string data item, which is interpreted as an unsigned integer
720	   n in network byte order.  For tag value 2, the value of the bignum is
721	   n.  For tag value 3, the value of the bignum is -1 - n.  Decoders
722	   that understand these tags MUST be able to decode bignums that have
723	   leading zeroes.

725	   For example, the number 18446744073709551616 (2**64) is represented
726	   as 0b110_00010 (major type 6, tag 2), followed by 0b010_01001 (major
727	   type 2, length 9), followed by 0x010000000000000000 (one byte 0x01
728	   and eight bytes 0x00).  In hexadecimal:

730	   C2                        -- Tag 2
731	      29                     -- Byte string of length 9
732	         010000000000000000  -- Bytes content

734	2.4.3.  Decimal Fractions and Bigfloats

736	   Decimal fractions combine an integer mantissa with a base-10 scaling
737	   factor.  They are most useful if an application needs the exact
738	   representation of a decimal fraction such as 1.1 because there is no
739	   exact representation for many decimal fractions in binary floating
740	   point.

742	   Bigfloats combine an integer mantissa with a base-2 scaling factor.
743	   They are binary floating point values that can exceed the range or
744	   the precision of the three IEEE 754 formats supported by CBOR
745	   (Section 2.3).  Bigfloats may also be used by constrained
746	   applications that need some basic binary floating point capability
747	   without the need for supporting IEEE 754.

749	   A decimal fraction or a bigfloat is represented as a tagged array
750	   that contains exactly two integer numbers: an exponent e and a
751	   mantissa m.  Decimal fractions (tag 4) use base-10 exponents, the
752	   value of a decimal fraction data item is m*(10**e).  Bigfloats (tag
753	   5) use base-2 exponents, the value of a bigfloat data item is
754	   m*(2**e).  The exponent e MUST be represented in an integer of major
755	   type 0 or 1, while the mantissa also can be a bignum (Section 2.4.2).

757	   An example of a decimal fraction is that the number 273.15 could be
758	   represented as 0b110_00100 (major type of 6 for the tag, additional
759	   information of 4 for the type of tag), followed by 0b100_00010 (major
760	   type of 4 for the array, additional information of 2 for the length
761	   of the array), followed by 0b001_00001 (major type of 1 for the first
762	   integer, additional information of 1 for the value of -2), followed
763	   by 0b000_11001 (major type of 0 for the second integer, additional
764	   information of 25 for a two-byte value), followed by
765	   0b0110101010110011 (27315 in two bytes).  In hexadecimal:

767	   C4             -- Tag 4
768	      82          -- Array of length 2
769	         21       -- -2
770	         19 6ab3  -- 27315

772	   An example of a bigfloat is that the number 1.5 could be represented
773	   as 0b110_00101 (major type of 6 for the tag, additional information
774	   of 5 for the type of tag), followed by 0b100_00010 (major type of 4
775	   for the array, additional information of 2 for the length of the
776	   array), followed by 0b001_00000 (major type of 1 for the first
777	   integer, additional information of 0 for the value of -1), followed
778	   by 0b000_00011 (major type of 0 for the second integer, additional
779	   information of 3 for the value of 3).  In hexadecimal:

781	   C5             -- Tag 5
782	      82          -- Array of length 2
783	         20       -- -1
784	         03       -- 3

786	   Decimal fractions and bigfloats provide no representation of
787	   Infinity, -Infinity, or NaN; if these are needed in place of a
788	   decimal fraction or bigfloat, the IEEE 754 half precision
789	   representations from Section 2.3 can be used.  For constrained
790	   applications, where there is a choice between representing a specific
791	   number as an integer and as a decimal fraction or bigfloat (such as
792	   when the exponent is small and non-negative), there is a quality of
793	   implementation expectation that the integer representation is used
794	   directly.

796	2.4.4.  Content Hints

798	   The tags in this section are for content hints that might be used by
799	   generic CBOR processors.

801	2.4.4.1.  Encoded CBOR data item

803	   Sometimes it is beneficial to carry an embedded CBOR data item that
804	   is not meant to be decoded immediately at the time the enclosing data
805	   item is being parsed.  Tag 24 (CBOR data item) can be used to tag the
806	   embedded byte string as a data item encoded in CBOR format.

808	2.4.4.2.  Expected Later Encoding for CBOR to JSON Converters
809	   Tags 21 to 23 indicate that a byte string might require a specific
810	   encoding when interoperating with a text-based representation.  These
811	   tags are useful when an encoder knows that the byte string data it is
812	   writing is likely to be later converted to a particular JSON-based
813	   usage.  That usage specifies that some strings are encoded as Base64,
814	   Base64url, and so on.  The encoder uses byte strings instead of doing
815	   the encoding itself to reduce the message size, to reduce the code
816	   size of the encoder, or both.  The encoder does not know whether or
817	   not the converter will be generic, and therefore wants to say what it
818	   believes is the proper way to convert binary strings to JSON.

820	   The data item tagged can be a byte string, or any other data item.
821	   In the latter case, the tag applies to all of the byte string data
822	   items contained in the data item, except for those contained in a
823	   nested expected conversion tagged item.

825	   These three tag types suggest conversions to three of the base data
826	   encodings defined in [RFC4648].  For base64url encoding, padding is
827	   not used (see section 3.2 of RFC 4648), that is all trailing equals
828	   signs ("=") are removed from the base64url encoded string.  Later
829	   tags might be defined for other data encodings of RFC 4648, or of
830	   other ways to encode binary data in strings.

832	2.4.4.3.  Encoded Text

834	   Some text strings hold data that have formats widely-used on the
835	   Internet, and sometimes those formats can be validated and presented
836	   to the application in appropriate form by the decoder.  There are
837	   tags for some of these formats.

839	   o  Tag 32 is for URIs, as defined in [RFC3986];

841	   o  Tags 33 and 34 are for base64url and base64 encoded text strings,
842	      as defined in [RFC4648];

844	   o  Tag 35 is for regular expressions in PCRE/JavaScript syntax
845	      [ECMA262].

847	   o  Tag 36 is for MIME messages (including all headers), as defined in
848	      [RFC2045];

850	   Note that tag 33 and 34 differ from 21 and 22 in that the data is
851	   transported in base-encoded form for the former and in raw byte
852	   string form in the latter case.

854	2.4.5.  Self-describe CBOR
855	   In many applications, it will be clear from the context that CBOR is
856	   being employed for encoding a data item.  For instance, a specific
857	   protocol might specify the use of CBOR, or a media type is indicated
858	   that specifies its use.  However, there may be applications where
859	   such context information is not available, such as when CBOR data is
860	   stored in a file and disambiguating metadata is not in use.  Here, it
861	   may help to have some distinguishing characteristics for the data
862	   itself.

864	   Tag 55799 is defined for this purpose.  It does not impart any
865	   special semantics on the data item that follows, that is, the
866	   semantics of a data item tagged with tag 55799 is exactly identical
867	   to the semantics of the data item itself.

869	   The serialization of this tag is 0xd9d9f7, which appears not to be in
870	   use as a distinguishing mark for frequently used file types.  In
871	   particular, it is not a valid start of a Unicode text in any Unicode
872	   encoding if followed by a valid CBOR data item.

874	   For instance, a decoder might be able to parse both CBOR and JSON.
875	   Such a decoder would need to mechanically distinguish the two
876	   formats.  An easy way for an encoder to help the decoder would be to
877	   tag the entire CBOR item with Tag 55799, the serialization of which
878	   will never be found at the beginning of a JSON text.

880	3.  Creating CBOR-Based Protocols

882	   Data formats such as CBOR are often used in environments where there
883	   is no format negotiation.  A specific design goal of CBOR is to not
884	   need any included or assumed schema: a decoder can take a CBOR item
885	   and decode it with no other knowledge.

887	   Of course, in real-world implementations, the encoder and the decoder
888	   will have a shared view of what should be in a CBOR data item.  For
889	   example, an agreed-to format might be "the item is an array whose
890	   first value is a UTF-8 string, the second value is an integer,
891	   followed by zero or more floating point numbers" or "a map whose keys
892	   are byte strings that has to contain at least one pair whose key is
893	   0xab01".

895	   This specification puts no restrictions on CBOR-based protocols.  An
896	   encoder can be capable of encoding as many or as few types of values
897	   as is required by the protocol in which it is used; a decoder can be
898	   capable of understanding as many or as few types of values as is
899	   required by the protocols in which it is used.  This lack of
900	   restrictions allows CBOR to be used in extremely constrained
901	   environments.

903	   This section discusses some considerations in creating CBOR-based
904	   protocols.  It is advisory only, and explicitly excludes any language
905	   from RFC 2119 other than words that could be interpreted as "MAY" in
906	   the RFC 2119 sense.

908	3.1.  CBOR in Streaming Applications

910	   In a streaming application, a data stream may be composed of a
911	   sequence of CBOR data items concatenated back-to-back.  In such an
912	   environment, the decoder immediately begins decoding a new data item
913	   if data is found after the end of a previous data item.

915	   Not all of the bytes making up a data item may be immediately
916	   available to the decoder; some decoders will buffer additional data
917	   until a complete data item can be presented to the application.
918	   Other decoders can present partial information about a top-level data
919	   item to an application, such as the nested data items that could
920	   already be decoded, or even parts of a byte string that hasn't
921	   completely arrived yet.

923	   Note that some applications and protocols will not want to use
924	   indefinite length encoding.  Using indefinite length encoding allows
925	   an encoder to not need to marshall all the data for counting, but it
926	   requires a decoder to allocate increasing amounts of memory while
927	   waiting for the end of the item.  This might be fine for some
928	   applications but not others.

930	3.2.  Generic Encoders and Decoders

932	   A generic CBOR decoder can decode all well-formed CBOR data and
933	   present them to an application.  CBOR data are well-formed if the
934	   structure of the initial bytes and the byte strings/data items
935	   implied by their values is followed and no extraneous data follows
936	   (Appendix C).

938	   Even though CBOR attempts to minimize these cases, not all well-
939	   formed CBOR data are valid: for example, the format excludes simple
940	   values below 32 that are encoded with an extension byte.  Also,
941	   specific tags may make semantic constraints that may be violated,
942	   such as by including a tag in a bignum tag or by following a byte
943	   string within a date tag.  Finally, the data may be invalid, such as
944	   invalid UTF-8 strings or date strings that do not conform to
945	   [RFC3339].  There is no requirement that generic encoders and
946	   decoders make unnatural choices for their application interface to
947	   enable the processing of invalid data.  Generic encoders and decoders
948	   are expected to forward simple values and tags even if their specific
949	   code points had not been registered at the time the encoder/decoder
950	   has been written (Section 3.5).

952	   Generic decoders provide ways to present well-formed CBOR values,
953	   both valid and invalid, to an application.  The diagnostic notation
954	   (Section 6) may be used to present well-formed CBOR values to humans.

956	   Generic encoders provide an application interface that allows the
957	   application to specify any well-formed value, including simple values
958	   and tags unknown to the encoder.

960	3.3.  Syntax Errors

962	   A decoder encountering a CBOR data item that is not well-formed
963	   generally can choose to completely fail the decoding (issue an error
964	   and/or stop processing altogether), substitute the problematic data
965	   and data items using an decoder-specific convention that clearly
966	   indicates there has been a problem, or it might take some other
967	   action.

969	3.3.1.  Incomplete CBOR data items

971	   The representation of a CBOR data item has a specific length,
972	   determined by its initial bytes and by the structure of any data
973	   items enclosed in the data items.  If less data is available, this
974	   can be treated as a syntax error.  A decoder may also implement
975	   incremental parsing, that is, decode the data item as far as it is
976	   available and present the data found so far, (such as in an event-
977	   based interface) with the option of continuing the decoding once
978	   further data are available.

980	   Examples of incomplete data items include:

982	   o  a decoder expecting a certain number of array or map entries
983	      instead encounters the end of the data

985	   o  a decoder processing what it expects to be the last pair in a map
986	      comes to the end of the data

988	   o  a decoder has just seen a tag and then encounters the end of the
989	      data

991	   o  a decoder has seen the beginning of an indefinite length item but
992	      encounters the end of the data before it sees the "break" stop
993	      code

995	3.3.2.  Malformed Indefinite Length Items

997	   Examples of malformed indefinite length data items include:

999	   o  Within an indefinite length byte string or text, a decoder finds
1000	      an item that is not of the appropriate major type before it finds
1001	      the "break" stop code

1003	   o  Within an indefinite length map, a decoder encounters the "break"
1004	      stop code immediately after reading a key (the value is missing)

1006	   Another error is a "break" stop code that is found when there is no
1007	   immediately enclosing indefinite length item needing to be closed.

1009	3.3.3.  Unknown Additional Information Values

1011	   At the time this document is written, some additional information
1012	   values are unassigned and reserved for future versions of this
1013	   document (see Section 5.2).  Since the overall syntax for these
1014	   additional information values is not yet defined, a decoder that sees
1015	   an additional information value that it does not understand cannot
1016	   continue parsing.

1018	3.4.  Other Decoding Errors

1020	   A CBOR data item may be syntactically well-formed, but present a
1021	   problem with interpreting the data encoded in it in the CBOR data
1022	   model.  Generally speaking, a decoder that finds a data item with
1023	   such a problem might issue a warning, might stop processing
1024	   altogether, might handle the error and make the problematic value
1025	   available to the application as such, or take some other type of
1026	   action.

1028	   Such problems might include:

1030	   Duplicate keys in a map:  Generic decoders (Section 3.2) make data
1031	      available to applications using the native CBOR data model.  That
1032	      data model includes maps (key-value mappings with unique keys),
1033	      not multimaps (key-value mappings where multiple entries can have
1034	      the same key).  Thus, a generic decoder that gets a CBOR map item
1035	      that has duplicate keys will decode to a map with only one
1036	      instance of that key, or it might stop processing altogether.  On
1037	      the other hand, a "streaming decoder" may not even be able to
1038	      notice (Section 3.7).

1040	   Inadmissible type on the value following a tag:  Tags (Section 2.4)
1041	      specify what type of data item is supposed to follow the tag; for
1042	      example, the tags for positive or negative bignums are supposed to
1043	      be put on byte strings.  A decoder that decodes the tagged data
1044	      item into a native representation (a native big integer in this
1045	      example) is expected to check the type of the data item being
1046	      tagged.  Even decoders that don't have such native representations
1047	      available in their environment may perform the check on those tags
1048	      known to them and react appropriately.

1050	   Invalid UTF-8 string:  A decoder might or might not want to verify
1051	      that the sequence of bytes in an UTF-8 string (major type 3) is
1052	      actually valid UTF-8 and react appropriately.

1054	3.5.  Handling Unknown Simple Values and Tags

1056	   A decoder that comes across a simple value (Section 2.3) that it does
1057	   not recognize, such as a value that was added to the IANA registry
1058	   after the decoder was deployed or a value that the decoder chose not
1059	   to implement, might issue a warning, might stop processing
1060	   altogether, might handle the error by making the unknown value
1061	   available to the application as such (as is expected of generic
1062	   decoders), or take some other type of action.

1064	   A decoder that comes across a tag (Section 2.4) that it does not
1065	   recognize, such as a tag that was added to the IANA registry after
1066	   the decoder was deployed or a tag that the decoder chose not to
1067	   implement, might issue a warning, might stop processing altogether,
1068	   might handle the error and present the unknown tag value together
1069	   with the contained data item to the application (as is expected of
1070	   generic decoders), might ignore the tag and simply present the
1071	   contained data item only to the application, or take some other type
1072	   of action.

1074	3.6.  Numbers

1076	   For the purposes of this specification, all number representations
1077	   for the same numeric value are equivalent.  This means that an
1078	   encoder can encode a floating point value of 0.0 as the integer 0.
1079	   It, however, also means that an application that expects to find
1080	   integer values only might find floating point values if the encoder
1081	   decides these are desirable, such as when the floating point value is
1082	   more compact than a 64-bit integer.

1084	   An application or protocol that uses CBOR might restrict the
1085	   representations of numbers.  For instance, a protocol that only deals
1086	   with integers might say that floating point numbers may not be used
1087	   and that decoders of that protocol do not need to be able to handle
1088	   floating point numbers.  Similarly, a protocol or application that
1089	   uses CBOR might say that decoders need to be able to handle either
1090	   type of number.

1092	   CBOR-based protocols should take into account that different language
1093	   environments pose different restrictions on the range and precision
1094	   of numbers that are representable.  For example, the JavaScript
1095	   number system treats all numbers as floating-point, which may result
1096	   in silent loss of precision in decoding integers with more than 53
1097	   significant bits.  A protocol that uses numbers should define its
1098	   expectations on the handling of non-trivial numbers in decoders and
1099	   receiving applications.

1101	   A CBOR-based protocol that includes floating point numbers can
1102	   restrict which of the three formats (half-precision, single-
1103	   precision, and double-precision) are to be supported.  For an
1104	   integer-only application, a protocol may want to completely exclude
1105	   the use of floating point values.

1107	   A CBOR-based protocol designed for compactness may want to exclude
1108	   specific integer encodings that are longer than necessary for the
1109	   application, such as to save the need to implement 64-bit integers.
1110	   There is an expectation that encoders will use the most compact
1111	   integer representation that can represent a given value.  However, a
1112	   compact application should accept values that use a longer-than
1113	   needed encoding (such as encoding "0" as 0b000_11101 followed by two
1114	   bytes of 0x00) as long as the application can decode an integer of
1115	   the given size.

1117	3.7.  Specifying Keys for Maps

1119	   The encoding and decoding applications need to agree on what types of
1120	   keys are going to be used in maps.  In applications that need to
1121	   interwork with JSON-based applications, keys probably should be
1122	   limited to UTF-8 strings only; otherwise, there has to be a specified
1123	   mapping from the other CBOR types to Unicode characters, and this
1124	   often leads to implementation errors.  In applications where keys are
1125	   numeric in nature and numeric ordering of keys is important to the
1126	   application, directly using the numbers for the keys is useful.

1128	   If multiple types of keys are to used, consideration should be given
1129	   to how these types would be represented in the specific programming
1130	   environments that are to be used.  For example, in JavaScript
1131	   objects, a key of integer 1 cannot be distinguished from a key of
1132	   string "1".  This means that, if integer keys are used, the
1133	   simultaneous use of string keys that look like numbers needs to be
1134	   avoided.  Again, this leads to the conclusion that keys should be of
1135	   a single CBOR type.

1137	   Decoders that deliver data items nested within a CBOR data item
1138	   immediately on decoding them ("streaming decoders") often do not keep
1139	   the state that is necessary to ascertain uniqueness of a key in a
1140	   map.  Similarly, an encoder that can start encoding data items before
1141	   the enclosing data item is completely available ("streaming encoder")
1142	   may want to reduce its overhead significantly by relying on its data
1143	   source to maintain uniqueness.

1145	   A CBOR-based protocol should make an intentional decision about what
1146	   to do when a receiving application does see multiple identical keys
1147	   in a map.  The resulting rule in the protocol should respect the CBOR
1148	   data model: it cannot prescribe a specific handling of the entries
1149	   with the identical keys, except that it might have a rule that having
1150	   identical keys in a map indicates a malformed map and that the
1151	   decoder has to stop with an error.  Duplicate keys are also
1152	   prohibited by CBOR decoders that are using Section 3.10.

1154	   The CBOR data model for maps does not allow ascribing semantics to
1155	   the order of the key/value pairs in the map representation.
1156	   Thus, it would be a very bad practice to define a CBOR-based protocol
1157	   in such a way that changing the key/value pair order in a map would
1158	   change the semantics, apart from trivial aspects (cache usage etc.).
1159	   (A CBOR-based protocol can prescribe a specific order of
1160	   serialization, such as for canonicalization.)

1162	   Applications for constrained devices that have maps with 24 or fewer
1163	   frequently used keys should consider using small integers (and those
1164	   with up to 48 frequently used keys should consider also using small
1165	   negative integers) because the keys can then be encoded in a single
1166	   byte.

1168	3.8.  Undefined Values

1170	   In some CBOR-based protocols, the simple value (Section 2.3) of
1171	   Undefined might be used by an encoder as a substitute for a data item
1172	   with an encoding problem, in order to allow the rest of the enclosing
1173	   data items to be encoded without harm.

1175	3.9.  Canonical CBOR

1177	   Some protocols may want encoders to only emit CBOR in a particular
1178	   canonical format; those protocols might also have the decoders check
1179	   that their input is canonical.  Those protocols are free to define
1180	   what they mean by a canonical format and what encoders and decoders
1181	   are expected to do.  This section lists some suggestions for such
1182	   protocols.

1184	   If a protocol considers "canonical" to mean that two encoder
1185	   implementations starting with the same input data will produce the
1186	   same CBOR output, the following four rules would suffice:

1188	   o  Integers must be as small as possible.

1190	      *  0 to 23 and -1 to -24 must be expressed in the same byte as the
1191	         major type;

1193	      *  24 to 255 and -25 to -256 must be expressed only with an
1194	         additional uint8_t;

1196	      *  256 to 65535 and -257 to -65536 must be expressed only with an
1197	         additional uint16_t;

1199	      *  65536 to 4294967295 and -65537 to -4294967296 must be expressed
1200	         only with an additional uint32_t.

1202	   o  The expression of lengths in major types 2 through 5 must be as
1203	      short as possible.  The rules for these lengths follow the above
1204	      rule for integers.

1206	   o  The keys in every map must be sorted lowest value to highest.
1207	      Sorting is performed on the bytes of the representation of the key
1208	      data items without paying attention to the 3/5 bit splitting for
1209	      major types.  (Note that this rule allows maps that have keys of
1210	      different types, even though that is probably a bad practice that
1211	      could lead to errors in some canonicalization implementations.)
1212	      The sorting rules are:

1214	      *  If two keys have different lengths, the shorter one sorts
1215	         earlier;

1217	      *  If two keys have the same length, the one with the lower value
1218	         in (byte-wise) lexical order sorts earlier.

1220	   o  Indefinite length items must be made into definite length items.

1222	   If a protocol allows for IEEE floats, then additional
1223	   canonicalization rules might need to be added.  One example rule
1224	   might be to have all floats start as a 64-bit float, then do a test
1225	   conversion to a 32-bit float; if the result is the same numeric
1226	   value, use the shorter value and repeat the process with a test
1227	   conversion to a 16-bit float.  (This rule selects 16-bit float for
1228	   positive and negative infinity as well.)  Also, there are many
1229	   representations for NaN.  If NaN is an allowed value, it must always
1230	   be represented as 0xf97e00.

1232	   CBOR tags present additional considerations for the canonicalization.
1233	   The absence or presence of tags in a canonical format is determined
1234	   by the optionality of the tags in the protocol.  In a CBOR-based
1235	   protocol that allows optional tagging anywhere, the canonical format
1236	   must not allow them.  In a protocol that requires tags in certain
1237	   places, the tag needs to appear in the canonical format.  A CBOR-
1238	   based protocol that uses canonicalization might instead say that all
1239	   tags that appear in a message must be retained regardless of whether
1240	   they are optional.

1242	3.10.  Strict Mode

1244	   Some areas of application of CBOR do not require canonicalization
1245	   (Section 3.9), but may require that different decoders reach the same
1246	   (semantically equivalent) results, even in the presence of
1247	   potentially malicious data.  This can be required if one application
1248	   (such as a firewall or other protecting entity) makes a decision
1249	   based on the data that another application, which independently
1250	   decodes the data, relies on.

1252	   Normally, it is the responsibility of the sender to avoid ambiguously
1253	   decodable data.  However, the sender might be an attacker specially
1254	   making up CBOR data such that it will be interpreted differently by
1255	   different decoders in an attempt to exploit that as a vulnerability.
1256	   Generic decoders used in applications where this might be a problem
1257	   need to support a strict mode in which it is also the responsibility
1258	   of the receiver to reject ambiguously decodable data.  It is expected
1259	   that firewalls and other security systems that decode CBOR will only
1260	   decode in strict mode.

1262	   A decoder in strict mode will reliably reject any data that could be
1263	   interpreted by other decoders in different ways.  It will reliably
1264	   reject data items with syntax errors (Section 3.3).  It will also
1265	   expend the effort to reliably detect other decoding errors
1266	   Section 3.4.  In particular, a strict decoder needs to have an API
1267	   that reports an error (and does not return data) for a CBOR data item
1268	   that contains any of the following:

1270	   o  A map (major type 5) that has more than one entry with the same
1271	      key

1273	   o  A tag that is used on a data item of the incorrect type

1275	   o  A data item that is incorrectly formatted for the type given to
1276	      it, such as invalid UTF-8 or data that cannot be interpreted with
1277	      the specific tag that it has been tagged with

1279	   A decoder in strict mode can do one of two things when it encounters
1280	   a tag or simple value that it does not recognize:

1282	   o  It can report an error (and not return data).

1284	   o  It can emit the unknown item (type, value, and, for tags, the
1285	      decoded tagged data item) to the application calling the decoder
1286	      with an indication that the decoder did not recognize that tag or
1287	      simple value.

1289	   The latter approach, which is also appropriate for non-strict
1290	   decoders, supports forward compatibility with newly registered tags
1291	   and simple values without the requirement to update the encoder at
1292	   the same time as the calling application.  (For this, the API for the
1293	   decoder needs to have a way to mark unknown items so that the calling
1294	   application can handle them in a manner appropriate for the program.)

1296	   Since some of this processing may have an appreciable cost (in
1297	   particular with duplicate detection for maps), support of strict mode
1298	   is not a requirement placed on all CBOR decoders.

1300	   Some encoders will rely on their applications to provide input data
1301	   in such a way that unambiguously decodable CBOR results.  A generic
1302	   encoder also may want to provide a strict mode where it reliably
1303	   limits its output to unambiguously decodable CBOR, independent of
1304	   whether its application is providing API-conformant data or not.

1306	4.  Converting Data Between CBOR and JSON

1308	   This section gives non-normative advice about converting between CBOR
1309	   and JSON.  Implementations of converters are free to use whichever
1310	   advice here they want.

1312	   It is worth noting that a JSON text is a sequence of characters, not
1313	   an encoded sequence of bytes, while a CBOR data item consist of
1314	   bytes, not characters.

1316	4.1.  Converting From CBOR to JSON

1318	   Most of the types in CBOR have direct analogs in JSON.  However, some
1319	   do not, and someone implementing a CBOR-to-JSON converter has to
1320	   consider what to do in those cases.  The following non-normative
1321	   suggestion deals with these by converting them to a single substitute
1322	   value, such as a JSON null.

1324	   o  An Integer (major type 0 or 1) becomes a JSON number.

1326	   o  A byte string (major type 2) that is not embedded in a tag that
1327	      specifies a proposed encoding is encoded in Base64url without
1328	      padding and becomes a JSON string.

1330	   o  A UTF-8 string (major type 3) becomes a JSON string.  Note that
1331	      JSON requires escaping certain characters (RFC 4627, section 2.5):
1332	      quotation mark (U+0022), reverse solidus (U+005C), and the "C0
1333	      control characters" (U+0000 through U+001F).  All other characters
1334	      are copied unchanged into the JSON UTF-8 string.

1336	   o  An array (major type 4) becomes a JSON array.

1338	   o  A map (major type 5) becomes a JSON object.  This is possible
1339	      directly only if all keys are UTF-8 strings.  A converter might
1340	      also convert other keys into UTF-8 strings (such as by converting
1341	      integers into strings containing their decimal representation);
1342	      however, doing so introduces a danger of key collision.

1344	   o  False (major type 7, additional information 20) becomes a JSON
1345	      false.

1347	   o  True (major type 7, additional information 21) becomes a JSON
1348	      true.

1350	   o  Null (major type 7, additional information 22) becomes a JSON
1351	      null.

1353	   o  A floating point value (major type 7, additional information 25
1354	      through 27) becomes a JSON number if it is finite (that is, it can
1355	      be represented in a JSON number); if the value is non-finite (NaN,
1356	      or positive or negative Infinity), it is represented by the
1357	      substitute value.

1359	   o  Any other simple value (Major type 7, any additional information
1360	      value not yet discussed) is represented by the substitute value.

1362	   o  A bignum (major type 6, tag value 2 or 3) is represented by
1363	      encoding its byte string in Base64url without padding and becomes
1364	      a JSON string.  For tag value 3 (negative bignum), a "~" (ASCII
1365	      tilde) is inserted before the base-encoded value.  (The conversion
1366	      to a binary blob instead of a number is to prevent a likely
1367	      numeric overflow for the JSON decoder.)

1369	   o  A byte string with an encoding hint (major type 6, tag value 21
1370	      through 23) is encoded as described and becomes a JSON string.

1372	   o  For all other tags (major type 6, any other tag value), the
1373	      embedded CBOR item is represented as a JSON value; the tag value
1374	      is ignored.

1376	   o  Indefinite length items are made definite before conversion.

1378	4.2.  Converting From JSON to CBOR

1380	   All JSON values, once decoded, directly map into one or more CBOR
1381	   values.  As with any kind of CBOR generation, decisions have to be
1382	   made with respect to number representation.  In a suggested
1383	   conversion:

1385	   o  JSON numbers without fractional parts (integer numbers) are
1386	      represented as integers (major types 0 and 1, possibly major type
1387	      6 tag value 2 and 3), choosing the shortest form; integers longer
1388	      than an implementation-defined threshold (which is usually either
1389	      32 or 64 bits) may instead be represented as floating point
1390	      values.  (If the JSON was generated from a JavaScript
1391	      implementation, its precision is already limited to 53 bits
1392	      maximum.)

1394	   o  Numbers with fractional parts are represented as floating point
1395	      values.  Preferably, the shortest exact floating point
1396	      representation is used; for instance, 1.5 is represented in a
1397	      16-bit floating point value (not all implementations will be
1398	      efficiently capable of finding the minimum form, though).  There
1399	      may be an implementation-defined limit to the precision that will
1400	      affect the precision of the represented values.  Decimal
1401	      representation should only be used if that is specified in a
1402	      protocol.

1404	   CBOR has been designed to generally provide a more compact encoding
1405	   than JSON.  One implementation strategy that might come to mind is to
1406	   perform a JSON to CBOR encoding in place in a single buffer.  This
1407	   strategy would need to carefully consider a number of pathological
1408	   case, such as that some strings represented with no or very few
1409	   escapes and longer (or much longer) than 255 may expand when encoded
1410	   as UTF-8 strings in CBOR.  Similarly, a few of the binary floating
1411	   point representations might cause expansion from some short decimal
1412	   representations (1.1, 1e9) in JSON.  This may be hard to get right
1413	   and any ensuing vulnerabilities may be exploited by an attacker.

1415	5.  Future Evolution of CBOR

1417	   Successful protocols evolve over time.  New ideas appear,
1418	   implementation platforms improve, related protocols are developed and
1419	   evolve, and new requirements from applications and protocols are
1420	   added.  Facilitating protocol evolution is therefore an important
1421	   design consideration for any protocol development.

1423	   For protocols that will use CBOR, CBOR provides some useful
1424	   mechanisms to facilitate their evolution.  Best practices for this
1425	   are well known, particularly from JSON format development of JSON-
1426	   based protocols.  Therefore, such best practices are outside the
1427	   scope of this specification.

1429	   However, facilitating the evolution of CBOR itself is very well
1430	   within its scope.  CBOR is designed to both provide a stable basis
1431	   for development of CBOR-based protocols and to be able to evolve.
1432	   Since a successful protocol may live for decades, CBOR needs to be
1433	   designed for decades of use and evolution.  This section provides
1434	   some guidance for the evolution of CBOR.  It is necessarily more
1435	   subjective than other parts of this document.  It is also necessarily
1436	   incomplete, lest it turn into a textbook on protocol development.

1438	5.1.  Extension Points

1440	   In a protocol design, opportunities for evolution are often included
1441	   in the form of extension points.  For example, there may be a code
1442	   point space that is not fully allocated from the outset, and the
1443	   protocol is designed to tolerate and embrace implementations that
1444	   start using more code points than initially allocated.

1446	   Sizing the code point space may be difficult because the range
1447	   required may be hard to predict.  An attempt should be made to make
1448	   the codepoint space large enough so that it can slowly be filled over
1449	   the intended lifetime of the protocol.

1451	   CBOR has three major extension points:

1453	   o  the "simple" space (values in major type 7).  Of the 24 efficient
1454	      (and 224 slightly less efficient) values, only a small number have
1455	      been allocated.  Implementations receiving an unknown simple data
1456	      item may be able to process it as such, given that the structure
1457	      of the value is indeed simple.  An IANA registry is appropriate
1458	      here.

1460	   o  the "tag" space (values in major type 6).  Again, only a small
1461	      part of the code point space has been allocated, and the space is
1462	      abundant (although the early numbers are more efficient than the
1463	      later ones).  Implementations receiving an unknown tag can choose
1464	      to simply ignore it, or to process it as an unknown tag wrapping
1465	      the following data item.  An IANA registry is appropriate here.

1467	   o  the "additional information" space.  An implementation receiving
1468	      an unknown additional information has no way to continue parsing,
1469	      so allocating codepoints to this space is a major step.  There are
1470	      also very few codepoints left.

1472	5.2.  Curating the Additional Information Space

1474	   The human mind is sometimes drawn to filling in little perceived gaps
1475	   to make something neat.  We expect the remaining gaps in the code
1476	   point space for the additional information values to be an attractor
1477	   for new ideas, just because they are there.

1479	   The present specification does not manage the additional information
1480	   code point space by an IANA registry.  Instead, allocations out of
1481	   this space can only be done by updating this specification.

1483	   For an additional information value of n >= 24, the size of the
1484	   additional data typically is 2**(n-24) bytes.  Therefore, additional
1485	   information values 28 and 29 should be viewed as candidates for
1486	   128-bit and 256-bit quantities, in case a need arises to add them to
1487	   the protocol.  Additional information value 30 is then the only
1488	   additional information value available for general allocation, and
1489	   there should be a very good reason for allocating it before assigning
1490	   it through an update of this protocol.

1492	6.  Diagnostic Notation

1494	   CBOR is a binary interchange format.  To facilitate documentation and
1495	   debugging, and in particular to facilitate communication between
1496	   entities cooperating in debugging, this section defines a simple
1497	   human-readable diagnostic notation.  All actual interchange always
1498	   happens in the binary format.

1500	   Note that this truly is a diagnostic format; it is not meant to be
1501	   parsed.  Therefore, no formal definition (as in ABNF) is given in
1502	   this document.  (Implementers looking for a text-based format for
1503	   representing CBOR data items in configuration files may also want to
1504	   consider YAML [YAML].)

1506	   The diagnostic notation is loosely based on JSON as it is defined in
1507	   RFC 4627, extending it where needed.

1509	   The notation borrows the JSON syntax for numbers (integer and
1510	   floating point), True (>true<), False (>false<), Null (>null<), UTF-8
1511	   strings, arrays and maps (maps are called objects in JSON; the
1512	   diagnostic notation extends JSON here by allowing any data item in
1513	   the key position).  Undefined is written >undefined< as in
1514	   JavaScript.  The non-finite floating point numbers Infinity,
1515	   -Infinity, and NaN are written exactly as in this sentence (this is
1516	   also a way they can be written in JavaScript, although JSON does not
1517	   allow them).  A tagged item is written as an integer number for the
1518	   tag followed by the item in parentheses; for instance, an RFC 3339
1519	   (ISO 8601) date could be notated as:

1521	      0("2013-03-21T20:04:00Z")

1523	   or the equivalent relative time as

1525	      1(1363896240)

1527	   Byte strings are notated in one of the base encodings, without
1528	   padding, enclosed in single quotes, prefixed by >h< for base16, >b32<
1529	   for base32, >h32< for base32hex, >b64< for base64 or base64url (the
1530	   actual encodings do not overlap, so the string remains unambiguous).
1531	   For example, the byte string 0x12345678 could be written h'12345678',
1532	   b32'CI2FM6A', or b64'EjRWeA'.

1534	   Unassigned simple values are given as "simple()" with the appropriate
1535	   integer in the parentheses.  For example, "simple(42)" indicates
1536	   major type 7, value 42.

1538	6.1.  Encoding indicators

1540	   Sometimes it is useful to indicate in the diagnostic notation which
1541	   of several alternative representations were actually used; for
1542	   example, a data item written >1.5< by a diagnostic decoder might have
1543	   been encoded as a half-, single-, or double-precision float.

1545	   The convention for encoding indicators is that anything starting with
1546	   an underscore and all following characters that are alphanumeric or
1547	   underscore, is an encoding indicator, and can be ignored by anyone
1548	   not interested in this information.  Encoding indicators are always
1549	   optional.

1551	   A single underscore can be written after the opening brace of a map
1552	   or the opening bracket of an array to indicate that the data item was
1553	   represented in indefinite length format.  For example, [_ 1, 2]
1554	   contains a indicator that an indefinite length representation was
1555	   used to represent the data item [1, 2].

1557	   An underscore followed by a decimal digit n indicates that the
1558	   preceding item (or, for arrays and maps, the item starting with the
1559	   preceding bracket or brace) was encoded with an additional
1560	   information value of 24+n.  For example, 1.5_1 is a half precision
1561	   floating point number, while 1.5_3 is encoded as double precision.
1562	   (This encoding indicator is not shown in Appendix A.)  (Note that the
1563	   encoding indicator "_" is thus an abbreviation of the full form "_7",
1564	   which is not used.)

1566	   As a special case, byte and text strings of indefinite length can be
1567	   notated in the form (_ h'0123', h'4567') and (_ "foo", "bar").

1569	7.  IANA Considerations

1571	   IANA will create two registries for new CBOR values.  The registries
1572	   will be separate, not under an umbrella registry.  The registries
1573	   will follow the rules in [RFC5226].  IANA will also assign a new MIME
1574	   media type and an associated CoAP Content-Format entry.

1576	7.1.  Simple Values Registry

1578	   A registry called "CBOR Simple Values" will be created.  The initial
1579	   values are shown in Table 2.

1581	   New entries in the range 0 to 19 will be assigned by Standards
1582	   Action.  It is suggested that these Standards Actions allocate values
1583	   starting with the number 16 in order to reserve the lower numbers for
1584	   any contiguous block.

1586	   New entries in the range 32 to 255 will be assigned by Specification
1587	   Required.

1589	7.2.  Tags Registry

1591	   A registry called "CBOR Tags" will be created.  The initial values
1592	   are shown in Table 3.

1594	   New entries in the range 0 to 23 will be assigned by Standards
1595	   Action.  New entries in the range 24 to 255 will be assigned by
1596	   Specification Required.  New entries in the range 256 to
1597	   18446744073709551615 will be assigned by First Come First Served.
1598	   The template for First Come First Served will include point of
1599	   contact and an optional field for URL to a description of the
1600	   semantics of the tag; the latter can be something like an Internet-
1601	   Draft or a web page.

1603	7.3.  Media Type ("MIME Type")
1604	   The Internet media type [RFC6838] for CBOR data is application/cbor.

1606	   Type name: application

1608	   Subtype name: cbor

1610	   Required parameters: n/a

1612	   Optional parameters: n/a

1614	   Encoding considerations:  binary

1616	   Security considerations:  See Section 8 of this document

1618	   Interoperability considerations: n/a

1620	   Published specification: This document

1622	   Applications that use this media type:  None yet, but it is expected
1623	      that this format will be deployed in protocols and applications.

1625	   Additional information:
1626	     Magic number(s): n/a
1627	     File extension(s): .cbor
1628	     Macintosh file type code(s): n/a

1630	   Person & email address to contact for further information:
1631	     Carsten Bormann
1632	     cabo@tzi.org

1634	   Intended usage: COMMON

1636	   Restrictions on usage: none

1638	   Author:
1639	     Carsten Bormann 

1641	   Change controller:
1642	     The IESG 

1644	7.4.  CoAP Content-Format

1646	   Media Type: application/cbor

1648	   Encoding: -

1650	   Id: 60
1651	   Reference: [RFC-THIS-SPEC]

1653	7.5.  The +cbor Structured Syntax Suffix Registration

1655	   Name: Concise Binary Object Representation (CBOR)

1657	   +suffix: +cbor

1659	   References: [RFC-THIS-SPEC]

1661	   Encoding considerations: CBOR is a binary format.

1663	   Fragment identifier considerations:
1664	     The syntax and semantics of fragment identifiers specified for
1665	     +cbor SHOULD be as specified for "application/cbor".  (At
1666	     publication of this document, there is no fragment identification
1667	     syntax defined for "application/cbor".)

1669	     The syntax and semantics for fragment identifiers for a specific
1670	     "xxx/yyy+cbor" SHOULD be processed as follows:

1672	     For cases defined in +cbor, where the fragment identifier resolves
1673	     per the +cbor rules, then process as specified in +cbor.

1675	     For cases defined in +cbor, where the fragment identifier does
1676	     not resolve per the +cbor rules, then process as specified in
1677	     "xxx/yyy+cbor".

1679	     For cases not defined in +cbor, then process as specified in
1680	     "xxx/yyy+cbor".

1682	   Interoperability considerations: n/a

1684	   Security considerations:  See Section 8 of this document

1686	   Contact:
1687	     Apps Area Working Group (apps-discuss at ietf.org)

1689	   Author/Change controller:
1690	     The Apps Area Working Group.
1691	     The IESG has change control over this registration.

1693	8.  Security Considerations

1695	   A network-facing application can exhibit vulnerabilities in its
1696	   processing logic for incoming data.  Complex parsers are well known
1697	   as a likely source of such vulnerabilities, such as the ability to
1698	   remotely crash a node, or even remotely execute arbitrary code on it.
1699	   CBOR attempts to narrow the opportunities for introducing such
1700	   vulnerabilities by reducing parser complexity, by giving the entire
1701	   range of encodable values a meaning where possible.

1703	   Resource exhaustion attacks might attempt to lure a decoder into
1704	   allocating very big data items (strings, arrays, maps) or exhaust the
1705	   stack depth by setting up deeply nested items.  Decoders need to have
1706	   appropriate resource management to mitigate these attacks.  (Items
1707	   for which very large sizes are given can also attempt to exploit
1708	   integer overflow vulnerabilities.)

1710	   Applications where CBOR data items are examined by a gatekeeper
1711	   function and later used by a different application may exhibit
1712	   vulnerabilities when multiple interpretations of the data item are
1713	   possible.  For example, an attacker could make use of duplicate keys
1714	   in maps and precision issues in numbers to make the gatekeeper base
1715	   its decisions on a different interpretation than the one that will be
1716	   used by the second application.  Protocols that are used in a
1717	   security context should be defined in such a way that these multiple
1718	   interpretations are reliably reduced to a single one.  To facilitate
1719	   this, encoder and decoder implementations used in such contexts
1720	   should provide at least one strict mode of operation (Section 3.10).

1722	9.  Acknowledgements

1724	   CBOR was inspired by MessagePack.  MessagePack was developed and
1725	   promoted by Sadayuki Furuhashi ("frsyuki").  This reference to
1726	   MessagePack is solely for attribution; CBOR is not intended as a
1727	   version of or replacement for MessagePack, as it has different design
1728	   goals and requirements.

1730	   The need for functionality beyond the original MessagePack
1731	   Specification became obvious to many people at about the same time
1732	   around the year 2012.  BinaryPack is a minor derivation of
1733	   MessagePack that was developed by Eric Zhang for the binaryjs
1734	   project.  A similar, but different extension was made by Tim Caswell
1735	   for his msgpack-js and msgpack-js-browser projects.  Many people have
1736	   contributed to the recent discussion about extending MessagePack to
1737	   separate text string representation from byte string representation.

1739	   The encoding of the additional information in CBOR was inspired by
1740	   the encoding of length information designed by Klaus Hartke for CoAP.

1742	   This document also incorporates suggestions made by many people,
1743	   notably Dan Frost, James Manger, Joe Hildebrand, Keith Moore, Matthew
1744	   Lepinski, Nico Williams, Phillip Hallam-Baker, Ray Polk, Tim Bray,
1745	   Tony Finch, Tony Hansen, and Yaron Sheffer.

1747	10.  References

1749	10.1.  Normative References

1751	   [ECMA262]  European Computer Manufacturers Association, "ECMAScript
1752	              Language Specification 5.1 Edition", ECMA Standard
1753	              ECMA-262, June 2011, .

1756	   [RFC2045]  Freed, N. and N.S. Borenstein, "Multipurpose Internet Mail
1757	              Extensions (MIME) Part One: Format of Internet Message
1758	              Bodies", RFC 2045, November 1996.

1760	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1761	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1763	   [RFC3339]  Klyne, G., Ed. and C. Newman, "Date and Time on the
1764	              Internet: Timestamps", RFC 3339, July 2002.

1766	   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
1767	              10646", STD 63, RFC 3629, November 2003.

1769	   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
1770	              Resource Identifier (URI): Generic Syntax", STD 66, RFC
1771	              3986, January 2005.

1773	   [RFC4287]  Nottingham, M., Ed. and R. Sayre, Ed., "The Atom
1774	              Syndication Format", RFC 4287, December 2005.

1776	   [RFC4648]  Josefsson, S., "The Base16, Base32, and Base64 Data
1777	              Encodings", RFC 4648, October 2006.

1779	   [RFC5226]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
1780	              IANA Considerations Section in RFCs", BCP 26, RFC 5226,
1781	              May 2008.

1783	   [TIME_T]   The Open Group Base Specifications Issue 7, Volume 1,
1784	              "Base Definitions, section 4.15 'Seconds Since the
1785	              Epoch'", IEEE Std 1003.1, 2013 Edition, 2013, .

1789	10.2.  Informative References

1791	   [ASN.1]    International Telecommunications Union, "Information
1792	              Technology -- ASN.1 encoding rules: Specification of Basic
1793	              Encoding Rules (BER), Canonical Encoding Rules (CER) and
1794	              Distinguished Encoding Rules (DER)", ITU-T Recommendation
1795	              X.690, 1994.

1797	   [BSON]     Various, "BSON", 2013, .

1799	   [I-D.ietf-lwig-terminology]
1800	              Bormann, C., Ersue, M., and A. Keranen, "Terminology for
1801	              Constrained Node Networks", draft-ietf-lwig-terminology-05
1802	              (work in progress), July 2013.

1804	   [MessagePack]
1805	              FURUHASHI Sadayuki, "MessagePack", 2013,
1806	              .

1808	   [RFC0713]  Haverty, J., "MSDTP-Message Services Data Transmission
1809	              Protocol", RFC 713, April 1976.

1811	   [RFC4627]  Crockford, D., "The application/json Media Type for
1812	              JavaScript Object Notation (JSON)", RFC 4627, July 2006.

1814	   [RFC6838]  Freed, N., Klensin, J., and T. Hansen, "Media Type
1815	              Specifications and Registration Procedures", BCP 13, RFC
1816	              6838, January 2013.

1818	   [UBJSON]   The Buzz Media, "Universal Binary JSON Specification",
1819	              2013, .

1821	   [YAML]     Ben-Kiki, O., Evans, C., and I.d. Net, "YAML Ain't Markup
1822	              Language (YAML[TM]) Version 1.2, 3rd Edition", October
1823	              2009, .

1825	Appendix A.  Examples

1827	   The following table provides some CBOR encoded values in hexadecimal
1828	   (right column), together with diagnostic notation for these values
1829	   (left column).  Note that the string "\u00fc" is one form of
1830	   diagnostic notation for a UTF-8 string containing the single Unicode
1831	   character U+00FC, LATIN SMALL LETTER U WITH DIAERESIS (u umlaut).
1832	   Similarly, "\u6c34" is a UTF-8 string in diagnostic notation with a
1833	   single character U+6C34 (CJK UNIFIED IDEOGRAPH-6C34, often
1834	   representing "water"), and "\ud800\udd51" is a UTF-8 string in
1835	   diagnostic notation with a single character U+10151 (GREEK ACROPHONIC
1836	   ATTIC FIFTY STATERS).  (Note that all these single-character strings
1837	   could also be represented in native UTF-8 in diagnostic notation,
1838	   just not in an ASCII-only specification like the present one.)  In
1839	   the diagnostic notation provided for bignums, their intended numeric
1840	   value is shown as a decimal number (such as 18446744073709551616)
1841	   instead of showing a tagged byte string (such as
1842	   2(h'010000000000000000')).

1844	   +----------------------+--------------------------------------------+
1845	   | Diagnostic           | Encoded                                    |
1846	   +----------------------+--------------------------------------------+
1847	   | 0                    | 0x00                                       |
1848	   |                      |                                            |
1849	   | 1                    | 0x01                                       |
1850	   |                      |                                            |
1851	   | 10                   | 0x0a                                       |
1852	   |                      |                                            |
1853	   | 23                   | 0x17                                       |
1854	   |                      |                                            |
1855	   | 24                   | 0x1818                                     |
1856	   |                      |                                            |
1857	   | 25                   | 0x1819                                     |
1858	   |                      |                                            |
1859	   | 100                  | 0x1864                                     |
1860	   |                      |                                            |
1861	   | 1000                 | 0x1903e8                                   |
1862	   |                      |                                            |
1863	   | 1000000              | 0x1a000f4240                               |
1864	   |                      |                                            |
1865	   | 1000000000000        | 0x1b000000e8d4a51000                       |
1866	   |                      |                                            |
1867	   | 18446744073709551615 | 0x1bffffffffffffffff                       |
1868	   |                      |                                            |
1869	   | 18446744073709551616 | 0xc249010000000000000000                   |
1870	   |                      |                                            |
1871	   | -1844674407370955161 | 0x3bffffffffffffffff                       |
1872	   | 6                    |                                            |
1873	   |                      |                                            |
1874	   | -1844674407370955161 | 0xc349010000000000000000                   |
1875	   | 7                    |                                            |
1876	   |                      |                                            |
1877	   | -1                   | 0x20                                       |
1878	   |                      |                                            |
1879	   | -10                  | 0x29                                       |
1880	   |                      |                                            |
1881	   | -100                 | 0x3863                                     |
1882	   |                      |                                            |
1883	   | -1000                | 0x3903e7                                   |
1884	   |                      |                                            |
1885	   | 0.0                  | 0xf90000                                   |
1886	   |                      |                                            |
1887	   | -0.0                 | 0xf98000                                   |
1888	   |                      |                                            |
1889	   | 1.0                  | 0xf93c00                                   |
1890	   |                      |                                            |
1891	   | 1.1                  | 0xfb3ff199999999999a                       |
1892	   |                      |                                            |
1893	   | 1.5                  | 0xf93e00                                   |
1894	   |                      |                                            |
1895	   | 65504.0              | 0xf97bff                                   |
1896	   |                      |                                            |
1897	   | 100000.0             | 0xfa47c35000                               |
1898	   |                      |                                            |
1899	   | 3.4028234663852886e+ | 0xfa7f7fffff                               |
1900	   | 38                   |                                            |
1901	   |                      |                                            |
1902	   | 1.0e+300             | 0xfb7e37e43c8800759c                       |
1903	   |                      |                                            |
1904	   | 5.960464477539063e-8 | 0xf90001                                   |
1905	   |                      |                                            |
1906	   | 0.00006103515625     | 0xf90400                                   |
1907	   |                      |                                            |
1908	   | -4.0                 | 0xf9c400                                   |
1909	   |                      |                                            |
1910	   | -4.1                 | 0xfbc010666666666666                       |
1911	   |                      |                                            |
1912	   | Infinity             | 0xf97c00                                   |
1913	   |                      |                                            |
1914	   | NaN                  | 0xf97e00                                   |
1915	   |                      |                                            |
1916	   | -Infinity            | 0xf9fc00                                   |
1917	   |                      |                                            |
1918	   | Infinity             | 0xfa7f800000                               |
1919	   |                      |                                            |
1920	   | NaN                  | 0xfa7fc00000                               |
1921	   |                      |                                            |
1922	   | -Infinity            | 0xfaff800000                               |
1923	   |                      |                                            |
1924	   | Infinity             | 0xfb7ff0000000000000                       |
1925	   |                      |                                            |
1926	   | NaN                  | 0xfb7ff8000000000000                       |
1927	   |                      |                                            |
1928	   | -Infinity            | 0xfbfff0000000000000                       |
1929	   |                      |                                            |
1930	   | false                | 0xf4                                       |
1931	   |                      |                                            |
1932	   | true                 | 0xf5                                       |
1933	   |                      |                                            |
1934	   | null                 | 0xf6                                       |
1935	   |                      |                                            |
1936	   | undefined            | 0xf7                                       |
1937	   |                      |                                            |
1938	   | simple(16)           | 0xf0                                       |
1939	   |                      |                                            |
1940	   | simple(24)           | 0xf818                                     |
1941	   |                      |                                            |
1942	   | simple(255)          | 0xf8ff                                     |
1943	   |                      |                                            |
1944	   | 0("2013-03-21T20:04: | 0xc074323031332d30332d32315432303a30343a30 |
1945	   | 00Z")                | 305a                                       |
1946	   |                      |                                            |
1947	   | 1(1363896240)        | 0xc11a514b67b0                             |
1948	   |                      |                                            |
1949	   | 1(1363896240.5)      | 0xc1fb41d452d9ec200000                     |
1950	   |                      |                                            |
1951	   | 23(h'01020304')      | 0xd74401020304                             |
1952	   |                      |                                            |
1953	   | 24(h'6449455446')    | 0xd818456449455446                         |
1954	   |                      |                                            |
1955	   | 32("http://www.examp | 0xd82076687474703a2f2f7777772e6578616d706c |
1956	   | le.com")             | 652e636f6d                                 |
1957	   |                      |                                            |
1958	   | h''                  | 0x40                                       |
1959	   |                      |                                            |
1960	   | h'01020304'          | 0x4401020304                               |
1961	   |                      |                                            |
1962	   | ""                   | 0x60                                       |
1963	   |                      |                                            |
1964	   | "a"                  | 0x6161                                     |
1965	   |                      |                                            |
1966	   | "IETF"               | 0x6449455446                               |
1967	   |                      |                                            |
1968	   | "\"\\"               | 0x62225c                                   |
1969	   |                      |                                            |
1970	   | "\u00fc"             | 0x62c3bc                                   |
1971	   |                      |                                            |
1972	   | "\u6c34"             | 0x63e6b0b4                                 |
1973	   |                      |                                            |
1974	   | "\ud800\udd51"       | 0x64f0908591                               |
1975	   |                      |                                            |
1976	   | []                   | 0x80                                       |
1977	   |                      |                                            |
1978	   | [1, 2, 3]            | 0x83010203                                 |
1979	   |                      |                                            |
1980	   | [1, [2, 3], [4, 5]]  | 0x8301820203820405                         |
1981	   |                      |                                            |
1982	   | [1, 2, 3, 4, 5, 6,   | 0x98190102030405060708090a0b0c0d0e0f101112 |
1983	   | 7, 8, 9, 10, 11, 12, | 131415161718181819                         |
1984	   | 13, 14, 15, 16, 17,  |                                            |
1985	   | 18, 19, 20, 21, 22,  |                                            |
1986	   | 23, 24, 25]          |                                            |
1987	   |                      |                                            |
1988	   | {}                   | 0xa0                                       |
1989	   |                      |                                            |
1990	   | {1: 2, 3: 4}         | 0xa201020304                               |
1991	   |                      |                                            |
1992	   | {"a": 1, "b": [2,    | 0xa26161016162820203                       |
1993	   | 3]}                  |                                            |
1994	   |                      |                                            |
1995	   | ["a", {"b": "c"}]    | 0x826161a161626163                         |
1996	   |                      |                                            |
1997	   | {"a": "A", "b": "B", | 0xa561616141616261426163614361646144616561 |
1998	   | "c": "C", "d": "D",  | 45                                         |
1999	   | "e": "E"}            |                                            |
2000	   |                      |                                            |
2001	   | (_ h'0102',          | 0x5f42010243030405ff                       |
2002	   | h'030405')           |                                            |
2003	   |                      |                                            |
2004	   | (_ "strea", "ming")  | 0x7f657374726561646d696e67ff               |
2005	   |                      |                                            |
2006	   | [_ ]                 | 0x9fff                                     |
2007	   |                      |                                            |
2008	   | [_ 1, [2, 3], [_ 4,  | 0x9f018202039f0405ffff                     |
2009	   | 5]]                  |                                            |
2010	   |                      |                                            |
2011	   | [_ 1, [2, 3], [4,    | 0x9f01820203820405ff                       |
2012	   | 5]]                  |                                            |
2013	   |                      |                                            |
2014	   | [1, [2, 3], [_ 4,    | 0x83018202039f0405ff                       |
2015	   | 5]]                  |                                            |
2016	   |                      |                                            |
2017	   | [1, [_ 2, 3], [4,    | 0x83019f0203ff820405                       |
2018	   | 5]]                  |                                            |
2019	   |                      |                                            |
2020	   | [_ 1, 2, 3, 4, 5, 6, | 0x9f0102030405060708090a0b0c0d0e0f10111213 |
2021	   | 7, 8, 9, 10, 11, 12, | 1415161718181819ff                         |
2022	   | 13, 14, 15, 16, 17,  |                                            |
2023	   | 18, 19, 20, 21, 22,  |                                            |
2024	   | 23, 24, 25]          |                                            |
2025	   |                      |                                            |
2026	   | {_ "a": 1, "b": [_   | 0xbf61610161629f0203ffff                   |
2027	   | 2, 3]}               |                                            |
2028	   |                      |                                            |
2029	   | ["a", {_ "b": "c"}]  | 0x826161bf61626163ff                       |
2030	   |                      |                                            |
2031	   | {_ "Fun": true,      | 0xbf6346756ef563416d7421ff                 |
2032	   | "Amt": -2}           |                                            |
2033	   +----------------------+--------------------------------------------+

2035	Appendix B.  Jump Table

2037	   For brevity, this jump table does not show initial bytes that are
2038	   reserved for future extension.  It also only shows a selection of the
2039	   initial bytes that can be used for optional features.  (All unsigned
2040	   integers are in network byte order.)

2042	   +-----------------+-------------------------------------------------+
2043	   | Byte            | Structure/Semantics                             |
2044	   +-----------------+-------------------------------------------------+
2045	   | 0x00..0x17      | Integer 0x00..0x17 (0..23)                      |
2046	   |                 |                                                 |
2047	   | 0x18            | Unsigned integer (one-byte uint8_t follows)     |
2048	   |                 |                                                 |
2049	   | 0x19            | Unsigned integer (two-byte uint16_t follows)    |
2050	   |                 |                                                 |
2051	   | 0x1a            | Unsigned integer (four-byte uint32_t follows)   |
2052	   |                 |                                                 |
2053	   | 0x1b            | Unsigned integer (eight-byte uint64_t follows)  |
2054	   |                 |                                                 |
2055	   | 0x20..0x37      | Negative Integer -1-0x00..-1-0x17 (-1..-24)     |
2056	   |                 |                                                 |
2057	   | 0x38            | Negative Integer -1-n (one-byte uint8_t for n   |
2058	   |                 | follows)                                        |
2059	   |                 |                                                 |
2060	   | 0x39            | Negative integer -1-n (two-byte uint16_t for n  |
2061	   |                 | follows)                                        |
2062	   |                 |                                                 |
2063	   | 0x3a            | Negative integer -1-n (four-byte uint32_t for n |
2064	   |                 | follows)                                        |
2065	   |                 |                                                 |
2066	   | 0x3b            | Negative integer -1-n (eight-byte uint64_t for  |
2067	   |                 | n follows)                                      |
2068	   |                 |                                                 |
2069	   | 0x40..0x57      | byte string (0x00..0x17 bytes follow)           |
2070	   |                 |                                                 |
2071	   | 0x58            | byte string (one-byte uint8_t for n, and then n |
2072	   |                 | bytes follow)                                   |
2073	   |                 |                                                 |
2074	   | 0x59            | byte string (two-byte uint16_t for n, and then  |
2075	   |                 | n bytes follow)                                 |
2076	   |                 |                                                 |
2077	   | 0x5a            | byte string (four-byte uint32_t for n, and then |
2078	   |                 | n bytes follow)                                 |
2079	   |                 |                                                 |
2080	   | 0x5b            | byte string (eight-byte uint64_t for n, and     |
2081	   |                 | then n bytes follow)                            |
2082	   |                 |                                                 |
2083	   | 0x5f            | byte string, byte strings follow, terminated by |
2084	   |                 | "break"                                         |
2085	   |                 |                                                 |
2086	   | 0x60..0x77      | UTF-8 string (0x00..0x17 bytes follow)          |
2087	   |                 |                                                 |
2088	   | 0x78            | UTF-8 string (one-byte uint8_t for n, and then  |
2089	   |                 | n bytes follow)                                 |
2090	   |                 |                                                 |
2091	   | 0x79            | UTF-8 string (two-byte uint16_t for n, and then |
2092	   |                 | n bytes follow)                                 |
2093	   |                 |                                                 |
2094	   | 0x7a            | UTF-8 string (four-byte uint32_t for n, and     |
2095	   |                 | then n bytes follow)                            |
2096	   |                 |                                                 |
2097	   | 0x7b            | UTF-8 string (eight-byte uint64_t for n, and    |
2098	   |                 | then n bytes follow)                            |
2099	   |                 |                                                 |
2100	   | 0x7f            | UTF-8 string, UTF-8 strings follow, terminated  |
2101	   |                 | by "break"                                      |
2102	   |                 |                                                 |
2103	   | 0x80..0x97      | array (0x00..0x17 data items follow)            |
2104	   |                 |                                                 |
2105	   | 0x98            | array (one-byte uint8_t for n, and then n data  |
2106	   |                 | items follow)                                   |
2107	   |                 |                                                 |
2108	   | 0x99            | array (two-byte uint16_t for n, and then n data |
2109	   |                 | items follow)                                   |
2110	   |                 |                                                 |
2111	   | 0x9a            | array (four-byte uint32_t for n, and then n     |
2112	   |                 | data items follow)                              |
2113	   |                 |                                                 |
2114	   | 0x9b            | array (eight-byte uint64_t for n, and then n    |
2115	   |                 | data items follow)                              |
2116	   |                 |                                                 |
2117	   | 0x9f            | array, data items follow, terminated by "break" |
2118	   |                 |                                                 |
2119	   | 0xa0..0xb7      | map (0x00..0x17 pairs of data items follow)     |
2120	   |                 |                                                 |
2121	   | 0xb8            | map (one-byte uint8_t for n, and then n pairs   |
2122	   |                 | of data items follow)                           |
2123	   |                 |                                                 |
2124	   | 0xb9            | map (two-byte uint16_t for n, and then n pairs  |
2125	   |                 | of data items follow)                           |
2126	   |                 |                                                 |
2127	   | 0xba            | map (four-byte uint32_t for n, and then n pairs |
2128	   |                 | of data items follow)                           |
2129	   |                 |                                                 |
2130	   | 0xbb            | map (eight-byte uint64_t for n, and then n      |
2131	   |                 | pairs of data items follow)                     |
2132	   |                 |                                                 |
2133	   | 0xbf            | map, pairs of data items follow, terminated by  |
2134	   |                 | "break"                                         |
2135	   |                 |                                                 |
2136	   | 0xc0            | Text-based date/time (data item follows, see    |
2137	   |                 | Section 2.4.1)                                  |
2138	   |                 |                                                 |
2139	   | 0xc1            | Epoch-based date/time (data item follows, see   |
2140	   |                 | Section 2.4.1)                                  |
2141	   |                 |                                                 |
2142	   | 0xc2            | Positive bignum (data item "byte string"        |
2143	   |                 | follows)                                        |
2144	   |                 |                                                 |
2145	   | 0xc3            | Negative bignum (data item "byte string"        |
2146	   |                 | follows)                                        |
2147	   |                 |                                                 |
2148	   | 0xc4            | Decimal Fraction (data item "array" follows,    |
2149	   |                 | see Section 2.4.3)                              |
2150	   |                 |                                                 |
2151	   | 0xc5            | Bigfloat (data item "array" follows, see        |
2152	   |                 | Section 2.4.3)                                  |
2153	   |                 |                                                 |
2154	   | 0xc6..0xd4      | (tagged item, tag to be assigned by IANA)       |
2155	   |                 |                                                 |
2156	   | 0xd5..0xd7      | Expected Conversion (data item follows, see     |
2157	   |                 | Section 2.4.4.2)                                |
2158	   |                 |                                                 |
2159	   | 0xd8..0xdb      | (more tagged items, 1/2/4/8 bytes and then a    |
2160	   |                 | data item follow)                               |
2161	   |                 |                                                 |
2162	   | 0xe0..0xf3      | (simple value to be assigned by IANA)           |
2163	   |                 |                                                 |
2164	   | 0xf4            | False                                           |
2165	   |                 |                                                 |
2166	   | 0xf5            | True                                            |
2167	   |                 |                                                 |
2168	   | 0xf6            | Null                                            |
2169	   |                 |                                                 |
2170	   | 0xf7            | Undefined                                       |
2171	   |                 |                                                 |
2172	   | 0xf8            | (simple value to be assigned by IANA, one byte  |
2173	   |                 | follows)                                        |
2174	   |                 |                                                 |
2175	   | 0xf9            | Half-Precision Float (two-byte IEEE 754)        |
2176	   |                 |                                                 |
2177	   | 0xfa            | Single-Precision Float (four-byte IEEE 754)     |
2178	   |                 |                                                 |
2179	   | 0xfb            | Double-Precision Float (eight-byte IEEE 754)    |
2180	   |                 |                                                 |
2181	   | 0xff            | "break" stop code                               |
2182	   +-----------------+-------------------------------------------------+

2184	                   Table 4: Jump Table for Initial Byte

2186	Appendix C.  Pseudocode

2188	   The well-formedness of a CBOR item can be checked by the pseudo-code
2189	   in Figure 1.  The data is well-formed if and only if:

2191	   o  the pseudo-code does not "fail";

2193	   o  after execution of the pseudo-code, no bytes are left in the input
2194	      (except in streaming applications)

2196	   The pseudo-code has the following prerequisites:

2198	   o  take(n) reads n bytes from the input data and returns them as a
2199	      byte string.  If n bytes are no longer available, take(n) fails.

2201	   o  uint() converts a byte string into an unsigned integer by
2202	      interpreting the byte string in network byte order.

2204	   o  Arithmetic works as in C.

2206	   o  All variables are unsigned integers of sufficient range.

2208	   well_formed (breakable = false) {
2209	     // process initial bytes
2210	     ib = uint(take(1));
2211	     mt = ib >> 5;
2212	     val = ai = ib & 0x1f;
2213	     switch (ai) {
2214	       case 24: val = uint(take(1)); break;
2215	       case 25: val = uint(take(2)); break;
2216	       case 26: val = uint(take(4)); break;
2217	       case 27: val = uint(take(8)); break;
2218	       case 28: case 29: case 30: fail();
2219	       case 31:

2221	         return well_formed_indefinite(mt, breakable);
2222	     }
2223	     // process content
2224	     switch (mt) {
2225	       // case 0, 1, 7 do not have content; just use val
2226	       case 2: case 3: take(val); break; // bytes/UTF-8
2227	       case 4: for (i = 0; i < val; i++) well_formed(); break;
2228	       case 5: for (i = 0; i < val*2; i++) well_formed(); break;
2229	       case 6: well_formed(); break;     // 1 embedded data item
2230	     }
2231	     return mt;                    // finite data item
2232	   }

2234	   well_formed_indefinite(mt, breakable) {
2235	     switch (mt) {
2236	       case 2: case 3:
2237	         while ((it = well_formed(true)) != -1)
2238	           if (it != mt)           // need finite embedded
2239	             fail();               //    of same type
2240	         break;
2241	       case 4: while (well_formed(true) != -1); break;
2242	       case 5: while (well_formed(true) != -1) well_formed(); break;
2243	       case 7:
2244	         if (breakable)
2245	           return -1;              // signal break out
2246	         else fail();              // no enclosing indefinite
2247	       default: fail();            // wrong mt
2248	     }
2249	     return 0;                     // no break out
2250	   }

2252	              Figure 1: Pseudo-Code for well-formedness check

2254	   Note that the remaining complexity of a complete CBOR decoder is
2255	   about presenting data that has been parsed to the application in an
2256	   appropriate form.

2258	   Major types 0 and 1 are designed in such a way that they can be
2259	   encoded in C from a signed integer without actually doing an if-then-
2260	   else for positive/negative (Figure 2).  This uses the fact that
2261	   (-1-n), the transformation for major type 1, is the same as ~n
2262	   (bitwise complement) in C unsigned arithmetic, ~n can then be
2263	   expressed as (-1)^n for the negative case, while 0^n leaves n
2264	   unchanged for non-negative.  The sign of a number can be converted to
2265	   -1 for negative and 0 for non-negative (0 or positive) by arithmetic-
2266	   shifting the number by one bit less than the bit length of the number
2267	   (for example, by 63 for 64-bit numbers).

2269	   void encode_sint(int64_t n) {
2270	     uint64t ui = n >> 63;    // extend sign to whole length
2271	     mt = ui & 0x20;          // extract major type
2272	     ui ^= n;                 // complement negatives
2273	     if (ui < 24)
2274	       *p++ = mt + ui;
2275	     else if (ui < 256) {
2276	       *p++ = mt + 24;
2277	       *p++ = ui;
2278	     } else
2279	          ...

2281	            Figure 2: Pseudo-code for encoding a signed integer

2283	Appendix D.  Half-precision

2285	   As half-precision floating point numbers were only added to IEEE 754
2286	   in 2008, today's programming platforms often still only have limited
2287	   support for them.  It is very easy to include at least decoding
2288	   support for them even without such support.  An example of a small
2289	   decoder for half-precision floating point numbers in the C language
2290	   is shown in Figure 3.  A similar program for Python is in Figure 4;
2291	   this code assumes that the 2-byte value has already been decoded as
2292	   an (unsigned short) integer in network byte order (as would be done
2293	   by the pseudocode in Appendix C).

2295	   #include 

2297	   double decode_half(unsigned char *halfp) {
2298	     int half = (halfp[0] << 8) + halfp[1];
2299	     int exp = (half >> 10) & 0x1f;
2300	     int mant = half & 0x3ff;
2301	     double val;
2302	     if (exp == 0) val = ldexp(mant, -24);
2303	     else if (exp != 31) val = ldexp(mant + 1024, exp - 25);
2304	     else val = mant == 0 ? INFINITY : NAN;
2305	     return half & 0x8000 ? -val : val;
2306	   }

2308	               Figure 3: C code for a half-precision decoder

2310	   import struct
2311	   from math import ldexp

2313	   def decode_single(single):
2314	       return struct.unpack("!f", struct.pack("!I", single))[0]

2316	   def decode_half(half):

2318	       valu = (half & 0x7fff) << 13 | (half & 0x8000) << 16
2319	       if ((half & 0x7c00) != 0x7c00):
2320	           return ldexp(decode_single(valu), 112)
2321	       return decode_single(valu | 0x7f800000)

2323	            Figure 4: Python code for a half-precision decoder

2325	Appendix E.  Comparison of Other Binary Formats to CBOR's Design
2326	             Objectives

2328	   The proposal for CBOR follows a history of binary formats that is as
2329	   long as the history of computers themselves.  Different formats have
2330	   had different objectives.  In most cases, the objectives of the
2331	   format were never stated, although they can sometimes be implied by
2332	   the context where the format was first used.  Some formats were meant
2333	   to be universally-usable, although history has proven that no binary
2334	   format meets the needs of all protocols and applications.

2336	   CBOR differs from many of these formats due to it starting with a set
2337	   of objectives and attempting to meet just those.  This section
2338	   compares a few of the dozens of formats with CBOR's objectives in
2339	   order to help the reader decide if they want to use CBOR or a
2340	   different format for a particular protocol or application.

2342	   Note that the discussion here is not meant to be a criticism of any
2343	   format: to the best of our knowledge, no format before CBOR was meant
2344	   to cover CBOR's objectives in the priority we have assigned them.  A
2345	   brief recap of the objectives from Section 1.1 is:

2347	   1.  unambiguously encode common data formats from Internet standards

2349	   2.  code compactness for encoder or decoder

2351	   3.  no schema description needed

2353	   4.  reasonably compact serialization

2355	   5.  applicable to constrained and unconstrained applications

2357	   6.  good JSON conversion

2359	   7.  extensibility

2361	E.1.  ASN.1 DER, BER, and PER

2363	   [ASN.1] has many serializations.  In the IETF, DER and BER are the
2364	   most common.  The serialized output is not particularly compact for
2365	   many items, and the code needed to decode numeric items can be
2366	   complex on a constrained device.

2368	   Few (if any) IETF protocols have adopted one of the several variants
2369	   of PER.  There could be many reasons for this, but one that is
2370	   commonly stated is that PER requires making use of the schema for
2371	   even parsing the surface structure of the data stream, requiring
2372	   significant tool support.  There are different versions of the ASN.1
2373	   schema language in use, which has also hampered adoption.

2375	E.2.  MessagePack

2377	   [MessagePack] is a concise, widely-implemented counted binary
2378	   serialization format, similar in many properties to CBOR, although
2379	   somewhat less regular.  While the data model can be used to represent
2380	   JSON data, MessagePack has also been used in many RPC applications
2381	   and for long-term storage of data.

2383	   MessagePack has been essentially stable since it was first published
2384	   around 2011; it has not yet had a transition.  The evolution of
2385	   MessagePack is impeded by an imperative to maintain complete
2386	   backwards compatibility with existing stored data, while only few
2387	   bytecodes are still available for extension.  Repeated requests over
2388	   the years from the MessagePack user community to separate out binary
2389	   and text strings in the encoding recently have led to an extension
2390	   proposal that would leave MessagePack's "raw" data ambiguous between
2391	   its usages for binary and text data.  The extension mechanism for
2392	   MessagePack remains unclear.

2394	E.3.  BSON

2396	   [BSON] is a data format that was developed for the storage of JSON-
2397	   like maps (JSON objects) in the MongoDB database.  Its major
2398	   distinguishing feature is the capability for in-place update,
2399	   foregoing a compact representation.  BSON uses a counted
2400	   representation except for map keys, which are null-byte terminated.
2401	   While BSON can be used for the representation of JSON-like objects on
2402	   the wire, its specification is dominated by the requirements of the
2403	   database application and has become somewhat baroque.  The status of
2404	   how BSON extensions will be implemented remains unclear.

2406	E.4.  UBJSON

2408	   [UBJSON] has a design goal to make JSON faster and somewhat smaller,
2409	   using a binary format that is limited to exactly the data model JSON
2410	   uses.  Thus, there is expressly no intention to support, for example,
2411	   binary data; however, there is a "high-precision number", expressed
2412	   as a character string in JSON syntax.  UBJSON is not optimized for
2413	   code compactness, and its type byte coding is optimized for human
2414	   recognition and not for compact representation of native types such
2415	   as small integers.  Although UBJSON is mostly counted, it provides a
2416	   reserved "unknown-length" value to support streaming of arrays and
2417	   maps (JSON objects).  Within these containers, UBJSON also has a
2418	   "Noop" type for padding.

2420	E.5.  MSDTP: RFC 713

2422	   A very early example of a compact message format is described in
2423	   [RFC0713], defined in 1976.  It is included here for its historical
2424	   value, not because it was ever widely used.

2426	E.6.  Conciseness On The Wire

2428	   While CBOR's design objective of code compactness for encoders and
2429	   decoders is higher than its objective of conciseness on the wire,
2430	   many people focus on the wire size.  Table 5 shows some encoding
2431	   examples for the simple nested array [1, [2, 3]]; where some form of
2432	   indefinite length encoding is supported by the encoding, [_ 1, [2,
2433	   3]] (indefinite length on the outer array) is also shown.

2435	   (Entries marked with an asterisk have not been checked against an
2436	   implementation and might be applying some liberty in translating the
2437	   CBOR data model to that format.  Corrections are appreciated.)

2439	   +---------------+-------------------------+-------------------------+
2440	   | Format        | [1, [2, 3]]             | [_ 1, [2, 3]]           |
2441	   +---------------+-------------------------+-------------------------+
2442	   | RFC 713*      | c2 05 81 c2 02 82 83    |                         |
2443	   |               |                         |                         |
2444	   | ASN.1 BER*    | 30 0b 02 01 01 30 06 02 | 30 80 02 01 01 30 06 02 |
2445	   |               | 01 02 02 01 03          | 01 02 02 01 03 00 00    |
2446	   |               |                         |                         |
2447	   | MessagePack   | 92 01 92 02 03          |                         |
2448	   |               |                         |                         |
2449	   | BSON          | 22 00 00 00 10 30 00 01 |                         |
2450	   |               | 00 00 00 04 31 00 13 00 |                         |
2451	   |               | 00 00 10 30 00 02 00 00 |                         |
2452	   |               | 00 10 31 00 03 00 00 00 |                         |
2453	   |               | 00 00                   |                         |
2454	   |               |                         |                         |
2455	   | UBJSON        | 61 02 42 01 61 02 42 02 | 61 ff 42 01 61 02 42 02 |
2456	   |               | 42 03                   | 42 03 45*               |
2457	   |               |                         |                         |
2458	   | CBOR          | 82 01 82 02 03          | 9f 01 82 02 03 ff       |
2459	   +---------------+-------------------------+-------------------------+

2461	           Table 5: Examples for different levels of conciseness

2463	Authors' Addresses

2465	   Carsten Bormann
2466	   Universitaet Bremen TZI
2467	   Postfach 330440
2468	   D-28359 Bremen
2469	   Germany

2471	   Phone: +49-421-218-63921
2472	   Email: cabo@tzi.org

2474	   Paul Hoffman
2475	   VPN Consortium

2477	   Email: paul.hoffman@vpnc.org