idnits 2.17.1 draft-ietf-cbor-7049bis-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (8 March 2020) is 1510 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '2' on line 2757 -- Looks like a reference, but probably isn't: '3' on line 2757 -- Looks like a reference, but probably isn't: '4' on line 2755 -- Looks like a reference, but probably isn't: '5' on line 2755 -- Looks like a reference, but probably isn't: '100' on line 1518 == Missing Reference: '-1' is mentioned on line 1514, but not defined -- Looks like a reference, but probably isn't: '1' on line 3044 == Missing Reference: 'RFCthis' is mentioned on line 2281, but not defined == Missing Reference: 'TM' is mentioned on line 2576, but not defined -- Looks like a reference, but probably isn't: '0' on line 3060 == Missing Reference: 'RFC4627' is mentioned on line 3202, but not defined ** Obsolete undefined reference: RFC 4627 (Obsoleted by RFC 7158, RFC 7159) == Missing Reference: 'CNN-TERMS' is mentioned on line 3203, but not defined == Unused Reference: 'RFC8746' is defined on line 2560, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'ECMA262' -- Possible downref: Non-RFC (?) normative reference: ref. 'IEEE754' -- Obsolete informational reference (is this intentional?): RFC 7049 (Obsoleted by RFC 8949) Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Bormann 3 Internet-Draft Universitaet Bremen TZI 4 Obsoletes: 7049 (if approved) P. Hoffman 5 Intended status: Standards Track ICANN 6 Expires: 9 September 2020 8 March 2020 8 Concise Binary Object Representation (CBOR) 9 draft-ietf-cbor-7049bis-13 11 Abstract 13 The Concise Binary Object Representation (CBOR) is a data format 14 whose design goals include the possibility of extremely small code 15 size, fairly small message size, and extensibility without the need 16 for version negotiation. These design goals make it different from 17 earlier binary serializations such as ASN.1 and MessagePack. 19 This document is a revised edition of RFC 7049, with editorial 20 improvements, added detail, and fixed errata. This revision formally 21 obsoletes RFC 7049, while keeping full compatibility of the 22 interchange format from RFC 7049. It does not create a new version 23 of the format. 25 Contributing 27 This document is being worked on in the CBOR Working Group. Please 28 contribute on the mailing list there, or in the GitHub repository for 29 this draft: https://github.com/cbor-wg/CBORbis 31 The charter for the CBOR Working Group says that the WG will update 32 RFC 7049 to fix verified errata. Security issues and clarifications 33 may be addressed, but changes to this document will ensure backward 34 compatibility for popular deployed codebases. This document will be 35 targeted at becoming an Internet Standard. 37 [RFC editor: please remove this note.] 39 Status of This Memo 41 This Internet-Draft is submitted in full conformance with the 42 provisions of BCP 78 and BCP 79. 44 Internet-Drafts are working documents of the Internet Engineering 45 Task Force (IETF). Note that other groups may also distribute 46 working documents as Internet-Drafts. The list of current Internet- 47 Drafts is at https://datatracker.ietf.org/drafts/current/. 49 Internet-Drafts are draft documents valid for a maximum of six months 50 and may be updated, replaced, or obsoleted by other documents at any 51 time. It is inappropriate to use Internet-Drafts as reference 52 material or to cite them other than as "work in progress." 54 This Internet-Draft will expire on 9 September 2020. 56 Copyright Notice 58 Copyright (c) 2020 IETF Trust and the persons identified as the 59 document authors. All rights reserved. 61 This document is subject to BCP 78 and the IETF Trust's Legal 62 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 63 license-info) in effect on the date of publication of this document. 64 Please review these documents carefully, as they describe your rights 65 and restrictions with respect to this document. Code Components 66 extracted from this document must include Simplified BSD License text 67 as described in Section 4.e of the Trust Legal Provisions and are 68 provided without warranty as described in the Simplified BSD License. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 73 1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 74 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 75 2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 7 76 2.1. Extended Generic Data Models . . . . . . . . . . . . . . 8 77 2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 9 78 3. Specification of the CBOR Encoding . . . . . . . . . . . . . 10 79 3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 11 80 3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 13 81 3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 13 82 3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 14 83 3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 16 84 3.2.4. Summary of indefinite-length use of major types . . . 17 85 3.3. Floating-Point Numbers and Values with No Content . . . . 17 86 3.4. Tagging of Items . . . . . . . . . . . . . . . . . . . . 19 87 3.4.1. Standard Date/Time String . . . . . . . . . . . . . . 22 88 3.4.2. Epoch-based Date/Time . . . . . . . . . . . . . . . . 22 89 3.4.3. Bignums . . . . . . . . . . . . . . . . . . . . . . . 23 90 3.4.4. Decimal Fractions and Bigfloats . . . . . . . . . . . 24 91 3.4.5. Content Hints . . . . . . . . . . . . . . . . . . . . 25 92 3.4.5.1. Encoded CBOR Data Item . . . . . . . . . . . . . 25 93 3.4.5.2. Expected Later Encoding for CBOR-to-JSON 94 Converters . . . . . . . . . . . . . . . . . . . . 25 95 3.4.5.3. Encoded Text . . . . . . . . . . . . . . . . . . 26 96 3.4.6. Self-Described CBOR . . . . . . . . . . . . . . . . . 27 98 4. Serialization Considerations . . . . . . . . . . . . . . . . 28 99 4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 28 100 4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 29 101 4.2.1. Core Deterministic Encoding Requirements . . . . . . 29 102 4.2.2. Additional Deterministic Encoding Considerations . . 30 103 4.2.3. Length-first Map Key Ordering . . . . . . . . . . . . 32 104 5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 33 105 5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 33 106 5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 34 107 5.3. Validity of Items . . . . . . . . . . . . . . . . . . . . 35 108 5.3.1. Basic validity . . . . . . . . . . . . . . . . . . . 35 109 5.3.2. Tag validity . . . . . . . . . . . . . . . . . . . . 35 110 5.4. Validity and Evolution . . . . . . . . . . . . . . . . . 36 111 5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 37 112 5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 38 113 5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 39 114 5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 40 115 6. Converting Data between CBOR and JSON . . . . . . . . . . . . 40 116 6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 41 117 6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 42 118 7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 43 119 7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 43 120 7.2. Curating the Additional Information Space . . . . . . . . 44 121 8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 45 122 8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 46 123 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 46 124 9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 47 125 9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 47 126 9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 47 127 9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 48 128 9.5. The +cbor Structured Syntax Suffix Registration . . . . . 49 129 10. Security Considerations . . . . . . . . . . . . . . . . . . . 50 130 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 52 131 11.1. Normative References . . . . . . . . . . . . . . . . . . 52 132 11.2. Informative References . . . . . . . . . . . . . . . . . 53 133 Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 55 134 Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 59 135 Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 62 136 Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 65 137 Appendix E. Comparison of Other Binary Formats to CBOR's Design 138 Objectives . . . . . . . . . . . . . . . . . . . . . . . 66 139 E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 67 140 E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 67 141 E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 68 142 E.4. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 68 143 E.5. Conciseness on the Wire . . . . . . . . . . . . . . . . . 68 144 Appendix F. Changes from RFC 7049 . . . . . . . . . . . . . . . 69 145 Appendix G. Well-formedness errors and examples . . . . . . . . 70 146 G.1. Examples for CBOR data items that are not well-formed . . 71 147 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 73 148 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 74 150 1. Introduction 152 There are hundreds of standardized formats for binary representation 153 of structured data (also known as binary serialization formats). Of 154 those, some are for specific domains of information, while others are 155 generalized for arbitrary data. In the IETF, probably the best-known 156 formats in the latter category are ASN.1's BER and DER [ASN.1]. 158 The format defined here follows some specific design goals that are 159 not well met by current formats. The underlying data model is an 160 extended version of the JSON data model [RFC8259]. It is important 161 to note that this is not a proposal that the grammar in RFC 8259 be 162 extended in general, since doing so would cause a significant 163 backwards incompatibility with already deployed JSON documents. 164 Instead, this document simply defines its own data model that starts 165 from JSON. 167 Appendix E lists some existing binary formats and discusses how well 168 they do or do not fit the design objectives of the Concise Binary 169 Object Representation (CBOR). 171 This document is a revised edition of [RFC7049], with editorial 172 improvements, added detail, and fixed errata. This revision formally 173 obsoletes RFC 7049, while keeping full compatibility of the 174 interchange format from RFC 7049. It does not create a new version 175 of the format. 177 1.1. Objectives 179 The objectives of CBOR, roughly in decreasing order of importance, 180 are: 182 1. The representation must be able to unambiguously encode most 183 common data formats used in Internet standards. 185 * It must represent a reasonable set of basic data types and 186 structures using binary encoding. "Reasonable" here is 187 largely influenced by the capabilities of JSON, with the major 188 addition of binary byte strings. The structures supported are 189 limited to arrays and trees; loops and lattice-style graphs 190 are not supported. 192 * There is no requirement that all data formats be uniquely 193 encoded; that is, it is acceptable that the number "7" might 194 be encoded in multiple different ways. 196 2. The code for an encoder or decoder must be able to be compact in 197 order to support systems with very limited memory, processor 198 power, and instruction sets. 200 * An encoder and a decoder need to be implementable in a very 201 small amount of code (for example, in class 1 constrained 202 nodes as defined in [RFC7228]). 204 * The format should use contemporary machine representations of 205 data (for example, not requiring binary-to-decimal 206 conversion). 208 3. Data must be able to be decoded without a schema description. 210 * Similar to JSON, encoded data should be self-describing so 211 that a generic decoder can be written. 213 4. The serialization must be reasonably compact, but data 214 compactness is secondary to code compactness for the encoder and 215 decoder. 217 * "Reasonable" here is bounded by JSON as an upper bound in 218 size, and by the implementation complexity limiting how much 219 effort can go into achieving that compactness. Using either 220 general compression schemes or extensive bit-fiddling violates 221 the complexity goals. 223 5. The format must be applicable to both constrained nodes and high- 224 volume applications. 226 * This means it must be reasonably frugal in CPU usage for both 227 encoding and decoding. This is relevant both for constrained 228 nodes and for potential usage in applications with a very high 229 volume of data. 231 6. The format must support all JSON data types for conversion to and 232 from JSON. 234 * It must support a reasonable level of conversion as long as 235 the data represented is within the capabilities of JSON. It 236 must be possible to define a unidirectional mapping towards 237 JSON for all types of data. 239 7. The format must be extensible, and the extended data must be 240 decodable by earlier decoders. 242 * The format is designed for decades of use. 244 * The format must support a form of extensibility that allows 245 fallback so that a decoder that does not understand an 246 extension can still decode the message. 248 * The format must be able to be extended in the future by later 249 IETF standards. 251 1.2. Terminology 253 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 254 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 255 "OPTIONAL" in this document are to be interpreted as described in 256 BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 257 capitals, as shown here. 259 The term "byte" is used in its now-customary sense as a synonym for 260 "octet". All multi-byte values are encoded in network byte order 261 (that is, most significant byte first, also known as "big-endian"). 263 This specification makes use of the following terminology: 265 Data item: A single piece of CBOR data. The structure of a data 266 item may contain zero, one, or more nested data items. The term 267 is used both for the data item in representation format and for 268 the abstract idea that can be derived from that by a decoder; the 269 former can be addressed specifically by using "encoded data item". 271 Decoder: A process that decodes a well-formed encoded CBOR data item 272 and makes it available to an application. Formally speaking, a 273 decoder contains a parser to break up the input using the syntax 274 rules of CBOR, as well as a semantic processor to prepare the data 275 in a form suitable to the application. 277 Encoder: A process that generates the (well-formed) representation 278 format of a CBOR data item from application information. 280 Data Stream: A sequence of zero or more data items, not further 281 assembled into a larger containing data item. The independent 282 data items that make up a data stream are sometimes also referred 283 to as "top-level data items". 285 Well-formed: A data item that follows the syntactic structure of 286 CBOR. A well-formed data item uses the initial bytes and the byte 287 strings and/or data items that are implied by their values as 288 defined in CBOR and does not include following extraneous data. 290 CBOR decoders by definition only return contents from well-formed 291 data items. 293 Valid: A data item that is well-formed and also follows the semantic 294 restrictions that apply to CBOR data items (Section 5.3). 296 Expected: Besides its normal English meaning, the term "expected" is 297 used to describe requirements beyond CBOR validity that an 298 application has on its input data. Well-formed (processable at 299 all), valid (checked by a validity-checking generic decoder), and 300 expected (checked by the application) form a hierarchy of layers 301 of acceptability. 303 Stream decoder: A process that decodes a data stream and makes each 304 of the data items in the sequence available to an application as 305 they are received. 307 Terms and concepts for floating-point values such as Infinity, NaN 308 (not a number), negative zero, and subnormal are defined in 309 [IEEE754]. 311 Where bit arithmetic or data types are explained, this document uses 312 the notation familiar from the programming language C, except that 313 "**" denotes exponentiation. Similar to the "0x" notation for 314 hexadecimal numbers, numbers in binary notation are prefixed with 315 "0b". Underscores can be added to a number solely for readability, 316 so 0b00100001 (0x21) might be written 0b001_00001 to emphasize the 317 desired interpretation of the bits in the byte; in this case, it is 318 split into three bits and five bits. Encoded CBOR data items are 319 sometimes given in the "0x" or "0b" notation; these values are first 320 interpreted as numbers as in C and are then interpreted as byte 321 strings in network byte order, including any leading zero bytes 322 expressed in the notation. 324 Words may be _italicized_ for emphasis; in the plain text form of 325 this specification this is indicated by surrounding words with 326 underscore characters. Verbatim text (e.g., names from a programming 327 language) may be set in "monospace" type; in plain text this is 328 approximated somewhat ambiguously by surrounding the text in double 329 quotes (which also retain their usual meaning). 331 2. CBOR Data Models 333 CBOR is explicit about its generic data model, which defines the set 334 of all data items that can be represented in CBOR. Its basic generic 335 data model is extensible by the registration of simple type values 336 and tags. Applications can then subset the resulting extended 337 generic data model to build their specific data models. 339 Within environments that can represent the data items in the generic 340 data model, generic CBOR encoders and decoders can be implemented 341 (which usually involves defining additional implementation data types 342 for those data items that do not already have a natural 343 representation in the environment). The ability to provide generic 344 encoders and decoders is an explicit design goal of CBOR; however 345 many applications will provide their own application-specific 346 encoders and/or decoders. 348 In the basic (un-extended) generic data model, a data item is one of: 350 * an integer in the range -2**64..2**64-1 inclusive 352 * a simple value, identified by a number between 0 and 255, but 353 distinct from that number itself 355 * a floating-point value, distinct from an integer, out of the set 356 representable by IEEE 754 binary64 (including non-finites) 357 [IEEE754] 359 * a sequence of zero or more bytes ("byte string") 361 * a sequence of zero or more Unicode code points ("text string") 363 * a sequence of zero or more data items ("array") 365 * a mapping (mathematical function) from zero or more data items 366 ("keys") each to a data item ("values"), ("map") 368 * a tagged data item ("tag"), comprising a tag number (an integer in 369 the range 0..2**64-1) and the tag content (a data item) 371 Note that integer and floating-point values are distinct in this 372 model, even if they have the same numeric value. 374 Also note that serialization variants, such as the number of bytes of 375 the encoded floating-point value, or the choice of one of the ways in 376 which an integer, the length of a text or byte string, the number of 377 elements in an array or pairs in a map, or a tag number, 378 (collectively "the argument", see Section 3) can be encoded, are not 379 visible at the generic data model level. 381 2.1. Extended Generic Data Models 383 This basic generic data model comes pre-extended by the registration 384 of a number of simple values and tag numbers right in this document, 385 such as: 387 * "false", "true", "null", and "undefined" (simple values identified 388 by 20..23) 390 * integer and floating-point values with a larger range and 391 precision than the above (tag numbers 2 to 5) 393 * application data types such as a point in time or an RFC 3339 394 date/time string (tag numbers 1, 0) 396 Further elements of the extended generic data model can be (and have 397 been) defined via the IANA registries created for CBOR. Even if such 398 an extension is unknown to a generic encoder or decoder, data items 399 using that extension can be passed to or from the application by 400 representing them at the interface to the application within the 401 basic generic data model, i.e., as generic values of a simple type or 402 generic tags. 404 In other words, the basic generic data model is stable as defined in 405 this document, while the extended generic data model expands by the 406 registration of new simple values or tag numbers, but never shrinks. 408 While there is a strong expectation that generic encoders and 409 decoders can represent "false", "true", and "null" ("undefined" is 410 intentionally omitted) in the form appropriate for their programming 411 environment, implementation of the data model extensions created by 412 tags is truly optional and a matter of implementation quality. 414 2.2. Specific Data Models 416 The specific data model for a CBOR-based protocol usually subsets the 417 extended generic data model and assigns application semantics to the 418 data items within this subset and its components. When documenting 419 such specific data models, where it is desired to specify the types 420 of data items, it is preferred to identify the types by the names 421 they have in the generic data model ("negative integer", "array") 422 instead of by referring to aspects of their CBOR representation 423 ("major type 1", "major type 4"). 425 Specific data models can also specify what values (including values 426 of different types) are equivalent for the purposes of map keys and 427 encoder freedom. For example, in the generic data model, a valid map 428 MAY have both "0" and "0.0" as keys, and an encoder MUST NOT encode 429 "0.0" as an integer (major type 0, Section 3.1). However, if a 430 specific data model declares that floating-point and integer 431 representations of integral values are equivalent, using both map 432 keys "0" and "0.0" in a single map would be considered duplicates, 433 even while encoded as different major types, and so invalid; and an 434 encoder could encode integral-valued floats as integers or vice 435 versa, perhaps to save encoded bytes. 437 3. Specification of the CBOR Encoding 439 A CBOR data item (Section 2) is encoded to or decoded from a byte 440 string carrying a well-formed encoded data item as described in this 441 section. The encoding is summarized in Table 7, indexed by the 442 initial byte. An encoder MUST produce only well-formed encoded data 443 items. A decoder MUST NOT return a decoded data item when it 444 encounters input that is not a well-formed encoded CBOR data item 445 (this does not detract from the usefulness of diagnostic and recovery 446 tools that might make available some information from a damaged 447 encoded CBOR data item). 449 The initial byte of each encoded data item contains both information 450 about the major type (the high-order 3 bits, described in 451 Section 3.1) and additional information (the low-order 5 bits). With 452 a few exceptions, the additional information's value describes how to 453 load an unsigned integer "argument": 455 Less than 24: The argument's value is the value of the additional 456 information. 458 24, 25, 26, or 27: The argument's value is held in the following 1, 459 2, 4, or 8 bytes, respectively, in network byte order. For major 460 type 7 and additional information value 25, 26, 27, these bytes 461 are not used as an integer argument, but as a floating-point value 462 (see Section 3.3). 464 28, 29, 30: These values are reserved for future additions to the 465 CBOR format. In the present version of CBOR, the encoded item is 466 not well-formed. 468 31: No argument value is derived. If the major type is 0, 1, or 6, 469 the encoded item is not well-formed. For major types 2 to 5, the 470 item's length is indefinite, and for major type 7, the byte does 471 not consitute a data item at all but terminates an indefinite 472 length item; both are described in Section 3.2. 474 The initial byte and any additional bytes consumed to construct the 475 argument are collectively referred to as the "head" of the data item. 477 The meaning of this argument depends on the major type. For example, 478 in major type 0, the argument is the value of the data item itself 479 (and in major type 1 the value of the data item is computed from the 480 argument); in major type 2 and 3 it gives the length of the string 481 data in bytes that follows; and in major types 4 and 5 it is used to 482 determine the number of data items enclosed. 484 If the encoded sequence of bytes ends before the end of a data item, 485 that item is not well-formed. If the encoded sequence of bytes still 486 has bytes remaining after the outermost encoded item is decoded, that 487 encoding is not a single well-formed CBOR item; depending on the 488 application, the decoder may either treat the encoding as not well- 489 formed or just identify the start of the remaining bytes to the 490 application. 492 A CBOR decoder implementation can be based on a jump table with all 493 256 defined values for the initial byte (Table 7). A decoder in a 494 constrained implementation can instead use the structure of the 495 initial byte and following bytes for more compact code (see 496 Appendix C for a rough impression of how this could look). 498 3.1. Major Types 500 The following lists the major types and the additional information 501 and other bytes associated with the type. 503 Major type 0: an integer in the range 0..2**64-1 inclusive. The 504 value of the encoded item is the argument itself. For example, 505 the integer 10 is denoted as the one byte 0b000_01010 (major type 506 0, additional information 10). The integer 500 would be 507 0b000_11001 (major type 0, additional information 25) followed by 508 the two bytes 0x01f4, which is 500 in decimal. 510 Major type 1: a negative integer in the range -2**64..-1 inclusive. 511 The value of the item is -1 minus the argument. For example, the 512 integer -500 would be 0b001_11001 (major type 1, additional 513 information 25) followed by the two bytes 0x01f3, which is 499 in 514 decimal. 516 Major type 2: a byte string. The number of bytes in the string is 517 equal to the argument. For example, a byte string whose length is 518 5 would have an initial byte of 0b010_00101 (major type 2, 519 additional information 5 for the length), followed by 5 bytes of 520 binary content. A byte string whose length is 500 would have 3 521 initial bytes of 0b010_11001 (major type 2, additional information 522 25 to indicate a two-byte length) followed by the two bytes 0x01f4 523 for a length of 500, followed by 500 bytes of binary content. 525 Major type 3: a text string (Section 2), encoded as UTF-8 526 ([RFC3629]). The number of bytes in the string is equal to the 527 argument. A string containing an invalid UTF-8 sequence is well- 528 formed but invalid. This type is provided for systems that need 529 to interpret or display human-readable text, and allows the 530 differentiation between unstructured bytes and text that has a 531 specified repertoire and encoding. In contrast to formats such as 532 JSON, the Unicode characters in this type are never escaped. 533 Thus, a newline character (U+000A) is always represented in a 534 string as the byte 0x0a, and never as the bytes 0x5c6e (the 535 characters "\" and "n") or as 0x5c7530303061 (the characters "\", 536 "u", "0", "0", "0", and "a"). 538 Major type 4: an array of data items. In other formats, arrays are 539 also called lists, sequences, or tuples (a "CBOR sequence" is 540 something slightly different, though [RFC8742]). The argument is 541 the number of data items in the array. Items in an array do not 542 need to all be of the same type. For example, an array that 543 contains 10 items of any type would have an initial byte of 544 0b100_01010 (major type of 4, additional information of 10 for the 545 length) followed by the 10 remaining items. 547 Major type 5: a map of pairs of data items. Maps are also called 548 tables, dictionaries, hashes, or objects (in JSON). A map is 549 comprised of pairs of data items, each pair consisting of a key 550 that is immediately followed by a value. The argument is the 551 number of _pairs_ of data items in the map. For example, a map 552 that contains 9 pairs would have an initial byte of 0b101_01001 553 (major type of 5, additional information of 9 for the number of 554 pairs) followed by the 18 remaining items. The first item is the 555 first key, the second item is the first value, the third item is 556 the second key, and so on. Because items in a map come in pairs, 557 their total number is always even: A map that contains an odd 558 number of items (no value data present after the last key data 559 item) is not well-formed. A map that has duplicate keys may be 560 well-formed, but it is not valid, and thus it causes indeterminate 561 decoding; see also Section 5.6. 563 Major type 6: a tagged data item ("tag") whose tag number, an 564 integer in the range 0..2**64-1 inclusive, is the argument and 565 whose enclosed data item ("tag content") is the single encoded 566 data item that follows the head. See Section 3.4. 568 Major type 7: floating-point numbers and simple values, as well as 569 the "break" stop code. See Section 3.3. 571 These eight major types lead to a simple table showing which of the 572 256 possible values for the initial byte of a data item are used 573 (Table 7). 575 In major types 6 and 7, many of the possible values are reserved for 576 future specification. See Section 9 for more information on these 577 values. 579 Table 1 summarizes the major types defined by CBOR, ignoring the next 580 section for now. The number N in this table stands for the argument, 581 mt for the major type. 583 +----+-----------------------+---------------------------------+ 584 | mt | Meaning | Content | 585 +====+=======================+=================================+ 586 | 0 | unsigned integer N | - | 587 +----+-----------------------+---------------------------------+ 588 | 1 | negative integer -1-N | - | 589 +----+-----------------------+---------------------------------+ 590 | 2 | byte string | N bytes | 591 +----+-----------------------+---------------------------------+ 592 | 3 | text string | N bytes (UTF-8 text) | 593 +----+-----------------------+---------------------------------+ 594 | 4 | array | N data items (elements) | 595 +----+-----------------------+---------------------------------+ 596 | 5 | map | 2N data items (key/value pairs) | 597 +----+-----------------------+---------------------------------+ 598 | 6 | tag of number N | 1 data item | 599 +----+-----------------------+---------------------------------+ 600 | 7 | simple/float | - | 601 +----+-----------------------+---------------------------------+ 603 Table 1: Overview over the definite-length use of CBOR major 604 types (mt = major type, N = argument) 606 3.2. Indefinite Lengths for Some Major Types 608 Four CBOR items (arrays, maps, byte strings, and text strings) can be 609 encoded with an indefinite length using additional information value 610 31. This is useful if the encoding of the item needs to begin before 611 the number of items inside the array or map, or the total length of 612 the string, is known. (The ability to start sending a data item 613 before all of it is known is often referred to as "streaming" within 614 that data item.) 616 Indefinite-length arrays and maps are dealt with differently than 617 indefinite-length byte strings and text strings. 619 3.2.1. The "break" Stop Code 621 The "break" stop code is encoded with major type 7 and additional 622 information value 31 (0b111_11111). It is not itself a data item: it 623 is just a syntactic feature to close an indefinite-length item. 625 If the "break" stop code appears anywhere where a data item is 626 expected, other than directly inside an indefinite-length string, 627 array, or map -- for example directly inside a definite-length array 628 or map -- the enclosing item is not well-formed. 630 3.2.2. Indefinite-Length Arrays and Maps 632 Indefinite-length arrays and maps are represented using their major 633 type with the additional information value of 31, followed by an 634 arbitrary-length sequence of zero or more items for an array or key/ 635 value pairs for a map, followed by the "break" stop code 636 (Section 3.2.1). In other words, indefinite-length arrays and maps 637 look identical to other arrays and maps except for beginning with the 638 additional information value of 31 and ending with the "break" stop 639 code. 641 If the break stop code appears after a key in a map, in place of that 642 key's value, the map is not well-formed. 644 There is no restriction against nesting indefinite-length array or 645 map items. A "break" only terminates a single item, so nested 646 indefinite-length items need exactly as many "break" stop codes as 647 there are type bytes starting an indefinite-length item. 649 For example, assume an encoder wants to represent the abstract array 650 [1, [2, 3], [4, 5]]. The definite-length encoding would be 651 0x8301820203820405: 653 83 -- Array of length 3 654 01 -- 1 655 82 -- Array of length 2 656 02 -- 2 657 03 -- 3 658 82 -- Array of length 2 659 04 -- 4 660 05 -- 5 662 Indefinite-length encoding could be applied independently to each of 663 the three arrays encoded in this data item, as required, leading to 664 representations such as: 666 0x9f018202039f0405ffff 667 9F -- Start indefinite-length array 668 01 -- 1 669 82 -- Array of length 2 670 02 -- 2 671 03 -- 3 672 9F -- Start indefinite-length array 673 04 -- 4 674 05 -- 5 675 FF -- "break" (inner array) 676 FF -- "break" (outer array) 678 0x9f01820203820405ff 679 9F -- Start indefinite-length array 680 01 -- 1 681 82 -- Array of length 2 682 02 -- 2 683 03 -- 3 684 82 -- Array of length 2 685 04 -- 4 686 05 -- 5 687 FF -- "break" 689 0x83018202039f0405ff 690 83 -- Array of length 3 691 01 -- 1 692 82 -- Array of length 2 693 02 -- 2 694 03 -- 3 695 9F -- Start indefinite-length array 696 04 -- 4 697 05 -- 5 698 FF -- "break" 700 0x83019f0203ff820405 701 83 -- Array of length 3 702 01 -- 1 703 9F -- Start indefinite-length array 704 02 -- 2 705 03 -- 3 706 FF -- "break" 707 82 -- Array of length 2 708 04 -- 4 709 05 -- 5 711 An example of an indefinite-length map (that happens to have two key/ 712 value pairs) might be: 714 0xbf6346756ef563416d7421ff 715 BF -- Start indefinite-length map 716 63 -- First key, UTF-8 string length 3 717 46756e -- "Fun" 718 F5 -- First value, true 719 63 -- Second key, UTF-8 string length 3 720 416d74 -- "Amt" 721 21 -- Second value, -2 722 FF -- "break" 724 3.2.3. Indefinite-Length Byte Strings and Text Strings 726 Indefinite-length strings are represented by a byte containing the 727 major type and additional information value of 31, followed by a 728 series of zero or more byte or text strings ("chunks") that have 729 definite lengths, followed by the "break" stop code (Section 3.2.1). 730 The data item represented by the indefinite-length string is the 731 concatenation of the chunks (i.e., the empty byte or text string, 732 respectively, if no chunk is present). (Note that zero-length 733 chunks, while not particularly useful, are permitted.) 735 If any item between the indefinite-length string indicator 736 (0b010_11111 or 0b011_11111) and the "break" stop code is not a 737 definite-length string item of the same major type, the string is not 738 well-formed. 740 If any definite-length text string inside an indefinite-length text 741 string is invalid, the indefinite-length text string is invalid. 742 Note that this implies that the bytes of a single UTF-8 character 743 cannot be split up between chunks: a new chunk of a text string can 744 only be started at a character boundary. 746 For example, assume an encoded data item consisting of the bytes: 748 0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111 750 5F -- Start indefinite-length byte string 751 44 -- Byte string of length 4 752 aabbccdd -- Bytes content 753 43 -- Byte string of length 3 754 eeff99 -- Bytes content 755 FF -- "break" 757 After decoding, this results in a single byte string with seven 758 bytes: 0xaabbccddeeff99. 760 3.2.4. Summary of indefinite-length use of major types 762 Table 2 summarizes the major types defined by CBOR as used for 763 indefinite length encoding (with additional information set to 31). 764 mt stands for the major type. 766 +----+-------------------+----------------------------------+ 767 | mt | Meaning | enclosed up to "break" stop code | 768 +====+===================+==================================+ 769 | 0 | (not well-formed) | - | 770 +----+-------------------+----------------------------------+ 771 | 1 | (not well-formed) | - | 772 +----+-------------------+----------------------------------+ 773 | 2 | byte string | definite-length byte strings | 774 +----+-------------------+----------------------------------+ 775 | 3 | text string | definite-length text strings | 776 +----+-------------------+----------------------------------+ 777 | 4 | array | data items (elements) | 778 +----+-------------------+----------------------------------+ 779 | 5 | map | data items (key/value pairs) | 780 +----+-------------------+----------------------------------+ 781 | 6 | (not well-formed) | - | 782 +----+-------------------+----------------------------------+ 783 | 7 | "break" stop code | - | 784 +----+-------------------+----------------------------------+ 786 Table 2: Overview over the indefinite-length use of CBOR 787 major types (mt = major type, additional information = 788 31) 790 3.3. Floating-Point Numbers and Values with No Content 792 Major type 7 is for two types of data: floating-point numbers and 793 "simple values" that do not need any content. Each value of the 794 5-bit additional information in the initial byte has its own separate 795 meaning, as defined in Table 3. Like the major types for integers, 796 items of this major type do not carry content data; all the 797 information is in the initial bytes. 799 +-------------+---------------------------------------------------+ 800 | 5-Bit Value | Semantics | 801 +=============+===================================================+ 802 | 0..23 | Simple value (value 0..23) | 803 +-------------+---------------------------------------------------+ 804 | 24 | Simple value (value 32..255 in following byte) | 805 +-------------+---------------------------------------------------+ 806 | 25 | IEEE 754 Half-Precision Float (16 bits follow) | 807 +-------------+---------------------------------------------------+ 808 | 26 | IEEE 754 Single-Precision Float (32 bits follow) | 809 +-------------+---------------------------------------------------+ 810 | 27 | IEEE 754 Double-Precision Float (64 bits follow) | 811 +-------------+---------------------------------------------------+ 812 | 28-30 | Reserved, not well-formed in the present document | 813 +-------------+---------------------------------------------------+ 814 | 31 | "break" stop code for indefinite-length items | 815 | | (Section 3.2.1) | 816 +-------------+---------------------------------------------------+ 818 Table 3: Values for Additional Information in Major Type 7 820 As with all other major types, the 5-bit value 24 signifies a single- 821 byte extension: it is followed by an additional byte to represent the 822 simple value. (To minimize confusion, only the values 32 to 255 are 823 used.) This maintains the structure of the initial bytes: as for the 824 other major types, the length of these always depends on the 825 additional information in the first byte. Table 4 lists the values 826 assigned and available for simple types. 828 +---------+-----------------+ 829 | Value | Semantics | 830 +=========+=================+ 831 | 0..19 | (Unassigned) | 832 +---------+-----------------+ 833 | 20 | False | 834 +---------+-----------------+ 835 | 21 | True | 836 +---------+-----------------+ 837 | 22 | Null | 838 +---------+-----------------+ 839 | 23 | Undefined value | 840 +---------+-----------------+ 841 | 24..31 | (Reserved) | 842 +---------+-----------------+ 843 | 32..255 | (Unassigned) | 844 +---------+-----------------+ 846 Table 4: Simple Values 848 An encoder MUST NOT issue two-byte sequences that start with 0xf8 849 (major type = 7, additional information = 24) and continue with a 850 byte less than 0x20 (32 decimal). Such sequences are not well- 851 formed. (This implies that an encoder cannot encode false, true, 852 null, or undefined in two-byte sequences, only the one-byte variants 853 of these are well-formed.) 855 The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit 856 IEEE 754 binary floating-point values [IEEE754]. These floating- 857 point values are encoded in the additional bytes of the appropriate 858 size. (See Appendix D for some information about 16-bit floating- 859 point numbers.) 861 3.4. Tagging of Items 863 In CBOR, a data item can be enclosed by a tag to give it some 864 additional semantics, as uniquely identified by a "tag number". The 865 tag is major type 6, its argument (Section 3) indicates the tag 866 number, and it contains a single enclosed data item, the "tag 867 content". (If a tag requires further structure to its content, this 868 structure is provided by the enclosed data item.) We use the term 869 "tag" for the entire data item consisting of both a tag number and 870 the tag content: the tag content is the data item that is being 871 tagged. 873 For example, assume that a byte string of length 12 is marked with a 874 tag of number 2 to indicate it is a positive "bignum" 875 (Section 3.4.3). The encoded data item would start with a byte 876 0b110_00010 (major type 6, additional information 2 for the tag 877 number) followed by the encoded tag content: 0b010_01100 (major type 878 2, additional information of 12 for the length) followed by the 12 879 bytes of the bignum. 881 The definition of a tag number describes the additional semantics 882 conveyed for tags with this tag number in the extended generic data 883 model. These semantics may include equivalence of some tagged data 884 items with other data items, including some that can already be 885 represented in the basic generic data model. For instance, 0xc24101, 886 a bignum the tag content of which is the byte string with the single 887 byte 0x01, is equivalent to an integer 1, which could also be encoded 888 for instance as 0x01, 0x1801, or 0x190001. The tag definition may 889 include the definition of a preferred serialization (Section 4.1) 890 that is recommended for generic encoders; this may prefer basic 891 generic data model representations over ones that employ a tag. 893 The tag definition usually restricts what kinds of nested data item 894 or items are valid for such tags. Tag definitions may restrict their 895 content to a very specific syntactic structure, as the tags defined 896 in this document do, or they may aim at a more semantically defined 897 definition of their content, as for instance tags 40 and 1040 do 898 [rfc8746]: These accept a number of different ways of representing 899 arrays. 901 As a matter of convention, many tags do not accept null or undefined 902 values as tag content; instead, the expectation is that a null or 903 undefined value can be used in place of the entire tag; Section 3.4.2 904 provides some further considerations for one specific tag about the 905 handling of this convention in application protocols and in mapping 906 to platform types. 908 Decoders do not need to understand tags of every tag number, and tags 909 may be of little value in applications where the implementation 910 creating a particular CBOR data item and the implementation decoding 911 that stream know the semantic meaning of each item in the data flow. 912 Their primary purpose in this specification is to define common data 913 types such as dates. A secondary purpose is to provide conversion 914 hints when it is foreseen that the CBOR data item needs to be 915 translated into a different format, requiring hints about the content 916 of items. Understanding the semantics of tags is optional for a 917 decoder; it can simply present both the tag number and the tag 918 content to the application, without interpreting the additional 919 semantics of the tag. 921 A tag applies semantics to the data item it encloses. Tags can nest: 922 If tag A encloses tag B, which encloses data item C, tag A applies to 923 the result of applying tag B on data item C. 925 IANA maintains a registry of tag numbers as described in Section 9.2. 926 Table 5 provides a list of tag numbers that were defined in 927 [RFC7049], with definitions in the rest of this section. Note that 928 many other tag numbers have been defined since the publication of 929 [RFC7049]; see the registry described at Section 9.2 for the complete 930 list. 932 +------------+-------------+----------------------------------+ 933 | Tag Number | Data Item | Semantics | 934 +============+=============+==================================+ 935 | 0 | text string | Standard date/time string; see | 936 | | | Section 3.4.1 | 937 +------------+-------------+----------------------------------+ 938 | 1 | integer or | Epoch-based date/time; see | 939 | | float | Section 3.4.2 | 940 +------------+-------------+----------------------------------+ 941 | 2 | byte string | Positive bignum; see | 942 | | | Section 3.4.3 | 943 +------------+-------------+----------------------------------+ 944 | 3 | byte string | Negative bignum; see | 945 | | | Section 3.4.3 | 946 +------------+-------------+----------------------------------+ 947 | 4 | array | Decimal fraction; see | 948 | | | Section 3.4.4 | 949 +------------+-------------+----------------------------------+ 950 | 5 | array | Bigfloat; see Section 3.4.4 | 951 +------------+-------------+----------------------------------+ 952 | 21 | (any) | Expected conversion to base64url | 953 | | | encoding; see Section 3.4.5.2 | 954 +------------+-------------+----------------------------------+ 955 | 22 | (any) | Expected conversion to base64 | 956 | | | encoding; see Section 3.4.5.2 | 957 +------------+-------------+----------------------------------+ 958 | 23 | (any) | Expected conversion to base16 | 959 | | | encoding; see Section 3.4.5.2 | 960 +------------+-------------+----------------------------------+ 961 | 24 | byte string | Encoded CBOR data item; see | 962 | | | Section 3.4.5.1 | 963 +------------+-------------+----------------------------------+ 964 | 32 | text string | URI; see Section 3.4.5.3 | 965 +------------+-------------+----------------------------------+ 966 | 33 | text string | base64url; see Section 3.4.5.3 | 967 +------------+-------------+----------------------------------+ 968 | 34 | text string | base64; see Section 3.4.5.3 | 969 +------------+-------------+----------------------------------+ 970 | 35 | text string | Regular expression; see | 971 | | | Section 3.4.5.3 | 972 +------------+-------------+----------------------------------+ 973 | 36 | text string | MIME message; see | 974 | | | Section 3.4.5.3 | 975 +------------+-------------+----------------------------------+ 976 | 55799 | (any) | Self-described CBOR; see | 977 | | | Section 3.4.6 | 978 +------------+-------------+----------------------------------+ 980 Table 5: Tag numbers defined in RFC 7049 982 Conceptually, tags are interpreted in the generic data model, not at 983 (de-)serialization time. A small number of tags (specifically, tag 984 number 25 and tag number 29) have been registered with semantics that 985 may require processing at (de-)serialization time: The decoder needs 986 to be aware and the encoder needs to be in control of the exact 987 sequence in which data items are encoded into the CBOR data stream. 988 This means these tags cannot be implemented on top of every generic 989 CBOR encoder/decoder (which might not reflect the serialization order 990 for entries in a map at the data model level and vice versa); their 991 implementation therefore typically needs to be integrated into the 992 generic encoder/decoder. The definition of new tags with this 993 property is NOT RECOMMENDED. 995 Protocols using tag numbers 0 and 1 extend the generic data model 996 (Section 2) with data items representing points in time; tag numbers 997 2 and 3, with arbitrarily sized integers; and tag numbers 4 and 5, 998 with floating-point values of arbitrary size and precision. 1000 3.4.1. Standard Date/Time String 1002 Tag number 0 contains a text string in the standard format described 1003 by the "date-time" production in [RFC3339], as refined by Section 3.3 1004 of [RFC4287], representing the point in time described there. A 1005 nested item of another type or that doesn't match the [RFC4287] 1006 format is invalid. 1008 3.4.2. Epoch-based Date/Time 1010 Tag number 1 contains a numerical value counting the number of 1011 seconds from 1970-01-01T00:00Z in UTC time to the represented point 1012 in civil time. 1014 The tag content MUST be an unsigned or negative integer (major types 1015 0 and 1), or a floating-point number (major type 7 with additional 1016 information 25, 26, or 27). Other contained types are invalid. 1018 Non-negative values (major type 0 and non-negative floating-point 1019 numbers) stand for time values on or after 1970-01-01T00:00Z UTC and 1020 are interpreted according to POSIX [TIME_T]. (POSIX time is also 1021 known as UNIX Epoch time. Note that leap seconds are handled 1022 specially by POSIX time and this results in a 1 second discontinuity 1023 several times per decade.) Note that applications that require the 1024 expression of times beyond early 2106 cannot leave out support of 1025 64-bit integers for the tag content. 1027 Negative values (major type 1 and negative floating-point numbers) 1028 are interpreted as determined by the application requirements as 1029 there is no universal standard for UTC count-of-seconds time before 1030 1970-01-01T00:00Z (this is particularly true for points in time that 1031 precede discontinuities in national calendars). The same applies to 1032 non-finite values. 1034 To indicate fractional seconds, floating-point values can be used 1035 within tag number 1 instead of integer values. Note that this 1036 generally requires binary64 support, as binary16 and binary32 provide 1037 non-zero fractions of seconds only for a short period of time around 1038 early 1970. An application that requires tag number 1 support may 1039 restrict the tag content to be an integer (or a floating-point value) 1040 only. 1042 Note that platform types for date/time may include null or undefined 1043 values, which may also be desirable at an application protocol level. 1044 While emitting tag number 1 values with non-finite tag content values 1045 (e.g., with NaN for undefined date/time values or with Infinite for 1046 an expiry date that is not set) may seem an obvious way to handle 1047 this, using untagged null or undefined is often a better solution. 1048 Application protocol designers are encouraged to consider these cases 1049 and include clear guidelines for handling them. 1051 3.4.3. Bignums 1053 Protocols using tag numbers 2 and 3 extend the generic data model 1054 (Section 2) with "bignums" representing arbitrarily sized integers. 1055 In the basic generic data model, bignum values are not equal to 1056 integers from the same model, but the extended generic data model 1057 created by this tag definition defines equivalence based on numeric 1058 value, and preferred serialization (Section 4.1) never makes use of 1059 bignums that also can be expressed as basic integers (see below). 1061 Bignums are encoded as a byte string data item, which is interpreted 1062 as an unsigned integer n in network byte order. Contained items of 1063 other types are invalid. For tag number 2, the value of the bignum 1064 is n. For tag number 3, the value of the bignum is -1 - n. The 1065 preferred serialization of the byte string is to leave out any 1066 leading zeroes (note that this means the preferred serialization for 1067 n = 0 is the empty byte string, but see below). Decoders that 1068 understand these tags MUST be able to decode bignums that do have 1069 leading zeroes. The preferred serialization of an integer that can 1070 be represented using major type 0 or 1 is to encode it this way 1071 instead of as a bignum (which means that the empty string never 1072 occurs in a bignum when using preferred serialization). Note that 1073 this means the non-preferred choice of a bignum representation 1074 instead of a basic integer for encoding a number is not intended to 1075 have application semantics (just as the choice of a longer basic 1076 integer representation than needed, such as 0x1800 for 0x00 does 1077 not). 1079 For example, the number 18446744073709551616 (2**64) is represented 1080 as 0b110_00010 (major type 6, tag number 2), followed by 0b010_01001 1081 (major type 2, length 9), followed by 0x010000000000000000 (one byte 1082 0x01 and eight bytes 0x00). In hexadecimal: 1084 C2 -- Tag 2 1085 49 -- Byte string of length 9 1086 010000000000000000 -- Bytes content 1088 3.4.4. Decimal Fractions and Bigfloats 1090 Protocols using tag number 4 extend the generic data model with data 1091 items representing arbitrary-length decimal fractions of the form 1092 m*(10**e). Protocols using tag number 5 extend the generic data 1093 model with data items representing arbitrary-length binary fractions 1094 of the form m*(2**e). As with bignums, values of different types are 1095 not equal in the generic data model. 1097 Decimal fractions combine an integer mantissa with a base-10 scaling 1098 factor. They are most useful if an application needs the exact 1099 representation of a decimal fraction such as 1.1 because there is no 1100 exact representation for many decimal fractions in binary floating- 1101 point representations. 1103 "Bigfloats" combine an integer mantissa with a base-2 scaling factor. 1104 They are binary floating-point values that can exceed the range or 1105 the precision of the three IEEE 754 formats supported by CBOR 1106 (Section 3.3). Bigfloats may also be used by constrained 1107 applications that need some basic binary floating-point capability 1108 without the need for supporting IEEE 754. 1110 A decimal fraction or a bigfloat is represented as a tagged array 1111 that contains exactly two integer numbers: an exponent e and a 1112 mantissa m. Decimal fractions (tag number 4) use base-10 exponents; 1113 the value of a decimal fraction data item is m*(10**e). Bigfloats 1114 (tag number 5) use base-2 exponents; the value of a bigfloat data 1115 item is m*(2**e). The exponent e MUST be represented in an integer 1116 of major type 0 or 1, while the mantissa can also be a bignum 1117 (Section 3.4.3). Contained items with other structures are invalid. 1119 An example of a decimal fraction is that the number 273.15 could be 1120 represented as 0b110_00100 (major type of 6 for the tag, additional 1121 information of 4 for the number of tag), followed by 0b100_00010 1122 (major type of 4 for the array, additional information of 2 for the 1123 length of the array), followed by 0b001_00001 (major type of 1 for 1124 the first integer, additional information of 1 for the value of -2), 1125 followed by 0b000_11001 (major type of 0 for the second integer, 1126 additional information of 25 for a two-byte value), followed by 1127 0b0110101010110011 (27315 in two bytes). In hexadecimal: 1129 C4 -- Tag 4 1130 82 -- Array of length 2 1131 21 -- -2 1132 19 6ab3 -- 27315 1134 An example of a bigfloat is that the number 1.5 could be represented 1135 as 0b110_00101 (major type of 6 for the tag, additional information 1136 of 5 for the number of tag), followed by 0b100_00010 (major type of 4 1137 for the array, additional information of 2 for the length of the 1138 array), followed by 0b001_00000 (major type of 1 for the first 1139 integer, additional information of 0 for the value of -1), followed 1140 by 0b000_00011 (major type of 0 for the second integer, additional 1141 information of 3 for the value of 3). In hexadecimal: 1143 C5 -- Tag 5 1144 82 -- Array of length 2 1145 20 -- -1 1146 03 -- 3 1148 Decimal fractions and bigfloats provide no representation of 1149 Infinity, -Infinity, or NaN; if these are needed in place of a 1150 decimal fraction or bigfloat, the IEEE 754 half-precision 1151 representations from Section 3.3 can be used. 1153 3.4.5. Content Hints 1155 The tags in this section are for content hints that might be used by 1156 generic CBOR processors. These content hints do not extend the 1157 generic data model. 1159 3.4.5.1. Encoded CBOR Data Item 1161 Sometimes it is beneficial to carry an embedded CBOR data item that 1162 is not meant to be decoded immediately at the time the enclosing data 1163 item is being decoded. Tag number 24 (CBOR data item) can be used to 1164 tag the embedded byte string as a data item encoded in CBOR format. 1165 Contained items that aren't byte strings are invalid. A contained 1166 byte string is valid if it encodes a well-formed CBOR item; validity 1167 checking of the decoded CBOR item is not required for tag validity 1168 (but could be offered by a generic decoder as a special option). 1170 3.4.5.2. Expected Later Encoding for CBOR-to-JSON Converters 1172 Tags number 21 to 23 indicate that a byte string might require a 1173 specific encoding when interoperating with a text-based 1174 representation. These tags are useful when an encoder knows that the 1175 byte string data it is writing is likely to be later converted to a 1176 particular JSON-based usage. That usage specifies that some strings 1177 are encoded as base64, base64url, and so on. The encoder uses byte 1178 strings instead of doing the encoding itself to reduce the message 1179 size, to reduce the code size of the encoder, or both. The encoder 1180 does not know whether or not the converter will be generic, and 1181 therefore wants to say what it believes is the proper way to convert 1182 binary strings to JSON. 1184 The data item tagged can be a byte string or any other data item. In 1185 the latter case, the tag applies to all of the byte string data items 1186 contained in the data item, except for those contained in a nested 1187 data item tagged with an expected conversion. 1189 These three tag numbers suggest conversions to three of the base data 1190 encodings defined in [RFC4648]. Tag number 21 suggests conversion to 1191 base64url encoding (Section 5 of RFC 4648), where padding is not used 1192 (see Section 3.2 of RFC 4648); that is, all trailing equals signs 1193 ("=") are removed from the encoded string. Tag number 22 suggests 1194 conversion to classical base64 encoding (Section 4 of RFC 4648), with 1195 padding as defined in RFC 4648. For both base64url and base64, 1196 padding bits are set to zero (see Section 3.5 of RFC 4648), and 1197 encoding is performed without the inclusion of any line breaks, 1198 whitespace, or other additional characters. Tag number 23 suggests 1199 conversion to base16 (hex) encoding, with uppercase alphabetics (see 1200 Section 8 of RFC 4648). Note that, for all three tag numbers, the 1201 encoding of the empty byte string is the empty text string. 1203 3.4.5.3. Encoded Text 1205 Some text strings hold data that have formats widely used on the 1206 Internet, and sometimes those formats can be validated and presented 1207 to the application in appropriate form by the decoder. There are 1208 tags for some of these formats. As with tag numbers 21 to 23, if 1209 these tags are applied to an item other than a text string, they 1210 apply to all text string data items it contains. 1212 * Tag number 32 is for URIs, as defined in [RFC3986]. If the text 1213 string doesn't match the "URI-reference" production, the string is 1214 invalid. 1216 * Tag numbers 33 and 34 are for base64url- and base64-encoded text 1217 strings, respectively, as defined in [RFC4648]. If any of: 1219 - the encoded text string contains non-alphabet characters or 1220 only 1 character in the last block of 4, or 1222 - the padding bits in a 2- or 3-character block are not 0, or 1224 - the base64 encoding has the wrong number of padding characters, 1225 or 1227 - the base64url encoding has padding characters, 1229 the string is invalid. 1231 * Tag number 35 is for regular expressions that are roughly in Perl 1232 Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] or a 1233 version of the JavaScript regular expression syntax [ECMA262]. 1234 (Note that more specific identification may be necessary if the 1235 actual version of the specification underlying the regular 1236 expression, or more than just the text of the regular expression 1237 itself, need to be conveyed.) Any contained string value is 1238 valid. 1240 * Tag number 36 is for MIME messages (including all headers), as 1241 defined in [RFC2045]. A text string that isn't a valid MIME 1242 message is invalid. (For this tag, validity checking may be 1243 particularly onerous for a generic decoder and might therefore not 1244 be offered. Note that many MIME messages are general binary data 1245 and can therefore not be represented in a text string; 1246 [IANA.cbor-tags] lists a registration for tag number 257 that is 1247 similar to tag number 36 but uses a byte string as its tag 1248 content.) 1250 Note that tag numbers 33 and 34 differ from 21 and 22 in that the 1251 data is transported in base-encoded form for the former and in raw 1252 byte string form for the latter. 1254 3.4.6. Self-Described CBOR 1256 In many applications, it will be clear from the context that CBOR is 1257 being employed for encoding a data item. For instance, a specific 1258 protocol might specify the use of CBOR, or a media type is indicated 1259 that specifies its use. However, there may be applications where 1260 such context information is not available, such as when CBOR data is 1261 stored in a file that does not have disambiguating metadata. Here, 1262 it may help to have some distinguishing characteristics for the data 1263 itself. 1265 Tag number 55799 is defined for this purpose. It does not impart any 1266 special semantics on the data item that it encloses; that is, the 1267 semantics of the tag content enclosed in tag number 55799 is exactly 1268 identical to the semantics of the tag content itself. 1270 The serialization of this tag's head is 0xd9d9f7, which does not 1271 appear to be in use as a distinguishing mark for any frequently used 1272 file types. In particular, 0xd9d9f7 is not a valid start of a 1273 Unicode text in any Unicode encoding if it is followed by a valid 1274 CBOR data item. 1276 For instance, a decoder might be able to decode both CBOR and JSON. 1277 Such a decoder would need to mechanically distinguish the two 1278 formats. An easy way for an encoder to help the decoder would be to 1279 tag the entire CBOR item with tag number 55799, the serialization of 1280 which will never be found at the beginning of a JSON text. 1282 4. Serialization Considerations 1284 4.1. Preferred Serialization 1286 For some values at the data model level, CBOR provides multiple 1287 serializations. For many applications, it is desirable that an 1288 encoder always chooses a preferred serialization (preferred 1289 encoding); however, the present specification does not put the burden 1290 of enforcing this preference on either encoder or decoder. 1292 Some constrained decoders may be limited in their ability to decode 1293 non-preferred serializations: For example, if only integers below 1294 1_000_000_000 (one billion) are expected in an application, the 1295 decoder may leave out the code that would be needed to decode 64-bit 1296 arguments in integers. An encoder that always uses preferred 1297 serialization ("preferred encoder") interoperates with this decoder 1298 for the numbers that can occur in this application. More generally 1299 speaking, it therefore can be said that a preferred encoder is more 1300 universally interoperable (and also less wasteful) than one that, 1301 say, always uses 64-bit integers. 1303 Similarly, a constrained encoder may be limited in the variety of 1304 representation variants it supports in such a way that it does not 1305 emit preferred serializations ("variant encoder"): Say, it could be 1306 designed to always use the 32-bit variant for an integer that it 1307 encodes even if a short representation is available (again, assuming 1308 that there is no application need for integers that can only be 1309 represented with the 64-bit variant). A decoder that does not rely 1310 on only ever receiving preferred serializations ("variation-tolerant 1311 decoder") can there be said to be more universally interoperable (it 1312 might very well optimize for the case of receiving preferred 1313 serializations, though). Full implementations of CBOR decoders are 1314 by definition variation-tolerant; the distinction is only relevant if 1315 a constrained implementation of a CBOR decoder meets a variant 1316 encoder. 1318 The preferred serialization always uses the shortest form of 1319 representing the argument (Section 3); it also uses the shortest 1320 floating-point encoding that preserves the value being encoded. 1322 The preferred serialization for a floating-point value is the 1323 shortest floating-point encoding that preserves its value, e.g., 1324 0xf94580 for the number 5.5, and 0xfa45ad9c00 for the number 5555.5. 1325 For NaN values, a shorter encoding is preferred if zero-padding the 1326 shorter significand towards the right reconstitutes the original NaN 1327 value (for many applications, the single NaN encoding 0xf97e00 will 1328 suffice). 1330 Definite length encoding is preferred whenever the length is known at 1331 the time the serialization of the item starts. 1333 4.2. Deterministically Encoded CBOR 1335 Some protocols may want encoders to only emit CBOR in a particular 1336 deterministic format; those protocols might also have the decoders 1337 check that their input is in that deterministic format. Those 1338 protocols are free to define what they mean by a "deterministic 1339 format" and what encoders and decoders are expected to do. This 1340 section defines a set of restrictions that can serve as the base of 1341 such a deterministic format. 1343 4.2.1. Core Deterministic Encoding Requirements 1345 A CBOR encoding satisfies the "core deterministic encoding 1346 requirements" if it satisfies the following restrictions: 1348 * Preferred serialization MUST be used. In particular, this means 1349 that arguments (see Section 3) for integers, lengths in major 1350 types 2 through 5, and tags MUST be as short as possible, for 1351 instance: 1353 - 0 to 23 and -1 to -24 MUST be expressed in the same byte as the 1354 major type; 1356 - 24 to 255 and -25 to -256 MUST be expressed only with an 1357 additional uint8_t; 1359 - 256 to 65535 and -257 to -65536 MUST be expressed only with an 1360 additional uint16_t; 1362 - 65536 to 4294967295 and -65537 to -4294967296 MUST be expressed 1363 only with an additional uint32_t. 1365 Floating-point values also MUST use the shortest form that 1366 preserves the value, e.g. 1.5 is encoded as 0xf93e00 and 1000000.5 1367 as 0xfa49742408. (One implementation of this is to have all 1368 floats start as a 64-bit float, then do a test conversion to a 1369 32-bit float; if the result is the same numeric value, use the 1370 shorter form and repeat the process with a test conversion to a 1371 16-bit float. This also works to select 16-bit float for positive 1372 and negative Infinity as well.) 1374 * Indefinite-length items MUST NOT appear. They can be encoded as 1375 definite-length items instead. 1377 * The keys in every map MUST be sorted in the bytewise lexicographic 1378 order of their deterministic encodings. For example, the 1379 following keys are sorted correctly: 1381 1. 10, encoded as 0x0a. 1383 2. 100, encoded as 0x1864. 1385 3. -1, encoded as 0x20. 1387 4. "z", encoded as 0x617a. 1389 5. "aa", encoded as 0x626161. 1391 6. [100], encoded as 0x811864. 1393 7. [-1], encoded as 0x8120. 1395 8. false, encoded as 0xf4. 1397 4.2.2. Additional Deterministic Encoding Considerations 1399 CBOR tags present additional considerations for deterministic 1400 encoding. If a CBOR-based protocol were to provide the same 1401 semantics for the presence and absence of a specific tag (e.g., by 1402 allowing both tag 1 data items and raw numbers in a date/time 1403 position, treating the latter as if they were tagged), the 1404 deterministic format would not allow the presence of the tag, based 1405 on the "shortest form" principle. For example, a protocol might give 1406 encoders the choice of representing a URL as either a text string or, 1407 using Section 3.4.5.3, tag number 32 containing a text string. This 1408 protocol's deterministic encoding needs to either require that the 1409 tag is present or require that it is absent, not allow either one. 1411 In a protocol that does require tags in certain places to obtain 1412 specific semantics, the tag needs to appear in the deterministic 1413 format as well. Deterministic encoding considerations also apply to 1414 the content of tags. 1416 If a protocol includes a field that can express integers with an 1417 absolute value of 2^64 or larger using tag numbers 2 or 3 1418 (Section 3.4.3), the protocol's deterministic encoding needs to 1419 specify whether smaller integers are also expressed using these tags 1420 or using major types 0 and 1. Preferred serialization uses the 1421 latter choice, which is therefore recommended. 1423 Protocols that include floating-point values, whether represented 1424 using basic floating-point values (Section 3.3) or using tags (or 1425 both), may need to define extra requirements on their deterministic 1426 encodings, such as: 1428 * Although IEEE floating-point values can represent both positive 1429 and negative zero as distinct values, the application might not 1430 distinguish these and might decide to represent all zero values 1431 with a positive sign, disallowing negative zero. (The application 1432 may also want to restrict the precision of floating point values 1433 in such a way that there is never a need to represent 64-bit -- or 1434 even 32-bit -- floating-point values.) 1436 * If a protocol includes a field that can express floating-point 1437 values, with a specific data model that declares integer and 1438 floating-point values to be interchangeable, the protocol's 1439 deterministic encoding needs to specify whether the integer 1.0 is 1440 encoded as 0x01, 0xf93c00, 0xfa3f800000, or 0xfb3ff0000000000000. 1441 Example rules for this are: 1443 1. Encode integral values that fit in 64 bits as values from 1444 major types 0 and 1, and other values as the preferred 1445 (smallest of 16-, 32-, or 64-bit) floating-point 1446 representation that accurately represents the value, 1448 2. Encode all values as the preferred floating-point 1449 representation that accurately represents the value, even for 1450 integral values, or 1452 3. Encode all values as 64-bit floating-point representations. 1454 Rule 1 straddles the boundaries between integers and floating- 1455 point values, and Rule 3 does not use preferred serialization, so 1456 Rule 2 may be a good choice in many cases. 1458 * If NaN is an allowed value and there is no intent to support NaN 1459 payloads or signaling NaNs, the protocol needs to pick a single 1460 representation, typically 0xf97e00. If that simple choice is not 1461 possible, specific attention will be needed for NaN handling. 1463 * Subnormal numbers (nonzero numbers with the lowest possible 1464 exponent of a given IEEE 754 number format) may be flushed to zero 1465 outputs or be treated as zero inputs in some floating-point 1466 implementations. A protocol's deterministic encoding may want to 1467 specifically accommodate such implementations while creating an 1468 onus on other implementations, by excluding subnormal numbers from 1469 interchange, interchanging zero instead. 1471 * The same number can be represented by different decimal fractions, 1472 by different bigfloats, and by different forms under other tags 1473 that may be defined to express numeric values. Depending on the 1474 implementation, it may not always be practical to determine 1475 whether any of these forms (or forms in the basic generic data 1476 model) are equivalent. An application protocol that presents 1477 choices of this kind for the representation format of numbers 1478 needs to be explicit in how the formats are to be chosen for 1479 deterministic encoding. 1481 4.2.3. Length-first Map Key Ordering 1483 The core deterministic encoding requirements (Section 4.2.1) sort map 1484 keys in a different order from the one suggested by Section 3.9 of 1485 [RFC7049] (called "Canonical CBOR" there). Protocols that need to be 1486 compatible with [RFC7049]'s order can instead be specified in terms 1487 of this specification's "length-first core deterministic encoding 1488 requirements": 1490 A CBOR encoding satisfies the "length-first core deterministic 1491 encoding requirements" if it satisfies the core deterministic 1492 encoding requirements except that the keys in every map MUST be 1493 sorted such that: 1495 1. If two keys have different lengths, the shorter one sorts 1496 earlier; 1498 2. If two keys have the same length, the one with the lower value in 1499 (byte-wise) lexical order sorts earlier. 1501 For example, under the length-first core deterministic encoding 1502 requirements, the following keys are sorted correctly: 1504 1. 10, encoded as 0x0a. 1506 2. -1, encoded as 0x20. 1508 3. false, encoded as 0xf4. 1510 4. 100, encoded as 0x1864. 1512 5. "z", encoded as 0x617a. 1514 6. [-1], encoded as 0x8120. 1516 7. "aa", encoded as 0x626161. 1518 8. [100], encoded as 0x811864. 1520 (Although [RFC7049] used the term "Canonical CBOR" for its form of 1521 requirements on deterministic encoding, this document avoids this 1522 term because "canonicalization" is often associated with specific 1523 uses of deterministic encoding only. The terms are essentially 1524 interchangeable, however, and the set of core requirements in this 1525 document could also be called "Canonical CBOR", while the length- 1526 first-ordered version of that could be called "Old Canonical CBOR".) 1528 5. Creating CBOR-Based Protocols 1530 Data formats such as CBOR are often used in environments where there 1531 is no format negotiation. A specific design goal of CBOR is to not 1532 need any included or assumed schema: a decoder can take a CBOR item 1533 and decode it with no other knowledge. 1535 Of course, in real-world implementations, the encoder and the decoder 1536 will have a shared view of what should be in a CBOR data item. For 1537 example, an agreed-to format might be "the item is an array whose 1538 first value is a UTF-8 string, second value is an integer, and 1539 subsequent values are zero or more floating-point numbers" or "the 1540 item is a map that has byte strings for keys and contains at least 1541 one pair whose key is 0xab01". 1543 CBOR-based protocols MUST specify how their decoders handle invalid 1544 and other unexpected data. CBOR-based protocols MAY specify that 1545 they treat arbitrary valid data as unexpected. Encoders for CBOR- 1546 based protocols MUST produce only valid items, that is, the protocol 1547 cannot be designed to make use of invalid items. An encoder can be 1548 capable of encoding as many or as few types of values as is required 1549 by the protocol in which it is used; a decoder can be capable of 1550 understanding as many or as few types of values as is required by the 1551 protocols in which it is used. This lack of restrictions allows CBOR 1552 to be used in extremely constrained environments. 1554 The rest of this section discusses some considerations in creating 1555 CBOR-based protocols. With few exceptions, it is advisory only and 1556 explicitly excludes any language from BCP 14 other than words that 1557 could be interpreted as "MAY" in the sense of BCP 14. The exceptions 1558 aim at facilitating interoperability of CBOR-based protocols while 1559 making use of a wide variety of both generic and application-specific 1560 encoders and decoders. 1562 5.1. CBOR in Streaming Applications 1564 In a streaming application, a data stream may be composed of a 1565 sequence of CBOR data items concatenated back-to-back. In such an 1566 environment, the decoder immediately begins decoding a new data item 1567 if data is found after the end of a previous data item. 1569 Not all of the bytes making up a data item may be immediately 1570 available to the decoder; some decoders will buffer additional data 1571 until a complete data item can be presented to the application. 1572 Other decoders can present partial information about a top-level data 1573 item to an application, such as the nested data items that could 1574 already be decoded, or even parts of a byte string that hasn't 1575 completely arrived yet. 1577 Note that some applications and protocols will not want to use 1578 indefinite-length encoding. Using indefinite-length encoding allows 1579 an encoder to not need to marshal all the data for counting, but it 1580 requires a decoder to allocate increasing amounts of memory while 1581 waiting for the end of the item. This might be fine for some 1582 applications but not others. 1584 5.2. Generic Encoders and Decoders 1586 A generic CBOR decoder can decode all well-formed CBOR data and 1587 present them to an application. See Appendix C. 1589 Even though CBOR attempts to minimize these cases, not all well- 1590 formed CBOR data is valid: for example, the encoded text string 1591 "0x62c0ae" does not contain valid UTF-8 and so is not a valid CBOR 1592 item. Also, specific tags may make semantic constraints that may be 1593 violated, such as a bignum tag enclosing another tag, or an instance 1594 of tag number 0 containing a byte string, or containing a text string 1595 with contents that do not match [RFC3339]'s "date-time" production. 1596 There is no requirement that generic encoders and decoders make 1597 unnatural choices for their application interface to enable the 1598 processing of invalid data. Generic encoders and decoders are 1599 expected to forward simple values and tags even if their specific 1600 codepoints are not registered at the time the encoder/decoder is 1601 written (Section 5.4). 1603 Generic decoders provide ways to present well-formed CBOR values, 1604 both valid and invalid, to an application. The diagnostic notation 1605 (Section 8) may be used to present well-formed CBOR values to humans. 1607 Generic encoders provide an application interface that allows the 1608 application to specify any well-formed value, including simple values 1609 and tags unknown to the encoder. 1611 5.3. Validity of Items 1613 A well-formed but invalid CBOR data item presents a problem with 1614 interpreting the data encoded in it in the CBOR data model. A CBOR- 1615 based protocol could be specified in several layers, in which the 1616 lower layers don't process the semantics of some of the CBOR data 1617 they forward. These layers can't notice any validity errors in data 1618 they don't process and MUST forward that data as-is. The first layer 1619 that does process the semantics of an invalid CBOR item MUST take one 1620 of two choices: 1622 1. Replace the problematic item with an error marker and continue 1623 with the next item, or 1625 2. Issue an error and stop processing altogether. 1627 A CBOR-based protocol MUST specify which of these options its 1628 decoders take, for each kind of invalid item they might encounter. 1630 Such problems might occur at the basic validity level of CBOR or in 1631 the context of tags (tag validity). 1633 5.3.1. Basic validity 1635 Two kinds of validity errors can occur in the basic generic data 1636 model: 1638 Duplicate keys in a map: Generic decoders (Section 5.2) make data 1639 available to applications using the native CBOR data model. That 1640 data model includes maps (key-value mappings with unique keys), 1641 not multimaps (key-value mappings where multiple entries can have 1642 the same key). Thus, a generic decoder that gets a CBOR map item 1643 that has duplicate keys will decode to a map with only one 1644 instance of that key, or it might stop processing altogether. On 1645 the other hand, a "streaming decoder" may not even be able to 1646 notice (Section 5.6). 1648 Invalid UTF-8 string: A decoder might or might not want to verify 1649 that the sequence of bytes in a UTF-8 string (major type 3) is 1650 actually valid UTF-8 and react appropriately. 1652 5.3.2. Tag validity 1654 Two additional kinds of validity errors are introduced by adding tags 1655 to the basic generic data model: 1657 Inadmissible type for tag content: Tag numbers (Section 3.4) specify 1658 what type of data item is supposed to be used as their tag 1659 content; for example, the tag numbers for positive or negative 1660 bignums are supposed to be put on byte strings. A decoder that 1661 decodes the tagged data item into a native representation (a 1662 native big integer in this example) is expected to check the type 1663 of the data item being tagged. Even decoders that don't have such 1664 native representations available in their environment may perform 1665 the check on those tags known to them and react appropriately. 1667 Inadmissible value for tag content: The type of data item may be 1668 admissible for a tag's content, but the specific value may not be; 1669 e.g., a value of "yesterday" is not acceptable for the content of 1670 tag 0, even though it properly is a text string. A decoder that 1671 normally ingests such tags into equivalent platform types might 1672 present this tag to the application in a similar way to how it 1673 would present a tag with an unknown tag number (Section 5.4). 1675 5.4. Validity and Evolution 1677 A decoder with validity checking will expend the effort to reliably 1678 detect data items with validity errors. For example, such a decoder 1679 needs to have an API that reports an error (and does not return data) 1680 for a CBOR data item that contains any of the validity errors listed 1681 in the previous subsection. 1683 The set of tags defined in the tag registry (Section 9.2), as well as 1684 the set of simple values defined in the simple values registry 1685 (Section 9.1), can grow at any time beyond the set understood by a 1686 generic decoder. A validity-checking decoder can do one of two 1687 things when it encounters such a case that it does not recognize: 1689 * It can report an error (and not return data). Note that this 1690 error is not a validity error per se. This kind of error is more 1691 likely to be raised by a decoder that would be performing validity 1692 checking if this were a known case. 1694 * It can emit the unknown item (type, value, and, for tags, the 1695 decoded tagged data item) to the application calling the decoder, 1696 with an indication that the decoder did not recognize that tag 1697 number or simple value. 1699 The latter approach, which is also appropriate for decoders that do 1700 not support validity checking, provides forward compatibility with 1701 newly registered tags and simple values without the requirement to 1702 update the encoder at the same time as the calling application. (For 1703 this, the API for the decoder needs to have a way to mark unknown 1704 items so that the calling application can handle them in a manner 1705 appropriate for the program.) 1706 Since some of the processing needed for validity checking may have an 1707 appreciable cost (in particular with duplicate detection for maps), 1708 support of validity checking is not a requirement placed on all CBOR 1709 decoders. 1711 Some encoders will rely on their applications to provide input data 1712 in such a way that valid CBOR results from the encoder. A generic 1713 encoder may also want to provide a validity-checking mode where it 1714 reliably limits its output to valid CBOR, independent of whether or 1715 not its application is indeed providing API-conformant data. 1717 5.5. Numbers 1719 CBOR-based protocols should take into account that different language 1720 environments pose different restrictions on the range and precision 1721 of numbers that are representable. For example, the basic JavaScript 1722 number system treats all numbers as floating-point values, which may 1723 result in silent loss of precision in decoding integers with more 1724 than 53 significant bits. A protocol that uses numbers should define 1725 its expectations on the handling of non-trivial numbers in decoders 1726 and receiving applications. 1728 A CBOR-based protocol that includes floating-point numbers can 1729 restrict which of the three formats (half-precision, single- 1730 precision, and double-precision) are to be supported. For an 1731 integer-only application, a protocol may want to completely exclude 1732 the use of floating-point values. 1734 A CBOR-based protocol designed for compactness may want to exclude 1735 specific integer encodings that are longer than necessary for the 1736 application, such as to save the need to implement 64-bit integers. 1737 There is an expectation that encoders will use the most compact 1738 integer representation that can represent a given value. However, a 1739 compact application that does not require deterministic encoding 1740 should accept values that use a longer-than-needed encoding (such as 1741 encoding "0" as 0b000_11001 followed by two bytes of 0x00) as long as 1742 the application can decode an integer of the given size. Similar 1743 considerations apply to floating-point values; decoding both 1744 preferred serializations and longer-than-needed ones is recommended. 1746 CBOR-based protocols for constrained applications that provide a 1747 choice between representing a specific number as an integer and as a 1748 decimal fraction or bigfloat (such as when the exponent is small and 1749 non-negative), might express a quality-of-implementation expectation 1750 that the integer representation is used directly. 1752 5.6. Specifying Keys for Maps 1754 The encoding and decoding applications need to agree on what types of 1755 keys are going to be used in maps. In applications that need to 1756 interwork with JSON-based applications, conversion is simplified by 1757 limiting keys to text strings only; otherwise, there has to be a 1758 specified mapping from the other CBOR types to text strings, and this 1759 often leads to implementation errors. In applications where keys are 1760 numeric in nature and numeric ordering of keys is important to the 1761 application, directly using the numbers for the keys is useful. 1763 If multiple types of keys are to be used, consideration should be 1764 given to how these types would be represented in the specific 1765 programming environments that are to be used. For example, in 1766 JavaScript Maps [ECMA262], a key of integer 1 cannot be distinguished 1767 from a key of floating-point 1.0. This means that, if integer keys 1768 are used, the protocol needs to avoid use of floating-point keys the 1769 values of which happen to be integer numbers in the same map. 1771 Decoders that deliver data items nested within a CBOR data item 1772 immediately on decoding them ("streaming decoders") often do not keep 1773 the state that is necessary to ascertain uniqueness of a key in a 1774 map. Similarly, an encoder that can start encoding data items before 1775 the enclosing data item is completely available ("streaming encoder") 1776 may want to reduce its overhead significantly by relying on its data 1777 source to maintain uniqueness. 1779 A CBOR-based protocol MUST define what to do when a receiving 1780 application does see multiple identical keys in a map. The resulting 1781 rule in the protocol MUST respect the CBOR data model: it cannot 1782 prescribe a specific handling of the entries with the identical keys, 1783 except that it might have a rule that having identical keys in a map 1784 indicates a malformed map and that the decoder has to stop with an 1785 error. When processing maps that exhibit entries with duplicate 1786 keys, a generic decoder might do one of the following: 1788 * Not accept maps duplicate keys (that is, enforce validity for 1789 maps, see also Section 5.4). These generic decoders are 1790 universally useful. An application may still need to do perform 1791 its own duplicate checking based on application rules (for 1792 instance if the application equates integers and floating point 1793 values in map key positions for specific maps). 1795 * Pass all map entries to the application, including ones with 1796 duplicate keys. This requires the application to handle (check 1797 against) duplicate keys, even if the application rules are 1798 identical to the generic data model rules. 1800 * Lose some entries with duplicate keys, e.g. by only delivering the 1801 final (or first) entry out of the entries with the same key. With 1802 such a generic decoder, applications may get different results for 1803 a specific key on different runs and with different generic 1804 decoders as which value is returned is based on generic decoder 1805 implementation and the actual order of keys in the map. In 1806 particular, applications cannot validate key uniqueness on their 1807 own as they do not necessarily see all entries; they may not be 1808 able to use such a generic decoder if they do need to validate key 1809 uniqueness. These generic decoders can only be used in situations 1810 where the data source and transfer can be relied upon to always 1811 provide valid maps; this is not possible if the data source and 1812 transfer can be attacked. 1814 Generic decoders need to document which of these three approaches 1815 they implement. 1817 The CBOR data model for maps does not allow ascribing semantics to 1818 the order of the key/value pairs in the map representation. Thus, a 1819 CBOR-based protocol MUST NOT specify that changing the key/value pair 1820 order in a map would change the semantics, except to specify that 1821 some orders are disallowed, for example where they would not meet the 1822 requirements of a deterministic encoding (Section 4.2). (Any 1823 secondary effects of map ordering such as on timing, cache usage, and 1824 other potential side channels are not considered part of the 1825 semantics but may be enough reason on their own for a protocol to 1826 require a deterministic encoding format.) 1828 Applications for constrained devices that have maps where a small 1829 number of frequently used keys can be identified should consider 1830 using small integers as keys; for instance, a set of 24 or fewer 1831 frequent keys can be encoded in a single byte as unsigned integers, 1832 up to 48 if negative integers are also used. Less frequently 1833 occurring keys can then use integers with longer encodings. 1835 5.6.1. Equivalence of Keys 1837 The specific data model applying to a CBOR data item is used to 1838 determine whether keys occurring in maps are duplicates or distinct. 1840 At the generic data model level, numerically equivalent integer and 1841 floating-point values are distinct from each other, as they are from 1842 the various big numbers (Tags 2 to 5). Similarly, text strings are 1843 distinct from byte strings, even if composed of the same bytes. A 1844 tagged value is distinct from an untagged value or from a value 1845 tagged with a different tag number. 1847 Within each of these groups, numeric values are distinct unless they 1848 are numerically equal (specifically, -0.0 is equal to 0.0); for the 1849 purpose of map key equivalence, NaN (not a number) values are 1850 equivalent if they have the same significand after zero-extending 1851 both significands at the right to 64 bits. 1853 (Byte and text) strings are compared byte by byte, arrays element by 1854 element, and are equal if they have the same number of bytes/elements 1855 and the same values at the same positions. Two maps are equal if 1856 they have the same set of pairs regardless of their order; pairs are 1857 equal if both the key and value are equal. 1859 Tagged values are equal if both the tag number and the tag content 1860 are equal. (Note that a generic decoder that provides processing for 1861 a specific tag may not be able to distinguish some semantically 1862 equivalent values, e.g. if leading zeroes occur in the content of tag 1863 2/3 (Section 3.4.3).) Simple values are equal if they simply have 1864 the same value. Nothing else is equal in the generic data model, a 1865 simple value 2 is not equivalent to an integer 2 and an array is 1866 never equivalent to a map. 1868 As discussed in Section 2.2, specific data models can make values 1869 equivalent for the purpose of comparing map keys that are distinct in 1870 the generic data model. Note that this implies that a generic 1871 decoder may deliver a decoded map to an application that needs to be 1872 checked for duplicate map keys by that application (alternatively, 1873 the decoder may provide a programming interface to perform this 1874 service for the application). Specific data models cannot 1875 distinguish values for map keys that are equal for this purpose at 1876 the generic data model level. 1878 5.7. Undefined Values 1880 In some CBOR-based protocols, the simple value (Section 3.3) of 1881 Undefined might be used by an encoder as a substitute for a data item 1882 with an encoding problem, in order to allow the rest of the enclosing 1883 data items to be encoded without harm. 1885 6. Converting Data between CBOR and JSON 1887 This section gives non-normative advice about converting between CBOR 1888 and JSON. Implementations of converters are free to use whichever 1889 advice here they want. 1891 It is worth noting that a JSON text is a sequence of characters, not 1892 an encoded sequence of bytes, while a CBOR data item consists of 1893 bytes, not characters. 1895 6.1. Converting from CBOR to JSON 1897 Most of the types in CBOR have direct analogs in JSON. However, some 1898 do not, and someone implementing a CBOR-to-JSON converter has to 1899 consider what to do in those cases. The following non-normative 1900 advice deals with these by converting them to a single substitute 1901 value, such as a JSON null. 1903 * An integer (major type 0 or 1) becomes a JSON number. 1905 * A byte string (major type 2) that is not embedded in a tag that 1906 specifies a proposed encoding is encoded in base64url without 1907 padding and becomes a JSON string. 1909 * A UTF-8 string (major type 3) becomes a JSON string. Note that 1910 JSON requires escaping certain characters ([RFC8259], Section 7): 1911 quotation mark (U+0022), reverse solidus (U+005C), and the "C0 1912 control characters" (U+0000 through U+001F). All other characters 1913 are copied unchanged into the JSON UTF-8 string. 1915 * An array (major type 4) becomes a JSON array. 1917 * A map (major type 5) becomes a JSON object. This is possible 1918 directly only if all keys are UTF-8 strings. A converter might 1919 also convert other keys into UTF-8 strings (such as by converting 1920 integers into strings containing their decimal representation); 1921 however, doing so introduces a danger of key collision. Note also 1922 that, if tags on UTF-8 strings are ignored as proposed below, this 1923 will cause a key collision if the tags are different but the 1924 strings are the same. 1926 * False (major type 7, additional information 20) becomes a JSON 1927 false. 1929 * True (major type 7, additional information 21) becomes a JSON 1930 true. 1932 * Null (major type 7, additional information 22) becomes a JSON 1933 null. 1935 * A floating-point value (major type 7, additional information 25 1936 through 27) becomes a JSON number if it is finite (that is, it can 1937 be represented in a JSON number); if the value is non-finite (NaN, 1938 or positive or negative Infinity), it is represented by the 1939 substitute value. 1941 * Any other simple value (major type 7, any additional information 1942 value not yet discussed) is represented by the substitute value. 1944 * A bignum (major type 6, tag number 2 or 3) is represented by 1945 encoding its byte string in base64url without padding and becomes 1946 a JSON string. For tag number 3 (negative bignum), a "~" (ASCII 1947 tilde) is inserted before the base-encoded value. (The conversion 1948 to a binary blob instead of a number is to prevent a likely 1949 numeric overflow for the JSON decoder.) 1951 * A byte string with an encoding hint (major type 6, tag number 21 1952 through 23) is encoded as described and becomes a JSON string. 1954 * For all other tags (major type 6, any other tag number), the tag 1955 content is represented as a JSON value; the tag number is ignored. 1957 * Indefinite-length items are made definite before conversion. 1959 6.2. Converting from JSON to CBOR 1961 All JSON values, once decoded, directly map into one or more CBOR 1962 values. As with any kind of CBOR generation, decisions have to be 1963 made with respect to number representation. In a suggested 1964 conversion: 1966 * JSON numbers without fractional parts (integer numbers) are 1967 represented as integers (major types 0 and 1, possibly major type 1968 6 tag number 2 and 3), choosing the shortest form; integers longer 1969 than an implementation-defined threshold may instead be 1970 represented as floating-point values. The default range that is 1971 represented as integer is -2**53+1..2**53-1 (fully exploiting the 1972 range for exact integers in the binary64 representation often used 1973 for decoding JSON [RFC7493]). A CBOR-based protocol, or a generic 1974 converter implementation, may choose -2**32..2**32-1 or 1975 -2**64..2**64-1 (fully using the integer ranges available in CBOR 1976 with uint32_t or uint64_t, respectively) or even -2**31..2**31-1 1977 or -2**63..2**63-1 (using popular ranges for two's complement 1978 signed integers). (If the JSON was generated from a JavaScript 1979 implementation, its precision is already limited to 53 bits 1980 maximum.) 1982 * Numbers with fractional parts are represented as floating-point 1983 values, performing the decimal-to-binary conversion based on the 1984 precision provided by IEEE 754 binary64. Then, when encoding in 1985 CBOR, the preferred serialization uses the shortest floating-point 1986 representation exactly representing this conversion result; for 1987 instance, 1.5 is represented in a 16-bit floating-point value (not 1988 all implementations will be capable of efficiently finding the 1989 minimum form, though). Instead of using the default binary64 1990 precision, there may be an implementation-defined limit to the 1991 precision of the conversion that will affect the precision of the 1992 represented values. Decimal representation should only be used on 1993 the CBOR side if that is specified in a protocol. 1995 CBOR has been designed to generally provide a more compact encoding 1996 than JSON. One implementation strategy that might come to mind is to 1997 perform a JSON-to-CBOR encoding in place in a single buffer. This 1998 strategy would need to carefully consider a number of pathological 1999 cases, such as that some strings represented with no or very few 2000 escapes and longer (or much longer) than 255 bytes may expand when 2001 encoded as UTF-8 strings in CBOR. Similarly, a few of the binary 2002 floating-point representations might cause expansion from some short 2003 decimal representations (1.1, 1e9) in JSON. This may be hard to get 2004 right, and any ensuing vulnerabilities may be exploited by an 2005 attacker. 2007 7. Future Evolution of CBOR 2009 Successful protocols evolve over time. New ideas appear, 2010 implementation platforms improve, related protocols are developed and 2011 evolve, and new requirements from applications and protocols are 2012 added. Facilitating protocol evolution is therefore an important 2013 design consideration for any protocol development. 2015 For protocols that will use CBOR, CBOR provides some useful 2016 mechanisms to facilitate their evolution. Best practices for this 2017 are well known, particularly from JSON format development of JSON- 2018 based protocols. Therefore, such best practices are outside the 2019 scope of this specification. 2021 However, facilitating the evolution of CBOR itself is very well 2022 within its scope. CBOR is designed to both provide a stable basis 2023 for development of CBOR-based protocols and to be able to evolve. 2024 Since a successful protocol may live for decades, CBOR needs to be 2025 designed for decades of use and evolution. This section provides 2026 some guidance for the evolution of CBOR. It is necessarily more 2027 subjective than other parts of this document. It is also necessarily 2028 incomplete, lest it turn into a textbook on protocol development. 2030 7.1. Extension Points 2032 In a protocol design, opportunities for evolution are often included 2033 in the form of extension points. For example, there may be a 2034 codepoint space that is not fully allocated from the outset, and the 2035 protocol is designed to tolerate and embrace implementations that 2036 start using more codepoints than initially allocated. 2038 Sizing the codepoint space may be difficult because the range 2039 required may be hard to predict. An attempt should be made to make 2040 the codepoint space large enough so that it can slowly be filled over 2041 the intended lifetime of the protocol. 2043 CBOR has three major extension points: 2045 * the "simple" space (values in major type 7). Of the 24 efficient 2046 (and 224 slightly less efficient) values, only a small number have 2047 been allocated. Implementations receiving an unknown simple data 2048 item may be able to process it as such, given that the structure 2049 of the value is indeed simple. The IANA registry in Section 9.1 2050 is the appropriate way to address the extensibility of this 2051 codepoint space. 2053 * the "tag" space (values in major type 6). Again, only a small 2054 part of the codepoint space has been allocated, and the space is 2055 abundant (although the early numbers are more efficient than the 2056 later ones). Implementations receiving an unknown tag number can 2057 choose to simply ignore it (process just the enclosed tag content) 2058 or to process it as an unknown tag number wrapping the tag 2059 content. The IANA registry in Section 9.2 is the appropriate way 2060 to address the extensibility of this codepoint space. 2062 * the "additional information" space. An implementation receiving 2063 an unknown additional information value has no way to continue 2064 decoding, so allocating codepoints to this space is a major step. 2065 There are also very few codepoints left. See also Section 7.2. 2067 7.2. Curating the Additional Information Space 2069 The human mind is sometimes drawn to filling in little perceived gaps 2070 to make something neat. We expect the remaining gaps in the 2071 codepoint space for the additional information values to be an 2072 attractor for new ideas, just because they are there. 2074 The present specification does not manage the additional information 2075 codepoint space by an IANA registry. Instead, allocations out of 2076 this space can only be done by updating this specification. 2078 For an additional information value of n >= 24, the size of the 2079 additional data typically is 2**(n-24) bytes. Therefore, additional 2080 information values 28 and 29 should be viewed as candidates for 2081 128-bit and 256-bit quantities, in case a need arises to add them to 2082 the protocol. Additional information value 30 is then the only 2083 additional information value available for general allocation, and 2084 there should be a very good reason for allocating it before assigning 2085 it through an update of the present specification. 2087 8. Diagnostic Notation 2089 CBOR is a binary interchange format. To facilitate documentation and 2090 debugging, and in particular to facilitate communication between 2091 entities cooperating in debugging, this section defines a simple 2092 human-readable diagnostic notation. All actual interchange always 2093 happens in the binary format. 2095 Note that this truly is a diagnostic format; it is not meant to be 2096 parsed. Therefore, no formal definition (as in ABNF) is given in 2097 this document. (Implementers looking for a text-based format for 2098 representing CBOR data items in configuration files may also want to 2099 consider YAML [YAML].) 2101 The diagnostic notation is loosely based on JSON as it is defined in 2102 RFC 8259, extending it where needed. 2104 The notation borrows the JSON syntax for numbers (integer and 2105 floating-point), True (>true<), False (>false<), Null (>null<), UTF-8 2106 strings, arrays, and maps (maps are called objects in JSON; the 2107 diagnostic notation extends JSON here by allowing any data item in 2108 the key position). Undefined is written >undefined< as in 2109 JavaScript. The non-finite floating-point numbers Infinity, 2110 -Infinity, and NaN are written exactly as in this sentence (this is 2111 also a way they can be written in JavaScript, although JSON does not 2112 allow them). A tag is written as an integer number for the tag 2113 number, followed by the tag content in parentheses; for instance, an 2114 RFC 3339 (ISO 8601) date could be notated as: 2116 0("2013-03-21T20:04:00Z") 2118 or the equivalent relative time as 2120 1(1363896240) 2122 Byte strings are notated in one of the base encodings, without 2123 padding, enclosed in single quotes, prefixed by >h< for base16, >b32< 2124 for base32, >h32< for base32hex, >b64< for base64 or base64url (the 2125 actual encodings do not overlap, so the string remains unambiguous). 2126 For example, the byte string 0x12345678 could be written h'12345678', 2127 b32'CI2FM6A', or b64'EjRWeA'. 2129 Unassigned simple values are given as "simple()" with the appropriate 2130 integer in the parentheses. For example, "simple(42)" indicates 2131 major type 7, value 42. 2133 A number of useful extensions to the diagnostic notation defined here 2134 are provided in Appendix G of [RFC8610], "Extended Diagnostic 2135 Notation" (EDN). 2137 8.1. Encoding Indicators 2139 Sometimes it is useful to indicate in the diagnostic notation which 2140 of several alternative representations were actually used; for 2141 example, a data item written >1.5< by a diagnostic decoder might have 2142 been encoded as a half-, single-, or double-precision float. 2144 The convention for encoding indicators is that anything starting with 2145 an underscore and all following characters that are alphanumeric or 2146 underscore, is an encoding indicator, and can be ignored by anyone 2147 not interested in this information. For example, "_" or "_3". 2148 Encoding indicators are always optional. 2150 A single underscore can be written after the opening brace of a map 2151 or the opening bracket of an array to indicate that the data item was 2152 represented in indefinite-length format. For example, [_ 1, 2] 2153 contains an indicator that an indefinite-length representation was 2154 used to represent the data item [1, 2]. 2156 An underscore followed by a decimal digit n indicates that the 2157 preceding item (or, for arrays and maps, the item starting with the 2158 preceding bracket or brace) was encoded with an additional 2159 information value of 24+n. For example, 1.5_1 is a half-precision 2160 floating-point number, while 1.5_3 is encoded as double precision. 2161 This encoding indicator is not shown in Appendix A. (Note that the 2162 encoding indicator "_" is thus an abbreviation of the full form "_7", 2163 which is not used.) 2165 Byte and text strings of indefinite length can be notated in the form 2166 (_ h'0123', h'4567') and (_ "foo", "bar"). 2168 9. IANA Considerations 2170 IANA has created two registries for new CBOR values. The registries 2171 are separate, that is, not under an umbrella registry, and follow the 2172 rules in [RFC8126]. IANA has also assigned a new MIME media type and 2173 an associated Constrained Application Protocol (CoAP) Content-Format 2174 entry. 2176 [To be removed by RFC editor:] IANA is requested to update these 2177 registries to point to the present document instead of RFC 7049. 2179 9.1. Simple Values Registry 2181 IANA has created the "Concise Binary Object Representation (CBOR) 2182 Simple Values" registry at [IANA.cbor-simple-values]. The initial 2183 values are shown in Table 4. 2185 New entries in the range 0 to 19 are assigned by Standards Action. 2186 It is suggested that these Standards Actions allocate values starting 2187 with the number 16 in order to reserve the lower numbers for 2188 contiguous blocks (if any). 2190 New entries in the range 32 to 255 are assigned by Specification 2191 Required. 2193 9.2. Tags Registry 2195 IANA has created the "Concise Binary Object Representation (CBOR) 2196 Tags" registry at [IANA.cbor-tags]. The tags that were defined in 2197 [RFC7049] are described in detail in Section 3.4, and other tags have 2198 already been defined. 2200 New entries in the range 0 to 23 are assigned by Standards Action. 2201 New entries in the range 24 to 255 are assigned by Specification 2202 Required. New entries in the range 256 to 18446744073709551615 are 2203 assigned by First Come First Served. The template for registration 2204 requests is: 2206 * Data item 2208 * Semantics (short form) 2210 In addition, First Come First Served requests should include: 2212 * Point of contact 2214 * Description of semantics (URL) - This description is optional; the 2215 URL can point to something like an Internet-Draft or a web page. 2217 9.3. Media Type ("MIME Type") 2219 The Internet media type [RFC6838] for a single encoded CBOR data item 2220 is application/cbor, as defined in [IANA.media-types]: 2222 Type name: application 2224 Subtype name: cbor 2226 Required parameters: n/a 2227 Optional parameters: n/a 2229 Encoding considerations: binary 2231 Security considerations: See Section 10 of this document 2233 Interoperability considerations: n/a 2235 Published specification: This document 2237 Applications that use this media type: None yet, but it is expected 2238 that this format will be deployed in protocols and applications. 2240 Additional information: * Magic number(s): n/a 2242 * File extension(s): .cbor 2244 * Macintosh file type code(s): n/a 2246 Person & email address to contact for further information: IETF CBOR 2247 Working Group cbor@ietf.org (mailto:cbor@ietf.org) or IETF 2248 Applications and Real-Time Area art@ietf.org (mailto:art@ietf.org) 2250 Intended usage: COMMON 2252 Restrictions on usage: none 2254 Author: IETF CBOR Working Group cbor@ietf.org (mailto:cbor@ietf.org) 2256 Change controller: The IESG iesg@ietf.org (mailto:iesg@ietf.org) 2258 9.4. CoAP Content-Format 2260 The CoAP Content-Format for CBOR is defined in 2261 [IANA.core-parameters]: 2263 Media Type: application/cbor 2265 Encoding: - 2267 Id: 60 2269 Reference: [RFCthis] 2271 9.5. The +cbor Structured Syntax Suffix Registration 2273 The Structured Syntax Suffix [RFC6838] for media types based on a 2274 single encoded CBOR data item is +cbor, as defined in 2275 [IANA.media-type-structured-suffix]: 2277 Name: Concise Binary Object Representation (CBOR) 2279 +suffix: +cbor 2281 References: [RFCthis] 2283 Encoding Considerations: CBOR is a binary format. 2285 Interoperability Considerations: n/a 2287 Fragment Identifier Considerations: The syntax and semantics of 2288 fragment identifiers specified for +cbor SHOULD be as specified 2289 for "application/cbor". (At publication of this document, there 2290 is no fragment identification syntax defined for "application/ 2291 cbor".) 2293 The syntax and semantics for fragment identifiers for a specific 2294 "xxx/yyy+cbor" SHOULD be processed as follows: 2296 * For cases defined in +cbor, where the fragment identifier 2297 resolves per the +cbor rules, then process as specified in 2298 +cbor. 2300 * For cases defined in +cbor, where the fragment identifier does 2301 not resolve per the +cbor rules, then process as specified in 2302 "xxx/yyy+cbor". 2304 * For cases not defined in +cbor, then process as specified in 2305 "xxx/yyy+cbor". 2307 Security Considerations: See Section 10 of this document 2309 Contact: IETF CBOR Working Group cbor@ietf.org 2310 (mailto:cbor@ietf.org) or IETF Applications and Real-Time Area 2311 art@ietf.org (mailto:art@ietf.org) 2313 Author/Change Controller: The IESG iesg@ietf.org 2314 (mailto:iesg@ietf.org) 2315 // Editors' note: RFC 6838 has a template 2316 field Author/Change 2317 // controller, the descriptive text of 2318 which makes clear that this is 2319 // the change controller, not the author. 2320 Go figure. There is no 2321 // separate author entry as in the media 2322 types registry. (RFC 2323 // editor: Please remove this note before 2324 publication.) 2326 10. Security Considerations 2328 A network-facing application can exhibit vulnerabilities in its 2329 processing logic for incoming data. Complex parsers are well known 2330 as a likely source of such vulnerabilities, such as the ability to 2331 remotely crash a node, or even remotely execute arbitrary code on it. 2332 CBOR attempts to narrow the opportunities for introducing such 2333 vulnerabilities by reducing parser complexity, by giving the entire 2334 range of encodable values a meaning where possible. 2336 Because CBOR decoders are often used as a first step in processing 2337 unvalidated input, they need to be fully prepared for all types of 2338 hostile input that may be designed to corrupt, overrun, or achieve 2339 control of the system decoding the CBOR data item. A CBOR decoder 2340 needs to assume that all input may be hostile even if it has been 2341 checked by a firewall, has come over a secure channel such as TLS, is 2342 encrypted or signed, or has come from some other source that is 2343 presumed trusted. 2345 Hostile input may be constructed to overrun buffers, overflow or 2346 underflow integer arithmetic, or cause other decoding disruption. 2347 CBOR data items might have lengths or sizes that are intentionally 2348 extremely large or too short. Resource exhaustion attacks might 2349 attempt to lure a decoder into allocating very big data items 2350 (strings, arrays, maps, or even arbitrary precision numbers) or 2351 exhaust the stack depth by setting up deeply nested items. Decoders 2352 need to have appropriate resource management to mitigate these 2353 attacks. (Items for which very large sizes are given can also 2354 attempt to exploit integer overflow vulnerabilities.) 2356 A CBOR decoder, by definition, only accepts well-formed CBOR; this is 2357 the first step to its robustness. Input that is not well-formed CBOR 2358 causes no further processing from the point where the lack of well- 2359 formedness was detected. If possible, any data decoded up to this 2360 point should have no impact on the application using the CBOR 2361 decoder. 2363 In addition to ascertaining well-formedness, a CBOR decoder might 2364 also perform validity checks on the CBOR data. Alternatively, it can 2365 leave those checks to the application using the decoder. This choice 2366 needs to be clearly documented in the decoder. Beyond the validity 2367 at the CBOR level, an application also needs to ascertain that the 2368 input is in alignment with the application protocol that is 2369 serialized in CBOR. 2371 The input check itself may consume resources. This is usually linear 2372 in the size of the input, which means that an attacker has to spend 2373 resources that are commensurate to the resources spent by the 2374 defender on input validation. Processing for arbitrary-precision 2375 numbers may exceed linear effort. Also, some hash-table 2376 implementations that are used by decoders to build in-memory 2377 representations of maps can be attacked to spend quadratic effort, 2378 unless a secret key is employed (see Section 7 of [SIPHASH]). Such 2379 superlinear efforts can be employed by an attacker to exhaust 2380 resources at or before the input validator; they therefore need to be 2381 avoided in a CBOR decoder implementation. Note that tag number 2382 definitions and their implementations can add security considerations 2383 of this kind; this should then be discussed in the security 2384 considerations of the tag number definition. 2386 CBOR encoders do not receive input directly from the network and are 2387 thus not directly attackable in the same way as CBOR decoders. 2388 However, CBOR encoders often have an API that takes input from 2389 another level in the implementation and can be attacked through that 2390 API. The design and implementation of that API should assume the 2391 behavior of its caller may be based on hostile input or on coding 2392 mistakes. It should check inputs for buffer overruns, overflow and 2393 underflow of integer arithmetic, and other such errors that are aimed 2394 to disrupt the encoder. 2396 Protocols should be defined in such a way that potential multiple 2397 interpretations are reliably reduced to a single interpretation. For 2398 example, an attacker could make use of invalid input such as 2399 duplicate keys in maps, or exploit different precision in processing 2400 numbers to make one application base its decisions on a different 2401 interpretation than the one that will be used by a second 2402 application. To facilitate consistent interpretation, encoder and 2403 decoder implementations should provide a validity checking mode of 2404 operation (Section 5.4). Note, however, that a generic decoder 2405 cannot know about all requirements that an application poses on its 2406 input data; it is therefore not relieving the application from 2407 performing its own input checking. Also, since the set of defined 2408 tag numbers evolves, the application may employ a tag number that is 2409 not yet supported for validity checking by the generic decoder it 2410 uses. Generic decoders therefore need to provide documentation which 2411 tag numbers they support and what validity checking they can provide 2412 for each of them as well as for basic CBOR validity (UTF-8 checking, 2413 duplicate map key checking). 2415 11. References 2417 11.1. Normative References 2419 [ECMA262] Ecma International, "ECMAScript 2018 Language 2420 Specification", ECMA Standard ECMA-262, 9th Edition, June 2421 2018, . 2425 [IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", IEEE 2426 Std 754-2008. 2428 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 2429 Extensions (MIME) Part One: Format of Internet Message 2430 Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996, 2431 . 2433 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2434 Requirement Levels", BCP 14, RFC 2119, 2435 DOI 10.17487/RFC2119, March 1997, 2436 . 2438 [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: 2439 Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002, 2440 . 2442 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 2443 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 2444 2003, . 2446 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 2447 Resource Identifier (URI): Generic Syntax", STD 66, 2448 RFC 3986, DOI 10.17487/RFC3986, January 2005, 2449 . 2451 [RFC4287] Nottingham, M., Ed. and R. Sayre, Ed., "The Atom 2452 Syndication Format", RFC 4287, DOI 10.17487/RFC4287, 2453 December 2005, . 2455 [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 2456 Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, 2457 . 2459 [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for 2460 Writing an IANA Considerations Section in RFCs", BCP 26, 2461 RFC 8126, DOI 10.17487/RFC8126, June 2017, 2462 . 2464 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2465 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2466 May 2017, . 2468 [TIME_T] The Open Group Base Specifications, "Vol. 1: Base 2469 Definitions, Issue 7", 2013 Edition, IEEE Std 1003.1, 2470 Section 4.15 'Seconds Since the Epoch', 2013, 2471 . 2474 11.2. Informative References 2476 [ASN.1] International Telecommunication Union, "Information 2477 Technology -- ASN.1 encoding rules: Specification of Basic 2478 Encoding Rules (BER), Canonical Encoding Rules (CER) and 2479 Distinguished Encoding Rules (DER)", ITU-T Recommendation 2480 X.690, 1994. 2482 [BSON] Various, "BSON - Binary JSON", 2013, 2483 . 2485 [IANA.cbor-simple-values] 2486 IANA, "Concise Binary Object Representation (CBOR) Simple 2487 Values", 2488 . 2490 [IANA.cbor-tags] 2491 IANA, "Concise Binary Object Representation (CBOR) Tags", 2492 . 2494 [IANA.core-parameters] 2495 IANA, "Constrained RESTful Environments (CoRE) 2496 Parameters", 2497 . 2499 [IANA.media-type-structured-suffix] 2500 IANA, "Structured Syntax Suffix Registry", 2501 . 2504 [IANA.media-types] 2505 IANA, "Media Types", 2506 . 2508 [MessagePack] 2509 Furuhashi, S., "MessagePack", 2013, . 2511 [PCRE] Ho, A., "PCRE - Perl Compatible Regular Expressions", 2512 2018, . 2514 [RFC0713] Haverty, J., "MSDTP-Message Services Data Transmission 2515 Protocol", RFC 713, DOI 10.17487/RFC0713, April 1976, 2516 . 2518 [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 2519 Specifications and Registration Procedures", BCP 13, 2520 RFC 6838, DOI 10.17487/RFC6838, January 2013, 2521 . 2523 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 2524 Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, 2525 October 2013, . 2527 [RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for 2528 Constrained-Node Networks", RFC 7228, 2529 DOI 10.17487/RFC7228, May 2014, 2530 . 2532 [RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493, 2533 DOI 10.17487/RFC7493, March 2015, 2534 . 2536 [RFC7991] Hoffman, P., "The "xml2rfc" Version 3 Vocabulary", 2537 RFC 7991, DOI 10.17487/RFC7991, December 2016, 2538 . 2540 [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data 2541 Interchange Format", STD 90, RFC 8259, 2542 DOI 10.17487/RFC8259, December 2017, 2543 . 2545 [RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data 2546 Definition Language (CDDL): A Notational Convention to 2547 Express Concise Binary Object Representation (CBOR) and 2548 JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, 2549 June 2019, . 2551 [RFC8618] Dickinson, J., Hague, J., Dickinson, S., Manderson, T., 2552 and J. Bond, "Compacted-DNS (C-DNS): A Format for DNS 2553 Packet Capture", RFC 8618, DOI 10.17487/RFC8618, September 2554 2019, . 2556 [RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR) 2557 Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020, 2558 . 2560 [RFC8746] Bormann, C., Ed., "Concise Binary Object Representation 2561 (CBOR) Tags for Typed Arrays", RFC 8746, 2562 DOI 10.17487/RFC8746, February 2020, 2563 . 2565 [rfc8746] Bormann, C., Ed., "Concise Binary Object Representation 2566 (CBOR) Tags for Typed Arrays", RFC 8746, 2567 DOI 10.17487/RFC8746, February 2020, 2568 . 2570 [SIPHASH] Aumasson, J. and D. Bernstein, "SipHash: A Fast Short- 2571 Input PRF", DOI 10.1007/978-3-642-34931-7_28, Lecture 2572 Notes in Computer Science pp. 489-508, 2012, 2573 . 2575 [YAML] Ben-Kiki, O., Evans, C., and I.d. Net, "YAML Ain't Markup 2576 Language (YAML[TM]) Version 1.2", 3rd Edition, October 2577 2009, . 2579 Appendix A. Examples 2581 The following table provides some CBOR-encoded values in hexadecimal 2582 (right column), together with diagnostic notation for these values 2583 (left column). Note that the string "\u00fc" is one form of 2584 diagnostic notation for a UTF-8 string containing the single Unicode 2585 character U+00FC, LATIN SMALL LETTER U WITH DIAERESIS (u umlaut). 2586 Similarly, "\u6c34" is a UTF-8 string in diagnostic notation with a 2587 single character U+6C34 (CJK UNIFIED IDEOGRAPH-6C34, often 2588 representing "water"), and "\ud800\udd51" is a UTF-8 string in 2589 diagnostic notation with a single character U+10151 (GREEK ACROPHONIC 2590 ATTIC FIFTY STATERS). (Note that all these single-character strings 2591 could also be represented in native UTF-8 in diagnostic notation, 2592 just not in an ASCII-only specification like the present one.) In 2593 the diagnostic notation provided for bignums, their intended numeric 2594 value is shown as a decimal number (such as 18446744073709551616) 2595 instead of showing a tagged byte string (such as 2596 2(h'010000000000000000')). 2598 +------------------------------+------------------------------------+ 2599 | Diagnostic | Encoded | 2600 +==============================+====================================+ 2601 | 0 | 0x00 | 2602 +------------------------------+------------------------------------+ 2603 | 1 | 0x01 | 2604 +------------------------------+------------------------------------+ 2605 | 10 | 0x0a | 2606 +------------------------------+------------------------------------+ 2607 | 23 | 0x17 | 2608 +------------------------------+------------------------------------+ 2609 | 24 | 0x1818 | 2610 +------------------------------+------------------------------------+ 2611 | 25 | 0x1819 | 2612 +------------------------------+------------------------------------+ 2613 | 100 | 0x1864 | 2614 +------------------------------+------------------------------------+ 2615 | 1000 | 0x1903e8 | 2616 +------------------------------+------------------------------------+ 2617 | 1000000 | 0x1a000f4240 | 2618 +------------------------------+------------------------------------+ 2619 | 1000000000000 | 0x1b000000e8d4a51000 | 2620 +------------------------------+------------------------------------+ 2621 | 18446744073709551615 | 0x1bffffffffffffffff | 2622 +------------------------------+------------------------------------+ 2623 | 18446744073709551616 | 0xc249010000000000000000 | 2624 +------------------------------+------------------------------------+ 2625 | -18446744073709551616 | 0x3bffffffffffffffff | 2626 +------------------------------+------------------------------------+ 2627 | -18446744073709551617 | 0xc349010000000000000000 | 2628 +------------------------------+------------------------------------+ 2629 | -1 | 0x20 | 2630 +------------------------------+------------------------------------+ 2631 | -10 | 0x29 | 2632 +------------------------------+------------------------------------+ 2633 | -100 | 0x3863 | 2634 +------------------------------+------------------------------------+ 2635 | -1000 | 0x3903e7 | 2636 +------------------------------+------------------------------------+ 2637 | 0.0 | 0xf90000 | 2638 +------------------------------+------------------------------------+ 2639 | -0.0 | 0xf98000 | 2640 +------------------------------+------------------------------------+ 2641 | 1.0 | 0xf93c00 | 2642 +------------------------------+------------------------------------+ 2643 | 1.1 | 0xfb3ff199999999999a | 2644 +------------------------------+------------------------------------+ 2645 | 1.5 | 0xf93e00 | 2646 +------------------------------+------------------------------------+ 2647 | 65504.0 | 0xf97bff | 2648 +------------------------------+------------------------------------+ 2649 | 100000.0 | 0xfa47c35000 | 2650 +------------------------------+------------------------------------+ 2651 | 3.4028234663852886e+38 | 0xfa7f7fffff | 2652 +------------------------------+------------------------------------+ 2653 | 1.0e+300 | 0xfb7e37e43c8800759c | 2654 +------------------------------+------------------------------------+ 2655 | 5.960464477539063e-8 | 0xf90001 | 2656 +------------------------------+------------------------------------+ 2657 | 0.00006103515625 | 0xf90400 | 2658 +------------------------------+------------------------------------+ 2659 | -4.0 | 0xf9c400 | 2660 +------------------------------+------------------------------------+ 2661 | -4.1 | 0xfbc010666666666666 | 2662 +------------------------------+------------------------------------+ 2663 | Infinity | 0xf97c00 | 2664 +------------------------------+------------------------------------+ 2665 | NaN | 0xf97e00 | 2666 +------------------------------+------------------------------------+ 2667 | -Infinity | 0xf9fc00 | 2668 +------------------------------+------------------------------------+ 2669 | Infinity | 0xfa7f800000 | 2670 +------------------------------+------------------------------------+ 2671 | NaN | 0xfa7fc00000 | 2672 +------------------------------+------------------------------------+ 2673 | -Infinity | 0xfaff800000 | 2674 +------------------------------+------------------------------------+ 2675 | Infinity | 0xfb7ff0000000000000 | 2676 +------------------------------+------------------------------------+ 2677 | NaN | 0xfb7ff8000000000000 | 2678 +------------------------------+------------------------------------+ 2679 | -Infinity | 0xfbfff0000000000000 | 2680 +------------------------------+------------------------------------+ 2681 | false | 0xf4 | 2682 +------------------------------+------------------------------------+ 2683 | true | 0xf5 | 2684 +------------------------------+------------------------------------+ 2685 | null | 0xf6 | 2686 +------------------------------+------------------------------------+ 2687 | undefined | 0xf7 | 2688 +------------------------------+------------------------------------+ 2689 | simple(16) | 0xf0 | 2690 +------------------------------+------------------------------------+ 2691 | simple(255) | 0xf8ff | 2692 +------------------------------+------------------------------------+ 2693 | 0("2013-03-21T20:04:00Z") | 0xc074323031332d30332d32315432303a | 2694 | | 30343a30305a | 2695 +------------------------------+------------------------------------+ 2696 | 1(1363896240) | 0xc11a514b67b0 | 2697 +------------------------------+------------------------------------+ 2698 | 1(1363896240.5) | 0xc1fb41d452d9ec200000 | 2699 +------------------------------+------------------------------------+ 2700 | 23(h'01020304') | 0xd74401020304 | 2701 +------------------------------+------------------------------------+ 2702 | 24(h'6449455446') | 0xd818456449455446 | 2703 +------------------------------+------------------------------------+ 2704 | 32("http://www.example.com") | 0xd82076687474703a2f2f7777772e6578 | 2705 | | 616d706c652e636f6d | 2706 +------------------------------+------------------------------------+ 2707 | h'' | 0x40 | 2708 +------------------------------+------------------------------------+ 2709 | h'01020304' | 0x4401020304 | 2710 +------------------------------+------------------------------------+ 2711 | "" | 0x60 | 2712 +------------------------------+------------------------------------+ 2713 | "a" | 0x6161 | 2714 +------------------------------+------------------------------------+ 2715 | "IETF" | 0x6449455446 | 2716 +------------------------------+------------------------------------+ 2717 | "\"\\" | 0x62225c | 2718 +------------------------------+------------------------------------+ 2719 | "\u00fc" | 0x62c3bc | 2720 +------------------------------+------------------------------------+ 2721 | "\u6c34" | 0x63e6b0b4 | 2722 +------------------------------+------------------------------------+ 2723 | "\ud800\udd51" | 0x64f0908591 | 2724 +------------------------------+------------------------------------+ 2725 | [] | 0x80 | 2726 +------------------------------+------------------------------------+ 2727 | [1, 2, 3] | 0x83010203 | 2728 +------------------------------+------------------------------------+ 2729 | [1, [2, 3], [4, 5]] | 0x8301820203820405 | 2730 +------------------------------+------------------------------------+ 2731 | [1, 2, 3, 4, 5, 6, 7, 8, 9, | 0x98190102030405060708090a0b0c0d0e | 2732 | 10, 11, 12, 13, 14, 15, 16, | 0f101112131415161718181819 | 2733 | 17, 18, 19, 20, 21, 22, 23, | | 2734 | 24, 25] | | 2735 +------------------------------+------------------------------------+ 2736 | {} | 0xa0 | 2737 +------------------------------+------------------------------------+ 2738 | {1: 2, 3: 4} | 0xa201020304 | 2739 +------------------------------+------------------------------------+ 2740 | {"a": 1, "b": [2, 3]} | 0xa26161016162820203 | 2741 +------------------------------+------------------------------------+ 2742 | ["a", {"b": "c"}] | 0x826161a161626163 | 2743 +------------------------------+------------------------------------+ 2744 |{"a": "A", "b": "B", "c": "C",| 0xa5616161416162614261636143616461 | 2745 | "d": "D", "e": "E"} | 4461656145 | 2746 +------------------------------+------------------------------------+ 2747 | (_ h'0102', h'030405') | 0x5f42010243030405ff | 2748 +------------------------------+------------------------------------+ 2749 | (_ "strea", "ming") | 0x7f657374726561646d696e67ff | 2750 +------------------------------+------------------------------------+ 2751 | [_ ] | 0x9fff | 2752 +------------------------------+------------------------------------+ 2753 | [_ 1, [2, 3], [_ 4, 5]] | 0x9f018202039f0405ffff | 2754 +------------------------------+------------------------------------+ 2755 | [_ 1, [2, 3], [4, 5]] | 0x9f01820203820405ff | 2756 +------------------------------+------------------------------------+ 2757 | [1, [2, 3], [_ 4, 5]] | 0x83018202039f0405ff | 2758 +------------------------------+------------------------------------+ 2759 | [1, [_ 2, 3], [4, 5]] | 0x83019f0203ff820405 | 2760 +------------------------------+------------------------------------+ 2761 |[_ 1, 2, 3, 4, 5, 6, 7, 8, 9, | 0x9f0102030405060708090a0b0c0d0e0f | 2762 | 10, 11, 12, 13, 14, 15, 16, | 101112131415161718181819ff | 2763 | 17, 18, 19, 20, 21, 22, 23, | | 2764 | 24, 25] | | 2765 +------------------------------+------------------------------------+ 2766 | {_ "a": 1, "b": [_ 2, 3]} | 0xbf61610161629f0203ffff | 2767 +------------------------------+------------------------------------+ 2768 | ["a", {_ "b": "c"}] | 0x826161bf61626163ff | 2769 +------------------------------+------------------------------------+ 2770 | {_ "Fun": true, "Amt": -2} | 0xbf6346756ef563416d7421ff | 2771 +------------------------------+------------------------------------+ 2773 Table 6: Examples of Encoded CBOR Data Items 2775 Appendix B. Jump Table 2777 For brevity, this jump table does not show initial bytes that are 2778 reserved for future extension. It also only shows a selection of the 2779 initial bytes that can be used for optional features. (All unsigned 2780 integers are in network byte order.) 2782 +------------+------------------------------------------------+ 2783 | Byte | Structure/Semantics | 2784 +============+================================================+ 2785 | 0x00..0x17 | Unsigned integer 0x00..0x17 (0..23) | 2786 +------------+------------------------------------------------+ 2787 | 0x18 | Unsigned integer (one-byte uint8_t follows) | 2788 +------------+------------------------------------------------+ 2789 | 0x19 | Unsigned integer (two-byte uint16_t follows) | 2790 +------------+------------------------------------------------+ 2791 | 0x1a | Unsigned integer (four-byte uint32_t follows) | 2792 +------------+------------------------------------------------+ 2793 | 0x1b | Unsigned integer (eight-byte uint64_t follows) | 2794 +------------+------------------------------------------------+ 2795 | 0x20..0x37 | Negative integer -1-0x00..-1-0x17 (-1..-24) | 2796 +------------+------------------------------------------------+ 2797 | 0x38 | Negative integer -1-n (one-byte uint8_t for n | 2798 | | follows) | 2799 +------------+------------------------------------------------+ 2800 | 0x39 | Negative integer -1-n (two-byte uint16_t for n | 2801 | | follows) | 2802 +------------+------------------------------------------------+ 2803 | 0x3a | Negative integer -1-n (four-byte uint32_t for | 2804 | | n follows) | 2805 +------------+------------------------------------------------+ 2806 | 0x3b | Negative integer -1-n (eight-byte uint64_t for | 2807 | | n follows) | 2808 +------------+------------------------------------------------+ 2809 | 0x40..0x57 | byte string (0x00..0x17 bytes follow) | 2810 +------------+------------------------------------------------+ 2811 | 0x58 | byte string (one-byte uint8_t for n, and then | 2812 | | n bytes follow) | 2813 +------------+------------------------------------------------+ 2814 | 0x59 | byte string (two-byte uint16_t for n, and then | 2815 | | n bytes follow) | 2816 +------------+------------------------------------------------+ 2817 | 0x5a | byte string (four-byte uint32_t for n, and | 2818 | | then n bytes follow) | 2819 +------------+------------------------------------------------+ 2820 | 0x5b | byte string (eight-byte uint64_t for n, and | 2821 | | then n bytes follow) | 2822 +------------+------------------------------------------------+ 2823 | 0x5f | byte string, byte strings follow, terminated | 2824 | | by "break" | 2825 +------------+------------------------------------------------+ 2826 | 0x60..0x77 | UTF-8 string (0x00..0x17 bytes follow) | 2827 +------------+------------------------------------------------+ 2828 | 0x78 | UTF-8 string (one-byte uint8_t for n, and then | 2829 | | n bytes follow) | 2830 +------------+------------------------------------------------+ 2831 | 0x79 | UTF-8 string (two-byte uint16_t for n, and | 2832 | | then n bytes follow) | 2833 +------------+------------------------------------------------+ 2834 | 0x7a | UTF-8 string (four-byte uint32_t for n, and | 2835 | | then n bytes follow) | 2836 +------------+------------------------------------------------+ 2837 | 0x7b | UTF-8 string (eight-byte uint64_t for n, and | 2838 | | then n bytes follow) | 2839 +------------+------------------------------------------------+ 2840 | 0x7f | UTF-8 string, UTF-8 strings follow, terminated | 2841 | | by "break" | 2842 +------------+------------------------------------------------+ 2843 | 0x80..0x97 | array (0x00..0x17 data items follow) | 2844 +------------+------------------------------------------------+ 2845 | 0x98 | array (one-byte uint8_t for n, and then n data | 2846 | | items follow) | 2847 +------------+------------------------------------------------+ 2848 | 0x99 | array (two-byte uint16_t for n, and then n | 2849 | | data items follow) | 2850 +------------+------------------------------------------------+ 2851 | 0x9a | array (four-byte uint32_t for n, and then n | 2852 | | data items follow) | 2853 +------------+------------------------------------------------+ 2854 | 0x9b | array (eight-byte uint64_t for n, and then n | 2855 | | data items follow) | 2856 +------------+------------------------------------------------+ 2857 | 0x9f | array, data items follow, terminated by | 2858 | | "break" | 2859 +------------+------------------------------------------------+ 2860 | 0xa0..0xb7 | map (0x00..0x17 pairs of data items follow) | 2861 +------------+------------------------------------------------+ 2862 | 0xb8 | map (one-byte uint8_t for n, and then n pairs | 2863 | | of data items follow) | 2864 +------------+------------------------------------------------+ 2865 | 0xb9 | map (two-byte uint16_t for n, and then n pairs | 2866 | | of data items follow) | 2867 +------------+------------------------------------------------+ 2868 | 0xba | map (four-byte uint32_t for n, and then n | 2869 | | pairs of data items follow) | 2870 +------------+------------------------------------------------+ 2871 | 0xbb | map (eight-byte uint64_t for n, and then n | 2872 | | pairs of data items follow) | 2873 +------------+------------------------------------------------+ 2874 | 0xbf | map, pairs of data items follow, terminated by | 2875 | | "break" | 2876 +------------+------------------------------------------------+ 2877 | 0xc0 | Text-based date/time (data item follows; see | 2878 | | Section 3.4.1) | 2879 +------------+------------------------------------------------+ 2880 | 0xc1 | Epoch-based date/time (data item follows; see | 2881 | | Section 3.4.2) | 2882 +------------+------------------------------------------------+ 2883 | 0xc2 | Positive bignum (data item "byte string" | 2884 | | follows) | 2885 +------------+------------------------------------------------+ 2886 | 0xc3 | Negative bignum (data item "byte string" | 2887 | | follows) | 2888 +------------+------------------------------------------------+ 2889 | 0xc4 | Decimal Fraction (data item "array" follows; | 2890 | | see Section 3.4.4) | 2891 +------------+------------------------------------------------+ 2892 | 0xc5 | Bigfloat (data item "array" follows; see | 2893 | | Section 3.4.4) | 2894 +------------+------------------------------------------------+ 2895 | 0xc6..0xd4 | (tag) | 2896 +------------+------------------------------------------------+ 2897 | 0xd5..0xd7 | Expected Conversion (data item follows; see | 2898 | | Section 3.4.5.2) | 2899 +------------+------------------------------------------------+ 2900 | 0xd8..0xdb | (more tags, 1/2/4/8 bytes and then a data item | 2901 | | follow) | 2902 +------------+------------------------------------------------+ 2903 | 0xe0..0xf3 | (simple value) | 2904 +------------+------------------------------------------------+ 2905 | 0xf4 | False | 2906 +------------+------------------------------------------------+ 2907 | 0xf5 | True | 2908 +------------+------------------------------------------------+ 2909 | 0xf6 | Null | 2910 +------------+------------------------------------------------+ 2911 | 0xf7 | Undefined | 2912 +------------+------------------------------------------------+ 2913 | 0xf8 | (simple value, one byte follows) | 2914 +------------+------------------------------------------------+ 2915 | 0xf9 | Half-Precision Float (two-byte IEEE 754) | 2916 +------------+------------------------------------------------+ 2917 | 0xfa | Single-Precision Float (four-byte IEEE 754) | 2918 +------------+------------------------------------------------+ 2919 | 0xfb | Double-Precision Float (eight-byte IEEE 754) | 2920 +------------+------------------------------------------------+ 2921 | 0xff | "break" stop code | 2922 +------------+------------------------------------------------+ 2924 Table 7: Jump Table for Initial Byte 2926 Appendix C. Pseudocode 2928 The well-formedness of a CBOR item can be checked by the pseudocode 2929 in Figure 1. The data is well-formed if and only if: 2931 * the pseudocode does not "fail"; 2933 * after execution of the pseudocode, no bytes are left in the input 2934 (except in streaming applications) 2936 The pseudocode has the following prerequisites: 2938 * take(n) reads n bytes from the input data and returns them as a 2939 byte string. If n bytes are no longer available, take(n) fails. 2941 * uint() converts a byte string into an unsigned integer by 2942 interpreting the byte string in network byte order. 2944 * Arithmetic works as in C. 2946 * All variables are unsigned integers of sufficient range. 2948 Note that "well_formed" returns the major type for well-formed 2949 definite length items, but 0 for an indefinite length item (or -1 for 2950 a break stop code, only if "breakable" is set). This is used in 2951 "well_formed_indefinite" to ascertain that indefinite length strings 2952 only contain definite length strings as chunks. 2954 well_formed (breakable = false) { 2955 // process initial bytes 2956 ib = uint(take(1)); 2957 mt = ib >> 5; 2958 val = ai = ib & 0x1f; 2959 switch (ai) { 2960 case 24: val = uint(take(1)); break; 2961 case 25: val = uint(take(2)); break; 2962 case 26: val = uint(take(4)); break; 2963 case 27: val = uint(take(8)); break; 2964 case 28: case 29: case 30: fail(); 2965 case 31: 2966 return well_formed_indefinite(mt, breakable); 2967 } 2968 // process content 2969 switch (mt) { 2970 // case 0, 1, 7 do not have content; just use val 2971 case 2: case 3: take(val); break; // bytes/UTF-8 2972 case 4: for (i = 0; i < val; i++) well_formed(); break; 2973 case 5: for (i = 0; i < val*2; i++) well_formed(); break; 2974 case 6: well_formed(); break; // 1 embedded data item 2975 case 7: if (ai == 24 && val < 32) fail(); // bad simple 2976 } 2977 return mt; // finite data item 2978 } 2980 well_formed_indefinite(mt, breakable) { 2981 switch (mt) { 2982 case 2: case 3: 2983 while ((it = well_formed(true)) != -1) 2984 if (it != mt) // need finite-length chunk 2985 fail(); // of same type 2986 break; 2987 case 4: while (well_formed(true) != -1); break; 2988 case 5: while (well_formed(true) != -1) well_formed(); break; 2989 case 7: 2990 if (breakable) 2991 return -1; // signal break out 2992 else fail(); // no enclosing indefinite 2993 default: fail(); // wrong mt 2994 } 2995 return 0; // no break out 2996 } 2998 Figure 1: Pseudocode for Well-Formedness Check 3000 Note that the remaining complexity of a complete CBOR decoder is 3001 about presenting data that has been decoded to the application in an 3002 appropriate form. 3004 Major types 0 and 1 are designed in such a way that they can be 3005 encoded in C from a signed integer without actually doing an if-then- 3006 else for positive/negative (Figure 2). This uses the fact that 3007 (-1-n), the transformation for major type 1, is the same as ~n 3008 (bitwise complement) in C unsigned arithmetic; ~n can then be 3009 expressed as (-1)^n for the negative case, while 0^n leaves n 3010 unchanged for non-negative. The sign of a number can be converted to 3011 -1 for negative and 0 for non-negative (0 or positive) by arithmetic- 3012 shifting the number by one bit less than the bit length of the number 3013 (for example, by 63 for 64-bit numbers). 3015 void encode_sint(int64_t n) { 3016 uint64t ui = n >> 63; // extend sign to whole length 3017 mt = ui & 0x20; // extract major type 3018 ui ^= n; // complement negatives 3019 if (ui < 24) 3020 *p++ = mt + ui; 3021 else if (ui < 256) { 3022 *p++ = mt + 24; 3023 *p++ = ui; 3024 } else 3025 ... 3027 Figure 2: Pseudocode for Encoding a Signed Integer 3029 Appendix D. Half-Precision 3031 As half-precision floating-point numbers were only added to IEEE 754 3032 in 2008 [IEEE754], today's programming platforms often still only 3033 have limited support for them. It is very easy to include at least 3034 decoding support for them even without such support. An example of a 3035 small decoder for half-precision floating-point numbers in the C 3036 language is shown in Figure 3. A similar program for Python is in 3037 Figure 4; this code assumes that the 2-byte value has already been 3038 decoded as an (unsigned short) integer in network byte order (as 3039 would be done by the pseudocode in Appendix C). 3041 #include 3043 double decode_half(unsigned char *halfp) { 3044 int half = (halfp[0] << 8) + halfp[1]; 3045 int exp = (half >> 10) & 0x1f; 3046 int mant = half & 0x3ff; 3047 double val; 3048 if (exp == 0) val = ldexp(mant, -24); 3049 else if (exp != 31) val = ldexp(mant + 1024, exp - 25); 3050 else val = mant == 0 ? INFINITY : NAN; 3051 return half & 0x8000 ? -val : val; 3052 } 3054 Figure 3: C Code for a Half-Precision Decoder 3056 import struct 3057 from math import ldexp 3059 def decode_single(single): 3060 return struct.unpack("!f", struct.pack("!I", single))[0] 3062 def decode_half(half): 3063 valu = (half & 0x7fff) << 13 | (half & 0x8000) << 16 3064 if ((half & 0x7c00) != 0x7c00): 3065 return ldexp(decode_single(valu), 112) 3066 return decode_single(valu | 0x7f800000) 3068 Figure 4: Python Code for a Half-Precision Decoder 3070 Appendix E. Comparison of Other Binary Formats to CBOR's Design 3071 Objectives 3073 The proposal for CBOR follows a history of binary formats that is as 3074 long as the history of computers themselves. Different formats have 3075 had different objectives. In most cases, the objectives of the 3076 format were never stated, although they can sometimes be implied by 3077 the context where the format was first used. Some formats were meant 3078 to be universally usable, although history has proven that no binary 3079 format meets the needs of all protocols and applications. 3081 CBOR differs from many of these formats due to it starting with a set 3082 of objectives and attempting to meet just those. This section 3083 compares a few of the dozens of formats with CBOR's objectives in 3084 order to help the reader decide if they want to use CBOR or a 3085 different format for a particular protocol or application. 3087 Note that the discussion here is not meant to be a criticism of any 3088 format: to the best of our knowledge, no format before CBOR was meant 3089 to cover CBOR's objectives in the priority we have assigned them. A 3090 brief recap of the objectives from Section 1.1 is: 3092 1. unambiguous encoding of most common data formats from Internet 3093 standards 3095 2. code compactness for encoder or decoder 3097 3. no schema description needed 3099 4. reasonably compact serialization 3101 5. applicability to constrained and unconstrained applications 3103 6. good JSON conversion 3105 7. extensibility 3107 A discussion of CBOR and other formats with respect to a different 3108 set of design objectives is provided in Section 5 and Appendix C of 3109 [RFC8618]. 3111 E.1. ASN.1 DER, BER, and PER 3113 [ASN.1] has many serializations. In the IETF, DER and BER are the 3114 most common. The serialized output is not particularly compact for 3115 many items, and the code needed to decode numeric items can be 3116 complex on a constrained device. 3118 Few (if any) IETF protocols have adopted one of the several variants 3119 of Packed Encoding Rules (PER). There could be many reasons for 3120 this, but one that is commonly stated is that PER makes use of the 3121 schema even for parsing the surface structure of the data stream, 3122 requiring significant tool support. There are different versions of 3123 the ASN.1 schema language in use, which has also hampered adoption. 3125 E.2. MessagePack 3127 [MessagePack] is a concise, widely implemented counted binary 3128 serialization format, similar in many properties to CBOR, although 3129 somewhat less regular. While the data model can be used to represent 3130 JSON data, MessagePack has also been used in many remote procedure 3131 call (RPC) applications and for long-term storage of data. 3133 MessagePack has been essentially stable since it was first published 3134 around 2011; it has not yet had a transition. The evolution of 3135 MessagePack is impeded by an imperative to maintain complete 3136 backwards compatibility with existing stored data, while only few 3137 bytecodes are still available for extension. Repeated requests over 3138 the years from the MessagePack user community to separate out binary 3139 and text strings in the encoding recently have led to an extension 3140 proposal that would leave MessagePack's "raw" data ambiguous between 3141 its usages for binary and text data. The extension mechanism for 3142 MessagePack remains unclear. 3144 E.3. BSON 3146 [BSON] is a data format that was developed for the storage of JSON- 3147 like maps (JSON objects) in the MongoDB database. Its major 3148 distinguishing feature is the capability for in-place update, which 3149 prevents a compact representation. BSON uses a counted 3150 representation except for map keys, which are null-byte terminated. 3151 While BSON can be used for the representation of JSON-like objects on 3152 the wire, its specification is dominated by the requirements of the 3153 database application and has become somewhat baroque. The status of 3154 how BSON extensions will be implemented remains unclear. 3156 E.4. MSDTP: RFC 713 3158 Message Services Data Transmission (MSDTP) is a very early example of 3159 a compact message format; it is described in [RFC0713], written in 3160 1976. It is included here for its historical value, not because it 3161 was ever widely used. 3163 E.5. Conciseness on the Wire 3165 While CBOR's design objective of code compactness for encoders and 3166 decoders is a higher priority than its objective of conciseness on 3167 the wire, many people focus on the wire size. Table 8 shows some 3168 encoding examples for the simple nested array [1, [2, 3]]; where some 3169 form of indefinite-length encoding is supported by the encoding, 3170 [_ 1, [2, 3]] (indefinite length on the outer array) is also shown. 3172 +-------------+----------------------------+----------------+ 3173 | Format | [1, [2, 3]] | [_ 1, [2, 3]] | 3174 +=============+============================+================+ 3175 | RFC 713 | c2 05 81 c2 02 82 83 | | 3176 +-------------+----------------------------+----------------+ 3177 | ASN.1 BER | 30 0b 02 01 01 30 06 02 01 | 30 80 02 01 01 | 3178 | | 02 02 01 03 | 30 06 02 01 02 | 3179 | | | 02 01 03 00 00 | 3180 +-------------+----------------------------+----------------+ 3181 | MessagePack | 92 01 92 02 03 | | 3182 +-------------+----------------------------+----------------+ 3183 | BSON | 22 00 00 00 10 30 00 01 00 | | 3184 | | 00 00 04 31 00 13 00 00 00 | | 3185 | | 10 30 00 02 00 00 00 10 31 | | 3186 | | 00 03 00 00 00 00 00 | | 3187 +-------------+----------------------------+----------------+ 3188 | CBOR | 82 01 82 02 03 | 9f 01 82 02 03 | 3189 | | | ff | 3190 +-------------+----------------------------+----------------+ 3192 Table 8: Examples for Different Levels of Conciseness 3194 Appendix F. Changes from RFC 7049 3196 The following is a list of known changes from RFC 7049. This list is 3197 non-authoritative. It is meant to help reviewers see the significant 3198 differences. 3200 * Made some use of new RFCXML functionality [RFC7991] 3202 * Updated references, e.g. for [RFC4627] to [RFC8259] in many 3203 places, for [CNN-TERMS] to [RFC7228]; added missing reference to 3204 [IEEE754] and updated to [ECMA262] 3206 * Fixed errata: in the example in Section 2.4.2 ("29" -> "49"), and 3207 in the last paragraph of Section 3.6 ("0b000_11101" -> 3208 "0b000_11001") 3210 * Added a comment to the last example in Section 3.2.2 (added 3211 "Second value") 3213 * Applied numerous small editorial changes 3215 * Added a few tables for illustration 3217 * More stringently used terminology for well-formed and valid data, 3218 avoiding less well-defined alternative terms such as "syntax 3219 error", "decoding error" and "strict mode" outside examples 3221 * Streamlined terminology to talk about tags, tag numbers, and tag 3222 content 3224 * Clarified the restrictions on tag content, in general and 3225 specifically for tag 1 3227 * Added text about the CBOR data model and its small variations 3228 (basic generic, extended generic, specific) 3230 * More clearly separated integers from floating-point values; 3231 provided a suggestion (based on I-JSON [RFC7493]) for handling 3232 these types when converting JSON to CBOR 3234 * Added term "preferred serialization" and defined it for various 3235 kinds of data items 3237 * Added comment about tags with semantics that depend on 3238 serialization order 3240 * Defined "deterministic encoding", making use of "preferred 3241 serialization", and simplified the suggested map ordering for the 3242 "Core Deterministic Encoding Requirements", easing implementation, 3243 while keeping RFC 7049 map ordering as an alternative "length- 3244 first map key ordering"; now avoiding the terms "canonical" and 3245 "canonicalization" 3247 * Clarified map validity (handling of duplicate keys) and explained 3248 the domain of applicability of certain implementation choices 3250 * Updated IANA considerations 3252 * Added security considerations 3254 * Clarified handling of non-well-formed simple values in text and 3255 pseudocode 3257 * Added Appendix G, well-formedness errors and examples 3259 * Removed UBJSON from Appendix E, as that format has completely 3260 changed since RFC 7049; added reference to [RFC8618] 3262 Appendix G. Well-formedness errors and examples 3264 There are three basic kinds of well-formedness errors that can occur 3265 in decoding a CBOR data item: 3267 * Too much data: There are input bytes left that were not consumed. 3268 This is only an error if the application assumed that the input 3269 bytes would span exactly one data item. Where the application 3270 uses the self-delimiting nature of CBOR encoding to permit 3271 additional data after the data item, as is for example done in 3272 CBOR sequences [RFC8742], the CBOR decoder can simply indicate 3273 what part of the input has not been consumed. 3275 * Too little data: The input data available would need additional 3276 bytes added at their end for a complete CBOR data item. This may 3277 indicate the input is truncated; it is also a common error when 3278 trying to decode random data as CBOR. For some applications 3279 however, this may not actually be an error, as the application may 3280 not be certain it has all the data yet and can obtain or wait for 3281 additional input bytes. Some of these applications may have an 3282 upper limit for how much additional data can show up; here the 3283 decoder may be able to indicate that the encoded CBOR data item 3284 cannot be completed within this limit. 3286 * Syntax error: The input data are not consistent with the 3287 requirements of the CBOR encoding, and this cannot be remedied by 3288 adding (or removing) data at the end. 3290 In Appendix C, errors of the first kind are addressed in the first 3291 paragraph/bullet list (requiring "no bytes are left"), and errors of 3292 the second kind are addressed in the second paragraph/bullet list 3293 (failing "if n bytes are no longer available"). Errors of the third 3294 kind are identified in the pseudocode by specific instances of 3295 calling fail(), in order: 3297 * a reserved value is used for additional information (28, 29, 30) 3299 * major type 7, additional information 24, value < 32 (incorrect or 3300 incorrectly encoded simple type) 3302 * incorrect substructure of indefinite length byte/text string (may 3303 only contain definite length strings of the same major type) 3305 * break stop code (mt=7, ai=31) occurs in a value position of a map 3306 or except at a position directly in an indefinite length item 3307 where also another enclosed data item could occur 3309 * additional information 31 used with major type 0, 1, or 6 3311 G.1. Examples for CBOR data items that are not well-formed 3313 This subsection shows a few examples for CBOR data items that are not 3314 well-formed. Each example is a sequence of bytes each shown in 3315 hexadecimal; multiple examples in a list are separated by commas. 3317 Examples for well-formedness error kind 1 (too much data) can easily 3318 be formed by adding data to a well-formed encoded CBOR data item. 3320 Similarly, examples for well-formedness error kind 2 (too little 3321 data) can be formed by truncating a well-formed encoded CBOR data 3322 item. In test suites, it may be beneficial to specifically test with 3323 incomplete data items that would require large amounts of addition to 3324 be completed (for instance by starting the encoding of a string of a 3325 very large size). 3327 A premature end of the input can occur in a head or within the 3328 enclosed data, which may be bare strings or enclosed data items that 3329 are either counted or should have been ended by a break stop code. 3331 * End of input in a head: 18, 19, 1a, 1b, 19 01, 1a 01 02, 1b 01 02 3332 03 04 05 06 07, 38, 58, 78, 98, 9a 01 ff 00, b8, d8, f8, f9 00, fa 3333 00 00, fb 00 00 00 3335 * Definite length strings with short data: 41, 61, 5a ff ff ff ff 3336 00, 5b ff ff ff ff ff ff ff ff 01 02 03, 7a ff ff ff ff 00, 7b 7f 3337 ff ff ff ff ff ff ff 01 02 03 3339 * Definite length maps and arrays not closed with enough items: 81, 3340 81 81 81 81 81 81 81 81 81, 82 00, a1, a2 01 02, a1 00, a2 00 00 3341 00 3343 * Tag number not followed by tag content: c0 3345 * Indefinite length strings not closed by a break stop code: 5f 41 3346 00, 7f 61 00 3348 * Indefinite length maps and arrays not closed by a break stop code: 3349 9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f 9f 9f 3350 ff ff ff ff, 9f 81 9f 81 9f 9f ff ff ff 3352 A few examples for the five subkinds of well-formedness error kind 3 3353 (syntax error) are shown below. 3355 Subkind 1: 3357 * Reserved additional information values: 1c, 1d, 1e, 3c, 3d, 3e, 3358 5c, 5d, 5e, 7c, 7d, 7e, 9c, 9d, 9e, bc, bd, be, dc, dd, de, fc, 3359 fd, fe, 3361 Subkind 2: 3363 * Reserved two-byte encodings of simple types: f8 00, f8 01, f8 18, 3364 f8 1f 3366 Subkind 3: 3368 * Indefinite length string chunks not of the correct type: 5f 00 ff, 3369 5f 21 ff, 5f 61 00 ff, 5f 80 ff, 5f a0 ff, 5f c0 00 ff, 5f e0 ff, 3370 7f 41 00 ff 3372 * Indefinite length string chunks not definite length: 5f 5f 41 00 3373 ff ff, 7f 7f 61 00 ff ff 3375 Subkind 4: 3377 * Break occurring on its own outside of an indefinite length item: 3378 ff 3380 * Break occurring in a definite length array or map or a tag: 81 ff, 3381 82 00 ff, a1 ff, a1 ff 00, a1 00 ff, a2 00 00 ff, 9f 81 ff, 9f 82 3382 9f 81 9f 9f ff ff ff ff 3384 * Break in indefinite length map would lead to odd number of items 3385 (break in a value position): bf 00 ff, bf 00 00 00 ff 3387 Subkind 5: 3389 * Major type 0, 1, 6 with additional information 31: 1f, 3f, df 3391 Acknowledgements 3393 CBOR was inspired by MessagePack. MessagePack was developed and 3394 promoted by Sadayuki Furuhashi ("frsyuki"). This reference to 3395 MessagePack is solely for attribution; CBOR is not intended as a 3396 version of or replacement for MessagePack, as it has different design 3397 goals and requirements. 3399 The need for functionality beyond the original MessagePack 3400 Specification became obvious to many people at about the same time 3401 around the year 2012. BinaryPack is a minor derivation of 3402 MessagePack that was developed by Eric Zhang for the binaryjs 3403 project. A similar, but different, extension was made by Tim Caswell 3404 for his msgpack-js and msgpack-js-browser projects. Many people have 3405 contributed to the discussion about extending MessagePack to separate 3406 text string representation from byte string representation. 3408 The encoding of the additional information in CBOR was inspired by 3409 the encoding of length information designed by Klaus Hartke for CoAP. 3411 This document also incorporates suggestions made by many people, 3412 notably Dan Frost, James Manger, Jeffrey Yasskin, Joe Hildebrand, 3413 Keith Moore, Laurence Lundblade, Matthew Lepinski, Michael 3414 Richardson, Nico Williams, Peter Occil, Phillip Hallam-Baker, Ray 3415 Polk, Tim Bray, Tony Finch, Tony Hansen, and Yaron Sheffer. 3417 Authors' Addresses 3419 Carsten Bormann 3420 Universitaet Bremen TZI 3421 Postfach 330440 3422 D-28359 Bremen 3423 Germany 3425 Phone: +49-421-218-63921 3426 Email: cabo@tzi.org 3428 Paul Hoffman 3429 ICANN 3431 Email: paul.hoffman@icann.org