idnits 2.17.1 draft-hallambaker-jsonbcd-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 8, 2016) is 2971 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC2119' is mentioned on line 458, but not defined == Missing Reference: 'RFC4627' is mentioned on line 154, but not defined ** Obsolete undefined reference: RFC 4627 (Obsoleted by RFC 7158, RFC 7159) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Hallam-Baker 3 Internet-Draft Comodo Group Inc. 4 Intended status: Informational March 8, 2016 5 Expires: September 9, 2016 7 Title 8 draft-hallambaker-jsonbcd-05 10 Abstract 12 Binary Encodings for JavaScript Object Notation: JSON-B, JSON-C, 13 JSON-D 15 Status of This Memo 17 This Internet-Draft is submitted in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF). Note that other groups may also distribute 22 working documents as Internet-Drafts. The list of current Internet- 23 Drafts is at http://datatracker.ietf.org/drafts/current/. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on September 9, 2016. 32 Copyright Notice 34 Copyright (c) 2016 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the Simplified BSD License. 47 1. Abstract 49 Three binary encodings for JavaScript Object Notation (JSON) are 50 presented. JSON-B (Binary) is a strict superset of the JSON encoding 51 that permits efficient binary encoding of intrinsic JavaScript data 52 types. JSON-C (Compact) is a strict superset of JSON-B that supports 53 compact representation of repeated data strings with short numeric 54 codes. JSON-D (Data) supports additional binary data types for 55 integer and floating point representations for use in scientific 56 applications where conversion between binary and decimal 57 representations would cause a loss of precision. 59 2. Definitions 61 2.1. Requirements Language 63 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 64 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 65 document are to be interpreted as described in [RFC2119]. 67 3. Introduction 69 JavaScript Object Notation (JSON) is a simple text encoding for the 70 JavaScript Data model that has found wide application beyond its 71 original field of use. In particular JSON has rapidly become a 72 preferred encoding for Web Services. 74 JSON encoding supports just four fundamental data types (integer, 75 floating point, string and boolean), arrays and objects which consist 76 of a list of tag-value pairs. 78 Although the JSON encoding is sufficient for many purposes it is not 79 always efficient. In particular there is no efficient representation 80 for blocks of binary data. Use of base64 encoding increases data 81 volume by 33%. This overhead increases exponentially in applications 82 where nested binary encodings are required making use of JSON 83 encoding unsatisfactory in cryptographic applications where nested 84 binary structures are frequently required. 86 Another source of inefficiency in JSON encoding is the repeated 87 occurrence of object tags. A JSON encoding containing an array of a 88 hundred objects such as {"first":1,"second":2} will contain a hundred 89 occurrences of the string "first" (seven bytes) and a hundred 90 occurrences of the string "second" (eight bytes). Using two byte 91 code sequences in place of strings allows a saving of 11 bytes per 92 object without loss of information, a saving of 50%. 94 A third objection to the use of JSON encoding is that floating point 95 numbers can only be represented in decimal form and this necessarily 96 involves a loss of precision when converting between binary and 97 decimal representations. While such issues are rarely important in 98 network applications they can be critical in scientific applications. 99 It is not acceptable for saving and restoring a data set to change 100 the result of a calculation. 102 3.1. Objectives 104 The following were identified as core objectives for a binary JSON 105 encoding: 107 o 109 * Low overhead encoding and decoding 111 * Easy to convert existing encoders and decoders to add binary 112 support 114 * Efficient encoding of binary data 116 * Ability to convert from JSON to binary encoding in a streaming 117 mode (i.e. without reading the entire binary data block before 118 beginning encoding. 120 * Lossless encoding of JavaScript data types 122 * The ability to support JSON tag compression and extended data 123 types are considered desirable but not essential for typical 124 network applications. 126 Three binary encodings are defined: 128 JSON-B (Binary) 130 Simply encodes JSON data in binary. Only the JavaScript data model 131 is supported (i.e. atomic types are integers, double or string). 132 Integers may be 8, 16, 32 or 64 bits either signed or unsigned. 133 Floating points are IEEE 754 binary64 format [IEEE-754]. Supports 134 chunked encoding for binary and UTF-8 string types. 136 JSON-C (Compact) 138 As JSON-B but with support for representing JSON tags in numeric code 139 form (16 bit code space). This is done for both compact encoding and 140 to allow simplification of encoders/decoders in constrained 141 environments. Codes may be defined inline or by reference to a known 142 dictionary of codes referenced via a digest value. 144 JSON-D (Data) 146 As JSON-C but with support for representing additional data types 147 without loss of precision. In particular other IEEE 754 floating 148 point formats, both binary and decimal and Intel's 80 bit floating 149 point, plus 128 bit integers and bignum integers. 151 4. Extended JSON Grammar 153 The JSON-B, JSON-C and JSON-D encodings are all based on the JSON 154 grammar [RFC4627] using the same syntactic structure but different 155 lexical encodings. 157 JSON-B0 and JSON-C0 replace the JSON lexical encodings for strings 158 and numbers with binary encodings. JSON-B1 and JSON-C1 allow either 159 lexical encoding to be used. Thus any valid JSON encoding is a valid 160 JSON-B1 or JSON-C1 encoding. 162 The grammar of JSON-B, JSON-C and JSON-D is a superset of the JSON 163 grammar. The following productions are added to the grammar: 165 x-value 167 Binary encodings for data values. As the binary value encodings are 168 all self delimiting 170 x-member 172 An object member where the value is specified as an X-value and thus 173 does not require a value-separator. 175 b-value 177 Binary data encodings defined in JSON-B. 179 b-string 181 Defined length string encoding defined in JSON-B. 183 c-def 185 Tag code definition defined in JSON-C. These may only appear before 186 the beginning of an Object or Array and before any preceeding white 187 space. 189 c-tag 191 Tag code value defined in JSON-C. 193 d-value 195 Additional binary data encodings defined in JSON-D for use in 196 scientific data applications. 198 The JSON grammar is modified to permit the use of x-value productions 199 in place of ( value value-separator ) : 201 JSON-text = (object / array) 203 object = *cdef begin-object [ 204 *( member value-separator | x-member ) 205 (member | x-member) ] end-object 207 member = tag value 208 x-member = tag x-value 210 tag = string name-separator | b-string | c-tag 212 array = *cdef begin-array [ *( value value-separator | x-value ) 213 (value | x-value) ] end-array 215 x-value = b-value / d-value 217 value = false / null / true / object / array / number / string 219 name-separator = ws %x3A ws ; : colon 220 value-separator = ws %x2C ws ; , comma 222 The following lexical values are unchanged: 223 begin-array = ws %x5B ws ; [ left square bracket 224 begin-object = ws %x7B ws ; { left curly bracket 225 end-array = ws %x5D ws ; ] right square bracket 226 end-object = ws %x7D ws ; } right curly bracket 228 ws = *( %x20 %x09 %x0A %x0D ) 230 false = %x66.61.6c.73.65 ; false 231 null = %x6e.75.6c.6c ; null 232 true = %x74.72.75.65 ; true 234 The productions number and string are defined as before: 236 number = [ minus ] int [ frac ] [ exp ] 237 decimal-point = %x2E ; . 238 digit1-9 = %x31-39 ; 1-9 239 e = %x65 / %x45 ; e E 240 exp = e [ minus / plus ] 1*DIGIT 241 frac = decimal-point 1*DIGIT 242 int = zero / ( digit1-9 *DIGIT ) 243 minus = %x2D ; - 244 plus = %x2B ; + 245 zero = %x30 ; 0 247 string = quotation-mark *char quotation-mark 248 char = unescaped / 249 escape ( %x22 / %x5C / %x2F / %x62 / %x66 / 250 %x6E / %x72 / %x74 / %x75 4HEXDIG ) 252 escape = %x5C ; \ 253 quotation-mark = %x22 ; " 254 unescaped = %x20-21 / %x23-5B / %x5D-10FFFF 256 5. JSON-B 258 The JSON-B encoding defines the b-value and b-string productions: 260 b-value = b-atom | b-string | b-data | b-integer | 261 b-float 263 b-string = *( string-chunk ) string-term 264 b-data = *( data-chunk ) data-last 266 b-integer = p-int8 | p-int16 | p-int32 | p-int64 | p-bignum16 | 267 n-int8 | n-int16 | n-int32 | n-int64 | n-bignum16 269 b-float = binary64 271 The lexical encodings of the productions are defined in the following 272 table where the column 'tag' specifies the byte code that begins the 273 production, 'Fixed' specifies the number of data bytes that follow 274 and 'Length' specifies the number of bytes used to define the length 275 of a variable length field following the data bytes: 277 +--------------+-----+-------+--------+-----------------------------+ 278 | Production | Tag | Fixed | Length | Data Description | 279 +--------------+-----+-------+--------+-----------------------------+ 280 | string-term | x80 | - | 1 | Terminal String 8 bit | 281 | | | | | length | 282 | | | | | | 283 | string-term | x81 | - | 2 | Terminal String 16 bit | 284 | | | | | length | 285 | | | | | | 286 | string-term | x82 | - | 4 | Terminal String 32 bit | 287 | | | | | length | 288 | | | | | | 289 | string-term | x83 | - | 8 | Terminal String 64 bit | 290 | | | | | length | 291 | | | | | | 292 | string-chunk | x84 | - | 1 | Non-Terminal String 8 bit | 293 | | | | | length | 294 | | | | | | 295 | string-chunk | x85 | - | 2 | Non-Terminal String 16 bit | 296 | | | | | length | 297 | | | | | | 298 | string-chunk | x86 | - | 4 | Non-Terminal String 32 bit | 299 | | | | | length | 300 | | | | | | 301 | string-chunk | x87 | - | 8 | Non-Terminal String 64 bit | 302 | | | | | length | 303 | | | | | | 304 | data-term | x88 | - | 1 | Terminal Data 8 bit length | 305 | | | | | | 306 | data-term | x89 | - | 2 | Terminal Data 16 bit length | 307 | | | | | | 308 | data-term | x8A | - | 4 | Terminal Data 32 bit length | 309 | | | | | | 310 | data-term | x8B | - | 8 | Terminal Data 64 bit length | 311 | | | | | | 312 | data-chunk | x8C | - | 1 | Non-Terminal Data 8 bit | 313 | | | | | length | 314 | | | | | | 315 | data-chunk | x8D | - | 2 | Non-Terminal Data 16 bit | 316 | | | | | length | 317 | | | | | | 318 | data-chunk | x8E | - | 4 | Non-Terminal Data 32 bit | 319 | | | | | length | 320 | | | | | | 321 | data-chunk | x8F | - | 8 | Non-Terminal String 64 bit | 322 | | | | | length | 323 | | | | | | 324 | p-int8 | xA0 | 1 | - | Positive 8 bit Integer | 325 | | | | | | 326 | p-int16 | xA1 | 2 | - | Positive 16 bit Integer | 327 | | | | | | 328 | p-int32 | xA2 | 4 | - | Positive 32 bit Integer | 329 | | | | | | 330 | p-int64 | xA3 | 8 | - | Positive 64 bit Integer | 331 | | | | | | 332 | p-bignum16 | xA5 | - | 2 | Positive Bignum 16 bit | 333 | | | | | length | 334 | | | | | | 335 | n-int8 | xA8 | 1 | - | Negative 8 bit Integer | 336 | | | | | | 337 | n-int16 | xA9 | 2 | - | Negative 16 bit Integer | 338 | | | | | | 339 | n-int32 | xAA | 4 | - | Negative 32 bit Integer | 340 | | | | | | 341 | n-int64 | xAB | 8 | - | Negative 64 bit Integer | 342 | | | | | | 343 | n-bignum16 | xAD | - | 2 | Negative Bignum 16 bit | 344 | | | | | length | 345 | | | | | | 346 | binary64 | x92 | 8 | - | IEEE 754 Floating Point | 347 | | | | | binary64 | 348 | | | | | | 349 | b-value | xB0 | - | - | True | 350 | | | | | | 351 | b-value | xB1 | - | - | False | 352 | | | | | | 353 | b-value | xB2 | - | - | Null | 354 +--------------+-----+-------+--------+-----------------------------+ 356 A data type commonly used in networking that is not defined in this 357 scheme is a datetime representation. To define such a data type, a 358 string containing a date-time value in Internet type format is 359 typically used. 361 5.1. JSON-B Examples 363 The following examples show examples of using JSON-B encoding: 365 A0 2A 42 (as 8 bit integer) 366 A1 00 2A 42 (as 16 bit integer) 367 A2 00 00 00 2A 42 (as 32 bit integer) 368 A3 00 00 00 00 00 00 00 2A 42 (as 64 bit integer) 369 A5 00 01 42 42 (as Bignum) 371 80 05 48 65 6c 6c 6f "Hello" (single chunk) 372 81 00 05 48 65 6c 6c 6f "Hello" (single chunk) 373 84 05 48 65 6c 6c 6f 80 00 "Hello" (as two chunks) 375 92 3f f0 00 00 00 00 00 00 1.0 376 92 40 24 00 00 00 00 00 00 10.0 377 92 40 09 21 fb 54 44 2e ea 3.14159265359 378 92 bf f0 00 00 00 00 00 00 -1.0 380 B0 true 381 B1 false 382 B2 null 384 6. JSON-C 386 JSON-C (Compressed) permits numeric code values to be substituted for 387 strings and binary data. Tag codes MAY be 8, 16 or 32 bits long 388 encoded in network byte order. 390 Tag codes MUST be defined before they are referenced. A Tag code MAY 391 be defined before the corresponding data or string value is used or 392 at the same time that it is used. 394 A dictionary is a list of tag code definitions. An encoding MAY 395 incorporate definitions from a dictionary using the dict-hash 396 production. The dict hash production specifies a (positive) offset 397 value to be added to the entries in the dictionary followed by the 398 UDF fingerprint [draft-hallambaker-udf] of the dictionary to be used. 400 +------------+-----+-------+--------+-------------------------------+ 401 | Production | Tag | Fixed | Length | Data Description | 402 +------------+-----+-------+--------+-------------------------------+ 403 | c-tag | xC0 | 1 | - | 8 bit tag code | 404 | | | | | | 405 | c-tag | xC1 | 2 | - | 16 bit tag code | 406 | | | | | | 407 | c-tag | xC2 | 4 | - | 32 bit tag code | 408 | | | | | | 409 | c-def | xC4 | 1 | - | 8 bit tag definition | 410 | | | | | | 411 | c-def | xC5 | 2 | - | 16 bit tag definition | 412 | | | | | | 413 | c-def | xC6 | 4 | - | 32 bit tag definition | 414 | | | | | | 415 | c-tag | xC8 | 1 | - | 8 bit tag code & definition | 416 | | | | | | 417 | c-tag | xC9 | 2 | - | 16 bit tag code & definition | 418 | | | | | | 419 | c-tag | xCA | 4 | - | 32 bit tag code & definition | 420 | | | | | | 421 | c-def | xCC | 1 | - | 8 bit tag dictionary | 422 | | | | | definition | 423 | | | | | | 424 | c-def | xCD | 2 | - | 16 bit tag dictionary | 425 | | | | | definition | 426 | | | | | | 427 | c-def | xCE | 4 | - | 32 bit tag dictionary | 428 | | | | | definition | 429 | | | | | | 430 | dict-hash | xD0 | 4 | 1 | UDF fingerprint of dictionary | 431 +------------+-----+-------+--------+-------------------------------+ 433 All integer values are encoded in Network Byte Order (most 434 significant byte first). 436 6.1. JSON-C Examples 438 The following examples show examples of using JSON-C encoding: 440 C8 20 80 05 48 65 6c 6c 6f "Hello" 20 = "Hello" 441 C4 21 80 05 48 65 6c 6c 6f 21 = "Hello" 442 C0 20 "Hello" 443 C1 00 20 "Hello" 445 D0 00 00 01 00 20 Insert dictionary at code 256 446 e3 b0 c4 42 98 fc 1c 14 447 9a fb f4 c8 99 6f b9 24 448 27 ae 41 e4 64 9b 93 4c 449 a4 95 99 1b 78 52 b8 55 UDF (C4 21 80 05 48 65 6c 6c 6f) 451 7. JSON-D (Data) 453 JSON-B and JSON-C only support the two numeric types defined in the 454 JavaScript data model: Integers and 64 bit floating point values. 455 JSON-D (Data) defines binary encodings for additional data types that 456 are commonly used in scientific applications. These comprise 457 positive and negative 128 bit integers, six additional floating point 458 representations defined by IEEE 754 [RFC2119] and the Intel extended 459 precision 80 bit floating point representation. 461 Should the need arise, even bigger bignums could be defined with the 462 length specified as a 32 bit value permitting bignums of up to 2^35 463 bits to be represented. 465 d-value = d-integer | d-float 467 d-float = binary16 | binary32 | binary128 | binary80 | 468 decimal32 | decimal64 | decimal 128 470 8. 472 +------------+-----+-------+--------+-------------------------------+ 473 | Production | Tag | Fixed | Length | Data Description | 474 +------------+-----+-------+--------+-------------------------------+ 475 | p-int128 | xA4 | 16 | - | Positive 128 bit Integer | 476 | | | | | | 477 | n-in7128 | xAC | 16 | - | Negative 128 bit Integer | 478 | | | | | | 479 | binary16 | x90 | 2 | - | IEEE 754 Floating Point | 480 | | | | | binary16 | 481 | | | | | | 482 | binary32 | x91 | 4 | - | IEEE 754 Floating Point | 483 | | | | | binary32 | 484 | | | | | | 485 | binary128 | x94 | 16 | - | IEEE 754 Floating Point | 486 | | | | | binary128 | 487 | | | | | | 488 | intel80 | x95 | 10 | - | Intel 80 bit extended binary | 489 | | | | | Floating Point | 490 | | | | | | 491 | decimal32 | x96 | 4 | - | IEEE 754 Floating Point | 492 | | | | | decimal32 | 493 | | | | | | 494 | decimal64 | x97 | 8 | - | IEEE 754 Floating Point | 495 | | | | | decimal64 | 496 | | | | | | 497 | decimal128 | x98 | 18 | - | IEEE 754 Floating Point | 498 | | | | | decimal128 | 499 +------------+-----+-------+--------+-------------------------------+ 501 9. 503 10. Acknowledgements 505 This work was assisted by conversations with Nico Williams and other 506 participants on the applications area mailing list. 508 11. Security Considerations 510 A correctly implemented data encoding mechanism should not introduce 511 new security vulnerabilities. However, experience demonstrates that 512 some data encoding approaches are more prone to introduce 513 vulnerabilities when incorrectly implemented than others. 515 In particular, whenever variable length data formats are used, the 516 possibility of a buffer overrun vulnerability is introduced. While 517 best practice suggests that a coding language with native mechanisms 518 for bounds checking is the best protection against such errors, such 519 approaches are not always followed. While such vulnerabilities are 520 most commonly seen in the design of decoders, it is possible for the 521 same vulnerabilities to be exploited in encoders. 523 A common source of such errors is the case where nested length 524 encodings are used. For example, a decoder relies on an outermost 525 length encoding that specifies a length on 50 bytes to allocate 526 memory for the entire result and then attempts to copy a string with 527 a declared length of 1000 bytes within the sequence. 529 The extensions to the JSON encoding described in this document are 530 designed to avoid such errors. Length encodings are only used to 531 define the length of x-value constructions which are always terminal 532 and cannot have nested data entries. 534 12. IANA Considerations 536 [TBS list out all the code points that require an IANA registration] 538 13. Normative References 540 [IEEE-754] 541 "[Reference Not Found!]". 543 [draft-hallambaker-udf] 544 "[Reference Not Found!]". 546 Author's Address 548 Phillip Hallam-Baker 549 Comodo Group Inc. 551 Email: philliph@comodo.com