idnits 2.17.1 draft-hallambaker-jsonbcd-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 8, 2016) is 2964 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC2119' is mentioned on line 566, but not defined == Missing Reference: 'RFC4627' is mentioned on line 154, but not defined ** Obsolete undefined reference: RFC 4627 (Obsoleted by RFC 7158, RFC 7159) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Hallam-Baker 3 Internet-Draft Comodo Group Inc. 4 Intended status: Informational March 8, 2016 5 Expires: September 9, 2016 7 Title 8 draft-hallambaker-jsonbcd-04 10 Abstract 12 Binary Encodings for JavaScript Object Notation: JSON-B, JSON-C, 13 JSON-D 15 Status of This Memo 17 This Internet-Draft is submitted in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF). Note that other groups may also distribute 22 working documents as Internet-Drafts. The list of current Internet- 23 Drafts is at http://datatracker.ietf.org/drafts/current/. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on September 9, 2016. 32 Copyright Notice 34 Copyright (c) 2016 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the Simplified BSD License. 47 1. Abstract 49 Three binary encodings for JavaScript Object Notation (JSON) are 50 presented. JSON-B (Binary) is a strict superset of the JSON encoding 51 that permits efficient binary encoding of intrinsic JavaScript data 52 types. JSON-C (Compact) is a strict superset of JSON-B that supports 53 compact representation of repeated data strings with short numeric 54 codes. JSON-D (Data) supports additional binary data types for 55 integer and floating point representations for use in scientific 56 applications where conversion between binary and decimal 57 representations would cause a loss of precision. 59 2. Definitions 61 2.1. Requirements Language 63 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 64 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 65 document are to be interpreted as described in [RFC2119]. 67 3. Introduction 69 JavaScript Object Notation (JSON) is a simple text encoding for the 70 JavaScript Data model that has found wide application beyond its 71 original field of use. In particular JSON has rapidly become a 72 preferred encoding for Web Services. 74 JSON encoding supports just four fundamental data types (integer, 75 floating point, string and boolean), arrays and objects which consist 76 of a list of tag-value pairs. 78 Although the JSON encoding is sufficient for many purposes it is not 79 always efficient. In particular there is no efficient representation 80 for blocks of binary data. Use of base64 encoding increases data 81 volume by 33%. This overhead increases exponentially in applications 82 where nested binary encodings are required making use of JSON 83 encoding unsatisfactory in cryptographic applications where nested 84 binary structures are frequently required. 86 Another source of inefficiency in JSON encoding is the repeated 87 occurrence of object tags. A JSON encoding containing an array of a 88 hundred objects such as {"first":1,"second":2} will contain a hundred 89 occurrences of the string "first" (seven bytes) and a hundred 90 occurrences of the string "second" (eight bytes). Using two byte 91 code sequences in place of strings allows a saving of 11 bytes per 92 object without loss of information, a saving of 50%. 94 A third objection to the use of JSON encoding is that floating point 95 numbers can only be represented in decimal form and this necessarily 96 involves a loss of precision when converting between binary and 97 decimal representations. While such issues are rarely important in 98 network applications they can be critical in scientific applications. 99 It is not acceptable for saving and restoring a data set to change 100 the result of a calculation. 102 3.1. Objectives 104 The following were identified as core objectives for a binary JSON 105 encoding: 107 o 109 * Low overhead encoding and decoding 111 * Easy to convert existing encoders and decoders to add binary 112 support 114 * Efficient encoding of binary data 116 * Ability to convert from JSON to binary encoding in a streaming 117 mode (i.e. without reading the entire binary data block before 118 beginning encoding. 120 * Lossless encoding of JavaScript data types 122 * The ability to support JSON tag compression and extended data 123 types are considered desirable but not essential for typical 124 network applications. 126 Three binary encodings are defined: 128 JSON-B (Binary) 130 Simply encodes JSON data in binary. Only the JavaScript data model 131 is supported (i.e. atomic types are integers, double or string). 132 Integers may be 8, 16, 32 or 64 bits either signed or unsigned. 133 Floating points are IEEE 754 binary64 format [IEEE-754]. Supports 134 chunked encoding for binary and UTF-8 string types. 136 JSON-C (Compact) 138 As JSON-B but with support for representing JSON tags in numeric code 139 form (16 bit code space). This is done for both compact encoding and 140 to allow simplification of encoders/decoders in constrained 141 environments. Codes may be defined inline or by reference to a known 142 dictionary of codes referenced via a digest value. 144 JSON-D (Data) 146 As JSON-C but with support for representing additional data types 147 without loss of precision. In particular other IEEE 754 floating 148 point formats, both binary and decimal and Intel's 80 bit floating 149 point, plus 128 bit integers and bignum integers. 151 4. Extended JSON Grammar 153 The JSON-B, JSON-C and JSON-D encodings are all based on the JSON 154 grammar [RFC4627] using the same syntactic structure but different 155 lexical encodings. 157 JSON-B0 and JSON-C0 replace the JSON lexical encodings for strings 158 and numbers with binary encodings. JSON-B1 and JSON-C1 allow either 159 lexical encoding to be used. Thus any valid JSON encoding is a valid 160 JSON-B1 or JSON-C1 encoding. 162 The grammar of JSON-B, JSON-C and JSON-D is a superset of the JSON 163 grammar. The following productions are added to the grammar: 165 x-value 167 Binary encodings for data values. As the binary value encodings are 168 all self delimiting 170 x-member 172 An object member where the value is specified as an X-value and thus 173 does not require a value-separator. 175 b-value 177 Binary data encodings defined in JSON-B. 179 b-string 181 Defined length string encoding defined in JSON-B. 183 c-def 185 Tag code definition defined in JSON-C. These may only appear before 186 the beginning of an Object or Array and before any preceeding white 187 space. 189 c-tag 191 Tag code value defined in JSON-C. 193 d-value 195 Additional binary data encodings defined in JSON-D for use in 196 scientific data applications. 198 The JSON grammar is modified to permit the use of x-value productions 199 in place of ( value value-separator ) : 201 JSON-text = (object / array) 203 object = *cdef begin-object [ 204 *( member value-separator | x-member ) 205 (member | x-member) ] end-object 207 member = tag value 208 x-member = tag x-value 210 tag = string name-separator | b-string | c-tag 212 array = *cdef begin-array [ *( value value-separator | x-value ) 213 (value | x-value) ] end-array 215 x-value = b-value / d-value 217 value = false / null / true / object / array / number / string 219 name-separator = ws %x3A ws ; : colon 220 value-separator = ws %x2C ws ; , comma 222 The following lexical values are unchanged: 223 begin-array = ws %x5B ws ; [ left square bracket 224 begin-object = ws %x7B ws ; { left curly bracket 225 end-array = ws %x5D ws ; ] right square bracket 226 end-object = ws %x7D ws ; } right curly bracket 228 ws = *( %x20 %x09 %x0A %x0D ) 230 false = %x66.61.6c.73.65 ; false 231 null = %x6e.75.6c.6c ; null 232 true = %x74.72.75.65 ; true 234 The productions number and string are defined as before: 236 number = [ minus ] int [ frac ] [ exp ] 237 decimal-point = %x2E ; . 238 digit1-9 = %x31-39 ; 1-9 239 e = %x65 / %x45 ; e E 240 exp = e [ minus / plus ] 1*DIGIT 241 frac = decimal-point 1*DIGIT 242 int = zero / ( digit1-9 *DIGIT ) 243 minus = %x2D ; - 244 plus = %x2B ; + 245 zero = %x30 ; 0 247 string = quotation-mark *char quotation-mark 248 char = unescaped / 249 escape ( %x22 / %x5C / %x2F / %x62 / %x66 / 250 %x6E / %x72 / %x74 / %x75 4HEXDIG ) 252 escape = %x5C ; \ 253 quotation-mark = %x22 ; " 254 unescaped = %x20-21 / %x23-5B / %x5D-10FFFF 256 5. JSON-B 258 The JSON-B encoding defines the b-value and b-string productions: 260 b-value = b-atom | b-string | b-data | b-integer | 261 b-float 263 b-string = *( string-chunk ) string-term 264 b-data = *( data-chunk ) data-last 266 b-integer = p-int8 | p-int16 | p-int32 | p-int64 | p-bignum16 | 267 n-int8 | n-int16 | n-int32 | n-int64 | n-bignum16 269 b-float = binary64 271 The lexical encodings of the productions are defined in the following 272 table where the column 'tag' specifies the byte code that begins the 273 production, 'Fixed' specifies the number of data bytes that follow 274 and 'Length' specifies the number of bytes used to define the length 275 of a variable length field following the data bytes: 277 +--------------+-----+-------+--------+-----------------------------+ 279 | Production | Tag | Fixed | Length | Data Description | 281 +--------------+-----+-------+--------+-----------------------------+ 283 | string-term | x80 | - | 1 | Terminal String 8 bit | 284 | | | | | length | 286 | | | | | | 288 | string-term | x81 | - | 2 | Terminal String 16 bit | 290 | | | | | length | 292 | | | | | | 294 | string-term | x82 | - | 4 | Terminal String 32 bit | 296 | | | | | length | 298 | | | | | | 300 | string-term | x83 | - | 8 | Terminal String 64 bit | 302 | | | | | length | 304 | | | | | | 306 | string-chunk | x84 | - | 1 | Non-Terminal String 8 bit | 308 | | | | | length | 310 | | | | | | 312 | string-chunk | x85 | - | 2 | Non-Terminal String 16 bit | 314 | | | | | length | 316 | | | | | | 318 | string-chunk | x86 | - | 4 | Non-Terminal String 32 bit | 320 | | | | | length | 322 | | | | | | 324 | string-chunk | x87 | - | 8 | Non-Terminal String 64 bit | 326 | | | | | length | 328 | | | | | | 330 | data-term | x88 | - | 1 | Terminal Data 8 bit length | 331 | | | | | | 333 | data-term | x89 | - | 2 | Terminal Data 16 bit length | 335 | | | | | | 337 | data-term | x8A | - | 4 | Terminal Data 32 bit length | 339 | | | | | | 341 | data-term | x8B | - | 8 | Terminal Data 64 bit length | 343 | | | | | | 345 | data-chunk | x8C | - | 1 | Non-Terminal Data 8 bit | 347 | | | | | length | 349 | | | | | | 351 | data-chunk | x8D | - | 2 | Non-Terminal Data 16 bit | 353 | | | | | length | 355 | | | | | | 357 | data-chunk | x8E | - | 4 | Non-Terminal Data 32 bit | 359 | | | | | length | 361 | | | | | | 363 | data-chunk | x8F | - | 8 | Non-Terminal String 64 bit | 365 | | | | | length | 367 | | | | | | 369 | p-int8 | xA0 | 1 | - | Positive 8 bit Integer | 371 | | | | | | 373 | p-int16 | xA1 | 2 | - | Positive 16 bit Integer | 375 | | | | | | 377 | p-int32 | xA2 | 4 | - | Positive 32 bit Integer | 378 | | | | | | 380 | p-int64 | xA3 | 8 | - | Positive 64 bit Integer | 382 | | | | | | 384 | p-bignum16 | xA5 | - | 2 | Positive Bignum 16 bit | 386 | | | | | length | 388 | | | | | | 390 | n-int8 | xA8 | 1 | - | Negative 8 bit Integer | 392 | | | | | | 394 | n-int16 | xA9 | 2 | - | Negative 16 bit Integer | 396 | | | | | | 398 | n-int32 | xAA | 4 | - | Negative 32 bit Integer | 400 | | | | | | 402 | n-int64 | xAB | 8 | - | Negative 64 bit Integer | 404 | | | | | | 406 | n-bignum16 | xAD | - | 2 | Negative Bignum 16 bit | 408 | | | | | length | 410 | | | | | | 412 | binary64 | x92 | 8 | - | IEEE 754 Floating Point | 414 | | | | | binary64 | 416 | | | | | | 418 | b-value | xB0 | - | - | True | 420 | | | | | | 422 | b-value | xB1 | - | - | False | 424 | | | | | | 425 | b-value | xB2 | - | - | Null | 427 +--------------+-----+-------+--------+-----------------------------+ 429 A data type commonly used in networking that is not defined in this 430 scheme is a datetime representation. To define such a data type, a 431 string containing a date-time value in Internet type format is 432 typically used. 434 5.1. JSON-B Examples 436 The following examples show examples of using JSON-B encoding: 438 A0 2A 42 (as 8 bit integer) 439 A1 00 2A 42 (as 16 bit integer) 440 A2 00 00 00 2A 42 (as 32 bit integer) 441 A3 00 00 00 00 00 00 00 2A 42 (as 64 bit integer) 442 A5 00 01 42 42 (as Bignum) 444 80 05 48 65 6c 6c 6f "Hello" (single chunk) 445 81 00 05 48 65 6c 6c 6f "Hello" (single chunk) 446 84 05 48 65 6c 6c 6f 80 00 "Hello" (as two chunks) 448 92 3f f0 00 00 00 00 00 00 1.0 449 92 40 24 00 00 00 00 00 00 10.0 450 92 40 09 21 fb 54 44 2e ea 3.14159265359 451 92 bf f0 00 00 00 00 00 00 -1.0 453 B0 true 454 B1 false 455 B2 null 457 6. JSON-C 459 JSON-C (Compressed) permits numeric code values to be substituted for 460 strings and binary data. Tag codes MAY be 8, 16 or 32 bits long 461 encoded in network byte order. 463 Tag codes MUST be defined before they are referenced. A Tag code MAY 464 be defined before the corresponding data or string value is used or 465 at the same time that it is used. 467 A dictionary is a list of tag code definitions. An encoding MAY 468 incorporate definitions from a dictionary using the dict-hash 469 production. The dict hash production specifies a (positive) offset 470 value to be added to the entries in the dictionary and a hash code 471 identifier consisting of the ASN.1 OID value sequence for the 472 cryptographic digest used to compute the hash value followed by the 473 hash value in network byte order. 475 +------------+-----+-------+--------+-------------------------------+ 477 | Production | Tag | Fixed | Length | Data Description | 479 +------------+-----+-------+--------+-------------------------------+ 481 | c-tag | xC0 | 1 | - | 8 bit tag code | 483 | | | | | | 485 | c-tag | xC1 | 2 | - | 16 bit tag code | 487 | | | | | | 489 | c-tag | xC2 | 4 | - | 32 bit tag code | 491 | | | | | | 493 | c-def | xC4 | 1 | - | 8 bit tag definition | 495 | | | | | | 497 | c-def | xC5 | 2 | - | 16 bit tag definition | 499 | | | | | | 501 | c-def | xC6 | 4 | - | 32 bit tag definition | 503 | | | | | | 505 | c-tag | xC8 | 1 | - | 8 bit tag code & definition | 507 | | | | | | 509 | c-tag | xC9 | 2 | - | 16 bit tag code & definition | 511 | | | | | | 513 | c-tag | xCA | 4 | - | 32 bit tag code & definition | 515 | | | | | | 517 | c-def | xCC | 1 | - | 8 bit tag dictionary | 519 | | | | | definition | 520 | | | | | | 522 | c-def | xCD | 2 | - | 16 bit tag dictionary | 524 | | | | | definition | 526 | | | | | | 528 | c-def | xCE | 4 | - | 32 bit tag dictionary | 530 | | | | | definition | 532 | | | | | | 534 | dict-hash | xD0 | 4 | 1 | Hash of dictionary | 536 +------------+-----+-------+--------+-------------------------------+ 538 All integer values are encoded in Network Byte Order (most 539 significant byte first). 541 6.1. JSON-C Examples 543 The following examples show examples of using JSON-C encoding: 545 C8 20 80 05 48 65 6c 6c 6f "Hello" 20 = "Hello" 546 C4 21 80 05 48 65 6c 6c 6f 21 = "Hello" 547 C0 20 "Hello" 548 C1 00 20 "Hello" 550 D0 00 00 01 00 1B 277 = "Hello" 551 06 09 60 86 48 01 65 03 552 04 02 01 OID for SHA-2-256 553 e3 b0 c4 42 98 fc 1c 14 554 9a fb f4 c8 99 6f b9 24 555 27 ae 41 e4 64 9b 93 4c 556 a4 95 99 1b 78 52 b8 55 SHA-256(C4 21 80 05 48 65 6c 6c 6f) 557 [2.16.840.1.101.3.4.2.1] 559 7. JSON-D (Data) 561 JSON-B and JSON-C only support the two numeric types defined in the 562 JavaScript data model: Integers and 64 bit floating point values. 563 JSON-D (Data) defines binary encodings for additional data types that 564 are commonly used in scientific applications. These comprise 565 positive and negative 128 bit integers, six additional floating point 566 representations defined by IEEE 754 [RFC2119] and the Intel extended 567 precision 80 bit floating point representation. 569 Should the need arise, even bigger bignums could be defined with the 570 length specified as a 32 bit value permitting bignums of up to 2^35 571 bits to be represented. 573 d-value = d-integer | d-float 575 d-float = binary16 | binary32 | binary128 | binary80 | 577 decimal32 | decimal64 | decimal 128 579 8. 581 +------------+-----+-------+--------+-------------------------------+ 583 | Production | Tag | Fixed | Length | Data Description | 585 +------------+-----+-------+--------+-------------------------------+ 587 | p-int128 | xA4 | 16 | - | Positive 128 bit Integer | 589 | | | | | | 591 | n-in7128 | xAC | 16 | - | Negative 128 bit Integer | 593 | | | | | | 595 | binary16 | x90 | 2 | - | IEEE 754 Floating Point | 597 | | | | | binary16 | 599 | | | | | | 601 | binary32 | x91 | 4 | - | IEEE 754 Floating Point | 603 | | | | | binary32 | 605 | | | | | | 607 | binary128 | x94 | 16 | - | IEEE 754 Floating Point | 609 | | | | | binary128 | 611 | | | | | | 613 | intel80 | x95 | 10 | - | Intel 80 bit extended binary | 615 | | | | | Floating Point | 616 | | | | | | 618 | decimal32 | x96 | 4 | - | IEEE 754 Floating Point | 620 | | | | | decimal32 | 622 | | | | | | 624 | decimal64 | x97 | 8 | - | IEEE 754 Floating Point | 626 | | | | | decimal64 | 628 | | | | | | 630 | decimal128 | x98 | 18 | - | IEEE 754 Floating Point | 632 | | | | | decimal128 | 634 +------------+-----+-------+--------+-------------------------------+ 636 9. 638 10. Acknowledgements 640 This work was assisted by conversations with Nico Williams and other 641 participants on the applications area mailing list. 643 11. Security Considerations 645 A correctly implemented data encoding mechanism should not introduce 646 new security vulnerabilities. However, experience demonstrates that 647 some data encoding approaches are more prone to introduce 648 vulnerabilities when incorrectly implemented than others. 650 In particular, whenever variable length data formats are used, the 651 possibility of a buffer overrun vulnerability is introduced. While 652 best practice suggests that a coding language with native mechanisms 653 for bounds checking is the best protection against such errors, such 654 approaches are not always followed. While such vulnerabilities are 655 most commonly seen in the design of decoders, it is possible for the 656 same vulnerabilities to be exploited in encoders. 658 A common source of such errors is the case where nested length 659 encodings are used. For example, a decoder relies on an outermost 660 length encoding that specifies a length on 50 bytes to allocate 661 memory for the entire result and then attempts to copy a string with 662 a declared length of 1000 bytes within the sequence. 664 The extensions to the JSON encoding described in this document are 665 designed to avoid such errors. Length encodings are only used to 666 define the length of x-value constructions which are always terminal 667 and cannot have nested data entries. 669 12. IANA Considerations 671 [TBS list out all the code points that require an IANA registration] 673 13. Normative References 675 [IEEE-754] 676 "[Reference Not Found!]". 678 Author's Address 680 Phillip Hallam-BakerPhillip Hallam-Baker 681 Comodo Group Inc. 683 Email: philliph@comodo.com