idnits 2.17.1 draft-hallambaker-jsonbcd-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 14, 2017) is 2444 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 7159 (Obsoleted by RFC 8259) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Hallam-Baker 3 Internet-Draft Comodo Group Inc. 4 Intended status: Informational August 14, 2017 5 Expires: February 15, 2018 7 Binary Encodings for JavaScript Object Notation: JSON-B, JSON-C, JSON-D 8 draft-hallambaker-jsonbcd-07 10 Abstract 12 Three binary encodings for JavaScript Object Notation (JSON) are 13 presented. JSON-B (Binary) is a strict superset of the JSON encoding 14 that permits efficient binary encoding of intrinsic JavaScript data 15 types. JSON-C (Compact) is a strict superset of JSON-B that supports 16 compact representation of repeated data strings with short numeric 17 codes. JSON-D (Data) supports additional binary data types for 18 integer and floating-point representations for use in scientific 19 applications where conversion between binary and decimal 20 representations would cause a loss of precision. 22 This document is also available online at 23 http://prismproof.org/Documents/draft-hallambaker-json-bcd.html . 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on February 15, 2018. 42 Copyright Notice 44 Copyright (c) 2017 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 60 1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 62 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 63 3. Extended JSON Grammar . . . . . . . . . . . . . . . . . . . . 4 64 4. JSON-B . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 65 4.1. JSON-B Examples . . . . . . . . . . . . . . . . . . . . . 9 66 5. JSON-C . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 67 5.1. JSON-C Examples . . . . . . . . . . . . . . . . . . . . . 11 68 6. JSON-D (Data) . . . . . . . . . . . . . . . . . . . . . . . . 12 69 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 70 8. Security Considerations . . . . . . . . . . . . . . . . . . . 13 71 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 72 10. Normative References . . . . . . . . . . . . . . . . . . . . 13 73 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 14 75 1. Introduction 77 JavaScript Object Notation (JSON) is a simple text encoding for the 78 JavaScript Data model that has found wide application beyond its 79 original field of use. In particular JSON has rapidly become a 80 preferred encoding for Web Services. 82 JSON encoding supports just four fundamental data types (integer, 83 floating point, string and boolean), arrays and objects which consist 84 of a list of tag-value pairs. 86 Although the JSON encoding is sufficient for many purposes it is not 87 always efficient. In particular there is no efficient representation 88 for blocks of binary data. Use of base64 encoding increases data 89 volume by 33%. This overhead increases exponentially in applications 90 where nested binary encodings are required making use of JSON 91 encoding unsatisfactory in cryptographic applications where nested 92 binary structures are frequently required. 94 Another source of inefficiency in JSON encoding is the repeated 95 occurrence of object tags. A JSON encoding containing an array of a 96 hundred objects such as {"first":1,"second":2} will contain a hundred 97 occurrences of the string "first" (seven bytes) and a hundred 98 occurrences of the string "second" (eight bytes). Using two byte 99 code sequences in place of strings allows a saving of 11 bytes per 100 object without loss of information, a saving of 50%. 102 A third objection to the use of JSON encoding is that floating point 103 numbers can only be represented in decimal form and this necessarily 104 involves a loss of precision when converting between binary and 105 decimal representations. While such issues are rarely important in 106 network applications they can be critical in scientific applications. 107 It is not acceptable for saving and restoring a data set to change 108 the result of a calculation. 110 1.1. Objectives 112 The following were identified as core objectives for a binary JSON 113 encoding: 115 o Low overhead encoding and decoding 117 o Easy to convert existing encoders and decoders to add binary 118 support 120 o Efficient encoding of binary data 122 o Ability to convert from JSON to binary encoding in a streaming 123 mode (i.e. without reading the entire binary data block before 124 beginning encoding. 126 o Lossless encoding of JavaScript data types 128 o The ability to support JSON tag compression and extended data 129 types are considered desirable but not essential for typical 130 network applications. 132 Three binary encodings are defined: 134 Encodes JSON data in binary. Only the JavaScript data model is 135 supported (i.e. atomic types are integers, double or string). 136 Integers may be 8, 16, 32 or 64 bits either signed or unsigned. 137 Floating points are IEEE 754 binary64 format [IEEE754] [IEEE754] . 138 Supports chunked encoding for binary and UTF-8 string types. 140 As JSON-B but with support for representing JSON tags in numeric 141 code form (16 bit code space). This is done for both compact 142 encoding and to allow simplification of encoders/decoders in 143 constrained environments. Codes may be defined inline or by 144 reference to a known dictionary of codes referenced via a digest 145 value. 147 As JSON-C but with support for representing additional data types 148 without loss of precision. In particular other IEEE 754 floating 149 point formats, both binary and decimal and Intel's 80 bit floating 150 point, plus 128 bit integers and bignum integers. 152 Each encoding is a proper superset of JSON, JSON-C is a proper 153 superset of JSON-B and JSON-D is a proper superset of JSON-C. Thus a 154 single decoder MAY be used for all three new encodings and for JSON. 155 Figure 1 shows these relationships graphically: 157 [[This figure is not viewable in this format. The figure is 158 available at http://prismproof.org/Documents/draft-hallambaker-json- 159 bcd.html.]] 161 Encoding Relationships. 163 2. Definitions 165 2.1. Requirements Language 167 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 168 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 169 document are to be interpreted as described in [RFC2119] [RFC2119] . 171 3. Extended JSON Grammar 173 The JSON-B, JSON-C and JSON-D encodings are all based on the JSON 174 grammar [RFC7159] [RFC7159] using the same syntactic structure but 175 different lexical encodings. 177 JSON-B0 and JSON-C0 replace the JSON lexical encodings for strings 178 and numbers with binary encodings. JSON-B1 and JSON-C1 allow either 179 lexical encoding to be used. Thus any valid JSON encoding is a valid 180 JSON-B1 or JSON-C1 encoding. 182 The grammar of JSON-B, JSON-C and JSON-D is a superset of the JSON 183 grammar. The following productions are added to the grammar: 185 Binary encodings for data values. As the binary value encodings 186 are all self delimiting 188 An object member where the value is specified as an X-value and 189 thus does not require a value-separator. 191 Binary data encodings defined in JSON-B. 193 Defined length string encoding defined in JSON-B. 195 Tag code definition defined in JSON-C. These may only appear 196 before the beginning of an Object or Array and before any 197 preceding white space. 199 Tag code value defined in JSON-C. 201 Additional binary data encodings defined in JSON-D for use in 202 scientific data applications. 204 The JSON grammar is modified to permit the use of x-value productions 205 in place of ( value value-separator ) : 207 JSON-text = (object / array) 209 object = *cdef begin-object [ 210 *( member value-separator | x-member ) 211 (member | x-member) ] end-object 213 member = tag value 214 x-member = tag x-value 216 tag = string name-separator | b-string | c-tag 218 array = *cdef begin-array [ *( value value-separator | x-value ) 219 (value | x-value) ] end-array 221 x-value = b-value / d-value 223 value = false / null / true / object / array / number / string 225 name-separator = ws %x3A ws ; : colon 226 value-separator = ws %x2C ws ; , comma 228 Figure 1 230 The following lexical values are unchanged: 231 begin-array = ws %x5B ws ; [ left square bracket 232 begin-object = ws %x7B ws ; { left curly bracket 233 end-array = ws %x5D ws ; ] right square bracket 234 end-object = ws %x7D ws ; } right curly bracket 236 ws = *( %x20 %x09 %x0A %x0D ) 238 false = %x66.61.6c.73.65 ; false 239 null = %x6e.75.6c.6c ; null 240 true = %x74.72.75.65 ; true 242 Figure 2 244 The productions number and string are defined as before: 246 number = [ minus ] int [ frac ] [ exp ] 247 decimal-point = %x2E ; . 248 digit1-9 = %x31-39 ; 1-9 249 e = %x65 / %x45 ; e E 250 exp = e [ minus / plus ] 1*DIGIT 251 frac = decimal-point 1*DIGIT 252 int = zero / ( digit1-9 *DIGIT ) 253 minus = %x2D ; - 254 plus = %x2B ; + 255 zero = %x30 ; 0 257 string = quotation-mark *char quotation-mark 258 char = unescaped / 259 escape ( %x22 / %x5C / %x2F / %x62 / %x66 / 260 %x6E / %x72 / %x74 / %x75 4HEXDIG ) 262 escape = %x5C ; \ 263 quotation-mark = %x22 ; " 264 unescaped = %x20-21 / %x23-5B / %x5D-10FFFF 266 Figure 3 268 4. JSON-B 270 The JSON-B encoding defines the b-value and b-string productions: 272 b-value = b-atom | b-string | b-data | b-integer | 273 b-float 275 b-string = *( string-chunk ) string-term 276 b-data = *( data-chunk ) data-last 278 b-integer = p-int8 | p-int16 | p-int32 | p-int64 | p-bignum16 | 279 n-int8 | n-int16 | n-int32 | n-int64 | n-bignum16 281 b-float = binary64 283 Figure 4 285 The lexical encodings of the productions are defined in the following 286 tables where the column 'tag' specifies the byte code that begins the 287 production, 'Fixed' specifies the number of data bytes that follow 288 and 'Length' specifies the number of bytes used to define the length 289 of a variable length field following the data bytes: 291 +--------------+-----+-------+--------+-----------------------------+ 292 | Production | Tag | Fixed | Length | Data Description | 293 +--------------+-----+-------+--------+-----------------------------+ 294 | string-term | x80 | - | 1 | Terminal String 8 bit | 295 | | | | | length | 296 | string-term | x81 | - | 2 | Terminal String 16 bit | 297 | | | | | length | 298 | string-term | x82 | - | 4 | Terminal String 32 bit | 299 | | | | | length | 300 | string-term | x83 | - | 8 | Terminal String 64 bit | 301 | | | | | length | 302 | string-chunk | x84 | - | 1 | Terminal String 8 bit | 303 | | | | | length | 304 | string-chunk | x85 | - | 2 | Terminal String 16 bit | 305 | | | | | length | 306 | string-chunk | x86 | - | 4 | Terminal String 32 bit | 307 | | | | | length | 308 | string-chunk | x87 | - | 8 | Terminal String 64 bit | 309 | | | | | length | 310 | data-term | x88 | - | 1 | Terminal String 8 bit | 311 | | | | | length | 312 | data-term | x89 | - | 2 | Terminal String 16 bit | 313 | | | | | length | 314 | data-term | x8A | - | 4 | Terminal String 32 bit | 315 | | | | | length | 316 | data-term | x8B | - | 8 | Terminal String 64 bit | 317 | | | | | length | 318 | data-term | X8C | - | 1 | Terminal String 8 bit | 319 | | | | | length | 320 | data-term | x8D | - | 2 | Terminal String 16 bit | 321 | | | | | length | 322 | data-term | x8E | - | 4 | Terminal String 32 bit | 323 | | | | | length | 324 | data-term | x8F | - | 8 | Terminal String 64 bit | 325 | | | | | length | 326 +--------------+-----+-------+--------+-----------------------------+ 328 Table 1 330 Table 1: Codes for String and Data items 331 +------------+-----+-------+--------+-------------------------------+ 332 | Production | Tag | Fixed | Length | Data Description | 333 +------------+-----+-------+--------+-------------------------------+ 334 | p-int8 | xA0 | 1 | - | Positive 8 bit Integer | 335 | p-int16 | Xa1 | 2 | - | Positive 16 bit Integer | 336 | p-int32 | Xa2 | 4 | - | Positive 32 bit Integer | 337 | p-int64 | Xa3 | 8 | - | Positive 64 bit Integer | 338 | p-bignum16 | Xa5 | - | 2 | Positive Bignum | 339 | n-int8 | xA8 | 1 | - | Negative 8 bit Integer | 340 | n-int16 | xA9 | 2 | - | Negative 16 bit Integer | 341 | n-int32 | xAA | 4 | - | Negative 32 bit Integer | 342 | n-int64 | xAB | 8 | - | Negative 64 bit Integer | 343 | n-bignum16 | xAD | - | 2 | Negative Bignum | 344 | binary64 | x92 | 8 | - | IEEE 754 Floating Point | 345 | | | | | Binary 64 bit | 346 | b-value | xB0 | - | - | True | 347 | b-value | xB1 | - | - | False | 348 | b-value | xB2 | - | - | Null | 349 +------------+-----+-------+--------+-------------------------------+ 351 Table 2 353 Table 2: Codes for Integers, 64 Bit Floating Point, Boolean and Null 354 items. 356 A data type commonly used in networking that is not defined in this 357 scheme is a datetime representation. To define such a data type, a 358 string containing a date-time value in Internet type format is 359 typically used. 361 4.1. JSON-B Examples 363 The following examples show examples of using JSON-B encoding: 365 A0 2A 42 (as 8 bit integer) 366 A1 00 2A 42 (as 16 bit integer) 367 A2 00 00 00 2A 42 (as 32 bit integer) 368 A3 00 00 00 00 00 00 00 2A 42 (as 64 bit integer) 369 A5 00 01 42 42 (as Bignum) 371 80 05 48 65 6c 6c 6f "Hello" (single chunk) 372 81 00 05 48 65 6c 6c 6f "Hello" (single chunk) 373 84 05 48 65 6c 6c 6f 80 00 "Hello" (as two chunks) 375 92 3f f0 00 00 00 00 00 00 1.0 376 92 40 24 00 00 00 00 00 00 10.0 377 92 40 09 21 fb 54 44 2e ea 3.14159265359 378 92 bf f0 00 00 00 00 00 00 -1.0 380 B0 true 381 B1 false 382 B2 null 384 Figure 5 386 5. JSON-C 388 JSON-C (Compressed) permits numeric code values to be substituted for 389 strings and binary data. Tag codes MAY be 8, 16 or 32 bits long 390 encoded in network byte order. 392 Tag codes MUST be defined before they are referenced. A Tag code MAY 393 be defined before the corresponding data or string value is used or 394 at the same time that it is used. 396 A dictionary is a list of tag code definitions. An encoding MAY 397 incorporate definitions from a dictionary using the dict-hash 398 production. The dict hash production specifies a (positive) offset 399 value to be added to the entries in the dictionary followed by the 400 UDF fingerprint [draft-hallambaker-udf] [draft-hallambaker-udf] of 401 the dictionary to be used. 403 +------------+-----+-------+--------+-------------------------------+ 404 | Production | Tag | Fixed | Length | Data Description | 405 +------------+-----+-------+--------+-------------------------------+ 406 | c-tag | xC0 | 1 | - | 8 bit tag code | 407 | c-tag | xC1 | 2 | - | 16 bit tag code | 408 | c-tag | xC2 | 4 | - | 32 bit tag code | 409 | c-def | xC4 | 1 | - | 8 bit tag definition | 410 | c-def | xC5 | 2 | - | 16 bit tag definition | 411 | c-def | xC6 | 4 | - | 32 bit tag definition | 412 | c-tag | xC8 | 1 | - | 8 bit tag code and definition | 413 | c-tag | xC9 | 2 | - | 16 bit tag code and | 414 | | | | | definition | 415 | c-tag | xCA | 4 | - | 32 bit tag code and | 416 | | | | | definition | 417 | c-def | xCC | 1 | - | 8 bit tag dictionary | 418 | | | | | definition | 419 | c-def | xCD | 2 | - | 16 bit tag dictionary | 420 | | | | | definition | 421 | c-def | xCE | 4 | - | 32 bit tag dictionary | 422 | | | | | definition | 423 | dict-hash | xD0 | 4 | 1 | UDF fingerprint of dictionary | 424 +------------+-----+-------+--------+-------------------------------+ 426 Table 3 428 Table 3: Codes Used for Compression 430 All integer values are encoded in Network Byte Order (most 431 significant byte first). 433 5.1. JSON-C Examples 435 The following examples show examples of using JSON-C encoding: 437 C8 20 80 05 48 65 6c 6c 6f "Hello" 20 = "Hello" 438 C4 21 80 05 48 65 6c 6c 6f 21 = "Hello" 439 C0 20 "Hello" 440 C1 00 20 "Hello" 442 D0 00 00 01 00 20 Insert dictionary at code 256 443 e3 b0 c4 42 98 fc 1c 14 444 9a fb f4 c8 99 6f b9 24 445 27 ae 41 e4 64 9b 93 4c 446 a4 95 99 1b 78 52 b8 55 UDF (C4 21 80 05 48 65 6c 6c 6f) 448 Figure 6 450 6. JSON-D (Data) 452 JSON-B and JSON-C only support the two numeric types defined in the 453 JavaScript data model: Integers and 64 bit floating point values. 454 JSON-D (Data) defines binary encodings for additional data types that 455 are commonly used in scientific applications. These comprise 456 positive and negative 128 bit integers, six additional floating point 457 representations defined by IEEE 754 [IEEE754] [IEEE754] and the Intel 458 extended precision 80 bit floating point representation [INTEL] 459 [INTEL] . 461 Should the need arise, even bigger bignums could be defined with the 462 length specified as a 32 bit value permitting bignums of up to 2^35 463 bits to be represented. 465 d-value = d-integer | d-float 467 d-float = binary16 | binary32 | binary128 | binary80 | 468 decimal32 | decimal64 | decimal 128 470 Figure 7 472 The codes for these values are as follows: 474 +------------+-----+-------+--------+-------------------------------+ 475 | Production | Tag | Fixed | Length | Data Description | 476 +------------+-----+-------+--------+-------------------------------+ 477 | p-int128 | xA4 | 16 | - | Positive 128 bit Integer | 478 | n-int128 | xAC | 16 | - | Negative 128 bit Integer | 479 | binary16 | x90 | 2 | - | IEEE 754 Floating Point | 480 | | | | | Binary 16 bit | 481 | binary32 | x91 | 4 | - | IEEE 754 Floating Point | 482 | | | | | Binary 32 bit | 483 | binary128 | x94 | 16 | - | IEEE 754 Floating Point | 484 | | | | | Binary 64 bit | 485 | Intel80 | x95 | 10 | - | Intel extended Floating Point | 486 | | | | | 80 bit | 487 | decimal32 | x96 | 4 | - | IEEE 754 Floating Point | 488 | | | | | Decimal 32 | 489 | Decimal64 | x97 | 8 | - | IEEE 754 Floating Point | 490 | | | | | Decimal 64 | 491 | Decimal128 | x98 | 16 | - | IEEE 754 Floating Point | 492 | | | | | Decimal 128 | 493 +------------+-----+-------+--------+-------------------------------+ 495 Table 4 497 Table 4: Additional Codes for Scientific Data 499 7. Acknowledgements 501 This work was assisted by conversations with Nico Williams and other 502 participants on the applications area mailing list. 504 8. Security Considerations 506 A correctly implemented data encoding mechanism should not introduce 507 new security vulnerabilities. However, experience demonstrates that 508 some data encoding approaches are more prone to introduce 509 vulnerabilities when incorrectly implemented than others. 511 In particular, whenever variable length data formats are used, the 512 possibility of a buffer overrun vulnerability is introduced. While 513 best practice suggests that a coding language with native mechanisms 514 for bounds checking is the best protection against such errors, such 515 approaches are not always followed. While such vulnerabilities are 516 most commonly seen in the design of decoders, it is possible for the 517 same vulnerabilities to be exploited in encoders. 519 A common source of such errors is the case where nested length 520 encodings are used. For example, a decoder relies on an outermost 521 length encoding that specifies a length on 50 bytes to allocate 522 memory for the entire result and then attempts to copy a string with 523 a declared length of 1000 bytes within the sequence. 525 The extensions to the JSON encoding described in this document are 526 designed to avoid such errors. Length encodings are only used to 527 define the length of x-value constructions which are always terminal 528 and cannot have nested data entries. 530 9. IANA Considerations 532 [TBS list out all the code points that require an IANA registration] 534 10. Normative References 536 [draft-hallambaker-udf] 537 Hallam-Baker, P., "Uniform Data Fingerprint (UDF)", draft- 538 hallambaker-udf-06 (work in progress), August 2017. 540 [IEEE754] IEEE Computer Society, "IEEE Standard for Floating-Point 541 Arithmetic", IEEE 754-2008, 542 DOI 10.1109/IEEESTD.2008.4610935, August 2008. 544 [INTEL] Intel Corp., "Unknown". 546 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 547 Requirement Levels", BCP 14, RFC 2119, 548 DOI 10.17487/RFC2119, March 1997. 550 [RFC7159] Bray, T., "The JavaScript Object Notation (JSON) Data 551 Interchange Format", RFC 7159, DOI 10.17487/RFC7159, March 552 2014. 554 Author's Address 556 Phillip Hallam-Baker 557 Comodo Group Inc. 559 Email: philliph@comodo.com