Binary Encodings for JavaScript Object Notation: JSON-B, JSON-C, JSON-DComodo Group Inc.philliph@comodo.com
General
TransparencyPKIPKIXThree binary encodings for JavaScript Object Notation (JSON) are presented. JSON-B (Binary) is a strict superset of the JSON encoding that permits efficient binary encoding of intrinsic JavaScript data types. JSON-C (Compact) is a strict superset of JSON-B that supports compact representation of repeated data strings with short numeric codes. JSON-D (Data) supports additional binary data types for integer and floating point representations for use in scientific applications where conversion between binary and decimal representations would cause a loss of precision. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. JavaScript Object Notation (JSON) is a simple text encoding for the JavaScript Data model that has found wide application beyond its original field of use. In particular JSON has rapidly become a preferred encoding for Web Services. JSON encoding supports just four fundamental data types (integer, floating point, string and boolean), arrays and objects which consist of a list of tag-value pairs. Although the JSON encoding is sufficient for many purposes it is not always efficient. In particular there is no efficient representation for blocks of binary data. Use of base64 encoding increases data volume by 33%. This overhead increases exponentially in applications where nested binary encodings are required making use of JSON encoding unsatisfactory in cryptographic applications where nested binary structures are frequently required. Another source of inefficiency in JSON encoding is the repeated occurrence of object tags. A JSON encoding containing an array of a hundred objects such as {"first":1,"second":2} will contain a hundred occurrences of the string "first" (seven bytes) and a hundred occurrences of the string "second" (eight bytes). Using two byte code sequences in place of strings allows a saving of 11 bytes per object without loss of information, a saving of 50%. A third objection to the use of JSON encoding is that floating point numbers can only be represented in decimal form and this necessarily involves a loss of precision when converting between binary and decimal representations. While such issues are rarely important in network applications they can be critical in scientific applications. It is not acceptable for saving and restoring a data set to change the result of a calculation. The following were identified as core objectives for a binary JSON encoding:Low overhead encoding and decoding Easy to convert existing encoders and decoders to add binary support Efficient encoding of binary data Ability to convert from JSON to binary encoding in a streaming mode (i.e. without reading the entire binary data block before beginning encoding. Lossless encoding of JavaScript data types The ability to support JSON tag compression and extended data types are considered desirable but not essential for typical network applications. Three binary encodings are defined:Simply encodes JSON data in binary. Only the JavaScript data model is supported (i.e. atomic types are integers, double or string). Integers may be 8, 16, 32 or 64 bits either signed or unsigned. Floating points are IEEE 754 binary64 format [!IEEE-754]. Supports chunked encoding for binary and UTF-8 string types. As JSON-B but with support for representing JSON tags in numeric code form (16 bit code space). This is done for both compact encoding and to allow simplification of encoders/decoders in constrained environments. Codes may be defined inline or by reference to a known dictionary of codes referenced via a digest value.As JSON-C but with support for representing additional data types without loss of precision. In particular other IEEE 754 floating point formats, both binary and decimal and Intel's 80 bit floating point, plus 128 bit integers and bignum integers.The JSON-B, JSON-C and JSON-D encodings are all based on the JSON grammar [RFC4627] /> using the same syntactic structure but different lexical encodings. JSON-B0 and JSON-C0 replace the JSON lexical encodings for strings and numbers with binary encodings. JSON-B1 and JSON-C1 allow either lexical encoding to be used. Thus any valid JSON encoding is a valid JSON-B1 or JSON-C1 encoding. The grammar of JSON-B, JSON-C and JSON-D is a superset of the JSON grammar. The following productions are added to the grammar:Binary encodings for data values. As the binary value encodings are all self delimiting An object member where the value is specified as an X-value and thus does not require a value-separator. Binary data encodings defined in JSON-B. Defined length string encoding defined in JSON-B. Tag code definition defined in JSON-C. These may only appear before the beginning of an Object or Array and before any preceeding white space. Tag code value defined in JSON-C. Additional binary data encodings defined in JSON-D for use in scientific data applications. The JSON grammar is modified to permit the use of x-value productions in place of ( value value-separator ) : The following lexical values are unchanged: The productions number and string are defined as before: The JSON-B encoding defines the b-value and b-string productions: The lexical encodings of the productions are defined in the following table where the column 'tag' specifies the byte code that begins the production, 'Fixed' specifies the number of data bytes that follow and 'Length' specifies the number of bytes used to define the length of a variable length field following the data bytes: ProductionTagFixedLengthData Descriptionstring-termx80-1Terminal String 8 bit lengthstring-termx81-2Terminal String 16 bit lengthstring-termx82-4Terminal String 32 bit lengthstring-termx83-8Terminal String 64 bit lengthstring-chunkx84-1Non-Terminal String 8 bit lengthstring-chunkx85-2Non-Terminal String 16 bit lengthstring-chunkx86-4Non-Terminal String 32 bit lengthstring-chunkx87-8Non-Terminal String 64 bit lengthdata-termx88-1Terminal Data 8 bit lengthdata-termx89-2Terminal Data 16 bit lengthdata-termx8A-4Terminal Data 32 bit lengthdata-termx8B-8Terminal Data 64 bit lengthdata-chunkx8C-1Non-Terminal Data 8 bit lengthdata-chunkx8D-2Non-Terminal Data 16 bit lengthdata-chunkx8E-4Non-Terminal Data 32 bit lengthdata-chunkx8F-8Non-Terminal String 64 bit lengthp-int8xA01-Positive 8 bit Integerp-int16xA12-Positive 16 bit Integerp-int32xA24-Positive 32 bit Integerp-int64xA38-Positive 64 bit Integerp-bignum16xA5-2Positive Bignum 16 bit lengthn-int8xA81-Negative 8 bit Integern-int16xA92-Negative 16 bit Integern-int32xAA4-Negative 32 bit Integern-int64xAB8-Negative 64 bit Integern-bignum16xAD-2Negative Bignum 16 bit lengthbinary64x928-IEEE 754 Floating Point binary64b-valuexB0--Trueb-valuexB1--Falseb-valuexB2--NullA data type commonly used in networking that is not defined in this scheme is a datetime representation. The following examples show examples of using JSON-B encoding: JSON-C (Compressed) permits numeric code values to be substituted for strings and binary data. Tag codes MAY be 8, 16 or 32 bits long encoded in network byte order. Tag codes MUST be defined before they are referenced. A Tag code MAY be defined before the corresponding data or string value is used or at the same time that it is used. A dictionary is a list of tag code definitions. An encoding MAY incorporate definitions from a dictionary using the dict-hash production. The dict hash production specifies a (positive) offset value to be added to the entries in the dictionary and a hash code identifier consisting of the ASN.1 OID value sequence for the cryptographic digest used to compute the hash value followed by the hash value in network byte order. ProductionTagFixedLengthData Descriptionc-tagxC01-8 bit tag codec-tagxC12-16 bit tag codec-tagxC24-32 bit tag codec-defxC41-8 bit tag definitionc-defxC52-16 bit tag definitionc-defxC64-32 bit tag definitionc-tagxC81-8 bit tag code & definitionc-tagxC92-16 bit tag code & definitionc-tagxCA4-32 bit tag code & definitionc-defxCC1-8 bit tag dictionary definitionc-defxCD2-16 bit tag dictionary definitionc-defxCE4-32 bit tag dictionary definitiondict-hashxD041Hash of dictionaryAll integer values are encoded in Network Byte Order (most significant byte first). The following examples show examples of using JSON-C encoding: 2.16.840.1.101.3.4.2.1JSON-B and JSON-C only support the two numeric types defined in the JavaScript data model: Integers and 64 bit floating point values. JSON-D (Data) defines binary encodings for additional data types that are commonly used in scientific applications. These comprise positive and negative 128 bit integers, six additional floating point representations defined by IEEE 754 [RFC2119] and the Intel extended precision 80 bit floating point representation. Should the need arise, even bigger bignums could be defined with the length specified as a 32 bit value permitting bignums of up to 2^35 bits to be represented. ProductionTagFixedLengthData Descriptionp-int128xA416-Positive 128 bit Integern-in7128xAC16-Negative 128 bit Integerbinary16x902-IEEE 754 Floating Point binary16binary32x914-IEEE 754 Floating Point binary32binary128x9416-IEEE 754 Floating Point binary128intel80x9510-Intel 80 bit extended binary Floating Pointdecimal32x964-IEEE 754 Floating Point decimal32decimal64x978-IEEE 754 Floating Point decimal64decimal128x9818-IEEE 754 Floating Point decimal128Nico Williams, etc TBS [TBS list out all the code points that require an IANA registration] Key words for use in RFCs to Indicate Requirement LevelsHarvard UniversitykeywordThe application/json Media Type for JavaScript Object Notation (JSON)[Reference Not Found!]