JSON Canonicalization Scheme (JCS)
IndependentMontpellierFranceanders.rundgren.net@gmail.comhttps://www.linkedin.com/in/andersrundgren/
Security
JSON, ECMAScript, Signatures, Cryptography, Canonicalization
Cryptographic operations like hashing and signing depend on that the
target data does not change during serialization, transport, or parsing.
By applying the rules defined by JCS (JSON Canonicalization Scheme),
data provided in the JSON format can be exchanged "as is",
while still being subject to secure cryptographic operations.
JCS achieves this by combining the strict serialization of JSON primitives
defined in ECMAScript with a platform independent
sorting scheme.
The intended audiences of this document are JSON tool vendors, as
well as designers of JSON based cryptographic solutions.
Cryptographic operations like hashing and signing depend on that the
target data does not change during serialization, transport, or parsing.
A straightforward way of accomplishing this is converting the data into
a format which has a simple and fixed representation like Base64Url
which for example have been used in JWS .
Another solution is creating a canonicalized version of the target data
with XML Signature as a prime example.
Since the objective was keeping the data "as is", the canonicalization
method was selected. For avoiding "reinventing the wheel",
JCS relies on serialization of JSON primitives compatible with
ECMAScript (aka JavaScript) beginning with version 6 ,
from now on simply referred to as "ES6".
Seasoned XML developers recalling difficulties getting signatures
to validate (usually due to different interpretations of the quite intricate
XML canonicalization rules as well as of the equally extensive
Web Services security standards), may rightfully wonder why this
particular effort would succeed. The reasons are twofold:
JSON is a considerably simpler format than XML, as well as lacking
support for the powerful (but complex) namespace concept.
ES6 compatible JSON serialization is already supported by most
Web browsers, Node.js ,
as well as by third party libraries like Open Keystore ,
giving the proposed canonicalization scheme a head start.
Also see .
The JCS specification describes how JSON serializing rules compliant
with ES6 combined with an elementary sorting scheme, can be used for
supporting "Crypto Safe" JSON.
JCS is compatible with some existing systems relying on JSON canonicalization
such as JWK Thumbprint and Keybase .
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are to be interpreted as
described in BCP 14
when, and only when, they appear in all capitals, as shown here.
This section describes the different issues related to JSON canonicalization,
and how they are addressed by JCS.
In order to canonicalize JSON data, an internal representation
of the JSON data is needed. This can be achieved by:
Parsing externally supplied JSON data.
Programmatic creation of JSON data.
Irrespective of method used, the JSON data MUST be compatible
both with ES6 and I-JSON , which implies the following:
There MUST NOT be any duplicate property names within an Object.
Data of the type String MUST be expressible
as Unicode strings.
Also see .
Data of the type Number MUST be expressible
as IEEE-754 double precision values.
Also see .
The following sub sections describe the steps required for creating a canonicalized
version of internal JSON data elaborated on in the previous section.
shows sample code for an ES6 based canonicalizer,
matching the JCS specification.
Possible whitespace between JSON elements MUST be ignored (not emitted).
Assume that you parse a JSON object like the following:
If you subsequently serialize the object created by the operation above
using an serializer compliant with ES6's JSON.stringify(),
the result would (with a line wrap added for display purposes only),
be rather divergent with respect to representation of data:
Note: \u20ac denotes the Euro character, which not
being ASCII, is currently not displayable in RFCs.
The reason for the difference between the parsed data and its
serialized counterpart, is due to a wide tolerance on input data (as defined
by JSON ), while output data (as defined by ES6),
has a fixed representation. As can be seen by the example,
numbers are subject to rounding as well.
The following sub sections describe serialization of primitive JSON data types
according to JCS. This part is identical to that of ES6.
The JSON literals null, true,
and false present no challenge since they already have a
fixed definition in JSON .
For JSON data of the type String (which includes
Object property names as well), each character MUST be serialized as
described in Section 24.3.2.2 of ES6.
If the Unicode value falls within the traditional ASCII control
character range (U+0000 through U+001F), it MUST
be serialized using lowercase hexadecimal Unicode notation (\uhhhh) unless it is in the
set of predefined JSON control characters U+0008, U+0009, U+000A, U+000C or U+000D
which MUST be serialized as \b, \t, \n, \f and \r respectively.
If the Unicode value is outside of the ASCII control character range, it MUST
be serialized "as is" unless it is equivalent to
U+005C (\) or U+0022 (") which MUST be serialized as \\ and \" respectively.
Finally, the serialized string value MUST be enclosed in double quotes (").
Note that many JSON systems permit the use of invalid Unicode data
like "lone surrogates" (e.g. U+DEAD), which also is dealt with in a platform specific way.
Since this leads to interoperability issues including broken signatures,
such usages MUST be avoided.
Note that although the Unicode standard offers a possibility combining
certain characters into one, referred to as "Unicode Normalization"
(https://www.unicode.org/reports/tr15/),
such functionality MUST be delegated to the application layer
which already is the case for most other uses of JSON.
JSON data of the type Number MUST be serialized according to
Section 7.1.12.1 of ES6; for
maximum interoperability preferably including the "Note 2" enhancement as well.
The latter is implemented by for example Google's V8 .
Due to the relative complexity of this part, it is not included in this specification.
Note that ES6 builds on the IEEE-754 double precision
standard for storing Number data.
holds a set of IEEE-754 sample values and their
corresponding JSON serialization.
Occasionally applications need higher precision or longer integers than
offered by the current implementation of JSON Number
in ES6.
outlines how this can be achieved
in a portable and extensible way.
Although the previous step indeed normalized the representation of primitive
JSON data types, the result would not qualify as "canonicalized" since
Object properties are not in lexicographic (alphabetical) order.
Applied to the sample in ,
a properly canonicalized version should (with a
line wrap added for display purposes only), read as:
Note: \u20ac denotes the Euro character, which not
being ASCII, is currently not displayable in RFCs.
The rules for lexicographic sorting of JSON properties according to JCS are as follows:
Object properties are sorted in a recursive manner which means that a found JSON
child Object type MUST be subject to sorting as well.
JSON Array data MUST also be checked for
the presence of sortable JSON Object elements,
but array element order MUST NOT be changed.
When a JSON Object is about to have its properties
sorted, the following measures MUST be adhered to:
The sorting process is applied to the internal representation of
property strings. That is, their state before serialization.
Property strings to be sorted depend on that strings are internally represented
as arrays of 16‑bit unsigned integers where each integer holds a
single UCS2/UTF-16 code unit. The sorting is
based on pure value comparisons, independent of locale settings.
Property strings either have different values at some index that is
a valid index for both strings, or their lengths are different, or both.
If they have different values at one or more index
positions, let k be the smallest such index; then the string whose
value at position k has the smaller value, as determined by using
the < operator, lexicographically precedes the other string.
If there is no index position at which they differ,
then the shorter string lexicographically precedes the longer string.
The rationale for basing the sort algorithm on UCS2/UTF-16 code units is that
it maps directly to the string type in ECMAScript, Java and .NET.
Systems using another representation of string data will need to convert
JSON property strings into arrays of UCS2/UTF-16 code units before sorting.
Note: for the purpose obtaining a deterministic property order, sorting on
UTF-8 or UTF-32 encoded data would also work, but the result would differ
(and thus be incompatible with this specification).
Finally, in order to create a platform independent representation,
the resulting JSON string data MUST be encoded in UTF-8.
Applied to the sample in this
should yield the following bytes here shown in hexadecimal notation:
This data is intended to be usable as input to cryptographic functions.
For other uses see .
This document has no IANA actions.
JSON parsers MUST check that input data conforms to the JSON
specification.
Building on ES6 Number normalization was
originally proposed by James Manger. This ultimately led to the
adoption of the entire ES6 serialization scheme for JSON primitives.
Other people who have contributed with valuable input to this specification include
Mike Jones, Mike Miller, Mike Samuel, Michal Wadas, Richard Gibson and Scott Ananian.
ECMAScript 2015 Language SpecificationEcma InternationalIEEE Standard for Floating-Point ArithmeticIEEEThe Unicode Standard, Version 10.0.0The Unicode ConsortiumChrome V8 Open Source JavaScript EngineGoogle LLCNode.jsOpen KeystoreKeybaseThe OpenAPI InitiativeXML Signature Syntax and Processing Version 1.1W3C
Below is a functionally complete example of a JCS compliant canonicalizer
for usage with ES6 based systems.
Note: The primary purpose of this code is highlighting the canonicalization algorithm.
Using the full power of ES6 would reduce the code size considerably
but would also be more difficult to follow by non-experts.
The following table holds a set of ES6 Number serialization samples,
including some edge cases. The column "ES6 Internal" refers to the internal
ES6 representation of the Number data type which is based on the
IEEE-754 standard using 64-bit (double precision) values,
here expressed in hexadecimal.
Note: for maximum compliance with ECMAScript's JSON object, values that are
to be interpreted as true integers,
SHOULD be in the range -9007199254740991 to 9007199254740991.
Since the result from the canonicalization process (see ),
is fully valid JSON, it can also be used as "Wire Format".
However, this is just an option since cryptographic schemes
based on JCS, in most cases would not depend on that externally
supplied JSON data already is canonicalized.
In fact, the ES6 standard way of serializing objects using
JSON.stringify() produces a
more "logical" format, where properties are
kept in the order they were created or received. The
example below shows an address record which could benefit from
ES6 standard serialization:
Using canonicalization the properties above would be output in the order
"address", "city", "name", "state" and "zip", which adds fuzziness
to the data from a human (developer or technical support), perspective.
That is, for many applications, canonicalization would only be used internally
for creating a "hashable" representation of the data needed for cryptographic
operations.
Note that if message size is not a concern, you may even send "Pretty Printed"
JSON data on the wire (since whitespace always is ignored by the canonicalization process).
There are two major issues associated with the
JSON Number type, here illustrated by the following
sample object:
Although the sample above conforms to JSON (according to ),
there are some practical hurdles to consider:
Standard JSON parsers rarely process "giantNumber" in a meaningful way.
64-bit integers like "int64Max" normally pass through parsers, but in
systems like ES6, at the expense of lost precision.
Another issue is that parsers typically would use different schemes for handling
"giantNumber" and "int64Max". In addition, monetary data like "payMeThis" would
presumably not rely on a floating point system due to rounding issues with respect
to decimal arithmetic.
The (to the author NB), only known way handling this kind of "overloading" of the
Number type (at least in an extensible manner), is through
mapping mechanisms, instructing parsers what to do with different properties
based on their name. However, this greatly limits the value of using the
Number type outside of its original
somewhat constrained, JavaScript context.
For usage with JCS (and in fact for any usage of JSON by multiple
parties potentially using independently developed software), numbers that do
not have a natural place in the current JSON ecosystem MUST be wrapped
using the JSON String type. This is close to
a de-facto standard for open systems.
Aided by a mapping system; be it programmatic like
or declarative schemes like OpenAPI ,
there are no real limits, not even when using ES6.
The optimal solution is integrating support for JCS directly
in JSON parsers and serializers. However, this is not always realistic.
Fortunately JCS support can be performed through externally supplied
canonicalizer software, enabling signature creation schemes like the following:
Create the data to be signed.
Serialize the data using existing JSON tools.
Let the external canonicalizer process the serialized data and return canonicalized result data.
Sign the canonicalized data.
Add the resulting signature value to the original JSON data through a designated signature property.
Serialize the completed (now signed) JSON object using existing JSON tools.
A compatible signature verification scheme would then be as follows:
Parse the signed JSON data using existing JSON tools.
Read and save the signature value from the designated signature property.
Remove the signature property from the parsed JSON object.
Serialize the remaining JSON data using existing JSON tools.
Let the external canonicalizer process the serialized data and return canonicalized result data.
Verify that the canonicalized data matches the saved signature value
using the algorithm and key used for creating the signature.
A canonicalizer like above is effectively only a "filter", potentially usable with
a multitude of quite different cryptographic schemes.
Using an integrated canonicalizer, you would eliminate the serialization and
parsing step before the canonicalization, for both processes. That is,
canonicalization would typically be an additional "mode"
for a JSON serializer.
There are (and have been) other efforts creating "Canonical JSON".
Below is a list of URLs to some of them:
https://tools.ietf.org/html/draft-staykov-hu-json-canonical-form-00https://gibson042.github.io/canonicaljson-spec/https://www.npmjs.com/package/canonicalizehttp://wiki.laptop.org/go/Canonical_JSON
The JSC specification is currently developed at
https://github.com/cyberphone/json-canonicalization.
The portal also provides software for testing.