idnits 2.17.1 draft-rundgren-json-canonicalization-scheme-14.txt: -(591): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(592): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(593): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(594): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(595): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(596): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(597): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(598): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(599): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(600): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(601): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(602): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(603): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(604): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(605): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(606): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(607): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(608): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(609): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(610): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(611): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(612): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(613): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(614): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(615): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(616): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(617): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(618): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(619): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(620): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(621): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(622): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(623): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(624): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(625): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(626): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(627): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(628): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(629): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(630): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(631): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(632): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(633): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(634): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(635): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(636): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(637): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(638): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(639): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(640): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(641): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(642): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(643): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(644): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(645): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 57 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (16 October 2019) is 1651 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'JsonIgnore' is mentioned on line 814, but not defined Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Rundgren 3 Internet-Draft Independent 4 Intended status: Informational B. Jordan 5 Expires: 18 April 2020 Symantec Corporation 6 S. Erdtman 7 Spotify AB 8 16 October 2019 10 JSON Canonicalization Scheme (JCS) 11 draft-rundgren-json-canonicalization-scheme-14 13 Abstract 15 Cryptographic operations like hashing and signing need the data to be 16 expressed in an invariant format so that the operations are reliably 17 repeatable. One way to address this is to create a canonical 18 representation of the data. Canonicalization also permits data to be 19 exchanged in its original form on the "wire" while cryptographic 20 operations performed on the canonicalized counterpart of the data in 21 the producer and consumer end points, generate consistent results. 22 This document describes the JSON Canonicalization Scheme (JCS). The 23 JCS specification defines how to create a canonical representation of 24 JSON data by building on the strict serialization methods for JSON 25 primitives defined by ECMAScript, constraining JSON data to the 26 I-JSON subset, and by using deterministic property sorting. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at https://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on 18 April 2020. 45 Copyright Notice 47 Copyright (c) 2019 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 52 license-info) in effect on the date of publication of this document. 53 Please review these documents carefully, as they describe your rights 54 and restrictions with respect to this document. Code Components 55 extracted from this document must include Simplified BSD License text 56 as described in Section 4.e of the Trust Legal Provisions and are 57 provided without warranty as described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 62 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 63 3. Detailed Operation . . . . . . . . . . . . . . . . . . . . . 4 64 3.1. Creation of Input Data . . . . . . . . . . . . . . . . . 4 65 3.2. Generation of Canonical JSON Data . . . . . . . . . . . . 5 66 3.2.1. Whitespace . . . . . . . . . . . . . . . . . . . . . 5 67 3.2.2. Serialization of Primitive Data Types . . . . . . . . 5 68 3.2.2.1. Serialization of Literals . . . . . . . . . . . . 6 69 3.2.2.2. Serialization of Strings . . . . . . . . . . . . 6 70 3.2.2.3. Serialization of Numbers . . . . . . . . . . . . 6 71 3.2.3. Sorting of Object Properties . . . . . . . . . . . . 7 72 3.2.4. UTF-8 Generation . . . . . . . . . . . . . . . . . . 9 73 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 74 5. Security Considerations . . . . . . . . . . . . . . . . . . . 9 75 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 76 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 77 7.1. Normative References . . . . . . . . . . . . . . . . . . 10 78 7.2. Informative References . . . . . . . . . . . . . . . . . 11 79 Appendix A. ES6 Sample Canonicalizer . . . . . . . . . . . . . . 11 80 Appendix B. Number Serialization Samples . . . . . . . . . . . . 13 81 Appendix C. Canonicalized JSON as "Wire Format" . . . . . . . . 15 82 Appendix D. Dealing with Big Numbers . . . . . . . . . . . . . . 15 83 Appendix E. String Subtype Handling . . . . . . . . . . . . . . 16 84 E.1. Subtypes in Arrays . . . . . . . . . . . . . . . . . . . 18 85 Appendix F. Implementation Guidelines . . . . . . . . . . . . . 18 86 Appendix G. Open Source Implementations . . . . . . . . . . . . 19 87 Appendix H. Other JSON Canonicalization Efforts . . . . . . . . 20 88 Appendix I. Development Portal . . . . . . . . . . . . . . . . . 20 89 Appendix J. Document History . . . . . . . . . . . . . . . . . . 20 90 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 92 1. Introduction 94 This document describes the JSON Canonicalization Scheme (JCS). The 95 JCS specification defines how to create a canonical representation of 96 JSON [RFC8259] data by building on the strict serialization methods 97 for JSON primitives defined by ECMAScript [ES6], constraining JSON 98 data to the I-JSON [RFC7493] subset, and by using deterministic 99 property sorting. The output from JCS is a "Hashable" representation 100 of JSON data that can be used by cryptographic methods. The 101 subsequent paragraphs outline the primary design considerations. 103 Cryptographic operations like hashing and signing need the data to be 104 expressed in an invariant format so that the operations are reliably 105 repeatable. One way to accomplish this is to convert the data into a 106 format that has a simple and fixed representation, like Base64Url 107 [RFC4648]. This is how JWS [RFC7515] addressed this issue. 109 Another solution is to create a canonical version of the data, 110 similar to what was done for the XML Signature [XMLDSIG] standard. 111 The primary advantage with a canonicalizing scheme is that data can 112 be kept in its original form. This is the core rationale behind JCS. 113 Put another way, using canonicalization enables a JSON Object to 114 remain a JSON Object even after being signed. This can simplify 115 system design, documentation, and logging. 117 To avoid "reinventing the wheel", JCS relies on the serialization of 118 JSON primitives (strings, numbers and literals), as defined by 119 ECMAScript (aka JavaScript) beginning with version 6 [ES6], hereafter 120 referred to as "ES6". 122 Seasoned XML developers may recall difficulties getting XML 123 signatures to validate. This was usually due to different 124 interpretations of the quite intricate XML canonicalization rules as 125 well as of the equally complex Web Services security standards. The 126 reasons why JCS should not suffer from similar issues are: 128 o The absence of a namespace concept and default values. 130 o Constraining data to the I-JSON [RFC7493] subset. This eliminates 131 the need for specific parsers for dealing with canonicalization. 133 o JCS compatible serialization of JSON primitives is currently 134 supported by most Web browsers and as well as by Node.js [NODEJS], 136 o The full JCS specification is currently supported by multiple Open 137 Source implementations (see Appendix G). See also Appendix F. 139 JCS is compatible with some existing systems relying on JSON 140 canonicalization such as JWK Thumbprint [RFC7638] and Keybase 141 [KEYBASE]. 143 For potential uses outside of cryptography see [JSONCOMP]. 145 The intended audiences of this document are JSON tool vendors, as 146 well as designers of JSON based cryptographic solutions. The reader 147 is assumed to be knowledgeable in ECMAScript including the "JSON" 148 object. 150 2. Terminology 152 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 153 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 154 "OPTIONAL" in this document are to be interpreted as described in BCP 155 14 [RFC2119] [RFC8174] when, and only when, they appear in all 156 capitals, as shown here. 158 3. Detailed Operation 160 This section describes different issues related to creating a 161 canonical JSON representation, and how they are addressed by JCS. 163 Appendix F describes the RECOMMENDED way of adding JCS support to 164 existing JSON tools. 166 3.1. Creation of Input Data 168 Data to be serialized is usually achieved by: 170 o Parsing previously generated JSON data. 172 o Programmatically creating data. 174 Irrespective of the method used, the data to be serialized MUST be 175 adapted for I-JSON [RFC7493] formatting, which implies the following: 177 o JSON Objects MUST NOT exhibit duplicate property names. 179 o JSON String data MUST be expressible as Unicode [UNICODE]. 181 o JSON Number data MUST be expressible as IEEE-754 [IEEE754] double 182 precision values. For applications needing higher precision or 183 longer integers than offered by IEEE-754 double precision, 184 Appendix D outlines how such requirements can be supported in an 185 interoperable and extensible way. 187 An additional constraint is that parsed JSON String data MUST NOT be 188 altered during subsequent serializations. For more information see 189 Appendix E. 191 Note: although the Unicode standard offers the possibility of 192 rearranging certain character sequences, referred to as "Unicode 193 Normalization" (https://www.unicode.org/reports/tr15/), JCS' string 194 processing does not take this in consideration. That is, all 195 components involved in a scheme depending on JCS, MUST preserve 196 Unicode string data "as is". 198 3.2. Generation of Canonical JSON Data 200 The following subsections describe the steps required to create a 201 canonical JSON representation of the data elaborated on in the 202 previous section. 204 Appendix A shows sample code for an ES6 based canonicalizer, matching 205 the JCS specification. 207 3.2.1. Whitespace 209 Whitespace between JSON tokens MUST NOT be emitted. 211 3.2.2. Serialization of Primitive Data Types 213 Assume a JSON object as follows is parsed: 215 { 216 "numbers": [333333333.33333329, 1E30, 4.50, 217 2e-3, 0.000000000000000000000000001], 218 "string": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/", 219 "literals": [null, true, false] 220 } 222 If the parsed data is subsequently serialized using a serializer 223 compliant with ES6's "JSON.stringify()", the result would (with a 224 line wrap added for display purposes only), be rather divergent with 225 respect to the original data: 227 {"numbers":[333333333.3333333,1e+30,4.5,0.002,1e-27],"string": 228 "€$\u000f\nA'B\"\\\\\"/","literals":[null,true,false]} 230 The reason for the difference between the parsed data and its 231 serialized counterpart, is due to a wide tolerance on input data (as 232 defined by JSON [RFC8259]), while output data (as defined by ES6), 233 has a fixed representation. As can be seen in the example, numbers 234 are subject to rounding as well. 236 The following subsections describe the serialization of primitive 237 JSON data types according to JCS. This part is identical to that of 238 ES6. In the (unlikely) event that a future version of ECMAScript 239 would invalidate any of the following serialization methods, it will 240 be up to the developer community to either stick to this 241 specification or create a new specification. 243 3.2.2.1. Serialization of Literals 245 In accordance with JSON [RFC8259], the literals "null", "true", and 246 "false" MUST be serialized as null, true, and false respectively. 248 3.2.2.2. Serialization of Strings 250 For JSON String data (which includes JSON Object property names as 251 well), each Unicode code point MUST be serialized as described below 252 (see section 24.3.2.2 of [ES6]): 254 o If the Unicode value falls within the traditional ASCII control 255 character range (U+0000 through U+001F), it MUST be serialized 256 using lowercase hexadecimal Unicode notation (\uhhhh) unless it is 257 in the set of predefined JSON control characters U+0008, U+0009, 258 U+000A, U+000C or U+000D which MUST be serialized as \b, \t, \n, 259 \f and \r respectively. 261 o If the Unicode value is outside of the ASCII control character 262 range, it MUST be serialized "as is" unless it is equivalent to 263 U+005C (\) or U+0022 (") which MUST be serialized as \\ and \" 264 respectively. 266 Finally, the resulting sequence of Unicode code points MUST be 267 enclosed in double quotes ("). 269 Note: since invalid Unicode data like "lone surrogates" (e.g. 270 U+DEAD), may lead to interoperability issues including broken 271 signatures, occurrences of such data MUST cause a compliant JCS 272 implementation to terminate with an appropriate error. 274 3.2.2.3. Serialization of Numbers 276 ES6 builds on the IEEE-754 [IEEE754] double precision standard for 277 representing JSON Number data. Such data MUST be serialized 278 according to section 7.1.12.1 of [ES6] including the "Note 2" 279 enhancement. 281 Due to the relative complexity of this part, the algorithm itself is 282 not included in this document. For implementers of JCS compliant 283 number serialization, Google's V8 [V8] may serve as a reference. 285 Another compatible number serialization reference implementation is 286 Ryu [RYU], that is used by the JCS open source Java implementation 287 mentioned in Appendix G. Appendix B holds a set of IEEE-754 sample 288 values and their corresponding JSON serialization. 290 Note: since "NaN" (Not a Number) and "Infinity" are not permitted in 291 JSON, occurrences of "NaN" or "Infinity" MUST cause a compliant JCS 292 implementation to terminate with an appropriate error. 294 3.2.3. Sorting of Object Properties 296 Although the previous step normalized the representation of primitive 297 JSON data types, the result would not yet qualify as "canonical" 298 since JSON Object properties are not in lexicographic (alphabetical) 299 order. 301 Applied to the sample in Section 3.2.2, a properly canonicalized 302 version should (with a line wrap added for display purposes only), 303 read as: 305 {"literals":[null,true,false],"numbers":[333333333.3333333, 306 1e+30,4.5,0.002,1e-27],"string":"€$\u000f\nA'B\"\\\\\"/"} 308 The rules for lexicographic sorting of JSON Object properties 309 according to JCS are as follows: 311 o JSON Object properties MUST be sorted recursively, which means 312 that JSON child Objects MUST have their properties sorted as well. 314 o JSON Array data MUST also be scanned for the presence of JSON 315 Objects (if an object is found then its properties MUST be 316 sorted), but array element order MUST NOT be changed. 318 When a JSON Object is about to have its properties sorted, the 319 following measures MUST be adhered to: 321 o The sorting process is applied to property name strings in their 322 "raw" (unescaped) form. That is, a newline character is treated 323 as U+000A. 325 o Property name strings to be sorted are formatted as arrays of 326 UTF-16 [UNICODE] code units. The sorting is based on pure value 327 comparisons, where code units are treated as unsigned integers, 328 independent of locale settings. 330 o Property name strings either have different values at some index 331 that is a valid index for both strings, or their lengths are 332 different, or both. If they have different values at one or more 333 index positions, let k be the smallest such index; then the string 334 whose value at position k has the smaller value, as determined by 335 using the < operator, lexicographically precedes the other string. 336 If there is no index position at which they differ, then the 337 shorter string lexicographically precedes the longer string. 339 In plain English this means that property names are sorted in 340 ascending order like the following: 342 "" 343 "a" 344 "aa" 345 "ab" 347 The rationale for basing the sorting algorithm on UTF-16 code units 348 is that it maps directly to the string type in ECMAScript (featured 349 in Web browsers and Node.js), Java and .NET. In addition, JSON only 350 supports escape sequences expressed as UTF-16 code units making 351 knowledge and handling of such data a necessity anyway. Systems 352 using another internal representation of string data will need to 353 convert JSON property name strings into arrays of UTF-16 code units 354 before sorting. The conversion from UTF-8 or UTF-32 to UTF-16 is 355 defined by the Unicode [UNICODE] standard. 357 The following test data can be used for verifying the correctness of 358 the sorting scheme in a JCS implementation. JSON test data: 360 { 361 "\u20ac": "Euro Sign", 362 "\r": "Carriage Return", 363 "\ufb33": "Hebrew Letter Dalet With Dagesh", 364 "1": "One", 365 "\ud83d\ude00": "Emoji: Grinning Face", 366 "\u0080": "Control", 367 "\u00f6": "Latin Small Letter O With Diaeresis" 368 } 370 Expected argument order after sorting property strings: 372 "Carriage Return" 373 "One" 374 "Control" 375 "Latin Small Letter O With Diaeresis" 376 "Euro Sign" 377 "Emoji: Grinning Face" 378 "Hebrew Letter Dalet With Dagesh" 380 Note: for the purpose of obtaining a deterministic property order, 381 sorting on UTF-8 or UTF-32 encoded data would also work, but the 382 outcome for JSON data like above would differ and thus be 383 incompatible with this specification. However, in practice, property 384 names are rarely defined outside of 7-bit ASCII making it possible to 385 sort on string data in UTF-8 or UTF-32 format without conversions to 386 UTF-16 and still be compatible with JCS. If this is a viable option 387 or not depends on the environment JCS is used in. 389 3.2.4. UTF-8 Generation 391 Finally, in order to create a platform independent representation, 392 the result of the preceding step MUST be encoded in UTF-8. 394 Applied to the sample in Section 3.2.3 this should yield the 395 following bytes here shown in hexadecimal notation: 397 7b 22 6c 69 74 65 72 61 6c 73 22 3a 5b 6e 75 6c 6c 2c 74 72 398 75 65 2c 66 61 6c 73 65 5d 2c 22 6e 75 6d 62 65 72 73 22 3a 399 5b 33 33 33 33 33 33 33 33 33 2e 33 33 33 33 33 33 33 2c 31 400 65 2b 33 30 2c 34 2e 35 2c 30 2e 30 30 32 2c 31 65 2d 32 37 401 5d 2c 22 73 74 72 69 6e 67 22 3a 22 e2 82 ac 24 5c 75 30 30 402 30 66 5c 6e 41 27 42 5c 22 5c 5c 5c 5c 5c 22 2f 22 7d 404 This data is intended to be usable as input to cryptographic methods. 406 4. IANA Considerations 408 This document has no IANA actions. 410 5. Security Considerations 412 It is vital performing sanity checks on input data to avoid 413 overflowing buffers and similar things that could affect the 414 integrity of the system. 416 When JCS is applied to signature schemes like the one described in 417 Appendix F, applications MUST perform the following operations before 418 acting upon received data: 420 1. Parse the JSON data and verify that it adheres to I-JSON. 422 2. Verify the data for correctness according to the conventions 423 defined by the ecosystem where it is to be used. This also 424 includes locating the property holding the signature data. 426 3. Verify the signature. 428 If any of these steps fail, the operation in progress MUST be 429 aborted. 431 6. Acknowledgements 433 Building on ES6 Number serialization was originally proposed by 434 James Manger. This ultimately led to the adoption of the entire ES6 435 serialization scheme for JSON primitives. 437 Other people who have contributed with valuable input to this 438 specification include Scott Ananian, Tim Bray, Ben Campbell, Adrian 439 Farell, Richard Gibson, Bron Gondwana, John-Mark Gurney, John Levine, 440 Mark Miller, Matthew Miller, Mike Jones, Mark Nottingham, 441 Mike Samuel, Jim Schaad, Robert Tupelo-Schneck and Michal Wadas. 443 For carrying out real world concept verification, the software and 444 support for number serialization provided by Ulf Adams, 445 Tanner Gooding and Remy Oudompheng was very helpful. 447 7. References 449 7.1. Normative References 451 [ES6] Ecma International, "ECMAScript 2015 Language 452 Specification", June 2015, 453 . 456 [IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", 457 August 2008, . 459 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 460 Requirement Levels", BCP 14, RFC 2119, 461 DOI 10.17487/RFC2119, March 1997, 462 . 464 [RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493, 465 DOI 10.17487/RFC7493, March 2015, 466 . 468 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 469 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 470 May 2017, . 472 [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data 473 Interchange Format", STD 90, RFC 8259, 474 DOI 10.17487/RFC8259, December 2017, 475 . 477 [UNICODE] The Unicode Consortium, "The Unicode Standard, Version 478 12.1.0", May 2019, 479 . 481 7.2. Informative References 483 [JSONCOMP] A. Rundgren, ""Comparable" JSON - Work in progress", 484 . 487 [KEYBASE] "Keybase", 488 . 490 [NODEJS] "Node.js", . 492 [OPENAPI] "The OpenAPI Initiative", . 494 [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 495 Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, 496 . 498 [RFC7515] Jones, M., Bradley, J., and N. Sakimura, "JSON Web 499 Signature (JWS)", RFC 7515, DOI 10.17487/RFC7515, May 500 2015, . 502 [RFC7638] Jones, M. and N. Sakimura, "JSON Web Key (JWK) 503 Thumbprint", RFC 7638, DOI 10.17487/RFC7638, September 504 2015, . 506 [RYU] Ulf Adams, "Ryu floating point number serializing 507 algorithm", . 509 [V8] Google LLC, "Chrome V8 Open Source JavaScript Engine", 510 . 512 [XMLDSIG] W3C, "XML Signature Syntax and Processing Version 1.1", 513 . 515 Appendix A. ES6 Sample Canonicalizer 517 Below is an example of a JCS canonicalizer for usage with ES6 based 518 systems: 520 //////////////////////////////////////////////////////////// 521 // Since the primary purpose of this code is highlighting // 522 // the core of the JCS algorithm, error handling and // 523 // UTF-8 generation were not implemented // 524 //////////////////////////////////////////////////////////// 525 var canonicalize = function(object) { 527 var buffer = ''; 528 serialize(object); 529 return buffer; 531 function serialize(object) { 532 if (object === null || typeof object !== 'object' || 533 object.toJSON != null) { 534 ///////////////////////////////////////////////// 535 // Primitive type or toJSON - Use ES6/JSON // 536 ///////////////////////////////////////////////// 537 buffer += JSON.stringify(object); 539 } else if (Array.isArray(object)) { 540 ///////////////////////////////////////////////// 541 // Array - Maintain element order // 542 ///////////////////////////////////////////////// 543 buffer += '['; 544 let next = false; 545 object.forEach((element) => { 546 if (next) { 547 buffer += ','; 548 } 549 next = true; 550 ///////////////////////////////////////// 551 // Array element - Recursive expansion // 552 ///////////////////////////////////////// 553 serialize(element); 554 }); 555 buffer += ']'; 557 } else { 558 ///////////////////////////////////////////////// 559 // Object - Sort properties before serializing // 560 ///////////////////////////////////////////////// 561 buffer += '{'; 562 let next = false; 563 Object.keys(object).sort().forEach((property) => { 564 if (next) { 565 buffer += ','; 566 } 567 next = true; 568 /////////////////////////////////////////////// 569 // Property names are strings - Use ES6/JSON // 570 /////////////////////////////////////////////// 571 buffer += JSON.stringify(property); 572 buffer += ':'; 573 ////////////////////////////////////////// 574 // Property value - Recursive expansion // 575 ////////////////////////////////////////// 576 serialize(object[property]); 577 }); 578 buffer += '}'; 579 } 580 } 581 }; 583 Appendix B. Number Serialization Samples 585 The following table holds a set of ES6 compatible Number 586 serialization samples, including some edge cases. The column 587 "IEEE-754" refers to the internal ES6 representation of the Number 588 data type which is based on the IEEE-754 [IEEE754] standard using 589 64-bit (double precision) values, here expressed in hexadecimal. 591 ╒══════════════════╤═══════════════════════════╤═════════════════════╕ 592 │ IEEE-754 │ JSON Representation │ Comment │ 593 ╞══════════════════╪═══════════════════════════╪═════════════════════╡ 594 │ 0000000000000000 │ 0 │ Zero │ 595 ├──────────────────┼───────────────────────────┼─────────────────────┤ 596 │ 8000000000000000 │ 0 │ Minus zero │ 597 ├──────────────────┼───────────────────────────┼─────────────────────┤ 598 │ 0000000000000001 │ 5e-324 │ Min pos number │ 599 ├──────────────────┼───────────────────────────┼─────────────────────┤ 600 │ 8000000000000001 │ -5e-324 │ Min neg number │ 601 ├──────────────────┼───────────────────────────┼─────────────────────┤ 602 │ 7fefffffffffffff │ 1.7976931348623157e+308 │ Max pos number │ 603 ├──────────────────┼───────────────────────────┼─────────────────────┤ 604 │ ffefffffffffffff │ -1.7976931348623157e+308 │ Max neg number │ 605 ├──────────────────┼───────────────────────────┼─────────────────────┤ 606 │ 4340000000000000 │ 9007199254740992 │ Max pos integer (1) │ 607 ├──────────────────┼───────────────────────────┼─────────────────────┤ 608 │ c340000000000000 │ -9007199254740992 │ Max neg integer (1) │ 609 ├──────────────────┼───────────────────────────┼─────────────────────┤ 610 │ 4430000000000000 │ 295147905179352830000 │ ~2**68 (2) │ 611 ├──────────────────┼───────────────────────────┼─────────────────────┤ 612 │ 7fffffffffffffff │ │ NaN (3) │ 613 ├──────────────────┼───────────────────────────┼─────────────────────┤ 614 │ 7ff0000000000000 │ │ Infinity (3) │ 615 ├──────────────────┼───────────────────────────┼─────────────────────┤ 616 │ 44b52d02c7e14af5 │ 9.999999999999997e+22 │ │ 617 ├──────────────────┼───────────────────────────┼─────────────────────┤ 618 │ 44b52d02c7e14af6 │ 1e+23 │ │ 619 ├──────────────────┼───────────────────────────┼─────────────────────┤ 620 │ 44b52d02c7e14af7 │ 1.0000000000000001e+23 │ │ 621 ├──────────────────┼───────────────────────────┼─────────────────────┤ 622 │ 444b1ae4d6e2ef4e │ 999999999999999700000 │ │ 623 ├──────────────────┼───────────────────────────┼─────────────────────┤ 624 │ 444b1ae4d6e2ef4f │ 999999999999999900000 │ │ 625 ├──────────────────┼───────────────────────────┼─────────────────────┤ 626 │ 444b1ae4d6e2ef50 │ 1e+21 │ │ 627 ├──────────────────┼───────────────────────────┼─────────────────────┤ 628 │ 3eb0c6f7a0b5ed8c │ 9.999999999999997e-7 │ │ 629 ├──────────────────┼───────────────────────────┼─────────────────────┤ 630 │ 3eb0c6f7a0b5ed8d │ 0.000001 │ │ 631 ├──────────────────┼───────────────────────────┼─────────────────────┤ 632 │ 41b3de4355555553 │ 333333333.3333332 │ │ 633 ├──────────────────┼───────────────────────────┼─────────────────────┤ 634 │ 41b3de4355555554 │ 333333333.33333325 │ │ 635 ├──────────────────┼───────────────────────────┼─────────────────────┤ 636 │ 41b3de4355555555 │ 333333333.3333333 │ │ 637 ├──────────────────┼───────────────────────────┼─────────────────────┤ 638 │ 41b3de4355555556 │ 333333333.3333334 │ │ 639 ├──────────────────┼───────────────────────────┼─────────────────────┤ 640 │ 41b3de4355555557 │ 333333333.33333343 │ │ 641 ├──────────────────┼───────────────────────────┼─────────────────────┤ 642 │ becbf647612f3696 │ -0.0000033333333333333333 │ │ 643 ├──────────────────┼───────────────────────────┼─────────────────────┤ 644 │ 43143ff3c1cb0959 │ 1424953923781206.2 │ Round to even (4) │ 645 └──────────────────┴───────────────────────────┴─────────────────────┘ 647 Notes: 649 (1) For maximum compliance with the ES6 "JSON" object, values that 650 are to be interpreted as true integers SHOULD be in the range 651 -9007199254740991 to 9007199254740991. However, how numbers are 652 used in applications do not affect the JCS algorithm. 654 (2) Although a set of specific integers like 2**68 could be regarded 655 as having extended precision, the JCS/ES6 number serialization 656 algorithm does not take this in consideration. 658 (3) Invalid. See Section 3.2.2.3. 660 (4) This number is exactly 1424953923781206.25 but will after the 661 "Note 2" rule mentioned in Section 3.2.2.3 be truncated and 662 rounded to the closest even value. 664 Appendix C. Canonicalized JSON as "Wire Format" 666 Since the result from the canonicalization process (see 667 Section 3.2.4), is fully valid JSON, it can also be used as 668 "Wire Format". However, this is just an option since cryptographic 669 schemes based on JCS, in most cases would not depend on that 670 externally supplied JSON data already is canonicalized. 672 In fact, the ES6 standard way of serializing objects using 673 "JSON.stringify()" produces a more "logical" format, where properties 674 are kept in the order they were created or received. The example 675 below shows an address record which could benefit from ES6 standard 676 serialization: 678 { 679 "name": "John Doe", 680 "address": "2000 Sunset Boulevard", 681 "city": "Los Angeles", 682 "zip": "90001", 683 "state": "CA" 684 } 686 Using canonicalization the properties above would be output in the 687 order "address", "city", "name", "state" and "zip", which adds 688 fuzziness to the data from a human (developer or technical support), 689 perspective. Canonicalization also converts JSON data into a single 690 line of text, which may be less than ideal for debugging and logging. 692 Appendix D. Dealing with Big Numbers 694 There are several issues associated with the JSON Number type, here 695 illustrated by the following sample object: 697 { 698 "giantNumber": 1.4e+9999, 699 "payMeThis": 26000.33, 700 "int64Max": 9223372036854775807 701 } 703 Although the sample above conforms to JSON [RFC8259], applications 704 would normally use different native data types for storing 705 "giantNumber" and "int64Max". In addition, monetary data like 706 "payMeThis" would presumably not rely on floating point data types 707 due to rounding issues with respect to decimal arithmetic. 709 The established way handling this kind of "overloading" of the JSON 710 Number type (at least in an extensible manner), is through mapping 711 mechanisms, instructing parsers what to do with different properties 712 based on their name. However, this greatly limits the value of using 713 the JSON Number type outside of its original somewhat constrained, 714 JavaScript context. The ES6 "JSON" object does not support mappings 715 to JSON Number either. 717 Due to the above, numbers that do not have a natural place in the 718 current JSON ecosystem MUST be wrapped using the JSON String type. 719 This is close to a de-facto standard for open systems. This is also 720 applicable for other data types that do not have direct support in 721 JSON, like "DateTime" objects as described in Appendix E. 723 Aided by a system using the JSON String type; be it programmatic like 725 var obj = JSON.parse('{"giantNumber": "1.4e+9999"}'); 726 var biggie = new BigNumber(obj.giantNumber); 728 or declarative schemes like OpenAPI [OPENAPI], JCS imposes no limits 729 on applications, including when using ES6. 731 Appendix E. String Subtype Handling 733 Due to the limited set of data types featured in JSON, the JSON 734 String type is commonly used for holding subtypes. This can 735 depending on JSON parsing method lead to interoperability problems 736 which MUST be dealt with by JCS compliant applications targeting a 737 wider audience. 739 Assume you want to parse a JSON object where the schema designer 740 assigned the property "big" for holding a "BigInt" subtype and "time" 741 for holding a "DateTime" subtype, while "val" is supposed to be a 742 JSON Number compliant with JCS. The following example shows such an 743 object: 745 { 746 "time": "2019-01-28T07:45:10Z", 747 "big": "055", 748 "val": 3.5 749 } 751 Parsing of this object can accomplished by the following ES6 752 statement: 754 var object = JSON.parse(JSON_object_featured_as_a_string); 756 After parsing the actual data can be extracted which for subtypes 757 also involve a conversion step using the result of the parsing 758 process (an ECMAScript object) as input: 760 ... = new Date(object.time); // Date object 761 ... = BigInt(object.big); // Big integer 762 ... = object.val; // JSON/JS number 764 Note that the "BigInt" data type is currently only natively supported 765 by V8 [V8]. 767 Canonicalization of "object" using the sample code in Appendix A 768 would return the following string: 770 {"big":"055","time":"2019-01-28T07:45:10Z","val":3.5} 772 Although this is (with respect to JCS) technically correct, there is 773 another way parsing JSON data which also can be used with ECMAScript 774 as shown below: 776 // "BigInt" requires the following code to become JSON serializable 777 BigInt.prototype.toJSON = function() { 778 return this.toString(); 779 }; 781 // JSON parsing using a "stream" based method 782 var object = JSON.parse(JSON_object_featured_as_a_string, 783 (k,v) => k == 'time' ? new Date(v) : k == 'big' ? BigInt(v) : v 784 ); 786 If you now apply the canonicalizer in Appendix A to "object", the 787 following string would be generated: 789 {"big":"55","time":"2019-01-28T07:45:10.000Z","val":3.5} 791 In this case the string arguments for "big" and "time" have changed 792 with respect to the original, presumable making an application 793 depending on JCS fail. 795 The reason for the deviation is that in stream and schema based JSON 796 parsers, the original "string" argument is typically replaced on-the- 797 fly by the native subtype which when serialized, may exhibit a 798 different and platform dependent pattern. 800 That is, stream and schema based parsing MUST treat subtypes as 801 "pure" (immutable) JSON String types, and perform the actual 802 conversion to the designated native type in a subsequent step. In 803 modern programming platforms like Go, Java and C# this can be 804 achieved with moderate efforts by combining annotations, getters and 805 setters. Below is an example in C#/Json.NET showing a part of a 806 class that is serializable as a JSON Object: 808 // The "pure" string solution uses a local 809 // string variable for JSON serialization while 810 // exposing another type to the application 811 [JsonProperty("amount")] 812 private string _amount; 814 [JsonIgnore] 815 public decimal Amount { 816 get { return decimal.Parse(_amount); } 817 set { _amount = value.ToString(); } 818 } 820 In an application "Amount" can be accessed as any other property 821 while it is actually represented by a quoted string in JSON contexts. 823 Note: the example above also addresses the constraints on numeric 824 data implied by I-JSON (the C# "decimal" data type has quite 825 different characteristics compared to IEEE-754 double precision). 827 E.1. Subtypes in Arrays 829 Since the JSON Array construct permits mixing arbitrary JSON data 830 types, custom parsing and serialization code may be required to cope 831 with subtypes anyway. 833 Appendix F. Implementation Guidelines 835 The optimal solution is integrating support for JCS directly in JSON 836 serializers (parsers need no changes). That is, canonicalization 837 would just be an additional "mode" for a JSON serializer. However, 838 this is currently not the case. Fortunately, JCS support can be 839 introduced through externally supplied canonicalizer software acting 840 as a post processor to existing JSON serializers. This arrangement 841 also relieves the JCS implementer from having to deal with how 842 underlying data is to be represented in JSON. 844 The post processor concept enables signature creation schemes like 845 the following: 847 1. Create the data to be signed. 849 2. Serialize the data using existing JSON tools. 851 3. Let the external canonicalizer process the serialized data and 852 return canonicalized result data. 854 4. Sign the canonicalized data. 856 5. Add the resulting signature value to the original JSON data 857 through a designated signature property. 859 6. Serialize the completed (now signed) JSON object using existing 860 JSON tools. 862 A compatible signature verification scheme would then be as follows: 864 1. Parse the signed JSON data using existing JSON tools. 866 2. Read and save the signature value from the designated signature 867 property. 869 3. Remove the signature property from the parsed JSON object. 871 4. Serialize the remaining JSON data using existing JSON tools. 873 5. Let the external canonicalizer process the serialized data and 874 return canonicalized result data. 876 6. Verify that the canonicalized data matches the saved signature 877 value using the algorithm and key used for creating the 878 signature. 880 A canonicalizer like above is effectively only a "filter", 881 potentially usable with a multitude of quite different cryptographic 882 schemes. 884 Using a JSON serializer with integrated JCS support, the 885 serialization performed before the canonicalization step could be 886 eliminated for both processes. 888 Appendix G. Open Source Implementations 890 The following Open Source implementations have been verified to be 891 compatible with JCS: 893 * JavaScript: https://www.npmjs.com/package/canonicalize 895 * Java: https://github.com/erdtman/java-json-canonicalization 897 * Go: https://github.com/cyberphone/json- 898 canonicalization/tree/master/go 900 * .NET/C#: https://github.com/cyberphone/json- 901 canonicalization/tree/master/dotnet 903 * Python: https://github.com/cyberphone/json- 904 canonicalization/tree/master/python3 906 Appendix H. Other JSON Canonicalization Efforts 908 There are (and have been) other efforts creating "Canonical JSON". 909 Below is a list of URLs to some of them: 911 * https://tools.ietf.org/html/draft-staykov-hu-json-canonical- 912 form-00 914 * https://gibson042.github.io/canonicaljson-spec/ 916 * http://wiki.laptop.org/go/Canonical_JSON 918 The listed efforts all build on text level JSON to JSON 919 transformations. The primary feature of text level canonicalization 920 is that it can be made neutral to the flavor of JSON used. However, 921 such schemes also imply major changes to the JSON parsing process 922 which is a likely hurdle for adoption. Albeit at the expense of 923 certain JSON and application constraints, JCS was designed to be 924 compatible with existing JSON tools. 926 Appendix I. Development Portal 928 The JCS specification is currently developed at: 929 https://github.com/cyberphone/ietf-json-canon. 931 The most recent "editors' copy" can be found at: 932 https://cyberphone.github.io/ietf-json-canon. 934 JCS source code and extensive test data is available at: 935 https://github.com/cyberphone/json-canonicalization 937 Appendix J. Document History 939 [[ to be removed by the RFC Editor before publication as an RFC ]] 941 Version 00-06: 943 * See IETF diff listings. 945 Version 07: 947 * Initial converson to XML RFC version 3. 949 * Changed intended status to "Informational". 951 * Added UTF-16 test data and explanations. 953 Version 08: 955 * Updated Abstract. 957 * Added a "Note 2" number serialization sample. 959 * Updated Security Considerations. 961 * Tried to clear up the JSON input data section. 963 * Added a line about Unicode normalization. 965 * Added a line about serialiation of structured data. 967 * Added a missing fact about "BigInt" (V8 not ES6). 969 Version 09: 971 * Updated initial line of Abstract and Introduction. 973 * Added note about breaking ECMAScript changes. 975 * Minor language nit fixes. 977 Version 10-12: 979 * Language tweaks. 981 Version 13: 983 * Reorganized Section 3.2.2.3. 985 Version 14: 987 * Improved introduction + some minor changes in security 988 considerations, aknowlegdgements, and unicode normalization. 990 * Generalized data representation issues by updating Appendix F. 992 Authors' Addresses 994 Anders Rundgren 995 Independent 996 Montpellier 997 France 998 Email: anders.rundgren.net@gmail.com 999 URI: https://www.linkedin.com/in/andersrundgren/ 1001 Bret Jordan 1002 Symantec Corporation 1003 350 Ellis Street 1004 Mountain View, CA 94043 1005 United States of America 1007 Email: bret_jordan@symantec.com 1009 Samuel Erdtman 1010 Spotify AB 1011 Birger Jarlsgatan 61, 4tr 1012 SE-113 56 Stockholm 1013 Sweden 1015 Email: erdtman@spotify.com