idnits 2.17.1 draft-rundgren-json-canonicalization-scheme-09.txt: -(583): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(584): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(585): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(586): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(587): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(588): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(589): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(590): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(591): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(592): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(593): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(594): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(595): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(596): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(597): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(598): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(599): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(600): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(601): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(602): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(603): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(604): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(605): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(606): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(607): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(608): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(609): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(610): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(611): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(612): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(613): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(614): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(615): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(616): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(617): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(618): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(619): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(620): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(621): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(622): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(623): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(624): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(625): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(626): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(627): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(628): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(629): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(630): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(631): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(632): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(633): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(634): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(635): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(636): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(637): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 57 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 14, 2019) is 1685 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC4648' is mentioned on line 487, but not defined == Missing Reference: 'RFC7515' is mentioned on line 491, but not defined == Missing Reference: 'XMLDSIG' is mentioned on line 505, but not defined == Missing Reference: 'NODEJS' is mentioned on line 483, but not defined == Missing Reference: 'RFC7638' is mentioned on line 495, but not defined == Missing Reference: 'KEYBASE' is mentioned on line 480, but not defined == Missing Reference: 'JSONCOMP' is mentioned on line 476, but not defined == Missing Reference: 'V8' is mentioned on line 502, but not defined == Missing Reference: 'RYU' is mentioned on line 499, but not defined == Missing Reference: 'OPENAPI' is mentioned on line 720, but not defined == Missing Reference: 'JsonIgnore' is mentioned on line 804, but not defined Summary: 0 errors (**), 0 flaws (~~), 13 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Rundgren 3 Internet-Draft Independent 4 Intended status: Informational B. Jordan 5 Expires: March 17, 2020 Symantec Corporation 6 S. Erdtman 7 Spotify AB 8 September 14, 2019 10 JSON Canonicalization Scheme (JCS) 11 draft-rundgren-json-canonicalization-scheme-09 13 Abstract 15 Cryptographic operations like hashing and signing requires that the 16 data is expressed in an invariant format. One way addressing this 17 issue is creating a canonical form of the data. Canonicalization 18 also permits data to be exchanged in its original form on the "wire" 19 while secure cryptographic operations are performed on its 20 canonicalized counterpart in the producer and consumer end points. 21 The JSON Canonicalization Scheme (JCS) provides canonicalization 22 support for data in the JSON format by building on the strict 23 serialization methods for JSON primitives defined by ECMAScript, 24 constraining JSON data to the I-JSON subset, and through a 25 deterministic property sorting scheme. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at https://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on 17 March 2020. 44 Copyright Notice 46 Copyright (c) 2019 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 51 license-info) in effect on the date of publication of this document. 52 Please review these documents carefully, as they describe your rights 53 and restrictions with respect to this document. Code Components 54 extracted from this document must include Simplified BSD License text 55 as described in Section 4.e of the Trust Legal Provisions and are 56 provided without warranty as described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 62 3. Detailed Operation . . . . . . . . . . . . . . . . . . . . . 4 63 3.1. Creation of Input Data . . . . . . . . . . . . . . . . . 4 64 3.2. Generation of Canonical JSON Data . . . . . . . . . . . . 5 65 3.2.1. Whitespace . . . . . . . . . . . . . . . . . . . . . 5 66 3.2.2. Serialization of Primitive Data Types . . . . . . . . 5 67 3.2.2.1. Serialization of Literals . . . . . . . . . . . . 6 68 3.2.2.2. Serialization of Strings . . . . . . . . . . . . 6 69 3.2.2.3. Serialization of Numbers . . . . . . . . . . . . 6 70 3.2.3. Sorting of Object Properties . . . . . . . . . . . . 7 71 3.2.4. UTF-8 Generation . . . . . . . . . . . . . . . . . . 9 72 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 73 5. Security Considerations . . . . . . . . . . . . . . . . . . . 9 74 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 75 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 76 7.1. Normative References . . . . . . . . . . . . . . . . . . 10 77 7.2. Informal References . . . . . . . . . . . . . . . . . . . 11 78 Appendix A. ES6 Sample Canonicalizer . . . . . . . . . . . . . . 11 79 Appendix B. Number Serialization Samples . . . . . . . . . . . . 13 80 Appendix C. Canonicalized JSON as "Wire Format" . . . . . . . . 14 81 Appendix D. Dealing with Big Numbers . . . . . . . . . . . . . . 15 82 Appendix E. String Subtype Handling . . . . . . . . . . . . . . 16 83 E.1. Subtypes in Arrays . . . . . . . . . . . . . . . . . . . 18 84 Appendix F. Implementation Guidelines . . . . . . . . . . . . . 18 85 Appendix G. Open Source Implementations . . . . . . . . . . . . 19 86 Appendix H. Other JSON Canonicalization Efforts . . . . . . . . 19 87 Appendix I. Development Portal . . . . . . . . . . . . . . . . . 20 88 Appendix J. Document History . . . . . . . . . . . . . . . . . . 20 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 91 1. Introduction 93 Cryptographic operations like hashing and signing requires that the 94 data is expressed in an invariant format. One way of accomplishing 95 this is converting the data into a format that has a simple and fixed 96 representation like Base64Url [RFC4648], which is how JWS [RFC7515] 97 addressed this issue. 99 Another solution is to create a canonical version of the data, 100 similar to what was done for the XML Signature [XMLDSIG] standard. 101 The primary advantage with a canonicalizing scheme is that data can 102 be kept in its original form. This is the core rationale behind JCS. 103 Put another way: by using canonicalization a JSON Object may remain a 104 JSON Object even after being signed which can simplify system design, 105 documentation and logging. 107 To avoid "reinventing the wheel", JCS relies on serialization of JSON 108 primitives (strings, numbers and literals), compatible with 109 ECMAScript (aka JavaScript) beginning with version 6 [ES6], hereafter 110 referred to as "ES6". 112 Seasoned XML developers recalling difficulties getting signatures to 113 validate (usually due to different interpretations of the quite 114 intricate XML canonicalization rules as well as of the equally 115 extensive Web Services security standards), may rightfully wonder why 116 JCS would not suffer from similar issues. The reasons are twofold: 118 o The absence of a namespace concept and default values, as well as 119 constraining data to the I-JSON subset eliminate the need for 120 specific parsers for dealing with canonicalization. 122 o JCS compatible serialization of JSON primitives is supported by 123 most current Web browsers and as well as by Node.js [NODEJS], 124 while the full JCS specification is supported by multiple Open 125 Source implementations (see Appendix G). See also Appendix F. 127 In summary the JCS specification describes how serialization of JSON 128 primitives compliant with ES6 combined with a deterministic property 129 sorting scheme can be used for creating "Hashable" representations of 130 JSON data intended for consumption by cryptographic methods. 132 JCS is compatible with some existing systems relying on JSON 133 canonicalization such as JWK Thumbprint [RFC7638] and Keybase 134 [KEYBASE]. 136 For potential uses outside of cryptography see [JSONCOMP]. 138 The intended audiences of this document are JSON tool vendors, as 139 well as designers of JSON based cryptographic solutions. The reader 140 is assumed to have a basic knowledge of ECMAScript including the 141 "JSON" object. 143 2. Terminology 145 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 146 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 147 "OPTIONAL" in this document are to be interpreted as described in BCP 148 14 [RFC2119] [RFC8174] when, and only when, they appear in all 149 capitals, as shown here. 151 3. Detailed Operation 153 This section describes the different issues related to creating a 154 canonical JSON representation, and how they are addressed by JCS. 156 3.1. Creation of Input Data 158 Data to be serialized is usually achieved by: 160 o Parsing previously generated JSON data. 162 o Programmatically creating data. 164 Irrespective of the method used, the data to be serialized MUST be 165 adapted for I-JSON [RFC7493] formatting, which implies the following: 167 o JSON Objects MUST NOT exhibit duplicate property names. 169 o JSON String data MUST be expressible as Unicode [UNICODE]. 171 o JSON Number data MUST be expressible as IEEE-754 [IEEE754] double 172 precision values. For applications needing higher precision or 173 longer integers than offered by IEEE-754 double precision, 174 Appendix D outlines how such requirements can be supported in an 175 interoperable and extensible way. 177 An additional constraint is that parsed JSON String data MUST NOT be 178 altered during subsequent serializations. For more information see 179 Appendix E. 181 Note: although the Unicode standard offers a possibility combining 182 certain characters into one, referred to as "Unicode Normalization" 183 (https://www.unicode.org/reports/tr15/), JCS' string processing does 184 not take this in consideration. That is, all components involved in 185 a scheme depending on JCS, MUST preserve Unicode string data "as is". 187 Note: how structured objects like sets are represented in JSON is out 188 of scope for JCS. See also Appendix F. 190 3.2. Generation of Canonical JSON Data 192 The following subsections describe the steps required for creating a 193 canonical JSON representation of the data elaborated on in the 194 previous section. 196 Appendix A shows sample code for an ES6 based canonicalizer, matching 197 the JCS specification. 199 3.2.1. Whitespace 201 Whitespace between JSON tokens MUST NOT be emitted. 203 3.2.2. Serialization of Primitive Data Types 205 Assume that you parse a JSON object like the following: 207 { 208 "numbers": [333333333.33333329, 1E30, 4.50, 209 2e-3, 0.000000000000000000000000001], 210 "string": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/", 211 "literals": [null, true, false] 212 } 214 If you subsequently serialize the parsed data using a serializer 215 compliant with ES6's "JSON.stringify()", the result would (with a 216 line wrap added for display purposes only), be rather divergent with 217 respect to representation of data: 219 {"numbers":[333333333.3333333,1e+30,4.5,0.002,1e-27],"string": 220 "€$\u000f\nA'B\"\\\\\"/","literals":[null,true,false]} 222 The reason for the difference between the parsed data and its 223 serialized counterpart, is due to a wide tolerance on input data (as 224 defined by JSON [RFC8259]), while output data (as defined by ES6), 225 has a fixed representation. As can be seen by the example, numbers 226 are subject to rounding as well. 228 The following subsections describe serialization of primitive JSON 229 data types according to JCS. This part is identical to that of ES6. 230 In the (unlikely) event that a future version of ECMAScript would 231 invalidate any of the following serialization methods, it will be up 232 to the developer community to either stick to this specification or 233 create a new specification. 235 3.2.2.1. Serialization of Literals 237 The JSON literals "null", "true", and "false" present no challenge 238 since they already have a fixed definition in JSON [RFC8259]. 240 3.2.2.2. Serialization of Strings 242 For JSON String data (which includes JSON Object property names as 243 well), each Unicode code point MUST be serialized as described below 244 (also matching Section 24.3.2.2 of [ES6]): 246 o If the Unicode value falls within the traditional ASCII control 247 character range (U+0000 through U+001F), it MUST be serialized 248 using lowercase hexadecimal Unicode notation (\uhhhh) unless it is 249 in the set of predefined JSON control characters U+0008, U+0009, 250 U+000A, U+000C or U+000D which MUST be serialized as \b, \t, \n, 251 \f and \r respectively. 253 o If the Unicode value is outside of the ASCII control character 254 range, it MUST be serialized "as is" unless it is equivalent to 255 U+005C (\) or U+0022 (") which MUST be serialized as \\ and \" 256 respectively. 258 Finally, the resulting sequence of Unicode code points MUST be 259 enclosed in double quotes ("). 261 Note: some JSON systems permit the use of invalid Unicode data 262 including "lone surrogates" (e.g. U+DEAD). Since this leads to 263 interoperability issues including broken signatures, occurrences of 264 such data MUST cause the JCS algorithm to terminate with an error 265 indication. 267 3.2.2.3. Serialization of Numbers 269 JSON Number data MUST be serialized according to Section 7.1.12.1 of 270 [ES6] including the "Note 2" enhancement. 272 Due to the relative complexity of this part, the algorithm itself is 273 not included in this document. However, the specification is fully 274 implemented by for example Google's V8 [V8]. The open source Java 275 implementation mentioned in Appendix G uses a recently developed 276 number serialization algorithm called Ryu [RYU]. 278 ES6 builds on the IEEE-754 [IEEE754] double precision standard for 279 representing JSON Number data. Appendix B holds a set of IEEE-754 280 sample values and their corresponding JSON serialization. 282 Note: since NaN (Not a Number) and Infinity are not permitted in 283 JSON, occurrences of such values MUST cause the JCS algorithm to 284 terminate with an error indication. 286 3.2.3. Sorting of Object Properties 288 Although the previous step indeed normalized the representation of 289 primitive JSON data types, the result would not qualify as 290 "canonical" since JSON Object properties are not in lexicographic 291 (alphabetical) order. 293 Applied to the sample in Section 3.2.2, a properly canonicalized 294 version should (with a line wrap added for display purposes only), 295 read as: 297 {"literals":[null,true,false],"numbers":[333333333.3333333, 298 1e+30,4.5,0.002,1e-27],"string":"€$\u000f\nA'B\"\\\\\"/"} 300 The rules for lexicographic sorting of JSON Object properties 301 according to JCS are as follows: 303 o JSON Object properties MUST be sorted in a recursive manner which 304 means that possible JSON child Objects MUST have their properties 305 sorted as well. 307 o JSON Array data MUST also be scanned for presence of JSON Objects 308 (and applying associated property sorting), but array element 309 order MUST NOT be changed. 311 When a JSON Object is about to have its properties sorted, the 312 following measures MUST be adhered to: 314 o The sorting process is applied to property name strings in their 315 "raw" (unescaped) form. That is, a newline character is treated 316 as U+000A. 318 o Property name strings to be sorted are formatted as arrays of 319 UTF-16 [UNICODE] code units. The sorting is based on pure value 320 comparisons, where code units are treated as unsigned integers, 321 independent of locale settings. 323 o Property name strings either have different values at some index 324 that is a valid index for both strings, or their lengths are 325 different, or both. If they have different values at one or more 326 index positions, let k be the smallest such index; then the string 327 whose value at position k has the smaller value, as determined by 328 using the < operator, lexicographically precedes the other string. 329 If there is no index position at which they differ, then the 330 shorter string lexicographically precedes the longer string. 332 In plain English this means that property names are sorted in 333 ascending order like the following: 335 "" 336 "a" 337 "aa" 338 "ab" 340 The rationale for basing the sorting algorithm on UTF-16 code units 341 is that it maps directly to the string type in ECMAScript (featured 342 in Web browsers and Node.js), Java and .NET. In addition, JSON only 343 supports escape sequences expressed as UTF-16 code units making 344 knowledge and handling of such data a necessity anyway. Systems 345 using another internal representation of string data will need to 346 convert JSON property name strings into arrays of UTF-16 code units 347 before sorting. The conversion from UTF-8 or UTF-32 to UTF-16 is 348 defined by the Unicode [UNICODE] standard. 350 The following test data can be used for verifying the correctness of 351 the sorting scheme in a JCS implementation. Input JSON data: 353 { 354 "\u20ac": "Euro Sign", 355 "\r": "Carriage Return", 356 "\ufb33": "Hebrew Letter Dalet With Dagesh", 357 "1": "One", 358 "\ud83d\ude00": "Emoji: Grinning Face", 359 "\u0080": "Control", 360 "\u00f6": "Latin Small Letter O With Diaeresis" 361 } 363 Expected argument order after sorting property strings: 365 "Carriage Return" 366 "One" 367 "Control" 368 "Latin Small Letter O With Diaeresis" 369 "Euro Sign" 370 "Emoji: Grinning Face" 371 "Hebrew Letter Dalet With Dagesh" 373 Note: for the purpose of obtaining a deterministic property order 374 sorting on UTF-8 or UTF-32 encoded data would also work but the 375 outcome for JSON data like above would differ and thus be 376 incompatible with this specification. However, in practice property 377 names are rarely defined outside of 7-bit ASCII making it possible 378 sorting on string data in UTF-8 or UTF-32 without conversions and 379 still be compatible with JCS. If this is a viable option or not 380 depends on the environment JCS is supposed to be used in. 382 3.2.4. UTF-8 Generation 384 Finally, in order to create a platform independent representation, 385 the result of the preceding step MUST be encoded in UTF-8. 387 Applied to the sample in Section 3.2.3 this should yield the 388 following bytes here shown in hexadecimal notation: 390 7b 22 6c 69 74 65 72 61 6c 73 22 3a 5b 6e 75 6c 6c 2c 74 72 391 75 65 2c 66 61 6c 73 65 5d 2c 22 6e 75 6d 62 65 72 73 22 3a 392 5b 33 33 33 33 33 33 33 33 33 2e 33 33 33 33 33 33 33 2c 31 393 65 2b 33 30 2c 34 2e 35 2c 30 2e 30 30 32 2c 31 65 2d 32 37 394 5d 2c 22 73 74 72 69 6e 67 22 3a 22 e2 82 ac 24 5c 75 30 30 395 30 66 5c 6e 41 27 42 5c 22 5c 5c 5c 5c 5c 22 2f 22 7d 397 This data is intended to be usable as input to cryptographic methods. 399 4. IANA Considerations 401 This document has no IANA actions. 403 5. Security Considerations 405 It is vital performing "sanity" checks on input data to avoid 406 overflowing buffers and similar things that could affect the 407 integrity of the system. 409 When JCS is applied to signature schemes like the one described in 410 Appendix F, applications MUST perform the following operations before 411 acting upon received data: 413 1. Parse the JSON data and verify that it adheres to I-JSON. 415 2. Verify the data for correctness according to the conventions 416 defined by the ecosystem where it is to be used. This also 417 includes locating the property holding the signature data. 419 3. Verify the signature. 421 If any of these steps fail, the operation in progress MUST be 422 aborted. 424 6. Acknowledgements 426 Building on ES6 Number serialization was originally proposed by 427 James Manger. This ultimately led to the adoption of the entire ES6 428 serialization scheme for JSON primitives. 430 Other people who have contributed with valuable input to this 431 specification include Scott Ananian, Tim Bray, Ben Campbell, Adrian 432 Farell, Richard Gibson, Bron Gondwana, John-Mark Gurney, John Levine, 433 Mark Miller, Matt Miller, Mike Jones, Mark Nottingham, Mike Samuel, 434 Jim Schaad, Robert Tupelo-Schneck and Michal Wadas. 436 For carrying out real world concept verification, the software and 437 support for number serialization provided by Ulf Adams, 438 Tanner Gooding and Remy Oudompheng was very helpful. 440 7. References 442 7.1. Normative References 444 [ES6] Ecma International, "ECMAScript 2015 Language 445 Specification", June 2015, 446 . 449 [IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", 450 August 2008, . 452 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 453 Requirement Levels", BCP 14, RFC 2119, 454 DOI 10.17487/RFC2119, March 1997, 455 . 457 [RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493, 458 DOI 10.17487/RFC7493, March 2015, 459 . 461 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 462 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 463 May 2017, . 465 [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data 466 Interchange Format", STD 90, RFC 8259, 467 DOI 10.17487/RFC8259, December 2017, 468 . 470 [UNICODE] The Unicode Consortium, "The Unicode Standard, Version 471 12.1.0", May 2019, 472 . 474 7.2. Informal References 476 [JSONCOMP] A. Rundgren, ""Comparable" JSON - Work in progress", 477 . 480 [KEYBASE] "Keybase", 481 . 483 [NODEJS] "Node.js", . 485 [OPENAPI] "The OpenAPI Initiative", . 487 [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 488 Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, 489 . 491 [RFC7515] Jones, M., Bradley, J., and N. Sakimura, "JSON Web 492 Signature (JWS)", RFC 7515, DOI 10.17487/RFC7515, May 493 2015, . 495 [RFC7638] Jones, M. and N. Sakimura, "JSON Web Key (JWK) 496 Thumbprint", RFC 7638, DOI 10.17487/RFC7638, September 497 2015, . 499 [RYU] Ulf Adams, "Ryu floating point number serializing 500 algorithm", . 502 [V8] Google LLC, "Chrome V8 Open Source JavaScript Engine", 503 . 505 [XMLDSIG] W3C, "XML Signature Syntax and Processing Version 1.1", 506 . 508 Appendix A. ES6 Sample Canonicalizer 510 Below is an example of a JCS canonicalizer for usage with ES6 based 511 systems: 513 //////////////////////////////////////////////////////////// 514 // Since the primary purpose of this code is highlighting // 515 // the core of the JCS algorithm, error handling and // 516 // UTF-8 generation were not implemented // 517 //////////////////////////////////////////////////////////// 518 var canonicalize = function(object) { 519 var buffer = ''; 520 serialize(object); 521 return buffer; 523 function serialize(object) { 524 if (object === null || typeof object !== 'object' || 525 object.toJSON != null) { 526 ///////////////////////////////////////////////// 527 // Primitive type or toJSON - Use ES6/JSON // 528 ///////////////////////////////////////////////// 529 buffer += JSON.stringify(object); 531 } else if (Array.isArray(object)) { 532 ///////////////////////////////////////////////// 533 // Array - Maintain element order // 534 ///////////////////////////////////////////////// 535 buffer += '['; 536 let next = false; 537 object.forEach((element) => { 538 if (next) { 539 buffer += ','; 540 } 541 next = true; 542 ///////////////////////////////////////// 543 // Array element - Recursive expansion // 544 ///////////////////////////////////////// 545 serialize(element); 546 }); 547 buffer += ']'; 549 } else { 550 ///////////////////////////////////////////////// 551 // Object - Sort properties before serializing // 552 ///////////////////////////////////////////////// 553 buffer += '{'; 554 let next = false; 555 Object.keys(object).sort().forEach((property) => { 556 if (next) { 557 buffer += ','; 558 } 559 next = true; 560 /////////////////////////////////////////////// 561 // Property names are strings - Use ES6/JSON // 562 /////////////////////////////////////////////// 563 buffer += JSON.stringify(property); 564 buffer += ':'; 565 ////////////////////////////////////////// 566 // Property value - Recursive expansion // 567 ////////////////////////////////////////// 568 serialize(object[property]); 569 }); 570 buffer += '}'; 571 } 572 } 573 }; 575 Appendix B. Number Serialization Samples 577 The following table holds a set of ES6 compatible Number 578 serialization samples, including some edge cases. The column 579 "IEEE-754" refers to the internal ES6 representation of the Number 580 data type which is based on the IEEE-754 [IEEE754] standard using 581 64-bit (double precision) values, here expressed in hexadecimal. 583 ╒══════════════════╤═══════════════════════════╤═════════════════════╕ 584 │ IEEE-754 │ JSON Representation │ Comment │ 585 ╞══════════════════╪═══════════════════════════╪═════════════════════╡ 586 │ 0000000000000000 │ 0 │ Zero │ 587 ├──────────────────┼───────────────────────────┼─────────────────────┤ 588 │ 8000000000000000 │ 0 │ Minus zero │ 589 ├──────────────────┼───────────────────────────┼─────────────────────┤ 590 │ 0000000000000001 │ 5e-324 │ Min pos number │ 591 ├──────────────────┼───────────────────────────┼─────────────────────┤ 592 │ 8000000000000001 │ -5e-324 │ Min neg number │ 593 ├──────────────────┼───────────────────────────┼─────────────────────┤ 594 │ 7fefffffffffffff │ 1.7976931348623157e+308 │ Max pos number │ 595 ├──────────────────┼───────────────────────────┼─────────────────────┤ 596 │ ffefffffffffffff │ -1.7976931348623157e+308 │ Max neg number │ 597 ├──────────────────┼───────────────────────────┼─────────────────────┤ 598 │ 4340000000000000 │ 9007199254740992 │ Max pos integer (1) │ 599 ├──────────────────┼───────────────────────────┼─────────────────────┤ 600 │ c340000000000000 │ -9007199254740992 │ Max neg integer (1) │ 601 ├──────────────────┼───────────────────────────┼─────────────────────┤ 602 │ 4430000000000000 │ 295147905179352830000 │ ~2**68 (2) │ 603 ├──────────────────┼───────────────────────────┼─────────────────────┤ 604 │ 7fffffffffffffff │ │ NaN (3) │ 605 ├──────────────────┼───────────────────────────┼─────────────────────┤ 606 │ 7ff0000000000000 │ │ Infinity (3) │ 607 ├──────────────────┼───────────────────────────┼─────────────────────┤ 608 │ 44b52d02c7e14af5 │ 9.999999999999997e+22 │ │ 609 ├──────────────────┼───────────────────────────┼─────────────────────┤ 610 │ 44b52d02c7e14af6 │ 1e+23 │ │ 611 ├──────────────────┼───────────────────────────┼─────────────────────┤ 612 │ 44b52d02c7e14af7 │ 1.0000000000000001e+23 │ │ 613 ├──────────────────┼───────────────────────────┼─────────────────────┤ 614 │ 444b1ae4d6e2ef4e │ 999999999999999700000 │ │ 615 ├──────────────────┼───────────────────────────┼─────────────────────┤ 616 │ 444b1ae4d6e2ef4f │ 999999999999999900000 │ │ 617 ├──────────────────┼───────────────────────────┼─────────────────────┤ 618 │ 444b1ae4d6e2ef50 │ 1e+21 │ │ 619 ├──────────────────┼───────────────────────────┼─────────────────────┤ 620 │ 3eb0c6f7a0b5ed8c │ 9.999999999999997e-7 │ │ 621 ├──────────────────┼───────────────────────────┼─────────────────────┤ 622 │ 3eb0c6f7a0b5ed8d │ 0.000001 │ │ 623 ├──────────────────┼───────────────────────────┼─────────────────────┤ 624 │ 41b3de4355555553 │ 333333333.3333332 │ │ 625 ├──────────────────┼───────────────────────────┼─────────────────────┤ 626 │ 41b3de4355555554 │ 333333333.33333325 │ │ 627 ├──────────────────┼───────────────────────────┼─────────────────────┤ 628 │ 41b3de4355555555 │ 333333333.3333333 │ │ 629 ├──────────────────┼───────────────────────────┼─────────────────────┤ 630 │ 41b3de4355555556 │ 333333333.3333334 │ │ 631 ├──────────────────┼───────────────────────────┼─────────────────────┤ 632 │ 41b3de4355555557 │ 333333333.33333343 │ │ 633 ├──────────────────┼───────────────────────────┼─────────────────────┤ 634 │ becbf647612f3696 │ -0.0000033333333333333333 │ │ 635 ├──────────────────┼───────────────────────────┼─────────────────────┤ 636 │ 43143ff3c1cb0959 │ 1424953923781206.2 │ Round to even (4) │ 637 └──────────────────┴───────────────────────────┴─────────────────────┘ 639 Notes: 641 (1) For maximum compliance with the ES6 "JSON" object values that 642 are to be interpreted as true integers SHOULD be in the range 643 -9007199254740991 to 9007199254740991. However, how numbers are 644 used in applications do not affect the JCS algorithm. 646 (2) Although a set of specific integers like 2**68 could be regarded 647 as having extended precision, the JCS/ES6 number serialization 648 algorithm does not take this in consideration. 650 (3) Invalid. See Section 3.2.2.3. 652 (4) This number is exactly 1424953923781206.25 but will after the 653 "Note 2" rule mentioned in Section 3.2.2.3 be truncated and 654 rounded to the closest even value. 656 Appendix C. Canonicalized JSON as "Wire Format" 658 Since the result from the canonicalization process (see 659 Section 3.2.4), is fully valid JSON, it can also be used as 660 "Wire Format". However, this is just an option since cryptographic 661 schemes based on JCS, in most cases would not depend on that 662 externally supplied JSON data already is canonicalized. 664 In fact, the ES6 standard way of serializing objects using 665 "JSON.stringify()" produces a more "logical" format, where properties 666 are kept in the order they were created or received. The example 667 below shows an address record which could benefit from ES6 standard 668 serialization: 670 { 671 "name": "John Doe", 672 "address": "2000 Sunset Boulevard", 673 "city": "Los Angeles", 674 "zip": "90001", 675 "state": "CA" 676 } 678 Using canonicalization the properties above would be output in the 679 order "address", "city", "name", "state" and "zip", which adds 680 fuzziness to the data from a human (developer or technical support), 681 perspective. Canonicalization also converts JSON data into a single 682 line of text, which may be less than ideal for debugging and logging. 684 Appendix D. Dealing with Big Numbers 686 There are several issues associated with the JSON Number type, here 687 illustrated by the following sample object: 689 { 690 "giantNumber": 1.4e+9999, 691 "payMeThis": 26000.33, 692 "int64Max": 9223372036854775807 693 } 695 Although the sample above conforms to JSON [RFC8259], applications 696 would normally use different native data types for storing 697 "giantNumber" and "int64Max". In addition, monetary data like 698 "payMeThis" would presumably not rely on floating point data types 699 due to rounding issues with respect to decimal arithmetic. 701 The established way handling this kind of "overloading" of the JSON 702 Number type (at least in an extensible manner), is through mapping 703 mechanisms, instructing parsers what to do with different properties 704 based on their name. However, this greatly limits the value of using 705 the JSON Number type outside of its original somewhat constrained, 706 JavaScript context. The ES6 "JSON" object does not support mappings 707 to JSON Number either. 709 Due to the above, numbers that do not have a natural place in the 710 current JSON ecosystem MUST be wrapped using the JSON String type. 711 This is close to a de-facto standard for open systems. This is also 712 applicable for other data types that do not have direct support in 713 JSON, like "DateTime" objects as described in Appendix E. 715 Aided by a system using the JSON String type; be it programmatic like 717 var obj = JSON.parse('{"giantNumber": "1.4e+9999"}'); 718 var biggie = new BigNumber(obj.giantNumber); 720 or declarative schemes like OpenAPI [OPENAPI], JCS imposes no limits 721 on applications, including when using ES6. 723 Appendix E. String Subtype Handling 725 Due to the limited set of data types featured in JSON, the JSON 726 String type is commonly used for holding subtypes. This can 727 depending on JSON parsing method lead to interoperability problems 728 which MUST be dealt with by JCS compliant applications targeting a 729 wider audience. 731 Assume you want to parse a JSON object where the schema designer 732 assigned the property "big" for holding a "BigInteger" subtype and 733 "time" for holding a "DateTime" subtype, while "val" is supposed to 734 be a JSON Number compliant with JCS. The following example shows 735 such an object: 737 { 738 "time": "2019-01-28T07:45:10Z", 739 "big": "055", 740 "val": 3.5 741 } 743 Parsing of this object can accomplished by the following ES6 744 statement: 746 var object = JSON.parse(JSON_object_featured_as_a_string); 748 After parsing the actual data can be extracted which for subtypes 749 also involve a conversion step using the result of the parsing 750 process (an ECMAScript object) as input: 752 ... = new Date(object.time); // Date object 753 ... = BigInt(object.big); // Big integer 754 ... = object.val; // JSON/JS number 756 Canonicalization of "object" using the sample code in Appendix A 757 would return the following string: 759 {"big":"055","time":"2019-01-28T07:45:10Z","val":3.5} 761 Although this is (with respect to JCS) technically correct, there is 762 another way parsing JSON data which also can be used with ECMAScript 763 as shown below: 765 // Note: "BigInt" is implemented by Google's V8 ECMAScript engine. 766 // It requires the following code to become JSON serializable. 767 BigInt.prototype.toJSON = function() { 768 return this.toString(); 769 }; 771 // JSON parsing using a "stream" based method 772 var object = JSON.parse(JSON_object_featured_as_a_string, 773 (k,v) => k == 'time' ? new Date(v) : k == 'big' ? BigInt(v) : v 774 ); 776 If you now apply the canonicalizer in Appendix A to "object", the 777 following string would be generated: 779 {"big":"55","time":"2019-01-28T07:45:10.000Z","val":3.5} 781 In this case the string arguments for "big" and "time" have changed 782 with respect to the original, presumable making an application 783 depending on JCS fail. 785 The reason for the deviation is that in stream and schema based JSON 786 parsers, the original "string" argument is typically replaced on-the- 787 fly by the native subtype which when serialized, may exhibit a 788 different and platform dependent pattern. 790 That is, stream and schema based parsing MUST treat subtypes as 791 "pure" (immutable) JSON String types, and perform the actual 792 conversion to the designated native type in a subsequent step. In 793 modern programming platforms like Go, Java and C# this can be 794 achieved with moderate efforts by combining annotations, getters and 795 setters. Below is an example in C#/Json.NET showing a part of a 796 class that is serializable as a JSON Object: 798 // The "pure" string solution uses a local 799 // string variable for JSON serialization while 800 // exposing another type to the application 801 [JsonProperty("amount")] 802 private string _amount; 804 [JsonIgnore] 805 public decimal Amount { 806 get { return decimal.Parse(_amount); } 807 set { _amount = value.ToString(); } 808 } 810 In an application "Amount" can be accessed as any other property 811 while it is actually represented by a quoted string in JSON contexts. 813 Note: the example above also addresses the constraints on numeric 814 data implied by I-JSON (the C# "decimal" data type has quite 815 different characteristics compared to IEEE-754 double precision). 817 E.1. Subtypes in Arrays 819 Since the JSON Array construct permits mixing arbitrary JSON data 820 types, custom parsing and serialization code may be required to cope 821 with subtypes anyway. 823 Appendix F. Implementation Guidelines 825 The optimal solution is integrating support for JCS directly in JSON 826 serializers (parsers need no changes). That is, canonicalization 827 would just be an additional "mode" for a JSON serializer. However, 828 this is currently not the case. Fortunately JCS support can be 829 performed through externally supplied canonicalizer software, 830 enabling signature creation schemes like the following: 832 1. Create the data to be signed. 834 2. Serialize the data using existing JSON tools. 836 3. Let the external canonicalizer process the serialized data and 837 return canonicalized result data. 839 4. Sign the canonicalized data. 841 5. Add the resulting signature value to the original JSON data 842 through a designated signature property. 844 6. Serialize the completed (now signed) JSON object using existing 845 JSON tools. 847 A compatible signature verification scheme would then be as follows: 849 1. Parse the signed JSON data using existing JSON tools. 851 2. Read and save the signature value from the designated signature 852 property. 854 3. Remove the signature property from the parsed JSON object. 856 4. Serialize the remaining JSON data using existing JSON tools. 858 5. Let the external canonicalizer process the serialized data and 859 return canonicalized result data. 861 6. Verify that the canonicalized data matches the saved signature 862 value using the algorithm and key used for creating the 863 signature. 865 A canonicalizer like above is effectively only a "filter", 866 potentially usable with a multitude of quite different cryptographic 867 schemes. 869 Using a JSON serializer with integrated JCS support, the 870 serialization performed before the canonicalization step could be 871 eliminated for both processes. 873 Appendix G. Open Source Implementations 875 The following Open Source implementations have been verified to be 876 compatible with JCS: 878 * JavaScript: https://www.npmjs.com/package/canonicalize 880 * Java: https://github.com/erdtman/java-json-canonicalization 882 * Go: https://github.com/cyberphone/json- 883 canonicalization/tree/master/go 885 * .NET/C#: https://github.com/cyberphone/json- 886 canonicalization/tree/master/dotnet 888 * Python: https://github.com/cyberphone/json- 889 canonicalization/tree/master/python3 891 Appendix H. Other JSON Canonicalization Efforts 893 There are (and have been) other efforts creating "Canonical JSON". 894 Below is a list of URLs to some of them: 896 * https://tools.ietf.org/html/draft-staykov-hu-json-canonical- 897 form-00 899 * https://gibson042.github.io/canonicaljson-spec/ 901 * http://wiki.laptop.org/go/Canonical_JSON 903 The listed efforts all build on text level JSON to JSON 904 transformations. The primary feature of text level canonicalization 905 is that it can be made neutral to the flavor of JSON used. However, 906 such schemes also imply major changes to the JSON parsing process 907 which is a likely hurdle for adoption. Albeit at the expense of 908 certain JSON and application constraints, JCS was designed to be 909 compatible with existing JSON tools. 911 Appendix I. Development Portal 913 The JCS specification is currently developed at: 914 https://github.com/cyberphone/ietf-json-canon. 916 The most recent "editors' copy" can be found at: 917 https://cyberphone.github.io/ietf-json-canon. 919 JCS source code and extensive test data is available at: 920 https://github.com/cyberphone/json-canonicalization 922 Appendix J. Document History 924 [[ to be removed by the RFC Editor before publication as an RFC ]] 926 Version 00-06: 928 * See IETF diff listings. 930 Version 07: 932 * Initial converson to XML RFC version 3. 934 * Changed intended status to "Informational". 936 * Added UTF-16 test data and explanations. 938 Version 08: 940 * Updated Abstract. 942 * Added a "Note 2" number serialization sample. 944 * Updated Security Considerations. 946 * Tried to clear up the JSON input data section. 948 * Added a line about Unicode normalization. 950 * Added a line about serialiation of structured data. 952 * Added a missing fact about "BigInt" (V8 not ES6). 954 Version 09: 956 * Updated initial line of Abstract and Introduction. 958 * Added note about breaking ECMAScript changes. 960 * Minor nit fixes. 962 Authors' Addresses 964 Anders Rundgren 965 Independent 966 Montpellier 967 France 969 Email: anders.rundgren.net@gmail.com 970 URI: https://www.linkedin.com/in/andersrundgren/ 972 Bret Jordan 973 Symantec Corporation 974 350 Ellis Street 975 Mountain View, CA 94043 976 United States of America 978 Email: bret_jordan@symantec.com 980 Samuel Erdtman 981 Spotify AB 982 Birger Jarlsgatan 61, 4tr 983 SE-113 56 Stockholm 984 Sweden 986 Email: erdtman@spotify.com