idnits 2.17.1 draft-staykov-hu-json-canonical-form-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 218 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: The double data type is represented as specified in the XML schema standard [XML] * The canonical representation of the double data type consists of mantissa followed by "E", followed by exponent. * Mantissa * MUST be represented as a decimal. The decimal point is mandatory * There MUST be a single non zero digit on the left of the decimal point (unless a zero is represented). * There MUST be at least single digit on the right of the decimal point. * Exponent * Zero exponent is represented by "E0". * "+" sign is prohibited in both the mantissa and the exponent. * Leading zeroes are prohibited from the left side of the decimal point in the mantissa and from the exponent. * Special values (NaN, INF) MUST not be used. -- The document date (November 07, 2012) is 4189 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'JSON' -- Possible downref: Non-RFC (?) normative reference: ref. 'XML' Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet-Draft G. Staykov 2 Intended status: Standards Track VMware 3 Expires: May 07, 2013 J. Hu 4 VMware 5 November 07, 2012 7 JSON Canonical Form 8 draft-staykov-hu-json-canonical-form-00 10 Abstract 12 A single JSON document can have multiple logically equivalent 13 physical representations. While convenient for human interaction, this 14 flexibility is inconvenient for cases where a machine is used to 15 assess the logical equivalence of documents. In cases where logical 16 equivalence is useful, an encoder should produce a canonical form of a 17 JSON document. For example, since digital signatures demand the same 18 physical representation for logically equivalent documents, a 19 canonical physical representation would allow the signature to apply 20 to the logical document. This internet draft has the goal to define a 21 canonical form of JSON documents. Two logically equivalent documents 22 should have same canonical form. 24 Requirements Language 26 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 27 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 28 document are to be interpreted as described in [RFC2119]. 30 Status of this Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. Internet-Drafts are working 34 documents of the Internet Engineering Task Force (IETF). Note that 35 other groups may also distribute working documents as 36 Internet-Drafts. The list of current Internet-Drafts is at 37 http://datatracker.ietf.org/drafts/current. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 Copyright Notice 46 Copyright (c) 2012 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 1. Introduction 61 JSON [JSON] is a lightweight data-interchange text format that is 62 suitable for both humans and machines. It allows multiple physical 63 representations that are logically equivalent. For example, a 64 formatting change to add whitespaces and line endings to make a 65 document more human readable will result in a different representation 66 when doing a byte for byte comparison. There are cases however where 67 it is essential to have a single physical representation of a data 68 document. For example when a cryptographic hash is applied over a JSON 69 document, a single physical representation allows the hash to 70 represent the logical content of the document by removing variation in 71 how that content is encoded in JSON. Thus a common physical 72 representation of logically equivalent JSON documents should be 73 defined. It is called canonical form. 75 2. JSON canonical form 77 The canonical form is defined by the following rules: 78 * The document MUST be encoded in UTF-8 [UTF-8] 79 * Non-significant(1) whitespace characters MUST NOT be used 80 * Non-significant(1) line endings MUST NOT be used 81 * Entries (set of name/value pairs) in JSON objects MUST be sorted 82 lexicographically(2) by their names 83 * Arrays MUST preserve their initial ordering 85 (1)As defined in JSON data-interchange format [JSON], JSON objects 86 consists of multiple "name"/"value" pairs and JSON arrays consists 87 of multiple "value" fields. Non-significant means not part of 88 "name" or "value". 90 (2)Lexicographic comparison, which orders strings from least to 91 greatest alphabetically based on the UCS (Unicode Character Set) 92 codepoint values. 94 2.1 Canonical representation of data types 96 2.1.1 Double 98 The double data type is represented as specified in the XML schema 99 standard [XML] 100 * The canonical representation of the double data type consists of 101 mantissa followed by "E", followed by exponent. 102 * Mantissa 103 * MUST be represented as a decimal. The decimal point is mandatory 104 * There MUST be a single non zero digit on the left of the decimal 105 point (unless a zero is represented). 106 * There MUST be at least single digit on the right of the decimal 107 point. 108 * Exponent 109 * Zero exponent is represented by "E0". 110 * "+" sign is prohibited in both the mantissa and the exponent. 111 * Leading zeroes are prohibited from the left side of the decimal 112 point in the mantissa and from the exponent. 113 * Special values (NaN, INF) MUST not be used. 115 3. Applications 117 The JSON canonical form can be used when digitally signing JSON 118 documents generated from a serialization library. Because 119 serialization and deserialization libraries might tolerate variation 120 in physical representation, different physical representations may 121 result after several serialization / deserialization cycles. This 122 could result in false signature verification failures as the hash 123 digest of the same document differs from the hash digest used when 124 signing. A way to avoid this problem is to use canonical form when 125 signing and verifying hash digests. 127 4. Examples 129 4.1. Example 1 131 Input: 132 { 133 "foo" : "foo bar" 134 } 136 Canonical form: 137 {"foo":"foo bar"} 139 Demonstrates: 140 * Non-significant whitespace characters and line endings are removed. 141 * Whitespaces inside name/value object entities are preserved. 143 4.2. Example 2 145 Input: 146 { 147 "foo":"bar", 148 "abc":"def", 149 "zoo" : 150 [ 151 "def", 152 "abc" 153 ] 154 } 156 Canonical Form: 157 {"abc":"def","foo":"bar","zoo":["def","abc"]} 159 Demonstrates: 160 * Non-significant whitespaces and line endings are removed. 161 * Name/value pairs in JSON objects are lexicographically sorted by 162 "name" key. 163 * Array order is preserved. 165 4.3. Example 3 167 Input: 168 { 169 "d1":-12.34e4, 170 "d2":1E-130, 171 "d3":0.0E-0, 172 "d4":1.2 173 } 175 Canonical Form: 176 {"d1":-1.234E5,"d2":1.0E-130,"d3":0.0E0,"d4":1.2E0} 178 Demonstrates: 179 * Various canonical representations of double data types. 181 5. Security Considerations 183 This document provides a groundwork needed for providing data 184 integrity by using digital signatures over JSON messages. 186 6. IANA Considerations 188 This document has no actions for IANA 190 7. References 192 7.1. Normative References 194 [JSON] http://www.json.org/ 196 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 197 Requirement Levels", BCP 14, RFC 2119, March 1997. 199 [UTF-8] UTF-8, a transformation format of ISO 10646, IETF RFC 3629. 200 F. Yergeau. January 1998. 201 http://www.ietf.org/rfc/rfc3629.txt 203 [XML] http://www.w3.org/TR/xmlschema-2 205 Authors' Addresses 207 Georgi Staykov 208 VMware 209 Email: gstaykov@vmware.com 211 Jeff Hu 212 VMware 213 Email: jhu@vmware.com