idnits 2.17.1 draft-ietf-json-text-sequence-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 23, 2014) is 3412 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 7159 (Obsoleted by RFC 8259) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 json N. Williams 3 Internet-Draft Cryptonector 4 Intended status: Standards Track December 23, 2014 5 Expires: June 26, 2015 7 JavaScript Object Notation (JSON) Text Sequences 8 draft-ietf-json-text-sequence-13 10 Abstract 12 This document describes the JSON text sequence format and associated 13 media type, "application/json-seq". A JSON text sequence consists of 14 any number of JSON texts, all encoded in UTF-8, each prefixed by an 15 ASCII Record Separator (0x1E), and each ending with an ASCII Line 16 Feed character (0x1A). 18 Status of this Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at http://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on June 26, 2015. 35 Copyright Notice 37 Copyright (c) 2014 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction and Motivation . . . . . . . . . . . . . . . . 3 53 1.1. Conventions used in this document . . . . . . . . . . . . . 3 54 2. JSON Text Sequence Format . . . . . . . . . . . . . . . . . 4 55 2.1. JSON text sequence parsing . . . . . . . . . . . . . . . . . 4 56 2.2. JSON text sequence encoding . . . . . . . . . . . . . . . . 5 57 2.3. Incomplete/invalid JSON texts need not be fatal . . . . . . 5 58 2.4. Top-level numeric, 'true', 'false', and 'null' values . . . 6 59 3. Security Considerations . . . . . . . . . . . . . . . . . . 7 60 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . 8 61 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 62 6. Normative References . . . . . . . . . . . . . . . . . . . . 10 63 Author's Address . . . . . . . . . . . . . . . . . . . . . . 11 65 1. Introduction and Motivation 67 The JavaScript Object Notation (JSON) [RFC7159] is a very handy 68 serialization format. However, when serializing a large sequence of 69 values as an array, or a possibly indeterminate-length or never- 70 ending sequence of values, JSON becomes difficult to work with. 72 Consider a sequence of one million values, each possibly 1 kilobyte 73 when encoded -- roughly one gigabyte. It is often desirable to 74 process such a dataset in an incremental manner: without having to 75 first read all of it before beginning to produce results. 76 Traditionally the way to do this with JSON is to use a "streaming" 77 parser, but these are neither widely available, widely used, nor easy 78 to use. 80 This document describes the concept and format of "JSON text 81 sequences", which are specifically not JSON texts themselves but are 82 composed of (possible) JSON texts. JSON text sequences can be parsed 83 (and produced) incrementally without having to have a streaming 84 parser (nor streaming encoder). 86 1.1. Conventions used in this document 88 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 89 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 90 "OPTIONAL" in this document are to be interpreted as described in 91 [RFC2119]. 93 2. JSON Text Sequence Format 95 Two different sets of ABNF rules are provided for the definition of 96 JSON text sequences: one for parsers, and one for encoders. Having 97 two different sets of rules permits recovery by parsers from 98 sequences where some the elements are truncated for whatever reason. 99 The syntax for parsers is specified in terms of octet strings which 100 are then interpreted as JSON texts if possible. The syntax for 101 encoders, on the other hand, assumes that sequence elements are not 102 truncated. 104 JSON text sequences MUST use UTF-8 encoding; other encodings of JSON 105 (i.e., UTF-16 and UTF-32) MUST NOT be used. 107 2.1. JSON text sequence parsing 109 The ABNF [RFC5234] for the JSON text sequence parser is as given in 110 Figure 1. 112 JSON-sequence = *(1*RS possible-JSON) 113 RS = %x1E; "record separator" (RS), see RFC20 114 ; Also known as: Unicode Character 'INFORMATION SEPARATOR 115 ; TWO' (U+001E) 116 possible-JSON = 1*(not-RS); attempt to parse as UTF-8-encoded 117 ; JSON text (see RFC7159) 118 not-RS = %x00-1d / %x1f-ff; any octets other than RS 120 Figure 1: JSON text sequence ABNF 122 In prose: a series of octet strings, each containing any octet other 123 than a record separator (RS) (0x1E) [RFC0020], all octet strings 124 separated from each other by RS octets. Each octet string in the 125 sequence is to be parsed as a JSON text in the UTF-8 encoding 126 [RFC3629]. 128 If parsing of such an octet string as a UTF-8-encoded JSON text 129 fails, the parser SHOULD nonetheless continue parsing the remainder 130 of the sequence. The parser can report such failures to applications 131 (which might then choose to terminate parsing of a sequence). 132 Multiple consecutive RS octets do not denote empty sequence elements 133 between them, and can be ignored. 135 This document does not define a mechanism for reliably identifying 136 text sequence by position (for example, when sending individual 137 elements of an array as unique text sequences). For applications 138 where truncation is a possibility, this means that intended sequence 139 elements can be truncated, and can even be missing entirely, 140 therefore a reference to an nth element would be unreliable. 142 There is no end of sequence indicator. 144 2.2. JSON text sequence encoding 146 The ABNF for the JSON text sequence encoder is given in Figure 2. 148 JSON-sequence = *(RS JSON-text LF) 149 RS = %x1E; see RFC20 150 ; Also known as: Unicode Character 'INFORMATION SEPARATOR 151 ; TWO' (U+001E) 152 LF = %x0A; "line feed" (LF), see RFC20 153 JSON-text = 155 Figure 2: JSON text sequence ABNF 157 In prose: any number of JSON texts, each encoded in UTF-8 [RFC3629], 158 each preceded by one ASCII RS character, and each followed by a line 159 feed (LF). Since RS is an ASCII control character it may only appear 160 in JSON strings in escaped form (see [RFC7159]), and since RS may not 161 appear in JSON texts in any other form, RS unambiguously delimits the 162 start of any element in the sequence. RS is sufficient to 163 unambiguously delimit all top-level JSON value types other than 164 numbers. Following each JSON text in the sequence with an LF allows 165 detection of truncated JSON texts consisting of a number at the top- 166 level; see Section 2.4. 168 JSON text sequence encoders are expected to ensure that the sequence 169 elements are properly formed. When the JSON text sequence encoder 170 does the JSON text encoding, the sequence elements will naturally be 171 properly formed. When the JSON text sequence encoder accepts 172 already-encoded JSON texts, the JSON text sequence encoder ought to 173 to parse them before adding them to a sequence. 175 Note that on some systems it's possible to input RS by typing 176 'ctrl-^'; on some system or applications the correct sequence may be 177 'ctrl-v crtl-^'. This is helpful when constructing a sequence 178 manually with a text editor. 180 2.3. Incomplete/invalid JSON texts need not be fatal 182 Per- Section 2.1, JSON text sequence parsers should not abort when an 183 octet string contains a malformed JSON text, instead the JSON text 184 sequence parser should skip to the next RS. Such a situation may 185 arise in contexts where, for example, append-writes to log files are 186 truncated by the filesystem (e.g., due to a crash, or administrative 187 process termination). 189 Incremental JSON text parsers may be used, though of course failure 190 to parse a given text may result after first producing some 191 incremental parse results. 193 Sequence parsers should have an option to warn about truncated JSON 194 texts. 196 2.4. Top-level numeric, 'true', 'false', and 'null' values 198 While objects, arrays, and strings are self-delimited in JSON texts, 199 numbers, and the values 'true', 'false', and 'null' are not. Only 200 whitespace can delimit the latter four kinds of values. 202 JSON text sequences use 0x0A as a "canary" octet to detect 203 truncation. 205 Parsers MUST check that any JSON texts that are a top-level number, 206 or which might be 'true', 'false', or 'null' include JSON whitespace 207 (at least one byte matching the "ws" ABNF rule from [RFC7159]) after 208 that value, otherwise the JSON-text may have been truncated. Note 209 that the LF following each JSON text matches the "ws" ABNF rule. 211 Parsers MUST drop JSON-text sequence elements consisting of non-self- 212 delimited top-level values that may have been truncated (that are not 213 delimited by whitespace). Parsers can report such texts as warnings 214 (including, optionally, the parsed text and/or the original octet 215 string). 217 For example, '123' might have been intended to carry the top- 218 level number 1234, but must have been truncated. Similarly, 219 'true' might have been intended to carry the invalid text 220 'trueish'. 'truefalse' is not two top-level values, 'true', 221 and 'false'; it is simply not a valid JSON text. 223 Implementations may produce a value when parsing '"foo"' 224 because their JSON text parser might be able to consume bytes 225 incrementally, and since the JSON text in this case is a self- 226 delimiting top-level value, the parser can produce the result without 227 consuming an additional byte. Such implementations ought to skip to 228 the next RS byte, possibly reporting any intervening non-whitespace 229 bytes. 231 Detection of truncation of non-self-delimited sequence elements 232 (numbers, true, false, and null) is only possible when the sequence 233 encoder produces or receives complete JSON texts. Implementations 234 where the sequence encoder is not also in charge of encoding the 235 individual JSON texts should ensure that those JSON texts are 236 complete. 238 3. Security Considerations 240 All the security considerations of JSON [RFC7159] apply. This format 241 provides no cryptographic integrity protection of any kind. 243 As usual, parsers must operate on as-good-as untrusted input. This 244 means that parsers must fail gracefully in the face of malicious 245 inputs. 247 Note that incremental JSON text parsers can produce partial results 248 and later indicate failure to parse the remainder of a text. A 249 sequence parser that uses an incremental JSON text parser might treat 250 a sequence like '"foo"456' as a sequence of one 251 element ("foo"), while a sequence parser that uses a non-incremental 252 JSON text parser might treat the same sequence as being empty. This 253 effect, and texts that fail to parse and are ignored can be used to 254 smuggle data past sequence parsers that don't warn about JSON text 255 failures. 257 Repeated parsing and re-encoding of a JSON text sequence can result 258 in the addition (or stripping) of trailing LF bytes from (to) 259 individual sequence element JSON texts. This can break signature 260 validation. JSON has no canonical form for JSON texts, therefore 261 neither does the JSON text sequence format. 263 4. IANA Considerations 265 The MIME media type for JSON text sequences is application/json-seq. 267 Type name: application 269 Subtype name: json-seq 271 Required parameters: N/A 273 Optional parameters: N/A 275 Encoding considerations: binary 277 Security considerations: See , 278 Section 3. 280 Interoperability considerations: Described herein. 282 Published specification: . 284 Applications that use this media type: is likely to support this format>. 287 Fragment identifier considerations: N/A. 289 Additional information: 291 o Deprecated alias names for this type: N/A. 293 o Magic number(s): N/A 295 o File extension(s): N/A. 297 o Macintosh file type code(s): N/A. 299 o Person & email address to contact for further information: 301 * json@ietf.org 303 o Intended usage: COMMON 305 o Author: See the "Authors' Addresses" section of this document. 307 o Change controller: IETF 309 5. Acknowledgements 311 Phillip Hallam-Baker proposed the use of JSON text sequences for 312 logfiles and pointed out the need for resynchronization. Stephen 313 Dolan created , which uses something 314 like JSON text sequences (with LF as the separator between texts on 315 output, and requiring only such whitespace as needed to disambiguate 316 on input). Carsten Bormann suggested the use of ASCII RS, and Joe 317 Hildebrand suggested the use of LF in addition to RS for 318 disambiguating top-level number values. Paul Hoffman shepherded the 319 Internet-Draft. Many others contributed reviews and comments on the 320 JSON Working Group mailing list. 322 6. Normative References 324 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 325 Requirement Levels", BCP 14, RFC 2119, March 1997. 327 [RFC0020] Cerf, V., "ASCII format for network interchange", RFC 20, 328 October 1969. 330 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 331 10646", STD 63, RFC 3629, November 2003. 333 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 334 Specifications: ABNF", STD 68, RFC 5234, January 2008. 336 [RFC7159] Bray, T., "The JavaScript Object Notation (JSON) Data 337 Interchange Format", RFC 7159, March 2014. 339 Author's Address 341 Nicolas Williams 342 Cryptonector, LLC 344 Email: nico@cryptonector.com