idnits 2.17.1 draft-ietf-json-text-sequence-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 23, 2014) is 3626 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 7159 (Obsoleted by RFC 8259) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 json N. Williams 3 Internet-Draft Cryptonector 4 Intended status: Standards Track May 23, 2014 5 Expires: November 24, 2014 7 JavaScript Object Notation (JSON) Text Sequences 8 draft-ietf-json-text-sequence-04 10 Abstract 12 This document describes the JSON text sequence format and associated 13 media type. 15 Status of this Memo 17 This Internet-Draft is submitted in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF). Note that other groups may also distribute 22 working documents as Internet-Drafts. The list of current Internet- 23 Drafts is at http://datatracker.ietf.org/drafts/current/. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on November 24, 2014. 32 Copyright Notice 34 Copyright (c) 2014 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the Simplified BSD License. 47 Table of Contents 49 1. Introduction and Motivation . . . . . . . . . . . . . . . 3 50 1.1. JSON Parser Types . . . . . . . . . . . . . . . . . . . . 3 51 1.2. Conventions used in this document . . . . . . . . . . . . 3 52 2. JSON Text Sequence Format . . . . . . . . . . . . . . . . 4 53 2.1. Ambiguities . . . . . . . . . . . . . . . . . . . . . . . 4 54 2.1.1. Ambiguities Resulting from Partial Texts . . . . . . . . . 4 55 2.2. Rationale for Choice of LF as the Text Separator . . . . . 5 56 3. Use for Logfiles, or How to Resynchronize Following 57 Truncated entries . . . . . . . . . . . . . . . . . . . . 6 58 4. Security Considerations . . . . . . . . . . . . . . . . . 8 59 5. IANA Considerations . . . . . . . . . . . . . . . . . . . 9 60 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 10 61 7. Normative References . . . . . . . . . . . . . . . . . . . 11 62 Author's Address . . . . . . . . . . . . . . . . . . . . . 12 64 1. Introduction and Motivation 66 The JavaScript Object Notation (JSON) [RFC7159] is a very handy 67 serialization format. However, when serializing a large sequence of 68 values as an array, or a possibly indeterminate-length or never- 69 ending sequence of values, JSON becomes difficult to work with. 71 Consider a sequence of one million values, each possibly 1 kilobyte 72 when encoded, which would be roughly one gigabyte. It is often 73 desirable to process such a dataset in an incremental manner: without 74 having to first read all of it before beginning to produce results. 75 Traditionally the way to do this with JSON is to use a "streaming" 76 parser (see Section 1.1), but these are neither widely available, 77 widely used, nor easy to use. 79 This document describes the concept and format of "JSON text 80 sequences", which are specifically not JSON texts themselves but are 81 composed of JSON texts. JSON text sequences can be parsed (and 82 produced) incrementally without having to have a streaming parser 83 (nor encoder). 85 1.1. JSON Parser Types 87 For the purposes of this document we shall classify JSON parsers as 88 follows: 90 Streaming Consumes a text incrementally, outputs values 91 incrementally (e.g., as (path, leaf value) pairs). 93 Online Consumes a text incrementally. 95 Off-line Consumes only complete texts. 97 1.2. Conventions used in this document 99 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 100 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 101 document are to be interpreted as described in [RFC2119]. 103 2. JSON Text Sequence Format 105 The ABNF [RFC5234] for the JSON text sequence format is as given in 106 Figure 1. Note that this ABNF does not work if we assume greedy 107 matching. Therefore, in prose, a JSON text sequence is a sequence of 108 zero or more JSON texts, each surrounded by any number of JSON 109 whitespace characters and always followed by a newline. 111 JSON-sequence = ws *(JSON-text ws LF ws) 112 LF = 113 ws = 114 JSON-text = 116 Figure 1: JSON text sequence ABNF 118 As long as a JSON text sequence consist of complete JSON texts, the 119 only requirement is that whitespace separate any non-object, array, 120 string top-level values from neighboring texts. The simplest way to 121 ensure this is to require such whitespace, and furthermore it is 122 convenient to use a newline, as we'll see in Section 2.1. Therefore 123 we impose one requirement: 125 o JSON text sequence encoders MUST emit a newline after any JSON 126 text. 128 2.1. Ambiguities 130 Otherwise An input of 'truefalse' is not a valid sequence of two JSON 131 values, true and false! Neither is 'true0' a valid sequence of true 132 and zero. Some existing JSON parsers that might be used to construct 133 sequence parsers might in fact accept such sequences, resulting in 134 erroneous parsing of sequences of two or more numbers. E.g., a 135 sequence of two numbers, 4 and 2, encoded without the required 136 whitespace between them would parse incorrectly as the number 42. 138 Such ambiguities is resolved by requiring that encoders emit a 139 whitespace separator (specifically: a newline) after each text. 141 2.1.1. Ambiguities Resulting from Partial Texts 143 Another kind of ambiguity arises when a JSON text sequence contains 144 partial texts. Such a sequence can result when using "append writes" 145 to write to a file. For example, many systems might commit partial 146 writes to stable storage then fail to complete the remainder of a 147 write as a result of, e.g., power failures; upon recovery the file 148 may then end with a partial JSON text. 150 [[anchor1: Perhaps we should add a note about what POSIX requires 151 w.r.t. O_APPEND, and how POSIX is agnostic as to power failures and 152 so on. The point being that even where a standard imposes strong 153 atomicity requirements as to append writes, there are good reasons 154 why that might be difficult to obtain under exceptional 155 circumstances.]] 157 Consider a portion of a JSON text sequence such as: 159 { "foo": 160 { "bar": 42 } 161 } 163 How can we tell that the first line isn't part of an incomplete JSON 164 text? We can't, especially if the third line were missing. 166 In the common case JSON text sequence parsers assume every text is 167 complete, and abort processing if any one text fails to parse. 168 However, for logfiles, there is value is being able to recover from 169 such situations. Recovery is described in Section 3. 171 2.2. Rationale for Choice of LF as the Text Separator 173 A variety of characters or character sequences (even non-whitespace 174 characters) could have been used as the JSON text separator in JSON 175 text sequences. The rationale for using newline (LF) as the 176 separator is as follows: 178 o it matches the 'ws' ABNF rule in [RFC7159] (as do CR, HTAB, and 179 SP); 181 o it is always escaped in encoded JSON strings, therefore it is safe 182 remove LFs (or replace then with other JSON whitespace characters) 183 from any JSON text (this is also true of CR and HTAB, but not SP); 185 o it is generally understood as the end-of-line marker by line- 186 oriented tools; 188 o at least one JSON text sequence implementation exists and has 189 existed for some time [XXX add external informative reference to 190 https://stedolan.github.com/jq], and it uses LF as the JSON text 191 separator. 193 Note that JSON text sequence writers may (and should) use CR LF as 194 the text separator where the end-of-line marker is expected to be CR 195 LF. 197 3. Use for Logfiles, or How to Resynchronize Following Truncated 198 entries 200 The JSON Text Sequence format is useful for logfiles, as those are 201 generally (and atomically) appended to on an ongoing basis. I.e., 202 logfiles are of indeterminate length, at least right up until they 203 are closed. 205 The partial-write ambiguities described in Section 2.1.1 come up in 206 the case of logfiles. 208 As long as all texts in the logfile sequence are followed by a 209 newline, it is possible to detect a subsequent JSON text written 210 after an entry that fails to parse: either the first or the second 211 subsequent, complete JSON texts. Figure 2 shows an ABNF rule for 212 detecting the boundary between a non-truncated [and some truncated] 213 JSON text and the next JSON text in a sequence. This rule assumes 214 that only valid JSON texts are written to a sequence. 216 boundary = endchar *text-sep *ws startchar 217 text-sep = *(SP / HTAB / CR) LF ; these are from RFC5234 218 endchar = ( "}" / "]" / DQUOTE / "e" / "l" / DIGIT ) 219 startchar = ( "{" / "[" / DQUOTE / "t" / "f" / "n" / "-" / DIGIT ) 220 ws = 222 Figure 2: ABNF for resynchronization 224 To resynchronize after failing to parse a JSON text, simply search 225 for a boundary as described in figure 2. A boundary found this way 226 might be the boundary between the truncated entry and the subsequent 227 entry, or it might be a subsequent boundary. 229 This method does not support scanning backwards for boundaries. 231 To make resynchronization reliable, and work both forwards and 232 backwards, the writer MUST first ensure that the JSON text being 233 written is valid, and SHOULD apply either (or both) of the following: 235 1. Remove internal newlines (not including escaped newlines in 236 strings) from any JSON text being written. 238 2. Prefix any JSON text with a null value and a newline. The append 239 write must still be atomic (one write), and contain both texts. 241 Method #1 permits scanning for newlines (in either direction) as the 242 resynchronization method. 244 Method #2 permits scanning for "null" LF (in either direction) as the 245 resynchronization method. 247 Consider a JSON text sequence such as: 249 null 250 { "foo":"hello world" } 251 "a broken writenull 252 "a complete write" 254 Resynchronization methods #1 and #2 will correctly detect that the 255 third line is an incomplete JSON text, and that the next complete 256 text starts at the fourth line. We can't tell which of method #1 or 257 #2 the writer was using, but either method works for the parser. The 258 parser SHOULD know which method the writer was using, as to know 259 whether to discard the nulls, and whether to attempt 260 resynchronization at all. 262 Method #1 is RECOMMENDED for JSON text sequence logfile writers. 264 4. Security Considerations 266 All the security considerations of JSON [RFC7159] apply. 268 There is no end of sequence indicator. This means that "end of 269 file", "end of transmission", and so on, can be indistinguishable 270 from a logical end of sequence. Applications where this matters 271 should denote end of sequence by convention (e.g., Content-Length in 272 HTTP). 274 The resynchronization ABNF heuristic is imperfect and might skip a 275 valid entry following a truncated one. Purposefully appending a 276 truncated (or invalid) JSON text to a JSON text sequence logfile can 277 cause the subsequent entry to be invisible. 279 JSON text sequence writers MUST validate (parse) any JSON text inputs 280 from untrusted third parties. 282 JSON text sequence logfile writers SHOULD apply one of the 283 resynchronization methods described in Figure 2, preferably method 284 #1. 286 5. IANA Considerations 288 The MIME media type for JSON text sequences is application/json-seq. 290 Type name: application 292 Subtype name: json-seq 294 Required parameters: n/a 296 Optional parameters: n/a 298 Encoding considerations: binary 300 Security considerations: See , 301 Section 4. 303 Interoperability considerations: Described herein. 305 Published specification: . 307 Applications that use this media type: JSON text 309 sequences have been used in applications written with the jq 310 programming language. 312 6. Acknowledgements 314 Phillip Hallam-Baker proposed the use of JSON text sequences for 315 logfiles and pointed out the need for resynchronization. James 316 Manger contributed the ABNF for resynchronization. 318 7. Normative References 320 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 321 Requirement Levels", BCP 14, RFC 2119, March 1997. 323 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 324 Specifications: ABNF", STD 68, RFC 5234, January 2008. 326 [RFC7159] Bray, T., "The JavaScript Object Notation (JSON) Data 327 Interchange Format", RFC 7159, March 2014. 329 Author's Address 331 Nicolas Williams 332 Cryptonector, LLC 334 Email: nico@cryptonector.com