idnits 2.17.1 draft-ietf-json-text-sequence-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 22, 2014) is 3626 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 7159 (Obsoleted by RFC 8259) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 json N. Williams 3 Internet-Draft Cryptonector 4 Intended status: Standards Track May 22, 2014 5 Expires: November 23, 2014 7 JavaScript Object Notation (JSON) Text Sequences 8 draft-ietf-json-text-sequence-03 10 Abstract 12 This document describes the JSON text sequence format and associated 13 media type. 15 Status of this Memo 17 This Internet-Draft is submitted in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF). Note that other groups may also distribute 22 working documents as Internet-Drafts. The list of current Internet- 23 Drafts is at http://datatracker.ietf.org/drafts/current/. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on November 23, 2014. 32 Copyright Notice 34 Copyright (c) 2014 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the Simplified BSD License. 47 Table of Contents 49 1. Introduction and Motivation . . . . . . . . . . . . . . . . 3 50 1.1. Conventions used in this document . . . . . . . . . . . . . 3 51 2. JSON Text Sequence Format . . . . . . . . . . . . . . . . . 4 52 2.1. Requirements: . . . . . . . . . . . . . . . . . . . . . . . 4 53 3. Use for Logfiles, or How to Resynchronize Following 54 Truncated entries . . . . . . . . . . . . . . . . . . . . . 5 55 4. Security Considerations . . . . . . . . . . . . . . . . . . 6 56 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . 7 57 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 58 7. Normative References . . . . . . . . . . . . . . . . . . . . 9 59 Author's Address . . . . . . . . . . . . . . . . . . . . . . 10 61 1. Introduction and Motivation 63 The JavaScript Object Notation (JSON) [RFC7159] is a very handy 64 serialization format. However, when serializing a large sequence of 65 values as an array, or a possibly indeterminate-length or never- 66 ending sequence of values, JSON becomes difficult to work with. 68 Consider a sequence of one million values, each possibly 1 kilobyte 69 when encoded, which would be roughly one gigabyte. If processing 70 such a dataset requires first parsing it entirely, then the result is 71 very inefficient and the processing will be limited by virtual 72 memory. "Online" (a.k.a., "streaming") parsers help, but they are 73 neither widely available or widely used, nor are they easy to use. 75 Ideally such datasets could be parsed and processed one element at a 76 time. Even if each element must be parsed in a not-online manner due 77 to local choice of parser, the result will usually be sufficiently 78 online: limited by the size of the biggest element in the sequence 79 rather than by the size of the sequence. 81 This document describes the concept and format of "JSON text 82 sequences", which are specifically not JSON texts themselves but are 83 composed of JSON texts. 85 1.1. Conventions used in this document 87 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 88 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 89 document are to be interpreted as described in [RFC2119]. 91 2. JSON Text Sequence Format 93 The ABNF [RFC5234] for the JSON text sequence format is as given in 94 Figure 1. Note that this ABNF does not work if we assume greedy 95 matching. Therefore, in prose, a JSON text sequence is a sequence of 96 zero or more JSON texts, each surrounded by any number of JSON 97 whitespace characters and always followed by a newline. 99 JSON-sequence = ws *(JSON-text ws LF ws) 100 LF = 101 ws = 102 JSON-text = 104 Figure 1: JSON text sequence ABNF 106 2.1. Requirements: 108 o JSON text sequence encoders MUST emit a newline after any JSON 109 text. 111 An input of 'truefalse' is not a valid sequence of two JSON values, 112 true and false! Neither is 'true0' a valid sequence of true and 113 zero. Some existing JSON parsers that might be used to construct 114 sequence parsers might in fact accept such sequences, resulting in 115 erroneous parsing of sequences of two or more numbers. E.g., a 116 sequence of two numbers, 4 and 2, encoded without the required 117 whitespace between them would parse incorrectly as the number 42. 118 This ambiguity is resolved by requiring that encoders emit a 119 whitespace separator (specifically: a newline) after each text. 121 3. Use for Logfiles, or How to Resynchronize Following Truncated 122 entries 124 The JSON Text Sequence format is useful for logfiles, as those are 125 generally (and atomically) appended to on an ongoing basis. I.e., 126 logfiles are of indeterminate length, at least right up until they 127 closed. 129 A problem comes up with this use case: it is difficult to guarantee 130 that append writes will complete. Therefore it's possible (if 131 unlikely) to end up with truncated log entries -which may fail to 132 parse as JSON texts- followed by other entries. The mechanics of 133 such failures are not explained here (but consider power failures). 135 Fortunately, as long as all texts in the logfile sequence are 136 followed by a newline, it is possible to detect a subsequent entry 137 written after an entry that fails to parse. Figure 2 shows an ABNF 138 rule for detecting the boundary between a non-truncated [and some 139 truncated] JSON text and the next JSON text in a sequence. 141 boundary = endchar *text-sep *ws startchar 142 text-sep = *(SP / HTAB / CR) LF ; these are from RFC5234 143 endchar = ( "}" / "]" / DQUOTE / "e" / "l" / DIGIT ) 144 startchar = ( "{" / "[" / DQUOTE / "t" / "f" / "n" / "-" / DIGIT ) 145 ws = 147 Figure 2: ABNF for resynchronization 149 To resynchronize after failing to parse a JSON text, simply search 150 for a boundary as described in figure 2. A boundary found this way 151 might be the boundary between the truncated entry and the subsequent 152 entry, or it might be a subsequent boundary. 154 Scanning backwards for boundaries may not work reliably unless JSON 155 texts written to logfiles are stripped of internal newlines. 157 4. Security Considerations 159 All the security considerations of JSON [RFC7159] apply. 161 There is no end of sequence indicator. This means that "end of 162 file", "end of transmission", and so on, can be indistinguishable 163 from a logical end of sequence. Applications where this matters 164 should denote end of sequence by convention (e.g., Content-Length in 165 HTTP). 167 JSON text sequence parsers based on non-incremental, non-online JSON 168 text parsers will not be able to efficiently parser JSON texts in 169 which newlines appear; attempting to parse such sequences with non- 170 incremental, non-online JSON text parsers creates a compute resource 171 exhaustion vulnerability. 173 The resynchronization heuristic for logfiles is imperfect and might 174 skip a valid entry following a truncated one. Purposefully appending 175 a truncated (or invalid) JSON text to a JSON text sequence logfile 176 can cause the subsequent entry to be invisible. Logfile writers 177 SHOULD validate (parse) any untrusted JSON text inputs and SHOULD 178 remove internal newlines from them, thus enabling reliable backwards 179 scanning for sequence element boundaries. Alternatively, logfile 180 writers might write texts in sequences of two texts, the first being 181 meaningless by convention. Of course, logfile writers SHOULD also 182 ensure that their writes are atomic, at least in so far as not 183 interleaving with other writers' writes. 185 5. IANA Considerations 187 The MIME media type for JSON text sequences is application/json-seq. 189 Type name: application 191 Subtype name: json-seq 193 Required parameters: n/a 195 Optional parameters: n/a 197 Encoding considerations: binary 199 Security considerations: See , 200 Section 4. 202 Interoperability considerations: Described herein. 204 Published specification: . 206 Applications that use this media type: JSON text 208 sequences have been used in applications written with the jq 209 programming language. 211 6. Acknowledgements 213 Phillip Hallam-Baker proposed the use of JSON text sequences for 214 logfiles and pointed out the need for resynchronization. James 215 Manger contributed the ABNF for resynchronization. 217 7. Normative References 219 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 220 Requirement Levels", BCP 14, RFC 2119, March 1997. 222 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 223 Specifications: ABNF", STD 68, RFC 5234, January 2008. 225 [RFC7159] Bray, T., "The JavaScript Object Notation (JSON) Data 226 Interchange Format", RFC 7159, March 2014. 228 Author's Address 230 Nicolas Williams 231 Cryptonector, LLC 233 Email: nico@cryptonector.com