idnits 2.17.1 draft-ietf-json-text-sequence-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 9, 2014) is 3641 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 7159 (Obsoleted by RFC 8259) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 json N. Williams 3 Internet-Draft Cryptonector 4 Intended status: Standards Track May 9, 2014 5 Expires: November 10, 2014 7 JavaScript Object Notation (JSON) Text Sequences 8 draft-ietf-json-text-sequence-02 10 Abstract 12 This document describes the JSON text sequence format and associated 13 media type. 15 Status of this Memo 17 This Internet-Draft is submitted in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF). Note that other groups may also distribute 22 working documents as Internet-Drafts. The list of current Internet- 23 Drafts is at http://datatracker.ietf.org/drafts/current/. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on November 10, 2014. 32 Copyright Notice 34 Copyright (c) 2014 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the Simplified BSD License. 47 Table of Contents 49 1. Introduction and Motivation . . . . . . . . . . . . . . . . 3 50 1.1. Conventions used in this document . . . . . . . . . . . . . 3 51 2. JSON Text Sequence Format . . . . . . . . . . . . . . . . . 4 52 3. Use for Logfiles, or How to Resynchronize Following 53 Truncated entries . . . . . . . . . . . . . . . . . . . . . 5 54 4. Security Considerations . . . . . . . . . . . . . . . . . . 6 55 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . 7 56 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 57 7. Normative References . . . . . . . . . . . . . . . . . . . . 9 58 Author's Address . . . . . . . . . . . . . . . . . . . . . . 10 60 1. Introduction and Motivation 62 The JavaScript Object Notation (JSON) [RFC7159] is a very handy 63 serialization format. However, when serializing a large sequence of 64 values as an array, or a possibly indeterminate-length or never- 65 ending sequence of values, JSON becomes difficult to work with. 67 Consider a sequence of one million values, each possibly 1 kilobyte 68 when encoded, which would be roughly one gigabyte. If processing 69 such a dataset requires first parsing it entirely, then the result is 70 very inefficient and the processing will be limited by virtual 71 memory. "Online" (a.k.a., "streaming") parsers help, but they are 72 neither widely available or widely used, nor are they easy to use. 74 Ideally such datasets could be parsed and processed one element at a 75 time. Even if each element must be parsed in a not-online manner due 76 to local choice of parser, the result will usually be sufficiently 77 online: limited by the size of the biggest element in the sequence 78 rather than by the size of the sequence. 80 This document describes the concept and format of "JSON text 81 sequences", which are specifically not JSON texts themselves but are 82 composed of JSON texts. 84 1.1. Conventions used in this document 86 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 87 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 88 document are to be interpreted as described in [RFC2119]. 90 2. JSON Text Sequence Format 92 The ABNF [RFC5234] for the JSON text sequence format is as follows: 94 JSON-sequence = *ws1 *(JSON-text *ws2 %x0A *ws1) 95 ws1 = %x20 / %x09 / %x0A / %x0D 96 ws2 = %x20 / %x09 / %x0D 97 JSON-text = 99 Figure 1: JSON text sequence ABNF 101 A JSON text sequence is a sequence of zero or more JSON texts, each 102 surrounded by any number of JSON whitespace characters and always 103 followed by a newline. 105 Requirements: 107 o JSON text sequence encoders MUST emit a newline after any JSON 108 text. 110 An input of 'truefalse' is not a valid sequence of two JSON values, 111 true and false! Neither is 'true0' a valid sequence of true and 112 zero. Some existing JSON parsers that might be used to construct 113 sequence parsers might in fact accept such sequences, resulting in 114 erroneous parsing of sequences of two or more numbers. E.g., a 115 sequence of two numbers, 4 and 2, encoded without the required 116 whitespace between them would parse incorrectly as the number 42. 117 This ambiguity is resolved by requiring that encoders emit a 118 whitespace separator (specifically: a newline) after each text. 120 3. Use for Logfiles, or How to Resynchronize Following Truncated 121 entries 123 The JSON Text Sequence format is useful for logfiles, as those are 124 generally (and atomically) appended to on an ongoing basis. I.e., 125 logfiles are of indeterminate length, at least right up until they 126 closed. 128 A problem comes up with this use case: it is difficult to guarantee 129 that append writes will complete. Therefore it's possible (if 130 unlikely) to end up with truncated log entries -which may fail to 131 parse as JSON texts- followed by other entries. The mechanics of 132 such failures are not explained here (but consider power failures). 134 Fortunately, as long as all texts in the logfile sequence are 135 followed by a newline, it is possible to detect a subsequent entry 136 written after an entry that fails to parse. Figure 2 shows an ABNF 137 rule for detecting the boundary between a non-truncated [and some 138 truncated] JSON text and the next JSON text in a sequence. 140 boundary = endchar *ws2 %0xA *ws1 startchar 141 endchar = ( "}" / "]" / %x22 / "e" / "l" / DIGIT ) 142 startchar = ( "{" / "[" / %x22 / "t" / "f" / "n" / "-" / DIGIT ) 144 Figure 2: ABNF for resynchronization 146 To resynchronize after failing to parse a JSON text, simply search 147 for a boundary as described in figure 2. A boundary found this way 148 might be the boundary between the truncated entry and the subsequent 149 entry, or it might be a subsequent boundary. 151 Scanning backwards may for boundaries will not work reliably unless 152 JSON texts written to logfiles are stripped of internal newlines! 154 4. Security Considerations 156 All the security considerations of JSON [RFC7159] apply. 158 There is no end of sequence indicator. This means that "end of 159 file", "end of transmission", and so on, can be indistinguishable 160 from a logical end of sequence. Applications where this matters 161 should denote end of sequence by convention (e.g., Content-Length in 162 HTTP). 164 JSON text sequence parsers based on non-incremental, non-online JSON 165 text parsers will not be able to efficiently parser JSON texts in 166 which newlines appear; attempting to parse such sequences with non- 167 incremental, non-online JSON text parsers creates a compute resource 168 exhaustion vulnerability. 170 The first requirement given in Section 2 (otherwise-ambiguous JSON 171 texts must be separated by whitespace) is critical and must be 172 adhered to. It is best to always emit a whitespace separator after 173 every JSON text emitted. 175 The resynchronization heuristic for logfiles is imperfect and might 176 skip a valid entry following a truncated one. Purposefully appending 177 a truncated (or invalid) JSON text to a JSON text sequence logfile 178 can cause the subsequent entry to be invisible. Logfile writers 179 SHOULD validate (parse) any untrusted JSON text inputs and SHOULD 180 remove internal newlines from them, thus enabling reliable backwards 181 scanning for sequence element boundaries. 183 5. IANA Considerations 185 The MIME media type for JSON text sequences is application/json-seq. 187 Type name: application 189 Subtype name: json-seq 191 Required parameters: n/a 193 Optional parameters: n/a 195 Encoding considerations: binary 197 Security considerations: See , 198 Section 4. 200 Interoperability considerations: Described herein. 202 Published specification: . 204 Applications that use this media type: JSON text sequences have been 205 used in applications written with the jq programming language. 207 6. Acknowledgements 209 Phillip Hallam-Baker proposed the use of JSON text sequences for 210 logfiles and pointed out the need for resynchronization. James 211 Manger contributed the ABNF for resynchronization. 213 7. Normative References 215 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 216 Requirement Levels", BCP 14, RFC 2119, March 1997. 218 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 219 Specifications: ABNF", STD 68, RFC 5234, January 2008. 221 [RFC7159] Bray, T., "The JavaScript Object Notation (JSON) Data 222 Interchange Format", RFC 7159, March 2014. 224 Author's Address 226 Nicolas Williams 227 Cryptonector, LLC 229 Email: nico@cryptonector.com