idnits 2.17.1 

draft-ietf-json-text-sequence-13.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (December 23, 2014) is 3412 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 7159 (Obsoleted by RFC 8259)


     Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	json                                                         N. Williams
3	Internet-Draft                                              Cryptonector
4	Intended status: Standards Track                       December 23, 2014
5	Expires: June 26, 2015

7	            JavaScript Object Notation (JSON) Text Sequences
8	                    draft-ietf-json-text-sequence-13

10	Abstract

12	   This document describes the JSON text sequence format and associated
13	   media type, "application/json-seq".  A JSON text sequence consists of
14	   any number of JSON texts, all encoded in UTF-8, each prefixed by an
15	   ASCII Record Separator (0x1E), and each ending with an ASCII Line
16	   Feed character (0x1A).

18	Status of this Memo

20	   This Internet-Draft is submitted in full conformance with the
21	   provisions of BCP 78 and BCP 79.

23	   Internet-Drafts are working documents of the Internet Engineering
24	   Task Force (IETF).  Note that other groups may also distribute
25	   working documents as Internet-Drafts.  The list of current Internet-
26	   Drafts is at http://datatracker.ietf.org/drafts/current/.

28	   Internet-Drafts are draft documents valid for a maximum of six months
29	   and may be updated, replaced, or obsoleted by other documents at any
30	   time.  It is inappropriate to use Internet-Drafts as reference
31	   material or to cite them other than as "work in progress."

33	   This Internet-Draft will expire on June 26, 2015.

35	Copyright Notice

37	   Copyright (c) 2014 IETF Trust and the persons identified as the
38	   document authors.  All rights reserved.

40	   This document is subject to BCP 78 and the IETF Trust's Legal
41	   Provisions Relating to IETF Documents
42	   (http://trustee.ietf.org/license-info) in effect on the date of
43	   publication of this document.  Please review these documents
44	   carefully, as they describe your rights and restrictions with respect
45	   to this document.  Code Components extracted from this document must
46	   include Simplified BSD License text as described in Section 4.e of
47	   the Trust Legal Provisions and are provided without warranty as
48	   described in the Simplified BSD License.

50	Table of Contents

52	   1.    Introduction and Motivation  . . . . . . . . . . . . . . . .  3
53	   1.1.  Conventions used in this document  . . . . . . . . . . . . .  3
54	   2.    JSON Text Sequence Format  . . . . . . . . . . . . . . . . .  4
55	   2.1.  JSON text sequence parsing . . . . . . . . . . . . . . . . .  4
56	   2.2.  JSON text sequence encoding  . . . . . . . . . . . . . . . .  5
57	   2.3.  Incomplete/invalid JSON texts need not be fatal  . . . . . .  5
58	   2.4.  Top-level numeric, 'true', 'false', and 'null' values  . . .  6
59	   3.    Security Considerations  . . . . . . . . . . . . . . . . . .  7
60	   4.    IANA Considerations  . . . . . . . . . . . . . . . . . . . .  8
61	   5.    Acknowledgements . . . . . . . . . . . . . . . . . . . . . .  9
62	   6.    Normative References . . . . . . . . . . . . . . . . . . . . 10
63	         Author's Address . . . . . . . . . . . . . . . . . . . . . . 11

65	1.  Introduction and Motivation

67	   The JavaScript Object Notation (JSON) [RFC7159] is a very handy
68	   serialization format.  However, when serializing a large sequence of
69	   values as an array, or a possibly indeterminate-length or never-
70	   ending sequence of values, JSON becomes difficult to work with.

72	   Consider a sequence of one million values, each possibly 1 kilobyte
73	   when encoded -- roughly one gigabyte.  It is often desirable to
74	   process such a dataset in an incremental manner: without having to
75	   first read all of it before beginning to produce results.
76	   Traditionally the way to do this with JSON is to use a "streaming"
77	   parser, but these are neither widely available, widely used, nor easy
78	   to use.

80	   This document describes the concept and format of "JSON text
81	   sequences", which are specifically not JSON texts themselves but are
82	   composed of (possible) JSON texts.  JSON text sequences can be parsed
83	   (and produced) incrementally without having to have a streaming
84	   parser (nor streaming encoder).

86	1.1.  Conventions used in this document

88	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
89	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
90	   "OPTIONAL" in this document are to be interpreted as described in
91	   [RFC2119].

93	2.  JSON Text Sequence Format

95	   Two different sets of ABNF rules are provided for the definition of
96	   JSON text sequences: one for parsers, and one for encoders.  Having
97	   two different sets of rules permits recovery by parsers from
98	   sequences where some the elements are truncated for whatever reason.
99	   The syntax for parsers is specified in terms of octet strings which
100	   are then interpreted as JSON texts if possible.  The syntax for
101	   encoders, on the other hand, assumes that sequence elements are not
102	   truncated.

104	   JSON text sequences MUST use UTF-8 encoding; other encodings of JSON
105	   (i.e., UTF-16 and UTF-32) MUST NOT be used.

107	2.1.  JSON text sequence parsing

109	   The ABNF [RFC5234] for the JSON text sequence parser is as given in
110	   Figure 1.

112	     JSON-sequence = *(1*RS possible-JSON)
113	     RS = %x1E; "record separator" (RS), see RFC20
114	              ; Also known as: Unicode Character 'INFORMATION SEPARATOR
115	              ;                TWO' (U+001E)
116	     possible-JSON = 1*(not-RS); attempt to parse as UTF-8-encoded
117	                               ; JSON text (see RFC7159)
118	     not-RS = %x00-1d / %x1f-ff; any octets other than RS

120	                     Figure 1: JSON text sequence ABNF

122	   In prose: a series of octet strings, each containing any octet other
123	   than a record separator (RS) (0x1E) [RFC0020], all octet strings
124	   separated from each other by RS octets.  Each octet string in the
125	   sequence is to be parsed as a JSON text in the UTF-8 encoding
126	   [RFC3629].

128	   If parsing of such an octet string as a UTF-8-encoded JSON text
129	   fails, the parser SHOULD nonetheless continue parsing the remainder
130	   of the sequence.  The parser can report such failures to applications
131	   (which might then choose to terminate parsing of a sequence).
132	   Multiple consecutive RS octets do not denote empty sequence elements
133	   between them, and can be ignored.

135	   This document does not define a mechanism for reliably identifying
136	   text sequence by position (for example, when sending individual
137	   elements of an array as unique text sequences).  For applications
138	   where truncation is a possibility, this means that intended sequence
139	   elements can be truncated, and can even be missing entirely,
140	   therefore a reference to an nth element would be unreliable.

142	   There is no end of sequence indicator.

144	2.2.  JSON text sequence encoding

146	   The ABNF for the JSON text sequence encoder is given in Figure 2.

148	     JSON-sequence = *(RS JSON-text LF)
149	     RS = %x1E; see RFC20
150	              ; Also known as: Unicode Character 'INFORMATION SEPARATOR
151	              ;                TWO' (U+001E)
152	     LF = %x0A; "line feed" (LF), see RFC20
153	     JSON-text = <given by RFC7159, using UTF-8 encoding>

155	                     Figure 2: JSON text sequence ABNF

157	   In prose: any number of JSON texts, each encoded in UTF-8 [RFC3629],
158	   each preceded by one ASCII RS character, and each followed by a line
159	   feed (LF).  Since RS is an ASCII control character it may only appear
160	   in JSON strings in escaped form (see [RFC7159]), and since RS may not
161	   appear in JSON texts in any other form, RS unambiguously delimits the
162	   start of any element in the sequence.  RS is sufficient to
163	   unambiguously delimit all top-level JSON value types other than
164	   numbers.  Following each JSON text in the sequence with an LF allows
165	   detection of truncated JSON texts consisting of a number at the top-
166	   level; see Section 2.4.

168	   JSON text sequence encoders are expected to ensure that the sequence
169	   elements are properly formed.  When the JSON text sequence encoder
170	   does the JSON text encoding, the sequence elements will naturally be
171	   properly formed.  When the JSON text sequence encoder accepts
172	   already-encoded JSON texts, the JSON text sequence encoder ought to
173	   to parse them before adding them to a sequence.

175	   Note that on some systems it's possible to input RS by typing
176	   'ctrl-^'; on some system or applications the correct sequence may be
177	   'ctrl-v crtl-^'.  This is helpful when constructing a sequence
178	   manually with a text editor.

180	2.3.  Incomplete/invalid JSON texts need not be fatal

182	   Per- Section 2.1, JSON text sequence parsers should not abort when an
183	   octet string contains a malformed JSON text, instead the JSON text
184	   sequence parser should skip to the next RS.  Such a situation may
185	   arise in contexts where, for example, append-writes to log files are
186	   truncated by the filesystem (e.g., due to a crash, or administrative
187	   process termination).

189	   Incremental JSON text parsers may be used, though of course failure
190	   to parse a given text may result after first producing some
191	   incremental parse results.

193	   Sequence parsers should have an option to warn about truncated JSON
194	   texts.

196	2.4.  Top-level numeric, 'true', 'false', and 'null' values

198	   While objects, arrays, and strings are self-delimited in JSON texts,
199	   numbers, and the values 'true', 'false', and 'null' are not.  Only
200	   whitespace can delimit the latter four kinds of values.

202	   JSON text sequences use 0x0A as a "canary" octet to detect
203	   truncation.

205	   Parsers MUST check that any JSON texts that are a top-level number,
206	   or which might be 'true', 'false', or 'null' include JSON whitespace
207	   (at least one byte matching the "ws" ABNF rule from [RFC7159]) after
208	   that value, otherwise the JSON-text may have been truncated.  Note
209	   that the LF following each JSON text matches the "ws" ABNF rule.

211	   Parsers MUST drop JSON-text sequence elements consisting of non-self-
212	   delimited top-level values that may have been truncated (that are not
213	   delimited by whitespace).  Parsers can report such texts as warnings
214	   (including, optionally, the parsed text and/or the original octet
215	   string).

217	   For example, '<RS>123<RS>' might have been intended to carry the top-
218	   level number 1234, but must have been truncated.  Similarly,
219	   '<RS>true<RS>' might have been intended to carry the invalid text
220	   'trueish'. '<RS>truefalse<RS>' is not two top-level values, 'true',
221	   and 'false'; it is simply not a valid JSON text.

223	   Implementations may produce a value when parsing '<RS>"foo"<RS>'
224	   because their JSON text parser might be able to consume bytes
225	   incrementally, and since the JSON text in this case is a self-
226	   delimiting top-level value, the parser can produce the result without
227	   consuming an additional byte.  Such implementations ought to skip to
228	   the next RS byte, possibly reporting any intervening non-whitespace
229	   bytes.

231	   Detection of truncation of non-self-delimited sequence elements
232	   (numbers, true, false, and null) is only possible when the sequence
233	   encoder produces or receives complete JSON texts.  Implementations
234	   where the sequence encoder is not also in charge of encoding the
235	   individual JSON texts should ensure that those JSON texts are
236	   complete.

238	3.  Security Considerations

240	   All the security considerations of JSON [RFC7159] apply.  This format
241	   provides no cryptographic integrity protection of any kind.

243	   As usual, parsers must operate on as-good-as untrusted input.  This
244	   means that parsers must fail gracefully in the face of malicious
245	   inputs.

247	   Note that incremental JSON text parsers can produce partial results
248	   and later indicate failure to parse the remainder of a text.  A
249	   sequence parser that uses an incremental JSON text parser might treat
250	   a sequence like '<RS>"foo"<LF>456<LF><RS>' as a sequence of one
251	   element ("foo"), while a sequence parser that uses a non-incremental
252	   JSON text parser might treat the same sequence as being empty.  This
253	   effect, and texts that fail to parse and are ignored can be used to
254	   smuggle data past sequence parsers that don't warn about JSON text
255	   failures.

257	   Repeated parsing and re-encoding of a JSON text sequence can result
258	   in the addition (or stripping) of trailing LF bytes from (to)
259	   individual sequence element JSON texts.  This can break signature
260	   validation.  JSON has no canonical form for JSON texts, therefore
261	   neither does the JSON text sequence format.

263	4.  IANA Considerations

265	   The MIME media type for JSON text sequences is application/json-seq.

267	   Type name: application

269	   Subtype name: json-seq

271	   Required parameters: N/A

273	   Optional parameters: N/A

275	   Encoding considerations: binary

277	   Security considerations: See <this document, once published>,
278	   Section 3.

280	   Interoperability considerations: Described herein.

282	   Published specification: <this document, once published>.

284	   Applications that use this media type: <by publication time
285	   <https://stedolan.github.io/jq> is likely to support this format>.

287	   Fragment identifier considerations: N/A.

289	   Additional information:

291	   o  Deprecated alias names for this type: N/A.

293	   o  Magic number(s): N/A

295	   o  File extension(s): N/A.

297	   o  Macintosh file type code(s): N/A.

299	   o  Person & email address to contact for further information:

301	      *  json@ietf.org

303	   o  Intended usage: COMMON

305	   o  Author: See the "Authors' Addresses" section of this document.

307	   o  Change controller: IETF

309	5.  Acknowledgements

311	   Phillip Hallam-Baker proposed the use of JSON text sequences for
312	   logfiles and pointed out the need for resynchronization.  Stephen
313	   Dolan created <https://github.com/stedolan/jq>, which uses something
314	   like JSON text sequences (with LF as the separator between texts on
315	   output, and requiring only such whitespace as needed to disambiguate
316	   on input).  Carsten Bormann suggested the use of ASCII RS, and Joe
317	   Hildebrand suggested the use of LF in addition to RS for
318	   disambiguating top-level number values.  Paul Hoffman shepherded the
319	   Internet-Draft.  Many others contributed reviews and comments on the
320	   JSON Working Group mailing list.

322	6.  Normative References

324	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
325	              Requirement Levels", BCP 14, RFC 2119, March 1997.

327	   [RFC0020]  Cerf, V., "ASCII format for network interchange", RFC 20,
328	              October 1969.

330	   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
331	              10646", STD 63, RFC 3629, November 2003.

333	   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
334	              Specifications: ABNF", STD 68, RFC 5234, January 2008.

336	   [RFC7159]  Bray, T., "The JavaScript Object Notation (JSON) Data
337	              Interchange Format", RFC 7159, March 2014.

339	Author's Address

341	   Nicolas Williams
342	   Cryptonector, LLC

344	   Email: nico@cryptonector.com