idnits 2.17.1 

draft-phillips-record-jar-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 14.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 437.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 448.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 455.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 461.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 20, 2008) is 5909 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UAX31'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode'

  -- Obsolete informational reference (is this intentional?): RFC 4646
     (Obsoleted by RFC 5646)


     Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                   A. Phillips, Ed.
3	Internet-Draft                                               Yahoo! Inc.
4	Expires: August 23, 2008                               February 20, 2008

6	                         The record-jar Format
7	                      draft-phillips-record-jar-02

9	Status of this Memo

11	   By submitting this Internet-Draft, each author represents that any
12	   applicable patent or other IPR claims of which he or she is aware
13	   have been or will be disclosed, and any of which he or she becomes
14	   aware will be disclosed, in accordance with Section 6 of BCP 79.

16	   Internet-Drafts are working documents of the Internet Engineering
17	   Task Force (IETF), its areas, and its working groups.  Note that
18	   other groups may also distribute working documents as Internet-
19	   Drafts.

21	   Internet-Drafts are draft documents valid for a maximum of six months
22	   and may be updated, replaced, or obsoleted by other documents at any
23	   time.  It is inappropriate to use Internet-Drafts as reference
24	   material or to cite them other than as "work in progress."

26	   The list of current Internet-Drafts can be accessed at
27	   http://www.ietf.org/ietf/1id-abstracts.txt.

29	   The list of Internet-Draft Shadow Directories can be accessed at
30	   http://www.ietf.org/shadow.html.

32	   This Internet-Draft will expire on August 23, 2008.

34	Copyright Notice

36	   Copyright (C) The IETF Trust (2008).

38	Abstract

40	   The record-jar format provides a method of storing multiple records
41	   with a variable repertoire of fields in a text format.  This document
42	   provides a description of the format.  Comments are solicited and
43	   should be addressed to the mailing list 'record-jar@yahoogroups.com'
44	   and/or the author.

46	Table of Contents

48	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
49	   2.  Format and Grammar . . . . . . . . . . . . . . . . . . . . . .  4
50	     2.1.  Folding of Field Values  . . . . . . . . . . . . . . . . .  5
51	     2.2.  Comments . . . . . . . . . . . . . . . . . . . . . . . . .  7
52	     2.3.  Characters, Encodings, and Escapes . . . . . . . . . . . .  7
53	   3.  Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
54	   4.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
55	     4.1.  Normative References . . . . . . . . . . . . . . . . . . . 11
56	     4.2.  Informative References . . . . . . . . . . . . . . . . . . 11
57	   Appendix A.  Acknowledgements  . . . . . . . . . . . . . . . . . . 12
58	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 13
59	   Intellectual Property and Copyright Statements . . . . . . . . . . 14

61	1.  Introduction

63	   The record-jar format was originally described by The Art of Unix
64	   Programming [AOUP].  This format is useful for storing information in
65	   a human-readable text form, while making the data available for
66	   machine processing.  It is a flexible format, since it provides for
67	   an arbitrary range of fields in any given record and can be used to
68	   store data with variable length and content.

70	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
71	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
72	   document are to be interpreted as described in [RFC2119].

74	2.  Format and Grammar

76	   The record-jar format is described by the following ABNF ([RFC4234]):

78	   record-jar   = [encodingSig] [separator] *record
79	   record       = 1*field separator
80	   field        = ( field-name field-sep field-body CRLF )
81	   field-name   = 1*character
82	   field-sep    = *SP ":" *SP
83	   field-body   = *(continuation 1*character)
84	   continuation = ["\"] [[*SP CRLF] 1*SP]
85	   separator    = [blank-line] *("%%" [comment] CRLF)
86	   comment      = SP *69(character)
87	   character    = SP / ASCCHAR / UNICHAR / ESCAPE
88	   encodingSig  = "%%encoding" field-sep
89	                    *(ALPHA / DIGIT / "-" / "_") CRLF
90	   blank-line   = WSP CRLF

92	   ; ASCII characters except %x26 (&) and %x5C (\)
93	   ASCCHAR      = %x21-25 / %x27-5B / %x5D-7E
94	   ; Unicode characters
95	   UNICHAR      = %x80-10FFFF
96	   ESCAPE       = "\" ("\" / "&" / "r" / "n" / "t" )
97	                / "&#x" 2*6HEXDIG ";"

99	                              record-jar ABNF

101	   The record-jar format uses plain-text to represent data values.  A
102	   record-jar document consists of a sequence of records, each of which
103	   contains one or more fields.  Each record is separated from other
104	   records by at least one line beginning with the sequence "%%"
105	   (%x25.25).  A record MAY contain as many or as few fields as are
106	   necessary to convey the necessary data.  Empty records and blank
107	   lines are ignored.

109	   A field is a single, logical line of characters from the Universal
110	   Character Set (Unicode) [Unicode].  Each field is comprised of three
111	   parts: the field-name, the field-separator, and the field body.

113	   The field-name is an identifer.  Field-names consist of a sequence of
114	   Unicode characters.  Whitespace characters and colon (":", %x3A) are
115	   not permitted in a field-name.

117	   An application can impose additional restrictions on field-names.
118	   For example, they might be restricted to the characters permitted in
119	   identifiers according to Unicode Standards Annex #31 (UAX#31)
120	   [UAX31].  Or they might be restricted to a sequence of letters and
121	   digits from the US-ASCII [ISO646] character repertoire.

123	   Field-names are case sensitive.  Upper and lowercase letters are
124	   often used to visually break up the name, for example using
125	   CamelCase.  It is a common convention that field names use an initial
126	   capital letter, although this is not enforced.

128	   The field separator (field-sep) is the colon character (":", %x3A).
129	   The separator MAY be surrounded on either side by any amount of
130	   horizontal whitespace (tab or space characters).  The normal
131	   convention is one space on each side.

133	   The field-body contains the data value.  Logically, the field-body
134	   consists of a single line of text using any combination of characters
135	   from the Universal Character Set followed by a CRLF (newline).  The
136	   carriage return, newline, and tab characters, when they occur in the
137	   data value stored in the field-body, are represented by their common
138	   backslash escapes ("\r", "\n", and "\t" respectively).  See
139	   Section 2.3 for more information on escape sequences.

141	2.1.  Folding of Field Values

143	   Some protocols limit total line length.  For example, many Internet
144	   plain-text protocols limit lines to 72 total bytes.  To accommodate
145	   such limits or for readability and presentational purposes, the
146	   field-body portion of a field can be split into a multiple-line
147	   representation; this is called "folding".

149	   Successive lines in the same field-body begin with one or more
150	   whitespace characters.  When processing the record-jar format, the
151	   linear whitespace (including the newline and any preceeding spaces)
152	   is consumed by the processor and the two parts of the field-body
153	   joined to form a single, logical line.  For example:
154	   Eulers-Number : 2.718281828459045235360287471
155	     352662497757247093699959574966967627724076630353547
156	     5945713821785251664274274663919320030599218174135...

158	                       Figure 2: Example of Folding

160	   Note that imposing a line length limit effectively limits the length
161	   of the field-name, since the field separator MUST appear on the same
162	   line with the field-name and the field-name MUST NOT be folded.
163	   Also, when imposing a line length limit, note that some encodings
164	   (including the Unicode encodings) can use a variable number of bytes
165	   per character or commonly use more than one byte per character.
166	   Characters MUST NOT be folded in the middle of a byte sequence.

168	   It is RECOMMENDED that folding not occur between characters inside a
169	   Unicode grapheme cluster (since this will alter the display of
170	   characters in the file and might result in unintentional alteration
171	   of the file's semantics).  Information on grapheme clusters can be
172	   found in [UAX29]

174	   In some cases, the field-body contains spaces that are important to
175	   the data.  To accurately preserve whitespace in the document, an
176	   optional line-continuation character (backslash, %x5C) MAY be
177	   included to delimit and separate whitespace to be preserved from
178	   whitespace that will be removed by the processor.  The line-
179	   continuation character and any whitespace that follows it (including
180	   whitespace at the beginning of the continuing field-body on the next
181	   line) MUST be consumed by the processor when reading the file.
182	   Whitespace appearing before the line-continuation MUST NOT be
183	   consumed.  Use of the line continuation character makes the
184	   whitespace visible in the file.

186	   In other cases, the field-body might contain natural language text,
187	   and, while it is readily apparent that many languages use spaces to
188	   separate words, others, such as Japanese or Thai, do not.
189	   Implementations MAY, in the absence of line continuation characters,
190	   replace the continuation sequence (the line break and surrounding
191	   whitespace) in a folded line with a single ASCII space (%x20),
192	   however, implementations SHOULD just remove the continuation sequence
193	   altogether in order to avoid causing unnatural breaks in the text.

195	   Here are some examples:
196	   SomeField : This is some running text \
197	    that is continued on several lines \
198	    and which preserves spaces between \
199	    the words.
200	   %%
201	   AnotherExample: There are three spaces   \
202	   between 'spaces' and 'between' in this record.
203	   %%
204	   SwallowingExample: There are no spaces between \
205	          the numbers one and two in this example 1\
206	          2.
207	   %%

209	          Figure 3: Example of Folding with Preserved Whitespace

211	   Note that entirely blank continuation lines are not permitted.  That
212	   is, this record is illegal, since the field-body of "SomeText" would
213	   be the empty string:

215	   %%
216	   SomeText:               \
217	                           \
218	                           \
219	   %%

221	                   Figure 4: Whitespace Folding Example

223	2.2.  Comments

225	   Comments MAY be included in the body of the record-jar document by
226	   placing them at the end of a separator line.  The comment MUST be
227	   separated by at least one space from the "%%" sequence that
228	   introduces the record separator.

230	   Multiple record separators (including comment lines) MAY appear
231	   between records.  Logically this appears to result in records that
232	   contain no fields: records containing no fields MUST be ignored by a
233	   processor.

235	   Folding of comments is not permitted; instead multiple comment lines
236	   MUST be used.  Comments can not appear in the body of a record.  For
237	   example:
238	   %% this is a comment.
239	   Record: goes here
240	   %%
241	   %% here is another sequence of comments
242	   %% that appear on multiple lines
243	   Record: another record
244	   %% a final comment
245	   %%

247	                         Figure 5: Comment example

249	   Although comments are not associated with any particular record in
250	   the file, processors that preserve comments sometimes treat the
251	   comments as if they were associated with the record just following
252	   them.  Reserialization of a record-jar file would thus restore the
253	   comments to their logical position in the file.  In many cases,
254	   processing a record-jar file loses comment information associated
255	   with the file.

257	2.3.  Characters, Encodings, and Escapes

259	   By default, a file containing a record-jar archive uses the UTF-8
260	   character encoding (see [RFC3629]).  If an application, protocol, or
261	   specification permits a character encoding other than UTF-8 to be
262	   used in the file, it SHOULD also support reading the character
263	   encoding from the encoding signature.

265	   The encoding signature, when present, MUST be the very first line of
266	   the file.  If the encoding signature is not present, an application
267	   or protocol MAY attempt to infer the character encoding using other
268	   means.  Record-jar files SHOULD always include an encoding signature,
269	   even if one is not required, whenever the application, protocol, or
270	   specification permits one.

272	   A file that uses the UTF-16 or UTF-32 encoding MAY also include a
273	   Byte Order Mark (U+FEFF) as the first sequence of two octets (in the
274	   case of UTF-16) or four octets (in the case of UTF-32) in the file,
275	   just preceeding the encoding signature.

277	   Some applications, protocols, or specifications require that the
278	   record-jar file use some other, non-Unicode, legacy character
279	   encoding.  In particular, some applications, protocols, or
280	   specifications only support the US-ASCII character set ([ISO646]).

282	   Here is an example of the encoding signature for the UTF-8 encoding
283	   of Unicode:
284	   %%encoding:UTF-8

286	                Figure 6: Example of an Encoding Signature

288	   Printable ASCII characters excepting backslash ("\") and ampersand
289	   ("&") are represented as themselves.

291	   Non-ASCII values MAY be included in a record-jar file in several
292	   ways.  For portability, the best mechanism is to use escape sequences
293	   in the field-body.  Exclusive use of escape sequences results in a
294	   pure ASCII text file.

296	   Non-ASCII characters MAY be represented using the character's Unicode
297	   value represented using the Numeric Character Reference format
298	   adapted from XML; the sequence "&#x" (%x26.23.78) is followed by the
299	   character's Unicode scalar value in hex followed directly by the
300	   semi-colon character (";", %x3B).  Leading zeroes MAY be omitted.
301	   For example, the EURO SIGN is U+20AC and could be represented as
302	   "&#x20ac;".

304	   Non-ASCII characters MAY also be represented as their associated
305	   octet sequence in the file's character encoding.  For example, the
306	   EURO SIGN would be represented as the octet sequence %xE2.82.AC,
307	   since those three bytes encode that character in UTF-8.

309	   The characters for carriage return, newline, and tab when considered
310	   as part of the data (and not the file format itself) are represented
311	   by the traditional escape sequences "\r" (%x5C.72), "\n" (%x5C.6E),
312	   and "\t" (%x5C.74) respectively.  The character backslash is
313	   represented by "\\" (%x5C.5C), while the ampersand character is
314	   represented by "\&" (%x5C.26).  A single backslash at the end of a
315	   line indicates continuation, as discussed in Section 2.1.  Otherwise
316	   a single backslash followed by some other character in the data is an
317	   error, although a record-jar processor MAY choose to interpret it as
318	   a backslash.

320	3.  Examples

322	   Here is the canonical example from [AOUP]:
323	   Planet: Mercury
324	   Orbital-Radius: 57,910,000 km
325	   Diameter: 4,880 km
326	   Mass: 3.30e23 kg
327	   %%
328	   Planet: Venus
329	   Orbital-Radius: 108,200,000 km
330	   Diameter: 12,103.6 km
331	   Mass: 4.869e24 kg
332	   %%
333	   Planet: Earth
334	   Orbital-Radius: 149,600,000 km
335	   Diameter: 12,756.3 km
336	   Mass: 5.972e24 kg
337	   Moons: Luna

339	   A more complete example showing more of the various features in the
340	   format is described in [RFC4646].  The data shown here is taken from
341	   the Language Subtag Registry defined that document:
342	   %%
343	   Type: language
344	   Subtag: ia
345	   Description: Interlingua (International Auxiliary Language \
346	     Association)
347	   Added: 2005-08-16
348	   %%
349	   Type: language
350	   Subtag: id
351	   Description: Indonesian
352	   Added: 2005-08-16
353	   Suppress-Script: Latn
354	   %%
355	   Type: language
356	   Subtag: nb
357	   Description: Norwegian Bokm&#xE5;l
358	   Added: 2005-08-16
359	   Suppress-Script: Latn
360	   %%

362	4.  References

364	4.1.  Normative References

366	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
367	              Requirement Levels", BCP 14, RFC 2119, March 1997.

369	   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
370	              10646", STD 63, RFC 3629, November 2003.

372	   [RFC4234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
373	              Specifications: ABNF", draft-crocker-abnf-rfc2234bis-00
374	              (work in progress), October 2005,
375	              <ftp://ftp.rfc-editor.org/in-notes/rfc4234.txt>.

377	   [UAX31]    Davis, M., "Unicode Standard Annex #31: Identifier and
378	              Pattern Syntax", 09 2006.

380	   [Unicode]  Unicode Consortium, "The Unicode Consortium. The Unicode
381	              Standard, Version 5.0, (Boston, MA, Addison-Wesley, 2003.
382	              ISBN 0-321-49081-0)", January 2007.

384	4.2.  Informative References

386	   [AOUP]     Raymond, E., "The Art of Unix Programming", 2003,
387	              <urn:isbn:0-13-142901-9>.

389	   [ISO646]   International Organization for Standardization, "ISO/IEC
390	              646:1991, Information technology -- ISO 7-bit coded
391	              character set for information interchange.", 1991.

393	   [RFC4646]  Phillips, A., Ed. and M. Davis, Ed., "Tags for the
394	              Identification of Languages", September 2006,
395	              <http://www.ietf.org/rfc/rfc4646.txt>.

397	   [UAX29]    Davis, M., "Unicode Standard Annex #29: Text Boundaries",
398	              10 2006, <UAX29>.

400	Appendix A.  Acknowledgements

402	   Thanks to Eris S. Raymond for his gracious permission to both
403	   reference and quote The Art of Unix Programming in this document.
404	   Without his work, this document would likely not exist.

406	   Contributors to this document include: Stephane Bortzmeyer, John
407	   Cowan, Frank Ellerman, Doug Ewell.

409	   The IETF LTRU working group adopted record-jar format on John Cowan's
410	   suggestion.  That effort required record-jar to be documented and
411	   many people in that group contributed to this work there: the author
412	   thanks everyone who participated in that effort, even though names
413	   cannot be mustered here.

415	Author's Address

417	   Addison Phillips (editor)
418	   Yahoo! Inc.

420	   Email: addison@inter-locale.com
421	   URI:   http://www.inter-locale.com

423	Full Copyright Statement

425	   Copyright (C) The IETF Trust (2008).

427	   This document is subject to the rights, licenses and restrictions
428	   contained in BCP 78, and except as set forth therein, the authors
429	   retain all their rights.

431	   This document and the information contained herein are provided on an
432	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
433	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
434	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
435	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
436	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
437	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

439	Intellectual Property

441	   The IETF takes no position regarding the validity or scope of any
442	   Intellectual Property Rights or other rights that might be claimed to
443	   pertain to the implementation or use of the technology described in
444	   this document or the extent to which any license under such rights
445	   might or might not be available; nor does it represent that it has
446	   made any independent effort to identify any such rights.  Information
447	   on the procedures with respect to rights in RFC documents can be
448	   found in BCP 78 and BCP 79.

450	   Copies of IPR disclosures made to the IETF Secretariat and any
451	   assurances of licenses to be made available, or the result of an
452	   attempt made to obtain a general license or permission for the use of
453	   such proprietary rights by implementers or users of this
454	   specification can be obtained from the IETF on-line IPR repository at
455	   http://www.ietf.org/ipr.

457	   The IETF invites any interested party to bring to its attention any
458	   copyrights, patents or patent applications, or other proprietary
459	   rights that may cover technology that may be required to implement
460	   this standard.  Please address the information to the IETF at
461	   ietf-ipr@ietf.org.

463	Acknowledgment

465	   Funding for the RFC Editor function is provided by the IETF
466	   Administrative Support Activity (IASA).