idnits 2.17.1 draft-phillips-record-jar-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 14. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 416. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 427. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 434. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 440. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 24, 2007) is 6090 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'UAX31' -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode' -- Obsolete informational reference (is this intentional?): RFC 4646 (Obsoleted by RFC 5646) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Phillips, Ed. 3 Internet-Draft Yahoo! Inc. 4 Expires: February 25, 2008 August 24, 2007 6 The record-jar Format 7 draft-phillips-record-jar-01 9 Status of this Memo 11 By submitting this Internet-Draft, each author represents that any 12 applicable patent or other IPR claims of which he or she is aware 13 have been or will be disclosed, and any of which he or she becomes 14 aware will be disclosed, in accordance with Section 6 of BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on February 25, 2008. 34 Copyright Notice 36 Copyright (C) The IETF Trust (2007). 38 Abstract 40 The record-jar format provides a method of storing multiple records 41 with a variable repertoire of fields in a text format. This document 42 provides a description of the format. Comments are solicited and 43 should be addressed to the mailing list 'record-jar@yahoogroups.com' 44 and/or the author. 46 Table of Contents 48 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 49 2. Format and Grammar . . . . . . . . . . . . . . . . . . . . . . 4 50 2.1. Folding of Field Values . . . . . . . . . . . . . . . . . 5 51 2.2. Comments . . . . . . . . . . . . . . . . . . . . . . . . . 7 52 2.3. Characters, Encodings, and Escapes . . . . . . . . . . . . 7 53 3. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 54 4. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 55 4.1. Normative References . . . . . . . . . . . . . . . . . . . 10 56 4.2. Informative References . . . . . . . . . . . . . . . . . . 10 57 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 11 58 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 12 59 Intellectual Property and Copyright Statements . . . . . . . . . . 13 61 1. Introduction 63 The record-jar format was originally described by The Art of Unix 64 Programming [AOUP]. This format is useful for storing information in 65 a human-readable text form, while making the data available for 66 machine processing. It is a flexible format, since it provides for 67 an arbitrary range of fields in any given record and can be used to 68 store data with variable length and content. 70 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 71 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 72 document are to be interpreted as described in [RFC2119]. 74 2. Format and Grammar 76 The record-jar format is described by the following ABNF ([RFC4234]): 78 record-jar = [encodingSig] [separator] *record 79 record = 1*field separator 80 field = ( field-name field-sep field-body CRLF ) 81 field-name = *character 82 field-sep = *SP ":" *SP 83 field-body = *(continuation 1*character) 84 continuation = ["\"] [[*SP CRLF] 1*SP] 85 separator = [blank-line] *("%%" [comment] CRLF) 86 comment = SP *69(character) 87 character = SP / ASCCHAR / UNICHAR / ESCAPE 88 encodingSig = "%%encoding" field-sep 89 *(ALPHA / DIGIT / "-" / "_") CRLF 90 blank-line = WSP CRLF 92 ; ASCII characters except %x26 (&) and %x5C (\) 93 ASCCHAR = %x21-25 / %x27-5B / %x5D-7E 94 UNICHAR = %x80-%x10FFFF ; Unicode chars 95 ESCAPE = "\" ("\" / "&" / "r" / "n" / "t" ) 96 / "&#x" 2*6HEXDIG ";" 98 record-jar ABNF 100 The record-jar format consists of character data that forms a 101 sequence of records. Each record is separated from other records by 102 at least one line beginning with the sequence "%%" (%x25.25). 103 Records are made up of one or more fields and a record MAY contain as 104 many or as few fields as are necessary to convey the necessary data. 105 Empty records and blank lines are ignored. 107 A field is a single, logical line of characters from the Universal 108 Character Set (Unicode) [Unicode], comprised of three parts: the 109 field-name, the field-separator, and the field body. 111 The field-name is an identifer. Field-names SHOULD consist only of 112 characters permitted in identifiers according to Unicode Standards 113 Annex #31 (UAX#31) [UAX31] and SHOULD start only with characters with 114 the property ID_Start. Often field-names are further restricted to a 115 sequence of letters and digits from the US-ASCII character set 116 [ISO646]. A field-name SHOULD be treated as case sensitive and MUST 117 NOT contain any spaces. Upper and lowercase letters are often used 118 to visually break up the name, for example using CamelCase. It is a 119 common convention that field names use an initial capital letter, 120 although this is not enforced. The hyphen-minus character ("-", 121 %x2D) MAY be used to separate parts of the name visually, however, it 122 MUST NOT appear at the beginning or end of a field-name. 124 The field separator (field-sep) is the colon character (":", %x3A). 125 The separator MAY be surrounded on either side by any amount of 126 horizontal whitespace (tab or space characters). The normal 127 convention is one space on each side. 129 The field-body contains the data value. Logically, the field-body 130 consists of a single line of text using any combination of characters 131 from the Universal Character Set followed by a CRLF (newline). The 132 carriage return, newline, and tab characters, when they occur in the 133 data value stored in the field-body, are represented by their common 134 backslash escapes ("\r", "\n", and "\t" respectively). See 135 Section 2.3 for more information on escape sequences. 137 2.1. Folding of Field Values 139 Some protocols limit total line length. For example, many Internet 140 plain-text protocols limits lines to 72 total bytes. To accommodate 141 such limits or for readability and presentational purposes, the 142 field-body portion of a field can be split into a multiple-line 143 representation; this is called "folding". 145 Successive lines in the same field-body begin with one or more 146 whitespace characters. When processing the record-jar format, the 147 linear whitespace (including the newline and any preceeding spaces) 148 is consumed by the processor and the two parts of the field-body 149 joined to form a single, logical line. For example: 150 Eulers-Number : 2.718281828459045235360287471 151 352662497757247093699959574966967627724076630353547 152 5945713821785251664274274663919320030599218174135... 154 Figure 2: Example of Folding 156 Note that imposing a line length limit effectively limits the length 157 of the field-name, since the field separator MUST appear on the same 158 line with the field-name and the field-name MUST NOT be folded. 159 Also, when imposing a line length limit, note that some encodings 160 (including the Unicode encodings) can use a variable number of bytes 161 per character or commonly use more than one byte per character. 162 Characters MUST NOT be folded in the middle of a byte sequence. 163 Furthermore, folding SHOULD NOT be done just prior to a combining 164 character (since this will alter the display of characters in the 165 file and might result in unintentional alteration of the file's 166 semantics). 168 In some cases, the field-body contains spaces that are important to 169 the data. To accurately preserve whitespace in the document, an 170 optional line-continuation character (backslash, %x5C) MAY be 171 included to delimit and separate whitespace to be preserved from 172 whitespace that will be removed by the processor. The line- 173 continuation character and any whitespace that follows it (including 174 whitespace at the beginning of the continuing field-body on the next 175 line) MUST be consumed by the processor when reading the file. 176 Whitespace appearing before the line-continuation MUST NOT be 177 consumed. Use of the line continuation character makes the 178 whitespace visible in the file. 180 In other cases, the field-body might contain natural language text, 181 and, while it is readily apparent that many languages use spaces to 182 separate words, others, such as Japanese or Thai, do not. 183 Implementations MAY, in the absence of line continuation characters, 184 replace the continuation sequence (the line break and surrounding 185 whitespace) in a folded line with a single ASCII space (%x20), 186 however, implementations SHOULD just remove the continuation sequence 187 altogether in order to avoid causing unnatural breaks in the text. 189 Here are some examples: 190 SomeField : This is some running text \ 191 that is continued on several lines \ 192 and which preserves spaces between \ 193 the words. 194 %% 195 AnotherExample: There are three spaces \ 196 between 'spaces' and 'between' in this record. 197 %% 198 SwallowingExample: There are no spaces between \ 199 the numbers one and two in this example 1\ 200 2. 201 %% 203 Figure 3: Example of Folding with Preserved Whitespace 205 Note that entirely blank continuation lines are not permitted. That 206 is, this record is illegal, since the field-body of "SomeText" would 207 be the empty string: 208 %% 209 SomeText: \ 210 \ 211 \ 212 %% 214 Figure 4: Whitespace Folding Example 216 2.2. Comments 218 Comments MAY be included in the body of the record-jar document by 219 placing them at the end of a separator line. The comment MUST be 220 separated by at least one space from the "%%" sequence that 221 introduces the separator. 223 Multiple separators MAY appear between records. Logically this 224 appears to result in records that contain no fields: records 225 containing no fields MUST be ignored by a processor. 227 Folding of comments is not permitted; instead multiple comment lines 228 MUST be used. Comments can not appear in the body of a record. For 229 example: 230 %% this is a comment. 231 Record: goes here 232 %% 233 %% here is another sequence of comments 234 %% that appear on multiple lines 235 Record: another record 236 %% a final comment 237 %% 239 Figure 5: Comment example 241 2.3. Characters, Encodings, and Escapes 243 By default, a file containing a record-jar archive uses the UTF-8 244 character encoding (see [RFC3629]). If an application, protocol, or 245 specification permits an encoding other than UTF-8 to be used in the 246 file, it SHOULD also support reading the encoding from the encoding 247 signature. The encoding signature, when present, MUST be the very 248 first line of the file. If the encoding signature is not present, an 249 application or protocol MAY attempt to infer the encoding using other 250 means. Record-jar files SHOULD include an encoding signature, even 251 if one is not required, whenever the application, protocol, or 252 specification permits one. 254 A file that uses the UTF-16 or UTF-32 encoding MAY also include a 255 Byte Order Mark (U+FEFF) as the first sequence of two octets (in the 256 case of UTF-16) or four octets (in the case of UTF-32) in the file, 257 just preceeding the encoding signature. 259 Some applications, protocols, or specifications require that the 260 record-jar file use some other, non-Unicode, legacy character set. 261 In particular, some applications, protocols, or specifications only 262 support the US-ASCII character set ([ISO646]). 264 Here is an example of the encoding signature for the UTF-8 encoding 265 of Unicode: 266 %%encoding:UTF-8 268 Figure 6: Example of an Encoding Signature 270 Printable ASCII characters excepting backslash ("\") and ampersand 271 ("&") are represented as themselves. 273 Non-ASCII values MAY be included in a record-jar file in several 274 ways. For portability, the best mechanism is to use escape sequences 275 in the field-body. Exclusive use of escape sequences results in a 276 pure ASCII text file. 278 Non-ASCII characters MAY be represented using the character's Unicode 279 value represented using the Numeric Character Reference format 280 adapted from XML; the sequence "&#x" (%x26.23.78) is followed by the 281 character's Unicode scalar value in hex followed directly by the 282 semi-colon character (";", %x3B). Leading zeroes MAY be omitted. 283 For example, the EURO SIGN is U+20AC and could be represented as 284 "€". 286 Non-ASCII characters MAY also be represented as their associated 287 octet sequence in the file's character encoding. For example, the 288 EURO SIGN would be represented as the byte sequence %xE2.82.AC in 289 UTF-8. 291 The characters for carriage return, newline, and tab when considered 292 as part of the data (and not the file format itself) are represented 293 by the traditional escape sequences "\r" (%x5C.72), "\n" (%x5C.6E), 294 and "\t" (%x5C.74) respectively. The character backslash is 295 represented by "\\" (%x5C.5C), while the ampersand character is 296 represented by "\&" (%x5C.26). A single backslash at the end of a 297 line indicates continuation, as discussed in Section 2.1. Otherwise 298 a single backslash followed by some other character in the data is an 299 error, although a record-jar processor MAY choose to interpret it as 300 a backslash. 302 3. Examples 304 Here is the canonical example from [AOUP]: 305 Planet: Mercury 306 Orbital-Radius: 57,910,000 km 307 Diameter: 4,880 km 308 Mass: 3.30e23 kg 309 %% 310 Planet: Venus 311 Orbital-Radius: 108,200,000 km 312 Diameter: 12,103.6 km 313 Mass: 4.869e24 kg 314 %% 315 Planet: Earth 316 Orbital-Radius: 149,600,000 km 317 Diameter: 12,756.3 km 318 Mass: 5.972e24 kg 319 Moons: Luna 321 A more complete example showing more of the various features in the 322 format is described in [RFC4646]. The data shown here is taken from 323 the Language Subtag Registry defined that document: 324 %% 325 Type: language 326 Subtag: ia 327 Description: Interlingua (International Auxiliary Language \ 328 Association) 329 Added: 2005-08-16 330 %% 331 Type: language 332 Subtag: id 333 Description: Indonesian 334 Added: 2005-08-16 335 Suppress-Script: Latn 336 %% 337 Type: language 338 Subtag: nb 339 Description: Norwegian Bokmål 340 Added: 2005-08-16 341 Suppress-Script: Latn 342 %% 344 4. References 346 4.1. Normative References 348 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 349 Requirement Levels", BCP 14, RFC 2119, March 1997. 351 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 352 10646", STD 63, RFC 3629, November 2003. 354 [RFC4234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 355 Specifications: ABNF", draft-crocker-abnf-rfc2234bis-00 356 (work in progress), October 2005, 357 . 359 [UAX31] Davis, M., "Unicode Standard Annex #31: Identifier and 360 Pattern Syntax", 09 2006. 362 [Unicode] Unicode Consortium, "The Unicode Consortium. The Unicode 363 Standard, Version 5.0, (Boston, MA, Addison-Wesley, 2003. 364 ISBN 0-321-49081-0)", January 2007. 366 4.2. Informative References 368 [AOUP] Raymond, E., "The Art of Unix Programming", 2003, 369 . 371 [ISO646] International Organization for Standardization, "ISO/IEC 372 646:1991, Information technology -- ISO 7-bit coded 373 character set for information interchange.", 1991. 375 [RFC4646] Phillips, A., Ed. and M. Davis, Ed., "Tags for the 376 Identification of Languages", September 2006, 377 . 379 Appendix A. Acknowledgements 381 Thanks to Eris S. Raymond for his gracious permission to both 382 reference and quote The Art of Unix Programming in this document. 383 Without his work, this document would likely not exist. 385 Contributors to this document include: Stephane Bortzmeyer, John 386 Cowan, Frank Ellerman, Doug Ewell. 388 The IETF LTRU working group adopted record-jar format on John Cowan's 389 suggestion. That effort required record-jar to be documented and 390 many people in that group contributed to this work there: the author 391 thanks everyone who participated in that effort, even though names 392 cannot be mustered here. 394 Author's Address 396 Addison Phillips (editor) 397 Yahoo! Inc. 399 Email: addison@inter-locale.com 400 URI: http://www.inter-locale.com 402 Full Copyright Statement 404 Copyright (C) The IETF Trust (2007). 406 This document is subject to the rights, licenses and restrictions 407 contained in BCP 78, and except as set forth therein, the authors 408 retain all their rights. 410 This document and the information contained herein are provided on an 411 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 412 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 413 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 414 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 415 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 416 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 418 Intellectual Property 420 The IETF takes no position regarding the validity or scope of any 421 Intellectual Property Rights or other rights that might be claimed to 422 pertain to the implementation or use of the technology described in 423 this document or the extent to which any license under such rights 424 might or might not be available; nor does it represent that it has 425 made any independent effort to identify any such rights. Information 426 on the procedures with respect to rights in RFC documents can be 427 found in BCP 78 and BCP 79. 429 Copies of IPR disclosures made to the IETF Secretariat and any 430 assurances of licenses to be made available, or the result of an 431 attempt made to obtain a general license or permission for the use of 432 such proprietary rights by implementers or users of this 433 specification can be obtained from the IETF on-line IPR repository at 434 http://www.ietf.org/ipr. 436 The IETF invites any interested party to bring to its attention any 437 copyrights, patents or patent applications, or other proprietary 438 rights that may cover technology that may be required to implement 439 this standard. Please address the information to the IETF at 440 ietf-ipr@ietf.org. 442 Acknowledgment 444 Funding for the RFC Editor function is provided by the IETF 445 Administrative Support Activity (IASA).