idnits 2.17.1 draft-seantek-kerwin-arcmedia-type-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document date (October 27, 2014) is 3469 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'FILE' is mentioned on line 93, but not defined == Missing Reference: 'CITE' is mentioned on line 299, but not defined == Unused Reference: 'RFC6838' is defined on line 382, but no explicit reference was found in the text Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Leonard 3 Internet-Draft Penango, Inc. 4 Intended Status: Standards Track M. Kerwin 5 Expires: April 30, 2015 October 27, 2014 7 The Archive Primary Media Type for File Archives 8 draft-seantek-kerwin-arcmedia-type-00 10 Abstract 12 This document defines a new primary content-type to be known as 13 "archive", which defines a fundamental type of content with unique 14 presentational, hardware, and processing aspects. 16 Status of This Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF). Note that other groups may also distribute 23 working documents as Internet-Drafts. The list of current Internet- 24 Drafts is at http://datatracker.ietf.org/drafts/current/. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 This Internet-Draft will expire on April 30, 2015. 33 Copyright Notice 35 Copyright (c) 2014 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. Code Components extracted from this document must 44 include Simplified BSD License text as described in Section 4.e of 45 the Trust Legal Provisions and are provided without warranty as 46 described in the Simplified BSD License. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 51 1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 2 52 1.2. Notational Conventions . . . . . . . . . . . . . . . . . . 2 53 2. Definition of an archive . . . . . . . . . . . . . . . . . . . 2 54 3. Consultation Mechanisms . . . . . . . . . . . . . . . . . . . 5 55 4. Encoding and Transport . . . . . . . . . . . . . . . . . . . . 5 56 5. Common Required and Optional Parameters . . . . . . . . . . . 7 57 6. Split Archives . . . . . . . . . . . . . . . . . . . . . . . . 7 58 7. Fragment Identifier Syntax . . . . . . . . . . . . . . . . . . 8 59 8. Piped-Composite Type Suffix Syntax . . . . . . . . . . . . . . 8 60 9. Security Considerations . . . . . . . . . . . . . . . . . . . 8 61 10. Normative References . . . . . . . . . . . . . . . . . . . . . 8 62 Appendix A. Expected Subtypes . . . . . . . . . . . . . . . . . . 9 64 1. Introduction 66 The purpose of this memo is to propose an update to [RFC2045] to 67 include a new primary content-type to be known as "archive". 68 [RFC2045] describes mechanisms for specifying and describing the 69 format of Internet Message Bodies via content-type/subtype pairs. 70 "archive" defines a fundamental type of content with unique 71 presentational, hardware, and processing aspects. Various subtypes 72 of this primary type are immediately anticipated, and will be covered 73 under separate documents. 75 1.1. Overview 77 This document will outline what an archive is, show examples of 78 archives, and discuss the benefits of grouping archives together. 80 This document is a discussion document for an agreed definition, 81 intended eventually to form a standard accepted extension to 82 [RFC2045]. 84 1.2. Notational Conventions 86 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 87 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 88 document are to be interpreted as described in [RFC2119]. 90 2. Definition of an archive 92 An archive primary media type identifies data that represents one or 93 more files [FILE] along with metadata. Archives are used to collect 94 multiple data files together into a single file for easier 95 portability and storage. Archive formats can provide many optional 96 services, including: 98 1. compression 99 2. encryption 100 3. authentication 101 4. backup 102 5. filesystem imaging 103 6. software packaging and distribution 104 7. volume-splitting (archive split into multiple contents) 105 8. block storage 107 Formats and techniques that perform one or more of these services 108 already exist under separate registrations. For example, the Content- 109 Encoding header can be used to compress Internet message content. The 110 distinguishing feature of the archive primary type is that these 111 services are integrated into the format itself, along with the 112 inclusion of file-specific metadata. Virtually all formats 113 contemplated under this primary type are designed to concatenate 114 multiple files into a single data stream, along with filenames and 115 other metadata. When an Internet-facing application handles content 116 labeled with this type, it SHOULD provide handling consistent with 117 the archive as a discrete data item. For example, an Internet mail 118 user agent would display an archive-labeled type with an archive 119 icon, possibly with a preview of the files contained therein (as 120 opposed to automatically traversing its contents, as it would for 121 multipart-labeled content). 123 Common operations include creating an archive, identifying files in 124 an archive, adding to an archive, backing up to an archive, 125 extracting an archive, restoring from an archive, deleting from an 126 archive, mounting and unmounting an archive, [[TODO: executing an 127 archive?]], and installing and uninstalling an archive. 129 * Creating: taking files from a filesystem and representing those 130 files in an archive. 132 * Identifying files: parsing an archive's format, extracting 133 information about files represented in the archive. 135 * Adding: parsing an archive's format, adding files or non-file data 136 to the archive. In virtually all cases, at least some part of 137 the archive's content will be modified (though perhaps only at 138 the end). Unlike, for instance, text media types, concatenating 139 two separate archive contents *never* yields a valid composite 140 archive. 142 * Backing up: taking some or all of a filesystem and representing the 143 filesystem in an archive, with the express intention of 144 recording the files as they exist in a source filesystem at the 145 time of backing up. For example, the compression, encryption, 146 and access control list (permissions) properties of the files 147 would be preserved. 149 * Extracting: parsing an archive's format, copying file data (or file 150 metadata) out of the archive into one or more files on a 151 destination filesystem. This operation implies that at least 152 some file metadata will be preserved, while other file metadata 153 may be adjusted or added to adapt to the local environment. 155 * Restoring: parsing an archive's format, copying file data out of 156 the archive into the destination filesystem, with the express 157 intention of recreating the files as they existed in a source 158 filesystem at the time of backing up. For example, the 159 compression, encryption, and access control list (permissions) 160 properties of the files would be preserved. 162 * Deleting: parsing an archive's format, removing file data (or 163 metadata) from the archive, requiring changes to the archive's 164 contents. Some archive formats permit orphan data in the archive 165 content; other formats require re-serializing some or all of the 166 archive. 168 * Mounting and unmounting: Mapping an archive's semantics directly to 169 a filesystem, so that the files represented in the archive can 170 be accessed using the filesystem's namespace with typical 171 filesystem APIs. Rather than being backed by a physical block 172 storage device, that part of the filesystem is backed by the 173 archive. 175 * Executing [[NB: this may be controversial; it is worth 176 discussing]]: Identifying executable semantics of an archive, 177 and causing code to execute. 179 * Installing and uninstalling [[NB: this may be controversial; it is 180 worth discussing]]: Treating the archive as a software package, 181 extracting certain contents in the archive and executing other 182 contents in the archive, according to some software packaging 183 protocol. 185 3. Consultation Mechanisms 187 Before proposing a subtype for the archive/* primary type, it is 188 suggested that the subtype author examine the definition (above) of 189 what an archive/* is and the listing (below) of what an archive/* is 190 not. Additional consultations with the authors of the existing 191 archive/* subtypes is also suggested. 193 4. Encoding and Transport 195 Unrecognized subtypes of archive SHOULD at a minimum be treated as 196 "archive/file". Like "application/octet-stream", the purpose of the 197 "archive/file" is to provide default handling; it does not represent 198 a particular archive format. Implementations SHOULD pass subtypes of 199 archive that they do not specifically recognize to a robust 200 general-purpose archive viewing application, if such an application 201 is available. 203 If default archive (archive/file) handling is not supported, it is 204 appropriate to treat the archive like "application/octet-stream". 206 Unless noted in the subtype registration, subtypes of archive SHALL 207 be assumed to contain binary data, implying a content encoding of 208 base64 for email and binary transfer for ftp and http. 210 The formal syntax for the subtypes of the model primary type SHOULD 211 look like this: 213 Type name: 215 archive 217 Subtype name: 219 xxxxxxxx 221 Required parameters: 223 none 225 Optional parameters: 227 TBD 229 Encoding considerations: 231 base64 encoding is recommended when transmitting archive/* 232 documents through MIME electronic mail. 234 Security considerations: 236 see Section 5 below 238 Interoperability considerations: 240 TBD 242 Published specification: 244 TBD 246 Applications that use this media type: 248 TBD 250 Fragment identifier considerations: 252 The considerations of this document, plus any extra syntaxes 253 not inconsistent with this document. 255 Additional information: 257 Deprecated alias names for this type: 258 (Include non-archive alias names, 259 such as those in application.) 260 Magic number(s): TBD 261 File extension(s): TBD 262 Macintosh file type code(s): TBD 264 See Appendix A for references to some of the expected subtypes. 266 Person and email address to contact for further information: 268 TBD 270 Intended usage: TBD (COMMON will be the most common) 272 Restrictions on usage: TBD 274 Author: TBD 276 Change controller: TBD 278 Provisional registration? (standards tree only): (Yes/No) 280 (Any other information that the author deems interesting may be 281 added below this line.) 283 The optional parameters consist of starting conditions and variable 284 values used as part of the subtypes. 286 5. Common Required and Optional Parameters 288 Unlike the text primary media type (for instance), virtually all 289 archive formats have been designed with almost all of the information 290 required for interpretation contained within the format. Therefore, 291 parameters are NOT RECOMMENDED; registrants are not expected to 292 register additional parameters. 294 Regrettably, not all archive formats are as "universal" or "complete" 295 as one might assume at first glance. This is because some archive 296 formats are very old or are based on older formats where backwards- 297 compatibility was a design goal; thus they were not designed with 298 transport across the Internet in mind. The ZIP file is an example: 299 although the modern ZIP supports Unicode [CITE], the default encoding 300 of ZIP filenames has always been Code Page 437. Since "archive" 301 contents are literally archives of computing history, sometimes 302 communicating the archive as-is, rather than updating the archive to 303 a more universal format, is necessary. 305 Implementations that are archive-type aware MUST support the 306 following parameters for maximum compatibility. At the same time, new 307 archives SHOULD NOT rely on these parameters for disambiguation; new 308 archives SHOULD be created in such a way that "universal" 309 interoperability is achieved with the archive's self-contained 310 information. [[TODO: code page--it's like charset but only applies to 311 certain strings in the archive, when the archive format is ambiguous; 312 do NOT attempt to apply this parameter as one would apply charset to 313 text/*. Endian-ness? Time/Y2K representation issues? Anything else?]] 315 6. Split Archives 317 Several archive formats (notably RAR and ZIP) support split archives. 318 A "split archive" is an archive that is stored in multiple files 319 (when stored as multiple files), or more generally, across multiple 320 storage media. 322 The ZIP format, for example, actually has two types of splits: "split 323 archive" and "spanned archive". A "split archive" is a standard ZIP 324 archive split over multiple files with the file extensions .z01, 325 .z02, etc.; the .zip file is the last file. A "spanned archive" is 326 the original format designed for use with swapping floppy disks. All 327 archive files have the same filename; the format uses volume labels 328 (presumably on floppy disks) to store disk numbers. Neither sub- 329 format is merely a naive division of the octet stream: each ZIP file 330 is parseable in its own right, and contains its own offset values. 332 The TAR format (or family of formats, including cpio and ustar) was 333 originally designed for streaming to and from tape devices, so 334 splitting is accomplished differently. 336 [[TODO: Consider how to label this content. archive/zip^01? 337 archive/zip; split=01? Something else? How shall 01 be associated 338 with 02, 03, etc., when the Content-Disposition: ; filename="" 339 parameter is "presentation-information" and may be separated from the 340 Content-Type header information?]] 342 7. Fragment Identifier Syntax 344 Because all archives represent files, archives can serve as virtual 345 filesystems. Respondents have noted that an archive's files can be 346 addressed by a fragment syntax that resembles a filesystem path. At 347 the same time, archives may record files in different ways (along 348 with different types of metadata), suggesting that a common baseline 349 with flexible extension points is more appropriate than a fixed 350 universal syntax. [[TODO: This will be explored in future drafts. 351 Note the similarities with this and the file: URI...]] 353 [[TODO: consider how to provide a fragment for content in the 354 archive. NB: most archives do NOT provide Content-Type/media type 355 information! So /foo.html being an HTML file is just an *assumption*, 356 and possibly a very wrong one at that. There is no IETF registry for 357 file extensions.]] 359 8. Piped-Composite Type Suffix Syntax 361 [[TODO: discuss tar piped through bzip2, gzip, etc. as a distinct 362 file format, rather than an application of the Content-Encoding: 363 header. Suggest common suffix like archive/tar|bzip2, where | is some 364 useful character but not + since + is for structured syntaxes.]] 366 9. Security Considerations 368 Archives represent files, file metadata, and filesystems; thus, 369 security issues loom large because archives can contain just about 370 anything. These concerns are magnified by the arbitrary transport of 371 such data across the Internet. [[TODO: complete.]] 373 10. Normative References 375 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 376 Extensions (MIME) Part One: Format of Internet Message 377 Bodies", RFC 2045, November 1996. 379 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 380 Requirement Levels", BCP 14, RFC 2119, March 1997. 382 [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 383 Specifications and Registration Procedures", BCP 13, RFC 384 6838, January 2013. 386 Appendix A. Expected Subtypes 388 The following archive formats will be explored for registration as 389 subtypes along with this effort: 391 Archiving Only 393 TAR 395 Multipurpose (archiving, compression, encryption) 397 ZIP, ACE, RAR, 7-Zip, StuffIt, FreeArc 399 Software Packaging 401 MSI, RPM, JAR, XPI, CAB, CRX, APK 403 Disk Imaging 405 ISO, NRG, BIN/CUE, VMDK, WIM, PartImage, IMG/IMA/IMZ, DMG 407 Authors' Addresses 409 Sean Leonard 410 Penango, Inc. 411 5900 Wilshire Boulevard 412 21st Floor 413 Los Angeles, CA 90036 414 USA 416 EMail: dev+ietf@seantek.com 417 URI: http://www.penango.com/ 419 Matthew Kerwin 421 Email: matthew@kerwin.net.au 422 URI: http://matthew.kerwin.net.au/