idnits 2.17.1 

draft-seantek-kerwin-arcmedia-type-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but
     does not include the phrase in its RFC 2119 key words list.

  -- The document date (October 27, 2014) is 3469 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'FILE' is mentioned on line 93, but not defined

  == Missing Reference: 'CITE' is mentioned on line 299, but not defined

  == Unused Reference: 'RFC6838' is defined on line 382, but no explicit
     reference was found in the text


     Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         S. Leonard
3	Internet-Draft                                             Penango, Inc.
4	Intended Status: Standards Track                               M. Kerwin
5	Expires: April 30, 2015                                 October 27, 2014

7	            The Archive Primary Media Type for File Archives
8	                 draft-seantek-kerwin-arcmedia-type-00

10	Abstract

12	   This document defines a new primary content-type to be known as
13	   "archive", which defines a fundamental type of content with unique
14	   presentational, hardware, and processing aspects.

16	Status of This Memo

18	   This Internet-Draft is submitted in full conformance with the
19	   provisions of BCP 78 and BCP 79.

21	   Internet-Drafts are working documents of the Internet Engineering
22	   Task Force (IETF).  Note that other groups may also distribute
23	   working documents as Internet-Drafts.  The list of current Internet-
24	   Drafts is at http://datatracker.ietf.org/drafts/current/.

26	   Internet-Drafts are draft documents valid for a maximum of six months
27	   and may be updated, replaced, or obsoleted by other documents at any
28	   time.  It is inappropriate to use Internet-Drafts as reference
29	   material or to cite them other than as "work in progress."

31	   This Internet-Draft will expire on April 30, 2015.

33	Copyright Notice

35	   Copyright (c) 2014 IETF Trust and the persons identified as the
36	   document authors.  All rights reserved.

38	   This document is subject to BCP 78 and the IETF Trust's Legal
39	   Provisions Relating to IETF Documents
40	   (http://trustee.ietf.org/license-info) in effect on the date of
41	   publication of this document.  Please review these documents
42	   carefully, as they describe your rights and restrictions with respect
43	   to this document.  Code Components extracted from this document must
44	   include Simplified BSD License text as described in Section 4.e of
45	   the Trust Legal Provisions and are provided without warranty as
46	   described in the Simplified BSD License.

48	Table of Contents

50	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  2
51	     1.1.  Overview . . . . . . . . . . . . . . . . . . . . . . . . .  2
52	     1.2.  Notational Conventions . . . . . . . . . . . . . . . . . .  2
53	   2.  Definition of an archive . . . . . . . . . . . . . . . . . . .  2
54	   3.  Consultation Mechanisms  . . . . . . . . . . . . . . . . . . .  5
55	   4.  Encoding and Transport . . . . . . . . . . . . . . . . . . . .  5
56	   5.  Common Required and Optional Parameters  . . . . . . . . . . .  7
57	   6.  Split Archives . . . . . . . . . . . . . . . . . . . . . . . .  7
58	   7.  Fragment Identifier Syntax . . . . . . . . . . . . . . . . . .  8
59	   8.  Piped-Composite Type Suffix Syntax . . . . . . . . . . . . . .  8
60	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . .  8
61	   10. Normative References . . . . . . . . . . . . . . . . . . . . .  8
62	   Appendix A.  Expected Subtypes . . . . . . . . . . . . . . . . . .  9

64	1.  Introduction

66	   The purpose of this memo is to propose an update to [RFC2045] to
67	   include a new primary content-type to be known as "archive".
68	   [RFC2045] describes mechanisms for specifying and describing the
69	   format of Internet Message Bodies via content-type/subtype pairs.
70	   "archive" defines a fundamental type of content with unique
71	   presentational, hardware, and processing aspects.  Various subtypes
72	   of this primary type are immediately anticipated, and will be covered
73	   under separate documents.

75	1.1.  Overview

77	   This document will outline what an archive is, show examples of
78	   archives, and discuss the benefits of grouping archives together.

80	   This document is a discussion document for an agreed definition,
81	   intended eventually to form a standard accepted extension to
82	   [RFC2045].

84	1.2.  Notational Conventions

86	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
87	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
88	   document are to be interpreted as described in [RFC2119].

90	2.  Definition of an archive

92	   An archive primary media type identifies data that represents one or
93	   more files [FILE] along with metadata.  Archives are used to collect
94	   multiple data files together into a single file for easier
95	   portability and storage. Archive formats can provide many optional
96	   services, including:

98	   1. compression
99	   2. encryption
100	   3. authentication
101	   4. backup
102	   5. filesystem imaging
103	   6. software packaging and distribution
104	   7. volume-splitting (archive split into multiple contents)
105	   8. block storage

107	   Formats and techniques that perform one or more of these services
108	   already exist under separate registrations. For example, the Content-
109	   Encoding header can be used to compress Internet message content. The
110	   distinguishing feature of the archive primary type is that these
111	   services are integrated into the format itself, along with the
112	   inclusion of file-specific metadata. Virtually all formats
113	   contemplated under this primary type are designed to concatenate
114	   multiple files into a single data stream, along with filenames and
115	   other metadata. When an Internet-facing application handles content
116	   labeled with this type, it SHOULD provide handling consistent with
117	   the archive as a discrete data item. For example, an Internet mail
118	   user agent would display an archive-labeled type with an archive
119	   icon, possibly with a preview of the files contained therein (as
120	   opposed to automatically traversing its contents, as it would for
121	   multipart-labeled content).

123	   Common operations include creating an archive, identifying files in
124	   an archive, adding to an archive, backing up to an archive,
125	   extracting an archive, restoring from an archive, deleting from an
126	   archive, mounting and unmounting an archive, [[TODO: executing an
127	   archive?]], and installing and uninstalling an archive.

129	   * Creating: taking files from a filesystem and representing those
130	        files in an archive.

132	   * Identifying files: parsing an archive's format, extracting
133	        information about files represented in the archive.

135	   * Adding: parsing an archive's format, adding files or non-file data
136	        to the archive. In virtually all cases, at least some part of
137	        the archive's content will be modified (though perhaps only at
138	        the end). Unlike, for instance, text media types, concatenating
139	        two separate archive contents *never* yields a valid composite
140	        archive.

142	   * Backing up: taking some or all of a filesystem and representing the
143	        filesystem in an archive, with the express intention of
144	        recording the files as they exist in a source filesystem at the
145	        time of backing up. For example, the compression, encryption,
146	        and access control list (permissions) properties of the files
147	        would be preserved.

149	   * Extracting: parsing an archive's format, copying file data (or file
150	        metadata) out of the archive into one or more files on a
151	        destination filesystem. This operation implies that at least
152	        some file metadata will be preserved, while other file metadata
153	        may be adjusted or added to adapt to the local environment.

155	   * Restoring: parsing an archive's format, copying file data out of
156	        the archive into the destination filesystem, with the express
157	        intention of recreating the files as they existed in a source
158	        filesystem at the time of backing up. For example, the
159	        compression, encryption, and access control list (permissions)
160	        properties of the files would be preserved.

162	   * Deleting: parsing an archive's format, removing file data (or
163	        metadata) from the archive, requiring changes to the archive's
164	        contents. Some archive formats permit orphan data in the archive
165	        content; other formats require re-serializing some or all of the
166	        archive.

168	   * Mounting and unmounting: Mapping an archive's semantics directly to
169	        a filesystem, so that the files represented in the archive can
170	        be accessed using the filesystem's namespace with typical
171	        filesystem APIs. Rather than being backed by a physical block
172	        storage device, that part of the filesystem is backed by the
173	        archive.

175	   * Executing [[NB: this may be controversial; it is worth
176	        discussing]]: Identifying executable semantics of an archive,
177	        and causing code to execute.

179	   * Installing and uninstalling [[NB: this may be controversial; it is
180	        worth discussing]]: Treating the archive as a software package,
181	        extracting certain contents in the archive and executing other
182	        contents in the archive, according to some software packaging
183	        protocol.

185	3.  Consultation Mechanisms

187	   Before proposing a subtype for the archive/* primary type, it is
188	   suggested that the subtype author examine the definition (above) of
189	   what an archive/* is and the listing (below) of what an archive/* is
190	   not.  Additional consultations with the authors of the existing
191	   archive/* subtypes is also suggested.

193	4.  Encoding and Transport

195	   Unrecognized subtypes of archive SHOULD at a minimum be treated as
196	   "archive/file".  Like "application/octet-stream", the purpose of the
197	   "archive/file" is to provide default handling; it does not represent
198	   a particular archive format. Implementations SHOULD pass subtypes of
199	   archive that they do not specifically recognize to a robust
200	   general-purpose archive viewing application, if such an application
201	   is available.

203	   If default archive (archive/file) handling is not supported, it is
204	   appropriate to treat the archive like "application/octet-stream".

206	   Unless noted in the subtype registration, subtypes of archive SHALL
207	   be assumed to contain binary data, implying a content encoding of
208	   base64 for email and binary transfer for ftp and http.

210	   The formal syntax for the subtypes of the model primary type SHOULD
211	   look like this:

213	      Type name:

215	         archive

217	      Subtype name:

219	         xxxxxxxx

221	      Required parameters:

223	         none

225	      Optional parameters:

227	         TBD

229	      Encoding considerations:

231	         base64 encoding is recommended when transmitting archive/*
232	         documents through MIME electronic mail.

234	      Security considerations:

236	         see Section 5 below

238	      Interoperability considerations:

240	         TBD

242	      Published specification:

244	         TBD

246	      Applications that use this media type:

248	         TBD

250	      Fragment identifier considerations:

252	         The considerations of this document, plus any extra syntaxes
253	         not inconsistent with this document.

255	      Additional information:

257	         Deprecated alias names for this type:
258	            (Include non-archive alias names,
259	             such as those in application.)
260	         Magic number(s): TBD
261	         File extension(s): TBD
262	         Macintosh file type code(s): TBD

264	      See Appendix A for references to some of the expected subtypes.

266	      Person and email address to contact for further information:

268	         TBD

270	      Intended usage: TBD (COMMON will be the most common)

272	      Restrictions on usage: TBD

274	      Author: TBD

276	      Change controller: TBD

278	      Provisional registration? (standards tree only): (Yes/No)

280	      (Any other information that the author deems interesting may be
281	      added below this line.)

283	   The optional parameters consist of starting conditions and variable
284	   values used as part of the subtypes.

286	5.  Common Required and Optional Parameters

288	   Unlike the text primary media type (for instance), virtually all
289	   archive formats have been designed with almost all of the information
290	   required for interpretation contained within the format. Therefore,
291	   parameters are NOT RECOMMENDED; registrants are not expected to
292	   register additional parameters.

294	   Regrettably, not all archive formats are as "universal" or "complete"
295	   as one might assume at first glance. This is because some archive
296	   formats are very old or are based on older formats where backwards-
297	   compatibility was a design goal; thus they were not designed with
298	   transport across the Internet in mind. The ZIP file is an example:
299	   although the modern ZIP supports Unicode [CITE], the default encoding
300	   of ZIP filenames has always been Code Page 437. Since "archive"
301	   contents are literally archives of computing history, sometimes
302	   communicating the archive as-is, rather than updating the archive to
303	   a more universal format, is necessary.

305	   Implementations that are archive-type aware MUST support the
306	   following parameters for maximum compatibility. At the same time, new
307	   archives SHOULD NOT rely on these parameters for disambiguation; new
308	   archives SHOULD be created in such a way that "universal"
309	   interoperability is achieved with the archive's self-contained
310	   information. [[TODO: code page--it's like charset but only applies to
311	   certain strings in the archive, when the archive format is ambiguous;
312	   do NOT attempt to apply this parameter as one would apply charset to
313	   text/*. Endian-ness? Time/Y2K representation issues? Anything else?]]

315	6.  Split Archives

317	   Several archive formats (notably RAR and ZIP) support split archives.
318	   A "split archive" is an archive that is stored in multiple files
319	   (when stored as multiple files), or more generally, across multiple
320	   storage media.

322	   The ZIP format, for example, actually has two types of splits: "split
323	   archive" and "spanned archive". A "split archive" is a standard ZIP
324	   archive split over multiple files with the file extensions .z01,
325	   .z02, etc.; the .zip file is the last file. A "spanned archive" is
326	   the original format designed for use with swapping floppy disks. All
327	   archive files have the same filename; the format uses volume labels
328	   (presumably on floppy disks) to store disk numbers. Neither sub-
329	   format is merely a naive division of the octet stream: each ZIP file
330	   is parseable in its own right, and contains its own offset values.

332	   The TAR format (or family of formats, including cpio and ustar) was
333	   originally designed for streaming to and from tape devices, so
334	   splitting is accomplished differently.

336	   [[TODO: Consider how to label this content. archive/zip^01?
337	   archive/zip; split=01? Something else? How shall 01 be associated
338	   with 02, 03, etc., when the Content-Disposition: ; filename=""
339	   parameter is "presentation-information" and may be separated from the
340	   Content-Type header information?]]

342	7.  Fragment Identifier Syntax

344	   Because all archives represent files, archives can serve as virtual
345	   filesystems. Respondents have noted that an archive's files can be
346	   addressed by a fragment syntax that resembles a filesystem path. At
347	   the same time, archives may record files in different ways (along
348	   with different types of metadata), suggesting that a common baseline
349	   with flexible extension points is more appropriate than a fixed
350	   universal syntax. [[TODO: This will be explored in future drafts.
351	   Note the similarities with this and the file: URI...]]

353	   [[TODO: consider how to provide a fragment for content in the
354	   archive. NB: most archives do NOT provide Content-Type/media type
355	   information! So /foo.html being an HTML file is just an *assumption*,
356	   and possibly a very wrong one at that. There is no IETF registry for
357	   file extensions.]]

359	8.  Piped-Composite Type Suffix Syntax

361	   [[TODO: discuss tar piped through bzip2, gzip, etc. as a distinct
362	   file format, rather than an application of the Content-Encoding:
363	   header. Suggest common suffix like archive/tar|bzip2, where | is some
364	   useful character but not + since + is for structured syntaxes.]]

366	9.  Security Considerations

368	   Archives represent files, file metadata, and filesystems; thus,
369	   security issues loom large because archives can contain just about
370	   anything. These concerns are magnified by the arbitrary transport of
371	   such data across the Internet. [[TODO: complete.]]

373	10. Normative References

375	   [RFC2045]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
376	              Extensions (MIME) Part One: Format of Internet Message
377	              Bodies", RFC 2045, November 1996.

379	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
380	              Requirement Levels", BCP 14, RFC 2119, March 1997.

382	   [RFC6838]  Freed, N., Klensin, J., and T. Hansen, "Media Type
383	              Specifications and Registration Procedures", BCP 13, RFC
384	              6838, January 2013.

386	Appendix A.  Expected Subtypes

388	   The following archive formats will be explored for registration as
389	   subtypes along with this effort:

391	   Archiving Only

393	      TAR

395	   Multipurpose (archiving, compression, encryption)

397	      ZIP, ACE, RAR, 7-Zip, StuffIt, FreeArc

399	   Software Packaging

401	      MSI, RPM, JAR, XPI, CAB, CRX, APK

403	   Disk Imaging

405	      ISO, NRG, BIN/CUE, VMDK, WIM, PartImage, IMG/IMA/IMZ, DMG

407	Authors' Addresses

409	   Sean Leonard
410	   Penango, Inc.
411	   5900 Wilshire Boulevard
412	   21st Floor
413	   Los Angeles, CA  90036
414	   USA

416	   EMail: dev+ietf@seantek.com
417	   URI:   http://www.penango.com/

419	   Matthew Kerwin

421	   Email: matthew@kerwin.net.au
422	   URI:   http://matthew.kerwin.net.au/