[apps-discuss] A proposal for a new top-level media type: archive

Sean Leonard <dev+ietf@seantek.com> Wed, 24 September 2014 23:23 UTC

Return-Path: <dev+ietf@seantek.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2E0B11A1B38 for <apps-discuss@ietfa.amsl.com>; Wed, 24 Sep 2014 16:23:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.702
X-Spam-Level:
X-Spam-Status: No, score=-0.702 tagged_above=-999 required=5 tests=[BAYES_40=-0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p96xtGsYVn7i for <apps-discuss@ietfa.amsl.com>; Wed, 24 Sep 2014 16:23:42 -0700 (PDT)
Received: from mxout-08.mxes.net (mxout-08.mxes.net [216.86.168.183]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E70931A1A69 for <apps-discuss@ietf.org>; Wed, 24 Sep 2014 16:23:41 -0700 (PDT)
Received: from [192.168.123.7] (unknown [23.240.242.6]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id AD57C509B8; Wed, 24 Sep 2014 19:23:40 -0400 (EDT)
Message-ID: <54235269.2060002@seantek.com>
Date: Wed, 24 Sep 2014 16:23:21 -0700
From: Sean Leonard <dev+ietf@seantek.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1
MIME-Version: 1.0
To: media-types@iana.org, apps-discuss@ietf.org
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: http://mailarchive.ietf.org/arch/msg/apps-discuss/ZRdVOZp-iZqdLb4X-VUTlxYEyvU
Subject: [apps-discuss] A proposal for a new top-level media type: archive
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Sep 2014 23:23:44 -0000

Colleagues on media-types and apps-discuss:

I would like to propose that the IETF create a new top-level media type: 
archive.

Basically, archive would be a top-level type for all types of archive 
formats.
https://en.wikipedia.org/wiki/Archive_file
https://en.wikipedia.org/wiki/List_of_archive_formats

I think it's important to register archive formats as a distinct type 
from application, because there are common semantics that apply. In 
fact, these semantics are very similar to multipart and message 
top-level types.

The archive data types are all storage formats for *files*, as opposed 
to *content*. Each file has its own security implications, along with 
metadata that also has security implications (user and group 
permissions, access bits, executable bits, ACLs). At the highest level, 
an Internet-connected application ought to be able to identify that a 
particular piece of content is of this type (as opposed to the opaque 
application type), so it can make decisions about the content that are 
unique to archives, namely, dealing with the security issues, and 
presenting uniform user interfaces to handling such archives. Content 
bundling types like message (RFC 5322), multipart, and application/cms 
(CMS) are conceptually distinct. All those types can contain content 
that can get split off into files, but their purpose is not to replicate 
file system data.

Archives are ubiquitous on the Internet. Even if archives are used 
"infrequently" across the Internet architecture, they are obviously used 
at the endpoints. Improper transmission of archives has become a major 
source of labeling and security issues.

Remarkably, most archive formats have not been registered as media types 
(except for application/zip, which is an oldie). Therefore, it's pretty 
much a "clean field". Furthermore, there is a trend of a lot of widely 
available tools to support multiple formats, so the probability is good 
that if you pass some archive/* labeled content to an archive 
application, it will be able to do something intelligent with it.

The following major sub-types of archives, all belong in a common 
top-level media type: [from Wikipedia]
* archiving only (concatenate files): tar
*  multi-function (concatenate, compress, encrypt, etc.): zip, rar, 7z, 
arc, arj, the list goes on and on...
* software packaging: cab, msi, pup, pet, apk, rpm...
* disk image: ISO-9660 (CD/DVD/Blu-Ray), Apple Disk Image, virtual 
floppy disks, formerly-known-as-TrueCrypt, etc.
* backup: (a large quantity of proprietary formats)

I know that the TLMT matter has been brought up before with fonts. 
<http://www6.ietf.org/mail-archive/web/apps-discuss/current/msg03447.html>

Where do we start? Maybe we should talk about it? I don't think it's as 
simple as drafting an Internet-Draft. Maybe there should be a BOF or 
working group. Experts with file system and archival experience should 
get involved.

Sean