Internet Engineering Task Force L. Masinter Internet-Draft Adobe Intended status: Informational January 11, 2011 Expires: July 15, 2011 MIME and the Web draft-masinter-mime-web-info-02 Abstract This document describes some of the ways in which parts of the MIME system, originally designed for electronic mail, have been used in the Web, and some of the ways in which those uses have resulted in difficulties. Given this background and justification, this document then goes on to outline requirements for changes to MIME registries and practices for their use within W3C and IETF, in order to address those difficulties. Within IETF, it is expected that a companion Best Current Practice document will make specific changes to the Internet Media Types and Charset registries, among others. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on July 15, 2011. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect Masinter Expires July 15, 2011 [Page 1] Internet-Draft MIME and the Web January 2011 to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. History . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Origins of MIME . . . . . . . . . . . . . . . . . . . . . 3 2.2. Introducing MIME into the Web . . . . . . . . . . . . . . 4 2.3. Distributed Extensibility . . . . . . . . . . . . . . . . 5 3. Problems with application to the Web . . . . . . . . . . . . . 5 3.1. Lack of clarity . . . . . . . . . . . . . . . . . . . . . 5 3.2. Differences between email and Web delivery . . . . . . . . 6 3.3. The Rules Weren't Quite Followed . . . . . . . . . . . . . 7 3.4. Consequences . . . . . . . . . . . . . . . . . . . . . . . 9 3.5. The Down Side of Extensibility . . . . . . . . . . . . . . 9 4. Additional considerations . . . . . . . . . . . . . . . . . . 9 4.1. There are related problems with charsets . . . . . . . . . 10 4.2. Embedded, downloaded, launch independent application . . . 10 4.3. Additional Use Cases: Polyglot and Multiview . . . . . . . 10 4.4. Evolution, Versioning, Forking . . . . . . . . . . . . . . 11 4.5. Content Negotiation . . . . . . . . . . . . . . . . . . . 12 4.6. Fragment identifiers . . . . . . . . . . . . . . . . . . . 12 5. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 12 5.1. Internet Media Type registration . . . . . . . . . . . . . 13 5.1.1. MIME registry magic numbers for sniffing . . . . . . . 13 5.1.2. Scripting and scriptable content safety . . . . . . . 13 5.1.3. Fragment identifiers . . . . . . . . . . . . . . . . . 13 5.1.4. Application info . . . . . . . . . . . . . . . . . . . 13 5.1.5. File extensions in registry . . . . . . . . . . . . . 14 5.2. Sniffing . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.2.1. Sniffing uses Media Type magic number . . . . . . . . 14 5.2.2. Sniffing when there are multiple different definitions . . . . . . . . . . . . . . . . . . . . . 14 5.2.3. Sniffing charsets . . . . . . . . . . . . . . . . . . 14 5.2.4. Sniffing security uses scriptability info . . . . . . 14 5.3. Changes to IANA processes for MIME registries . . . . . . 15 5.4. FTP specification . . . . . . . . . . . . . . . . . . . . 15 5.5. Update some URI definitions . . . . . . . . . . . . . . . 15 5.6. Changes to W3C findings, processes . . . . . . . . . . . . 15 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 8. Security Considerations . . . . . . . . . . . . . . . . . . . 16 9. Informative References . . . . . . . . . . . . . . . . . . . . 16 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 17 Masinter Expires July 15, 2011 [Page 2] Internet-Draft MIME and the Web January 2011 1. Introduction This document was prompted by discussions about Web architecture and the difficulties surrounding evolution of the Web, Internet Media types, multiple specifications for a single media type, and related discussions. The document gives some of the history of MIME and its introduction and use in the Web Section 2. It then describes some of the current difficulties with the use of MIME in the Web context Section 3. This background and context is then followed by a description of changes which would reduce some of those difficulties; the changes involve specifications, practices, and registries within IETF and W3C Section 5. In particular, changes to the registry and maintenance procedures for MIME-related registries maintained by IANA are describes. Currently, discussion of this document is suggested on the mailing list www-tag@w3c.org (mailing list open for subscription to all), archives at http://lists.w3.org/Archives/Public/www-tag/. 2. History 2.1. Origins of MIME MIME ("Multipurpose Internet Mail Extensions") was invented originally for email, based on general principles of "messaging" (a foundational architecture framework). The role of MIME was to extend Internet email messaging from ASCII-only plain text, to include other character sets, images, rich documents, etc.) [RFC1521], [RFC1522]. The basic architecture of complex content messaging is: o Message sent from A to B. o Message includes some data. Sender A includes standard 'headers' telling recipient B enough information that recipient B knows how sender A intends the message to be interpreted. o Recipient B gets the message, interprets the headers for the data and uses it as information on how to interpret the data. MIME is a "tagging and bagging" specification: tagging: How to label content so the intent of how the content should be interpreted is known. Masinter Expires July 15, 2011 [Page 3] Internet-Draft MIME and the Web January 2011 bagging: How to wrap the content so the label is clear, or, if there are multiple parts to a single message, how to combine them. "MIME types" (renamed "Internet Media Types" in later specs [RFC2046]) are part of the "tagging" -- a way to describe the content of a message so that it could be used to initiate interpretation of a message. The "Internet Media Type registry" (MIME type registry) is where someone can tell the world what a particular label means, as far as the sender's intent of how recipients should process a message of that type, and the description of a recipients capability and ability for senders. 2.2. Introducing MIME into the Web The original World Wide Web (the 0.9 version of HTTP, see [RFC1945]) didn't have "tagging and bagging" -- everything sent via HTTP was assumed to be HTML. However, at the time (early 1990's) other distributed information access systems, including Gopher (distributed menu system) and WAIS (remote access to document databases) were adding capabilities for accessing many things other text and hypertext and the WWW folks were considering type tagging. It was agreed that HTTP should use MIME as the vocabulary for talking about file types and character sets. The result was that HTTP 1.0 added the "content-type" header, following (more or less) MIME. Later, for content negotiation, additional uses of this technology (in 'Accept' headers) were also added. The differences between the use of Internet Media Types between email and HTTP have minor: o default charset: HTTP originally specified ISO-8859-1 as the default character set, not US-ASCII ((NEED REF TO HTTP ISSUE see http://trac.tools.ietf.org/wg/httpbis/trac/ticket/20; the text that it refers to currently is here: http://greenbytes.de/tech/ webdav/draft-ietf-httpbis-p3-payload-11.html#rfc.section.2.3.1 )) o requirement for CRLF in plain text: in practice, Web clients didn't restrict content to use CRLF in text/* MIME bodies. These minor differences have caused a lot of trouble. Masinter Expires July 15, 2011 [Page 4] Internet-Draft MIME and the Web January 2011 2.3. Distributed Extensibility The real advantage of using Internet Media Types to label content meant that the Web was no longer restricted to a single format. This one addition meant expanding from Global Hypertext to Global Hypermedia (as suggested in a 1992 email [connolly92]) +-------------------------------------------------------------------+ | The Internet currently serves as the backbone for a global | | hypertext. FTP and email provided a good start, and the gopher, | | WWW, or WAIS clients and servers make wide area information | | browsing simple. These systems even interoperate, with email | | servers talking to FTP servers, WWW clients talking to gopher | | servers, on and on. | | This currently works quite well for text. But what should WWW | | clients do as Gopher and WAIS servers begin to serve up pictures, | | sounds, movies, spreadsheet templates, postscript files, etc.? | | It would be a shame for each to adopt its own multimedia typing | | system. | | If they all adopt the MIME typing system (and as many other | | features from MIME as are appropriate), we can step from global | | hypertext to global hypermedia that much easier. | +-------------------------------------------------------------------+ The fact that HTTP could reliably transport images of different formats, for example, allowed NCSA to add to HTML. MIME allowed other document formats (Word, PDF, Postscript) and other kinds of hypermedia, as well as other applications, to be part of the Web. MIME was arguably the most important extensibility mechanism in the Web. 3. Problems with application to the Web Unfortunately, while the use of Internet Media Types for the Web added incredible power, a number of problems have arisen. 3.1. Lack of clarity Many people are confused about the purpose of MIME in the Web, its uses, the meaning of Internet Media Types. Many W3C specifications TAG findings and Internet Media Type registrations make what are incorrect assumptions about the meaning and purposes of a Internet Media Type registration. Masinter Expires July 15, 2011 [Page 5] Internet-Draft MIME and the Web January 2011 3.2. Differences between email and Web delivery Some of the differences between the application contexts of email and Web delivery determine different requirements: o In the Web, the transfer of data is initiated differently than in email: the "messages" with labeled content are usually HTTP responses to a specific (GET) request (although the request is itself a message, GET has no content). In the most common case, then, the receiver knows more about the data before it has been sent. o Clients would like to know more about the content before they retrieve it. The "tagging" is often not sufficient to know, for example, "can I interpret this if I retrieve it", because of versioning, capabilities, or dependencies on things like screen size or interaction capabilities of the recipient. o Some content isn't delivered over the HTTP (files on local file system), or there is no opportunity for tagging (data delivered over FTP) and in those cases, some other ways are needed for determining file type. Operating systems use (and continued to evolve) different systems to determine the 'type' of something, different from the MIME tagging and bagging: o 'magic numbers': in many contexts, file types can be guessed by looking for some unique string, number or pattern, which only appears in files of that type. In circumstances where this was a unique number, it was called a "magic number", although this concept has been extended to other textual patterns. o Originally MAC OS had a 4 character 'file type' and another 4 character 'creator code' for file types. o Windows evolved to use the "file extension" -- 3 letters (and then more) at the end of the file name -- as the initial determination of the oveall type of a file. This practice has now extended to other systems. Information about these other ways of determining type (rather than by the content-type label) were gathered for the Internet Media Type registry; those registering types are encouraged to also describe 'magic numbers', Mac file type, common file extensions. However, since there was no formal use of that information, the quality of that information in the registry is haphazard. Masinter Expires July 15, 2011 [Page 6] Internet-Draft MIME and the Web January 2011 Finally, there was the fact that tagging and bagging might be OK for unilaterally initiated (one-way) messaging, you might want to know whether you could handle the data before reading it in and interpreting it, but the Internet Media Types weren't enough to tell. 3.3. The Rules Weren't Quite Followed The behavior of the community when the Internet Media Type registry was designed hasn't matched expectations: o Lots of file types aren't registered (no entry in IANA for file types) o For many file types that are registration, the registration is incomplete or incorrect (people doing registration didn't understand 'magic number' or other fields). o The actual content deployed or created by deployed software doesn't match the registration. These problems arise for various reason, for example: o The benefit of registration to the organization that designed the file type is unclear compared to the overhead of sheperding the registration through the process. o Registration requires announcing product plans in advance of product release. o Orgnaizations are unaware of the registration process or misinformed. In particular, Web implementations of Internet Media Types diverged from expected behavior: o Browser implementors would be liberal in what they accepted, and use what looked like a file extension in the URL and/or magic number or other "sniffing" techniques to decide file type, without assuming content-label was authoritative. This was necessary anyway for files that weren't delivered by HTTP. o HTTP server implementors and administrators didn't supply ways of easily associating the 'intended' file type label with the file, resulting in files frequently being delivered with a label other than the one they would have chosen if they'd thought about it, and if browsers *had* assumed content-type was authoritative. Some popular servers had default configuration files that treated any unknown type as "text/plain" (plain ext in ASCII). Since it Masinter Expires July 15, 2011 [Page 7] Internet-Draft MIME and the Web January 2011 didn't matter (the browsers worked anyway), it was hard to get this fixed. Thus, in many situations, because of poor control over server administration or weak file-type detection in popular web server technology, receivers might find that 'magic number' scanning was more reliable than the actual labeled content-type. Incorrect senders coupled with liberal readers wind up feeding a negative feedback loop based on the robustness principle ([WikiRobust], [RFC3117]). In addition, since the "magic number" technology is heuristic, it is possible to have different formats all with the same "magic number" or more generally, more than one different format that might be reasonably "sniffed". For example, there are cases where the reuse of one file type's magic number for another file type is intentional -- deliberate "puns", attempts to usurp ownership of another vendor, group, or standards organization's control over a file format, for example. Secondly, there are cases where a single file might match more than one 'magic number' or recognition pattern, and different recievers apply heuristics differently. Finally, there are simple cases where the labeled type (text/plain, application/octet-stream) is more general and could reasonably be used with content which might otherwise match other patterns. For example, the sniffing that's done by some web browsers text/ plain. If you serve it the perfectly valid text file with the content: Rufus Kitty the browser will not display it (there are intentionally mismatched tags on the 3rd line). Something like this might come up, for example, if you had a bug database, with links to the text of documents that caused problems. This buggy XML, served as text/ plain, should render, but it does not in browsers that incorrectly guess "application/xml". The ". [RFC1521] Borenstein, N. and N. Freed, "MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies", RFC 1521, . [RFC1522] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part Two: Message Header Extensions for Non-ASCII Text", RFC 1522, September 1993, . [RFC1945] Berners-Lee, T., Fielding, R., and H. Nielsen, "Hypertext Transfer Protocol -- HTTP/1.0", RFC 1945, May 1996, . [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996, . [RFC3117] Rose, M., "On the Design of Application Protocols", RFC 3117, November 2001, . [Widgets] Caceres, M., "Widget Packaging and Configuration", . Masinter Expires July 15, 2011 [Page 16] Internet-Draft MIME and the Web January 2011 [WikiRobust] "Robustness principle", 2010, . [connolly92] Connolly, D., "Global Hypermedia", Oct 1992, . [mime-sniff] Barth, A. and I. Hickson, "Media Type Sniffing", December 2010, . Author's Address Larry Masinter Adobe 345 Park Ave. San Jose, 95110 USA Phone: +1 408 536 3024 Email: masinter@adobe.com URI: http://larry.masinter.net Masinter Expires July 15, 2011 [Page 17]