idnits 2.17.1 draft-masinter-mime-web-info-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 11, 2011) is 4844 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 1522 (Obsoleted by RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force L. Masinter 3 Internet-Draft Adobe 4 Intended status: Informational January 11, 2011 5 Expires: July 15, 2011 7 MIME and the Web 8 draft-masinter-mime-web-info-02 10 Abstract 12 This document describes some of the ways in which parts of the MIME 13 system, originally designed for electronic mail, have been used in 14 the Web, and some of the ways in which those uses have resulted in 15 difficulties. Given this background and justification, this document 16 then goes on to outline requirements for changes to MIME registries 17 and practices for their use within W3C and IETF, in order to address 18 those difficulties. Within IETF, it is expected that a companion 19 Best Current Practice document will make specific changes to the 20 Internet Media Types and Charset registries, among others. 22 Status of this Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on July 15, 2011. 39 Copyright Notice 41 Copyright (c) 2011 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 2. History . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2.1. Origins of MIME . . . . . . . . . . . . . . . . . . . . . 3 59 2.2. Introducing MIME into the Web . . . . . . . . . . . . . . 4 60 2.3. Distributed Extensibility . . . . . . . . . . . . . . . . 5 61 3. Problems with application to the Web . . . . . . . . . . . . . 5 62 3.1. Lack of clarity . . . . . . . . . . . . . . . . . . . . . 5 63 3.2. Differences between email and Web delivery . . . . . . . . 6 64 3.3. The Rules Weren't Quite Followed . . . . . . . . . . . . . 7 65 3.4. Consequences . . . . . . . . . . . . . . . . . . . . . . . 9 66 3.5. The Down Side of Extensibility . . . . . . . . . . . . . . 9 67 4. Additional considerations . . . . . . . . . . . . . . . . . . 9 68 4.1. There are related problems with charsets . . . . . . . . . 10 69 4.2. Embedded, downloaded, launch independent application . . . 10 70 4.3. Additional Use Cases: Polyglot and Multiview . . . . . . . 10 71 4.4. Evolution, Versioning, Forking . . . . . . . . . . . . . . 11 72 4.5. Content Negotiation . . . . . . . . . . . . . . . . . . . 12 73 4.6. Fragment identifiers . . . . . . . . . . . . . . . . . . . 12 74 5. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 12 75 5.1. Internet Media Type registration . . . . . . . . . . . . . 13 76 5.1.1. MIME registry magic numbers for sniffing . . . . . . . 13 77 5.1.2. Scripting and scriptable content safety . . . . . . . 13 78 5.1.3. Fragment identifiers . . . . . . . . . . . . . . . . . 13 79 5.1.4. Application info . . . . . . . . . . . . . . . . . . . 13 80 5.1.5. File extensions in registry . . . . . . . . . . . . . 14 81 5.2. Sniffing . . . . . . . . . . . . . . . . . . . . . . . . . 14 82 5.2.1. Sniffing uses Media Type magic number . . . . . . . . 14 83 5.2.2. Sniffing when there are multiple different 84 definitions . . . . . . . . . . . . . . . . . . . . . 14 85 5.2.3. Sniffing charsets . . . . . . . . . . . . . . . . . . 14 86 5.2.4. Sniffing security uses scriptability info . . . . . . 14 87 5.3. Changes to IANA processes for MIME registries . . . . . . 15 88 5.4. FTP specification . . . . . . . . . . . . . . . . . . . . 15 89 5.5. Update some URI definitions . . . . . . . . . . . . . . . 15 90 5.6. Changes to W3C findings, processes . . . . . . . . . . . . 15 91 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15 92 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 93 8. Security Considerations . . . . . . . . . . . . . . . . . . . 16 94 9. Informative References . . . . . . . . . . . . . . . . . . . . 16 95 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 17 97 1. Introduction 99 This document was prompted by discussions about Web architecture and 100 the difficulties surrounding evolution of the Web, Internet Media 101 types, multiple specifications for a single media type, and related 102 discussions. 104 The document gives some of the history of MIME and its introduction 105 and use in the Web Section 2. It then describes some of the current 106 difficulties with the use of MIME in the Web context Section 3. This 107 background and context is then followed by a description of changes 108 which would reduce some of those difficulties; the changes involve 109 specifications, practices, and registries within IETF and W3C 110 Section 5. In particular, changes to the registry and maintenance 111 procedures for MIME-related registries maintained by IANA are 112 describes. 114 Currently, discussion of this document is suggested on the mailing 115 list www-tag@w3c.org (mailing list open for subscription to all), 116 archives at http://lists.w3.org/Archives/Public/www-tag/. 118 2. History 120 2.1. Origins of MIME 122 MIME ("Multipurpose Internet Mail Extensions") was invented 123 originally for email, based on general principles of "messaging" (a 124 foundational architecture framework). The role of MIME was to extend 125 Internet email messaging from ASCII-only plain text, to include other 126 character sets, images, rich documents, etc.) [RFC1521], [RFC1522]. 127 The basic architecture of complex content messaging is: 129 o Message sent from A to B. 131 o Message includes some data. Sender A includes standard 'headers' 132 telling recipient B enough information that recipient B knows how 133 sender A intends the message to be interpreted. 135 o Recipient B gets the message, interprets the headers for the data 136 and uses it as information on how to interpret the data. 138 MIME is a "tagging and bagging" specification: 140 tagging: How to label content so the intent of how the content 141 should be interpreted is known. 143 bagging: How to wrap the content so the label is clear, or, if there 144 are multiple parts to a single message, how to combine them. 146 "MIME types" (renamed "Internet Media Types" in later specs 147 [RFC2046]) are part of the "tagging" -- a way to describe the content 148 of a message so that it could be used to initiate interpretation of a 149 message. The "Internet Media Type registry" (MIME type registry) is 150 where someone can tell the world what a particular label means, as 151 far as the sender's intent of how recipients should process a message 152 of that type, and the description of a recipients capability and 153 ability for senders. 155 2.2. Introducing MIME into the Web 157 The original World Wide Web (the 0.9 version of HTTP, see [RFC1945]) 158 didn't have "tagging and bagging" -- everything sent via HTTP was 159 assumed to be HTML. However, at the time (early 1990's) other 160 distributed information access systems, including Gopher (distributed 161 menu system) and WAIS (remote access to document databases) were 162 adding capabilities for accessing many things other text and 163 hypertext and the WWW folks were considering type tagging. It was 164 agreed that HTTP should use MIME as the vocabulary for talking about 165 file types and character sets. The result was that HTTP 1.0 added 166 the "content-type" header, following (more or less) MIME. Later, for 167 content negotiation, additional uses of this technology (in 'Accept' 168 headers) were also added. 170 The differences between the use of Internet Media Types between email 171 and HTTP have minor: 173 o default charset: HTTP originally specified ISO-8859-1 as the 174 default character set, not US-ASCII ((NEED REF TO HTTP ISSUE see 175 http://trac.tools.ietf.org/wg/httpbis/trac/ticket/20; the text 176 that it refers to currently is here: http://greenbytes.de/tech/ 177 webdav/draft-ietf-httpbis-p3-payload-11.html#rfc.section.2.3.1 )) 179 o requirement for CRLF in plain text: in practice, Web clients 180 didn't restrict content to use CRLF in text/* MIME bodies. 182 These minor differences have caused a lot of trouble. 184 2.3. Distributed Extensibility 186 The real advantage of using Internet Media Types to label content 187 meant that the Web was no longer restricted to a single format. This 188 one addition meant expanding from Global Hypertext to Global 189 Hypermedia (as suggested in a 1992 email [connolly92]) 191 +-------------------------------------------------------------------+ 192 | The Internet currently serves as the backbone for a global | 193 | hypertext. FTP and email provided a good start, and the gopher, | 194 | WWW, or WAIS clients and servers make wide area information | 195 | browsing simple. These systems even interoperate, with email | 196 | servers talking to FTP servers, WWW clients talking to gopher | 197 | servers, on and on. | 198 | This currently works quite well for text. But what should WWW | 199 | clients do as Gopher and WAIS servers begin to serve up pictures, | 200 | sounds, movies, spreadsheet templates, postscript files, etc.? | 201 | It would be a shame for each to adopt its own multimedia typing | 202 | system. | 203 | If they all adopt the MIME typing system (and as many other | 204 | features from MIME as are appropriate), we can step from global | 205 | hypertext to global hypermedia that much easier. | 206 +-------------------------------------------------------------------+ 208 The fact that HTTP could reliably transport images of different 209 formats, for example, allowed NCSA to add to HTML. MIME 210 allowed other document formats (Word, PDF, Postscript) and other 211 kinds of hypermedia, as well as other applications, to be part of the 212 Web. MIME was arguably the most important extensibility mechanism in 213 the Web. 215 3. Problems with application to the Web 217 Unfortunately, while the use of Internet Media Types for the Web 218 added incredible power, a number of problems have arisen. 220 3.1. Lack of clarity 222 Many people are confused about the purpose of MIME in the Web, its 223 uses, the meaning of Internet Media Types. Many W3C specifications 224 TAG findings and Internet Media Type registrations make what are 225 incorrect assumptions about the meaning and purposes of a Internet 226 Media Type registration. 228 3.2. Differences between email and Web delivery 230 Some of the differences between the application contexts of email and 231 Web delivery determine different requirements: 233 o In the Web, the transfer of data is initiated differently than in 234 email: the "messages" with labeled content are usually HTTP 235 responses to a specific (GET) request (although the request is 236 itself a message, GET has no content). In the most common case, 237 then, the receiver knows more about the data before it has been 238 sent. 240 o Clients would like to know more about the content before they 241 retrieve it. The "tagging" is often not sufficient to know, for 242 example, "can I interpret this if I retrieve it", because of 243 versioning, capabilities, or dependencies on things like screen 244 size or interaction capabilities of the recipient. 246 o Some content isn't delivered over the HTTP (files on local file 247 system), or there is no opportunity for tagging (data delivered 248 over FTP) and in those cases, some other ways are needed for 249 determining file type. 251 Operating systems use (and continued to evolve) different systems to 252 determine the 'type' of something, different from the MIME tagging 253 and bagging: 255 o 'magic numbers': in many contexts, file types can be guessed by 256 looking for some unique string, number or pattern, which only 257 appears in files of that type. In circumstances where this was a 258 unique number, it was called a "magic number", although this 259 concept has been extended to other textual patterns. 261 o Originally MAC OS had a 4 character 'file type' and another 4 262 character 'creator code' for file types. 264 o Windows evolved to use the "file extension" -- 3 letters (and then 265 more) at the end of the file name -- as the initial determination 266 of the oveall type of a file. This practice has now extended to 267 other systems. 269 Information about these other ways of determining type (rather than 270 by the content-type label) were gathered for the Internet Media Type 271 registry; those registering types are encouraged to also describe 272 'magic numbers', Mac file type, common file extensions. However, 273 since there was no formal use of that information, the quality of 274 that information in the registry is haphazard. 276 Finally, there was the fact that tagging and bagging might be OK for 277 unilaterally initiated (one-way) messaging, you might want to know 278 whether you could handle the data before reading it in and 279 interpreting it, but the Internet Media Types weren't enough to tell. 281 3.3. The Rules Weren't Quite Followed 283 The behavior of the community when the Internet Media Type registry 284 was designed hasn't matched expectations: 286 o Lots of file types aren't registered (no entry in IANA for file 287 types) 289 o For many file types that are registration, the registration is 290 incomplete or incorrect (people doing registration didn't 291 understand 'magic number' or other fields). 293 o The actual content deployed or created by deployed software 294 doesn't match the registration. 296 These problems arise for various reason, for example: 298 o The benefit of registration to the organization that designed the 299 file type is unclear compared to the overhead of sheperding the 300 registration through the process. 302 o Registration requires announcing product plans in advance of 303 product release. 305 o Orgnaizations are unaware of the registration process or 306 misinformed. 308 In particular, Web implementations of Internet Media Types diverged 309 from expected behavior: 311 o Browser implementors would be liberal in what they accepted, and 312 use what looked like a file extension in the URL and/or magic 313 number or other "sniffing" techniques to decide file type, without 314 assuming content-label was authoritative. This was necessary 315 anyway for files that weren't delivered by HTTP. 317 o HTTP server implementors and administrators didn't supply ways of 318 easily associating the 'intended' file type label with the file, 319 resulting in files frequently being delivered with a label other 320 than the one they would have chosen if they'd thought about it, 321 and if browsers *had* assumed content-type was authoritative. 322 Some popular servers had default configuration files that treated 323 any unknown type as "text/plain" (plain ext in ASCII). Since it 324 didn't matter (the browsers worked anyway), it was hard to get 325 this fixed. 327 Thus, in many situations, because of poor control over server 328 administration or weak file-type detection in popular web server 329 technology, receivers might find that 'magic number' scanning was 330 more reliable than the actual labeled content-type. 332 Incorrect senders coupled with liberal readers wind up feeding a 333 negative feedback loop based on the robustness principle 334 ([WikiRobust], [RFC3117]). 336 In addition, since the "magic number" technology is heuristic, it is 337 possible to have different formats all with the same "magic number" 338 or more generally, more than one different format that might be 339 reasonably "sniffed". 341 For example, there are cases where the reuse of one file type's magic 342 number for another file type is intentional -- deliberate "puns", 343 attempts to usurp ownership of another vendor, group, or standards 344 organization's control over a file format, for example. 346 Secondly, there are cases where a single file might match more than 347 one 'magic number' or recognition pattern, and different recievers 348 apply heuristics differently. 350 Finally, there are simple cases where the labeled type (text/plain, 351 application/octet-stream) is more general and could reasonably be 352 used with content which might otherwise match other patterns. 354 For example, the sniffing that's done by some web browsers text/ 355 plain. If you serve it the perfectly valid text file with the 356 content: 358 359 360 Rufus 361 Kitty 362 364 the browser will not display it (there are intentionally mismatched 365 tags on the 3rd line). Something like this might come up, for 366 example, if you had a bug database, with links to the text of 367 documents that caused problems. This buggy XML, served as text/ 368 plain, should render, but it does not in browsers that incorrectly 369 guess "application/xml". 371 The ". 727 [RFC1521] Borenstein, N. and N. Freed, "MIME (Multipurpose Internet 728 Mail Extensions) Part One: Mechanisms for Specifying and 729 Describing the Format of Internet Message Bodies", 730 RFC 1521, . 732 [RFC1522] Moore, K., "MIME (Multipurpose Internet Mail Extensions) 733 Part Two: Message Header Extensions for Non-ASCII Text", 734 RFC 1522, September 1993, 735 . 737 [RFC1945] Berners-Lee, T., Fielding, R., and H. Nielsen, "Hypertext 738 Transfer Protocol -- HTTP/1.0", RFC 1945, May 1996, 739 . 741 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 742 Extensions (MIME) Part Two: Media Types", RFC 2046, 743 November 1996, . 745 [RFC3117] Rose, M., "On the Design of Application Protocols", 746 RFC 3117, November 2001, 747 . 749 [Widgets] Caceres, M., "Widget Packaging and Configuration", 750 . 752 [WikiRobust] 753 "Robustness principle", 2010, 754 . 756 [connolly92] 757 Connolly, D., "Global Hypermedia", Oct 1992, . 761 [mime-sniff] 762 Barth, A. and I. Hickson, "Media Type Sniffing", 763 December 2010, 764 . 766 Author's Address 768 Larry Masinter 769 Adobe 770 345 Park Ave. 771 San Jose, 95110 772 USA 774 Phone: +1 408 536 3024 775 Email: masinter@adobe.com 776 URI: http://larry.masinter.net