idnits 2.17.1 

draft-masinter-mime-web-info-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (October 25, 2010) is 4931 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Obsolete informational reference (is this intentional?): RFC 1522
     (Obsoleted by RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049)


     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Engineering Task Force                              L. Masinter
3	Internet-Draft                                                     Adobe
4	Intended status: Informational                          October 25, 2010
5	Expires: April 28, 2011

7	                            MIME and the Web
8	                    draft-masinter-mime-web-info-01

10	Abstract

12	   This document describes some of the ways in which parts of the MIME
13	   system, originally designed for electronic mail, have been used in
14	   the Web, and some of the ways in which those uses have resulted in
15	   difficulties.  Given this background and justification, this document
16	   then goes on to outline requirements for changes to MIME registries
17	   and practices for their use within W3C and IETF, in order to address
18	   those difficulties.  Within IETF, a companion Best Current Practice
19	   document will be developed to specifically make some changes to the
20	   Internet Media Types and Charset registries.

22	Status of this Memo

24	   This Internet-Draft is submitted in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF).  Note that other groups may also distribute
29	   working documents as Internet-Drafts.  The list of current Internet-
30	   Drafts is at http://datatracker.ietf.org/drafts/current/.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   This Internet-Draft will expire on April 28, 2011.

39	Copyright Notice

41	   Copyright (c) 2010 IETF Trust and the persons identified as the
42	   document authors.  All rights reserved.

44	   This document is subject to BCP 78 and the IETF Trust's Legal
45	   Provisions Relating to IETF Documents
46	   (http://trustee.ietf.org/license-info) in effect on the date of
47	   publication of this document.  Please review these documents
48	   carefully, as they describe your rights and restrictions with respect
49	   to this document.  Code Components extracted from this document must
50	   include Simplified BSD License text as described in Section 4.e of
51	   the Trust Legal Provisions and are provided without warranty as
52	   described in the Simplified BSD License.

54	Table of Contents

56	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
57	   2.  History  . . . . . . . . . . . . . . . . . . . . . . . . . . .  3
58	     2.1.  Origins of MIME  . . . . . . . . . . . . . . . . . . . . .  3
59	     2.2.  Introducing MIME into the Web  . . . . . . . . . . . . . .  4
60	     2.3.  Distributed Extensibility  . . . . . . . . . . . . . . . .  5
61	   3.  Problems with application to the Web . . . . . . . . . . . . .  5
62	     3.1.  Lack of clarity  . . . . . . . . . . . . . . . . . . . . .  5
63	     3.2.  Differences between email and Web delivery . . . . . . . .  6
64	     3.3.  The Rules Weren't Quite Followed . . . . . . . . . . . . .  7
65	     3.4.  Consequences . . . . . . . . . . . . . . . . . . . . . . .  7
66	     3.5.  The Down Side of Extensibility . . . . . . . . . . . . . .  8
67	   4.  Additional considerations  . . . . . . . . . . . . . . . . . .  8
68	     4.1.  There are related problems with charsets . . . . . . . . .  8
69	     4.2.  Embedded, downloaded, launch independent application . . .  9
70	     4.3.  Additional Use Cases: Polyglot and Multiview . . . . . . .  9
71	     4.4.  Evolution, Versioning, Forking . . . . . . . . . . . . . .  9
72	     4.5.  Content Negotiation  . . . . . . . . . . . . . . . . . . . 10
73	     4.6.  Fragment identifiers . . . . . . . . . . . . . . . . . . . 11
74	   5.  Recommendations  . . . . . . . . . . . . . . . . . . . . . . . 11
75	     5.1.  Internet Media Type registration . . . . . . . . . . . . . 12
76	       5.1.1.  MIME registry magic numbers for sniffing . . . . . . . 12
77	       5.1.2.  Scripting and scriptable content safety  . . . . . . . 12
78	       5.1.3.  Fragment identifiers . . . . . . . . . . . . . . . . . 12
79	       5.1.4.  Application info . . . . . . . . . . . . . . . . . . . 12
80	       5.1.5.  File extensions in registry  . . . . . . . . . . . . . 12
81	     5.2.  Sniffing . . . . . . . . . . . . . . . . . . . . . . . . . 13
82	       5.2.1.  Sniffing uses Media Type magic number  . . . . . . . . 13
83	       5.2.2.  Sniffing when there are multiple different
84	               definitions  . . . . . . . . . . . . . . . . . . . . . 13
85	       5.2.3.  Sniffing charsets  . . . . . . . . . . . . . . . . . . 13
86	       5.2.4.  Sniffing security uses scriptability info  . . . . . . 13
87	     5.3.  Changes to IANA processes for MIME registries  . . . . . . 13
88	     5.4.  FTP specification  . . . . . . . . . . . . . . . . . . . . 13
89	     5.5.  Update some URI definitions  . . . . . . . . . . . . . . . 14
90	     5.6.  Changes to W3C findings, processes . . . . . . . . . . . . 14
91	   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14
92	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 14
93	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 14
94	   9.  Informative References . . . . . . . . . . . . . . . . . . . . 14
95	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 15

97	1.  Introduction

99	   This document was initially prompted by a set of discussions about
100	   Web architecture and the difficulties surrounding evolution of the
101	   Web, Internet Media types, multiple specifications for a single media
102	   type, and related discussions.

104	   The document gives some of the history of MIME and its introduction
105	   and use in the web Section 2.  It then describes some of the current
106	   difficulties with the use of MIME in the web context Section 3.  This
107	   background and context is then followed by a description of changes
108	   which would reduce some of those difficulties; the changes involve
109	   specifications, practices, and registries within IETF and W3C
110	   Section 5.  In particular, changes to the registry and maintenance
111	   procedures for MIME-related registries maintained by IANA are
112	   describes.

114	   Currently, discussion of this document is suggested on the mailing
115	   list www-tag@w3c.org (mailing list open for subscription to all),
116	   archives at http://lists.w3.org/Archives/Public/www-tag/.

118	   NOTE: This document is still quite rough; some of the facts need to
119	   be checked, many sections still need expansion.  Any help with
120	   references and such appreciated.

122	2.  History

124	2.1.  Origins of MIME

126	   MIME ("Multipurpose Internet Mail Extensions") was invented
127	   originally for email, based on general principles of "messaging" (a
128	   foundational architecture framework).  The role of MIME was to extend
129	   Internet email messaging from ASCII-only plain text, to include other
130	   character sets, images, rich documents, etc.)  [RFC1521], [RFC1522].
131	   The basic architecture of complex content messaging is:

133	   o  Message sent from A to B.

135	   o  Message includes some data.  Sender A includes standard 'headers'
136	      telling recipient B enough information that recipient B knows how
137	      sender A intends the message to be interpreted.

139	   o  Recipient B gets the message, interprets the headers for the data
140	      and uses it as information on how to interpret the data.

142	   MIME is a "tagging and bagging" specification:

144	   tagging:  How to label content so the intent of how the content
145	      should be interpreted is known.

147	   bagging:  How to wrap the content so the label is clear, or, if there
148	      are multiple parts to a single message, how to combine them.

150	   "MIME types" (renamed "Internet Media Types" in later specs
151	   [RFC2046]) are part of the "tagging" -- a way to describe the content
152	   of a message so that it could be used to initiate interpretation of a
153	   message.  The "Internet Media Type registry" (MIME type registry) is
154	   where someone can tell the world what a particular label means, as
155	   far as the sender's intent of how recipients should process a message
156	   of that type, and the description of a recipients capability and
157	   ability for senders.

159	2.2.  Introducing MIME into the Web

161	   The original World Wide Web (the 0.9 version of HTTP, see [RFC1945])
162	   didn't have "tagging and bagging" -- everything sent via HTTP was
163	   assumed to be HTML.  However, at the time (early 1990's) other
164	   distributed information access systems, including Gopher (distributed
165	   menu system) and WAIS (remote access to document databases) were
166	   adding capabilities for accessing many things other text and
167	   hypertext and the WWW folks were considering type tagging.  It was
168	   agreed that HTTP should use MIME as the vocabulary for talking about
169	   file types and character sets.  The result was that HTTP 1.0 added
170	   the "content-type" header, following (more or less) MIME.  Later, for
171	   content negotiation, additional uses of this technology (in 'Accept'
172	   headers) were also added.

174	   The differences between the use of Internet Media Types between email
175	   and HTTP have minor:

177	   o  default charset: HTTP specified ISO-8859-1 as the default
178	      character set, not US-ASCII

180	   o  requirement for CRLF in plain text: in practice, web clients
181	      didn't restrict content to use CRLF in text/* MIME bodies.

183	   These minor differences have caused a lot of trouble.

185	2.3.  Distributed Extensibility

187	     The real advantage of using Internet Media Types to label content
188	   meant that the Web was no longer restricted to a single format.  This
189	       one addition meant expanding from Global Hypertext to Global
190	          Hypermedia (as suggested in a 1992 email [connolly92])

192	   +-------------------------------------------------------------------+
193	   | The Internet currently serves as the backbone for a global        |
194	   | hypertext.  FTP and email provided a good start, and the gopher,  |
195	   | WWW, or WAIS clients and servers make wide area information       |
196	   | browsing simple.  These systems even interoperate, with email     |
197	   | servers talking to FTP servers, WWW clients talking to gopher     |
198	   | servers, on and on.                                               |
199	   | This currently works quite well for text.  But what should WWW    |
200	   | clients do as Gopher and WAIS servers begin to serve up pictures, |
201	   | sounds, movies, spreadsheet templates, postscript files, etc.?    |
202	   | It would be a shame for each to adopt its own multimedia typing   |
203	   | system.                                                           |
204	   | If they all adopt the MIME typing system (and as many other       |
205	   | features from MIME as are appropriate), we can step from global   |
206	   | hypertext to global hypermedia that much easier.                  |
207	   +-------------------------------------------------------------------+

209	   The fact that HTTP could reliably transport images of different
210	   formats, for example, allowed NCSA to add <img> to HTML.  MIME
211	   allowed other document formats (Word, PDF, Postscript) and other
212	   kinds of hypermedia, as well as other applications, to be part of the
213	   Web. MIME was arguably the most important extensibility mechanism in
214	   the Web.

216	3.  Problems with application to the Web

218	   Unfortunately, while the use of Internet Media Types for the Web
219	   added incredible power, a number of problems have arisen.

221	3.1.  Lack of clarity

223	   Many people are confused about the purpose of MIME in the Web, its
224	   uses, the meaning of Internet Media Types.  Many W3C specifications
225	   TAG findings and Internet Media Type registrations make what are
226	   incorrect assumptions about the meaning and purposes of a Internet
227	   Media Type registration.

229	3.2.  Differences between email and Web delivery

231	   Some of the differences between the application contexts of email and
232	   Web delivery determine different requirements:

234	   o  In the Web, the transfer of data is initiated differently than in
235	      email: the "messages" with labeled content are usually HTTP
236	      responses to a specific (GET) request (although the request is
237	      itself a message, GET has no content).  In the most common case,
238	      then, the receiver knows more about the data before it has been
239	      sent.

241	   o  Clients would like to know more about the content before they
242	      retrieve it.  The "tagging" is often not sufficient to know, for
243	      example, "can I interpret this if I retrieve it", because of
244	      versioning, capabilities, or dependencies on things like screen
245	      size or interaction capabilities of the recipient.

247	   o  Some content isn't delivered over the HTTP (files on local file
248	      system), or there is no opportunity for tagging (data delivered
249	      over FTP) and in those cases, some other ways are needed for
250	      determining file type.

252	   Operating systems use (and continued to evolve) different systems to
253	   determine the 'type' of something, different from the MIME tagging
254	   and bagging:

256	   o  'magic numbers': in many contexts, file types could be guessed
257	      pretty reliably by looking for headers.

259	   o  Originally MAC OS had a 4 character 'file type' and another 4
260	      character 'creator code' for file types.

262	   o  Windows evolved to use the "file extension" -- 3 letters (and then
263	      more) at the end of the file name -- as the initial determination
264	      of the oveall type of a file.  This practice has now extended to
265	      other systems.

267	   Information about these other ways of determining type (rather than
268	   by the content-type label) were gathered for the Internet Media Type
269	   registry; those registering types are encouraged to also describe
270	   'magic numbers', Mac file type, common file extensions.  However,
271	   since there was no formal use of that information, the quality of
272	   that information in the registry is haphazard.

274	   Finally, there was the fact that tagging and bagging might be OK for
275	   unilaterally initiated (one-way) messaging, you might want to know
276	   whether you could handle the data before reading it in and
277	   interpreting it, but the Internet Media Types weren't enough to tell.

279	3.3.  The Rules Weren't Quite Followed

281	   The behavior of the community when the Internet Media Type registry
282	   was designed hasn't matched expectations:

284	   o  Lots of file types aren't registered (no entry in IANA for file
285	      types).

287	   o  Those that are, the registration is incomplete or incorrect
288	      (people doing registration didn't understand 'magic number' or
289	      other fields).

291	   o  The actual content deployed or created by deployed software
292	      doesn't match the registration.

294	   In particular, Web implementations of Internet Media Types diverged
295	   from expected behavior:

297	   o  Browser implementors would be liberal in what they accepted, and
298	      use what looked like a file extension in the URL and/or magic
299	      number or other 'sniffing' techniques to decide file type, without
300	      assuming content-label was authoritative.  This was necessary
301	      anyway for files that weren't delivered by HTTP.

303	   o  HTTP server implementors and administrators didn't supply ways of
304	      easily associating the 'intended' file type label with the file,
305	      resulting in files frequently being delivered with a label other
306	      than the one they would have chosen if they'd thought about it,
307	      and if browsers *had* assumed content-type was authoritative.
308	      Some popular servers had default configuration files that treated
309	      any unknown type as "text/plain" (plain ext in ASCII).  Since it
310	      didn't matter (the browsers worked anyway), it was hard to get
311	      this fixed.

313	   Incorrect senders coupled with liberal readers wind up feeding a
314	   negative feedback loop based on the robustness principle
315	   ([WikiRobust], [RFC3117]).

317	3.4.  Consequences

319	   The result, alas, is that increased unreliability, in that

321	   o  servers sending responses to browsers don't have a good guarantee
322	      that the browser won't "sniff" the content and decide to do
323	      something other than treat it as it is labeled

325	   o  browsers receiving content don't have a good guarantee that the
326	      content isn't mis-labeled

328	   o  intermediaries (gateways, proxies, caches, and other pieces of the
329	      Web infrastructure) don't have a good way of telling what the
330	      conversation means.

332	   This ambiguity and 'sniffing' also applies to packaged content in
333	   webapps ('bagging' but using ZIP rather than MIME multipart).  (NOTE:
334	   NEEDS EXPANSION, REFERENCE TO WEBAPPS)

336	3.5.  The Down Side of Extensibility

338	   Extensibility adds great power, and allows the Web to evolve without
339	   committee approval of every extension.  For some (those who want to
340	   extend and their clients who want those extensions), this is power!
341	   For others (those who are building Web components or infrastructure),
342	   extensibility is a drawback -- it adds to the unreliability and
343	   difference of the Web experience.  When senders use extensions
344	   recipients aren't aware of, implement incorrectly or incompletely,
345	   then communication often fails.  With messaging, this is a serious
346	   problem, although most 'rich text' documents are still delivered in
347	   multiple forms (using multipart/alternative).

349	   If your job is to support users of a popular browser, however, where
350	   each user has installed a different configuration of file handlers
351	   and extensibility mechanisms, MIME may appear to add unnecessary
352	   complexity and variable experience for users of all but the most
353	   popular types.

355	4.  Additional considerations

357	   This section notes some additional considerations.

359	4.1.  There are related problems with charsets

361	   MIME includes provisions not only for file 'types', but also,
362	   importantly the "character encoding" used by text types: for example,
363	   simple US ASCII, Western European ISO-8859-1, Unicode UTF8.  A
364	   similar vicious cycle also happened with character set labels:
365	   mislabeled content happily processed correctly by liberal browsers
366	   encouraged more and more sites to proliferate text with mis-labeled
367	   character sets, to the point where browsers feel they *have* to guess
368	   the wrong label.  (NEEDS EXPANSION)

370	   There are sites that intentionally label content as iso-2022-jp or
371	   euc-jp when it is in fact one of the Microsoft extension charsets
372	   (e.g., for access to circled digits.  This is an intentional misuse
373	   of the definitions of the charsets themselves -- definitions which
374	   originated at the national standards body level.

376	4.2.  Embedded, downloaded, launch independent application

378	   The type of a document might be determined not only for entire
379	   documents "HTML" vs "Word" vs "PDF", but also to embedded components
380	   of documents, "JPEG image" vs. "PNG image".  However, the use cases,
381	   requirements and likely operational impact of MIME handling is likely
382	   different for those use cases.

384	4.3.  Additional Use Cases: Polyglot and Multiview

386	   There are some interesting additional use cases which add to the
387	   design requirements:

389	   o  "Polyglot" documents: A 'polyglot' document is one which is some
390	      data which can be treated as two different Internet Media Types,
391	      in the case where the meaning of the data is the same.  This is
392	      part of a transition strategy to allow content providers (senders)
393	      to manage, produce, store, deliver the same data, but with two
394	      different labels, and have it work equivalently with two different
395	      kinds of receivers (one of which knows one Internet Media Type,
396	      and another which knows a second one.)  This use case was part of
397	      the transition strategy from HTML to an XML-based XHTML, and also
398	      as a way of a single service offering both HTML-based and XML-
399	      based processing (e.g., same content useful for news articles and
400	      Web pages.

402	   o  "Multiview" documents: This use case seems similar but it's quite
403	      different.  In this case, the same data has very different meaning
404	      when served as two different content-types, but that difference is
405	      intentional; for example, the same data served as text/html is a
406	      document, and served as an RDFa type is some specific data.

408	4.4.  Evolution, Versioning, Forking

410	   The subject of format/language/type evolution is complex; this
411	   section is a litle terse.

413	   Formats and their specifications evolve over time.  There are several
414	   reasons for the evolution: innovation, compatibility with other
415	   implementations, attempts to gain control.

417	   Some times new evolutions are "compatible", although compatibility
418	   has several variations.  It is part of the responsibility of the
419	   designer of a new version of a file type to try to insure both
420	   forward and backward compatibility: new documents work reasonably
421	   (with some fallback) with old viewers and that old documents work
422	   reasonably with new viewers.  In some cases this is accomplished,
423	   others not; in some cases, "works reasonably" is softened to "either
424	   works reasonably or gives clear warning about nature of problem
425	   (version mismatch)."

427	   In MIME, the 'tag', the Internet Media Type, corresponds to the
428	   versioned series.  Internet Media Types do not identify a particular
429	   version of a file format.  Rather, the general idea is that the
430	   Internet Media Type identifies the family, and also how you're
431	   supposed to otherwise find version information on a per-format basis.
432	   Many (most) file formats have an internal version indicator, with the
433	   idea that you only need a new Internet Media Type to designate a
434	   completely incompatible format.  The notion of an "Internet Media
435	   Type" is very course-grained.  The general approach to this has been
436	   that the actual Media Type includes provisions for version
437	   indicator(s) embedded in the content itself to determine more
438	   precisely the nature of how the data is to be interpreted.  That is,
439	   the message itself contains further information.

441	   Unfortunately, lots has gone wrong in this scenario as well --
442	   processors ignoring version indicators encouraging content creators
443	   to not be careful to supply correct version indicators, leading to
444	   lots of content with wrong version indicators.

446	   Those updating an existing Internet Media Type registration to
447	   account for new versions are admonished to not make previously
448	   conforming documents non-conforming.  This is harder to enforce than
449	   would seem, because the previous specifications are not always
450	   accurate to what the Internet Media Type was used for in practice.

452	   (NOTE: MULTIPLE INCOMPATIBLE AUTHORITATIVE SPECS)

454	4.5.  Content Negotiation

456	   The general idea of content negotiation is when party A communicates
457	   to party B, and the message can be delivered in more than one format
458	   (or version, or configuration), there can be some way of allowing
459	   some negotiation, some way for A to communication to B the available
460	   options, and for B to be able to accept or indicate preferences.

462	   Content negotiation happens all over.  When one fax machine twirps to
463	   another when initially connecting, they are negotiating resolution,
464	   compression methods and so forth.  In Internet mail, which is a one-
465	   way communication, the "negotiation" consists of the sender preparing
466	   and sending multiple versions of the message, one in text/html, one
467	   in text/plain, for example, in sender-preference order.  The
468	   recipient then chooses the first version it can understand.

470	   HTTP added "Accept" and "Accept-language" to allow content
471	   negotiation in HTTP GET, based on Internet Media Types, and there are
472	   other methods explained in the HTTP spec.

474	4.6.  Fragment identifiers

476	   The Web added the notion of being able to address part of a content
477	   and not the whole content by adding a 'fragment identifier' to the
478	   URL that addressed the data.  Of course, this originally made sense
479	   for the original Web with just HTML, but how would it apply to other
480	   content.  The URL spec glibly noted that "the definition of the
481	   fragment identifier meaning depends on the Internet Media Type", but
482	   unfortunately, few of the Internet Media Type definitions included
483	   this information, and practices diverged greatly.

485	   If the interpretation of fragment identifiers depends on the MIME
486	   type, though, this really crimps the style of using fragment
487	   identifiers differently if content negotiation is wanted.

489	5.  Recommendations

491	   This section outlines the kinds of changes needed to bring the MIME
492	   system in line with current practice and to address the problems
493	   outlined above.  The purpose of this text is not to specify the exact
494	   details of how changes can be accomplished, but rather to find broad
495	   aggreement.

497	   We need a clear direction on how to make the Web more reliable, not
498	   less.  We need a realistic transition plan from the unreliable Web to
499	   the more reliable one.  Part of this is to encourage senders (Web
500	   servers) to mean what they say, and encourage recipients (browsers)
501	   to give preference to what the senders are sending.

503	   We should try to create specifications for protocols and best
504	   practices that will lead the Web to more reliable and secure
505	   communication.  To this end, we give an overall architectural
506	   approach to use of MIME, and then specific specifications, for HTTP
507	   clients and servers, Web Browsers in general, proxies and
508	   intermediaries, which encourage behavior which, on the one hand,
509	   continues to work with the already deployed infrastructure (of
510	   servers, browsers, and intermediaries), but which advice, if
511	   followed, also improves the operability, reliability and security of
512	   the Web.

514	   This section outlines requirements for standards and practices
515	   intended to address some of the difficulties.  This is an early
516	   version, which mainly contains "strawman" proposals for changes.  It
517	   is intended to stimulate discussion -- however, the hope is that we
518	   can get agreement about the nature of the problems and current
519	   situation before focusing in detail about possible solutions.
520	   However, having at least strawman proposals seems to be helpful.  For
521	   some problems, additional changes to IETF and W3C specifications are
522	   also be advisable; the expectations are briefly outlined here.

524	5.1.  Internet Media Type registration

526	   Update the Internet Media Type registry and registration process.

528	5.1.1.  MIME registry magic numbers for sniffing

530	   Be clearer about relationship of 'magic numbers' to sniffing; review
531	   Internet Media Types already registered and update.

533	5.1.2.  Scripting and scriptable content safety

535	   Be clearer about requiring Security Considerations to address risks
536	   of sniffing

538	5.1.3.  Fragment identifiers

540	   Problem: MIME type definitions don't talk about fragment identifiers.

542	   require definition of fragment identifier applicability; add fragID
543	   semantics

545	5.1.4.  Application info

547	   Problem: ((hasn't been expanded)

549	   Could the 'applications that use this type' section to be clearer
550	   about whether the file type is frequently for embedding (plug-in) or
551	   as a separate document with auto-launch (MIME handler), or should
552	   always be downloaded?  Is there a separate issue for 'auto-play on
553	   download' vs. 'ask user for permission'?

555	5.1.5.  File extensions in registry

557	   Problem: Sniffing needs to use file extensions too; signify which
558	   file extensions are useful for sniffing.

560	   Be clearer about file extension use and relationship of file
561	   extensions to MIME handlers

563	5.2.  Sniffing

565	   Various new specifications discuss, promote or mandate the use of
566	   'sniffing' -- using the content of the data to supplement or even
567	   override the declared content-type or charset.  Update these
568	   specifications.

570	5.2.1.  Sniffing uses Media Type magic number

572	   Update the proposed Media Type sniffing document so that sniffing
573	   uses MIME registry for 'magic numbers'.

575	5.2.2.  Sniffing when there are multiple different definitions

577	   Address issue of sniffing when there are multiple independent
578	   definitions of the same MIME type.

580	5.2.3.  Sniffing charsets

582	   Update sniffing of charsets to use charset reference info.

584	5.2.4.  Sniffing security uses scriptability info

586	   If the Internet Media Type registry is more explicit about which
587	   kinds of content contain what kind of scriptability access, then the
588	   specifications for sniffing can reference the Internet Media Type
589	   registry to determine what kinds of sniffing constitute a 'privelege
590	   upgrade'.

592	   Note that all sniffing can be a priviledge upgrade, if there is a
593	   buggy recipient, although bugs can be fixed, but spec violations are
594	   a problem.

596	5.3.  Changes to IANA processes for MIME registries

598	   Problem: Internet Media Type registries are hard to update, and there
599	   can be different definitions of the same MIME type.

601	   STRAWMAN: Allow commenting or easier update; not all Internet Media
602	   Type owners need or have all the information the internet needs.
603	   Wiki for Internet Media Types as well as formal registry?  Ability to
604	   add comments about deployed senders, deployed content, deployed
605	   recievers.

607	5.4.  FTP specification

609	   Do FTP clients also change rules about guessing file types based on
610	   OS of FTP server?

612	5.5.  Update some URI definitions

614	   ftp, file, need sniffing, http sometimes does; data defaults to text/
615	   plain rather than sniffing.  Should this info be in the URI
616	   definitions.

618	5.6.  Changes to W3C findings, processes

620	   Update Tag finding on authoritative metadata: is it possible to
621	   remove 'authority'?

623	   new: MIME and Internet Media Type section to WebArch, referencing
624	   this memo

626	   New: Add a W3C Web architecture material on MIME in HTML to W3C web
627	   site, referencing this memo

629	   Reconsider other extensibility mechanisms (namespaces, for example):
630	   should they use MIME or something like it?

632	6.  Acknowledgements

634	   This document is the result of discussions among many individuals in
635	   the IETF and W3C. Special thanks to Noah Mendelsohn.

637	7.  IANA Considerations

639	   This document includes no specific changes to IANA registries or
640	   processes.  However, it outlines several considerations for future
641	   explicit recommendations to IANA, to change Internet Media Type and
642	   Charset registries and the processes around their maintenance.  IANA
643	   evaluation of the feasibility of these changed processes is required.

645	8.  Security Considerations

647	   This document discusses some of the security issues resulting from
648	   use (and mis-use) of MIME content types in the Web.

650	9.  Informative References

652	   [RFC1521]  Borenstein, N. and N. Freed, "MIME (Multipurpose Internet
653	              Mail Extensions) Part One: Mechanisms for Specifying and
654	              Describing the Format of Internet Message Bodies",
655	              RFC 1521, <http://tools.ietf.org/html/rfc5521>.

657	   [RFC1522]  Moore, K., "MIME (Multipurpose Internet Mail Extensions)
658	              Part Two: Message Header Extensions for Non-ASCII Text",
659	              RFC 1522, September 1993,
660	              <http://tools.ietf.org/html/rfc1522>.

662	   [RFC1945]  Berners-Lee, T., Fielding, R., and H. Nielsen, "Hypertext
663	              Transfer Protocol -- HTTP/1.0", RFC 1945, May 1996,
664	              <http://tools.ietf.org/rfc/rf1945>.

666	   [RFC2046]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
667	              Extensions (MIME) Part Two: Media Types", RFC 2046,
668	              November 1996, <http://tools.ietf.org/html/rfc2046>.

670	   [RFC3117]  Rose, M., "On the Design of Application Protocols",
671	              RFC 3117, November 2001,
672	              <http://tools.ietf.org/html/rfc3117>.

674	   [WikiRobust]
675	              "Robustness principle", 2010,
676	              <http://en.wikipedia.org/wiki/Robustness_principle>.

678	   [connolly92]
679	              Connolly, D., "Global Hypermedia", Oct 1992, <http://
680	              lists.w3.org/Archives/Public/www-talk/1992SepOct/
681	              0024.html>.

683	   [mime-sniff]
684	              Barth, A. and I. Hickson, "Media Type Sniffing", May 2010,
685	              <http://tools.ietf.org/html/draft-abarth-mime-sniff>.

687	Author's Address

689	   Larry Masinter
690	   Adobe
691	   345 Park Ave.
692	   San Jose,   95110
693	   USA

695	   Phone: +1 408 536 3024
696	   Email: masinter@adobe.com
697	   URI:   http://larry.masinter.net