idnits 2.17.1 draft-murata-kohn-lilley-xml-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC3023]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 24, 2009) is 5327 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ASCII' -- Possible downref: Non-RFC (?) normative reference: ref. 'CSS' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO8859' -- Possible downref: Non-RFC (?) normative reference: ref. 'MathML' -- Possible downref: Non-RFC (?) normative reference: ref. 'PNG' ** Obsolete normative reference: RFC 1652 (Obsoleted by RFC 6152) ** Obsolete normative reference: RFC 2445 (Obsoleted by RFC 5545) ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 3023 (Obsoleted by RFC 7303) ** Obsolete normative reference: RFC 3501 (Obsoleted by RFC 9051) ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838) -- Possible downref: Non-RFC (?) normative reference: ref. 'SGML' -- Possible downref: Non-RFC (?) normative reference: ref. 'SVG' -- Possible downref: Non-RFC (?) normative reference: ref. 'SVGMediaType' -- Possible downref: Non-RFC (?) normative reference: ref. 'TAGMIME' -- Possible downref: Non-RFC (?) normative reference: ref. 'UML' -- Possible downref: Non-RFC (?) normative reference: ref. 'XBase' -- Possible downref: Non-RFC (?) normative reference: ref. 'XHTML' -- Possible downref: Non-RFC (?) normative reference: ref. 'XML' -- Possible downref: Non-RFC (?) normative reference: ref. 'XPointerElement' -- Possible downref: Non-RFC (?) normative reference: ref. 'XPointerFramework' -- Possible downref: Non-RFC (?) normative reference: ref. 'XPointerXmlns' -- Possible downref: Non-RFC (?) normative reference: ref. 'XPtrReg' -- Possible downref: Non-RFC (?) normative reference: ref. 'XSLT' -- Obsolete informational reference (is this intentional?): RFC 2376 (Obsoleted by RFC 3023) Summary: 8 errors (**), 0 flaws (~~), 2 warnings (==), 22 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Murata 3 Internet-Draft IBM Tokyo Research Laboratory 4 Intended status: Standards Track D. Kohn 5 Expires: March 28, 2010 skymoon ventures 6 C. Lilley 7 W3C 8 September 24, 2009 10 XML Media Types 11 draft-murata-kohn-lilley-xml-03.txt 13 Status of this Memo 15 This Internet-Draft is submitted to IETF in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on March 28, 2010. 36 Copyright Notice 38 Copyright (c) 2009 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents in effect on the date of 43 publication of this document (http://trustee.ietf.org/license-info). 44 Please review these documents carefully, as they describe your rights 45 and restrictions with respect to this document. 47 Abstract 49 This document standardizes three media types -- application/xml, 50 application/xml-external-parsed-entity, and application/xml-dtd -- 51 for use in exchanging network entities that are related to the 52 Extensible Markup Language (XML) while deprecating text/xml and text/ 53 xml-external-parsed-entity. This document also standardizes a 54 convention (using the suffix '+xml') for naming media types outside 55 of these five types when those media types represent XML MIME 56 entities. XML MIME entities are currently exchanged via the 57 HyperText Transfer Protocol on the World Wide Web, are an integral 58 part of the WebDAV protocol for remote web authoring, and are 59 expected to have utility in many domains. 61 Major differences from [RFC3023] are deprecation of text/xml and 62 text/xml-external-parsed-entity, the addition of XPointer and XML 63 Base as fragment identifiers and base URIs, respectively, mention of 64 the XPointer Registry, and updating of many references. 66 Table of Contents 68 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 69 2. Notational Conventions . . . . . . . . . . . . . . . . . . . . 6 70 3. XML Media Types . . . . . . . . . . . . . . . . . . . . . . . 7 71 3.1. Text/xml Registration (deprecated) . . . . . . . . . . . . 9 72 3.2. Application/xml Registration . . . . . . . . . . . . . . . 12 73 3.3. Text/xml-external-parsed-entity Registration 74 (deprecated) . . . . . . . . . . . . . . . . . . . . . . . 13 75 3.4. Application/xml-external-parsed-entity Registration . . . 14 76 3.5. Application/xml-dtd Registration . . . . . . . . . . . . . 15 77 3.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 16 78 4. The Byte Order Mark (BOM) and Conversions to/from the 79 UTF-16 Charset . . . . . . . . . . . . . . . . . . . . . . . . 18 80 5. Fragment Identifiers . . . . . . . . . . . . . . . . . . . . . 19 81 6. The Base URI . . . . . . . . . . . . . . . . . . . . . . . . . 20 82 7. XML Versions . . . . . . . . . . . . . . . . . . . . . . . . . 21 83 8. A Naming Convention for XML-Based Media Types . . . . . . . . 22 84 8.1. Referencing . . . . . . . . . . . . . . . . . . . . . . . 24 85 9. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 86 9.1. Text/xml (deprecated) with UTF-8 Charset . . . . . . . . . 25 87 9.2. Text/xml (deprecated) with UTF-16 Charset . . . . . . . . 25 88 9.3. Text/xml (deprecated) with UTF-16BE Charset . . . . . . . 26 89 9.4. Text/xml (deprecated) with ISO-2022-KR Charset . . . . . . 26 90 9.5. Text/xml (deprecated) with Omitted Charset . . . . . . . . 26 91 9.6. Application/xml with UTF-16 Charset . . . . . . . . . . . 26 92 9.7. Application/xml with UTF-16BE Charset . . . . . . . . . . 27 93 9.8. Application/xml with ISO-2022-KR Charset . . . . . . . . . 27 94 9.9. Application/xml with Omitted Charset and UTF-16 XML 95 MIME Entity . . . . . . . . . . . . . . . . . . . . . . . 27 96 9.10. Application/xml with Omitted Charset and UTF-8 Entity . . 28 97 9.11. Application/xml with Omitted Charset and Internal 98 Encoding Declaration . . . . . . . . . . . . . . . . . . . 28 99 9.12. Text/xml-external-parsed-entity (deprecated) with 100 UTF-8 Charset . . . . . . . . . . . . . . . . . . . . . . 29 101 9.13. Application/xml-external-parsed-entity with UTF-16 102 Charset . . . . . . . . . . . . . . . . . . . . . . . . . 29 103 9.14. Application/xml-external-parsed-entity with UTF-16BE 104 Charset . . . . . . . . . . . . . . . . . . . . . . . . . 29 105 9.15. Application/xml-dtd . . . . . . . . . . . . . . . . . . . 30 106 9.16. Application/mathml+xml . . . . . . . . . . . . . . . . . . 30 107 9.17. Application/xslt+xml . . . . . . . . . . . . . . . . . . . 30 108 9.18. Application/rdf+xml . . . . . . . . . . . . . . . . . . . 30 109 9.19. Image/svg+xml . . . . . . . . . . . . . . . . . . . . . . 31 110 9.20. model/x3d+xml . . . . . . . . . . . . . . . . . . . . . . 31 111 9.21. INCONSISTENT EXAMPLE: Text/xml (deprecated) with UTF-8 112 Charset . . . . . . . . . . . . . . . . . . . . . . . . . 31 113 9.22. application/xml . . . . . . . . . . . . . . . . . . . . . 31 114 9.23. Application/soap+xml . . . . . . . . . . . . . . . . . . . 32 115 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 33 116 11. Security Considerations . . . . . . . . . . . . . . . . . . . 34 117 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 37 118 12.1. Normative References . . . . . . . . . . . . . . . . . . . 37 119 12.2. Informative References . . . . . . . . . . . . . . . . . . 40 120 Appendix A. Why Use the '+xml' Suffix for XML-Based MIME 121 Types? . . . . . . . . . . . . . . . . . . . . . . . 41 122 A.1. Why not just use text/xml or application/xml and let 123 the XML processor dispatch to the correct application 124 based on the referenced DTD? . . . . . . . . . . . . . . . 41 125 A.2. Why not create a new subtree (e.g., image/xml.svg) to 126 represent XML MIME types? . . . . . . . . . . . . . . . . 41 127 A.3. Why not create a new top-level MIME type for XML-based 128 media types? . . . . . . . . . . . . . . . . . . . . . . . 41 129 A.4. Why not just have the MIME processor 'sniff' the 130 content to determine whether it is XML? . . . . . . . . . 42 131 A.5. Why not use a MIME parameter to specify that a media 132 type uses XML syntax? . . . . . . . . . . . . . . . . . . 42 133 A.6. How about labeling with parameters in the other 134 direction (e.g., application/xml; 135 Content-Feature=iotp)? . . . . . . . . . . . . . . . . . . 43 136 A.7. How about a new superclass MIME parameter that is 137 defined to apply to all MIME types (e.g., 138 Content-Type: application/iotp; $superclass=xml)? . . . . 43 139 A.8. What about adding a new parameter to the 140 Content-Disposition header or creating a new 141 Content-Structure header to indicate XML syntax? . . . . . 44 142 A.9. How about a new Alternative-Content-Type header? . . . . . 44 143 A.10. How about using a conneg tag instead (e.g., 144 accept-features: (syntax=xml))? . . . . . . . . . . . . . 44 145 A.11. How about a third-level content-type, such as 146 text/xml/rdf? . . . . . . . . . . . . . . . . . . . . . . 44 147 A.12. Why use the plus ('+') character for the suffix '+xml'? . 45 148 A.13. What is the semantic difference between 149 application/foo and application/foo+xml? . . . . . . . . . 45 150 A.14. What happens when an even better markup language 151 (e.g., EBML) is defined, or a new category of data? . . . 45 152 A.15. Why must I use the '+xml' suffix for my new XML-based 153 media type? . . . . . . . . . . . . . . . . . . . . . . . 46 154 A.16. Why not redefine text/xml instead of deprecating it . . . 46 155 Appendix B. Changes from RFC 3023 . . . . . . . . . . . . . . . . 47 156 Appendix C. Acknowledgements . . . . . . . . . . . . . . . . . . 48 157 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 49 159 1. Introduction 161 The World Wide Web Consortium has issued the Extensible Markup 162 Language (XML) 1.0 specification. [XML]. To enable the exchange of 163 XML network entities, this document standardizes three media types -- 164 application/xml, application/xml-external-parsed-entity, and 165 application/xml-dtd --, deprecates two media types -- text/xml and 166 text/xml-external-parsed-entity --, as well as a naming convention 167 for identifying XML-based MIME media types. 169 XML entities are currently exchanged on the World Wide Web, and XML 170 is also used for property values and parameter marshalling by the 171 WebDAV [RFC4918] protocol for remote web authoring. Thus, there is a 172 need for a media type to properly label the exchange of XML network 173 entities. 175 Although XML is a subset of the Standard Generalized Markup Language 176 (SGML) ISO 8879 [SGML], which has been assigned the media types text/ 177 sgml and application/sgml, there are several reasons why use of text/ 178 sgml or application/sgml to label XML is inappropriate. First, there 179 exist many applications that can process XML, but that cannot process 180 SGML, due to SGML's larger feature set. Second, SGML applications 181 cannot always process XML entities, because XML uses features of 182 recent technical corrigenda to SGML. Third, the definition of text/ 183 sgml and application/sgml in [RFC1874] includes parameters for SGML 184 bit combination transformation format (SGML-bctf), and SGML boot 185 attribute (SGML-boot). Since XML does not use these parameters, it 186 would be ambiguous if such parameters were given for an XML MIME 187 entity. For these reasons, the best approach for labeling XML 188 network entities has been to provide new media types for XML. 190 Since XML is an integral part of the WebDAV Distributed Authoring 191 Protocol, and since World Wide Web Consortium Recommendations are 192 assigned standards tree media types, and since similar media types 193 (HTML, SGML) have been assigned standards tree media types, the XML 194 media types were also placed in the standards tree [RFC3023]. 196 Similarly, XML has been used as a foundation for other media types, 197 including types in every branch of the IETF media types tree. To 198 facilitate the processing of such types, media types based on XML, 199 but that are not identified using application/xml (or text/xml), 200 SHOULD be named using a suffix of '+xml' as described in Section 8. 201 This will allow XML-based tools -- browsers, editors, search engines, 202 and other processors -- to work with all XML-based media types. 204 2. Notational Conventions 206 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 207 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 208 document are to be interpreted as described in [RFC2119]. 210 As defined in [RFC2781] (informative), the three charsets "utf-16", 211 "utf-16le", and "utf-16be" are used to label UTF-16 text. In this 212 document, "the UTF-16 family" refers to those three charsets. By 213 contrast, the phrases "utf-16" or UTF-16 in this document refer 214 specifically to the single charset "utf-16". 216 As sometimes happens between two communities, both MIME and XML have 217 defined the term entity, with different meanings. Section 2.4 of 218 [RFC2045] says: 220 "The term 'entity' refers specifically to the MIME-defined header 221 fields and contents of either a message or one of the parts in the 222 body of a multipart entity." 224 Section 4 of [XML] says: 226 "An XML document may consist of one or many storage units. These 227 are called entities; they all have content and are all (except for 228 the document entity and the external DTD subset) identified by 229 entity name". 231 In this document, "XML MIME entity" is defined as the latter (an XML 232 entity) encapsulated in the former (a MIME entity). 234 3. XML Media Types 236 This document standardizes three media types related to XML MIME 237 entities: application/xml, application/xml-external-parsed-entity, 238 and application/xml-dtd while deprecating text/xml and text/ 239 xml-external-parsed-entity. Registration information for these media 240 types is described in the sections below. 242 Within the XML specification, XML MIME entities can be classified 243 into four types. In the XML terminology, they are called "document 244 entities", "external DTD subsets", "external parsed entities", and 245 "external parameter entities". The media type application/xml MAY be 246 used for "document entities", while application/ 247 xml-external-parsed-entity SHOULD be used for "external parsed 248 entities". Note that [RFC3023] (which this document obsoletes) 249 recommended the use of text/xml and text/xml-external-parsed-entity 250 for document entities and external parsed entities, respectively. 251 Although these media types are still commonly used, this document 252 deprecates them for future interoperability. The media type 253 application/xml-dtd SHOULD be used for "external DTD subsets" or 254 "external parameter entities". application/xml MUST NOT be used for 255 "external parameter entities" or "external DTD subsets", and MUST NOT 256 be used for "external parsed entities" unless they are also well- 257 formed "document entities" and are referenced as such. Note that 258 [RFC2376] (which is obsolete) allowed such usage, although in 259 practice it is likely to have been rare. 261 Neither external DTD subsets nor external parameter entities parse as 262 XML documents, and while some XML document entities may be used as 263 external parsed entities and vice versa, there are many cases where 264 the two are not interchangeable. XML also has unparsed entities, 265 internal parsed entities, and internal parameter entities, but they 266 are not XML MIME entities. 268 Application/xml and application/xml-external-parsed-entity are 269 recommended. Unlike [RFC2376] or [RFC3023], this document deprecates 270 text/xml and text/xml-external-parsed-entity. The reasons are as 271 follows: 273 Conflicting specifications regarding the character encoding has 274 caused confusion. On the one hand, [RFC2046] specifies "The 275 default character set, which must be assumed in the absence of a 276 charset parameter, is US-ASCII.", [RFC2616] Section 3.7.1, defines 277 that "media subtypes of the 'text' type are defined to have a 278 default charset value of 'ISO-8859-1'", and [RFC2376] as well as 279 [RFC3023] specify the default charset is US-ASCII. On the other 280 hand, implementors and users of XML parsers, following Appendix F 281 of [XML], assume that the default is provided by the XML encoding 282 declaration or BOM. Note that this conflict does not exist for 283 application/xml or application/xml-external-parsed-entity (see 284 "Optional parameters" of application/xml registration in 285 Section 3.2). 287 An XML document -- that is, the unprocessed, source XML document 288 -- is unreadable by casual users. Note that MIME user agents that 289 do not have explicit support for text/xml will treat it as text/ 290 plain, for example, by displaying the XML MIME entity as plain 291 text. 293 Using application/xml and application/xml-external-parsed-entity 294 instead of text/xml and text/xml-external-parsed-entity does not 295 loose any functionalities. 297 The top-level media type "text" has some restrictions on MIME 298 entities and they are described in [RFC2045] and [RFC2046]. In 299 particular, the UTF-16 family, UCS-4, and UTF-32 are not allowed 300 (except over HTTP [RFC2616], which uses a MIME-like mechanism). 301 However, section 4.3.3 of [XML] says: 303 "Each external parsed entity in an XML document may use a 304 different encoding for its characters. All XML processors MUST 305 be able to read entities in both the UTF-8 and UTF-16 306 encodings." 308 Thus, although all XML processors can read entities in at least 309 UTF-16, if an XML document or external parsed entity is encoded in 310 such character encoding schemes, it cannot be labeled as text/xml 311 or text/xml-external-parsed-entity (except for HTTP). 313 XML provides a general framework for defining sequences of structured 314 data. In some cases, it may be desirable to define new media types 315 that use XML but define a specific application of XML, perhaps due to 316 domain-specific display, editing, security considerations or runtime 317 information. Furthermore, such media types may allow UTF-8 or UTF-16 318 only and prohibit other charsets. This document does not prohibit 319 such media types and in fact expects them to proliferate. However, 320 developers of such media types are STRONGLY RECOMMENDED to use this 321 document as a basis for their registration. In particular, the 322 charset parameter SHOULD be used in the same manner, as described in 323 Section 8.1, in order to enhance interoperability. 325 An XML document labeled as application/xml, or with a +xml media 326 type, (or text/xml) might contain namespace declarations, stylesheet- 327 linking processing instructions (PIs), schema information, or other 328 declarations that might be used to suggest how the document is to be 329 processed. For example, a document might have the XHTML namespace 330 and a reference to a CSS stylesheet. Such a document might be 331 handled by applications that would use this information to dispatch 332 the document for appropriate processing. 334 3.1. Text/xml Registration (deprecated) 336 MIME media type name: text 338 MIME subtype name: xml 340 Mandatory parameters: none 342 Optional parameters: charset 344 Although listed as an optional parameter, the use of the charset 345 parameter is REQUIRED, unless the charset is us-ascii. The 346 charset parameter can also be used to provide protocol-specific 347 operations, such as charset-based content negotiation in HTTP. 348 "utf-8" [RFC3629] is the recommended value, representing the UTF-8 349 charset. UTF-8 is supported by all conforming processors of [XML] 351 If the XML MIME entity is transmitted via HTTP, which uses a MIME- 352 like mechanism that is exempt from the restrictions on the text 353 top-level type (see section 19.4.1 of [RFC2616]), "utf-16" 354 [RFC2781]) is also recommended. UTF-16 is supported by all 355 conforming processors of [XML]. Since the handling of CR, LF and 356 NUL for text types in most MIME applications would cause undesired 357 transformations of individual octets in UTF-16 multi-octet 358 characters, gateways from HTTP to these MIME applications MUST 359 transform the XML MIME entity from text/xml; charset="utf-16" to 360 application/xml; charset="utf-16". 362 Conformant with [RFC2046], if a text/xml entity is received with 363 the charset parameter omitted, MIME processors and XML processors 364 MUST use the default charset value of "us-ascii" [ASCII]. In 365 cases where the XML MIME entity is transmitted via HTTP, the 366 default charset value is still "us-ascii". (Note: There is an 367 inconsistency between this specification and HTTP/1.1, which uses 368 ISO-8859-1 [ISO8859] as the default for a historical reason. 369 Since it is the intersection of UTF-8 and ISO-8859-1 and since it 370 is already used by MIME, US-ASCII was chosen, as the default 371 charset for text/xml. However, it is known that many servers and 372 parsers ignore this default and rely on the XML encoding 373 declaration or BOM. Thus, application/xml is a more suitable 374 choice. 376 There are several reasons that the charset parameter was 377 authoritative. First, some MIME processing engines do transcoding 378 of MIME bodies of the top-level media type "text" without 379 reference to any of the internal content. Thus, it is possible 380 that some agent might change text/xml; charset="iso-2022-jp" to 381 text/xml; charset="utf-8" without modifying the encoding 382 declaration of an XML document. Second, text/xml must be 383 compatible with text/plain, since MIME agents that do not 384 understand text/xml will fallback to handling it as text/plain. 385 If the charset parameter for text/xml were not authoritative, such 386 fallback would cause data corruption. Third, recent web servers 387 have been improved so that server administrators can specify the 388 charset parameter. Fourth, [RFC2130] (informative) specifies that 389 the recommended specification scheme is the "charset" parameter. 391 Since the charset parameter is authoritative, the charset was 392 sometimes not declared within an XML encoding declaration. Thus, 393 special care was needed when the recipient stripped the MIME 394 header and provided persistent storage of the received XML MIME 395 entity (e.g., in a file system). Unless the charset is UTF-8 or 396 UTF-16, the recipient SHOULD also persistently store information 397 about the charset, perhaps by embedding a correct XML encoding 398 declaration within the XML MIME entity. 400 Encoding considerations: This media type MAY be encoded as 401 appropriate for the charset and the capabilities of the underlying 402 MIME transport. For 7-bit transports, data in UTF-8 MUST be 403 encoded in quoted-printable or base64. For 8-bit clean transport 404 (e.g., 8BITMIME [RFC1652] ESMTP or NNTP [RFC3977]), UTF-8 does not 405 need to be encoded. Over HTTP [RFC2616], no content-transfer- 406 encoding is necessary and UTF-16 may also be used. 408 Security considerations: See Section 11. 410 Interoperability considerations: XML has proven to be interoperable 411 across WebDAV clients and servers, and for import and export from 412 multiple XML authoring tools. For maximum interoperability, 413 validating processors are recommended. Although non-validating 414 processors may be more efficient, they are not required to handle 415 all features of XML. For further information, see sub-section 2.9 416 "Standalone Document Declaration" and section 5 "Conformance" of 417 [XML] . 419 Published specification: Extensible Markup Language (XML) 1.0 (Fifth 420 Edition) [XML]. 422 Applications which use this media type: XML is device-, platform-, 423 and vendor-neutral and is supported by a wide range of Web user 424 agents, WebDAV [RFC4918] clients and servers, as well as XML 425 authoring tools. 427 Additional information: 429 Magic number(s): None. 431 Although no byte sequences can be counted on to always be 432 present, XML MIME entities in ASCII-compatible charsets 433 (including UTF-8) often begin with hexadecimal 3C 3F 78 6D 6C 434 (" 447 Daniel Kohn 449 Chris Lilley 451 Intended usage: COMMON 453 Author/Change controller: The XML specification is a work product of 454 the World Wide Web Consortium's XML Working Group, and was edited 455 by: 457 Tim Bray 459 Jean Paoli 461 C. M. Sperberg-McQueen 463 Eve Maler 465 Francois Yergeau 467 The W3C, and the W3C XML Core Working Group, have change control 468 over the XML specification. 470 3.2. Application/xml Registration 472 MIME media type name: application 474 MIME subtype name: xml 476 Mandatory parameters: none 478 Optional parameters: charset 480 Although listed as an optional parameter, the use of the charset 481 parameter, when the charset is reliably known and agrees with the 482 encoding declaration, is RECOMMENDED, since this information can 483 be used by non-XML processors to determine authoritatively the 484 charset of the XML MIME entity. The charset parameter can also be 485 used to provide protocol-specific operations, such as charset- 486 based content negotiation in HTTP. 488 "utf-8" [RFC3629] and "utf-16" [RFC2781] are the recommended 489 values, representing the UTF-8 and UTF-16 charsets, respectively. 490 These charsets are preferred since they are supported by all 491 conforming processors of [XML]. 493 If an application/xml entity is received where the charset 494 parameter is omitted, no information is being provided about the 495 charset by the MIME Content-Type header. Conforming XML 496 processors MUST follow the requirements in section 4.3.3 of [XML] 497 that directly address this contingency. However, MIME processors 498 that are not XML processors SHOULD NOT assume a default charset if 499 the charset parameter is omitted from an application/xml entity. 501 There are several reasons that the charset parameter is 502 authoritative. First, recent web servers have been improved so 503 that users can specify the charset parameter. Second, [RFC2130] 504 (informative) specifies that the recommended specification scheme 505 is the "charset" parameter. 507 On the other hand, it has been argued that the charset parameter 508 should be omitted and the mechanism described in Appendix F of 509 [XML] (which is non-normative) should be solely relied on. This 510 approach would allow users to avoid configuration of the charset 511 parameter; an XML document stored in a file is likely to contain a 512 correct encoding declaration or BOM (if necessary), since the 513 operating system does not typically provide charset information 514 for files. If users would like to rely on the encoding 515 declaration or BOM and to hide charset information from protocols, 516 they SHOULD determine not to use the parameter. 518 Since the charset parameter is authoritative, the charset is not 519 always declared within an XML encoding declaration. However, 520 since a receiving application can, with very high reliability, 521 determine the encoding of an XML document by reading it, the XML 522 encoding declaration SHOULD be provided and SHOULD agree with the 523 charset parameter. Special care is needed when the recipient 524 strips the MIME header and provides persistent storage of the 525 received XML MIME entity (e.g., in a file system). Unless the 526 charset is UTF-8 or UTF-16, the recipient SHOULD also persistently 527 store information about the charset, preferably by embedding a 528 correct XML encoding declaration within the XML MIME entity. 530 Encoding considerations: This media type MAY be encoded as 531 appropriate for the charset and the capabilities of the underlying 532 MIME transport. For 7-bit transports, data in either UTF-8 or 533 UTF-16 MUST be encoded in quoted-printable or base64. For 8-bit 534 clean transport (e.g., 8BITMIME [RFC1652] ESMTP or NNTP 535 [RFC3977]), UTF-8 is not encoded, but the UTF-16 family MUST be 536 encoded in base64. For binary clean transports (e.g., HTTP 537 [RFC2616]), no content-transfer-encoding is necessary. 539 Security considerations: See Section 11. 541 Interoperability considerations: Same as Section 3.1. 543 Published specification: Same as Section 3.1. 545 Applications which use this media type: Same as Section 3.1. 547 Additional information: Same as Section 3.1. 549 Person and email address for further information: Same as 550 Section 3.1. 552 Intended usage: COMMON 554 Author/Change controller: Same as Section 3.1. 556 3.3. Text/xml-external-parsed-entity Registration (deprecated) 558 MIME media type name: text 560 MIME subtype name: xml-external-parsed-entity 562 Mandatory parameters: none 563 Optional parameters: charset 565 The charset parameter of text/xml-external-parsed-entity is 566 handled the same as that of text/xml as described in Section 3.1. 568 Encoding considerations: Same as Section 3.1. 570 Security considerations: See Section 11. 572 Interoperability considerations: XML external parsed entities are as 573 interoperable as XML documents, though they have a less tightly 574 constrained structure and therefore need to be referenced by XML 575 documents for proper handling by XML processors. Similarly, XML 576 documents cannot be reliably used as external parsed entities 577 because external parsed entities are prohibited from having 578 standalone document declarations or DTDs. Identifying XML 579 external parsed entities with their own content type should 580 enhance interoperability of both XML documents and XML external 581 parsed entities. 583 Published specification: Same as Section 3.1. 585 Applications which use this media type: Same as Section 3.1. 587 Additional information: 589 Magic number(s): Same as Section 3.1. 591 File extension(s): .xml or .ent 593 Macintosh File Type Code(s): "TEXT" 595 Person and email address for further information: Same as 596 Section 3.1. 598 Intended usage: COMMON 600 Author/Change controller: Same as Section 3.1. 602 3.4. Application/xml-external-parsed-entity Registration 604 MIME media type name: application 606 MIME subtype name: xml-external-parsed-entity 607 Mandatory parameters: none 609 Optional parameters: charset 611 The charset parameter of application/xml-external-parsed-entity is 612 handled the same as that of application/xml as described in 613 Section 3.2. 615 Encoding considerations: Same as Section 3.2. 617 Security considerations: See Section 11. 619 Interoperability considerations: Same as those for text/ 620 xml-external-parsed-entity as described in Section 3.3. 622 Published specification: Same as text/xml as described in 623 Section 3.1. 625 Applications which use this media type: Same as Section 3.1. 627 Additional information: 629 Magic number(s): Same as Section 3.1. 631 File extension(s): .xml or .ent 633 Macintosh File Type Code(s): "TEXT" 635 Person and email address for further information: Same as 636 Section 3.1. 638 Intended usage: COMMON 640 Author/Change controller: Same as Section 3.1. 642 3.5. Application/xml-dtd Registration 644 MIME media type name: application 646 MIME subtype name: xml-dtd 648 Mandatory parameters: none 650 Optional parameters: charset 652 The charset parameter of application/xml-dtd is handled the same 653 as that of application/xml as described in Section 3.2. 655 Encoding considerations: Same as Section 3.2. 657 Security considerations: See Section 11. 659 Interoperability considerations: XML DTDs have proven to be 660 interoperable by DTD authoring tools and XML browsers, among 661 others. 663 Published specification: Same as text/xml as described in 664 Section 3.1. 666 Applications which use this media type: DTD authoring tools handle 667 external DTD subsets as well as external parameter entities. XML 668 browsers may also access external DTD subsets and external 669 parameter entities. 671 Additional information: 673 Magic number(s): Same as Section 3.1. 675 File extension(s): .dtd or .mod 677 Macintosh File Type Code(s): "TEXT" 679 Person and email address for further information: Same as 680 Section 3.1. 682 Intended usage: COMMON 684 Author/Change controller: Same as Section 3.1. 686 3.6. Summary 688 The following list applies to application/xml, application/ 689 xml-external-parsed-entity, application/xml-dtd, and XML-based media 690 types under top-level types other than "text" that define the charset 691 parameter according to this specification: 693 o Charset parameter is recommended, if it agrees with the xml 694 encoding declaration, and if present, it takes precedence. 696 o If the charset parameter is omitted, conforming XML processors 697 MUST follow the requirements in section 4.3.3 of [XML] or [XML1.1] 698 as appropriate. 700 Although text/xml, text/xml-external-parsed-entity, and subtypes of 701 "text" having the "+xml" suffix are deprecated, the next list applies 702 to these media types: 704 o Charset parameter is strongly recommended. 706 o If the charset parameter is not specified, the default is "us- 707 ascii". The default of "iso-8859-1" in HTTP is explicitly 708 overridden. 710 o No error handling provisions. 712 o An encoding declaration, if present, is irrelevant, but when 713 saving a received resource as a file, the correct encoding 714 declaration SHOULD be inserted. 716 4. The Byte Order Mark (BOM) and Conversions to/from the UTF-16 Charset 718 Section 4.3.3 of [XML] specifies that XML MIME entities in the 719 charset "utf-16" MUST begin with a byte order mark (BOM), which is a 720 hexadecimal octet sequence 0xFE 0xFF (or 0xFF 0xFE, depending on 721 endian). The XML Recommendation further states that the BOM is an 722 encoding signature, and is not part of either the markup or the 723 character data of the XML document. 725 Due to the presence of the BOM, applications that convert XML from 726 "utf-16" to a non-Unicode encoding MUST strip the BOM before 727 conversion. Similarly, when converting from another encoding into 728 "utf-16", the BOM MUST be added after conversion is complete. 730 In addition to the charset "utf-16", [RFC2781] introduces "utf-16le" 731 (little endian) and "utf-16be" (big endian) as well. The BOM is 732 prohibited for these charsets. When an XML MIME entity is encoded in 733 "utf-16le" or "utf-16be", it MUST NOT begin with the BOM but SHOULD 734 contain an encoding declaration. Conversion from "utf-16" to "utf- 735 16be" or "utf-16le" and conversion in the other direction MUST strip 736 or add the BOM, respectively. 738 5. Fragment Identifiers 740 Uniform Resource Identifiers (URIs) may contain fragement identifiers 741 (see Section 3.5 of [RFC3986]). Likewise, Internationalized Resource 742 Identifiers (IRIs) [RFC3987] may contain fragement identifiers. 744 A family of specifications define fragment identifiers for XML media 745 types. The fragment identifier syntax for application/xml is defined 746 by two W3C Recommendations in this family, namely [XPointerFramework] 747 and [XPointerElement]. Schemes other than the element scheme MUST 748 NOT be specified as part of fragment identifiers for these media 749 types. In particular, the xpointer scheme MUST NOT be specified 750 since it is still at the W3C working draft stage. 752 When an XML-based MIME media type follows the naming convention 753 '+xml', the fragment identifier syntax for this media type SHALL 754 include the fragment identifier syntax for application/xml and 755 application/xml-external-parsed-entity. It MAY further allow other 756 registered schemes such as the xmlns scheme and other schemes. 758 A registry of XPointer schemes [XPtrReg] is maintained at the W3C. 759 Unregistered schemes SHOULD NOT be used. 761 If [XPointerFramework] and [XPointerElement] are inappropriate for 762 some XML-based media type, it SHOULD NOT follow the naming convention 763 '+xml'. 765 When a URI has a fragment identifier, it is encoded by a limited 766 subset of the repertoire of US-ASCII [ASCII] characters, as defined 767 in [RFC3986]. When a IRI contains a fragment identifier, it is 768 encoded by a much wider repertoire of characters. The conversion 769 between IRI fragment identifiers and URI fragment identifiers is 770 presented in Section 7 of [RFC3987]. 772 An XPointer fragment identifier does not have to be resolved even 773 when an XML document is retrieved. 775 6. The Base URI 777 Section 5.1 of [RFC3986] specifies that the semantics of a relative 778 URI reference embedded in a MIME entity is dependent on the base URI. 779 The base URI is either (1) the base URI embedded in context, (2) the 780 base URI from the encapsulating entity, (3) the base URI from the 781 Retrieval URI, or (4) the default base URI, where (1) has the highest 782 precedence. [RFC3986] further specifies that the mechanism for 783 embedding the base URI is depaendent on the media type. 785 The media type dependent mechanism for embedding the base URI in a 786 MIME entity of type application/xml or application/ 787 xml-external-parsed-entity is to use the xml:base attribute described 788 in detail in [XBase]. 790 Note that the base URI may be embedded in a different MIME entity, 791 since the default value for the xml:base attribute may be specified 792 in an external DTD subset or external parameter entity. 794 7. XML Versions 796 application/xml, application/xml-external-parsed-entity, and 797 application/xml-dtd, text/xml(deprecated) and text/ 798 xml-external-parsed-entity(deprecated) are to be used with [XML] In 799 all examples herein where version="1.0" is shown, it is understood 800 that version="1.1" may also be used, providing the content does 801 indeed conform to [XML1.1]. 803 The normative requirement of this specification upon XML is to follow 804 the requirements of [XML], section 4.3.3. Except for minor 805 clarifications, that section is substantially identical from the 806 first edition to the current (5th) edition of XML 1.0, and for XML 807 1.1. Therefore, this specification may be used with any version or 808 edition of XML 1.0 or 1.1. 810 Specifications and recommendations based on or referring to this RFC 811 SHOULD indicate any limitations on the particular versions of XML to 812 be used. For example, a particular specification might indicate: 813 "content MUST be represented using media-type application/xml, and 814 the document must either (a) carry an xml declaration specifying 815 version="1.0" or (b) omit the XML declaration, in which case per the 816 XML recommendation the version defaults to 1.0" 818 8. A Naming Convention for XML-Based Media Types 820 This document recommends the use of a naming convention (a suffix of 821 '+xml') for identifying XML-based MIME media types, whatever their 822 particular content may represent. This allows the use of generic XML 823 processors and technologies on a wide variety of different XML 824 document types at a minimum cost, using existing frameworks for media 825 type registration. 827 Although the use of a suffix was not considered as part of the 828 original MIME architecture, this choice is considered to provide the 829 most functionality with the least potential for interoperability 830 problems or lack of future extensibility. The alternatives to the 831 '+xml' suffix and the reason for its selection are described in 832 Appendix A. 834 As XML development continues, new XML document types are appearing 835 rapidly. Many of these XML document types would benefit from the 836 identification possibilities of a more specific MIME media type than 837 text/xml or application/xml can provide, and it is likely that many 838 new media types for XML-based document types will be registered in 839 the near and ongoing future. 841 While the benefits of specific MIME types for particular types of XML 842 documents are significant, all XML documents share common structures 843 and syntax that make possible common processing. 845 Some areas where 'generic' processing is useful include: 847 o Browsing - An XML browser can display any XML document with a 848 provided [CSS] or [XSLT] style sheet, whatever the vocabulary of 849 that document. 851 o Editing - Any XML editor can read, modify, and save any XML 852 document. 854 o Fragment identification - XPointers (see Section 5) can work with 855 any XML document, whatever vocabulary it uses. 857 o Hypertext linking - XLink (work in progress) hypertext linking is 858 designed to connect any XML documents, regardless of vocabulary. 860 o Searching - XML-oriented search engines, web crawlers, agents, and 861 query tools should be able to read XML documents and extract the 862 names and content of elements and attributes even if the tools are 863 ignorant of the particular vocabulary used for elements and 864 attributes. 866 o Storage - XML-oriented storage systems, which keep XML documents 867 internally in a parsed form, should similarly be able to process, 868 store, and recreate any XML document. 870 o Well-formedness and validity checking - An XML processor can 871 confirm that any XML document is well-formed and that it is valid 872 (i.e., conforms to its declared DTD or Schema). 874 When a new media type is introduced for an XML-based format, the name 875 of the media type SHOULD end with '+xml'. This convention will allow 876 applications that can process XML generically to detect that the MIME 877 entity is supposed to be an XML document, verify this assumption by 878 invoking some XML processor, and then process the XML document 879 accordingly. Applications may match for types that represent XML 880 MIME entities by comparing the subtype to the pattern '*/*+xml'. (Of 881 course, 4 of the 5 media types defined in this document -- text/xml, 882 application/xml, text/xml-external-parsed-entity, and application/ 883 xml-external-parsed-entity -- also represent XML MIME entities while 884 not conforming to the '*/*+xml' pattern.) 886 NOTE: Section 14.1 of HTTP [RFC2616] does not support Accept 887 headers of the form "Accept: */*+xml" and so this header MUST NOT 888 be used in this way. Instead, content negotiation [RFC2703] could 889 potentially be used if an XML-based MIME type were needed. 891 Media types following the naming convention '+xml' SHOULD introduce 892 the charset parameter for consistency, since XML-generic processing 893 applies the same program for any such media type. However, there are 894 some cases that the charset parameter needs not be introduced. For 895 example: 897 When an XML-based media type is restricted to UTF-8, it is not 898 necessary to introduce the charset paramter. "UTF-8 only" is a 899 generic principle and UTF-8 is the default of XML. 901 When an XML-based media type is restricted to UTF-8 and UTF-16, it 902 might not be unreasonable to omit the charset parameter. Neither 903 UTF-8 nor UTF-16 require encoding declarations of XML. 905 Note: Some argue that XML-based media types should not introduce 906 the charset parameter, although others disagree. 908 XML generic processing is not always appropriate for XML-based media 909 types. For example, authors of some such media types may wish that 910 the types remain entirely opaque except to applications that are 911 specifically designed to deal with that media type. By NOT following 912 the naming convention '+xml', such media types can avoid XML-generic 913 processing. Since generic processing will be useful in many cases, 914 however -- including in some situations that are difficult to predict 915 ahead of time -- those registering media types SHOULD use the '+xml' 916 convention unless they have a particularly compelling reason not to. 918 The registration process for these media types is described in 919 [RFC4288] and [RFC4289] . The registrar for the IETF tree will 920 encourage new XML-based media type registrations in the IETF tree to 921 follow this guideline. Registrars for other trees SHOULD follow this 922 convention in order to ensure maximum interoperability of their XML- 923 based documents. Similarly, media subtypes that do not represent XML 924 MIME entities MUST NOT be allowed to register with a '+xml' suffix. 926 8.1. Referencing 928 Registrations for new XML-based media types under the top-level type 929 "text" are discouraged for the same reasons that text/xml and text/ 930 xml-external-parsed-entity are deprecated. 932 Registrations for new XML-based media types under top-level types 933 other than "text" SHOULD, in specifying the charset parameter and 934 encoding considerations, define them as: "Same as [charset parameter 935 / encoding considerations] of application/xml as specified in RFC 936 XXXX." 938 The use of the charset parameter is STRONGLY RECOMMENDED, since this 939 information can be used by XML processors to determine 940 authoritatively the charset of the XML MIME entity. If there are 941 some reasons not to follow this advice, they SHOULD be included as 942 part of the registration. As shown above, two such reasons are 943 "UTF-8 only" or "UTF-8 or UTF-16 only". 945 These registrations SHOULD specify that the XML-based media type 946 being registered has all of the security considerations described in 947 RFC XXXX plus any additional considerations specific to that media 948 type. 950 These registrations SHOULD also make reference to RFC XXXX in 951 specifying magic numbers, fragment identifiers, base URIs, and use of 952 the BOM. 954 These registrations MAY reference the applicaiton/xml registration in 955 RFC XXXX in specifying interoperability considerations, if these 956 considerations are not overridden by issues specific to that media 957 type. 959 9. Examples 961 The examples below give the value of the MIME Content-type header and 962 the XML declaration (which includes the encoding declaration) inside 963 the XML MIME entity. For UTF-16 examples, the Byte Order Mark 964 character is denoted as "{BOM}", and the XML declaration is assumed 965 to come at the beginning of the XML MIME entity, immediately 966 following the BOM. Note that other MIME headers may be present, and 967 the XML MIME entity may contain other data in addition to the XML 968 declaration; the examples focus on the Content-type header and the 969 encoding declaration for clarity. 971 9.1. Text/xml (deprecated) with UTF-8 Charset 973 Content-type: text/xml; charset="utf-8" 975 977 This is the recommended charset value for use with text/xml. Since 978 the charset parameter is provided, MIME and XML processors MUST treat 979 the enclosed entity as UTF-8 encoded. 981 If sent using a 7-bit transport (e.g. SMTP [RFC5321]), the XML MIME 982 entity MUST use a content-transfer-encoding of either quoted- 983 printable or base64. For an 8-bit clean transport (e.g., 8BITMIME 984 ESMTP or NNTP), or a binary clean transport (e.g., HTTP), no content- 985 transfer-encoding is necessary. 987 9.2. Text/xml (deprecated) with UTF-16 Charset 989 Content-type: text/xml; charset="utf-16" 991 {BOM} 993 or 995 {BOM} 997 This is possible only when the XML MIME entity is transmitted via 998 HTTP, which uses a MIME-like mechanism and is a binary-clean 999 protocol, hence does not perform CR and LF transformations and allows 1000 NUL octets. As described in [RFC2781], the UTF-16 family MUST NOT be 1001 used with media types under the top-level type "text" except over 1002 HTTP (see section 19.4.1 of [RFC2616] for details). 1004 Since HTTP is binary clean, no content-transfer-encoding is 1005 necessary. 1007 9.3. Text/xml (deprecated) with UTF-16BE Charset 1009 Content-type: text/xml; charset="utf-16be" 1011 1013 Observe that the BOM does not exist. This is again possible only 1014 when the XML MIME entity is transmitted via HTTP. 1016 9.4. Text/xml (deprecated) with ISO-2022-KR Charset 1018 Content-type: text/xml; charset="iso-2022-kr" 1020 1022 This example shows text/xml with a Korean charset (e.g., Hangul) 1023 encoded following the specification in [RFC1557]. Since the charset 1024 parameter is provided, MIME and XML processors MUST treat the 1025 enclosed entity as encoded per RFC 1557. 1027 Since ISO-2022-KR has been defined to use only 7 bits of data, no 1028 content-transfer-encoding is necessary with any transport. 1030 9.5. Text/xml (deprecated) with Omitted Charset 1032 Content-type: text/xml 1034 {BOM} 1036 or 1038 {BOM} 1040 This example shows text/xml with the charset parameter omitted. In 1041 this case, MIME and XML processors MUST assume the charset is "us- 1042 ascii", the default charset value for text media types specified in 1043 [RFC2046]. The default of "us-ascii" holds even if the text/xml 1044 entity is transported using HTTP. 1046 Omitting the charset parameter is NOT RECOMMENDED for text/xml. For 1047 example, even if the contents of the XML MIME entity are UTF-16 or 1048 UTF-8, or the XML MIME entity has an explicit encoding declaration, 1049 XML and MIME processors MUST assume the charset is "us-ascii". 1051 9.6. Application/xml with UTF-16 Charset 1053 Content-type: application/xml; charset="utf-16" 1054 {BOM} 1056 or 1058 {BOM} 1060 This is a recommended charset value for use with application/xml. 1061 Since the charset parameter is provided, MIME and XML processors MUST 1062 treat the enclosed entity as UTF-16 encoded. 1064 If sent using a 7-bit transport (e.g., SMTP) or an 8-bit clean 1065 transport (e.g., 8BITMIME ESMTP or NNTP), the XML MIME entity MUST be 1066 encoded in quoted-printable or base64. For a binary clean transport 1067 (e.g., HTTP), no content-transfer-encoding is necessary. 1069 9.7. Application/xml with UTF-16BE Charset 1071 Content-type: application/xml; charset="utf-16be" 1073 1075 Observe that the BOM does not exist. Since the charset parameter is 1076 provided, MIME and XML processors MUST treat the enclosed entity as 1077 UTF-16BE encoded. 1079 9.8. Application/xml with ISO-2022-KR Charset 1081 Content-type: application/xml; charset="iso-2022-kr" 1083 1085 This example shows application/xml with a Korean charset (e.g., 1086 Hangul) encoded following the specification in [RFC1557]. Since the 1087 charset parameter is provided, MIME and XML processors MUST treat the 1088 enclosed entity as encoded per RFC 1557, independent of whether the 1089 XML MIME entity has an internal encoding declaration (this example 1090 does show such a declaration, which agrees with the charset 1091 parameter). 1093 Since ISO-2022-KR has been defined to use only 7 bits of data, no 1094 content-transfer-encoding is necessary with any transport. 1096 9.9. Application/xml with Omitted Charset and UTF-16 XML MIME Entity 1098 Content-type: application/xml 1100 {BOM} 1101 or 1103 {BOM} 1105 For this example, the XML MIME entity begins with a BOM. Since the 1106 charset has been omitted, a conforming XML processor follows the 1107 requirements of [XML], section 4.3.3. Specifically, the XML 1108 processor reads the BOM, and thus knows deterministically that the 1109 charset is UTF-16. 1111 An XML-unaware MIME processor SHOULD make no assumptions about the 1112 charset of the XML MIME entity. 1114 9.10. Application/xml with Omitted Charset and UTF-8 Entity 1116 Content-type: application/xml 1118 1120 In this example, the charset parameter has been omitted, and there is 1121 no BOM. Since there is no BOM, the XML processor follows the 1122 requirements in section 4.3.3, and optionally applies the mechanism 1123 described in Appendix F (which is non-normative) of [XML] to 1124 determine the charset encoding of UTF-8. The XML MIME entity does 1125 not contain an encoding declaration, but since the encoding is UTF-8, 1126 this is still a conforming XML MIME entity. 1128 An XML-unaware MIME processor SHOULD make no assumptions about the 1129 charset of the XML MIME entity. 1131 9.11. Application/xml with Omitted Charset and Internal Encoding 1132 Declaration 1134 Content-type: application/xml 1136 1138 In this example, the charset parameter has been omitted, and there is 1139 no BOM. However, the XML MIME entity does have an encoding 1140 declaration inside the XML MIME entity that specifies the entity's 1141 charset. Following the requirements in section 4.3.3, and optionally 1142 applying the mechanism described in Appendix F (non-normative) of 1143 [XML], the XML processor determines the charset encoding of the XML 1144 MIME entity (in this example, UCS-4). 1146 An XML-unaware MIME processor SHOULD make no assumptions about the 1147 charset of the XML MIME entity. 1149 9.12. Text/xml-external-parsed-entity (deprecated) with UTF-8 Charset 1151 Content-type: text/xml-external-parsed-entity; charset="utf-8" 1153 1155 This is the recommended charset value for use with text/ 1156 xml-external-parsed-entity. Since the charset parameter is provided, 1157 MIME and XML processors MUST treat the enclosed entity as UTF-8 1158 encoded. 1160 If sent using a 7-bit transport (e.g. SMTP), the XML MIME entity 1161 MUST use a content-transfer-encoding of either quoted-printable or 1162 base64. For an 8-bit clean transport (e.g., 8BITMIME ESMTP or NNTP), 1163 or a binary clean transport (e.g., HTTP) no content-transfer-encoding 1164 is necessary. 1166 9.13. Application/xml-external-parsed-entity with UTF-16 Charset 1168 Content-type: application/xml-external-parsed-entity; 1169 charset="utf-16" 1171 {BOM} 1173 or 1175 {BOM} 1177 This is a recommended charset value for use with application/ 1178 xml-external-parsed-entity. Since the charset parameter is provided, 1179 MIME and XML processors MUST treat the enclosed entity as UTF-16 1180 encoded. 1182 If sent using a 7-bit transport (e.g., SMTP) or an 8-bit clean 1183 transport (e.g., 8BITMIME ESMTP or NNTP), the XML MIME entity MUST be 1184 encoded in quoted-printable or base64. For a binary clean transport 1185 (e.g., HTTP), no content-transfer-encoding is necessary. 1187 9.14. Application/xml-external-parsed-entity with UTF-16BE Charset 1189 Content-type: application/xml-external-parsed-entity; charset="utf- 1190 16be" 1192 1194 Since the charset parameter is provided, MIME and XML processors MUST 1195 treat the enclosed entity as UTF-16BE encoded. 1197 9.15. Application/xml-dtd 1199 Content-type: application/xml-dtd; charset="utf-8" 1201 1203 Charset "utf-8" is a recommended charset value for use with 1204 application/xml-dtd. Since the charset parameter is provided, MIME 1205 and XML processors MUST treat the enclosed entity as UTF-8 encoded. 1207 9.16. Application/mathml+xml 1209 Content-type: application/mathml+xml 1211 1213 MathML documents are XML documents whose content describes 1214 mathematical information, as defined by [MathML]. As a format based 1215 on XML, MathML documents SHOULD use the '+xml' suffix convention in 1216 their MIME content-type identifier. However, no content type has yet 1217 been registered for MathML and so this media type should not be used 1218 until such registration has been completed. 1220 9.17. Application/xslt+xml 1222 Content-type: application/xslt+xml 1224 1226 Extensible Stylesheet Language (XSLT) documents are XML documents 1227 whose content describes stylesheets for other XML documents, as 1228 defined by [XSLT]. As a format based on XML, XSLT documents SHOULD 1229 use the '+xml' suffix convention in their MIME content-type 1230 identifier. However, no content type has yet been registered for 1231 XSLT and so this media type should not be used until such 1232 registration has been completed. 1234 9.18. Application/rdf+xml 1236 Content-type: application/rdf+xml 1238 1240 Resources identified using the application/rdf+xml media type are XML 1241 documents whose content describe RDF metadata. This media type has 1242 been registered at IANA and is fully defined in [RFC3870]. 1244 9.19. Image/svg+xml 1246 Content-type: image/svg+xml 1248 1250 Scalable Vector Graphics (SVG) documents are XML documents whose 1251 content describes graphical information, as defined by [SVG]. As a 1252 format based on XML, SVG documents SHOULD use the '+xml' suffix 1253 convention in their MIME content-type identifier. Content type 1254 registration for SVG is in progress, [SVGMediaType] but depends on 1255 the present document. 1257 9.20. model/x3d+xml 1259 Content-type: model/x3d+xml 1261 1263 X3D is derived from VRML and is used for 3D models. Besides the XML 1264 representation, it may also be serialised in classic VRML syntax and 1265 using a fast infoset. Separate, but clearly related media types are 1266 used for these serialisations (model/x3d+vrml and model/ 1267 x3d+fastinfoset respectively).. 1269 9.21. INCONSISTENT EXAMPLE: Text/xml (deprecated) with UTF-8 Charset 1271 Content-type: text/xml; charset="utf-8" 1273 1275 Since the charset parameter is provided in the Content-Type header, 1276 MIME and XML processors MUST treat the enclosed entity as UTF-8 1277 encoded. That is, the "iso-8859-1" encoding MUST be ignored. 1279 Processors generating XML MIME entities MUST NOT label conflicting 1280 charset information between the MIME Content-Type and the XML 1281 declaration. 1283 9.22. application/xml 1285 Content-type: application/xml 1287 1289 Since the charset parameter is not provided in the Content-Type 1290 header, MIME and XML processors MUST treat the "iso-8859-1" encoding 1291 as authoritative. 1293 Processors generating XML MIME entities MUST NOT label conflicting 1294 charset information between the MIME Content-Type and the XML 1295 declaration. 1297 9.23. Application/soap+xml 1299 Content-type: application/soap+xml 1301 1303 Resources identified using the application/soap+xml media type are 1304 SOAP 1.2 message envelopes that have been serialized with XML 1.0. 1305 This media type has been registered at IANA and is fully defined in 1306 [RFC3902]. 1308 10. IANA Considerations 1310 As described in Section 8, this document updates the [RFC4288] and 1311 [RFC4289] registration process for XML-based MIME types. 1313 11. Security Considerations 1315 XML, as a subset of SGML, has all of the same security considerations 1316 as specified in [RFC1874], and likely more, due to its expected 1317 ubiquitous deployment. 1319 To paraphrase section 3 of RFC 1874, XML MIME entities contain 1320 information to be parsed and processed by the recipient's XML system. 1321 These entities may contain and such systems may permit explicit 1322 system level commands to be executed while processing the data. To 1323 the extent that an XML system will execute arbitrary command strings, 1324 recipients of XML MIME entities may be a risk. In general, it may be 1325 possible to specify commands that perform unauthorized file 1326 operations or make changes to the display processor's environment 1327 that affect subsequent operations. 1329 In general, any information stored outside of the direct control of 1330 the user -- including CSS style sheets, XSL transformations, entity 1331 declarations, and DTDs -- can be a source of insecurity, by either 1332 obvious or subtle means. For example, a tiny "whiteout attack" 1333 modification made to a "master" style sheet could make words in 1334 critical locations disappear in user documents, without directly 1335 modifying the user document or the stylesheet it references. Thus, 1336 the security of any XML document is vitally dependent on all of the 1337 documents recursively referenced by that document. 1339 The entity lists and DTDs for XHTML 1.0 [XHTML], for instance, are 1340 likely to be a commonly used set of information. Many developers 1341 will use and trust them, few of whom will know much about the level 1342 of security on the W3C's servers, or on any similarly trusted 1343 repository. 1345 The simplest attack involves adding declarations that break 1346 validation. Adding extraneous declarations to a list of character 1347 entities can effectively "break the contract" used by documents. A 1348 tiny change that produces a fatal error in a DTD could halt XML 1349 processing on a large scale. Extraneous declarations are fairly 1350 obvious, but more sophisticated tricks, like changing attributes from 1351 being optional to required, can be difficult to track down. Perhaps 1352 the most dangerous option available to crackers is redefining default 1353 values for attributes: e.g., if developers have relied on defaulted 1354 attributes for security, a relatively small change might expose 1355 enormous quantities of information. 1357 Apart from the structural possibilities, another option, "entity 1358 spoofing," can be used to insert text into documents, vandalizing and 1359 perhaps conveying an unintended message. Because XML 1.0 permits 1360 multiple entity declarations, and the first declaration takes 1361 precedence, it's possible to insert malicious content where an entity 1362 is used, such as by inserting the full text of Winnie the Pooh in 1363 every occurrence of —. 1365 Use of the digital signatures work currently underway by the xmldsig 1366 working group may eventually ameliorate the dangers of referencing 1367 external documents not under one's own control. 1369 Use of XML is expected to be varied, and widespread. XML is under 1370 scrutiny by a wide range of communities for use as a common syntax 1371 for community-specific metadata. For example, the Dublin Core 1372 [RFC5013] group is using XML for document metadata, and a new effort 1373 has begun that is considering use of XML for medical information. 1374 Other groups view XML as a mechanism for marshalling parameters for 1375 remote procedure calls. More uses of XML will undoubtedly arise. 1377 Security considerations will vary by domain of use. For example, XML 1378 medical records will have much more stringent privacy and security 1379 considerations than XML library metadata. Similarly, use of XML as a 1380 parameter marshalling syntax necessitates a case by case security 1381 review. 1383 XML may also have some of the same security concerns as plain text. 1384 Like plain text, XML can contain escape sequences that, when 1385 displayed, have the potential to change the display processor 1386 environment in ways that adversely affect subsequent operations. 1387 Possible effects include, but are not limited to, locking the 1388 keyboard, changing display parameters so subsequent displayed text is 1389 unreadable, or even changing display parameters to deliberately 1390 obscure or distort subsequent displayed material so that its meaning 1391 is lost or altered. Display processors SHOULD either filter such 1392 material from displayed text or else make sure to reset all important 1393 settings after a given display operation is complete. 1395 Some terminal devices have keys whose output, when pressed, can be 1396 changed by sending the display processor a character sequence. If 1397 this is possible the display of a text object containing such 1398 character sequences could reprogram keys to perform some illicit or 1399 dangerous action when the key is subsequently pressed by the user. 1400 In some cases not only can keys be programmed, they can be triggered 1401 remotely, making it possible for a text display operation to directly 1402 perform some unwanted action. As such, the ability to program keys 1403 SHOULD be blocked either by filtering or by disabling the ability to 1404 program keys entirely. 1406 Note that it is also possible to construct XML documents that make 1407 use of what XML terms "entity references" (using the XML meaning of 1408 the term "entity" as described in Section 2), to construct repeated 1409 expansions of text. Recursive expansions are prohibited by [XML] and 1410 XML processors are required to detect them. However, even non- 1411 recursive expansions may cause problems with the finite computing 1412 resources of computers, if they are performed many times. (Entity A 1413 consists of 100 copies of entity B, which in turn consists of 100 1414 copies of entity C, and so on) 1416 12. References 1418 12.1. Normative References 1420 [ASCII] "US-ASCII. Coded Character Set -- 7-Bit American Standard 1421 Code for Information Interchange", ANSI X3.4-1986, 1986. 1423 [CSS] Bos, B., Lie, H., Lilley, C., and I. Jacobs, "Cascading 1424 Style Sheets, level 2 (CSS2) Specification", World Wide 1425 Web Consortium Recommendation REC-CSS2, May 1998, 1426 . 1428 [ISO8859] "ISO-8859. International Standard -- Information 1429 Processing -- 8-bit Single-Byte Coded Graphic Character 1430 Sets -- Part 1: Latin alphabet No. 1, ISO-8859-1:1987", 1431 1987. 1433 [MathML] Carlisle, D., Ion, P., Miner, R., and N. Poppelier, 1434 "Mathematical Markup Language (MathML) Version 2.0 (Second 1435 Edition)", World Wide Web Consortium Recommendation REC- 1436 MathML2, October 2003, . 1438 [PNG] Boutell, T., "PNG (Portable Network Graphics) 1439 Specification", World Wide Web Consortium 1440 Recommendation REC-png, October 1996, 1441 . 1443 [RFC1652] Klensin, J., Freed, N., Rose, M., Stefferud, E., and D. 1444 Crocker, "SMTP Service Extension for 8bit-MIMEtransport", 1445 RFC 1652, July 1994. 1447 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 1448 Extensions (MIME) Part One: Format of Internet Message 1449 Bodies", RFC 2045, November 1996. 1451 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 1452 Extensions (MIME) Part Two: Media Types", RFC 2046, 1453 November 1996. 1455 [RFC2077] Nelson, S., Parks, C., and Mitra, "The Model Primary 1456 Content Type for Multipurpose Internet Mail Extensions", 1457 RFC 2077, January 1997. 1459 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1460 Requirement Levels", BCP 14, RFC 2119, March 1997. 1462 [RFC2445] Dawson, F. and D. Stenerson, "Internet Calendaring and 1463 Scheduling Core Object Specification (iCalendar)", 1464 RFC 2445, November 1998. 1466 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Nielsen, H., 1467 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 1468 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 1470 [RFC3023] Murata, M., St.Laurent, S., and D. Kohn, "XML Media 1471 Types", January 2001. 1473 [RFC3501] Crispin, M., "Internet Message Access Protocol - Version 1474 4rev1", RFC 3501, March 2003. 1476 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 1477 10646", RFC 3629, November 2003. 1479 [RFC3977] Feather, B., "Network News Transfer Protocol", RFC 3977, 1480 October 2006. 1482 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 1483 Resource Identifiers (URI): Generic Syntax.", RFC 3986, 1484 January 2005. 1486 [RFC3987] DUeerst, M. and M. Suignard, "Internationalized Resource 1487 Identifiers (IRIs)", RFC 3987, July 2005. 1489 [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and 1490 Registration Procedures", RFC 4288, December 2005. 1492 [RFC4289] Freed, N. and J. Klensin, "Multipurpose Internet Mail 1493 Extensions (MIME) Part Four: Registration Procedures", 1494 RFC 4289, December 2005. 1496 [RFC4918] Dusseault, L., "HTTP Extensions for Distributed Authoring 1497 -- WEBDAV", RFC 4918, June 2007. 1499 [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, 1500 October 2008. 1502 [SGML] International Standard Organization, "Information 1503 Processing -- Text and Office Systems -- Standard 1504 Generalized Markup Language (SGML)", ISO 8879, 1505 October 1986. 1507 [SVG] Ferraiolo, J., Fujisawa, F., and D. Jackson, "Scalable 1508 Vector Graphics (SVG) 1.1 Specification", World Wide Web 1509 Consortium Recommendation SVG, January 2004, 1510 . 1512 [SVGMediaType] 1513 Anderson, O., "Media Type Registration for image/svg+xml", 1514 December 2008, 1515 . 1517 [TAGMIME] Bray, T., Ed., "Internet Media Type registration, 1518 consistency of use", April 2004, 1519 . 1521 [UML] Object Management Group, "OMG Unified Modeling Language 1522 Specification, Version 1.3", OMG Specification ad/ 1523 99-06-08, June 1999, . 1525 [XBase] Marsh, J., "XML Base", World Wide Web Consortium 1526 Recommendation xmlbase, June 2001, 1527 . 1529 [XHTML] Pemberton, S. and et al, "XHTML 1.0: The Extensible 1530 HyperText Markup Language", World Wide Web Consortium 1531 Recommendation xhtml1, December 1999, 1532 . 1534 [XML] Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E., and 1535 F. Yergeau, "Extensible Markup Language (XML) 1.0 (Fifth 1536 Edition)", World Wide Web Consortium Recommendation REC- 1537 xml, November 2008, . 1539 [XML1.1] Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E., 1540 Yergeau, F., and J. Cowan, "Extensible Markup Language 1541 (XML) 1.1", World Wide Web Consortium Recommendation REC- 1542 xml, April 2004, . 1544 [XPointerElement] 1545 Grosso, P., Maler, E., Marsh, J., and N. Walsh, "XPointer 1546 element() Scheme", World Wide Web Consortium 1547 Recommendation REC-XPointer-Element, March 2003, 1548 . 1550 [XPointerFramework] 1551 Grosso, P., Maler, E., Marsh, J., and N. Walsh, "XPointer 1552 Framework", World Wide Web Consortium Recommendation REC- 1553 XPointer-Framework, March 2003, 1554 . 1556 [XPointerXmlns] 1557 DeRose, S., Daniel, R., Maler, E., and J. Marsh, "XPointer 1558 xmlns() Scheme", World Wide Web Consortium 1559 Recommendation REC-XPointer-Xmlns, March 2003, 1560 . 1562 [XPtrReg] Hazael-Massieux, D., "XPointer Registry", 2005, 1563 . 1565 [XSLT] Clark , J., "XSL Transformations (XSLT) Version 1.0", 1566 World Wide Web Consortium Recommendation xslt, 1567 November 1999, . 1569 12.2. Informative References 1571 [RFC1557] Choi, U., Chon, K., and H. Park, "Korean Character 1572 Encoding for Internet Messages", RFC 1557, December 1993. 1574 [RFC1874] Levinson, E., "SGML Media Types", RFC 1874, December 1995. 1576 [RFC2130] Weider, C., Cecilia Preston, C., Simonsen, K., Alvestrand, 1577 H., Atkinson, R., Crispin, M., and P. Svanberg, "The 1578 Report of the IAB Character Set Workshop held 29 February 1579 - 1 March, 1996", RFC 2130, April 1997. 1581 [RFC2376] Whitehead, E. and M. Murata, "XML Media Types", RFC 2376, 1582 July 1998. 1584 [RFC2703] Klyne, G., "Protocol-independent Content Negotiation 1585 Framework", RFC 2703, September 1999. 1587 [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 1588 10646", RFC 2781, Februrary 2000. 1590 [RFC2801] Burdett, D., "Internet Open Trading Protocol - IOTP 1591 Version 1.0", RFC 2801, April 2000. 1593 [RFC3870] 3870, A., "application/rdf+xml Media Type Registration", 1594 RFC 3870, September 2004. 1596 [RFC3902] Baker, M. and M. Nottingham, "The "application/soap+xml" 1597 media type", RFC 3902, September 2004. 1599 [RFC5013] Kunze, J. and T. Baker, "Dublin Core Metadata for Resource 1600 Discovery", RFC 5013, August 2007. 1602 Appendix A. Why Use the '+xml' Suffix for XML-Based MIME Types? 1604 Although the use of a suffix was not considered as part of the 1605 original MIME architecture, this choice is considered to provide the 1606 most functionality with the least potential for interoperability 1607 problems or lack of future extensibility. The alternatives to the 1608 '+xml' suffix and the reason for its selection are described below. 1610 A.1. Why not just use text/xml or application/xml and let the XML 1611 processor dispatch to the correct application based on the 1612 referenced DTD? 1614 text/xml and application/xml remain useful in many situations, 1615 especially for document-oriented applications that involve combining 1616 XML with a stylesheet in order to present the data. However, XML is 1617 also used to define entirely new data types, and an XML-based format 1618 such as image/svg+xml fits the definition of a MIME media type 1619 exactly as well as image/png [PNG] does. (Note that image/svg+xml is 1620 not yet registered.) Although extra functionality is available for 1621 MIME processors that are also XML processors, XML-based media types 1622 -- even when treated as opaque, non-XML media types -- are just as 1623 useful as any other media type and should be treated as such. 1625 Since MIME dispatchers work off of the MIME type, use of text/xml or 1626 application/xml to label discrete media types will hinder correct 1627 dispatching and general interoperability. Finally, many XML 1628 documents use neither DTDs nor namespaces, yet are perfectly legal 1629 XML. 1631 A.2. Why not create a new subtree (e.g., image/xml.svg) to represent 1632 XML MIME types? 1634 The subtree under which a media type is registered -- IETF, vendor 1635 (*/vnd.*), or personal (*/prs.*); see [RFC4288] and [RFC4289] for 1636 details -- is completely orthogonal from whether the media type uses 1637 XML syntax or not. The suffix approach allows XML document types to 1638 be identified within any subtree. The vendor subtree, for example, 1639 is likely to include a large number of XML-based document types. By 1640 using a suffix, rather than setting up a separate subtree, those 1641 types may remain in the same location in the tree of MIME types that 1642 they would have occupied had they not been based on XML. 1644 A.3. Why not create a new top-level MIME type for XML-based media 1645 types? 1647 The top-level MIME type (e.g., model/* [RFC2077]) determines what 1648 kind of content the type is, not what syntax it uses. For example, 1649 agents using image/* to signal acceptance of any image format should 1650 certainly be given access to media type image/svg+xml, which is in 1651 all respects a standard image subtype. It just happens to use XML to 1652 describe its syntax. The two aspects of the media type are 1653 completely orthogonal. 1655 XML-based data types will most likely be registered in ALL top-level 1656 categories. Potential, though currently unregistered, examples could 1657 include application/mathml+xml [MathML], model/uml+xml [UML], and 1658 image/svg+xml [SVG]. 1660 A.4. Why not just have the MIME processor 'sniff' the content to 1661 determine whether it is XML? 1663 Rather than explicitly labeling XML-based media types, the processor 1664 could look inside each type and see whether or not it is XML. The 1665 processor could also cache a list of XML-based media types. 1667 Although this method might work acceptably for some mail 1668 applications, it would fail completely in many other uses of MIME. 1669 For instance, an XML-based web crawler would have no way of 1670 determining whether a file is XML except to fetch it and check. The 1671 same issue applies in some IMAP4 [RFC3501] mail applications, where 1672 the client first fetches the MIME type as part of the message 1673 structure and then decides whether to fetch the MIME entity. 1674 Requiring these fetches just to determine whether the MIME type is 1675 XML could have significant bandwidth and latency disadvantages in 1676 many situations. 1678 Sniffing XML also isn't as simple as it might seem. DOCTYPE 1679 declarations aren't required, and they can appear fairly deep into a 1680 document under certain unpreventable circumstances. (E.g., the XML 1681 declaration, comments, and processing instructions can occupy space 1682 before the DOCTYPE declaration.) Even sniffing the DOCTYPE isn't 1683 completely reliable, thanks to a variety of issues involving default 1684 values for namespaces within external DTDs and overrides inside the 1685 internal DTD. Finally, the variety in potential character encodings 1686 (something XML provides tools to deal with), also makes reliable 1687 sniffing less likely. 1689 A.5. Why not use a MIME parameter to specify that a media type uses XML 1690 syntax? 1692 For example, one could use "Content-Type: application/iotp; 1693 alternate-type=text/xml" or "Content-Type: application/iotp; 1694 syntax=xml". 1696 Section 5 of [RFC2045] says that "Parameters are modifiers of the 1697 media subtype, and as such do not fundamentally affect the nature of 1698 the content". However, all XML-based media types are by their nature 1699 always XML. Parameters, as they have been defined in the MIME 1700 architecture, are never invariant across all instantiations of a 1701 media type. 1703 More practically, very few if any MIME dispatchers and other MIME 1704 agents support dispatching off of a parameter. While MIME agents on 1705 the receiving side will need to be updated in either case to support 1706 (or fall back to) generic XML processing, it has been suggested that 1707 it is easier to implement this functionality when acting off of the 1708 media type rather than a parameter. More important, sending agents 1709 require no update to properly tag an image as "image/svg+xml", but 1710 few if any sending agents currently support always tagging certain 1711 content types with a parameter. 1713 A.6. How about labeling with parameters in the other direction (e.g., 1714 application/xml; Content-Feature=iotp)? 1716 This proposal fails under the simplest case, of a user with neither 1717 knowledge of XML nor an XML-capable MIME dispatcher. In that case, 1718 the user's MIME dispatcher is likely to dispatch the content to an 1719 XML processing application when the correct default behavior should 1720 be to dispatch the content to the application responsible for the 1721 content type (e.g., an ecommerce engine for application/iotp+xml 1722 [RFC2801], once this media type is registered). 1724 Note that even if the user had already installed the appropriate 1725 application (e.g., the ecommerce engine), and that installation had 1726 updated the MIME registry, many operating system level MIME 1727 registries such as .mailcap in Unix and HKEY_CLASSES_ROOT in Windows 1728 do not currently support dispatching off a parameter, and cannot 1729 easily be upgraded to do so. And, even if the operating system were 1730 upgraded to support this, each MIME dispatcher would also separately 1731 need to be upgraded. 1733 A.7. How about a new superclass MIME parameter that is defined to apply 1734 to all MIME types (e.g., Content-Type: application/iotp; 1735 $superclass=xml)? 1737 This combines the problems of Appendix A.5 and Appendix A.6. 1739 If the sender attaches an image/svg+xml file to a message and 1740 includes the instructions "Please copy the French text on the road 1741 sign", someone with an XML-aware MIME client and an XML browser but 1742 no support for SVG can still probably open the file and copy the 1743 text. By contrast, with superclasses, the sender must add superclass 1744 support to her existing mailer AND the receiver must add superclass 1745 support to his before this transaction can work correctly. 1747 If the receiver comes to rely on the superclass tag being present and 1748 applications are deployed relying on that tag (as always seems to 1749 happen), then only upgraded senders will be able to interoperate with 1750 those receiving applications. 1752 A.8. What about adding a new parameter to the Content-Disposition 1753 header or creating a new Content-Structure header to indicate XML 1754 syntax? 1756 This has nearly identical problems to Appendix A.7, in that it 1757 requires both senders and receivers to be upgraded, and few if any 1758 operating systems and MIME dispatchers support working off of 1759 anything other than the MIME type. 1761 A.9. How about a new Alternative-Content-Type header? 1763 This is better than Appendix A.8, in that no extra functionality 1764 needs to be added to a MIME registry to support dispatching of 1765 information other than standard content types. However, it still 1766 requires both sender and receiver to be upgraded, and it will also 1767 fail in many cases (e.g., web hosting to an outsourced server), where 1768 the user can set MIME types (often through implicit mapping to file 1769 extensions), but has no way of adding arbitrary HTTP headers. 1771 A.10. How about using a conneg tag instead (e.g., accept-features: 1772 (syntax=xml))? 1774 When the conneg protocol is fully defined, this may potentially be a 1775 reasonable thing to do. But given the limited current state of 1776 conneg [RFC2703] development, it is not a credible replacement for a 1777 MIME-based solution. 1779 Also, note that adding a content-type parameter doesn't work with 1780 conneg either, since conneg only deals with media types, not their 1781 parameters. This is another illustration of the limits of parameters 1782 for MIME dispatchers. 1784 A.11. How about a third-level content-type, such as text/xml/rdf? 1786 MIME explicitly defines two levels of content type, the top-level for 1787 the kind of content and the second-level for the specific media type. 1788 [RFC4288] and [RFC4289] extends this in an interoperable way by using 1789 prefixes to specify separate trees for IETF, vendor, and personal 1790 registrations. This specification also extends the two-level type by 1791 using the '+xml' suffix. In both cases, processors that are unaware 1792 of these later specifications treat them as opaque and continue to 1793 interoperate. By contrast, adding a third-level type would break the 1794 current MIME architecture and cause numerous interoperability 1795 failures. 1797 A.12. Why use the plus ('+') character for the suffix '+xml'? 1799 As specified in Section 5.1 of [RFC2045], a tspecial can't be used: 1801 tspecials := 1802 "(" / ")" / "<" / ">" / "@" / 1803 "," / ";" / ":" / "\" / <"> 1804 "/" / "[" / "]" / "?" / "=" 1806 It was thought that "." would not be a good choice since it is 1807 already used as an additional hierarchy delimiter. Also, "*" has a 1808 common wildcard meaning, and "-" and "_" are common word separators 1809 and easily confused. The characters %'`#& are frequently used for 1810 quoting or comments and so are not ideal. 1812 That leaves: ~!$^+{}| 1814 Note that "-" is used heavily in the current registry. "$" and "_" 1815 are used once each. The others are currently unused. 1817 It was thought that '+' expressed the semantics that a MIME type can 1818 be treated (for example) as both scalable vector graphics AND ALSO as 1819 XML; it is both simultaneously. 1821 A.13. What is the semantic difference between application/foo and 1822 application/foo+xml? 1824 MIME processors that are unaware of XML will treat the '+xml' suffix 1825 as completely opaque, so it is essential that no extra semantics be 1826 assigned to its presence. Therefore, application/foo and 1827 application/foo+xml SHOULD be treated as completely independent media 1828 types. Although, for example, text/calendar+xml could be an XML 1829 version of text/calendar [RFC2445], it is possible that this 1830 (hypothetical) new media type would include new semantics as well as 1831 new syntax, and in any case, there would be many applications that 1832 support text/calendar but had not yet been upgraded to support text/ 1833 calendar+xml. 1835 A.14. What happens when an even better markup language (e.g., EBML) is 1836 defined, or a new category of data? 1838 In the ten years that MIME has existed, XML is the first generic data 1839 format that has seemed to justify special treatment, so it is hoped 1840 that no further suffixes will be necessary. However, if some are 1841 later defined, and these documents were also XML, they would need to 1842 specify that the '+xml' suffix is always the outermost suffix (e.g., 1843 application/foo+ebml+xml not application/foo+xml+ebml). If they were 1844 not XML, then they would use a regular suffix (e.g., application/ 1845 foo+ebml). 1847 A.15. Why must I use the '+xml' suffix for my new XML-based media type? 1849 You don't have to, but unless you have a good reason to explicitly 1850 disallow generic XML processing, you should use the suffix so as not 1851 to curtail the options of future users and developers. 1853 Whether the inventors of a media type, today, design it for dispatch 1854 to generic XML processing machinery (and most won't) is not the 1855 critical issue. The core notion is that the knowledge that some 1856 media type happens to use XML syntax opens the door to unanticipated 1857 kinds of processing beyond those envisioned by its inventors, and on 1858 this basis identifying such encoding is a good and useful thing. 1860 Developers of new media types are often tightly focused on a 1861 particular type of processing that meets current needs. But there is 1862 no need to rule out generic processing as well, which could make your 1863 media type more valuable over time. It is believed that registering 1864 with the '+xml' suffix will cause no interoperability problems 1865 whatsoever, while it may enable significant new functionality and 1866 interoperability now and in the future. So, the conservative 1867 approach is to include the '+xml' suffix. 1869 A.16. Why not redefine text/xml instead of deprecating it 1871 Since many XML processors do not follow RFC 3023 (they treat the xml 1872 encoding declaration as authoritative) it has been suggested that 1873 text/xml be redefined to follow the same behavior as application/xml 1874 in this specification. However, this pragmatic solution would not be 1875 compatible with the definition of the text/* type for non-HTTP 1876 transports. 1878 Appendix B. Changes from RFC 3023 1880 There are numerous and significant differences between this 1881 specification and [RFC3023], which it obsoletes. This appendix 1882 summarizes the major differences only. 1884 First, text/xml and text/xml-external-parsed-entity are deprecated. 1885 Second, XPointer ([XPointerFramework] and [XPointerElement] and 1886 [XPointerXmlns]) has been added as fragment identifier syntax for 1887 "application/xml", and the XPointer Registry ([XPtrReg]) mentioned. 1888 Third, [XBase] has been added as a mechanism for specifying base 1889 URIs. Fourth, the language regarding charsets was updated to 1890 correspond to the W3C TAG finding Internet Media Type registration, 1891 consistency of use [TAGMIME]. Fifth, many references are updated. 1893 Appendix C. Acknowledgements 1895 This document reflects the input of numerous participants to the 1896 ietf-xml-mime@imc.org mailing list, though any errors are the 1897 responsibility of the authors. Special thanks to: 1899 Mark Baker, James Clark, Dan Connolly, Martin Duerst, Ned Freed, 1900 Yaron Goland, Rick Jelliffe, Larry Masinter, David Megginson, Keith 1901 Moore, Chris Newman, Gavin Nicol, Marshall Rose, Jim Whitehead and 1902 participants of the XML activity at the W3C. 1904 Jim Whitehead and Simon St.Laurent are editors of [RFC2376] and 1905 [RFC3023], respectively. 1907 Authors' Addresses 1909 MURATA Makoto (FAMILY Given) 1910 IBM Tokyo Research Laboratory 1911 1623-14, Shimotsuruma 1912 Yamato-shi, Kanagawa-ken 242-8502 1913 Japan 1915 Phone: +81-46-215-4678 1916 Email: eb2m-mrt@asahi-net.or.jp 1918 Dan Kohn 1919 skymoon ventures 1920 3045 Park Boulevard 1921 Palo Alto, California 94306 1922 USA 1924 Phone: +1-650-327-2600 1925 Email: dan@dankohn.com 1926 URI: http://www.dankohn.com/ 1928 Chris Lilley 1929 World Wide Web Consortium 1930 2004, Route des Lucioles - B.P. 93 06902 1931 Sophia Antipolis Cedex 1932 France 1934 Email: chris@w3.org 1935 URI: http://www.w3.org/People/chris/