idnits 2.17.1 draft-ietf-appsawg-xml-mediatypes-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC3023, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document updates RFC6839, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 07, 2014) is 3671 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '2' on line 1246 -- Looks like a reference, but probably isn't: '3' on line 1252 -- Looks like a reference, but probably isn't: '1' on line 1259 -- Looks like a reference, but probably isn't: '4' on line 1267 -- Looks like a reference, but probably isn't: '5' on line 1275 -- Looks like a reference, but probably isn't: '8' on line 1296 -- Looks like a reference, but probably isn't: '9' on line 1303 -- Looks like a reference, but probably isn't: '11' on line 1310 -- Looks like a reference, but probably isn't: '6' on line 1318 -- Looks like a reference, but probably isn't: '7' on line 1359 -- Looks like a reference, but probably isn't: '13' on line 1367 -- Looks like a reference, but probably isn't: '12' on line 1374 -- Looks like a reference, but probably isn't: '14' on line 1381 -- Looks like a reference, but probably isn't: '15' on line 1388 -- Looks like a reference, but probably isn't: '10' on line 1394 == Unused Reference: 'ASCII' is defined on line 1287, but no explicit reference was found in the text == Outdated reference: A later version (-26) exists of draft-ietf-httpbis-p1-messaging-25 -- Possible downref: Non-RFC (?) normative reference: ref. 'IANA-charsets' ** Downref: Normative reference to an Informational RFC: RFC 2781 ** Downref: Normative reference to an Informational RFC: RFC 6839 -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE' -- Possible downref: Non-RFC (?) normative reference: ref. 'XMLBase' -- Possible downref: Non-RFC (?) normative reference: ref. 'XML' -- Possible downref: Non-RFC (?) normative reference: ref. 'XPointerElement' -- Possible downref: Non-RFC (?) normative reference: ref. 'XPointerFramework' -- Possible downref: Non-RFC (?) normative reference: ref. 'XPtrRegPolicy' -- Possible downref: Non-RFC (?) normative reference: ref. 'XPtrReg' -- Obsolete informational reference (is this intentional?): RFC 2376 (Obsoleted by RFC 3023) -- Obsolete informational reference (is this intentional?): RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 3023 (Obsoleted by RFC 7303) Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 30 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group H. Thompson 3 Internet-Draft University of Edinburgh 4 Obsoletes: 3023 (if approved) C. Lilley 5 Updates: 6839 (if approved) W3C 6 Intended status: Standards Track April 07, 2014 7 Expires: October 9, 2014 9 XML Media Types 10 draft-ietf-appsawg-xml-mediatypes-10 12 Abstract 14 This specification standardizes three media types -- application/xml, 15 application/xml-external-parsed-entity, and application/xml-dtd -- 16 for use in exchanging network entities that are related to the 17 Extensible Markup Language (XML) while defining text/xml and text/ 18 xml-external-parsed-entity as aliases for the respective application/ 19 types. This specification also standardizes the '+xml' suffix for 20 naming media types outside of these five types when those media types 21 represent XML MIME entities. 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on October 9, 2014. 40 Copyright Notice 42 Copyright (c) 2014 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 This document may contain material from IETF Documents or IETF 56 Contributions published or made publicly available before November 57 10, 2008. The person(s) controlling the copyright in some of this 58 material may not have granted the IETF Trust the right to allow 59 modifications of such material outside the IETF Standards Process. 60 Without obtaining an adequate license from the person(s) controlling 61 the copyright in such materials, this document may not be modified 62 outside the IETF Standards Process, and derivative works of it may 63 not be created outside the IETF Standards Process, except to format 64 it for publication as an RFC or to translate it into languages other 65 than English. 67 Table of Contents 69 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 70 2. Notational Conventions . . . . . . . . . . . . . . . . . . . 4 71 2.1. Conformance Keywords . . . . . . . . . . . . . . . . . . 4 72 2.2. Characters, Encodings, Charsets . . . . . . . . . . . . . 4 73 2.3. MIME Entities, XML Entities . . . . . . . . . . . . . . . 4 74 3. Encoding Considerations . . . . . . . . . . . . . . . . . . . 5 75 3.1. XML MIME producers . . . . . . . . . . . . . . . . . . . 6 76 3.2. XML MIME consumers . . . . . . . . . . . . . . . . . . . 6 77 3.3. The Byte Order Mark (BOM) and Encoding Conversions . . . 7 78 4. XML Media Types . . . . . . . . . . . . . . . . . . . . . . . 8 79 4.1. XML MIME Entities . . . . . . . . . . . . . . . . . . . . 9 80 4.2. Using '+xml' when Registering XML-based Media Types . . . 10 81 4.3. Registration Guidelines for XML-based Media Types Not 82 Using '+xml' . . . . . . . . . . . . . . . . . . . . . . 12 83 5. Fragment Identifiers . . . . . . . . . . . . . . . . . . . . 12 84 6. The Base URI . . . . . . . . . . . . . . . . . . . . . . . . 13 85 7. XML Versions . . . . . . . . . . . . . . . . . . . . . . . . 13 86 8. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 14 87 8.1. UTF-8 Charset . . . . . . . . . . . . . . . . . . . . . . 14 88 8.2. UTF-16 Charset . . . . . . . . . . . . . . . . . . . . . 15 89 8.3. Omitted Charset and 8-bit MIME Entity . . . . . . . . . . 15 90 8.4. Omitted Charset and 16-bit MIME Entity . . . . . . . . . 16 91 8.5. Omitted Charset, no Internal Encoding Declaration . . . . 16 92 8.6. UTF-16BE Charset . . . . . . . . . . . . . . . . . . . . 17 93 8.7. Non-UTF Charset . . . . . . . . . . . . . . . . . . . . . 17 94 8.8. INCONSISTENT EXAMPLE: Conflicting Charset and Internal 95 Encoding Declaration . . . . . . . . . . . . . . . . . . 17 96 8.9. INCONSISTENT EXAMPLE: Conflicting Charset and BOM . . . . 18 98 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 99 9.1. Application/xml Registration . . . . . . . . . . . . . . 18 100 9.2. Text/xml Registration . . . . . . . . . . . . . . . . . . 20 101 9.3. Application/xml-external-parsed-entity Registration . . . 20 102 9.4. Text/xml-external-parsed-entity Registration . . . . . . 21 103 9.5. Application/xml-dtd Registration . . . . . . . . . . . . 21 104 9.6. The '+xml' Naming Convention for XML-Based Media Types . 22 105 9.6.1. +xml Structured Syntax Suffix Registration . . . . . 22 106 10. Security Considerations . . . . . . . . . . . . . . . . . . . 24 107 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 25 108 11.1. Normative References . . . . . . . . . . . . . . . . . . 26 109 11.2. Informative References . . . . . . . . . . . . . . . . . 28 110 Appendix A. Why Use the '+xml' Suffix for XML-Based MIME Types? 30 111 Appendix B. Core XML specifications . . . . . . . . . . . . . . 30 112 Appendix C. Operational considerations . . . . . . . . . . . . . 31 113 C.1. General considerations . . . . . . . . . . . . . . . . . 31 114 C.2. Considerations for producers . . . . . . . . . . . . . . 31 115 C.3. Considerations for consumers . . . . . . . . . . . . . . 32 116 Appendix D. Changes from RFC 3023 . . . . . . . . . . . . . . . 32 117 Appendix E. Acknowledgements . . . . . . . . . . . . . . . . . . 33 118 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 120 1. Introduction 122 The World Wide Web Consortium has issued the Extensible Markup 123 Language (XML) 1.0 [XML] and Extensible Markup Language (XML) 1.1 124 [XML1.1] specifications. To enable the exchange of XML network 125 entities, this specification standardizes three media types -- 126 application/xml, application/xml-external-parsed-entity, and 127 application/xml-dtd and two aliases -- text/xml and text/xml- 128 external-parsed-entity, as well as a naming convention for 129 identifying XML-based MIME media types (using '+xml'). 131 XML has been used as a foundation for other media types, including 132 types in every branch of the IETF media types tree. To facilitate 133 the processing of such types, and in line with the recognition in 134 [RFC6838] of structured syntax name suffixes, a suffix of '+xml' is 135 registered in Section 9.6. This will allow generic XML-based tools 136 -- browsers, editors, search engines, and other processors -- to work 137 with all XML-based media types. 139 This specification replaces [RFC3023]. Major differences are in the 140 areas of alignment of text/xml and text/xml-external-parsed-entity 141 with application/xml and application/xml-external-parsed-entity 142 respectively, the addition of XPointer and XML Base as fragment 143 identifiers and base URIs, respectively, integration of the XPointer 144 Registry and updating of many references. 146 2. Notational Conventions 148 2.1. Conformance Keywords 150 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 151 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 152 "OPTIONAL" in this specification are to be interpreted as described 153 in [RFC2119]. 155 2.2. Characters, Encodings, Charsets 157 Both XML (in an XML or Text declaration using the encoding pseudo- 158 attribute) and MIME (in a Content-Type header field using the charset 159 parameter) use a common set of labels [IANA-charsets] to identify the 160 MIME charset (mapping from byte stream to character sequence 161 [RFC2978]). 163 In this specification we will use the phrases "charset parameter" and 164 "encoding declaration" to refer to whatever MIME charset is specified 165 by a MIME charset parameter or XML encoding declaration respectively. 166 We reserve the phrase "character encoding" (or, when the context 167 makes the intention clear, simply "encoding") for the MIME charset 168 actually used in a particular XML MIME entity. 170 [UNICODE] defines three "encoding forms", namely UTF-8, UTF-16, and 171 UTF-32. As UTF-8 can only be serialized in one way, the only 172 possible label for UTF-8-encoded documents when serialised into MIME 173 entities is "utf-8". UTF-16 XML documents, however, can be 174 serialised into MIME entities in one of two ways: either big- endian, 175 labelled (optionally) "utf-16" or "utf-16be", or little- endian, 176 labelled (optionally) "utf-16" or "utf-16le". See Section 3.3 below 177 for how a Byte Order Mark (BOM) is required when the "utf-16" 178 serialization is used. 180 UTF-32 has four potential serializations, of which only two (UTF-32BE 181 and UTF-32LE) are given names in [UNICODE]. Support for the various 182 serializations varies widely, and security concerns about their use 183 have been raised (see for example [Sivonen]). The use of UTF-32 is 184 NOT RECOMMENDED for XML MIME entities. 186 2.3. MIME Entities, XML Entities 188 As sometimes happens between two communities, both MIME and XML have 189 defined the term entity, with different meanings. Section 2.4 of 190 [RFC2045] says: 192 "The term 'entity' refers specifically to the MIME-defined header 193 fields and contents of either a message or one of the parts in the 194 body of a multipart entity." 196 Section 4 of [XML] says: 198 "An XML document may consist of one or many storage units. These 199 are called entities; they all have content and are all (except for 200 the document entity and the external DTD subset) identified by 201 entity name". 203 In this specification, "XML MIME entity" is defined as the latter (an 204 XML entity) encapsulated in the former (a MIME entity). 206 Furthermore, XML provides for the naming and referencing of entities 207 for purposes of inclusion and/or substitution. In this specification 208 "XML-entity declaration/reference/..." is used to avoid confusion 209 when referring to such cases. 211 3. Encoding Considerations 213 The registrations below all address issues around character encoding 214 in the same way, by referencing this section. 216 As many as three distinct sources of information about character 217 encoding may be present for an XML MIME entity: a charset parameter, 218 a Byte Order Mark (BOM -- see Section 3.3 below) and an XML encoding 219 declaration (see Section 4.3.3 of [XML]). Ensuring consistency among 220 these sources requires coordination between entity authors and MIME 221 agents (that is, processes which package, transfer, deliver and/or 222 receive MIME entities). 224 The use of UTF-8, without a BOM, is RECOMMENDED for all XML MIME 225 entities. 227 Some MIME agents will be what we will call "XML-aware", that is, 228 capable of processing XML MIME entities as XML and detecting the XML 229 encoding declaration (or its absence). All three sources of 230 information about encoding are available to them, and they can be 231 expected to be aware of this spec. 233 Other MIME agents will not be XML-aware, and thus cannot know 234 anything about the XML encoding declaration. Not only do they lack 235 one of the three sources of information about encoding, they are also 236 less likely to be aware of or responsive to this spec. 238 Some MIME agents, such as proxies and transcoders, both consume and 239 produce MIME entities. 241 This mixture of two kinds of agents handling XML MIME entities 242 increases the complexity of the coordination task. The 243 recommendations given below are intended to maximise interoperability 244 in the face of this, by on the one hand mandating consistent 245 production and encouraging maximally robust forms of production, and 246 on the other specifying recovery strategies to maximize the 247 interoperability of consumers when the production rules are broken. 249 3.1. XML MIME producers 251 XML-aware MIME producers SHOULD supply a charset parameter and/or an 252 appropriate BOM with non-UTF-8-encoded XML MIME entities which lack 253 an encoding declaration. Such producers SHOULD remove or correct an 254 encoding declaration which is known to be incorrect (for example, as 255 a result of transcoding). 257 XML-aware MIME producers MUST supply an XML text declaration at the 258 beginning of non-UNICODE XML external parsed entities which would 259 otherwise begin with the hexadecimal octet sequences 0xFE 0xFF, 0xFF 260 0xFE or 0xEF 0xBB 0xBF, in order to avoid the mistaken detection of a 261 BOM. 263 XML-unaware MIME producers MUST NOT supply a charset parameter with 264 an XML MIME entity unless the entity's character encoding is reliably 265 known. Note that this is particularly relevant for central 266 configuration of web servers, where configuring a default for the 267 charset parameter will almost certainly violate this requirement. 269 XML MIME producers are RECOMMENDED to provide means for users to 270 control what value, if any, is given to charset parameters for XML 271 MIME entities, for example by giving users control of the 272 configuration of Web server filename-to-Content-Type-header mappings 273 on a file-by-file or suffix basis. 275 3.2. XML MIME consumers 277 For XML MIME consumers, the question of priority arises in cases when 278 the available character encoding information is not consistent. 279 Again, we must distinguish betweeen XML-aware and XML-unaware agents. 281 When a charset parameter is specified for an XML MIME entity, the 282 normative component of the [XML] specification leaves the question 283 open as to how to determine the encoding with which to attempt to 284 process the entity. This is true independently of whether or not the 285 entity contains in-band encoding information, that is, either a BOM 286 (Section 3.3) or an XML encoding declaration, or both, or neither. 287 In particular, in the case where there is in-band information and it 288 conflicts with the charset parameter, the [XML] specification does 289 not specify which is authoritative. In its (non-normative) 290 Appendix F it defers to this specification: 292 [T]he preferred method of handling conflict should be specified as 293 part of the higher-level protocol used to deliver XML. In 294 particular, please refer to [IETF RFC 3023] or its successor... 296 Accordingly, to conform with deployed processors and content and to 297 avoid conflicting with this or other normative specifications, this 298 specification sets the priority as follows: 300 A BOM (Section 3.3) is authoritative if it is present in an XML 301 MIME entity; 303 In the absence of a BOM (Section 3.3), the charset parameter is 304 authoritative if it is present. 306 Whenever the above determines a source of encoding information as 307 authoritative, consumers SHOULD process XML MIME entities based on 308 that information. 310 When MIME producers conform to the requirements stated above 311 (Section 3.1, Section 3) inconsistencies will not arise---the above 312 statement of priorities only has practical impact in the case of non- 313 conforming XML MIME entities. In the face of inconsistencies, no 314 uniform strategy can deliver the 'right' answer every time: the 315 purpose of specifying one here is to encourage convergence over time, 316 first on the part of consumers, then on the part of producers. 318 For XML-aware consumers, note that Section 4.3.3 of [XML] does _not_ 319 make it an error for the charset parameter and the XML encoding 320 declaration (or the UTF-8 default in the absence of encoding 321 declaration and BOM) to be inconsistent, although such consumers 322 might choose to issue a warning in this case. 324 If an XML MIME entity is received where the charset parameter is 325 omitted, no information is being provided about the character 326 encoding by the MIME Content-Type header. XML-aware consumers MUST 327 follow the requirements in section 4.3.3 of [XML] that directly 328 address this case. XML-unaware MIME consumers SHOULD NOT assume a 329 default encoding in this case. 331 3.3. The Byte Order Mark (BOM) and Encoding Conversions 333 Section 4.3.3 of [XML] specifies that UTF-16 XML MIME entities not 334 labelled as "utf-16le" or "utf-16be" MUST begin with a byte order 335 mark (BOM), U+FEFF, which appears as the hexadecimal octet sequence 336 0xFE 0xFF (big-endian) or 0xFF 0xFE (little-endian). [XML] further 337 states that the BOM is an encoding signature, and is not part of 338 either the markup or the character data of the XML document. 340 Due to the presence of the BOM, applications that convert XML from 341 UTF-16 to an encoding other than UTF-8 MUST strip the BOM before 342 conversion. Similarly, when converting from another encoding into 343 UTF-16, either without a charset parameter, or labelled "utf-16", the 344 BOM MUST be added unless the original encoding was UTF-8 and a BOM 345 was already present, in which case it MUST be transcoded into the 346 appropriate UTF-16 BOM. 348 Section 4.3.3 of [XML] also allows for UTF-8 XML MIME entities to 349 begin with a BOM, which appears as the hexadecimal octet sequence 350 0xEF 0xBB 0xBF. This is likewise defined to be an encoding 351 signature, and not part of either the markup or the character data of 352 the XML document. 354 Applications that convert XML from UTF-8 to an encoding other than 355 UTF-16 MUST strip the BOM, if present, before conversion. 356 Applications which convert XML into UTF-8 MAY add a BOM. 358 In addition to the MIME charset "utf-16", [RFC2781] introduces "utf- 359 16le" (little endian) and "utf-16be" (big endian). When an XML MIME 360 entity is encoded in "utf-16le" or "utf-16be", it MUST NOT begin with 361 the BOM but SHOULD contain an in-band XML encoding declaration. 362 Conversion from UTF-8 or UTF-16 (unlabelled, or labelled with 363 "utf-16") to "utf-16be" or "utf-16le" MUST strip a BOM if present. 364 Conversion from UTF-16 labelled "utf-16le" or "utf-16be" to UTF-16 365 without a label or labelled "utf-16" MUST add the appropriate BOM. 366 Conversion from UTF-16 labelled "utf-16le" or "utf-16be" to UTF-8 MAY 367 add a UTF-8 BOM, but this is NOT RECOMMENDED. 369 Appendix F of [XML] also implies the a UTF-32 BOM may be used in 370 conjunction with UTF-32-encoded documents. As noted above, this 371 specification recommends against the use of UTF-32, but if it is 372 used, the same considerations apply with respect to its being a 373 signature, not part of the document, with respect to transcoding into 374 or out of it and with respect to the MIME charsets "utf-32le" and 375 "utf-32be", as for UTF-16. Consumers which do not support UTF-32 376 SHOULD none-the-less recognise UTF-32 signatures in order to give 377 helpful error messages (instead of treating them as invalid UTF-16). 379 4. XML Media Types 380 4.1. XML MIME Entities 382 Within the XML specification, XML MIME entities can be classified 383 into four types. In the XML terminology, they are called "document 384 entities", "external DTD subsets", "external parsed entities", and 385 "external parameter entities". Appropriate usage for the types 386 registered below is as follows: 388 document entities: The media types application/xml or text/xml, or a 389 more specific media type (see Section 9.6), SHOULD be used. 391 external DTD subsets: The media type application/xml-dtd SHOULD be 392 used. The media types application/xml and text/xml MUST NOT be 393 used. 395 external parsed entities: The media types application/xml-external- 396 parsed-entity or text/xml-external-parsed-entity SHOULD be used. 397 The media types application/xml and text/xml MUST NOT be used 398 unless the parsed entities are also well-formed "document 399 entities". 401 external parameter entities: The media type application/xml-dtd 402 SHOULD be used. The media types application/xml and text/xml MUST 403 NOT be used. 405 Note that [RFC3023] (which this specification obsoletes) recommended 406 the use of text/xml and text/xml-external-parsed-entity for document 407 entities and external parsed entities, respectively, but described 408 handling of character encoding which differed from common 409 implementation practice. These media types are still commonly used, 410 and this specification aligns the handling of character encoding with 411 industry practice. 413 Note that [RFC2376] (which is obsolete) allowed application/xml and 414 text/xml to be used for any of the four types, although in practice 415 it is likely to have been rare. 417 Neither external DTD subsets nor external parameter entities parse as 418 XML documents, and while some XML document entities may be used as 419 external parsed entities and vice versa, there are many cases where 420 the two are not interchangeable. XML also has unparsed entities, 421 internal parsed entities, and internal parameter entities, but they 422 are not XML MIME entities. 424 Compared to [RFC2376] or [RFC3023], this specification alters the 425 handling of character encoding of text/xml and text/xml-external- 426 parsed-entity, treating them no differently from the respective 427 application/ types. However application/xml and application/xml- 428 external-parsed-entity are still RECOMMENDED, to avoid possible 429 confusion based on the earlier distinction. The former confusion 430 around the question of default character sets for the two text/ types 431 no longer arises because 433 [HTTPbis] changes [RFC2616] by removing the ISO-8859-1 default and 434 not defining any default at all; 436 [RFC6657] updates [RFC2046] to remove the US-ASCII default. 438 See Section 3 for the now-unified approach to the charset parameter 439 which results. 441 XML provides a general framework for defining sequences of structured 442 data. It is often appropriate to define new media types that use XML 443 but define a specific application of XML, due to domain-specific 444 display, editing, security considerations or runtime information. 445 Furthermore, such media types may allow only UTF-8 and/or UTF-16 and 446 prohibit other character sets. This specification does not prohibit 447 such media types and in fact expects them to proliferate. However, 448 developers of such media types are RECOMMENDED to use this 449 specification as a basis for their registration. See Section 4.2 for 450 more detailed recommendations on using the '+xml' suffix for 451 registration of such media types. 453 An XML document labeled as application/xml or text/xml, or with a 454 '+xml' media type, might contain namespace declarations, stylesheet- 455 linking processing instructions (PIs), schema information, or other 456 declarations that might be used to suggest how the document is to be 457 processed. For example, a document might have the XHTML namespace 458 and a reference to a CSS stylesheet. Such a document might be 459 handled by applications that would use this information to dispatch 460 the document for appropriate processing. Appendix B lists the core 461 XML specifications which, taken together with [XML] itself, show how 462 to determine an XML document's language-level semantics and suggest 463 how information about its application-level semantics may be 464 locatable. 466 4.2. Using '+xml' when Registering XML-based Media Types 468 In Section 9.6, this specification updates the [RFC6839] registration 469 for XML-based MIME types (the '+xml' types). 471 When a new media type is introduced for an XML-based format, the name 472 of the media type SHOULD end with '+xml' unless generic XML 473 processing is in some way inappropriate for documents of the new 474 type. This convention will allow applications that can process XML 475 generically to detect that the MIME entity is supposed to be an XML 476 document, verify this assumption by invoking some XML processor, and 477 then process the XML document accordingly. Applications may check 478 for types that represent XML MIME entities by comparing the last four 479 characters of the subtype to the string '+xml'. (However note that 4 480 of the 5 media types defined in this specification -- text/xml, 481 application/xml, text/xml-external-parsed-entity, and application/ 482 xml-external-parsed-entity -- also represent XML MIME entities while 483 not ending with '+xml'.) 485 NOTE: Section 5.3.2 of HTTPbis [HTTPbis] does not support any form 486 of Accept header which will match only '+xml' types. In 487 particular, Accept headers of the form "Accept: */*+xml" are not 488 allowed, and will not work for this purpose. 490 Media types following the naming convention '+xml' SHOULD define the 491 charset parameter for consistency, since XML-generic processing by 492 definition treats treats all XML MIME entities uniformly as regards 493 character encoding information. However, there are some cases that 494 the charset parameter need not be defined. For example: 496 When an XML-based media type is restricted to UTF-8, it is not 497 necessary to define the charset parameter. UTF-8 is the default 498 for XML. 500 When an XML-based media type is restricted to UTF-8 and UTF-16, it 501 might not be unreasonable to omit the charset parameter. Neither 502 UTF-8 nor UTF-16 require XML encoding declarations. 504 XML generic processing is not always appropriate for XML-based media 505 types. For example, authors of some such media types may wish that 506 the types remain entirely opaque except to applications that are 507 specifically designed to deal with that media type. By NOT following 508 the naming convention '+xml', such media types can avoid XML-generic 509 processing. Since generic processing will be useful in many cases, 510 however -- including in some situations that are difficult to predict 511 ahead of time -- the '+xml' convention is to be preferred unless 512 there is some particularly compelling reason not to. 514 The registration process for specific '+xml' media types is described 515 in [RFC6838]. New XML-based media type registrations in the IETF 516 must follow these guidelines. When other organisations register XML- 517 based media types via the "Specification Required" IANA registration 518 policy, the relevant Media Reviewer should ensure that they use the 519 '+xml' convention, in order to ensure maximum interoperability of 520 their XML-based documents. Only media subtypes that represent XML 521 MIME entities are allowed to register with a '+xml' suffix. 523 In addition to the changes described above, the change controller has 524 been changed to be the World Wide Web Consortium (W3C). 526 4.3. Registration Guidelines for XML-based Media Types Not Using '+xml' 528 Registrations for new XML-based media types which do _not_ use the 529 '+xml' suffix SHOULD, in specifying the charset parameter and 530 encoding considerations, define them as: "Same as [charset parameter 531 / encoding considerations] of application/xml as specified in RFC 532 XXXX." 534 Defining the charset parameter is RECOMMENDED, since this information 535 can be used by XML processors to determine authoritatively the 536 character encoding of the XML MIME entity in the absence of a BOM. 537 If there are some reasons not to follow this advice, they SHOULD be 538 included as part of the registration. As shown above, two such 539 reasons are "UTF-8 only" or "UTF-8 or UTF-16 only". 541 These registrations SHOULD specify that the XML-based media type 542 being registered has all of the security considerations described in 543 RFC XXXX plus any additional considerations specific to that media 544 type. 546 These registrations SHOULD also make reference to RFC XXXX in 547 specifying magic numbers, base URIs, and use of the BOM. 549 These registrations MAY reference the application/xml registration in 550 RFC XXXX in specifying interoperability and fragment identifier 551 considerations, if these considerations are not overridden by issues 552 specific to that media type. 554 5. Fragment Identifiers 556 Uniform Resource Identifiers (URIs) can contain fragment identifiers 557 (see Section 3.5 of [RFC3986]). Specifying the syntax and semantics 558 of fragment identifiers is devolved by [RFC3986] to the appropriate 559 media type registration. 561 The syntax and semantics of fragment identifiers for the XML media 562 types defined in this specification are based on the 563 [XPointerFramework] W3C Recommendation. It allows simple names, and 564 more complex constructions based on named schemes. When the syntax 565 of a fragment identifier part of any URI or IRI ([RFC3987]) with a 566 retrieved media type governed by this specification conforms to the 567 syntax specified in [XPointerFramework], conforming applications MUST 568 interpret such fragment identifiers as designating whatever is 569 specified by the [XPointerFramework] together with any other 570 specifications governing the XPointer schemes used in those 571 identifiers which the applications support. Conforming applications 572 MUST support the 'element' scheme as defined in [XPointerElement], 573 but need not support other schemes. 575 If an XPointer error is reported in the attempt to process the part, 576 this specification does not define an interpretation for the part. 578 A registry of XPointer schemes [XPtrReg] is maintained at the W3C. 579 Generic processors of XML MIME entities SHOULD NOT implement 580 unregistered XPointer schemes ([XPtrRegPolicy] describes requirements 581 and procedures for registering schemes). 583 See Section 4.2 for additional requirements which apply when an XML- 584 based media type follows the naming convention '+xml'. 586 If [XPointerFramework] and [XPointerElement] are inappropriate for 587 some XML-based media type, it SHOULD NOT follow the naming convention 588 '+xml'. 590 When a URI has a fragment identifier, it is encoded by a limited 591 subset of the repertoire of US-ASCII characters, see 592 [XPointerFramework] for details.. 594 6. The Base URI 596 An XML MIME entity of type application/xml, text/xml, application/ 597 xml-external-parsed-entity or text/xml-external-parsed-entity MAY use 598 the xml:base attribute, as described in [XMLBase], to embed a base 599 URI in that entity for use in resolving relative URI references (see 600 Section 5.1 of [RFC3986]). 602 Note that the base URI itself might be embedded in a different MIME 603 entity, since the default value for the xml:base attribute can be 604 specified in an external DTD subset or external parameter entity. 605 Since conforming XML processors need not always read and process 606 external entities, the effect of such an external default is 607 uncertain and therefore its use is NOT RECOMMENDED. 609 7. XML Versions 611 application/xml, application/xml-external-parsed-entity, and 612 application/xml-dtd, text/xml and text/xml-external-parsed-entity are 613 to be used with [XML]. In all examples herein where version="1.0" is 614 shown, it is understood that version="1.1" might also appear, 615 providing the content does indeed conform to [XML1.1]. 617 The normative requirement of this specification upon XML documents 618 and processors is to follow the requirements of [XML], section 4.3.3. 620 Except for minor clarifications, that section is substantially 621 identical from the first edition to the current (5th) edition of XML 622 1.0, and for XML 1.1 1st or 2nd edition [XML1.1]. Therefore, 623 references herein to [XML] may be interpreted as referencing any 624 existing version or edition of XML, or any subsequent edition or 625 version which makes no incompatible changes to that section. 627 Specifications and recommendations based on or referring to this RFC 628 SHOULD indicate any limitations on the particular versions or 629 editions of XML to be used. 631 8. Examples 633 This section is non-normative. In particular, note that all 634 [RFC2119] language herein reproduces or summarizes the consequences 635 of normative statements already made above, and has no independent 636 normative force, and accordingly does not appear in uppercase. 638 The examples below give the MIME Content-type header, including the 639 charset parameter, if present and the XML declaration or Text 640 declaration (which includes the encoding declaration) inside the XML 641 MIME entity. For UTF-16 examples, the Byte Order Mark character 642 appropriately UTF-16-encoded is denoted as "{BOM}", and the XML or 643 Text declaration is assumed to come at the beginning of the XML MIME 644 entity, immediately following the encoded BOM. Note that other MIME 645 headers may be present, and the XML MIME entity will normally contain 646 other data in addition to the XML declaration; the examples focus on 647 the Content-type header and the encoding declaration for clarity. 649 Although they show a content type of 'application/xml', all the 650 examples below apply to all five media types declared below in 651 Section 9, as well as to any media types declared using the '+xml' 652 convention (with the exception of the examples involving the charset 653 parameter for any such media types which do not enable its use). See 654 the XML MIME entities table (Section 4.1, Paragraph 1) for discussion 655 of which types are appropriate for which varieties of XML MIME 656 entity. 658 8.1. UTF-8 Charset 660 Content-Type: application/xml; charset=utf-8 662 664 or 666 667 UTF-8 is the recommended encoding for use with all the media types 668 defined in this specification. Since the charset parameter is 669 provided and there is no overriding BOM, conformant MIME and XML 670 processors must treat the enclosed entity as UTF-8 encoded. 672 If sent using a 7-bit transport (e.g. SMTP [RFC5321]), in general, a 673 UTF-8 XML MIME entity must use a content-transfer-encoding of either 674 quoted-printable or base64. For an 8-bit clean transport (e.g. 675 8BITMIME ESMTP or NNTP), or a binary clean transport (e.g. BINARY 676 ESMTP or HTTP), no content-transfer-encoding is necessary (or even 677 possible, in the case of HTTP). 679 8.2. UTF-16 Charset 681 Content-Type: application/xml; charset=utf-16 683 {BOM} 685 or 687 {BOM} 689 For the three application/ media types defined above, if sent using a 690 7-bit transport (e.g. SMTP) or an 8-bit clean transport (e.g. 691 8BITMIME ESMTP or NNTP), the XML MIME entity must be encoded in 692 quoted-printable or base64; for a binary clean transport (e.g. BINARY 693 ESMTP or HTTP), no content-transfer-encoding is necessary (or even 694 possible, in the case of HTTP). 696 As described in [RFC2781], the UTF-16 family must not be used with 697 media types under the top-level type "text" except over HTTP or HTTPS 698 (see section A.2 of HTTP [HTTPbis] for details). Hence one of the 699 two text/ media types defined above can be used with this exampleonly 700 when the XML MIME entity is transmitted via HTTP or HTTPS, which use 701 a MIME-like mechanism and are binary-clean protocols, hence do not 702 perform CR and LF transformations and allow NUL octets. Since HTTP 703 is binary clean, no content-transfer-encoding is necessary (or even 704 possible). 706 8.3. Omitted Charset and 8-bit MIME Entity 708 Content-Type: application/xml 710 712 Since the charset parameter is not provided in the Content-Type 713 header and there is no overriding BOM, conformant XML processors must 714 treat the "iso-8859-1" encoding as authoritative. Conformant XML- 715 unaware MIME processors should make no assumptions about the 716 character encoding of the XML MIME entity. 718 8.4. Omitted Charset and 16-bit MIME Entity 720 Content-Type: application/xml 722 {BOM} 724 or 726 {BOM} 728 This example shows a 16-bit MIME entity with no charset parameter. 729 However since there is a BOM conformant processors must treat the 730 entity as UTF-16-encoded. 732 Omitting the charset parameter is not recommended in conjunction with 733 media types under the top-level type "application" when used with 734 transports other than HTTP or HTTPS. Media types under the top-level 735 type "text" should not be used for 16-bit MIME with transports other 736 than HTTP or HTTPS (see discussion above (Section 8.2, Paragraph 7)). 738 8.5. Omitted Charset, no Internal Encoding Declaration 740 Content-Type: application/xml 742 744 In this example, the charset parameter has been omitted, there is no 745 internal encoding declaration, and there is no BOM. Since there is 746 no BOM or charset parameter, the XML processor follows the 747 requirements in section 4.3.3, and optionally applies the mechanism 748 described in Appendix F (which is non-normative) of [XML] to 749 determine an encoding of UTF-8. Although the XML MIME entity does 750 not contain an encoding declaration, provided the encoding actually 751 _is_ UTF-8, this is a conforming XML MIME entity. 753 A conformant XML-unaware MIME processor should make no assumptions 754 about the character encoding of the XML MIME entity. 756 See Section 8.1 for transport-related issues for UTF-8 XML MIME 757 entities. 759 8.6. UTF-16BE Charset 761 Content-Type: application/xml; charset=utf-16be 763 765 Observe that, as required for this encoding, there is no BOM. Since 766 the charset parameter is provided and there is no overriding BOM, 767 conformant MIME and XML processors must treat the enclosed entity as 768 UTF-16BE encoded. 770 See also the additional considerations in the UTF-16 example 771 (Section 8.2) above. 773 8.7. Non-UTF Charset 775 Content-Type: application/xml; charset=iso-2022-kr 777 779 This example shows the use of a non-UTF character encoding (in this 780 case Hangul, but this example is intended to cover all non-UTF-family 781 character encodings). Since the charset parameter is provided and 782 there is no overriding BOM, conformant processors must treat the 783 enclosed entity as encoded per RFC 1557. 785 Since ISO-2022-KR [RFC1557] has been defined to use only 7 bits of 786 data, no content-transfer-encoding is necessary with any transport: 787 for character sets needing 8 or more bits, considerations such as 788 those discussed above (Section 8.1, Section 8.2) would apply. 790 8.8. INCONSISTENT EXAMPLE: Conflicting Charset and Internal Encoding 791 Declaration 793 Content-Type: application/xml; charset=iso-8859-1 795 797 Although the charset parameter is provided in the Content-Type header 798 and there is no BOM and the charset parameter differs from the XML 799 encoding declaration, conformant MIME and XML processors will 800 interoperate. Since the charset parameter is authoritative in the 801 absence of a BOM, conformant processors will treat the enclosed 802 entity as iso-8859-1 encoded. That is, the "UTF-8" encoding 803 declaration will be ignored. 805 Conformant processors generating XML MIME entities must not label 806 conflicting character encoding information between the MIME Content- 807 Type and the XML declaration unless they have definitive information 808 about the actual encoding, for example as a result of systematic 809 transcoding. In particular, the addition by servers of an explicit, 810 site-wide charset parameter default has frequently lead to 811 interoperability problems for XML documents. 813 8.9. INCONSISTENT EXAMPLE: Conflicting Charset and BOM 815 Content-Type: application/xml; charset=iso-8859-1 817 {BOM} 819 Although the charset parameter is provided in the Content-Type 820 header, there is a BOM, so MIME and XML processors may not 821 interoperate. Since the BOM parameter is authoritative for 822 conformant XML processors, they will treat the enclosed entity as 823 UTF-16-encoded. That is, the "iso-8859-1" charset parameter will be 824 ignored. XML-unaware MIME processors on the other hand may be 825 unaware of the BOM and so treat the entity as encoded in iso-8859-1. 827 Conformant processors generating XML MIME entities must not label 828 conflicting character encoding information between the MIME Content- 829 Type and an entity-initial BOM. 831 9. IANA Considerations 833 9.1. Application/xml Registration 835 Type name: application 837 Subtype name: xml 839 Required parameters: none 841 Optional parameters: charset 843 See Section 3. 845 Encoding considerations: Depending on the character encoding used, 846 XML MIME entities can consist of 7bit, 8bit or binary data 847 [RFC6838]. For 7-bit transports, 7bit data, for example US-ASCII- 848 encoded data, does not require content-transfer-encoding, but 8bit 849 or binary data, for example UTF-8 or UTF-16 data, MUST be content- 850 transfer-encoded in quoted-printable or base64. For 8-bit clean 851 transport (e.g. 8BITMIME ESMTP [RFC6152] or NNTP [RFC3977]), 7bit 852 or 8bit data, for example US-ASCII or UTF-8 data, does not require 853 content-transfer-encoding, but binary data, for example data with 854 a UTF-16 encoding, MUST be content-transfer-encoded in base64. 856 For binary clean transports (e.g. BINARY ESMTP [RFC3030] or HTTP 857 [HTTPbis]), no content-transfer-encoding is necessary (or even 858 possible, in the case of HTTP) for 7bit, 8bit or binary data. 860 Security considerations: See Section 10. 862 Interoperability considerations: XML has proven to be interoperable 863 across both generic and task-specific applications and for import 864 and export from multiple XML authoring and editing tools. 865 Validating processors provide maximum interoperability, because 866 they have to handle all aspects of XML. Although a non-validating 867 processor may be more efficient, it might not handle all aspects. 868 For further information, see sub-section 2.9 "Standalone Document 869 Declaration" and section 5 "Conformance" of [XML] . 871 In practice, character set issues have proved to be the biggest 872 source of interoperability problems. The use of UTF-8, and 873 careful attention to the guidelines set out in Section 3, are the 874 best ways to avoid such problems. 876 Published specification: Extensible Markup Language (XML) 1.0 (Fifth 877 Edition) [XML] or subsequent editions or versions thereof. 879 Applications that use this media type: XML is device-, platform-, 880 and vendor-neutral and is supported by generic and task-specific 881 applications and a wide range of generic XML tools (editors, 882 parsers, Web agents, ...). 884 Additional information: 886 Magic number(s): None. 888 Although no byte sequences can be counted on to always be 889 present, XML MIME entities in ASCII-compatible character sets 890 (including UTF-8) often begin with hexadecimal 3C 3F 78 6D 6C 891 (". 1196 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 1197 Extensions (MIME) Part One: Format of Internet Message 1198 Bodies", RFC 2045, November 1996. 1200 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 1201 Extensions (MIME) Part Two: Media Types", RFC 2046, 1202 November 1996. 1204 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1205 Requirement Levels", BCP 14, RFC 2119, March 1997. 1207 [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 1208 10646", RFC 2781, February 2000. 1210 [RFC2978] Freed, N. and J. Postel, "IANA Charset Registration 1211 Procedures", RFC 2978, October 2000. 1213 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 1214 Resource Identifiers (URI): Generic Syntax.", RFC 3986, 1215 January 2005. 1217 [RFC3987] Dueerst, M. and M. Suignard, "Internationalized Resource 1218 Identifiers (IRIs)", RFC 3987, July 2005. 1220 [RFC6657] Melnikov, A. and J. Reschke, "Update to MIME regarding 1221 "charset" Parameter Handling in Textual Media Types", RFC 1222 6657, July 2012, 1223 . 1225 [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 1226 Specifications and Registration Procedures", BCP 13, RFC 1227 6838, January 2013. 1229 [RFC6839] Hansen, T. and A. Melnikov, "Additional Media Type 1230 Structured Syntax Suffixes", RFC 6839, January 2013. 1232 [UNICODE] The Unicode Consortium, "The Unicode Standard, Version 1233 6.3.0", 2013, 1234 . 1236 Defined by: The Unicode Standard, Version 6.3 (Mountain 1237 View, CA: The Unicode Consortium, 2013. ISBN 1238 978-1-936213-08-5) 1240 [XML1.1] Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E., 1241 Yergeau, F., and J. Cowan, "Extensible Markup Language 1242 (XML) 1.1 (Second Edition)", W3C Recommendation REC-xml, 1243 September 2006, 1244 . 1246 Latest version available at [2]. 1248 [XMLBase] Marsh, J. and R. Tobin, "XML Base (Second Edition)", W3C 1249 Recommendation REC-xmlbase-20090128, January 2009, 1250 . 1252 Latest version available at [3]. 1254 [XML] Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E., and 1255 F. Yergeau, "Extensible Markup Language (XML) 1.0 (Fifth 1256 Edition)", W3C Recommendation REC-xml, November 2008, 1257 . 1259 Latest version available at [1]. 1261 [XPointerElement] 1262 Grosso, P., Maler, E., Marsh, J., and N. Walsh, "XPointer 1263 element() Scheme", W3C Recommendation REC-XPointer- 1264 Element, March 2003, 1265 . 1267 Latest version available at [4]. 1269 [XPointerFramework] 1270 Grosso, P., Maler, E., Marsh, J., and N. Walsh, "XPointer 1271 Framework", W3C Recommendation REC-XPointer-Framework, 1272 March 2003, 1273 . 1275 Latest version available at [5]. 1277 [XPtrRegPolicy] 1278 Hazael-Massieux, D., "XPointer Scheme Name Registry 1279 Policy", 2005, 1280 . 1282 [XPtrReg] Hazael-Massieux, D., "XPointer Registry", 2005, 1283 . 1285 11.2. Informative References 1287 [ASCII] American National Standards Institute, "Coded Character 1288 Set -- 7-bit American Standard Code for Information 1289 Interchange", ANSI X3.4, 1986. 1291 [AWWW] Jacobs, I. and N. Walsh, "Architecture of the World Wide 1292 Web, Volume One", W3C Recommendation REC-webarch-20041215, 1293 December 2004, 1294 . 1296 Latest version available at [8]. 1298 [FYN] Mendelsohn, N., "The Self-Describing Web", W3C TAG Finding 1299 selfDescribingDocuments-2009-02-07, February 2009, 1300 . 1303 Latest version available at [9] 1305 [Infoset] Cowan, J. and R. Tobin, "XML Information Set (Second 1306 Edition)", W3C Recommendation REC-xml-infoset-20040204, 1307 Febuary 2004, 1308 . 1310 Latest version available at [11]. 1312 [MediaFrags] 1313 Troncy, R., Mannens, E., Pfeiffer, S., and D. Van Deursen, 1314 "Media Fragments URI 1.0 (basic)", W3C Recommendation 1315 media-frags, September 2012, 1316 . 1318 Latest version available at [6]. 1320 [RFC1557] Choi, U., Chon, K., and H. Park, "Korean Character 1321 Encoding for Internet Messages", RFC 1557, December 1993. 1323 [RFC2376] Whitehead, E. and M. Murata, "XML Media Types", RFC 2376, 1324 July 1998. 1326 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Nielsen, H., 1327 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 1328 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 1330 [RFC3023] Murata, M., St.Laurent, S., and D. Kohn, "XML Media 1331 Types", RFC 3023, January 2001. 1333 [RFC3030] Vaudreuil, G., "SMTP Service Extensions for Transmission 1334 of Large and Binary MIME Messages", RFC 3030, 2000. 1336 [RFC3977] Feather, B., "Network News Transfer Protocol", RFC 3977, 1337 October 2006. 1339 [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, 1340 October 2008. 1342 [RFC6152] Klensin, J., Freed, N., Rose, M., and D. Crocker, "SMTP 1343 Service Extension for 8-bit MIME Transport", RFC 6152, 1344 March 2011. 1346 [Sivonen] Sivonen, H. and others, "Mozilla bug: Remove support for 1347 UTF-32 per HTML5 spec", October 2011, . 1350 [TAGMIME] Bray, T., Ed., "Internet Media Type registration, 1351 consistency of use", April 2004, 1352 . 1354 [XHTML] Pemberton, S. and et al, "XHTML 1.0: The Extensible 1355 HyperText Markup Language", W3C Recommendation xhtml1, 1356 December 1999, 1357 . 1359 Latest version available at [7]. 1361 [XMLModel] 1362 Grosso, P. and J. Kosek, "Associating Schemas with XML 1363 documents 1.0 (Third Edition)", W3C Group Note NOTE-xml- 1364 model-20121009, October 2012, 1365 . 1367 Latest version available at [13]. 1369 [XMLNS10] Bray, T., Hollander, D., Layman, A., Tobin, R., and H. 1370 Thompson, "Namespaces in XML 1.0 (Third Edition)", W3C 1371 Recommendation REC-xml-names-20091208, December 2009, 1372 . 1374 Latest version available at [12]. 1376 [XMLNS11] Bray, T., Hollander, D., Layman, A., and R. Tobin, 1377 "Namespaces in XML 1.1 (Second Edition)", W3C 1378 Recommendation REC-xml-names11-20060816, August 2006, 1379 . 1381 Latest version available at [14]. 1383 [XMLSS] Clark, J., Pieters, S., and H. Thompson, "Associating 1384 Style Sheets with XML documents 1.0 (Second Edition)", W3C 1385 Recommendation REC-xml-stylesheet-20101028, October 2010, 1386 . 1388 Latest version available at [15]. 1390 [XMLid] Marsh, J., Veillard, D., and N. Walsh, "xml:id Version 1391 1.0", W3C Recommendation REC-xml-id-20050909, September 1392 2005, . 1394 Latest version available at [10]. 1396 Appendix A. Why Use the '+xml' Suffix for XML-Based MIME Types? 1398 [RFC3023] contains a detailed discussion of the (at the time) novel 1399 use of a suffix, a practice which has since become widespread. Those 1400 interested in a historical perspective on this topic are referred to 1401 [RFC3023], Appendix A. 1403 The registration process for new '+xml' media types is described in 1404 [RFC6838] 1406 Appendix B. Core XML specifications 1408 The following specifications each articulate key aspects of XML 1409 document semantics: 1411 Namespaces in XML 1.0 [XMLNS10]/Namespaces in XML 1.1 [XMLNS11] 1413 XML Information Set [Infoset] 1415 xml:id [XMLid] 1417 XML Base [XMLBase] 1419 Associating Style Sheets with XML documents [XMLSS] 1421 Associating Schemas with XML documents [XMLModel] 1423 The W3C Technical Architecture group has produced two documents which 1424 are also relevant: 1426 The Self-Describing Web [FYN] discusses the overall principles of 1427 how document semantics are determined on the Web. 1429 Architecture of the World Wide Web, Volume One [AWWW], section 1430 4.5.4, discusses the specific role of XML Namespace documents in 1431 this process. 1433 Appendix C. Operational considerations 1435 This section provides an informal summary of the major operational 1436 considerations which arise when exchanging XML MIME entities over a 1437 network. 1439 C.1. General considerations 1441 The existence of both XML-aware and XML-unaware agents handling XML 1442 MIME entities can compromise introperability. Generic transcoding 1443 proxies pose a particular risk in this regard. Detailed advice about 1444 the handling of BOMs when transcoding can be found in Section 3.3. 1446 This specification requires XML consumers to treat BOMs as 1447 authoritative: this is in principle a backwards-incompatibility. In 1448 practice serious interoperability issues already exist when BOMs are 1449 used. Making BOMs authoritative, in conjunction with the deprecation 1450 of the UTF-32 encoding form and the requirement to include an XML 1451 encoding declaration in certain cases (Section 3.1), is intended to 1452 improve in-practice interoperability as much as possible over time. 1454 This specification establishes Section 5 as the basis for 1455 interpreting URIs for XML MIME entities which include fragment 1456 identifiers, mandates support only for shorthand ("simple name") and 1457 'element'-scheme fragments and deprecates support for unregistered 1458 XPointer schemes by XML MIME entity processors. Accordingly, URIs 1459 will interoperate best if they use only simple names and 1460 'element'-scheme fragment identifiers, with registered schemes 1461 varying widely in the degree of support to be found in generic tools. 1462 XPointer scheme authors can only expect generic tool support if they 1463 register their schemes. 1465 C.2. Considerations for producers 1467 Interoperability for all XML MIME entities is maximized by the use of 1468 UTF-8, without a BOM. When UTF-8 is _not_ used, a charset parameter 1469 and/or a BOM improve interoperability, particularly when XML-unaware 1470 consumers may be involved. 1472 In the very rare case where the substantive content of a non-UNICODE 1473 XML external parsed entity begins with the hexadecimal octet 1474 sequences 0xFE 0xFF, 0xFF 0xFE or 0xEF 0xBB 0xBF, including an XML 1475 text declaration will forestall the mistaken detection of a BOM. 1477 The use of UTF-32 for XML MIME entities puts interoperability at very 1478 high risk. 1480 Web-server configurations which supply default charset parameters 1481 risk misrepresenting XML MIME entities. Allowing users to control 1482 the value of charset parameters improves interoperability. 1484 Supplying a mistaken charset parameter is worse than supplying none 1485 at all. In particular, generic processors such as transcoders, when 1486 processing based on a mistaken charset parameter, if they do not fail 1487 altogether are likely to produce arbitrarily bogus results from which 1488 the original is not recoverable. 1490 C.3. Considerations for consumers 1492 Consumers of XML MIME entities can maximize interoperability by 1494 1. Taking a BOM as authoritative if it is present in an XML MIME 1495 entity; 1497 2. In the absence of a BOM, taking a charset parameter as 1498 authoritative if it is present. 1500 Assuming a default character encoding in the absence of a charset 1501 parameter harms interoperability. 1503 Although support for UTF-32 is not required by [XML] itself, and this 1504 specification deprecates its use, consumers which check for UTF-32 1505 BOMs can thereby avoid mistakenly processing UTF-32 entities as 1506 (invalid) UTF-16 entities. 1508 Appendix D. Changes from RFC 3023 1510 There are numerous and significant differences between this 1511 specification and [RFC3023], which it obsoletes. This appendix 1512 summarizes the major differences only. 1514 XPointer ([XPointerFramework] and [XPointerElement]) has been 1515 added as fragment identifier syntax for all the XML media types, 1516 and the XPointer Registry ([XPtrReg]) mentioned 1518 [XMLBase] has been added as a mechanism for specifying base URIs 1519 The language regarding character sets was updated to correspond to 1520 the W3C TAG finding Internet Media Type registration, consistency 1521 of use [TAGMIME] 1523 Priority is now given to a Byte Order Mark (BOM) if present 1525 Many references are updated, and the existence of XML 1.1 and 1526 relevance of this specification to it acknowledged 1528 A number of justifications and contextualizations which were 1529 appropriate when XML was new have been removed, including the 1530 whole of the original Appendix A 1532 Appendix E. Acknowledgements 1534 MURATA Makoto (FAMILY Given) and Alexey Melnikov made early and 1535 important contributions to the effort to revise [RFC3023]. 1537 This specification reflects the input of numerous participants to the 1538 ietf-xml-mime@imc.org, xml-mime@ietf.org and apps-discuss@ietf.org 1539 mailing lists, though any errors are the responsibility of the 1540 authors. Special thanks to: 1542 Mark Baker, James Clark, Dan Connolly, Martin Duerst, Ned Freed, 1543 Yaron Goland, Bjoern Hoehrmann, Rick Jelliffe, Murray S. Kucherawy, 1544 Larry Masinter, David Megginson, S. Moonesamy, Keith Moore, Chris 1545 Newman, Gavin Nicol, Julian Reschke, Marshall Rose, Jim Whitehead, 1546 Erik Wilde and participants of the XML activity and the TAG at the 1547 W3C. 1549 Jim Whitehead and Simon St. Laurent were editors of [RFC2376] and 1550 [RFC3023], respectively. 1552 Authors' Addresses 1554 Henry S. Thompson 1555 University of Edinburgh 1557 Email: ht@inf.ed.ac.uk 1558 URI: http://www.ltg.ed.ac.uk/~ht/ 1559 Chris Lilley 1560 World Wide Web Consortium 1561 2004, Route des Lucioles - B.P. 93 06902 1562 Sophia Antipolis Cedex 1563 France 1565 Email: chris@w3.org 1566 URI: http://www.w3.org/People/chris/