| < draft-whitehead-mime-xml-03.txt | draft-whitehead-mime-xml-04.txt > | |||
|---|---|---|---|---|
| INTERNET DRAFT E. J. Whitehead, Jr., UC Irvine | INTERNET DRAFT E. J. Whitehead, Jr., UC Irvine | |||
| <draft-whitehead-mime-xml-03> M. Murata, Fuji Xerox Info. Systems | <draft-whitehead-mime-xml-04> M. Murata, Fuji Xerox Info. Systems | |||
| Expires September, 1998 May 15, 1998 | Expires November, 1998 May 31, 1998 | |||
| XML Media Types | XML Media Types | |||
| Status of this Memo | Status of this Memo | |||
| This document is an Internet-Draft. Internet-Drafts are working | This document is an Internet-Draft. Internet-Drafts are working | |||
| documents of the Internet Engineering Task Force (IETF), its areas, | documents of the Internet Engineering Task Force (IETF), its areas, | |||
| and its working groups. Note that other groups may also distribute | and its working groups. Note that other groups may also distribute | |||
| working documents as Internet-Drafts. | working documents as Internet-Drafts. | |||
| skipping to change at page 1, line 37 ¶ | skipping to change at page 1, line 37 ¶ | |||
| Distribution of this document is unlimited. | Distribution of this document is unlimited. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The Internet Society (1998). All Rights Reserved. | Copyright (C) The Internet Society (1998). All Rights Reserved. | |||
| Abstract | Abstract | |||
| This document proposes two new media subtypes, text/xml and | This document proposes two new media subtypes, text/xml and | |||
| application/xml, for use in exchanging network entities which are | application/xml, for use in exchanging network entities which are | |||
| conformant Extensible Markup Language (XML). XML entities are | conforming Extensible Markup Language (XML). XML entities are | |||
| currently exchanged via the HyperText Transfer Protocol on the World | currently exchanged via the HyperText Transfer Protocol on the World | |||
| Wide Web, and are an integral part of the WebDAV protocol for remote | Wide Web, are an integral part of the WebDAV protocol for remote web | |||
| web authoring, and are expected to have utility in many domains. | authoring, and are expected to have utility in many domains. | |||
| Contents | Contents | |||
| STATUS OF THIS MEMO...................................................1 | STATUS OF THIS MEMO...................................................1 | |||
| COPYRIGHT NOTICE......................................................1 | COPYRIGHT NOTICE......................................................1 | |||
| ABSTRACT..............................................................1 | ABSTRACT..............................................................1 | |||
| CONTENTS..............................................................2 | CONTENTS..............................................................2 | |||
| 1 INTRODUCTION .......................................................3 | 1 INTRODUCTION .......................................................3 | |||
| 2 XML MEDIA TYPES ....................................................3 | 2 NOTATIONAL CONVENTIONS .............................................3 | |||
| 2.1 Text/xml Registration ...........................................4 | 3 XML MEDIA TYPES ....................................................4 | |||
| 2.2 Application/xml Registration ....................................6 | 3.1 Text/xml Registration ...........................................4 | |||
| 3 SECURITY CONSIDERATIONS ............................................9 | 3.2 Application/xml Registration ....................................7 | |||
| 4 REFERENCES ........................................................11 | 4 SECURITY CONSIDERATIONS ............................................9 | |||
| 5 ACKNOWLEDGEMENTS ..................................................11 | 5 THE BYTE ORDER MARK (BOM) AND CONVERSIONS TO/FROM UTF-16 ..........10 | |||
| 6 AUTHOR'S ADDRESS ..................................................12 | 6 EXAMPLES ..........................................................10 | |||
| 6.1 text/xml with UTF-8 Charset ....................................10 | ||||
| 6.2 text/xml with UTF-16 Charset ...................................11 | ||||
| 6.3 text/xml with ISO-2022-KR Charset ..............................11 | ||||
| 6.4 text/xml with Omitted Charset ..................................11 | ||||
| 6.5 application/xml with UTF-16 Charset ............................12 | ||||
| 6.6 application/xml with ISO-2022-KR Charset .......................12 | ||||
| 6.7 application/xml with Omitted Charset and UTF-16 XML Entity .....12 | ||||
| 6.8 application/xml with Omitted Charset and UTF-8 Entity ..........13 | ||||
| 6.9 application/xml with Omitted Charset and Internal Encoding | ||||
| Declaration..........................................................13 | ||||
| 7 REFERENCES ........................................................14 | ||||
| 8 ACKNOWLEDGEMENTS ..................................................15 | ||||
| 9 ADDRESSES OF AUTHORS ..............................................15 | ||||
| 1 Introduction | 1 Introduction | |||
| The World Wide Web Consortium (W3C) has issued a Recommendation | The World Wide Web Consortium (W3C) has issued a Recommendation | |||
| [REC-XML] which defines the Extensible Markup Language (XML), | [REC-XML] which defines the Extensible Markup Language (XML), | |||
| version 1. To enable the exchange of XML network entities, this | version 1. To enable the exchange of XML network entities, this | |||
| document proposes two new media types, text/xml and application/xml. | document proposes two new media types, text/xml and application/xml. | |||
| XML entities are currently exchanged on the World Wide Web, and XML | XML entities are currently exchanged on the World Wide Web, and XML | |||
| is also used for property values and parameter marshalling by the | is also used for property values and parameter marshalling by the | |||
| WebDAV protocol for remote web authoring. Thus, there is a need for | WebDAV protocol for remote web authoring. Thus, there is a need for | |||
| skipping to change at page 3, line 31 ¶ | skipping to change at page 3, line 31 ¶ | |||
| text/sgml or application/sgml to label XML is inappropriate. First, | text/sgml or application/sgml to label XML is inappropriate. First, | |||
| there exist many applications which can process XML, but which | there exist many applications which can process XML, but which | |||
| cannot process SGML, due to SGML's larger feature set. Second, SGML | cannot process SGML, due to SGML's larger feature set. Second, SGML | |||
| applications cannot always process XML entities, because XML uses | applications cannot always process XML entities, because XML uses | |||
| features of recent technical corrigenda to SGML. Third, the | features of recent technical corrigenda to SGML. Third, the | |||
| definition of text/sgml and application/sgml [RFC-1874] includes | definition of text/sgml and application/sgml [RFC-1874] includes | |||
| parameters for SGML bit combination transformation format (SGML- | parameters for SGML bit combination transformation format (SGML- | |||
| bctf), and SGML boot attribute (SGML-boot). Since XML does not use | bctf), and SGML boot attribute (SGML-boot). Since XML does not use | |||
| these parameters, it would be ambiguous if such parameters were | these parameters, it would be ambiguous if such parameters were | |||
| given for an XML entity. For these reasons, the best approach for | given for an XML entity. For these reasons, the best approach for | |||
| labeling XML network entities is to provide a new media type for | labeling XML network entities is to provide new media types for XML. | |||
| XML. | ||||
| Since XML is an integral part of the WebDAV Distributed Authoring | Since XML is an integral part of the WebDAV Distributed Authoring | |||
| Protocol, and since World Wide Web Consortium Recommendations have | Protocol, and since World Wide Web Consortium Recommendations have | |||
| conventionally been assigned IETF tree media types, and since | conventionally been assigned IETF tree media types, and since | |||
| similar media types (HTML, SGML) have been assigned IETF tree media | similar media types (HTML, SGML) have been assigned IETF tree media | |||
| types, the XML media types also belong in the IETF tree. | types, the XML media types also belong in the IETF media types tree. | |||
| 2 XML Media Types | 2 Notational Conventions | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | ||||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | ||||
| document are to be interpreted as described in [RFC-2119]. | ||||
| 3 XML Media Types | ||||
| This document introduces two new media types for XML entities, | This document introduces two new media types for XML entities, | |||
| text/xml and application/xml. Registration information for these | text/xml and application/xml. Registration information for these | |||
| media types are described in the sections below. | media types are described in the sections below. | |||
| An XML network entity should be labeled as text/xml under the | Every XML entity is suitable for use with the application/xml media | |||
| following circumstances: | type without modification. But this does not exploit the fact that | |||
| XML can be treated as plain text in many cases. MIME user agents | ||||
| - it is represented by a charset (e.g., UTF-8) which is | (and web user agents) that do not have explicit support for | |||
| compatible with the requirements for text media types as | application/xml will treat it as application/octet-stream, for | |||
| described in [RFC-2045] and [RFC-2046], or | example, by offering to save it to a file. | |||
| - it is transmitted by HTTP, which uses a MIME-like mechanism | ||||
| that is exempt from the restrictions on the text top-level type | ||||
| (see section 19.4.1 of HTTP 1.1 [RFC-2068]). | ||||
| If an XML network entity fails any of these criteria, then it should | To indicate that an XML entity should be treated as plain text by | |||
| be labeled as application/xml. Specifically, if the XML entity is | default, use the text/xml media type. This restricts the encoding | |||
| represented by UTF-16 and the protocol is not HTTP, the XML entity | used in the XML entity to those that are compatible with the | |||
| should be labeled as application/xml. | requirements for text media types as described in [RFC-2045] and | |||
| [RFC-2046], e.g., UTF-8, but not UTF-16 (except for HTTP). | ||||
| Some applications of XML will require security or runtime | XML provides a general framework for defining sequences of | |||
| information specific to these applications. This document does not | structured data. In some cases, it may be desirable to define new | |||
| prohibit future media types dedicated to such XML applications. | media types which use XML but define a specific application of XML, | |||
| However, developers of such media types are recommended to use this | perhaps due to domain-specific security considerations or runtime | |||
| document as a basis. In particular, the charset parameter should be | information. This document does not prohibit future media types | |||
| used in the same manner. | dedicated to such XML applications. However, developers of such | |||
| media types are recommended to use this document as a basis. In | ||||
| particular, the charset parameter should be used in the same manner. | ||||
| Within the XML specification, XML entities can be classified into | Within the XML specification, XML entities can be classified into | |||
| four types. In the XML terminology, they are called "document | four types. In the XML terminology, they are called "document | |||
| entities", "external DTD subsets", "external parsed entities", and | entities", "external DTD subsets", "external parsed entities", and | |||
| "external parameter entities". The media types text/xml and | "external parameter entities". The media types text/xml and | |||
| application/xml can be used for any of these four types. | application/xml can be used for any of these four types. | |||
| 2.1 Text/xml Registration | 3.1 Text/xml Registration | |||
| MIME media type name: text | MIME media type name: text | |||
| MIME subtype name: xml | MIME subtype name: xml | |||
| Mandatory parameters: none | Mandatory parameters: none | |||
| Optional parameters: charset | Optional parameters: charset | |||
| Although listed as an optional parameter, the use of the charset | Although listed as an optional parameter, the use of the charset | |||
| parameter is STRONGLY RECOMMENDED, since this information can be | parameter is STRONGLY RECOMMENDED, since this information can be | |||
| used by XML processors to determine authoritatively the charset | used by XML processors to determine authoritatively the | |||
| of the XML entity. The charset parameter can also be used to | character encoding of the XML entity. The charset parameter can | |||
| provide protocol-specific operations, such as charset-based | also be used to provide protocol-specific operations, such as | |||
| content negotiation in HTTP. | charset-based content negotiation in HTTP. | |||
| "UTF-8" [RFC-2279] is the recommended value, representing the | "UTF-8" [RFC-2279] is the recommended value, representing the | |||
| UTF-8 charset. UTF-8 is supported by all conformant XML | UTF-8 charset. UTF-8 is supported by all conforming XML | |||
| processors [REC-XML]. | processors [REC-XML]. | |||
| If the XML entity is transmitted via HTTP, which uses a MIME- | If the XML entity is transmitted via HTTP, which uses a MIME- | |||
| like mechanism that is exempt from the restrictions on the text | like mechanism that is exempt from the restrictions on the text | |||
| top-level type (see section 19.4.1 of HTTP 1.1 [RFC-2068]), | top-level type (see section 19.4.1 of HTTP 1.1 [RFC-2068]), | |||
| "UTF-16" (Appendix C.3 of [UNICODE] and Amendment 1 of [ISO- | "UTF-16" (Appendix C.3 of [UNICODE] and Amendment 1 of [ISO- | |||
| 10646]) is also recommended. UTF-16 is supported by all | 10646]) is also recommended. UTF-16 is supported by all | |||
| conformant XML processors [REC-XML]. UTF-16 should be sent in | conforming XML processors [REC-XML]. Since the handling of CR, | |||
| network byte order (big-endian), but recipients should be able | LF and NUL for text types in most MIME applications would cause | |||
| to handle both big-endian and little-endian. | undesired transformations of individual octets in UTF-16 multi- | |||
| octet characters, gateways from HTTP to these MIME applications | ||||
| If a text/xml entity is received where the charset parameter is | MUST transform the XML entity from a text/xml; charset="utf-16" | |||
| omitted, XML processors are encouraged to apply the heuristics | to application/xml; charset="utf-16". | |||
| described in Appendix F of [REC-XML] ("Autodetection of | ||||
| Character Encodings") to deterministically detect the charset of | ||||
| the entity before reverting to the text/xml default charset. | ||||
| Note that if a charset other than UTF-8 (or UTF-16) is used, and | Conformant with [RFC-2046], if a text/xml entity is received | |||
| the charset is not declared within an XML entity by an XML | with the charset parameter omitted, MIME processors and XML | |||
| "encoding declaration" (a non-conformant situation according to | processors MUST use the default charset value of "us-ascii". If | |||
| the XML specification), an XML processor will be unable to | the XML entity is transmitted via HTTP, the default charset | |||
| determine the charset of the XML entity if the charset parameter | value is "ISO-8859-1" (see section 3.7.1 of HTTP 1.1 [RFC- | |||
| is not given. The definition of XML encoding declarations is | 2068]). | |||
| given in 4.3.3 of [REC-XML]. | ||||
| Since the charset parameter is authoritative, the charset is not | Since the charset parameter is authoritative, the charset is not | |||
| always declared within an XML encoding declaration. Thus, | always declared within an XML encoding declaration. Thus, | |||
| special care is needed when the recipient strips the MIME header | special care is needed when the recipient strips the MIME header | |||
| and provides persistent storage of the received XML entity | and provides persistent storage of the received XML entity | |||
| (e.g., in a file system). Unless the charset is UTF-8 or UTF- | (e.g., in a file system). Unless the charset is UTF-8 or UTF-16, | |||
| 16, the recipient should also persistently store information | the recipient SHOULD also persistently store information about | |||
| about the charset, perhaps by embedding a correct XML encoding | the charset, perhaps by embedding a correct XML encoding | |||
| declaration within the XML entity. | declaration within the XML entity. | |||
| Encoding considerations: | Encoding considerations: | |||
| May be encoded. In particular, XML entities in UTF-8 must be | This media type MAY be encoded as appropriate for the charset | |||
| encoded in quoted-printable or base64 unless the underlying MIME | and the capabilities of the underlying MIME transport. For 7-bit | |||
| transport is 8-bit clean. Since HTTP is 8-bit clean, XML | transports, data in both UTF-8 and UTF-16 is encoded in quoted- | |||
| entities in UTF-16 do not require encoding. | printable or base64. For 8-bit clean transport (e.g., ESMTP, | |||
| 8BITMIME, or NNTP), UTF-8 is not encoded, but UTF-16 is base64 | ||||
| encoded. For binary clean transports (e.g., HTTP), no content- | ||||
| transfer-encoding is necessary. | ||||
| Security considerations: | Security considerations: | |||
| See section 3 below. | See section 4 below. | |||
| Interoperability considerations: | Interoperability considerations: | |||
| XML has proven to be interoperable across WebDAV clients and | XML has proven to be interoperable across WebDAV clients and | |||
| servers, and for import and export from multiple XML authoring | servers, and for import and export from multiple XML authoring | |||
| tools. | tools. | |||
| Published specification: see [REC-XML] | Published specification: see [REC-XML] | |||
| Applications which use this media type: | Applications which use this media type: | |||
| skipping to change at page 6, line 12 ¶ | skipping to change at page 6, line 27 ¶ | |||
| Although no byte sequences can be counted on to always be | Although no byte sequences can be counted on to always be | |||
| present, XML entities in ASCII-compatible charsets (including | present, XML entities in ASCII-compatible charsets (including | |||
| UTF-8) often begin with hexadecimal 3C 3F 78 6D 6C ("<?xml"). | UTF-8) often begin with hexadecimal 3C 3F 78 6D 6C ("<?xml"). | |||
| For more information, see Appendix F of [REC-XML]. | For more information, see Appendix F of [REC-XML]. | |||
| File extension(s): .xml, .dtd | File extension(s): .xml, .dtd | |||
| Macintosh File Type Code(s): "TEXT" | Macintosh File Type Code(s): "TEXT" | |||
| Person & email address for further information: | Person & email address for further information: | |||
| Dan Connolly <connolly@w3.org> | ||||
| Murata Makoto (Family Given) <murata@fxis.fujixerox.co.jp> | Murata Makoto (Family Given) <murata@fxis.fujixerox.co.jp> | |||
| Jim Whitehead <ejw@ics.uci.edu> | ||||
| Kurt Conrad <conrad@SagebrushGroup.com> | ||||
| Intended usage: COMMON | Intended usage: COMMON | |||
| Author/Change controller: | Author/Change controller: | |||
| The XML specification is a work product of the World Wide Web | The XML specification is a work product of the World Wide Web | |||
| Consortium's XML Working Group, and was edited by: | Consortium's XML Working Group, and was edited by: | |||
| Tim Bray <tbray@textuality.com> | Tim Bray <tbray@textuality.com> | |||
| Jean Paoli <jeanpa@microsoft.com> | Jean Paoli <jeanpa@microsoft.com> | |||
| C. M. Sperberg-McQueen <cmsmcq@uic.edu> | C. M. Sperberg-McQueen <cmsmcq@uic.edu> | |||
| The W3C, and the W3C XML working group, has change control over | The W3C, and the W3C XML working group, has change control over | |||
| the XML specification. | the XML specification. | |||
| 2.2 Application/xml Registration | 3.2 Application/xml Registration | |||
| MIME media type name: application | MIME media type name: application | |||
| MIME subtype name: xml | MIME subtype name: xml | |||
| Mandatory parameters: none | Mandatory parameters: none | |||
| Optional parameters: charset | Optional parameters: charset | |||
| Although listed as an optional parameter, the use of the charset | Although listed as an optional parameter, the use of the charset | |||
| parameter is STRONGLY RECOMMENDED, since this information can be | parameter is STRONGLY RECOMMENDED, since this information can be | |||
| used by XML processors to determine authoritatively the charset | used by XML processors to determine authoritatively the charset | |||
| of the XML entity. The charset parameter can also be used to | of the XML entity. The charset parameter can also be used to | |||
| provide protocol-specific operations, such as charset-based | provide protocol-specific operations, such as charset-based | |||
| content negotiation in HTTP. | content negotiation in HTTP. | |||
| "UTF-16" (Appendix C.3 of [UNICODE] and Amendment 1 of [ISO- | "UTF-8" [RFC-2279] and "UTF-16" (Appendix C.3 of [UNICODE] and | |||
| 10646]) is the recommended values, representing the UTF-16 | Amendment 1 of [ISO-10646]) are the recommended values, | |||
| charset. This charset is preferred since it is supported by all | representing the UTF-8 and UTF-16 charsets, respectively. These | |||
| conformant XML processors [REC-XML]. UTF-16 should be sent in | charsets are preferred since they are supported by all | |||
| network byte order (big-endian), but recipients should be able | conforming XML processors [REC-XML]. | |||
| to handle both big-endian and little-endian. | ||||
| If an application/xml entity is received where the charset | If an application/xml entity is received where the charset | |||
| parameter is omitted, XML processors are encouraged to apply the | parameter is omitted, no information is being provided about the | |||
| heuristics described in Appendix F of [REC-XML] ("Autodetection | charset by the MIME Content-Type header. Conforming XML | |||
| of Character Encodings") to deterministically detect the charset | processors MUST follow the requirements in section 4.3.3 of | |||
| of the entity. | [REC-XML] which directly address this contingency. However, MIME | |||
| processors which are not XML processors should not assume a | ||||
| Note that if a charset other than UTF-8 or UTF-16 is used, and | default charset if the charset parameter is omitted from an | |||
| the charset is not declared within an XML entity by an XML | application/xml entity. | |||
| "encoding declaration" (a non-conformant situation according to | ||||
| the XML specification), an XML processor will be unable to | ||||
| determine the charset of the XML entity if the charset parameter | ||||
| is not given. The definition of XML encoding declarations is | ||||
| given in 4.3.3 of [REC-XML]. | ||||
| Since the charset parameter is authoritative, the charset is not | Since the charset parameter is authoritative, the charset is not | |||
| always declared within an XML encoding declaration. Thus, | always declared within an XML encoding declaration. Thus, | |||
| special care is needed when the recipient strips the MIME header | special care is needed when the recipient strips the MIME header | |||
| and provides persistent storage of the received XML entity | and provides persistent storage of the received XML entity | |||
| (e.g., in a file system). Unless the charset is UTF-8 or UTF- | (e.g., in a file system). Unless the charset is UTF-8 or | |||
| 16, the recipient should also persistently store information | UTF-16, the recipient SHOULD also persistently store information | |||
| about the charset, perhaps by embedding a correct XML encoding | about the charset, perhaps by embedding a correct XML encoding | |||
| declaration within the XML entity. | declaration within the XML entity. | |||
| Encoding considerations: | Encoding considerations: | |||
| May be encoded. In particular, XML entities in UTF-16 must be | This media type MAY be encoded as appropriate for the charset | |||
| encoded in base64 unless the underlying MIME transport is binary | and the capabilities of the underlying MIME transport. For 7-bit | |||
| safe; XML entities in UTF-8 must be encoded in quoted-printable | transports, data in both UTF-8 and UTF-16 is encoded in quoted- | |||
| or base64 unless the underlying MIME transport is 8-bit clean. | printable or base64. For 8-bit clean transport (e.g., ESMTP, | |||
| 8BITMIME, or NNTP), UTF-8 is not encoded, but UTF-16 is base64 | ||||
| encoded. For binary clean transport (e.g., HTTP), no content- | ||||
| transfer-encoding is necessary. | ||||
| Security considerations: | Security considerations: | |||
| See section 3 below. | See section 4 below. | |||
| Interoperability considerations: | Interoperability considerations: | |||
| XML has proven to be interoperable for import and export from | XML has proven to be interoperable for import and export from | |||
| multiple XML authoring tools. | multiple XML authoring tools. | |||
| Published specification: see [REC-XML] | Published specification: see [REC-XML] | |||
| Applications which use this media type: | Applications which use this media type: | |||
| skipping to change at page 8, line 15 ¶ | skipping to change at page 8, line 38 ¶ | |||
| and those in UTF-16 often begin with hexadecimal FE FF 00 3C 00 | and those in UTF-16 often begin with hexadecimal FE FF 00 3C 00 | |||
| 3F 00 78 00 6D or FF FE 3C 00 3F 00 78 00 6D 00 (the Byte Order | 3F 00 78 00 6D or FF FE 3C 00 3F 00 78 00 6D 00 (the Byte Order | |||
| Mark (BOM) followed by "<?xml"). For more information, see | Mark (BOM) followed by "<?xml"). For more information, see | |||
| Annex F of [REC-XML]. | Annex F of [REC-XML]. | |||
| File extension(s): .xml, .dtd | File extension(s): .xml, .dtd | |||
| Macintosh File Type Code(s): "TEXT" | Macintosh File Type Code(s): "TEXT" | |||
| Person & email address for further information: | Person & email address for further information: | |||
| Dan Connolly <connolly@w3.org> | ||||
| Murata Makoto (Family Given) <murata@fxis.fujixerox.co.jp> | Murata Makoto (Family Given) <murata@fxis.fujixerox.co.jp> | |||
| Jim Whitehead <ejw@ics.uci.edu> | ||||
| Kurt Conrad <conrad@SagebrushGroup.com> | ||||
| Intended usage: COMMON | Intended usage: COMMON | |||
| Author/Change controller: | Author/Change controller: | |||
| The XML specification is a work product of the World Wide Web | The XML specification is a work product of the World Wide Web | |||
| Consortium's XML Working Group, and was edited by: | Consortium's XML Working Group, and was edited by: | |||
| Tim Bray <tbray@textuality.com> | Tim Bray <tbray@textuality.com> | |||
| Jean Paoli <jeanpa@microsoft.com> | Jean Paoli <jeanpa@microsoft.com> | |||
| C. M. Sperberg-McQueen <cmsmcq@uic.edu> | C. M. Sperberg-McQueen <cmsmcq@uic.edu> | |||
| The W3C, and the W3C XML working group, has change control over | The W3C, and the W3C XML working group, has change control over | |||
| the XML specification. | the XML specification. | |||
| 3 Security Considerations | 4 Security Considerations | |||
| XML, as a subset of SGML, has the same security considerations as | XML, as a subset of SGML, has the same security considerations as | |||
| specified in [RFC-1874]. | specified in [RFC-1874]. | |||
| To paraphrase section 3 of [RFC-1874], XML entities contain | To paraphrase section 3 of [RFC-1874], XML entities contain | |||
| information to be parsed and processed by the recipient's XML | information to be parsed and processed by the recipient's XML | |||
| system. These entities may contain and such systems may permit | system. These entities may contain and such systems may permit | |||
| explicit system level commands to be executed while processing the | explicit system level commands to be executed while processing the | |||
| data. To the extent that an XML system will execute arbitrary | data. To the extent that an XML system will execute arbitrary | |||
| command strings, recipients of XML entities may be at risk. In | command strings, recipients of XML entities may be at risk. In | |||
| skipping to change at page 11, line 5 ¶ | skipping to change at page 10, line 14 ¶ | |||
| Note that it is also possible to construct XML documents which make | Note that it is also possible to construct XML documents which make | |||
| use of what XML terms "entity references" (using the XML meaning of | use of what XML terms "entity references" (using the XML meaning of | |||
| the term "entity", which differs from the MIME definition of this | the term "entity", which differs from the MIME definition of this | |||
| term), to construct repeated expansions of text. Recursive | term), to construct repeated expansions of text. Recursive | |||
| expansions are prohibited [REC-XML] and XML processors are required | expansions are prohibited [REC-XML] and XML processors are required | |||
| to detect them. However, even non-recursive expansions may cause | to detect them. However, even non-recursive expansions may cause | |||
| problems with the finite computing resources of computers, if they | problems with the finite computing resources of computers, if they | |||
| are performed many times. | are performed many times. | |||
| 4 References | 5 The Byte Order Mark (BOM) and Conversions to/from UTF-16 | |||
| [ISO-10646] ISO/IEC, Information Technology - Universal Multiple- | The XML Recommendation, in section 4.3.3, specifies that UTF-16 XML | |||
| Octet Coded Character Set (UCS) - Part 1: Architecture and Basic | entities must begin with a byte order mark (BOM), which is the ZERO | |||
| WIDTH NO-BREAK SPACE character, hexadecimal sequence 0xFEFF (or | ||||
| 0xFFFE, depending on endian). The XML Recommendation further states | ||||
| that the BOM is an encoding signature, and is not part of either the | ||||
| markup or the character data of the XML document. | ||||
| Due to the BOM, applications which convert XML from the UTF-16 | ||||
| encoding to another encoding SHOULD strip the BOM before conversion. | ||||
| Similarly, when converting from another encoding into UTF-16, the | ||||
| BOM SHOULD be added after conversion is complete. | ||||
| 6 Examples | ||||
| The examples below give the value of the Content-type MIME header | ||||
| and the XML declaration (which includes the encoding declaration) | ||||
| inside the XML entity. For UTF-16 examples, the Byte Order Mark | ||||
| character is denoted as "{BOM}", and the XML declaration is assumed | ||||
| to come at the beginning of the XML entity, immediately following | ||||
| the BOM. Note that other MIME headers may be present, and the XML | ||||
| entity may contain other data in addition to the XML declaration; | ||||
| the examples focus on the Content-type header and the encoding | ||||
| declaration for clarity. | ||||
| 6.1 text/xml with UTF-8 Charset | ||||
| Content-type: text/xml; charset="utf-8" | ||||
| <?xml version="1.0" encoding="utf-8"?> | ||||
| This is the recommended charset value for use with text/xml. Since | ||||
| the charset parameter is provided, MIME and XML processors must | ||||
| treat the enclosed entity as UTF-8 encoded. | ||||
| If sent using a 7-bit transport (e.g. SMTP), the XML entity must use | ||||
| a content-transfer-encoding of either quoted-printable or base64. | ||||
| For an 8-bit clean transport (e.g., ESMTP, 8BITMIME, or NNTP), or a | ||||
| binary clean transport (e.g., HTTP) no content-transfer-encoding is | ||||
| necessary. | ||||
| 6.2 text/xml with UTF-16 Charset | ||||
| Content-type: text/xml; charset="utf-16" | ||||
| {BOM}<?xml version='1.0' encoding='utf-16'?> | ||||
| This is possible only when the XML entity is transmitted via HTTP, | ||||
| which uses a MIME-like mechanism and is a binary-clean protocol, | ||||
| hence does not perform CR and LF transformations and allows NUL | ||||
| octets. This differs from typical text MIME type processing (see | ||||
| section 19.4.1 of HTTP 1.1 [RFC-2068] for details). | ||||
| Since HTTP is binary clean, no content-transfer-encoding is | ||||
| necessary. | ||||
| 6.3 text/xml with ISO-2022-KR Charset | ||||
| Content-type: text/xml; charset="iso-2022-kr" | ||||
| <?xml version="1.0" encoding='iso-2022-kr'?> | ||||
| This example shows text/xml with a Korean charset (e.g., Hangul) | ||||
| encoded following the specification in [RFC-1557]. Since the | ||||
| charset parameter is provided, MIME and XML processors must treat | ||||
| the enclosed entity as encoded per [RFC-1557]. | ||||
| Since ISO-2022-KR has been defined to use only 7 bits of data, no | ||||
| content-transfer-encoding is necessary with any transport. | ||||
| 6.4 text/xml with Omitted Charset | ||||
| Content-type: text/xml | ||||
| {BOM}<?xml version="1.0" encoding="utf-16"?> | ||||
| This example shows text/xml with the charset parameter omitted. In | ||||
| this case, MIME and XML processors must assume the charset is | ||||
| "us-ascii", the default charset value for text media types specified | ||||
| in [RFC-2046], except when the underlying transport defines a | ||||
| different default charset, e.g., if the XML entity is transmitted | ||||
| via HTTP, the default charset value is "ISO-8859-1" (see section | ||||
| 3.7.1 of HTTP 1.1 [RFC-2068]). | ||||
| Omitting the charset parameter is NOT RECOMMENDED for text/xml. For | ||||
| example, even if the contents of the XML entity are UTF-16 or UTF-8, | ||||
| or the XML entity has an explicit encoding declaration, XML and MIME | ||||
| processors must assume the charset is "us-ascii". | ||||
| 6.5 application/xml with UTF-16 Charset | ||||
| Content-type: application/xml; charset="utf-16" | ||||
| {BOM}<?xml version="1.0"?> | ||||
| This is a recommended charset value for use with application/xml. | ||||
| Since the charset parameter is provided, MIME and XML processors | ||||
| must treat the enclosed entity as UTF-16 encoded. | ||||
| If sent using a 7-bit transport (e.g., SMTP) or an 8-bit clean | ||||
| transport (e.g., ESMTP, 8BITMIME, or NNTP), the XML entity must be | ||||
| encoded in quoted-printable or base64. For a binary clean transport | ||||
| (e.g., HTTP), no content-transfer-encoding is necessary. | ||||
| 6.6 application/xml with ISO-2022-KR Charset | ||||
| Content-type: application/xml; charset="iso-2022-kr" | ||||
| <?xml version="1.0" encoding="iso-2022-kr"?> | ||||
| This example shows application/xml with a Korean charset (e.g., | ||||
| Hangul) encoded following the specification in [RFC-1557]. Since | ||||
| the charset parameter is provided, MIME and XML processors must | ||||
| treat the enclosed entity as encoded per [RFC-1557], independent of | ||||
| whether the XML entity has an internal encoding declaration (this | ||||
| example does show such a declaration, which agrees with the charset | ||||
| parameter). | ||||
| Since ISO-2022-KR has been defined to use only 7 bits of data, no | ||||
| content-transfer-encoding is necessary with any transport. | ||||
| 6.7 application/xml with Omitted Charset and UTF-16 XML Entity | ||||
| Content-type: application/xml | ||||
| {BOM}<?xml version='1.0'?> | ||||
| For this example, the XML entity begins with a BOM. Since the | ||||
| charset has been omitted, a conforming XML processor follows the | ||||
| requirements of [REC-XML], section 4.3.3. Specifically, the XML | ||||
| processor reads the BOM, and thus knows deterministically that the | ||||
| charset encoding is UTF-16. | ||||
| An XML-unaware MIME processor should make no assumptions about the | ||||
| charset of the XML entity. | ||||
| 6.8 application/xml with Omitted Charset and UTF-8 Entity | ||||
| Content-type: application/xml | ||||
| <?xml version='1.0'?> | ||||
| In this example, the charset parameter has been omitted, and there | ||||
| is no BOM. Since there is no BOM, the XML processor follows the | ||||
| requirements in section 4.3.3, and optionally applies the mechanism | ||||
| described in appendix F (which is non-normative) of [REC-XML] to | ||||
| determine the charset encoding of UTF-8. The XML entity does not | ||||
| contain an encoding declaration, but since the encoding is UTF-8, | ||||
| this is still a conforming XML entity. | ||||
| An XML-unaware MIME processor should make no assumptions about the | ||||
| charset of the XML entity. | ||||
| 6.9 application/xml with Omitted Charset and Internal Encoding | ||||
| Declaration | ||||
| Content-type: application/xml | ||||
| <?xml version='1.0' encoding="ISO-10646-UCS-4"?> | ||||
| In this example, the charset parameter has been omitted, and there | ||||
| is no BOM. However, the XML entity does have an encoding | ||||
| declaration inside the XML entity which specifies the entity's | ||||
| charset. Following the requirements in section 4.3.3, and optionally | ||||
| applying the mechanism described in appendix F (non-normative) of | ||||
| [REC-XML], the XML processor determines the charset encoding of the | ||||
| XML entity (in this example, UCS-4). | ||||
| An XML-unaware MIME processor should make no assumptions about the | ||||
| charset of the XML entity. | ||||
| 7 References | ||||
| [ISO-10646] ISO/IEC, Information Technology -- Universal Multiple- | ||||
| Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic | ||||
| Multilingual Plane, May 1993. | Multilingual Plane, May 1993. | |||
| [ISO-8897] ISO (International Organization for Standardization) ISO | [ISO-8897] ISO (International Organization for Standardization) ISO | |||
| 8879:1986(E) Information Processing _ Text and Office Systems _ | 8879:1986(E) Information Processing -- Text and Office Systems -- | |||
| Standard Generalized Markup Language (SGML). First edition _ 1986- | Standard Generalized Markup Language (SGML). First edition -- 1986- | |||
| 10-15. | 10-15. | |||
| [REC-XML] T. Bray, J. Paoli, C. M. Sperberg-McQueen, "Extensible | [REC-XML] T. Bray, J. Paoli, C. M. Sperberg-McQueen, "Extensible | |||
| Markup Language (XML)." World Wide Web Consortium Recommendation | Markup Language (XML)" World Wide Web Consortium Recommendation REC- | |||
| REC-xml-19980210. http://www.w3.org/TR/1998/REC-xml-19980210. | xml-19980210. http://www.w3.org/TR/1998/REC-xml-19980210. | |||
| [RFC-1874] E. Levinson. "SGML Media Types_ Accurate Information | [RFC-1557] U. Choi, K. Chon, H. Park. "Korean Character Encoding for | |||
| Internet Messages" KAIST, Solvit Chosun Media. RFC 1557. December, | ||||
| 1993. | ||||
| [RFC-1874] E. Levinson. "SGML Media Types" Accurate Information | ||||
| Systems. RFC 1874. December, 1995. | Systems. RFC 1874. December, 1995. | |||
| [RFC-2119] S. Bradner. "Key words for use in RFCs to Indicate | ||||
| Requirement Levels." RFC 2119, BCP 14. Harvard University. March, | ||||
| 1997. | ||||
| [RFC-2045] N. Freed, N. Borenstein. "Multipurpose Internet Mail | [RFC-2045] N. Freed, N. Borenstein. "Multipurpose Internet Mail | |||
| Extensions (MIME) Part One: Format of Internet Message Bodies" | Extensions (MIME) Part One: Format of Internet Message Bodies" | |||
| Innosoft, First Virtual. RFC 2045. November, 1996. | Innosoft, First Virtual. RFC 2045. November, 1996. | |||
| [RFC-2046] N. Freed, N. Borenstein. "Multipurpose Internet Mail | [RFC-2046] N. Freed, N. Borenstein. "Multipurpose Internet Mail | |||
| Extensions (MIME) Part Two: Media Types_ Innosoft, First Virtual. | Extensions (MIME) Part Two: Media Types" Innosoft, First Virtual. | |||
| RFC 2046. November, 1996. | RFC 2046. November, 1996. | |||
| [RFC-2068] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, T. Berners- | [RFC-2068] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, T. Berners- | |||
| Lee. "Hypertext Transfer Protocol -- HTTP/1.1" UC Irvine, DEC, | Lee. "Hypertext Transfer Protocol -- HTTP/1.1" UC Irvine, DEC, | |||
| MIT/LCS. RFC 2068. January, 1997. | MIT/LCS. RFC 2068. January, 1997. | |||
| [RFC-2279] F. Yergeau, "UTF-8, a transformation format of ISO | [RFC-2279] F. Yergeau, "UTF-8, a transformation format of ISO 10646" | |||
| 10646", January 1998. | RFC 2279. January 1998. | |||
| [UNICODE] The Unicode Consortium, "The Unicode Standard -- Version | [UNICODE] The Unicode Consortium, "The Unicode Standard -- Version | |||
| 2.0", Addison-Wesley, 1996. | 2.0", Addison-Wesley, 1996. | |||
| 5 Acknowledgements | 8 Acknowledgements | |||
| Chris Newman and Yaron Y. Goland both contributed content to the | Chris Newman and Yaron Y. Goland both contributed content to the | |||
| security considerations section of this document. In particular, | security considerations section of this document. In particular, | |||
| some text in the security considerations section is copied verbatim | some text in the security considerations section is copied verbatim | |||
| from draft-newman-mime-textpara-00, by permission of the author. | from draft-newman-mime-textpara-00, by permission of the author. | |||
| Discussions with Ned Freed and Dan Connolly helped refine the | Chris Newman additionally contributed content to the encoding | |||
| author's understanding of the text media type. | considerations sections. Dan Connolly contributed content discussing | |||
| when to use text/xml. Discussions with Ned Freed and Dan Connolly | ||||
| helped refine the author's understanding of the text media type; | ||||
| feedback from Larry Masinter was also very helpful in understanding | ||||
| media type registration issues. | ||||
| Members of the W3C XML Working Group and XML Special Interest group | Members of the W3C XML Working Group and XML Special Interest group | |||
| have made significant contributions to this document. | have made significant contributions to this document, and the | |||
| authors would like to specially recognize James Clark, Martin | ||||
| Duerst, Rick Jelliffe, Gavin Nicol for their many thoughtful | ||||
| comments. | ||||
| 6 Author's Address | 9 Addresses of Authors | |||
| E. James Whitehead, Jr. | E. James Whitehead, Jr. | |||
| Dept. of Information and Computer Science | Dept. of Information and Computer Science | |||
| University of California, Irvine | University of California, Irvine | |||
| Irvine, CA 92697-3425 | Irvine, CA 92697-3425 | |||
| Email: ejw@ics.uci.edu | Email: ejw@ics.uci.edu | |||
| Murata Makoto (Family Given) | Murata Makoto (Family Given) | |||
| Fuji Xerox Information Systems, | Fuji Xerox Information Systems, | |||
| KSP 9A7, 2-1, Sakado 3-chome, Takatsu-ku, | KSP 9A7, 2-1, Sakado 3-chome, Takatsu-ku, | |||
| End of changes. 42 change blocks. | ||||
| 114 lines changed or deleted | 312 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||