< draft-whitehead-mime-xml-03.txt   draft-whitehead-mime-xml-04.txt >
INTERNET DRAFT E. J. Whitehead, Jr., UC Irvine INTERNET DRAFT E. J. Whitehead, Jr., UC Irvine
<draft-whitehead-mime-xml-03> M. Murata, Fuji Xerox Info. Systems <draft-whitehead-mime-xml-04> M. Murata, Fuji Xerox Info. Systems
Expires September, 1998 May 15, 1998 Expires November, 1998 May 31, 1998
XML Media Types XML Media Types
Status of this Memo Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts. working documents as Internet-Drafts.
skipping to change at page 1, line 37 skipping to change at page 1, line 37
Distribution of this document is unlimited. Distribution of this document is unlimited.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (1998). All Rights Reserved. Copyright (C) The Internet Society (1998). All Rights Reserved.
Abstract Abstract
This document proposes two new media subtypes, text/xml and This document proposes two new media subtypes, text/xml and
application/xml, for use in exchanging network entities which are application/xml, for use in exchanging network entities which are
conformant Extensible Markup Language (XML). XML entities are conforming Extensible Markup Language (XML). XML entities are
currently exchanged via the HyperText Transfer Protocol on the World currently exchanged via the HyperText Transfer Protocol on the World
Wide Web, and are an integral part of the WebDAV protocol for remote Wide Web, are an integral part of the WebDAV protocol for remote web
web authoring, and are expected to have utility in many domains. authoring, and are expected to have utility in many domains.
Contents Contents
STATUS OF THIS MEMO...................................................1 STATUS OF THIS MEMO...................................................1
COPYRIGHT NOTICE......................................................1 COPYRIGHT NOTICE......................................................1
ABSTRACT..............................................................1 ABSTRACT..............................................................1
CONTENTS..............................................................2 CONTENTS..............................................................2
1 INTRODUCTION .......................................................3 1 INTRODUCTION .......................................................3
2 XML MEDIA TYPES ....................................................3 2 NOTATIONAL CONVENTIONS .............................................3
2.1 Text/xml Registration ...........................................4 3 XML MEDIA TYPES ....................................................4
2.2 Application/xml Registration ....................................6 3.1 Text/xml Registration ...........................................4
3 SECURITY CONSIDERATIONS ............................................9 3.2 Application/xml Registration ....................................7
4 REFERENCES ........................................................11 4 SECURITY CONSIDERATIONS ............................................9
5 ACKNOWLEDGEMENTS ..................................................11 5 THE BYTE ORDER MARK (BOM) AND CONVERSIONS TO/FROM UTF-16 ..........10
6 AUTHOR'S ADDRESS ..................................................12 6 EXAMPLES ..........................................................10
6.1 text/xml with UTF-8 Charset ....................................10
6.2 text/xml with UTF-16 Charset ...................................11
6.3 text/xml with ISO-2022-KR Charset ..............................11
6.4 text/xml with Omitted Charset ..................................11
6.5 application/xml with UTF-16 Charset ............................12
6.6 application/xml with ISO-2022-KR Charset .......................12
6.7 application/xml with Omitted Charset and UTF-16 XML Entity .....12
6.8 application/xml with Omitted Charset and UTF-8 Entity ..........13
6.9 application/xml with Omitted Charset and Internal Encoding
Declaration..........................................................13
7 REFERENCES ........................................................14
8 ACKNOWLEDGEMENTS ..................................................15
9 ADDRESSES OF AUTHORS ..............................................15
1 Introduction 1 Introduction
The World Wide Web Consortium (W3C) has issued a Recommendation The World Wide Web Consortium (W3C) has issued a Recommendation
[REC-XML] which defines the Extensible Markup Language (XML), [REC-XML] which defines the Extensible Markup Language (XML),
version 1. To enable the exchange of XML network entities, this version 1. To enable the exchange of XML network entities, this
document proposes two new media types, text/xml and application/xml. document proposes two new media types, text/xml and application/xml.
XML entities are currently exchanged on the World Wide Web, and XML XML entities are currently exchanged on the World Wide Web, and XML
is also used for property values and parameter marshalling by the is also used for property values and parameter marshalling by the
WebDAV protocol for remote web authoring. Thus, there is a need for WebDAV protocol for remote web authoring. Thus, there is a need for
skipping to change at page 3, line 31 skipping to change at page 3, line 31
text/sgml or application/sgml to label XML is inappropriate. First, text/sgml or application/sgml to label XML is inappropriate. First,
there exist many applications which can process XML, but which there exist many applications which can process XML, but which
cannot process SGML, due to SGML's larger feature set. Second, SGML cannot process SGML, due to SGML's larger feature set. Second, SGML
applications cannot always process XML entities, because XML uses applications cannot always process XML entities, because XML uses
features of recent technical corrigenda to SGML. Third, the features of recent technical corrigenda to SGML. Third, the
definition of text/sgml and application/sgml [RFC-1874] includes definition of text/sgml and application/sgml [RFC-1874] includes
parameters for SGML bit combination transformation format (SGML- parameters for SGML bit combination transformation format (SGML-
bctf), and SGML boot attribute (SGML-boot). Since XML does not use bctf), and SGML boot attribute (SGML-boot). Since XML does not use
these parameters, it would be ambiguous if such parameters were these parameters, it would be ambiguous if such parameters were
given for an XML entity. For these reasons, the best approach for given for an XML entity. For these reasons, the best approach for
labeling XML network entities is to provide a new media type for labeling XML network entities is to provide new media types for XML.
XML.
Since XML is an integral part of the WebDAV Distributed Authoring Since XML is an integral part of the WebDAV Distributed Authoring
Protocol, and since World Wide Web Consortium Recommendations have Protocol, and since World Wide Web Consortium Recommendations have
conventionally been assigned IETF tree media types, and since conventionally been assigned IETF tree media types, and since
similar media types (HTML, SGML) have been assigned IETF tree media similar media types (HTML, SGML) have been assigned IETF tree media
types, the XML media types also belong in the IETF tree. types, the XML media types also belong in the IETF media types tree.
2 XML Media Types 2 Notational Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC-2119].
3 XML Media Types
This document introduces two new media types for XML entities, This document introduces two new media types for XML entities,
text/xml and application/xml. Registration information for these text/xml and application/xml. Registration information for these
media types are described in the sections below. media types are described in the sections below.
An XML network entity should be labeled as text/xml under the Every XML entity is suitable for use with the application/xml media
following circumstances: type without modification. But this does not exploit the fact that
XML can be treated as plain text in many cases. MIME user agents
- it is represented by a charset (e.g., UTF-8) which is (and web user agents) that do not have explicit support for
compatible with the requirements for text media types as application/xml will treat it as application/octet-stream, for
described in [RFC-2045] and [RFC-2046], or example, by offering to save it to a file.
- it is transmitted by HTTP, which uses a MIME-like mechanism
that is exempt from the restrictions on the text top-level type
(see section 19.4.1 of HTTP 1.1 [RFC-2068]).
If an XML network entity fails any of these criteria, then it should To indicate that an XML entity should be treated as plain text by
be labeled as application/xml. Specifically, if the XML entity is default, use the text/xml media type. This restricts the encoding
represented by UTF-16 and the protocol is not HTTP, the XML entity used in the XML entity to those that are compatible with the
should be labeled as application/xml. requirements for text media types as described in [RFC-2045] and
[RFC-2046], e.g., UTF-8, but not UTF-16 (except for HTTP).
Some applications of XML will require security or runtime XML provides a general framework for defining sequences of
information specific to these applications. This document does not structured data. In some cases, it may be desirable to define new
prohibit future media types dedicated to such XML applications. media types which use XML but define a specific application of XML,
However, developers of such media types are recommended to use this perhaps due to domain-specific security considerations or runtime
document as a basis. In particular, the charset parameter should be information. This document does not prohibit future media types
used in the same manner. dedicated to such XML applications. However, developers of such
media types are recommended to use this document as a basis. In
particular, the charset parameter should be used in the same manner.
Within the XML specification, XML entities can be classified into Within the XML specification, XML entities can be classified into
four types. In the XML terminology, they are called "document four types. In the XML terminology, they are called "document
entities", "external DTD subsets", "external parsed entities", and entities", "external DTD subsets", "external parsed entities", and
"external parameter entities". The media types text/xml and "external parameter entities". The media types text/xml and
application/xml can be used for any of these four types. application/xml can be used for any of these four types.
2.1 Text/xml Registration 3.1 Text/xml Registration
MIME media type name: text MIME media type name: text
MIME subtype name: xml MIME subtype name: xml
Mandatory parameters: none Mandatory parameters: none
Optional parameters: charset Optional parameters: charset
Although listed as an optional parameter, the use of the charset Although listed as an optional parameter, the use of the charset
parameter is STRONGLY RECOMMENDED, since this information can be parameter is STRONGLY RECOMMENDED, since this information can be
used by XML processors to determine authoritatively the charset used by XML processors to determine authoritatively the
of the XML entity. The charset parameter can also be used to character encoding of the XML entity. The charset parameter can
provide protocol-specific operations, such as charset-based also be used to provide protocol-specific operations, such as
content negotiation in HTTP. charset-based content negotiation in HTTP.
"UTF-8" [RFC-2279] is the recommended value, representing the "UTF-8" [RFC-2279] is the recommended value, representing the
UTF-8 charset. UTF-8 is supported by all conformant XML UTF-8 charset. UTF-8 is supported by all conforming XML
processors [REC-XML]. processors [REC-XML].
If the XML entity is transmitted via HTTP, which uses a MIME- If the XML entity is transmitted via HTTP, which uses a MIME-
like mechanism that is exempt from the restrictions on the text like mechanism that is exempt from the restrictions on the text
top-level type (see section 19.4.1 of HTTP 1.1 [RFC-2068]), top-level type (see section 19.4.1 of HTTP 1.1 [RFC-2068]),
"UTF-16" (Appendix C.3 of [UNICODE] and Amendment 1 of [ISO- "UTF-16" (Appendix C.3 of [UNICODE] and Amendment 1 of [ISO-
10646]) is also recommended. UTF-16 is supported by all 10646]) is also recommended. UTF-16 is supported by all
conformant XML processors [REC-XML]. UTF-16 should be sent in conforming XML processors [REC-XML]. Since the handling of CR,
network byte order (big-endian), but recipients should be able LF and NUL for text types in most MIME applications would cause
to handle both big-endian and little-endian. undesired transformations of individual octets in UTF-16 multi-
octet characters, gateways from HTTP to these MIME applications
If a text/xml entity is received where the charset parameter is MUST transform the XML entity from a text/xml; charset="utf-16"
omitted, XML processors are encouraged to apply the heuristics to application/xml; charset="utf-16".
described in Appendix F of [REC-XML] ("Autodetection of
Character Encodings") to deterministically detect the charset of
the entity before reverting to the text/xml default charset.
Note that if a charset other than UTF-8 (or UTF-16) is used, and Conformant with [RFC-2046], if a text/xml entity is received
the charset is not declared within an XML entity by an XML with the charset parameter omitted, MIME processors and XML
"encoding declaration" (a non-conformant situation according to processors MUST use the default charset value of "us-ascii". If
the XML specification), an XML processor will be unable to the XML entity is transmitted via HTTP, the default charset
determine the charset of the XML entity if the charset parameter value is "ISO-8859-1" (see section 3.7.1 of HTTP 1.1 [RFC-
is not given. The definition of XML encoding declarations is 2068]).
given in 4.3.3 of [REC-XML].
Since the charset parameter is authoritative, the charset is not Since the charset parameter is authoritative, the charset is not
always declared within an XML encoding declaration. Thus, always declared within an XML encoding declaration. Thus,
special care is needed when the recipient strips the MIME header special care is needed when the recipient strips the MIME header
and provides persistent storage of the received XML entity and provides persistent storage of the received XML entity
(e.g., in a file system). Unless the charset is UTF-8 or UTF- (e.g., in a file system). Unless the charset is UTF-8 or UTF-16,
16, the recipient should also persistently store information the recipient SHOULD also persistently store information about
about the charset, perhaps by embedding a correct XML encoding the charset, perhaps by embedding a correct XML encoding
declaration within the XML entity. declaration within the XML entity.
Encoding considerations: Encoding considerations:
May be encoded. In particular, XML entities in UTF-8 must be This media type MAY be encoded as appropriate for the charset
encoded in quoted-printable or base64 unless the underlying MIME and the capabilities of the underlying MIME transport. For 7-bit
transport is 8-bit clean. Since HTTP is 8-bit clean, XML transports, data in both UTF-8 and UTF-16 is encoded in quoted-
entities in UTF-16 do not require encoding. printable or base64. For 8-bit clean transport (e.g., ESMTP,
8BITMIME, or NNTP), UTF-8 is not encoded, but UTF-16 is base64
encoded. For binary clean transports (e.g., HTTP), no content-
transfer-encoding is necessary.
Security considerations: Security considerations:
See section 3 below. See section 4 below.
Interoperability considerations: Interoperability considerations:
XML has proven to be interoperable across WebDAV clients and XML has proven to be interoperable across WebDAV clients and
servers, and for import and export from multiple XML authoring servers, and for import and export from multiple XML authoring
tools. tools.
Published specification: see [REC-XML] Published specification: see [REC-XML]
Applications which use this media type: Applications which use this media type:
skipping to change at page 6, line 12 skipping to change at page 6, line 27
Although no byte sequences can be counted on to always be Although no byte sequences can be counted on to always be
present, XML entities in ASCII-compatible charsets (including present, XML entities in ASCII-compatible charsets (including
UTF-8) often begin with hexadecimal 3C 3F 78 6D 6C ("<?xml"). UTF-8) often begin with hexadecimal 3C 3F 78 6D 6C ("<?xml").
For more information, see Appendix F of [REC-XML]. For more information, see Appendix F of [REC-XML].
File extension(s): .xml, .dtd File extension(s): .xml, .dtd
Macintosh File Type Code(s): "TEXT" Macintosh File Type Code(s): "TEXT"
Person & email address for further information: Person & email address for further information:
Dan Connolly <connolly@w3.org>
Murata Makoto (Family Given) <murata@fxis.fujixerox.co.jp> Murata Makoto (Family Given) <murata@fxis.fujixerox.co.jp>
Jim Whitehead <ejw@ics.uci.edu>
Kurt Conrad <conrad@SagebrushGroup.com>
Intended usage: COMMON Intended usage: COMMON
Author/Change controller: Author/Change controller:
The XML specification is a work product of the World Wide Web The XML specification is a work product of the World Wide Web
Consortium's XML Working Group, and was edited by: Consortium's XML Working Group, and was edited by:
Tim Bray <tbray@textuality.com> Tim Bray <tbray@textuality.com>
Jean Paoli <jeanpa@microsoft.com> Jean Paoli <jeanpa@microsoft.com>
C. M. Sperberg-McQueen <cmsmcq@uic.edu> C. M. Sperberg-McQueen <cmsmcq@uic.edu>
The W3C, and the W3C XML working group, has change control over The W3C, and the W3C XML working group, has change control over
the XML specification. the XML specification.
2.2 Application/xml Registration 3.2 Application/xml Registration
MIME media type name: application MIME media type name: application
MIME subtype name: xml MIME subtype name: xml
Mandatory parameters: none Mandatory parameters: none
Optional parameters: charset Optional parameters: charset
Although listed as an optional parameter, the use of the charset Although listed as an optional parameter, the use of the charset
parameter is STRONGLY RECOMMENDED, since this information can be parameter is STRONGLY RECOMMENDED, since this information can be
used by XML processors to determine authoritatively the charset used by XML processors to determine authoritatively the charset
of the XML entity. The charset parameter can also be used to of the XML entity. The charset parameter can also be used to
provide protocol-specific operations, such as charset-based provide protocol-specific operations, such as charset-based
content negotiation in HTTP. content negotiation in HTTP.
"UTF-16" (Appendix C.3 of [UNICODE] and Amendment 1 of [ISO- "UTF-8" [RFC-2279] and "UTF-16" (Appendix C.3 of [UNICODE] and
10646]) is the recommended values, representing the UTF-16 Amendment 1 of [ISO-10646]) are the recommended values,
charset. This charset is preferred since it is supported by all representing the UTF-8 and UTF-16 charsets, respectively. These
conformant XML processors [REC-XML]. UTF-16 should be sent in charsets are preferred since they are supported by all
network byte order (big-endian), but recipients should be able conforming XML processors [REC-XML].
to handle both big-endian and little-endian.
If an application/xml entity is received where the charset If an application/xml entity is received where the charset
parameter is omitted, XML processors are encouraged to apply the parameter is omitted, no information is being provided about the
heuristics described in Appendix F of [REC-XML] ("Autodetection charset by the MIME Content-Type header. Conforming XML
of Character Encodings") to deterministically detect the charset processors MUST follow the requirements in section 4.3.3 of
of the entity. [REC-XML] which directly address this contingency. However, MIME
processors which are not XML processors should not assume a
Note that if a charset other than UTF-8 or UTF-16 is used, and default charset if the charset parameter is omitted from an
the charset is not declared within an XML entity by an XML application/xml entity.
"encoding declaration" (a non-conformant situation according to
the XML specification), an XML processor will be unable to
determine the charset of the XML entity if the charset parameter
is not given. The definition of XML encoding declarations is
given in 4.3.3 of [REC-XML].
Since the charset parameter is authoritative, the charset is not Since the charset parameter is authoritative, the charset is not
always declared within an XML encoding declaration. Thus, always declared within an XML encoding declaration. Thus,
special care is needed when the recipient strips the MIME header special care is needed when the recipient strips the MIME header
and provides persistent storage of the received XML entity and provides persistent storage of the received XML entity
(e.g., in a file system). Unless the charset is UTF-8 or UTF- (e.g., in a file system). Unless the charset is UTF-8 or
16, the recipient should also persistently store information UTF-16, the recipient SHOULD also persistently store information
about the charset, perhaps by embedding a correct XML encoding about the charset, perhaps by embedding a correct XML encoding
declaration within the XML entity. declaration within the XML entity.
Encoding considerations: Encoding considerations:
May be encoded. In particular, XML entities in UTF-16 must be This media type MAY be encoded as appropriate for the charset
encoded in base64 unless the underlying MIME transport is binary and the capabilities of the underlying MIME transport. For 7-bit
safe; XML entities in UTF-8 must be encoded in quoted-printable transports, data in both UTF-8 and UTF-16 is encoded in quoted-
or base64 unless the underlying MIME transport is 8-bit clean. printable or base64. For 8-bit clean transport (e.g., ESMTP,
8BITMIME, or NNTP), UTF-8 is not encoded, but UTF-16 is base64
encoded. For binary clean transport (e.g., HTTP), no content-
transfer-encoding is necessary.
Security considerations: Security considerations:
See section 3 below. See section 4 below.
Interoperability considerations: Interoperability considerations:
XML has proven to be interoperable for import and export from XML has proven to be interoperable for import and export from
multiple XML authoring tools. multiple XML authoring tools.
Published specification: see [REC-XML] Published specification: see [REC-XML]
Applications which use this media type: Applications which use this media type:
skipping to change at page 8, line 15 skipping to change at page 8, line 38
and those in UTF-16 often begin with hexadecimal FE FF 00 3C 00 and those in UTF-16 often begin with hexadecimal FE FF 00 3C 00
3F 00 78 00 6D or FF FE 3C 00 3F 00 78 00 6D 00 (the Byte Order 3F 00 78 00 6D or FF FE 3C 00 3F 00 78 00 6D 00 (the Byte Order
Mark (BOM) followed by "<?xml"). For more information, see Mark (BOM) followed by "<?xml"). For more information, see
Annex F of [REC-XML]. Annex F of [REC-XML].
File extension(s): .xml, .dtd File extension(s): .xml, .dtd
Macintosh File Type Code(s): "TEXT" Macintosh File Type Code(s): "TEXT"
Person & email address for further information: Person & email address for further information:
Dan Connolly <connolly@w3.org>
Murata Makoto (Family Given) <murata@fxis.fujixerox.co.jp> Murata Makoto (Family Given) <murata@fxis.fujixerox.co.jp>
Jim Whitehead <ejw@ics.uci.edu>
Kurt Conrad <conrad@SagebrushGroup.com>
Intended usage: COMMON Intended usage: COMMON
Author/Change controller: Author/Change controller:
The XML specification is a work product of the World Wide Web The XML specification is a work product of the World Wide Web
Consortium's XML Working Group, and was edited by: Consortium's XML Working Group, and was edited by:
Tim Bray <tbray@textuality.com> Tim Bray <tbray@textuality.com>
Jean Paoli <jeanpa@microsoft.com> Jean Paoli <jeanpa@microsoft.com>
C. M. Sperberg-McQueen <cmsmcq@uic.edu> C. M. Sperberg-McQueen <cmsmcq@uic.edu>
The W3C, and the W3C XML working group, has change control over The W3C, and the W3C XML working group, has change control over
the XML specification. the XML specification.
3 Security Considerations 4 Security Considerations
XML, as a subset of SGML, has the same security considerations as XML, as a subset of SGML, has the same security considerations as
specified in [RFC-1874]. specified in [RFC-1874].
To paraphrase section 3 of [RFC-1874], XML entities contain To paraphrase section 3 of [RFC-1874], XML entities contain
information to be parsed and processed by the recipient's XML information to be parsed and processed by the recipient's XML
system. These entities may contain and such systems may permit system. These entities may contain and such systems may permit
explicit system level commands to be executed while processing the explicit system level commands to be executed while processing the
data. To the extent that an XML system will execute arbitrary data. To the extent that an XML system will execute arbitrary
command strings, recipients of XML entities may be at risk. In command strings, recipients of XML entities may be at risk. In
skipping to change at page 11, line 5 skipping to change at page 10, line 14
Note that it is also possible to construct XML documents which make Note that it is also possible to construct XML documents which make
use of what XML terms "entity references" (using the XML meaning of use of what XML terms "entity references" (using the XML meaning of
the term "entity", which differs from the MIME definition of this the term "entity", which differs from the MIME definition of this
term), to construct repeated expansions of text. Recursive term), to construct repeated expansions of text. Recursive
expansions are prohibited [REC-XML] and XML processors are required expansions are prohibited [REC-XML] and XML processors are required
to detect them. However, even non-recursive expansions may cause to detect them. However, even non-recursive expansions may cause
problems with the finite computing resources of computers, if they problems with the finite computing resources of computers, if they
are performed many times. are performed many times.
4 References 5 The Byte Order Mark (BOM) and Conversions to/from UTF-16
[ISO-10646] ISO/IEC, Information Technology - Universal Multiple- The XML Recommendation, in section 4.3.3, specifies that UTF-16 XML
Octet Coded Character Set (UCS) - Part 1: Architecture and Basic entities must begin with a byte order mark (BOM), which is the ZERO
WIDTH NO-BREAK SPACE character, hexadecimal sequence 0xFEFF (or
0xFFFE, depending on endian). The XML Recommendation further states
that the BOM is an encoding signature, and is not part of either the
markup or the character data of the XML document.
Due to the BOM, applications which convert XML from the UTF-16
encoding to another encoding SHOULD strip the BOM before conversion.
Similarly, when converting from another encoding into UTF-16, the
BOM SHOULD be added after conversion is complete.
6 Examples
The examples below give the value of the Content-type MIME header
and the XML declaration (which includes the encoding declaration)
inside the XML entity. For UTF-16 examples, the Byte Order Mark
character is denoted as "{BOM}", and the XML declaration is assumed
to come at the beginning of the XML entity, immediately following
the BOM. Note that other MIME headers may be present, and the XML
entity may contain other data in addition to the XML declaration;
the examples focus on the Content-type header and the encoding
declaration for clarity.
6.1 text/xml with UTF-8 Charset
Content-type: text/xml; charset="utf-8"
<?xml version="1.0" encoding="utf-8"?>
This is the recommended charset value for use with text/xml. Since
the charset parameter is provided, MIME and XML processors must
treat the enclosed entity as UTF-8 encoded.
If sent using a 7-bit transport (e.g. SMTP), the XML entity must use
a content-transfer-encoding of either quoted-printable or base64.
For an 8-bit clean transport (e.g., ESMTP, 8BITMIME, or NNTP), or a
binary clean transport (e.g., HTTP) no content-transfer-encoding is
necessary.
6.2 text/xml with UTF-16 Charset
Content-type: text/xml; charset="utf-16"
{BOM}<?xml version='1.0' encoding='utf-16'?>
This is possible only when the XML entity is transmitted via HTTP,
which uses a MIME-like mechanism and is a binary-clean protocol,
hence does not perform CR and LF transformations and allows NUL
octets. This differs from typical text MIME type processing (see
section 19.4.1 of HTTP 1.1 [RFC-2068] for details).
Since HTTP is binary clean, no content-transfer-encoding is
necessary.
6.3 text/xml with ISO-2022-KR Charset
Content-type: text/xml; charset="iso-2022-kr"
<?xml version="1.0" encoding='iso-2022-kr'?>
This example shows text/xml with a Korean charset (e.g., Hangul)
encoded following the specification in [RFC-1557]. Since the
charset parameter is provided, MIME and XML processors must treat
the enclosed entity as encoded per [RFC-1557].
Since ISO-2022-KR has been defined to use only 7 bits of data, no
content-transfer-encoding is necessary with any transport.
6.4 text/xml with Omitted Charset
Content-type: text/xml
{BOM}<?xml version="1.0" encoding="utf-16"?>
This example shows text/xml with the charset parameter omitted. In
this case, MIME and XML processors must assume the charset is
"us-ascii", the default charset value for text media types specified
in [RFC-2046], except when the underlying transport defines a
different default charset, e.g., if the XML entity is transmitted
via HTTP, the default charset value is "ISO-8859-1" (see section
3.7.1 of HTTP 1.1 [RFC-2068]).
Omitting the charset parameter is NOT RECOMMENDED for text/xml. For
example, even if the contents of the XML entity are UTF-16 or UTF-8,
or the XML entity has an explicit encoding declaration, XML and MIME
processors must assume the charset is "us-ascii".
6.5 application/xml with UTF-16 Charset
Content-type: application/xml; charset="utf-16"
{BOM}<?xml version="1.0"?>
This is a recommended charset value for use with application/xml.
Since the charset parameter is provided, MIME and XML processors
must treat the enclosed entity as UTF-16 encoded.
If sent using a 7-bit transport (e.g., SMTP) or an 8-bit clean
transport (e.g., ESMTP, 8BITMIME, or NNTP), the XML entity must be
encoded in quoted-printable or base64. For a binary clean transport
(e.g., HTTP), no content-transfer-encoding is necessary.
6.6 application/xml with ISO-2022-KR Charset
Content-type: application/xml; charset="iso-2022-kr"
<?xml version="1.0" encoding="iso-2022-kr"?>
This example shows application/xml with a Korean charset (e.g.,
Hangul) encoded following the specification in [RFC-1557]. Since
the charset parameter is provided, MIME and XML processors must
treat the enclosed entity as encoded per [RFC-1557], independent of
whether the XML entity has an internal encoding declaration (this
example does show such a declaration, which agrees with the charset
parameter).
Since ISO-2022-KR has been defined to use only 7 bits of data, no
content-transfer-encoding is necessary with any transport.
6.7 application/xml with Omitted Charset and UTF-16 XML Entity
Content-type: application/xml
{BOM}<?xml version='1.0'?>
For this example, the XML entity begins with a BOM. Since the
charset has been omitted, a conforming XML processor follows the
requirements of [REC-XML], section 4.3.3. Specifically, the XML
processor reads the BOM, and thus knows deterministically that the
charset encoding is UTF-16.
An XML-unaware MIME processor should make no assumptions about the
charset of the XML entity.
6.8 application/xml with Omitted Charset and UTF-8 Entity
Content-type: application/xml
<?xml version='1.0'?>
In this example, the charset parameter has been omitted, and there
is no BOM. Since there is no BOM, the XML processor follows the
requirements in section 4.3.3, and optionally applies the mechanism
described in appendix F (which is non-normative) of [REC-XML] to
determine the charset encoding of UTF-8. The XML entity does not
contain an encoding declaration, but since the encoding is UTF-8,
this is still a conforming XML entity.
An XML-unaware MIME processor should make no assumptions about the
charset of the XML entity.
6.9 application/xml with Omitted Charset and Internal Encoding
Declaration
Content-type: application/xml
<?xml version='1.0' encoding="ISO-10646-UCS-4"?>
In this example, the charset parameter has been omitted, and there
is no BOM. However, the XML entity does have an encoding
declaration inside the XML entity which specifies the entity's
charset. Following the requirements in section 4.3.3, and optionally
applying the mechanism described in appendix F (non-normative) of
[REC-XML], the XML processor determines the charset encoding of the
XML entity (in this example, UCS-4).
An XML-unaware MIME processor should make no assumptions about the
charset of the XML entity.
7 References
[ISO-10646] ISO/IEC, Information Technology -- Universal Multiple-
Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic
Multilingual Plane, May 1993. Multilingual Plane, May 1993.
[ISO-8897] ISO (International Organization for Standardization) ISO [ISO-8897] ISO (International Organization for Standardization) ISO
8879:1986(E) Information Processing _ Text and Office Systems _ 8879:1986(E) Information Processing -- Text and Office Systems --
Standard Generalized Markup Language (SGML). First edition _ 1986- Standard Generalized Markup Language (SGML). First edition -- 1986-
10-15. 10-15.
[REC-XML] T. Bray, J. Paoli, C. M. Sperberg-McQueen, "Extensible [REC-XML] T. Bray, J. Paoli, C. M. Sperberg-McQueen, "Extensible
Markup Language (XML)." World Wide Web Consortium Recommendation Markup Language (XML)" World Wide Web Consortium Recommendation REC-
REC-xml-19980210. http://www.w3.org/TR/1998/REC-xml-19980210. xml-19980210. http://www.w3.org/TR/1998/REC-xml-19980210.
[RFC-1874] E. Levinson. "SGML Media Types_ Accurate Information [RFC-1557] U. Choi, K. Chon, H. Park. "Korean Character Encoding for
Internet Messages" KAIST, Solvit Chosun Media. RFC 1557. December,
1993.
[RFC-1874] E. Levinson. "SGML Media Types" Accurate Information
Systems. RFC 1874. December, 1995. Systems. RFC 1874. December, 1995.
[RFC-2119] S. Bradner. "Key words for use in RFCs to Indicate
Requirement Levels." RFC 2119, BCP 14. Harvard University. March,
1997.
[RFC-2045] N. Freed, N. Borenstein. "Multipurpose Internet Mail [RFC-2045] N. Freed, N. Borenstein. "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message Bodies" Extensions (MIME) Part One: Format of Internet Message Bodies"
Innosoft, First Virtual. RFC 2045. November, 1996. Innosoft, First Virtual. RFC 2045. November, 1996.
[RFC-2046] N. Freed, N. Borenstein. "Multipurpose Internet Mail [RFC-2046] N. Freed, N. Borenstein. "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types_ Innosoft, First Virtual. Extensions (MIME) Part Two: Media Types" Innosoft, First Virtual.
RFC 2046. November, 1996. RFC 2046. November, 1996.
[RFC-2068] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, T. Berners- [RFC-2068] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, T. Berners-
Lee. "Hypertext Transfer Protocol -- HTTP/1.1" UC Irvine, DEC, Lee. "Hypertext Transfer Protocol -- HTTP/1.1" UC Irvine, DEC,
MIT/LCS. RFC 2068. January, 1997. MIT/LCS. RFC 2068. January, 1997.
[RFC-2279] F. Yergeau, "UTF-8, a transformation format of ISO [RFC-2279] F. Yergeau, "UTF-8, a transformation format of ISO 10646"
10646", January 1998. RFC 2279. January 1998.
[UNICODE] The Unicode Consortium, "The Unicode Standard -- Version [UNICODE] The Unicode Consortium, "The Unicode Standard -- Version
2.0", Addison-Wesley, 1996. 2.0", Addison-Wesley, 1996.
5 Acknowledgements 8 Acknowledgements
Chris Newman and Yaron Y. Goland both contributed content to the Chris Newman and Yaron Y. Goland both contributed content to the
security considerations section of this document. In particular, security considerations section of this document. In particular,
some text in the security considerations section is copied verbatim some text in the security considerations section is copied verbatim
from draft-newman-mime-textpara-00, by permission of the author. from draft-newman-mime-textpara-00, by permission of the author.
Discussions with Ned Freed and Dan Connolly helped refine the Chris Newman additionally contributed content to the encoding
author's understanding of the text media type. considerations sections. Dan Connolly contributed content discussing
when to use text/xml. Discussions with Ned Freed and Dan Connolly
helped refine the author's understanding of the text media type;
feedback from Larry Masinter was also very helpful in understanding
media type registration issues.
Members of the W3C XML Working Group and XML Special Interest group Members of the W3C XML Working Group and XML Special Interest group
have made significant contributions to this document. have made significant contributions to this document, and the
authors would like to specially recognize James Clark, Martin
Duerst, Rick Jelliffe, Gavin Nicol for their many thoughtful
comments.
6 Author's Address 9 Addresses of Authors
E. James Whitehead, Jr. E. James Whitehead, Jr.
Dept. of Information and Computer Science Dept. of Information and Computer Science
University of California, Irvine University of California, Irvine
Irvine, CA 92697-3425 Irvine, CA 92697-3425
Email: ejw@ics.uci.edu Email: ejw@ics.uci.edu
Murata Makoto (Family Given) Murata Makoto (Family Given)
Fuji Xerox Information Systems, Fuji Xerox Information Systems,
KSP 9A7, 2-1, Sakado 3-chome, Takatsu-ku, KSP 9A7, 2-1, Sakado 3-chome, Takatsu-ku,
 End of changes. 42 change blocks. 
114 lines changed or deleted 312 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/