Network Working Group S. Hollenbeck
Internet-Draft VeriSign, Inc.
Expires: October 28, 2002 M. Rose
Dover Beach Consulting, Inc.
L. Masinter
Adobe Systems Incorporated
April 29, 2002
Guidelines for the Use of XML within IETF Protocols
draft-hollenbeck-ietf-xml-guidelines-02.txt
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at http://
www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on October 28, 2002.
Copyright Notice
Copyright (C) The Internet Society (2002). All Rights Reserved.
Abstract
The Extensible Markup Language (XML) is a framework for structuring
data. While it evolved from SGML -- a markup language primarily
focused on structuring documents -- XML has evolved to be a widely-
used mechanism for representing structured data.
There are a wide variety of Internet protocols being developed; many
have need for a representation for structured data relevant to their
application. There has been much interest in the use of XML as a
Hollenbeck, et al. Expires October 28, 2002 [Page 1]
Internet-Draft XML Within IETF Protocols April 2002
representation method. This document describes basic XML concepts,
analyzes various alternatives in the use of XML, and provides
guidelines for the use of XML within IETF standards-track protocols.
Intended Publication Status
It is the goal of the authors that this draft (when completed and
then approved by the IESG) be published as a Best Current Practice
(BCP).
Conventions Used In This Document
This document recommends, as policy, what specifications for Internet
protocols -- and, in particular, IETF standards track protocol
documents -- should include as normative language within them. The
capitalized keywords "SHOULD", "MUST", "REQUIRED", etc. are used in
the sense of how they would be used within other documents with the
meanings as specified in RFC 2119 [1].
Discussion Venue
The authors welcome discussion and comments relating to the topics
presented in this document. Though direct comments to the authors
are welcome, public discussion is taking place on the "ietf-xml-
use@imc.org" mailing list. To join the list, send a message to
"ietf-xml-use-request@imc.org" with the word "subscribe" in the body
of the message. There is a web site for the archives of the list at
http://www.imc.org/ietf-xml-use/.
Hollenbeck, et al. Expires October 28, 2002 [Page 2]
Internet-Draft XML Within IETF Protocols April 2002
Table of Contents
1. Introduction and Overview . . . . . . . . . . . . . . . . . 4
1.1 Intended Audience . . . . . . . . . . . . . . . . . . . . . 4
1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 XML Evolution . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 XML Users, Support Groups, and Additional Information . . . 5
2. XML Selection Considerations . . . . . . . . . . . . . . . . 6
3. XML Alternatives . . . . . . . . . . . . . . . . . . . . . . 8
4. XML Use Considerations and Recommendations . . . . . . . . . 10
4.1 XML Declarations . . . . . . . . . . . . . . . . . . . . . . 10
4.2 XML Processing Instructions . . . . . . . . . . . . . . . . 10
4.3 Well-Formedness . . . . . . . . . . . . . . . . . . . . . . 11
4.4 Validity and Extensibility . . . . . . . . . . . . . . . . . 11
4.5 Namespaces . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.5.1 Namespaces and Attributes . . . . . . . . . . . . . . . . . 13
4.6 Element and Attribute Design Considerations . . . . . . . . 13
4.7 Binary Data . . . . . . . . . . . . . . . . . . . . . . . . 15
4.8 Incremental Processing . . . . . . . . . . . . . . . . . . . 15
5. Internationalization Considerations . . . . . . . . . . . . 16
5.1 Character Sets and Encodings: UTF-8 and UTF-16 . . . . . . . 16
5.2 Language Declaration . . . . . . . . . . . . . . . . . . . . 16
5.3 Other Considerations . . . . . . . . . . . . . . . . . . . . 16
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . 18
7. Security Considerations . . . . . . . . . . . . . . . . . . 19
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 20
Normative References . . . . . . . . . . . . . . . . . . . . 21
Informative References . . . . . . . . . . . . . . . . . . . 22
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 24
A. Appendix A: Change History . . . . . . . . . . . . . . . . . 26
Full Copyright Statement . . . . . . . . . . . . . . . . . . 28
Hollenbeck, et al. Expires October 28, 2002 [Page 3]
Internet-Draft XML Within IETF Protocols April 2002
1. Introduction and Overview
The Extensible Markup Language (XML) is a framework for structuring
data. While it evolved from the Standard Generalized Markup Language
(SGML) [31] -- a markup language primarily focused on structuring
documents -- XML has evolved to be a widely-used mechanism for
representing structured data in protocol exchanges. See [39] for an
introduction to XML.
1.1 Intended Audience
Many Internet protocol designers are considering using XML and XML
fragments within the context of existing and new Internet protocols.
This document is intended as a guide to XML usage and as IETF policy
for standards track documents. Experienced XML practitioners will
likely already be familiar with the background material here, but the
guidelines are intended to be appropriate for those readers as well.
1.2 Scope
This document is intended to give guidelines for the use of XML
content within a larger protocol. The goal is not to suggest that
XML is the "best" or "preferred" way to represent data; rather, the
goal is to lay out the context for the use XML within a protocol once
other factors point to XML as a possible data representation
solution.
There are a number of protocol frameworks already in use or under
development which focus entirely on "XML protocol": the exclusive use
of XML as the data representation in the protocol. For example, the
World Wide Web Consortium (W3C) is developing an XML Protocol
framework [41] based on the Simple Object Access Protocol (SOAP)
[42]. The applicability of such protocols is not part of the scope
of this document.
In addition, there are higher-level representation frameworks, based
on XML, that have been designed as carriers of certain classes of
information; for example, the Resource Description Framework (RDF)
[36] is an XML-based representation for logical assertions. This
document does not provide guidelines for the use of such frameworks.
1.3 XML Evolution
Originally published in February 1998 [35], XML's popularity has led
to several additions to the base specification. Although these
additions are designed to be consistent with version 1.0 of XML, they
have varying levels of stability, consensus, and implementation.
Accordingly, this document identifies the major evolutionary features
Hollenbeck, et al. Expires October 28, 2002 [Page 4]
Internet-Draft XML Within IETF Protocols April 2002
of XML and makes suggestions as to the circumstances in which each
feature should be used.
1.4 XML Users, Support Groups, and Additional Information
There are many XML support groups, some devoted to the entire XML
industry (e.g., http://xml.org/), some devoted to developers (http://
xmlhack.com/), some devoted to the business applications of XML
(e.g., http://oasis-open.org/), and many, many groups devoted to the
use of XML in a particular context.
It is beyond the scope of this document to provide a comprehensive
list of referrals. Interested readers are directed to the three
links above as starting points, as well as their favorite Internet
search engine.
Hollenbeck, et al. Expires October 28, 2002 [Page 5]
Internet-Draft XML Within IETF Protocols April 2002
2. XML Selection Considerations
XML is a tool that provides a means towards an end. Choosing the
right tool for a given task is an essential part of ensuring that the
task can be completed in a satisfactory manner. This section
describes factors to be aware of when considering XML as a tool for
use in IETF protocols:
o XML is a meta-markup language that can be used to define markup
languages for specific domains and problem spaces.
o XML provides both logical structure and physical structure to
describe data. Data framing is built-in.
o XML includes features to support internationalization and
localization.
o XML is extensible. New tags (and thus new protocol elements) can
be defined without requiring changes to XML itself.
o XML is still evolving. The formal specifications are still being
influenced and updated as use experience is gained and applied.
o XML is text-based, so XML fragments are easily created, edited,
and managed using common utilities. Further, being text-based
means it more readily supports incremental development, debugging,
and logging. A simple "canned" XML fragment can be embedded
within a program as a string constant, rather than constructed.
o Binary data has to be encoded into a text-based form to be
represented in XML.
o XML is verbose when compared with many other structured data
representation languages. A representation with element
extensibility and human readability typically requires more bits
when compared to one optimized for efficient machine processing.
o XML implementations are still relatively new. As designers and
implementers gain experience, it is not uncommon to find defects
in early and current products.
o XML support is available in a large number of software development
utilities, available in both open source and proprietary products.
o XML processing speed can be an issue in some environments. XML
processing can be slower because XML data streams may be larger
than other representations, and the use of general purpose XML
parsers will add a software layer with its own performance costs
Hollenbeck, et al. Expires October 28, 2002 [Page 6]
Internet-Draft XML Within IETF Protocols April 2002
(though these costs can be reduced through consistent use of an
optimized parser). Further, processing XML requires scanning the
entire XML data stream; in some situations, this is the primary
overhead.
Hollenbeck, et al. Expires October 28, 2002 [Page 7]
Internet-Draft XML Within IETF Protocols April 2002
3. XML Alternatives
This document focuses on guidelines for the use of XML. It is useful
to consider why one might use XML as opposed to some other mechanism.
This section considers some other commonly used representation
mechanisms and compares XML to those alternatives.
For many fundamental protocols, the extensibility requirements are
modest, and the performance requirements are high enough that fixed
binary data blocks are the appropriate representation; mechanisms
such as XML merely add bloat [25].
In addition, there are other representation and extensibility
frameworks that have been used successfully within communication
protocols. For example, Abstract Syntax Notation 1 (ASN.1) [29]
along with the corresponding Basic Encoding Rules (BER) [30] are part
of the OSI communication protocol suite, and have been used in many
subsequent communications standards (e.g., the ANSI Information
Retrieval protocol [28] and the Simple Network Management Protocol
(SNMP) [15]). The External Data Representation (XDR) [16] and
variations of it have been used in many other distributed network
applications (e.g., the Network File System protocol [24]). With
ASN.1, data types are explicit in the representation, while with XDR,
the data types of components are described externally as part of an
interface specification.
Many other protocols use data structures directly (without data
encapsulation) by describing the data structure with Backus Normal
Form (BNF) [26]; many IETF protocols use an Augmented Backus-Naur
Form (ABNF) [18]. The Simple Mail Transfer Protocol [23] is an
example of a protocol specified using ABNF.
Representation methods differ from XML in several important ways:
Specification encoding: XML schema are themselves represented in XML,
and the specification itself can be written using arbitrary
characters from the language. The specification of representations
in other systems (ASN.1, XDR, ABNF) are generally in ASCII [27] text.
Text Encoding and character sets: the character encoding used to
represent a formal specification. XML defines a consistent character
model based on ISO 10646 [32], with a base that supports at least
UTF-8 [4] and UTF-16 [22], and allows for other encodings. While
ASN.1 and XDR may carry strings in any encoding, there is no common
mechanism for defining character encodings within them. Typically,
ABNF definitions tend to be defined in terms of octets or characters
in ASCII.
Hollenbeck, et al. Expires October 28, 2002 [Page 8]
Internet-Draft XML Within IETF Protocols April 2002
Data Encoding: XML is based on a character model. XML Schema [11]
includes mechanisms for representing some datatypes (integer, date,
array, etc.) but other binary datatypes are encoded in Base64 [17].
ASN.1 and XDR have rich mechanisms for encoding a wide variety of
datatypes.
Extensibility: XML has a rich extensibility model: XML
representations can frequently be versioned independently. Many XML
representations can be extended by adding new element names and
attributes (if done compatibly); other extensions can be added by
defining new XML namespaces [9], though there is no standard
mechanism in XML to indicating whether or not new extensions are
mandatory to recognize. ASN.1 is similarly extensible through the
use of Object Identifiers (OIDs). XDR representations tend to not be
independently extensible by different parties because the framing and
datatypes are implicit and not self-describing. The extensibility of
BNF-based protocol elements needs to be explicitly planned.
Legibility of protocol elements: As noted above, XML is text-based,
and thus carries the advantages (and disadvantages) of text-based
protocol elements. Typically this is shared with (A)BNF-defined
protocol elements. ASN.1 and XDR use binary encodings which are not
visible.
ASN.1, XDR, and BNF are described here as examples of alternatives to
XML for use in IETF protocols. There are other alternatives, but a
complete enumeration of all possible alternatives is beyond the scope
of this document.
Hollenbeck, et al. Expires October 28, 2002 [Page 9]
Internet-Draft XML Within IETF Protocols April 2002
4. XML Use Considerations and Recommendations
This section notes several aspects of XML and makes recommendations
for use. Since the 1998 publication of XML version 1 [35], an
editorial second edition [8] was published in 2000; this section
refers to the second edition.
4.1 XML Declarations
An XML declaration (defined in section 2.8 of [8]) is a small header
at the beginning of an XML data stream that indicates the XML version
and the character encoding used. For example,
specifies the use of XML version 1 and UTF-8 character encoding.
Protocol specifications must be clear about use of XML declarations.
In some cases, the XML used is a small fragment in a larger context,
where the XML version is fixed at "1.0" and the character encoding is
known to be "UTF-8". In those cases, the XML declaration might add
extra overhead. In other cases, the XML is a larger component which
may find its way alone as an external entity body, transported as a
MIME message. In those cases, the XML declaration is an important
marker and useful for reliability and extensibility. The XML
declaration is also an important marker for character set/encoding
(see Section 5.1), if any encoding other than UTF-8 is allowed. In
general, an XML protocol element should either disallow XML
declarations ("MUST NOT be used") or require one ("MUST have"). A
design which allows but does not require an XML declaration leads to
unreliable implementations. When in doubt, require an XML
declaration.
4.2 XML Processing Instructions
An XML processing instruction (defined in section 2.6 of [8]) is a
component of an XML document that signals extra "out of band"
information to the receiver; a common use of XML processing
instructions are for document applications. For example, the XML2RFC
application used to generate this document and described in [21]
supports a "table of contents" processing instruction:
Again, protocol specifications must be clear about whether -- and if
so, what kind of -- XML processing instructions are allowed.
However, XML processing instructions appear to have rare
applicability to XML fragments embedded in Internet protocols, and it
Hollenbeck, et al. Expires October 28, 2002 [Page 10]
Internet-Draft XML Within IETF Protocols April 2002
is recommended that their use be explicitly disallowed ("MUST NOT
use"). In cases where XML processing instructions are allowed, the
nature of the allowable processing instructions should be specified
explicitly.
4.3 Well-Formedness
A well-formed XML instance is one in which all character and markup
data conforms to a specific set of structural rules defined in
section 2.1 of [8].
Character and markup data that is not well-formed is not XML; well-
formedness is the basis for syntactic compatibility with XML.
Without well-formedness, all of the advantages of using XML
disappear. For this reason, it is recommended that protocol
specifications explicitly require XML well-formedness ("MUST be well-
formed").
The IETF has a long-standing tradition of "be liberal in what you
accept" that might seem to be at odds with this recommendation.
Given that XML requires well-formedness, XML parsers are typically
intolerant of well-formedness errors. Protocol designers need to
recognize this limitation and provide specific guidelines for
recovery when malformed data is encountered.
4.4 Validity and Extensibility
There are formal mechanisms for XML for defining structural and data
content constraints that constrain the identity of elements or
attributes or the values contained within them:
A "Document Type Definition" (DTD) is defined in section 2.8 of [8];
the concept came from a similar mechanism for SGML.
XML Schema (defined in [10] and [11]) provides additional features to
allow a tighter and more precise specification of allowable protocol
syntax and data type specifications.
There are also a number of other mechanisms for describing XML
instance validity; these include, for example, Schematron [44], RELAX
NG [45], and the Document Schema Definition Language [33].
There is ongoing discussion within the XML community on the use and
applicability of various constraint mechanisms. The choice of tool
depends on the needs for extensibility or for a formal language and
mechanism for constraining permissible values and validating
adherence to the constraints. An Internet protocol that uses XML
must choose whether or not to describe "valid" XML protocol elements
Hollenbeck, et al. Expires October 28, 2002 [Page 11]
Internet-Draft XML Within IETF Protocols April 2002
using an appropriate validity mechanism, and whether and how to
require validity. Many protocols have successfully used the DTD
mechanism for describing validity, whether or not they insist that
all XML elements are valid. However, the features in XML Schema for
data typing and constraining values seem very appropriate for many of
the uses of XML.
This document recommends that, in the absence of reasons to choose
some other mechanism, protocol designs use W3C XML Schema as the
language for describing validity. Note, though, that there is still
some controversy within the XML community relating to validity and
XML Schema; the other mechanisms described above have largely been
developed as a result of the ongoing debate.
Whether protocol definitions also require the corresponding protocol
elements be valid according to the schema depends to some degree on
the extensibility design; for example, if the protocol has its own
versioning mechanism, way of updating the schema, or pointing to a
new one. The use of XML namespaces (Section 4.5) allows other kinds
of extensibility without compromising schema validity.
For whatever formalism chosen, there are often additional constraints
that cannot be expressed in that formalism. These additional
requirements should be clearly called out in the specification.
Ideally, a process model might first check for well-formedness; if
OK, apply the primary formalism and, if the instances "passes", apply
the other constraints so that the entire set (or as mush is machine
processable) can be checked at the same time.
4.5 Namespaces
XML namespaces, defined in [9], provide a means of assigning markup
to a specific vocabulary. If two elements or attributes from
different vocabularies have the same name, they can be distinguished
unambiguously if they belong to different namespaces. Additionally,
namespaces provide significant support for protocol extensibility as
they can be defined, reused, and processed dynamically.
Markup vocabulary collisions are very possible when namespaces are
not used to separate and uniquely identify vocabularies. Protocol
definitions should use existing XML namespaces where appropriate.
When a new namespace is needed, the "namespace name" is a URI that is
used to identify the namespace; it's also useful for that URI to
point to a description of the namespace. Typically (and recommended
practice in W3C) is to assign namespace names using persistent http
URIs.
In the case of namespaces in IETF standards-track documents, it would
Hollenbeck, et al. Expires October 28, 2002 [Page 12]
Internet-Draft XML Within IETF Protocols April 2002
be useful if there were some permanent part of the IETF's own web
space that could be used for this purpose. In lieu of such, other
permanent URIs can be used, e.g., URNs in the IETF URN namespace (see
[13] and [14]).
4.5.1 Namespaces and Attributes
There is a frequently misunderstood aspect of the relationship
between unprefixed attributes and the default XML namespace - the
natural assumption is that an unprefixed attribute is qualified by
the default namespace, but this is not true. Rather, the unprefixed
attribute belongs to a set of attributes that are defined
specifically for the element to which it is applied. Thus, in the
following: