< draft-ietf-trade-hiroshi-dom-hash-02.txt   draft-ietf-trade-hiroshi-dom-hash-03.txt >
Network Working Group Hirosi Maruyama Network Working Group Hirosi Maruyama
INTERNET-DRAFT Kent Tamura INTERNET-DRAFT Kent Tamura
February 1999 Naohiko Uramoto September 1999 Naohiko Uramoto
Expires: March 2000 IBM Expires: March 2000 IBM
draft-ietf-trade-hiroshi-dom-hash-02.txt draft-ietf-trade-hiroshi-dom-hash-03.txt
Digest Values for DOM (DOMHASH) Digest Values for DOM (DOMHASH)
------ ------ --- --- --------- ------ ------ --- --- ---------
Hiroshi Maruyama Hiroshi Maruyama
Kent Tamura Kent Tamura
Naohiko Uramoto Naohiko Uramoto
Status of This Document Status of This Document
This draft, file name draft-ietf-trade-hiroshi-dom-hash-02.txt, is intended to This draft, file name draft-ietf-trade-hiroshi-dom-hash-03.txt, is
be become an Informational RFC. Distribution of this document is intended to be become an Informational RFC. Distribution of this
unlimited. Comments should be sent to the authors. document is unlimited. Comments should be sent to the authors or the
IETF TRADE working group mailing list <ietf-trade@elistx.com>.
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. Internet-Drafts are working all provisions of Section 10 of RFC2026. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts. working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six Internet-Drafts are draft documents valid for a maximum of six
months. Internet-Drafts may be updated, replaced, or obsoleted by months. Internet-Drafts may be updated, replaced, or obsoleted by
other documents at any time. It is not appropriate to use Internet- other documents at any time. It is not appropriate to use Internet-
skipping to change at page 2, line 15 skipping to change at page 2, line 8
Abstract Abstract
This internet draft defines a clear and unambiguous definition of This internet draft defines a clear and unambiguous definition of
digest (hash) values of the XML objects regardless of the surface digest (hash) values of the XML objects regardless of the surface
string variation of XML. This definition can be used for XML digital string variation of XML. This definition can be used for XML digital
signature as well efficient replication of XML objects. signature as well efficient replication of XML objects.
Table of Contents Table of Contents
Status of This Document....................................1 Status of This Document....................................1
Abstract...................................................1
Abstract...................................................2
Table of Contents..........................................2 Table of Contents..........................................2
1. Introduction............................................3 1. Introduction............................................3
2. Digest Calculation......................................4 2. Digest Calculation......................................4
2.1. Overview..............................................4 2.1. Overview..............................................4
2.2. Namespace Considerations..............................5 2.2. Namespace Considerations..............................5
2.3. Definition with Code Fragments........................6 2.3. Definition with Code Fragments........................6
2.3.1. Text Nodes..........................................6 2.3.1. Text Nodes..........................................6
2.3.2. ProcessingInstruction Nodes.........................7 2.3.2. ProcessingInstruction Nodes.........................7
2.3.3. Attr Nodes..........................................7 2.3.3. Attr Nodes..........................................8
2.3.4. Element Nodes.......................................8 2.3.4. Element Nodes.......................................8
3. Discussion..............................................9 3. Discussion.............................................10
4. Security Considerations.................................9 4. Security Considerations................................10
References................................................11 References................................................11
Author's Address..........................................11 Author's Address..........................................11
Expiration and File Name..................................11 Expiration and File Name..................................11
1. Introduction 1. Introduction
The purpose of this document is to give a clear and unambiguous The purpose of this document is to give a clear and unambiguous
definition of digest (hash) values of the XML objects [XML]. Two definition of digest (hash) values of the XML objects [XML]. Two
subtrees are considered identical if their hash values are the same, subtrees are considered identical if their hash values are the same,
skipping to change at page 4, line 43 skipping to change at page 4, line 43
2. Attr 2. Attr
3. ProcessingInstruction 3. ProcessingInstruction
4. Text 4. Text
Comment nodes and Document Type Definitions (DTDs) do not participate Comment nodes and Document Type Definitions (DTDs) do not participate
in the digest value calculation. This is because DOM does not in the digest value calculation. This is because DOM does not
require a conformant processor to create data structures for these. require a conformant processor to create data structures for these.
DOMHash is designed so that it can be computed with any XML processor DOMHash is designed so that it can be computed with any XML processor
conformant to the DOM or SAX [SAX] specification. conformant to the DOM or SAX [SAX] specification.
Nodes with the node type EntityReference must be expanded prior to
digest calculation.
The digest values are defined recursively on each level of the DOM The digest values are defined recursively on each level of the DOM
tree so that only a relavant part needs to be recalculated when a tree so that only a relevant part needs to be recalculated when a
small portion of the tree is changed. small portion of the tree is changed.
Below, we give the precise definitions of digest for these types. We Below, we give the precise definitions of digest for these types. We
describe the format of the data to be supplied to a hash algorithm describe the format of the data to be supplied to a hash algorithm
using a figure and a simple description, followed by a Java code using a figure and a simple description, followed by a Java code
fragment using the DOM API and the JDK 1.1 Platform Core API only. fragment using the DOM API and the JDK 1.1 Platform Core API only.
Therefore, the semantics should be unambiguous. Therefore, the semantics should be unambiguous.
As the rule of thumb, all strings are to be in UTF-16 in the network As the rule of thumb, all strings are to be in UTF-16 in the network
byte order (Big Endian) with no byte order mark. If there is a byte order (Big Endian) with no byte order mark. If there is a
sequence of text nodes without any element nodes inbetween, these sequence of text nodes without any element nodes in between, these
text nodes are merged into one by concatenating them. A zero-length text nodes are merged into one by concatenating them. A zero-length
text node is always ignored. text node is always ignored.
Note that validating and non-validating XML processors may generate
different DOM trees from the same XML document, due to attribute
normalization and default attributes. If DOMHash is to be used for
testing logical equivalence between two XML documents (as opposed to
DOM trees), it may be necessary to normalize attributes and supply
default attributes prior to DOMHash calculation.
Some legacy character encodings (such as ISO-2022-JP) have certain
ambiguity in translating into Unicode. This is again dependent on
XML processors. Treatment of such processor dependencies is out of
scope of this document.
2.2. Namespace Considerations 2.2. Namespace Considerations
To avoid the dependence on the namespace prefix, we use "expanded To avoid the dependence on the namespace prefix, we use "expanded
names" to do digest calculation. If an element name or an attribute names" to do digest calculation. If an element name or an attribute
name is qualified either by a explicit namespace prefix or by a name is qualified either by a explicit namespace prefix or by a
default namespace, the name's LocalPart is prepended by the URI of default namespace, the name's LocalPart is prepended by the URI of
the namespace (the namespace name as defined in the NameSpace the namespace (the namespace name as defined in the NameSpace
specification [NAM]) and a colon before digest calculation. In the specification [NAM]) and a colon before digest calculation. In the
following example, the default qualified name "order" is expanded following example, the default qualified name "order" is expanded
into "http://ecommerce.org/schema:order" while the explicit qualified into "http://ecommerce.org/schema:order" while the explicit qualified
name "book:title" is exapanded into "urn:loc.gov:books:title" before name "book:title" is expanded into "urn:loc.gov:books:title" before
digest calculation. digest calculation.
<?xml version="1.0"?> <?xml version="1.0"?>
<root xmlns='http://ecommerce.org/schema' <root xmlns='http://ecommerce.org/schema'
xmlns:book='urn:loc.gov:books'> xmlns:book='urn:loc.gov:books'>
<order> <order>
<book:title> ... </book:title> <book:title> ... </book:title>
: :
</name> </name>
</root> </root>
We define an expanded name (either for element or attirbute) as We define an expanded name (either for element or attribute) as
follows: follows:
If a name is not qualified, the exapanded name is the name If a name is not qualified, the expanded name is the name
itself. itself.
If a name is qualified with the prefix "xmlns", the expanded If a name is qualified with the prefix "xmlns", the expanded
name is undefined. name is undefined.
If a name is qualified either by default or by an explicit If a name is qualified either by default or by an explicit
namespace prefix, the expanded name is URI bound to the namespace prefix, the expanded name is URI bound to the
namespace + ":" + LocalPart namespace + ":" + LocalPart
In the following example code, for succinctness we assume that the In the following example code, we assume that the getExpandedName()
getExpandedName() method (which returns the expanded name as defined method (which returns the expanded name as defined above) is defined
above) is defined in both Element and Attr interfaces of DOM. in both Element and Attr interfaces of DOM.
Note that the digest values are not defined on namespace Note that the digest values are not defined on namespace
declarations. In other words, the digest value is not defined for an declarations. In other words, the digest value is not defined for an
attribute when attribute when
- the attribute name is "xmlns", or - the attribute name is "xmlns", or
- the namespace prefix is "xmlns". - the namespace prefix is "xmlns".
In the above example, the two attributes which are namespace In the above example, the two attributes which are namespace
declarations do not have digest values and therefore will not declarations do not have digest values and therefore will not
participate in the calculation of the digest value of the "root" participate in the calculation of the digest value of the "root"
element. element.
2.3. Definition with Code Fragments 2.3. Definition with Code Fragments
The code fragments in the definitions below assume that they are in The code fragments in the definitions below assume that they are in
implementation classes of Node. Therefore, a methods call without an implementation classes of Node. Therefore, a methods call without an
explicit object reference is for the Node itself. For example, explicit object reference is for the Node itself. For example,
getData() returns the text data of the current node if it is a Text getData() returns the text data of the current node if it is a Text
node. The parameter digestAlgorithm is to be replaced by an node. The parameter digestAlgorithm is to be replaced by an
identifier of the digest algorithm, such as "MD5" [MD5] and "SHA-1". identifier of the digest algorithm, such as "MD5" [MD5] and "SHA-1"
[SHA].
The computation should begin with a four byte integer that represents The computation should begin with a four byte integer that represents
the type of the node, such as Node.TEXT_NODE or Node.ELEMENT_NODE. the type of the node, such as TEXT_NODE or ELEMENT_NODE.
2.3.1. Text Nodes 2.3.1. Text Nodes
The hash value of a Text node is computed on the four byte header The hash value of a Text node is computed on the four byte header
followed by the UTF-16 encoded text string. followed by the UTF-16 encoded text string.
- TEXT_NODE (3) in 32 bit network-byte-ordered integer - TEXT_NODE (3) in 32 bit network-byte-ordered integer
- Text data in UTF-16 stream (variable length) - Text data in UTF-16 stream (variable length)
public byte[] getDigest(String digestAlgorithm) { public byte[] getDigest(String digestAlgorithm) {
skipping to change at page 7, line 7 skipping to change at page 7, line 23
md.update((byte)3); md.update((byte)3);
md.update(getData().getBytes("UnicodeBigUnmarked")); md.update(getData().getBytes("UnicodeBigUnmarked"));
return md.digest(); return md.digest();
} }
Here, MessageDigest is in the package java.security.*, one of the Here, MessageDigest is in the package java.security.*, one of the
built-in packages of JDK 1.1. built-in packages of JDK 1.1.
2.3.2. ProcessingInstruction Nodes 2.3.2. ProcessingInstruction Nodes
A ProcessingIinstruction (PI) node has two components: the target and A ProcessingInstruction (PI) node has two components: the target and
the data. Accordingly, the hash is computed on the concatenation of the data. Accordingly, the hash is computed on the concatenation of
both, separated by 'x0000'. PI data is from the first non white space both, separated by 'x0000'. PI data is from the first non white space
character after the target to the character immediately preceding the character after the target to the character immediately preceding the
"?>". "?>".
- PROCESSING_INSTRUCTION_NODE (7) in 32 bit network-byte-ordered - PROCESSING_INSTRUCTION_NODE (7) in 32 bit network-byte-ordered
integer integer
- PI target in UTF-16 stream (variable length) - PI target in UTF-16 stream (variable length)
- 0x00 0x00 - 0x00 0x00
skipping to change at page 8, line 34 skipping to change at page 8, line 52
calculate the node's digest so that we can save computation when the calculate the node's digest so that we can save computation when the
structure is partially changed. structure is partially changed.
First, all the attributes except for namespace declarations must be First, all the attributes except for namespace declarations must be
collected. This list is sorted by the expanded attribute names. The collected. This list is sorted by the expanded attribute names. The
sorting is done in ascending order in terms of the UTF-16 encoded sorting is done in ascending order in terms of the UTF-16 encoded
expanded attribute names, using the string comparison operator expanded attribute names, using the string comparison operator
defined as String#compareTo() in Java. The semantics of this sorting defined as String#compareTo() in Java. The semantics of this sorting
operation should be clear. operation should be clear.
- nELEMENT_NODE (1) in 32 bit network-byte-ordered integer - ELEMENT_NODE (1) in 32 bit network-byte-ordered integer
- Expanded element name in UTF-16 stream (variable length) - Expanded element name in UTF-16 stream (variable length)
- 0x00 0x00 - 0x00 0x00
- A number of non-namespace-declaration attributes in 32 bit - A number of non-namespace-declaration attributes in 32 bit
network-byte-ordered unsigned integer network-byte-ordered unsigned integer
- Sequence of digest values of non-namespace-declaration attributes, - Sequence of digest values of non-namespace-declaration attributes,
sorted by String#compareTo() for attribute names sorted by String#compareTo() for attribute names
- A number of child nodes (except for Comment nodes) in 32bit - A number of child nodes (except for Comment nodes) in 32bit
network-byte-ordered unsigned integer network-byte-ordered unsigned integer
- Sequence of digest values of each child node except for Comment - Sequence of digest values of each child node except for Comment
nodes (variable length) (A sequence of child texts is merged to one nodes (variable length) (A sequence of child texts is merged to one
skipping to change at page 9, line 4 skipping to change at page 9, line 15
- 0x00 0x00 - 0x00 0x00
- A number of non-namespace-declaration attributes in 32 bit - A number of non-namespace-declaration attributes in 32 bit
network-byte-ordered unsigned integer network-byte-ordered unsigned integer
- Sequence of digest values of non-namespace-declaration attributes, - Sequence of digest values of non-namespace-declaration attributes,
sorted by String#compareTo() for attribute names sorted by String#compareTo() for attribute names
- A number of child nodes (except for Comment nodes) in 32bit - A number of child nodes (except for Comment nodes) in 32bit
network-byte-ordered unsigned integer network-byte-ordered unsigned integer
- Sequence of digest values of each child node except for Comment - Sequence of digest values of each child node except for Comment
nodes (variable length) (A sequence of child texts is merged to one nodes (variable length) (A sequence of child texts is merged to one
text. A zero-length text and Comment nodes are not counted as child) text. A zero-length text and Comment nodes are not counted as child)
public byte[] getDigest(String digestAlgorithm) { public byte[] getDigest(String digestAlgorithm) {
MessageDigest md = MessageDigest.getInstance(digestAlgorithm); MessageDigest md = MessageDigest.getInstance(digestAlgorithm);
ByteArrayOutputStream baos = new ByteArrayOutputStream(); ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(baos); DataOutputStream dos = new DataOutputStream(baos);
dos.writeInt((int)1);//This is stored in network byte order dos.writeInt(ELEMENT_NODE);//This is stored in network byte order
dos.write(getExpandedName().getBytes("UnicodeBigUnmarked")); dos.write(getExpandedName().getBytes("UnicodeBigUnmarked"));
dos.write((byte)0); dos.write((byte)0);
dos.write((byte)0); dos.write((byte)0);
// Collect all attributes except for namespace declarations // Collect all attributes except for namespace declarations
NamedNodeMap nnm = this.getAttributes(); NamedNodeMap nnm = this.getAttributes();
int len = nnm.getLength() int len = nnm.getLength()
// Find "xmlns" or "xmlns:foo" in nnm and omit it. // Find "xmlns" or "xmlns:foo" in nnm and omit it.
... ...
dos.writeInt(len); // This is sorted in the network byte order dos.writeInt(len); // This is sorted in the network byte order
// Sort attributes by String#compareTo() on expanded attribute names. // Sort attributes by String#compareTo() on expanded attribute names.
skipping to change at page 11, line 19 skipping to change at page 11, line 19
[MD5] - RFC 1321 - R. Rivest, "The MD5 Message-Digest Algorithm", [MD5] - RFC 1321 - R. Rivest, "The MD5 Message-Digest Algorithm",
April 1992. April 1992.
[NAM] - Tim Bray, Dave Hollander, Andrew Layman, "Namespaces in XML", [NAM] - Tim Bray, Dave Hollander, Andrew Layman, "Namespaces in XML",
http://www.w3.org/TR/1999/REC-xml-names-19990114. http://www.w3.org/TR/1999/REC-xml-names-19990114.
[SAX] - David Megginson, "SAX 1.0: The Simple API for XML", [SAX] - David Megginson, "SAX 1.0: The Simple API for XML",
http://www.megginson.com/SAX/, May 1998. http://www.megginson.com/SAX/, May 1998.
[SHA] - (US) National Institute of Standards and Technology, "Federal
Information Processing Standards Publication 180-1: Secure Hash
Standard", 17 April 1995.
[URI] - RFC 2396 - T. Berners-Lee, R. Fielding, L. Masinter, "Uniform [URI] - RFC 2396 - T. Berners-Lee, R. Fielding, L. Masinter, "Uniform
Resource Identifiers (URI): Generic Syntax", August 1998. Resource Identifiers (URI): Generic Syntax", August 1998.
[XML] - Tim Bray, Jean Paoli, C. M. Sperber-McQueen, "Extensible [XML] - Tim Bray, Jean Paoli, C. M. Sperber-McQueen, "Extensible
Markup Language (XML) 1.0", http://www.w3.org/TR/1998/REC-xml- Markup Language (XML) 1.0", http://www.w3.org/TR/1998/REC-xml-
19980210 19980210
Author's Address Author's Address
Hiroshi Maruyama, Hiroshi Maruyama,
skipping to change at page 11, line 44 skipping to change at page 11, line 48
email: kent @ trl.ibm.co.jp email: kent @ trl.ibm.co.jp
Naohiko Uramoto, Naohiko Uramoto,
IBM Research, Tokyo Research Laboratory IBM Research, Tokyo Research Laboratory
email: uramoto @ jp.ibm.com email: uramoto @ jp.ibm.com
Expiration and File Name Expiration and File Name
This draft expires March 2000. This draft expires March 2000.
Its file name is draft-hiroshi-dom-hash-02.txt. Its file name is draft-ietf-trade-hiroshi-dom-hash-03.txt.
 End of changes. 23 change blocks. 
22 lines changed or deleted 44 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/