| < draft-ietf-trade-hiroshi-dom-hash-02.txt | draft-ietf-trade-hiroshi-dom-hash-03.txt > | |||
|---|---|---|---|---|
| Network Working Group Hirosi Maruyama | Network Working Group Hirosi Maruyama | |||
| INTERNET-DRAFT Kent Tamura | INTERNET-DRAFT Kent Tamura | |||
| February 1999 Naohiko Uramoto | September 1999 Naohiko Uramoto | |||
| Expires: March 2000 IBM | Expires: March 2000 IBM | |||
| draft-ietf-trade-hiroshi-dom-hash-02.txt | draft-ietf-trade-hiroshi-dom-hash-03.txt | |||
| Digest Values for DOM (DOMHASH) | Digest Values for DOM (DOMHASH) | |||
| ------ ------ --- --- --------- | ------ ------ --- --- --------- | |||
| Hiroshi Maruyama | Hiroshi Maruyama | |||
| Kent Tamura | Kent Tamura | |||
| Naohiko Uramoto | Naohiko Uramoto | |||
| Status of This Document | Status of This Document | |||
| This draft, file name draft-ietf-trade-hiroshi-dom-hash-02.txt, is intended to | This draft, file name draft-ietf-trade-hiroshi-dom-hash-03.txt, is | |||
| be become an Informational RFC. Distribution of this document is | intended to be become an Informational RFC. Distribution of this | |||
| unlimited. Comments should be sent to the authors. | document is unlimited. Comments should be sent to the authors or the | |||
| IETF TRADE working group mailing list <ietf-trade@elistx.com>. | ||||
| This document is an Internet-Draft and is in full conformance with | This document is an Internet-Draft and is in full conformance with | |||
| all provisions of Section 10 of RFC2026. Internet-Drafts are working | all provisions of Section 10 of RFC2026. Internet-Drafts are working | |||
| documents of the Internet Engineering Task Force (IETF), its areas, | documents of the Internet Engineering Task Force (IETF), its areas, | |||
| and its working groups. Note that other groups may also distribute | and its working groups. Note that other groups may also distribute | |||
| working documents as Internet-Drafts. | working documents as Internet-Drafts. | |||
| Internet-Drafts are draft documents valid for a maximum of six | Internet-Drafts are draft documents valid for a maximum of six | |||
| months. Internet-Drafts may be updated, replaced, or obsoleted by | months. Internet-Drafts may be updated, replaced, or obsoleted by | |||
| other documents at any time. It is not appropriate to use Internet- | other documents at any time. It is not appropriate to use Internet- | |||
| skipping to change at page 2, line 15 ¶ | skipping to change at page 2, line 8 ¶ | |||
| Abstract | Abstract | |||
| This internet draft defines a clear and unambiguous definition of | This internet draft defines a clear and unambiguous definition of | |||
| digest (hash) values of the XML objects regardless of the surface | digest (hash) values of the XML objects regardless of the surface | |||
| string variation of XML. This definition can be used for XML digital | string variation of XML. This definition can be used for XML digital | |||
| signature as well efficient replication of XML objects. | signature as well efficient replication of XML objects. | |||
| Table of Contents | Table of Contents | |||
| Status of This Document....................................1 | Status of This Document....................................1 | |||
| Abstract...................................................1 | ||||
| Abstract...................................................2 | ||||
| Table of Contents..........................................2 | Table of Contents..........................................2 | |||
| 1. Introduction............................................3 | 1. Introduction............................................3 | |||
| 2. Digest Calculation......................................4 | 2. Digest Calculation......................................4 | |||
| 2.1. Overview..............................................4 | 2.1. Overview..............................................4 | |||
| 2.2. Namespace Considerations..............................5 | 2.2. Namespace Considerations..............................5 | |||
| 2.3. Definition with Code Fragments........................6 | 2.3. Definition with Code Fragments........................6 | |||
| 2.3.1. Text Nodes..........................................6 | 2.3.1. Text Nodes..........................................6 | |||
| 2.3.2. ProcessingInstruction Nodes.........................7 | 2.3.2. ProcessingInstruction Nodes.........................7 | |||
| 2.3.3. Attr Nodes..........................................7 | 2.3.3. Attr Nodes..........................................8 | |||
| 2.3.4. Element Nodes.......................................8 | 2.3.4. Element Nodes.......................................8 | |||
| 3. Discussion..............................................9 | 3. Discussion.............................................10 | |||
| 4. Security Considerations.................................9 | 4. Security Considerations................................10 | |||
| References................................................11 | References................................................11 | |||
| Author's Address..........................................11 | Author's Address..........................................11 | |||
| Expiration and File Name..................................11 | Expiration and File Name..................................11 | |||
| 1. Introduction | 1. Introduction | |||
| The purpose of this document is to give a clear and unambiguous | The purpose of this document is to give a clear and unambiguous | |||
| definition of digest (hash) values of the XML objects [XML]. Two | definition of digest (hash) values of the XML objects [XML]. Two | |||
| subtrees are considered identical if their hash values are the same, | subtrees are considered identical if their hash values are the same, | |||
| skipping to change at page 4, line 43 ¶ | skipping to change at page 4, line 43 ¶ | |||
| 2. Attr | 2. Attr | |||
| 3. ProcessingInstruction | 3. ProcessingInstruction | |||
| 4. Text | 4. Text | |||
| Comment nodes and Document Type Definitions (DTDs) do not participate | Comment nodes and Document Type Definitions (DTDs) do not participate | |||
| in the digest value calculation. This is because DOM does not | in the digest value calculation. This is because DOM does not | |||
| require a conformant processor to create data structures for these. | require a conformant processor to create data structures for these. | |||
| DOMHash is designed so that it can be computed with any XML processor | DOMHash is designed so that it can be computed with any XML processor | |||
| conformant to the DOM or SAX [SAX] specification. | conformant to the DOM or SAX [SAX] specification. | |||
| Nodes with the node type EntityReference must be expanded prior to | ||||
| digest calculation. | ||||
| The digest values are defined recursively on each level of the DOM | The digest values are defined recursively on each level of the DOM | |||
| tree so that only a relavant part needs to be recalculated when a | tree so that only a relevant part needs to be recalculated when a | |||
| small portion of the tree is changed. | small portion of the tree is changed. | |||
| Below, we give the precise definitions of digest for these types. We | Below, we give the precise definitions of digest for these types. We | |||
| describe the format of the data to be supplied to a hash algorithm | describe the format of the data to be supplied to a hash algorithm | |||
| using a figure and a simple description, followed by a Java code | using a figure and a simple description, followed by a Java code | |||
| fragment using the DOM API and the JDK 1.1 Platform Core API only. | fragment using the DOM API and the JDK 1.1 Platform Core API only. | |||
| Therefore, the semantics should be unambiguous. | Therefore, the semantics should be unambiguous. | |||
| As the rule of thumb, all strings are to be in UTF-16 in the network | As the rule of thumb, all strings are to be in UTF-16 in the network | |||
| byte order (Big Endian) with no byte order mark. If there is a | byte order (Big Endian) with no byte order mark. If there is a | |||
| sequence of text nodes without any element nodes inbetween, these | sequence of text nodes without any element nodes in between, these | |||
| text nodes are merged into one by concatenating them. A zero-length | text nodes are merged into one by concatenating them. A zero-length | |||
| text node is always ignored. | text node is always ignored. | |||
| Note that validating and non-validating XML processors may generate | ||||
| different DOM trees from the same XML document, due to attribute | ||||
| normalization and default attributes. If DOMHash is to be used for | ||||
| testing logical equivalence between two XML documents (as opposed to | ||||
| DOM trees), it may be necessary to normalize attributes and supply | ||||
| default attributes prior to DOMHash calculation. | ||||
| Some legacy character encodings (such as ISO-2022-JP) have certain | ||||
| ambiguity in translating into Unicode. This is again dependent on | ||||
| XML processors. Treatment of such processor dependencies is out of | ||||
| scope of this document. | ||||
| 2.2. Namespace Considerations | 2.2. Namespace Considerations | |||
| To avoid the dependence on the namespace prefix, we use "expanded | To avoid the dependence on the namespace prefix, we use "expanded | |||
| names" to do digest calculation. If an element name or an attribute | names" to do digest calculation. If an element name or an attribute | |||
| name is qualified either by a explicit namespace prefix or by a | name is qualified either by a explicit namespace prefix or by a | |||
| default namespace, the name's LocalPart is prepended by the URI of | default namespace, the name's LocalPart is prepended by the URI of | |||
| the namespace (the namespace name as defined in the NameSpace | the namespace (the namespace name as defined in the NameSpace | |||
| specification [NAM]) and a colon before digest calculation. In the | specification [NAM]) and a colon before digest calculation. In the | |||
| following example, the default qualified name "order" is expanded | following example, the default qualified name "order" is expanded | |||
| into "http://ecommerce.org/schema:order" while the explicit qualified | into "http://ecommerce.org/schema:order" while the explicit qualified | |||
| name "book:title" is exapanded into "urn:loc.gov:books:title" before | name "book:title" is expanded into "urn:loc.gov:books:title" before | |||
| digest calculation. | digest calculation. | |||
| <?xml version="1.0"?> | <?xml version="1.0"?> | |||
| <root xmlns='http://ecommerce.org/schema' | <root xmlns='http://ecommerce.org/schema' | |||
| xmlns:book='urn:loc.gov:books'> | xmlns:book='urn:loc.gov:books'> | |||
| <order> | <order> | |||
| <book:title> ... </book:title> | <book:title> ... </book:title> | |||
| : | : | |||
| </name> | </name> | |||
| </root> | </root> | |||
| We define an expanded name (either for element or attirbute) as | We define an expanded name (either for element or attribute) as | |||
| follows: | follows: | |||
| If a name is not qualified, the exapanded name is the name | If a name is not qualified, the expanded name is the name | |||
| itself. | itself. | |||
| If a name is qualified with the prefix "xmlns", the expanded | If a name is qualified with the prefix "xmlns", the expanded | |||
| name is undefined. | name is undefined. | |||
| If a name is qualified either by default or by an explicit | If a name is qualified either by default or by an explicit | |||
| namespace prefix, the expanded name is URI bound to the | namespace prefix, the expanded name is URI bound to the | |||
| namespace + ":" + LocalPart | namespace + ":" + LocalPart | |||
| In the following example code, for succinctness we assume that the | In the following example code, we assume that the getExpandedName() | |||
| getExpandedName() method (which returns the expanded name as defined | method (which returns the expanded name as defined above) is defined | |||
| above) is defined in both Element and Attr interfaces of DOM. | in both Element and Attr interfaces of DOM. | |||
| Note that the digest values are not defined on namespace | Note that the digest values are not defined on namespace | |||
| declarations. In other words, the digest value is not defined for an | declarations. In other words, the digest value is not defined for an | |||
| attribute when | attribute when | |||
| - the attribute name is "xmlns", or | - the attribute name is "xmlns", or | |||
| - the namespace prefix is "xmlns". | - the namespace prefix is "xmlns". | |||
| In the above example, the two attributes which are namespace | In the above example, the two attributes which are namespace | |||
| declarations do not have digest values and therefore will not | declarations do not have digest values and therefore will not | |||
| participate in the calculation of the digest value of the "root" | participate in the calculation of the digest value of the "root" | |||
| element. | element. | |||
| 2.3. Definition with Code Fragments | 2.3. Definition with Code Fragments | |||
| The code fragments in the definitions below assume that they are in | The code fragments in the definitions below assume that they are in | |||
| implementation classes of Node. Therefore, a methods call without an | implementation classes of Node. Therefore, a methods call without an | |||
| explicit object reference is for the Node itself. For example, | explicit object reference is for the Node itself. For example, | |||
| getData() returns the text data of the current node if it is a Text | getData() returns the text data of the current node if it is a Text | |||
| node. The parameter digestAlgorithm is to be replaced by an | node. The parameter digestAlgorithm is to be replaced by an | |||
| identifier of the digest algorithm, such as "MD5" [MD5] and "SHA-1". | identifier of the digest algorithm, such as "MD5" [MD5] and "SHA-1" | |||
| [SHA]. | ||||
| The computation should begin with a four byte integer that represents | The computation should begin with a four byte integer that represents | |||
| the type of the node, such as Node.TEXT_NODE or Node.ELEMENT_NODE. | the type of the node, such as TEXT_NODE or ELEMENT_NODE. | |||
| 2.3.1. Text Nodes | 2.3.1. Text Nodes | |||
| The hash value of a Text node is computed on the four byte header | The hash value of a Text node is computed on the four byte header | |||
| followed by the UTF-16 encoded text string. | followed by the UTF-16 encoded text string. | |||
| - TEXT_NODE (3) in 32 bit network-byte-ordered integer | - TEXT_NODE (3) in 32 bit network-byte-ordered integer | |||
| - Text data in UTF-16 stream (variable length) | - Text data in UTF-16 stream (variable length) | |||
| public byte[] getDigest(String digestAlgorithm) { | public byte[] getDigest(String digestAlgorithm) { | |||
| skipping to change at page 7, line 7 ¶ | skipping to change at page 7, line 23 ¶ | |||
| md.update((byte)3); | md.update((byte)3); | |||
| md.update(getData().getBytes("UnicodeBigUnmarked")); | md.update(getData().getBytes("UnicodeBigUnmarked")); | |||
| return md.digest(); | return md.digest(); | |||
| } | } | |||
| Here, MessageDigest is in the package java.security.*, one of the | Here, MessageDigest is in the package java.security.*, one of the | |||
| built-in packages of JDK 1.1. | built-in packages of JDK 1.1. | |||
| 2.3.2. ProcessingInstruction Nodes | 2.3.2. ProcessingInstruction Nodes | |||
| A ProcessingIinstruction (PI) node has two components: the target and | A ProcessingInstruction (PI) node has two components: the target and | |||
| the data. Accordingly, the hash is computed on the concatenation of | the data. Accordingly, the hash is computed on the concatenation of | |||
| both, separated by 'x0000'. PI data is from the first non white space | both, separated by 'x0000'. PI data is from the first non white space | |||
| character after the target to the character immediately preceding the | character after the target to the character immediately preceding the | |||
| "?>". | "?>". | |||
| - PROCESSING_INSTRUCTION_NODE (7) in 32 bit network-byte-ordered | - PROCESSING_INSTRUCTION_NODE (7) in 32 bit network-byte-ordered | |||
| integer | integer | |||
| - PI target in UTF-16 stream (variable length) | - PI target in UTF-16 stream (variable length) | |||
| - 0x00 0x00 | - 0x00 0x00 | |||
| skipping to change at page 8, line 34 ¶ | skipping to change at page 8, line 52 ¶ | |||
| calculate the node's digest so that we can save computation when the | calculate the node's digest so that we can save computation when the | |||
| structure is partially changed. | structure is partially changed. | |||
| First, all the attributes except for namespace declarations must be | First, all the attributes except for namespace declarations must be | |||
| collected. This list is sorted by the expanded attribute names. The | collected. This list is sorted by the expanded attribute names. The | |||
| sorting is done in ascending order in terms of the UTF-16 encoded | sorting is done in ascending order in terms of the UTF-16 encoded | |||
| expanded attribute names, using the string comparison operator | expanded attribute names, using the string comparison operator | |||
| defined as String#compareTo() in Java. The semantics of this sorting | defined as String#compareTo() in Java. The semantics of this sorting | |||
| operation should be clear. | operation should be clear. | |||
| - nELEMENT_NODE (1) in 32 bit network-byte-ordered integer | - ELEMENT_NODE (1) in 32 bit network-byte-ordered integer | |||
| - Expanded element name in UTF-16 stream (variable length) | - Expanded element name in UTF-16 stream (variable length) | |||
| - 0x00 0x00 | - 0x00 0x00 | |||
| - A number of non-namespace-declaration attributes in 32 bit | - A number of non-namespace-declaration attributes in 32 bit | |||
| network-byte-ordered unsigned integer | network-byte-ordered unsigned integer | |||
| - Sequence of digest values of non-namespace-declaration attributes, | - Sequence of digest values of non-namespace-declaration attributes, | |||
| sorted by String#compareTo() for attribute names | sorted by String#compareTo() for attribute names | |||
| - A number of child nodes (except for Comment nodes) in 32bit | - A number of child nodes (except for Comment nodes) in 32bit | |||
| network-byte-ordered unsigned integer | network-byte-ordered unsigned integer | |||
| - Sequence of digest values of each child node except for Comment | - Sequence of digest values of each child node except for Comment | |||
| nodes (variable length) (A sequence of child texts is merged to one | nodes (variable length) (A sequence of child texts is merged to one | |||
| skipping to change at page 9, line 4 ¶ | skipping to change at page 9, line 15 ¶ | |||
| - 0x00 0x00 | - 0x00 0x00 | |||
| - A number of non-namespace-declaration attributes in 32 bit | - A number of non-namespace-declaration attributes in 32 bit | |||
| network-byte-ordered unsigned integer | network-byte-ordered unsigned integer | |||
| - Sequence of digest values of non-namespace-declaration attributes, | - Sequence of digest values of non-namespace-declaration attributes, | |||
| sorted by String#compareTo() for attribute names | sorted by String#compareTo() for attribute names | |||
| - A number of child nodes (except for Comment nodes) in 32bit | - A number of child nodes (except for Comment nodes) in 32bit | |||
| network-byte-ordered unsigned integer | network-byte-ordered unsigned integer | |||
| - Sequence of digest values of each child node except for Comment | - Sequence of digest values of each child node except for Comment | |||
| nodes (variable length) (A sequence of child texts is merged to one | nodes (variable length) (A sequence of child texts is merged to one | |||
| text. A zero-length text and Comment nodes are not counted as child) | text. A zero-length text and Comment nodes are not counted as child) | |||
| public byte[] getDigest(String digestAlgorithm) { | public byte[] getDigest(String digestAlgorithm) { | |||
| MessageDigest md = MessageDigest.getInstance(digestAlgorithm); | MessageDigest md = MessageDigest.getInstance(digestAlgorithm); | |||
| ByteArrayOutputStream baos = new ByteArrayOutputStream(); | ByteArrayOutputStream baos = new ByteArrayOutputStream(); | |||
| DataOutputStream dos = new DataOutputStream(baos); | DataOutputStream dos = new DataOutputStream(baos); | |||
| dos.writeInt((int)1);//This is stored in network byte order | dos.writeInt(ELEMENT_NODE);//This is stored in network byte order | |||
| dos.write(getExpandedName().getBytes("UnicodeBigUnmarked")); | dos.write(getExpandedName().getBytes("UnicodeBigUnmarked")); | |||
| dos.write((byte)0); | dos.write((byte)0); | |||
| dos.write((byte)0); | dos.write((byte)0); | |||
| // Collect all attributes except for namespace declarations | // Collect all attributes except for namespace declarations | |||
| NamedNodeMap nnm = this.getAttributes(); | NamedNodeMap nnm = this.getAttributes(); | |||
| int len = nnm.getLength() | int len = nnm.getLength() | |||
| // Find "xmlns" or "xmlns:foo" in nnm and omit it. | // Find "xmlns" or "xmlns:foo" in nnm and omit it. | |||
| ... | ... | |||
| dos.writeInt(len); // This is sorted in the network byte order | dos.writeInt(len); // This is sorted in the network byte order | |||
| // Sort attributes by String#compareTo() on expanded attribute names. | // Sort attributes by String#compareTo() on expanded attribute names. | |||
| skipping to change at page 11, line 19 ¶ | skipping to change at page 11, line 19 ¶ | |||
| [MD5] - RFC 1321 - R. Rivest, "The MD5 Message-Digest Algorithm", | [MD5] - RFC 1321 - R. Rivest, "The MD5 Message-Digest Algorithm", | |||
| April 1992. | April 1992. | |||
| [NAM] - Tim Bray, Dave Hollander, Andrew Layman, "Namespaces in XML", | [NAM] - Tim Bray, Dave Hollander, Andrew Layman, "Namespaces in XML", | |||
| http://www.w3.org/TR/1999/REC-xml-names-19990114. | http://www.w3.org/TR/1999/REC-xml-names-19990114. | |||
| [SAX] - David Megginson, "SAX 1.0: The Simple API for XML", | [SAX] - David Megginson, "SAX 1.0: The Simple API for XML", | |||
| http://www.megginson.com/SAX/, May 1998. | http://www.megginson.com/SAX/, May 1998. | |||
| [SHA] - (US) National Institute of Standards and Technology, "Federal | ||||
| Information Processing Standards Publication 180-1: Secure Hash | ||||
| Standard", 17 April 1995. | ||||
| [URI] - RFC 2396 - T. Berners-Lee, R. Fielding, L. Masinter, "Uniform | [URI] - RFC 2396 - T. Berners-Lee, R. Fielding, L. Masinter, "Uniform | |||
| Resource Identifiers (URI): Generic Syntax", August 1998. | Resource Identifiers (URI): Generic Syntax", August 1998. | |||
| [XML] - Tim Bray, Jean Paoli, C. M. Sperber-McQueen, "Extensible | [XML] - Tim Bray, Jean Paoli, C. M. Sperber-McQueen, "Extensible | |||
| Markup Language (XML) 1.0", http://www.w3.org/TR/1998/REC-xml- | Markup Language (XML) 1.0", http://www.w3.org/TR/1998/REC-xml- | |||
| 19980210 | 19980210 | |||
| Author's Address | Author's Address | |||
| Hiroshi Maruyama, | Hiroshi Maruyama, | |||
| skipping to change at page 11, line 44 ¶ | skipping to change at page 11, line 48 ¶ | |||
| email: kent @ trl.ibm.co.jp | email: kent @ trl.ibm.co.jp | |||
| Naohiko Uramoto, | Naohiko Uramoto, | |||
| IBM Research, Tokyo Research Laboratory | IBM Research, Tokyo Research Laboratory | |||
| email: uramoto @ jp.ibm.com | email: uramoto @ jp.ibm.com | |||
| Expiration and File Name | Expiration and File Name | |||
| This draft expires March 2000. | This draft expires March 2000. | |||
| Its file name is draft-hiroshi-dom-hash-02.txt. | Its file name is draft-ietf-trade-hiroshi-dom-hash-03.txt. | |||
| End of changes. 23 change blocks. | ||||
| 22 lines changed or deleted | 44 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||