| < draft-fielding-uri-rfc2396bis-02.txt | draft-fielding-uri-rfc2396bis-03.txt > | |||
|---|---|---|---|---|
| Network Working Group T. Berners-Lee | Network Working Group T. Berners-Lee | |||
| Internet-Draft MIT/LCS | Internet-Draft MIT/LCS | |||
| Updates: 1738 (if approved) R. Fielding | Updates: 1738 (if approved) R. Fielding | |||
| Obsoletes: 2732, 2396, 1808 (if approved) Day Software | Obsoletes: 2732, 2396, 1808 (if approved) Day Software | |||
| L. Masinter | L. Masinter | |||
| Expires: November 21, 2003 Adobe | Expires: December 5, 2003 Adobe | |||
| May 23, 2003 | June 6, 2003 | |||
| Uniform Resource Identifier (URI): Generic Syntax | Uniform Resource Identifier (URI): Generic Syntax | |||
| draft-fielding-uri-rfc2396bis-02 | draft-fielding-uri-rfc2396bis-03 | |||
| Status of this Memo | Status of this Memo | |||
| This document is an Internet-Draft and is in full conformance with | This document is an Internet-Draft and is in full conformance with | |||
| all provisions of Section 10 of RFC2026. | all provisions of Section 10 of RFC2026. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that other | Task Force (IETF), its areas, and its working groups. Note that other | |||
| groups may also distribute working documents as Internet-Drafts. | groups may also distribute working documents as Internet-Drafts. | |||
| skipping to change at page 1, line 41 ¶ | skipping to change at page 1, line 41 ¶ | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| <http://www.ietf.org/shadow.html>. | <http://www.ietf.org/shadow.html>. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The Internet Society (2003). All Rights Reserved. | Copyright (C) The Internet Society (2003). All Rights Reserved. | |||
| Abstract | Abstract | |||
| A Uniform Resource Identifier (URI) is a compact string of characters | A Uniform Resource Identifier (URI) is a compact string of characters | |||
| for identifying an abstract or physical resource. This document | for identifying an abstract or physical resource. This specification | |||
| defines the generic syntax of a URI, including both absolute and | defines the generic URI syntax and a process for resolving URI | |||
| relative forms, and guidelines for their use. | references that might be in relative form, along with guidelines and | |||
| security considerations for the use of URIs on the Internet. | ||||
| This document defines a grammar that is a superset of all valid URIs, | The URI syntax defines a grammar that is a superset of all valid | |||
| such that an implementation can parse the common components of a URI | URIs, such that an implementation can parse the common components of | |||
| reference without knowing the scheme-specific requirements of every | a URI reference without knowing the scheme-specific requirements of | |||
| possible identifier type. This document does not define a generative | every possible identifier. This specification does not define a | |||
| grammar for all URIs; that task will be performed by the individual | generative grammar for URIs; that task is performed by the individual | |||
| specifications of each URI scheme. | specifications of each URI scheme. | |||
| Editorial Note | Editorial Note | |||
| Discussion of this draft and comments to the editors should be sent | Discussion of this draft and comments to the editors should be sent | |||
| to the uri@w3.org mailing list. An issues list and version history | to the uri@w3.org mailing list. An issues list and version history | |||
| is available at <http://www.apache.org/~fielding/uri/rev-2002/ | is available at <http://www.apache.org/~fielding/uri/rev-2002/ | |||
| issues.html>. | issues.html>. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.1 Overview of URIs . . . . . . . . . . . . . . . . . . . . . . 4 | 1.1 Overview of URIs . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.1.1 Generic Syntax . . . . . . . . . . . . . . . . . . . . . . . 5 | 1.1.1 Generic Syntax . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 1.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 1.1.3 URI, URL, and URN . . . . . . . . . . . . . . . . . . . . . 6 | 1.1.3 URI, URL, and URN . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 1.2 Design Considerations . . . . . . . . . . . . . . . . . . . 6 | 1.2 Design Considerations . . . . . . . . . . . . . . . . . . . 6 | |||
| 1.2.1 Transcription . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.2.1 Transcription . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 1.2.2 Separating Identification from Interaction . . . . . . . . . 7 | 1.2.2 Separating Identification from Interaction . . . . . . . . . 7 | |||
| 1.2.3 Hierarchical Identifiers . . . . . . . . . . . . . . . . . . 9 | 1.2.3 Hierarchical Identifiers . . . . . . . . . . . . . . . . . . 8 | |||
| 1.3 Syntax Notation . . . . . . . . . . . . . . . . . . . . . . 9 | 1.3 Syntax Notation . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 2. Characters . . . . . . . . . . . . . . . . . . . . . . . . . 10 | 2. Characters . . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 2.1 Encoding of Characters . . . . . . . . . . . . . . . . . . . 10 | 2.1 Encoding of Characters . . . . . . . . . . . . . . . . . . . 11 | |||
| 2.2 Reserved Characters . . . . . . . . . . . . . . . . . . . . 10 | 2.2 Reserved Characters . . . . . . . . . . . . . . . . . . . . 11 | |||
| 2.3 Unreserved Characters . . . . . . . . . . . . . . . . . . . 11 | 2.3 Unreserved Characters . . . . . . . . . . . . . . . . . . . 12 | |||
| 2.4 Escaped Characters . . . . . . . . . . . . . . . . . . . . . 12 | 2.4 Escaped Characters . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 2.4.1 Escaped Encoding . . . . . . . . . . . . . . . . . . . . . . 12 | 2.4.1 Escaped Encoding . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 2.4.2 When to Escape and Unescape . . . . . . . . . . . . . . . . 12 | 2.4.2 When to Escape and Unescape . . . . . . . . . . . . . . . . 13 | |||
| 2.5 Excluded Characters . . . . . . . . . . . . . . . . . . . . 13 | 2.5 Excluded Characters . . . . . . . . . . . . . . . . . . . . 14 | |||
| 3. Syntax Components . . . . . . . . . . . . . . . . . . . . . 15 | 3. Syntax Components . . . . . . . . . . . . . . . . . . . . . 16 | |||
| 3.1 Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 | 3.1 Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 | |||
| 3.2 Authority . . . . . . . . . . . . . . . . . . . . . . . . . 16 | 3.2 Authority . . . . . . . . . . . . . . . . . . . . . . . . . 17 | |||
| 3.2.1 User Information . . . . . . . . . . . . . . . . . . . . . . 16 | 3.2.1 User Information . . . . . . . . . . . . . . . . . . . . . . 18 | |||
| 3.2.2 Host . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 | 3.2.2 Host . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 | |||
| 3.2.3 Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 | 3.2.3 Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
| 3.3 Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 | 3.3 Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
| 3.4 Query . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 | 3.4 Query . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
| 3.5 Fragment . . . . . . . . . . . . . . . . . . . . . . . . . . 20 | 3.5 Fragment . . . . . . . . . . . . . . . . . . . . . . . . . . 22 | |||
| 4. Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 | 4. Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
| 4.1 URI Reference . . . . . . . . . . . . . . . . . . . . . . . 22 | 4.1 URI Reference . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
| 4.2 Relative URI . . . . . . . . . . . . . . . . . . . . . . . . 22 | 4.2 Relative URI . . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
| 4.3 Absolute URI . . . . . . . . . . . . . . . . . . . . . . . . 23 | 4.3 Absolute URI . . . . . . . . . . . . . . . . . . . . . . . . 25 | |||
| 4.4 Same-document Reference . . . . . . . . . . . . . . . . . . 23 | 4.4 Same-document Reference . . . . . . . . . . . . . . . . . . 25 | |||
| 4.5 Suffix Reference . . . . . . . . . . . . . . . . . . . . . . 23 | 4.5 Suffix Reference . . . . . . . . . . . . . . . . . . . . . . 25 | |||
| 5. Relative Resolution . . . . . . . . . . . . . . . . . . . . 25 | 5. Reference Resolution . . . . . . . . . . . . . . . . . . . . 27 | |||
| 5.1 Establishing a Base URI . . . . . . . . . . . . . . . . . . 25 | 5.1 Establishing a Base URI . . . . . . . . . . . . . . . . . . 27 | |||
| 5.1.1 Base URI within Document Content . . . . . . . . . . . . . . 26 | 5.1.1 Base URI within Document Content . . . . . . . . . . . . . . 27 | |||
| 5.1.2 Base URI from the Encapsulating Entity . . . . . . . . . . . 26 | 5.1.2 Base URI from the Encapsulating Entity . . . . . . . . . . . 28 | |||
| 5.1.3 Base URI from the Retrieval URI . . . . . . . . . . . . . . 27 | 5.1.3 Base URI from the Retrieval URI . . . . . . . . . . . . . . 28 | |||
| 5.1.4 Default Base URI . . . . . . . . . . . . . . . . . . . . . . 27 | 5.1.4 Default Base URI . . . . . . . . . . . . . . . . . . . . . . 28 | |||
| 5.2 Obtaining the Referenced URI . . . . . . . . . . . . . . . . 27 | 5.2 Obtaining the Referenced URI . . . . . . . . . . . . . . . . 28 | |||
| 5.3 Recomposition of a Parsed URI . . . . . . . . . . . . . . . 29 | 5.3 Recomposition of a Parsed URI . . . . . . . . . . . . . . . 31 | |||
| 5.4 Examples of Relative Resolution . . . . . . . . . . . . . . 30 | 5.4 Reference Resolution Examples . . . . . . . . . . . . . . . 32 | |||
| 5.4.1 Normal Examples . . . . . . . . . . . . . . . . . . . . . . 30 | 5.4.1 Normal Examples . . . . . . . . . . . . . . . . . . . . . . 32 | |||
| 5.4.2 Abnormal Examples . . . . . . . . . . . . . . . . . . . . . 31 | 5.4.2 Abnormal Examples . . . . . . . . . . . . . . . . . . . . . 32 | |||
| 6. Normalization and Comparison . . . . . . . . . . . . . . . . 33 | 6. Normalization and Comparison . . . . . . . . . . . . . . . . 35 | |||
| 6.1 Equivalence . . . . . . . . . . . . . . . . . . . . . . . . 33 | 6.1 Equivalence . . . . . . . . . . . . . . . . . . . . . . . . 35 | |||
| 6.2 Comparison Ladder . . . . . . . . . . . . . . . . . . . . . 33 | 6.2 Comparison Ladder . . . . . . . . . . . . . . . . . . . . . 35 | |||
| 6.2.1 Simple String Comparison . . . . . . . . . . . . . . . . . . 34 | 6.2.1 Simple String Comparison . . . . . . . . . . . . . . . . . . 36 | |||
| 6.2.2 Syntax-based Normalization . . . . . . . . . . . . . . . . . 35 | 6.2.2 Syntax-based Normalization . . . . . . . . . . . . . . . . . 37 | |||
| 6.2.3 Scheme-based Normalization . . . . . . . . . . . . . . . . . 36 | 6.2.3 Scheme-based Normalization . . . . . . . . . . . . . . . . . 38 | |||
| 6.2.4 Protocol-based Normalization . . . . . . . . . . . . . . . . 36 | 6.2.4 Protocol-based Normalization . . . . . . . . . . . . . . . . 38 | |||
| 6.3 Canonical Form . . . . . . . . . . . . . . . . . . . . . . . 36 | 6.3 Canonical Form . . . . . . . . . . . . . . . . . . . . . . . 38 | |||
| 7. Security Considerations . . . . . . . . . . . . . . . . . . 38 | 7. Security Considerations . . . . . . . . . . . . . . . . . . 40 | |||
| 7.1 Reliability and Consistency . . . . . . . . . . . . . . . . 38 | 7.1 Reliability and Consistency . . . . . . . . . . . . . . . . 40 | |||
| 7.2 Malicious Construction . . . . . . . . . . . . . . . . . . . 38 | 7.2 Malicious Construction . . . . . . . . . . . . . . . . . . . 40 | |||
| 7.3 Rare IP Address Formats . . . . . . . . . . . . . . . . . . 39 | 7.3 Rare IP Address Formats . . . . . . . . . . . . . . . . . . 41 | |||
| 7.4 Sensitive Information . . . . . . . . . . . . . . . . . . . 39 | 7.4 Sensitive Information . . . . . . . . . . . . . . . . . . . 41 | |||
| 7.5 Semantic Attacks . . . . . . . . . . . . . . . . . . . . . . 39 | 7.5 Semantic Attacks . . . . . . . . . . . . . . . . . . . . . . 41 | |||
| 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . 41 | 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . 43 | |||
| Normative References . . . . . . . . . . . . . . . . . . . . 42 | Normative References . . . . . . . . . . . . . . . . . . . . 44 | |||
| Informative References . . . . . . . . . . . . . . . . . . . 43 | Informative References . . . . . . . . . . . . . . . . . . . 45 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 45 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 47 | |||
| A. Collected ABNF for URI . . . . . . . . . . . . . . . . . . . 46 | A. Collected ABNF for URI . . . . . . . . . . . . . . . . . . . 48 | |||
| B. Parsing a URI Reference with a Regular Expression . . . . . 47 | B. Parsing a URI Reference with a Regular Expression . . . . . 50 | |||
| C. Embedding the Base URI in HTML documents . . . . . . . . . . 48 | C. Delimiting a URI in Context . . . . . . . . . . . . . . . . 51 | |||
| D. Delimiting a URI in Context . . . . . . . . . . . . . . . . 49 | D. Summary of Non-editorial Changes . . . . . . . . . . . . . . 53 | |||
| E. Summary of Non-editorial Changes . . . . . . . . . . . . . . 51 | D.1 Additions . . . . . . . . . . . . . . . . . . . . . . . . . 53 | |||
| E.1 Additions . . . . . . . . . . . . . . . . . . . . . . . . . 51 | D.2 Modifications from RFC 2396 . . . . . . . . . . . . . . . . 53 | |||
| E.2 Modifications from RFC 2396 . . . . . . . . . . . . . . . . 51 | Index . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 | |||
| Index . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 | Intellectual Property and Copyright Statements . . . . . . . 60 | |||
| Intellectual Property and Copyright Statements . . . . . . . 57 | ||||
| 1. Introduction | 1. Introduction | |||
| A Uniform Resource Identifier (URI) provides a simple and extensible | A Uniform Resource Identifier (URI) provides a simple and extensible | |||
| means for identifying a resource. This specification of URI syntax | means for identifying a resource. This specification of URI syntax | |||
| and semantics is derived from concepts introduced by the World Wide | and semantics is derived from concepts introduced by the World Wide | |||
| Web global information initiative, whose use of such identifiers | Web global information initiative, whose use of such identifiers | |||
| dates from 1990 and is described in "Universal Resource Identifiers | dates from 1990 and is described in "Universal Resource Identifiers | |||
| in WWW" [RFC1630], and is designed to meet the recommendations laid | in WWW" [RFC1630], and is designed to meet the recommendations laid | |||
| out in "Functional Recommendations for Internet Resource Locators" | out in "Functional Recommendations for Internet Resource Locators" | |||
| skipping to change at page 4, line 25 ¶ | skipping to change at page 4, line 25 ¶ | |||
| [RFC1737]. | [RFC1737]. | |||
| This document obsoletes [RFC2396], which merged "Uniform Resource | This document obsoletes [RFC2396], which merged "Uniform Resource | |||
| Locators" [RFC1738] and "Relative Uniform Resource Locators" | Locators" [RFC1738] and "Relative Uniform Resource Locators" | |||
| [RFC1808] in order to define a single, generic syntax for all URIs. | [RFC1808] in order to define a single, generic syntax for all URIs. | |||
| It excludes those portions of RFC 1738 that defined the specific | It excludes those portions of RFC 1738 that defined the specific | |||
| syntax of individual URI schemes; those portions will be updated as | syntax of individual URI schemes; those portions will be updated as | |||
| separate documents. The process for registration of new URI schemes | separate documents. The process for registration of new URI schemes | |||
| is defined separately by [RFC2717]. | is defined separately by [RFC2717]. | |||
| All significant changes from RFC 2396 are noted in Appendix G. | All significant changes from RFC 2396 are noted in Appendix D. | |||
| 1.1 Overview of URIs | 1.1 Overview of URIs | |||
| URIs are characterized as follows: | URIs are characterized as follows: | |||
| Uniform | Uniform | |||
| Uniformity provides several benefits: it allows different types of | Uniformity provides several benefits: it allows different types of | |||
| resource identifiers to be used in the same context, even when the | resource identifiers to be used in the same context, even when the | |||
| mechanisms used to access those resources may differ; it allows | mechanisms used to access those resources may differ; it allows | |||
| skipping to change at page 5, line 14 ¶ | skipping to change at page 5, line 14 ¶ | |||
| mathematical equation or the types of a relationship (e.g., | mathematical equation or the types of a relationship (e.g., | |||
| "parent" or "employee"). | "parent" or "employee"). | |||
| Identifier | Identifier | |||
| An identifier embodies the information required to distinguish | An identifier embodies the information required to distinguish | |||
| what is being identified from all other things within its scope of | what is being identified from all other things within its scope of | |||
| identification. | identification. | |||
| A URI is an identifier that consists of a sequence of characters | A URI is an identifier that consists of a sequence of characters | |||
| matching the restricted syntax defined by this specification. A URI | matching the syntax defined by the grammar rule named "URI" in | |||
| can be used to refer to a resource. This specification does not | Section 3. A URI can be used to refer to a resource. This | |||
| place any limits on the nature of a resource or the reasons why an | specification does not place any limits on the nature of a resource | |||
| application might wish to refer to a resource. URIs have a global | or the reasons why an application might wish to refer to a resource. | |||
| scope and should be interpreted consistently regardless of context, | URIs have a global scope and should be interpreted consistently | |||
| but that interpretation may be defined in relation to the user's | regardless of context, but that interpretation may be defined in | |||
| context (e.g., "http://localhost/" refers to a resource that is | relation to the user's context (e.g., "http://localhost/" refers to a | |||
| relative to the user's network interface and yet not specific to any | resource that is relative to the user's network interface and yet not | |||
| one user). | specific to any one user). | |||
| 1.1.1 Generic Syntax | 1.1.1 Generic Syntax | |||
| Each URI begins with a scheme name, as defined in Section 3.1, that | Each URI begins with a scheme name, as defined in Section 3.1, that | |||
| refers to a specification for assigning identifiers within that | refers to a specification for assigning identifiers within that | |||
| scheme. As such, the URI syntax is a federated and extensible naming | scheme. As such, the URI syntax is a federated and extensible naming | |||
| system wherein each scheme's specification may further restrict the | system wherein each scheme's specification may further restrict the | |||
| syntax and semantics of identifiers using that scheme. | syntax and semantics of identifiers using that scheme. | |||
| This specification defines those elements of the URI syntax that are | This specification defines those elements of the URI syntax that are | |||
| skipping to change at page 6, line 31 ¶ | skipping to change at page 6, line 31 ¶ | |||
| news:comp.infosystems.www.servers.unix | news:comp.infosystems.www.servers.unix | |||
| -- news scheme for USENET news groups and articles | -- news scheme for USENET news groups and articles | |||
| telnet://melvyl.ucop.edu/ | telnet://melvyl.ucop.edu/ | |||
| -- telnet scheme for interactive TELNET services | -- telnet scheme for interactive TELNET services | |||
| 1.1.3 URI, URL, and URN | 1.1.3 URI, URL, and URN | |||
| A URI can be further classified as a locator, a name, or both. The | A URI can be further classified as a locator, a name, or both. The | |||
| term "Uniform Resource Locator" (URL) refers to the subset of URIs | term "Uniform Resource Locator" (URL) refers to the subset of URIs | |||
| that, in addition to identifying the resource, provide a means of | that, in addition to identifying a resource, provide a means of | |||
| locating the resource by describing its primary access mechanism | locating the resource by describing its primary access mechanism | |||
| (e.g., its network "location"). The term "Uniform Resource Name" | (e.g., its network "location"). The term "Uniform Resource Name" | |||
| (URN) refers to the subset of URIs that are required to remain | (URN) refers to URIs under the "urn" scheme [RFC2141], which are | |||
| globally unique and persistent even when the resource ceases to exist | required to remain globally unique and persistent even when the | |||
| or becomes unavailable. | resource ceases to exist or becomes unavailable. | |||
| An individual scheme does not need to be classified as being just one | An individual scheme does not need to be classified as being just one | |||
| of "name" or "locator". Instances of URIs from any given scheme may | of "name" or "locator". Instances of URIs from any given scheme may | |||
| have the characteristics of names or locators or both, often | have the characteristics of names or locators or both, often | |||
| depending on the persistence and care in the assignment of | depending on the persistence and care in the assignment of | |||
| identifiers by the naming authority, rather than any quality of the | identifiers by the naming authority, rather than any quality of the | |||
| scheme. This specification deprecates use of the term "URN" for | scheme. | |||
| anything but URIs in the "urn" scheme [RFC2141]. This specification | ||||
| also deprecates the term "URL". | ||||
| 1.2 Design Considerations | 1.2 Design Considerations | |||
| 1.2.1 Transcription | 1.2.1 Transcription | |||
| The URI syntax has been designed with global transcription as one of | The URI syntax has been designed with global transcription as one of | |||
| its main considerations. A URI is a sequence of characters from a | its main considerations. A URI is a sequence of characters from a | |||
| very limited set: the letters of the basic Latin alphabet, digits, | very limited set: the letters of the basic Latin alphabet, digits, | |||
| and a few special characters. A URI may be represented in a variety | and a few special characters. A URI may be represented in a variety | |||
| of ways: e.g., ink on paper, pixels on a screen, or a sequence of | of ways: e.g., ink on paper, pixels on a screen, or a sequence of | |||
| skipping to change at page 7, line 43 ¶ | skipping to change at page 7, line 41 ¶ | |||
| familiar components. | familiar components. | |||
| These design considerations are not always in alignment. For | These design considerations are not always in alignment. For | |||
| example, it is often the case that the most meaningful name for a URI | example, it is often the case that the most meaningful name for a URI | |||
| component would require characters that cannot be typed into some | component would require characters that cannot be typed into some | |||
| systems. The ability to transcribe a resource identifier from one | systems. The ability to transcribe a resource identifier from one | |||
| medium to another has been considered more important than having a | medium to another has been considered more important than having a | |||
| URI consist of the most meaningful of components. In local or | URI consist of the most meaningful of components. In local or | |||
| regional contexts and with improving technology, users might benefit | regional contexts and with improving technology, users might benefit | |||
| from being able to use a wider range of characters; such use is not | from being able to use a wider range of characters; such use is not | |||
| defined in this document. | defined in this specification. | |||
| 1.2.2 Separating Identification from Interaction | 1.2.2 Separating Identification from Interaction | |||
| A common misunderstanding of URIs is that they are only used to refer | A common misunderstanding of URIs is that they are only used to refer | |||
| to accessible resources. In fact, the URI alone only provides | to accessible resources. In fact, the URI alone only provides | |||
| identification; access to the resource is neither guaranteed nor | identification; access to the resource is neither guaranteed nor | |||
| implied by the presence of a URI. Instead, an operation (if any) | implied by the presence of a URI. Instead, an operation (if any) | |||
| associated with a URI reference is defined by the protocol element, | associated with a URI reference is defined by the protocol element, | |||
| data format attribute, or natural language text in which it appears. | data format attribute, or natural language text in which it appears. | |||
| Given a URI, a system may attempt to perform a variety of operations | Given a URI, a system may attempt to perform a variety of operations | |||
| on the resource, as might be characterized by such words as "denote", | on the resource, as might be characterized by such words as "denote", | |||
| "access", "update", "replace", or "find attributes". Such operations | "access", "update", "replace", or "find attributes". Such operations | |||
| are defined by the protocols that make use of URIs, not by this | are defined by the protocols that make use of URIs, not by this | |||
| specification. However, we do use a few general terms for describing | specification. However, we do use a few general terms for describing | |||
| common operations on URIs. URI "resolution" is the process of | common operations on URIs. URI "resolution" is the process of | |||
| determining an access mechanism and the appropriate parameters | determining an access mechanism and the appropriate parameters | |||
| necessary to dereference a URI; such resolution may require several | necessary to dereference a URI; such resolution may require several | |||
| iterations. Using that access mechanism to perform some action on | iterations. Use of that access mechanism to perform an action on the | |||
| the URI's resource is termed a "dereference" of the URI. | URI's resource is termed a "dereference" of the URI. | |||
| When URIs are used within information systems to identify sources of | When URIs are used within information systems to identify sources of | |||
| information, the most common form of URI dereference is "retrieval": | information, the most common form of URI dereference is "retrieval": | |||
| making use of a URI in order to retrieve a representation of its | making use of a URI in order to retrieve a representation of its | |||
| associated resource. A "representation" is a sequence of octets, | associated resource. A "representation" is a sequence of octets, | |||
| along with metadata describing those octets, that constitutes a | along with metadata describing those octets, that constitutes a | |||
| record of the state of the resource at the time that the | record of the state of the resource at the time that the | |||
| representation is generated. Retrieval is achieved by a process that | representation is generated. Retrieval is achieved by a process that | |||
| might include using the URI as a cache key to check for a locally | might include using the URI as a cache key to check for a locally | |||
| cached representation, resolution of the URI to determine an | cached representation, resolution of the URI to determine an | |||
| skipping to change at page 9, line 6 ¶ | skipping to change at page 9, line 4 ¶ | |||
| via the named protocol. URIs are often used simply for the sake of | via the named protocol. URIs are often used simply for the sake of | |||
| identification. Even when a URI is used to retrieve a representation | identification. Even when a URI is used to retrieve a representation | |||
| of a resource, that access might be through gateways, proxies, | of a resource, that access might be through gateways, proxies, | |||
| caches, and name resolution services that are independent of the | caches, and name resolution services that are independent of the | |||
| protocol associated with the scheme name, and the resolution of some | protocol associated with the scheme name, and the resolution of some | |||
| URIs may require the use of more than one protocol (e.g., both DNS | URIs may require the use of more than one protocol (e.g., both DNS | |||
| and HTTP are typically used to access an "http" URI's origin server | and HTTP are typically used to access an "http" URI's origin server | |||
| when a representation isn't found in a local cache). | when a representation isn't found in a local cache). | |||
| 1.2.3 Hierarchical Identifiers | 1.2.3 Hierarchical Identifiers | |||
| The URI syntax is organized hierarchically, with components listed in | The URI syntax is organized hierarchically, with components listed in | |||
| decreasing order from left to right. For some URI schemes, the | decreasing order from left to right. For some URI schemes, the | |||
| visible hierarchy is limited to the scheme itself: everything after | visible hierarchy is limited to the scheme itself: everything after | |||
| the scheme component delimiter is considered opaque to URI | the scheme component delimiter is considered opaque to URI | |||
| processing. Other URI schemes make the hierarchy explicit and visible | processing. Other URI schemes make the hierarchy explicit and visible | |||
| to generic parsing algorithms. | to generic parsing algorithms. | |||
| The URI syntax reserves the slash ("/"), question-mark ("?"), and | The URI syntax reserves the slash ("/"), question-mark ("?"), and | |||
| crosshatch ("#") characters for the purpose of delimiting components | number-sign ("#") characters for the purpose of delimiting components | |||
| that are significant to the generic parser's hierarchical | that are significant to the generic parser's hierarchical | |||
| interpretation of an identifier. In addition to aiding the | interpretation of an identifier. In addition to aiding the | |||
| readability of such identifiers through the consistent use of | readability of such identifiers through the consistent use of | |||
| familiar syntax, this uniform representation of hierarchy across | familiar syntax, this uniform representation of hierarchy across | |||
| naming schemes allows scheme-independent references to be made | naming schemes allows scheme-independent references to be made | |||
| relative to that hierarchy. | relative to that hierarchy. | |||
| An "absolute" URI refers to a resource independent of the naming | It is often the case that a group or "tree" of documents has been | |||
| hierarchy in which the identifier is used. In contrast, a "relative" | constructed to serve a common purpose; the vast majority of URIs in | |||
| URI refers to a resource by describing the difference within a | these documents point to resources within the tree rather than | |||
| hierarchical name space between the current context and an absolute | outside of it. Similarly, documents located at a particular site are | |||
| URI of the resource. Section 4.2 defines a scheme-independent form | much more likely to refer to other resources at that site than to | |||
| of relative URI reference that can be used in conjunction with a base | resources at remote sites. | |||
| URI of a hierarchical scheme to produce the absolute URI form of that | ||||
| reference. | Relative referencing of URIs allows document trees to be partially | |||
| independent of their location and access scheme. For instance, it is | ||||
| possible for a single set of hypertext documents to be simultaneously | ||||
| accessible and traversable via each of the "file", "http", and "ftp" | ||||
| schemes if the documents refer to each other using relative | ||||
| references. Furthermore, such document trees can be moved, as a | ||||
| whole, without changing any of the relative references. | ||||
| A relative URI reference (Section 4.2) refers to a resource by | ||||
| describing the difference within a hierarchical name space between | ||||
| the current context and the target URI. The reference resolution | ||||
| algorithm, presented in Section 5, defines how such references are | ||||
| resolved. | ||||
| 1.3 Syntax Notation | 1.3 Syntax Notation | |||
| This document uses the Augmented Backus-Naur Form (ABNF) notation of | This specification uses the Augmented Backus-Naur Form (ABNF) | |||
| [RFC2234] to define the URI syntax. Although the ABNF defines syntax | notation of [RFC2234] to define the URI syntax. Although the ABNF | |||
| in terms of the US-ASCII character encoding [ASCII], the URI syntax | defines syntax in terms of the US-ASCII character encoding [ASCII], | |||
| should be interpreted in terms of the character that the | the URI syntax should be interpreted in terms of the character that | |||
| ASCII-encoded octet represents, rather than the octet encoding | the ASCII-encoded octet represents, rather than the octet encoding | |||
| itself. How a URI is represented in terms of bits and bytes on the | itself. How a URI is represented in terms of bits and bytes on the | |||
| wire is dependent upon the character encoding of the protocol used to | wire is dependent upon the character encoding of the protocol used to | |||
| transport it, or the charset of the document that contains it. | transport it, or the charset of the document that contains it. | |||
| The following core ABNF productions are used by this specification as | The following core ABNF productions are used by this specification as | |||
| defined by Section 6.1 of [RFC2234]: ALPHA, CR, CTL, DIGIT, DQUOTE, | defined by Section 6.1 of [RFC2234]: ALPHA, CR, CTL, DIGIT, DQUOTE, | |||
| HEXDIG, LF, OCTET, and SP. The complete URI syntax is collected in | HEXDIG, LF, OCTET, and SP. The complete URI syntax is collected in | |||
| Appendix A. | Appendix A. | |||
| 2. Characters | 2. Characters | |||
| skipping to change at page 10, line 45 ¶ | skipping to change at page 11, line 45 ¶ | |||
| non-ASCII data, numeric coordinates on a map, etc. Some URI schemes | non-ASCII data, numeric coordinates on a map, etc. Some URI schemes | |||
| define a specific encoding of raw data to US-ASCII characters as part | define a specific encoding of raw data to US-ASCII characters as part | |||
| of their scheme-specific requirements. Most URI schemes represent | of their scheme-specific requirements. Most URI schemes represent | |||
| data octets by the US-ASCII character corresponding to that octet, | data octets by the US-ASCII character corresponding to that octet, | |||
| either directly in the form of the character's glyph or by use of an | either directly in the form of the character's glyph or by use of an | |||
| escape triplet (Section 2.4). | escape triplet (Section 2.4). | |||
| When a URI scheme defines a component that represents textual data | When a URI scheme defines a component that represents textual data | |||
| consisting of characters from the Unicode (ISO 10646) character set, | consisting of characters from the Unicode (ISO 10646) character set, | |||
| we recommend that the data be encoded first as octets according to | we recommend that the data be encoded first as octets according to | |||
| the UTF-8 [UTF-8] character encoding, and then escaping any octets | the UTF-8 [UTF-8] character encoding, and then escaping only those | |||
| that are not in the unreserved character set. | octets that are not in the unreserved character set. | |||
| 2.2 Reserved Characters | 2.2 Reserved Characters | |||
| URIs include components and sub-components that are delimited by | URIs include components and sub-components that are delimited by | |||
| certain special characters. These characters are called "reserved", | certain special characters. These characters are called "reserved", | |||
| since their usage within a URI component is limited to their reserved | since their usage within a URI component is limited to their reserved | |||
| purpose within that component. If data for a URI component would | purpose within that component. If data for a URI component would | |||
| conflict with the reserved purpose, then the conflicting data must be | conflict with the reserved purpose, then the conflicting data must be | |||
| escaped (Section 2.4) before forming the URI. | escaped (Section 2.4) before forming the URI. | |||
| skipping to change at page 11, line 27 ¶ | skipping to change at page 12, line 27 ¶ | |||
| delimiter role by this specification should be considered reserved | delimiter role by this specification should be considered reserved | |||
| for special use by whatever software generates the URI (i.e., they | for special use by whatever software generates the URI (i.e., they | |||
| may be used to delimit or indicate information that is significant to | may be used to delimit or indicate information that is significant to | |||
| interpretation of the identifier, but that significance is outside | interpretation of the identifier, but that significance is outside | |||
| the scope of this specification). Outside of the URI's origin, a | the scope of this specification). Outside of the URI's origin, a | |||
| reserved character cannot be escaped without fear of changing how it | reserved character cannot be escaped without fear of changing how it | |||
| will be interpreted; likewise, an escaped octet that corresponds to a | will be interpreted; likewise, an escaped octet that corresponds to a | |||
| reserved character cannot be unescaped outside the software that is | reserved character cannot be unescaped outside the software that is | |||
| responsible for interpreting it during URI resolution. | responsible for interpreting it during URI resolution. | |||
| The slash ("/"), question-mark ("?"), and crosshatch ("#") characters | The slash ("/"), question-mark ("?"), and number-sign ("#") | |||
| are reserved in all URI for the purpose of delimiting components that | characters are reserved in all URIs for the purpose of delimiting | |||
| are significant to the generic parser's hierarchical interpretation | components that are significant to the generic parser's hierarchical | |||
| of an identifier. The hierarchical prefix of a URI, wherein the | interpretation of an identifier. The hierarchical prefix of a URI, | |||
| slash ("/") character signifies a hierarchy delimiter, extends from | wherein the slash ("/") character signifies a hierarchy delimiter, | |||
| the scheme (Section 3.1) through to the first question-mark ("?"), | extends from the scheme (Section 3.1) through to the first | |||
| crosshatch ("#"), or the end of the URI string. In other words, the | question-mark ("?"), number-sign ("#"), or the end of the URI string. | |||
| slash ("/") character is not treated as a hierarchical separator | In other words, the slash ("/") character is not treated as a | |||
| within the query (Section 3.4) and fragment (Section 3.5) components | hierarchical separator within the query (Section 3.4) and fragment | |||
| of a URI, but is still considered reserved within those components | (Section 3.5) components of a URI, but is still considered reserved | |||
| for purposes outside the scope of this specification. | within those components for purposes outside the scope of this | |||
| specification. | ||||
| 2.3 Unreserved Characters | 2.3 Unreserved Characters | |||
| Data characters that are allowed in a URI but do not have a reserved | Characters that are allowed in a URI but do not have a reserved | |||
| purpose are called unreserved. These include uppercase and lowercase | purpose are called unreserved. These include uppercase and lowercase | |||
| letters, decimal digits, and a limited set of punctuation marks and | letters, decimal digits, and a limited set of punctuation marks and | |||
| symbols. | symbols. | |||
| unreserved = ALPHA / DIGIT / mark | unreserved = ALPHA / DIGIT / mark | |||
| mark = "-" / "_" / "." / "!" / "~" / "*" / "'" / "(" / ")" | mark = "-" / "_" / "." / "!" / "~" / "*" / "'" / "(" / ")" | |||
| Unreserved characters can be escaped without changing the semantics | Escaping unreserved characters in a URI does not change what resource | |||
| of a URI, but this should not be done unless the URI is being used in | is identified by that URI. However, it may change the result of a | |||
| a context that does not allow the unescaped character to appear. URI | URI comparison (Section 6), potentially leading to less efficient | |||
| normalization processes may unescape sequences in the ranges of ALPHA | actions by an application. Therefore, unreserved characters should | |||
| (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), underscore | not be escaped unless the URI is being used in a context that does | |||
| (%5F), or tilde (%7E) without fear of creating a conflict, but | not allow the unescaped character to appear. URI normalization | |||
| unescaping the other mark characters is usually counterproductive. | processes may unescape sequences in the ranges of ALPHA (%41-%5A and | |||
| %61-%7A), DIGIT (%30-%39), hyphen (%2D), underscore (%5F), or tilde | ||||
| (%7E) without fear of creating a conflict, but unescaping the other | ||||
| mark characters is usually counterproductive. | ||||
| 2.4 Escaped Characters | 2.4 Escaped Characters | |||
| Data must be escaped if it does not have a representation using an | Data must be escaped if it does not have a representation using an | |||
| unreserved character; this includes data that does not correspond to | unreserved character; this includes data that does not correspond to | |||
| a printable character of the US-ASCII coded character set or | a printable character of the US-ASCII coded character set or | |||
| corresponds to a US-ASCII character that delimits the component from | corresponds to a US-ASCII character that delimits the component from | |||
| others, is reserved in that component for delimiting sub-components, | others, is reserved in that component for delimiting sub-components, | |||
| or is excluded from any use within a URI (Section 2.5). | or is excluded from any use within a URI (Section 2.5). | |||
| 2.4.1 Escaped Encoding | 2.4.1 Escaped Encoding | |||
| An escaped octet is encoded as a character triplet, consisting of | An escaped octet is encoded as a character triplet, consisting of | |||
| the percent character "%" followed by the two hexadecimal digits | the percent character "%" followed by the two hexadecimal digits | |||
| representing that octet's numeric value. For example, "%20" is the | representing that octet's numeric value. For example, "%20" is the | |||
| escaped encoding for the US-ASCII space character (SP). This is | escaped encoding for the binary octet "00100000" (ABNF: %x20), which | |||
| sometimes referred to as "percent-encoding" the octet. | corresponds to the US-ASCII space character (SP). This is sometimes | |||
| referred to as "percent-encoding" the octet. | ||||
| escaped = "%" HEXDIG HEXDIG | escaped = "%" HEXDIG HEXDIG | |||
| The uppercase hexadecimal digits 'A' through 'F' are equivalent to | The uppercase hexadecimal digits 'A' through 'F' are equivalent to | |||
| the lowercase digits 'a' through 'f', respectively. Two URIs that | the lowercase digits 'a' through 'f', respectively. Two URIs that | |||
| differ only in the case of hexadecimal digits used in escaped octets | differ only in the case of hexadecimal digits used in escaped octets | |||
| are equivalent. For consistency, we recommend that uppercase digits | are equivalent. For consistency, we recommend that uppercase digits | |||
| be used by URI generators and normalizers. | be used by URI generators and normalizers. | |||
| 2.4.2 When to Escape and Unescape | 2.4.2 When to Escape and Unescape | |||
| skipping to change at page 13, line 13 ¶ | skipping to change at page 14, line 18 ¶ | |||
| escaped characters within those components can be safely unescaped. | escaped characters within those components can be safely unescaped. | |||
| In some cases, data that could be represented by an unreserved | In some cases, data that could be represented by an unreserved | |||
| character may appear escaped; for example, some of the unreserved | character may appear escaped; for example, some of the unreserved | |||
| "mark" characters are automatically escaped by some systems. A URI | "mark" characters are automatically escaped by some systems. A URI | |||
| normalizer may unescape escaped octets that are represented by | normalizer may unescape escaped octets that are represented by | |||
| characters in the unreserved set. For example, "%7E" is sometimes | characters in the unreserved set. For example, "%7E" is sometimes | |||
| used instead of tilde ("~") in an "http" URI path and can be | used instead of tilde ("~") in an "http" URI path and can be | |||
| converted to "~" without changing the interpretation of the URI. | converted to "~" without changing the interpretation of the URI. | |||
| In all cases, a URI character is equivalent to its corresponding | ||||
| ASCII-encoded octet, even when that octet is represented as a | ||||
| percent-escape. URI characters are provided as an external ASCII | ||||
| interface for identification between systems. A system that | ||||
| internally provides identifiers in the form of a different character | ||||
| encoding, such as EBCDIC, will generally perform character | ||||
| translation of textual identifiers to UTF-8 at some internal | ||||
| interface, thus providing meaningful identifiers in ASCII even though | ||||
| the back-end identifiers are in a different encoding. Escaped octets | ||||
| must be unescaped before such a transcoding is applied. Although | ||||
| this specification does not define the character encoding of escaped | ||||
| octets outside the ASCII range, the general principle of unescaping | ||||
| before transcoding should be applied for all character encodings. | ||||
| Because the percent ("%") character serves as the escape indicator, | Because the percent ("%") character serves as the escape indicator, | |||
| it must be escaped as "%25" in order for that octet to be used as | it must be escaped as "%25" in order for that octet to be used as | |||
| data within a URI. Implementers should be careful not to escape or | data within a URI. Implementers should be careful not to escape or | |||
| unescape the same string more than once, since unescaping an already | unescape the same string more than once, since unescaping an already | |||
| unescaped string might lead to misinterpreting a percent data | unescaped string might lead to misinterpreting a percent data | |||
| character as another escaped character, or vice versa in the case of | character as another escaped character, or vice versa in the case of | |||
| escaping an already escaped string. | escaping an already escaped string. | |||
| 2.5 Excluded Characters | 2.5 Excluded Characters | |||
| skipping to change at page 15, line 38 ¶ | skipping to change at page 16, line 38 ¶ | |||
| a generic URI parser. | a generic URI parser. | |||
| The authority component is only present when a string matches the | The authority component is only present when a string matches the | |||
| net-path production. Since the presence of an authority component | net-path production. Since the presence of an authority component | |||
| restricts the remaining syntax for path, we have not included a | restricts the remaining syntax for path, we have not included a | |||
| specific "path" rule in the syntax. Instead, what we refer to as the | specific "path" rule in the syntax. Instead, what we refer to as the | |||
| URI path is that part of the parsed URI string matching the abs-path | URI path is that part of the parsed URI string matching the abs-path | |||
| or rel-path production in the syntax above, since they are mutually | or rel-path production in the syntax above, since they are mutually | |||
| exclusive for any given URI and can be parsed as a single component. | exclusive for any given URI and can be parsed as a single component. | |||
| The following are two example URIs and their component parts: | ||||
| foo://example.com:8042/over/there?name=ferret#nose | ||||
| \_/ \______________/\_________/ \_________/ \__/ | ||||
| | | | | | | ||||
| scheme authority path query fragment | ||||
| | _____________________|__ | ||||
| / \ / \ | ||||
| urn:example:animal:ferret:nose | ||||
| 3.1 Scheme | 3.1 Scheme | |||
| Each URI begins with a scheme name that refers to a specification for | Each URI begins with a scheme name that refers to a specification for | |||
| assigning identifiers within that scheme. As such, the URI syntax is | assigning identifiers within that scheme. As such, the URI syntax is | |||
| a federated and extensible naming system wherein each scheme's | a federated and extensible naming system wherein each scheme's | |||
| specification may further restrict the syntax and semantics of | specification may further restrict the syntax and semantics of | |||
| identifiers using that scheme. | identifiers using that scheme. | |||
| Scheme names consist of a sequence of characters beginning with a | Scheme names consist of a sequence of characters beginning with a | |||
| letter and followed by any combination of letters, digits, plus | letter and followed by any combination of letters, digits, plus | |||
| skipping to change at page 16, line 26 ¶ | skipping to change at page 17, line 37 ¶ | |||
| Many URI schemes include a hierarchical element for a naming | Many URI schemes include a hierarchical element for a naming | |||
| authority, such that governance of the name space defined by the | authority, such that governance of the name space defined by the | |||
| remainder of the URI is delegated to that authority (which may, in | remainder of the URI is delegated to that authority (which may, in | |||
| turn, delegate it further). The generic syntax provides a common | turn, delegate it further). The generic syntax provides a common | |||
| means for distinguishing an authority based on a registered domain | means for distinguishing an authority based on a registered domain | |||
| name or server address, along with optional port and user | name or server address, along with optional port and user | |||
| information. | information. | |||
| The authority component is preceded by a double slash ("//") and is | The authority component is preceded by a double slash ("//") and is | |||
| terminated by the next slash ("/"), question-mark ("?"), or | terminated by the next slash ("/"), question-mark ("?"), or | |||
| crosshatch ("#") character, or by the end of the URI. | number-sign ("#") character, or by the end of the URI. | |||
| authority = [ userinfo "@" ] host [ ":" port ] | authority = [ userinfo "@" ] host [ ":" port ] | |||
| The parts "<userinfo>@" and ":<port>" may be omitted. | The parts "<userinfo>@" and ":<port>" may be omitted. | |||
| Some schemes do not allow the userinfo and/or port sub-components. | Some schemes do not allow the userinfo and/or port sub-components. | |||
| When presented with a URI that violates one or more scheme-specific | When presented with a URI that violates one or more scheme-specific | |||
| restrictions, the scheme-specific URI resolution process should flag | restrictions, the scheme-specific URI resolution process should flag | |||
| the reference as an error rather than ignore the unused parts; doing | the reference as an error rather than ignore the unused parts; doing | |||
| so reduces the number of equivalent URIs and helps detect abuses of | so reduces the number of equivalent URIs and helps detect abuses of | |||
| skipping to change at page 19, line 16 ¶ | skipping to change at page 20, line 29 ¶ | |||
| 3.3 Path | 3.3 Path | |||
| The path component contains hierarchical data that, along with data | The path component contains hierarchical data that, along with data | |||
| in the optional query (Section 3.4) component, serves to identify a | in the optional query (Section 3.4) component, serves to identify a | |||
| resource within the scope of that URI's scheme and naming authority | resource within the scope of that URI's scheme and naming authority | |||
| (if any). There is no specific "path" syntax production in the | (if any). There is no specific "path" syntax production in the | |||
| generic URI syntax. Instead, what we refer to as the URI path is | generic URI syntax. Instead, what we refer to as the URI path is | |||
| that part of the parsed URI string matching either the abs-path or | that part of the parsed URI string matching either the abs-path or | |||
| the rel-path production, since they are mutually exclusive for any | the rel-path production, since they are mutually exclusive for any | |||
| given URI and can be parsed as a single component. The path is | given URI and can be parsed as a single component. The path is | |||
| terminated by the first question-mark ("?") or crosshatch ("#") | terminated by the first question-mark ("?") or number-sign ("#") | |||
| character, or by the end of the URI. | character, or by the end of the URI. | |||
| path-segments = segment *( "/" segment ) | path-segments = segment *( "/" segment ) | |||
| segment = *pchar | segment = *pchar | |||
| pchar = unreserved / escaped / ";" / | pchar = unreserved / escaped / ";" / | |||
| ":" / "@" / "&" / "=" / "+" / "$" / "," | ":" / "@" / "&" / "=" / "+" / "$" / "," | |||
| The path consists of a sequence of path segments separated by a slash | The path consists of a sequence of path segments separated by a slash | |||
| ("/") character. A path is always defined for a URI, though the | ("/") character. A path is always defined for a URI, though the | |||
| defined path may be empty (zero length) or opaque (not containing any | defined path may be empty (zero length) or opaque (not containing any | |||
| "/" delimiters). For example, the URI <mailto:fred@example.com> has | "/" delimiters). For example, the URI <mailto:fred@example.com> has | |||
| a path of "fred@example.com". | a path of "fred@example.com". | |||
| Within a path segment, the semicolon (";") and equals ("=") reserved | ||||
| characters are often used for delimiting parameters and parameter | ||||
| values applicable to that segment. The comma (",") reserved | ||||
| character is often used for similar purposes. For example, one URI | ||||
| generator might use a segment like "name;v=1.1" to indicate a | ||||
| reference to version 1.1 of "name", whereas another might use a | ||||
| segment like "name,1.1" to indicate the same. Parameter types may be | ||||
| defined by scheme-specific semantics, but in most cases the meaning | ||||
| of a parameter is specific to the URI originator. Parameters are not | ||||
| significant to the parsing of relative references. | ||||
| The path segments "." and ".." are defined for relative reference | The path segments "." and ".." are defined for relative reference | |||
| within the path name hierarchy. They are intended for use at the | within the path name hierarchy. They are intended for use at the | |||
| beginning of a relative path reference (Section 4.2) for indicating | beginning of a relative path reference (Section 4.2) for indicating | |||
| relative position within the hierarchical tree of names, with a | relative position within the hierarchical tree of names, with a | |||
| similar effect to how they are used within some operating systems' | similar effect to how they are used within some operating systems' | |||
| file directory structure to indicate the current directory and parent | file directory structure to indicate the current directory and parent | |||
| directory, respectively. Unlike a file system, however, these | directory, respectively. Unlike a file system, however, these | |||
| dot-segments are only interpreted within the URI path hierarchy and | dot-segments are only interpreted within the URI path hierarchy and | |||
| must be removed as part of the URI normalization or resolution | are removed as part of the URI normalization or resolution process, | |||
| process, in accordance with the process described in Section 5.2. | as described in Section 5.2. | |||
| Aside from dot-segments in hierarchical paths, a path segment is | ||||
| considered opaque by the generic syntax. URI generating applications | ||||
| often use the reserved characters allowed in segment for the purpose | ||||
| of delimiting scheme-specific or generator-specific sub-components. | ||||
| For example, the semicolon (";") and equals ("=") reserved characters | ||||
| are often used for delimiting parameters and parameter values | ||||
| applicable to that segment. The comma (",") reserved character is | ||||
| often used for similar purposes. For example, one URI generator | ||||
| might use a segment like "name;v=1.1" to indicate a reference to | ||||
| version 1.1 of "name", whereas another might use a segment like | ||||
| "name,1.1" to indicate the same. Parameter types may be defined by | ||||
| scheme-specific semantics, but in most cases the meaning of a | ||||
| parameter is specific to the URI originator. | ||||
| 3.4 Query | 3.4 Query | |||
| The query component contains non-hierarchical data that, along with | The query component contains non-hierarchical data that, along with | |||
| data in the path (Section 3.3) component, serves to identify a | data in the path (Section 3.3) component, serves to identify a | |||
| resource within the scope of that URI's scheme and naming authority | resource within the scope of that URI's scheme and naming authority | |||
| (if any). The query component is indicated by the first question-mark | (if any). The query component is indicated by the first question-mark | |||
| ("?") character and terminated by a crosshatch ("#") character or by | ("?") character and terminated by a number-sign ("#") character or by | |||
| the end of the URI. | the end of the URI. | |||
| query = *( pchar / "/" / "?" ) | query = *( pchar / "/" / "?" ) | |||
| The characters slash ("/") and question-mark ("?") are allowed to | The characters slash ("/") and question-mark ("?") are allowed to | |||
| represent data within the query component, but such use is | represent data within the query component, but such use is | |||
| discouraged; incorrect implementations of relative URI resolution | discouraged; incorrect implementations of reference resolution often | |||
| often fail to distinguish them from hierarchical separators, thus | fail to distinguish them from hierarchical separators, thus resulting | |||
| resulting in non-interoperable results while parsing relative | in non-interoperable results while parsing relative references. | |||
| references. However, since query components are often used to carry | However, since query components are often used to carry identifying | |||
| identifying information in the form of "key=value" pairs, and one | information in the form of "key=value" pairs, and one frequently used | |||
| frequently used value is a reference to another URI, it is sometimes | value is a reference to another URI, it is sometimes better for | |||
| better for usability to include those characters unescaped. | usability to include those characters unescaped. | |||
| Note: Some client applications will fail to separate a reference's | ||||
| query component from its path component before merging the base | ||||
| and reference paths (Section 5.2). This may result in loss of | ||||
| information if the query component contains the strings "/../" or | ||||
| "/./". | ||||
| 3.5 Fragment | 3.5 Fragment | |||
| The fragment identifier component allows indirect identification of | The fragment identifier component allows indirect identification of a | |||
| a secondary resource by reference to a primary resource and | secondary resource by reference to a primary resource and additional | |||
| additional identifying information that is selective within that | identifying information that is selective within that resource. The | |||
| resource. The identified secondary resource may be some portion or | identified secondary resource may be some portion or subset of the | |||
| subset of the primary resource, some view on representations of the | primary resource, some view on representations of the primary | |||
| primary resource, or some other resource that is merely named within | resource, or some other resource that is merely named within the | |||
| the primary resource. A fragment identifier component is indicated | primary resource. A fragment identifier component is indicated by | |||
| by the presence of a crosshatch ("#") character and terminated by the | the presence of a number-sign ("#") character and terminated by the | |||
| end of the URI string. | end of the URI string. | |||
| fragment = *( pchar / "/" / "?" ) | fragment = *( pchar / "/" / "?" ) | |||
| The semantics of a fragment identifier are defined by the set of | The semantics of a fragment identifier are defined by the set of | |||
| representations that might result from a retrieval action on the | representations that might result from a retrieval action on the | |||
| primary resource. Therefore, the format and interpretation of a | primary resource. The fragment's format and resolution is therefore | |||
| fragment identifier component is dependent on the media type | dependent on the media type [RFC2046] of the retrieved | |||
| [RFC2046] of a potential retrieval result. Individual media types | representation, even though such a retrieval is only performed if the | |||
| may define their own restrictions on, or structure within, the | URI is dereferenced. Individual media types may define their own | |||
| fragment identifier syntax for specifying different types of subsets, | restrictions on, or structure within, the fragment identifier syntax | |||
| views, or external references that are identifiable as fragments by | for specifying different types of subsets, views, or external | |||
| that media type. If the primary resource is represented by multiple | references that are identifiable as secondary resources by that media | |||
| media types, as is often the case for resources whose representation | type. If the primary resource is represented by multiple media | |||
| is selected based on attributes of the retrieval request, then | types, as is often the case for resources whose representation is | |||
| interpretation of the given fragment identifier must be consistent | selected based on attributes of the retrieval request, then | |||
| across all of those media types in order for it to be viable as an | interpretation of the fragment identifier must be consistent across | |||
| all of those media types in order for it to be viable as an | ||||
| identifier. | identifier. | |||
| As with any URI, use of a fragment identifier component does not | As with any URI, use of a fragment identifier component does not | |||
| imply that a retrieval action will take place. A URI with a fragment | imply that a retrieval action will take place. A URI with a fragment | |||
| identifier may be used to refer to the secondary resource without any | identifier may be used to refer to the secondary resource without any | |||
| implication that the primary resource is accessible. However, if | implication that the primary resource is accessible. However, if | |||
| that URI is used in a context that does call for retrieval and is not | that URI is used in a context that does call for retrieval and is not | |||
| a same-document reference (Section 4.4), the fragment identifier is | a same-document reference (Section 4.4), the fragment identifier is | |||
| only valid as a reference if a retrieval action on the primary | only valid as a reference if a retrieval action on the primary | |||
| resource succeeds and results in a representation that defines the | resource succeeds and results in a representation for which the | |||
| fragment. | fragment identifier is meaningful. | |||
| Fragment identifiers have a special role in information systems as | Fragment identifiers have a special role in information systems as | |||
| the primary form of client-side indirect referencing, allowing an | the primary form of client-side indirect referencing, allowing an | |||
| author to specifically identify those aspects of an existing resource | author to specifically identify those aspects of an existing resource | |||
| that are only indirectly provided by the resource owner. As such, | that are only indirectly provided by the resource owner. As such, | |||
| interpretation of the fragment identifier during a retrieval action | interpretation of the fragment identifier during a retrieval action | |||
| is performed solely by the user agent; the fragment identifier is not | is performed solely by the user agent; the fragment identifier is not | |||
| passed to other systems during the process of retrieval. Although | passed to other systems during the process of retrieval. Although | |||
| this is often perceived to be a loss of information, particularly in | this is often perceived to be a loss of information, particularly in | |||
| regards to accurate redirection of references as content moves over | regards to accurate redirection of references as content moves over | |||
| skipping to change at page 22, line 24 ¶ | skipping to change at page 24, line 24 ¶ | |||
| design of the generic syntax, requiring a uniform parsing algorithm | design of the generic syntax, requiring a uniform parsing algorithm | |||
| in order to be interpreted consistently. | in order to be interpreted consistently. | |||
| 4.1 URI Reference | 4.1 URI Reference | |||
| The ABNF rule URI-reference is used to denote the most common usage | The ABNF rule URI-reference is used to denote the most common usage | |||
| of a resource identifier. | of a resource identifier. | |||
| URI-reference = URI / relative-URI | URI-reference = URI / relative-URI | |||
| A URI-reference may be absolute or relative: if the reference | A URI-reference may be relative: if the reference string's prefix | |||
| string's prefix matches the syntax of a scheme followed by its colon | matches the syntax of a scheme followed by its colon separator, then | |||
| separator, then the reference is a URI rather than a relative-URI. | the reference is a URI rather than a relative-URI. | |||
| A URI-reference is typically parsed first into the five URI | A URI-reference is typically parsed first into the five URI | |||
| components, in order to determine what components are present and | components, in order to determine what components are present and | |||
| whether the reference is relative or absolute, and then each | whether or not the reference is relative, and then each component is | |||
| component is parsed for its subparts and their validation. The ABNF | parsed for its subparts and their validation. The ABNF of | |||
| of URI-reference, along with the "first-match-wins" disambiguation | URI-reference, along with the "first-match-wins" disambiguation rule, | |||
| rule, is sufficient to define a validating parser for the generic | is sufficient to define a validating parser for the generic syntax. | |||
| syntax. Readers familiar with regular expressions should see | Readers familiar with regular expressions should see Appendix B for | |||
| Appendix B for an example of a non-validating URI-reference parser | an example of a non-validating URI-reference parser that will take | |||
| that will take any given string and extract the URI components. | any given string and extract the URI components. | |||
| 4.2 Relative URI | 4.2 Relative URI | |||
| A relative URI reference takes advantage of the hier-part syntax | A relative URI reference takes advantage of the hier-part syntax | |||
| (Section 3) in order to express a reference that is relative to the | (Section 3) in order to express a reference that is relative to the | |||
| name space of another hierarchical URI. | name space of another hierarchical URI. | |||
| relative-URI = hier-part [ "?" query ] [ "#" fragment ] | relative-URI = hier-part [ "?" query ] [ "#" fragment ] | |||
| The URI referred to by a relative URI reference is obtained by | The URI referred to by a relative reference is obtained by applying | |||
| applying the relative resolution algorithm of Section 5. | the reference resolution algorithm of Section 5. | |||
| A relative reference that begins with two slash characters is termed | A relative reference that begins with two slash characters is termed | |||
| a network-path reference; such references are rarely used. A relative | a network-path reference; such references are rarely used. A relative | |||
| reference that begins with a single slash character is termed an | reference that begins with a single slash character is termed an | |||
| absolute-path reference. A relative reference that does not begin | absolute-path reference. A relative reference that does not begin | |||
| with a slash character is termed a relative-path reference. | with a slash character is termed a relative-path reference. | |||
| A path segment that contains a colon character (e.g., "this:that") | A path segment that contains a colon character (e.g., "this:that") | |||
| cannot be used as the first segment of a relative-path reference | cannot be used as the first segment of a relative-path reference | |||
| because it might be mistaken for a scheme name. Such a segment must | because it would be mistaken for a scheme name. Such a segment must | |||
| be preceded by a dot-segment (e.g., "./this:that") to make a | be preceded by a dot-segment (e.g., "./this:that") to make a | |||
| relative-path reference. | relative-path reference. | |||
| 4.3 Absolute URI | 4.3 Absolute URI | |||
| Some protocol elements allow only the absolute form of a URI without | Some protocol elements allow only the absolute form of a URI without | |||
| a fragment identifier. For example, defining the base URI for later | a fragment identifier. For example, defining the base URI for later | |||
| use by relative references calls for an absolute-URI production that | use by relative references calls for an absolute-URI production that | |||
| does not allow a fragment. | does not allow a fragment. | |||
| absolute-URI = scheme ":" hier-part [ "?" query ] | absolute-URI = scheme ":" hier-part [ "?" query ] | |||
| 4.4 Same-document Reference | 4.4 Same-document Reference | |||
| When a URI reference occurring within a document or message refers to | When a URI reference occurring within a document or message refers to | |||
| a URI that is, aside from its fragment component (if any), identical | a URI that is, aside from its fragment component (if any), identical | |||
| to the base URI (Section 5), that reference is called a | to the base URI (Section 5.1), that reference is called a | |||
| "same-document" reference. The most frequent examples of | "same-document" reference. The most frequent examples of | |||
| same-document references are relative references that are empty or | same-document references are relative references that are empty or | |||
| include only the crosshatch ("#") separator followed by a fragment | include only the number-sign ("#") separator followed by a fragment | |||
| identifier. | identifier. | |||
| When a same-document reference is dereferenced for the purpose of a | When a same-document reference is dereferenced for the purpose of a | |||
| retrieval action, the target of that reference is defined to be | retrieval action, the target of that reference is defined to be | |||
| within that current document or message; the dereference should not | within that current document or message; the dereference should not | |||
| result in a new retrieval. | result in a new retrieval. | |||
| 4.5 Suffix Reference | 4.5 Suffix Reference | |||
| The URI syntax is designed for unambiguous reference to resources and | The URI syntax is designed for unambiguous reference to resources and | |||
| skipping to change at page 24, line 11 ¶ | skipping to change at page 26, line 11 ¶ | |||
| intended for human interpretation rather than machine, with the | intended for human interpretation rather than machine, with the | |||
| assumption that context-based heuristics are sufficient to complete | assumption that context-based heuristics are sufficient to complete | |||
| the URI (e.g., most hostnames beginning with "www" are likely to have | the URI (e.g., most hostnames beginning with "www" are likely to have | |||
| a URI prefix of "http://"). Although there is no standard set of | a URI prefix of "http://"). Although there is no standard set of | |||
| heuristics for disambiguating a URI suffix, many client | heuristics for disambiguating a URI suffix, many client | |||
| implementations allow them to be entered by the user and | implementations allow them to be entered by the user and | |||
| heuristically resolved. It should be noted that such heuristics may | heuristically resolved. It should be noted that such heuristics may | |||
| change over time, particularly when new URI schemes are introduced. | change over time, particularly when new URI schemes are introduced. | |||
| Since a URI suffix has the same syntax as a relative path reference, | Since a URI suffix has the same syntax as a relative path reference, | |||
| a suffix reference cannot be used in contexts where relative URIs are | a suffix reference cannot be used in contexts where a relative | |||
| expected. This limits use of suffix references to those places where | reference is expected. As a result, suffix references are limited to | |||
| there is no defined base URI, such as dialog boxes and off-line | those places where there is no defined base URI, such as dialog boxes | |||
| advertisements. | and off-line advertisements. | |||
| 5. Relative Resolution | ||||
| It is often the case that a group or "tree" of documents has been | 5. Reference Resolution | |||
| constructed to serve a common purpose; the vast majority of URIs in | ||||
| these documents point to resources within the tree rather than | ||||
| outside of it. Similarly, documents located at a particular site are | ||||
| much more likely to refer to other resources at that site than to | ||||
| resources at remote sites. | ||||
| Relative referencing of URIs allows document trees to be partially | This section defines the process of resolving a URI reference within | |||
| independent of their location and access scheme. For instance, it is | a context that allows relative references, such that the result is a | |||
| possible for a single set of hypertext documents to be simultaneously | string matching the "URI" syntax production of Section 3. | |||
| accessible and traversable via each of the "file", "http", and "ftp" | ||||
| schemes if the documents refer to each other using relative URIs. | ||||
| Furthermore, such document trees can be moved, as a whole, without | ||||
| changing any of the relative references. Experience within the WWW | ||||
| has demonstrated that the ability to perform relative referencing is | ||||
| necessary for the long-term usability of embedded URIs. | ||||
| 5.1 Establishing a Base URI | 5.1 Establishing a Base URI | |||
| The term "relative URI" implies that there exists some absolute "base | The term "relative" implies that there exists some "base URI" against | |||
| URI" against which the relative reference is applied. Indeed, the | which the relative reference is applied. Aside from same-document | |||
| base URI is necessary to define the semantics of any relative URI | references (Section 4.4, relative references are only usable if the | |||
| reference; without it, a relative reference is meaningless. In order | base URI is known. The base URI must be established by the parser | |||
| for relative URI to be usable within a document, the base URI of that | prior to parsing URI references that might be relative. | |||
| document must be known to the parser. | ||||
| A document that contains relative references must have a base URI | ||||
| that contains a hierarchical path component. In other words, a | ||||
| relative-URI cannot be used within a document that has an unsuitable | ||||
| base URI. Some URI schemes do not allow a hierarchical path component | ||||
| and are thus restricted to full URI references. | ||||
| An authority component is not required for a URI scheme to make use | ||||
| of relative references. A base URI without an authority component | ||||
| implies that any relative reference will also be without an authority | ||||
| component. | ||||
| The base URI of a document can be established in one of four ways, | The base URI of a document can be established in one of four ways, | |||
| listed below in order of precedence. The order of precedence can be | listed below in order of precedence. The order of precedence can be | |||
| thought of in terms of layers, where the innermost defined base URI | thought of in terms of layers, where the innermost defined base URI | |||
| has the highest precedence. This can be visualized graphically as: | has the highest precedence. This can be visualized graphically as: | |||
| .----------------------------------------------------------. | .----------------------------------------------------------. | |||
| | .----------------------------------------------------. | | | .----------------------------------------------------. | | |||
| | | .----------------------------------------------. | | | | | .----------------------------------------------. | | | |||
| | | | .----------------------------------------. | | | | | | | .----------------------------------------. | | | | |||
| skipping to change at page 26, line 30 ¶ | skipping to change at page 28, line 6 ¶ | |||
| Within certain document media types, the base URI of the document can | Within certain document media types, the base URI of the document can | |||
| be embedded within the content itself such that it can be readily | be embedded within the content itself such that it can be readily | |||
| obtained by a parser. This can be useful for descriptive documents, | obtained by a parser. This can be useful for descriptive documents, | |||
| such as tables of content, which may be transmitted to others through | such as tables of content, which may be transmitted to others through | |||
| protocols other than their usual retrieval context (e.g., E-Mail or | protocols other than their usual retrieval context (e.g., E-Mail or | |||
| USENET news). | USENET news). | |||
| It is beyond the scope of this document to specify how, for each | It is beyond the scope of this document to specify how, for each | |||
| media type, the base URI can be embedded. It is assumed that user | media type, the base URI can be embedded. It is assumed that user | |||
| agents manipulating such media types will be able to obtain the | agents manipulating such media types will be able to obtain the | |||
| appropriate syntax from that media type's specification. An example | appropriate syntax from that media type's specification. | |||
| of how the base URI can be embedded in the Hypertext Markup Language | ||||
| (HTML) [HTML] is provided in Appendix D. | ||||
| A mechanism for embedding the base URI within MIME container types | A mechanism for embedding the base URI within MIME container types | |||
| (e.g., the message and multipart types) is defined by MHTML | (e.g., the message and multipart types) is defined by MHTML | |||
| [RFC2110]. Protocols that do not use the MIME message header syntax, | [RFC2110]. Protocols that do not use the MIME message header syntax, | |||
| but do allow some form of tagged metadata to be included within | but do allow some form of tagged metadata to be included within | |||
| messages, may define their own syntax for defining the base URI as | messages, may define their own syntax for defining the base URI as | |||
| part of a message. | part of a message. | |||
| 5.1.2 Base URI from the Encapsulating Entity | 5.1.2 Base URI from the Encapsulating Entity | |||
| skipping to change at page 27, line 23 ¶ | skipping to change at page 28, line 42 ¶ | |||
| 5.1.4 Default Base URI | 5.1.4 Default Base URI | |||
| If none of the conditions described in above apply, then the base URI | If none of the conditions described in above apply, then the base URI | |||
| is defined by the context of the application. Since this definition | is defined by the context of the application. Since this definition | |||
| is necessarily application-dependent, failing to define the base URI | is necessarily application-dependent, failing to define the base URI | |||
| using one of the other methods may result in the same content being | using one of the other methods may result in the same content being | |||
| interpreted differently by different types of application. | interpreted differently by different types of application. | |||
| It is the responsibility of the distributor(s) of a document | It is the responsibility of the distributor(s) of a document | |||
| containing a relative URI to ensure that the base URI for that | containing a relative reference to ensure that the base URI for that | |||
| document can be established. It must be emphasized that a relative | document can be established. It must be emphasized that a relative | |||
| URI cannot be used reliably in situations where the document's base | reference, aside from a same-document reference, cannot be used | |||
| URI is not well-defined. | reliably in situations where the document's base URI is not | |||
| well-defined. | ||||
| 5.2 Obtaining the Referenced URI | 5.2 Obtaining the Referenced URI | |||
| This section describes an example algorithm for resolving URI | This section describes an example algorithm for resolving URI | |||
| references that might be relative to a given base URI. The algorithm | references that might be relative to a given base URI. The algorithm | |||
| is intended to provide a definitive result that can be used to test | is intended to provide a definitive result that can be used to test | |||
| the output of other implementations. Implementation of the algorithm | the output of other implementations. Implementation of the algorithm | |||
| itself is not required, but the result given by an implementation | itself is not required, but the result given by an implementation | |||
| must match the result that would be given by this algorithm. | must match the result that would be given by this algorithm. | |||
| The base URI (Base) is established according to the rules of Section | The base URI (Base) is established according to the rules of Section | |||
| 5.1 and parsed into the five main components described in Section 3. | 5.1 and parsed into the five main components described in Section 3. | |||
| Note that only the scheme component is required to be present in the | Note that only the scheme component is required to be present in the | |||
| base URI; the other components may be empty or undefined. A | base URI; the other components may be empty or undefined. A | |||
| component is undefined if its preceding separator does not appear in | component is undefined if its preceding separator does not appear in | |||
| the URI reference; the path component is never undefined, though it | the URI reference; the path component is never undefined, though it | |||
| may be empty. | may be empty. The algorithm assumes that the base URI is well-formed | |||
| and does not contain dot-segments in its path. | ||||
| For each URI reference (R), the following pseudocode describes an | For each URI reference (R), the following pseudocode describes an | |||
| algorithm for transforming R into its target URI (T): | algorithm for transforming R into its target URI (T): | |||
| -- The URI reference is parsed into the five URI components | ||||
| -- | ||||
| (R.scheme, R.authority, R.path, R.query, R.fragment) = parse(R); | (R.scheme, R.authority, R.path, R.query, R.fragment) = parse(R); | |||
| -- The URI reference is parsed into the five URI components | ||||
| if ((not validating) and (R.scheme == Base.scheme)) then | -- A non-strict parser may ignore a scheme in the reference | |||
| -- A non-validating parser may ignore a scheme in the | -- if it is identical to the base URI's scheme. | |||
| -- reference if it is identical to the base URI's scheme. | -- | |||
| if ((not strict) and (R.scheme == Base.scheme)) then | ||||
| undefine(R.scheme); | undefine(R.scheme); | |||
| endif; | endif; | |||
| if defined(R.scheme) then | if defined(R.scheme) then | |||
| T.scheme = R.scheme; | T.scheme = R.scheme; | |||
| T.authority = R.authority; | T.authority = R.authority; | |||
| T.path = R.path; | T.path = remove_dot_segments(R.path); | |||
| T.query = R.query; | T.query = R.query; | |||
| else | else | |||
| if defined(R.authority) then | if defined(R.authority) then | |||
| T.authority = R.authority; | T.authority = R.authority; | |||
| T.path = R.path; | T.path = remove_dot_segments(R.path); | |||
| T.query = R.query; | T.query = R.query; | |||
| else | else | |||
| if (R.path == "") then | if (R.path == "") then | |||
| T.path = Base.path; | T.path = Base.path; | |||
| if defined(R.query) then | if defined(R.query) then | |||
| T.query = R.query; | T.query = R.query; | |||
| else | else | |||
| T.query = Base.query; | T.query = Base.query; | |||
| endif; | endif; | |||
| else | else | |||
| if (R.path starts-with "/") then | if (R.path starts-with "/") then | |||
| T.path = R.path; | T.path = remove_dot_segments(R.path); | |||
| else | else | |||
| T.path = merge(Base.path, R.path); | T.path = merge(Base.path, R.path); | |||
| T.path = remove_dot_segments(T.path); | ||||
| endif; | endif; | |||
| T.query = R.query; | T.query = R.query; | |||
| endif; | endif; | |||
| T.authority = Base.authority; | T.authority = Base.authority; | |||
| endif; | endif; | |||
| T.scheme = Base.scheme; | T.scheme = Base.scheme; | |||
| endif; | endif; | |||
| T.fragment = R.fragment; | T.fragment = R.fragment; | |||
| The pseudocode above refers to a merge routine for merging a | The pseudocode above refers to a merge routine for merging a | |||
| relative-path reference with the path of the base URI to obtain the | relative-path reference with the path of the base URI. This is | |||
| target path. Although there are many ways to do this, we will | accomplished as follows: | |||
| describe a simple method using a separate string buffer: | ||||
| 1. All but the last segment of the base URI's path component is | o If the base URI's path is empty, then return a string consisting | |||
| copied to the buffer. In other words, any characters after the | of "/" concatenated with the reference's path component; | |||
| last (right-most) slash character, if any, are excluded. If the | otherwise, | |||
| base URI's path component is the empty string, then a single | ||||
| slash character ("/") is copied to the buffer. | ||||
| 2. The reference's path component is appended to the buffer string. | o If the base URI's path is non-hierarchical, as indicated by not | |||
| beginning with a slash, then return a string consisting of the | ||||
| reference's path component; otherwise, | ||||
| 3. All occurrences of "./", where "." is a complete path segment, | o Return a string consisting of the reference's path component | |||
| are removed from the buffer string. | appended to all but the last segment of the base URI's path (i.e., | |||
| any characters after the right-most "/" in the base URI path are | ||||
| excluded). | ||||
| 4. If the buffer string ends with "." as a complete path segment, | The pseudocode also refers to a remove_dot_segments routine for | |||
| that "." is removed. | interpreting and removing the special "." and ".." complete path | |||
| segments from a referenced path. This is done after the path is | ||||
| extracted from a reference, whether or not the path was relative, in | ||||
| order to remove any invalid or extraneous dot-segments prior to | ||||
| forming the target URI. Although there are many ways to accomplish | ||||
| this removal process, we describe a simple method using a separate | ||||
| string buffer: | ||||
| 5. All occurrences of "<segment>/../", where <segment> is a complete | 1. The buffer is initialized with the unprocessed path component. | |||
| path segment not equal to "..", are removed from the buffer | ||||
| string. Removal of these path segments is performed iteratively, | ||||
| removing the leftmost matching pattern on each iteration, until | ||||
| no matching pattern remains. | ||||
| 6. If the buffer string ends with "<segment>/..", where <segment> is | 2. If the buffer begins with "./" or "../", the "." or ".." segment | |||
| a complete path segment not equal to "..", that "<segment>/.." is | is removed. | |||
| removed. | ||||
| 7. If the resulting buffer string still begins with one or more | 3. All occurrences of "/./" in the buffer are replaced with "/". | |||
| complete path segments of "..", then the reference is considered | ||||
| to be in error. Implementations may handle this error by | ||||
| removing them from the resolved path (i.e., discarding relative | ||||
| levels above the root) or by avoiding traversal of the reference. | ||||
| 8. The remaining buffer string is the target URI's path component. | 4. If the buffer ends with "/.", the "." is removed. | |||
| Some systems may find it more efficient to implement the merge | 5. All occurrences of "/<segment>/../" in the buffer, where ".." and | |||
| algorithm as a pair of path segment stacks being merged, rather than | <segment> are complete path segments, are iteratively replaced | |||
| as a series of string pattern replacements. | with "/" in order from left to right until no matching pattern | |||
| remains. If the buffer ends with "/<segment>/..", that is also | ||||
| replaced with "/". Note that <segment> may be empty. | ||||
| Note: Some WWW client applications will fail to separate the | 6. All prefixes of "<segment>/../" in the buffer, where ".." and | |||
| reference's query component from its path component before merging | <segment> are complete path segments, are iteratively replaced | |||
| the base and reference paths. This may result in a loss of | with "/" in order from left to right until no matching pattern | |||
| information if the query component contains the strings "/../" or | remains. If the buffer ends with "<segment>/..", that is also | |||
| "/./". | replaced with "/". Note that <segment> may be empty. | |||
| 7. The remaining buffer is returned as the result of | ||||
| remove_dot_segments. | ||||
| Some systems may find it more efficient to implement the | ||||
| remove_dot_segments algorithm as a stack of path segments being | ||||
| compressed, rather than as a series of string pattern replacements. | ||||
| 5.3 Recomposition of a Parsed URI | 5.3 Recomposition of a Parsed URI | |||
| Parsed URI components can be recombined to obtain the referenced URI. | Parsed URI components can be recomposed to obtain the corresponding | |||
| Using pseudocode, this would be: | URI reference string. Using pseudocode, this would be: | |||
| result = "" | result = "" | |||
| if defined(T.scheme) then | if defined(scheme) then | |||
| append T.scheme to result; | append scheme to result; | |||
| append ":" to result; | append ":" to result; | |||
| endif; | endif; | |||
| if defined(T.authority) then | ||||
| if defined(authority) then | ||||
| append "//" to result; | append "//" to result; | |||
| append T.authority to result; | append authority to result; | |||
| endif; | endif; | |||
| append T.path to result; | append path to result; | |||
| if defined(T.query) then | if defined(query) then | |||
| append "?" to result; | append "?" to result; | |||
| append T.query to result; | append query to result; | |||
| endif; | endif; | |||
| if defined(fragment) then | if defined(fragment) then | |||
| append "#" to result; | append "#" to result; | |||
| append fragment to result; | append fragment to result; | |||
| endif; | endif; | |||
| return result; | return result; | |||
| Note that we are careful to preserve the distinction between a | Note that we are careful to preserve the distinction between a | |||
| component that is undefined, meaning that its separator was not | component that is undefined, meaning that its separator was not | |||
| present in the reference, and a component that is empty, meaning that | present in the reference, and a component that is empty, meaning that | |||
| the separator was present and was immediately followed by the next | the separator was present and was immediately followed by the next | |||
| component separator or the end of the reference. | component separator or the end of the reference. | |||
| 5.4 Examples of Relative Resolution | 5.4 Reference Resolution Examples | |||
| Within an object with a well-defined base URI of | Within an object with a well-defined base URI of | |||
| http://a/b/c/d;p?q | http://a/b/c/d;p?q | |||
| a relative URI reference would be resolved as follows: | a relative URI reference would be resolved as follows: | |||
| 5.4.1 Normal Examples | 5.4.1 Normal Examples | |||
| "g:h" = "g:h" | "g:h" = "g:h" | |||
| skipping to change at page 31, line 32 ¶ | skipping to change at page 33, line 17 ¶ | |||
| "" = "http://a/b/c/d;p?q" | "" = "http://a/b/c/d;p?q" | |||
| Parsers must be careful in handling the case where there are more | Parsers must be careful in handling the case where there are more | |||
| relative path ".." segments than there are hierarchical levels in the | relative path ".." segments than there are hierarchical levels in the | |||
| base URI's path. Note that the ".." syntax cannot be used to change | base URI's path. Note that the ".." syntax cannot be used to change | |||
| the authority component of a URI. | the authority component of a URI. | |||
| "../../../g" = "http://a/g" | "../../../g" = "http://a/g" | |||
| "../../../../g" = "http://a/g" | "../../../../g" = "http://a/g" | |||
| Similarly, parsers should remove the dot-segments "." and ".." when | Similarly, parsers must remove the dot-segments "." and ".." when | |||
| they are complete components of a path, but not when they are only | they are complete components of a path, but not when they are only | |||
| part of a segment. | part of a segment. | |||
| "/./g" = "http://a/g" | "/./g" = "http://a/g" | |||
| "/../g" = "http://a/g" | "/../g" = "http://a/g" | |||
| "g." = "http://a/b/c/g." | "g." = "http://a/b/c/g." | |||
| ".g" = "http://a/b/c/.g" | ".g" = "http://a/b/c/.g" | |||
| "g.." = "http://a/b/c/g.." | "g.." = "http://a/b/c/g.." | |||
| "..g" = "http://a/b/c/..g" | "..g" = "http://a/b/c/..g" | |||
| skipping to change at page 32, line 8 ¶ | skipping to change at page 33, line 40 ¶ | |||
| "./../g" = "http://a/b/g" | "./../g" = "http://a/b/g" | |||
| "./g/." = "http://a/b/c/g/" | "./g/." = "http://a/b/c/g/" | |||
| "g/./h" = "http://a/b/c/g/h" | "g/./h" = "http://a/b/c/g/h" | |||
| "g/../h" = "http://a/b/c/h" | "g/../h" = "http://a/b/c/h" | |||
| "g;x=1/./y" = "http://a/b/c/g;x=1/y" | "g;x=1/./y" = "http://a/b/c/g;x=1/y" | |||
| "g;x=1/../y" = "http://a/b/c/y" | "g;x=1/../y" = "http://a/b/c/y" | |||
| Some applications fail to separate the reference's query and/or | Some applications fail to separate the reference's query and/or | |||
| fragment components from a relative path before merging it with the | fragment components from a relative path before merging it with the | |||
| base path. This error is rarely noticed, since typical usage of a | base path and removing dot-segments. This error is rarely noticed, | |||
| fragment never includes the hierarchy ("/") character, and the query | since typical usage of a fragment never includes the hierarchy ("/") | |||
| component is not normally used within relative references. | character, and the query component is not normally used within | |||
| relative references. | ||||
| "g?y/./x" = "http://a/b/c/g?y/./x" | "g?y/./x" = "http://a/b/c/g?y/./x" | |||
| "g?y/../x" = "http://a/b/c/g?y/../x" | "g?y/../x" = "http://a/b/c/g?y/../x" | |||
| "g#s/./x" = "http://a/b/c/g#s/./x" | "g#s/./x" = "http://a/b/c/g#s/./x" | |||
| "g#s/../x" = "http://a/b/c/g#s/../x" | "g#s/../x" = "http://a/b/c/g#s/../x" | |||
| Some parsers allow the scheme name to be present in a relative URI if | Some parsers allow the scheme name to be present in a relative URI if | |||
| it is the same as the base URI scheme. This is considered to be a | it is the same as the base URI scheme. This is considered to be a | |||
| loophole in prior specifications of partial URI [RFC1630]. Its use | loophole in prior specifications of partial URI [RFC1630]. Its use | |||
| should be avoided, but is allowed for backward compatibility. | should be avoided, but is allowed for backward compatibility. | |||
| "http:g" = "http:g" ; for validating parsers | "http:g" = "http:g" ; for strict parsers | |||
| / "http://a/b/c/g" ; for backward compatibility | / "http://a/b/c/g" ; for backward compatibility | |||
| 6. Normalization and Comparison | 6. Normalization and Comparison | |||
| One of the most common operations on URIs is simple comparison: | One of the most common operations on URIs is simple comparison: | |||
| determining if two URIs are equivalent without using the URIs to | determining if two URIs are equivalent without using the URIs to | |||
| access their respective resource(s). A comparison is performed every | access their respective resource(s). A comparison is performed every | |||
| time a response cache is accessed, a browser checks its history to | time a response cache is accessed, a browser checks its history to | |||
| color a link, or an XML parser processes tags within a namespace. | color a link, or an XML parser processes tags within a namespace. | |||
| Extensive normalization prior to comparison of URIs is often used by | Extensive normalization prior to comparison of URIs is often used by | |||
| skipping to change at page 35, line 19 ¶ | skipping to change at page 37, line 19 ¶ | |||
| processing is moderately higher in cost than character-for-character | processing is moderately higher in cost than character-for-character | |||
| string comparison. For example, an application using this approach | string comparison. For example, an application using this approach | |||
| could reasonably consider the following two URIs equivalent: | could reasonably consider the following two URIs equivalent: | |||
| example://a/b/c/%7A | example://a/b/c/%7A | |||
| eXAMPLE://a/./b/../b/c/%7a | eXAMPLE://a/./b/../b/c/%7a | |||
| Web user agents, such as browsers, typically apply this type of URI | Web user agents, such as browsers, typically apply this type of URI | |||
| normalization when determining whether a cached response is | normalization when determining whether a cached response is | |||
| available. Syntax-based normalization includes such techniques as | available. Syntax-based normalization includes such techniques as | |||
| case normalization, escape normalization, and removal of leftover | case normalization, escape normalization, and removal of | |||
| relative path segments. | dot-segments. | |||
| 6.2.2.1 Case Normalization | 6.2.2.1 Case Normalization | |||
| When a URI scheme uses components of the generic syntax, it will also | When a URI scheme uses components of the generic syntax, it will also | |||
| use the common syntax equivalence rules, namely that the scheme and | use the common syntax equivalence rules, namely that the scheme and | |||
| hostname are case insensitive and therefore can be normalized to | hostname are case insensitive and therefore can be normalized to | |||
| lowercase. For example, the URI <HTTP://www.EXAMPLE.com/> is | lowercase. For example, the URI <HTTP://www.EXAMPLE.com/> is | |||
| equivalent to <http://www.example.com/>. | equivalent to <http://www.example.com/>. | |||
| 6.2.2.2 Escape Normalization | 6.2.2.2 Escape Normalization | |||
| skipping to change at page 35, line 52 ¶ | skipping to change at page 37, line 52 ¶ | |||
| generators go beyond that and escape characters that do not require | generators go beyond that and escape characters that do not require | |||
| escaping, resulting in URIs that are equivalent to their unescaped | escaping, resulting in URIs that are equivalent to their unescaped | |||
| counterparts. Such URIs can be normalized by unescaping sequences | counterparts. Such URIs can be normalized by unescaping sequences | |||
| that represent the unreserved characters, as described in Section | that represent the unreserved characters, as described in Section | |||
| 2.3. | 2.3. | |||
| 6.2.2.3 Path Segment Normalization | 6.2.2.3 Path Segment Normalization | |||
| The complete path segments "." and ".." have a special meaning within | The complete path segments "." and ".." have a special meaning within | |||
| hierarchical URI schemes. As such, they should not appear in | hierarchical URI schemes. As such, they should not appear in | |||
| absolute URI paths; if they are found, they can be removed by | absolute paths; if they are found, they can be removed by applying | |||
| splitting the URI just after the "/" that starts the path, using the | the remove_dot_segments algorithm to the path, as described in | |||
| left half as the base URI and the right as a relative reference, and | Section 5.2. | |||
| normalizing the URI by merging the two in in accordance with the | ||||
| relative URI processing algorithm (Section 5). | ||||
| 6.2.3 Scheme-based Normalization | 6.2.3 Scheme-based Normalization | |||
| The syntax and semantics of URIs vary from scheme to scheme, as | The syntax and semantics of URIs vary from scheme to scheme, as | |||
| described by the defining specification for each scheme. Software | described by the defining specification for each scheme. Software | |||
| may use scheme-specific rules, at further processing cost, to reduce | may use scheme-specific rules, at further processing cost, to reduce | |||
| the probability of false negatives. For example, Web spiders that | the probability of false negatives. For example, Web spiders that | |||
| populate most large search engines would consider the following two | populate most large search engines would consider the following two | |||
| URIs to be equivalent: | URIs to be equivalent: | |||
| skipping to change at page 36, line 38 ¶ | skipping to change at page 38, line 36 ¶ | |||
| 6.2.4 Protocol-based Normalization | 6.2.4 Protocol-based Normalization | |||
| Web spiders, for which substantial effort to reduce the incidence of | Web spiders, for which substantial effort to reduce the incidence of | |||
| false negatives is often cost-effective, are observed to implement | false negatives is often cost-effective, are observed to implement | |||
| even more aggressive techniques in URI comparison. For example, if | even more aggressive techniques in URI comparison. For example, if | |||
| they observe that a URI such as | they observe that a URI such as | |||
| http://example.com/data | http://example.com/data | |||
| redirects to | redirects to a URI differing only in the trailing slash | |||
| http://example.com/data/ | http://example.com/data/ | |||
| they will likely regard the two as equivalent in the future. | they will likely regard the two as equivalent in the future. | |||
| Obviously, this kind of technique is only appropriate in special | Obviously, this kind of technique is only appropriate in special | |||
| situations. | situations. | |||
| 6.3 Canonical Form | 6.3 Canonical Form | |||
| It is in the best interests of everyone to avoid false-negatives in | It is in the best interests of everyone to avoid false-negatives in | |||
| comparing URIs and to minimize the amount of software processing for | comparing URIs and to minimize the amount of software processing for | |||
| such comparisons. Those who generate and make reference to URIs can | such comparisons. Those who generate and make reference to URIs can | |||
| reduce the cost of processing and the risk of false negatives by | reduce the cost of processing and the risk of false negatives by | |||
| consistently providing them in a form that is reasonably canonical | consistently providing them in a form that is reasonably canonical | |||
| with respect to their scheme. Specifically: | with respect to their scheme. Specifically: | |||
| Always provide the URI scheme in lowercase characters. | o Always provide the URI scheme in lowercase characters. | |||
| Always provide the hostname, if any, in lowercase characters. | o Always provide the hostname, if any, in lowercase characters. | |||
| Only perform percent-escaping where it is essential. | o Only perform percent-escaping where it is essential. | |||
| Always use uppercase A-through-F characters when percent-escaping. | o Always use uppercase A-through-F characters when percent-escaping. | |||
| Prevent /./ and /../ from appearing in non-relative URI paths. | o Prevent /./ and /../ from appearing in non-relative URI paths. | |||
| The good practices listed above are motivated by observations that a | The good practices listed above are motivated by deployed software | |||
| high proportion of deployed software use these techniques for the | that frequently use these techniques for the purposes of | |||
| purposes of normalization. | normalization. | |||
| 7. Security Considerations | 7. Security Considerations | |||
| A URI does not in itself pose a security threat. However, since URIs | A URI does not in itself pose a security threat. However, since URIs | |||
| are often used to provide a compact set of instructions for access to | are often used to provide a compact set of instructions for access to | |||
| network resources, care must be taken to properly interpret the data | network resources, care must be taken to properly interpret the data | |||
| within a URI, to prevent that data from causing unintended access, | within a URI, to prevent that data from causing unintended access, | |||
| and to avoid including data that should not be revealed in plain | and to avoid including data that should not be revealed in plain | |||
| text. | text. | |||
| 7.1 Reliability and Consistency | 7.1 Reliability and Consistency | |||
| There is no guarantee that, having once used a given URI to retrieve | There is no guarantee that, having once used a given URI to retrieve | |||
| some information, that the same information will be retrievable by | some information, the same information will be retrievable by that | |||
| that URI in the future. Nor is there any guarantee that the | URI in the future. Nor is there any guarantee that the information | |||
| information retrievable via that URI in the future will be observably | retrievable via that URI in the future will be observably similar to | |||
| similar to that retrieved in the past. The URI syntax does not | that retrieved in the past. The URI syntax does not constrain how a | |||
| constrain how a given scheme or authority apportions its name space | given scheme or authority apportions its name space or maintains it | |||
| or maintains it over time. Such a guarantee can only be obtained | over time. Such a guarantee can only be obtained from the person(s) | |||
| from the person(s) controlling that name space and the resource in | controlling that name space and the resource in question. A specific | |||
| question. A specific URI scheme may define additional semantics, | URI scheme may define additional semantics, such as name persistence, | |||
| such as name persistence, if those semantics are required of all | if those semantics are required of all naming authorities for that | |||
| naming authorities for that scheme. | scheme. | |||
| 7.2 Malicious Construction | 7.2 Malicious Construction | |||
| It is sometimes possible to construct a URI such that an attempt to | It is sometimes possible to construct a URI such that an attempt to | |||
| perform a seemingly harmless, idempotent operation, such as the | perform a seemingly harmless, idempotent operation, such as the | |||
| retrieval of a representation, will in fact cause a possibly damaging | retrieval of a representation, will in fact cause a possibly damaging | |||
| remote operation to occur. The unsafe URI is typically constructed | remote operation to occur. The unsafe URI is typically constructed | |||
| by specifying a port number other than that reserved for the network | by specifying a port number other than that reserved for the network | |||
| protocol in question. The client unwittingly contacts a site that is | protocol in question. The client unwittingly contacts a site that is | |||
| running a different protocol service. The content of the URI | running a different protocol service. The content of the URI | |||
| skipping to change at page 41, line 7 ¶ | skipping to change at page 43, line 7 ¶ | |||
| preconceived notions about the meaning of a URI, rather than an | preconceived notions about the meaning of a URI, rather than an | |||
| attack on the software itself. User agents may be able to reduce the | attack on the software itself. User agents may be able to reduce the | |||
| impact of such attacks by visually distinguishing the various | impact of such attacks by visually distinguishing the various | |||
| components of the URI when rendered, such as by using a different | components of the URI when rendered, such as by using a different | |||
| color or tone to render userinfo if any is present, though there is | color or tone to render userinfo if any is present, though there is | |||
| no general panacea. More information on URI-based semantic attacks | no general panacea. More information on URI-based semantic attacks | |||
| can be found in [Siedzik]. | can be found in [Siedzik]. | |||
| 8. Acknowledgments | 8. Acknowledgments | |||
| This document is derived from RFC 2396 [RFC2396], RFC 1808 [RFC1808], | This specification is derived from RFC 2396 [RFC2396], RFC 1808 | |||
| and RFC 1738 [RFC1738]; the acknowledgments in those specifications | [RFC1808], and RFC 1738 [RFC1738]; the acknowledgments in those | |||
| still apply. It also incorporates the update (with corrections) for | documents still apply. It also incorporates the update (with | |||
| IPv6 literals in the host syntax, as defined by Robert M. Hinden, | corrections) for IPv6 literals in the host syntax, as defined by | |||
| Brian E. Carpenter, and Larry Masinter in [RFC2732]. In addition, | Robert M. Hinden, Brian E. Carpenter, and Larry Masinter in | |||
| contributions by Reese Anschultz, Tim Bray, Rob Cameron, Dan | [RFC2732]. In addition, contributions by Reese Anschultz, Tim Bray, | |||
| Connolly, Adam M. Costello, Jason Diamond, Martin Duerst, Stefan | Rob Cameron, Dan Connolly, Adam M. Costello, John Cowan, Jason | |||
| Eissing, Clive D.W. Feather, Pat Hayes, Henry Holtzman, Graham Klyne, | Diamond, Martin Duerst, Stefan Eissing, Clive D.W. Feather, Pat | |||
| Dan Kohn, Bruce Lilly, Andrew Main, Michael Mealling, Julian Reschke, | Hayes, Henry Holtzman, Graham Klyne, Dan Kohn, Bruce Lilly, Andrew | |||
| Tomas Rokicki, Miles Sabin, Ronald Tschalaer, Marc Warne, Stuart | Main, Michael Mealling, Julian Reschke, Tomas Rokicki, Miles Sabin, | |||
| Williams, and Henry Zongaro are gratefully acknowledged. | Ronald Tschalaer, Marc Warne, Stuart Williams, and Henry Zongaro are | |||
| gratefully acknowledged. | ||||
| Normative References | Normative References | |||
| [ASCII] American National Standards Institute, "Coded Character | [ASCII] American National Standards Institute, "Coded Character | |||
| Set -- 7-bit American Standard Code for Information | Set -- 7-bit American Standard Code for Information | |||
| Interchange", ANSI X3.4, 1986. | Interchange", ANSI X3.4, 1986. | |||
| [RFC2234] Crocker, D. and P. Overell, "Augmented BNF for Syntax | [RFC2234] Crocker, D. and P. Overell, "Augmented BNF for Syntax | |||
| Specifications: ABNF", RFC 2234, November 1997. | Specifications: ABNF", RFC 2234, November 1997. | |||
| skipping to change at page 44, line 15 ¶ | skipping to change at page 46, line 15 ¶ | |||
| [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", | [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", | |||
| STD 13, RFC 1034, November 1987. | STD 13, RFC 1034, November 1987. | |||
| [RFC2110] Palme, J. and A. Hopmann, "MIME E-mail Encapsulation of | [RFC2110] Palme, J. and A. Hopmann, "MIME E-mail Encapsulation of | |||
| Aggregate Documents, such as HTML (MHTML)", RFC 2110, | Aggregate Documents, such as HTML (MHTML)", RFC 2110, | |||
| March 1997. | March 1997. | |||
| [RFC2717] Petke, R. and I. King, "Registration Procedures for URL | [RFC2717] Petke, R. and I. King, "Registration Procedures for URL | |||
| Scheme Names", BCP 35, RFC 2717, November 1999. | Scheme Names", BCP 35, RFC 2717, November 1999. | |||
| [HTML] Raggett, D., Le Hors, A. and I. Jacobs, "Hypertext Markup | ||||
| Language (HTML 4.01) Specification", December 1999. | ||||
| [Siedzik] Siedzik, R., "Semantic Attacks: What's in a URL?", April | [Siedzik] Siedzik, R., "Semantic Attacks: What's in a URL?", April | |||
| 2001. | 2001. | |||
| [UTF-8] Yergeau, F., "UTF-8, a transformation format of ISO | [UTF-8] Yergeau, F., "UTF-8, a transformation format of ISO | |||
| 10646", RFC 2279, January 1998. | 10646", RFC 2279, January 1998. | |||
| Authors' Addresses | Authors' Addresses | |||
| Tim Berners-Lee | Tim Berners-Lee | |||
| World Wide Web Consortium | World Wide Web Consortium | |||
| skipping to change at page 46, line 7 ¶ | skipping to change at page 48, line 7 ¶ | |||
| 345 Park Ave | 345 Park Ave | |||
| San Jose, CA 95110 | San Jose, CA 95110 | |||
| USA | USA | |||
| Phone: +1-408-536-3024 | Phone: +1-408-536-3024 | |||
| EMail: LMM@acm.org | EMail: LMM@acm.org | |||
| URI: http://larry.masinter.net/ | URI: http://larry.masinter.net/ | |||
| Appendix A. Collected ABNF for URI | Appendix A. Collected ABNF for URI | |||
| To be filled-in later. | abs-path = "/" path-segments | |||
| absolute-URI = scheme ":" hier-part [ "?" query ] | ||||
| alphanum = ALPHA / DIGIT | ||||
| authority = [ userinfo "@" ] host [ ":" port ] | ||||
| dec-octet = DIGIT ; 0-9 | ||||
| / %x31-39 DIGIT ; 10-99 | ||||
| / "1" 2DIGIT ; 100-199 | ||||
| / "2" %x30-34 DIGIT ; 200-249 | ||||
| / "25" %x30-35 ; 250-255 | ||||
| domainlabel = alphanum [ 0*61( alphanum / "-" ) alphanum ] | ||||
| escaped = "%" HEXDIG HEXDIG | ||||
| fragment = *( pchar / "/" / "?" ) | ||||
| h4 = 1*4HEXDIG | ||||
| hier-part = net-path / abs-path / rel-path | ||||
| host = [ IPv6reference / IPv4address / hostname ] | ||||
| hostname = domainlabel qualified | ||||
| IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet | ||||
| IPv6address = 6( h4 ":" ) ls32 | ||||
| / "::" 5( h4 ":" ) ls32 | ||||
| / [ h4 ] "::" 4( h4 ":" ) ls32 | ||||
| / [ *1( h4 ":" ) h4 ] "::" 3( h4 ":" ) ls32 | ||||
| / [ *2( h4 ":" ) h4 ] "::" 2( h4 ":" ) ls32 | ||||
| / [ *3( h4 ":" ) h4 ] "::" h4 ":" ls32 | ||||
| / [ *4( h4 ":" ) h4 ] "::" ls32 | ||||
| / [ *5( h4 ":" ) h4 ] "::" h4 | ||||
| / [ *6( h4 ":" ) h4 ] "::" | ||||
| IPv6reference = "[" IPv6address "]" | ||||
| ls32 = ( h4 ":" h4 ) / IPv4address | ||||
| mark = "-" / "_" / "." / "!" / "~" / "*" / "'" / "(" / ")" | ||||
| net-path = "//" authority [ abs-path ] | ||||
| path-segments = segment *( "/" segment ) | ||||
| pchar = unreserved / escaped / ";" / | ||||
| ":" / "@" / "&" / "=" / "+" / "$" / "," | ||||
| port = *DIGIT | ||||
| qualified = *( "." domainlabel ) [ "." ] | ||||
| query = *( pchar / "/" / "?" ) | ||||
| rel-path = path-segments | ||||
| relative-URI = hier-part [ "?" query ] [ "#" fragment ] | ||||
| reserved = "/" / "?" / "#" / "[" / "]" / ";" / | ||||
| ":" / "@" / "&" / "=" / "+" / "$" / "," | ||||
| scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) | ||||
| segment = *pchar | ||||
| unreserved = ALPHA / DIGIT / mark | ||||
| URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] | ||||
| URI-reference = URI / relative-URI | ||||
| uric = reserved / unreserved / escaped | ||||
| userinfo = *( unreserved / escaped / ";" / | ||||
| ":" / "&" / "=" / "+" / "$" / "," ) | ||||
| Appendix B. Parsing a URI Reference with a Regular Expression | Appendix B. Parsing a URI Reference with a Regular Expression | |||
| Since the "first-match-wins" algorithm is identical to the "greedy" | Since the "first-match-wins" algorithm is identical to the "greedy" | |||
| disambiguation method used by POSIX regular expressions, it is | disambiguation method used by POSIX regular expressions, it is | |||
| natural and commonplace to use a regular expression for parsing the | natural and commonplace to use a regular expression for parsing the | |||
| potential five components of a URI reference. | potential five components of a URI reference. | |||
| The following line is the regular expression for breaking-down a | The following line is the regular expression for breaking-down a | |||
| well-formed URI reference into its components. | well-formed URI reference into its components. | |||
| skipping to change at page 48, line 5 ¶ | skipping to change at page 51, line 5 ¶ | |||
| scheme = $2 | scheme = $2 | |||
| authority = $4 | authority = $4 | |||
| path = $5 | path = $5 | |||
| query = $7 | query = $7 | |||
| fragment = $9 | fragment = $9 | |||
| and, going in the opposite direction, we can recreate a URI reference | and, going in the opposite direction, we can recreate a URI reference | |||
| from its components using the algorithm of Section 5.3. | from its components using the algorithm of Section 5.3. | |||
| Appendix C. Embedding the Base URI in HTML documents | Appendix C. Delimiting a URI in Context | |||
| It is useful to consider an example of how the base URI of a document | ||||
| can be embedded within the document's content. In this appendix, we | ||||
| describe how documents written in the Hypertext Markup Language | ||||
| (HTML) [HTML] can include an embedded base URI. This appendix does | ||||
| not form a part of the URI specification and should not be considered | ||||
| as anything more than a descriptive example. | ||||
| HTML defines a special element "BASE" which, when present in the | ||||
| "HEAD" portion of a document, signals that the parser should use the | ||||
| BASE element's "HREF" attribute as the base URI for resolving any | ||||
| relative URI. The "HREF" attribute must be an absolute URI. Note | ||||
| that, in HTML, element and attribute names are case-insensitive. For | ||||
| example: | ||||
| <!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"> | ||||
| <HTML><HEAD> | ||||
| <TITLE>An example HTML document</TITLE> | ||||
| <BASE href="http://www.example.com/Test/a/b/c"> | ||||
| </HEAD><BODY> | ||||
| ... <A href="../x">a hypertext anchor</A> ... | ||||
| </BODY></HTML> | ||||
| A parser reading the example document should interpret the given | ||||
| relative URI "../x" as representing the absolute URI | ||||
| <http://www.example.com/Test/a/x> | ||||
| regardless of the context in which the example document was obtained. | ||||
| Appendix D. Delimiting a URI in Context | ||||
| URIs are often transmitted through formats that do not provide a | URIs are often transmitted through formats that do not provide a | |||
| clear context for their interpretation. For example, there are many | clear context for their interpretation. For example, there are many | |||
| occasions when a URI is included in plain text; examples include text | occasions when a URI is included in plain text; examples include text | |||
| sent in electronic mail, USENET news messages, and, most importantly, | sent in electronic mail, USENET news messages, and, most importantly, | |||
| printed on paper. In such cases, it is important to be able to | printed on paper. In such cases, it is important to be able to | |||
| delimit the URI from the rest of the text, and in particular from | delimit the URI from the rest of the text, and in particular from | |||
| punctuation marks that might be mistaken for part of the URI. | punctuation marks that might be mistaken for part of the URI. | |||
| In practice, URI are delimited in a variety of ways, but usually | In practice, URI are delimited in a variety of ways, but usually | |||
| skipping to change at page 50, line 4 ¶ | skipping to change at page 52, line 9 ¶ | |||
| designators, though it is not commonly used in practice and is no | designators, though it is not commonly used in practice and is no | |||
| longer recommended. | longer recommended. | |||
| For robustness, software that accepts user-typed URI should attempt | For robustness, software that accepts user-typed URI should attempt | |||
| to recognize and strip both delimiters and embedded whitespace. | to recognize and strip both delimiters and embedded whitespace. | |||
| For example, the text: | For example, the text: | |||
| Yes, Jim, I found it under "http://www.w3.org/Addressing/", | Yes, Jim, I found it under "http://www.w3.org/Addressing/", | |||
| but you can probably pick it up from <ftp://ds.internic. | but you can probably pick it up from <ftp://ds.internic. | |||
| net/rfc/>. Note the warning in <http://www.ics.uci.edu/pub/ | net/rfc/>. Note the warning in <http://www.ics.uci.edu/pub/ | |||
| ietf/uri/historical.html#WARNING>. | ietf/uri/historical.html#WARNING>. | |||
| contains the URI references | contains the URI references | |||
| http://www.w3.org/Addressing/ | http://www.w3.org/Addressing/ | |||
| ftp://ds.internic.net/rfc/ | ftp://ds.internic.net/rfc/ | |||
| http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING | http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING | |||
| Appendix E. Summary of Non-editorial Changes | Appendix D. Summary of Non-editorial Changes | |||
| E.1 Additions | D.1 Additions | |||
| IPv6 literals have been added to the list of possible identifiers for | IPv6 literals have been added to the list of possible identifiers for | |||
| the host portion of a authority component, as described by [RFC2732], | the host portion of a authority component, as described by [RFC2732], | |||
| with the addition of "[" and "]" to the reserved and uric sets. | with the addition of "[" and "]" to the reserved and uric sets. | |||
| Square brackets are now specified as reserved within the authority | Square brackets are now specified as reserved within the authority | |||
| component and not allowed outside their use as delimiters for an | component and not allowed outside their use as delimiters for an | |||
| IPv6reference within host. In order to make this change without | IPv6reference within host. In order to make this change without | |||
| changing the technical definition of the path, query, and fragment | changing the technical definition of the path, query, and fragment | |||
| components, those rules were redefined to directly specify the | components, those rules were redefined to directly specify the | |||
| characters allowed rather than be defined in terms of uric. | characters allowed rather than be defined in terms of uric. | |||
| skipping to change at page 51, line 36 ¶ | skipping to change at page 53, line 36 ¶ | |||
| partially-qualified domain names. | partially-qualified domain names. | |||
| Section 6 (Section 6) on URI normalization and comparison has been | Section 6 (Section 6) on URI normalization and comparison has been | |||
| completely rewritten and extended using input from Tim Bray and | completely rewritten and extended using input from Tim Bray and | |||
| discussion within the W3C Technical Architecture Group. Likewise, | discussion within the W3C Technical Architecture Group. Likewise, | |||
| Section 2.1 on the encoding of characters has been replaced. | Section 2.1 on the encoding of characters has been replaced. | |||
| An ABNF production for URI has been introduced to correspond to the | An ABNF production for URI has been introduced to correspond to the | |||
| common usage of the term: an absolute URI with optional fragment. | common usage of the term: an absolute URI with optional fragment. | |||
| E.2 Modifications from RFC 2396 | D.2 Modifications from RFC 2396 | |||
| The ad-hoc BNF syntax has been replaced with the ABNF of [RFC2234]. | The ad-hoc BNF syntax has been replaced with the ABNF of [RFC2234]. | |||
| This change required all rule names that formerly included underscore | This change required all rule names that formerly included underscore | |||
| characters to be renamed with a dash instead. | characters to be renamed with a dash instead. | |||
| Section 2.2 on reserved characters has been rewritten to clearly | Section 2.2 on reserved characters has been rewritten to clearly | |||
| explain what characters are reserved, when they are reserved, and why | explain what characters are reserved, when they are reserved, and why | |||
| they are reserved even when not used as delimiters by the generic | they are reserved even when not used as delimiters by the generic | |||
| syntax. Likewise, the section on escaped characters has been | syntax. Likewise, the section on escaped characters has been | |||
| rewritten, and URI normalizers are now given license to unescape any | rewritten, and URI normalizers are now given license to unescape any | |||
| octets corresponding to unreserved characters. The crosshatch ("#") | octets corresponding to unreserved characters. The number-sign ("#") | |||
| character has been moved back from the excluded delims to the | character has been moved back from the excluded delims to the | |||
| reserved set. | reserved set. | |||
| The ABNF for URI and URI-reference has been redesigned to make them | The ABNF for URI and URI-reference has been redesigned to make them | |||
| more friendly to LALR parsers and significantly reduce complexity. As | more friendly to LALR parsers and significantly reduce complexity. As | |||
| a result, the layout form of syntax description has been removed, | a result, the layout form of syntax description has been removed, | |||
| along with the uric-no-slash, opaque-part, and rel-segment | along with the uric-no-slash, opaque-part, and rel-segment | |||
| productions. All references to "opaque" URIs have been replaced with | productions. All references to "opaque" URIs have been replaced with | |||
| a better description of how the path component may be opaque to | a better description of how the path component may be opaque to | |||
| hierarchy. The fragment identifier has been moved back into the | hierarchy. The fragment identifier has been moved back into the | |||
| skipping to change at page 52, line 23 ¶ | skipping to change at page 54, line 23 ¶ | |||
| explained and disambiguated in the section defining relative-URI. | explained and disambiguated in the section defining relative-URI. | |||
| The ABNF of hier-part and relative-URI has been corrected to allow a | The ABNF of hier-part and relative-URI has been corrected to allow a | |||
| relative URI path to be empty. This also allows an absolute-URI to | relative URI path to be empty. This also allows an absolute-URI to | |||
| consist of nothing after the "scheme:", as is present in practice | consist of nothing after the "scheme:", as is present in practice | |||
| with the "DAV:" namespace [RFC2518] and the "about:" URI used by many | with the "DAV:" namespace [RFC2518] and the "about:" URI used by many | |||
| browser implementations. The ambiguity regarding the parsing of | browser implementations. The ambiguity regarding the parsing of | |||
| net-path, abs-path, and rel-path is now explained and disambiguated | net-path, abs-path, and rel-path is now explained and disambiguated | |||
| in the same section. | in the same section. | |||
| Registry-based naming authorities that use the hierarchical authority | Registry-based naming authorities that use the generic syntax | |||
| syntax component are now limited to DNS hostnames, since those have | authority component are now limited to DNS hostnames, since those | |||
| been the only such URIs in deployment. This change was necessary to | have been the only such URIs in deployment. This change was | |||
| enable internationalized domain names to be processed in their native | necessary to enable internationalized domain names to be processed in | |||
| character encodings at the application layers above URI processing. | their native character encodings at the application layers above URI | |||
| The reg_name, server, and hostport productions have been removed to | processing. The reg_name, server, and hostport productions have been | |||
| simplify parsing of the URI syntax. | removed to simplify parsing of the URI syntax. | |||
| The ABNF of qualified has been simplified to remove a parsing | The ABNF of qualified has been simplified to remove a parsing | |||
| ambiguity without changing the allowed syntax. The toplabel | ambiguity without changing the allowed syntax. The toplabel | |||
| production has been removed because it served no useful purpose. The | production has been removed because it served no useful purpose. The | |||
| ambiguity regarding the parsing of host as IPv4address or hostname is | ambiguity regarding the parsing of host as IPv4address or hostname is | |||
| now explained and disambiguated in the same section. | now explained and disambiguated in the same section. | |||
| The resolving relative references algorithm of [RFC2396] has been | The resolving relative references algorithm of [RFC2396] has been | |||
| rewritten using pseudocode for this revision to improve clarity and | rewritten using pseudocode for this revision to improve clarity and | |||
| fix the following issues: | fix the following issues: | |||
| skipping to change at page 52, line 51 ¶ | skipping to change at page 54, line 51 ¶ | |||
| o [RFC2396] section 5.2, step 6a, failed to account for a base URI | o [RFC2396] section 5.2, step 6a, failed to account for a base URI | |||
| with no path. | with no path. | |||
| o Restored the behavior of [RFC1808] where, if the reference | o Restored the behavior of [RFC1808] where, if the reference | |||
| contains an empty path and a defined query component, then the | contains an empty path and a defined query component, then the | |||
| target URI inherits the base URI's path component. | target URI inherits the base URI's path component. | |||
| o Removed the special-case treatment of same-document references in | o Removed the special-case treatment of same-document references in | |||
| favor of a section that explains that a new retrieval action | favor of a section that explains that a new retrieval action | |||
| should not be made if the target URI and base URI, excluding | should not be made if the target URI and base URI, excluding | |||
| fragments, match. | fragments, match. This change has no impact on user agent | |||
| behavior aside from how the resolved reference might be described | ||||
| to the user. | ||||
| o Separated the path merge routine into two routines: merge, for | ||||
| describing combination of the base URI path with a relative-path | ||||
| reference, and remove_dot_segments, for describing how to remove | ||||
| the special "." and ".." segments from a composed path. The | ||||
| remove_dot_segments algorithm is now applied to all URI reference | ||||
| paths in order to match common implementations and improve the | ||||
| normalization of URIs in practice. This change only impacts the | ||||
| parsing of abnormal references and same-scheme references wherein | ||||
| the base URI has a non-hierarchical path. | ||||
| Index | Index | |||
| A | A | |||
| ABNF 9 | ABNF 9 | |||
| abs-path 15 | abs-path 16 | |||
| absolute 9 | absolute 25 | |||
| absolute-path 22 | absolute-path 24 | |||
| absolute-URI 23 | absolute-URI 25 | |||
| access 7 | access 7 | |||
| alphanum 17 | alphanum 18 | |||
| authority 15, 16 | authority 16, 17 | |||
| B | ||||
| base URI 27 | ||||
| D | D | |||
| dec-octet 17 | dec-octet 19 | |||
| delims 13 | delims 15 | |||
| dereference 8 | dereference 7 | |||
| domainlabel 17 | domainlabel 18 | |||
| dot-segments 19 | dot-segments 20 | |||
| E | E | |||
| escaped 12 | escaped 13 | |||
| excluded 13 | excluded 14 | |||
| F | F | |||
| fragment 20 | fragment 22 | |||
| G | G | |||
| generic syntax 5 | generic syntax 5 | |||
| H | H | |||
| h4 18 | h4 19 | |||
| hier-part 15 | hier-part 16 | |||
| hierarchical 9 | hierarchical 8 | |||
| host 17 | host 18 | |||
| hostname 17 | hostname 18 | |||
| I | I | |||
| identifier 5 | identifier 5 | |||
| invisible 13 | invisible 14 | |||
| IPv4 17 | IPv4 19 | |||
| IPv4address 17 | IPv4address 19 | |||
| IPv6 18 | IPv6 19 | |||
| IPv6address 18 | IPv6address 19 | |||
| IPv6reference 18 | IPv6reference 19 | |||
| L | L | |||
| locator 6 | locator 6 | |||
| ls32 18 | ls32 19 | |||
| M | M | |||
| mark 11 | mark 12 | |||
| merge 30 | ||||
| N | N | |||
| name 6 | name 6 | |||
| net-path 15 | net-path 16 | |||
| network-path 22 | network-path 24 | |||
| P | P | |||
| path 15, 19 | path 16, 20 | |||
| path-segments 19 | path-segments 20 | |||
| pchar 19 | pchar 20 | |||
| port 18 | port 20 | |||
| Q | Q | |||
| qualified 17 | qualified 18 | |||
| query 20 | query 21 | |||
| R | R | |||
| rel-path 15 | rel-path 16 | |||
| relative 9 | relative 9, 27 | |||
| relative-path 22 | relative-path 24 | |||
| relative-URI 22 | relative-URI 24 | |||
| remove_dot_segments 30 | ||||
| representation 8 | representation 8 | |||
| reserved 10 | reserved 11 | |||
| resolution 8 | resolution 7, 27 | |||
| resource 4 | resource 4 | |||
| retrieval 8 | retrieval 8 | |||
| S | S | |||
| same-document 23 | same-document 25 | |||
| sameness 8 | sameness 8 | |||
| scheme 15 | scheme 16 | |||
| segment 19 | segment 20 | |||
| suffix 23 | suffix 25 | |||
| T | T | |||
| transcription 6 | transcription 6 | |||
| U | U | |||
| uniform 4 | uniform 4 | |||
| unreserved 11 | unreserved 12 | |||
| unwise 13 | unwise 15 | |||
| URI grammar | URI grammar | |||
| abs-path 15 | abs-path 16 | |||
| absolute-URI 23 | absolute-URI 25 | |||
| ALPHA 9 | ALPHA 9 | |||
| alphanum 17 | alphanum 18 | |||
| authority 15, 16 | authority 16, 17 | |||
| CR 9 | CR 9 | |||
| CTL 9 | CTL 9 | |||
| dec-octet 17 | dec-octet 19 | |||
| DIGIT 9 | DIGIT 9 | |||
| domainlabel 17 | domainlabel 18 | |||
| DQUOTE 9 | DQUOTE 9 | |||
| escaped 12 | escaped 13 | |||
| fragment 15, 20, 22 | fragment 16, 22, 24 | |||
| h4 18 | h4 19 | |||
| HEXDIG 9 | HEXDIG 9 | |||
| hier-part 15, 22, 23 | hier-part 16, 24, 25 | |||
| host 16, 17 | host 17, 18 | |||
| hostname 17 | hostname 18 | |||
| IPv4address 17 | IPv4address 19 | |||
| IPv6address 18 | IPv6address 19 | |||
| IPv6reference 18 | IPv6reference 19 | |||
| LF 9 | LF 9 | |||
| ls32 18 | ls32 19 | |||
| mark 11 | mark 12 | |||
| net-path 15 | net-path 16 | |||
| OCTET 9 | OCTET 9 | |||
| path-segments 15, 19 | path-segments 16, 20 | |||
| pchar 19, 20, 20 | pchar 20, 21, 22 | |||
| port 16, 18 | port 17, 20 | |||
| qualified 17 | qualified 18 | |||
| query 15, 20, 22, 23 | query 16, 21, 24, 25 | |||
| rel-path 15 | rel-path 16 | |||
| relative-URI 22, 22 | relative-URI 24, 24 | |||
| reserved 11 | reserved 12 | |||
| scheme 15, 16, 23 | scheme 16, 17, 25 | |||
| segment 19 | segment 20 | |||
| SP 9 | SP 9 | |||
| unreserved 11 | unreserved 12 | |||
| URI 15, 22 | URI 16, 24 | |||
| URI-reference 22 | URI-reference 24 | |||
| uric 10 | uric 11 | |||
| userinfo 16, 16 | userinfo 17, 18 | |||
| URI 15 | URI 16 | |||
| URI-reference 22 | URI-reference 24 | |||
| uric 10 | uric 11 | |||
| URL 6 | URL 6 | |||
| URN 6 | URN 6 | |||
| userinfo 16 | userinfo 18 | |||
| Intellectual Property Statement | Intellectual Property Statement | |||
| The IETF takes no position regarding the validity or scope of any | The IETF takes no position regarding the validity or scope of any | |||
| intellectual property or other rights that might be claimed to | intellectual property or other rights that might be claimed to | |||
| pertain to the implementation or use of the technology described in | pertain to the implementation or use of the technology described in | |||
| this document or the extent to which any license under such rights | this document or the extent to which any license under such rights | |||
| might or might not be available; neither does it represent that it | might or might not be available; neither does it represent that it | |||
| has made any effort to identify any such rights. Information on the | has made any effort to identify any such rights. Information on the | |||
| IETF's procedures with respect to rights in standards-track and | IETF's procedures with respect to rights in standards-track and | |||
| End of changes. 125 change blocks. | ||||
| 463 lines changed or deleted | 556 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||