| < draft-fielding-uri-rfc2396bis-06.txt | draft-fielding-uri-rfc2396bis-07.txt > | |||
|---|---|---|---|---|
| Network Working Group T. Berners-Lee | Network Working Group T. Berners-Lee | |||
| Internet-Draft W3C/MIT | Internet-Draft W3C/MIT | |||
| Updates: 1738 (if approved) R. Fielding | Updates: 1738 (if approved) R. Fielding | |||
| Obsoletes: 2732, 2396, 1808 (if approved) Day Software | Obsoletes: 2732, 2396, 1808 (if approved) Day Software | |||
| L. Masinter | L. Masinter | |||
| Expires: January 15, 2005 Adobe | Expires: March 26, 2005 Adobe | |||
| July 17, 2004 | September 25, 2004 | |||
| Uniform Resource Identifier (URI): Generic Syntax | Uniform Resource Identifier (URI): Generic Syntax | |||
| draft-fielding-uri-rfc2396bis-06 | draft-fielding-uri-rfc2396bis-07 | |||
| Status of this Memo | Status of this Memo | |||
| This document is an Internet-Draft and is subject to all provisions | This document is an Internet-Draft and is subject to all provisions | |||
| of section 3 of RFC 3667. By submitting this Internet-Draft, each | of section 3 of RFC 3667. By submitting this Internet-Draft, each | |||
| author represents that any applicable patent or other IPR claims of | author represents that any applicable patent or other IPR claims of | |||
| which he or she is aware have been or will be disclosed, and any of | which he or she is aware have been or will be disclosed, and any of | |||
| which he or she become aware will be disclosed, in accordance with | which he or she become aware will be disclosed, in accordance with | |||
| RFC 3668. | RFC 3668. | |||
| skipping to change at page 1, line 35 ¶ | skipping to change at page 1, line 35 ¶ | |||
| other groups may also distribute working documents as | other groups may also distribute working documents as | |||
| Internet-Drafts. | Internet-Drafts. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| <http://www.ietf.org/ietf/1id-abstracts.txt>. | <http://www.ietf.org/ietf/1id-abstracts.txt>. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| <http://www.ietf.org/shadow.html>. | <http://www.ietf.org/shadow.html>. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The Internet Society (2004). All Rights Reserved. | Copyright (C) The Internet Society (2004). | |||
| Abstract | Abstract | |||
| A Uniform Resource Identifier (URI) is a compact sequence of | A Uniform Resource Identifier (URI) is a compact sequence of | |||
| characters for identifying an abstract or physical resource. This | characters for identifying an abstract or physical resource. This | |||
| specification defines the generic URI syntax and a process for | specification defines the generic URI syntax and a process for | |||
| resolving URI references that might be in relative form, along with | resolving URI references that might be in relative form, along with | |||
| guidelines and security considerations for the use of URIs on the | guidelines and security considerations for the use of URIs on the | |||
| Internet. The URI syntax defines a grammar that is a superset of all | Internet. The URI syntax defines a grammar that is a superset of all | |||
| valid URIs, such that an implementation can parse the common | valid URIs, such that an implementation can parse the common | |||
| components of a URI reference without knowing the scheme-specific | components of a URI reference without knowing the scheme-specific | |||
| requirements of every possible identifier. This specification does | requirements of every possible identifier. This specification does | |||
| not define a generative grammar for URIs; that task is performed by | not define a generative grammar for URIs; that task is performed by | |||
| the individual specifications of each URI scheme. | the individual specifications of each URI scheme. | |||
| Editorial Note | Editorial Note | |||
| Discussion of this draft and comments to the editors should be sent | Discussion of this draft and comments to the editors should be sent | |||
| to the uri@w3.org mailing list. An issues list and version history | to the uri@w3.org mailing list. An issues list and version history | |||
| is available at <http://gbiv.com/protocols/uri/rev-2002/ | is available at <http://gbiv.com/protocols/uri/rev-2002/issues.html>. | |||
| issues.html> [1]. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.1 Overview of URIs . . . . . . . . . . . . . . . . . . . . . 4 | 1.1 Overview of URIs . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.1.1 Generic Syntax . . . . . . . . . . . . . . . . . . . . 6 | 1.1.1 Generic Syntax . . . . . . . . . . . . . . . . . . . . 6 | |||
| 1.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . 7 | 1.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 1.1.3 URI, URL, and URN . . . . . . . . . . . . . . . . . . 7 | 1.1.3 URI, URL, and URN . . . . . . . . . . . . . . . . . . 7 | |||
| 1.2 Design Considerations . . . . . . . . . . . . . . . . . . 7 | 1.2 Design Considerations . . . . . . . . . . . . . . . . . . 7 | |||
| 1.2.1 Transcription . . . . . . . . . . . . . . . . . . . . 7 | 1.2.1 Transcription . . . . . . . . . . . . . . . . . . . . 7 | |||
| skipping to change at page 2, line 33 ¶ | skipping to change at page 2, line 32 ¶ | |||
| 1.3 Syntax Notation . . . . . . . . . . . . . . . . . . . . . 11 | 1.3 Syntax Notation . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 2. Characters . . . . . . . . . . . . . . . . . . . . . . . . . . 11 | 2. Characters . . . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 2.1 Percent-Encoding . . . . . . . . . . . . . . . . . . . . . 12 | 2.1 Percent-Encoding . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 2.2 Reserved Characters . . . . . . . . . . . . . . . . . . . 12 | 2.2 Reserved Characters . . . . . . . . . . . . . . . . . . . 12 | |||
| 2.3 Unreserved Characters . . . . . . . . . . . . . . . . . . 13 | 2.3 Unreserved Characters . . . . . . . . . . . . . . . . . . 13 | |||
| 2.4 When to Encode or Decode . . . . . . . . . . . . . . . . . 13 | 2.4 When to Encode or Decode . . . . . . . . . . . . . . . . . 13 | |||
| 2.5 Identifying Data . . . . . . . . . . . . . . . . . . . . . 14 | 2.5 Identifying Data . . . . . . . . . . . . . . . . . . . . . 14 | |||
| 3. Syntax Components . . . . . . . . . . . . . . . . . . . . . . 16 | 3. Syntax Components . . . . . . . . . . . . . . . . . . . . . . 16 | |||
| 3.1 Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 16 | 3.1 Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 16 | |||
| 3.2 Authority . . . . . . . . . . . . . . . . . . . . . . . . 17 | 3.2 Authority . . . . . . . . . . . . . . . . . . . . . . . . 17 | |||
| 3.2.1 User Information . . . . . . . . . . . . . . . . . . . 17 | 3.2.1 User Information . . . . . . . . . . . . . . . . . . . 18 | |||
| 3.2.2 Host . . . . . . . . . . . . . . . . . . . . . . . . . 18 | 3.2.2 Host . . . . . . . . . . . . . . . . . . . . . . . . . 18 | |||
| 3.2.3 Port . . . . . . . . . . . . . . . . . . . . . . . . . 21 | 3.2.3 Port . . . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
| 3.3 Path . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 | 3.3 Path . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 | |||
| 3.4 Query . . . . . . . . . . . . . . . . . . . . . . . . . . 23 | 3.4 Query . . . . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
| 3.5 Fragment . . . . . . . . . . . . . . . . . . . . . . . . . 23 | 3.5 Fragment . . . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
| 4. Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 | 4. Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 | |||
| 4.1 URI Reference . . . . . . . . . . . . . . . . . . . . . . 25 | 4.1 URI Reference . . . . . . . . . . . . . . . . . . . . . . 25 | |||
| 4.2 Relative URI . . . . . . . . . . . . . . . . . . . . . . . 26 | 4.2 Relative Reference . . . . . . . . . . . . . . . . . . . . 26 | |||
| 4.3 Absolute URI . . . . . . . . . . . . . . . . . . . . . . . 26 | 4.3 Absolute URI . . . . . . . . . . . . . . . . . . . . . . . 26 | |||
| 4.4 Same-document Reference . . . . . . . . . . . . . . . . . 26 | 4.4 Same-document Reference . . . . . . . . . . . . . . . . . 27 | |||
| 4.5 Suffix Reference . . . . . . . . . . . . . . . . . . . . . 27 | 4.5 Suffix Reference . . . . . . . . . . . . . . . . . . . . . 27 | |||
| 5. Reference Resolution . . . . . . . . . . . . . . . . . . . . . 28 | 5. Reference Resolution . . . . . . . . . . . . . . . . . . . . . 28 | |||
| 5.1 Establishing a Base URI . . . . . . . . . . . . . . . . . 28 | 5.1 Establishing a Base URI . . . . . . . . . . . . . . . . . 28 | |||
| 5.1.1 Base URI Embedded in Content . . . . . . . . . . . . . 28 | 5.1.1 Base URI Embedded in Content . . . . . . . . . . . . . 29 | |||
| 5.1.2 Base URI from the Encapsulating Entity . . . . . . . . 29 | 5.1.2 Base URI from the Encapsulating Entity . . . . . . . . 29 | |||
| 5.1.3 Base URI from the Retrieval URI . . . . . . . . . . . 29 | 5.1.3 Base URI from the Retrieval URI . . . . . . . . . . . 30 | |||
| 5.1.4 Default Base URI . . . . . . . . . . . . . . . . . . . 29 | 5.1.4 Default Base URI . . . . . . . . . . . . . . . . . . . 30 | |||
| 5.2 Relative Resolution . . . . . . . . . . . . . . . . . . . 30 | 5.2 Relative Resolution . . . . . . . . . . . . . . . . . . . 30 | |||
| 5.2.1 Pre-parse the Base URI . . . . . . . . . . . . . . . . 30 | 5.2.1 Pre-parse the Base URI . . . . . . . . . . . . . . . . 30 | |||
| 5.2.2 Transform References . . . . . . . . . . . . . . . . . 30 | 5.2.2 Transform References . . . . . . . . . . . . . . . . . 31 | |||
| 5.2.3 Merge Paths . . . . . . . . . . . . . . . . . . . . . 31 | 5.2.3 Merge Paths . . . . . . . . . . . . . . . . . . . . . 32 | |||
| 5.2.4 Remove Dot Segments . . . . . . . . . . . . . . . . . 32 | 5.2.4 Remove Dot Segments . . . . . . . . . . . . . . . . . 32 | |||
| 5.3 Component Recomposition . . . . . . . . . . . . . . . . . 34 | 5.3 Component Recomposition . . . . . . . . . . . . . . . . . 34 | |||
| 5.4 Reference Resolution Examples . . . . . . . . . . . . . . 34 | 5.4 Reference Resolution Examples . . . . . . . . . . . . . . 34 | |||
| 5.4.1 Normal Examples . . . . . . . . . . . . . . . . . . . 35 | 5.4.1 Normal Examples . . . . . . . . . . . . . . . . . . . 35 | |||
| 5.4.2 Abnormal Examples . . . . . . . . . . . . . . . . . . 35 | 5.4.2 Abnormal Examples . . . . . . . . . . . . . . . . . . 35 | |||
| 6. Normalization and Comparison . . . . . . . . . . . . . . . . . 36 | 6. Normalization and Comparison . . . . . . . . . . . . . . . . . 36 | |||
| 6.1 Equivalence . . . . . . . . . . . . . . . . . . . . . . . 37 | 6.1 Equivalence . . . . . . . . . . . . . . . . . . . . . . . 37 | |||
| 6.2 Comparison Ladder . . . . . . . . . . . . . . . . . . . . 37 | 6.2 Comparison Ladder . . . . . . . . . . . . . . . . . . . . 37 | |||
| 6.2.1 Simple String Comparison . . . . . . . . . . . . . . . 38 | 6.2.1 Simple String Comparison . . . . . . . . . . . . . . . 38 | |||
| 6.2.2 Syntax-based Normalization . . . . . . . . . . . . . . 38 | 6.2.2 Syntax-based Normalization . . . . . . . . . . . . . . 39 | |||
| 6.2.3 Scheme-based Normalization . . . . . . . . . . . . . . 39 | 6.2.3 Scheme-based Normalization . . . . . . . . . . . . . . 40 | |||
| 6.2.4 Protocol-based Normalization . . . . . . . . . . . . . 40 | 6.2.4 Protocol-based Normalization . . . . . . . . . . . . . 41 | |||
| 6.3 Canonical Form . . . . . . . . . . . . . . . . . . . . . . 40 | ||||
| 7. Security Considerations . . . . . . . . . . . . . . . . . . . 41 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 41 | |||
| 7.1 Reliability and Consistency . . . . . . . . . . . . . . . 41 | 7.1 Reliability and Consistency . . . . . . . . . . . . . . . 41 | |||
| 7.2 Malicious Construction . . . . . . . . . . . . . . . . . . 41 | 7.2 Malicious Construction . . . . . . . . . . . . . . . . . . 42 | |||
| 7.3 Back-end Transcoding . . . . . . . . . . . . . . . . . . . 42 | 7.3 Back-end Transcoding . . . . . . . . . . . . . . . . . . . 42 | |||
| 7.4 Rare IP Address Formats . . . . . . . . . . . . . . . . . 43 | 7.4 Rare IP Address Formats . . . . . . . . . . . . . . . . . 43 | |||
| 7.5 Sensitive Information . . . . . . . . . . . . . . . . . . 44 | 7.5 Sensitive Information . . . . . . . . . . . . . . . . . . 44 | |||
| 7.6 Semantic Attacks . . . . . . . . . . . . . . . . . . . . . 44 | 7.6 Semantic Attacks . . . . . . . . . . . . . . . . . . . . . 44 | |||
| 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 44 | 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 45 | |||
| 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 44 | 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 45 | |||
| 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 45 | 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 46 | |||
| 10.1 Normative References . . . . . . . . . . . . . . . . . . . . 45 | 10.1 Normative References . . . . . . . . . . . . . . . . . . . . 46 | |||
| 10.2 Informative References . . . . . . . . . . . . . . . . . . . 45 | 10.2 Informative References . . . . . . . . . . . . . . . . . . . 46 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 47 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 48 | |||
| A. Collected ABNF for URI . . . . . . . . . . . . . . . . . . . . 48 | A. Collected ABNF for URI . . . . . . . . . . . . . . . . . . . . 49 | |||
| B. Parsing a URI Reference with a Regular Expression . . . . . . 50 | B. Parsing a URI Reference with a Regular Expression . . . . . . 51 | |||
| C. Delimiting a URI in Context . . . . . . . . . . . . . . . . . 50 | C. Delimiting a URI in Context . . . . . . . . . . . . . . . . . 52 | |||
| D. Summary of Non-editorial Changes . . . . . . . . . . . . . . . 52 | D. Changes from RFC 2396 . . . . . . . . . . . . . . . . . . . . 53 | |||
| D.1 Additions . . . . . . . . . . . . . . . . . . . . . . . . 52 | D.1 Additions . . . . . . . . . . . . . . . . . . . . . . . . 53 | |||
| D.2 Modifications from RFC 2396 . . . . . . . . . . . . . . . 52 | D.2 Modifications . . . . . . . . . . . . . . . . . . . . . . 54 | |||
| E. Instructions to RFC Editor . . . . . . . . . . . . . . . . . . 54 | E. Instructions to RFC Editor . . . . . . . . . . . . . . . . . . 56 | |||
| Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 | Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 | |||
| Intellectual Property and Copyright Statements . . . . . . . . 59 | Intellectual Property and Copyright Statements . . . . . . . . 61 | |||
| 1. Introduction | 1. Introduction | |||
| A Uniform Resource Identifier (URI) provides a simple and extensible | A Uniform Resource Identifier (URI) provides a simple and extensible | |||
| means for identifying a resource. This specification of URI syntax | means for identifying a resource. This specification of URI syntax | |||
| and semantics is derived from concepts introduced by the World Wide | and semantics is derived from concepts introduced by the World Wide | |||
| Web global information initiative, whose use of such identifiers | Web global information initiative, whose use of such identifiers | |||
| dates from 1990 and is described in "Universal Resource Identifiers | dates from 1990 and is described in "Universal Resource Identifiers | |||
| in WWW" [RFC1630], and is designed to meet the recommendations laid | in WWW" [RFC1630], and is designed to meet the recommendations laid | |||
| out in "Functional Recommendations for Internet Resource Locators" | out in "Functional Recommendations for Internet Resource Locators" | |||
| skipping to change at page 10, line 33 ¶ | skipping to change at page 10, line 33 ¶ | |||
| number sign ("#") characters for the purpose of delimiting components | number sign ("#") characters for the purpose of delimiting components | |||
| that are significant to the generic parser's hierarchical | that are significant to the generic parser's hierarchical | |||
| interpretation of an identifier. In addition to aiding the | interpretation of an identifier. In addition to aiding the | |||
| readability of such identifiers through the consistent use of | readability of such identifiers through the consistent use of | |||
| familiar syntax, this uniform representation of hierarchy across | familiar syntax, this uniform representation of hierarchy across | |||
| naming schemes allows scheme-independent references to be made | naming schemes allows scheme-independent references to be made | |||
| relative to that hierarchy. | relative to that hierarchy. | |||
| It is often the case that a group or "tree" of documents has been | It is often the case that a group or "tree" of documents has been | |||
| constructed to serve a common purpose, wherein the vast majority of | constructed to serve a common purpose, wherein the vast majority of | |||
| URIs in these documents point to resources within the tree rather | URI references in these documents point to resources within the tree | |||
| than outside of it. Similarly, documents located at a particular | rather than outside of it. Similarly, documents located at a | |||
| site are much more likely to refer to other resources at that site | particular site are much more likely to refer to other resources at | |||
| than to resources at remote sites. Relative referencing of URIs | that site than to resources at remote sites. Relative referencing of | |||
| allows document trees to be partially independent of their location | URIs allows document trees to be partially independent of their | |||
| and access scheme. For instance, it is possible for a single set of | location and access scheme. For instance, it is possible for a | |||
| hypertext documents to be simultaneously accessible and traversable | single set of hypertext documents to be simultaneously accessible and | |||
| via each of the "file", "http", and "ftp" schemes if the documents | traversable via each of the "file", "http", and "ftp" schemes if the | |||
| refer to each other using relative references. Furthermore, such | documents refer to each other using relative references. | |||
| document trees can be moved, as a whole, without changing any of the | Furthermore, such document trees can be moved, as a whole, without | |||
| relative references. | changing any of the relative references. | |||
| A relative URI reference (Section 4.2) refers to a resource by | A relative reference (Section 4.2) refers to a resource by describing | |||
| describing the difference within a hierarchical name space between | the difference within a hierarchical name space between the reference | |||
| the reference context and the target URI. The reference resolution | context and the target URI. The reference resolution algorithm, | |||
| algorithm, presented in Section 5, defines how such a reference is | presented in Section 5, defines how such a reference is transformed | |||
| transformed to the target URI. Since relative references can only be | to the target URI. Since relative references can only be used within | |||
| used within the context of a hierarchical URI, designers of new URI | the context of a hierarchical URI, designers of new URI schemes | |||
| schemes should use a syntax consistent with the generic syntax's | should use a syntax consistent with the generic syntax's hierarchical | |||
| hierarchical components unless there are compelling reasons to forbid | components unless there are compelling reasons to forbid relative | |||
| relative referencing within that scheme. | referencing within that scheme. | |||
| All URIs are parsed by generic syntax parsers when used. A URI | NOTE: Previous specifications used the terms "partial URI" and | |||
| scheme that wishes to remain opaque to hierarchical processing must | "relative URI" to denote a relative reference to a URI. Since | |||
| disallow the use of slash and question mark characters. However, | some readers misunderstood those terms to mean that relative URIs | |||
| since a URI reference is only modified by the generic parser if it | are a subset of URIs, rather than a method of referencing URIs, | |||
| contains a dot-segment (a complete path segment of "." or "..", as | this specification simply refers to them as relative references. | |||
| described in Section 3.3), URI schemes may safely use "/" for other | ||||
| purposes if they do not allow dot-segments. | All URI references are parsed by generic syntax parsers when used. | |||
| However, since hierarchical processing has no effect on an absolute | ||||
| URI used in a reference unless it contains one or more dot-segments | ||||
| (complete path segments of "." or "..", as described in Section 3.3), | ||||
| URI scheme specifications can define opaque identifiers by | ||||
| disallowing use of slash characters, question mark characters, and | ||||
| the URIs "scheme:." and "scheme:..". | ||||
| 1.3 Syntax Notation | 1.3 Syntax Notation | |||
| This specification uses the Augmented Backus-Naur Form (ABNF) | This specification uses the Augmented Backus-Naur Form (ABNF) | |||
| notation of [RFC2234], including the following core ABNF syntax rules | notation of [RFC2234], including the following core ABNF syntax rules | |||
| defined by that specification: ALPHA (letters), CR (carriage return), | defined by that specification: ALPHA (letters), CR (carriage return), | |||
| DIGIT (decimal digits), DQUOTE (double quote), HEXDIG (hexadecimal | DIGIT (decimal digits), DQUOTE (double quote), HEXDIG (hexadecimal | |||
| digits), LF (line feed), and SP (space). The complete URI syntax is | digits), LF (line feed), and SP (space). The complete URI syntax is | |||
| collected in Appendix A. | collected in Appendix A. | |||
| skipping to change at page 14, line 13 ¶ | skipping to change at page 14, line 18 ¶ | |||
| Once produced, a URI is always in its percent-encoded form. | Once produced, a URI is always in its percent-encoded form. | |||
| When a URI is dereferenced, the components and subcomponents | When a URI is dereferenced, the components and subcomponents | |||
| significant to the scheme-specific dereferencing process (if any) | significant to the scheme-specific dereferencing process (if any) | |||
| must be parsed and separated before the percent-encoded octets within | must be parsed and separated before the percent-encoded octets within | |||
| those components can be safely decoded, since otherwise the data may | those components can be safely decoded, since otherwise the data may | |||
| be mistaken for component delimiters. The only exception is for | be mistaken for component delimiters. The only exception is for | |||
| percent-encoded octets corresponding to characters in the unreserved | percent-encoded octets corresponding to characters in the unreserved | |||
| set, which can be decoded at any time. For example, the octet | set, which can be decoded at any time. For example, the octet | |||
| corresponding to the tilde ("~") character is often encoded as "%7E" | corresponding to the tilde ("~") character is often encoded as "%7E" | |||
| by older URI processing software; the "%7E" can be replaced by "~" | by older URI processing implementations; the "%7E" can be replaced by | |||
| without changing its interpretation. | "~" without changing its interpretation. | |||
| Because the percent ("%") character serves as the indicator for | Because the percent ("%") character serves as the indicator for | |||
| percent-encoded octets, it must be percent-encoded as "%25" in order | percent-encoded octets, it must be percent-encoded as "%25" in order | |||
| for that octet to be used as data within a URI. Implementations must | for that octet to be used as data within a URI. Implementations must | |||
| not percent-encode or decode the same string more than once, since | not percent-encode or decode the same string more than once, since | |||
| decoding an already decoded string might lead to misinterpreting a | decoding an already decoded string might lead to misinterpreting a | |||
| percent data octet as the beginning of a percent-encoding, or vice | percent data octet as the beginning of a percent-encoding, or vice | |||
| versa in the case of percent-encoding an already percent-encoded | versa in the case of percent-encoding an already percent-encoded | |||
| string. | string. | |||
| skipping to change at page 17, line 9 ¶ | skipping to change at page 17, line 11 ¶ | |||
| lowercase in scheme names (e.g., allow "HTTP" as well as "http"), for | lowercase in scheme names (e.g., allow "HTTP" as well as "http"), for | |||
| the sake of robustness, but should only produce lowercase scheme | the sake of robustness, but should only produce lowercase scheme | |||
| names, for consistency. | names, for consistency. | |||
| scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) | scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) | |||
| Individual schemes are not specified by this document. The process | Individual schemes are not specified by this document. The process | |||
| for registration of new URI schemes is defined separately by [BCP35]. | for registration of new URI schemes is defined separately by [BCP35]. | |||
| The scheme registry maintains the mapping between scheme names and | The scheme registry maintains the mapping between scheme names and | |||
| their specifications. Advice for designers of new URI schemes can be | their specifications. Advice for designers of new URI schemes can be | |||
| found in [RFC2718]. | found in [RFC2718]. URI scheme specifications must define their own | |||
| syntax such that all strings matching their scheme-specific syntax | ||||
| will also match the <absolute-URI> grammar, as described in | ||||
| Section 4.3. | ||||
| When presented with a URI that violates one or more scheme-specific | When presented with a URI that violates one or more scheme-specific | |||
| restrictions, the scheme-specific resolution process should flag the | restrictions, the scheme-specific resolution process should flag the | |||
| reference as an error rather than ignore the unused parts; doing so | reference as an error rather than ignore the unused parts; doing so | |||
| reduces the number of equivalent URIs and helps detect abuses of the | reduces the number of equivalent URIs and helps detect abuses of the | |||
| generic syntax that might indicate the URI has been constructed to | generic syntax that might indicate the URI has been constructed to | |||
| mislead the user (Section 7.6). | mislead the user (Section 7.6). | |||
| 3.2 Authority | 3.2 Authority | |||
| skipping to change at page 22, line 10 ¶ | skipping to change at page 22, line 21 ¶ | |||
| form, that, along with data in the non-hierarchical query component | form, that, along with data in the non-hierarchical query component | |||
| (Section 3.4), serves to identify a resource within the scope of the | (Section 3.4), serves to identify a resource within the scope of the | |||
| URI's scheme and naming authority (if any). The path is terminated | URI's scheme and naming authority (if any). The path is terminated | |||
| by the first question mark ("?") or number sign ("#") character, or | by the first question mark ("?") or number sign ("#") character, or | |||
| by the end of the URI. | by the end of the URI. | |||
| If a URI contains an authority component, then the path component | If a URI contains an authority component, then the path component | |||
| must either be empty or begin with a slash ("/") character. If a URI | must either be empty or begin with a slash ("/") character. If a URI | |||
| does not contain an authority component, then the path cannot begin | does not contain an authority component, then the path cannot begin | |||
| with two slash characters ("//"). In addition, a URI reference | with two slash characters ("//"). In addition, a URI reference | |||
| (Section 4.1) may begin with a relative path, in which case the first | (Section 4.1) may be a relative-path reference, in which case the | |||
| path segment cannot contain a colon (":") character. The ABNF | first path segment cannot contain a colon (":") character. The ABNF | |||
| requires five separate rules to disambiguate these cases, only one of | requires five separate rules to disambiguate these cases, only one of | |||
| which will match the path substring within a given URI reference. We | which will match the path substring within a given URI reference. We | |||
| use the generic term "path component" to describe the URI substring | use the generic term "path component" to describe the URI substring | |||
| matched by the parser to one of these rules. | matched by the parser to one of these rules. | |||
| path = path-abempty ; begins with "/" or is empty | path = path-abempty ; begins with "/" or is empty | |||
| / path-absolute ; begins with "/" but not "//" | / path-absolute ; begins with "/" but not "//" | |||
| / path-noscheme ; begins with a non-colon segment | / path-noscheme ; begins with a non-colon segment | |||
| / path-rootless ; begins with a segment | / path-rootless ; begins with a segment | |||
| / path-empty ; zero characters | / path-empty ; zero characters | |||
| skipping to change at page 22, line 46 ¶ | skipping to change at page 23, line 8 ¶ | |||
| A path consists of a sequence of path segments separated by a slash | A path consists of a sequence of path segments separated by a slash | |||
| ("/") character. A path is always defined for a URI, though the | ("/") character. A path is always defined for a URI, though the | |||
| defined path may be empty (zero length). Use of the slash character | defined path may be empty (zero length). Use of the slash character | |||
| to indicate hierarchy is only required when a URI will be used as the | to indicate hierarchy is only required when a URI will be used as the | |||
| context for relative references. For example, the URI | context for relative references. For example, the URI | |||
| <mailto:fred@example.com> has a path of "fred@example.com", whereas | <mailto:fred@example.com> has a path of "fred@example.com", whereas | |||
| the URI <foo://info.example.com?fred> has an empty path. | the URI <foo://info.example.com?fred> has an empty path. | |||
| The path segments "." and "..", also known as dot-segments, are | The path segments "." and "..", also known as dot-segments, are | |||
| defined for relative reference within the path name hierarchy. They | defined for relative reference within the path name hierarchy. They | |||
| are intended for use at the beginning of a relative path reference | are intended for use at the beginning of a relative-path reference | |||
| (Section 4.2) for indicating relative position within the | (Section 4.2) for indicating relative position within the | |||
| hierarchical tree of names. This is similar to their role within | hierarchical tree of names. This is similar to their role within | |||
| some operating systems' file directory structure to indicate the | some operating systems' file directory structure to indicate the | |||
| current directory and parent directory, respectively. However, | current directory and parent directory, respectively. However, | |||
| unlike a file system, these dot-segments are only interpreted within | unlike a file system, these dot-segments are only interpreted within | |||
| the URI path hierarchy and are removed as part of the resolution | the URI path hierarchy and are removed as part of the resolution | |||
| process (Section 5.2). | process (Section 5.2). | |||
| Aside from dot-segments in hierarchical paths, a path segment is | Aside from dot-segments in hierarchical paths, a path segment is | |||
| considered opaque by the generic syntax. URI-producing applications | considered opaque by the generic syntax. URI-producing applications | |||
| skipping to change at page 23, line 35 ¶ | skipping to change at page 23, line 45 ¶ | |||
| data in the path component (Section 3.3), serves to identify a | data in the path component (Section 3.3), serves to identify a | |||
| resource within the scope of the URI's scheme and naming authority | resource within the scope of the URI's scheme and naming authority | |||
| (if any). The query component is indicated by the first question | (if any). The query component is indicated by the first question | |||
| mark ("?") character and terminated by a number sign ("#") character | mark ("?") character and terminated by a number sign ("#") character | |||
| or by the end of the URI. | or by the end of the URI. | |||
| query = *( pchar / "/" / "?" ) | query = *( pchar / "/" / "?" ) | |||
| The characters slash ("/") and question mark ("?") may represent data | The characters slash ("/") and question mark ("?") may represent data | |||
| within the query component. Beware that some older, erroneous | within the query component. Beware that some older, erroneous | |||
| implementations do not handle such URIs correctly when they are used | implementations may not handle such data correctly when used as the | |||
| as the base for relative references (Section 5.1), apparently because | base URI for relative references (Section 5.1), apparently because | |||
| they fail to to distinguish query data from path data when looking | they fail to to distinguish query data from path data when looking | |||
| for hierarchical separators. However, since query components are | for hierarchical separators. However, since query components are | |||
| often used to carry identifying information in the form of | often used to carry identifying information in the form of | |||
| "key=value" pairs, and one frequently used value is a reference to | "key=value" pairs, and one frequently used value is a reference to | |||
| another URI, it is sometimes better for usability to avoid | another URI, it is sometimes better for usability to avoid | |||
| percent-encoding those characters. | percent-encoding those characters. | |||
| 3.5 Fragment | 3.5 Fragment | |||
| The fragment identifier component of a URI allows indirect | The fragment identifier component of a URI allows indirect | |||
| skipping to change at page 25, line 10 ¶ | skipping to change at page 25, line 17 ¶ | |||
| loss of information, particularly in regards to accurate redirection | loss of information, particularly in regards to accurate redirection | |||
| of references as resources move over time, it also serves to prevent | of references as resources move over time, it also serves to prevent | |||
| information providers from denying reference authors the right to | information providers from denying reference authors the right to | |||
| selectively refer to information within a resource. Indirect | selectively refer to information within a resource. Indirect | |||
| referencing also provides additional flexibility and extensibility to | referencing also provides additional flexibility and extensibility to | |||
| systems that use URIs, since new media types are easier to define and | systems that use URIs, since new media types are easier to define and | |||
| deploy than new schemes of identification. | deploy than new schemes of identification. | |||
| The characters slash ("/") and question mark ("?") are allowed to | The characters slash ("/") and question mark ("?") are allowed to | |||
| represent data within the fragment identifier. Beware that some | represent data within the fragment identifier. Beware that some | |||
| older, erroneous implementations do not handle such URIs correctly | older, erroneous implementations may not handle such data correctly | |||
| when they are used as the base for relative references (Section 5.1). | when used as the base URI for relative references (Section 5.1). | |||
| 4. Usage | 4. Usage | |||
| When applications make reference to a URI, they do not always use the | When applications make reference to a URI, they do not always use the | |||
| full form of reference defined by the "URI" syntax rule. In order to | full form of reference defined by the "URI" syntax rule. In order to | |||
| save space and take advantage of hierarchical locality, many Internet | save space and take advantage of hierarchical locality, many Internet | |||
| protocol elements and media type formats allow an abbreviation of a | protocol elements and media type formats allow an abbreviation of a | |||
| URI, while others restrict the syntax to a particular form of URI. | URI, while others restrict the syntax to a particular form of URI. | |||
| We define the most common forms of reference syntax in this | We define the most common forms of reference syntax in this | |||
| specification because they impact and depend upon the design of the | specification because they impact and depend upon the design of the | |||
| generic syntax, requiring a uniform parsing algorithm in order to be | generic syntax, requiring a uniform parsing algorithm in order to be | |||
| interpreted consistently. | interpreted consistently. | |||
| 4.1 URI Reference | 4.1 URI Reference | |||
| URI-reference is used to denote the most common usage of a resource | URI-reference is used to denote the most common usage of a resource | |||
| identifier. | identifier. | |||
| URI-reference = URI / relative-URI | URI-reference = URI / relative-ref | |||
| A URI-reference may be relative: if the reference's prefix matches | A URI-reference is either a URI or a relative reference. If the | |||
| the syntax of a scheme followed by its colon separator, then the | URI-reference's prefix does not match the syntax of a scheme followed | |||
| reference is a URI rather than a relative-URI. | by its colon separator, then the URI-reference is a relative | |||
| reference. | ||||
| A URI-reference is typically parsed first into the five URI | A URI-reference is typically parsed first into the five URI | |||
| components, in order to determine what components are present and | components, in order to determine what components are present and | |||
| whether or not the reference is relative, after which each component | whether or not the reference is relative, after which each component | |||
| is parsed for its subparts and their validation. The ABNF of | is parsed for its subparts and their validation. The ABNF of | |||
| URI-reference, along with the "first-match-wins" disambiguation rule, | URI-reference, along with the "first-match-wins" disambiguation rule, | |||
| is sufficient to define a validating parser for the generic syntax. | is sufficient to define a validating parser for the generic syntax. | |||
| Readers familiar with regular expressions should see Appendix B for | Readers familiar with regular expressions should see Appendix B for | |||
| an example of a non-validating URI-reference parser that will take | an example of a non-validating URI-reference parser that will take | |||
| any given string and extract the URI components. | any given string and extract the URI components. | |||
| 4.2 Relative URI | 4.2 Relative Reference | |||
| A relative URI reference takes advantage of the hierarchical syntax | A relative reference takes advantage of the hierarchical syntax | |||
| (Section 1.2.3) in order to express a reference that is relative to | (Section 1.2.3) in order to express a URI reference relative to the | |||
| the name space of another hierarchical URI. | name space of another hierarchical URI. | |||
| relative-URI = relative-part [ "?" query ] [ "#" fragment ] | relative-ref = relative-part [ "?" query ] [ "#" fragment ] | |||
| relative-part = "//" authority path-abempty | relative-part = "//" authority path-abempty | |||
| / path-absolute | / path-absolute | |||
| / path-noscheme | / path-noscheme | |||
| / path-empty | / path-empty | |||
| The URI referred to by a relative reference, also known as the target | The URI referred to by a relative reference, also known as the target | |||
| URI, is obtained by applying the reference resolution algorithm of | URI, is obtained by applying the reference resolution algorithm of | |||
| Section 5. | Section 5. | |||
| skipping to change at page 26, line 43 ¶ | skipping to change at page 26, line 43 ¶ | |||
| 4.3 Absolute URI | 4.3 Absolute URI | |||
| Some protocol elements allow only the absolute form of a URI without | Some protocol elements allow only the absolute form of a URI without | |||
| a fragment identifier. For example, defining a base URI for later | a fragment identifier. For example, defining a base URI for later | |||
| use by relative references calls for an absolute-URI syntax rule that | use by relative references calls for an absolute-URI syntax rule that | |||
| does not allow a fragment. | does not allow a fragment. | |||
| absolute-URI = scheme ":" hier-part [ "?" query ] | absolute-URI = scheme ":" hier-part [ "?" query ] | |||
| URI scheme specifications must define their own syntax such that all | ||||
| strings matching their scheme-specific syntax will also match the | ||||
| <absolute-URI> grammar. Scheme specifications are not responsible | ||||
| for defining fragment identifier syntax or usage, regardless of its | ||||
| applicability to resources identifiable via that scheme, since | ||||
| fragment identification is orthogonal to scheme definition. However, | ||||
| scheme specifications are encouraged to include a wide range of | ||||
| examples, including examples that show use of the scheme's URIs with | ||||
| fragment identifiers when such usage is appropriate. | ||||
| 4.4 Same-document Reference | 4.4 Same-document Reference | |||
| When a URI reference refers to a URI that is, aside from its fragment | When a URI reference refers to a URI that is, aside from its fragment | |||
| component (if any), identical to the base URI (Section 5.1), that | component (if any), identical to the base URI (Section 5.1), that | |||
| reference is called a "same-document" reference. The most frequent | reference is called a "same-document" reference. The most frequent | |||
| examples of same-document references are relative references that are | examples of same-document references are relative references that are | |||
| empty or include only the number sign ("#") separator followed by a | empty or include only the number sign ("#") separator followed by a | |||
| fragment identifier. | fragment identifier. | |||
| When a same-document reference is dereferenced for the purpose of a | When a same-document reference is dereferenced for the purpose of a | |||
| skipping to change at page 27, line 45 ¶ | skipping to change at page 28, line 8 ¶ | |||
| user and heuristically resolved. | user and heuristically resolved. | |||
| While this practice of using suffix references is common, it should | While this practice of using suffix references is common, it should | |||
| be avoided whenever possible and never used in situations where | be avoided whenever possible and never used in situations where | |||
| long-term references are expected. The heuristics noted above will | long-term references are expected. The heuristics noted above will | |||
| change over time, particularly when a new URI scheme becomes popular, | change over time, particularly when a new URI scheme becomes popular, | |||
| and are often incorrect when used out of context. Furthermore, they | and are often incorrect when used out of context. Furthermore, they | |||
| can lead to security issues along the lines of those described in | can lead to security issues along the lines of those described in | |||
| [RFC1535]. | [RFC1535]. | |||
| Since a URI suffix has the same syntax as a relative path reference, | Since a URI suffix has the same syntax as a relative-path reference, | |||
| a suffix reference cannot be used in contexts where a relative | a suffix reference cannot be used in contexts where a relative | |||
| reference is expected. As a result, suffix references are limited to | reference is expected. As a result, suffix references are limited to | |||
| those places where there is no defined base URI, such as dialog boxes | those places where there is no defined base URI, such as dialog boxes | |||
| and off-line advertisements. | and off-line advertisements. | |||
| 5. Reference Resolution | 5. Reference Resolution | |||
| This section defines the process of resolving a URI reference within | This section defines the process of resolving a URI reference within | |||
| a context that allows relative references, such that the result is a | a context that allows relative references, such that the result is a | |||
| string matching the "URI" syntax rule of Section 3. | string matching the <URI> syntax rule of Section 3. | |||
| 5.1 Establishing a Base URI | 5.1 Establishing a Base URI | |||
| The term "relative" implies that there exists a "base URI" against | The term "relative" implies that there exists a "base URI" against | |||
| which the relative reference is applied. Aside from fragment-only | which the relative reference is applied. Aside from fragment-only | |||
| references (Section 4.4), relative references are only usable when a | references (Section 4.4), relative references are only usable when a | |||
| base URI is known. A base URI must be established by the parser | base URI is known. A base URI must be established by the parser | |||
| prior to parsing URI references that might be relative. A base URI | prior to parsing URI references that might be relative. A base URI | |||
| must conform to the <absolute-URI> syntax rule (Section 4.3): if the | must conform to the <absolute-URI> syntax rule (Section 4.3): if the | |||
| base URI is obtained from a URI reference, then that reference must | base URI is obtained from a URI reference, then that reference must | |||
| skipping to change at page 34, line 48 ¶ | skipping to change at page 34, line 48 ¶ | |||
| present in the reference, and a component that is empty, meaning that | present in the reference, and a component that is empty, meaning that | |||
| the separator was present and was immediately followed by the next | the separator was present and was immediately followed by the next | |||
| component separator or the end of the reference. | component separator or the end of the reference. | |||
| 5.4 Reference Resolution Examples | 5.4 Reference Resolution Examples | |||
| Within a representation with a well-defined base URI of | Within a representation with a well-defined base URI of | |||
| http://a/b/c/d;p?q | http://a/b/c/d;p?q | |||
| a relative URI reference is transformed to its target URI as follows. | a relative reference is transformed to its target URI as follows. | |||
| 5.4.1 Normal Examples | 5.4.1 Normal Examples | |||
| "g:h" = "g:h" | "g:h" = "g:h" | |||
| "g" = "http://a/b/c/g" | "g" = "http://a/b/c/g" | |||
| "./g" = "http://a/b/c/g" | "./g" = "http://a/b/c/g" | |||
| "g/" = "http://a/b/c/g/" | "g/" = "http://a/b/c/g/" | |||
| "/g" = "http://a/g" | "/g" = "http://a/g" | |||
| "//g" = "http://g" | "//g" = "http://g" | |||
| "?y" = "http://a/b/c/d;p?y" | "?y" = "http://a/b/c/d;p?y" | |||
| skipping to change at page 35, line 37 ¶ | skipping to change at page 35, line 37 ¶ | |||
| "../.." = "http://a/" | "../.." = "http://a/" | |||
| "../../" = "http://a/" | "../../" = "http://a/" | |||
| "../../g" = "http://a/g" | "../../g" = "http://a/g" | |||
| 5.4.2 Abnormal Examples | 5.4.2 Abnormal Examples | |||
| Although the following abnormal examples are unlikely to occur in | Although the following abnormal examples are unlikely to occur in | |||
| normal practice, all URI parsers should be capable of resolving them | normal practice, all URI parsers should be capable of resolving them | |||
| consistently. Each example uses the same base as above. | consistently. Each example uses the same base as above. | |||
| Parsers must be careful in handling cases where there are more | Parsers must be careful in handling cases where there are more ".." | |||
| relative path ".." segments than there are hierarchical levels in the | segments in a relative-path reference than there are hierarchical | |||
| base URI's path. Note that the ".." syntax cannot be used to change | levels in the base URI's path. Note that the ".." syntax cannot be | |||
| the authority component of a URI. | used to change the authority component of a URI. | |||
| "../../../g" = "http://a/g" | "../../../g" = "http://a/g" | |||
| "../../../../g" = "http://a/g" | "../../../../g" = "http://a/g" | |||
| Similarly, parsers must remove the dot-segments "." and ".." when | Similarly, parsers must remove the dot-segments "." and ".." when | |||
| they are complete components of a path, but not when they are only | they are complete components of a path, but not when they are only | |||
| part of a segment. | part of a segment. | |||
| "/./g" = "http://a/g" | "/./g" = "http://a/g" | |||
| "/../g" = "http://a/g" | "/../g" = "http://a/g" | |||
| "g." = "http://a/b/c/g." | "g." = "http://a/b/c/g." | |||
| ".g" = "http://a/b/c/.g" | ".g" = "http://a/b/c/.g" | |||
| "g.." = "http://a/b/c/g.." | "g.." = "http://a/b/c/g.." | |||
| "..g" = "http://a/b/c/..g" | "..g" = "http://a/b/c/..g" | |||
| Less likely are cases where the relative URI reference uses | Less likely are cases where the relative reference uses unnecessary | |||
| unnecessary or nonsensical forms of the "." and ".." complete path | or nonsensical forms of the "." and ".." complete path segments. | |||
| segments. | ||||
| "./../g" = "http://a/b/g" | "./../g" = "http://a/b/g" | |||
| "./g/." = "http://a/b/c/g/" | "./g/." = "http://a/b/c/g/" | |||
| "g/./h" = "http://a/b/c/g/h" | "g/./h" = "http://a/b/c/g/h" | |||
| "g/../h" = "http://a/b/c/h" | "g/../h" = "http://a/b/c/h" | |||
| "g;x=1/./y" = "http://a/b/c/g;x=1/y" | "g;x=1/./y" = "http://a/b/c/g;x=1/y" | |||
| "g;x=1/../y" = "http://a/b/c/y" | "g;x=1/../y" = "http://a/b/c/y" | |||
| Some applications fail to separate the reference's query and/or | Some applications fail to separate the reference's query and/or | |||
| fragment components from a relative path before merging it with the | fragment components from the path component before merging it with | |||
| base path and removing dot-segments. This error is rarely noticed, | the base path and removing dot-segments. This error is rarely | |||
| since typical usage of a fragment never includes the hierarchy ("/") | noticed, since typical usage of a fragment never includes the | |||
| character, and the query component is not normally used within | hierarchy ("/") character, and the query component is not normally | |||
| relative references. | used within relative references. | |||
| "g?y/./x" = "http://a/b/c/g?y/./x" | "g?y/./x" = "http://a/b/c/g?y/./x" | |||
| "g?y/../x" = "http://a/b/c/g?y/../x" | "g?y/../x" = "http://a/b/c/g?y/../x" | |||
| "g#s/./x" = "http://a/b/c/g#s/./x" | "g#s/./x" = "http://a/b/c/g#s/./x" | |||
| "g#s/../x" = "http://a/b/c/g#s/../x" | "g#s/../x" = "http://a/b/c/g#s/../x" | |||
| Some parsers allow the scheme name to be present in a relative URI | Some parsers allow the scheme name to be present in a relative | |||
| reference if it is the same as the base URI scheme. This is | reference if it is the same as the base URI scheme. This is | |||
| considered to be a loophole in prior specifications of partial URI | considered to be a loophole in prior specifications of partial URI | |||
| [RFC1630]. Its use should be avoided, but is allowed for backward | [RFC1630]. Its use should be avoided, but is allowed for backward | |||
| compatibility. | compatibility. | |||
| "http:g" = "http:g" ; for strict parsers | "http:g" = "http:g" ; for strict parsers | |||
| / "http://a/b/c/g" ; for backward compatibility | / "http://a/b/c/g" ; for backward compatibility | |||
| 6. Normalization and Comparison | 6. Normalization and Comparison | |||
| One of the most common operations on URIs is simple comparison: | One of the most common operations on URIs is simple comparison: | |||
| determining if two URIs are equivalent without using the URIs to | determining if two URIs are equivalent without using the URIs to | |||
| access their respective resource(s). A comparison is performed every | access their respective resource(s). A comparison is performed every | |||
| time a response cache is accessed, a browser checks its history to | time a response cache is accessed, a browser checks its history to | |||
| color a link, or an XML parser processes tags within a namespace. | color a link, or an XML parser processes tags within a namespace. | |||
| Extensive normalization prior to comparison of URIs is often used by | Extensive normalization prior to comparison of URIs is often used by | |||
| spiders and indexing engines to prune a search space or reduce | spiders and indexing engines to prune a search space or reduce | |||
| duplication of request actions and response storage. | duplication of request actions and response storage. | |||
| URI comparison is performed in respect to some particular purpose, | URI comparison is performed in respect to some particular purpose, | |||
| and software with differing purposes will often be subject to | and implementations with differing purposes will often be subject to | |||
| differing design trade-offs in regards to how much effort should be | differing design trade-offs in regards to how much effort should be | |||
| spent in reducing duplicate identifiers. This section describes a | spent in reducing aliased identifiers. This section describes a | |||
| variety of methods that may be used to compare URIs, the trade-offs | variety of methods that may be used to compare URIs, the trade-offs | |||
| between them, and the types of applications that might use them. A | between them, and the types of applications that might use them. | |||
| canonical form for URI references is defined to reduce the occurrence | ||||
| of false negative comparisons. | ||||
| 6.1 Equivalence | 6.1 Equivalence | |||
| Since URIs exist to identify resources, presumably they should be | Since URIs exist to identify resources, presumably they should be | |||
| considered equivalent when they identify the same resource. However, | considered equivalent when they identify the same resource. However, | |||
| such a definition of equivalence is not of much practical use, since | such a definition of equivalence is not of much practical use, since | |||
| there is no way for software to compare two resources without | there is no way for an implementation to compare two resources that | |||
| knowledge of the implementation-specific syntax of each URI's | are not under its own control. For this reason, determination of | |||
| dereferencing algorithm. For this reason, determination of | ||||
| equivalence or difference of URIs is based on string comparison, | equivalence or difference of URIs is based on string comparison, | |||
| perhaps augmented by reference to additional rules provided by URI | perhaps augmented by reference to additional rules provided by URI | |||
| scheme definitions. We use the terms "different" and "equivalent" to | scheme definitions. We use the terms "different" and "equivalent" to | |||
| describe the possible outcomes of such comparisons, but there are | describe the possible outcomes of such comparisons, but there are | |||
| many application-dependent versions of equivalence. | many application-dependent versions of equivalence. | |||
| Even though it is possible to determine that two URIs are equivalent, | Even though it is possible to determine that two URIs are equivalent, | |||
| it is never possible to be sure that two URIs identify different | URI comparison is not sufficient to determine if two URIs identify | |||
| resources. For example, an owner of two different domain names could | different resources. For example, an owner of two different domain | |||
| decide to serve the same resource from both, resulting in two | names could decide to serve the same resource from both, resulting in | |||
| different URIs. Therefore, comparison methods are designed to | two different URIs. Therefore, comparison methods are designed to | |||
| minimize false negatives while strictly avoiding false positives. | minimize false negatives while strictly avoiding false positives. | |||
| In testing for equivalence, applications should not directly compare | In testing for equivalence, applications should not directly compare | |||
| relative URI references; the references should be converted to their | relative references; the references should be converted to their | |||
| target URI forms before comparison. When URIs are being compared for | respective target URIs before comparison. When URIs are being | |||
| the purpose of selecting (or avoiding) a network action, such as | compared for the purpose of selecting (or avoiding) a network action, | |||
| retrieval of a representation, the fragment components (if any) | such as retrieval of a representation, fragment components (if any) | |||
| should be excluded from the comparison. | should be excluded from the comparison. | |||
| 6.2 Comparison Ladder | 6.2 Comparison Ladder | |||
| A variety of methods are used in practice to test URI equivalence. | A variety of methods are used in practice to test URI equivalence. | |||
| These methods fall into a range, distinguished by the amount of | These methods fall into a range, distinguished by the amount of | |||
| processing required and the degree to which the probability of false | processing required and the degree to which the probability of false | |||
| negatives is reduced. As noted above, false negatives cannot in | negatives is reduced. As noted above, false negatives cannot be | |||
| principle be eliminated. In practice, their probability can be | eliminated. In practice, their probability can be reduced, but this | |||
| reduced, but this reduction requires more processing and is not | reduction requires more processing and is not cost-effective for all | |||
| cost-effective for all applications. | applications. | |||
| If this range of comparison practices is considered as a ladder, the | If this range of comparison practices is considered as a ladder, the | |||
| following discussion will climb the ladder, starting with those | following discussion will climb the ladder, starting with those | |||
| practices that are cheap but have a relatively higher chance of | practices that are cheap but have a relatively higher chance of | |||
| producing false negatives, and proceeding to those that have higher | producing false negatives, and proceeding to those that have higher | |||
| computational cost and lower risk of false negatives. | computational cost and lower risk of false negatives. | |||
| 6.2.1 Simple String Comparison | 6.2.1 Simple String Comparison | |||
| If two URIs, considered as character strings, are identical, then it | If two URIs, considered as character strings, are identical, then it | |||
| skipping to change at page 38, line 38 ¶ | skipping to change at page 38, line 31 ¶ | |||
| Such character comparisons require that each pair of characters be | Such character comparisons require that each pair of characters be | |||
| put in comparable form. For example, should one URI be stored in a | put in comparable form. For example, should one URI be stored in a | |||
| byte array in EBCDIC encoding, and the second be in a Java String | byte array in EBCDIC encoding, and the second be in a Java String | |||
| object (UTF-16), bit-for-bit comparisons applied naively will produce | object (UTF-16), bit-for-bit comparisons applied naively will produce | |||
| errors. It is better to speak of equality on a | errors. It is better to speak of equality on a | |||
| character-for-character rather than byte-for-byte or bit-for-bit | character-for-character rather than byte-for-byte or bit-for-bit | |||
| basis. In practical terms, character-by-character comparisons should | basis. In practical terms, character-by-character comparisons should | |||
| be done codepoint-by-codepoint after conversion to a common character | be done codepoint-by-codepoint after conversion to a common character | |||
| encoding. | encoding. | |||
| False negatives are caused by the production and use of URI aliases. | ||||
| Unnecessary aliases can be reduced, regardless of the comparison | ||||
| method, by consistently providing URI references in an | ||||
| already-normalized form (i.e., a form identical to what would be | ||||
| produced after normalization is applied, as described below). | ||||
| Protocols and data formats often choose to limit some URI comparisons | ||||
| to simple string comparison, based on the theory that people and | ||||
| implementations will, in their own best interest, be consistent in | ||||
| providing URI references, or at least consistent enough to negate any | ||||
| efficiency that might be obtained from further normalization. | ||||
| 6.2.2 Syntax-based Normalization | 6.2.2 Syntax-based Normalization | |||
| Software may use logic based on the definitions provided by this | Implementations may use logic based on the definitions provided by | |||
| specification to reduce the probability of false negatives. Such | this specification to reduce the probability of false negatives. | |||
| processing is moderately higher in cost than character-for-character | Such processing is moderately higher in cost than | |||
| string comparison. For example, an application using this approach | character-for-character string comparison. For example, an | |||
| could reasonably consider the following two URIs equivalent: | application using this approach could reasonably consider the | |||
| following two URIs equivalent: | ||||
| example://a/b/c/%7Bfoo%7D | example://a/b/c/%7Bfoo%7D | |||
| eXAMPLE://a/./b/../b/%63/%7bfoo%7d | eXAMPLE://a/./b/../b/%63/%7bfoo%7d | |||
| Web user agents, such as browsers, typically apply this type of URI | Web user agents, such as browsers, typically apply this type of URI | |||
| normalization when determining whether a cached response is | normalization when determining whether a cached response is | |||
| available. Syntax-based normalization includes such techniques as | available. Syntax-based normalization includes such techniques as | |||
| case normalization, percent-encoding normalization, and removal of | case normalization, percent-encoding normalization, and removal of | |||
| dot-segments. | dot-segments. | |||
| 6.2.2.1 Case Normalization | 6.2.2.1 Case Normalization | |||
| When a URI scheme uses components of the generic syntax, it will also | For all URIs, the hexadecimal digits within a percent-encoding | |||
| use the common syntax equivalence rules, namely that the scheme and | triplet (e.g., "%3a" versus "%3A") are case-insensitive and therefore | |||
| should be normalized to use uppercase letters for the digits A-F. | ||||
| When a URI uses components of the generic syntax, the component | ||||
| syntax equivalence rules always apply; namely, that the scheme and | ||||
| host are case-insensitive and therefore should be normalized to | host are case-insensitive and therefore should be normalized to | |||
| lowercase. For example, the URI <HTTP://www.EXAMPLE.com/> is | lowercase. For example, the URI <HTTP://www.EXAMPLE.com/> is | |||
| equivalent to <http://www.example.com/>. Applications should not | equivalent to <http://www.example.com/>. The other generic syntax | |||
| assume anything about the case sensitivity of other URI components, | components are assumed to be case-sensitive unless specifically | |||
| since that is dependent on the implementation used to handle a | defined otherwise by the scheme (see Section 6.2.3). | |||
| dereference. | ||||
| The hexadecimal digits within a percent-encoding triplet (e.g., "%3a" | ||||
| versus "%3A") are case-insensitive and therefore should be normalized | ||||
| to use uppercase letters for the digits A-F. | ||||
| 6.2.2.2 Percent-Encoding Normalization | 6.2.2.2 Percent-Encoding Normalization | |||
| The percent-encoding mechanism (Section 2.1) is a frequent source of | The percent-encoding mechanism (Section 2.1) is a frequent source of | |||
| variance among otherwise identical URIs. In addition to the | variance among otherwise identical URIs. In addition to the case | |||
| case-insensitivity issue noted above, some URI producers | normalization issue noted above, some URI producers percent-encode | |||
| percent-encode octets that do not require percent-encoding, resulting | octets that do not require percent-encoding, resulting in URIs that | |||
| in URIs that are equivalent to their non-encoded counterparts. Such | are equivalent to their non-encoded counterparts. Such URIs should | |||
| URIs should be normalized by decoding any percent-encoded octet that | be normalized by decoding any percent-encoded octet that corresponds | |||
| corresponds to an unreserved character, as described in Section 2.3. | to an unreserved character, as described in Section 2.3. | |||
| 6.2.2.3 Path Segment Normalization | 6.2.2.3 Path Segment Normalization | |||
| The complete path segments "." and ".." have a special meaning within | The complete path segments "." and ".." are intended only for use | |||
| hierarchical URI schemes. As such, they should not appear in | within relative references (Section 4.1) and are removed as part of | |||
| absolute paths; if they are found, they can be removed by applying | the reference resolution process (Section 5.2). However, some | |||
| the remove_dot_segments algorithm to the path, as described in | deployed implementations incorrectly assume that reference resolution | |||
| Section 5.2. | is not necessary when the reference is already a URI, and thus fail | |||
| to remove dot-segments when they occur in non-relative paths. URI | ||||
| normalizers should remove dot-segments by applying the | ||||
| remove_dot_segments algorithm to the path, as described in | ||||
| Section 5.2.4. | ||||
| 6.2.3 Scheme-based Normalization | 6.2.3 Scheme-based Normalization | |||
| The syntax and semantics of URIs vary from scheme to scheme, as | The syntax and semantics of URIs vary from scheme to scheme, as | |||
| described by the defining specification for each scheme. Software | described by the defining specification for each scheme. | |||
| may use scheme-specific rules, at further processing cost, to reduce | Implementations may use scheme-specific rules, at further processing | |||
| the probability of false negatives. For example, since the "http" | cost, to reduce the probability of false negatives. For example, | |||
| scheme makes use of an authority component, has a default port of | since the "http" scheme makes use of an authority component, has a | |||
| "80", and defines an empty path to be equivalent to "/", the | default port of "80", and defines an empty path to be equivalent to | |||
| following four URIs are equivalent: | "/", the following four URIs are equivalent: | |||
| http://example.com | http://example.com | |||
| http://example.com/ | http://example.com/ | |||
| http://example.com:/ | http://example.com:/ | |||
| http://example.com:80/ | http://example.com:80/ | |||
| In general, a URI that uses the generic syntax for authority with an | In general, a URI that uses the generic syntax for authority with an | |||
| empty path should be normalized to a path of "/"; likewise, an | empty path should be normalized to a path of "/"; likewise, an | |||
| explicit ":port", where the port is empty or the default for the | explicit ":port", where the port is empty or the default for the | |||
| scheme, is equivalent to one where the port and its ":" delimiter are | scheme, is equivalent to one where the port and its ":" delimiter are | |||
| elided. In other words, the second of the above URI examples is the | elided, and thus should be removed by scheme-based normalization. | |||
| normal form for the "http" scheme. | For example, the second URI above is the normal form for the "http" | |||
| scheme. | ||||
| Another case where normalization varies by scheme is in the handling | Another case where normalization varies by scheme is in the handling | |||
| of an empty authority component or empty host subcomponent. For many | of an empty authority component or empty host subcomponent. For many | |||
| scheme specifications, an empty authority or host is considered an | scheme specifications, an empty authority or host is considered an | |||
| error; for others, it is considered equivalent to "localhost" or the | error; for others, it is considered equivalent to "localhost" or the | |||
| end-user's host. When a scheme defines a default for authority and a | end-user's host. When a scheme defines a default for authority and a | |||
| URI reference to that default is desired, the reference should have | URI reference to that default is desired, the reference should be | |||
| an empty authority for the sake of uniformity, brevity, and | normalized to an empty authority for the sake of uniformity, brevity, | |||
| internationalization. If, however, either the userinfo or port | and internationalization. If, however, either the userinfo or port | |||
| subcomponent is non-empty, then the host should be given explicitly | subcomponent is non-empty, then the host should be given explicitly | |||
| even if it matches the default. | even if it matches the default. | |||
| Normalization should not remove delimiters when their associated | ||||
| component is empty unless licensed to do so by the scheme | ||||
| specification. For example, the URI "http://example.com/?" cannot be | ||||
| assumed to be equivalent to any of the examples above. Likewise, the | ||||
| presence or absence of delimiters within a userinfo subcomponent is | ||||
| usually significant to its interpretation. The fragment component is | ||||
| not subject to any scheme-based normalization; thus, two URIs that | ||||
| differ only by the suffix "#" are considered different regardless of | ||||
| the scheme. | ||||
| Some schemes define additional subcomponents that consist of | ||||
| case-insensitive data, giving an implicit license to normalizers to | ||||
| convert such data to a common case (e.g., all lowercase). For | ||||
| example, URI schemes that define a subcomponent of path to contain an | ||||
| Internet hostname, such as the "mailto" URI scheme, cause that | ||||
| subcomponent to be case-insensitive and thus subject to case | ||||
| normalization (e.g., "mailto:Joe@Example.COM" is equivalent to | ||||
| "mailto:Joe@example.com" even though the generic syntax considers the | ||||
| path component to be case-sensitive). | ||||
| Other scheme-specific normalizations are possible. | ||||
| 6.2.4 Protocol-based Normalization | 6.2.4 Protocol-based Normalization | |||
| Web spiders, for which substantial effort to reduce the incidence of | Web spiders, for which substantial effort to reduce the incidence of | |||
| false negatives is often cost-effective, are observed to implement | false negatives is often cost-effective, are observed to implement | |||
| even more aggressive techniques in URI comparison. For example, if | even more aggressive techniques in URI comparison. For example, if | |||
| they observe that a URI such as | they observe that a URI such as | |||
| http://example.com/data | http://example.com/data | |||
| redirects to a URI differing only in the trailing slash | redirects to a URI differing only in the trailing slash | |||
| http://example.com/data/ | http://example.com/data/ | |||
| they will likely regard the two as equivalent in the future. This | they will likely regard the two as equivalent in the future. This | |||
| kind of technique is only appropriate when equivalence is clearly | kind of technique is only appropriate when equivalence is clearly | |||
| indicated by both the result of accessing the resources and the | indicated by both the result of accessing the resources and the | |||
| common conventions of their scheme's dereference algorithm (in this | common conventions of their scheme's dereference algorithm (in this | |||
| case, use of redirection by HTTP origin servers to avoid problems | case, use of redirection by HTTP origin servers to avoid problems | |||
| with relative references). | with relative references). | |||
| 6.3 Canonical Form | ||||
| It is in the best interests of everyone concerned to avoid | ||||
| false-negatives in comparing URIs and to minimize the amount of | ||||
| software processing for such comparisons. Those who produce and make | ||||
| reference to URIs can reduce the cost of processing and the risk of | ||||
| false negatives by consistently providing them in a form that is | ||||
| reasonably canonical with respect to their scheme. Specifically: | ||||
| o Always provide the URI scheme in lowercase characters. | ||||
| o Always provide the host, if any, in lowercase characters. | ||||
| o Only perform percent-encoding where it is essential. | ||||
| o Always use uppercase A-through-F characters when percent-encoding. | ||||
| o Prevent dot-segments appearing in non-relative URI paths. | ||||
| o For schemes that define a default authority, use an empty | ||||
| authority if the default is desired. | ||||
| o For schemes that define an empty path to be equivalent to a path | ||||
| of "/", use "/". | ||||
| 7. Security Considerations | 7. Security Considerations | |||
| A URI does not in itself pose a security threat. However, since URIs | A URI does not in itself pose a security threat. However, since URIs | |||
| are often used to provide a compact set of instructions for access to | are often used to provide a compact set of instructions for access to | |||
| network resources, care must be taken to properly interpret the data | network resources, care must be taken to properly interpret the data | |||
| within a URI, to prevent that data from causing unintended access, | within a URI, to prevent that data from causing unintended access, | |||
| and to avoid including data that should not be revealed in plain | and to avoid including data that should not be revealed in plain | |||
| text. | text. | |||
| 7.1 Reliability and Consistency | 7.1 Reliability and Consistency | |||
| skipping to change at page 44, line 42 ¶ | skipping to change at page 45, line 9 ¶ | |||
| impact of such attacks by distinguishing the various components of | impact of such attacks by distinguishing the various components of | |||
| the URI when rendered, such as by using a different color or tone to | the URI when rendered, such as by using a different color or tone to | |||
| render userinfo if any is present, though there is no general | render userinfo if any is present, though there is no general | |||
| panacea. More information on URI-based semantic attacks can be found | panacea. More information on URI-based semantic attacks can be found | |||
| in [Siedzik]. | in [Siedzik]. | |||
| 8. IANA Considerations | 8. IANA Considerations | |||
| URI scheme names, as defined by <scheme> in Section 3.1, form a | URI scheme names, as defined by <scheme> in Section 3.1, form a | |||
| registered name space that is managed by IANA according to the | registered name space that is managed by IANA according to the | |||
| procedures defined in [BCP35]. | procedures defined in [BCP35]. No IANA actions are required by this | |||
| document. | ||||
| 9. Acknowledgments | 9. Acknowledgments | |||
| This specification is derived from RFC 2396 [RFC2396], RFC 1808 | This specification is derived from RFC 2396 [RFC2396], RFC 1808 | |||
| [RFC1808], and RFC 1738 [RFC1738]; the acknowledgments in those | [RFC1808], and RFC 1738 [RFC1738]; the acknowledgments in those | |||
| documents still apply. It also incorporates the update (with | documents still apply. It also incorporates the update (with | |||
| corrections) for IPv6 literals in the host syntax, as defined by | corrections) for IPv6 literals in the host syntax, as defined by | |||
| Robert M. Hinden, Brian E. Carpenter, and Larry Masinter in | Robert M. Hinden, Brian E. Carpenter, and Larry Masinter in | |||
| [RFC2732]. In addition, contributions by Gisle Aas, Reese Anschultz, | [RFC2732]. In addition, contributions by Gisle Aas, Reese Anschultz, | |||
| Daniel Barclay, Tim Bray, Mike Brown, Rob Cameron, Jeremy Carroll, | Daniel Barclay, Tim Bray, Mike Brown, Rob Cameron, Jeremy Carroll, | |||
| skipping to change at page 47, line 12 ¶ | skipping to change at page 47, line 48 ¶ | |||
| (URNs): Clarifications and Recommendations", RFC 3305, | (URNs): Clarifications and Recommendations", RFC 3305, | |||
| August 2002. | August 2002. | |||
| [RFC3490] Faltstrom, P., Hoffman, P. and A. Costello, | [RFC3490] Faltstrom, P., Hoffman, P. and A. Costello, | |||
| "Internationalizing Domain Names in Applications (IDNA)", | "Internationalizing Domain Names in Applications (IDNA)", | |||
| RFC 3490, March 2003. | RFC 3490, March 2003. | |||
| [RFC3513] Hinden, R. and S. Deering, "Internet Protocol Version 6 | [RFC3513] Hinden, R. and S. Deering, "Internet Protocol Version 6 | |||
| (IPv6) Addressing Architecture", RFC 3513, April 2003. | (IPv6) Addressing Architecture", RFC 3513, April 2003. | |||
| [Siedzik] Siedzik, R., "Semantic Attacks: What's in a URL?", April | [Siedzik] Siedzik, R., "Semantic Attacks: What's in a URL?", | |||
| 2001, <http://www.giac.org/practical/gsec/ | April 2001, <http://www.giac.org/practical/gsec/ | |||
| Richard_Siedzik_GSEC.pdf>. | Richard_Siedzik_GSEC.pdf>. | |||
| Authors' Addresses | Authors' Addresses | |||
| Tim Berners-Lee | Tim Berners-Lee | |||
| World Wide Web Consortium | World Wide Web Consortium | |||
| Massachusetts Institute of Technology | Massachusetts Institute of Technology | |||
| 77 Massachusetts Avenue | 77 Massachusetts Avenue | |||
| Cambridge, MA 02139 | Cambridge, MA 02139 | |||
| USA | USA | |||
| skipping to change at page 48, line 14 ¶ | skipping to change at page 49, line 14 ¶ | |||
| Appendix A. Collected ABNF for URI | Appendix A. Collected ABNF for URI | |||
| URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] | URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] | |||
| hier-part = "//" authority path-abempty | hier-part = "//" authority path-abempty | |||
| / path-absolute | / path-absolute | |||
| / path-rootless | / path-rootless | |||
| / path-empty | / path-empty | |||
| URI-reference = URI / relative-URI | URI-reference = URI / relative-ref | |||
| absolute-URI = scheme ":" hier-part [ "?" query ] | absolute-URI = scheme ":" hier-part [ "?" query ] | |||
| relative-URI = relative-part [ "?" query ] [ "#" fragment ] | relative-ref = relative-part [ "?" query ] [ "#" fragment ] | |||
| relative-part = "//" authority path-abempty | relative-part = "//" authority path-abempty | |||
| / path-absolute | / path-absolute | |||
| / path-noscheme | / path-noscheme | |||
| / path-empty | / path-empty | |||
| scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) | scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) | |||
| authority = [ userinfo "@" ] host [ ":" port ] | authority = [ userinfo "@" ] host [ ":" port ] | |||
| userinfo = *( unreserved / pct-encoded / sub-delims / ":" ) | userinfo = *( unreserved / pct-encoded / sub-delims / ":" ) | |||
| skipping to change at page 51, line 11 ¶ | skipping to change at page 52, line 16 ¶ | |||
| URIs are often transmitted through formats that do not provide a | URIs are often transmitted through formats that do not provide a | |||
| clear context for their interpretation. For example, there are many | clear context for their interpretation. For example, there are many | |||
| occasions when a URI is included in plain text; examples include text | occasions when a URI is included in plain text; examples include text | |||
| sent in electronic mail, USENET news messages, and, most importantly, | sent in electronic mail, USENET news messages, and, most importantly, | |||
| printed on paper. In such cases, it is important to be able to | printed on paper. In such cases, it is important to be able to | |||
| delimit the URI from the rest of the text, and in particular from | delimit the URI from the rest of the text, and in particular from | |||
| punctuation marks that might be mistaken for part of the URI. | punctuation marks that might be mistaken for part of the URI. | |||
| In practice, URIs are delimited in a variety of ways, but usually | In practice, URIs are delimited in a variety of ways, but usually | |||
| within double-quotes "http://example.com/", angle brackets <http:// | within double-quotes "http://example.com/", angle brackets | |||
| example.com/>, or just using whitespace | <http://example.com/>, or just using whitespace | |||
| http://example.com/ | http://example.com/ | |||
| These wrappers do not form part of the URI. | These wrappers do not form part of the URI. | |||
| In some cases, extra whitespace (spaces, line-breaks, tabs, etc.) may | In some cases, extra whitespace (spaces, line-breaks, tabs, etc.) may | |||
| need to be added to break a long URI across lines. The whitespace | need to be added to break a long URI across lines. The whitespace | |||
| should be ignored when extracting the URI. | should be ignored when extracting the URI. | |||
| No whitespace should be introduced after a hyphen ("-") character. | No whitespace should be introduced after a hyphen ("-") character. | |||
| skipping to change at page 52, line 5 ¶ | skipping to change at page 53, line 18 ¶ | |||
| but you can probably pick it up from <ftp://foo.example. | but you can probably pick it up from <ftp://foo.example. | |||
| com/rfc/>. Note the warning in <http://www.ics.uci.edu/pub/ | com/rfc/>. Note the warning in <http://www.ics.uci.edu/pub/ | |||
| ietf/uri/historical.html#WARNING>. | ietf/uri/historical.html#WARNING>. | |||
| contains the URI references | contains the URI references | |||
| http://www.w3.org/Addressing/ | http://www.w3.org/Addressing/ | |||
| ftp://foo.example.com/rfc/ | ftp://foo.example.com/rfc/ | |||
| http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING | http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING | |||
| Appendix D. Summary of Non-editorial Changes | Appendix D. Changes from RFC 2396 | |||
| D.1 Additions | D.1 Additions | |||
| An ABNF rule for URI has been introduced to correspond to one common | ||||
| usage of the term: an absolute URI with optional fragment. | ||||
| IPv6 (and later) literals have been added to the list of possible | IPv6 (and later) literals have been added to the list of possible | |||
| identifiers for the host portion of a authority component, as | identifiers for the host portion of an authority component, as | |||
| described by [RFC2732], with the addition of "[" and "]" to the | described by [RFC2732], with the addition of "[" and "]" to the | |||
| reserved set and a version flag to anticipate future versions of IP | reserved set and a version flag to anticipate future versions of IP | |||
| literals. Square brackets are now specified as reserved within the | literals. Square brackets are now specified as reserved within the | |||
| authority component and not allowed outside their use as delimiters | authority component and not allowed outside their use as delimiters | |||
| for an IP literal within host. In order to make this change without | for an IP literal within host. In order to make this change without | |||
| changing the technical definition of the path, query, and fragment | changing the technical definition of the path, query, and fragment | |||
| components, those rules were redefined to directly specify the | components, those rules were redefined to directly specify the | |||
| characters allowed rather than be defined in terms of uric. | characters allowed. | |||
| Since [RFC2732] defers to [RFC3513] for definition of an IPv6 literal | Since [RFC2732] defers to [RFC3513] for definition of an IPv6 literal | |||
| address, which unfortunately lacks an ABNF description of | address, which unfortunately lacks an ABNF description of | |||
| IPv6address, we created a new ABNF rule for IPv6address that matches | IPv6address, we created a new ABNF rule for IPv6address that matches | |||
| the text representations defined by Section 2.2 of [RFC3513]. | the text representations defined by Section 2.2 of [RFC3513]. | |||
| Likewise, the definition of IPv4address has been improved in order to | Likewise, the definition of IPv4address has been improved in order to | |||
| limit each decimal octet to the range 0-255. | limit each decimal octet to the range 0-255. | |||
| Section 6 (Section 6) on URI normalization and comparison has been | Section 6 (Section 6) on URI normalization and comparison has been | |||
| completely rewritten and extended using input from Tim Bray and | completely rewritten and extended using input from Tim Bray and | |||
| discussion within the W3C Technical Architecture Group. | discussion within the W3C Technical Architecture Group. | |||
| An ABNF rule for URI has been introduced to correspond to the common | D.2 Modifications | |||
| usage of the term: an absolute URI with optional fragment. | ||||
| D.2 Modifications from RFC 2396 | The ad-hoc BNF syntax of RFC 2396 has been replaced with the ABNF of | |||
| [RFC2234]. This change required all rule names that formerly | ||||
| included underscore characters to be renamed with a dash instead. In | ||||
| addition, a number of syntax rules have been eliminated or simplified | ||||
| to make the overall grammar more comprehensible. Specifications that | ||||
| refer to the obsolete grammar rules may be understood by replacing | ||||
| those rules according to the following table: | ||||
| The ad-hoc BNF syntax has been replaced with the ABNF of [RFC2234]. | +----------------+--------------------------------------------------+ | |||
| This change required all rule names that formerly included underscore | | obsolete rule | translation | | |||
| characters to be renamed with a dash instead. | +----------------+--------------------------------------------------+ | |||
| | absoluteURI | absolute-URI | | ||||
| | relativeURI | relative-part [ "?" query ] | | ||||
| | hier_part | ( "//" authority path-abempty / | | ||||
| | | path-absolute ) [ "?" query ] | | ||||
| | | | | ||||
| | opaque_part | path-rootless [ "?" query ] | | ||||
| | net_path | "//" authority path-abempty | | ||||
| | abs_path | path-absolute | | ||||
| | rel_path | path-rootless | | ||||
| | rel_segment | segment-nz-nc | | ||||
| | reg_name | reg-name | | ||||
| | server | authority | | ||||
| | hostport | host [ ":" port ] | | ||||
| | hostname | reg-name | | ||||
| | path_segments | path-abempty | | ||||
| | param | *<pchar excluding ";"> | | ||||
| | | | | ||||
| | uric | unreserved / pct-encoded / ";" / "?" / ":" | | ||||
| | | / "@" / "&" / "=" / "+" / "$" / "," / "/" | | ||||
| | | | | ||||
| | uric_no_slash | unreserved / pct-encoded / ";" / "?" / ":" | | ||||
| | | / "@" / "&" / "=" / "+" / "$" / "," | | ||||
| | | | | ||||
| | mark | "-" / "_" / "." / "!" / "~" / "*" / "'" | | ||||
| | | / "(" / ")" | | ||||
| | | | | ||||
| | escaped | pct-encoded | | ||||
| | hex | HEXDIG | | ||||
| | alphanum | ALPHA / DIGIT | | ||||
| +----------------+--------------------------------------------------+ | ||||
| Use of the above obsolete rules for the definition of scheme-specific | ||||
| syntax is deprecated. | ||||
| Section 2 on characters has been rewritten to explain what characters | Section 2 on characters has been rewritten to explain what characters | |||
| are reserved, when they are reserved, and why they are reserved even | are reserved, when they are reserved, and why they are reserved even | |||
| when not used as delimiters by the generic syntax. The mark | when not used as delimiters by the generic syntax. The mark | |||
| characters that are typically unsafe to decode, including the | characters that are typically unsafe to decode, including the | |||
| exclamation mark ("!"), asterisk ("*"), single-quote ("'"), and open | exclamation mark ("!"), asterisk ("*"), single-quote ("'"), and open | |||
| and close parentheses ("(" and ")"), have been moved to the reserved | and close parentheses ("(" and ")"), have been moved to the reserved | |||
| set in order to clarify the distinction between reserved and | set in order to clarify the distinction between reserved and | |||
| unreserved and hopefully answer the most common question of scheme | unreserved and hopefully answer the most common question of scheme | |||
| designers. Likewise, the section on percent-encoded characters has | designers. Likewise, the section on percent-encoded characters has | |||
| skipping to change at page 53, line 11 ¶ | skipping to change at page 55, line 26 ¶ | |||
| In general, the terms "escaped" and "unescaped" have been replaced | In general, the terms "escaped" and "unescaped" have been replaced | |||
| with "percent-encoded" and "decoded", respectively, to reduce | with "percent-encoded" and "decoded", respectively, to reduce | |||
| confusion with other forms of escape mechanisms. | confusion with other forms of escape mechanisms. | |||
| The ABNF for URI and URI-reference has been redesigned to make them | The ABNF for URI and URI-reference has been redesigned to make them | |||
| more friendly to LALR parsers and reduce complexity. As a result, | more friendly to LALR parsers and reduce complexity. As a result, | |||
| the layout form of syntax description has been removed, along with | the layout form of syntax description has been removed, along with | |||
| the uric, uric_no_slash, opaque_part, net_path, abs_path, rel_path, | the uric, uric_no_slash, opaque_part, net_path, abs_path, rel_path, | |||
| path_segments, rel_segment, and mark rules. All references to | path_segments, rel_segment, and mark rules. All references to | |||
| "opaque" URIs have been replaced with a better description of how the | "opaque" URIs have been replaced with a better description of how the | |||
| path component may be opaque to hierarchy. The ambiguity regarding | path component may be opaque to hierarchy. The relativeURI rule has | |||
| the parsing of URI-reference as a URI or a relative-URI with a colon | been replaced with relative-ref to avoid unnecessary confusion over | |||
| in the first segment has been eliminated through the use of five | whether or not they are a subset of URI. The ambiguity regarding the | |||
| parsing of URI-reference as a URI or a relative-ref with a colon in | ||||
| the first segment has been eliminated through the use of five | ||||
| separate path matching rules. | separate path matching rules. | |||
| The fragment identifier has been moved back into the section on | The fragment identifier has been moved back into the section on | |||
| generic syntax components and within the URI and relative-URI rules, | generic syntax components and within the URI and relative-ref rules, | |||
| though it remains excluded from absolute-URI. The number sign ("#") | though it remains excluded from absolute-URI. The number sign ("#") | |||
| character has been moved back to the reserved set as a result of | character has been moved back to the reserved set as a result of | |||
| reintegrating the fragment syntax. | reintegrating the fragment syntax. | |||
| The ABNF has been corrected to allow a relative path to be empty. | The ABNF has been corrected to allow the path component to be empty. | |||
| This also allows an absolute-URI to consist of nothing after the | This also allows an absolute-URI to consist of nothing after the | |||
| "scheme:", as is present in practice with the "dav:" namespace | "scheme:", as is present in practice with the "dav:" namespace | |||
| [RFC2518] and the "about:" scheme used internally by many WWW browser | [RFC2518] and the "about:" scheme used internally by many WWW browser | |||
| implementations. The ambiguity regarding the boundary between | implementations. The ambiguity regarding the boundary between | |||
| authority and path has been eliminated through the use of five | authority and path has been eliminated through the use of five | |||
| separate path matching rules. | separate path matching rules. | |||
| Registry-based naming authorities that use the generic syntax are now | Registry-based naming authorities that use the generic syntax are now | |||
| defined within the host rule. This change allows current | defined within the host rule. This change allows current | |||
| implementations, where whatever name provided is simply fed to the | implementations, where whatever name provided is simply fed to the | |||
| skipping to change at page 55, line 10 ¶ | skipping to change at page 57, line 10 ¶ | |||
| of the normative references are updated prior to publication, the | of the normative references are updated prior to publication, the | |||
| associated reference in this document can be safely updated as well. | associated reference in this document can be safely updated as well. | |||
| This document has been produced using the xml2rfc tool set; the XML | This document has been produced using the xml2rfc tool set; the XML | |||
| version can be obtained via the URI listed in the editorial note. | version can be obtained via the URI listed in the editorial note. | |||
| Index | Index | |||
| A | A | |||
| ABNF 11 | ABNF 11 | |||
| absolute 26 | absolute 26 | |||
| absolute-path 25 | absolute-path 26 | |||
| absolute-URI 26 | absolute-URI 26 | |||
| access 9 | access 9 | |||
| authority 15, 17 | authority 16, 17 | |||
| B | B | |||
| base URI 28 | base URI 28 | |||
| C | C | |||
| character encoding 4 | character encoding 4 | |||
| character 4 | character 4 | |||
| characters 11 | characters 11 | |||
| coded character set 4 | coded character set 4 | |||
| D | D | |||
| dec-octet 20 | dec-octet 20 | |||
| dereference 9 | dereference 9 | |||
| dot-segments 22 | dot-segments 22 | |||
| F | F | |||
| fragment 15, 23 | fragment 16, 24 | |||
| G | G | |||
| gen-delims 12 | gen-delims 12 | |||
| generic syntax 6 | generic syntax 6 | |||
| H | H | |||
| h16 19 | h16 19 | |||
| hier-part 15 | hier-part 16 | |||
| hierarchical 10 | hierarchical 10 | |||
| host 18 | host 18 | |||
| I | I | |||
| identifier 5 | identifier 5 | |||
| IP-literal 19 | IP-literal 19 | |||
| IPv4 20 | IPv4 20 | |||
| IPv4address 20 | IPv4address 20 | |||
| IPv6 19 | IPv6 19 | |||
| IPv6address 19 | IPv6address 19, 20 | |||
| IPvFuture 19 | IPvFuture 19 | |||
| L | L | |||
| locator 7 | locator 7 | |||
| ls32 19 | ls32 19 | |||
| M | M | |||
| merge 31 | merge 32 | |||
| N | N | |||
| name 7 | name 7 | |||
| network-path 25 | network-path 26 | |||
| P | P | |||
| path 15, 21 | path 16, 22 | |||
| path-abempty 21 | path-abempty 22 | |||
| path-absolute 21 | path-absolute 22 | |||
| path-empty 21 | path-empty 22 | |||
| path-noscheme 21 | path-noscheme 22 | |||
| path-rootless 21 | path-rootless 22 | |||
| path-abempty 15 | path-abempty 16 | |||
| path-absolute 15 | path-absolute 16 | |||
| path-empty 15 | path-empty 16 | |||
| path-rootless 15 | path-rootless 16 | |||
| pchar 21 | pchar 22 | |||
| pct-encoded 12 | pct-encoded 12 | |||
| percent-encoding 12 | percent-encoding 12 | |||
| port 21 | port 21 | |||
| Q | Q | |||
| query 15, 23 | query 16, 23 | |||
| R | R | |||
| reg-name 20 | reg-name 20 | |||
| registered name 20 | registered name 20 | |||
| relative 10, 28 | relative 10, 28 | |||
| relative-path 25 | relative-path 26 | |||
| relative-URI 25 | relative-ref 26 | |||
| remove_dot_segments 31 | remove_dot_segments 32 | |||
| representation 9 | representation 9 | |||
| reserved 12 | reserved 12 | |||
| resolution 9, 28 | resolution 9, 28 | |||
| resource 5 | resource 5 | |||
| retrieval 9 | retrieval 9 | |||
| S | S | |||
| same-document 26 | same-document 27 | |||
| sameness 9 | sameness 9 | |||
| scheme 15, 16 | scheme 16, 16 | |||
| segment 21 | segment 22 | |||
| segment-nz 21 | segment-nz 22 | |||
| segment-nz-nc 21 | segment-nz-nc 22 | |||
| sub-delims 12 | sub-delims 12 | |||
| suffix 27 | suffix 27 | |||
| T | T | |||
| transcription 7 | transcription 7 | |||
| U | U | |||
| uniform 4 | uniform 4 | |||
| unreserved 13 | unreserved 13 | |||
| URI grammar | URI grammar | |||
| skipping to change at page 57, line 27 ¶ | skipping to change at page 59, line 27 ¶ | |||
| DIGIT 11 | DIGIT 11 | |||
| DQUOTE 11 | DQUOTE 11 | |||
| fragment 16, 24, 26 | fragment 16, 24, 26 | |||
| gen-delims 12 | gen-delims 12 | |||
| h16 19 | h16 19 | |||
| HEXDIG 11 | HEXDIG 11 | |||
| hier-part 16 | hier-part 16 | |||
| host 17, 18 | host 17, 18 | |||
| IP-literal 19 | IP-literal 19 | |||
| IPv4address 20 | IPv4address 20 | |||
| IPv6address 19, 19 | IPv6address 19, 20 | |||
| IPvFuture 19 | IPvFuture 19 | |||
| LF 11 | LF 11 | |||
| ls32 19 | ls32 19 | |||
| mark 13 | mark 13 | |||
| OCTET 11 | OCTET 11 | |||
| path 22 | path 22 | |||
| path-abempty 16, 22 | path-abempty 16, 22 | |||
| path-absolute 16, 22 | path-absolute 16, 22 | |||
| path-empty 16, 22 | path-empty 16, 22 | |||
| path-noscheme 22 | path-noscheme 22 | |||
| path-rootless 16, 22 | path-rootless 16, 22 | |||
| pchar 22, 23, 24 | pchar 22, 23, 24 | |||
| pct-encoded 12 | pct-encoded 12 | |||
| port 17, 21 | port 17, 21 | |||
| query 16, 23, 26, 26 | query 16, 23, 26, 26 | |||
| reg-name 20 | reg-name 20 | |||
| relative-URI 25, 26 | relative-ref 25, 26 | |||
| reserved 12 | reserved 12 | |||
| scheme 16, 16, 26 | scheme 16, 16, 26 | |||
| segment 22 | segment 22 | |||
| segment-nz 22 | segment-nz 22 | |||
| segment-nz-nc 22 | segment-nz-nc 22 | |||
| SP 11 | SP 11 | |||
| sub-delims 12 | sub-delims 12 | |||
| unreserved 13 | unreserved 13 | |||
| URI 16, 25 | URI 16, 25 | |||
| URI-reference 25 | URI-reference 25 | |||
| userinfo 17, 17 | userinfo 17, 18 | |||
| URI 15 | URI 16 | |||
| URI-reference 25 | URI-reference 25 | |||
| URL 7 | URL 7 | |||
| URN 7 | URN 7 | |||
| userinfo 17 | userinfo 17, 18 | |||
| Intellectual Property Statement | Intellectual Property Statement | |||
| The IETF takes no position regarding the validity or scope of any | The IETF takes no position regarding the validity or scope of any | |||
| Intellectual Property Rights or other rights that might be claimed to | Intellectual Property Rights or other rights that might be claimed to | |||
| pertain to the implementation or use of the technology described in | pertain to the implementation or use of the technology described in | |||
| this document or the extent to which any license under such rights | this document or the extent to which any license under such rights | |||
| might or might not be available; nor does it represent that it has | might or might not be available; nor does it represent that it has | |||
| made any independent effort to identify any such rights. Information | made any independent effort to identify any such rights. Information | |||
| on the procedures with respect to rights in RFC documents can be | on the procedures with respect to rights in RFC documents can be | |||
| End of changes. 87 change blocks. | ||||
| 233 lines changed or deleted | 305 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||