| < draft-fielding-uri-rfc2396bis-05.txt | draft-fielding-uri-rfc2396bis-06.txt > | |||
|---|---|---|---|---|
| Network Working Group T. Berners-Lee | Network Working Group T. Berners-Lee | |||
| Internet-Draft W3C/MIT | Internet-Draft W3C/MIT | |||
| Updates: 1738 (if approved) R. Fielding | Updates: 1738 (if approved) R. Fielding | |||
| Obsoletes: 2732, 2396, 1808 (if approved) Day Software | Obsoletes: 2732, 2396, 1808 (if approved) Day Software | |||
| L. Masinter | L. Masinter | |||
| Expires: October 15, 2004 Adobe | Expires: January 15, 2005 Adobe | |||
| April 16, 2004 | July 17, 2004 | |||
| Uniform Resource Identifier (URI): Generic Syntax | Uniform Resource Identifier (URI): Generic Syntax | |||
| draft-fielding-uri-rfc2396bis-05 | draft-fielding-uri-rfc2396bis-06 | |||
| Status of this Memo | Status of this Memo | |||
| By submitting this Internet-Draft, I certify that any applicable | This document is an Internet-Draft and is subject to all provisions | |||
| patent or other IPR claims of which I am aware have been disclosed, | of section 3 of RFC 3667. By submitting this Internet-Draft, each | |||
| and any of which I become aware will be disclosed, in accordance with | author represents that any applicable patent or other IPR claims of | |||
| which he or she is aware have been or will be disclosed, and any of | ||||
| which he or she become aware will be disclosed, in accordance with | ||||
| RFC 3668. | RFC 3668. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that other | Task Force (IETF), its areas, and its working groups. Note that | |||
| groups may also distribute working documents as Internet-Drafts. | other groups may also distribute working documents as | |||
| Internet-Drafts. | ||||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| <http://www.ietf.org/ietf/1id-abstracts.txt>. | <http://www.ietf.org/ietf/1id-abstracts.txt>. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| <http://www.ietf.org/shadow.html>. | <http://www.ietf.org/shadow.html>. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The Internet Society (2004). All Rights Reserved. | Copyright (C) The Internet Society (2004). All Rights Reserved. | |||
| Abstract | Abstract | |||
| A Uniform Resource Identifier (URI) is a compact sequence of | A Uniform Resource Identifier (URI) is a compact sequence of | |||
| characters for identifying an abstract or physical resource. This | characters for identifying an abstract or physical resource. This | |||
| specification defines the generic URI syntax and a process for | specification defines the generic URI syntax and a process for | |||
| resolving URI references that might be in relative form, along with | resolving URI references that might be in relative form, along with | |||
| guidelines and security considerations for the use of URIs on the | guidelines and security considerations for the use of URIs on the | |||
| Internet. | Internet. The URI syntax defines a grammar that is a superset of all | |||
| valid URIs, such that an implementation can parse the common | ||||
| The URI syntax defines a grammar that is a superset of all valid | components of a URI reference without knowing the scheme-specific | |||
| URIs, such that an implementation can parse the common components of | requirements of every possible identifier. This specification does | |||
| a URI reference without knowing the scheme-specific requirements of | not define a generative grammar for URIs; that task is performed by | |||
| every possible identifier. This specification does not define a | the individual specifications of each URI scheme. | |||
| generative grammar for URIs; that task is performed by the individual | ||||
| specifications of each URI scheme. | ||||
| Editorial Note | Editorial Note | |||
| Discussion of this draft and comments to the editors should be sent | Discussion of this draft and comments to the editors should be sent | |||
| to the uri@w3.org mailing list. An issues list and version history | to the uri@w3.org mailing list. An issues list and version history | |||
| is available at <http://gbiv.com/protocols/uri/rev-2002/issues.html>. | is available at <http://gbiv.com/protocols/uri/rev-2002/ | |||
| issues.html> [1]. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.1 Overview of URIs . . . . . . . . . . . . . . . . . . . . . 4 | 1.1 Overview of URIs . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.1.1 Generic Syntax . . . . . . . . . . . . . . . . . . . . 6 | 1.1.1 Generic Syntax . . . . . . . . . . . . . . . . . . . . 6 | |||
| 1.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 1.1.3 URI, URL, and URN . . . . . . . . . . . . . . . . . . 6 | 1.1.3 URI, URL, and URN . . . . . . . . . . . . . . . . . . 7 | |||
| 1.2 Design Considerations . . . . . . . . . . . . . . . . . . 7 | 1.2 Design Considerations . . . . . . . . . . . . . . . . . . 7 | |||
| 1.2.1 Transcription . . . . . . . . . . . . . . . . . . . . 7 | 1.2.1 Transcription . . . . . . . . . . . . . . . . . . . . 7 | |||
| 1.2.2 Separating Identification from Interaction . . . . . . 8 | 1.2.2 Separating Identification from Interaction . . . . . . 9 | |||
| 1.2.3 Hierarchical Identifiers . . . . . . . . . . . . . . . 9 | 1.2.3 Hierarchical Identifiers . . . . . . . . . . . . . . . 10 | |||
| 1.3 Syntax Notation . . . . . . . . . . . . . . . . . . . . . 10 | 1.3 Syntax Notation . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 2. Characters . . . . . . . . . . . . . . . . . . . . . . . . . . 10 | 2. Characters . . . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 2.1 Percent-Encoding . . . . . . . . . . . . . . . . . . . . . 11 | 2.1 Percent-Encoding . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 2.2 Reserved Characters . . . . . . . . . . . . . . . . . . . 11 | 2.2 Reserved Characters . . . . . . . . . . . . . . . . . . . 12 | |||
| 2.3 Unreserved Characters . . . . . . . . . . . . . . . . . . 12 | 2.3 Unreserved Characters . . . . . . . . . . . . . . . . . . 13 | |||
| 2.4 When to Encode or Decode . . . . . . . . . . . . . . . . . 13 | 2.4 When to Encode or Decode . . . . . . . . . . . . . . . . . 13 | |||
| 2.5 Identifying Data . . . . . . . . . . . . . . . . . . . . . 13 | 2.5 Identifying Data . . . . . . . . . . . . . . . . . . . . . 14 | |||
| 3. Syntax Components . . . . . . . . . . . . . . . . . . . . . . 15 | 3. Syntax Components . . . . . . . . . . . . . . . . . . . . . . 16 | |||
| 3.1 Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 15 | 3.1 Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 16 | |||
| 3.2 Authority . . . . . . . . . . . . . . . . . . . . . . . . 16 | 3.2 Authority . . . . . . . . . . . . . . . . . . . . . . . . 17 | |||
| 3.2.1 User Information . . . . . . . . . . . . . . . . . . . 17 | 3.2.1 User Information . . . . . . . . . . . . . . . . . . . 17 | |||
| 3.2.2 Host . . . . . . . . . . . . . . . . . . . . . . . . . 17 | 3.2.2 Host . . . . . . . . . . . . . . . . . . . . . . . . . 18 | |||
| 3.2.3 Port . . . . . . . . . . . . . . . . . . . . . . . . . 20 | 3.2.3 Port . . . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
| 3.3 Path . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 | 3.3 Path . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
| 3.4 Query . . . . . . . . . . . . . . . . . . . . . . . . . . 22 | 3.4 Query . . . . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
| 3.5 Fragment . . . . . . . . . . . . . . . . . . . . . . . . . 23 | 3.5 Fragment . . . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
| 4. Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 | 4. Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 | |||
| 4.1 URI Reference . . . . . . . . . . . . . . . . . . . . . . 24 | 4.1 URI Reference . . . . . . . . . . . . . . . . . . . . . . 25 | |||
| 4.2 Relative URI . . . . . . . . . . . . . . . . . . . . . . . 25 | 4.2 Relative URI . . . . . . . . . . . . . . . . . . . . . . . 26 | |||
| 4.3 Absolute URI . . . . . . . . . . . . . . . . . . . . . . . 25 | 4.3 Absolute URI . . . . . . . . . . . . . . . . . . . . . . . 26 | |||
| 4.4 Same-document Reference . . . . . . . . . . . . . . . . . 25 | 4.4 Same-document Reference . . . . . . . . . . . . . . . . . 26 | |||
| 4.5 Suffix Reference . . . . . . . . . . . . . . . . . . . . . 26 | 4.5 Suffix Reference . . . . . . . . . . . . . . . . . . . . . 27 | |||
| 5. Reference Resolution . . . . . . . . . . . . . . . . . . . . . 27 | 5. Reference Resolution . . . . . . . . . . . . . . . . . . . . . 28 | |||
| 5.1 Establishing a Base URI . . . . . . . . . . . . . . . . . 27 | 5.1 Establishing a Base URI . . . . . . . . . . . . . . . . . 28 | |||
| 5.1.1 Base URI Embedded in Content . . . . . . . . . . . . . 27 | 5.1.1 Base URI Embedded in Content . . . . . . . . . . . . . 28 | |||
| 5.1.2 Base URI from the Encapsulating Entity . . . . . . . . 28 | 5.1.2 Base URI from the Encapsulating Entity . . . . . . . . 29 | |||
| 5.1.3 Base URI from the Retrieval URI . . . . . . . . . . . 28 | 5.1.3 Base URI from the Retrieval URI . . . . . . . . . . . 29 | |||
| 5.1.4 Default Base URI . . . . . . . . . . . . . . . . . . . 28 | 5.1.4 Default Base URI . . . . . . . . . . . . . . . . . . . 29 | |||
| 5.2 Relative Resolution . . . . . . . . . . . . . . . . . . . 29 | 5.2 Relative Resolution . . . . . . . . . . . . . . . . . . . 30 | |||
| 5.2.1 Pre-parse the Base URI . . . . . . . . . . . . . . . . 29 | 5.2.1 Pre-parse the Base URI . . . . . . . . . . . . . . . . 30 | |||
| 5.2.2 Transform References . . . . . . . . . . . . . . . . . 29 | 5.2.2 Transform References . . . . . . . . . . . . . . . . . 30 | |||
| 5.2.3 Merge Paths . . . . . . . . . . . . . . . . . . . . . 30 | 5.2.3 Merge Paths . . . . . . . . . . . . . . . . . . . . . 31 | |||
| 5.2.4 Remove Dot Segments . . . . . . . . . . . . . . . . . 31 | 5.2.4 Remove Dot Segments . . . . . . . . . . . . . . . . . 32 | |||
| 5.3 Component Recomposition . . . . . . . . . . . . . . . . . 33 | 5.3 Component Recomposition . . . . . . . . . . . . . . . . . 34 | |||
| 5.4 Reference Resolution Examples . . . . . . . . . . . . . . 34 | 5.4 Reference Resolution Examples . . . . . . . . . . . . . . 34 | |||
| 5.4.1 Normal Examples . . . . . . . . . . . . . . . . . . . 34 | 5.4.1 Normal Examples . . . . . . . . . . . . . . . . . . . 35 | |||
| 5.4.2 Abnormal Examples . . . . . . . . . . . . . . . . . . 34 | 5.4.2 Abnormal Examples . . . . . . . . . . . . . . . . . . 35 | |||
| 6. Normalization and Comparison . . . . . . . . . . . . . . . . . 36 | 6. Normalization and Comparison . . . . . . . . . . . . . . . . . 36 | |||
| 6.1 Equivalence . . . . . . . . . . . . . . . . . . . . . . . 36 | 6.1 Equivalence . . . . . . . . . . . . . . . . . . . . . . . 37 | |||
| 6.2 Comparison Ladder . . . . . . . . . . . . . . . . . . . . 37 | 6.2 Comparison Ladder . . . . . . . . . . . . . . . . . . . . 37 | |||
| 6.2.1 Simple String Comparison . . . . . . . . . . . . . . . 37 | 6.2.1 Simple String Comparison . . . . . . . . . . . . . . . 38 | |||
| 6.2.2 Syntax-based Normalization . . . . . . . . . . . . . . 37 | 6.2.2 Syntax-based Normalization . . . . . . . . . . . . . . 38 | |||
| 6.2.3 Scheme-based Normalization . . . . . . . . . . . . . . 38 | 6.2.3 Scheme-based Normalization . . . . . . . . . . . . . . 39 | |||
| 6.2.4 Protocol-based Normalization . . . . . . . . . . . . . 39 | 6.2.4 Protocol-based Normalization . . . . . . . . . . . . . 40 | |||
| 6.3 Canonical Form . . . . . . . . . . . . . . . . . . . . . . 40 | 6.3 Canonical Form . . . . . . . . . . . . . . . . . . . . . . 40 | |||
| 7. Security Considerations . . . . . . . . . . . . . . . . . . . 40 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 41 | |||
| 7.1 Reliability and Consistency . . . . . . . . . . . . . . . 40 | 7.1 Reliability and Consistency . . . . . . . . . . . . . . . 41 | |||
| 7.2 Malicious Construction . . . . . . . . . . . . . . . . . . 41 | 7.2 Malicious Construction . . . . . . . . . . . . . . . . . . 41 | |||
| 7.3 Back-end Transcoding . . . . . . . . . . . . . . . . . . . 41 | 7.3 Back-end Transcoding . . . . . . . . . . . . . . . . . . . 42 | |||
| 7.4 Rare IP Address Formats . . . . . . . . . . . . . . . . . 42 | 7.4 Rare IP Address Formats . . . . . . . . . . . . . . . . . 43 | |||
| 7.5 Sensitive Information . . . . . . . . . . . . . . . . . . 43 | 7.5 Sensitive Information . . . . . . . . . . . . . . . . . . 44 | |||
| 7.6 Semantic Attacks . . . . . . . . . . . . . . . . . . . . . 43 | 7.6 Semantic Attacks . . . . . . . . . . . . . . . . . . . . . 44 | |||
| 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 44 | 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 44 | |||
| 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 45 | 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 44 | |||
| 9.1 Normative References . . . . . . . . . . . . . . . . . . . . 45 | 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 45 | |||
| 9.2 Informative References . . . . . . . . . . . . . . . . . . . 45 | 10.1 Normative References . . . . . . . . . . . . . . . . . . . . 45 | |||
| 10.2 Informative References . . . . . . . . . . . . . . . . . . . 45 | ||||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 47 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 47 | |||
| A. Collected ABNF for URI . . . . . . . . . . . . . . . . . . . . 48 | A. Collected ABNF for URI . . . . . . . . . . . . . . . . . . . . 48 | |||
| B. Parsing a URI Reference with a Regular Expression . . . . . . 50 | B. Parsing a URI Reference with a Regular Expression . . . . . . 50 | |||
| C. Delimiting a URI in Context . . . . . . . . . . . . . . . . . 51 | C. Delimiting a URI in Context . . . . . . . . . . . . . . . . . 50 | |||
| D. Summary of Non-editorial Changes . . . . . . . . . . . . . . . 52 | D. Summary of Non-editorial Changes . . . . . . . . . . . . . . . 52 | |||
| D.1 Additions . . . . . . . . . . . . . . . . . . . . . . . . 52 | D.1 Additions . . . . . . . . . . . . . . . . . . . . . . . . 52 | |||
| D.2 Modifications from RFC 2396 . . . . . . . . . . . . . . . 53 | D.2 Modifications from RFC 2396 . . . . . . . . . . . . . . . 52 | |||
| Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 | E. Instructions to RFC Editor . . . . . . . . . . . . . . . . . . 54 | |||
| Intellectual Property and Copyright Statements . . . . . . . . 58 | Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 | |||
| Intellectual Property and Copyright Statements . . . . . . . . 59 | ||||
| 1. Introduction | 1. Introduction | |||
| A Uniform Resource Identifier (URI) provides a simple and extensible | A Uniform Resource Identifier (URI) provides a simple and extensible | |||
| means for identifying a resource. This specification of URI syntax | means for identifying a resource. This specification of URI syntax | |||
| and semantics is derived from concepts introduced by the World Wide | and semantics is derived from concepts introduced by the World Wide | |||
| Web global information initiative, whose use of such identifiers | Web global information initiative, whose use of such identifiers | |||
| dates from 1990 and is described in "Universal Resource Identifiers | dates from 1990 and is described in "Universal Resource Identifiers | |||
| in WWW" [RFC1630], and is designed to meet the recommendations laid | in WWW" [RFC1630], and is designed to meet the recommendations laid | |||
| out in "Functional Recommendations for Internet Resource Locators" | out in "Functional Recommendations for Internet Resource Locators" | |||
| [RFC1736] and "Functional Requirements for Uniform Resource Names" | [RFC1736] and "Functional Requirements for Uniform Resource Names" | |||
| [RFC1737]. | [RFC1737]. | |||
| This document obsoletes [RFC2396], which merged "Uniform Resource | This document obsoletes [RFC2396], which merged "Uniform Resource | |||
| Locators" [RFC1738] and "Relative Uniform Resource Locators" | Locators" [RFC1738] and "Relative Uniform Resource Locators" | |||
| [RFC1808] in order to define a single, generic syntax for all URIs. | [RFC1808] in order to define a single, generic syntax for all URIs. | |||
| It excludes those portions of RFC 1738 that defined the specific | It contains the updates from, and obsoletes, [RFC2732], which | |||
| syntax of individual URI schemes; those portions will be updated as | introduced syntax for IPv6 addresses. It excludes those portions of | |||
| separate documents. The process for registration of new URI schemes | RFC 1738 that defined the specific syntax of individual URI schemes; | |||
| is defined separately by [RFC2717]. Advice for designers of new URI | those portions will be updated as separate documents. The process | |||
| schemes can be found in [RFC2718]. | for registration of new URI schemes is defined separately by [BCP35]. | |||
| Advice for designers of new URI schemes can be found in [RFC2718]. | ||||
| All significant changes from RFC 2396 are noted in Appendix D. | All significant changes from RFC 2396 are noted in Appendix D. | |||
| This specification uses the terms "character" and "coded character | This specification uses the terms "character" and "coded character | |||
| set" in accordance with the definitions provided in [RFC2978], and | set" in accordance with the definitions provided in [BCP19], and | |||
| "character encoding" in place of what [RFC2978] refers to as a | "character encoding" in place of what [BCP19] refers to as a | |||
| "charset". | "charset". | |||
| 1.1 Overview of URIs | 1.1 Overview of URIs | |||
| URIs are characterized as follows: | URIs are characterized as follows: | |||
| Uniform | Uniform | |||
| Uniformity provides several benefits: it allows different types of | Uniformity provides several benefits: it allows different types of | |||
| resource identifiers to be used in the same context, even when the | resource identifiers to be used in the same context, even when the | |||
| mechanisms used to access those resources may differ; it allows | mechanisms used to access those resources may differ; it allows | |||
| uniform semantic interpretation of common syntactic conventions | uniform semantic interpretation of common syntactic conventions | |||
| across different types of resource identifiers; it allows | across different types of resource identifiers; it allows | |||
| introduction of new types of resource identifiers without | introduction of new types of resource identifiers without | |||
| interfering with the way that existing identifiers are used; and, | interfering with the way that existing identifiers are used; and, | |||
| it allows the identifiers to be reused in many different contexts, | it allows the identifiers to be reused in many different contexts, | |||
| thus permitting new applications or protocols to leverage a | thus permitting new applications or protocols to leverage a | |||
| pre-existing, large, and widely-used set of resource identifiers. | pre-existing, large, and widely-used set of resource identifiers. | |||
| Resource | Resource | |||
| Anything that has been named or described can be a resource. | ||||
| Familiar examples include an electronic document, an image, a | This specification does not limit the scope of what might be a | |||
| service (e.g., "today's weather report for Los Angeles"), and a | resource; rather, the term "resource" is used in a general sense | |||
| collection of other resources. A resource is not necessarily | for whatever might be identified by a URI. Familiar examples | |||
| include an electronic document, an image, a source of information | ||||
| with consistent purpose (e.g., "today's weather report for Los | ||||
| Angeles"), a service (e.g., an HTTP to SMS gateway), a collection | ||||
| of other resources, and so on. A resource is not necessarily | ||||
| accessible via the Internet; e.g., human beings, corporations, and | accessible via the Internet; e.g., human beings, corporations, and | |||
| bound books in a library can also be resources. Likewise, abstract | bound books in a library can also be resources. Likewise, | |||
| concepts can be resources, such as the operators and operands of a | abstract concepts can be resources, such as the operators and | |||
| mathematical equation, the types of a relationship (e.g., "parent" | operands of a mathematical equation, the types of a relationship | |||
| or "employee"), or numeric values (e.g., zero, one, and infinity). | (e.g., "parent" or "employee"), or numeric values (e.g., zero, | |||
| These things are called resources because they each can be | one, and infinity). | |||
| considered a source of supply or support, or an available means, | ||||
| for some system, where such systems may be as diverse as the World | ||||
| Wide Web, a filesystem, an ontological graph, a theorem prover, or | ||||
| some other form of system for the direct or indirect observation | ||||
| and/or manipulation of resources. Note that "supply" is not | ||||
| necessary for a thing to be considered a resource: the ability to | ||||
| simply refer to that thing is often sufficient to support the | ||||
| operation of a given system. | ||||
| Identifier | Identifier | |||
| An identifier embodies the information required to distinguish | An identifier embodies the information required to distinguish | |||
| what is being identified from all other things within its scope of | what is being identified from all other things within its scope of | |||
| identification. Our use of the terms "identify" and "identifying" | identification. Our use of the terms "identify" and "identifying" | |||
| refer to this process of distinguishing from many to one; they | refer to this purpose of distinguishing one resource from all | |||
| should not be mistaken as an assumption that the identifier | other resources, regardless of how that purpose is accomplished | |||
| defines the identity of what is referenced, though that may be the | (e.g., by name, address, context, etc.). These terms should not | |||
| case for some identifiers. | be mistaken as an assumption that an identifier defines or | |||
| embodies the identity of what is referenced, though that may be | ||||
| the case for some identifiers. Nor should it be assumed that a | ||||
| system using URIs will access the resource identified: in many | ||||
| cases, URIs are used to denote resources without any intention | ||||
| that they be accessed. Likewise, the "one" resource identified | ||||
| might not be singular in nature (e.g., a resource might be a named | ||||
| set or a mapping that varies over time). | ||||
| A URI is an identifier that consists of a sequence of characters | A URI is an identifier, consisting of a sequence of characters | |||
| matching the syntax rule named <URI> in Section 3. A URI can be used | matching the syntax rule named <URI> in Section 3, that enables | |||
| to refer to a resource. This specification does not place any limits | uniform identification of resources via a separately defined, | |||
| on the nature of a resource, the reasons why an application might | extensible set of naming schemes (Section 3.1). How that | |||
| wish to refer to a resource, or the kinds of system that might use | identification is accomplished, assigned, or enabled is delegated to | |||
| URIs for the sake of identifying resources. | each scheme specification. | |||
| URIs have a global scope and must be interpreted consistently | This specification does not place any limits on the nature of a | |||
| regardless of context, though the result of that interpretation may | resource, the reasons why an application might wish to refer to a | |||
| be in relation to the end-user's context. For example, "http:// | resource, or the kinds of system that might use URIs for the sake of | |||
| localhost/" has the same interpretation for every user of that | identifying resources. This specification does not require that a | |||
| reference, even though the network interface corresponding to | URI persists in identifying the same resource over all time, though | |||
| "localhost" may be different for each end-user: interpretation is | that is a common goal of all URI schemes. Nevertheless, nothing in | |||
| independent of access. However, an action made on the basis of that | this specification prevents an application from limiting itself to | |||
| reference will take place in relation to the end-user's context, | particular types of resources, or to a subset of URIs that maintains | |||
| which implies that an action intended to refer to a single, globally | characteristics desired by that application. | |||
| unique thing must use a URI that distinguishes that resource from all | ||||
| other things. URIs that identify in relation to the end-user's local | URIs have a global scope and are interpreted consistently regardless | |||
| context should only be used when the context itself is a defining | of context, though the result of that interpretation may be in | |||
| aspect of the resource, such as when an on-line Linux manual refers | relation to the end-user's context. For example, "http://localhost/" | |||
| to a file on the end-user's filesystem (e.g., "file:///etc/hosts"). | has the same interpretation for every user of that reference, even | |||
| though the network interface corresponding to "localhost" may be | ||||
| different for each end-user: interpretation is independent of access. | ||||
| However, an action made on the basis of that reference will take | ||||
| place in relation to the end-user's context, which implies that an | ||||
| action intended to refer to a single, globally unique thing must use | ||||
| a URI that distinguishes that resource from all other things. URIs | ||||
| that identify in relation to the end-user's local context should only | ||||
| be used when the context itself is a defining aspect of the resource, | ||||
| such as when an on-line help manual refers to a file on the | ||||
| end-user's filesystem (e.g., "file:///etc/hosts"). | ||||
| 1.1.1 Generic Syntax | 1.1.1 Generic Syntax | |||
| Each URI begins with a scheme name, as defined in Section 3.1, that | Each URI begins with a scheme name, as defined in Section 3.1, that | |||
| refers to a specification for assigning identifiers within that | refers to a specification for assigning identifiers within that | |||
| scheme. As such, the URI syntax is a federated and extensible naming | scheme. As such, the URI syntax is a federated and extensible naming | |||
| system wherein each scheme's specification may further restrict the | system wherein each scheme's specification may further restrict the | |||
| syntax and semantics of identifiers using that scheme. | syntax and semantics of identifiers using that scheme. | |||
| This specification defines those elements of the URI syntax that are | This specification defines those elements of the URI syntax that are | |||
| required of all URI schemes or are common to many URI schemes. It | required of all URI schemes or are common to many URI schemes. It | |||
| thus defines the syntax and semantics that are needed to implement a | thus defines the syntax and semantics that are needed to implement a | |||
| scheme-independent parsing mechanism for URI references, such that | scheme-independent parsing mechanism for URI references, such that | |||
| the scheme-dependent handling of a URI can be postponed until the | the scheme-dependent handling of a URI can be postponed until the | |||
| scheme-dependent semantics are needed. Likewise, protocols and data | scheme-dependent semantics are needed. Likewise, protocols and data | |||
| formats that make use of URI references can refer to this | formats that make use of URI references can refer to this | |||
| specification as defining the range of syntax allowed for all URIs, | specification as defining the range of syntax allowed for all URIs, | |||
| including those schemes that have yet to be defined. | including those schemes that have yet to be defined, thus decoupling | |||
| the evolution of identification schemes from the evolution of | ||||
| protocols, data formats, and implementations that make use of URIs. | ||||
| A parser of the generic URI syntax is capable of parsing any URI | A parser of the generic URI syntax is capable of parsing any URI | |||
| reference into its major components; once the scheme is determined, | reference into its major components; once the scheme is determined, | |||
| further scheme-specific parsing can be performed on the components. | further scheme-specific parsing can be performed on the components. | |||
| In other words, the URI generic syntax is a superset of the syntax of | In other words, the URI generic syntax is a superset of the syntax of | |||
| all URI schemes. | all URI schemes. | |||
| 1.1.2 Examples | 1.1.2 Examples | |||
| The following examples illustrate URIs that are in common use. | The following example URIs illustrate several URI schemes and | |||
| variations in their common syntax components: | ||||
| ftp://ftp.is.co.za/rfc/rfc1808.txt | ftp://ftp.is.co.za/rfc/rfc1808.txt | |||
| http://www.ietf.org/rfc/rfc2396.txt | http://www.ietf.org/rfc/rfc2396.txt | |||
| ldap://[2001:db8::7]/c=GB?objectClass?one | ||||
| mailto:John.Doe@example.com | mailto:John.Doe@example.com | |||
| news:comp.infosystems.www.servers.unix | news:comp.infosystems.www.servers.unix | |||
| telnet://melvyl.ucop.edu/ | tel:+1-816-555-1212 | |||
| telnet://192.0.2.16:80/ | ||||
| urn:oasis:names:specification:docbook:dtd:xml:4.1.2 | ||||
| 1.1.3 URI, URL, and URN | 1.1.3 URI, URL, and URN | |||
| A URI can be further classified as a locator, a name, or both. The | A URI can be further classified as a locator, a name, or both. The | |||
| term "Uniform Resource Locator" (URL) refers to the subset of URIs | term "Uniform Resource Locator" (URL) refers to the subset of URIs | |||
| that, in addition to identifying a resource, provide a means of | that, in addition to identifying a resource, provide a means of | |||
| locating the resource by describing its primary access mechanism | locating the resource by describing its primary access mechanism | |||
| (e.g., its network "location"). The term "Uniform Resource Name" | (e.g., its network "location"). The term "Uniform Resource Name" | |||
| (URN) has been used historically to refer to both URIs under the | (URN) has been used historically to refer to both URIs under the | |||
| "urn" scheme [RFC2141], which are required to remain globally unique | "urn" scheme [RFC2141], which are required to remain globally unique | |||
| skipping to change at page 8, line 38 ¶ | skipping to change at page 9, line 25 ¶ | |||
| on the resource, as might be characterized by such words as "access", | on the resource, as might be characterized by such words as "access", | |||
| "update", "replace", or "find attributes". Such operations are | "update", "replace", or "find attributes". Such operations are | |||
| defined by the protocols that make use of URIs, not by this | defined by the protocols that make use of URIs, not by this | |||
| specification. However, we do use a few general terms for describing | specification. However, we do use a few general terms for describing | |||
| common operations on URIs. URI "resolution" is the process of | common operations on URIs. URI "resolution" is the process of | |||
| determining an access mechanism and the appropriate parameters | determining an access mechanism and the appropriate parameters | |||
| necessary to dereference a URI; such resolution may require several | necessary to dereference a URI; such resolution may require several | |||
| iterations. To use that access mechanism to perform an action on the | iterations. To use that access mechanism to perform an action on the | |||
| URI's resource is to "dereference" the URI. | URI's resource is to "dereference" the URI. | |||
| When URIs are used within information systems to identify sources of | When URIs are used within information retrieval systems to identify | |||
| information, the most common form of URI dereference is "retrieval": | sources of information, the most common form of URI dereference is | |||
| making use of a URI in order to retrieve a representation of its | "retrieval": making use of a URI in order to retrieve a | |||
| associated resource. A "representation" is a sequence of octets, | representation of its associated resource. A "representation" is a | |||
| along with representation metadata describing those octets, that | sequence of octets, along with representation metadata describing | |||
| constitutes a record of the state of the resource at the time that | those octets, that constitutes a record of the state of the resource | |||
| the representation is generated. Retrieval is achieved by a process | at the time that the representation is generated. Retrieval is | |||
| that might include using the URI as a cache key to check for a | achieved by a process that might include using the URI as a cache key | |||
| locally cached representation, resolution of the URI to determine an | to check for a locally cached representation, resolution of the URI | |||
| appropriate access mechanism (if any), and dereference of the URI for | to determine an appropriate access mechanism (if any), and | |||
| the sake of applying a retrieval operation. Depending on the | dereference of the URI for the sake of applying a retrieval | |||
| protocols used to perform the retrieval, additional information might | operation. Depending on the protocols used to perform the retrieval, | |||
| be supplied about the resource (resource metadata) and its relation | additional information might be supplied about the resource (resource | |||
| to other resources. | metadata) and its relation to other resources. | |||
| URI references in information systems are designed to be | URI references in information retrieval systems are designed to be | |||
| late-binding: the result of an access is generally determined at the | late-binding: the result of an access is generally determined at the | |||
| time it is accessed and may vary over time or due to other aspects of | time it is accessed and may vary over time or due to other aspects of | |||
| the interaction. Such references are created in order to be be used | the interaction. Such references are created in order to be used in | |||
| in the future: what is being identified is not some specific result | the future: what is being identified is not some specific result that | |||
| that was obtained in the past, but rather some characteristic that is | was obtained in the past, but rather some characteristic that is | |||
| expected to be true for future results. In such cases, the resource | expected to be true for future results. In such cases, the resource | |||
| referred to by the URI is actually a sameness of characteristics as | referred to by the URI is actually a sameness of characteristics as | |||
| observed over time, perhaps elucidated by additional comments or | observed over time, perhaps elucidated by additional comments or | |||
| assertions made by the resource provider. | assertions made by the resource provider. | |||
| Although many URI schemes are named after protocols, this does not | Although many URI schemes are named after protocols, this does not | |||
| imply that use of such a URI will result in access to the resource | imply that use of such a URI will result in access to the resource | |||
| via the named protocol. URIs are often used simply for the sake of | via the named protocol. URIs are often used simply for the sake of | |||
| identification. Even when a URI is used to retrieve a representation | identification. Even when a URI is used to retrieve a representation | |||
| of a resource, that access might be through gateways, proxies, | of a resource, that access might be through gateways, proxies, | |||
| skipping to change at page 9, line 33 ¶ | skipping to change at page 10, line 19 ¶ | |||
| URIs may require the use of more than one protocol (e.g., both DNS | URIs may require the use of more than one protocol (e.g., both DNS | |||
| and HTTP are typically used to access an "http" URI's origin server | and HTTP are typically used to access an "http" URI's origin server | |||
| when a representation isn't found in a local cache). | when a representation isn't found in a local cache). | |||
| 1.2.3 Hierarchical Identifiers | 1.2.3 Hierarchical Identifiers | |||
| The URI syntax is organized hierarchically, with components listed in | The URI syntax is organized hierarchically, with components listed in | |||
| order of decreasing significance from left to right. For some URI | order of decreasing significance from left to right. For some URI | |||
| schemes, the visible hierarchy is limited to the scheme itself: | schemes, the visible hierarchy is limited to the scheme itself: | |||
| everything after the scheme component delimiter (":") is considered | everything after the scheme component delimiter (":") is considered | |||
| opaque to URI processing. Other URI schemes make the hierarchy | opaque to URI processing. Other URI schemes make the hierarchy | |||
| explicit and visible to generic parsing algorithms. | explicit and visible to generic parsing algorithms. | |||
| The generic syntax uses the slash ("/"), question mark ("?"), and | The generic syntax uses the slash ("/"), question mark ("?"), and | |||
| number sign ("#") characters for the purpose of delimiting components | number sign ("#") characters for the purpose of delimiting components | |||
| that are significant to the generic parser's hierarchical | that are significant to the generic parser's hierarchical | |||
| interpretation of an identifier. In addition to aiding the | interpretation of an identifier. In addition to aiding the | |||
| readability of such identifiers through the consistent use of | readability of such identifiers through the consistent use of | |||
| familiar syntax, this uniform representation of hierarchy across | familiar syntax, this uniform representation of hierarchy across | |||
| naming schemes allows scheme-independent references to be made | naming schemes allows scheme-independent references to be made | |||
| relative to that hierarchy. | relative to that hierarchy. | |||
| It is often the case that a group or "tree" of documents has been | It is often the case that a group or "tree" of documents has been | |||
| constructed to serve a common purpose, wherein the vast majority of | constructed to serve a common purpose, wherein the vast majority of | |||
| URIs in these documents point to resources within the tree rather | URIs in these documents point to resources within the tree rather | |||
| than outside of it. Similarly, documents located at a particular | than outside of it. Similarly, documents located at a particular | |||
| site are much more likely to refer to other resources at that site | site are much more likely to refer to other resources at that site | |||
| than to resources at remote sites. Relative referencing of URIs | than to resources at remote sites. Relative referencing of URIs | |||
| allows document trees to be partially independent of their location | allows document trees to be partially independent of their location | |||
| and access scheme. For instance, it is possible for a single set of | and access scheme. For instance, it is possible for a single set of | |||
| hypertext documents to be simultaneously accessible and traversable | hypertext documents to be simultaneously accessible and traversable | |||
| via each of the "file", "http", and "ftp" schemes if the documents | via each of the "file", "http", and "ftp" schemes if the documents | |||
| refer to each other using relative references. Furthermore, such | refer to each other using relative references. Furthermore, such | |||
| document trees can be moved, as a whole, without changing any of the | document trees can be moved, as a whole, without changing any of the | |||
| relative references. | relative references. | |||
| A relative URI reference (Section 4.2) refers to a resource by | A relative URI reference (Section 4.2) refers to a resource by | |||
| describing the difference within a hierarchical name space between | describing the difference within a hierarchical name space between | |||
| the reference context and the target URI. The reference resolution | the reference context and the target URI. The reference resolution | |||
| algorithm, presented in Section 5, defines how such a reference is | algorithm, presented in Section 5, defines how such a reference is | |||
| transformed to the target URI. Since relative references can only be | transformed to the target URI. Since relative references can only be | |||
| used within the context of a hierarchical URI, designers of new URI | used within the context of a hierarchical URI, designers of new URI | |||
| schemes should use a syntax consistent with the generic syntax's | schemes should use a syntax consistent with the generic syntax's | |||
| hierarchical components unless there are compelling reasons to forbid | hierarchical components unless there are compelling reasons to forbid | |||
| relative referencing within that scheme. | relative referencing within that scheme. | |||
| All URIs are parsed by generic syntax parsers when used. A URI scheme | All URIs are parsed by generic syntax parsers when used. A URI | |||
| that wishes to remain opaque to hierarchical processing must disallow | scheme that wishes to remain opaque to hierarchical processing must | |||
| the use of slash and question mark characters. However, since a URI | disallow the use of slash and question mark characters. However, | |||
| reference is only modified by the generic parser if it contains a | since a URI reference is only modified by the generic parser if it | |||
| dot-segment (a complete path segment of "." or "..", as described in | contains a dot-segment (a complete path segment of "." or "..", as | |||
| Section 3.3), URI schemes may safely use "/" for other purposes if | described in Section 3.3), URI schemes may safely use "/" for other | |||
| they do not allow dot-segments. | purposes if they do not allow dot-segments. | |||
| 1.3 Syntax Notation | 1.3 Syntax Notation | |||
| This specification uses the Augmented Backus-Naur Form (ABNF) | This specification uses the Augmented Backus-Naur Form (ABNF) | |||
| notation of [RFC2234], including the following core ABNF syntax rules | notation of [RFC2234], including the following core ABNF syntax rules | |||
| defined by that specification: ALPHA (letters), CR (carriage return), | defined by that specification: ALPHA (letters), CR (carriage return), | |||
| DIGIT (decimal digits), DQUOTE (double quote), HEXDIG (hexadecimal | DIGIT (decimal digits), DQUOTE (double quote), HEXDIG (hexadecimal | |||
| digits), LF (line feed), and SP (space). The complete URI syntax is | digits), LF (line feed), and SP (space). The complete URI syntax is | |||
| collected in Appendix A. | collected in Appendix A. | |||
| 2. Characters | 2. Characters | |||
| The URI syntax provides a method of encoding data, presumably for the | The URI syntax provides a method of encoding data, presumably for the | |||
| sake of identifying a resource, as a sequence of characters. The URI | sake of identifying a resource, as a sequence of characters. The URI | |||
| characters are, in turn, frequently encoded as octets for transport | characters are, in turn, frequently encoded as octets for transport | |||
| or presentation. This specification does not mandate any particular | or presentation. This specification does not mandate any particular | |||
| character encoding for mapping between URI characters and the octets | character encoding for mapping between URI characters and the octets | |||
| used to store or transmit those characters. When a URI appears in a | used to store or transmit those characters. When a URI appears in a | |||
| protocol element, the character encoding is defined by that protocol; | protocol element, the character encoding is defined by that protocol; | |||
| absent such a definition, a URI is assumed to be in the same | absent such a definition, a URI is assumed to be in the same | |||
| character encoding as the surrounding text. | character encoding as the surrounding text. | |||
| The ABNF notation defines its terminal values to be non-negative | The ABNF notation defines its terminal values to be non-negative | |||
| integers (codepoints) based on the US-ASCII coded character set | integers (codepoints) based on the US-ASCII coded character set | |||
| [ASCII]. Since a URI is a sequence of characters, we must invert | [ASCII]. Since a URI is a sequence of characters, we must invert | |||
| that relation in order to understand the URI syntax. Therefore, the | that relation in order to understand the URI syntax. Therefore, the | |||
| integer values used by the ABNF must be mapped back to their | integer values used by the ABNF must be mapped back to their | |||
| corresponding characters via US-ASCII in order to complete the syntax | corresponding characters via US-ASCII in order to complete the syntax | |||
| rules. | rules. | |||
| A URI is composed from a limited set of characters consisting of | A URI is composed from a limited set of characters consisting of | |||
| digits, letters, and a few graphic symbols. A reserved subset of | digits, letters, and a few graphic symbols. A reserved subset of | |||
| those characters may be used to delimit syntax components within a | those characters may be used to delimit syntax components within a | |||
| URI, while the remaining characters, including both the unreserved | URI, while the remaining characters, including both the unreserved | |||
| set and those reserved characters not acting as delimiters, define | set and those reserved characters not acting as delimiters, define | |||
| each component's identifying data. | each component's identifying data. | |||
| 2.1 Percent-Encoding | 2.1 Percent-Encoding | |||
| A percent-encoding mechanism is used to represent a data octet in a | A percent-encoding mechanism is used to represent a data octet in a | |||
| component when that octet's corresponding character is outside the | component when that octet's corresponding character is outside the | |||
| allowed set or is being used as a delimiter of, or within, the | allowed set or is being used as a delimiter of, or within, the | |||
| component. A percent-encoded octet is encoded as a character triplet, | component. A percent-encoded octet is encoded as a character | |||
| consisting of the percent character "%" followed by the two | triplet, consisting of the percent character "%" followed by the two | |||
| hexadecimal digits representing that octet's numeric value. For | hexadecimal digits representing that octet's numeric value. For | |||
| example, "%20" is the percent-encoding for the binary octet | example, "%20" is the percent-encoding for the binary octet | |||
| "00100000" (ABNF: %x20), which in US-ASCII corresponds to the space | "00100000" (ABNF: %x20), which in US-ASCII corresponds to the space | |||
| character (SP). Section 2.4 describes when percent-encoding and | character (SP). Section 2.4 describes when percent-encoding and | |||
| decoding is applied. | decoding is applied. | |||
| pct-encoded = "%" HEXDIG HEXDIG | pct-encoded = "%" HEXDIG HEXDIG | |||
| The uppercase hexadecimal digits 'A' through 'F' are equivalent to | The uppercase hexadecimal digits 'A' through 'F' are equivalent to | |||
| the lowercase digits 'a' through 'f', respectively. Two URIs that | the lowercase digits 'a' through 'f', respectively. Two URIs that | |||
| differ only in the case of hexadecimal digits used in percent-encoded | differ only in the case of hexadecimal digits used in percent-encoded | |||
| octets are equivalent. For consistency, URI producers and | octets are equivalent. For consistency, URI producers and | |||
| normalizers should use uppercase hexadecimal digits for all | normalizers should use uppercase hexadecimal digits for all | |||
| percent-encodings. | percent-encodings. | |||
| skipping to change at page 12, line 4 ¶ | skipping to change at page 12, line 39 ¶ | |||
| URIs include components and subcomponents that are delimited by | URIs include components and subcomponents that are delimited by | |||
| characters in the "reserved" set. These characters are called | characters in the "reserved" set. These characters are called | |||
| "reserved" because they may (or may not) be defined as delimiters by | "reserved" because they may (or may not) be defined as delimiters by | |||
| the generic syntax, by each scheme-specific syntax, or by the | the generic syntax, by each scheme-specific syntax, or by the | |||
| implementation-specific syntax of a URI's dereferencing algorithm. | implementation-specific syntax of a URI's dereferencing algorithm. | |||
| If data for a URI component would conflict with a reserved | If data for a URI component would conflict with a reserved | |||
| character's purpose as a delimiter, then the conflicting data must be | character's purpose as a delimiter, then the conflicting data must be | |||
| percent-encoded before forming the URI. | percent-encoded before forming the URI. | |||
| reserved = gen-delims / sub-delims | reserved = gen-delims / sub-delims | |||
| gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" | gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" | |||
| sub-delims = "!" / "$" / "&" / "'" / "(" / ")" | sub-delims = "!" / "$" / "&" / "'" / "(" / ")" | |||
| / "*" / "+" / "," / ";" / "=" | / "*" / "+" / "," / ";" / "=" | |||
| The purpose of reserved characters is to provide a set of delimiting | The purpose of reserved characters is to provide a set of delimiting | |||
| characters that are distinguishable from other data within a URI. | characters that are distinguishable from other data within a URI. | |||
| URIs that differ in the replacement of a reserved character with its | URIs that differ in the replacement of a reserved character with its | |||
| corresponding percent-encoded octet are not equivalent. | corresponding percent-encoded octet are not equivalent. | |||
| Percent-encoding a reserved character, or decoding a percent-encoded | Percent-encoding a reserved character, or decoding a percent-encoded | |||
| octet that corresponds to a reserved character, will change how the | octet that corresponds to a reserved character, will change how the | |||
| URI is interpreted by most applications. Thus, characters in the | URI is interpreted by most applications. Thus, characters in the | |||
| reserved set are protected from normalization and are therefore safe | reserved set are protected from normalization and are therefore safe | |||
| to be used by scheme-specific and producer-specific algorithms for | to be used by scheme-specific and producer-specific algorithms for | |||
| delimiting data subcomponents within a URI. | delimiting data subcomponents within a URI. | |||
| A subset of the reserved characters (gen-delims) are used as | A subset of the reserved characters (gen-delims) are used as | |||
| delimiters of the generic URI components described in Section 3. A | delimiters of the generic URI components described in Section 3. A | |||
| component's ABNF syntax rule will not use the reserved or gen-delims | component's ABNF syntax rule will not use the reserved or gen-delims | |||
| rule names directly; instead, each syntax rule lists the characters | rule names directly; instead, each syntax rule lists the characters | |||
| allowed within that component (i.e., not delimiting it) and any of | allowed within that component (i.e., not delimiting it) and any of | |||
| those characters that are also in the reserved set are "reserved" for | those characters that are also in the reserved set are "reserved" for | |||
| use as subcomponent delimiters within the component. Only the most | use as subcomponent delimiters within the component. Only the most | |||
| common subcomponents are defined by this specification; other | common subcomponents are defined by this specification; other | |||
| subcomponents may be defined by a URI scheme's specification, or by | subcomponents may be defined by a URI scheme's specification, or by | |||
| the implementation-specific syntax of a URI's dereferencing | the implementation-specific syntax of a URI's dereferencing | |||
| algorithm, provided that such subcomponents are delimited by | algorithm, provided that such subcomponents are delimited by | |||
| characters in the reserved set allowed within that component. | characters in the reserved set allowed within that component. | |||
| skipping to change at page 12, line 48 ¶ | skipping to change at page 13, line 35 ¶ | |||
| 2.3 Unreserved Characters | 2.3 Unreserved Characters | |||
| Characters that are allowed in a URI but do not have a reserved | Characters that are allowed in a URI but do not have a reserved | |||
| purpose are called unreserved. These include uppercase and lowercase | purpose are called unreserved. These include uppercase and lowercase | |||
| letters, decimal digits, hyphen, period, underscore, and tilde. | letters, decimal digits, hyphen, period, underscore, and tilde. | |||
| unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" | unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" | |||
| URIs that differ in the replacement of an unreserved character with | URIs that differ in the replacement of an unreserved character with | |||
| its corresponding percent-encoded octet are equivalent: they identify | its corresponding percent-encoded US-ASCII octet are equivalent: they | |||
| the same resource. However, percent-encoded unreserved characters | identify the same resource. However, URI comparison implementations | |||
| may change the result of some URI comparisons (Section 6), | do not always perform normalization prior to comparison Section 6. | |||
| potentially leading to incorrect or inefficient behavior. For | For consistency, percent-encoded octets in the ranges of ALPHA | |||
| consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A | (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), | |||
| and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore | underscore (%5F), or tilde (%7E) should not be created by URI | |||
| (%5F), or tilde (%7E) should not be created by URI producers and, | producers and, when found in a URI, should be decoded to their | |||
| when found in a URI, should be decoded to their corresponding | corresponding unreserved character by URI normalizers. | |||
| unreserved character by URI normalizers. | ||||
| 2.4 When to Encode or Decode | 2.4 When to Encode or Decode | |||
| Under normal circumstances, the only time that octets within a URI | Under normal circumstances, the only time that octets within a URI | |||
| are percent-encoded is during the process of producing the URI from | are percent-encoded is during the process of producing the URI from | |||
| its component parts. It is during that process that an | its component parts. It is during that process that an | |||
| implementation determines which of the reserved characters are to be | implementation determines which of the reserved characters are to be | |||
| used as subcomponent delimiters and which can be safely used as data. | used as subcomponent delimiters and which can be safely used as data. | |||
| Once produced, a URI is always in its percent-encoded form. | Once produced, a URI is always in its percent-encoded form. | |||
| skipping to change at page 13, line 43 ¶ | skipping to change at page 14, line 29 ¶ | |||
| not percent-encode or decode the same string more than once, since | not percent-encode or decode the same string more than once, since | |||
| decoding an already decoded string might lead to misinterpreting a | decoding an already decoded string might lead to misinterpreting a | |||
| percent data octet as the beginning of a percent-encoding, or vice | percent data octet as the beginning of a percent-encoding, or vice | |||
| versa in the case of percent-encoding an already percent-encoded | versa in the case of percent-encoding an already percent-encoded | |||
| string. | string. | |||
| 2.5 Identifying Data | 2.5 Identifying Data | |||
| URI characters provide identifying data for each of the URI | URI characters provide identifying data for each of the URI | |||
| components, serving as an external interface for identification | components, serving as an external interface for identification | |||
| between systems. Although the presence and nature of the URI | between systems. Although the presence and nature of the URI | |||
| production interface is hidden from clients that use its URIs, and | production interface is hidden from clients that use its URIs, and | |||
| thus beyond the scope of the interoperability requirements defined by | thus beyond the scope of the interoperability requirements defined by | |||
| this specification, it is a frequent source of confusion and errors | this specification, it is a frequent source of confusion and errors | |||
| in the interpretation of URI character issues. Implementers need to | in the interpretation of URI character issues. Implementers need to | |||
| be aware that there are multiple character encodings involved in the | be aware that there are multiple character encodings involved in the | |||
| production and transmission of URIs: local name and data encoding, | production and transmission of URIs: local name and data encoding, | |||
| public interface encoding, URI character encoding, data format | public interface encoding, URI character encoding, data format | |||
| encoding, and protocol encoding. | encoding, and protocol encoding. | |||
| The first encoding of identifying data is the one in which the local | The first encoding of identifying data is the one in which the local | |||
| skipping to change at page 14, line 24 ¶ | skipping to change at page 15, line 9 ¶ | |||
| data formats are often subsequently encoded for transmission over | data formats are often subsequently encoded for transmission over | |||
| Internet protocols. | Internet protocols. | |||
| For most systems, an unreserved character appearing within a URI | For most systems, an unreserved character appearing within a URI | |||
| component is interpreted as representing the data octet corresponding | component is interpreted as representing the data octet corresponding | |||
| to that character's encoding in US-ASCII. Consumers of URIs assume | to that character's encoding in US-ASCII. Consumers of URIs assume | |||
| that the letter "X" corresponds to the octet "01011000", and there is | that the letter "X" corresponds to the octet "01011000", and there is | |||
| no harm in making that assumption even when it is incorrect. A | no harm in making that assumption even when it is incorrect. A | |||
| system that internally provides identifiers in the form of a | system that internally provides identifiers in the form of a | |||
| different character encoding, such as EBCDIC, will generally perform | different character encoding, such as EBCDIC, will generally perform | |||
| character translation of textual identifiers to UTF-8 [RFC3629] (or | character translation of textual identifiers to UTF-8 [STD63] (or | |||
| some other superset of the US-ASCII character encoding) at an | some other superset of the US-ASCII character encoding) at an | |||
| internal interface, thereby providing more meaningful identifiers | internal interface, thereby providing more meaningful identifiers | |||
| than simply percent-encoding the original octets. | than simply percent-encoding the original octets. | |||
| For example, consider an information service that provides data, | For example, consider an information service that provides data, | |||
| stored locally using an EBCDIC-based filesystem, to clients on the | stored locally using an EBCDIC-based filesystem, to clients on the | |||
| Internet through an HTTP server. When an author creates a file on | Internet through an HTTP server. When an author creates a file on | |||
| that filesystem with the name "Laguna Beach", their expectation is | that filesystem with the name "Laguna Beach", their expectation is | |||
| that the "http" URI corresponding to that resource would also contain | that the "http" URI corresponding to that resource would also contain | |||
| the meaningful string "Laguna%20Beach". If, however, that server | the meaningful string "Laguna%20Beach". If, however, that server | |||
| skipping to change at page 14, line 48 ¶ | skipping to change at page 15, line 33 ¶ | |||
| interface fixes that problem by transcoding the local name to a | interface fixes that problem by transcoding the local name to a | |||
| superset of US-ASCII prior to producing the URI. Naturally, proper | superset of US-ASCII prior to producing the URI. Naturally, proper | |||
| interpretation of an incoming URI on such an interface requires that | interpretation of an incoming URI on such an interface requires that | |||
| percent-encoded octets be decoded (e.g., "%20" to SP) before the | percent-encoded octets be decoded (e.g., "%20" to SP) before the | |||
| reverse transcoding is applied to obtain the local name. | reverse transcoding is applied to obtain the local name. | |||
| In some cases, the internal interface between a URI component and the | In some cases, the internal interface between a URI component and the | |||
| identifying data that it has been crafted to represent is much less | identifying data that it has been crafted to represent is much less | |||
| direct than a character encoding translation. For example, portions | direct than a character encoding translation. For example, portions | |||
| of a URI might reflect a query on non-ASCII data, numeric coordinates | of a URI might reflect a query on non-ASCII data, numeric coordinates | |||
| on a map, etc. Likewise, a URI scheme may define components with | on a map, etc. Likewise, a URI scheme may define components with | |||
| additional encoding requirements that are applied prior to forming | additional encoding requirements that are applied prior to forming | |||
| the component and producing the URI. | the component and producing the URI. | |||
| When a new URI scheme defines a component that represents textual | When a new URI scheme defines a component that represents textual | |||
| data consisting of characters from the Unicode (ISO/IEC 10646-1) | data consisting of characters from the Unicode character set [UCS], | |||
| character set, the data should be encoded first as octets according | the data should be encoded first as octets according to the UTF-8 | |||
| to the UTF-8 character encoding [RFC3629], and then only those octets | character encoding [STD63], and then only those octets that do not | |||
| that do not correspond to characters in the unreserved set should be | correspond to characters in the unreserved set should be | |||
| percent-encoded. For example, the character A would be represented | percent-encoded. For example, the character A would be represented | |||
| as "A", the character LATIN CAPITAL LETTER A WITH GRAVE would be | as "A", the character LATIN CAPITAL LETTER A WITH GRAVE would be | |||
| represented as "%C3%80", and the character KATAKANA LETTER A would be | represented as "%C3%80", and the character KATAKANA LETTER A would be | |||
| represented as "%E3%82%A2". | represented as "%E3%82%A2". | |||
| 3. Syntax Components | 3. Syntax Components | |||
| The generic URI syntax consists of a hierarchical sequence of | The generic URI syntax consists of a hierarchical sequence of | |||
| components referred to as the scheme, authority, path, query, and | components referred to as the scheme, authority, path, query, and | |||
| fragment. | fragment. | |||
| URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] | URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] | |||
| hier-part = "//" authority path-abempty | hier-part = "//" authority path-abempty | |||
| / path-abs | / path-absolute | |||
| / path-rootless | / path-rootless | |||
| / path-empty | / path-empty | |||
| The scheme and path components are required, though path may be empty | The scheme and path components are required, though path may be empty | |||
| (no characters). When authority is present, the path must either be | (no characters). When authority is present, the path must either be | |||
| empty or begin with a slash ("/") character. When authority is not | empty or begin with a slash ("/") character. When authority is not | |||
| present, the path cannot begin with two slash characters ("//"). | present, the path cannot begin with two slash characters ("//"). | |||
| These restrictions result in five different ABNF rules for a path | These restrictions result in five different ABNF rules for a path | |||
| (Section 3.3), only one of which will match any given URI reference. | (Section 3.3), only one of which will match any given URI reference. | |||
| The following are two example URIs and their component parts: | The following are two example URIs and their component parts: | |||
| foo://example.com:8042/over/there?name=ferret#nose | foo://example.com:8042/over/there?name=ferret#nose | |||
| \_/ \______________/\_________/ \_________/ \__/ | \_/ \______________/\_________/ \_________/ \__/ | |||
| | | | | | | | | | | | | |||
| scheme authority path query fragment | scheme authority path query fragment | |||
| | _____________________|__ | | _____________________|__ | |||
| / \ / \ | / \ / \ | |||
| urn:example:animal:ferret:nose | urn:example:animal:ferret:nose | |||
| 3.1 Scheme | 3.1 Scheme | |||
| Each URI begins with a scheme name that refers to a specification for | Each URI begins with a scheme name that refers to a specification for | |||
| assigning identifiers within that scheme. As such, the URI syntax is | assigning identifiers within that scheme. As such, the URI syntax is | |||
| a federated and extensible naming system wherein each scheme's | a federated and extensible naming system wherein each scheme's | |||
| specification may further restrict the syntax and semantics of | specification may further restrict the syntax and semantics of | |||
| identifiers using that scheme. | identifiers using that scheme. | |||
| Scheme names consist of a sequence of characters beginning with a | Scheme names consist of a sequence of characters beginning with a | |||
| letter and followed by any combination of letters, digits, plus | letter and followed by any combination of letters, digits, plus | |||
| ("+"), period ("."), or hyphen ("-"). Although scheme is | ("+"), period ("."), or hyphen ("-"). Although scheme is | |||
| case-insensitive, the canonical form is lowercase and documents that | case-insensitive, the canonical form is lowercase and documents that | |||
| specify schemes must do so using lowercase letters. An | specify schemes must do so using lowercase letters. An | |||
| implementation should accept uppercase letters as equivalent to | implementation should accept uppercase letters as equivalent to | |||
| lowercase in scheme names (e.g., allow "HTTP" as well as "http"), for | lowercase in scheme names (e.g., allow "HTTP" as well as "http"), for | |||
| the sake of robustness, but should only produce lowercase scheme | the sake of robustness, but should only produce lowercase scheme | |||
| names, for consistency. | names, for consistency. | |||
| scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) | scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) | |||
| Individual schemes are not specified by this document. The process | Individual schemes are not specified by this document. The process | |||
| for registration of new URI schemes is defined separately by | for registration of new URI schemes is defined separately by [BCP35]. | |||
| [RFC2717]. The scheme registry maintains the mapping between scheme | The scheme registry maintains the mapping between scheme names and | |||
| names and their specifications. Advice for designers of new URI | their specifications. Advice for designers of new URI schemes can be | |||
| schemes can be found in [RFC2718]. | found in [RFC2718]. | |||
| When presented with a URI that violates one or more scheme-specific | When presented with a URI that violates one or more scheme-specific | |||
| restrictions, the scheme-specific resolution process should flag the | restrictions, the scheme-specific resolution process should flag the | |||
| reference as an error rather than ignore the unused parts; doing so | reference as an error rather than ignore the unused parts; doing so | |||
| reduces the number of equivalent URIs and helps detect abuses of the | reduces the number of equivalent URIs and helps detect abuses of the | |||
| generic syntax that might indicate the URI has been constructed to | generic syntax that might indicate the URI has been constructed to | |||
| mislead the user (Section 7.6). | mislead the user (Section 7.6). | |||
| 3.2 Authority | 3.2 Authority | |||
| skipping to change at page 16, line 46 ¶ | skipping to change at page 17, line 34 ¶ | |||
| means for distinguishing an authority based on a registered name or | means for distinguishing an authority based on a registered name or | |||
| server address, along with optional port and user information. | server address, along with optional port and user information. | |||
| The authority component is preceded by a double slash ("//") and is | The authority component is preceded by a double slash ("//") and is | |||
| terminated by the next slash ("/"), question mark ("?"), or number | terminated by the next slash ("/"), question mark ("?"), or number | |||
| sign ("#") character, or by the end of the URI. | sign ("#") character, or by the end of the URI. | |||
| authority = [ userinfo "@" ] host [ ":" port ] | authority = [ userinfo "@" ] host [ ":" port ] | |||
| URI producers and normalizers should omit the ":" delimiter that | URI producers and normalizers should omit the ":" delimiter that | |||
| separates host from port if the port component is empty. Some schemes | separates host from port if the port component is empty. Some | |||
| do not allow the userinfo and/or port subcomponents. | schemes do not allow the userinfo and/or port subcomponents. | |||
| If a URI contains an authority component, then the path component | If a URI contains an authority component, then the path component | |||
| must either be empty or begin with a slash ("/") character. | must either be empty or begin with a slash ("/") character. | |||
| Non-validating parsers (those that merely separate a URI reference | Non-validating parsers (those that merely separate a URI reference | |||
| into its major components) will often ignore the subcomponent | into its major components) will often ignore the subcomponent | |||
| structure of authority, treating it as an opaque string from the | structure of authority, treating it as an opaque string from the | |||
| double-slash to the first terminating delimiter, until such time as | double-slash to the first terminating delimiter, until such time as | |||
| the URI is dereferenced. | the URI is dereferenced. | |||
| 3.2.1 User Information | 3.2.1 User Information | |||
| The userinfo subcomponent may consist of a user name and, optionally, | The userinfo subcomponent may consist of a user name and, optionally, | |||
| scheme-specific information about how to gain authorization to access | scheme-specific information about how to gain authorization to access | |||
| the resource. The user information, if present, is followed by a | the resource. The user information, if present, is followed by a | |||
| commercial at-sign ("@") that delimits it from the host. | commercial at-sign ("@") that delimits it from the host. | |||
| userinfo = *( unreserved / pct-encoded / sub-delims / ":" ) | userinfo = *( unreserved / pct-encoded / sub-delims / ":" ) | |||
| Use of the format "user:password" in the userinfo field is | Use of the format "user:password" in the userinfo field is | |||
| deprecated. Applications should not render as clear text any data | deprecated. Applications should not render as clear text any data | |||
| after the first colon (":") character found within a userinfo | after the first colon (":") character found within a userinfo | |||
| subcomponent unless the data after the colon is the empty string | subcomponent unless the data after the colon is the empty string | |||
| (indicating no password). Applications may choose to ignore or reject | (indicating no password). Applications may choose to ignore or | |||
| such data when received as part of a reference, and should reject the | reject such data when received as part of a reference, and should | |||
| storage of such data in unencrypted form. The passing of | reject the storage of such data in unencrypted form. The passing of | |||
| authentication information in clear text has proven to be a security | authentication information in clear text has proven to be a security | |||
| risk in almost every case where it has been used. | risk in almost every case where it has been used. | |||
| Applications that render a URI for the sake of user feedback, such as | Applications that render a URI for the sake of user feedback, such as | |||
| in graphical hypertext browsing, should render userinfo in a way that | in graphical hypertext browsing, should render userinfo in a way that | |||
| is distinguished from the rest of a URI, when feasible. Such | is distinguished from the rest of a URI, when feasible. Such | |||
| rendering will assist the user in cases where the userinfo has been | rendering will assist the user in cases where the userinfo has been | |||
| misleadingly crafted to look like a trusted domain name (Section | misleadingly crafted to look like a trusted domain name | |||
| 7.6). | (Section 7.6). | |||
| 3.2.2 Host | 3.2.2 Host | |||
| The host subcomponent of authority is identified by an IP literal | The host subcomponent of authority is identified by an IP literal | |||
| encapsulated within square brackets, an IPv4 address in | encapsulated within square brackets, an IPv4 address in | |||
| dotted-decimal form, or a registered name. The host subcomponent is | dotted-decimal form, or a registered name. The host subcomponent is | |||
| case-insensitive. The presence of a host subcomponent within a URI | case-insensitive. The presence of a host subcomponent within a URI | |||
| does not imply that the scheme requires access to the given host on | does not imply that the scheme requires access to the given host on | |||
| the Internet. In many cases, the host syntax is used only for the | the Internet. In many cases, the host syntax is used only for the | |||
| sake of reusing the existing registration process created and | sake of reusing the existing registration process created and | |||
| deployed for DNS, thus obtaining a globally unique name without the | deployed for DNS, thus obtaining a globally unique name without the | |||
| cost of deploying another registry. However, such use comes with its | cost of deploying another registry. However, such use comes with its | |||
| own costs: domain name ownership may change over time for reasons not | own costs: domain name ownership may change over time for reasons not | |||
| anticipated by the URI producer. In other cases, the data within the | anticipated by the URI producer. In other cases, the data within the | |||
| host component identifies a registered name that has nothing to do | host component identifies a registered name that has nothing to do | |||
| with an Internet host. We use the name "host" for the ABNF rule | with an Internet host. We use the name "host" for the ABNF rule | |||
| because that is its most common purpose, not its only purpose, and | because that is its most common purpose, not its only purpose, and | |||
| thus should not be considered as semantically limiting the data | thus should not be considered as semantically limiting the data | |||
| within it. | within it. | |||
| host = IP-literal / IPv4address / reg-name | host = IP-literal / IPv4address / reg-name | |||
| The syntax rule for host is ambiguous because it does not completely | The syntax rule for host is ambiguous because it does not completely | |||
| distinguish between an IPv4address and a reg-name. In order to | distinguish between an IPv4address and a reg-name. In order to | |||
| disambiguate, the syntax, we apply the "first-match-wins" algorithm: | disambiguate the syntax, we apply the "first-match-wins" algorithm: | |||
| If host matches the rule for IPv4address, then it should be | If host matches the rule for IPv4address, then it should be | |||
| considered an IPv4 address literal and not a reg-name. Although host | considered an IPv4 address literal and not a reg-name. Although host | |||
| is case-insensitive, producers and normalizers should use lowercase | is case-insensitive, producers and normalizers should use lowercase | |||
| for registered names and hexadecimal addresses for the sake of | for registered names and hexadecimal addresses for the sake of | |||
| uniformity, while only using uppercase letters for percent-encodings. | uniformity, while only using uppercase letters for percent-encodings. | |||
| A host identified by an Internet Protocol literal address, version 6 | A host identified by an Internet Protocol literal address, version 6 | |||
| [RFC3513] or later, is distinguished by enclosing the IP literal | [RFC3513] or later, is distinguished by enclosing the IP literal | |||
| within square brackets ("[" and "]"). This is the only place where | within square brackets ("[" and "]"). This is the only place where | |||
| square bracket characters are allowed in the URI syntax. In | square bracket characters are allowed in the URI syntax. In | |||
| anticipation of future, as-yet-undefined IP literal address formats, | anticipation of future, as-yet-undefined IP literal address formats, | |||
| an optional version flag may be used to indicate such a format | an optional version flag may be used to indicate such a format | |||
| explicitly rather than relying on heuristic determination. | explicitly rather than relying on heuristic determination. | |||
| IP-literal = "[" ( IPv6address / IPvFuture ) "]" | IP-literal = "[" ( IPv6address / IPvFuture ) "]" | |||
| IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" ) | IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" ) | |||
| The version flag does not indicate the IP version; rather, it | The version flag does not indicate the IP version; rather, it | |||
| indicates future versions of the literal format. As such, | indicates future versions of the literal format. As such, | |||
| implementations must not provide the version flag for existing IPv4 | implementations must not provide the version flag for existing IPv4 | |||
| and IPv6 literal addresses. If a URI containing an IP-literal that | and IPv6 literal addresses. If a URI containing an IP-literal that | |||
| starts with "v" (case-insensitive), indicating that the version flag | starts with "v" (case-insensitive), indicating that the version flag | |||
| is present, is dereferenced by an application that does not know the | is present, is dereferenced by an application that does not know the | |||
| meaning of that version flag, then the application should return an | meaning of that version flag, then the application should return an | |||
| appropriate error for "address mechanism not supported". | appropriate error for "address mechanism not supported". | |||
| A host identified by an IPv6 literal address is represented inside | A host identified by an IPv6 literal address is represented inside | |||
| the square brackets without a preceding version flag. The ABNF | the square brackets without a preceding version flag. The ABNF | |||
| provided here is a translation of the text definition of an IPv6 | provided here is a translation of the text definition of an IPv6 | |||
| literal address provided in [RFC3513]. A 128-bit IPv6 address is | literal address provided in [RFC3513]. A 128-bit IPv6 address is | |||
| divided into eight 16-bit pieces. Each piece is represented | divided into eight 16-bit pieces. Each piece is represented | |||
| numerically in case-insensitive hexadecimal, using one to four | numerically in case-insensitive hexadecimal, using one to four | |||
| hexadecimal digits (leading zeroes are permitted). The eight encoded | hexadecimal digits (leading zeroes are permitted). The eight encoded | |||
| pieces are given most-significant first, separated by colon | pieces are given most-significant first, separated by colon | |||
| characters. Optionally, the least-significant two pieces may instead | characters. Optionally, the least-significant two pieces may instead | |||
| be represented in IPv4 address textual format. A sequence of one or | be represented in IPv4 address textual format. A sequence of one or | |||
| more consecutive zero-valued 16-bit pieces within the address may be | more consecutive zero-valued 16-bit pieces within the address may be | |||
| elided, omitting all their digits and leaving exactly two consecutive | elided, omitting all their digits and leaving exactly two consecutive | |||
| colons in their place to mark the elision. | colons in their place to mark the elision. | |||
| IPv6address = 6( h16 ":" ) ls32 | IPv6address = 6( h16 ":" ) ls32 | |||
| / "::" 5( h16 ":" ) ls32 | / "::" 5( h16 ":" ) ls32 | |||
| / [ h16 ] "::" 4( h16 ":" ) ls32 | / [ h16 ] "::" 4( h16 ":" ) ls32 | |||
| / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32 | / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32 | |||
| / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32 | / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32 | |||
| / [ *3( h16 ":" ) h16 ] "::" h16 ":" ls32 | / [ *3( h16 ":" ) h16 ] "::" h16 ":" ls32 | |||
| skipping to change at page 19, line 41 ¶ | skipping to change at page 20, line 28 ¶ | |||
| dec-octet = DIGIT ; 0-9 | dec-octet = DIGIT ; 0-9 | |||
| / %x31-39 DIGIT ; 10-99 | / %x31-39 DIGIT ; 10-99 | |||
| / "1" 2DIGIT ; 100-199 | / "1" 2DIGIT ; 100-199 | |||
| / "2" %x30-34 DIGIT ; 200-249 | / "2" %x30-34 DIGIT ; 200-249 | |||
| / "25" %x30-35 ; 250-255 | / "25" %x30-35 ; 250-255 | |||
| A host identified by a registered name is a sequence of characters | A host identified by a registered name is a sequence of characters | |||
| that is usually intended for lookup within a locally-defined host or | that is usually intended for lookup within a locally-defined host or | |||
| service name registry, though the URI's scheme-specific semantics may | service name registry, though the URI's scheme-specific semantics may | |||
| require that a specific registry (or fixed name table) be used | require that a specific registry (or fixed name table) be used | |||
| instead. The most common name registry mechanism is the Domain Name | instead. The most common name registry mechanism is the Domain Name | |||
| System (DNS). A registered name intended for lookup in the DNS uses | System (DNS). A registered name intended for lookup in the DNS uses | |||
| the syntax defined in Section 3.5 of [RFC1034] and Section 2.1 of | the syntax defined in Section 3.5 of [RFC1034] and Section 2.1 of | |||
| [RFC1123]. Such a name consists of a sequence of domain labels | [RFC1123]. Such a name consists of a sequence of domain labels | |||
| separated by ".", each domain label starting and ending with an | separated by ".", each domain label starting and ending with an | |||
| alphanumeric character and possibly also containing "-" characters. | alphanumeric character and possibly also containing "-" characters. | |||
| The rightmost domain label of a fully qualified domain name in DNS | The rightmost domain label of a fully qualified domain name in DNS | |||
| may be followed by a single "." and should be followed by one if it | may be followed by a single "." and should be followed by one if it | |||
| is necessary to distinguish between the complete domain name and some | is necessary to distinguish between the complete domain name and some | |||
| local domain. | local domain. | |||
| reg-name = 0*255( unreserved / pct-encoded / sub-delims ) | reg-name = *( unreserved / pct-encoded / sub-delims ) | |||
| If the URI scheme defines a default for host, then that default | If the URI scheme defines a default for host, then that default | |||
| applies when the host subcomponent is undefined or when the | applies when the host subcomponent is undefined or when the | |||
| registered name is empty (zero length). For example, the "file" URI | registered name is empty (zero length). For example, the "file" URI | |||
| scheme is defined such that no authority, an empty host, and | scheme is defined such that no authority, an empty host, and | |||
| "localhost" all mean the end-user's machine, whereas the "http" | "localhost" all mean the end-user's machine, whereas the "http" | |||
| scheme considers a missing authority or empty host to be invalid. | scheme considers a missing authority or empty host to be invalid. | |||
| This specification does not mandate a particular registered name | This specification does not mandate a particular registered name | |||
| lookup technology and therefore does not restrict the syntax of | lookup technology and therefore does not restrict the syntax of | |||
| reg-name beyond that necessary for interoperability. Instead, it | reg-name beyond that necessary for interoperability. Instead, it | |||
| delegates the issue of registered name syntax conformance to the | delegates the issue of registered name syntax conformance to the | |||
| operating system of each application performing URI resolution, and | operating system of each application performing URI resolution, and | |||
| that operating system decides what it will allow for the purpose of | that operating system decides what it will allow for the purpose of | |||
| host identification. A URI resolution implementation might use DNS, | host identification. A URI resolution implementation might use DNS, | |||
| host tables, yellow pages, NetInfo, WINS, or any other system for | host tables, yellow pages, NetInfo, WINS, or any other system for | |||
| lookup of registered names. However, a globally-scoped naming system, | lookup of registered names. However, a globally-scoped naming | |||
| such as DNS fully-qualified domain names, is necessary for URIs that | system, such as DNS fully-qualified domain names, is necessary for | |||
| are intended to have global scope. URI producers should use names | URIs that are intended to have global scope. URI producers should | |||
| that conform to the DNS syntax, even when use of DNS is not | use names that conform to the DNS syntax, even when use of DNS is not | |||
| immediately apparent. | immediately apparent, and should limit such names to no more than 255 | |||
| characters in length. | ||||
| The reg-name syntax allows percent-encoded octets in order to | The reg-name syntax allows percent-encoded octets in order to | |||
| represent non-ASCII registered names in a uniform way that is | represent non-ASCII registered names in a uniform way that is | |||
| independent of the underlying name resolution technology; such | independent of the underlying name resolution technology; such | |||
| non-ASCII characters must first be encoded according to UTF-8 | non-ASCII characters must first be encoded according to UTF-8 [STD63] | |||
| [RFC3629] and then each octet of the corresponding UTF-8 sequence | and then each octet of the corresponding UTF-8 sequence must be | |||
| must be percent-encoded to be represented as URI characters. URI | percent-encoded to be represented as URI characters. URI producing | |||
| producing applications must not use percent-encoding in host unless | applications must not use percent-encoding in host unless it is used | |||
| it is used to represent a UTF-8 character sequence. When a non-ASCII | to represent a UTF-8 character sequence. When a non-ASCII registered | |||
| registered name represents an internationalized domain name intended | name represents an internationalized domain name intended for | |||
| for resolution via the DNS, the name must be transformed to the IDNA | resolution via the DNS, the name must be transformed to the IDNA | |||
| encoding [RFC3490] prior to name lookup. URI producers should | encoding [RFC3490] prior to name lookup. URI producers should | |||
| provide such registered names in the IDNA encoding, rather than a | provide such registered names in the IDNA encoding, rather than a | |||
| percent-encoding, if they wish to maximize interoperability with | percent-encoding, if they wish to maximize interoperability with | |||
| legacy URI resolvers. | legacy URI resolvers. | |||
| 3.2.3 Port | 3.2.3 Port | |||
| The port subcomponent of authority is designated by an optional port | The port subcomponent of authority is designated by an optional port | |||
| number in decimal following the host and delimited from it by a | number in decimal following the host and delimited from it by a | |||
| single colon (":") character. | single colon (":") character. | |||
| port = *DIGIT | port = *DIGIT | |||
| A scheme may define a default port. For example, the "http" scheme | A scheme may define a default port. For example, the "http" scheme | |||
| defines a default port of "80", corresponding to its reserved TCP | defines a default port of "80", corresponding to its reserved TCP | |||
| port number. The type of port designated by the port number (e.g., | port number. The type of port designated by the port number (e.g., | |||
| TCP, UDP, SCTP, etc.) is defined by the URI scheme. URI producers | TCP, UDP, SCTP, etc.) is defined by the URI scheme. URI producers | |||
| and normalizers should omit the port component and its ":" delimiter | and normalizers should omit the port component and its ":" delimiter | |||
| if port is empty or its value would be the same as the scheme's | if port is empty or its value would be the same as the scheme's | |||
| default. | default. | |||
| 3.3 Path | 3.3 Path | |||
| The path component contains data, usually organized in hierarchical | The path component contains data, usually organized in hierarchical | |||
| form, that, along with data in the non-hierarchical query component | form, that, along with data in the non-hierarchical query component | |||
| (Section 3.4), serves to identify a resource within the scope of the | (Section 3.4), serves to identify a resource within the scope of the | |||
| URI's scheme and naming authority (if any). The path is terminated by | URI's scheme and naming authority (if any). The path is terminated | |||
| the first question mark ("?") or number sign ("#") character, or by | by the first question mark ("?") or number sign ("#") character, or | |||
| the end of the URI. | by the end of the URI. | |||
| If a URI contains an authority component, then the path component | If a URI contains an authority component, then the path component | |||
| must either be empty or begin with a slash ("/") character. If a URI | must either be empty or begin with a slash ("/") character. If a URI | |||
| does not contain an authority component, then the path cannot begin | does not contain an authority component, then the path cannot begin | |||
| with two slash characters ("//"). In addition, a URI reference | with two slash characters ("//"). In addition, a URI reference | |||
| (Section 4.1) may begin with a relative path, in which case the first | (Section 4.1) may begin with a relative path, in which case the first | |||
| path segment cannot contain a colon (":") character. The ABNF | path segment cannot contain a colon (":") character. The ABNF | |||
| requires five separate rules to disambiguate these cases, only one of | requires five separate rules to disambiguate these cases, only one of | |||
| which will match a given URI reference. We use the generic term | which will match the path substring within a given URI reference. We | |||
| "path component" to describe the URI substring that is matched by the | use the generic term "path component" to describe the URI substring | |||
| parser to one of these rules. | matched by the parser to one of these rules. | |||
| path = path-abempty ; begins with "/" or is empty | path = path-abempty ; begins with "/" or is empty | |||
| / path-abs ; begins with "/" but not "//" | / path-absolute ; begins with "/" but not "//" | |||
| / path-noscheme ; begins with a non-colon segment | / path-noscheme ; begins with a non-colon segment | |||
| / path-rootless ; begins with a segment | / path-rootless ; begins with a segment | |||
| / path-empty ; zero characters | / path-empty ; zero characters | |||
| path-abempty = *( "/" segment ) | path-abempty = *( "/" segment ) | |||
| path-abs = "/" [ segment-nz *( "/" segment ) ] | path-absolute = "/" [ segment-nz *( "/" segment ) ] | |||
| path-noscheme = segment-nzc *( "/" segment ) | path-noscheme = segment-nz-nc *( "/" segment ) | |||
| path-rootless = segment-nz *( "/" segment ) | path-rootless = segment-nz *( "/" segment ) | |||
| path-empty = 0<pchar> | path-empty = 0<pchar> | |||
| segment = *pchar | segment = *pchar | |||
| segment-nz = 1*pchar | segment-nz = 1*pchar | |||
| segment-nzc = 1*( unreserved / pct-encoded / sub-delims / "@" ) | segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" ) | |||
| ; non-zero-length segment without any colon ":" | ||||
| pchar = unreserved / pct-encoded / sub-delims / ":" / "@" | pchar = unreserved / pct-encoded / sub-delims / ":" / "@" | |||
| A path consists of a sequence of path segments separated by a slash | A path consists of a sequence of path segments separated by a slash | |||
| ("/") character. A path is always defined for a URI, though the | ("/") character. A path is always defined for a URI, though the | |||
| defined path may be empty (zero length). Use of the slash character | defined path may be empty (zero length). Use of the slash character | |||
| to indicate hierarchy is only required when a URI will be used as the | to indicate hierarchy is only required when a URI will be used as the | |||
| context for relative references. For example, the URI | context for relative references. For example, the URI | |||
| <mailto:fred@example.com> has a path of "fred@example.com", whereas | <mailto:fred@example.com> has a path of "fred@example.com", whereas | |||
| the URI <foo://info.example.com?fred> has an empty path. | the URI <foo://info.example.com?fred> has an empty path. | |||
| The path segments "." and "..", also known as dot-segments, are | The path segments "." and "..", also known as dot-segments, are | |||
| defined for relative reference within the path name hierarchy. They | defined for relative reference within the path name hierarchy. They | |||
| are intended for use at the beginning of a relative path reference | are intended for use at the beginning of a relative path reference | |||
| (Section 4.2) for indicating relative position within the | (Section 4.2) for indicating relative position within the | |||
| hierarchical tree of names. This is similar to their role within | hierarchical tree of names. This is similar to their role within | |||
| some operating systems' file directory structure to indicate the | some operating systems' file directory structure to indicate the | |||
| current directory and parent directory, respectively. However, unlike | current directory and parent directory, respectively. However, | |||
| a file system, these dot-segments are only interpreted within the URI | unlike a file system, these dot-segments are only interpreted within | |||
| path hierarchy and are removed as part of the resolution process | the URI path hierarchy and are removed as part of the resolution | |||
| (Section 5.2). | process (Section 5.2). | |||
| Aside from dot-segments in hierarchical paths, a path segment is | Aside from dot-segments in hierarchical paths, a path segment is | |||
| considered opaque by the generic syntax. URI-producing applications | considered opaque by the generic syntax. URI-producing applications | |||
| often use the reserved characters allowed in a segment for the | often use the reserved characters allowed in a segment for the | |||
| purpose of delimiting scheme-specific or dereference-handler-specific | purpose of delimiting scheme-specific or dereference-handler-specific | |||
| subcomponents. For example, the semicolon (";") and equals ("=") | subcomponents. For example, the semicolon (";") and equals ("=") | |||
| reserved characters are often used for delimiting parameters and | reserved characters are often used for delimiting parameters and | |||
| parameter values applicable to that segment. The comma (",") | parameter values applicable to that segment. The comma (",") | |||
| reserved character is often used for similar purposes. For example, | reserved character is often used for similar purposes. For example, | |||
| one URI producer might use a segment like "name;v=1.1" to indicate a | one URI producer might use a segment like "name;v=1.1" to indicate a | |||
| reference to version 1.1 of "name", whereas another might use a | reference to version 1.1 of "name", whereas another might use a | |||
| segment like "name,1.1" to indicate the same. Parameter types may be | segment like "name,1.1" to indicate the same. Parameter types may be | |||
| defined by scheme-specific semantics, but in most cases the syntax of | defined by scheme-specific semantics, but in most cases the syntax of | |||
| a parameter is specific to the implementation of the URI's | a parameter is specific to the implementation of the URI's | |||
| dereferencing algorithm. | dereferencing algorithm. | |||
| 3.4 Query | 3.4 Query | |||
| The query component contains non-hierarchical data that, along with | The query component contains non-hierarchical data that, along with | |||
| data in the path component (Section 3.3), serves to identify a | data in the path component (Section 3.3), serves to identify a | |||
| resource within the scope of the URI's scheme and naming authority | resource within the scope of the URI's scheme and naming authority | |||
| (if any). The query component is indicated by the first question mark | (if any). The query component is indicated by the first question | |||
| ("?") character and terminated by a number sign ("#") character or by | mark ("?") character and terminated by a number sign ("#") character | |||
| the end of the URI. | or by the end of the URI. | |||
| query = *( pchar / "/" / "?" ) | query = *( pchar / "/" / "?" ) | |||
| The characters slash ("/") and question mark ("?") may represent data | The characters slash ("/") and question mark ("?") may represent data | |||
| within the query component. Beware that some older, erroneous | within the query component. Beware that some older, erroneous | |||
| implementations do not handle such URIs correctly when they are used | implementations do not handle such URIs correctly when they are used | |||
| as the base for relative references (Section 5.1), apparently because | as the base for relative references (Section 5.1), apparently because | |||
| they fail to to distinguish query data from path data when looking | they fail to to distinguish query data from path data when looking | |||
| for hierarchical separators. However, since query components are | for hierarchical separators. However, since query components are | |||
| often used to carry identifying information in the form of | often used to carry identifying information in the form of | |||
| "key=value" pairs, and one frequently used value is a reference to | "key=value" pairs, and one frequently used value is a reference to | |||
| another URI, it is sometimes better for usability to avoid | another URI, it is sometimes better for usability to avoid | |||
| percent-encoding those characters. | percent-encoding those characters. | |||
| 3.5 Fragment | 3.5 Fragment | |||
| The fragment identifier component of a URI allows indirect | The fragment identifier component of a URI allows indirect | |||
| identification of a secondary resource by reference to a primary | identification of a secondary resource by reference to a primary | |||
| resource and additional identifying information. The identified | resource and additional identifying information. The identified | |||
| secondary resource may be some portion or subset of the primary | secondary resource may be some portion or subset of the primary | |||
| resource, some view on representations of the primary resource, or | resource, some view on representations of the primary resource, or | |||
| some other resource defined or described by those representations. A | some other resource defined or described by those representations. A | |||
| fragment identifier component is indicated by the presence of a | fragment identifier component is indicated by the presence of a | |||
| number sign ("#") character and terminated by the end of the URI. | number sign ("#") character and terminated by the end of the URI. | |||
| fragment = *( pchar / "/" / "?" ) | fragment = *( pchar / "/" / "?" ) | |||
| The semantics of a fragment identifier are defined by the set of | The semantics of a fragment identifier are defined by the set of | |||
| representations that might result from a retrieval action on the | representations that might result from a retrieval action on the | |||
| primary resource. The fragment's format and resolution is therefore | primary resource. The fragment's format and resolution is therefore | |||
| dependent on the media type [RFC2046] of a potentially retrieved | dependent on the media type [RFC2046] of a potentially retrieved | |||
| representation, even though such a retrieval is only performed if the | representation, even though such a retrieval is only performed if the | |||
| URI is dereferenced. If no such representation exists, then the | URI is dereferenced. If no such representation exists, then the | |||
| semantics of the fragment are considered unknown and, effectively, | semantics of the fragment are considered unknown and, effectively, | |||
| unconstrained. Fragment identifier semantics are independent of the | unconstrained. Fragment identifier semantics are independent of the | |||
| URI scheme and thus cannot be redefined by scheme specifications. | URI scheme and thus cannot be redefined by scheme specifications. | |||
| Individual media types may define their own restrictions on, or | Individual media types may define their own restrictions on, or | |||
| structure within, the fragment identifier syntax for specifying | structure within, the fragment identifier syntax for specifying | |||
| different types of subsets, views, or external references that are | different types of subsets, views, or external references that are | |||
| identifiable as secondary resources by that media type. If the | identifiable as secondary resources by that media type. If the | |||
| primary resource has multiple representations, as is often the case | primary resource has multiple representations, as is often the case | |||
| for resources whose representation is selected based on attributes of | for resources whose representation is selected based on attributes of | |||
| the retrieval request (a.k.a., content negotiation), then whatever is | the retrieval request (a.k.a., content negotiation), then whatever is | |||
| identified by the fragment should be consistent across all of those | identified by the fragment should be consistent across all of those | |||
| skipping to change at page 23, line 47 ¶ | skipping to change at page 24, line 38 ¶ | |||
| fragment such that it corresponds to the same secondary resource, | fragment such that it corresponds to the same secondary resource, | |||
| regardless of how it is represented, or the fragment should be left | regardless of how it is represented, or the fragment should be left | |||
| undefined by the representation (i.e., not found). | undefined by the representation (i.e., not found). | |||
| As with any URI, use of a fragment identifier component does not | As with any URI, use of a fragment identifier component does not | |||
| imply that a retrieval action will take place. A URI with a fragment | imply that a retrieval action will take place. A URI with a fragment | |||
| identifier may be used to refer to the secondary resource without any | identifier may be used to refer to the secondary resource without any | |||
| implication that the primary resource is accessible or will ever be | implication that the primary resource is accessible or will ever be | |||
| accessed. | accessed. | |||
| Fragment identifiers have a special role in information systems as | Fragment identifiers have a special role in information retrieval | |||
| the primary form of client-side indirect referencing, allowing an | systems as the primary form of client-side indirect referencing, | |||
| author to specifically identify those aspects of an existing resource | allowing an author to specifically identify those aspects of an | |||
| that are only indirectly provided by the resource owner. As such, the | existing resource that are only indirectly provided by the resource | |||
| fragment identifier is not used in the scheme-specific processing of | owner. As such, the fragment identifier is not used in the | |||
| a URI; instead, the fragment identifier is separated from the rest of | scheme-specific processing of a URI; instead, the fragment identifier | |||
| the URI prior to a dereference, and thus the identifying information | is separated from the rest of the URI prior to a dereference, and | |||
| within the fragment itself is dereferenced solely by the user agent | thus the identifying information within the fragment itself is | |||
| and regardless of the URI scheme. Although this separate handling is | dereferenced solely by the user agent and regardless of the URI | |||
| often perceived to be a loss of information, particularly in regards | scheme. Although this separate handling is often perceived to be a | |||
| to accurate redirection of references as resources move over time, it | loss of information, particularly in regards to accurate redirection | |||
| also serves to prevent information providers from denying reference | of references as resources move over time, it also serves to prevent | |||
| authors the right to selectively refer to information within a | information providers from denying reference authors the right to | |||
| resource. Indirect referencing also provides additional flexibility | selectively refer to information within a resource. Indirect | |||
| and extensibility to systems that use URIs, since new media types are | referencing also provides additional flexibility and extensibility to | |||
| easier to define and deploy than new schemes of identification. | systems that use URIs, since new media types are easier to define and | |||
| deploy than new schemes of identification. | ||||
| The characters slash ("/") and question mark ("?") are allowed to | The characters slash ("/") and question mark ("?") are allowed to | |||
| represent data within the fragment identifier. Beware that some | represent data within the fragment identifier. Beware that some | |||
| older, erroneous implementations do not handle such URIs correctly | older, erroneous implementations do not handle such URIs correctly | |||
| when they are used as the base for relative references (Section 5.1). | when they are used as the base for relative references (Section 5.1). | |||
| 4. Usage | 4. Usage | |||
| When applications make reference to a URI, they do not always use the | When applications make reference to a URI, they do not always use the | |||
| full form of reference defined by the "URI" syntax rule. In order to | full form of reference defined by the "URI" syntax rule. In order to | |||
| save space and take advantage of hierarchical locality, many Internet | save space and take advantage of hierarchical locality, many Internet | |||
| protocol elements and media type formats allow an abbreviation of a | protocol elements and media type formats allow an abbreviation of a | |||
| URI, while others restrict the syntax to a particular form of URI. | URI, while others restrict the syntax to a particular form of URI. | |||
| We define the most common forms of reference syntax in this | We define the most common forms of reference syntax in this | |||
| specification because they impact and depend upon the design of the | specification because they impact and depend upon the design of the | |||
| generic syntax, requiring a uniform parsing algorithm in order to be | generic syntax, requiring a uniform parsing algorithm in order to be | |||
| interpreted consistently. | interpreted consistently. | |||
| 4.1 URI Reference | 4.1 URI Reference | |||
| skipping to change at page 25, line 14 ¶ | skipping to change at page 26, line 14 ¶ | |||
| 4.2 Relative URI | 4.2 Relative URI | |||
| A relative URI reference takes advantage of the hierarchical syntax | A relative URI reference takes advantage of the hierarchical syntax | |||
| (Section 1.2.3) in order to express a reference that is relative to | (Section 1.2.3) in order to express a reference that is relative to | |||
| the name space of another hierarchical URI. | the name space of another hierarchical URI. | |||
| relative-URI = relative-part [ "?" query ] [ "#" fragment ] | relative-URI = relative-part [ "?" query ] [ "#" fragment ] | |||
| relative-part = "//" authority path-abempty | relative-part = "//" authority path-abempty | |||
| / path-abs | / path-absolute | |||
| / path-noscheme | / path-noscheme | |||
| / path-empty | / path-empty | |||
| The URI referred to by a relative reference, also known as the target | The URI referred to by a relative reference, also known as the target | |||
| URI, is obtained by applying the reference resolution algorithm of | URI, is obtained by applying the reference resolution algorithm of | |||
| Section 5. | Section 5. | |||
| A relative reference that begins with two slash characters is termed | A relative reference that begins with two slash characters is termed | |||
| a network-path reference; such references are rarely used. A relative | a network-path reference; such references are rarely used. A | |||
| reference that begins with a single slash character is termed an | relative reference that begins with a single slash character is | |||
| absolute-path reference. A relative reference that does not begin | termed an absolute-path reference. A relative reference that does | |||
| with a slash character is termed a relative-path reference. | not begin with a slash character is termed a relative-path reference. | |||
| A path segment that contains a colon character (e.g., "this:that") | A path segment that contains a colon character (e.g., "this:that") | |||
| cannot be used as the first segment of a relative-path reference | cannot be used as the first segment of a relative-path reference | |||
| because it would be mistaken for a scheme name. Such a segment must | because it would be mistaken for a scheme name. Such a segment must | |||
| be preceded by a dot-segment (e.g., "./this:that") to make a | be preceded by a dot-segment (e.g., "./this:that") to make a | |||
| relative-path reference. | relative-path reference. | |||
| 4.3 Absolute URI | 4.3 Absolute URI | |||
| Some protocol elements allow only the absolute form of a URI without | Some protocol elements allow only the absolute form of a URI without | |||
| skipping to change at page 26, line 15 ¶ | skipping to change at page 27, line 12 ¶ | |||
| When a same-document reference is dereferenced for the purpose of a | When a same-document reference is dereferenced for the purpose of a | |||
| retrieval action, the target of that reference is defined to be | retrieval action, the target of that reference is defined to be | |||
| within the same entity (representation, document, or message) as the | within the same entity (representation, document, or message) as the | |||
| reference; therefore, a dereference should not result in a new | reference; therefore, a dereference should not result in a new | |||
| retrieval action. | retrieval action. | |||
| Normalization of the base and target URIs prior to their comparison, | Normalization of the base and target URIs prior to their comparison, | |||
| as described in Section 6.2.2 and Section 6.2.3, is allowed but | as described in Section 6.2.2 and Section 6.2.3, is allowed but | |||
| rarely performed in practice. Normalization may increase the set of | rarely performed in practice. Normalization may increase the set of | |||
| same-document references, which may be of benefit to some caching | same-document references, which may be of benefit to some caching | |||
| applications. As such, reference authors should not assume that a | applications. As such, reference authors should not assume that a | |||
| slightly different, though equivalent, reference URI will (or will | slightly different, though equivalent, reference URI will (or will | |||
| not) be interpreted as a same-document reference by any given | not) be interpreted as a same-document reference by any given | |||
| application. | application. | |||
| 4.5 Suffix Reference | 4.5 Suffix Reference | |||
| The URI syntax is designed for unambiguous reference to resources and | The URI syntax is designed for unambiguous reference to resources and | |||
| extensibility via the URI scheme. However, as URI identification and | extensibility via the URI scheme. However, as URI identification and | |||
| usage have become commonplace, traditional media (television, radio, | usage have become commonplace, traditional media (television, radio, | |||
| newspapers, billboards, etc.) have increasingly used a suffix of the | newspapers, billboards, etc.) have increasingly used a suffix of the | |||
| skipping to change at page 27, line 17 ¶ | skipping to change at page 28, line 17 ¶ | |||
| This section defines the process of resolving a URI reference within | This section defines the process of resolving a URI reference within | |||
| a context that allows relative references, such that the result is a | a context that allows relative references, such that the result is a | |||
| string matching the "URI" syntax rule of Section 3. | string matching the "URI" syntax rule of Section 3. | |||
| 5.1 Establishing a Base URI | 5.1 Establishing a Base URI | |||
| The term "relative" implies that there exists a "base URI" against | The term "relative" implies that there exists a "base URI" against | |||
| which the relative reference is applied. Aside from fragment-only | which the relative reference is applied. Aside from fragment-only | |||
| references (Section 4.4), relative references are only usable when a | references (Section 4.4), relative references are only usable when a | |||
| base URI is known. A base URI must be established by the parser | base URI is known. A base URI must be established by the parser | |||
| prior to parsing URI references that might be relative. | prior to parsing URI references that might be relative. A base URI | |||
| must conform to the <absolute-URI> syntax rule (Section 4.3): if the | ||||
| base URI is obtained from a URI reference, then that reference must | ||||
| be converted to absolute form and stripped of any fragment component | ||||
| prior to use as a base URI. | ||||
| The base URI of a reference can be established in one of four ways, | The base URI of a reference can be established in one of four ways, | |||
| discussed below in order of precedence. The order of precedence can | discussed below in order of precedence. The order of precedence can | |||
| be thought of in terms of layers, where the innermost defined base | be thought of in terms of layers, where the innermost defined base | |||
| URI has the highest precedence. This can be visualized graphically | URI has the highest precedence. This can be visualized graphically | |||
| as: | as: | |||
| .----------------------------------------------------------. | .----------------------------------------------------------. | |||
| | .----------------------------------------------------. | | | .----------------------------------------------------. | | |||
| | | .----------------------------------------------. | | | | | .----------------------------------------------. | | | |||
| skipping to change at page 28, line 30 ¶ | skipping to change at page 29, line 32 ¶ | |||
| (e.g., the message and multipart types) is defined by MHTML | (e.g., the message and multipart types) is defined by MHTML | |||
| [RFC2557]. Protocols that do not use the MIME message header syntax, | [RFC2557]. Protocols that do not use the MIME message header syntax, | |||
| but do allow some form of tagged metadata to be included within | but do allow some form of tagged metadata to be included within | |||
| messages, may define their own syntax for defining a base URI as part | messages, may define their own syntax for defining a base URI as part | |||
| of a message. | of a message. | |||
| 5.1.3 Base URI from the Retrieval URI | 5.1.3 Base URI from the Retrieval URI | |||
| If no base URI is embedded and the representation is not encapsulated | If no base URI is embedded and the representation is not encapsulated | |||
| within some other entity, then, if a URI was used to retrieve the | within some other entity, then, if a URI was used to retrieve the | |||
| representation, that URI shall be considered the base URI. Note that | representation, that URI shall be considered the base URI. Note that | |||
| if the retrieval was the result of a redirected request, the last URI | if the retrieval was the result of a redirected request, the last URI | |||
| used (i.e., the URI that resulted in the actual retrieval of the | used (i.e., the URI that resulted in the actual retrieval of the | |||
| representation) is the base URI. | representation) is the base URI. | |||
| 5.1.4 Default Base URI | 5.1.4 Default Base URI | |||
| If none of the conditions described above apply, then the base URI is | If none of the conditions described above apply, then the base URI is | |||
| defined by the context of the application. Since this definition is | defined by the context of the application. Since this definition is | |||
| necessarily application-dependent, failing to define a base URI using | necessarily application-dependent, failing to define a base URI using | |||
| one of the other methods may result in the same content being | one of the other methods may result in the same content being | |||
| interpreted differently by different types of application. | interpreted differently by different types of application. | |||
| A sender of a representation containing relative references is | A sender of a representation containing relative references is | |||
| responsible for ensuring that a base URI for those references can be | responsible for ensuring that a base URI for those references can be | |||
| established. Aside from fragment-only references, relative references | established. Aside from fragment-only references, relative | |||
| can only be used reliably in situations where the base URI is | references can only be used reliably in situations where the base URI | |||
| well-defined. | is well-defined. | |||
| 5.2 Relative Resolution | 5.2 Relative Resolution | |||
| This section describes an algorithm for converting a URI reference | This section describes an algorithm for converting a URI reference | |||
| that might be relative to a given base URI into the parsed components | that might be relative to a given base URI into the parsed components | |||
| of the reference's target. The components can then be recomposed, as | of the reference's target. The components can then be recomposed, as | |||
| described in Section 5.3, to form the target URI. This algorithm | described in Section 5.3, to form the target URI. This algorithm | |||
| provides definitive results that can be used to test the output of | provides definitive results that can be used to test the output of | |||
| other implementations. Applications may implement relative reference | other implementations. Applications may implement relative reference | |||
| resolution using some other algorithm, provided that the results | resolution using some other algorithm, provided that the results | |||
| match what would be given by this algorithm. | match what would be given by this algorithm. | |||
| 5.2.1 Pre-parse the Base URI | 5.2.1 Pre-parse the Base URI | |||
| The base URI (Base) is established according to the procedure of | The base URI (Base) is established according to the procedure of | |||
| Section 5.1 and parsed into the five main components described in | Section 5.1 and parsed into the five main components described in | |||
| Section 3. Note that only the scheme component is required to be | Section 3. Note that only the scheme component is required to be | |||
| skipping to change at page 31, line 22 ¶ | skipping to change at page 32, line 22 ¶ | |||
| forming the target URI. Although there are many ways to accomplish | forming the target URI. Although there are many ways to accomplish | |||
| this removal process, we describe a simple method using two string | this removal process, we describe a simple method using two string | |||
| buffers. | buffers. | |||
| 1. The input buffer is initialized with the now-appended path | 1. The input buffer is initialized with the now-appended path | |||
| components and the output buffer is initialized to the empty | components and the output buffer is initialized to the empty | |||
| string. | string. | |||
| 2. While the input buffer is not empty, loop: | 2. While the input buffer is not empty, loop: | |||
| a. If the input buffer begins with a prefix of "../" or "./", | A. If the input buffer begins with a prefix of "../" or "./", | |||
| then remove that prefix from the input buffer; otherwise, | then remove that prefix from the input buffer; otherwise, | |||
| b. If the input buffer begins with a prefix of "/./" or "/.", | B. If the input buffer begins with a prefix of "/./" or "/.", | |||
| where "." is a complete path segment, then replace that | where "." is a complete path segment, then replace that | |||
| prefix with "/" in the input buffer; otherwise, | prefix with "/" in the input buffer; otherwise, | |||
| c. If the input buffer begins with a prefix of "/../" or "/..", | C. If the input buffer begins with a prefix of "/../" or "/..", | |||
| where ".." is a complete path segment, then replace that | where ".." is a complete path segment, then replace that | |||
| prefix with "/" in the input buffer and remove the last | prefix with "/" in the input buffer and remove the last | |||
| segment and its preceding "/" (if any) from the output | segment and its preceding "/" (if any) from the output | |||
| buffer; otherwise, | buffer; otherwise, | |||
| d. If the input buffer consists only of "." or "..", then remove | D. If the input buffer consists only of "." or "..", then remove | |||
| that from the input buffer; otherwise, | that from the input buffer; otherwise, | |||
| e. Move the first path segment in the input buffer to the end of | E. Move the first path segment in the input buffer to the end of | |||
| the output buffer, including the initial "/" character (if | the output buffer, including the initial "/" character (if | |||
| any) and any subsequent characters up to, but not including, | any) and any subsequent characters up to, but not including, | |||
| the next "/" character or the end of the input buffer. | the next "/" character or the end of the input buffer. | |||
| 3. Finally, the output buffer is returned as the result of | 3. Finally, the output buffer is returned as the result of | |||
| remove_dot_segments. | remove_dot_segments. | |||
| Note that dot-segments are intended for use in URI references to | Note that dot-segments are intended for use in URI references to | |||
| express an identifier relative to the hierarchy of names in the base | express an identifier relative to the hierarchy of names in the base | |||
| URI. The remove_dot_segments algorithm respects that hierarchy by | URI. The remove_dot_segments algorithm respects that hierarchy by | |||
| removing extra dot-segments rather than treating them as an error or | removing extra dot-segments rather than treating them as an error or | |||
| leaving them to be misinterpreted by dereference implementations. | leaving them to be misinterpreted by dereference implementations. | |||
| The following illustrates how the above steps are applied for two | The following illustrates how the above steps are applied for two | |||
| example merged paths, showing the state of the two buffers after each | example merged paths, showing the state of the two buffers after each | |||
| step. | step. | |||
| STEP OUTPUT BUFFER INPUT BUFFER | STEP OUTPUT BUFFER INPUT BUFFER | |||
| 1 : /a/b/c/./../../g | 1 : /a/b/c/./../../g | |||
| 2e: /a /b/c/./../../g | 2E: /a /b/c/./../../g | |||
| 2e: /a/b /c/./../../g | 2E: /a/b /c/./../../g | |||
| 2e: /a/b/c /./../../g | 2E: /a/b/c /./../../g | |||
| 2b: /a/b/c /../../g | 2B: /a/b/c /../../g | |||
| 2c: /a/b /../g | 2C: /a/b /../g | |||
| 2c: /a /g | 2C: /a /g | |||
| 2e: /a/g | 2E: /a/g | |||
| STEP OUTPUT BUFFER INPUT BUFFER | STEP OUTPUT BUFFER INPUT BUFFER | |||
| 1 : mid/content=5/../6 | 1 : mid/content=5/../6 | |||
| 2e: mid /content=5/../6 | 2E: mid /content=5/../6 | |||
| 2e: mid/content=5 /../6 | 2E: mid/content=5 /../6 | |||
| 2c: mid /6 | 2C: mid /6 | |||
| 2e: mid/6 | 2E: mid/6 | |||
| Some applications may find it more efficient to implement the | Some applications may find it more efficient to implement the | |||
| remove_dot_segments algorithm using two segment stacks rather than | remove_dot_segments algorithm using two segment stacks rather than | |||
| strings. | strings. | |||
| Note: Beware that some older, erroneous implementations will fail | Note: Beware that some older, erroneous implementations will fail | |||
| to separate a reference's query component from its path component | to separate a reference's query component from its path component | |||
| prior to merging the base and reference paths, resulting in an | prior to merging the base and reference paths, resulting in an | |||
| interoperability failure if the query component contains the | interoperability failure if the query component contains the | |||
| strings "/../" or "/./". | strings "/../" or "/./". | |||
| skipping to change at page 35, line 42 ¶ | skipping to change at page 36, line 38 ¶ | |||
| relative references. | relative references. | |||
| "g?y/./x" = "http://a/b/c/g?y/./x" | "g?y/./x" = "http://a/b/c/g?y/./x" | |||
| "g?y/../x" = "http://a/b/c/g?y/../x" | "g?y/../x" = "http://a/b/c/g?y/../x" | |||
| "g#s/./x" = "http://a/b/c/g#s/./x" | "g#s/./x" = "http://a/b/c/g#s/./x" | |||
| "g#s/../x" = "http://a/b/c/g#s/../x" | "g#s/../x" = "http://a/b/c/g#s/../x" | |||
| Some parsers allow the scheme name to be present in a relative URI | Some parsers allow the scheme name to be present in a relative URI | |||
| reference if it is the same as the base URI scheme. This is | reference if it is the same as the base URI scheme. This is | |||
| considered to be a loophole in prior specifications of partial URI | considered to be a loophole in prior specifications of partial URI | |||
| [RFC1630]. Its use should be avoided, but is allowed for backward | [RFC1630]. Its use should be avoided, but is allowed for backward | |||
| compatibility. | compatibility. | |||
| "http:g" = "http:g" ; for strict parsers | "http:g" = "http:g" ; for strict parsers | |||
| / "http://a/b/c/g" ; for backward compatibility | / "http://a/b/c/g" ; for backward compatibility | |||
| 6. Normalization and Comparison | 6. Normalization and Comparison | |||
| One of the most common operations on URIs is simple comparison: | One of the most common operations on URIs is simple comparison: | |||
| determining if two URIs are equivalent without using the URIs to | determining if two URIs are equivalent without using the URIs to | |||
| access their respective resource(s). A comparison is performed every | access their respective resource(s). A comparison is performed every | |||
| skipping to change at page 36, line 32 ¶ | skipping to change at page 37, line 23 ¶ | |||
| canonical form for URI references is defined to reduce the occurrence | canonical form for URI references is defined to reduce the occurrence | |||
| of false negative comparisons. | of false negative comparisons. | |||
| 6.1 Equivalence | 6.1 Equivalence | |||
| Since URIs exist to identify resources, presumably they should be | Since URIs exist to identify resources, presumably they should be | |||
| considered equivalent when they identify the same resource. However, | considered equivalent when they identify the same resource. However, | |||
| such a definition of equivalence is not of much practical use, since | such a definition of equivalence is not of much practical use, since | |||
| there is no way for software to compare two resources without | there is no way for software to compare two resources without | |||
| knowledge of the implementation-specific syntax of each URI's | knowledge of the implementation-specific syntax of each URI's | |||
| dereferencing algorithm. For this reason, determination of | dereferencing algorithm. For this reason, determination of | |||
| equivalence or difference of URIs is based on string comparison, | equivalence or difference of URIs is based on string comparison, | |||
| perhaps augmented by reference to additional rules provided by URI | perhaps augmented by reference to additional rules provided by URI | |||
| scheme definitions. We use the terms "different" and "equivalent" to | scheme definitions. We use the terms "different" and "equivalent" to | |||
| describe the possible outcomes of such comparisons, but there are | describe the possible outcomes of such comparisons, but there are | |||
| many application-dependent versions of equivalence. | many application-dependent versions of equivalence. | |||
| Even though it is possible to determine that two URIs are equivalent, | Even though it is possible to determine that two URIs are equivalent, | |||
| it is never possible to be sure that two URIs identify different | it is never possible to be sure that two URIs identify different | |||
| resources. For example, an owner of two different domain names could | resources. For example, an owner of two different domain names could | |||
| decide to serve the same resource from both, resulting in two | decide to serve the same resource from both, resulting in two | |||
| different URIs. Therefore, comparison methods are designed to | different URIs. Therefore, comparison methods are designed to | |||
| minimize false negatives while strictly avoiding false positives. | minimize false negatives while strictly avoiding false positives. | |||
| In testing for equivalence, applications should not directly compare | In testing for equivalence, applications should not directly compare | |||
| relative URI references; the references should be converted to their | relative URI references; the references should be converted to their | |||
| target URI forms before comparison. When URIs are being compared for | target URI forms before comparison. When URIs are being compared for | |||
| the purpose of selecting (or avoiding) a network action, such as | the purpose of selecting (or avoiding) a network action, such as | |||
| retrieval of a representation, the fragment components (if any) | retrieval of a representation, the fragment components (if any) | |||
| should be excluded from the comparison. | should be excluded from the comparison. | |||
| skipping to change at page 37, line 28 ¶ | skipping to change at page 38, line 19 ¶ | |||
| producing false negatives, and proceeding to those that have higher | producing false negatives, and proceeding to those that have higher | |||
| computational cost and lower risk of false negatives. | computational cost and lower risk of false negatives. | |||
| 6.2.1 Simple String Comparison | 6.2.1 Simple String Comparison | |||
| If two URIs, considered as character strings, are identical, then it | If two URIs, considered as character strings, are identical, then it | |||
| is safe to conclude that they are equivalent. This type of | is safe to conclude that they are equivalent. This type of | |||
| equivalence test has very low computational cost and is in wide use | equivalence test has very low computational cost and is in wide use | |||
| in a variety of applications, particularly in the domain of parsing. | in a variety of applications, particularly in the domain of parsing. | |||
| Testing strings for equivalence requires some basic precautions. This | Testing strings for equivalence requires some basic precautions. | |||
| procedure is often referred to as "bit-for-bit" or "byte-for-byte" | This procedure is often referred to as "bit-for-bit" or | |||
| comparison, which is potentially misleading. Testing of strings for | "byte-for-byte" comparison, which is potentially misleading. Testing | |||
| equality is normally based on pairwise comparison of the characters | of strings for equality is normally based on pairwise comparison of | |||
| that make up the strings, starting from the first and proceeding | the characters that make up the strings, starting from the first and | |||
| until both strings are exhausted and all characters found to be | proceeding until both strings are exhausted and all characters found | |||
| equal, a pair of characters compares unequal, or one of the strings | to be equal, a pair of characters compares unequal, or one of the | |||
| is exhausted before the other. | strings is exhausted before the other. | |||
| Such character comparisons require that each pair of characters be | Such character comparisons require that each pair of characters be | |||
| put in comparable form. For example, should one URI be stored in a | put in comparable form. For example, should one URI be stored in a | |||
| byte array in EBCDIC encoding, and the second be in a Java String | byte array in EBCDIC encoding, and the second be in a Java String | |||
| object (UTF-16), bit-for-bit comparisons applied naively will produce | object (UTF-16), bit-for-bit comparisons applied naively will produce | |||
| errors. It is better to speak of equality on a | errors. It is better to speak of equality on a | |||
| character-for-character rather than byte-for-byte or bit-for-bit | character-for-character rather than byte-for-byte or bit-for-bit | |||
| basis. In practical terms, character-by-character comparisons should | basis. In practical terms, character-by-character comparisons should | |||
| be done codepoint-by-codepoint after conversion to a common character | be done codepoint-by-codepoint after conversion to a common character | |||
| encoding. | encoding. | |||
| 6.2.2 Syntax-based Normalization | 6.2.2 Syntax-based Normalization | |||
| Software may use logic based on the definitions provided by this | Software may use logic based on the definitions provided by this | |||
| specification to reduce the probability of false negatives. Such | specification to reduce the probability of false negatives. Such | |||
| processing is moderately higher in cost than character-for-character | processing is moderately higher in cost than character-for-character | |||
| string comparison. For example, an application using this approach | string comparison. For example, an application using this approach | |||
| could reasonably consider the following two URIs equivalent: | could reasonably consider the following two URIs equivalent: | |||
| example://a/b/c/%7Bfoo%7D | example://a/b/c/%7Bfoo%7D | |||
| eXAMPLE://a/./b/../b/%63/%7bfoo%7d | eXAMPLE://a/./b/../b/%63/%7bfoo%7d | |||
| Web user agents, such as browsers, typically apply this type of URI | Web user agents, such as browsers, typically apply this type of URI | |||
| normalization when determining whether a cached response is | normalization when determining whether a cached response is | |||
| available. Syntax-based normalization includes such techniques as | available. Syntax-based normalization includes such techniques as | |||
| case normalization, percent-encoding normalization, and removal of | case normalization, percent-encoding normalization, and removal of | |||
| dot-segments. | dot-segments. | |||
| 6.2.2.1 Case Normalization | 6.2.2.1 Case Normalization | |||
| When a URI scheme uses components of the generic syntax, it will also | When a URI scheme uses components of the generic syntax, it will also | |||
| use the common syntax equivalence rules, namely that the scheme and | use the common syntax equivalence rules, namely that the scheme and | |||
| host are case-insensitive and therefore should be normalized to | host are case-insensitive and therefore should be normalized to | |||
| lowercase. For example, the URI <HTTP://www.EXAMPLE.com/> is | lowercase. For example, the URI <HTTP://www.EXAMPLE.com/> is | |||
| equivalent to <http://www.example.com/>. Applications should not | equivalent to <http://www.example.com/>. Applications should not | |||
| skipping to change at page 38, line 32 ¶ | skipping to change at page 39, line 25 ¶ | |||
| since that is dependent on the implementation used to handle a | since that is dependent on the implementation used to handle a | |||
| dereference. | dereference. | |||
| The hexadecimal digits within a percent-encoding triplet (e.g., "%3a" | The hexadecimal digits within a percent-encoding triplet (e.g., "%3a" | |||
| versus "%3A") are case-insensitive and therefore should be normalized | versus "%3A") are case-insensitive and therefore should be normalized | |||
| to use uppercase letters for the digits A-F. | to use uppercase letters for the digits A-F. | |||
| 6.2.2.2 Percent-Encoding Normalization | 6.2.2.2 Percent-Encoding Normalization | |||
| The percent-encoding mechanism (Section 2.1) is a frequent source of | The percent-encoding mechanism (Section 2.1) is a frequent source of | |||
| variance among otherwise identical URIs. In addition to the | variance among otherwise identical URIs. In addition to the | |||
| case-insensitivity issue noted above, some URI producers | case-insensitivity issue noted above, some URI producers | |||
| percent-encode octets that do not require percent-encoding, resulting | percent-encode octets that do not require percent-encoding, resulting | |||
| in URIs that are equivalent to their non-encoded counterparts. Such | in URIs that are equivalent to their non-encoded counterparts. Such | |||
| URIs should be normalized by decoding any percent-encoded octet that | URIs should be normalized by decoding any percent-encoded octet that | |||
| corresponds to an unreserved character, as described in Section 2.3. | corresponds to an unreserved character, as described in Section 2.3. | |||
| 6.2.2.3 Path Segment Normalization | 6.2.2.3 Path Segment Normalization | |||
| The complete path segments "." and ".." have a special meaning within | The complete path segments "." and ".." have a special meaning within | |||
| hierarchical URI schemes. As such, they should not appear in | hierarchical URI schemes. As such, they should not appear in | |||
| absolute paths; if they are found, they can be removed by applying | absolute paths; if they are found, they can be removed by applying | |||
| the remove_dot_segments algorithm to the path, as described in | the remove_dot_segments algorithm to the path, as described in | |||
| Section 5.2. | Section 5.2. | |||
| skipping to change at page 39, line 17 ¶ | skipping to change at page 40, line 14 ¶ | |||
| http://example.com | http://example.com | |||
| http://example.com/ | http://example.com/ | |||
| http://example.com:/ | http://example.com:/ | |||
| http://example.com:80/ | http://example.com:80/ | |||
| In general, a URI that uses the generic syntax for authority with an | In general, a URI that uses the generic syntax for authority with an | |||
| empty path should be normalized to a path of "/"; likewise, an | empty path should be normalized to a path of "/"; likewise, an | |||
| explicit ":port", where the port is empty or the default for the | explicit ":port", where the port is empty or the default for the | |||
| scheme, is equivalent to one where the port and its ":" delimiter are | scheme, is equivalent to one where the port and its ":" delimiter are | |||
| elided. In other words, the second of the above URI examples is the | elided. In other words, the second of the above URI examples is the | |||
| normal form for the "http" scheme. | normal form for the "http" scheme. | |||
| Another case where normalization varies by scheme is in the handling | Another case where normalization varies by scheme is in the handling | |||
| of an empty authority component or empty host subcomponent. For many | of an empty authority component or empty host subcomponent. For many | |||
| scheme specifications, an empty authority or host is considered an | scheme specifications, an empty authority or host is considered an | |||
| error; for others, it is considered equivalent to "localhost" or the | error; for others, it is considered equivalent to "localhost" or the | |||
| end-user's host. When a scheme defines a default for authority and a | end-user's host. When a scheme defines a default for authority and a | |||
| URI reference to that default is desired, the reference should have | URI reference to that default is desired, the reference should have | |||
| an empty authority for the sake of uniformity, brevity, and | an empty authority for the sake of uniformity, brevity, and | |||
| internationalization. If, however, either the userinfo or port | internationalization. If, however, either the userinfo or port | |||
| subcomponent is non-empty, then the host should be given explicitly | subcomponent is non-empty, then the host should be given explicitly | |||
| even if it matches the default. | even if it matches the default. | |||
| 6.2.4 Protocol-based Normalization | 6.2.4 Protocol-based Normalization | |||
| Web spiders, for which substantial effort to reduce the incidence of | Web spiders, for which substantial effort to reduce the incidence of | |||
| false negatives is often cost-effective, are observed to implement | false negatives is often cost-effective, are observed to implement | |||
| even more aggressive techniques in URI comparison. For example, if | even more aggressive techniques in URI comparison. For example, if | |||
| they observe that a URI such as | they observe that a URI such as | |||
| http://example.com/data | http://example.com/data | |||
| redirects to a URI differing only in the trailing slash | redirects to a URI differing only in the trailing slash | |||
| http://example.com/data/ | http://example.com/data/ | |||
| they will likely regard the two as equivalent in the future. This | they will likely regard the two as equivalent in the future. This | |||
| kind of technique is only appropriate when equivalence is clearly | kind of technique is only appropriate when equivalence is clearly | |||
| indicated by both the result of accessing the resources and the | indicated by both the result of accessing the resources and the | |||
| common conventions of their scheme's dereference algorithm (in this | common conventions of their scheme's dereference algorithm (in this | |||
| case, use of redirection by HTTP origin servers to avoid problems | case, use of redirection by HTTP origin servers to avoid problems | |||
| with relative references). | with relative references). | |||
| 6.3 Canonical Form | 6.3 Canonical Form | |||
| It is in the best interests of everyone concerned to avoid | It is in the best interests of everyone concerned to avoid | |||
| false-negatives in comparing URIs and to minimize the amount of | false-negatives in comparing URIs and to minimize the amount of | |||
| skipping to change at page 40, line 43 ¶ | skipping to change at page 41, line 36 ¶ | |||
| are often used to provide a compact set of instructions for access to | are often used to provide a compact set of instructions for access to | |||
| network resources, care must be taken to properly interpret the data | network resources, care must be taken to properly interpret the data | |||
| within a URI, to prevent that data from causing unintended access, | within a URI, to prevent that data from causing unintended access, | |||
| and to avoid including data that should not be revealed in plain | and to avoid including data that should not be revealed in plain | |||
| text. | text. | |||
| 7.1 Reliability and Consistency | 7.1 Reliability and Consistency | |||
| There is no guarantee that, having once used a given URI to retrieve | There is no guarantee that, having once used a given URI to retrieve | |||
| some information, the same information will be retrievable by that | some information, the same information will be retrievable by that | |||
| URI in the future. Nor is there any guarantee that the information | URI in the future. Nor is there any guarantee that the information | |||
| retrievable via that URI in the future will be observably similar to | retrievable via that URI in the future will be observably similar to | |||
| that retrieved in the past. The URI syntax does not constrain how a | that retrieved in the past. The URI syntax does not constrain how a | |||
| given scheme or authority apportions its name space or maintains it | given scheme or authority apportions its name space or maintains it | |||
| over time. Such a guarantee can only be obtained from the person(s) | over time. Such a guarantee can only be obtained from the person(s) | |||
| controlling that name space and the resource in question. A specific | controlling that name space and the resource in question. A specific | |||
| URI scheme may define additional semantics, such as name persistence, | URI scheme may define additional semantics, such as name persistence, | |||
| if those semantics are required of all naming authorities for that | if those semantics are required of all naming authorities for that | |||
| scheme. | scheme. | |||
| 7.2 Malicious Construction | 7.2 Malicious Construction | |||
| skipping to change at page 41, line 23 ¶ | skipping to change at page 42, line 16 ¶ | |||
| running a different protocol service and data within the URI contains | running a different protocol service and data within the URI contains | |||
| instructions that, when interpreted according to this other protocol, | instructions that, when interpreted according to this other protocol, | |||
| cause an unexpected operation. A frequent example of such abuse has | cause an unexpected operation. A frequent example of such abuse has | |||
| been the use of a protocol-based scheme with a port component of | been the use of a protocol-based scheme with a port component of | |||
| "25", thereby fooling user agent software into sending an unintended | "25", thereby fooling user agent software into sending an unintended | |||
| or impersonating message via an SMTP server. | or impersonating message via an SMTP server. | |||
| Applications should prevent dereference of a URI that specifies a TCP | Applications should prevent dereference of a URI that specifies a TCP | |||
| port number within the "well-known port" range (0 - 1023) unless the | port number within the "well-known port" range (0 - 1023) unless the | |||
| protocol being used to dereference that URI is compatible with the | protocol being used to dereference that URI is compatible with the | |||
| protocol expected on that well-known port. Although IANA maintains a | protocol expected on that well-known port. Although IANA maintains a | |||
| registry of well-known ports, applications should make such | registry of well-known ports, applications should make such | |||
| restrictions user-configurable to avoid preventing the deployment of | restrictions user-configurable to avoid preventing the deployment of | |||
| new services. | new services. | |||
| When a URI contains percent-encoded octets that match the delimiters | When a URI contains percent-encoded octets that match the delimiters | |||
| for a given resolution or dereference protocol (for example, CR and | for a given resolution or dereference protocol (for example, CR and | |||
| LF characters for the TELNET protocol), such percent-encoded octets | LF characters for the TELNET protocol), such percent-encoded octets | |||
| must not be decoded before transmission across that protocol. | must not be decoded before transmission across that protocol. | |||
| Transfer of the percent-encoding, which might violate the protocol, | Transfer of the percent-encoding, which might violate the protocol, | |||
| is less harmful than allowing decoded octets to be interpreted as | is less harmful than allowing decoded octets to be interpreted as | |||
| skipping to change at page 41, line 46 ¶ | skipping to change at page 42, line 39 ¶ | |||
| 7.3 Back-end Transcoding | 7.3 Back-end Transcoding | |||
| When a URI is dereferenced, the data within it is often parsed by | When a URI is dereferenced, the data within it is often parsed by | |||
| both the user agent and one or more servers. In HTTP, for example, a | both the user agent and one or more servers. In HTTP, for example, a | |||
| typical user agent will parse a URI into its five major components, | typical user agent will parse a URI into its five major components, | |||
| access the authority's server, and send it the data within the | access the authority's server, and send it the data within the | |||
| authority, path, and query components. A typical server will take | authority, path, and query components. A typical server will take | |||
| that information, parse the path into segments and the query into | that information, parse the path into segments and the query into | |||
| key/value pairs, and then invoke implementation-specific handlers to | key/value pairs, and then invoke implementation-specific handlers to | |||
| respond to the request. As a result, a common security concern for | respond to the request. As a result, a common security concern for | |||
| server implementations that handle a URI, either as a whole or split | server implementations that handle a URI, either as a whole or split | |||
| into separate components, is proper interpretation of the octet data | into separate components, is proper interpretation of the octet data | |||
| represented by the characters and percent-encodings within that URI. | represented by the characters and percent-encodings within that URI. | |||
| Percent-encoded octets must be decoded at some point during the | Percent-encoded octets must be decoded at some point during the | |||
| dereference process. Applications must split the URI into its | dereference process. Applications must split the URI into its | |||
| components and subcomponents prior to decoding the octets, since | components and subcomponents prior to decoding the octets, since | |||
| otherwise the decoded octets might be mistaken for delimiters. | otherwise the decoded octets might be mistaken for delimiters. | |||
| Security checks of the data within a URI should be applied after | Security checks of the data within a URI should be applied after | |||
| decoding the octets. Note, however, that the "%00" percent-encoding | decoding the octets. Note, however, that the "%00" percent-encoding | |||
| (NUL) may require special handling and should be rejected if the | (NUL) may require special handling and should be rejected if the | |||
| application is not expecting to receive raw data within a component. | application is not expecting to receive raw data within a component. | |||
| Special care should be taken when the URI path interpretation process | Special care should be taken when the URI path interpretation process | |||
| involves the use of a back-end filesystem or related system | involves the use of a back-end filesystem or related system | |||
| functions. Filesystems typically assign an operational meaning to | functions. Filesystems typically assign an operational meaning to | |||
| special characters, such as the "/", "\", ":", "[", and "]" | special characters, such as the "/", "\", ":", "[", and "]" | |||
| characters, and special device names like ".", "..", "...", "aux", | characters, and special device names like ".", "..", "...", "aux", | |||
| "lpt", etc. In some cases, merely testing for the existence of such a | "lpt", etc. In some cases, merely testing for the existence of such | |||
| name will cause the operating system to pause or invoke unrelated | a name will cause the operating system to pause or invoke unrelated | |||
| system calls, leading to significant security concerns regarding | system calls, leading to significant security concerns regarding | |||
| denial of service and unintended data transfer. It would be | denial of service and unintended data transfer. It would be | |||
| impossible for this specification to list all such significant | impossible for this specification to list all such significant | |||
| characters and device names; implementers should research the | characters and device names; implementers should research the | |||
| reserved names and characters for the types of storage device that | reserved names and characters for the types of storage device that | |||
| may be attached to their application and restrict the use of data | may be attached to their application and restrict the use of data | |||
| obtained from URI components accordingly. | obtained from URI components accordingly. | |||
| 7.4 Rare IP Address Formats | 7.4 Rare IP Address Formats | |||
| skipping to change at page 42, line 42 ¶ | skipping to change at page 43, line 33 ¶ | |||
| dotted-decimal form of IPv4 address literal, many implementations | dotted-decimal form of IPv4 address literal, many implementations | |||
| that process URIs make use of platform-dependent system routines, | that process URIs make use of platform-dependent system routines, | |||
| such as gethostbyname() and inet_aton(), to translate the string | such as gethostbyname() and inet_aton(), to translate the string | |||
| literal to an actual IP address. Unfortunately, such system routines | literal to an actual IP address. Unfortunately, such system routines | |||
| often allow and process a much larger set of formats than those | often allow and process a much larger set of formats than those | |||
| described in Section 3.2.2. | described in Section 3.2.2. | |||
| For example, many implementations allow dotted forms of three | For example, many implementations allow dotted forms of three | |||
| numbers, wherein the last part is interpreted as a 16-bit quantity | numbers, wherein the last part is interpreted as a 16-bit quantity | |||
| and placed in the right-most two bytes of the network address (e.g., | and placed in the right-most two bytes of the network address (e.g., | |||
| a Class B network). Likewise, a dotted form of two numbers means the | a Class B network). Likewise, a dotted form of two numbers means the | |||
| last part is interpreted as a 24-bit quantity and placed in the right | last part is interpreted as a 24-bit quantity and placed in the right | |||
| most three bytes of the network address (Class A), and a single | most three bytes of the network address (Class A), and a single | |||
| number (without dots) is interpreted as a 32-bit quantity and stored | number (without dots) is interpreted as a 32-bit quantity and stored | |||
| directly in the network address. Adding further to the confusion, | directly in the network address. Adding further to the confusion, | |||
| some implementations allow each dotted part to be interpreted as | some implementations allow each dotted part to be interpreted as | |||
| decimal, octal, or hexadecimal, as specified in the C language (i.e., | decimal, octal, or hexadecimal, as specified in the C language (i.e., | |||
| a leading 0x or 0X implies hexadecimal; otherwise, a leading 0 | a leading 0x or 0X implies hexadecimal; otherwise, a leading 0 | |||
| implies octal; otherwise, the number is interpreted as decimal). | implies octal; otherwise, the number is interpreted as decimal). | |||
| These additional IP address formats are not allowed in the URI syntax | These additional IP address formats are not allowed in the URI syntax | |||
| skipping to change at page 43, line 18 ¶ | skipping to change at page 44, line 10 ¶ | |||
| access to resources based on the IP address in string literal format. | access to resources based on the IP address in string literal format. | |||
| If such filtering is performed, literals should be converted to | If such filtering is performed, literals should be converted to | |||
| numeric form and filtered based on the numeric value, rather than a | numeric form and filtered based on the numeric value, rather than a | |||
| prefix or suffix of the string form. | prefix or suffix of the string form. | |||
| 7.5 Sensitive Information | 7.5 Sensitive Information | |||
| URI producers should not provide a URI that contains a username or | URI producers should not provide a URI that contains a username or | |||
| password which is intended to be secret: URIs are frequently | password which is intended to be secret: URIs are frequently | |||
| displayed by browsers, stored in clear text bookmarks, and logged by | displayed by browsers, stored in clear text bookmarks, and logged by | |||
| user agent history and intermediary applications (proxies). A | user agent history and intermediary applications (proxies). A | |||
| password appearing within the userinfo component is deprecated and | password appearing within the userinfo component is deprecated and | |||
| should be considered an error (or simply ignored) except in those | should be considered an error (or simply ignored) except in those | |||
| rare cases where the 'password' parameter is intended to be public. | rare cases where the 'password' parameter is intended to be public. | |||
| 7.6 Semantic Attacks | 7.6 Semantic Attacks | |||
| Because the userinfo subcomponent is rarely used and appears before | Because the userinfo subcomponent is rarely used and appears before | |||
| the host in the authority component, it can be used to construct a | the host in the authority component, it can be used to construct a | |||
| URI that is intended to mislead a human user by appearing to identify | URI that is intended to mislead a human user by appearing to identify | |||
| one (trusted) naming authority while actually identifying a different | one (trusted) naming authority while actually identifying a different | |||
| skipping to change at page 43, line 43 ¶ | skipping to change at page 44, line 35 ¶ | |||
| might lead a human user to assume that the host is 'cnn.example.com', | might lead a human user to assume that the host is 'cnn.example.com', | |||
| whereas it is actually '10.0.0.1'. Note that a misleading userinfo | whereas it is actually '10.0.0.1'. Note that a misleading userinfo | |||
| subcomponent could be much longer than the example above. | subcomponent could be much longer than the example above. | |||
| A misleading URI, such as the one above, is an attack on the user's | A misleading URI, such as the one above, is an attack on the user's | |||
| preconceived notions about the meaning of a URI, rather than an | preconceived notions about the meaning of a URI, rather than an | |||
| attack on the software itself. User agents may be able to reduce the | attack on the software itself. User agents may be able to reduce the | |||
| impact of such attacks by distinguishing the various components of | impact of such attacks by distinguishing the various components of | |||
| the URI when rendered, such as by using a different color or tone to | the URI when rendered, such as by using a different color or tone to | |||
| render userinfo if any is present, though there is no general | render userinfo if any is present, though there is no general | |||
| panacea. More information on URI-based semantic attacks can be found | panacea. More information on URI-based semantic attacks can be found | |||
| in [Siedzik]. | in [Siedzik]. | |||
| 8. Acknowledgments | 8. IANA Considerations | |||
| URI scheme names, as defined by <scheme> in Section 3.1, form a | ||||
| registered name space that is managed by IANA according to the | ||||
| procedures defined in [BCP35]. | ||||
| 9. Acknowledgments | ||||
| This specification is derived from RFC 2396 [RFC2396], RFC 1808 | This specification is derived from RFC 2396 [RFC2396], RFC 1808 | |||
| [RFC1808], and RFC 1738 [RFC1738]; the acknowledgments in those | [RFC1808], and RFC 1738 [RFC1738]; the acknowledgments in those | |||
| documents still apply. It also incorporates the update (with | documents still apply. It also incorporates the update (with | |||
| corrections) for IPv6 literals in the host syntax, as defined by | corrections) for IPv6 literals in the host syntax, as defined by | |||
| Robert M. Hinden, Brian E. Carpenter, and Larry Masinter in | Robert M. Hinden, Brian E. Carpenter, and Larry Masinter in | |||
| [RFC2732]. In addition, contributions by Gisle Aas, Reese Anschultz, | [RFC2732]. In addition, contributions by Gisle Aas, Reese Anschultz, | |||
| Daniel Barclay, Tim Bray, Mike Brown, Rob Cameron, Jeremy Carroll, | Daniel Barclay, Tim Bray, Mike Brown, Rob Cameron, Jeremy Carroll, | |||
| Dan Connolly, Adam M. Costello, John Cowan, Jason Diamond, Martin | Dan Connolly, Adam M. Costello, John Cowan, Jason Diamond, Martin | |||
| Duerst, Stefan Eissing, Clive D.W. Feather, Tony Hammond, Pat Hayes, | Duerst, Stefan Eissing, Clive D.W. Feather, Al Gilman, Tony Hammond, | |||
| Henry Holtzman, Ian B. Jacobs, Michael Kay, John C. Klensin, Graham | Elliotte Harold, Pat Hayes, Henry Holtzman, Ian B. Jacobs, Michael | |||
| Klyne, Dan Kohn, Bruce Lilly, Andrew Main, Ira McDonald, Michael | Kay, John C. Klensin, Graham Klyne, Dan Kohn, Bruce Lilly, Andrew | |||
| Mealling, Ray Merkert, Stephen Pollei, Julian Reschke, Tomas Rokicki, | Main, Dave McAlpin, Ira McDonald, Michael Mealling, Ray Merkert, | |||
| Miles Sabin, Kai Schaetzl, Mark Thomson, Ronald Tschalaer, Norm | Stephen Pollei, Julian Reschke, Tomas Rokicki, Miles Sabin, Kai | |||
| Walsh, Marc Warne, Stuart Williams, and Henry Zongaro are gratefully | Schaetzl, Mark Thomson, Ronald Tschalaer, Norm Walsh, Marc Warne, | |||
| acknowledged. | Stuart Williams, and Henry Zongaro are gratefully acknowledged. | |||
| 9. References | 10. References | |||
| 9.1 Normative References | 10.1 Normative References | |||
| [ASCII] American National Standards Institute, "Coded Character | [ASCII] American National Standards Institute, "Coded Character | |||
| Set -- 7-bit American Standard Code for Information | Set -- 7-bit American Standard Code for Information | |||
| Interchange", ANSI X3.4, 1986. | Interchange", ANSI X3.4, 1986. | |||
| [RFC2234] Crocker, D. and P. Overell, "Augmented BNF for Syntax | [RFC2234] Crocker, D. and P. Overell, "Augmented BNF for Syntax | |||
| Specifications: ABNF", RFC 2234, November 1997. | Specifications: ABNF", RFC 2234, November 1997. | |||
| [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO | [STD63] Yergeau, F., "UTF-8, a transformation format of ISO | |||
| 10646", STD 63, RFC 3629, November 2003. | 10646", STD 63, RFC 3629, November 2003. | |||
| 9.2 Informative References | [UCS] International Organization for Standardization, | |||
| "Information Technology - Universal Multiple-Octet Coded | ||||
| Character Set (UCS)", ISO/IEC 10646:2003, December 2003. | ||||
| 10.2 Informative References | ||||
| [BCP19] Freed, N. and J. Postel, "IANA Charset Registration | ||||
| Procedures", BCP 19, RFC 2978, October 2000. | ||||
| [BCP35] Petke, R. and I. King, "Registration Procedures for URL | ||||
| Scheme Names", BCP 35, RFC 2717, November 1999. | ||||
| [RFC0952] Harrenstien, K., Stahl, M. and E. Feinler, "DoD Internet | [RFC0952] Harrenstien, K., Stahl, M. and E. Feinler, "DoD Internet | |||
| host table specification", RFC 952, October 1985. | host table specification", RFC 952, October 1985. | |||
| [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", | [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", | |||
| STD 13, RFC 1034, November 1987. | STD 13, RFC 1034, November 1987. | |||
| [RFC1123] Braden, R., "Requirements for Internet Hosts - Application | [RFC1123] Braden, R., "Requirements for Internet Hosts - Application | |||
| and Support", STD 3, RFC 1123, October 1989. | and Support", STD 3, RFC 1123, October 1989. | |||
| skipping to change at page 46, line 7 ¶ | skipping to change at page 46, line 28 ¶ | |||
| [RFC1808] Fielding, R., "Relative Uniform Resource Locators", RFC | [RFC1808] Fielding, R., "Relative Uniform Resource Locators", RFC | |||
| 1808, June 1995. | 1808, June 1995. | |||
| [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail | [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail | |||
| Extensions (MIME) Part Two: Media Types", RFC 2046, | Extensions (MIME) Part Two: Media Types", RFC 2046, | |||
| November 1996. | November 1996. | |||
| [RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. | [RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. | |||
| [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and | ||||
| Languages", BCP 18, RFC 2277, January 1998. | ||||
| [RFC2396] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform | [RFC2396] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform | |||
| Resource Identifiers (URI): Generic Syntax", RFC 2396, | Resource Identifiers (URI): Generic Syntax", RFC 2396, | |||
| August 1998. | August 1998. | |||
| [RFC2518] Goland, Y., Whitehead, E., Faizi, A., Carter, S. and D. | [RFC2518] Goland, Y., Whitehead, E., Faizi, A., Carter, S. and D. | |||
| Jensen, "HTTP Extensions for Distributed Authoring -- | Jensen, "HTTP Extensions for Distributed Authoring -- | |||
| WEBDAV", RFC 2518, February 1999. | WEBDAV", RFC 2518, February 1999. | |||
| [RFC2557] Palme, F., Hopmann, A., Shelness, N. and E. Stefferud, | [RFC2557] Palme, F., Hopmann, A., Shelness, N. and E. Stefferud, | |||
| "MIME Encapsulation of Aggregate Documents, such as HTML | "MIME Encapsulation of Aggregate Documents, such as HTML | |||
| (MHTML)", RFC 2557, March 1999. | (MHTML)", RFC 2557, March 1999. | |||
| [RFC2717] Petke, R. and I. King, "Registration Procedures for URL | ||||
| Scheme Names", BCP 35, RFC 2717, November 1999. | ||||
| [RFC2718] Masinter, L., Alvestrand, H., Zigmond, D. and R. Petke, | [RFC2718] Masinter, L., Alvestrand, H., Zigmond, D. and R. Petke, | |||
| "Guidelines for new URL Schemes", RFC 2718, November 1999. | "Guidelines for new URL Schemes", RFC 2718, November 1999. | |||
| [RFC2732] Hinden, R., Carpenter, B. and L. Masinter, "Format for | [RFC2732] Hinden, R., Carpenter, B. and L. Masinter, "Format for | |||
| Literal IPv6 Addresses in URL's", RFC 2732, December 1999. | Literal IPv6 Addresses in URL's", RFC 2732, December 1999. | |||
| [RFC2978] Freed, N. and J. Postel, "IANA Charset Registration | ||||
| Procedures", BCP 19, RFC 2978, October 2000. | ||||
| [RFC3305] Mealling, M. and R. Denenberg, "Report from the Joint W3C/ | [RFC3305] Mealling, M. and R. Denenberg, "Report from the Joint W3C/ | |||
| IETF URI Planning Interest Group: Uniform Resource | IETF URI Planning Interest Group: Uniform Resource | |||
| Identifiers (URIs), URLs, and Uniform Resource Names | Identifiers (URIs), URLs, and Uniform Resource Names | |||
| (URNs): Clarifications and Recommendations", RFC 3305, | (URNs): Clarifications and Recommendations", RFC 3305, | |||
| August 2002. | August 2002. | |||
| [RFC3490] Faltstrom, P., Hoffman, P. and A. Costello, | [RFC3490] Faltstrom, P., Hoffman, P. and A. Costello, | |||
| "Internationalizing Domain Names in Applications (IDNA)", | "Internationalizing Domain Names in Applications (IDNA)", | |||
| RFC 3490, March 2003. | RFC 3490, March 2003. | |||
| skipping to change at page 47, line 22 ¶ | skipping to change at page 47, line 33 ¶ | |||
| USA | USA | |||
| Phone: +1-617-253-5702 | Phone: +1-617-253-5702 | |||
| Fax: +1-617-258-5999 | Fax: +1-617-258-5999 | |||
| EMail: timbl@w3.org | EMail: timbl@w3.org | |||
| URI: http://www.w3.org/People/Berners-Lee/ | URI: http://www.w3.org/People/Berners-Lee/ | |||
| Roy T. Fielding | Roy T. Fielding | |||
| Day Software | Day Software | |||
| 5251 California Ave., Suite 110 | 5251 California Ave., Suite 110 | |||
| Irvine, CA 92612-3074 | Irvine, CA 92617 | |||
| USA | USA | |||
| Phone: +1-949-679-2960 | Phone: +1-949-679-2960 | |||
| Fax: +1-949-679-2972 | Fax: +1-949-679-2972 | |||
| EMail: fielding@gbiv.com | EMail: fielding@gbiv.com | |||
| URI: http://roy.gbiv.com/ | URI: http://roy.gbiv.com/ | |||
| Larry Masinter | Larry Masinter | |||
| Adobe Systems Incorporated | Adobe Systems Incorporated | |||
| 345 Park Ave | 345 Park Ave | |||
| skipping to change at page 48, line 10 ¶ | skipping to change at page 48, line 10 ¶ | |||
| Phone: +1-408-536-3024 | Phone: +1-408-536-3024 | |||
| EMail: LMM@acm.org | EMail: LMM@acm.org | |||
| URI: http://larry.masinter.net/ | URI: http://larry.masinter.net/ | |||
| Appendix A. Collected ABNF for URI | Appendix A. Collected ABNF for URI | |||
| URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] | URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] | |||
| hier-part = "//" authority path-abempty | hier-part = "//" authority path-abempty | |||
| / path-abs | / path-absolute | |||
| / path-rootless | / path-rootless | |||
| / path-empty | / path-empty | |||
| URI-reference = URI / relative-URI | URI-reference = URI / relative-URI | |||
| absolute-URI = scheme ":" hier-part [ "?" query ] | absolute-URI = scheme ":" hier-part [ "?" query ] | |||
| relative-URI = relative-part [ "?" query ] [ "#" fragment ] | relative-URI = relative-part [ "?" query ] [ "#" fragment ] | |||
| relative-part = "//" authority path-abempty | relative-part = "//" authority path-abempty | |||
| / path-abs | / path-absolute | |||
| / path-noscheme | / path-noscheme | |||
| / path-empty | / path-empty | |||
| scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) | scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) | |||
| authority = [ userinfo "@" ] host [ ":" port ] | authority = [ userinfo "@" ] host [ ":" port ] | |||
| userinfo = *( unreserved / pct-encoded / sub-delims / ":" ) | userinfo = *( unreserved / pct-encoded / sub-delims / ":" ) | |||
| host = IP-literal / IPv4address / reg-name | host = IP-literal / IPv4address / reg-name | |||
| port = *DIGIT | port = *DIGIT | |||
| skipping to change at page 48, line 48 ¶ | skipping to change at page 49, line 4 ¶ | |||
| / [ h16 ] "::" 4( h16 ":" ) ls32 | / [ h16 ] "::" 4( h16 ":" ) ls32 | |||
| / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32 | / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32 | |||
| / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32 | / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32 | |||
| / [ *3( h16 ":" ) h16 ] "::" h16 ":" ls32 | / [ *3( h16 ":" ) h16 ] "::" h16 ":" ls32 | |||
| / [ *4( h16 ":" ) h16 ] "::" ls32 | / [ *4( h16 ":" ) h16 ] "::" ls32 | |||
| / [ *5( h16 ":" ) h16 ] "::" h16 | / [ *5( h16 ":" ) h16 ] "::" h16 | |||
| / [ *6( h16 ":" ) h16 ] "::" | / [ *6( h16 ":" ) h16 ] "::" | |||
| h16 = 1*4HEXDIG | h16 = 1*4HEXDIG | |||
| ls32 = ( h16 ":" h16 ) / IPv4address | ls32 = ( h16 ":" h16 ) / IPv4address | |||
| IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet | IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet | |||
| dec-octet = DIGIT ; 0-9 | dec-octet = DIGIT ; 0-9 | |||
| / %x31-39 DIGIT ; 10-99 | / %x31-39 DIGIT ; 10-99 | |||
| / "1" 2DIGIT ; 100-199 | / "1" 2DIGIT ; 100-199 | |||
| / "2" %x30-34 DIGIT ; 200-249 | / "2" %x30-34 DIGIT ; 200-249 | |||
| / "25" %x30-35 ; 250-255 | / "25" %x30-35 ; 250-255 | |||
| reg-name = 0*255( unreserved / pct-encoded / sub-delims ) | reg-name = *( unreserved / pct-encoded / sub-delims ) | |||
| path = path-abempty ; begins with "/" or is empty | path = path-abempty ; begins with "/" or is empty | |||
| / path-abs ; begins with "/" but not "//" | / path-absolute ; begins with "/" but not "//" | |||
| / path-noscheme ; begins with a non-colon segment | / path-noscheme ; begins with a non-colon segment | |||
| / path-rootless ; begins with a segment | / path-rootless ; begins with a segment | |||
| / path-empty ; zero characters | / path-empty ; zero characters | |||
| path-abempty = *( "/" segment ) | path-abempty = *( "/" segment ) | |||
| path-abs = "/" [ segment-nz *( "/" segment ) ] | path-absolute = "/" [ segment-nz *( "/" segment ) ] | |||
| path-noscheme = segment-nzc *( "/" segment ) | path-noscheme = segment-nz-nc *( "/" segment ) | |||
| path-rootless = segment-nz *( "/" segment ) | path-rootless = segment-nz *( "/" segment ) | |||
| path-empty = 0<pchar> | path-empty = 0<pchar> | |||
| segment = *pchar | segment = *pchar | |||
| segment-nz = 1*pchar | segment-nz = 1*pchar | |||
| segment-nzc = 1*( unreserved / pct-encoded / sub-delims / "@" ) | segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" ) | |||
| ; non-zero-length segment without any colon ":" | ||||
| pchar = unreserved / pct-encoded / sub-delims / ":" / "@" | pchar = unreserved / pct-encoded / sub-delims / ":" / "@" | |||
| query = *( pchar / "/" / "?" ) | query = *( pchar / "/" / "?" ) | |||
| fragment = *( pchar / "/" / "?" ) | fragment = *( pchar / "/" / "?" ) | |||
| pct-encoded = "%" HEXDIG HEXDIG | pct-encoded = "%" HEXDIG HEXDIG | |||
| unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" | unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" | |||
| skipping to change at page 51, line 24 ¶ | skipping to change at page 51, line 19 ¶ | |||
| In practice, URIs are delimited in a variety of ways, but usually | In practice, URIs are delimited in a variety of ways, but usually | |||
| within double-quotes "http://example.com/", angle brackets <http:// | within double-quotes "http://example.com/", angle brackets <http:// | |||
| example.com/>, or just using whitespace | example.com/>, or just using whitespace | |||
| http://example.com/ | http://example.com/ | |||
| These wrappers do not form part of the URI. | These wrappers do not form part of the URI. | |||
| In some cases, extra whitespace (spaces, line-breaks, tabs, etc.) may | In some cases, extra whitespace (spaces, line-breaks, tabs, etc.) may | |||
| need to be added to break a long URI across lines. The whitespace | need to be added to break a long URI across lines. The whitespace | |||
| should be ignored when extracting the URI. | should be ignored when extracting the URI. | |||
| No whitespace should be introduced after a hyphen ("-") character. | No whitespace should be introduced after a hyphen ("-") character. | |||
| Because some typesetters and printers may (erroneously) introduce a | Because some typesetters and printers may (erroneously) introduce a | |||
| hyphen at the end of line when breaking a line, the interpreter of a | hyphen at the end of line when breaking a line, the interpreter of a | |||
| URI containing a line break immediately after a hyphen should ignore | URI containing a line break immediately after a hyphen should ignore | |||
| all whitespace around the line break, and should be aware that the | all whitespace around the line break, and should be aware that the | |||
| hyphen may or may not actually be part of the URI. | hyphen may or may not actually be part of the URI. | |||
| Using <> angle brackets around each URI is especially recommended as | Using <> angle brackets around each URI is especially recommended as | |||
| skipping to change at page 52, line 37 ¶ | skipping to change at page 52, line 24 ¶ | |||
| authority component and not allowed outside their use as delimiters | authority component and not allowed outside their use as delimiters | |||
| for an IP literal within host. In order to make this change without | for an IP literal within host. In order to make this change without | |||
| changing the technical definition of the path, query, and fragment | changing the technical definition of the path, query, and fragment | |||
| components, those rules were redefined to directly specify the | components, those rules were redefined to directly specify the | |||
| characters allowed rather than be defined in terms of uric. | characters allowed rather than be defined in terms of uric. | |||
| Since [RFC2732] defers to [RFC3513] for definition of an IPv6 literal | Since [RFC2732] defers to [RFC3513] for definition of an IPv6 literal | |||
| address, which unfortunately lacks an ABNF description of | address, which unfortunately lacks an ABNF description of | |||
| IPv6address, we created a new ABNF rule for IPv6address that matches | IPv6address, we created a new ABNF rule for IPv6address that matches | |||
| the text representations defined by Section 2.2 of [RFC3513]. | the text representations defined by Section 2.2 of [RFC3513]. | |||
| Likewise, the definition of IPv4address has been improved in order to | Likewise, the definition of IPv4address has been improved in order to | |||
| limit each decimal octet to the range 0-255. | limit each decimal octet to the range 0-255. | |||
| Section 6 (Section 6) on URI normalization and comparison has been | Section 6 (Section 6) on URI normalization and comparison has been | |||
| completely rewritten and extended using input from Tim Bray and | completely rewritten and extended using input from Tim Bray and | |||
| discussion within the W3C Technical Architecture Group. | discussion within the W3C Technical Architecture Group. | |||
| An ABNF rule for URI has been introduced to correspond to the common | An ABNF rule for URI has been introduced to correspond to the common | |||
| usage of the term: an absolute URI with optional fragment. | usage of the term: an absolute URI with optional fragment. | |||
| D.2 Modifications from RFC 2396 | D.2 Modifications from RFC 2396 | |||
| The ad-hoc BNF syntax has been replaced with the ABNF of [RFC2234]. | The ad-hoc BNF syntax has been replaced with the ABNF of [RFC2234]. | |||
| This change required all rule names that formerly included underscore | This change required all rule names that formerly included underscore | |||
| characters to be renamed with a dash instead. | characters to be renamed with a dash instead. | |||
| Section 2 on characters has been rewritten to explain what characters | Section 2 on characters has been rewritten to explain what characters | |||
| are reserved, when they are reserved, and why they are reserved even | are reserved, when they are reserved, and why they are reserved even | |||
| when not used as delimiters by the generic syntax. The mark | when not used as delimiters by the generic syntax. The mark | |||
| characters that are typically unsafe to decode, including the | characters that are typically unsafe to decode, including the | |||
| exclamation mark ("!"), asterisk ("*"), single-quote ("'"), and open | exclamation mark ("!"), asterisk ("*"), single-quote ("'"), and open | |||
| and close parentheses ("(" and ")"), have been moved to the reserved | and close parentheses ("(" and ")"), have been moved to the reserved | |||
| set in order to clarify the distinction between reserved and | set in order to clarify the distinction between reserved and | |||
| unreserved and hopefully answer the most common question of scheme | unreserved and hopefully answer the most common question of scheme | |||
| designers. Likewise, the section on percent-encoded characters has | designers. Likewise, the section on percent-encoded characters has | |||
| been rewritten, and URI normalizers are now given license to decode | been rewritten, and URI normalizers are now given license to decode | |||
| any percent-encoded octets corresponding to unreserved characters. | any percent-encoded octets corresponding to unreserved characters. | |||
| In general, the terms "escaped" and "unescaped" have been replaced | In general, the terms "escaped" and "unescaped" have been replaced | |||
| with "percent-encoded" and "decoded", respectively, to reduce | with "percent-encoded" and "decoded", respectively, to reduce | |||
| confusion with other forms of escape mechanisms. | confusion with other forms of escape mechanisms. | |||
| The ABNF for URI and URI-reference has been redesigned to make them | The ABNF for URI and URI-reference has been redesigned to make them | |||
| more friendly to LALR parsers and reduce complexity. As a result, the | more friendly to LALR parsers and reduce complexity. As a result, | |||
| layout form of syntax description has been removed, along with the | the layout form of syntax description has been removed, along with | |||
| uric, uric_no_slash, opaque_part, net_path, abs_path, rel_path, | the uric, uric_no_slash, opaque_part, net_path, abs_path, rel_path, | |||
| path_segments, rel_segment, and mark rules. All references to | path_segments, rel_segment, and mark rules. All references to | |||
| "opaque" URIs have been replaced with a better description of how the | "opaque" URIs have been replaced with a better description of how the | |||
| path component may be opaque to hierarchy. The ambiguity regarding | path component may be opaque to hierarchy. The ambiguity regarding | |||
| the parsing of URI-reference as a URI or a relative-URI with a colon | the parsing of URI-reference as a URI or a relative-URI with a colon | |||
| in the first segment has been eliminated through the use of five | in the first segment has been eliminated through the use of five | |||
| separate path matching rules. | separate path matching rules. | |||
| The fragment identifier has been moved back into the section on | The fragment identifier has been moved back into the section on | |||
| generic syntax components and within the URI and relative-URI rules, | generic syntax components and within the URI and relative-URI rules, | |||
| though it remains excluded from absolute-URI. The number sign ("#") | though it remains excluded from absolute-URI. The number sign ("#") | |||
| character has been moved back to the reserved set as a result of | character has been moved back to the reserved set as a result of | |||
| reintegrating the fragment syntax. | reintegrating the fragment syntax. | |||
| The ABNF has been corrected to allow a relative path to be empty. | The ABNF has been corrected to allow a relative path to be empty. | |||
| This also allows an absolute-URI to consist of nothing after the | This also allows an absolute-URI to consist of nothing after the | |||
| "scheme:", as is present in practice with the "dav:" namespace | "scheme:", as is present in practice with the "dav:" namespace | |||
| [RFC2518] and the "about:" scheme used internally by many WWW browser | [RFC2518] and the "about:" scheme used internally by many WWW browser | |||
| implementations. The ambiguity regarding the boundary between | implementations. The ambiguity regarding the boundary between | |||
| authority and path has been eliminated through the use of five | authority and path has been eliminated through the use of five | |||
| separate path matching rules. | separate path matching rules. | |||
| Registry-based naming authorities that use the generic syntax are now | Registry-based naming authorities that use the generic syntax are now | |||
| defined within the host rule and limited to 255 path characters. This | defined within the host rule. This change allows current | |||
| change allows current implementations, where whatever name provided | implementations, where whatever name provided is simply fed to the | |||
| is simply fed to the local name resolution mechanism, to be | local name resolution mechanism, to be consistent with the | |||
| consistent with the specification and removes the need to re-specify | specification and removes the need to re-specify DNS name formats | |||
| DNS name formats here. It also allows the host component to contain | here. It also allows the host component to contain percent-encoded | |||
| percent-encoded octets, which is necessary to enable | octets, which is necessary to enable internationalized domain names | |||
| internationalized domain names to be provided in URIs, processed in | to be provided in URIs, processed in their native character encodings | |||
| their native character encodings at the application layers above URI | at the application layers above URI processing, and passed to an IDNA | |||
| processing, and passed to an IDNA library as a registered name in the | library as a registered name in the UTF-8 character encoding. The | |||
| UTF-8 character encoding. The server, hostport, hostname, | server, hostport, hostname, domainlabel, toplabel, and alphanum rules | |||
| domainlabel, toplabel, and alphanum rules have been removed. | have been removed. | |||
| The resolving relative references algorithm of [RFC2396] has been | The resolving relative references algorithm of [RFC2396] has been | |||
| rewritten using pseudocode for this revision to improve clarity and | rewritten using pseudocode for this revision to improve clarity and | |||
| fix the following issues: | fix the following issues: | |||
| o [RFC2396] section 5.2, step 6a, failed to account for a base URI | o [RFC2396] section 5.2, step 6a, failed to account for a base URI | |||
| with no path. | with no path. | |||
| o Restored the behavior of [RFC1808] where, if the reference | o Restored the behavior of [RFC1808] where, if the reference | |||
| contains an empty path and a defined query component, then the | contains an empty path and a defined query component, then the | |||
| target URI inherits the base URI's path component. | target URI inherits the base URI's path component. | |||
| o Removed the special-case treatment of same-document references | o The determination of whether a URI reference is a same-document | |||
| within the URI parser in favor of a section that explains when a | reference has been decoupled from the URI parser, simplifying the | |||
| reference should be interpreted by a dereferencing engine as a | URI processing interface within applications in a way consistent | |||
| same-document reference: when the target URI and base URI, | with the internal architecture of deployed URI processing | |||
| excluding fragments, match. This change does not modify the | implementations. The determination is now based on comparison to | |||
| behavior of existing same-document references as defined by RFC | the base URI after transforming a reference to absolute form, | |||
| 2396 (fragment-only references); it merely adds the same-document | rather than on the format of the reference itself. This change | |||
| distinction to other references that refer to the base URI and | may result in more references being considered "same-document" | |||
| simplifies the interface between applications and their URI | under this specification than would be under the rules given in | |||
| parsers, as is consistent with the internal architecture of | RFC 2396, especially when normalization is used to reduce aliases. | |||
| deployed URI processing implementations. | However, it does not change the status of existing same-document | |||
| references. | ||||
| o Separated the path merge routine into two routines: merge, for | o Separated the path merge routine into two routines: merge, for | |||
| describing combination of the base URI path with a relative-path | describing combination of the base URI path with a relative-path | |||
| reference, and remove_dot_segments, for describing how to remove | reference, and remove_dot_segments, for describing how to remove | |||
| the special "." and ".." segments from a composed path. The | the special "." and ".." segments from a composed path. The | |||
| remove_dot_segments algorithm is now applied to all URI reference | remove_dot_segments algorithm is now applied to all URI reference | |||
| paths in order to match common implementations and improve the | paths in order to match common implementations and improve the | |||
| normalization of URIs in practice. This change only impacts the | normalization of URIs in practice. This change only impacts the | |||
| parsing of abnormal references and same-scheme references wherein | parsing of abnormal references and same-scheme references wherein | |||
| the base URI has a non-hierarchical path. | the base URI has a non-hierarchical path. | |||
| Appendix E. Instructions to RFC Editor | ||||
| Prior to publication as an RFC, please remove this section and the | ||||
| "Editorial Note" that appears after the Abstract. If [BCP35] or any | ||||
| of the normative references are updated prior to publication, the | ||||
| associated reference in this document can be safely updated as well. | ||||
| This document has been produced using the xml2rfc tool set; the XML | ||||
| version can be obtained via the URI listed in the editorial note. | ||||
| Index | Index | |||
| A | A | |||
| ABNF 10 | ABNF 11 | |||
| absolute 25 | absolute 26 | |||
| absolute-path 25 | absolute-path 25 | |||
| absolute-URI 25 | absolute-URI 26 | |||
| access 8 | access 9 | |||
| authority 15, 16 | authority 15, 17 | |||
| B | B | |||
| base URI 27 | base URI 28 | |||
| C | C | |||
| character encoding 4 | character encoding 4 | |||
| character 4 | character 4 | |||
| characters 10 | characters 11 | |||
| coded character set 4 | coded character set 4 | |||
| D | D | |||
| dec-octet 19 | dec-octet 20 | |||
| dereference 8 | dereference 9 | |||
| dot-segments 21 | dot-segments 22 | |||
| F | F | |||
| fragment 15, 23 | fragment 15, 23 | |||
| G | G | |||
| gen-delims 11 | gen-delims 12 | |||
| generic syntax 6 | generic syntax 6 | |||
| H | H | |||
| h16 18 | h16 19 | |||
| hier-part 15 | hier-part 15 | |||
| hierarchical 9 | hierarchical 10 | |||
| host 17 | host 18 | |||
| I | I | |||
| identifier 5 | identifier 5 | |||
| IP-literal 18 | IP-literal 19 | |||
| IPv4 19 | IPv4 20 | |||
| IPv4address 19 | IPv4address 20 | |||
| IPv6 18 | IPv6 19 | |||
| IPv6address 18 | IPv6address 19 | |||
| IPvFuture 18 | IPvFuture 19 | |||
| L | L | |||
| locator 6 | locator 7 | |||
| ls32 18 | ls32 19 | |||
| M | M | |||
| merge 30 | merge 31 | |||
| N | N | |||
| name 6 | name 7 | |||
| network-path 25 | network-path 25 | |||
| P | P | |||
| path 15, 21 | path 15, 21 | |||
| path-abempty 21 | path-abempty 21 | |||
| path-abs 21 | path-absolute 21 | |||
| path-empty 21 | path-empty 21 | |||
| path-noscheme 21 | path-noscheme 21 | |||
| path-rootless 21 | path-rootless 21 | |||
| path-abempty 15 | path-abempty 15 | |||
| path-abs 15 | path-absolute 15 | |||
| path-empty 15 | path-empty 15 | |||
| path-rootless 15 | path-rootless 15 | |||
| pchar 21 | pchar 21 | |||
| pct-encoded 11 | pct-encoded 12 | |||
| percent-encoding 11 | percent-encoding 12 | |||
| port 20 | port 21 | |||
| Q | Q | |||
| query 15, 22 | query 15, 23 | |||
| R | R | |||
| reg-name 19 | reg-name 20 | |||
| registered name 19 | registered name 20 | |||
| relative 9, 27 | relative 10, 28 | |||
| relative-path 25 | relative-path 25 | |||
| relative-URI 25 | relative-URI 25 | |||
| remove_dot_segments 30, 31 | remove_dot_segments 31 | |||
| representation 8 | representation 9 | |||
| reserved 11 | reserved 12 | |||
| resolution 8, 27 | resolution 9, 28 | |||
| resource 4 | resource 5 | |||
| retrieval 8 | retrieval 9 | |||
| S | S | |||
| same-document 25 | same-document 26 | |||
| sameness 8 | sameness 9 | |||
| scheme 15, 15 | scheme 15, 16 | |||
| segment 21 | segment 21 | |||
| segment-nz 21 | segment-nz 21 | |||
| segment-nzc 21 | segment-nz-nc 21 | |||
| sub-delims 11 | sub-delims 12 | |||
| suffix 26 | suffix 27 | |||
| T | T | |||
| transcription 7 | transcription 7 | |||
| U | U | |||
| uniform 4 | uniform 4 | |||
| unreserved 12 | unreserved 13 | |||
| URI grammar | URI grammar | |||
| absolute-URI 25 | absolute-URI 26 | |||
| ALPHA 10 | ALPHA 11 | |||
| authority 15, 16 | authority 16, 17 | |||
| CR 10 | CR 11 | |||
| dec-octet 19 | dec-octet 20 | |||
| DIGIT 10 | DIGIT 11 | |||
| DQUOTE 10 | DQUOTE 11 | |||
| fragment 15, 23, 25 | fragment 16, 24, 26 | |||
| gen-delims 11 | gen-delims 12 | |||
| h16 18 | h16 19 | |||
| HEXDIG 10 | HEXDIG 11 | |||
| hier-part 15 | hier-part 16 | |||
| host 16, 17 | host 17, 18 | |||
| IP-literal 18 | IP-literal 19 | |||
| IPv4address 19 | IPv4address 20 | |||
| IPv6address 18 | IPv6address 19, 19 | |||
| IPvFuture 18 | IPvFuture 19 | |||
| LF 10 | LF 11 | |||
| ls32 18 | ls32 19 | |||
| mark 12 | mark 13 | |||
| OCTET 10 | OCTET 11 | |||
| path 21 | path 22 | |||
| path-abempty 15, 21 | path-abempty 16, 22 | |||
| path-abs 15, 21 | path-absolute 16, 22 | |||
| path-empty 15, 21 | path-empty 16, 22 | |||
| path-noscheme 21 | path-noscheme 22 | |||
| path-rootless 15, 21 | path-rootless 16, 22 | |||
| pchar 21, 22, 23 | pchar 22, 23, 24 | |||
| pct-encoded 11 | pct-encoded 12 | |||
| port 16, 20 | port 17, 21 | |||
| query 15, 22, 25 | query 16, 23, 26, 26 | |||
| reg-name 19 | reg-name 20 | |||
| relative-URI 24, 25 | relative-URI 25, 26 | |||
| reserved 11 | reserved 12 | |||
| scheme 15, 16, 25 | scheme 16, 16, 26 | |||
| segment 21 | segment 22 | |||
| segment-nz 21 | segment-nz 22 | |||
| segment-nzc 21 | segment-nz-nc 22 | |||
| SP 10 | SP 11 | |||
| sub-delims 11 | sub-delims 12 | |||
| unreserved 12 | unreserved 13 | |||
| URI 15, 24 | URI 16, 25 | |||
| URI-reference 24 | URI-reference 25 | |||
| userinfo 16, 17 | userinfo 17, 17 | |||
| URI 15 | URI 15 | |||
| URI-reference 24 | URI-reference 25 | |||
| URL 6 | URL 7 | |||
| URN 6 | URN 7 | |||
| userinfo 17 | userinfo 17 | |||
| Intellectual Property Statement | Intellectual Property Statement | |||
| The IETF takes no position regarding the validity or scope of any | The IETF takes no position regarding the validity or scope of any | |||
| Intellectual Property Rights or other rights that might be claimed to | Intellectual Property Rights or other rights that might be claimed to | |||
| pertain to the implementation or use of the technology described in | pertain to the implementation or use of the technology described in | |||
| this document or the extent to which any license under such rights | this document or the extent to which any license under such rights | |||
| might or might not be available; nor does it represent that it has | might or might not be available; nor does it represent that it has | |||
| made any independent effort to identify any such rights. Information | made any independent effort to identify any such rights. Information | |||
| on the IETF's procedures with respect to rights in IETF Documents can | on the procedures with respect to rights in RFC documents can be | |||
| be found in BCP 78 and BCP 79. | found in BCP 78 and BCP 79. | |||
| Copies of IPR disclosures made to the IETF Secretariat and any | Copies of IPR disclosures made to the IETF Secretariat and any | |||
| assurances of licenses to be made available, or the result of an | assurances of licenses to be made available, or the result of an | |||
| attempt made to obtain a general license or permission for the use of | attempt made to obtain a general license or permission for the use of | |||
| such proprietary rights by implementers or users of this | such proprietary rights by implementers or users of this | |||
| specification can be obtained from the IETF on-line IPR repository at | specification can be obtained from the IETF on-line IPR repository at | |||
| http://www.ietf.org/ipr. | http://www.ietf.org/ipr. | |||
| The IETF invites any interested party to bring to its attention any | The IETF invites any interested party to bring to its attention any | |||
| copyrights, patents or patent applications, or other proprietary | copyrights, patents or patent applications, or other proprietary | |||
| rights that may cover technology that may be required to implement | rights that may cover technology that may be required to implement | |||
| this standard. Please address the information to the IETF at | this standard. Please address the information to the IETF at | |||
| ietf-ipr@ietf.org. | ietf-ipr@ietf.org. | |||
| Disclaimer of Validity | Disclaimer of Validity | |||
| This document and the information contained herein are provided on an | This document and the information contained herein are provided on an | |||
| "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | |||
| OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET | OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET | |||
| ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, | ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, | |||
| INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE | INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE | |||
| INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | |||
| WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | |||
| Copyright Statement | Copyright Statement | |||
| Copyright (C) The Internet Society (2004). This document is subject | Copyright (C) The Internet Society (2004). This document is subject | |||
| to the rights, licenses and restrictions contained in BCP 78, and | to the rights, licenses and restrictions contained in BCP 78, and | |||
| except as set forth therein, the authors retain all their rights. | except as set forth therein, the authors retain all their rights. | |||
| Acknowledgment | Acknowledgment | |||
| Funding for the RFC Editor function is currently provided by the | Funding for the RFC Editor function is currently provided by the | |||
| Internet Society. | Internet Society. | |||
| End of changes. 193 change blocks. | ||||
| 490 lines changed or deleted | 544 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||