< draft-fielding-uri-rfc2396bis-06.txt   draft-fielding-uri-rfc2396bis-07.txt >
Network Working Group T. Berners-Lee Network Working Group T. Berners-Lee
Internet-Draft W3C/MIT Internet-Draft W3C/MIT
Updates: 1738 (if approved) R. Fielding Updates: 1738 (if approved) R. Fielding
Obsoletes: 2732, 2396, 1808 (if approved) Day Software Obsoletes: 2732, 2396, 1808 (if approved) Day Software
L. Masinter L. Masinter
Expires: January 15, 2005 Adobe Expires: March 26, 2005 Adobe
July 17, 2004 September 25, 2004
Uniform Resource Identifier (URI): Generic Syntax Uniform Resource Identifier (URI): Generic Syntax
draft-fielding-uri-rfc2396bis-06 draft-fielding-uri-rfc2396bis-07
Status of this Memo Status of this Memo
This document is an Internet-Draft and is subject to all provisions This document is an Internet-Draft and is subject to all provisions
of section 3 of RFC 3667. By submitting this Internet-Draft, each of section 3 of RFC 3667. By submitting this Internet-Draft, each
author represents that any applicable patent or other IPR claims of author represents that any applicable patent or other IPR claims of
which he or she is aware have been or will be disclosed, and any of which he or she is aware have been or will be disclosed, and any of
which he or she become aware will be disclosed, in accordance with which he or she become aware will be disclosed, in accordance with
RFC 3668. RFC 3668.
skipping to change at page 1, line 35 skipping to change at page 1, line 35
other groups may also distribute working documents as other groups may also distribute working documents as
Internet-Drafts. Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
<http://www.ietf.org/ietf/1id-abstracts.txt>. <http://www.ietf.org/ietf/1id-abstracts.txt>.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
<http://www.ietf.org/shadow.html>. <http://www.ietf.org/shadow.html>.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2004). All Rights Reserved. Copyright (C) The Internet Society (2004).
Abstract Abstract
A Uniform Resource Identifier (URI) is a compact sequence of A Uniform Resource Identifier (URI) is a compact sequence of
characters for identifying an abstract or physical resource. This characters for identifying an abstract or physical resource. This
specification defines the generic URI syntax and a process for specification defines the generic URI syntax and a process for
resolving URI references that might be in relative form, along with resolving URI references that might be in relative form, along with
guidelines and security considerations for the use of URIs on the guidelines and security considerations for the use of URIs on the
Internet. The URI syntax defines a grammar that is a superset of all Internet. The URI syntax defines a grammar that is a superset of all
valid URIs, such that an implementation can parse the common valid URIs, such that an implementation can parse the common
components of a URI reference without knowing the scheme-specific components of a URI reference without knowing the scheme-specific
requirements of every possible identifier. This specification does requirements of every possible identifier. This specification does
not define a generative grammar for URIs; that task is performed by not define a generative grammar for URIs; that task is performed by
the individual specifications of each URI scheme. the individual specifications of each URI scheme.
Editorial Note Editorial Note
Discussion of this draft and comments to the editors should be sent Discussion of this draft and comments to the editors should be sent
to the uri@w3.org mailing list. An issues list and version history to the uri@w3.org mailing list. An issues list and version history
is available at &lt;http://gbiv.com/protocols/uri/rev-2002/ is available at <http://gbiv.com/protocols/uri/rev-2002/issues.html>.
issues.html&gt; [1].
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1 Overview of URIs . . . . . . . . . . . . . . . . . . . . . 4 1.1 Overview of URIs . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Generic Syntax . . . . . . . . . . . . . . . . . . . . 6 1.1.1 Generic Syntax . . . . . . . . . . . . . . . . . . . . 6
1.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.3 URI, URL, and URN . . . . . . . . . . . . . . . . . . 7 1.1.3 URI, URL, and URN . . . . . . . . . . . . . . . . . . 7
1.2 Design Considerations . . . . . . . . . . . . . . . . . . 7 1.2 Design Considerations . . . . . . . . . . . . . . . . . . 7
1.2.1 Transcription . . . . . . . . . . . . . . . . . . . . 7 1.2.1 Transcription . . . . . . . . . . . . . . . . . . . . 7
skipping to change at page 2, line 33 skipping to change at page 2, line 32
1.3 Syntax Notation . . . . . . . . . . . . . . . . . . . . . 11 1.3 Syntax Notation . . . . . . . . . . . . . . . . . . . . . 11
2. Characters . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2. Characters . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Percent-Encoding . . . . . . . . . . . . . . . . . . . . . 12 2.1 Percent-Encoding . . . . . . . . . . . . . . . . . . . . . 12
2.2 Reserved Characters . . . . . . . . . . . . . . . . . . . 12 2.2 Reserved Characters . . . . . . . . . . . . . . . . . . . 12
2.3 Unreserved Characters . . . . . . . . . . . . . . . . . . 13 2.3 Unreserved Characters . . . . . . . . . . . . . . . . . . 13
2.4 When to Encode or Decode . . . . . . . . . . . . . . . . . 13 2.4 When to Encode or Decode . . . . . . . . . . . . . . . . . 13
2.5 Identifying Data . . . . . . . . . . . . . . . . . . . . . 14 2.5 Identifying Data . . . . . . . . . . . . . . . . . . . . . 14
3. Syntax Components . . . . . . . . . . . . . . . . . . . . . . 16 3. Syntax Components . . . . . . . . . . . . . . . . . . . . . . 16
3.1 Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.1 Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Authority . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Authority . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 User Information . . . . . . . . . . . . . . . . . . . 17 3.2.1 User Information . . . . . . . . . . . . . . . . . . . 18
3.2.2 Host . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.2 Host . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.3 Port . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.3 Port . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Path . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Path . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Query . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4 Query . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5 Fragment . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.5 Fragment . . . . . . . . . . . . . . . . . . . . . . . . . 24
4. Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4. Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1 URI Reference . . . . . . . . . . . . . . . . . . . . . . 25 4.1 URI Reference . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Relative URI . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Relative Reference . . . . . . . . . . . . . . . . . . . . 26
4.3 Absolute URI . . . . . . . . . . . . . . . . . . . . . . . 26 4.3 Absolute URI . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 Same-document Reference . . . . . . . . . . . . . . . . . 26 4.4 Same-document Reference . . . . . . . . . . . . . . . . . 27
4.5 Suffix Reference . . . . . . . . . . . . . . . . . . . . . 27 4.5 Suffix Reference . . . . . . . . . . . . . . . . . . . . . 27
5. Reference Resolution . . . . . . . . . . . . . . . . . . . . . 28 5. Reference Resolution . . . . . . . . . . . . . . . . . . . . . 28
5.1 Establishing a Base URI . . . . . . . . . . . . . . . . . 28 5.1 Establishing a Base URI . . . . . . . . . . . . . . . . . 28
5.1.1 Base URI Embedded in Content . . . . . . . . . . . . . 28 5.1.1 Base URI Embedded in Content . . . . . . . . . . . . . 29
5.1.2 Base URI from the Encapsulating Entity . . . . . . . . 29 5.1.2 Base URI from the Encapsulating Entity . . . . . . . . 29
5.1.3 Base URI from the Retrieval URI . . . . . . . . . . . 29 5.1.3 Base URI from the Retrieval URI . . . . . . . . . . . 30
5.1.4 Default Base URI . . . . . . . . . . . . . . . . . . . 29 5.1.4 Default Base URI . . . . . . . . . . . . . . . . . . . 30
5.2 Relative Resolution . . . . . . . . . . . . . . . . . . . 30 5.2 Relative Resolution . . . . . . . . . . . . . . . . . . . 30
5.2.1 Pre-parse the Base URI . . . . . . . . . . . . . . . . 30 5.2.1 Pre-parse the Base URI . . . . . . . . . . . . . . . . 30
5.2.2 Transform References . . . . . . . . . . . . . . . . . 30 5.2.2 Transform References . . . . . . . . . . . . . . . . . 31
5.2.3 Merge Paths . . . . . . . . . . . . . . . . . . . . . 31 5.2.3 Merge Paths . . . . . . . . . . . . . . . . . . . . . 32
5.2.4 Remove Dot Segments . . . . . . . . . . . . . . . . . 32 5.2.4 Remove Dot Segments . . . . . . . . . . . . . . . . . 32
5.3 Component Recomposition . . . . . . . . . . . . . . . . . 34 5.3 Component Recomposition . . . . . . . . . . . . . . . . . 34
5.4 Reference Resolution Examples . . . . . . . . . . . . . . 34 5.4 Reference Resolution Examples . . . . . . . . . . . . . . 34
5.4.1 Normal Examples . . . . . . . . . . . . . . . . . . . 35 5.4.1 Normal Examples . . . . . . . . . . . . . . . . . . . 35
5.4.2 Abnormal Examples . . . . . . . . . . . . . . . . . . 35 5.4.2 Abnormal Examples . . . . . . . . . . . . . . . . . . 35
6. Normalization and Comparison . . . . . . . . . . . . . . . . . 36 6. Normalization and Comparison . . . . . . . . . . . . . . . . . 36
6.1 Equivalence . . . . . . . . . . . . . . . . . . . . . . . 37 6.1 Equivalence . . . . . . . . . . . . . . . . . . . . . . . 37
6.2 Comparison Ladder . . . . . . . . . . . . . . . . . . . . 37 6.2 Comparison Ladder . . . . . . . . . . . . . . . . . . . . 37
6.2.1 Simple String Comparison . . . . . . . . . . . . . . . 38 6.2.1 Simple String Comparison . . . . . . . . . . . . . . . 38
6.2.2 Syntax-based Normalization . . . . . . . . . . . . . . 38 6.2.2 Syntax-based Normalization . . . . . . . . . . . . . . 39
6.2.3 Scheme-based Normalization . . . . . . . . . . . . . . 39 6.2.3 Scheme-based Normalization . . . . . . . . . . . . . . 40
6.2.4 Protocol-based Normalization . . . . . . . . . . . . . 40 6.2.4 Protocol-based Normalization . . . . . . . . . . . . . 41
6.3 Canonical Form . . . . . . . . . . . . . . . . . . . . . . 40
7. Security Considerations . . . . . . . . . . . . . . . . . . . 41 7. Security Considerations . . . . . . . . . . . . . . . . . . . 41
7.1 Reliability and Consistency . . . . . . . . . . . . . . . 41 7.1 Reliability and Consistency . . . . . . . . . . . . . . . 41
7.2 Malicious Construction . . . . . . . . . . . . . . . . . . 41 7.2 Malicious Construction . . . . . . . . . . . . . . . . . . 42
7.3 Back-end Transcoding . . . . . . . . . . . . . . . . . . . 42 7.3 Back-end Transcoding . . . . . . . . . . . . . . . . . . . 42
7.4 Rare IP Address Formats . . . . . . . . . . . . . . . . . 43 7.4 Rare IP Address Formats . . . . . . . . . . . . . . . . . 43
7.5 Sensitive Information . . . . . . . . . . . . . . . . . . 44 7.5 Sensitive Information . . . . . . . . . . . . . . . . . . 44
7.6 Semantic Attacks . . . . . . . . . . . . . . . . . . . . . 44 7.6 Semantic Attacks . . . . . . . . . . . . . . . . . . . . . 44
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 44 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 45
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 44 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 45
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 45 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 46
10.1 Normative References . . . . . . . . . . . . . . . . . . . . 45 10.1 Normative References . . . . . . . . . . . . . . . . . . . . 46
10.2 Informative References . . . . . . . . . . . . . . . . . . . 45 10.2 Informative References . . . . . . . . . . . . . . . . . . . 46
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 47 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 48
A. Collected ABNF for URI . . . . . . . . . . . . . . . . . . . . 48 A. Collected ABNF for URI . . . . . . . . . . . . . . . . . . . . 49
B. Parsing a URI Reference with a Regular Expression . . . . . . 50 B. Parsing a URI Reference with a Regular Expression . . . . . . 51
C. Delimiting a URI in Context . . . . . . . . . . . . . . . . . 50 C. Delimiting a URI in Context . . . . . . . . . . . . . . . . . 52
D. Summary of Non-editorial Changes . . . . . . . . . . . . . . . 52 D. Changes from RFC 2396 . . . . . . . . . . . . . . . . . . . . 53
D.1 Additions . . . . . . . . . . . . . . . . . . . . . . . . 52 D.1 Additions . . . . . . . . . . . . . . . . . . . . . . . . 53
D.2 Modifications from RFC 2396 . . . . . . . . . . . . . . . 52 D.2 Modifications . . . . . . . . . . . . . . . . . . . . . . 54
E. Instructions to RFC Editor . . . . . . . . . . . . . . . . . . 54 E. Instructions to RFC Editor . . . . . . . . . . . . . . . . . . 56
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Intellectual Property and Copyright Statements . . . . . . . . 59 Intellectual Property and Copyright Statements . . . . . . . . 61
1. Introduction 1. Introduction
A Uniform Resource Identifier (URI) provides a simple and extensible A Uniform Resource Identifier (URI) provides a simple and extensible
means for identifying a resource. This specification of URI syntax means for identifying a resource. This specification of URI syntax
and semantics is derived from concepts introduced by the World Wide and semantics is derived from concepts introduced by the World Wide
Web global information initiative, whose use of such identifiers Web global information initiative, whose use of such identifiers
dates from 1990 and is described in "Universal Resource Identifiers dates from 1990 and is described in "Universal Resource Identifiers
in WWW" [RFC1630], and is designed to meet the recommendations laid in WWW" [RFC1630], and is designed to meet the recommendations laid
out in "Functional Recommendations for Internet Resource Locators" out in "Functional Recommendations for Internet Resource Locators"
skipping to change at page 10, line 33 skipping to change at page 10, line 33
number sign ("#") characters for the purpose of delimiting components number sign ("#") characters for the purpose of delimiting components
that are significant to the generic parser's hierarchical that are significant to the generic parser's hierarchical
interpretation of an identifier. In addition to aiding the interpretation of an identifier. In addition to aiding the
readability of such identifiers through the consistent use of readability of such identifiers through the consistent use of
familiar syntax, this uniform representation of hierarchy across familiar syntax, this uniform representation of hierarchy across
naming schemes allows scheme-independent references to be made naming schemes allows scheme-independent references to be made
relative to that hierarchy. relative to that hierarchy.
It is often the case that a group or "tree" of documents has been It is often the case that a group or "tree" of documents has been
constructed to serve a common purpose, wherein the vast majority of constructed to serve a common purpose, wherein the vast majority of
URIs in these documents point to resources within the tree rather URI references in these documents point to resources within the tree
than outside of it. Similarly, documents located at a particular rather than outside of it. Similarly, documents located at a
site are much more likely to refer to other resources at that site particular site are much more likely to refer to other resources at
than to resources at remote sites. Relative referencing of URIs that site than to resources at remote sites. Relative referencing of
allows document trees to be partially independent of their location URIs allows document trees to be partially independent of their
and access scheme. For instance, it is possible for a single set of location and access scheme. For instance, it is possible for a
hypertext documents to be simultaneously accessible and traversable single set of hypertext documents to be simultaneously accessible and
via each of the "file", "http", and "ftp" schemes if the documents traversable via each of the "file", "http", and "ftp" schemes if the
refer to each other using relative references. Furthermore, such documents refer to each other using relative references.
document trees can be moved, as a whole, without changing any of the Furthermore, such document trees can be moved, as a whole, without
relative references. changing any of the relative references.
A relative URI reference (Section 4.2) refers to a resource by A relative reference (Section 4.2) refers to a resource by describing
describing the difference within a hierarchical name space between the difference within a hierarchical name space between the reference
the reference context and the target URI. The reference resolution context and the target URI. The reference resolution algorithm,
algorithm, presented in Section 5, defines how such a reference is presented in Section 5, defines how such a reference is transformed
transformed to the target URI. Since relative references can only be to the target URI. Since relative references can only be used within
used within the context of a hierarchical URI, designers of new URI the context of a hierarchical URI, designers of new URI schemes
schemes should use a syntax consistent with the generic syntax's should use a syntax consistent with the generic syntax's hierarchical
hierarchical components unless there are compelling reasons to forbid components unless there are compelling reasons to forbid relative
relative referencing within that scheme. referencing within that scheme.
All URIs are parsed by generic syntax parsers when used. A URI NOTE: Previous specifications used the terms "partial URI" and
scheme that wishes to remain opaque to hierarchical processing must "relative URI" to denote a relative reference to a URI. Since
disallow the use of slash and question mark characters. However, some readers misunderstood those terms to mean that relative URIs
since a URI reference is only modified by the generic parser if it are a subset of URIs, rather than a method of referencing URIs,
contains a dot-segment (a complete path segment of "." or "..", as this specification simply refers to them as relative references.
described in Section 3.3), URI schemes may safely use "/" for other
purposes if they do not allow dot-segments. All URI references are parsed by generic syntax parsers when used.
However, since hierarchical processing has no effect on an absolute
URI used in a reference unless it contains one or more dot-segments
(complete path segments of "." or "..", as described in Section 3.3),
URI scheme specifications can define opaque identifiers by
disallowing use of slash characters, question mark characters, and
the URIs "scheme:." and "scheme:..".
1.3 Syntax Notation 1.3 Syntax Notation
This specification uses the Augmented Backus-Naur Form (ABNF) This specification uses the Augmented Backus-Naur Form (ABNF)
notation of [RFC2234], including the following core ABNF syntax rules notation of [RFC2234], including the following core ABNF syntax rules
defined by that specification: ALPHA (letters), CR (carriage return), defined by that specification: ALPHA (letters), CR (carriage return),
DIGIT (decimal digits), DQUOTE (double quote), HEXDIG (hexadecimal DIGIT (decimal digits), DQUOTE (double quote), HEXDIG (hexadecimal
digits), LF (line feed), and SP (space). The complete URI syntax is digits), LF (line feed), and SP (space). The complete URI syntax is
collected in Appendix A. collected in Appendix A.
skipping to change at page 14, line 13 skipping to change at page 14, line 18
Once produced, a URI is always in its percent-encoded form. Once produced, a URI is always in its percent-encoded form.
When a URI is dereferenced, the components and subcomponents When a URI is dereferenced, the components and subcomponents
significant to the scheme-specific dereferencing process (if any) significant to the scheme-specific dereferencing process (if any)
must be parsed and separated before the percent-encoded octets within must be parsed and separated before the percent-encoded octets within
those components can be safely decoded, since otherwise the data may those components can be safely decoded, since otherwise the data may
be mistaken for component delimiters. The only exception is for be mistaken for component delimiters. The only exception is for
percent-encoded octets corresponding to characters in the unreserved percent-encoded octets corresponding to characters in the unreserved
set, which can be decoded at any time. For example, the octet set, which can be decoded at any time. For example, the octet
corresponding to the tilde ("~") character is often encoded as "%7E" corresponding to the tilde ("~") character is often encoded as "%7E"
by older URI processing software; the "%7E" can be replaced by "~" by older URI processing implementations; the "%7E" can be replaced by
without changing its interpretation. "~" without changing its interpretation.
Because the percent ("%") character serves as the indicator for Because the percent ("%") character serves as the indicator for
percent-encoded octets, it must be percent-encoded as "%25" in order percent-encoded octets, it must be percent-encoded as "%25" in order
for that octet to be used as data within a URI. Implementations must for that octet to be used as data within a URI. Implementations must
not percent-encode or decode the same string more than once, since not percent-encode or decode the same string more than once, since
decoding an already decoded string might lead to misinterpreting a decoding an already decoded string might lead to misinterpreting a
percent data octet as the beginning of a percent-encoding, or vice percent data octet as the beginning of a percent-encoding, or vice
versa in the case of percent-encoding an already percent-encoded versa in the case of percent-encoding an already percent-encoded
string. string.
skipping to change at page 17, line 9 skipping to change at page 17, line 11
lowercase in scheme names (e.g., allow "HTTP" as well as "http"), for lowercase in scheme names (e.g., allow "HTTP" as well as "http"), for
the sake of robustness, but should only produce lowercase scheme the sake of robustness, but should only produce lowercase scheme
names, for consistency. names, for consistency.
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
Individual schemes are not specified by this document. The process Individual schemes are not specified by this document. The process
for registration of new URI schemes is defined separately by [BCP35]. for registration of new URI schemes is defined separately by [BCP35].
The scheme registry maintains the mapping between scheme names and The scheme registry maintains the mapping between scheme names and
their specifications. Advice for designers of new URI schemes can be their specifications. Advice for designers of new URI schemes can be
found in [RFC2718]. found in [RFC2718]. URI scheme specifications must define their own
syntax such that all strings matching their scheme-specific syntax
will also match the <absolute-URI> grammar, as described in
Section 4.3.
When presented with a URI that violates one or more scheme-specific When presented with a URI that violates one or more scheme-specific
restrictions, the scheme-specific resolution process should flag the restrictions, the scheme-specific resolution process should flag the
reference as an error rather than ignore the unused parts; doing so reference as an error rather than ignore the unused parts; doing so
reduces the number of equivalent URIs and helps detect abuses of the reduces the number of equivalent URIs and helps detect abuses of the
generic syntax that might indicate the URI has been constructed to generic syntax that might indicate the URI has been constructed to
mislead the user (Section 7.6). mislead the user (Section 7.6).
3.2 Authority 3.2 Authority
skipping to change at page 22, line 10 skipping to change at page 22, line 21
form, that, along with data in the non-hierarchical query component form, that, along with data in the non-hierarchical query component
(Section 3.4), serves to identify a resource within the scope of the (Section 3.4), serves to identify a resource within the scope of the
URI's scheme and naming authority (if any). The path is terminated URI's scheme and naming authority (if any). The path is terminated
by the first question mark ("?") or number sign ("#") character, or by the first question mark ("?") or number sign ("#") character, or
by the end of the URI. by the end of the URI.
If a URI contains an authority component, then the path component If a URI contains an authority component, then the path component
must either be empty or begin with a slash ("/") character. If a URI must either be empty or begin with a slash ("/") character. If a URI
does not contain an authority component, then the path cannot begin does not contain an authority component, then the path cannot begin
with two slash characters ("//"). In addition, a URI reference with two slash characters ("//"). In addition, a URI reference
(Section 4.1) may begin with a relative path, in which case the first (Section 4.1) may be a relative-path reference, in which case the
path segment cannot contain a colon (":") character. The ABNF first path segment cannot contain a colon (":") character. The ABNF
requires five separate rules to disambiguate these cases, only one of requires five separate rules to disambiguate these cases, only one of
which will match the path substring within a given URI reference. We which will match the path substring within a given URI reference. We
use the generic term "path component" to describe the URI substring use the generic term "path component" to describe the URI substring
matched by the parser to one of these rules. matched by the parser to one of these rules.
path = path-abempty ; begins with "/" or is empty path = path-abempty ; begins with "/" or is empty
/ path-absolute ; begins with "/" but not "//" / path-absolute ; begins with "/" but not "//"
/ path-noscheme ; begins with a non-colon segment / path-noscheme ; begins with a non-colon segment
/ path-rootless ; begins with a segment / path-rootless ; begins with a segment
/ path-empty ; zero characters / path-empty ; zero characters
skipping to change at page 22, line 46 skipping to change at page 23, line 8
A path consists of a sequence of path segments separated by a slash A path consists of a sequence of path segments separated by a slash
("/") character. A path is always defined for a URI, though the ("/") character. A path is always defined for a URI, though the
defined path may be empty (zero length). Use of the slash character defined path may be empty (zero length). Use of the slash character
to indicate hierarchy is only required when a URI will be used as the to indicate hierarchy is only required when a URI will be used as the
context for relative references. For example, the URI context for relative references. For example, the URI
<mailto:fred@example.com> has a path of "fred@example.com", whereas <mailto:fred@example.com> has a path of "fred@example.com", whereas
the URI <foo://info.example.com?fred> has an empty path. the URI <foo://info.example.com?fred> has an empty path.
The path segments "." and "..", also known as dot-segments, are The path segments "." and "..", also known as dot-segments, are
defined for relative reference within the path name hierarchy. They defined for relative reference within the path name hierarchy. They
are intended for use at the beginning of a relative path reference are intended for use at the beginning of a relative-path reference
(Section 4.2) for indicating relative position within the (Section 4.2) for indicating relative position within the
hierarchical tree of names. This is similar to their role within hierarchical tree of names. This is similar to their role within
some operating systems' file directory structure to indicate the some operating systems' file directory structure to indicate the
current directory and parent directory, respectively. However, current directory and parent directory, respectively. However,
unlike a file system, these dot-segments are only interpreted within unlike a file system, these dot-segments are only interpreted within
the URI path hierarchy and are removed as part of the resolution the URI path hierarchy and are removed as part of the resolution
process (Section 5.2). process (Section 5.2).
Aside from dot-segments in hierarchical paths, a path segment is Aside from dot-segments in hierarchical paths, a path segment is
considered opaque by the generic syntax. URI-producing applications considered opaque by the generic syntax. URI-producing applications
skipping to change at page 23, line 35 skipping to change at page 23, line 45
data in the path component (Section 3.3), serves to identify a data in the path component (Section 3.3), serves to identify a
resource within the scope of the URI's scheme and naming authority resource within the scope of the URI's scheme and naming authority
(if any). The query component is indicated by the first question (if any). The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI. or by the end of the URI.
query = *( pchar / "/" / "?" ) query = *( pchar / "/" / "?" )
The characters slash ("/") and question mark ("?") may represent data The characters slash ("/") and question mark ("?") may represent data
within the query component. Beware that some older, erroneous within the query component. Beware that some older, erroneous
implementations do not handle such URIs correctly when they are used implementations may not handle such data correctly when used as the
as the base for relative references (Section 5.1), apparently because base URI for relative references (Section 5.1), apparently because
they fail to to distinguish query data from path data when looking they fail to to distinguish query data from path data when looking
for hierarchical separators. However, since query components are for hierarchical separators. However, since query components are
often used to carry identifying information in the form of often used to carry identifying information in the form of
"key=value" pairs, and one frequently used value is a reference to "key=value" pairs, and one frequently used value is a reference to
another URI, it is sometimes better for usability to avoid another URI, it is sometimes better for usability to avoid
percent-encoding those characters. percent-encoding those characters.
3.5 Fragment 3.5 Fragment
The fragment identifier component of a URI allows indirect The fragment identifier component of a URI allows indirect
skipping to change at page 25, line 10 skipping to change at page 25, line 17
loss of information, particularly in regards to accurate redirection loss of information, particularly in regards to accurate redirection
of references as resources move over time, it also serves to prevent of references as resources move over time, it also serves to prevent
information providers from denying reference authors the right to information providers from denying reference authors the right to
selectively refer to information within a resource. Indirect selectively refer to information within a resource. Indirect
referencing also provides additional flexibility and extensibility to referencing also provides additional flexibility and extensibility to
systems that use URIs, since new media types are easier to define and systems that use URIs, since new media types are easier to define and
deploy than new schemes of identification. deploy than new schemes of identification.
The characters slash ("/") and question mark ("?") are allowed to The characters slash ("/") and question mark ("?") are allowed to
represent data within the fragment identifier. Beware that some represent data within the fragment identifier. Beware that some
older, erroneous implementations do not handle such URIs correctly older, erroneous implementations may not handle such data correctly
when they are used as the base for relative references (Section 5.1). when used as the base URI for relative references (Section 5.1).
4. Usage 4. Usage
When applications make reference to a URI, they do not always use the When applications make reference to a URI, they do not always use the
full form of reference defined by the "URI" syntax rule. In order to full form of reference defined by the "URI" syntax rule. In order to
save space and take advantage of hierarchical locality, many Internet save space and take advantage of hierarchical locality, many Internet
protocol elements and media type formats allow an abbreviation of a protocol elements and media type formats allow an abbreviation of a
URI, while others restrict the syntax to a particular form of URI. URI, while others restrict the syntax to a particular form of URI.
We define the most common forms of reference syntax in this We define the most common forms of reference syntax in this
specification because they impact and depend upon the design of the specification because they impact and depend upon the design of the
generic syntax, requiring a uniform parsing algorithm in order to be generic syntax, requiring a uniform parsing algorithm in order to be
interpreted consistently. interpreted consistently.
4.1 URI Reference 4.1 URI Reference
URI-reference is used to denote the most common usage of a resource URI-reference is used to denote the most common usage of a resource
identifier. identifier.
URI-reference = URI / relative-URI URI-reference = URI / relative-ref
A URI-reference may be relative: if the reference's prefix matches A URI-reference is either a URI or a relative reference. If the
the syntax of a scheme followed by its colon separator, then the URI-reference's prefix does not match the syntax of a scheme followed
reference is a URI rather than a relative-URI. by its colon separator, then the URI-reference is a relative
reference.
A URI-reference is typically parsed first into the five URI A URI-reference is typically parsed first into the five URI
components, in order to determine what components are present and components, in order to determine what components are present and
whether or not the reference is relative, after which each component whether or not the reference is relative, after which each component
is parsed for its subparts and their validation. The ABNF of is parsed for its subparts and their validation. The ABNF of
URI-reference, along with the "first-match-wins" disambiguation rule, URI-reference, along with the "first-match-wins" disambiguation rule,
is sufficient to define a validating parser for the generic syntax. is sufficient to define a validating parser for the generic syntax.
Readers familiar with regular expressions should see Appendix B for Readers familiar with regular expressions should see Appendix B for
an example of a non-validating URI-reference parser that will take an example of a non-validating URI-reference parser that will take
any given string and extract the URI components. any given string and extract the URI components.
4.2 Relative URI 4.2 Relative Reference
A relative URI reference takes advantage of the hierarchical syntax A relative reference takes advantage of the hierarchical syntax
(Section 1.2.3) in order to express a reference that is relative to (Section 1.2.3) in order to express a URI reference relative to the
the name space of another hierarchical URI. name space of another hierarchical URI.
relative-URI = relative-part [ "?" query ] [ "#" fragment ] relative-ref = relative-part [ "?" query ] [ "#" fragment ]
relative-part = "//" authority path-abempty relative-part = "//" authority path-abempty
/ path-absolute / path-absolute
/ path-noscheme / path-noscheme
/ path-empty / path-empty
The URI referred to by a relative reference, also known as the target The URI referred to by a relative reference, also known as the target
URI, is obtained by applying the reference resolution algorithm of URI, is obtained by applying the reference resolution algorithm of
Section 5. Section 5.
skipping to change at page 26, line 43 skipping to change at page 26, line 43
4.3 Absolute URI 4.3 Absolute URI
Some protocol elements allow only the absolute form of a URI without Some protocol elements allow only the absolute form of a URI without
a fragment identifier. For example, defining a base URI for later a fragment identifier. For example, defining a base URI for later
use by relative references calls for an absolute-URI syntax rule that use by relative references calls for an absolute-URI syntax rule that
does not allow a fragment. does not allow a fragment.
absolute-URI = scheme ":" hier-part [ "?" query ] absolute-URI = scheme ":" hier-part [ "?" query ]
URI scheme specifications must define their own syntax such that all
strings matching their scheme-specific syntax will also match the
<absolute-URI> grammar. Scheme specifications are not responsible
for defining fragment identifier syntax or usage, regardless of its
applicability to resources identifiable via that scheme, since
fragment identification is orthogonal to scheme definition. However,
scheme specifications are encouraged to include a wide range of
examples, including examples that show use of the scheme's URIs with
fragment identifiers when such usage is appropriate.
4.4 Same-document Reference 4.4 Same-document Reference
When a URI reference refers to a URI that is, aside from its fragment When a URI reference refers to a URI that is, aside from its fragment
component (if any), identical to the base URI (Section 5.1), that component (if any), identical to the base URI (Section 5.1), that
reference is called a "same-document" reference. The most frequent reference is called a "same-document" reference. The most frequent
examples of same-document references are relative references that are examples of same-document references are relative references that are
empty or include only the number sign ("#") separator followed by a empty or include only the number sign ("#") separator followed by a
fragment identifier. fragment identifier.
When a same-document reference is dereferenced for the purpose of a When a same-document reference is dereferenced for the purpose of a
skipping to change at page 27, line 45 skipping to change at page 28, line 8
user and heuristically resolved. user and heuristically resolved.
While this practice of using suffix references is common, it should While this practice of using suffix references is common, it should
be avoided whenever possible and never used in situations where be avoided whenever possible and never used in situations where
long-term references are expected. The heuristics noted above will long-term references are expected. The heuristics noted above will
change over time, particularly when a new URI scheme becomes popular, change over time, particularly when a new URI scheme becomes popular,
and are often incorrect when used out of context. Furthermore, they and are often incorrect when used out of context. Furthermore, they
can lead to security issues along the lines of those described in can lead to security issues along the lines of those described in
[RFC1535]. [RFC1535].
Since a URI suffix has the same syntax as a relative path reference, Since a URI suffix has the same syntax as a relative-path reference,
a suffix reference cannot be used in contexts where a relative a suffix reference cannot be used in contexts where a relative
reference is expected. As a result, suffix references are limited to reference is expected. As a result, suffix references are limited to
those places where there is no defined base URI, such as dialog boxes those places where there is no defined base URI, such as dialog boxes
and off-line advertisements. and off-line advertisements.
5. Reference Resolution 5. Reference Resolution
This section defines the process of resolving a URI reference within This section defines the process of resolving a URI reference within
a context that allows relative references, such that the result is a a context that allows relative references, such that the result is a
string matching the "URI" syntax rule of Section 3. string matching the <URI> syntax rule of Section 3.
5.1 Establishing a Base URI 5.1 Establishing a Base URI
The term "relative" implies that there exists a "base URI" against The term "relative" implies that there exists a "base URI" against
which the relative reference is applied. Aside from fragment-only which the relative reference is applied. Aside from fragment-only
references (Section 4.4), relative references are only usable when a references (Section 4.4), relative references are only usable when a
base URI is known. A base URI must be established by the parser base URI is known. A base URI must be established by the parser
prior to parsing URI references that might be relative. A base URI prior to parsing URI references that might be relative. A base URI
must conform to the <absolute-URI> syntax rule (Section 4.3): if the must conform to the <absolute-URI> syntax rule (Section 4.3): if the
base URI is obtained from a URI reference, then that reference must base URI is obtained from a URI reference, then that reference must
skipping to change at page 34, line 48 skipping to change at page 34, line 48
present in the reference, and a component that is empty, meaning that present in the reference, and a component that is empty, meaning that
the separator was present and was immediately followed by the next the separator was present and was immediately followed by the next
component separator or the end of the reference. component separator or the end of the reference.
5.4 Reference Resolution Examples 5.4 Reference Resolution Examples
Within a representation with a well-defined base URI of Within a representation with a well-defined base URI of
http://a/b/c/d;p?q http://a/b/c/d;p?q
a relative URI reference is transformed to its target URI as follows. a relative reference is transformed to its target URI as follows.
5.4.1 Normal Examples 5.4.1 Normal Examples
"g:h" = "g:h" "g:h" = "g:h"
"g" = "http://a/b/c/g" "g" = "http://a/b/c/g"
"./g" = "http://a/b/c/g" "./g" = "http://a/b/c/g"
"g/" = "http://a/b/c/g/" "g/" = "http://a/b/c/g/"
"/g" = "http://a/g" "/g" = "http://a/g"
"//g" = "http://g" "//g" = "http://g"
"?y" = "http://a/b/c/d;p?y" "?y" = "http://a/b/c/d;p?y"
skipping to change at page 35, line 37 skipping to change at page 35, line 37
"../.." = "http://a/" "../.." = "http://a/"
"../../" = "http://a/" "../../" = "http://a/"
"../../g" = "http://a/g" "../../g" = "http://a/g"
5.4.2 Abnormal Examples 5.4.2 Abnormal Examples
Although the following abnormal examples are unlikely to occur in Although the following abnormal examples are unlikely to occur in
normal practice, all URI parsers should be capable of resolving them normal practice, all URI parsers should be capable of resolving them
consistently. Each example uses the same base as above. consistently. Each example uses the same base as above.
Parsers must be careful in handling cases where there are more Parsers must be careful in handling cases where there are more ".."
relative path ".." segments than there are hierarchical levels in the segments in a relative-path reference than there are hierarchical
base URI's path. Note that the ".." syntax cannot be used to change levels in the base URI's path. Note that the ".." syntax cannot be
the authority component of a URI. used to change the authority component of a URI.
"../../../g" = "http://a/g" "../../../g" = "http://a/g"
"../../../../g" = "http://a/g" "../../../../g" = "http://a/g"
Similarly, parsers must remove the dot-segments "." and ".." when Similarly, parsers must remove the dot-segments "." and ".." when
they are complete components of a path, but not when they are only they are complete components of a path, but not when they are only
part of a segment. part of a segment.
"/./g" = "http://a/g" "/./g" = "http://a/g"
"/../g" = "http://a/g" "/../g" = "http://a/g"
"g." = "http://a/b/c/g." "g." = "http://a/b/c/g."
".g" = "http://a/b/c/.g" ".g" = "http://a/b/c/.g"
"g.." = "http://a/b/c/g.." "g.." = "http://a/b/c/g.."
"..g" = "http://a/b/c/..g" "..g" = "http://a/b/c/..g"
Less likely are cases where the relative URI reference uses Less likely are cases where the relative reference uses unnecessary
unnecessary or nonsensical forms of the "." and ".." complete path or nonsensical forms of the "." and ".." complete path segments.
segments.
"./../g" = "http://a/b/g" "./../g" = "http://a/b/g"
"./g/." = "http://a/b/c/g/" "./g/." = "http://a/b/c/g/"
"g/./h" = "http://a/b/c/g/h" "g/./h" = "http://a/b/c/g/h"
"g/../h" = "http://a/b/c/h" "g/../h" = "http://a/b/c/h"
"g;x=1/./y" = "http://a/b/c/g;x=1/y" "g;x=1/./y" = "http://a/b/c/g;x=1/y"
"g;x=1/../y" = "http://a/b/c/y" "g;x=1/../y" = "http://a/b/c/y"
Some applications fail to separate the reference's query and/or Some applications fail to separate the reference's query and/or
fragment components from a relative path before merging it with the fragment components from the path component before merging it with
base path and removing dot-segments. This error is rarely noticed, the base path and removing dot-segments. This error is rarely
since typical usage of a fragment never includes the hierarchy ("/") noticed, since typical usage of a fragment never includes the
character, and the query component is not normally used within hierarchy ("/") character, and the query component is not normally
relative references. used within relative references.
"g?y/./x" = "http://a/b/c/g?y/./x" "g?y/./x" = "http://a/b/c/g?y/./x"
"g?y/../x" = "http://a/b/c/g?y/../x" "g?y/../x" = "http://a/b/c/g?y/../x"
"g#s/./x" = "http://a/b/c/g#s/./x" "g#s/./x" = "http://a/b/c/g#s/./x"
"g#s/../x" = "http://a/b/c/g#s/../x" "g#s/../x" = "http://a/b/c/g#s/../x"
Some parsers allow the scheme name to be present in a relative URI Some parsers allow the scheme name to be present in a relative
reference if it is the same as the base URI scheme. This is reference if it is the same as the base URI scheme. This is
considered to be a loophole in prior specifications of partial URI considered to be a loophole in prior specifications of partial URI
[RFC1630]. Its use should be avoided, but is allowed for backward [RFC1630]. Its use should be avoided, but is allowed for backward
compatibility. compatibility.
"http:g" = "http:g" ; for strict parsers "http:g" = "http:g" ; for strict parsers
/ "http://a/b/c/g" ; for backward compatibility / "http://a/b/c/g" ; for backward compatibility
6. Normalization and Comparison 6. Normalization and Comparison
One of the most common operations on URIs is simple comparison: One of the most common operations on URIs is simple comparison:
determining if two URIs are equivalent without using the URIs to determining if two URIs are equivalent without using the URIs to
access their respective resource(s). A comparison is performed every access their respective resource(s). A comparison is performed every
time a response cache is accessed, a browser checks its history to time a response cache is accessed, a browser checks its history to
color a link, or an XML parser processes tags within a namespace. color a link, or an XML parser processes tags within a namespace.
Extensive normalization prior to comparison of URIs is often used by Extensive normalization prior to comparison of URIs is often used by
spiders and indexing engines to prune a search space or reduce spiders and indexing engines to prune a search space or reduce
duplication of request actions and response storage. duplication of request actions and response storage.
URI comparison is performed in respect to some particular purpose, URI comparison is performed in respect to some particular purpose,
and software with differing purposes will often be subject to and implementations with differing purposes will often be subject to
differing design trade-offs in regards to how much effort should be differing design trade-offs in regards to how much effort should be
spent in reducing duplicate identifiers. This section describes a spent in reducing aliased identifiers. This section describes a
variety of methods that may be used to compare URIs, the trade-offs variety of methods that may be used to compare URIs, the trade-offs
between them, and the types of applications that might use them. A between them, and the types of applications that might use them.
canonical form for URI references is defined to reduce the occurrence
of false negative comparisons.
6.1 Equivalence 6.1 Equivalence
Since URIs exist to identify resources, presumably they should be Since URIs exist to identify resources, presumably they should be
considered equivalent when they identify the same resource. However, considered equivalent when they identify the same resource. However,
such a definition of equivalence is not of much practical use, since such a definition of equivalence is not of much practical use, since
there is no way for software to compare two resources without there is no way for an implementation to compare two resources that
knowledge of the implementation-specific syntax of each URI's are not under its own control. For this reason, determination of
dereferencing algorithm. For this reason, determination of
equivalence or difference of URIs is based on string comparison, equivalence or difference of URIs is based on string comparison,
perhaps augmented by reference to additional rules provided by URI perhaps augmented by reference to additional rules provided by URI
scheme definitions. We use the terms "different" and "equivalent" to scheme definitions. We use the terms "different" and "equivalent" to
describe the possible outcomes of such comparisons, but there are describe the possible outcomes of such comparisons, but there are
many application-dependent versions of equivalence. many application-dependent versions of equivalence.
Even though it is possible to determine that two URIs are equivalent, Even though it is possible to determine that two URIs are equivalent,
it is never possible to be sure that two URIs identify different URI comparison is not sufficient to determine if two URIs identify
resources. For example, an owner of two different domain names could different resources. For example, an owner of two different domain
decide to serve the same resource from both, resulting in two names could decide to serve the same resource from both, resulting in
different URIs. Therefore, comparison methods are designed to two different URIs. Therefore, comparison methods are designed to
minimize false negatives while strictly avoiding false positives. minimize false negatives while strictly avoiding false positives.
In testing for equivalence, applications should not directly compare In testing for equivalence, applications should not directly compare
relative URI references; the references should be converted to their relative references; the references should be converted to their
target URI forms before comparison. When URIs are being compared for respective target URIs before comparison. When URIs are being
the purpose of selecting (or avoiding) a network action, such as compared for the purpose of selecting (or avoiding) a network action,
retrieval of a representation, the fragment components (if any) such as retrieval of a representation, fragment components (if any)
should be excluded from the comparison. should be excluded from the comparison.
6.2 Comparison Ladder 6.2 Comparison Ladder
A variety of methods are used in practice to test URI equivalence. A variety of methods are used in practice to test URI equivalence.
These methods fall into a range, distinguished by the amount of These methods fall into a range, distinguished by the amount of
processing required and the degree to which the probability of false processing required and the degree to which the probability of false
negatives is reduced. As noted above, false negatives cannot in negatives is reduced. As noted above, false negatives cannot be
principle be eliminated. In practice, their probability can be eliminated. In practice, their probability can be reduced, but this
reduced, but this reduction requires more processing and is not reduction requires more processing and is not cost-effective for all
cost-effective for all applications. applications.
If this range of comparison practices is considered as a ladder, the If this range of comparison practices is considered as a ladder, the
following discussion will climb the ladder, starting with those following discussion will climb the ladder, starting with those
practices that are cheap but have a relatively higher chance of practices that are cheap but have a relatively higher chance of
producing false negatives, and proceeding to those that have higher producing false negatives, and proceeding to those that have higher
computational cost and lower risk of false negatives. computational cost and lower risk of false negatives.
6.2.1 Simple String Comparison 6.2.1 Simple String Comparison
If two URIs, considered as character strings, are identical, then it If two URIs, considered as character strings, are identical, then it
skipping to change at page 38, line 38 skipping to change at page 38, line 31
Such character comparisons require that each pair of characters be Such character comparisons require that each pair of characters be
put in comparable form. For example, should one URI be stored in a put in comparable form. For example, should one URI be stored in a
byte array in EBCDIC encoding, and the second be in a Java String byte array in EBCDIC encoding, and the second be in a Java String
object (UTF-16), bit-for-bit comparisons applied naively will produce object (UTF-16), bit-for-bit comparisons applied naively will produce
errors. It is better to speak of equality on a errors. It is better to speak of equality on a
character-for-character rather than byte-for-byte or bit-for-bit character-for-character rather than byte-for-byte or bit-for-bit
basis. In practical terms, character-by-character comparisons should basis. In practical terms, character-by-character comparisons should
be done codepoint-by-codepoint after conversion to a common character be done codepoint-by-codepoint after conversion to a common character
encoding. encoding.
False negatives are caused by the production and use of URI aliases.
Unnecessary aliases can be reduced, regardless of the comparison
method, by consistently providing URI references in an
already-normalized form (i.e., a form identical to what would be
produced after normalization is applied, as described below).
Protocols and data formats often choose to limit some URI comparisons
to simple string comparison, based on the theory that people and
implementations will, in their own best interest, be consistent in
providing URI references, or at least consistent enough to negate any
efficiency that might be obtained from further normalization.
6.2.2 Syntax-based Normalization 6.2.2 Syntax-based Normalization
Software may use logic based on the definitions provided by this Implementations may use logic based on the definitions provided by
specification to reduce the probability of false negatives. Such this specification to reduce the probability of false negatives.
processing is moderately higher in cost than character-for-character Such processing is moderately higher in cost than
string comparison. For example, an application using this approach character-for-character string comparison. For example, an
could reasonably consider the following two URIs equivalent: application using this approach could reasonably consider the
following two URIs equivalent:
example://a/b/c/%7Bfoo%7D example://a/b/c/%7Bfoo%7D
eXAMPLE://a/./b/../b/%63/%7bfoo%7d eXAMPLE://a/./b/../b/%63/%7bfoo%7d
Web user agents, such as browsers, typically apply this type of URI Web user agents, such as browsers, typically apply this type of URI
normalization when determining whether a cached response is normalization when determining whether a cached response is
available. Syntax-based normalization includes such techniques as available. Syntax-based normalization includes such techniques as
case normalization, percent-encoding normalization, and removal of case normalization, percent-encoding normalization, and removal of
dot-segments. dot-segments.
6.2.2.1 Case Normalization 6.2.2.1 Case Normalization
When a URI scheme uses components of the generic syntax, it will also For all URIs, the hexadecimal digits within a percent-encoding
use the common syntax equivalence rules, namely that the scheme and triplet (e.g., "%3a" versus "%3A") are case-insensitive and therefore
should be normalized to use uppercase letters for the digits A-F.
When a URI uses components of the generic syntax, the component
syntax equivalence rules always apply; namely, that the scheme and
host are case-insensitive and therefore should be normalized to host are case-insensitive and therefore should be normalized to
lowercase. For example, the URI <HTTP://www.EXAMPLE.com/> is lowercase. For example, the URI <HTTP://www.EXAMPLE.com/> is
equivalent to <http://www.example.com/>. Applications should not equivalent to <http://www.example.com/>. The other generic syntax
assume anything about the case sensitivity of other URI components, components are assumed to be case-sensitive unless specifically
since that is dependent on the implementation used to handle a defined otherwise by the scheme (see Section 6.2.3).
dereference.
The hexadecimal digits within a percent-encoding triplet (e.g., "%3a"
versus "%3A") are case-insensitive and therefore should be normalized
to use uppercase letters for the digits A-F.
6.2.2.2 Percent-Encoding Normalization 6.2.2.2 Percent-Encoding Normalization
The percent-encoding mechanism (Section 2.1) is a frequent source of The percent-encoding mechanism (Section 2.1) is a frequent source of
variance among otherwise identical URIs. In addition to the variance among otherwise identical URIs. In addition to the case
case-insensitivity issue noted above, some URI producers normalization issue noted above, some URI producers percent-encode
percent-encode octets that do not require percent-encoding, resulting octets that do not require percent-encoding, resulting in URIs that
in URIs that are equivalent to their non-encoded counterparts. Such are equivalent to their non-encoded counterparts. Such URIs should
URIs should be normalized by decoding any percent-encoded octet that be normalized by decoding any percent-encoded octet that corresponds
corresponds to an unreserved character, as described in Section 2.3. to an unreserved character, as described in Section 2.3.
6.2.2.3 Path Segment Normalization 6.2.2.3 Path Segment Normalization
The complete path segments "." and ".." have a special meaning within The complete path segments "." and ".." are intended only for use
hierarchical URI schemes. As such, they should not appear in within relative references (Section 4.1) and are removed as part of
absolute paths; if they are found, they can be removed by applying the reference resolution process (Section 5.2). However, some
the remove_dot_segments algorithm to the path, as described in deployed implementations incorrectly assume that reference resolution
Section 5.2. is not necessary when the reference is already a URI, and thus fail
to remove dot-segments when they occur in non-relative paths. URI
normalizers should remove dot-segments by applying the
remove_dot_segments algorithm to the path, as described in
Section 5.2.4.
6.2.3 Scheme-based Normalization 6.2.3 Scheme-based Normalization
The syntax and semantics of URIs vary from scheme to scheme, as The syntax and semantics of URIs vary from scheme to scheme, as
described by the defining specification for each scheme. Software described by the defining specification for each scheme.
may use scheme-specific rules, at further processing cost, to reduce Implementations may use scheme-specific rules, at further processing
the probability of false negatives. For example, since the "http" cost, to reduce the probability of false negatives. For example,
scheme makes use of an authority component, has a default port of since the "http" scheme makes use of an authority component, has a
"80", and defines an empty path to be equivalent to "/", the default port of "80", and defines an empty path to be equivalent to
following four URIs are equivalent: "/", the following four URIs are equivalent:
http://example.com http://example.com
http://example.com/ http://example.com/
http://example.com:/ http://example.com:/
http://example.com:80/ http://example.com:80/
In general, a URI that uses the generic syntax for authority with an In general, a URI that uses the generic syntax for authority with an
empty path should be normalized to a path of "/"; likewise, an empty path should be normalized to a path of "/"; likewise, an
explicit ":port", where the port is empty or the default for the explicit ":port", where the port is empty or the default for the
scheme, is equivalent to one where the port and its ":" delimiter are scheme, is equivalent to one where the port and its ":" delimiter are
elided. In other words, the second of the above URI examples is the elided, and thus should be removed by scheme-based normalization.
normal form for the "http" scheme. For example, the second URI above is the normal form for the "http"
scheme.
Another case where normalization varies by scheme is in the handling Another case where normalization varies by scheme is in the handling
of an empty authority component or empty host subcomponent. For many of an empty authority component or empty host subcomponent. For many
scheme specifications, an empty authority or host is considered an scheme specifications, an empty authority or host is considered an
error; for others, it is considered equivalent to "localhost" or the error; for others, it is considered equivalent to "localhost" or the
end-user's host. When a scheme defines a default for authority and a end-user's host. When a scheme defines a default for authority and a
URI reference to that default is desired, the reference should have URI reference to that default is desired, the reference should be
an empty authority for the sake of uniformity, brevity, and normalized to an empty authority for the sake of uniformity, brevity,
internationalization. If, however, either the userinfo or port and internationalization. If, however, either the userinfo or port
subcomponent is non-empty, then the host should be given explicitly subcomponent is non-empty, then the host should be given explicitly
even if it matches the default. even if it matches the default.
Normalization should not remove delimiters when their associated
component is empty unless licensed to do so by the scheme
specification. For example, the URI "http://example.com/?" cannot be
assumed to be equivalent to any of the examples above. Likewise, the
presence or absence of delimiters within a userinfo subcomponent is
usually significant to its interpretation. The fragment component is
not subject to any scheme-based normalization; thus, two URIs that
differ only by the suffix "#" are considered different regardless of
the scheme.
Some schemes define additional subcomponents that consist of
case-insensitive data, giving an implicit license to normalizers to
convert such data to a common case (e.g., all lowercase). For
example, URI schemes that define a subcomponent of path to contain an
Internet hostname, such as the "mailto" URI scheme, cause that
subcomponent to be case-insensitive and thus subject to case
normalization (e.g., "mailto:Joe@Example.COM" is equivalent to
"mailto:Joe@example.com" even though the generic syntax considers the
path component to be case-sensitive).
Other scheme-specific normalizations are possible.
6.2.4 Protocol-based Normalization 6.2.4 Protocol-based Normalization
Web spiders, for which substantial effort to reduce the incidence of Web spiders, for which substantial effort to reduce the incidence of
false negatives is often cost-effective, are observed to implement false negatives is often cost-effective, are observed to implement
even more aggressive techniques in URI comparison. For example, if even more aggressive techniques in URI comparison. For example, if
they observe that a URI such as they observe that a URI such as
http://example.com/data http://example.com/data
redirects to a URI differing only in the trailing slash redirects to a URI differing only in the trailing slash
http://example.com/data/ http://example.com/data/
they will likely regard the two as equivalent in the future. This they will likely regard the two as equivalent in the future. This
kind of technique is only appropriate when equivalence is clearly kind of technique is only appropriate when equivalence is clearly
indicated by both the result of accessing the resources and the indicated by both the result of accessing the resources and the
common conventions of their scheme's dereference algorithm (in this common conventions of their scheme's dereference algorithm (in this
case, use of redirection by HTTP origin servers to avoid problems case, use of redirection by HTTP origin servers to avoid problems
with relative references). with relative references).
6.3 Canonical Form
It is in the best interests of everyone concerned to avoid
false-negatives in comparing URIs and to minimize the amount of
software processing for such comparisons. Those who produce and make
reference to URIs can reduce the cost of processing and the risk of
false negatives by consistently providing them in a form that is
reasonably canonical with respect to their scheme. Specifically:
o Always provide the URI scheme in lowercase characters.
o Always provide the host, if any, in lowercase characters.
o Only perform percent-encoding where it is essential.
o Always use uppercase A-through-F characters when percent-encoding.
o Prevent dot-segments appearing in non-relative URI paths.
o For schemes that define a default authority, use an empty
authority if the default is desired.
o For schemes that define an empty path to be equivalent to a path
of "/", use "/".
7. Security Considerations 7. Security Considerations
A URI does not in itself pose a security threat. However, since URIs A URI does not in itself pose a security threat. However, since URIs
are often used to provide a compact set of instructions for access to are often used to provide a compact set of instructions for access to
network resources, care must be taken to properly interpret the data network resources, care must be taken to properly interpret the data
within a URI, to prevent that data from causing unintended access, within a URI, to prevent that data from causing unintended access,
and to avoid including data that should not be revealed in plain and to avoid including data that should not be revealed in plain
text. text.
7.1 Reliability and Consistency 7.1 Reliability and Consistency
skipping to change at page 44, line 42 skipping to change at page 45, line 9
impact of such attacks by distinguishing the various components of impact of such attacks by distinguishing the various components of
the URI when rendered, such as by using a different color or tone to the URI when rendered, such as by using a different color or tone to
render userinfo if any is present, though there is no general render userinfo if any is present, though there is no general
panacea. More information on URI-based semantic attacks can be found panacea. More information on URI-based semantic attacks can be found
in [Siedzik]. in [Siedzik].
8. IANA Considerations 8. IANA Considerations
URI scheme names, as defined by <scheme> in Section 3.1, form a URI scheme names, as defined by <scheme> in Section 3.1, form a
registered name space that is managed by IANA according to the registered name space that is managed by IANA according to the
procedures defined in [BCP35]. procedures defined in [BCP35]. No IANA actions are required by this
document.
9. Acknowledgments 9. Acknowledgments
This specification is derived from RFC 2396 [RFC2396], RFC 1808 This specification is derived from RFC 2396 [RFC2396], RFC 1808
[RFC1808], and RFC 1738 [RFC1738]; the acknowledgments in those [RFC1808], and RFC 1738 [RFC1738]; the acknowledgments in those
documents still apply. It also incorporates the update (with documents still apply. It also incorporates the update (with
corrections) for IPv6 literals in the host syntax, as defined by corrections) for IPv6 literals in the host syntax, as defined by
Robert M. Hinden, Brian E. Carpenter, and Larry Masinter in Robert M. Hinden, Brian E. Carpenter, and Larry Masinter in
[RFC2732]. In addition, contributions by Gisle Aas, Reese Anschultz, [RFC2732]. In addition, contributions by Gisle Aas, Reese Anschultz,
Daniel Barclay, Tim Bray, Mike Brown, Rob Cameron, Jeremy Carroll, Daniel Barclay, Tim Bray, Mike Brown, Rob Cameron, Jeremy Carroll,
skipping to change at page 47, line 12 skipping to change at page 47, line 48
(URNs): Clarifications and Recommendations", RFC 3305, (URNs): Clarifications and Recommendations", RFC 3305,
August 2002. August 2002.
[RFC3490] Faltstrom, P., Hoffman, P. and A. Costello, [RFC3490] Faltstrom, P., Hoffman, P. and A. Costello,
"Internationalizing Domain Names in Applications (IDNA)", "Internationalizing Domain Names in Applications (IDNA)",
RFC 3490, March 2003. RFC 3490, March 2003.
[RFC3513] Hinden, R. and S. Deering, "Internet Protocol Version 6 [RFC3513] Hinden, R. and S. Deering, "Internet Protocol Version 6
(IPv6) Addressing Architecture", RFC 3513, April 2003. (IPv6) Addressing Architecture", RFC 3513, April 2003.
[Siedzik] Siedzik, R., "Semantic Attacks: What's in a URL?", April [Siedzik] Siedzik, R., "Semantic Attacks: What's in a URL?",
2001, <http://www.giac.org/practical/gsec/ April 2001, <http://www.giac.org/practical/gsec/
Richard_Siedzik_GSEC.pdf>. Richard_Siedzik_GSEC.pdf>.
Authors' Addresses Authors' Addresses
Tim Berners-Lee Tim Berners-Lee
World Wide Web Consortium World Wide Web Consortium
Massachusetts Institute of Technology Massachusetts Institute of Technology
77 Massachusetts Avenue 77 Massachusetts Avenue
Cambridge, MA 02139 Cambridge, MA 02139
USA USA
skipping to change at page 48, line 14 skipping to change at page 49, line 14
Appendix A. Collected ABNF for URI Appendix A. Collected ABNF for URI
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
hier-part = "//" authority path-abempty hier-part = "//" authority path-abempty
/ path-absolute / path-absolute
/ path-rootless / path-rootless
/ path-empty / path-empty
URI-reference = URI / relative-URI URI-reference = URI / relative-ref
absolute-URI = scheme ":" hier-part [ "?" query ] absolute-URI = scheme ":" hier-part [ "?" query ]
relative-URI = relative-part [ "?" query ] [ "#" fragment ] relative-ref = relative-part [ "?" query ] [ "#" fragment ]
relative-part = "//" authority path-abempty relative-part = "//" authority path-abempty
/ path-absolute / path-absolute
/ path-noscheme / path-noscheme
/ path-empty / path-empty
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
authority = [ userinfo "@" ] host [ ":" port ] authority = [ userinfo "@" ] host [ ":" port ]
userinfo = *( unreserved / pct-encoded / sub-delims / ":" ) userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
skipping to change at page 51, line 11 skipping to change at page 52, line 16
URIs are often transmitted through formats that do not provide a URIs are often transmitted through formats that do not provide a
clear context for their interpretation. For example, there are many clear context for their interpretation. For example, there are many
occasions when a URI is included in plain text; examples include text occasions when a URI is included in plain text; examples include text
sent in electronic mail, USENET news messages, and, most importantly, sent in electronic mail, USENET news messages, and, most importantly,
printed on paper. In such cases, it is important to be able to printed on paper. In such cases, it is important to be able to
delimit the URI from the rest of the text, and in particular from delimit the URI from the rest of the text, and in particular from
punctuation marks that might be mistaken for part of the URI. punctuation marks that might be mistaken for part of the URI.
In practice, URIs are delimited in a variety of ways, but usually In practice, URIs are delimited in a variety of ways, but usually
within double-quotes "http://example.com/", angle brackets <http:// within double-quotes "http://example.com/", angle brackets
example.com/>, or just using whitespace <http://example.com/>, or just using whitespace
http://example.com/ http://example.com/
These wrappers do not form part of the URI. These wrappers do not form part of the URI.
In some cases, extra whitespace (spaces, line-breaks, tabs, etc.) may In some cases, extra whitespace (spaces, line-breaks, tabs, etc.) may
need to be added to break a long URI across lines. The whitespace need to be added to break a long URI across lines. The whitespace
should be ignored when extracting the URI. should be ignored when extracting the URI.
No whitespace should be introduced after a hyphen ("-") character. No whitespace should be introduced after a hyphen ("-") character.
skipping to change at page 52, line 5 skipping to change at page 53, line 18
but you can probably pick it up from <ftp://foo.example. but you can probably pick it up from <ftp://foo.example.
com/rfc/>. Note the warning in <http://www.ics.uci.edu/pub/ com/rfc/>. Note the warning in <http://www.ics.uci.edu/pub/
ietf/uri/historical.html#WARNING>. ietf/uri/historical.html#WARNING>.
contains the URI references contains the URI references
http://www.w3.org/Addressing/ http://www.w3.org/Addressing/
ftp://foo.example.com/rfc/ ftp://foo.example.com/rfc/
http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING
Appendix D. Summary of Non-editorial Changes Appendix D. Changes from RFC 2396
D.1 Additions D.1 Additions
An ABNF rule for URI has been introduced to correspond to one common
usage of the term: an absolute URI with optional fragment.
IPv6 (and later) literals have been added to the list of possible IPv6 (and later) literals have been added to the list of possible
identifiers for the host portion of a authority component, as identifiers for the host portion of an authority component, as
described by [RFC2732], with the addition of "[" and "]" to the described by [RFC2732], with the addition of "[" and "]" to the
reserved set and a version flag to anticipate future versions of IP reserved set and a version flag to anticipate future versions of IP
literals. Square brackets are now specified as reserved within the literals. Square brackets are now specified as reserved within the
authority component and not allowed outside their use as delimiters authority component and not allowed outside their use as delimiters
for an IP literal within host. In order to make this change without for an IP literal within host. In order to make this change without
changing the technical definition of the path, query, and fragment changing the technical definition of the path, query, and fragment
components, those rules were redefined to directly specify the components, those rules were redefined to directly specify the
characters allowed rather than be defined in terms of uric. characters allowed.
Since [RFC2732] defers to [RFC3513] for definition of an IPv6 literal Since [RFC2732] defers to [RFC3513] for definition of an IPv6 literal
address, which unfortunately lacks an ABNF description of address, which unfortunately lacks an ABNF description of
IPv6address, we created a new ABNF rule for IPv6address that matches IPv6address, we created a new ABNF rule for IPv6address that matches
the text representations defined by Section 2.2 of [RFC3513]. the text representations defined by Section 2.2 of [RFC3513].
Likewise, the definition of IPv4address has been improved in order to Likewise, the definition of IPv4address has been improved in order to
limit each decimal octet to the range 0-255. limit each decimal octet to the range 0-255.
Section 6 (Section 6) on URI normalization and comparison has been Section 6 (Section 6) on URI normalization and comparison has been
completely rewritten and extended using input from Tim Bray and completely rewritten and extended using input from Tim Bray and
discussion within the W3C Technical Architecture Group. discussion within the W3C Technical Architecture Group.
An ABNF rule for URI has been introduced to correspond to the common D.2 Modifications
usage of the term: an absolute URI with optional fragment.
D.2 Modifications from RFC 2396 The ad-hoc BNF syntax of RFC 2396 has been replaced with the ABNF of
[RFC2234]. This change required all rule names that formerly
included underscore characters to be renamed with a dash instead. In
addition, a number of syntax rules have been eliminated or simplified
to make the overall grammar more comprehensible. Specifications that
refer to the obsolete grammar rules may be understood by replacing
those rules according to the following table:
The ad-hoc BNF syntax has been replaced with the ABNF of [RFC2234]. +----------------+--------------------------------------------------+
This change required all rule names that formerly included underscore | obsolete rule | translation |
characters to be renamed with a dash instead. +----------------+--------------------------------------------------+
| absoluteURI | absolute-URI |
| relativeURI | relative-part [ "?" query ] |
| hier_part | ( "//" authority path-abempty / |
| | path-absolute ) [ "?" query ] |
| | |
| opaque_part | path-rootless [ "?" query ] |
| net_path | "//" authority path-abempty |
| abs_path | path-absolute |
| rel_path | path-rootless |
| rel_segment | segment-nz-nc |
| reg_name | reg-name |
| server | authority |
| hostport | host [ ":" port ] |
| hostname | reg-name |
| path_segments | path-abempty |
| param | *<pchar excluding ";"> |
| | |
| uric | unreserved / pct-encoded / ";" / "?" / ":" |
| | / "@" / "&" / "=" / "+" / "$" / "," / "/" |
| | |
| uric_no_slash | unreserved / pct-encoded / ";" / "?" / ":" |
| | / "@" / "&" / "=" / "+" / "$" / "," |
| | |
| mark | "-" / "_" / "." / "!" / "~" / "*" / "'" |
| | / "(" / ")" |
| | |
| escaped | pct-encoded |
| hex | HEXDIG |
| alphanum | ALPHA / DIGIT |
+----------------+--------------------------------------------------+
Use of the above obsolete rules for the definition of scheme-specific
syntax is deprecated.
Section 2 on characters has been rewritten to explain what characters Section 2 on characters has been rewritten to explain what characters
are reserved, when they are reserved, and why they are reserved even are reserved, when they are reserved, and why they are reserved even
when not used as delimiters by the generic syntax. The mark when not used as delimiters by the generic syntax. The mark
characters that are typically unsafe to decode, including the characters that are typically unsafe to decode, including the
exclamation mark ("!"), asterisk ("*"), single-quote ("'"), and open exclamation mark ("!"), asterisk ("*"), single-quote ("'"), and open
and close parentheses ("(" and ")"), have been moved to the reserved and close parentheses ("(" and ")"), have been moved to the reserved
set in order to clarify the distinction between reserved and set in order to clarify the distinction between reserved and
unreserved and hopefully answer the most common question of scheme unreserved and hopefully answer the most common question of scheme
designers. Likewise, the section on percent-encoded characters has designers. Likewise, the section on percent-encoded characters has
skipping to change at page 53, line 11 skipping to change at page 55, line 26
In general, the terms "escaped" and "unescaped" have been replaced In general, the terms "escaped" and "unescaped" have been replaced
with "percent-encoded" and "decoded", respectively, to reduce with "percent-encoded" and "decoded", respectively, to reduce
confusion with other forms of escape mechanisms. confusion with other forms of escape mechanisms.
The ABNF for URI and URI-reference has been redesigned to make them The ABNF for URI and URI-reference has been redesigned to make them
more friendly to LALR parsers and reduce complexity. As a result, more friendly to LALR parsers and reduce complexity. As a result,
the layout form of syntax description has been removed, along with the layout form of syntax description has been removed, along with
the uric, uric_no_slash, opaque_part, net_path, abs_path, rel_path, the uric, uric_no_slash, opaque_part, net_path, abs_path, rel_path,
path_segments, rel_segment, and mark rules. All references to path_segments, rel_segment, and mark rules. All references to
"opaque" URIs have been replaced with a better description of how the "opaque" URIs have been replaced with a better description of how the
path component may be opaque to hierarchy. The ambiguity regarding path component may be opaque to hierarchy. The relativeURI rule has
the parsing of URI-reference as a URI or a relative-URI with a colon been replaced with relative-ref to avoid unnecessary confusion over
in the first segment has been eliminated through the use of five whether or not they are a subset of URI. The ambiguity regarding the
parsing of URI-reference as a URI or a relative-ref with a colon in
the first segment has been eliminated through the use of five
separate path matching rules. separate path matching rules.
The fragment identifier has been moved back into the section on The fragment identifier has been moved back into the section on
generic syntax components and within the URI and relative-URI rules, generic syntax components and within the URI and relative-ref rules,
though it remains excluded from absolute-URI. The number sign ("#") though it remains excluded from absolute-URI. The number sign ("#")
character has been moved back to the reserved set as a result of character has been moved back to the reserved set as a result of
reintegrating the fragment syntax. reintegrating the fragment syntax.
The ABNF has been corrected to allow a relative path to be empty. The ABNF has been corrected to allow the path component to be empty.
This also allows an absolute-URI to consist of nothing after the This also allows an absolute-URI to consist of nothing after the
"scheme:", as is present in practice with the "dav:" namespace "scheme:", as is present in practice with the "dav:" namespace
[RFC2518] and the "about:" scheme used internally by many WWW browser [RFC2518] and the "about:" scheme used internally by many WWW browser
implementations. The ambiguity regarding the boundary between implementations. The ambiguity regarding the boundary between
authority and path has been eliminated through the use of five authority and path has been eliminated through the use of five
separate path matching rules. separate path matching rules.
Registry-based naming authorities that use the generic syntax are now Registry-based naming authorities that use the generic syntax are now
defined within the host rule. This change allows current defined within the host rule. This change allows current
implementations, where whatever name provided is simply fed to the implementations, where whatever name provided is simply fed to the
skipping to change at page 55, line 10 skipping to change at page 57, line 10
of the normative references are updated prior to publication, the of the normative references are updated prior to publication, the
associated reference in this document can be safely updated as well. associated reference in this document can be safely updated as well.
This document has been produced using the xml2rfc tool set; the XML This document has been produced using the xml2rfc tool set; the XML
version can be obtained via the URI listed in the editorial note. version can be obtained via the URI listed in the editorial note.
Index Index
A A
ABNF 11 ABNF 11
absolute 26 absolute 26
absolute-path 25 absolute-path 26
absolute-URI 26 absolute-URI 26
access 9 access 9
authority 15, 17 authority 16, 17
B B
base URI 28 base URI 28
C C
character encoding 4 character encoding 4
character 4 character 4
characters 11 characters 11
coded character set 4 coded character set 4
D D
dec-octet 20 dec-octet 20
dereference 9 dereference 9
dot-segments 22 dot-segments 22
F F
fragment 15, 23 fragment 16, 24
G G
gen-delims 12 gen-delims 12
generic syntax 6 generic syntax 6
H H
h16 19 h16 19
hier-part 15 hier-part 16
hierarchical 10 hierarchical 10
host 18 host 18
I I
identifier 5 identifier 5
IP-literal 19 IP-literal 19
IPv4 20 IPv4 20
IPv4address 20 IPv4address 20
IPv6 19 IPv6 19
IPv6address 19 IPv6address 19, 20
IPvFuture 19 IPvFuture 19
L L
locator 7 locator 7
ls32 19 ls32 19
M M
merge 31 merge 32
N N
name 7 name 7
network-path 25 network-path 26
P P
path 15, 21 path 16, 22
path-abempty 21 path-abempty 22
path-absolute 21 path-absolute 22
path-empty 21 path-empty 22
path-noscheme 21 path-noscheme 22
path-rootless 21 path-rootless 22
path-abempty 15 path-abempty 16
path-absolute 15 path-absolute 16
path-empty 15 path-empty 16
path-rootless 15 path-rootless 16
pchar 21 pchar 22
pct-encoded 12 pct-encoded 12
percent-encoding 12 percent-encoding 12
port 21 port 21
Q Q
query 15, 23 query 16, 23
R R
reg-name 20 reg-name 20
registered name 20 registered name 20
relative 10, 28 relative 10, 28
relative-path 25 relative-path 26
relative-URI 25 relative-ref 26
remove_dot_segments 31 remove_dot_segments 32
representation 9 representation 9
reserved 12 reserved 12
resolution 9, 28 resolution 9, 28
resource 5 resource 5
retrieval 9 retrieval 9
S S
same-document 26 same-document 27
sameness 9 sameness 9
scheme 15, 16 scheme 16, 16
segment 21 segment 22
segment-nz 21 segment-nz 22
segment-nz-nc 21 segment-nz-nc 22
sub-delims 12 sub-delims 12
suffix 27 suffix 27
T T
transcription 7 transcription 7
U U
uniform 4 uniform 4
unreserved 13 unreserved 13
URI grammar URI grammar
skipping to change at page 57, line 27 skipping to change at page 59, line 27
DIGIT 11 DIGIT 11
DQUOTE 11 DQUOTE 11
fragment 16, 24, 26 fragment 16, 24, 26
gen-delims 12 gen-delims 12
h16 19 h16 19
HEXDIG 11 HEXDIG 11
hier-part 16 hier-part 16
host 17, 18 host 17, 18
IP-literal 19 IP-literal 19
IPv4address 20 IPv4address 20
IPv6address 19, 19 IPv6address 19, 20
IPvFuture 19 IPvFuture 19
LF 11 LF 11
ls32 19 ls32 19
mark 13 mark 13
OCTET 11 OCTET 11
path 22 path 22
path-abempty 16, 22 path-abempty 16, 22
path-absolute 16, 22 path-absolute 16, 22
path-empty 16, 22 path-empty 16, 22
path-noscheme 22 path-noscheme 22
path-rootless 16, 22 path-rootless 16, 22
pchar 22, 23, 24 pchar 22, 23, 24
pct-encoded 12 pct-encoded 12
port 17, 21 port 17, 21
query 16, 23, 26, 26 query 16, 23, 26, 26
reg-name 20 reg-name 20
relative-URI 25, 26 relative-ref 25, 26
reserved 12 reserved 12
scheme 16, 16, 26 scheme 16, 16, 26
segment 22 segment 22
segment-nz 22 segment-nz 22
segment-nz-nc 22 segment-nz-nc 22
SP 11 SP 11
sub-delims 12 sub-delims 12
unreserved 13 unreserved 13
URI 16, 25 URI 16, 25
URI-reference 25 URI-reference 25
userinfo 17, 17 userinfo 17, 18
URI 15 URI 16
URI-reference 25 URI-reference 25
URL 7 URL 7
URN 7 URN 7
userinfo 17 userinfo 17, 18
Intellectual Property Statement Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be on the procedures with respect to rights in RFC documents can be
 End of changes. 87 change blocks. 
233 lines changed or deleted 305 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/