idnits 2.17.1 

draft-fielding-uri-rfc2396bis-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3667, Section 5.1 on line 17.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 2646.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2630.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2636.

  ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line
     2652), which is fine, but *also* found old RFC 2026, Section 10.4C,
     paragraph 1 text on line 35.

  ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure
     Acknowledgement -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.

  ** The document seems to lack an RFC 3979 Section 5, para. 1 IPR Disclosure
     Acknowledgement -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate
     instead of verbatim RFC 3978 boilerplate.  After 6 May 2005, submission
     of drafts without verbatim RFC 3978 boilerplate is not accepted.

     The following non-3978 patterns matched text found in the document. 
     That text should be removed or replaced:

        By submitting this Internet-Draft, I certify that any applicable patent
        or other IPR claims of which I am aware have been disclosed, or
        will be disclosed, and any of which I become aware will be
        disclosed, in accordance with RFC 3668.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the
     document.

  -- The draft header indicates that this document obsoletes RFC2732, but the
     abstract doesn't seem to mention this, which it should.

  -- The draft header indicates that this document obsoletes RFC2396, but the
     abstract doesn't seem to mention this, which it should.

  -- The draft header indicates that this document obsoletes RFC1808, but the
     abstract doesn't seem to mention this, which it should.

  -- The draft header indicates that this document updates RFC1738, but the
     abstract doesn't seem to mention this, which it should.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 700 has weird spacing: '...  query   frag...'

     (Using the creation date from RFC1738, updated by this document, for
     RFC5378 checks: 1994-12-01)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (April 16, 2004) is 7314 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC2277' is defined on line 2076, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ASCII'

  ** Obsolete normative reference: RFC 2234 (Obsoleted by RFC 4234)

  -- Obsolete informational reference (is this intentional?): RFC 1738
     (Obsoleted by RFC 4248, RFC 4266)

  -- Obsolete informational reference (is this intentional?): RFC 1808
     (Obsoleted by RFC 3986)

  -- Obsolete informational reference (is this intentional?): RFC 2141
     (Obsoleted by RFC 8141)

  -- Obsolete informational reference (is this intentional?): RFC 2396
     (Obsoleted by RFC 3986)

  -- Obsolete informational reference (is this intentional?): RFC 2518
     (Obsoleted by RFC 4918)

  -- Obsolete informational reference (is this intentional?): RFC 2717
     (Obsoleted by RFC 4395)

  -- Obsolete informational reference (is this intentional?): RFC 2718
     (Obsoleted by RFC 4395)

  -- Obsolete informational reference (is this intentional?): RFC 2732
     (Obsoleted by RFC 3986)

  -- Obsolete informational reference (is this intentional?): RFC 3490
     (Obsoleted by RFC 5890, RFC 5891)

  -- Obsolete informational reference (is this intentional?): RFC 3513
     (Obsoleted by RFC 4291)


     Summary: 11 errors (**), 0 flaws (~~), 5 warnings (==), 21 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                     T. Berners-Lee
2	Internet-Draft                                                   W3C/MIT
3	Updates: 1738 (if approved)                                  R. Fielding
4	Obsoletes: 2732, 2396, 1808 (if approved)                   Day Software
5	                                                             L. Masinter
6	Expires: October 15, 2004                                          Adobe
7	                                                          April 16, 2004

9	           Uniform Resource Identifier (URI): Generic Syntax
10	                    draft-fielding-uri-rfc2396bis-05

12	Status of this Memo

14	   By submitting this Internet-Draft, I certify that any applicable
15	   patent or other IPR claims of which I am aware have been disclosed,
16	   and any of which I become aware will be disclosed, in accordance with
17	   RFC 3668.

19	   Internet-Drafts are working documents of the Internet Engineering
20	   Task Force (IETF), its areas, and its working groups. Note that other
21	   groups may also distribute working documents as Internet-Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time. It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   <http://www.ietf.org/ietf/1id-abstracts.txt>.
30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   <http://www.ietf.org/shadow.html>.

33	Copyright Notice

35	   Copyright (C) The Internet Society (2004). All Rights Reserved.

37	Abstract

39	   A Uniform Resource Identifier (URI) is a compact sequence of
40	   characters for identifying an abstract or physical resource.  This
41	   specification defines the generic URI syntax and a process for
42	   resolving URI references that might be in relative form, along with
43	   guidelines and security considerations for the use of URIs on the
44	   Internet.

46	   The URI syntax defines a grammar that is a superset of all valid
47	   URIs, such that an implementation can parse the common components of
48	   a URI reference without knowing the scheme-specific requirements of
49	   every possible identifier.  This specification does not define a
50	   generative grammar for URIs; that task is performed by the individual
51	   specifications of each URI scheme.

53	Editorial Note

55	   Discussion of this draft and comments to the editors should be sent
56	   to the uri@w3.org mailing list.  An issues list and version history
57	   is available at <http://gbiv.com/protocols/uri/rev-2002/issues.html>.

59	Table of Contents

61	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
62	     1.1   Overview of URIs . . . . . . . . . . . . . . . . . . . . .  4
63	       1.1.1   Generic Syntax . . . . . . . . . . . . . . . . . . . .  6
64	       1.1.2   Examples . . . . . . . . . . . . . . . . . . . . . . .  6
65	       1.1.3   URI, URL, and URN  . . . . . . . . . . . . . . . . . .  6
66	     1.2   Design Considerations  . . . . . . . . . . . . . . . . . .  7
67	       1.2.1   Transcription  . . . . . . . . . . . . . . . . . . . .  7
68	       1.2.2   Separating Identification from Interaction . . . . . .  8
69	       1.2.3   Hierarchical Identifiers . . . . . . . . . . . . . . .  9
70	     1.3   Syntax Notation  . . . . . . . . . . . . . . . . . . . . . 10
71	   2.  Characters . . . . . . . . . . . . . . . . . . . . . . . . . . 10
72	     2.1   Percent-Encoding . . . . . . . . . . . . . . . . . . . . . 11
73	     2.2   Reserved Characters  . . . . . . . . . . . . . . . . . . . 11
74	     2.3   Unreserved Characters  . . . . . . . . . . . . . . . . . . 12
75	     2.4   When to Encode or Decode . . . . . . . . . . . . . . . . . 13
76	     2.5   Identifying Data . . . . . . . . . . . . . . . . . . . . . 13
77	   3.  Syntax Components  . . . . . . . . . . . . . . . . . . . . . . 15
78	     3.1   Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 15
79	     3.2   Authority  . . . . . . . . . . . . . . . . . . . . . . . . 16
80	       3.2.1   User Information . . . . . . . . . . . . . . . . . . . 17
81	       3.2.2   Host . . . . . . . . . . . . . . . . . . . . . . . . . 17
82	       3.2.3   Port . . . . . . . . . . . . . . . . . . . . . . . . . 20
83	     3.3   Path . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
84	     3.4   Query  . . . . . . . . . . . . . . . . . . . . . . . . . . 22
85	     3.5   Fragment . . . . . . . . . . . . . . . . . . . . . . . . . 23
86	   4.  Usage  . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
87	     4.1   URI Reference  . . . . . . . . . . . . . . . . . . . . . . 24
88	     4.2   Relative URI . . . . . . . . . . . . . . . . . . . . . . . 25
89	     4.3   Absolute URI . . . . . . . . . . . . . . . . . . . . . . . 25
90	     4.4   Same-document Reference  . . . . . . . . . . . . . . . . . 25
91	     4.5   Suffix Reference . . . . . . . . . . . . . . . . . . . . . 26

93	   5.  Reference Resolution . . . . . . . . . . . . . . . . . . . . . 27
94	     5.1   Establishing a Base URI  . . . . . . . . . . . . . . . . . 27
95	       5.1.1   Base URI Embedded in Content . . . . . . . . . . . . . 27
96	       5.1.2   Base URI from the Encapsulating Entity . . . . . . . . 28
97	       5.1.3   Base URI from the Retrieval URI  . . . . . . . . . . . 28
98	       5.1.4   Default Base URI . . . . . . . . . . . . . . . . . . . 28
99	     5.2   Relative Resolution  . . . . . . . . . . . . . . . . . . . 29
100	       5.2.1   Pre-parse the Base URI . . . . . . . . . . . . . . . . 29
101	       5.2.2   Transform References . . . . . . . . . . . . . . . . . 29
102	       5.2.3   Merge Paths  . . . . . . . . . . . . . . . . . . . . . 30
103	       5.2.4   Remove Dot Segments  . . . . . . . . . . . . . . . . . 31
104	     5.3   Component Recomposition  . . . . . . . . . . . . . . . . . 33
105	     5.4   Reference Resolution Examples  . . . . . . . . . . . . . . 34
106	       5.4.1   Normal Examples  . . . . . . . . . . . . . . . . . . . 34
107	       5.4.2   Abnormal Examples  . . . . . . . . . . . . . . . . . . 34
108	   6.  Normalization and Comparison . . . . . . . . . . . . . . . . . 36
109	     6.1   Equivalence  . . . . . . . . . . . . . . . . . . . . . . . 36
110	     6.2   Comparison Ladder  . . . . . . . . . . . . . . . . . . . . 37
111	       6.2.1   Simple String Comparison . . . . . . . . . . . . . . . 37
112	       6.2.2   Syntax-based Normalization . . . . . . . . . . . . . . 37
113	       6.2.3   Scheme-based Normalization . . . . . . . . . . . . . . 38
114	       6.2.4   Protocol-based Normalization . . . . . . . . . . . . . 39
115	     6.3   Canonical Form . . . . . . . . . . . . . . . . . . . . . . 40
116	   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 40
117	     7.1   Reliability and Consistency  . . . . . . . . . . . . . . . 40
118	     7.2   Malicious Construction . . . . . . . . . . . . . . . . . . 41
119	     7.3   Back-end Transcoding . . . . . . . . . . . . . . . . . . . 41
120	     7.4   Rare IP Address Formats  . . . . . . . . . . . . . . . . . 42
121	     7.5   Sensitive Information  . . . . . . . . . . . . . . . . . . 43
122	     7.6   Semantic Attacks . . . . . . . . . . . . . . . . . . . . . 43
123	   8.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 44
124	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 45
125	   9.1   Normative References . . . . . . . . . . . . . . . . . . . . 45
126	   9.2   Informative References . . . . . . . . . . . . . . . . . . . 45
127	       Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 47
128	   A.  Collected ABNF for URI . . . . . . . . . . . . . . . . . . . . 48
129	   B.  Parsing a URI Reference with a Regular Expression  . . . . . . 50
130	   C.  Delimiting a URI in Context  . . . . . . . . . . . . . . . . . 51
131	   D.  Summary of Non-editorial Changes . . . . . . . . . . . . . . . 52
132	     D.1   Additions  . . . . . . . . . . . . . . . . . . . . . . . . 52
133	     D.2   Modifications from RFC 2396  . . . . . . . . . . . . . . . 53
134	       Index  . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
135	       Intellectual Property and Copyright Statements . . . . . . . . 58

137	1.  Introduction

139	   A Uniform Resource Identifier (URI) provides a simple and extensible
140	   means for identifying a resource.  This specification of URI syntax
141	   and semantics is derived from concepts introduced by the World Wide
142	   Web global information initiative, whose use of such identifiers
143	   dates from 1990 and is described in "Universal Resource Identifiers
144	   in WWW" [RFC1630], and is designed to meet the recommendations laid
145	   out in "Functional Recommendations for Internet Resource Locators"
146	   [RFC1736] and "Functional Requirements for Uniform Resource Names"
147	   [RFC1737].

149	   This document obsoletes [RFC2396], which merged "Uniform Resource
150	   Locators" [RFC1738] and "Relative Uniform Resource Locators"
151	   [RFC1808] in order to define a single, generic syntax for all URIs.
152	   It excludes those portions of RFC 1738 that defined the specific
153	   syntax of individual URI schemes; those portions will be updated as
154	   separate documents. The process for registration of new URI schemes
155	   is defined separately by [RFC2717]. Advice for designers of new URI
156	   schemes can be found in [RFC2718].

158	   All significant changes from RFC 2396 are noted in Appendix D.

160	   This specification uses the terms "character" and "coded character
161	   set" in accordance with the definitions provided in [RFC2978], and
162	   "character encoding" in place of what [RFC2978] refers to as a
163	   "charset".

165	1.1  Overview of URIs

167	   URIs are characterized as follows:

169	   Uniform
170	      Uniformity provides several benefits: it allows different types of
171	      resource identifiers to be used in the same context, even when the
172	      mechanisms used to access those resources may differ; it allows
173	      uniform semantic interpretation of common syntactic conventions
174	      across different types of resource identifiers; it allows
175	      introduction of new types of resource identifiers without
176	      interfering with the way that existing identifiers are used; and,
177	      it allows the identifiers to be reused in many different contexts,
178	      thus permitting new applications or protocols to leverage a
179	      pre-existing, large, and widely-used set of resource identifiers.

181	   Resource
182	      Anything that has been named or described can be a resource.
183	      Familiar examples include an electronic document, an image, a
184	      service (e.g., "today's weather report for Los Angeles"), and a
185	      collection of other resources. A resource is not necessarily
186	      accessible via the Internet; e.g., human beings, corporations, and
187	      bound books in a library can also be resources. Likewise, abstract
188	      concepts can be resources, such as the operators and operands of a
189	      mathematical equation, the types of a relationship (e.g., "parent"
190	      or "employee"), or numeric values (e.g., zero, one, and infinity).
191	      These things are called resources because they each can be
192	      considered a source of supply or support, or an available means,
193	      for some system, where such systems may be as diverse as the World
194	      Wide Web, a filesystem, an ontological graph, a theorem prover, or
195	      some other form of system for the direct or indirect observation
196	      and/or manipulation of resources. Note that "supply" is not
197	      necessary for a thing to be considered a resource: the ability to
198	      simply refer to that thing is often sufficient to support the
199	      operation of a given system.

201	   Identifier
202	      An identifier embodies the information required to distinguish
203	      what is being identified from all other things within its scope of
204	      identification. Our use of the terms "identify" and "identifying"
205	      refer to this process of distinguishing from many to one; they
206	      should not be mistaken as an assumption that the identifier
207	      defines the identity of what is referenced, though that may be the
208	      case for some identifiers.

210	   A URI is an identifier that consists of a sequence of characters
211	   matching the syntax rule named <URI> in Section 3. A URI can be used
212	   to refer to a resource. This specification does not place any limits
213	   on the nature of a resource, the reasons why an application might
214	   wish to refer to a resource, or the kinds of system that might use
215	   URIs for the sake of identifying resources.

217	   URIs have a global scope and must be interpreted consistently
218	   regardless of context, though the result of that interpretation may
219	   be in relation to the end-user's context.  For example, "http://
220	   localhost/" has the same interpretation for every user of that
221	   reference, even though the network interface corresponding to
222	   "localhost" may be different for each end-user: interpretation is
223	   independent of access.  However, an action made on the basis of that
224	   reference will take place in relation to the end-user's context,
225	   which implies that an action intended to refer to a single, globally
226	   unique thing must use a URI that distinguishes that resource from all
227	   other things.  URIs that identify in relation to the end-user's local
228	   context should only be used when the context itself is a defining
229	   aspect of the resource, such as when an on-line Linux manual refers
230	   to a file on the end-user's filesystem (e.g., "file:///etc/hosts").

232	1.1.1  Generic Syntax

234	   Each URI begins with a scheme name, as defined in Section 3.1, that
235	   refers to a specification for assigning identifiers within that
236	   scheme. As such, the URI syntax is a federated and extensible naming
237	   system wherein each scheme's specification may further restrict the
238	   syntax and semantics of identifiers using that scheme.

240	   This specification defines those elements of the URI syntax that are
241	   required of all URI schemes or are common to many URI schemes.  It
242	   thus defines the syntax and semantics that are needed to implement a
243	   scheme-independent parsing mechanism for URI references, such that
244	   the scheme-dependent handling of a URI can be postponed until the
245	   scheme-dependent semantics are needed.  Likewise, protocols and data
246	   formats that make use of URI references can refer to this
247	   specification as defining the range of syntax allowed for all URIs,
248	   including those schemes that have yet to be defined.

250	   A parser of the generic URI syntax is capable of parsing any URI
251	   reference into its major components; once the scheme is determined,
252	   further scheme-specific parsing can be performed on the components.
253	   In other words, the URI generic syntax is a superset of the syntax of
254	   all URI schemes.

256	1.1.2  Examples

258	   The following examples illustrate URIs that are in common use.

260	      ftp://ftp.is.co.za/rfc/rfc1808.txt

262	      http://www.ietf.org/rfc/rfc2396.txt

264	      mailto:John.Doe@example.com

266	      news:comp.infosystems.www.servers.unix

268	      telnet://melvyl.ucop.edu/

270	1.1.3  URI, URL, and URN

272	   A URI can be further classified as a locator, a name, or both.  The
273	   term "Uniform Resource Locator" (URL) refers to the subset of URIs
274	   that, in addition to identifying a resource, provide a means of
275	   locating the resource by describing its primary access mechanism
276	   (e.g., its network "location").  The term "Uniform Resource Name"
277	   (URN) has been used historically to refer to both URIs under the
278	   "urn" scheme [RFC2141], which are required to remain globally unique
279	   and persistent even when the resource ceases to exist or becomes
280	   unavailable, and to any other URI with the properties of a name.

282	   An individual scheme does not need to be classified as being just one
283	   of "name" or "locator".  Instances of URIs from any given scheme may
284	   have the characteristics of names or locators or both, often
285	   depending on the persistence and care in the assignment of
286	   identifiers by the naming authority, rather than any quality of the
287	   scheme.  Future specifications and related documentation should use
288	   the general term "URI", rather than the more restrictive terms URL
289	   and URN [RFC3305].

291	1.2  Design Considerations

293	1.2.1  Transcription

295	   The URI syntax has been designed with global transcription as one of
296	   its main considerations.  A URI is a sequence of characters from a
297	   very limited set: the letters of the basic Latin alphabet, digits,
298	   and a few special characters.  A URI may be represented in a variety
299	   of ways: e.g., ink on paper, pixels on a screen, or a sequence of
300	   character encoding octets.  The interpretation of a URI depends only
301	   on the characters used and not how those characters are represented
302	   in a network protocol.

304	   The goal of transcription can be described by a simple scenario.
305	   Imagine two colleagues, Sam and Kim, sitting in a pub at an
306	   international conference and exchanging research ideas.  Sam asks Kim
307	   for a location to get more information, so Kim writes the URI for the
308	   research site on a napkin.  Upon returning home, Sam takes out the
309	   napkin and types the URI into a computer, which then retrieves the
310	   information to which Kim referred.

312	   There are several design considerations revealed by the scenario:

314	   o  A URI is a sequence of characters that is not always represented
315	      as a sequence of octets.

317	   o  A URI might be transcribed from a non-network source, and thus
318	      should consist of characters that are most likely to be able to be
319	      entered into a computer, within the constraints imposed by
320	      keyboards (and related input devices) across languages and
321	      locales.

323	   o  A URI often needs to be remembered by people, and it is easier for
324	      people to remember a URI when it consists of meaningful or
325	      familiar components.

327	   These design considerations are not always in alignment.  For
328	   example, it is often the case that the most meaningful name for a URI
329	   component would require characters that cannot be typed into some
330	   systems.  The ability to transcribe a resource identifier from one
331	   medium to another has been considered more important than having a
332	   URI consist of the most meaningful of components.

334	   In local or regional contexts and with improving technology, users
335	   might benefit from being able to use a wider range of characters;
336	   such use is not defined by this specification.  Percent-encoded
337	   octets (Section 2.1) may be used within a URI to represent characters
338	   outside the range of the US-ASCII coded character set if such
339	   representation is allowed by the scheme or by the protocol element in
340	   which the URI is referenced; such a definition should specify the
341	   character encoding used to map those characters to octets prior to
342	   being percent-encoded for the URI.

344	1.2.2  Separating Identification from Interaction

346	   A common misunderstanding of URIs is that they are only used to refer
347	   to accessible resources.  In fact, the URI alone only provides
348	   identification; access to the resource is neither guaranteed nor
349	   implied by the presence of a URI.  Instead, an operation (if any)
350	   associated with a URI reference is defined by the protocol element,
351	   data format attribute, or natural language text in which it appears.

353	   Given a URI, a system may attempt to perform a variety of operations
354	   on the resource, as might be characterized by such words as "access",
355	   "update", "replace", or "find attributes".  Such operations are
356	   defined by the protocols that make use of URIs, not by this
357	   specification.  However, we do use a few general terms for describing
358	   common operations on URIs.  URI "resolution" is the process of
359	   determining an access mechanism and the appropriate parameters
360	   necessary to dereference a URI; such resolution may require several
361	   iterations.  To use that access mechanism to perform an action on the
362	   URI's resource is to "dereference" the URI.

364	   When URIs are used within information systems to identify sources of
365	   information, the most common form of URI dereference is "retrieval":
366	   making use of a URI in order to retrieve a representation of its
367	   associated resource.  A "representation" is a sequence of octets,
368	   along with representation metadata describing those octets, that
369	   constitutes a record of the state of the resource at the time that
370	   the representation is generated.  Retrieval is achieved by a process
371	   that might include using the URI as a cache key to check for a
372	   locally cached representation, resolution of the URI to determine an
373	   appropriate access mechanism (if any), and dereference of the URI for
374	   the sake of applying a retrieval operation. Depending on the
375	   protocols used to perform the retrieval, additional information might
376	   be supplied about the resource (resource metadata) and its relation
377	   to other resources.

379	   URI references in information systems are designed to be
380	   late-binding: the result of an access is generally determined at the
381	   time it is accessed and may vary over time or due to other aspects of
382	   the interaction. Such references are created in order to be be used
383	   in the future: what is being identified is not some specific result
384	   that was obtained in the past, but rather some characteristic that is
385	   expected to be true for future results.  In such cases, the resource
386	   referred to by the URI is actually a sameness of characteristics as
387	   observed over time, perhaps elucidated by additional comments or
388	   assertions made by the resource provider.

390	   Although many URI schemes are named after protocols, this does not
391	   imply that use of such a URI will result in access to the resource
392	   via the named protocol.  URIs are often used simply for the sake of
393	   identification.  Even when a URI is used to retrieve a representation
394	   of a resource, that access might be through gateways, proxies,
395	   caches, and name resolution services that are independent of the
396	   protocol associated with the scheme name, and the resolution of some
397	   URIs may require the use of more than one protocol (e.g., both DNS
398	   and HTTP are typically used to access an "http" URI's origin server
399	   when a representation isn't found in a local cache).

401	1.2.3  Hierarchical Identifiers

403	   The URI syntax is organized hierarchically, with components listed in
404	   order of decreasing significance from left to right.  For some URI
405	   schemes, the visible hierarchy is limited to the scheme itself:
406	   everything after the scheme component delimiter (":") is considered
407	   opaque to URI processing. Other URI schemes make the hierarchy
408	   explicit and visible to generic parsing algorithms.

410	   The generic syntax uses the slash ("/"), question mark ("?"), and
411	   number sign ("#") characters for the purpose of delimiting components
412	   that are significant to the generic parser's hierarchical
413	   interpretation of an identifier.  In addition to aiding the
414	   readability of such identifiers through the consistent use of
415	   familiar syntax, this uniform representation of hierarchy across
416	   naming schemes allows scheme-independent references to be made
417	   relative to that hierarchy.

419	   It is often the case that a group or "tree" of documents has been
420	   constructed to serve a common purpose, wherein the vast majority of
421	   URIs in these documents point to resources within the tree rather
422	   than outside of it.  Similarly, documents located at a particular
423	   site are much more likely to refer to other resources at that site
424	   than to resources at remote sites. Relative referencing of URIs
425	   allows document trees to be partially independent of their location
426	   and access scheme.  For instance, it is possible for a single set of
427	   hypertext documents to be simultaneously accessible and traversable
428	   via each of the "file", "http", and "ftp" schemes if the documents
429	   refer to each other using relative references. Furthermore, such
430	   document trees can be moved, as a whole, without changing any of the
431	   relative references.

433	   A relative URI reference (Section 4.2) refers to a resource by
434	   describing the difference within a hierarchical name space between
435	   the reference context and the target URI.  The reference resolution
436	   algorithm, presented in Section 5, defines how such a reference is
437	   transformed to the target URI. Since relative references can only be
438	   used within the context of a hierarchical URI, designers of new URI
439	   schemes should use a syntax consistent with the generic syntax's
440	   hierarchical components unless there are compelling reasons to forbid
441	   relative referencing within that scheme.

443	   All URIs are parsed by generic syntax parsers when used. A URI scheme
444	   that wishes to remain opaque to hierarchical processing must disallow
445	   the use of slash and question mark characters.  However, since a URI
446	   reference is only modified by the generic parser if it contains a
447	   dot-segment (a complete path segment of "." or "..", as described in
448	   Section 3.3), URI schemes may safely use "/" for other purposes if
449	   they do not allow dot-segments.

451	1.3  Syntax Notation

453	   This specification uses the Augmented Backus-Naur Form (ABNF)
454	   notation of [RFC2234], including the following core ABNF syntax rules
455	   defined by that specification: ALPHA (letters), CR (carriage return),
456	   DIGIT (decimal digits), DQUOTE (double quote), HEXDIG (hexadecimal
457	   digits), LF (line feed), and SP (space). The complete URI syntax is
458	   collected in Appendix A.

460	2.  Characters

462	   The URI syntax provides a method of encoding data, presumably for the
463	   sake of identifying a resource, as a sequence of characters.  The URI
464	   characters are, in turn, frequently encoded as octets for transport
465	   or presentation.  This specification does not mandate any particular
466	   character encoding for mapping between URI characters and the octets
467	   used to store or transmit those characters.  When a URI appears in a
468	   protocol element, the character encoding is defined by that protocol;
469	   absent such a definition, a URI is assumed to be in the same
470	   character encoding as the surrounding text.

472	   The ABNF notation defines its terminal values to be non-negative
473	   integers (codepoints) based on the US-ASCII coded character set
474	   [ASCII].  Since a URI is a sequence of characters, we must invert
475	   that relation in order to understand the URI syntax. Therefore, the
476	   integer values used by the ABNF must be mapped back to their
477	   corresponding characters via US-ASCII in order to complete the syntax
478	   rules.

480	   A URI is composed from a limited set of characters consisting of
481	   digits, letters, and a few graphic symbols.  A reserved subset of
482	   those characters may be used to delimit syntax components within a
483	   URI, while the remaining characters, including both the unreserved
484	   set and those reserved characters not acting as delimiters, define
485	   each component's identifying data.

487	2.1  Percent-Encoding

489	   A percent-encoding mechanism is used to represent a data octet in a
490	   component when that octet's corresponding character is outside the
491	   allowed set or is being used as a delimiter of, or within, the
492	   component. A percent-encoded octet is encoded as a character triplet,
493	   consisting of the percent character "%" followed by the two
494	   hexadecimal digits representing that octet's numeric value.  For
495	   example, "%20" is the percent-encoding for the binary octet
496	   "00100000" (ABNF: %x20), which in US-ASCII corresponds to the space
497	   character (SP). Section 2.4 describes when percent-encoding and
498	   decoding is applied.

500	      pct-encoded = "%" HEXDIG HEXDIG

502	   The uppercase hexadecimal digits 'A' through 'F' are equivalent to
503	   the lowercase digits 'a' through 'f', respectively.  Two URIs that
504	   differ only in the case of hexadecimal digits used in percent-encoded
505	   octets are equivalent.  For consistency, URI producers and
506	   normalizers should use uppercase hexadecimal digits for all
507	   percent-encodings.

509	2.2  Reserved Characters

511	   URIs include components and subcomponents that are delimited by
512	   characters in the "reserved" set.  These characters are called
513	   "reserved" because they may (or may not) be defined as delimiters by
514	   the generic syntax, by each scheme-specific syntax, or by the
515	   implementation-specific syntax of a URI's dereferencing algorithm.
516	   If data for a URI component would conflict with a reserved
517	   character's purpose as a delimiter, then the conflicting data must be
518	   percent-encoded before forming the URI.

520	      reserved    = gen-delims / sub-delims
521	      gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

523	      sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
524	                  / "*" / "+" / "," / ";" / "="

526	   The purpose of reserved characters is to provide a set of delimiting
527	   characters that are distinguishable from other data within a URI.
528	   URIs that differ in the replacement of a reserved character with its
529	   corresponding percent-encoded octet are not equivalent.
530	   Percent-encoding a reserved character, or decoding a percent-encoded
531	   octet that corresponds to a reserved character, will change how the
532	   URI is interpreted by most applications.  Thus, characters in the
533	   reserved set are protected from normalization and are therefore safe
534	   to be used by scheme-specific and producer-specific algorithms for
535	   delimiting data subcomponents within a URI.

537	   A subset of the reserved characters (gen-delims) are used as
538	   delimiters of the generic URI components described in Section 3. A
539	   component's ABNF syntax rule will not use the reserved or gen-delims
540	   rule names directly; instead, each syntax rule lists the characters
541	   allowed within that component (i.e., not delimiting it) and any of
542	   those characters that are also in the reserved set are "reserved" for
543	   use as subcomponent delimiters within the component.  Only the most
544	   common subcomponents are defined by this specification; other
545	   subcomponents may be defined by a URI scheme's specification, or by
546	   the implementation-specific syntax of a URI's dereferencing
547	   algorithm, provided that such subcomponents are delimited by
548	   characters in the reserved set allowed within that component.

550	   URI producing applications should percent-encode data octets that
551	   correspond to characters in the reserved set.  However, if a reserved
552	   character is found in a URI component and no delimiting role is known
553	   for that character, then it should be interpreted as representing the
554	   data octet corresponding to that character's encoding in US-ASCII.

556	2.3  Unreserved Characters

558	   Characters that are allowed in a URI but do not have a reserved
559	   purpose are called unreserved.  These include uppercase and lowercase
560	   letters, decimal digits, hyphen, period, underscore, and tilde.

562	      unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

564	   URIs that differ in the replacement of an unreserved character with
565	   its corresponding percent-encoded octet are equivalent: they identify
566	   the same resource.  However, percent-encoded unreserved characters
567	   may change the result of some URI comparisons (Section 6),
568	   potentially leading to incorrect or inefficient behavior. For
569	   consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A
570	   and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore
571	   (%5F), or tilde (%7E) should not be created by URI producers and,
572	   when found in a URI, should be decoded to their corresponding
573	   unreserved character by URI normalizers.

575	2.4  When to Encode or Decode

577	   Under normal circumstances, the only time that octets within a URI
578	   are percent-encoded is during the process of producing the URI from
579	   its component parts.  It is during that process that an
580	   implementation determines which of the reserved characters are to be
581	   used as subcomponent delimiters and which can be safely used as data.
582	   Once produced, a URI is always in its percent-encoded form.

584	   When a URI is dereferenced, the components and subcomponents
585	   significant to the scheme-specific dereferencing process (if any)
586	   must be parsed and separated before the percent-encoded octets within
587	   those components can be safely decoded, since otherwise the data may
588	   be mistaken for component delimiters.  The only exception is for
589	   percent-encoded octets corresponding to characters in the unreserved
590	   set, which can be decoded at any time.  For example, the octet
591	   corresponding to the tilde ("~") character is often encoded as "%7E"
592	   by older URI processing software; the "%7E" can be replaced by "~"
593	   without changing its interpretation.

595	   Because the percent ("%") character serves as the indicator for
596	   percent-encoded octets, it must be percent-encoded as "%25" in order
597	   for that octet to be used as data within a URI.  Implementations must
598	   not percent-encode or decode the same string more than once, since
599	   decoding an already decoded string might lead to misinterpreting a
600	   percent data octet as the beginning of a percent-encoding, or vice
601	   versa in the case of percent-encoding an already percent-encoded
602	   string.

604	2.5  Identifying Data

606	   URI characters provide identifying data for each of the URI
607	   components, serving as an external interface for identification
608	   between systems. Although the presence and nature of the URI
609	   production interface is hidden from clients that use its URIs, and
610	   thus beyond the scope of the interoperability requirements defined by
611	   this specification, it is a frequent source of confusion and errors
612	   in the interpretation of URI character issues.  Implementers need to
613	   be aware that there are multiple character encodings involved in the
614	   production and transmission of URIs: local name and data encoding,
615	   public interface encoding, URI character encoding, data format
616	   encoding, and protocol encoding.

618	   The first encoding of identifying data is the one in which the local
619	   names or data are stored.  URI producing applications (a.k.a., origin
620	   servers) will typically use the local encoding as the basis for
621	   producing meaningful names.  The URI producer will transform the
622	   local encoding to one that is suitable for a public interface, and
623	   then transform the public interface encoding into the restricted set
624	   of URI characters (reserved, unreserved, and percent-encodings).
625	   Those characters are, in turn, encoded as octets to be used as a
626	   reference within a data format (e.g., a document charset), and such
627	   data formats are often subsequently encoded for transmission over
628	   Internet protocols.

630	   For most systems, an unreserved character appearing within a URI
631	   component is interpreted as representing the data octet corresponding
632	   to that character's encoding in US-ASCII.  Consumers of URIs assume
633	   that the letter "X" corresponds to the octet "01011000", and there is
634	   no harm in making that assumption even when it is incorrect.  A
635	   system that internally provides identifiers in the form of a
636	   different character encoding, such as EBCDIC, will generally perform
637	   character translation of textual identifiers to UTF-8 [RFC3629] (or
638	   some other superset of the US-ASCII character encoding) at an
639	   internal interface, thereby providing more meaningful identifiers
640	   than simply percent-encoding the original octets.

642	   For example, consider an information service that provides data,
643	   stored locally using an EBCDIC-based filesystem, to clients on the
644	   Internet through an HTTP server.  When an author creates a file on
645	   that filesystem with the name "Laguna Beach", their expectation is
646	   that the "http" URI corresponding to that resource would also contain
647	   the meaningful string "Laguna%20Beach".  If, however, that server
648	   produces URIs using an overly-simplistic raw octet mapping, then the
649	   result would be a URI containing
650	   "%D3%81%87%A4%95%81@%C2%85%81%83%88".  An internal transcoding
651	   interface fixes that problem by transcoding the local name to a
652	   superset of US-ASCII prior to producing the URI.  Naturally, proper
653	   interpretation of an incoming URI on such an interface requires that
654	   percent-encoded octets be decoded (e.g., "%20" to SP) before the
655	   reverse transcoding is applied to obtain the local name.

657	   In some cases, the internal interface between a URI component and the
658	   identifying data that it has been crafted to represent is much less
659	   direct than a character encoding translation.  For example, portions
660	   of a URI might reflect a query on non-ASCII data, numeric coordinates
661	   on a map, etc. Likewise, a URI scheme may define components with
662	   additional encoding requirements that are applied prior to forming
663	   the component and producing the URI.

665	   When a new URI scheme defines a component that represents textual
666	   data consisting of characters from the Unicode (ISO/IEC 10646-1)
667	   character set, the data should be encoded first as octets according
668	   to the UTF-8 character encoding [RFC3629], and then only those octets
669	   that do not correspond to characters in the unreserved set should be
670	   percent-encoded.  For example, the character A would be represented
671	   as "A", the character LATIN CAPITAL LETTER A WITH GRAVE would be
672	   represented as "%C3%80", and the character KATAKANA LETTER A would be
673	   represented as "%E3%82%A2".

675	3.  Syntax Components

677	   The generic URI syntax consists of a hierarchical sequence of
678	   components referred to as the scheme, authority, path, query, and
679	   fragment.

681	      URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

683	      hier-part   = "//" authority path-abempty
684	                  / path-abs
685	                  / path-rootless
686	                  / path-empty

688	   The scheme and path components are required, though path may be empty
689	   (no characters).  When authority is present, the path must either be
690	   empty or begin with a slash ("/") character. When authority is not
691	   present, the path cannot begin with two slash characters ("//").
692	   These restrictions result in five different ABNF rules for a path
693	   (Section 3.3), only one of which will match any given URI reference.

695	   The following are two example URIs and their component parts:

697	         foo://example.com:8042/over/there?name=ferret#nose
698	         \_/   \______________/\_________/ \_________/ \__/
699	          |           |            |            |        |
700	       scheme     authority       path        query   fragment
701	          |   _____________________|__
702	         / \ /                        \
703	         urn:example:animal:ferret:nose

705	3.1  Scheme

707	   Each URI begins with a scheme name that refers to a specification for
708	   assigning identifiers within that scheme. As such, the URI syntax is
709	   a federated and extensible naming system wherein each scheme's
710	   specification may further restrict the syntax and semantics of
711	   identifiers using that scheme.

713	   Scheme names consist of a sequence of characters beginning with a
714	   letter and followed by any combination of letters, digits, plus
715	   ("+"), period ("."), or hyphen ("-").  Although scheme is
716	   case-insensitive, the canonical form is lowercase and documents that
717	   specify schemes must do so using lowercase letters.  An
718	   implementation should accept uppercase letters as equivalent to
719	   lowercase in scheme names (e.g., allow "HTTP" as well as "http"), for
720	   the sake of robustness, but should only produce lowercase scheme
721	   names, for consistency.

723	      scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

725	   Individual schemes are not specified by this document. The process
726	   for registration of new URI schemes is defined separately by
727	   [RFC2717].  The scheme registry maintains the mapping between scheme
728	   names and their specifications. Advice for designers of new URI
729	   schemes can be found in [RFC2718].

731	   When presented with a URI that violates one or more scheme-specific
732	   restrictions, the scheme-specific resolution process should flag the
733	   reference as an error rather than ignore the unused parts; doing so
734	   reduces the number of equivalent URIs and helps detect abuses of the
735	   generic syntax that might indicate the URI has been constructed to
736	   mislead the user (Section 7.6).

738	3.2  Authority

740	   Many URI schemes include a hierarchical element for a naming
741	   authority, such that governance of the name space defined by the
742	   remainder of the URI is delegated to that authority (which may, in
743	   turn, delegate it further).  The generic syntax provides a common
744	   means for distinguishing an authority based on a registered name or
745	   server address, along with optional port and user information.

747	   The authority component is preceded by a double slash ("//") and is
748	   terminated by the next slash ("/"), question mark ("?"), or number
749	   sign ("#") character, or by the end of the URI.

751	      authority   = [ userinfo "@" ] host [ ":" port ]

753	   URI producers and normalizers should omit the ":" delimiter that
754	   separates host from port if the port component is empty. Some schemes
755	   do not allow the userinfo and/or port subcomponents.

757	   If a URI contains an authority component, then the path component
758	   must either be empty or begin with a slash ("/") character.
759	   Non-validating parsers (those that merely separate a URI reference
760	   into its major components) will often ignore the subcomponent
761	   structure of authority, treating it as an opaque string from the
762	   double-slash to the first terminating delimiter, until such time as
763	   the URI is dereferenced.

765	3.2.1  User Information

767	   The userinfo subcomponent may consist of a user name and, optionally,
768	   scheme-specific information about how to gain authorization to access
769	   the resource.  The user information, if present, is followed by a
770	   commercial at-sign ("@") that delimits it from the host.

772	      userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )

774	   Use of the format "user:password" in the userinfo field is
775	   deprecated. Applications should not render as clear text any data
776	   after the first colon (":") character found within a userinfo
777	   subcomponent unless the data after the colon is the empty string
778	   (indicating no password). Applications may choose to ignore or reject
779	   such data when received as part of a reference, and should reject the
780	   storage of such data in unencrypted form.  The passing of
781	   authentication information in clear text has proven to be a security
782	   risk in almost every case where it has been used.

784	   Applications that render a URI for the sake of user feedback, such as
785	   in graphical hypertext browsing, should render userinfo in a way that
786	   is distinguished from the rest of a URI, when feasible.  Such
787	   rendering will assist the user in cases where the userinfo has been
788	   misleadingly crafted to look like a trusted domain name (Section
789	   7.6).

791	3.2.2  Host

793	   The host subcomponent of authority is identified by an IP literal
794	   encapsulated within square brackets, an IPv4 address in
795	   dotted-decimal form, or a registered name.  The host subcomponent is
796	   case-insensitive. The presence of a host subcomponent within a URI
797	   does not imply that the scheme requires access to the given host on
798	   the Internet. In many cases, the host syntax is used only for the
799	   sake of reusing the existing registration process created and
800	   deployed for DNS, thus obtaining a globally unique name without the
801	   cost of deploying another registry. However, such use comes with its
802	   own costs: domain name ownership may change over time for reasons not
803	   anticipated by the URI producer. In other cases, the data within the
804	   host component identifies a registered name that has nothing to do
805	   with an Internet host. We use the name "host" for the ABNF rule
806	   because that is its most common purpose, not its only purpose, and
807	   thus should not be considered as semantically limiting the data
808	   within it.

810	      host        = IP-literal / IPv4address / reg-name

812	   The syntax rule for host is ambiguous because it does not completely
813	   distinguish between an IPv4address and a reg-name.  In order to
814	   disambiguate, the syntax, we apply the "first-match-wins" algorithm:
815	   If host matches the rule for IPv4address, then it should be
816	   considered an IPv4 address literal and not a reg-name.  Although host
817	   is case-insensitive, producers and normalizers should use lowercase
818	   for registered names and hexadecimal addresses for the sake of
819	   uniformity, while only using uppercase letters for percent-encodings.

821	   A host identified by an Internet Protocol literal address, version 6
822	   [RFC3513] or later, is distinguished by enclosing the IP literal
823	   within square brackets ("[" and "]").  This is the only place where
824	   square bracket characters are allowed in the URI syntax. In
825	   anticipation of future, as-yet-undefined IP literal address formats,
826	   an optional version flag may be used to indicate such a format
827	   explicitly rather than relying on heuristic determination.

829	      IP-literal = "[" ( IPv6address / IPvFuture  ) "]"

831	      IPvFuture  = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )

833	   The version flag does not indicate the IP version; rather, it
834	   indicates future versions of the literal format.  As such,
835	   implementations must not provide the version flag for existing IPv4
836	   and IPv6 literal addresses. If a URI containing an IP-literal that
837	   starts with "v" (case-insensitive), indicating that the version flag
838	   is present, is dereferenced by an application that does not know the
839	   meaning of that version flag, then the application should return an
840	   appropriate error for "address mechanism not supported".

842	   A host identified by an IPv6 literal address is represented inside
843	   the square brackets without a preceding version flag.  The ABNF
844	   provided here is a translation of the text definition of an IPv6
845	   literal address provided in [RFC3513]. A 128-bit IPv6 address is
846	   divided into eight 16-bit pieces. Each piece is represented
847	   numerically in case-insensitive hexadecimal, using one to four
848	   hexadecimal digits (leading zeroes are permitted). The eight encoded
849	   pieces are given most-significant first, separated by colon
850	   characters.  Optionally, the least-significant two pieces may instead
851	   be represented in IPv4 address textual format. A sequence of one or
852	   more consecutive zero-valued 16-bit pieces within the address may be
853	   elided, omitting all their digits and leaving exactly two consecutive
854	   colons in their place to mark the elision.

856	      IPv6address =                            6( h16 ":" ) ls32
857	                  /                       "::" 5( h16 ":" ) ls32
858	                  / [               h16 ] "::" 4( h16 ":" ) ls32
859	                  / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
860	                  / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
861	                  / [ *3( h16 ":" ) h16 ] "::"    h16 ":"   ls32
862	                  / [ *4( h16 ":" ) h16 ] "::"              ls32
863	                  / [ *5( h16 ":" ) h16 ] "::"              h16
864	                  / [ *6( h16 ":" ) h16 ] "::"

866	      ls32        = ( h16 ":" h16 ) / IPv4address
867	                  ; least-significant 32 bits of address

869	      h16         = 1*4HEXDIG
870	                  ; 16 bits of address represented in hexadecimal

872	   A host identified by an IPv4 literal address is represented in
873	   dotted-decimal notation (a sequence of four decimal numbers in the
874	   range 0 to 255, separated by "."), as described in [RFC1123] by
875	   reference to [RFC0952].  Note that other forms of dotted notation may
876	   be interpreted on some platforms, as described in Section 7.4, but
877	   only the dotted-decimal form of four octets is allowed by this
878	   grammar.

880	      IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet

882	      dec-octet   = DIGIT                 ; 0-9
883	                  / %x31-39 DIGIT         ; 10-99
884	                  / "1" 2DIGIT            ; 100-199
885	                  / "2" %x30-34 DIGIT     ; 200-249
886	                  / "25" %x30-35          ; 250-255

888	   A host identified by a registered name is a sequence of characters
889	   that is usually intended for lookup within a locally-defined host or
890	   service name registry, though the URI's scheme-specific semantics may
891	   require that a specific registry (or fixed name table) be used
892	   instead. The most common name registry mechanism is the Domain Name
893	   System (DNS). A registered name intended for lookup in the DNS uses
894	   the syntax defined in Section 3.5 of [RFC1034] and Section 2.1 of
895	   [RFC1123].    Such a name consists of a sequence of domain labels
896	   separated by ".", each domain label starting and ending with an
897	   alphanumeric character and possibly also containing "-" characters.
898	   The rightmost domain label of a fully qualified domain name in DNS
899	   may be followed by a single "." and should be followed by one if it
900	   is necessary to distinguish between the complete domain name and some
901	   local domain.

903	      reg-name    = 0*255( unreserved / pct-encoded / sub-delims )

905	   If the URI scheme defines a default for host, then that default
906	   applies when the host subcomponent is undefined or when the
907	   registered name is empty (zero length).  For example, the "file" URI
908	   scheme is defined such that no authority, an empty host, and
909	   "localhost" all mean the end-user's machine, whereas the "http"
910	   scheme considers a missing authority or empty host to be invalid.

912	   This specification does not mandate a particular registered name
913	   lookup technology and therefore does not restrict the syntax of
914	   reg-name beyond that necessary for interoperability.  Instead, it
915	   delegates the issue of registered name syntax conformance to the
916	   operating system of each application performing URI resolution, and
917	   that operating system decides what it will allow for the purpose of
918	   host identification. A URI resolution implementation might use DNS,
919	   host tables, yellow pages, NetInfo, WINS, or any other system for
920	   lookup of registered names. However, a globally-scoped naming system,
921	   such as DNS fully-qualified domain names, is necessary for URIs that
922	   are intended to have global scope. URI producers should use names
923	   that conform to the DNS syntax, even when use of DNS is not
924	   immediately apparent.

926	   The reg-name syntax allows percent-encoded octets in order to
927	   represent non-ASCII registered names in a uniform way that is
928	   independent of the underlying name resolution technology; such
929	   non-ASCII characters must first be encoded according to UTF-8
930	   [RFC3629] and then each octet of the corresponding UTF-8 sequence
931	   must be percent-encoded to be represented as URI characters.  URI
932	   producing applications must not use percent-encoding in host unless
933	   it is used to represent a UTF-8 character sequence.  When a non-ASCII
934	   registered name represents an internationalized domain name intended
935	   for resolution via the DNS, the name must be transformed to the IDNA
936	   encoding [RFC3490] prior to name lookup.  URI producers should
937	   provide such registered names in the IDNA encoding, rather than a
938	   percent-encoding, if they wish to maximize interoperability with
939	   legacy URI resolvers.

941	3.2.3  Port

943	   The port subcomponent of authority is designated by an optional port
944	   number in decimal following the host and delimited from it by a
945	   single colon (":") character.

947	      port        = *DIGIT

949	   A scheme may define a default port.  For example, the "http" scheme
950	   defines a default port of "80", corresponding to its reserved TCP
951	   port number. The type of port designated by the port number (e.g.,
952	   TCP, UDP, SCTP, etc.) is defined by the URI scheme.  URI producers
953	   and normalizers should omit the port component and its ":" delimiter
954	   if port is empty or its value would be the same as the scheme's
955	   default.

957	3.3  Path

959	   The path component contains data, usually organized in hierarchical
960	   form, that, along with data in the non-hierarchical query component
961	   (Section 3.4), serves to identify a resource within the scope of the
962	   URI's scheme and naming authority (if any). The path is terminated by
963	   the first question mark ("?") or number sign ("#") character, or by
964	   the end of the URI.

966	   If a URI contains an authority component, then the path component
967	   must either be empty or begin with a slash ("/") character. If a URI
968	   does not contain an authority component, then the path cannot begin
969	   with two slash characters ("//"). In addition, a URI reference
970	   (Section 4.1) may begin with a relative path, in which case the first
971	   path segment cannot contain a colon (":") character.  The ABNF
972	   requires five separate rules to disambiguate these cases, only one of
973	   which will match a given URI reference.  We use the generic term
974	   "path component" to describe the URI substring that is matched by the
975	   parser to one of these rules.

977	      path          = path-abempty    ; begins with "/" or is empty
978	                    / path-abs        ; begins with "/" but not "//"
979	                    / path-noscheme   ; begins with a non-colon segment
980	                    / path-rootless   ; begins with a segment
981	                    / path-empty      ; zero characters

983	      path-abempty  = *( "/" segment )
984	      path-abs      = "/" [ segment-nz *( "/" segment ) ]
985	      path-noscheme = segment-nzc *( "/" segment )
986	      path-rootless = segment-nz *( "/" segment )
987	      path-empty    = 0<pchar>

989	      segment       = *pchar
990	      segment-nz    = 1*pchar
991	      segment-nzc   = 1*( unreserved / pct-encoded / sub-delims / "@" )

993	      pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

995	   A path consists of a sequence of path segments separated by a slash
996	   ("/") character.  A path is always defined for a URI, though the
997	   defined path may be empty (zero length).  Use of the slash character
998	   to indicate hierarchy is only required when a URI will be used as the
999	   context for relative references.  For example, the URI
1000	   <mailto:fred@example.com> has a path of "fred@example.com", whereas
1001	   the URI <foo://info.example.com?fred> has an empty path.

1003	   The path segments "." and "..", also known as dot-segments, are
1004	   defined for relative reference within the path name hierarchy. They
1005	   are intended for use at the beginning of a relative path reference
1006	   (Section 4.2) for indicating relative position within the
1007	   hierarchical tree of names.  This is similar to their role within
1008	   some operating systems' file directory structure to indicate the
1009	   current directory and parent directory, respectively. However, unlike
1010	   a file system, these dot-segments are only interpreted within the URI
1011	   path hierarchy and are removed as part of the resolution process
1012	   (Section 5.2).

1014	   Aside from dot-segments in hierarchical paths, a path segment is
1015	   considered opaque by the generic syntax.  URI-producing applications
1016	   often use the reserved characters allowed in a segment for the
1017	   purpose of delimiting scheme-specific or dereference-handler-specific
1018	   subcomponents. For example, the semicolon (";") and equals ("=")
1019	   reserved characters are often used for delimiting parameters and
1020	   parameter values applicable to that segment.  The comma (",")
1021	   reserved character is often used for similar purposes.  For example,
1022	   one URI producer might use a segment like "name;v=1.1" to indicate a
1023	   reference to version 1.1 of "name", whereas another might use a
1024	   segment like "name,1.1" to indicate the same. Parameter types may be
1025	   defined by scheme-specific semantics, but in most cases the syntax of
1026	   a parameter is specific to the implementation of the URI's
1027	   dereferencing algorithm.

1029	3.4  Query

1031	   The query component contains non-hierarchical data that, along with
1032	   data in the path component (Section 3.3), serves to identify a
1033	   resource within the scope of the URI's scheme and naming authority
1034	   (if any). The query component is indicated by the first question mark
1035	   ("?") character and terminated by a number sign ("#") character or by
1036	   the end of the URI.

1038	      query       = *( pchar / "/" / "?" )

1040	   The characters slash ("/") and question mark ("?") may represent data
1041	   within the query component.  Beware that some older, erroneous
1042	   implementations do not handle such URIs correctly when they are used
1043	   as the base for relative references (Section 5.1), apparently because
1044	   they fail to to distinguish query data from path data when looking
1045	   for hierarchical separators. However, since query components are
1046	   often used to carry identifying information in the form of
1047	   "key=value" pairs, and one frequently used value is a reference to
1048	   another URI, it is sometimes better for usability to avoid
1049	   percent-encoding those characters.

1051	3.5  Fragment

1053	   The fragment identifier component of a URI allows indirect
1054	   identification of a secondary resource by reference to a primary
1055	   resource and additional identifying information.  The identified
1056	   secondary resource may be some portion or subset of the primary
1057	   resource, some view on representations of the primary resource, or
1058	   some other resource defined or described by those representations.  A
1059	   fragment identifier component is indicated by the presence of a
1060	   number sign ("#") character and terminated by the end of the URI.

1062	      fragment    = *( pchar / "/" / "?" )

1064	   The semantics of a fragment identifier are defined by the set of
1065	   representations that might result from a retrieval action on the
1066	   primary resource. The fragment's format and resolution is therefore
1067	   dependent on the media type [RFC2046] of a potentially retrieved
1068	   representation, even though such a retrieval is only performed if the
1069	   URI is dereferenced.  If no such representation exists, then the
1070	   semantics of the fragment are considered unknown and, effectively,
1071	   unconstrained. Fragment identifier semantics are independent of the
1072	   URI scheme and thus cannot be redefined by scheme specifications.

1074	   Individual media types may define their own restrictions on, or
1075	   structure within, the fragment identifier syntax for specifying
1076	   different types of subsets, views, or external references that are
1077	   identifiable as secondary resources by that media type.  If the
1078	   primary resource has multiple representations, as is often the case
1079	   for resources whose representation is selected based on attributes of
1080	   the retrieval request (a.k.a., content negotiation), then whatever is
1081	   identified by the fragment should be consistent across all of those
1082	   representations: each representation should either define the
1083	   fragment such that it corresponds to the same secondary resource,
1084	   regardless of how it is represented, or the fragment should be left
1085	   undefined by the representation (i.e., not found).

1087	   As with any URI, use of a fragment identifier component does not
1088	   imply that a retrieval action will take place.  A URI with a fragment
1089	   identifier may be used to refer to the secondary resource without any
1090	   implication that the primary resource is accessible or will ever be
1091	   accessed.

1093	   Fragment identifiers have a special role in information systems as
1094	   the primary form of client-side indirect referencing, allowing an
1095	   author to specifically identify those aspects of an existing resource
1096	   that are only indirectly provided by the resource owner. As such, the
1097	   fragment identifier is not used in the scheme-specific processing of
1098	   a URI; instead, the fragment identifier is separated from the rest of
1099	   the URI prior to a dereference, and thus the identifying information
1100	   within the fragment itself is dereferenced solely by the user agent
1101	   and regardless of the URI scheme. Although this separate handling is
1102	   often perceived to be a loss of information, particularly in regards
1103	   to accurate redirection of references as resources move over time, it
1104	   also serves to prevent information providers from denying reference
1105	   authors the right to selectively refer to information within a
1106	   resource.  Indirect referencing also provides additional flexibility
1107	   and extensibility to systems that use URIs, since new media types are
1108	   easier to define and deploy than new schemes of identification.

1110	   The characters slash ("/") and question mark ("?") are allowed to
1111	   represent data within the fragment identifier.  Beware that some
1112	   older, erroneous implementations do not handle such URIs correctly
1113	   when they are used as the base for relative references (Section 5.1).

1115	4.  Usage

1117	   When applications make reference to a URI, they do not always use the
1118	   full form of reference defined by the "URI" syntax rule. In order to
1119	   save space and take advantage of hierarchical locality, many Internet
1120	   protocol elements and media type formats allow an abbreviation of a
1121	   URI, while others restrict the syntax to a particular form of URI.
1122	   We define the most common forms of reference syntax in this
1123	   specification because they impact and depend upon the design of the
1124	   generic syntax, requiring a uniform parsing algorithm in order to be
1125	   interpreted consistently.

1127	4.1  URI Reference

1129	   URI-reference is used to denote the most common usage of a resource
1130	   identifier.

1132	      URI-reference = URI / relative-URI

1134	   A URI-reference may be relative: if the reference's prefix matches
1135	   the syntax of a scheme followed by its colon separator, then the
1136	   reference is a URI rather than a relative-URI.

1138	   A URI-reference is typically parsed first into the five URI
1139	   components, in order to determine what components are present and
1140	   whether or not the reference is relative, after which each component
1141	   is parsed for its subparts and their validation.  The ABNF of
1142	   URI-reference, along with the "first-match-wins" disambiguation rule,
1143	   is sufficient to define a validating parser for the generic syntax.
1144	   Readers familiar with regular expressions should see Appendix B for
1145	   an example of a non-validating URI-reference parser that will take
1146	   any given string and extract the URI components.

1148	4.2  Relative URI

1150	   A relative URI reference takes advantage of the hierarchical syntax
1151	   (Section 1.2.3) in order to express a reference that is relative to
1152	   the name space of another hierarchical URI.

1154	      relative-URI  = relative-part [ "?" query ] [ "#" fragment ]

1156	      relative-part = "//" authority path-abempty
1157	                    / path-abs
1158	                    / path-noscheme
1159	                    / path-empty

1161	   The URI referred to by a relative reference, also known as the target
1162	   URI, is obtained by applying the reference resolution algorithm of
1163	   Section 5.

1165	   A relative reference that begins with two slash characters is termed
1166	   a network-path reference; such references are rarely used. A relative
1167	   reference that begins with a single slash character is termed an
1168	   absolute-path reference.  A relative reference that does not begin
1169	   with a slash character is termed a relative-path reference.

1171	   A path segment that contains a colon character (e.g., "this:that")
1172	   cannot be used as the first segment of a relative-path reference
1173	   because it would be mistaken for a scheme name.  Such a segment must
1174	   be preceded by a dot-segment (e.g., "./this:that") to make a
1175	   relative-path reference.

1177	4.3  Absolute URI

1179	   Some protocol elements allow only the absolute form of a URI without
1180	   a fragment identifier.  For example, defining a base URI for later
1181	   use by relative references calls for an absolute-URI syntax rule that
1182	   does not allow a fragment.

1184	      absolute-URI  = scheme ":" hier-part [ "?" query ]

1186	4.4  Same-document Reference

1188	   When a URI reference refers to a URI that is, aside from its fragment
1189	   component (if any), identical to the base URI (Section 5.1), that
1190	   reference is called a "same-document" reference.  The most frequent
1191	   examples of same-document references are relative references that are
1192	   empty or include only the number sign ("#") separator followed by a
1193	   fragment identifier.

1195	   When a same-document reference is dereferenced for the purpose of a
1196	   retrieval action, the target of that reference is defined to be
1197	   within the same entity (representation, document, or message) as the
1198	   reference; therefore, a dereference should not result in a new
1199	   retrieval action.

1201	   Normalization of the base and target URIs prior to their comparison,
1202	   as described in Section 6.2.2 and Section 6.2.3, is allowed but
1203	   rarely performed in practice.  Normalization may increase the set of
1204	   same-document references, which may be of benefit to some caching
1205	   applications. As such, reference authors should not assume that a
1206	   slightly different, though equivalent, reference URI will (or will
1207	   not) be interpreted as a same-document reference by any given
1208	   application.

1210	4.5  Suffix Reference

1212	   The URI syntax is designed for unambiguous reference to resources and
1213	   extensibility via the URI scheme.  However, as URI identification and
1214	   usage have become commonplace, traditional media (television, radio,
1215	   newspapers, billboards, etc.) have increasingly used a suffix of the
1216	   URI as a reference, consisting of only the authority and path
1217	   portions of the URI, such as

1219	      www.w3.org/Addressing/

1221	   or simply a DNS registered name on its own.  Such references are
1222	   primarily intended for human interpretation, rather than for
1223	   machines, with the assumption that context-based heuristics are
1224	   sufficient to complete the URI (e.g., most registered names beginning
1225	   with "www" are likely to have a URI prefix of "http://").  Although
1226	   there is no standard set of heuristics for disambiguating a URI
1227	   suffix, many client implementations allow them to be entered by the
1228	   user and heuristically resolved.

1230	   While this practice of using suffix references is common, it should
1231	   be avoided whenever possible and never used in situations where
1232	   long-term references are expected.  The heuristics noted above will
1233	   change over time, particularly when a new URI scheme becomes popular,
1234	   and are often incorrect when used out of context.  Furthermore, they
1235	   can lead to security issues along the lines of those described in
1236	   [RFC1535].

1238	   Since a URI suffix has the same syntax as a relative path reference,
1239	   a suffix reference cannot be used in contexts where a relative
1240	   reference is expected.  As a result, suffix references are limited to
1241	   those places where there is no defined base URI, such as dialog boxes
1242	   and off-line advertisements.

1244	5.  Reference Resolution

1246	   This section defines the process of resolving a URI reference within
1247	   a context that allows relative references, such that the result is a
1248	   string matching the "URI" syntax rule of Section 3.

1250	5.1  Establishing a Base URI

1252	   The term "relative" implies that there exists a "base URI" against
1253	   which the relative reference is applied.  Aside from fragment-only
1254	   references (Section 4.4), relative references are only usable when a
1255	   base URI is known.  A base URI must be established by the parser
1256	   prior to parsing URI references that might be relative.

1258	   The base URI of a reference can be established in one of four ways,
1259	   discussed below in order of precedence.  The order of precedence can
1260	   be thought of in terms of layers, where the innermost defined base
1261	   URI has the highest precedence.  This can be visualized graphically
1262	   as:

1264	      .----------------------------------------------------------.
1265	      |  .----------------------------------------------------.  |
1266	      |  |  .----------------------------------------------.  |  |
1267	      |  |  |  .----------------------------------------.  |  |  |
1268	      |  |  |  |  .----------------------------------.  |  |  |  |
1269	      |  |  |  |  |       <relative-reference>       |  |  |  |  |
1270	      |  |  |  |  `----------------------------------'  |  |  |  |
1271	      |  |  |  | (5.1.1) Base URI embedded in content   |  |  |  |
1272	      |  |  |  `----------------------------------------'  |  |  |
1273	      |  |  | (5.1.2) Base URI of the encapsulating entity |  |  |
1274	      |  |  |         (message, representation, or none)   |  |  |
1275	      |  |  `----------------------------------------------'  |  |
1276	      |  | (5.1.3) URI used to retrieve the entity            |  |
1277	      |  `----------------------------------------------------'  |
1278	      | (5.1.4) Default Base URI (application-dependent)         |
1279	      `----------------------------------------------------------'

1281	5.1.1  Base URI Embedded in Content

1283	   Within certain media types, a base URI for relative references can be
1284	   embedded within the content itself such that it can be readily
1285	   obtained by a parser.  This can be useful for descriptive documents,
1286	   such as tables of content, which may be transmitted to others through
1287	   protocols other than their usual retrieval context (e.g., E-Mail or
1288	   USENET news).

1290	   It is beyond the scope of this specification to specify how, for each
1291	   media type, a base URI can be embedded.  The appropriate syntax, when
1292	   available, is described by the data format specification associated
1293	   with each media type.

1295	5.1.2  Base URI from the Encapsulating Entity

1297	   If no base URI is embedded, the base URI is defined by the
1298	   representation's retrieval context.  For a document that is enclosed
1299	   within another entity, such as a message or archive, the retrieval
1300	   context is that entity; thus, the default base URI of a
1301	   representation is the base URI of the entity in which the
1302	   representation is encapsulated.

1304	   A mechanism for embedding a base URI within MIME container types
1305	   (e.g., the message and multipart types) is defined by MHTML
1306	   [RFC2557].  Protocols that do not use the MIME message header syntax,
1307	   but do allow some form of tagged metadata to be included within
1308	   messages, may define their own syntax for defining a base URI as part
1309	   of a message.

1311	5.1.3  Base URI from the Retrieval URI

1313	   If no base URI is embedded and the representation is not encapsulated
1314	   within some other entity, then, if a URI was used to retrieve the
1315	   representation, that URI shall be considered the base URI. Note that
1316	   if the retrieval was the result of a redirected request, the last URI
1317	   used (i.e., the URI that resulted in the actual retrieval of the
1318	   representation) is the base URI.

1320	5.1.4  Default Base URI

1322	   If none of the conditions described above apply, then the base URI is
1323	   defined by the context of the application. Since this definition is
1324	   necessarily application-dependent, failing to define a base URI using
1325	   one of the other methods may result in the same content being
1326	   interpreted differently by different types of application.

1328	   A sender of a representation containing relative references is
1329	   responsible for ensuring that a base URI for those references can be
1330	   established. Aside from fragment-only references, relative references
1331	   can only be used reliably in situations where the base URI is
1332	   well-defined.

1334	5.2  Relative Resolution

1336	   This section describes an algorithm for converting a URI reference
1337	   that might be relative to a given base URI into the parsed components
1338	   of the reference's target.  The components can then be recomposed, as
1339	   described in Section 5.3, to form the target URI. This algorithm
1340	   provides definitive results that can be used to test the output of
1341	   other implementations.  Applications may implement relative reference
1342	   resolution using some other algorithm, provided that the results
1343	   match what would be given by this algorithm.

1345	5.2.1  Pre-parse the Base URI

1347	   The base URI (Base) is established according to the procedure of
1348	   Section 5.1 and parsed into the five main components described in
1349	   Section 3.  Note that only the scheme component is required to be
1350	   present in a base URI; the other components may be empty or
1351	   undefined.  A component is undefined if its associated delimiter does
1352	   not appear in the URI reference; the path component is never
1353	   undefined, though it may be empty.

1355	   Normalization of the base URI, as described in Section 6.2.2 and
1356	   Section 6.2.3, is optional.  A URI reference must be transformed to
1357	   its target URI before it can be normalized.

1359	5.2.2  Transform References

1361	   For each URI reference (R), the following pseudocode describes an
1362	   algorithm for transforming R into its target URI (T):

1364	      -- The URI reference is parsed into the five URI components
1365	      --
1366	      (R.scheme, R.authority, R.path, R.query, R.fragment) = parse(R);

1368	      -- A non-strict parser may ignore a scheme in the reference
1369	      -- if it is identical to the base URI's scheme.
1370	      --
1371	      if ((not strict) and (R.scheme == Base.scheme)) then
1372	         undefine(R.scheme);
1373	      endif;
1374	      if defined(R.scheme) then
1375	         T.scheme    = R.scheme;
1376	         T.authority = R.authority;
1377	         T.path      = remove_dot_segments(R.path);
1378	         T.query     = R.query;
1379	      else
1380	         if defined(R.authority) then
1381	            T.authority = R.authority;
1382	            T.path      = remove_dot_segments(R.path);
1383	            T.query     = R.query;
1384	         else
1385	            if (R.path == "") then
1386	               T.path = Base.path;
1387	               if defined(R.query) then
1388	                  T.query = R.query;
1389	               else
1390	                  T.query = Base.query;
1391	               endif;
1392	            else
1393	               if (R.path starts-with "/") then
1394	                  T.path = remove_dot_segments(R.path);
1395	               else
1396	                  T.path = merge(Base.path, R.path);
1397	                  T.path = remove_dot_segments(T.path);
1398	               endif;
1399	               T.query = R.query;
1400	            endif;
1401	            T.authority = Base.authority;
1402	         endif;
1403	         T.scheme = Base.scheme;
1404	      endif;

1406	      T.fragment = R.fragment;

1408	5.2.3  Merge Paths

1410	   The pseudocode above refers to a "merge" routine for merging a
1411	   relative-path reference with the path of the base URI.  This is
1412	   accomplished as follows:

1414	   o  If the base URI has a defined authority component and an empty
1415	      path, then return a string consisting of "/" concatenated with the
1416	      reference's path; otherwise,

1418	   o  Return a string consisting of the reference's path component
1419	      appended to all but the last segment of the base URI's path (i.e.,
1420	      excluding any characters after the right-most "/" in the base URI
1421	      path, or excluding the entire base URI path if it does not contain
1422	      any "/" characters).

1424	5.2.4  Remove Dot Segments

1426	   The pseudocode also refers to a "remove_dot_segments" routine for
1427	   interpreting and removing the special "." and ".." complete path
1428	   segments from a referenced path.  This is done after the path is
1429	   extracted from a reference, whether or not the path was relative, in
1430	   order to remove any invalid or extraneous dot-segments prior to
1431	   forming the target URI.  Although there are many ways to accomplish
1432	   this removal process, we describe a simple method using two string
1433	   buffers.

1435	   1.  The input buffer is initialized with the now-appended path
1436	       components and the output buffer is initialized to the empty
1437	       string.

1439	   2.  While the input buffer is not empty, loop:

1441	       a.  If the input buffer begins with a prefix of "../" or "./",
1442	           then remove that prefix from the input buffer; otherwise,

1444	       b.  If the input buffer begins with a prefix of "/./" or "/.",
1445	           where "." is a complete path segment, then replace that
1446	           prefix with "/" in the input buffer; otherwise,

1448	       c.  If the input buffer begins with a prefix of "/../" or "/..",
1449	           where ".." is a complete path segment, then replace that
1450	           prefix with "/" in the input buffer and remove the last
1451	           segment and its preceding "/" (if any) from the output
1452	           buffer; otherwise,

1454	       d.  If the input buffer consists only of "." or "..", then remove
1455	           that from the input buffer; otherwise,

1457	       e.  Move the first path segment in the input buffer to the end of
1458	           the output buffer, including the initial "/" character (if
1459	           any) and any subsequent characters up to, but not including,
1460	           the next "/" character or the end of the input buffer.

1462	   3.  Finally, the output buffer is returned as the result of
1463	       remove_dot_segments.

1465	   Note that dot-segments are intended for use in URI references to
1466	   express an identifier relative to the hierarchy of names in the base
1467	   URI.  The remove_dot_segments algorithm respects that hierarchy by
1468	   removing extra dot-segments rather than treating them as an error or
1469	   leaving them to be misinterpreted by dereference implementations.

1471	   The following illustrates how the above steps are applied for two
1472	   example merged paths, showing the state of the two buffers after each
1473	   step.

1475	      STEP   OUTPUT BUFFER         INPUT BUFFER

1477	       1 :                         /a/b/c/./../../g
1478	       2e:   /a                    /b/c/./../../g
1479	       2e:   /a/b                  /c/./../../g
1480	       2e:   /a/b/c                /./../../g
1481	       2b:   /a/b/c                /../../g
1482	       2c:   /a/b                  /../g
1483	       2c:   /a                    /g
1484	       2e:   /a/g

1486	      STEP   OUTPUT BUFFER         INPUT BUFFER

1488	       1 :                         mid/content=5/../6
1489	       2e:   mid                   /content=5/../6
1490	       2e:   mid/content=5         /../6
1491	       2c:   mid                   /6
1492	       2e:   mid/6

1494	   Some applications may find it more efficient to implement the
1495	   remove_dot_segments algorithm using two segment stacks rather than
1496	   strings.

1498	      Note: Beware that some older, erroneous implementations will fail
1499	      to separate a reference's query component from its path component
1500	      prior to merging the base and reference paths, resulting in an
1501	      interoperability failure if the query component contains the
1502	      strings "/../" or "/./".

1504	5.3  Component Recomposition

1506	   Parsed URI components can be recomposed to obtain the corresponding
1507	   URI reference string.  Using pseudocode, this would be:

1509	      result = ""

1511	      if defined(scheme) then
1512	         append scheme to result;
1513	         append ":" to result;
1514	      endif;

1516	      if defined(authority) then
1517	         append "//" to result;
1518	         append authority to result;
1519	      endif;

1521	      append path to result;

1523	      if defined(query) then
1524	         append "?" to result;
1525	         append query to result;
1526	      endif;

1528	      if defined(fragment) then
1529	         append "#" to result;
1530	         append fragment to result;
1531	      endif;

1533	      return result;

1535	   Note that we are careful to preserve the distinction between a
1536	   component that is undefined, meaning that its separator was not
1537	   present in the reference, and a component that is empty, meaning that
1538	   the separator was present and was immediately followed by the next
1539	   component separator or the end of the reference.

1541	5.4  Reference Resolution Examples

1543	   Within a representation with a well-defined base URI of

1545	      http://a/b/c/d;p?q

1547	   a relative URI reference is transformed to its target URI as follows.

1549	5.4.1  Normal Examples

1551	      "g:h"           =  "g:h"
1552	      "g"             =  "http://a/b/c/g"
1553	      "./g"           =  "http://a/b/c/g"
1554	      "g/"            =  "http://a/b/c/g/"
1555	      "/g"            =  "http://a/g"
1556	      "//g"           =  "http://g"
1557	      "?y"            =  "http://a/b/c/d;p?y"
1558	      "g?y"           =  "http://a/b/c/g?y"
1559	      "#s"            =  "http://a/b/c/d;p?q#s"
1560	      "g#s"           =  "http://a/b/c/g#s"
1561	      "g?y#s"         =  "http://a/b/c/g?y#s"
1562	      ";x"            =  "http://a/b/c/;x"
1563	      "g;x"           =  "http://a/b/c/g;x"
1564	      "g;x?y#s"       =  "http://a/b/c/g;x?y#s"
1565	      ""              =  "http://a/b/c/d;p?q"
1566	      "."             =  "http://a/b/c/"
1567	      "./"            =  "http://a/b/c/"
1568	      ".."            =  "http://a/b/"
1569	      "../"           =  "http://a/b/"
1570	      "../g"          =  "http://a/b/g"
1571	      "../.."         =  "http://a/"
1572	      "../../"        =  "http://a/"
1573	      "../../g"       =  "http://a/g"

1575	5.4.2  Abnormal Examples

1577	   Although the following abnormal examples are unlikely to occur in
1578	   normal practice, all URI parsers should be capable of resolving them
1579	   consistently.  Each example uses the same base as above.

1581	   Parsers must be careful in handling cases where there are more
1582	   relative path ".." segments than there are hierarchical levels in the
1583	   base URI's path.  Note that the ".." syntax cannot be used to change
1584	   the authority component of a URI.

1586	      "../../../g"    =  "http://a/g"
1587	      "../../../../g" =  "http://a/g"

1589	   Similarly, parsers must remove the dot-segments "." and ".." when
1590	   they are complete components of a path, but not when they are only
1591	   part of a segment.

1593	      "/./g"          =  "http://a/g"
1594	      "/../g"         =  "http://a/g"
1595	      "g."            =  "http://a/b/c/g."
1596	      ".g"            =  "http://a/b/c/.g"
1597	      "g.."           =  "http://a/b/c/g.."
1598	      "..g"           =  "http://a/b/c/..g"

1600	   Less likely are cases where the relative URI reference uses
1601	   unnecessary or nonsensical forms of the "." and ".." complete path
1602	   segments.

1604	      "./../g"        =  "http://a/b/g"
1605	      "./g/."         =  "http://a/b/c/g/"
1606	      "g/./h"         =  "http://a/b/c/g/h"
1607	      "g/../h"        =  "http://a/b/c/h"
1608	      "g;x=1/./y"     =  "http://a/b/c/g;x=1/y"
1609	      "g;x=1/../y"    =  "http://a/b/c/y"

1611	   Some applications fail to separate the reference's query and/or
1612	   fragment components from a relative path before merging it with the
1613	   base path and removing dot-segments.  This error is rarely noticed,
1614	   since typical usage of a fragment never includes the hierarchy ("/")
1615	   character, and the query component is not normally used within
1616	   relative references.

1618	      "g?y/./x"       =  "http://a/b/c/g?y/./x"
1619	      "g?y/../x"      =  "http://a/b/c/g?y/../x"
1620	      "g#s/./x"       =  "http://a/b/c/g#s/./x"
1621	      "g#s/../x"      =  "http://a/b/c/g#s/../x"

1623	   Some parsers allow the scheme name to be present in a relative URI
1624	   reference if it is the same as the base URI scheme.  This is
1625	   considered to be a loophole in prior specifications of partial URI
1626	   [RFC1630]. Its use should be avoided, but is allowed for backward
1627	   compatibility.

1629	      "http:g"        =  "http:g"         ; for strict parsers
1630	                      /  "http://a/b/c/g" ; for backward compatibility

1632	6.  Normalization and Comparison

1634	   One of the most common operations on URIs is simple comparison:
1635	   determining if two URIs are equivalent without using the URIs to
1636	   access their respective resource(s).  A comparison is performed every
1637	   time a response cache is accessed, a browser checks its history to
1638	   color a link, or an XML parser processes tags within a namespace.
1639	   Extensive normalization prior to comparison of URIs is often used by
1640	   spiders and indexing engines to prune a search space or reduce
1641	   duplication of request actions and response storage.

1643	   URI comparison is performed in respect to some particular purpose,
1644	   and software with differing purposes will often be subject to
1645	   differing design trade-offs in regards to how much effort should be
1646	   spent in reducing duplicate identifiers.  This section describes a
1647	   variety of methods that may be used to compare URIs, the trade-offs
1648	   between them, and the types of applications that might use them.  A
1649	   canonical form for URI references is defined to reduce the occurrence
1650	   of false negative comparisons.

1652	6.1  Equivalence

1654	   Since URIs exist to identify resources, presumably they should be
1655	   considered equivalent when they identify the same resource.  However,
1656	   such a definition of equivalence is not of much practical use, since
1657	   there is no way for software to compare two resources without
1658	   knowledge of the implementation-specific syntax of each URI's
1659	   dereferencing algorithm. For this reason, determination of
1660	   equivalence or difference of URIs is based on string comparison,
1661	   perhaps augmented by reference to additional rules provided by URI
1662	   scheme definitions. We use the terms "different" and "equivalent" to
1663	   describe the possible outcomes of such comparisons, but there are
1664	   many application-dependent versions of equivalence.

1666	   Even though it is possible to determine that two URIs are equivalent,
1667	   it is never possible to be sure that two URIs identify different
1668	   resources. For example, an owner of two different domain names could
1669	   decide to serve the same resource from both, resulting in two
1670	   different URIs.  Therefore, comparison methods are designed to
1671	   minimize false negatives while strictly avoiding false positives.

1673	   In testing for equivalence, applications should not directly compare
1674	   relative URI references; the references should be converted to their
1675	   target URI forms before comparison.  When URIs are being compared for
1676	   the purpose of selecting (or avoiding) a network action, such as
1677	   retrieval of a representation, the fragment components (if any)
1678	   should be excluded from the comparison.

1680	6.2  Comparison Ladder

1682	   A variety of methods are used in practice to test URI equivalence.
1683	   These methods fall into a range, distinguished by the amount of
1684	   processing required and the degree to which the probability of false
1685	   negatives is reduced.  As noted above, false negatives cannot in
1686	   principle be eliminated.  In practice, their probability can be
1687	   reduced, but this reduction requires more processing and is not
1688	   cost-effective for all applications.

1690	   If this range of comparison practices is considered as a ladder, the
1691	   following discussion will climb the ladder, starting with those
1692	   practices that are cheap but have a relatively higher chance of
1693	   producing false negatives, and proceeding to those that have higher
1694	   computational cost and lower risk of false negatives.

1696	6.2.1  Simple String Comparison

1698	   If two URIs, considered as character strings, are identical, then it
1699	   is safe to conclude that they are equivalent.  This type of
1700	   equivalence test has very low computational cost and is in wide use
1701	   in a variety of applications, particularly in the domain of parsing.

1703	   Testing strings for equivalence requires some basic precautions. This
1704	   procedure is often referred to as "bit-for-bit" or "byte-for-byte"
1705	   comparison, which is potentially misleading.  Testing of strings for
1706	   equality is normally based on pairwise comparison of the characters
1707	   that make up the strings, starting from the first and proceeding
1708	   until both strings are exhausted and all characters found to be
1709	   equal, a pair of characters compares unequal, or one of the strings
1710	   is exhausted before the other.

1712	   Such character comparisons require that each pair of characters be
1713	   put in comparable form.  For example, should one URI be stored in a
1714	   byte array in EBCDIC encoding, and the second be in a Java String
1715	   object (UTF-16), bit-for-bit comparisons applied naively will produce
1716	   errors. It is better to speak of equality on a
1717	   character-for-character rather than byte-for-byte or bit-for-bit
1718	   basis. In practical terms, character-by-character comparisons should
1719	   be done codepoint-by-codepoint after conversion to a common character
1720	   encoding.

1722	6.2.2  Syntax-based Normalization

1724	   Software may use logic based on the definitions provided by this
1725	   specification to reduce the probability of false negatives.  Such
1726	   processing is moderately higher in cost than character-for-character
1727	   string comparison.  For example, an application using this approach
1728	   could reasonably consider the following two URIs equivalent:

1730	      example://a/b/c/%7Bfoo%7D
1731	      eXAMPLE://a/./b/../b/%63/%7bfoo%7d

1733	   Web user agents, such as browsers, typically apply this type of URI
1734	   normalization when determining whether a cached response is
1735	   available. Syntax-based normalization includes such techniques as
1736	   case normalization, percent-encoding normalization, and removal of
1737	   dot-segments.

1739	6.2.2.1  Case Normalization

1741	   When a URI scheme uses components of the generic syntax, it will also
1742	   use the common syntax equivalence rules, namely that the scheme and
1743	   host are case-insensitive and therefore should be normalized to
1744	   lowercase.  For example, the URI <HTTP://www.EXAMPLE.com/> is
1745	   equivalent to <http://www.example.com/>.  Applications should not
1746	   assume anything about the case sensitivity of other URI components,
1747	   since that is dependent on the implementation used to handle a
1748	   dereference.

1750	   The hexadecimal digits within a percent-encoding triplet (e.g., "%3a"
1751	   versus "%3A") are case-insensitive and therefore should be normalized
1752	   to use uppercase letters for the digits A-F.

1754	6.2.2.2  Percent-Encoding Normalization

1756	   The percent-encoding mechanism (Section 2.1) is a frequent source of
1757	   variance among otherwise identical URIs. In addition to the
1758	   case-insensitivity issue noted above, some URI producers
1759	   percent-encode octets that do not require percent-encoding, resulting
1760	   in URIs that are equivalent to their non-encoded counterparts. Such
1761	   URIs should be normalized by decoding any percent-encoded octet that
1762	   corresponds to an unreserved character, as described in Section 2.3.

1764	6.2.2.3  Path Segment Normalization

1766	   The complete path segments "." and ".." have a special meaning within
1767	   hierarchical URI schemes.  As such, they should not appear in
1768	   absolute paths; if they are found, they can be removed by applying
1769	   the remove_dot_segments algorithm to the path, as described in
1770	   Section 5.2.

1772	6.2.3  Scheme-based Normalization

1774	   The syntax and semantics of URIs vary from scheme to scheme, as
1775	   described by the defining specification for each scheme.  Software
1776	   may use scheme-specific rules, at further processing cost, to reduce
1777	   the probability of false negatives.  For example, since the "http"
1778	   scheme makes use of an authority component, has a default port of
1779	   "80", and defines an empty path to be equivalent to "/", the
1780	   following four URIs are equivalent:

1782	      http://example.com
1783	      http://example.com/
1784	      http://example.com:/
1785	      http://example.com:80/

1787	   In general, a URI that uses the generic syntax for authority with an
1788	   empty path should be normalized to a path of "/"; likewise, an
1789	   explicit ":port", where the port is empty or the default for the
1790	   scheme, is equivalent to one where the port and its ":" delimiter are
1791	   elided. In other words, the second of the above URI examples is the
1792	   normal form for the "http" scheme.

1794	   Another case where normalization varies by scheme is in the handling
1795	   of an empty authority component or empty host subcomponent.  For many
1796	   scheme specifications, an empty authority or host is considered an
1797	   error; for others, it is considered equivalent to "localhost" or the
1798	   end-user's host. When a scheme defines a default for authority and a
1799	   URI reference to that default is desired, the reference should have
1800	   an empty authority for the sake of uniformity, brevity, and
1801	   internationalization. If, however, either the userinfo or port
1802	   subcomponent is non-empty, then the host should be given explicitly
1803	   even if it matches the default.

1805	6.2.4  Protocol-based Normalization

1807	   Web spiders, for which substantial effort to reduce the incidence of
1808	   false negatives is often cost-effective, are observed to implement
1809	   even more aggressive techniques in URI comparison.  For example, if
1810	   they observe that a URI such as

1812	      http://example.com/data

1814	   redirects to a URI differing only in the trailing slash

1816	      http://example.com/data/

1818	   they will likely regard the two as equivalent in the future. This
1819	   kind of technique is only appropriate when equivalence is clearly
1820	   indicated by both the result of accessing the resources and the
1821	   common conventions of their scheme's dereference algorithm (in this
1822	   case, use of redirection by HTTP origin servers to avoid problems
1823	   with relative references).

1825	6.3  Canonical Form

1827	   It is in the best interests of everyone concerned to avoid
1828	   false-negatives in comparing URIs and to minimize the amount of
1829	   software processing for such comparisons.  Those who produce and make
1830	   reference to URIs can reduce the cost of processing and the risk of
1831	   false negatives by consistently providing them in a form that is
1832	   reasonably canonical with respect to their scheme.  Specifically:

1834	   o  Always provide the URI scheme in lowercase characters.

1836	   o  Always provide the host, if any, in lowercase characters.

1838	   o  Only perform percent-encoding where it is essential.

1840	   o  Always use uppercase A-through-F characters when percent-encoding.

1842	   o  Prevent dot-segments appearing in non-relative URI paths.

1844	   o  For schemes that define a default authority, use an empty
1845	      authority if the default is desired.

1847	   o  For schemes that define an empty path to be equivalent to a path
1848	      of "/", use "/".

1850	7.  Security Considerations

1852	   A URI does not in itself pose a security threat.  However, since URIs
1853	   are often used to provide a compact set of instructions for access to
1854	   network resources, care must be taken to properly interpret the data
1855	   within a URI, to prevent that data from causing unintended access,
1856	   and to avoid including data that should not be revealed in plain
1857	   text.

1859	7.1  Reliability and Consistency

1861	   There is no guarantee that, having once used a given URI to retrieve
1862	   some information, the same information will be retrievable by that
1863	   URI in the future. Nor is there any guarantee that the information
1864	   retrievable via that URI in the future will be observably similar to
1865	   that retrieved in the past.  The URI syntax does not constrain how a
1866	   given scheme or authority apportions its name space or maintains it
1867	   over time.  Such a guarantee can only be obtained from the person(s)
1868	   controlling that name space and the resource in question.  A specific
1869	   URI scheme may define additional semantics, such as name persistence,
1870	   if those semantics are required of all naming authorities for that
1871	   scheme.

1873	7.2  Malicious Construction

1875	   It is sometimes possible to construct a URI such that an attempt to
1876	   perform a seemingly harmless, idempotent operation, such as the
1877	   retrieval of a representation, will in fact cause a possibly damaging
1878	   remote operation to occur.  The unsafe URI is typically constructed
1879	   by specifying a port number other than that reserved for the network
1880	   protocol in question.  The client unwittingly contacts a site that is
1881	   running a different protocol service and data within the URI contains
1882	   instructions that, when interpreted according to this other protocol,
1883	   cause an unexpected operation.  A frequent example of such abuse has
1884	   been the use of a protocol-based scheme with a port component of
1885	   "25", thereby fooling user agent software into sending an unintended
1886	   or impersonating message via an SMTP server.

1888	   Applications should prevent dereference of a URI that specifies a TCP
1889	   port number within the "well-known port" range (0 - 1023) unless the
1890	   protocol being used to dereference that URI is compatible with the
1891	   protocol expected on that well-known port. Although IANA maintains a
1892	   registry of well-known ports, applications should make such
1893	   restrictions user-configurable to avoid preventing the deployment of
1894	   new services.

1896	   When a URI contains percent-encoded octets that match the delimiters
1897	   for a given resolution or dereference protocol (for example, CR and
1898	   LF characters for the TELNET protocol), such percent-encoded octets
1899	   must not be decoded before transmission across that protocol.
1900	   Transfer of the percent-encoding, which might violate the protocol,
1901	   is less harmful than allowing decoded octets to be interpreted as
1902	   additional operations or parameters, perhaps triggering an unexpected
1903	   and possibly harmful remote operation.

1905	7.3  Back-end Transcoding

1907	   When a URI is dereferenced, the data within it is often parsed by
1908	   both the user agent and one or more servers.  In HTTP, for example, a
1909	   typical user agent will parse a URI into its five major components,
1910	   access the authority's server, and send it the data within the
1911	   authority, path, and query components.  A typical server will take
1912	   that information, parse the path into segments and the query into
1913	   key/value pairs, and then invoke implementation-specific handlers to
1914	   respond to the request. As a result, a common security concern for
1915	   server implementations that handle a URI, either as a whole or split
1916	   into separate components, is proper interpretation of the octet data
1917	   represented by the characters and percent-encodings within that URI.

1919	   Percent-encoded octets must be decoded at some point during the
1920	   dereference process.  Applications must split the URI into its
1921	   components and subcomponents prior to decoding the octets, since
1922	   otherwise the decoded octets might be mistaken for delimiters.
1923	   Security checks of the data within a URI should be applied after
1924	   decoding the octets.  Note, however, that the "%00" percent-encoding
1925	   (NUL) may require special handling and should be rejected if the
1926	   application is not expecting to receive raw data within a component.

1928	   Special care should be taken when the URI path interpretation process
1929	   involves the use of a back-end filesystem or related system
1930	   functions. Filesystems typically assign an operational meaning to
1931	   special characters, such as the "/", "\", ":", "[", and "]"
1932	   characters, and special device names like ".", "..", "...", "aux",
1933	   "lpt", etc. In some cases, merely testing for the existence of such a
1934	   name will cause the operating system to pause or invoke unrelated
1935	   system calls, leading to significant security concerns regarding
1936	   denial of service and unintended data transfer.  It would be
1937	   impossible for this specification to list all such significant
1938	   characters and device names; implementers should research the
1939	   reserved names and characters for the types of storage device that
1940	   may be attached to their application and restrict the use of data
1941	   obtained from URI components accordingly.

1943	7.4  Rare IP Address Formats

1945	   Although the URI syntax for IPv4address only allows the common,
1946	   dotted-decimal form of IPv4 address literal, many implementations
1947	   that process URIs make use of platform-dependent system routines,
1948	   such as gethostbyname() and inet_aton(), to translate the string
1949	   literal to an actual IP address.  Unfortunately, such system routines
1950	   often allow and process a much larger set of formats than those
1951	   described in Section 3.2.2.

1953	   For example, many implementations allow dotted forms of three
1954	   numbers, wherein the last part is interpreted as a 16-bit quantity
1955	   and placed in the right-most two bytes of the network address (e.g.,
1956	   a Class B network). Likewise, a dotted form of two numbers means the
1957	   last part is interpreted as a 24-bit quantity and placed in the right
1958	   most three bytes of the network address (Class A), and a single
1959	   number (without dots) is interpreted as a 32-bit quantity and stored
1960	   directly in the network address.  Adding further to the confusion,
1961	   some implementations allow each dotted part to be interpreted as
1962	   decimal, octal, or hexadecimal, as specified in the C language (i.e.,
1963	   a leading 0x or 0X implies hexadecimal; otherwise, a leading 0
1964	   implies octal; otherwise, the number is interpreted as decimal).

1966	   These additional IP address formats are not allowed in the URI syntax
1967	   due to differences between platform implementations.  However, they
1968	   can become a security concern if an application attempts to filter
1969	   access to resources based on the IP address in string literal format.
1970	   If such filtering is performed, literals should be converted to
1971	   numeric form and filtered based on the numeric value, rather than a
1972	   prefix or suffix of the string form.

1974	7.5  Sensitive Information

1976	   URI producers should not provide a URI that contains a username or
1977	   password which is intended to be secret: URIs are frequently
1978	   displayed by browsers, stored in clear text bookmarks, and logged by
1979	   user agent history and intermediary applications (proxies). A
1980	   password appearing within the userinfo component is deprecated and
1981	   should be considered an error (or simply ignored) except in those
1982	   rare cases where the 'password' parameter is intended to be public.

1984	7.6  Semantic Attacks

1986	   Because the userinfo subcomponent is rarely used and appears before
1987	   the host in the authority component, it can be used to construct a
1988	   URI that is intended to mislead a human user by appearing to identify
1989	   one (trusted) naming authority while actually identifying a different
1990	   authority hidden behind the noise.  For example

1992	      ftp://cnn.example.com&story=breaking_news@10.0.0.1/top_story.htm

1994	   might lead a human user to assume that the host is 'cnn.example.com',
1995	   whereas it is actually '10.0.0.1'.  Note that a misleading userinfo
1996	   subcomponent could be much longer than the example above.

1998	   A misleading URI, such as the one above, is an attack on the user's
1999	   preconceived notions about the meaning of a URI, rather than an
2000	   attack on the software itself.  User agents may be able to reduce the
2001	   impact of such attacks by distinguishing the various components of
2002	   the URI when rendered, such as by using a different color or tone to
2003	   render userinfo if any is present, though there is no general
2004	   panacea. More information on URI-based semantic attacks can be found
2005	   in [Siedzik].

2007	8.  Acknowledgments

2009	   This specification is derived from RFC 2396 [RFC2396], RFC 1808
2010	   [RFC1808], and RFC 1738 [RFC1738]; the acknowledgments in those
2011	   documents still apply. It also incorporates the update (with
2012	   corrections) for IPv6 literals in the host syntax, as defined by
2013	   Robert M. Hinden, Brian E. Carpenter, and Larry Masinter in
2014	   [RFC2732]. In addition, contributions by Gisle Aas, Reese Anschultz,
2015	   Daniel Barclay, Tim Bray, Mike Brown, Rob Cameron, Jeremy Carroll,
2016	   Dan Connolly, Adam M. Costello, John Cowan, Jason Diamond, Martin
2017	   Duerst, Stefan Eissing, Clive D.W. Feather, Tony Hammond, Pat Hayes,
2018	   Henry Holtzman, Ian B. Jacobs, Michael Kay, John C. Klensin, Graham
2019	   Klyne, Dan Kohn, Bruce Lilly, Andrew Main, Ira McDonald, Michael
2020	   Mealling, Ray Merkert, Stephen Pollei, Julian Reschke, Tomas Rokicki,
2021	   Miles Sabin, Kai Schaetzl, Mark Thomson, Ronald Tschalaer, Norm
2022	   Walsh, Marc Warne, Stuart Williams, and Henry Zongaro are gratefully
2023	   acknowledged.

2025	9.  References

2027	9.1  Normative References

2029	   [ASCII]    American National Standards Institute, "Coded Character
2030	              Set -- 7-bit American Standard Code for Information
2031	              Interchange", ANSI X3.4, 1986.
2032	   [RFC2234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
2033	              Specifications: ABNF", RFC 2234, November 1997.

2035	   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
2036	              10646", STD 63, RFC 3629, November 2003.

2038	9.2  Informative References

2040	   [RFC0952]  Harrenstien, K., Stahl, M. and E. Feinler, "DoD Internet
2041	              host table specification", RFC 952, October 1985.

2043	   [RFC1034]  Mockapetris, P., "Domain names - concepts and facilities",
2044	              STD 13, RFC 1034, November 1987.

2046	   [RFC1123]  Braden, R., "Requirements for Internet Hosts - Application
2047	              and Support", STD 3, RFC 1123, October 1989.

2049	   [RFC1535]  Gavron, E., "A Security Problem and Proposed Correction
2050	              With Widely Deployed DNS Software", RFC 1535, October
2051	              1993.

2053	   [RFC1630]  Berners-Lee, T., "Universal Resource Identifiers in WWW: A
2054	              Unifying Syntax for the Expression of Names and Addresses
2055	              of Objects on the Network as used in the World-Wide Web",
2056	              RFC 1630, June 1994.

2058	   [RFC1736]  Kunze, J., "Functional Recommendations for Internet
2059	              Resource Locators", RFC 1736, February 1995.

2061	   [RFC1737]  Masinter, L. and K. Sollins, "Functional Requirements for
2062	              Uniform Resource Names", RFC 1737, December 1994.

2064	   [RFC1738]  Berners-Lee, T., Masinter, L. and M. McCahill, "Uniform
2065	              Resource Locators (URL)", RFC 1738, December 1994.

2067	   [RFC1808]  Fielding, R., "Relative Uniform Resource Locators", RFC
2068	              1808, June 1995.

2070	   [RFC2046]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
2071	              Extensions (MIME) Part Two: Media Types", RFC 2046,
2072	              November 1996.

2074	   [RFC2141]  Moats, R., "URN Syntax", RFC 2141, May 1997.

2076	   [RFC2277]  Alvestrand, H., "IETF Policy on Character Sets and
2077	              Languages", BCP 18, RFC 2277, January 1998.

2079	   [RFC2396]  Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform
2080	              Resource Identifiers (URI): Generic Syntax", RFC 2396,
2081	              August 1998.

2083	   [RFC2518]  Goland, Y., Whitehead, E., Faizi, A., Carter, S. and D.
2084	              Jensen, "HTTP Extensions for Distributed Authoring --
2085	              WEBDAV", RFC 2518, February 1999.

2087	   [RFC2557]  Palme, F., Hopmann, A., Shelness, N. and E. Stefferud,
2088	              "MIME Encapsulation of Aggregate Documents, such as HTML
2089	              (MHTML)", RFC 2557, March 1999.

2091	   [RFC2717]  Petke, R. and I. King, "Registration Procedures for URL
2092	              Scheme Names", BCP 35, RFC 2717, November 1999.

2094	   [RFC2718]  Masinter, L., Alvestrand, H., Zigmond, D. and R. Petke,
2095	              "Guidelines for new URL Schemes", RFC 2718, November 1999.

2097	   [RFC2732]  Hinden, R., Carpenter, B. and L. Masinter, "Format for
2098	              Literal IPv6 Addresses in URL's", RFC 2732, December 1999.

2100	   [RFC2978]  Freed, N. and J. Postel, "IANA Charset Registration
2101	              Procedures", BCP 19, RFC 2978, October 2000.

2103	   [RFC3305]  Mealling, M. and R. Denenberg, "Report from the Joint W3C/
2104	              IETF URI Planning Interest Group: Uniform Resource
2105	              Identifiers (URIs), URLs, and Uniform Resource Names
2106	              (URNs): Clarifications and Recommendations", RFC 3305,
2107	              August 2002.

2109	   [RFC3490]  Faltstrom, P., Hoffman, P. and A. Costello,
2110	              "Internationalizing Domain Names in Applications (IDNA)",
2111	              RFC 3490, March 2003.

2113	   [RFC3513]  Hinden, R. and S. Deering, "Internet Protocol Version 6
2114	              (IPv6) Addressing Architecture", RFC 3513, April 2003.

2116	   [Siedzik]  Siedzik, R., "Semantic Attacks: What's in a URL?", April
2117	              2001, <http://www.giac.org/practical/gsec/
2118	              Richard_Siedzik_GSEC.pdf>.

2120	Authors' Addresses

2122	   Tim Berners-Lee
2123	   World Wide Web Consortium
2124	   Massachusetts Institute of Technology
2125	   77 Massachusetts Avenue
2126	   Cambridge, MA  02139
2127	   USA

2129	   Phone: +1-617-253-5702
2130	   Fax:   +1-617-258-5999
2131	   EMail: timbl@w3.org
2132	   URI:   http://www.w3.org/People/Berners-Lee/

2134	   Roy T. Fielding
2135	   Day Software
2136	   5251 California Ave., Suite 110
2137	   Irvine, CA  92612-3074
2138	   USA

2140	   Phone: +1-949-679-2960
2141	   Fax:   +1-949-679-2972
2142	   EMail: fielding@gbiv.com
2143	   URI:   http://roy.gbiv.com/

2145	   Larry Masinter
2146	   Adobe Systems Incorporated
2147	   345 Park Ave
2148	   San Jose, CA  95110
2149	   USA

2151	   Phone: +1-408-536-3024
2152	   EMail: LMM@acm.org
2153	   URI:   http://larry.masinter.net/

2155	Appendix A.  Collected ABNF for URI

2157	    URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

2159	    hier-part     = "//" authority path-abempty
2160	                  / path-abs
2161	                  / path-rootless
2162	                  / path-empty

2164	    URI-reference = URI / relative-URI

2166	    absolute-URI  = scheme ":" hier-part [ "?" query ]

2168	    relative-URI  = relative-part [ "?" query ] [ "#" fragment ]

2170	    relative-part = "//" authority path-abempty
2171	                  / path-abs
2172	                  / path-noscheme
2173	                  / path-empty

2175	    scheme        = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

2177	    authority     = [ userinfo "@" ] host [ ":" port ]
2178	    userinfo      = *( unreserved / pct-encoded / sub-delims / ":" )
2179	    host          = IP-literal / IPv4address / reg-name
2180	    port          = *DIGIT

2182	    IP-literal    = "[" ( IPv6address / IPvFuture  ) "]"

2184	    IPvFuture     = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )

2186	    IPv6address   =                            6( h16 ":" ) ls32
2187	                  /                       "::" 5( h16 ":" ) ls32
2188	                  / [               h16 ] "::" 4( h16 ":" ) ls32
2189	                  / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
2190	                  / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
2191	                  / [ *3( h16 ":" ) h16 ] "::"    h16 ":"   ls32
2192	                  / [ *4( h16 ":" ) h16 ] "::"              ls32
2193	                  / [ *5( h16 ":" ) h16 ] "::"              h16
2194	                  / [ *6( h16 ":" ) h16 ] "::"

2196	    h16           = 1*4HEXDIG
2197	    ls32          = ( h16 ":" h16 ) / IPv4address

2199	    IPv4address   = dec-octet "." dec-octet "." dec-octet "." dec-octet
2200	    dec-octet     = DIGIT                 ; 0-9
2201	                  / %x31-39 DIGIT         ; 10-99
2202	                  / "1" 2DIGIT            ; 100-199
2203	                  / "2" %x30-34 DIGIT     ; 200-249
2204	                  / "25" %x30-35          ; 250-255

2206	    reg-name      = 0*255( unreserved / pct-encoded / sub-delims )

2208	    path          = path-abempty    ; begins with "/" or is empty
2209	                  / path-abs        ; begins with "/" but not "//"
2210	                  / path-noscheme   ; begins with a non-colon segment
2211	                  / path-rootless   ; begins with a segment
2212	                  / path-empty      ; zero characters

2214	    path-abempty  = *( "/" segment )
2215	    path-abs      = "/" [ segment-nz *( "/" segment ) ]
2216	    path-noscheme = segment-nzc *( "/" segment )
2217	    path-rootless = segment-nz *( "/" segment )
2218	    path-empty    = 0<pchar>

2220	    segment       = *pchar
2221	    segment-nz    = 1*pchar
2222	    segment-nzc   = 1*( unreserved / pct-encoded / sub-delims / "@" )

2224	    pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

2226	    query         = *( pchar / "/" / "?" )

2228	    fragment      = *( pchar / "/" / "?" )

2230	    pct-encoded   = "%" HEXDIG HEXDIG

2232	    unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
2233	    reserved      = gen-delims / sub-delims
2234	    gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
2235	    sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
2236	                  / "*" / "+" / "," / ";" / "="

2238	Appendix B.  Parsing a URI Reference with a Regular Expression

2240	   Since the "first-match-wins" algorithm is identical to the "greedy"
2241	   disambiguation method used by POSIX regular expressions, it is
2242	   natural and commonplace to use a regular expression for parsing the
2243	   potential five components of a URI reference.

2245	   The following line is the regular expression for breaking-down a
2246	   well-formed URI reference into its components.

2248	      ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
2249	       12            3  4          5       6  7        8 9

2251	   The numbers in the second line above are only to assist readability;
2252	   they indicate the reference points for each subexpression (i.e., each
2253	   paired parenthesis).  We refer to the value matched for subexpression
2254	   <n> as $<n>.  For example, matching the above expression to

2256	      http://www.ics.uci.edu/pub/ietf/uri/#Related

2258	   results in the following subexpression matches:

2260	      $1 = http:
2261	      $2 = http
2262	      $3 = //www.ics.uci.edu
2263	      $4 = www.ics.uci.edu
2264	      $5 = /pub/ietf/uri/
2265	      $6 = <undefined>
2266	      $7 = <undefined>
2267	      $8 = #Related
2268	      $9 = Related

2270	   where <undefined> indicates that the component is not present, as is
2271	   the case for the query component in the above example.  Therefore, we
2272	   can determine the value of the four components and fragment as

2274	      scheme    = $2
2275	      authority = $4
2276	      path      = $5
2277	      query     = $7
2278	      fragment  = $9

2280	   and, going in the opposite direction, we can recreate a URI reference
2281	   from its components using the algorithm of Section 5.3.

2283	Appendix C.  Delimiting a URI in Context

2285	   URIs are often transmitted through formats that do not provide a
2286	   clear context for their interpretation.  For example, there are many
2287	   occasions when a URI is included in plain text; examples include text
2288	   sent in electronic mail, USENET news messages, and, most importantly,
2289	   printed on paper.  In such cases, it is important to be able to
2290	   delimit the URI from the rest of the text, and in particular from
2291	   punctuation marks that might be mistaken for part of the URI.

2293	   In practice, URIs are delimited in a variety of ways, but usually
2294	   within double-quotes "http://example.com/", angle brackets <http://
2295	   example.com/>, or just using whitespace

2297	      http://example.com/

2299	   These wrappers do not form part of the URI.

2301	   In some cases, extra whitespace (spaces, line-breaks, tabs, etc.) may
2302	   need to be added to break a long URI across lines. The whitespace
2303	   should be ignored when extracting the URI.

2305	   No whitespace should be introduced after a hyphen ("-") character.
2306	   Because some typesetters and printers may (erroneously) introduce a
2307	   hyphen at the end of line when breaking a line, the interpreter of a
2308	   URI containing a line break immediately after a hyphen should ignore
2309	   all whitespace around the line break, and should be aware that the
2310	   hyphen may or may not actually be part of the URI.

2312	   Using <> angle brackets around each URI is especially recommended as
2313	   a delimiting style for a reference that contains embedded whitespace.

2315	   The prefix "URL:" (with or without a trailing space) was formerly
2316	   recommended as a way to help distinguish a URI from other bracketed
2317	   designators, though it is not commonly used in practice and is no
2318	   longer recommended.

2320	   For robustness, software that accepts user-typed URI should attempt
2321	   to recognize and strip both delimiters and embedded whitespace.

2323	   For example, the text:

2325	      Yes, Jim, I found it under "http://www.w3.org/Addressing/",
2326	      but you can probably pick it up from <ftp://foo.example.
2327	      com/rfc/>.  Note the warning in <http://www.ics.uci.edu/pub/
2328	      ietf/uri/historical.html#WARNING>.

2330	   contains the URI references

2332	      http://www.w3.org/Addressing/
2333	      ftp://foo.example.com/rfc/
2334	      http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING

2336	Appendix D.  Summary of Non-editorial Changes

2338	D.1  Additions

2340	   IPv6 (and later) literals have been added to the list of possible
2341	   identifiers for the host portion of a authority component, as
2342	   described by [RFC2732], with the addition of "[" and "]" to the
2343	   reserved set and a version flag to anticipate future versions of IP
2344	   literals.  Square brackets are now specified as reserved within the
2345	   authority component and not allowed outside their use as delimiters
2346	   for an IP literal within host.  In order to make this change without
2347	   changing the technical definition of the path, query, and fragment
2348	   components, those rules were redefined to directly specify the
2349	   characters allowed rather than be defined in terms of uric.

2351	   Since [RFC2732] defers to [RFC3513] for definition of an IPv6 literal
2352	   address, which unfortunately lacks an ABNF description of
2353	   IPv6address, we created a new ABNF rule for IPv6address that matches
2354	   the text representations defined by Section 2.2 of [RFC3513].

2356	   Likewise, the definition of IPv4address has been improved in order to
2357	   limit each decimal octet to the range 0-255.

2359	   Section 6 (Section 6) on URI normalization and comparison has been
2360	   completely rewritten and extended using input from Tim Bray and
2361	   discussion within the W3C Technical Architecture Group.

2363	   An ABNF rule for URI has been introduced to correspond to the common
2364	   usage of the term: an absolute URI with optional fragment.

2366	D.2  Modifications from RFC 2396

2368	   The ad-hoc BNF syntax has been replaced with the ABNF of [RFC2234].
2369	   This change required all rule names that formerly included underscore
2370	   characters to be renamed with a dash instead.

2372	   Section 2 on characters has been rewritten to explain what characters
2373	   are reserved, when they are reserved, and why they are reserved even
2374	   when not used as delimiters by the generic syntax. The mark
2375	   characters that are typically unsafe to decode, including the
2376	   exclamation mark ("!"), asterisk ("*"), single-quote ("'"), and open
2377	   and close parentheses ("(" and ")"), have been moved to the reserved
2378	   set in order to clarify the distinction between reserved and
2379	   unreserved and hopefully answer the most common question of scheme
2380	   designers. Likewise, the section on percent-encoded characters has
2381	   been rewritten, and URI normalizers are now given license to decode
2382	   any percent-encoded octets corresponding to unreserved characters.
2383	   In general, the terms "escaped" and "unescaped" have been replaced
2384	   with "percent-encoded" and "decoded", respectively, to reduce
2385	   confusion with other forms of escape mechanisms.

2387	   The ABNF for URI and URI-reference has been redesigned to make them
2388	   more friendly to LALR parsers and reduce complexity. As a result, the
2389	   layout form of syntax description has been removed, along with the
2390	   uric, uric_no_slash, opaque_part, net_path, abs_path, rel_path,
2391	   path_segments, rel_segment, and mark rules. All references to
2392	   "opaque" URIs have been replaced with a better description of how the
2393	   path component may be opaque to hierarchy. The ambiguity regarding
2394	   the parsing of URI-reference as a URI or a relative-URI with a colon
2395	   in the first segment has been eliminated through the use of five
2396	   separate path matching rules.

2398	   The fragment identifier has been moved back into the section on
2399	   generic syntax components and within the URI and relative-URI rules,
2400	   though it remains excluded from absolute-URI. The number sign ("#")
2401	   character has been moved back to the reserved set as a result of
2402	   reintegrating the fragment syntax.

2404	   The ABNF has been corrected to allow a relative path to be empty.
2405	   This also allows an absolute-URI to consist of nothing after the
2406	   "scheme:", as is present in practice with the "dav:" namespace
2407	   [RFC2518] and the "about:" scheme used internally by many WWW browser
2408	   implementations. The ambiguity regarding the boundary between
2409	   authority and path has been eliminated through the use of five
2410	   separate path matching rules.

2412	   Registry-based naming authorities that use the generic syntax are now
2413	   defined within the host rule and limited to 255 path characters. This
2414	   change allows current implementations, where whatever name provided
2415	   is simply fed to the local name resolution mechanism, to be
2416	   consistent with the specification and removes the need to re-specify
2417	   DNS name formats here.  It also allows the host component to contain
2418	   percent-encoded octets, which is necessary to enable
2419	   internationalized domain names to be provided in URIs, processed in
2420	   their native character encodings at the application layers above URI
2421	   processing, and passed to an IDNA library as a registered name in the
2422	   UTF-8 character encoding. The server, hostport, hostname,
2423	   domainlabel, toplabel, and alphanum rules have been removed.

2425	   The resolving relative references algorithm of [RFC2396] has been
2426	   rewritten using pseudocode for this revision to improve clarity and
2427	   fix the following issues:

2429	   o  [RFC2396] section 5.2, step 6a, failed to account for a base URI
2430	      with no path.

2432	   o  Restored the behavior of [RFC1808] where, if the reference
2433	      contains an empty path and a defined query component, then the
2434	      target URI inherits the base URI's path component.

2436	   o  Removed the special-case treatment of same-document references
2437	      within the URI parser in favor of a section that explains when a
2438	      reference should be interpreted by a dereferencing engine as a
2439	      same-document reference: when the target URI and base URI,
2440	      excluding fragments, match.  This change does not modify the
2441	      behavior of existing same-document references as defined by RFC
2442	      2396 (fragment-only references); it merely adds the same-document
2443	      distinction to other references that refer to the base URI and
2444	      simplifies the interface between applications and their URI
2445	      parsers, as is consistent with the internal architecture of
2446	      deployed URI processing implementations.

2448	   o  Separated the path merge routine into two routines: merge, for
2449	      describing combination of the base URI path with a relative-path
2450	      reference, and remove_dot_segments, for describing how to remove
2451	      the special "." and ".." segments from a composed path.  The
2452	      remove_dot_segments algorithm is now applied to all URI reference
2453	      paths in order to match common implementations and improve the
2454	      normalization of URIs in practice.  This change only impacts the
2455	      parsing of abnormal references and same-scheme references wherein
2456	      the base URI has a non-hierarchical path.

2458	Index

2460	A
2461	   ABNF  10
2462	   absolute  25
2463	   absolute-path  25
2464	   absolute-URI  25
2465	   access  8
2466	   authority  15, 16

2468	B
2469	   base URI  27

2471	C
2472	   character encoding  4
2473	   character  4
2474	   characters  10
2475	   coded character set  4

2477	D
2478	   dec-octet  19
2479	   dereference  8
2480	   dot-segments  21

2482	F
2483	   fragment  15, 23

2485	G
2486	   gen-delims  11
2487	   generic syntax  6

2489	H
2490	   h16  18
2491	   hier-part  15
2492	   hierarchical  9
2493	   host  17

2495	I
2496	   identifier  5
2497	   IP-literal  18
2498	   IPv4  19
2499	   IPv4address  19
2500	   IPv6  18
2501	   IPv6address  18
2502	   IPvFuture  18

2504	L
2505	   locator  6
2506	   ls32  18

2508	M
2509	   merge  30

2511	N
2512	   name  6
2513	   network-path  25

2515	P
2516	   path  15, 21
2517	      path-abempty  21
2518	      path-abs  21
2519	      path-empty  21
2520	      path-noscheme  21
2521	      path-rootless  21
2522	   path-abempty  15
2523	   path-abs  15
2524	   path-empty  15
2525	   path-rootless  15
2526	   pchar  21
2527	   pct-encoded  11
2528	   percent-encoding  11
2529	   port  20

2531	Q
2532	   query  15, 22

2534	R
2535	   reg-name  19
2536	   registered name  19
2537	   relative  9, 27
2538	   relative-path  25
2539	   relative-URI  25
2540	   remove_dot_segments  30, 31
2541	   representation  8
2542	   reserved  11
2543	   resolution  8, 27
2544	   resource  4
2545	   retrieval  8

2547	S
2548	   same-document  25
2549	   sameness  8
2550	   scheme  15, 15
2551	   segment  21
2552	      segment-nz  21
2553	      segment-nzc  21
2554	   sub-delims  11
2555	   suffix  26

2557	T
2558	   transcription  7

2560	U
2561	   uniform  4
2562	   unreserved  12
2563	   URI grammar
2564	      absolute-URI  25
2565	      ALPHA  10
2566	      authority  15, 16
2567	      CR  10
2568	      dec-octet  19
2569	      DIGIT  10
2570	      DQUOTE  10
2571	      fragment  15, 23, 25
2572	      gen-delims  11
2573	      h16  18
2574	      HEXDIG  10
2575	      hier-part  15
2576	      host  16, 17
2577	      IP-literal  18
2578	      IPv4address  19
2579	      IPv6address  18
2580	      IPvFuture  18
2581	      LF  10
2582	      ls32  18
2583	      mark  12
2584	      OCTET  10
2585	      path  21
2586	      path-abempty  15, 21
2587	      path-abs  15, 21
2588	      path-empty  15, 21
2589	      path-noscheme  21
2590	      path-rootless  15, 21
2591	      pchar  21, 22, 23
2592	      pct-encoded  11
2593	      port  16, 20
2594	      query  15, 22, 25
2595	      reg-name  19
2596	      relative-URI  24, 25
2597	      reserved  11
2598	      scheme  15, 16, 25
2599	      segment  21
2600	      segment-nz  21
2601	      segment-nzc  21
2602	      SP  10
2603	      sub-delims  11
2604	      unreserved  12
2605	      URI  15, 24
2606	      URI-reference  24
2607	      userinfo  16, 17
2608	   URI  15
2609	   URI-reference  24
2610	   URL  6
2611	   URN  6
2612	   userinfo  17

2614	Intellectual Property Statement

2616	   The IETF takes no position regarding the validity or scope of any
2617	   Intellectual Property Rights or other rights that might be claimed to
2618	   pertain to the implementation or use of the technology described in
2619	   this document or the extent to which any license under such rights
2620	   might or might not be available; nor does it represent that it has
2621	   made any independent effort to identify any such rights. Information
2622	   on the IETF's procedures with respect to rights in IETF Documents can
2623	   be found in BCP 78 and BCP 79.

2625	   Copies of IPR disclosures made to the IETF Secretariat and any
2626	   assurances of licenses to be made available, or the result of an
2627	   attempt made to obtain a general license or permission for the use of
2628	   such proprietary rights by implementers or users of this
2629	   specification can be obtained from the IETF on-line IPR repository at
2630	   http://www.ietf.org/ipr.

2632	   The IETF invites any interested party to bring to its attention any
2633	   copyrights, patents or patent applications, or other proprietary
2634	   rights that may cover technology that may be required to implement
2635	   this standard. Please address the information to the IETF at
2636	   ietf-ipr@ietf.org.

2638	Disclaimer of Validity

2640	   This document and the information contained herein are provided on an
2641	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
2642	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
2643	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
2644	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
2645	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
2646	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

2648	Copyright Statement

2650	   Copyright (C) The Internet Society (2004). This document is subject
2651	   to the rights, licenses and restrictions contained in BCP 78, and
2652	   except as set forth therein, the authors retain all their rights.

2654	Acknowledgment

2656	   Funding for the RFC Editor function is currently provided by the
2657	   Internet Society.