< draft-ietf-uri-relative-url-04.txt   draft-ietf-uri-relative-url-05.txt >
Uniform Resource Identifiers Working Group R. T. Fielding Uniform Resource Identifiers Working Group R. T. Fielding
INTERNET-DRAFT UC Irvine INTERNET-DRAFT UC Irvine
Expires July 18, 1995 January 18, 1995 Expires July 30, 1995 January 30, 1995
Relative Uniform Resource Locators Relative Uniform Resource Locators
<draft-ietf-uri-relative-url-04.txt> <draft-ietf-uri-relative-url-05.txt>
Status of this Memo Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts. working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other months and may be updated, replaced, or obsoleted by other
skipping to change at line 100 skipping to change at line 100
The syntax for relative URLs is a shortened form of that for absolute The syntax for relative URLs is a shortened form of that for absolute
URLs [2], where some prefix of the URL is missing and certain path URLs [2], where some prefix of the URL is missing and certain path
components ("." and "..") have a special meaning when interpreting a components ("." and "..") have a special meaning when interpreting a
relative path. Because a relative URL may appear in any context that relative path. Because a relative URL may appear in any context that
could hold an absolute URL, systems that support relative URLs must could hold an absolute URL, systems that support relative URLs must
be able to recognize them as part of the URL parsing process. be able to recognize them as part of the URL parsing process.
Although this document does not seek to define the overall URL Although this document does not seek to define the overall URL
syntax, some discussion of it is necessary in order to describe the syntax, some discussion of it is necessary in order to describe the
parsing of relative URLs. In particular, base documents can only parsing of relative URLs. In particular, base documents can only
make use of relative URLs when their base URL fits within the generic make use of relative URLs when their base URL fits within the
syntax described below. Although some URL schemes do not require generic-RL syntax described below. Although some URL schemes do not
this generic syntax, it is assumed that any document which contains require this generic-RL syntax, it is assumed that any document which
a relative reference does have a base URL that obeys the syntax. contains a relative reference does have a base URL that obeys the
In other words, relative URLs cannot be used within documents that syntax. In other words, relative URLs cannot be used within
have unsuitable base URLs. documents that have unsuitable base URLs.
2.1. URL Syntactic Components 2.1. URL Syntactic Components
The URL syntax is dependent upon the scheme. Some schemes use The URL syntax is dependent upon the scheme. Some schemes use
reserved characters like "?" and ";" to indicate special components, reserved characters like "?" and ";" to indicate special components,
while others just consider them to be part of the path. However, while others just consider them to be part of the path. However,
there is enough uniformity in the use of URLs to allow a parser there is enough uniformity in the use of URLs to allow a parser
to resolve relative URLs based upon a single, generic syntax. to resolve relative URLs based upon a single, generic-RL syntax.
This generic syntax consists of six components: This generic-RL syntax consists of six components:
<scheme>://<net_loc>/<path>;<params>?<query>#<fragment> <scheme>://<net_loc>/<path>;<params>?<query>#<fragment>
each of which, except <scheme>, may be absent from a particular URL. each of which, except <scheme>, may be absent from a particular URL.
These components are defined as follows (a complete BNF is provided These components are defined as follows (a complete BNF is provided
in Section 2.2): in Section 2.2):
scheme ":" ::= scheme name, as per Section 2.1 of [2]. scheme ":" ::= scheme name, as per Section 2.1 of RFC 1738 [2].
"//" net_loc ::= network location and login information, as per "//" net_loc ::= network location and login information, as per
Section 3.1 of [2]. Section 3.1 of RFC 1738 [2].
"/" path ::= URL path, as per Section 3.1 of [2]. "/" path ::= URL path, as per Section 3.1 of RFC 1738 [2].
";" params ::= object parameters (e.g. ";type=a" as in ";" params ::= object parameters (e.g. ";type=a" as in
Section 3.2.2 of [2]). Section 3.2.2 of RFC 1738 [2]).
"?" query ::= query information, as per Section 3.3 of [2]. "?" query ::= query information, as per Section 3.3 of
RFC 1738 [2].
"#" fragment ::= fragment identifier. "#" fragment ::= fragment identifier.
Note that the fragment identifier (and the "#" that precedes it) is Note that the fragment identifier (and the "#" that precedes it) is
not considered part of the URL. However, since it is commonly used not considered part of the URL. However, since it is commonly used
within the same string context as a URL, a parser must be able to within the same string context as a URL, a parser must be able to
recognize the fragment when it is present and set it aside as part recognize the fragment when it is present and set it aside as part
of the parsing process. of the parsing process.
The order of the components is important. If both <params> and The order of the components is important. If both <params> and
skipping to change at line 158 skipping to change at line 159
This is a BNF-like description of the Relative Uniform Resource This is a BNF-like description of the Relative Uniform Resource
Locator syntax, using the conventions of RFC 822 [5], except that Locator syntax, using the conventions of RFC 822 [5], except that
"|" is used to designate alternatives. Briefly, literals are quoted "|" is used to designate alternatives. Briefly, literals are quoted
with "", parentheses "(" and ")" are used to group elements, optional with "", parentheses "(" and ")" are used to group elements, optional
elements are enclosed in [brackets], and elements may be preceded elements are enclosed in [brackets], and elements may be preceded
with <n>* to designate n or more repetitions of the following with <n>* to designate n or more repetitions of the following
element; n defaults to 0. element; n defaults to 0.
URL = ( absoluteURL | relativeURL ) [ "#" fragment ] URL = ( absoluteURL | relativeURL ) [ "#" fragment ]
absoluteURL = scheme ":" *( uchar | reserved ) absoluteURL = generic-RL | ( scheme ":" *( uchar | reserved ) )
generic-RL = scheme ":" relativeURL
relativeURL = net_path | abs_path | rel_path relativeURL = net_path | abs_path | rel_path
net_path = "//" net_loc [ abs_path ] net_path = "//" net_loc [ abs_path ]
abs_path = "/" rel_path abs_path = "/" rel_path
rel_path = [ path ] [ ";" params ] [ "?" query ] rel_path = [ path ] [ ";" params ] [ "?" query ]
path = fsegment *( "/" segment ) path = fsegment *( "/" segment )
fsegment = 1*pchar fsegment = 1*pchar
segment = *pchar segment = *pchar
skipping to change at line 206 skipping to change at line 209
safe = "$" | "-" | "_" | "." | "+" safe = "$" | "-" | "_" | "." | "+"
extra = "!" | "*" | "'" | "(" | ")" | "," extra = "!" | "*" | "'" | "(" | ")" | ","
national = "{" | "}" | "|" | "\" | "^" | "~" | "[" | "]" | "`" national = "{" | "}" | "|" | "\" | "^" | "~" | "[" | "]" | "`"
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" reserved = ";" | "/" | "?" | ":" | "@" | "&" | "="
punctuation = "<" | ">" | "#" | "%" | <"> punctuation = "<" | ">" | "#" | "%" | <">
2.3. Specific Schemes and their Syntactic Categories 2.3. Specific Schemes and their Syntactic Categories
Each URL scheme has its own rules regarding the presence or absence Each URL scheme has its own rules regarding the presence or absence
of the syntactic components described in Section 2.1 and 2.2. of the syntactic components described in Sections 2.1 and 2.2.
In addition, some schemes are never appropriate for use with relative In addition, some schemes are never appropriate for use with relative
URLs. However, since relative URLs will only be used within contexts URLs. However, since relative URLs will only be used within contexts
in which they are useful, these scheme-specific differences can be in which they are useful, these scheme-specific differences can be
ignored by the resolution process. ignored by the resolution process.
Within this section, we include as examples only those schemes that Within this section, we include as examples only those schemes that
have a defined URL syntax in [2]. The following schemes are never have a defined URL syntax in RFC 1738 [2]. The following schemes are
used with relative URLs: never used with relative URLs:
mailto Electronic Mail mailto Electronic Mail
news USENET news
telnet TELNET Protocol for Interactive Sessions telnet TELNET Protocol for Interactive Sessions
Some URL schemes allow the use of reserved characters for purposes Some URL schemes allow the use of reserved characters for purposes
outside the generic grammar given above. However, such use is rare. outside the generic-RL syntax given above. However, such use is
Relative URLs can be used with these schemes whenever the applicable rare. Relative URLs can be used with these schemes whenever the
base URL follows the generic syntax. applicable base URL follows the generic-RL syntax.
gopher Gopher and Gopher+ Protocols gopher Gopher and Gopher+ Protocols
news USENET news
nntp USENET news using NNTP access
prospero Prospero Directory Service prospero Prospero Directory Service
wais Wide Area Information Servers Protocol wais Wide Area Information Servers Protocol
Finally, the following schemes can always be parsed using the generic Users of gopher URLs should note that gopher-type information is
syntax. often included at the beginning of what would be the generic-RL path.
If present, this type information prevents relative-path references
to documents with differing gopher-types.
Finally, the following schemes can always be parsed using the
generic-RL syntax.
file Host-specific Files file Host-specific Files
ftp File Transfer Protocol ftp File Transfer Protocol
http Hypertext Transfer Protocol http Hypertext Transfer Protocol
nntp USENET news using NNTP access
It is recommended that new schemes be designed to be parsable via It is recommended that new schemes be designed to be parsable via
the generic syntax if they are intended to be used with relative the generic-RL syntax if they are intended to be used with relative
URLs. A description of the allowed relative forms should be included URLs. A description of the allowed relative forms should be included
when a new scheme is registered, as per Section 4 of [2]. when a new scheme is registered, as per Section 4 of RFC 1738 [2].
2.4. Parsing a URL 2.4. Parsing a URL
An accepted method for parsing URLs is necessary to disambiguate the An accepted method for parsing URLs is useful to clarify the
generic URL syntax of Section 2.2 and to describe the algorithm for generic-RL syntax of Section 2.2 and to describe the algorithm for
resolving relative URLs presented in Section 4. This section resolving relative URLs presented in Section 4. This section
describes the parsing rules for breaking down a URL (relative or describes the parsing rules for breaking down a URL (relative or
absolute) into the component parts described in Section 2.1. The absolute) into the component parts described in Section 2.1. The
rules assume that the URL has already been separated from any rules assume that the URL has already been separated from any
surrounding text and copied to a "parse string". The rules are surrounding text and copied to a "parse string". The rules are
listed in the order in which they would be applied by the parser. listed in the order in which they would be applied by the parser.
2.4.1. Parsing the Fragment Identifier 2.4.1. Parsing the Fragment Identifier
If the parse string contains a crosshatch "#" character, then the If the parse string contains a crosshatch "#" character, then the
skipping to change at line 317 skipping to change at line 325
After the above steps, all that is left of the parse string is After the above steps, all that is left of the parse string is
the URL <path> and the slash "/" that may precede it. Even though the URL <path> and the slash "/" that may precede it. Even though
the initial slash is not part of the URL path, the parser must the initial slash is not part of the URL path, the parser must
remember whether or not it was present so that later processes remember whether or not it was present so that later processes
can differentiate between relative and absolute paths. Often this can differentiate between relative and absolute paths. Often this
is done by simply storing the preceding slash along with the path. is done by simply storing the preceding slash along with the path.
3. Establishing a Base URL 3. Establishing a Base URL
In order for relative URLs to be usable within a base document, The term "relative URL" implies that there exists some absolute
the absolute "base URL" of that document must be known to the "base URL" against which the relative reference is applied. Indeed,
parser. There are three methods for obtaining the base URL of the base URL is necessary to define the semantics of any embedded
a document, listed here in order of precedence. relative URLs; without it, a relative reference is meaningless.
In order for relative URLs to be usable within a document, the base
URL of that document must be known to the parser.
The base URL of a document can be established in one of four ways,
listed below in order of precedence. The order of precedence can be
thought of in terms of layers, where the innermost defined base URL
has the highest precedence. This can be visualized graphically as:
.---------------------------------------------------------.
| .---------------------------------------------------. |
| | .---------------------------------------------. | |
| | | .---------------------------------------. | | |
| | | | (3.1) Base URL embedded in the | | | |
| | | | document's content | | | |
| | | `---------------------------------------' | | |
| | | (3.2) URL defined by a "Base" message | | |
| | | header (or equivalent) | | |
| | `---------------------------------------------' | |
| | (3.3) URL of the document's retrieval context | |
| `---------------------------------------------------' |
| (3.4) Base URL = "" (undefined) |
`---------------------------------------------------------'
3.1. Base URL within Document Content 3.1. Base URL within Document Content
Within certain document media types, the base URL of the document Within certain document media types, the base URL of the document
can be embedded within the content itself such that it can be can be embedded within the content itself such that it can be
readily obtained by a parser. This can be useful for descriptive readily obtained by a parser. This can be useful for descriptive
documents, such as tables of content, which may be transmitted to documents, such as tables of content, which may be transmitted to
others through protocols other than their usual retrieval context others through protocols other than their usual retrieval context
(e.g. E-Mail or USENET news). (e.g. E-Mail or USENET news).
It is beyond the scope of this document to specify how, for each It is beyond the scope of this document to specify how, for each
media type, the base URL can be embedded. However, an example of media type, the base URL can be embedded. However, an example of
how this is done for the Hypertext Markup Language (HTML) [3] is how this is done for the Hypertext Markup Language (HTML) [3] is
provided in an Appendix (Section 10). provided in an Appendix (Section 10).
3.2. Base URL within Message Headers 3.2. Base URL within Message Headers
For protocols that make use of message headers like those described A second method for identifying the base URL of a document is to
in RFC 822 [5], a second method for identifying the base URL of a specify it within the message headers (or equivalent tagged
document is to include that URL in the message headers. It is metainformation) of the message enclosing the document. For
recommended that the format of this header be: protocols that make use of message headers like those described in
RFC 822 [5], it is recommended that the format of this header be:
base = "Base" ":" "<URL:" absoluteURL ">" base-header = "Base" ":" "<URL:" absoluteURL ">"
where "Base" is case-insensitive. For example, where "Base" is case-insensitive. For example, the header
Base: <URL:http://www.ics.uci.edu/Test/a/b/c> Base: <URL:http://www.ics.uci.edu/Test/a/b/c>
would indicate that any relative URLs found within the document would indicate that any relative URLs found within the document
should be parsed relative to <URL:http://www.ics.uci.edu/Test/a/b/c>. should be parsed relative to <URL:http://www.ics.uci.edu/Test/a/b/c>.
Any whitespace (including that used for line folding) inside the Any whitespace (including that used for line folding) inside the
angle brackets should be ignored. angle brackets should be ignored.
Protocols which do not use the RFC 822 message header syntax, but
which do allow some form of tagged metainformation to be included
within messages, may define their own syntax for passing the base URL
as part of a message. Describing the syntax for all possible
protocols is beyond the scope of this document. It is assumed that
user agents using such a protocol will be able to obtain the
appropriate syntax from that protocol's specification.
In situations where both an embedded base URL (as described in In situations where both an embedded base URL (as described in
Section 3.1) and a "Base" message header are present, the embedded Section 3.1) and a base-header are present, the embedded base URL
base URL takes precedence. takes precedence.
3.3. Base URL from the Retrieval Context 3.3. Base URL from the Retrieval Context
If neither an embedded base URL nor a "Base" message header If neither an embedded base URL nor a base-header is present, then,
is present, then, if a URL was used to retrieve the base document, if a URL was used to retrieve the base document, that URL shall be
that URL shall be considered the base URL. Note that if the considered the base URL. Note that if the retrieval was the result
retrieval was the result of a redirected request, the last URL used of a redirected request, the last URL used (i.e., that which resulted
(i.e., that which resulted in the actual retrieval of the document) in the actual retrieval of the document) is the base URL.
is the base URL.
Composite media types, such as the "multipart/*" and "message/*"
media types defined by MIME (RFC 1521, [4]), require special
processing in order to determine the retrieval context of an enclosed
document. For these types, the base URL of the composite entity
must be determined first; this base is then considered the retrieval
context for its component parts, and thus the base URL for any part
that does not define its own base via one of the methods described
in Sections 3.1 and 3.2. This logic is applied recursively for
component parts that are themselves composite entities.
In other words, the retrieval context (Section 3.3) of a component
part is the base URL of the composite entity of which it is a part.
Thus, a composite entity can redefine the retrieval context of its
component parts via inclusion of a base-header, and this redefinition
applies recursively for a hierarchy of composite parts. Note that
this is not necessarily the same as defining the base URL of the
components, since each component may include an embedded base URL
or base-header that takes precedence over the retrieval context.
3.4. Default Base URL 3.4. Default Base URL
If none of the conditions described in Sections 3.1 -- 3.3 apply, If none of the conditions described in Sections 3.1 -- 3.3 apply,
then the base URL is considered to be the empty string and all then the base URL is considered to be the empty string and all
embedded URLs within that document shall be interpreted as absolute. embedded URLs within that document are assumed to be absolute URLs.
It is the responsibility of the distributor(s) of a document It is the responsibility of the distributor(s) of a document
containing relative URLs to ensure that the base URL for that containing relative URLs to ensure that the base URL for that
document can be established. It must be emphasized that relative document can be established. It must be emphasized that relative
URLs cannot be used reliably in situations where the object's base URLs cannot be used reliably in situations where the object's base
URL is not well-defined. URL is not well-defined.
3.5. Base URL for Composite Media Types
Composite media types, such as the "multipart/*" and "message/*"
media types defined by MIME (RFC 1521, [4]), require special
processing in order to determine the base URL of a component part.
For these types, the base URL of the composite entity should be
determined first; this base is then considered the default for any
component part that does not define its own base via one of the
methods described in Sections 3.1 and 3.2.
4. Resolving Relative URLs 4. Resolving Relative URLs
This section describes an example algorithm for resolving URLs This section describes an example algorithm for resolving URLs
within a context in which the URLs may be relative, such that the within a context in which the URLs may be relative, such that the
result is always a URL in absolute form. Although this algorithm result is always a URL in absolute form. Although this algorithm
cannot guarantee that the resulting URL will equal that intended cannot guarantee that the resulting URL will equal that intended
by the original author, it does guarantee that any valid URL by the original author, it does guarantee that any valid URL
(relative or absolute) can be consistently transformed to an (relative or absolute) can be consistently transformed to an
absolute form given a valid base URL. absolute form given a valid base URL.
skipping to change at line 617 skipping to change at line 664
[4] N. Borenstein and N. Freed, "MIME (Multipurpose Internet Mail [4] N. Borenstein and N. Freed, "MIME (Multipurpose Internet Mail
Extensions): Mechanisms for Specifying and Describing the Format Extensions): Mechanisms for Specifying and Describing the Format
of Internet Message Bodies", RFC 1521, Bellcore, Innosoft, of Internet Message Bodies", RFC 1521, Bellcore, Innosoft,
September 1993. <URL:ftp://ds.internic.net/rfc/rfc1521.txt> September 1993. <URL:ftp://ds.internic.net/rfc/rfc1521.txt>
[5] D. H. Crocker, "Standard for the Format of ARPA Internet [5] D. H. Crocker, "Standard for the Format of ARPA Internet
Text Messages", STD 11, RFC 822, UDEL, August 1982. Text Messages", STD 11, RFC 822, UDEL, August 1982.
<URL:ftp://ds.internic.net/rfc/rfc822.txt> <URL:ftp://ds.internic.net/rfc/rfc822.txt>
[6] J. Kunze, "Functional Requirements for Internet Resource [6] J. Kunze, "Functional Requirements for Internet Resource
Locators", Work in Progress, IS&T, UC Berkeley, November 1994. Locators", Work in Progress, IS&T, UC Berkeley, January 1995.
<URL:ftp://ds.internic.net/internet-drafts/ <URL:ftp://ds.internic.net/internet-drafts/
draft-ietf-uri-irl-fun-req-02.txt> draft-ietf-uri-irl-fun-req-03.txt>
9. Author's Address 9. Author's Address
Roy T. Fielding Roy T. Fielding
Department of Information and Computer Science Department of Information and Computer Science
University of California University of California
Irvine, CA 92717-3425 Irvine, CA 92717-3425
U.S.A. U.S.A.
Tel: +1 (714) 824-4049 Tel: +1 (714) 824-4049
Fax: +1 (714) 824-4056 Fax: +1 (714) 824-4056
Email: fielding@ics.uci.edu Email: fielding@ics.uci.edu
This Internet-Draft expires July 18, 1995. This Internet-Draft expires July 30, 1995.
10. Appendix - Embedding the Base URL in HTML documents. 10. Appendix - Embedding the Base URL in HTML documents.
It is useful to consider an example of how the base URL of a It is useful to consider an example of how the base URL of a
document can be embedded within the document's content. In this document can be embedded within the document's content. In this
appendix, we describe how documents written in the Hypertext Markup appendix, we describe how documents written in the Hypertext Markup
Language (HTML) [3] can include an embedded base URL. This appendix Language (HTML) [3] can include an embedded base URL. This appendix
does not form a part of the relative URL specification and should not does not form a part of the relative URL specification and should not
be considered as anything more than a descriptive example. be considered as anything more than a descriptive example.
 End of changes. 32 change blocks. 
62 lines changed or deleted 109 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/