idnits 2.17.1 draft-ietf-uri-relative-url-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 719 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. ** There is 1 instance of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 30, 1995) is 10672 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Downref: Normative reference to an Informational RFC: RFC 1630 (ref. '1') ** Obsolete normative reference: RFC 1738 (ref. '2') (Obsoleted by RFC 4248, RFC 4266) -- Possible downref: Non-RFC (?) normative reference: ref. '3' ** Obsolete normative reference: RFC 1521 (ref. '4') (Obsoleted by RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049) ** Obsolete normative reference: RFC 822 (ref. '5') (Obsoleted by RFC 2822) -- Unexpected draft version: The latest known version of draft-ietf-uri-irl-fun-req is -02, but you're referring to -03. ** Downref: Normative reference to an Informational draft: draft-ietf-uri-irl-fun-req (ref. '6') Summary: 15 errors (**), 0 flaws (~~), 2 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Uniform Resource Identifiers Working Group R. T. Fielding 2 INTERNET-DRAFT UC Irvine 3 Expires July 30, 1995 January 30, 1995 5 Relative Uniform Resource Locators 6 8 Status of this Memo 10 This document is an Internet-Draft. Internet-Drafts are working 11 documents of the Internet Engineering Task Force (IETF), its areas, 12 and its working groups. Note that other groups may also distribute 13 working documents as Internet-Drafts. 15 Internet-Drafts are draft documents valid for a maximum of six 16 months and may be updated, replaced, or obsoleted by other 17 documents at any time. It is inappropriate to use Internet- 18 Drafts as reference material or to cite them other than as 19 ``work in progress.'' 21 To learn the current status of any Internet-Draft, please check 22 the ``1id-abstracts.txt'' listing contained in the Internet- 23 Drafts Shadow Directories on ftp.is.co.za (Africa), 24 nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), 25 ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). 27 Distribution of this document is unlimited. Please send comments 28 to the author, Roy T. Fielding , or to the 29 URI working group (URI-WG) of the Internet Engineering Task Force 30 (IETF) at . Discussions of the group are archived at 31 . 33 Abstract 35 A Uniform Resource Locator (URL) is a compact representation of the 36 location and access method for a resource available via the Internet. 37 When embedded within a base document, a URL in its absolute form may 38 contain a great deal of information which is already known from the 39 context of that base document's retrieval, including the scheme, 40 network location, and parts of the url-path. In situations where the 41 base URL is well-defined and known to the parser (human or machine), 42 it is useful to be able to embed URL references which inherit that 43 context rather than re-specifying it in every instance. This 44 document defines the syntax and semantics for such Relative Uniform 45 Resource Locators. 47 1. Introduction 49 This document describes the syntax and semantics for "relative" 50 Uniform Resource Locators (relative URLs): a compact representation 51 of the location of a resource relative to an absolute base URL. 52 It is a companion to RFC 1738, "Uniform Resource Locators (URL)" [2], 53 which specifies the syntax and semantics of absolute URLs. 55 A common use for Uniform Resource Locators is to embed them within 56 a document (referred to as the "base" document) for the purpose of 57 identifying other Internet-accessible resources. For example, in 58 hypertext documents, URLs can be used as the identifiers for 59 hypertext link destinations. 61 Absolute URLs contain a great deal of information which may already 62 be known from the context of the base document's retrieval, 63 including the scheme, network location, and parts of the URL path. 64 In situations where the base URL is well-defined and known, it is 65 useful to be able to embed a URL reference which inherits that 66 context rather than re-specifying it within each instance. 67 Similarly, relative URLs can be used within data-entry dialogs to 68 decrease the number of characters necessary to describe a location. 70 It is often the case that a group or "tree" of documents has been 71 constructed to serve a common purpose; the vast majority of URLs 72 within these documents point to locations within the tree rather 73 than outside of it. Similarly, documents located at a particular 74 Internet site are much more likely to refer to other resources at 75 that site than to resources at remote sites. 77 Relative addressing of URLs allows document trees to be partially 78 independent of their location and access scheme. For instance, 79 if they refer to each other using relative URLs, it is possible for 80 a single set of documents to be simultaneously accessible and, if 81 hypertext, traversable via each of the "file", "http", and "ftp" 82 schemes. Furthermore, document trees can be moved, as a whole, 83 without changing any of the embedded URLs. Experience within the 84 World-Wide Web has demonstrated that the ability to perform relative 85 referencing is necessary for the long-term usability of embedded 86 URLs. 88 2. Relative URL Syntax 90 The syntax for relative URLs is a shortened form of that for absolute 91 URLs [2], where some prefix of the URL is missing and certain path 92 components ("." and "..") have a special meaning when interpreting a 93 relative path. Because a relative URL may appear in any context that 94 could hold an absolute URL, systems that support relative URLs must 95 be able to recognize them as part of the URL parsing process. 97 Although this document does not seek to define the overall URL 98 syntax, some discussion of it is necessary in order to describe the 99 parsing of relative URLs. In particular, base documents can only 100 make use of relative URLs when their base URL fits within the 101 generic-RL syntax described below. Although some URL schemes do not 102 require this generic-RL syntax, it is assumed that any document which 103 contains a relative reference does have a base URL that obeys the 104 syntax. In other words, relative URLs cannot be used within 105 documents that have unsuitable base URLs. 107 2.1. URL Syntactic Components 109 The URL syntax is dependent upon the scheme. Some schemes use 110 reserved characters like "?" and ";" to indicate special components, 111 while others just consider them to be part of the path. However, 112 there is enough uniformity in the use of URLs to allow a parser 113 to resolve relative URLs based upon a single, generic-RL syntax. 114 This generic-RL syntax consists of six components: 116 :///;?# 118 each of which, except , may be absent from a particular URL. 119 These components are defined as follows (a complete BNF is provided 120 in Section 2.2): 122 scheme ":" ::= scheme name, as per Section 2.1 of RFC 1738 [2]. 124 "//" net_loc ::= network location and login information, as per 125 Section 3.1 of RFC 1738 [2]. 127 "/" path ::= URL path, as per Section 3.1 of RFC 1738 [2]. 129 ";" params ::= object parameters (e.g. ";type=a" as in 130 Section 3.2.2 of RFC 1738 [2]). 132 "?" query ::= query information, as per Section 3.3 of 133 RFC 1738 [2]. 135 "#" fragment ::= fragment identifier. 137 Note that the fragment identifier (and the "#" that precedes it) is 138 not considered part of the URL. However, since it is commonly used 139 within the same string context as a URL, a parser must be able to 140 recognize the fragment when it is present and set it aside as part 141 of the parsing process. 143 The order of the components is important. If both and 144 are present, the information must occur after the 145 . 147 2.2. BNF for Relative URLs 149 This is a BNF-like description of the Relative Uniform Resource 150 Locator syntax, using the conventions of RFC 822 [5], except that 151 "|" is used to designate alternatives. Briefly, literals are quoted 152 with "", parentheses "(" and ")" are used to group elements, optional 153 elements are enclosed in [brackets], and elements may be preceded 154 with * to designate n or more repetitions of the following 155 element; n defaults to 0. 157 URL = ( absoluteURL | relativeURL ) [ "#" fragment ] 159 absoluteURL = generic-RL | ( scheme ":" *( uchar | reserved ) ) 161 generic-RL = scheme ":" relativeURL 163 relativeURL = net_path | abs_path | rel_path 165 net_path = "//" net_loc [ abs_path ] 166 abs_path = "/" rel_path 167 rel_path = [ path ] [ ";" params ] [ "?" query ] 169 path = fsegment *( "/" segment ) 170 fsegment = 1*pchar 171 segment = *pchar 173 params = param *( ";" param ) 174 param = *( pchar | "/" ) 176 scheme = 1*( alpha | digit | "+" | "-" | "." ) 177 net_loc = *( pchar | ";" | "?" ) 178 query = *( uchar | reserved ) 179 fragment = *( uchar | reserved ) 181 pchar = uchar | ":" | "@" | "&" | "=" 182 uchar = unreserved | escape 183 unreserved = alpha | digit | safe | extra | national 185 escape = "%" hex hex 186 hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | 187 "a" | "b" | "c" | "d" | "e" | "f" 189 alpha = lowalpha | hialpha 190 lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | 191 "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | 192 "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" 193 hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | 194 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | 195 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" 197 digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | 198 "8" | "9" 200 safe = "$" | "-" | "_" | "." | "+" 201 extra = "!" | "*" | "'" | "(" | ")" | "," 202 national = "{" | "}" | "|" | "\" | "^" | "~" | "[" | "]" | "`" 203 reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" 204 punctuation = "<" | ">" | "#" | "%" | <"> 206 2.3. Specific Schemes and their Syntactic Categories 208 Each URL scheme has its own rules regarding the presence or absence 209 of the syntactic components described in Sections 2.1 and 2.2. 210 In addition, some schemes are never appropriate for use with relative 211 URLs. However, since relative URLs will only be used within contexts 212 in which they are useful, these scheme-specific differences can be 213 ignored by the resolution process. 215 Within this section, we include as examples only those schemes that 216 have a defined URL syntax in RFC 1738 [2]. The following schemes are 217 never used with relative URLs: 219 mailto Electronic Mail 220 news USENET news 221 telnet TELNET Protocol for Interactive Sessions 223 Some URL schemes allow the use of reserved characters for purposes 224 outside the generic-RL syntax given above. However, such use is 225 rare. Relative URLs can be used with these schemes whenever the 226 applicable base URL follows the generic-RL syntax. 228 gopher Gopher and Gopher+ Protocols 229 prospero Prospero Directory Service 230 wais Wide Area Information Servers Protocol 232 Users of gopher URLs should note that gopher-type information is 233 often included at the beginning of what would be the generic-RL path. 234 If present, this type information prevents relative-path references 235 to documents with differing gopher-types. 237 Finally, the following schemes can always be parsed using the 238 generic-RL syntax. 240 file Host-specific Files 241 ftp File Transfer Protocol 242 http Hypertext Transfer Protocol 243 nntp USENET news using NNTP access 245 It is recommended that new schemes be designed to be parsable via 246 the generic-RL syntax if they are intended to be used with relative 247 URLs. A description of the allowed relative forms should be included 248 when a new scheme is registered, as per Section 4 of RFC 1738 [2]. 250 2.4. Parsing a URL 252 An accepted method for parsing URLs is useful to clarify the 253 generic-RL syntax of Section 2.2 and to describe the algorithm for 254 resolving relative URLs presented in Section 4. This section 255 describes the parsing rules for breaking down a URL (relative or 256 absolute) into the component parts described in Section 2.1. The 257 rules assume that the URL has already been separated from any 258 surrounding text and copied to a "parse string". The rules are 259 listed in the order in which they would be applied by the parser. 261 2.4.1. Parsing the Fragment Identifier 263 If the parse string contains a crosshatch "#" character, then the 264 substring after the first (left-most) crosshatch "#" and up to the 265 end of the parse string is the identifier. If the 266 crosshatch is the last character, or no crosshatch is present, then 267 the fragment identifier is empty. The matched substring, including 268 the crosshatch character, is removed from the parse string before 269 continuing. 271 Note that the fragment identifier is not considered part of the URL. 272 However, since it is often attached to the URL, parsers must be able 273 to recognize and set aside fragment identifiers as part of the 274 process. 276 2.4.2. Parsing the Scheme 278 If the parse string contains a colon ":" after the first character 279 and before any characters not allowed as part of a scheme name 280 (i.e. any not an alphanumeric, plus "+", period ".", or hyphen "-"), 281 the of the URL is the substring of characters up to but not 282 including the first colon. These characters and the colon are then 283 removed from the parse string before continuing. 285 2.4.3. Parsing the Network Location/Login 287 If the parse string begins with a double-slash "//", then the 288 substring of characters after the double-slash and up to, but not 289 including, the next slash "/" character is the network location/login 290 () of the URL. If no trailing slash "/" is present, the 291 entire remaining parse string is assigned to . The 292 double-slash and are removed from the parse string before 293 continuing. 295 2.4.4. Parsing the Query Information 297 If the parse string contains a question mark "?" character, then the 298 substring after the first (left-most) question mark "?" and up to the 299 end of the parse string is the information. If the question 300 mark is the last character, or no question mark is present, then the 301 query information is empty. The matched substring, including the 302 question mark character, is removed from the parse string before 303 continuing. 305 2.4.5. Parsing the Parameters 307 If the parse string contains a semicolon ";" character, then the 308 substring after the first (left-most) semicolon ";" and up to the 309 end of the parse string is the parameters (). If the 310 semicolon is the last character, or no semicolon is present, then 311 is empty. The matched substring, including the semicolon 312 character, is removed from the parse string before continuing. 314 2.4.6. Parsing the Path 316 After the above steps, all that is left of the parse string is 317 the URL and the slash "/" that may precede it. Even though 318 the initial slash is not part of the URL path, the parser must 319 remember whether or not it was present so that later processes 320 can differentiate between relative and absolute paths. Often this 321 is done by simply storing the preceding slash along with the path. 323 3. Establishing a Base URL 325 The term "relative URL" implies that there exists some absolute 326 "base URL" against which the relative reference is applied. Indeed, 327 the base URL is necessary to define the semantics of any embedded 328 relative URLs; without it, a relative reference is meaningless. 329 In order for relative URLs to be usable within a document, the base 330 URL of that document must be known to the parser. 332 The base URL of a document can be established in one of four ways, 333 listed below in order of precedence. The order of precedence can be 334 thought of in terms of layers, where the innermost defined base URL 335 has the highest precedence. This can be visualized graphically as: 337 .---------------------------------------------------------. 338 | .---------------------------------------------------. | 339 | | .---------------------------------------------. | | 340 | | | .---------------------------------------. | | | 341 | | | | (3.1) Base URL embedded in the | | | | 342 | | | | document's content | | | | 343 | | | `---------------------------------------' | | | 344 | | | (3.2) URL defined by a "Base" message | | | 345 | | | header (or equivalent) | | | 346 | | `---------------------------------------------' | | 347 | | (3.3) URL of the document's retrieval context | | 348 | `---------------------------------------------------' | 349 | (3.4) Base URL = "" (undefined) | 350 `---------------------------------------------------------' 352 3.1. Base URL within Document Content 354 Within certain document media types, the base URL of the document 355 can be embedded within the content itself such that it can be 356 readily obtained by a parser. This can be useful for descriptive 357 documents, such as tables of content, which may be transmitted to 358 others through protocols other than their usual retrieval context 359 (e.g. E-Mail or USENET news). 361 It is beyond the scope of this document to specify how, for each 362 media type, the base URL can be embedded. However, an example of 363 how this is done for the Hypertext Markup Language (HTML) [3] is 364 provided in an Appendix (Section 10). 366 3.2. Base URL within Message Headers 368 A second method for identifying the base URL of a document is to 369 specify it within the message headers (or equivalent tagged 370 metainformation) of the message enclosing the document. For 371 protocols that make use of message headers like those described in 372 RFC 822 [5], it is recommended that the format of this header be: 374 base-header = "Base" ":" "" 376 where "Base" is case-insensitive. For example, the header 378 Base: 380 would indicate that any relative URLs found within the document 381 should be parsed relative to . 382 Any whitespace (including that used for line folding) inside the 383 angle brackets should be ignored. 385 Protocols which do not use the RFC 822 message header syntax, but 386 which do allow some form of tagged metainformation to be included 387 within messages, may define their own syntax for passing the base URL 388 as part of a message. Describing the syntax for all possible 389 protocols is beyond the scope of this document. It is assumed that 390 user agents using such a protocol will be able to obtain the 391 appropriate syntax from that protocol's specification. 393 In situations where both an embedded base URL (as described in 394 Section 3.1) and a base-header are present, the embedded base URL 395 takes precedence. 397 3.3. Base URL from the Retrieval Context 399 If neither an embedded base URL nor a base-header is present, then, 400 if a URL was used to retrieve the base document, that URL shall be 401 considered the base URL. Note that if the retrieval was the result 402 of a redirected request, the last URL used (i.e., that which resulted 403 in the actual retrieval of the document) is the base URL. 405 Composite media types, such as the "multipart/*" and "message/*" 406 media types defined by MIME (RFC 1521, [4]), require special 407 processing in order to determine the retrieval context of an enclosed 408 document. For these types, the base URL of the composite entity 409 must be determined first; this base is then considered the retrieval 410 context for its component parts, and thus the base URL for any part 411 that does not define its own base via one of the methods described 412 in Sections 3.1 and 3.2. This logic is applied recursively for 413 component parts that are themselves composite entities. 415 In other words, the retrieval context (Section 3.3) of a component 416 part is the base URL of the composite entity of which it is a part. 417 Thus, a composite entity can redefine the retrieval context of its 418 component parts via inclusion of a base-header, and this redefinition 419 applies recursively for a hierarchy of composite parts. Note that 420 this is not necessarily the same as defining the base URL of the 421 components, since each component may include an embedded base URL 422 or base-header that takes precedence over the retrieval context. 424 3.4. Default Base URL 426 If none of the conditions described in Sections 3.1 -- 3.3 apply, 427 then the base URL is considered to be the empty string and all 428 embedded URLs within that document are assumed to be absolute URLs. 429 It is the responsibility of the distributor(s) of a document 430 containing relative URLs to ensure that the base URL for that 431 document can be established. It must be emphasized that relative 432 URLs cannot be used reliably in situations where the object's base 433 URL is not well-defined. 435 4. Resolving Relative URLs 437 This section describes an example algorithm for resolving URLs 438 within a context in which the URLs may be relative, such that the 439 result is always a URL in absolute form. Although this algorithm 440 cannot guarantee that the resulting URL will equal that intended 441 by the original author, it does guarantee that any valid URL 442 (relative or absolute) can be consistently transformed to an 443 absolute form given a valid base URL. 445 The following steps are performed in order: 447 Step 1: The base URL is established according to the rules of 448 Section 3. If the base URL is the empty string (unknown), 449 the embedded URL is interpreted as an absolute URL and 450 we are done. 452 Step 2: Both the base and embedded URLs are parsed into their 453 component parts as described in Section 2.4. 455 a) If the embedded URL is entirely empty, it inherits the 456 entire base URL (i.e. is set equal to the base URL) 457 and we are done. 459 b) If the embedded URL starts with a scheme name, it is 460 interpreted as an absolute URL and we are done. 462 c) Otherwise, the embedded URL inherits the scheme of 463 the base URL. 465 Step 3: If the embedded URL's is non-empty, we skip to 466 Step 7. Otherwise, the embedded URL inherits the 467 (if any) of the base URL. 469 Step 4: If the embedded URL path is preceded by a slash "/", the 470 path is not relative and we skip to Step 7. 472 Step 5: If the embedded URL path is empty (and not preceded by a 473 slash), then the embedded URL inherits the base URL path, 474 and 476 a) if the embedded URL's is non-empty, we skip to 477 step 7; otherwise, it inherits the of the base 478 URL (if any) and 480 b) if the embedded URL's is non-empty, we skip to 481 step 7; otherwise, it inherits the of the base 482 URL (if any) and we skip to step 7. 484 Step 6: The last segment of the base URL's path (anything 485 following the rightmost slash "/", or the entire path if no 486 slash is present) is removed and the embedded URL's path is 487 appended in its place. The following operations are 488 then applied, in order, to the new path: 490 a) All occurrences of "./", where "." is a complete path 491 segment, are removed. 493 b) If the path ends with "." as a complete path segment, 494 that "." is removed. 496 c) All occurrences of "/../", where and 497 ".." are complete path segments, are removed. Removal of 498 these path segments is performed iteratively, removing the 499 leftmost matching pattern on each iteration, until no 500 matching pattern remains. 502 d) If the path ends with "/..", that "/.." 503 is removed. 505 Step 7: The resulting URL components, including any inherited from 506 the base URL, are recombined to give the absolute form of 507 the embedded URL. 509 Parameters, regardless of their purpose, do not form a part of the 510 URL path and thus have no effect on the resolving of relative paths. 511 In particular, the presence or absence of the ";type=d" parameter 512 on an ftp URL has no effect on the interpretation of paths relative 513 to that URL. Fragment identifiers are only inherited from the base 514 URL when the entire embedded URL is empty. 516 5. Examples and Recommended Practice 518 Within an object with a well-defined base URL of 520 Base: 522 the relative URLs would be resolved as follows: 524 5.1. Normal Examples 526 g:h = 527 g = 528 ./g = 529 g/ = 530 /g = 531 //g = 532 ?y = 533 g?y = 534 g?y/./x = 535 #s = 536 g#s = 537 g#s/./x = 538 g?y#s = 539 ;x = 540 g;x = 541 g;x?y#s = 542 . = 543 ./ = 544 .. = 545 ../ = 546 ../g = 547 ../.. = 548 ../../ = 549 ../../g = 551 5.2. Abnormal Examples 553 Although the following abnormal examples are unlikely to occur 554 in normal practice, all URL parsers should be capable of resolving 555 them consistently. Each example uses the same base as above. 557 An empty reference resolves to the complete base URL: 559 <> = 561 Parsers must be careful in handling the case where there are more 562 relative path ".." segments than there are hierarchical levels in 563 the base URL's path. Note that the ".." syntax cannot be used to 564 change the of a URL. 566 ../../../g = 568 Similarly, parsers must avoid treating "." and ".." as special when 569 they are not complete components of a relative path. 571 /./g = 572 /../g = 573 g. = 574 .g = 575 g.. = 576 ..g = 578 Less likely are cases where the relative URL uses unnecessary or 579 nonsensical forms of the "." and ".." complete path segments. 581 ./../g = 582 ./g/. = 583 g/./h = 584 g/../h = 586 Finally, some older parsers allow the scheme name to be present in 587 a relative URL if it is the same as the base URL scheme. This is 588 considered to be a loophole in prior specifications of partial 589 URLs [1] and should be avoided by future parsers. 591 http:g = 592 http: = 594 5.3. Recommended Practice 596 Authors should be aware that path names which contain a colon 597 ":" character cannot be used as the first component of a relative 598 URL path (e.g. "this:that") because they will likely be mistaken for 599 a scheme name. It is therefore necessary to precede such cases with 600 other components (e.g., "./this:that"), or to escape the colon 601 character (e.g., "this%3Athat"), in order for them to be correctly 602 parsed. The former solution is preferred because it has no effect 603 on the absolute form of the URL. 605 There is an ambiguity in the semantics for the ftp URL scheme 606 regarding the use of a trailing slash ("/") character and/or a 607 parameter ";type=d" to indicate a resource that is an ftp directory. 608 If the result of retrieving that directory includes embedded 609 relative URLs, it is necessary that the base URL path for that result 610 include a trailing slash. For this reason, it is recommended that 611 the ";type=d" parameter value not be used within contexts that allow 612 relative URLs. 614 6. Security Considerations 616 There are no security considerations in the use or parsing of relative 617 URLs. However, once a relative URL has been resolved to its absolute 618 form, the same security considerations apply as those described in 619 RFC 1738 [2]. 621 7. Acknowledgements 623 This work is derived from concepts introduced by Tim Berners-Lee and 624 the World-Wide Web global information initiative. Relative URLs are 625 described as "Partial URLs" in RFC 1630 [1]. That description was 626 expanded for inclusion as an appendix for an early draft of RFC 1738, 627 "Uniform Resource Locators (URL)" [2]. However, after further 628 discussion, the URI-WG decided to specify Relative URLs separately 629 from the primary URL draft. 631 This document is intended to fulfill the requirements for Internet 632 Resource Locators as stated in [6]. It has benefited greatly from 633 the comments of all those participating in the URI-WG. Particular 634 thanks go to Larry Masinter, Michael A. Dolan, Guido van Rossum, and 635 Dave Kristol for identifying problems/deficiencies in earlier drafts. 637 8. References 639 [1] T. Berners-Lee, "Universal Resource Identifiers in WWW: 640 A Unifying Syntax for the Expression of Names and Addresses of 641 Objects on the Network as used in the World-Wide Web", RFC 1630, 642 CERN, June 1994. 644 [2] T. Berners-Lee, L. Masinter, and M. McCahill, Editors, 645 "Uniform Resource Locators (URL)", RFC 1738, CERN, 646 Xerox Corporation, University of Minnesota, December 1994. 647 649 [3] T. Berners-Lee and D. Connolly, "HyperText Markup Language 650 Specification -- 2.0", Work in Progress, MIT, HaL Computer 651 Systems, November 1994. 652 654 [4] N. Borenstein and N. Freed, "MIME (Multipurpose Internet Mail 655 Extensions): Mechanisms for Specifying and Describing the Format 656 of Internet Message Bodies", RFC 1521, Bellcore, Innosoft, 657 September 1993. 659 [5] D. H. Crocker, "Standard for the Format of ARPA Internet 660 Text Messages", STD 11, RFC 822, UDEL, August 1982. 661 663 [6] J. Kunze, "Functional Requirements for Internet Resource 664 Locators", Work in Progress, IS&T, UC Berkeley, January 1995. 665 668 9. Author's Address 670 Roy T. Fielding 671 Department of Information and Computer Science 672 University of California 673 Irvine, CA 92717-3425 674 U.S.A. 676 Tel: +1 (714) 824-4049 677 Fax: +1 (714) 824-4056 678 Email: fielding@ics.uci.edu 680 This Internet-Draft expires July 30, 1995. 682 10. Appendix - Embedding the Base URL in HTML documents. 684 It is useful to consider an example of how the base URL of a 685 document can be embedded within the document's content. In this 686 appendix, we describe how documents written in the Hypertext Markup 687 Language (HTML) [3] can include an embedded base URL. This appendix 688 does not form a part of the relative URL specification and should not 689 be considered as anything more than a descriptive example. 691 HTML defines a special element "BASE" which, when present in the 692 "HEAD" portion of a document, signals that the parser should use 693 the BASE element's "HREF" attribute as the base URL for resolving 694 any relative URLs. The "HREF" attribute must be an absolute URL. 695 Note that, in HTML, element and attribute names are case-insensitive. 696 For example: 698 699 700 An example HTML document 701 702 703 ... a hypertext anchor ... 704 706 A parser reading the example document should interpret the given 707 relative URL "../x" as representing the absolute URL 709 711 regardless of the context in which the example document was obtained.