idnits 2.17.1 draft-ietf-appsawg-file-scheme-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC1738, but the abstract doesn't seem to directly say this. It does mention RFC1738 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 28, 2015) is 3255 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4395 (ref. 'BCP115') (Obsoleted by RFC 7595) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR15' -- Obsolete informational reference (is this intentional?): RFC 1738 (Obsoleted by RFC 4248, RFC 4266) -- Obsolete informational reference (is this intentional?): RFC 3530 (Obsoleted by RFC 7530) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Applications Area Working Group M. Kerwin 3 Internet-Draft QUT 4 Obsoletes: 1738 (if approved) May 28, 2015 5 Intended status: Standards Track 6 Expires: November 29, 2015 8 The file URI Scheme 9 draft-ietf-appsawg-file-scheme-02 11 Abstract 13 This document specifies the "file" Uniform Resource Identifier (URI) 14 scheme, obsoleting the definition in RFC 1738. 16 It attemps to define a common core which is intended to interoperate 17 across the broad spectrum of existing implementations, while at the 18 same time documenting other current practices. 20 Note to Readers (To be removed by the RFC Editor) 22 This draft should be discussed on the IETF Applications Area Working 23 Group discussion list . 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on November 29, 2015. 42 Copyright Notice 44 Copyright (c) 2015 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. History . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 1.2. Similar Technologies . . . . . . . . . . . . . . . . . . 3 62 1.3. Notational Conventions . . . . . . . . . . . . . . . . . 4 63 2. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 64 3. Operations on file URIs . . . . . . . . . . . . . . . . . . . 5 65 3.1. Translating Local File Path to file URI . . . . . . . . . 5 66 3.2. Translating UNC String to file URI . . . . . . . . . . . 6 67 3.3. Translating Non-local File Path to file URI . . . . . . . 7 68 3.4. Incompatible File Paths . . . . . . . . . . . . . . . . . 7 69 3.4.1. Win32 Namespaces . . . . . . . . . . . . . . . . . . 7 70 4. Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 8 71 5. Security Considerations . . . . . . . . . . . . . . . . . . . 8 72 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 73 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 74 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 75 8.1. Normative References . . . . . . . . . . . . . . . . . . 10 76 8.2. Informative References . . . . . . . . . . . . . . . . . 10 77 Appendix A. Example URIs . . . . . . . . . . . . . . . . . . . . 12 78 Appendix B. System-specific Operations . . . . . . . . . . . . . 12 79 B.1. POSIX Systems . . . . . . . . . . . . . . . . . . . . . . 12 80 B.2. DOS- and Windows-Like Systems . . . . . . . . . . . . . . 12 81 B.3. Mac OS X Systems . . . . . . . . . . . . . . . . . . . . 13 82 B.4. OpenVMS Files-11 Systems . . . . . . . . . . . . . . . . 13 83 Appendix C. Nonstandard Syntax Variations . . . . . . . . . . . 13 84 C.1. DOS and Windows Drive Letters . . . . . . . . . . . . . . 13 85 C.1.1. Relative Paths . . . . . . . . . . . . . . . . . . . 14 86 C.1.2. Vertical Bar Character . . . . . . . . . . . . . . . 14 87 C.2. UNC Paths . . . . . . . . . . . . . . . . . . . . . . . . 15 88 C.3. Backslash as Separator . . . . . . . . . . . . . . . . . 16 89 Appendix D. Example of IRI vs Percent-Encoded URI . . . . . . . 16 90 Appendix E. UNC Syntax . . . . . . . . . . . . . . . . . . . . . 17 91 Appendix F. Collected Rules . . . . . . . . . . . . . . . . . . 18 92 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 19 94 1. Introduction 96 A file URI identifies a file on a particular file system. It can be 97 used in discussions about the file, and if other conditions are met 98 it can be dereferenced to directly access the file. 100 The file URI scheme is not coupled with a specific protocol. As 101 such, there is no well-defined set of operations that can be 102 performed on file URIs, nor a specific media type associated with 103 them. 105 This document defines a syntax that is compatible with most extant 106 implementations, while attempting to push towards a stricter subset 107 of "ideal" constructs. In many cases it simultaneously acknowledges 108 and deprecates some less common or outdated constructs. 110 1.1. History 112 The file URI scheme was first defined in [RFC1630], which, being an 113 informational RFC, does not specify an Internet standard. The 114 definition was standardised in [RFC1738], and the scheme was 115 registered with the Internet Assigned Numbers Authority (IANA); 116 however that definition omitted certain language included by former 117 that clarified aspects such as: 119 o the use of slashes to denote boundaries between directory levels 120 of a hierarchical file system; and 122 o the requirement that client software convert the file URI into a 123 file name in the local file name conventions. 125 The Internet draft [I-D.hoffman-file-uri] was written in an effort to 126 keep the file URI scheme on standards track when [RFC1738] was made 127 obsolete, but that draft expired in 2005. It enumerated concerns 128 arising from the various, often conflicting implementations of the 129 scheme. It serves as the spiritual predecessor of this document. 131 Additionally the WHATWG defines a living URL standard [WHATWG-URL], 132 which includes algorithms for interpreting file URIs (as URLs). 134 1.2. Similar Technologies 136 The Universal Naming Convention (UNC) [MS-DTYP] defines a string 137 format that can perform a similar role to the file URI scheme in 138 describing the location of files. A UNC filespace selector string 139 has three parts: host, share, and path; see: Appendix E. This 140 document describes a means of translating between UNC filespace 141 selector strings and file URIs. 143 1.3. Notational Conventions 145 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 146 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 147 document are to be interpreted as described in [RFC2119]. 149 Throughout this document the term "local" is used to describe files 150 that can be accessed directly through the local file system. It is 151 important to note that a local file may not be physically located on 152 the local machine, for example if a networked file system is 153 transparently mounted into the local file system. 155 2. Syntax 157 The file URI syntax is defined here in Augmented Backus-Naur Form 158 (ABNF) [RFC5234], including the core ABNF syntax rule "ALPHA" defined 159 by that specification, and importing the "userinfo", "host", 160 "authority" and "path-absolute" rules from [RFC3986] (as updated by 161 [RFC6874].) 163 Please note Appendix C that lists some other commonly seen but 164 nonstandard variations. 166 file-URI = file-scheme ":" file-hier-part 168 file-scheme = "file" 170 file-hier-part = "//" auth-path 171 / local-path 173 auth-path = [ file-auth ] path-absolute 175 local-path = path-absolute 177 file-auth = [ userinfo "@" ] host 179 The syntax definition above is different from those given in 180 [RFC1630] and [RFC1738] as it is derived from the generic syntax of 181 [RFC3986], which post-dates all previous specifications. 183 Systems exhibit different levels of case-sensitivity. Unless the 184 file system is known to be case-insensitive, implementations MUST 185 maintain the case of file and directory names when translating file 186 URIs to and from the local system's representation of file paths, and 187 any systems or devices that transport file URIs MUST NOT alter the 188 case of file URIs they transport. 190 3. Operations on file URIs 192 Implementations SHOULD, at a minimum, provide a read-like operation 193 to return the contents of a file located by a file URI. Additional 194 operations MAY be provided, such as writing, creating, and deleting 195 files. See the POSIX file and directory operations [POSIX] for 196 examples of standardized operations that can be performed on files. 198 File URIs can also be translated to and from local file paths or UNC 199 strings. 201 A file URI can only be translated to a local file path if it has a 202 blank or no authority. Note that this differs from the previous 203 specification in [RFC1738], in that previously an authority of 204 "localhost" was used to refer to the local file system, but in this 205 specification it equates to a UNC string with the host "localhost". 207 This specification neither defines nor forbids a mechanism for 208 accessing non-local files. See SMB [MS-SMB], NFS [RFC3530], NCP 209 [NOVELL] for examples of protocols that can be used to access files 210 over a network. 212 3.1. Translating Local File Path to file URI 214 Below is an algorithmic description of the process used to convert a 215 file path to an Internationalized Resource Identifier (IRI) 216 [RFC3987], which can then be translated to a URI as per Section 3.1 217 of [RFC3987]; see: Section 4. 219 1. Resolve the file path to its fully qualified absolute form. 221 2. Initialise the URI with the "file:" scheme identifier. 223 3. If including an empty authority field, append the "//" sigil to 224 the URI. 226 4. Append a slash character "/" to the URI, to signify the path 227 root. 229 5. For each directory in the path after the root: 231 1. Transform the directory name to a path segment ([RFC3986], 232 Section 3.3) as per Section 2 of [RFC3986]. 234 2. Append the transformed segment and a delimiting slash 235 character "/" to the URI. 237 6. If the path includes a file name: 239 1. Transform the file name to a path segment as above. 241 2. Append the transformed segment to the URI. 243 Differences from RFC 1738 245 In [RFC1738] a file URL always started with the token "file://", 246 followed by an authority and a "/". That "/" was not considered part 247 of the path. This implies that the correct encoding for a file path 248 in a UNIX-like environment would have been: 250 token + authority + slash + path 251 = "file://" + "" + "/" + "/path/to/file.txt" 252 = "file:////path/to/file.txt" 254 However that construct was never used in practice, and in fact would 255 have collided with the eventual encoding of UNC strings in URIs 256 described in Appendix C.2. 258 3.2. Translating UNC String to file URI 260 A UNC filespace selector string can be directly translated to an 261 Internationalized Resource Identifier (IRI) [RFC3987], which can then 262 be translated to a URI as per Section 3.1 of [RFC3987]; see: 263 Section 4. 265 1. Initialise the URI with the "file:" scheme identifier. 267 2. Append the authority: 269 1. Append the "//" authority sigil to the URI. 271 2. Append the hostname field of the UNC string to the URI. 273 3. Append the sharename: 275 1. Transform the sharename to a path segment ([RFC3986], 276 Section 3.3) as per Section 2 of [RFC3986]. 278 2. Append a delimiting slash character "/" and the transformed 279 segment to the URI. 281 4. For each objectname: 283 1. Transform the objectname to a path segment ([RFC3986], 284 Section 3.3) as per Section 2 of [RFC3986]. 286 2. Append a delimiting slash character "/" and the transformed 287 segment to the URI. 289 Example: 291 UNC String: \\host.example.com\Share\path\to\file.txt 292 URI: file://host.example.com/Share/path/to/file.txt 294 Differences from RFC 1738 296 In [RFC1738] a file URL an authority of "localhost" was used to refer 297 to the local file system, but in this specification it equates to a 298 UNC string with the host "localhost". 300 3.3. Translating Non-local File Path to file URI 302 Translating a non-local file path other than a UNC string to a file 303 URI follows the same basic algorithm as for local files, above, 304 except that the authority MUST refer to the network-accesible node 305 that hosts the file. 307 For example, in a clustered OpenVMS Files-11 system the authority 308 would contain the node name. Where the original node reference 309 includes a username and password in an access control string, they 310 MAY be transcribed into the userinfo field of the authority 311 ([RFC3986], Section 3.2.1), security considerations (Section 5) 312 notwithstanding. 314 3.4. Incompatible File Paths 316 Some conventional file path formats are known to be incompatible with 317 the file URI scheme. 319 3.4.1. Win32 Namespaces 321 The Microsoft Windows API defines Win32 Namespaces [Win32-Namespaces] 322 for interacting with files and devices using Windows API functions. 323 These namespaced paths are prefixed by "\\?\" for Win32 File 324 Namespaces and "\\.\" for Win32 Device Namespaces. There is also a 325 special case for UNC file paths in Win32 File Namespaces, referred to 326 as "Long UNC", using the prefix "\\?\UNC\". 328 This specification does not define a mechanism for translating 329 namespaced paths to or from file URIs. 331 4. Encoding 333 The encoding of a file URI depends on the file system. If the file 334 system uses a known non-Unicode character encoding, the path SHOULD 335 be converted to a sequence of characters from the Universal Character 336 Set [ISO10646] normalized according to Normalization Form C (NFC) 337 [UTR15], before being translated to a file URI, and conversely a file 338 URI SHOULD be converted back to the file system's native encoding 339 when translating to a file path. 341 Note that many modern file systems encode directory and file names 342 as arbitrary sequences of octets. In those cases, the 343 representation as an encoded string often depends on the user's 344 localization settings, or defaults to UTF-8 [STD63]. 346 When the file system's encoding is not known the file URI SHOULD be 347 transported as an Internationalized Resource Identifier (IRI) 348 [RFC3987] to avoid ambiguity. See Appendix D for examples. 350 5. Security Considerations 352 There are many security considerations for URI schemes discussed in 353 [RFC3986]. 355 File access and the granting of privileges for specific operations 356 are complex topics, and the use of file URIs can complicate the 357 security model in effect for file privileges. Software using file 358 URIs MUST NOT grant greater access than would be available for other 359 file access methods. 361 File systems typically assign an operational meaning to special 362 characters, such as the "/", "\", ":", "[", and "]" characters, and 363 to special device names like ".", "..", "...", "aux", "lpt", etc. In 364 some cases, merely testing for the existence of such a name will 365 cause the operating system to pause or invoke unrelated system calls, 366 leading to significant security concerns regarding denial of service 367 and unintended data transfer. It would be impossible for this 368 specification to list all such significant characters and device 369 names. Implementers MUST research the reserved names and characters 370 for the types of storage device that may be attached to their 371 application and restrict the use of data obtained from URI components 372 accordingly. 374 Additionally, as discussed in the HP OpenVMS Systems Documentation 375 http://h71000.www7.hp.com/doc/84final/ba554_90015/ch03s09.html 376 "access control strings include sufficient information to allow 377 someone to break in to the remote account, [therefore] they create 378 serious security exposure." In a similar vein, the presence of a 379 password in a "user:password" userinfo field is deprecated by 380 [RFC3986]. As such, the userinfo field of a file URI, if present, 381 MUST NOT contain a password. 383 6. IANA Considerations 385 This document defines the following URI scheme, so the "Permanent URI 386 Schemes" registry has been updated accordingly. This registration 387 complies with [BCP115]. 389 Scheme name: 390 file 392 Status: 393 permanent 395 Applications/protocols that use this scheme name: 396 Commonly used in hypertext documents to refer to files without 397 depending on network access. Supported by major browsers. 399 Windows API (PathCreateFromUrl, UrlCreateFromPath). 401 Perl LWP. 403 Contact: 404 Matthew Kerwin 406 Change Controller: 407 This scheme is registered under the IETF tree. As such, the IETF 408 maintains change control. 410 [RFC Editor Note: Replace XXXX with this RFC's reference.] 412 7. Acknowledgements 414 This specification is derived from [RFC1738], [RFC3986], and 415 [I-D.hoffman-file-uri] (expired); the acknowledgements in those 416 documents still apply. 418 Additional thanks to Dave Risney, author of the informative IE Blog 419 article http://blogs.msdn.com/b/ie/archive/2006/12/06/file-uris-in- 420 windows.aspx , and Dave Thaler for their comments and suggestions. 422 8. References 423 8.1. Normative References 425 [BCP115] Hansen, T., Hardie, T., and L. Masinter, "Guidelines and 426 Registration Procedures for New URI Schemes", BCP 35, RFC 427 4395, February 2006. 429 [ISO10646] 430 International Organization for Standardization, 431 "Information Technology - Universal Multiple-Octet Coded 432 Character Set (UCS)", ISO/IEC 10646:2003, December 2003. 434 [RFC1035] Mockapetris, P., "Domain names - implementation and 435 specification", STD 13, RFC 1035, November 1987. 437 [RFC1123] Braden, R., "Requirements for Internet Hosts - Application 438 and Support", STD 3, RFC 1123, October 1989. 440 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 441 Requirement Levels", BCP 14, RFC 2119, March 1997. 443 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 444 Resource Identifier (URI): Generic Syntax", STD 66, RFC 445 3986, January 2005. 447 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 448 Identifiers (IRIs)", RFC 3987, January 2005. 450 [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing 451 Architecture", RFC 4291, February 2006. 453 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 454 Specifications: ABNF", STD 68, RFC 5234, January 2008. 456 [RFC6874] Carpenter, B., Cheshire, S., and R. Hinden, "Representing 457 IPv6 Zone Identifiers in Address Literals and Uniform 458 Resource Identifiers", RFC 6874, February 2013. 460 [UTR15] Davis, M. and K. Whistler, "Unicode Normalization Forms", 461 August 2012. 463 8.2. Informative References 465 [Bug107540] 466 Bugzilla@Mozilla, "Bug 107540", October 2007, 467 . 469 [I-D.hoffman-file-uri] 470 Hoffman, P., "The file URI Scheme", draft-hoffman-file- 471 uri-03 (work in progress), January 2005. 473 [MS-DTYP] Microsoft Open Specifications, "Windows Data Types, 2.2.56 474 UNC", January 2013, 475 . 477 [MS-NBTE] Microsoft Open Specifications, "NetBIOS over TCP (NBT) 478 Extensions", May 2014, 479 . 481 [MS-SMB] Microsoft Open Specifications, "Server Message Block (SMB) 482 Protocol", January 2013, 483 . 485 [NOVELL] Novell, "NetWare Core Protocols", 2013, 486 . 489 [POSIX] IEEE, "IEEE Std 1003.1, 2013 Edition", 2013. 491 [RFC1630] Berners-Lee, T., "Universal Resource Identifiers in WWW: A 492 Unifying Syntax for the Expression of Names and Addresses 493 of Objects on the Network as used in the World-Wide Web", 494 RFC 1630, June 1994. 496 [RFC1738] Berners-Lee, T., Masinter, L., and M. McCahill, "Uniform 497 Resource Locators (URL)", RFC 1738, December 1994. 499 [RFC3530] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., 500 Beame, C., Eisler, M., and D. Noveck, "Network File System 501 (NFS) version 4 Protocol", RFC 3530, April 2003. 503 [STD63] Yergeau, F., "UTF-8, a transformation format of ISO 504 10646", STD 63, RFC 3629, November 2003. 506 [WHATWG-URL] 507 WHATWG, "URL Living Standard", May 2013, 508 . 510 [Win32-Namespaces] 511 Microsoft Developer Network, "Naming Files, Paths, and 512 Namespaces", June 2013. 514 Appendix A. Example URIs 516 The syntax in Section 2 is intended to support file URIs that take 517 the following forms: 519 Local files: 521 o "file:///path/to/file" 523 A traditional file URI for a local file, with an empty 524 authority. This is the most common format in use today. 526 o "file:/path/to/file" 528 The minimal representation of a local file, with no authority 529 field and an absolute path that begins with a slash "/". 531 Non-local files: 533 o "file://host.example.com/path/to/file" 535 The representation of a non-local file, with an explicit 536 authority. Note that, unlike in [RFC1738], the string 537 "localhost" in the authority signifies a non-local file. 539 Appendix B. System-specific Operations 541 This section provides examples of some system-specific thingies. 543 This section is not normative. [FIXME: it's also incomplete] 545 B.1. POSIX Systems 547 o No special considerations (file URIs are based on UNIX paths) 549 B.2. DOS- and Windows-Like Systems 551 When mapping a DOS- or Windows-like file path to a URI, use the drive 552 letter (e.g. "c:") as the first path segment. 554 Some implementations leave the leading slash off before the drive 555 letter. See Appendix C.1) 557 Some implementations replace ":" with "|", while others leave it off 558 completely. See Appendix C.1.2 560 B.3. Mac OS X Systems 562 o HFS+ uses non-standard UTF-8 [STD63] form (like NFD) 564 * take care transforming <-> NFC [UTR15] 566 B.4. OpenVMS Files-11 Systems 568 When mapping a VMS file path to a file URI, use the device name as 569 the first path segment. Note that the dollars sign "$" is a reserved 570 character ([RFC3986], Section 2.2), so should be percent-encoded. 572 If the VMS file path includes a node reference, use that as the 573 authority. Where the original node reference includes a username and 574 password in an access control string, they can be transcribed into 575 the userinfo field of the authority ([RFC3986], Section 3.2.1), 576 security considerations (Section 5) notwithstanding. 578 Appendix C. Nonstandard Syntax Variations 580 These variations may be encountered for historical reasons, but are 581 not supported by the normative syntax of Section 2. 583 This section is not normative. 585 C.1. DOS and Windows Drive Letters 587 On Windows- or DOS-based file systems a absolute file path can begin 588 with a drive letter. To facilitate this, the "local-path" rule in 589 Section 2 can be replaced with the following: 591 local-path = [ drive-letter ] path-absolute 593 drive-letter = ALPHA ":" 595 This is intended to support URIs of the form: 597 o "file:c:/path/to/file" 599 The minimal representation of a local file in a DOS- or 600 Windows-based environment, with no authority field and an 601 absolute path that begins with a drive letter. 603 Note that comparison of drive letters in DOS or Windows file paths is 604 case-insensitive. Some implementations therefore canonicalize drive 605 letters in file URIs by converting them to uppercase. 607 C.1.1. Relative Paths 609 In DOS- or Windows-based file systems, relative paths beginning with 610 a slash "/" should be resolved relative to the drive letter, and 611 resolution of ".." dot segments (per Section 5.2.4 of [RFC3986]) 612 should not ever overwrite the drive letter. 614 e.g.: 616 base: file:///c:/path/to/file.txt 617 rel. URI: /some/other/thing.bmp 618 resolved: file:///c:/some/other/thing.bmp 620 base: file:///c:/foo.txt 621 rel. URI: ../../bar.txt 622 resolved: file:///c:/bar.txt 624 Relative paths with a drive letter followed by a character other than 625 a slash (e.g. "c:bar/baz.txt" or "c:../foo.txt") should not be 626 accepted as valid URIs in DOS or Windows systems. 628 C.1.2. Vertical Bar Character 630 Historically some implementations have used a vertical line character 631 "|" instead of a colon ":" in the drive letter construct. [RFC3986] 632 forbids the use of the vertical line, however it may be necessary to 633 interpret or update old URIs. 635 For interpreting such URIs, the "auth-path" and "local-path" rules in 636 Section 2 and the "drive-letter" rule above are replaced with the 637 following: 639 auth-path = [ file-auth ] path-absolute 640 / [ file-auth ] file-absolute 642 local-path = [ drive-letter ] path-absolute 643 / file-absolute 645 file-absolute = "/" drive-letter path-absolute 647 drive-letter = ALPHA ":" 648 / ALPHA "|" 650 This is intended to support URIs of the form: 652 o "file:///c|/path/to/file" 654 o "file:/c|/path/to/file" 655 o "file:c|/path/to/file" 657 Regular DOS or Windows file URIs, with vertical line characters 658 in the drive letter construct. 660 To update such an old URI, replace the vertical line "|" with a colon 661 ":". 663 C.2. UNC Paths 665 It is common to encounter file URIs that encode entire UNC strings in 666 the path, with all backslash "\" characters replaced with slashes 667 "/". 669 To interpret such URIs, the "auth-path" rule in Section 2 is replaced 670 with the following: 672 auth-path = [ file-auth ] path-absolute 673 / unc-authority path-absolute 675 unc-authority = 2*3"/" [ userinfo "@" ] file-host 677 file-host = inline-IP / IPv4address / reg-name 679 inline-IP = "%5B" ( IPv6address / IPvFuture ) "%5D" 681 This syntax uses the "userinfo", "IPv4address, "IPv6address", 682 "IPvFuture", and "reg-name` rules from [RFC3986]. 684 Note that the "file-host" rule is the same as "host" but with 685 percent-encoding applied to "[" and "]" characters. 687 This extended syntax is intended to support URIs that take the 688 following forms, in addition to those in Appendix A: 690 Non-local files: 692 o "file:////host.example.com/path/to/file" 694 The "traditional" representation of a non-local file, with an 695 empty authority and a complete (transformed) UNC string in the 696 path. 698 o "file://///host.example.com/path/to/file" 700 As above, with an extra slash between the empty authority and 701 the transformed UNC string, conformant with the definition from 703 [RFC1738]. This representation is notably used by the Firefox 704 web browser. See Bugzilla#107540 [Bug107540]. 706 It also further limits the set of file URIs that can be translated to 707 a local file path to those with a path that does not encode a UNC 708 string. 710 C.3. Backslash as Separator 712 Historically some implementations have copied entire file paths into 713 the path components of file URIs. Where DOS or Windows file paths 714 were copied thus, resulting URI strings contained unencoded backslash 715 "\" characters, which are forbidden by both [RFC1738] and [RFC3986]. 717 It may be possible to translate or update such an invalid file URI by 718 replacing all backslashes "\" with slashes "/", if it can be 719 determined with reasonable certainty that the backslashes are 720 intended as path separators. 722 Appendix D. Example of IRI vs Percent-Encoded URI 724 The following examples demonstrate the advantage of encoding file 725 URIs as IRIs to avoid ambiguity (see Section 4). 727 Example: file IRI: 729 | Bytes of file IRI in a UTF-8 document: 730 | 66 69 6c 65 3a 43 3a 2f 72 65 c3 a7 75 2e 74 78 74 731 | f i l e : c : / r e ( c ) u . t x t 732 | 733 | Interpretation: 734 | A file named "recu.txt" with a cedilla on the "c", in the 735 | directory "C:\" of a DOS or Windows file system. 736 | 737 | Character value sequences of file paths, for various file system 738 | encodings: 739 | 740 | o UTF-16 (e.g. NTFS): 741 | 0043 003a 005c 0072 0065 00e7 0075 002e 0074 0078 0074 742 | 743 | o Codepage 437 (e.g. MS-DOS): 744 | 43 3a 5c 72 65 87 75 2e 74 78 74 746 Counter-example: ambiguous file URI: 748 | Percent-encoded file URI, in any ASCII-compatible document: 749 | "file:///%E3%81%A1" 750 | 751 | Possible interpretations of the file name, depending on the 752 | (unknown) encoding of the file system: 753 | 754 | o UTF-8: 755 | 756 | 757 | o Codepage 437: 758 | + 759 | + 760 | 761 | 762 | o EBCDIC: 763 | "Ta~" 764 | 765 | etc. 767 Appendix E. UNC Syntax 769 The UNC filespace selector string is a null-terminated sequence of 770 characters from the Universal Character Set [ISO10646]. 772 The syntax of a UNC filespace selector string, as defined by 773 [MS-DTYP], is given here in Augmented Backus-Naur Form (ABNF) 774 [RFC5234] for convenience. Note that this definition is informative 775 only; the normative description is in [MS-DTYP]. 777 UNC = "\\" hostname "\" sharename *( "\" objectname ) 778 hostname = netbios-name / fqdn / ip-address 779 sharename = 780 objectname = 782 o "netbios-name" from [MS-NBTE], Section 2.2.1. 784 o "fqdn" from [RFC1035] or [RFC1123] 786 o "ip-address" from Section 2.1 of [RFC1123], or Section 2.2 of 787 [RFC4291]. 789 The precise format of "sharename" depends on the protocol; see: SMB 790 [MS-SMB], NFS [RFC3530], NCP [NOVELL]. 792 Appendix F. Collected Rules 794 Here are the collected syntax rules for all optional appendices, 795 presented for convenience. 797 file-URI = file-scheme ":" file-hier-part 799 file-scheme = "file" 801 file-hier-part = "//" auth-path 802 / local-path 804 auth-path = [ file-auth ] path-absolute 805 / [ file-auth ] file-absolute 806 / unc-authority path-absolute 808 local-path = [ drive-letter ] path-absolute 809 / file-absolute 811 file-auth = [ userinfo "@" ] host 813 unc-authority = 2*3"/" [ userinfo "@" ] file-host 815 file-host = inline-IP / IPv4address / reg-name 817 inline-IP = "%5B" ( IPv6address / IPvFuture ) "%5D" 819 file-absolute = "/" drive-letter path-absolute 821 drive-letter = ALPHA ":" 822 / ALPHA "|" 824 This collected syntax is intended to support file URIs that take the 825 following forms: 827 Local files: 829 o "file:///path/to/file" 831 A traditional file URI for a local file, with an empty 832 authority. 834 o "file:/path/to/file" 836 The minimal representation of a local file, with no authority 837 field and an absolute path that begins with a slash "/". 839 o "file:c:/path/to/file" 840 The minimal representation of a local file in a DOS- or 841 Windows-based environment, with no authority field and an 842 absolute path that begins with a drive letter. 844 o "file:///c|/path/to/file" 846 o "file:/c|/path/to/file" 848 o "file:c|/path/to/file" 850 Regular DOS or Windows file URIs, with vertical line characters 851 in the drive letter construct. 853 Non-local files: 855 o "file://host.example.com/path/to/file" 857 The representation of a non-local file, with an explicit 858 authority. 860 o "file:////host.example.com/path/to/file" 862 The "traditional" representation of a non-local file, with an 863 empty authority and a complete (transformed) UNC string in the 864 path. 866 o "file://///host.example.com/path/to/file" 868 As above, with an extra slash between the empty authority and 869 the transformed UNC string. 871 Author's Address 873 Matthew Kerwin 874 Queensland University of Technology 875 Victoria Park Road 876 Kelvin Grove, QLD 4059 877 Australia 879 Email: matthew.kerwin@qut.edu.au