idnits 2.17.1 draft-ietf-appsawg-file-scheme-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC1738, but the abstract doesn't seem to directly say this. It does mention RFC1738 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 12, 2016) is 2936 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR15' -- Obsolete informational reference (is this intentional?): RFC 1738 (Obsoleted by RFC 4248, RFC 4266) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Applications Area Working Group M. Kerwin 3 Internet-Draft QUT 4 Obsoletes: 1738 (if approved) April 12, 2016 5 Intended status: Standards Track 6 Expires: October 14, 2016 8 The file URI Scheme 9 draft-ietf-appsawg-file-scheme-06 11 Abstract 13 This document specifies the "file" Uniform Resource Identifier (URI) 14 scheme, obsoleting the definition in RFC 1738. 16 It attempts to define a common core which is intended to interoperate 17 across the broad spectrum of existing implementations, while at the 18 same time documenting other current practices. 20 *Note to Readers (To be removed by the RFC Editor)* 22 This draft should be discussed on the IETF Applications Area Working 23 Group discussion list . 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on October 14, 2016. 42 Copyright Notice 44 Copyright (c) 2016 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. History . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 1.2. Similar Technologies . . . . . . . . . . . . . . . . . . 3 62 1.3. Notational Conventions . . . . . . . . . . . . . . . . . 4 63 2. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 64 3. Operations on file URIs . . . . . . . . . . . . . . . . . . . 5 65 3.1. Translating Local File Path to file URI . . . . . . . . . 5 66 3.2. Translating Non-local File Path to file URI . . . . . . . 6 67 3.3. Incompatible File Paths . . . . . . . . . . . . . . . . . 7 68 3.3.1. Win32 Namespaces . . . . . . . . . . . . . . . . . . 7 69 4. Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 7 70 5. Origins . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 71 6. Security Considerations . . . . . . . . . . . . . . . . . . . 8 72 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 73 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 74 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 75 9.1. Normative References . . . . . . . . . . . . . . . . . . 9 76 9.2. Informative References . . . . . . . . . . . . . . . . . 11 77 Appendix A. Example URIs . . . . . . . . . . . . . . . . . . . . 12 78 Appendix B. System-specific Operations . . . . . . . . . . . . . 12 79 B.1. POSIX Systems . . . . . . . . . . . . . . . . . . . . . . 12 80 B.2. DOS- and Windows-Like Systems . . . . . . . . . . . . . . 13 81 B.3. Mac OS X Systems . . . . . . . . . . . . . . . . . . . . 13 82 B.4. OpenVMS Files-11 Systems . . . . . . . . . . . . . . . . 13 83 Appendix C. Nonstandard Syntax Variations . . . . . . . . . . . 13 84 C.1. DOS and Windows Drive Letters . . . . . . . . . . . . . . 13 85 C.1.1. Relative Paths . . . . . . . . . . . . . . . . . . . 14 86 C.1.2. Vertical Bar Character . . . . . . . . . . . . . . . 14 87 C.2. UNC Strings . . . . . . . . . . . . . . . . . . . . . . . 15 88 C.3. UNC Paths . . . . . . . . . . . . . . . . . . . . . . . . 16 89 C.4. Backslash as Separator . . . . . . . . . . . . . . . . . 17 90 Appendix D. Example of IRI vs Percent-Encoded URI . . . . . . . 17 91 Appendix E. UNC Syntax . . . . . . . . . . . . . . . . . . . . . 18 92 Appendix F. Collected Rules . . . . . . . . . . . . . . . . . . 19 93 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 20 95 1. Introduction 97 A file URI identifies a file on a particular file system. It can be 98 used in discussions about the file, and if other conditions are met 99 it can be dereferenced to directly access the file. 101 The file URI scheme is not coupled with a specific protocol, nor with 102 a specific media type. See Section 3 for a discussion of operations 103 that can be performed on a file URI. 105 This document defines a syntax that is compatible with most extant 106 implementations, while attempting to push towards a stricter subset 107 of "ideal" constructs. In many cases it simultaneously acknowledges 108 and deprecates some less common or outdated constructs. 110 1.1. History 112 The file URI scheme was first defined in [RFC1630], which, being an 113 informational RFC, does not specify an Internet standard. The 114 definition was standardised in [RFC1738], and the scheme was 115 registered with the Internet Assigned Numbers Authority (IANA); 116 however that definition omitted certain language included by the 117 former that clarified aspects such as: 119 o the use of slashes to denote boundaries between directory levels 120 of a hierarchical file system; and 122 o the requirement that client software convert the file URI into a 123 file name in the local file name conventions. 125 The Internet draft [I-D.hoffman-file-uri] was written in an effort to 126 keep the file URI scheme on standards track when [RFC1738] was made 127 obsolete, but that draft expired in 2005. It enumerated concerns 128 arising from the various, often conflicting implementations of the 129 scheme. It serves as the spiritual predecessor of this document. 131 Additionally the WHATWG defines a living URL standard [WHATWG-URL], 132 which includes algorithms for interpreting file URIs (as URLs). 134 1.2. Similar Technologies 136 The Universal Naming Convention (UNC) [MS-DTYP] defines a string 137 format that can perform a similar role to the file URI scheme in 138 describing the location of files. A UNC filespace selector string 139 has three parts: host, share, and path; see Appendix E. This 140 document describes but does not specify a means of translating 141 between UNC filespace selector strings and file URIs in Appendix C.2. 143 1.3. Notational Conventions 145 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 146 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 147 document are to be interpreted as described in [RFC2119]. 149 Throughout this document the term "local" is used to describe files 150 that can be accessed directly through the local file system. It is 151 important to note that a local file may not be physically located on 152 the local machine, for example if a networked file system is 153 transparently mounted into the local file system. 155 2. Syntax 157 The file URI syntax is defined here in Augmented Backus-Naur Form 158 (ABNF) [RFC5234], including the core ABNF syntax rule "ALPHA" defined 159 by that specification, and importing the "userinfo", "host" and 160 "path-absolute" rules from [RFC3986] (as updated by [RFC6874].) 162 The core syntax in [RFC3986] includes "path" and "authority" 163 components, for each of which only a subset is used in the definition 164 of the file URI scheme. The relevant subset of "path" is "path- 165 absolute", and the subset of "authority" is "file-auth", given below. 167 Please note Appendix C that lists some other commonly seen but 168 nonstandard variations. 170 file-URI = file-scheme ":" file-hier-part 172 file-scheme = "file" 174 file-hier-part = "//" auth-path 175 / local-path 177 auth-path = [ file-auth ] path-absolute 179 local-path = path-absolute 181 file-auth = [ userinfo "@" ] host 183 The syntax definition above is different from those given in 184 [RFC1630] and [RFC1738] as it is derived from the generic syntax of 185 [RFC3986], which post-dates all previous specifications. 187 As a special case, the "file-auth" rule can match the string 188 "localhost" or the empty string; either value is interpreted as "the 189 machine from which the URI is being interpreted," exactly as if no 190 authority was present. To maximise compatibility with previous 191 specifications, implementations MAY choose to include an empty "file- 192 auth". 194 Systems exhibit different levels of case-sensitivity. Unless the 195 file system is known to be case-insensitive, implementations MUST 196 maintain the case of file and directory names when translating file 197 URIs to and from the local system's representation of file paths, and 198 any systems or devices that transport file URIs MUST NOT alter the 199 case of file URIs they transport. 201 3. Operations on file URIs 203 Implementations that provide dereferencing operations on file URIs 204 SHOULD, at a minimum, provide a read-like operation to return the 205 contents of a file located by a file URI. Additional operations MAY 206 be provided, such as writing to, creating, and deleting files. See 207 the POSIX file and directory operations [POSIX] for examples of 208 standardized operations that can be performed on files. 210 File URIs can also be translated to and from other, similar 211 constructs, such as local file paths or UNC strings. 213 A file URI can be dependably dereferenced or translated to a local 214 file path only if it is local. A file URI is considered "local" if 215 it has a blank or no authority, or the authority is the special 216 string "localhost". 218 This specification neither defines nor forbids a mechanism for 219 accessing non-local files. See SMB [MS-SMB], NFS [RFC7530], NCP 220 [NOVELL] for examples of protocols that can be used to access files 221 over a network. Also see Appendix C.2 for a non-normative discussion 222 on translating non-local file URIs to and from UNC strings. 224 3.1. Translating Local File Path to file URI 226 Below is an algorithmic description of the process used to convert a 227 file path to a URI; see Section 4 for encoding considerations. 229 1. Resolve the file path to its fully qualified absolute form. 231 2. Initialise the URI with the "file:" scheme identifier. 233 3. If including an empty authority field, append the "//" sigil to 234 the URI. 236 4. Append a slash character "/" to the URI, to signify the path 237 root. 239 5. For each directory in the path after the root: 241 1. Transform the directory name to a path segment ([RFC3986], 242 Section 3.3) as per Section 2 of [RFC3986]. 244 2. Append the transformed segment and a delimiting slash 245 character "/" to the URI. 247 6. If the path includes a file name: 249 1. Transform the file name to a path segment as above. 251 2. Append the transformed segment to the URI. 253 This algorithm is intentionally general, and may not cover some 254 system-specific edge cases. See Appendix B for brief discussions on 255 system-specific considerations for some systems. 257 *Differences from RFC 1738* 259 In [RFC1738] a file URL always started with the token "file://", 260 followed by an (optionally blank) authority and a "/". That "/" was 261 not considered part of the path. This implies that the correct 262 encoding for a file path in a UNIX-like environment would have been: 264 token + authority + slash + path 265 = "file://" + "" + "/" + "/path/to/file.txt" 266 = "file:////path/to/file.txt" 268 However that construct was never observed in practice, and in fact 269 would have collided with the eventual encoding of UNC strings in URIs 270 described non-normatively in Appendix C.3. 272 3.2. Translating Non-local File Path to file URI 274 Translating a non-local file path, including a UNC string, to a file 275 URI follows the same basic algorithm as for local files, above, 276 except that the authority MUST refer to the network-accesible node 277 that hosts the file. 279 For example, in a clustered OpenVMS Files-11 system the authority 280 would contain the node name. Where the original node reference 281 includes a username and password in an access control string, they 282 MAY be transcribed into the userinfo field of the authority 283 ([RFC3986], Section 3.2.1), security considerations (Section 6) 284 notwithstanding. 286 See Appendix C.2 for an explicit (but non-normative and strictly 287 optional) handling of UNC strings. 289 3.3. Incompatible File Paths 291 Some conventional file path formats are known to be incompatible with 292 the file URI scheme. 294 3.3.1. Win32 Namespaces 296 The Microsoft Windows API defines Win32 Namespaces [Win32-Namespaces] 297 for interacting with files and devices using Windows API functions. 298 These namespaced paths are prefixed by "\\?\" for Win32 File 299 Namespaces and "\\.\" for Win32 Device Namespaces. There is also a 300 special case for UNC file paths in Win32 File Namespaces, referred to 301 as "Long UNC", using the prefix "\\?\UNC\". 303 This specification does not define a mechanism for translating 304 namespaced paths to or from file URIs. 306 4. Encoding 308 To avoid ambiguity, a file URI SHOULD be transported as an 309 Internationalized Resource Identifier (IRI) [RFC3987], or as a URI 310 with non-ASCII characters encoded according to the UTF-8 character 311 encoding [STD63] and percent-encoded as needed ([RFC3986], 312 Section 2.5). 314 The encoding of a file URI depends on the file system that stores the 315 identified file. If the file system uses a known non-Unicode 316 character encoding, the path SHOULD be converted to a sequence of 317 characters from the Universal Character Set [ISO10646] normalized 318 according to Normalization Form C (NFC) [UTR15], before being 319 translated to a file URI, and conversely a file URI SHOULD be 320 converted back to the file system's native encoding when 321 dereferencing or translating to a file path. 323 Note that many modern file systems encode directory and file names 324 as arbitrary sequences of octets. In those cases, the 325 representation as an encoded string often depends on the user's 326 localization settings, or defaults to UTF-8 [STD63]. 328 When the file system's encoding is not known the file URI SHOULD be 329 transported as an Internationalized Resource Identifier (IRI) 330 [RFC3987] to avoid ambiguity. See Appendix D for examples. 332 5. Origins 334 As per [RFC6454], Section 4, when determining the origin of a file 335 URI implementations MAY return an implementation-defined value. 337 Historically, user agents have granted content from the file URI 338 scheme a tremendous amount of privilege. However, granting all local 339 files such wide privileges can lead to privilege escalation attacks. 340 Some user agents have had success granting local files directory- 341 based privileges, but this approach has not been widely adopted. 342 Other user agents use globally unique identifiers for each file URI, 343 which is the most secure option. 345 6. Security Considerations 347 There are many security considerations for URI schemes discussed in 348 [RFC3986]. 350 File access and the granting of privileges for specific operations 351 are complex topics, and the use of file URIs can complicate the 352 security model in effect for file privileges. Software using file 353 URIs MUST NOT grant greater access than would be available for other 354 file access methods. 356 File systems typically assign an operational meaning to special 357 characters, such as the "/", "\", ":", "[", and "]" characters, and 358 to special device names like ".", "..", "...", "aux", "lpt", etc. In 359 some cases, merely testing for the existence of such a name will 360 cause the operating system to pause or invoke unrelated system calls, 361 leading to significant security concerns regarding denial of service 362 and unintended data transfer. It would be impossible for this 363 specification to list all such significant characters and device 364 names. Implementers MUST research the reserved names and characters 365 for the types of storage device that may be attached to their 366 application and restrict the use of data obtained from URI components 367 accordingly. 369 Additionally, as discussed in the HP OpenVMS Systems Documentation 370 371 "access control strings include sufficient information to allow 372 someone to break in to the remote account, [therefore] they create 373 serious security exposure." In a similar vein, the presence of a 374 password in a "user:password" userinfo field is deprecated by 375 [RFC3986]. As such, the userinfo field of a file URI, if present, 376 MUST NOT contain a password. 378 7. IANA Considerations 380 This document defines the following URI scheme, so the "Permanent URI 381 Schemes" registry has been updated accordingly. This registration 382 complies with [BCP35]. 384 Scheme name: 385 file 387 Status: 388 permanent 390 Applications/protocols that use this scheme name: 391 Commonly used in hypertext documents to refer to files without 392 depending on network access. Supported by major browsers. 394 Windows API (PathCreateFromUrl, UrlCreateFromPath). 396 Perl LWP. 398 Contact: 399 Matthew Kerwin 401 Change Controller: 402 This scheme is registered under the IETF tree. As such, the IETF 403 maintains change control. 405 8. Acknowledgements 407 This specification is derived from [RFC1738], [RFC3986], and 408 [I-D.hoffman-file-uri] (expired); the acknowledgements in those 409 documents still apply. 411 Additional thanks to Dave Risney, author of the informative IE Blog 412 article , and Dave Thaler for their comments and suggestions. 415 9. References 417 9.1. Normative References 419 [BCP35] Thaler, D., Ed., Hansen, T., and T. Hardie, "Guidelines 420 and Registration Procedures for URI Schemes", BCP 35, 421 RFC 7595, DOI 10.17487/RFC7595, June 2015, 422 . 424 [ISO10646] 425 International Organization for Standardization, 426 "Information Technology - Universal Multiple-Octet Coded 427 Character Set (UCS)", ISO/IEC 10646:2003, December 2003. 429 [RFC1035] Mockapetris, P., "Domain names - implementation and 430 specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, 431 November 1987, . 433 [RFC1123] Braden, R., Ed., "Requirements for Internet Hosts - 434 Application and Support", STD 3, RFC 1123, 435 DOI 10.17487/RFC1123, October 1989, 436 . 438 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 439 Requirement Levels", BCP 14, RFC 2119, 440 DOI 10.17487/RFC2119, March 1997, 441 . 443 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 444 Resource Identifier (URI): Generic Syntax", STD 66, 445 RFC 3986, DOI 10.17487/RFC3986, January 2005, 446 . 448 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 449 Identifiers (IRIs)", RFC 3987, DOI 10.17487/RFC3987, 450 January 2005, . 452 [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing 453 Architecture", RFC 4291, DOI 10.17487/RFC4291, February 454 2006, . 456 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 457 Specifications: ABNF", STD 68, RFC 5234, 458 DOI 10.17487/RFC5234, January 2008, 459 . 461 [RFC6874] Carpenter, B., Cheshire, S., and R. Hinden, "Representing 462 IPv6 Zone Identifiers in Address Literals and Uniform 463 Resource Identifiers", RFC 6874, DOI 10.17487/RFC6874, 464 February 2013, . 466 [UTR15] Davis, M. and K. Whistler, "Unicode Normalization Forms", 467 August 2012, 468 . 470 9.2. Informative References 472 [Bug107540] 473 Bugzilla@Mozilla, "Bug 107540", October 2007, 474 . 476 [I-D.hoffman-file-uri] 477 Hoffman, P., "The file URI Scheme", draft-hoffman-file- 478 uri-03 (work in progress), January 2005. 480 [MS-DTYP] Microsoft Open Specifications, "Windows Data Types, 2.2.56 481 UNC", January 2013, 482 . 484 [MS-NBTE] Microsoft Open Specifications, "NetBIOS over TCP (NBT) 485 Extensions", May 2014, 486 . 488 [MS-SMB] Microsoft Open Specifications, "Server Message Block (SMB) 489 Protocol", January 2013, 490 . 492 [NOVELL] Novell, "NetWare Core Protocols", 2013, 493 . 496 [POSIX] IEEE, "IEEE Std 1003.1, 2013 Edition", 2013. 498 [RFC1630] Berners-Lee, T., "Universal Resource Identifiers in WWW: A 499 Unifying Syntax for the Expression of Names and Addresses 500 of Objects on the Network as used in the World-Wide Web", 501 RFC 1630, DOI 10.17487/RFC1630, June 1994, 502 . 504 [RFC1738] Berners-Lee, T., Masinter, L., and M. McCahill, "Uniform 505 Resource Locators (URL)", RFC 1738, DOI 10.17487/RFC1738, 506 December 1994, . 508 [RFC6454] Barth, A., "The Web Origin Concept", RFC 6454, 509 DOI 10.17487/RFC6454, December 2011, 510 . 512 [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System 513 (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, 514 March 2015, . 516 [STD63] Yergeau, F., "UTF-8, a transformation format of ISO 517 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 518 2003, . 520 [WHATWG-URL] 521 WHATWG, "URL Living Standard", May 2013, 522 . 524 [Win32-Namespaces] 525 Microsoft Developer Network, "Naming Files, Paths, and 526 Namespaces", June 2013, . 529 Appendix A. Example URIs 531 The syntax in Section 2 is intended to support file URIs that take 532 the following forms: 534 Local files: 536 o A traditional file URI for a local file, with an empty authority. 537 This is the most common format in use today. E.g.: 539 * "file:///path/to/file" 541 o The minimal representation of a local file, with no authority 542 field and an absolute path that begins with a slash "/". E.g.: 544 * "file:/path/to/file" 546 Non-local files: 548 o A non-local file, with an explicit authority. E.g.: 550 * "file://host.example.com/path/to/file" 552 Appendix B. System-specific Operations 554 This appendix is not normative; it highlights some observed 555 behaviours and provides system-specific guidance for interacting with 556 file URIs and paths. 558 B.1. POSIX Systems 560 There is little to say about POSIX file systems; the file URI 561 structure already closely resembles POSIX file paths. 563 B.2. DOS- and Windows-Like Systems 565 When mapping a DOS- or Windows-like file path to a file URI, 566 implementations typically map the drive letter (e.g. "c:") into the 567 first path segment. 569 See Appendix C for some explicit (but non-normative and strictly 570 optional) rules for interacting with DOS- or Windows-like file paths 571 and URIs. 573 B.3. Mac OS X Systems 575 The HFS+ file system uses a non-standard normalization form, similar 576 to Normalization Form D. Take care when transforming HFS+ file paths 577 to and from URIs using Normalization Form C (Section 4). 579 B.4. OpenVMS Files-11 Systems 581 When mapping a VMS file path to a file URI, map the device name into 582 the first path segment. Note that the dollars sign "$" is a reserved 583 character per the definition in [RFC3986], Section 2.2, so should be 584 percent-encoded if present in the device name. 586 If the VMS file path includes a node reference, use that as the 587 authority. Where the original node reference includes a username and 588 password in an access control string, they can be transcribed into 589 the userinfo field of the authority ([RFC3986], Section 3.2.1), 590 security considerations (Section 6) notwithstanding. 592 Appendix C. Nonstandard Syntax Variations 594 These variations may be encountered for historical reasons, but are 595 not supported by the normative syntax of Section 2. 597 This appendix is not normative. 599 C.1. DOS and Windows Drive Letters 601 On Windows- or DOS-based file systems a absolute file path can begin 602 with a drive letter. To facilitate this, the "local-path" rule in 603 Section 2 can be replaced with the following: 605 local-path = [ drive-letter ] path-absolute 607 drive-letter = ALPHA ":" 609 This is intended to support the minimal representation of a local 610 file in a DOS- or Windows-based environment, with no authority field 611 and an absolute path that begins with a drive letter. E.g.: 613 o "file:c:/path/to/file" 615 URIs of the form "file:///c:/path/to/file" are already supported by 616 the "path-absolute" rule. 618 Note that comparison of drive letters in DOS or Windows file paths is 619 case-insensitive. Some implementations therefore canonicalize drive 620 letters in file URIs by converting them to uppercase. 622 C.1.1. Relative Paths 624 In DOS- or Windows-based file systems, relative paths beginning with 625 a slash "/" should be resolved relative to the drive letter, and 626 resolution of ".." dot segments (per Section 5.2.4 of [RFC3986]) 627 should not ever overwrite the drive letter. 629 e.g.: 631 base: file:///c:/path/to/file.txt 632 rel. URI: /some/other/thing.bmp 633 resolved: file:///c:/some/other/thing.bmp 635 base: file:///c:/foo.txt 636 rel. URI: ../../bar.txt 637 resolved: file:///c:/bar.txt 639 Relative paths with a drive letter followed by a character other than 640 a slash (e.g. "c:bar/baz.txt" or "c:../foo.txt") should not be 641 accepted as dereferenceable URIs in DOS or Windows systems. 643 C.1.2. Vertical Bar Character 645 Historically some implementations have used a vertical line character 646 "|" instead of a colon ":" in the drive letter construct. [RFC3986] 647 forbids the use of the vertical line, however it may be necessary to 648 interpret or update old URIs. 650 For interpreting such URIs, the "auth-path" and "local-path" rules in 651 Section 2 and the "drive-letter" rule above are replaced with the 652 following: 654 auth-path = [ file-auth ] path-absolute 655 / [ file-auth ] file-absolute 657 local-path = [ drive-letter ] path-absolute 658 / file-absolute 660 file-absolute = "/" drive-letter path-absolute 662 drive-letter = ALPHA ":" 663 / ALPHA "|" 665 This is intended to support regular DOS or Windows file URIs with 666 vertical line characters in the drive letter construct. E.g.: 668 o "file:///c|/path/to/file" 670 o "file:/c|/path/to/file" 672 o "file:c|/path/to/file" 674 To update such an old URI, replace the vertical line "|" with a colon 675 ":". 677 C.2. UNC Strings 679 A UNC filespace selector string can be directly translated to a URI; 680 see Section 4. The following is an algorithmic description of the 681 process of translating a UNC string to a file URI: 683 1. Initialise the URI with the "file:" scheme identifier. 685 2. Append the authority: 687 1. Append the "//" authority sigil to the URI. 689 2. Append the hostname field of the UNC string to the URI. 691 3. Append the sharename: 693 1. Transform the sharename to a path segment ([RFC3986], 694 Section 3.3) as per Section 2 of [RFC3986]. 696 2. Append a delimiting slash character "/" and the transformed 697 segment to the URI. 699 4. For each objectname: 701 1. Transform the objectname to a path segment ([RFC3986], 702 Section 3.3) as per Section 2 of [RFC3986]. 704 2. Append a delimiting slash character "/" and the transformed 705 segment to the URI. 707 Example: 709 UNC String: \\host.example.com\Share\path\to\file.txt 710 URI: file://host.example.com/Share/path/to/file.txt 712 C.3. UNC Paths 714 It is common to encounter file URIs that encode entire UNC strings in 715 the path, usually with all backslash "\" characters replaced with 716 slashes "/". 718 To interpret such URIs, the "auth-path" rule in Section 2 is replaced 719 with the following: 721 auth-path = [ file-auth ] path-absolute 722 / unc-authority path-absolute 724 unc-authority = 2*3"/" [ userinfo "@" ] file-host 726 file-host = inline-IP / IPv4address / reg-name 728 inline-IP = "%5B" ( IPv6address / IPvFuture ) "%5D" 730 This syntax uses the "userinfo", "IPv4address, "IPv6address", 731 "IPvFuture", and "reg-name` rules from [RFC3986]. 733 Note that the "file-host" rule is the same as "host" but with 734 percent-encoding applied to "[" and "]" characters. 736 This extended syntax is intended to support URIs that take the 737 following forms, in addition to those in Appendix A: 739 Non-local files: 741 o The "traditional" representation of a non-local file, with an 742 empty authority and a complete (transformed) UNC string in the 743 path. E.g.: 745 * "file:////host.example.com/path/to/file" 747 o As above, with an extra slash between the empty authority and the 748 transformed UNC string, conformant with the definition from 749 [RFC1738]. E.g.: 751 * "file://///host.example.com/path/to/file" 753 This representation is notably used by the Firefox web browser. 754 See Bugzilla#107540 [Bug107540]. 756 It also further limits the set of file URIs that can be translated to 757 a local file path to those with a path that does not encode a UNC 758 string. 760 C.4. Backslash as Separator 762 Historically some implementations have copied entire file paths into 763 the path components of file URIs. Where DOS or Windows file paths 764 were copied thus, resulting URI strings contained unencoded backslash 765 "\" characters, which are forbidden by both [RFC1738] and [RFC3986]. 767 It may be possible to translate or update such an invalid file URI by 768 replacing all backslashes "\" with slashes "/", if it can be 769 determined with reasonable certainty that the backslashes are 770 intended as path separators. 772 Appendix D. Example of IRI vs Percent-Encoded URI 774 The following examples demonstrate the advantage of encoding file 775 URIs as IRIs to avoid ambiguity (see Section 4). 777 Example: file IRI: 779 | Bytes of file IRI in a UTF-8 document: 780 | 66 69 6c 65 3a 43 3a 2f 72 65 c3 a7 75 2e 74 78 74 781 | f i l e : c : / r e ( c ) u . t x t 782 | 783 | Interpretation: 784 | A file named "recu.txt" with a cedilla on the "c", in the 785 | directory "C:\" of a DOS or Windows file system. 786 | 787 | Character value sequences of file paths, for various file system 788 | encodings: 789 | 790 | o UTF-16 (e.g. NTFS): 791 | 0043 003a 005c 0072 0065 00e7 0075 002e 0074 0078 0074 792 | 793 | o Codepage 437 (e.g. MS-DOS): 794 | 43 3a 5c 72 65 87 75 2e 74 78 74 795 Counter-example: ambiguous file URI: 797 | Percent-encoded file URI, in any ASCII-compatible document: 798 | "file:///%E3%81%A1" 799 | 800 | Possible interpretations of the file name, depending on the 801 | encoding of the file system: 802 | 803 | o UTF-8: 804 | 805 | 806 | o Codepage 437: 807 | + 808 | + 809 | 810 | 811 | o EBCDIC: 812 | "Ta~" 813 | 814 | etc. 816 Appendix E. UNC Syntax 818 The UNC filespace selector string is a null-terminated sequence of 819 characters from the Universal Character Set [ISO10646]. 821 The syntax of a UNC filespace selector string, as defined by 822 [MS-DTYP], is given here in Augmented Backus-Naur Form (ABNF) 823 [RFC5234] for convenience. Note that this definition is informative 824 only; the normative description is in [MS-DTYP]. 826 UNC = "\\" hostname "\" sharename *( "\" objectname ) 827 hostname = netbios-name / fqdn / ip-address 828 sharename = 829 objectname = 831 o "netbios-name" from [MS-NBTE], Section 2.2.1. 833 o "fqdn" from [RFC1035] or [RFC1123] 835 o "ip-address" from Section 2.1 of [RFC1123], or Section 2.2 of 836 [RFC4291]. 838 The precise format of "sharename" depends on the protocol; see: SMB 839 [MS-SMB], NFS [RFC7530], NCP [NOVELL]. 841 Appendix F. Collected Rules 843 Here are the collected syntax rules for all optional appendices, 844 presented for convenience. This collected syntax is not normative. 846 file-URI = file-scheme ":" file-hier-part 848 file-scheme = "file" 850 file-hier-part = "//" auth-path 851 / local-path 853 auth-path = [ file-auth ] path-absolute 854 / [ file-auth ] file-absolute 855 / unc-authority path-absolute 857 local-path = [ drive-letter ] path-absolute 858 / file-absolute 860 file-auth = [ userinfo "@" ] host 862 unc-authority = 2*3"/" [ userinfo "@" ] file-host 864 file-host = inline-IP / IPv4address / reg-name 866 inline-IP = "%5B" ( IPv6address / IPvFuture ) "%5D" 868 file-absolute = "/" drive-letter path-absolute 870 drive-letter = ALPHA ":" 871 / ALPHA "|" 873 This collected syntax is intended to support file URIs that take the 874 following forms: 876 Local files: 878 o A traditional file URI for a local file, with an empty authority. 879 E.g.: 881 * "file:///path/to/file" 883 o The minimal representation of a local file, with no authority 884 field and an absolute path that begins with a slash "/". E.g.: 886 * "file:/path/to/file" 888 o The minimal representation of a local file in a DOS- or Windows- 889 based environment, with no authority field and an absolute path 890 that begins with a drive letter. E.g.: 892 * "file:c:/path/to/file" 894 o Regular DOS or Windows file URIs, with vertical line characters in 895 the drive letter construct. E.g.: 897 * "file:///c|/path/to/file" 899 * "file:/c|/path/to/file" 901 * "file:c|/path/to/file" 903 Non-local files: 905 o The representation of a non-local file, with an explicit 906 authority. E.g.: 908 * "file://host.example.com/path/to/file" 910 o The "traditional" representation of a non-local file, with an 911 empty authority and a complete (transformed) UNC string in the 912 path. E.g.: 914 * "file:////host.example.com/path/to/file" 916 o As above, with an extra slash between the empty authority and the 917 transformed UNC string. E.g.: 919 * "file://///host.example.com/path/to/file" 921 Author's Address 923 Matthew Kerwin 924 Queensland University of Technology 925 Victoria Park Road 926 Kelvin Grove, QLD 4059 927 Australia 929 Email: matthew.kerwin@qut.edu.au