The file URI SchemeQueensland University of TechnologyVictoria Park RoadKelvin GroveQLD4059Australiamatthew.kerwin@qut.edu.au
General
Applications Area Working GroupInternet-DraftThis document specifies the “file” Uniform Resource Identifier (URI)
scheme, obsoleting the definition in RFC 1738.It attempts to define a common core which is intended to interoperate
across the broad spectrum of existing implementations, while at the
same time documenting other current practices.Note to Readers (To be removed by the RFC Editor)This draft should be discussed on the IETF Applications Area Working
Group discussion list <apps-discuss@ietf.org>.A file URI identifies a file on a particular file system. It can be
used in discussions about the file, and if other conditions are met it
can be dereferenced to directly access the file.The file URI scheme is not coupled with a specific protocol, nor with a
specific media type. See for a discussion of operations
that can be performed on a file URI.This document defines a syntax that is compatible with most extant
implementations, while attempting to push towards a stricter subset of
“ideal” constructs. In many cases it simultaneously acknowledges and
deprecates some less common or outdated constructs.The file URI scheme was first defined in , which, being an
informational RFC, does not specify an Internet standard. The
definition was standardised in , and the scheme was
registered with the Internet Assigned Numbers Authority (IANA);
however that definition omitted certain language included by former
that clarified aspects such as:the use of slashes to denote boundaries between directory
levels of a hierarchical file system; andthe requirement that client software convert the file URI
into a file name in the local file name conventions.The Internet draft was written in an
effort to keep the file URI scheme on standards track when
was made obsolete, but that draft expired in 2005. It enumerated
concerns arising from the various, often conflicting implementations
of the scheme. It serves as the spiritual predecessor of this document.Additionally the WHATWG defines a living URL standard ,
which includes algorithms for interpreting file URIs (as URLs).The Universal Naming Convention (UNC) defines a string
format that can perform a similar role to the file URI scheme in
describing the location of files. A UNC filespace selector string has
three parts: host, share, and path; see . This document
describes but does not specify a means of translating between UNC
filespace selector strings and file URIs in .The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”,
“SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this
document are to be interpreted as described in .Throughout this document the term “local” is used to describe files
that can be accessed directly through the local file system. It is
important to note that a local file may not be physically located on
the local machine, for example if a networked file system is
transparently mounted into the local file system.The file URI syntax is defined here in Augmented Backus-Naur Form (ABNF)
, including the core ABNF syntax rule ALPHA defined by that
specification, and importing the userinfo, host, authority and
path-absolute rules from (as updated by .)Please note that lists some other commonly seen
but nonstandard variations.The syntax definition above is different from those given in
and as it is derived from the generic syntax
of , which post-dates all previous specifications.As a special case, the “auth-path” rule can match the string
“localhost” or the empty string in the URI’s authority component; this
is interpreted as “the machine from which the URI is being
interpreted,” exactly as if no authority was present.Systems exhibit different levels of case-sensitivity. Unless the file
system is known to be case-insensitive, implementations MUST maintain
the case of file and directory names when translating file URIs to and
from the local system’s representation of file paths, and any systems or
devices that transport file URIs MUST NOT alter the case of file URIs
they transport.Implementations SHOULD, at a minimum, provide a read-like operation to
return the contents of a file located by a file URI. Additional
operations MAY be provided, such as writing to, creating, and deleting
files. See the POSIX file and directory operations for
examples of standardized operations that can be performed on files.File URIs can also be translated to and from other, similar constructs,
such as local file paths or UNC strings.A file URI can only be dereferenced or translated to a local file path
if it is local. A file URI is considered “local” if it has a blank or
no authority, or the authority is the special string “localhost”.This specification neither defines nor forbids a mechanism for
accessing non-local files. See SMB , NFS , NCP
for examples of protocols that can be used to access files
over a network. Also see for a discussion on translating
non-local file URIs to and from UNC stings.Below is an algorithmic description of the process used to convert a
file path to a URI; see .Resolve the file path to its fully qualified absolute form.Initialise the URI with the “file:” scheme identifier.If including an empty authority field, append the “//” sigil to
the URI.Append a slash character “/” to the URI, to signify the path root.For each directory in the path after the root: Transform the directory name to a path segment (,
Section 3.3) as per Section 2 of .Append the transformed segment and a delimiting slash character
“/” to the URI.If the path includes a file name: Transform the file name to a path segment as above.Append the transformed segment to the URI.Differences from RFC 1738In a file URL always started with the token “file://”,
followed by an (optionally blank) authority and a “/”. That “/” was not
considered part of the path. This implies that the correct encoding for
a file path in a UNIX-like environment would have been:However that construct was never observed in practice, and in fact
would have collided with the eventual encoding of UNC strings in URIs
described in .Translating a non-local file path, including a UNC string, to a file
URI follows the same basic algorithm as for local files, above, except
that the authority MUST refer to the network-accesible node that hosts
the file.For example, in a clustered OpenVMS Files-11 system the authority
would contain the node name. Where the original node reference
includes a username and password in an access control string, they MAY
be transcribed into the userinfo field of the authority (,
Section 3.2.1), security considerations () notwithstanding.See for an explicit handling of UNC strings.Some conventional file path formats are known to be incompatible with
the file URI scheme.The Microsoft Windows API defines Win32 Namespaces
for interacting with files and devices using Windows API functions.
These namespaced paths are prefixed by “\\?\” for Win32 File
Namespaces and “\\.\” for Win32 Device Namespaces. There is also a
special case for UNC file paths in Win32 File Namespaces, referred to as
“Long UNC”, using the prefix “\\?\UNC\”.This specification does not define a mechanism for translating
namespaced paths to or from file URIs.To avoid ambiguity, a file URI SHOULD be transported as an
Internationalized Resource Identifier (IRI) , or as a URI
with non-ASCII characters encoded according to the UTF-8 character
encoding and percent-encoded as needed (,
Section 2.5).The encoding of a file URI depends on the file system that stores the
identified file. If the file system uses a known non-Unicode character
encoding, the path SHOULD be converted to a sequence of characters from
the Universal Character Set normalized according to
Normalization Form C (NFC) , before being translated to a
file URI, and conversely a file URI SHOULD be converted back to the
file system’s native encoding when dereferencing or translating to a
file path.Note that many modern file systems encode directory and file names
as arbitrary sequences of octets. In those cases, the representation
as an encoded string often depends on the user’s localization
settings, or defaults to UTF-8 .When the file system’s encoding is not known the file URI SHOULD be
transported as an Internationalized Resource Identifier (IRI)
to avoid ambiguity. See for examples.There are many security considerations for URI schemes discussed in
.File access and the granting of privileges for specific operations
are complex topics, and the use of file URIs can complicate the
security model in effect for file privileges. Software using file
URIs MUST NOT grant greater access than would be available for other
file access methods.File systems typically assign an operational meaning to special
characters, such as the “/”, “\”, “:”, “[”, and “]” characters, and
to special device names like “.”, “..”, “…”, “aux”, “lpt”, etc.
In some cases, merely testing for the existence of such a name will
cause the operating system to pause or invoke unrelated system calls,
leading to significant security concerns regarding denial of service
and unintended data transfer. It would be impossible for this
specification to list all such significant characters and device names.
Implementers MUST research the reserved names and characters for the
types of storage device that may be attached to their application and
restrict the use of data obtained from URI components accordingly.Additionally, as discussed in the HP OpenVMS Systems Documentation
<http://h71000.www7.hp.com/doc/84final/ba554_90015/ch03s09.html>
“access control strings include sufficient information to allow someone
to break in to the remote account, [therefore] they create serious
security exposure.” In a similar vein, the presence of a password in a
“user:password” userinfo field is deprecated by . As such,
the userinfo field of a file URI, if present, MUST NOT contain a
password.This document defines the following URI scheme, so the “Permanent
URI Schemes” registry has been updated accordingly. This registration
complies with .
file
permanent
Commonly used in hypertext documents to refer to files without
depending on network access. Supported by major browsers.Windows API (PathCreateFromUrl, UrlCreateFromPath).Perl LWP.
Matthew Kerwin <matthew.kerwin@qut.edu.au>
This scheme is registered under the IETF tree. As such, the IETF
maintains change control.[RFC Editor Note: Replace XXXX with this RFC’s reference.]This specification is derived from , , and
(expired); the acknowledgements in
those documents still apply.Additional thanks to Dave Risney, author of the informative
IE Blog article <http://blogs.msdn.com/b/ie/archive/2006/12/06/file-uris-in-windows.aspx>,
and Dave Thaler for their comments and suggestions.Guidelines and Registration Procedures for URI SchemesDomain names - implementation and specificationThis RFC is the revised specification of the protocol and format used in the implementation of the Domain Name System. It obsoletes RFC-883. This memo documents the details of the domain name client - server communication.Requirements for Internet Hosts - Application and SupportThis RFC is an official specification for the Internet community. It incorporates by reference, amends, corrects, and supplements the primary protocol standards documents relating to hosts. [STANDARDS-TRACK]Key words for use in RFCs to Indicate Requirement LevelsIn many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.Uniform Resource Identifier (URI): Generic SyntaxA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. This specification defines the generic URI syntax and a process for resolving URI references that might be in relative form, along with guidelines and security considerations for the use of URIs on the Internet. The URI syntax defines a grammar that is a superset of all valid URIs, allowing an implementation to parse the common components of a URI reference without knowing the scheme-specific requirements of every possible identifier. This specification does not define a generative grammar for URIs; that task is performed by the individual specifications of each URI scheme. [STANDARDS-TRACK]Internationalized Resource Identifiers (IRIs)This document defines a new protocol element, the Internationalized Resource Identifier (IRI), as a complement of the Uniform Resource Identifier (URI). An IRI is a sequence of characters from the Universal Character Set (Unicode/ISO 10646). A mapping from IRIs to URIs is defined, which means that IRIs can be used instead of URIs, where appropriate, to identify resources. The approach of defining a new protocol element was chosen instead of extending or changing the definition of URIs. This was done in order to allow a clear distinction and to avoid incompatibilities with existing software. Guidelines are provided for the use and deployment of IRIs in various protocols, formats, and software components that currently deal with URIs.IP Version 6 Addressing ArchitectureThis specification defines the addressing architecture of the IP Version 6 (IPv6) protocol. The document includes the IPv6 addressing model, text representations of IPv6 addresses, definition of IPv6 unicast addresses, anycast addresses, and multicast addresses, and an IPv6 node's required addresses.This document obsoletes RFC 3513, "IP Version 6 Addressing Architecture". [STANDARDS-TRACK]Augmented BNF for Syntax Specifications: ABNFInternet technical specifications often need to define a formal syntax. Over the years, a modified version of Backus-Naur Form (BNF), called Augmented BNF (ABNF), has been popular among many Internet specifications. The current specification documents ABNF. It balances compactness and simplicity with reasonable representational power. The differences between standard BNF and ABNF involve naming rules, repetition, alternatives, order-independence, and value ranges. This specification also supplies additional rule definitions and encoding for a core lexical analyzer of the type common to several Internet specifications. [STANDARDS-TRACK]Representing IPv6 Zone Identifiers in Address Literals and Uniform Resource IdentifiersThis document describes how the zone identifier of an IPv6 scoped address, defined as <zone_id> in the IPv6 Scoped Address Architecture (RFC 4007), can be represented in a literal IPv6 address and in a Uniform Resource Identifier that includes such a literal address. It updates the URI Generic Syntax specification (RFC 3986) accordingly.Information Technology - Universal Multiple-Octet Coded Character Set (UCS)International Organization for StandardizationUnicode Normalization FormsUTF-8, a transformation format of ISO 10646Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide WebThis document defines the syntax used by the World-Wide Web initiative to encode the names and addresses of objects on the Internet. This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind.Uniform Resource Locators (URL)This document specifies a Uniform Resource Locator (URL), the syntax and semantics of formalized information for location and access of resources via the Internet. [STANDARDS-TRACK]Network File System (NFS) Version 4 ProtocolThe Network File System (NFS) version 4 protocol is a distributed file system protocol that builds on the heritage of NFS protocol version 2 (RFC 1094) and version 3 (RFC 1813). Unlike earlier versions, the NFS version 4 protocol supports traditional file access while integrating support for file locking and the MOUNT protocol. In addition, support for strong security (and its negotiation), COMPOUND operations, client caching, and internationalization has been added. Of course, attention has been applied to making NFS version 4 operate well in an Internet environment.This document, together with the companion External Data Representation (XDR) description document, RFC 7531, obsoletes RFC 3530 as the definition of the NFS version 4 protocol.The file URI SchemeThis document specifies the file: Uniform Resource Identifier (URI) scheme that was originally specified in RFC 1738. The purpose of this document is to allow RFC 1738 to be moved to historic while keeping the information about the scheme on standards track.URL Living StandardWHATWGWindows Data Types, 2.2.56 UNCMicrosoft Open SpecificationsNetBIOS over TCP (NBT) ExtensionsMicrosoft Open SpecificationsServer Message Block (SMB) ProtocolMicrosoft Open SpecificationsNetWare Core ProtocolsNovellIEEE Std 1003.1, 2013 EditionIEEENaming Files, Paths, and NamespacesMicrosoft Developer NetworkBug 107540Bugzilla@MozillaThe syntax in is intended to support file URIs that take the
following forms:Local files:file:///path/to/fileA traditional file URI for a local file, with an empty
authority. This is the most common format in use today.file:/path/to/fileThe minimal representation of a local file, with no authority
field and an absolute path that begins with a slash “/”.Non-local files:file://host.example.com/path/to/fileThe representation of a non-local file, with an explicit
authority.This appendix is not normative; it highlights some observed
behaviours and provides system-specific guidance for interacting
with file URIs and paths.There is little to say about POSIX file systems; the file URI structure
already closely resembles POSIX file paths.When mapping a DOS- or Windows-like file path to a file URI,
implementations typically map the drive letter (e.g. “c:”) into the
first path segment.See for explicit (but non-normative and strictly
optional) rules for interacting with DOS- or Windows-like file paths
and URIs.The HFS+ file system uses a non-standard normalization form, similar
to Normalization Form D. Take care when transforming HFS+ file paths
to and from URIs using Normalization Form C .When mapping a VMS file path to a file URI, map the device name
into the first path segment. Note that the dollars sign “$” is
a reserved character per the definition in , Section 2.2,
so should be percent-encoded if present in the device name.If the VMS file path includes a node reference, use that as the
authority. Where the original node reference includes a username and
password in an access control string, they can be transcribed into the
userinfo field of the authority (, Section 3.2.1), security
considerations () notwithstanding.These variations may be encountered for historical reasons, but are
not supported by the normative syntax of .This appendix is not normative.On Windows- or DOS-based file systems a absolute file path can begin
with a drive letter. To facilitate this, the local-path rule in
can be replaced with the following:This is intended to support URIs of the form:file:c:/path/to/fileThe minimal representation of a local file in a DOS- or
Windows-based environment, with no authority field and an
absolute path that begins with a drive letter.URIs of the form file:///c:/path/to/file are already supported by the
path-absolute rule.Note that comparison of drive letters in DOS or Windows file paths
is case-insensitive. Some implementations therefore canonicalize drive
letters in file URIs by converting them to uppercase.In DOS- or Windows-based file systems, relative paths beginning with
a slash “/” should be resolved relative to the drive letter, and
resolution of “..” dot segments (per Section 5.2.4 of )
should not ever overwrite the drive letter.e.g.:Relative paths with a drive letter followed by a character other than
a slash (e.g. “c:bar/baz.txt” or “c:../foo.txt”) should not be
accepted as dereferenceable URIs in DOS or Windows systems.Historically some implementations have used a vertical line character
“|” instead of a colon “:” in the drive letter construct.
forbids the use of the vertical line, however it may be necessary to
interpret or update old URIs.For interpreting such URIs, the auth-path and local-path rules in
and the drive-letter rule above are replaced with the
following:This is intended to support URIs of the form:file:///c|/path/to/filefile:/c|/path/to/filefile:c|/path/to/fileRegular DOS or Windows file URIs, with vertical line characters
in the drive letter construct.To update such an old URI, replace the vertical line “|” with a
colon “:”.A UNC filespace selector string can be directly translated to a
URI; see . The following is an algorithmic description
of the process of translating a UNC string to a file URI:Initialise the URI with the “file:” scheme identifier.Append the authority: Append the “//” authority sigil to the URI.Append the hostname field of the UNC string to the URI.Append the sharename: Transform the sharename to a path segment (,
Section 3.3) as per Section 2 of .Append a delimiting slash character “/” and the transformed
segment to the URI.For each objectname: Transform the objectname to a path segment (,
Section 3.3) as per Section 2 of .Append a delimiting slash character “/” and the transformed
segment to the URI.Example:It is common to encounter file URIs that encode entire UNC strings in
the path, usually with all backslash “\” characters replaced with
slashes “/”.To interpret such URIs, the auth-path rule in is replaced
with the following:This syntax uses the userinfo, IPv4address, IPv6address,
IPvFuture, and reg-name` rules from .Note that the file-host rule is the same as host but with
percent-encoding applied to “[” and “]” characters.This extended syntax is intended to support URIs that take the
following forms, in addition to those in :Non-local files:file:////host.example.com/path/to/fileThe “traditional” representation of a non-local file, with an
empty authority and a complete (transformed) UNC string in the
path.file://///host.example.com/path/to/fileAs above, with an extra slash between the empty authority and the
transformed UNC string, conformant with the definition from
. This representation is notably used by the Firefox
web browser. See Bugzilla#107540 .It also further limits the set of file URIs that can be translated to
a local file path to those with a path that does not encode a UNC
string.Historically some implementations have copied entire file paths into
the path components of file URIs. Where DOS or Windows file paths were
copied thus, resulting URI strings contained unencoded backslash “\”
characters, which are forbidden by both and .It may be possible to translate or update such an invalid file URI by
replacing all backslashes “\” with slashes “/”, if it can be
determined with reasonable certainty that the backslashes are intended
as path separators.The following examples demonstrate the advantage of encoding file
URIs as IRIs to avoid ambiguity (see ).Example: file IRI:Counter-example: ambiguous file URI:The UNC filespace selector string is a null-terminated sequence of
characters from the Universal Character Set .The syntax of a UNC filespace selector string, as defined by
, is given here in Augmented Backus-Naur Form (ABNF)
for convenience. Note that this definition is informative
only; the normative description is in .netbios-name from , Section 2.2.1.fqdn from or ip-address from Section 2.1 of , or Section 2.2 of .The precise format of sharename depends on the protocol;
see: SMB , NFS , NCP .Here are the collected syntax rules for all optional appendices,
presented for convenience.This collected syntax is intended to support file URIs that take the
following forms:Local files:file:///path/to/fileA traditional file URI for a local file, with an empty
authority.file:/path/to/fileThe minimal representation of a local file, with no authority
field and an absolute path that begins with a slash “/”.file:c:/path/to/fileThe minimal representation of a local file in a DOS- or
Windows-based environment, with no authority field and an
absolute path that begins with a drive letter.file:///c|/path/to/filefile:/c|/path/to/filefile:c|/path/to/fileRegular DOS or Windows file URIs, with vertical line characters
in the drive letter construct.Non-local files:file://host.example.com/path/to/fileThe representation of a non-local file, with an explicit
authority.file:////host.example.com/path/to/fileThe “traditional” representation of a non-local file, with an
empty authority and a complete (transformed) UNC string in the
path.file://///host.example.com/path/to/fileAs above, with an extra slash between the empty authority and the
transformed UNC string.