Standardising Structure in URIs
mnot@mnot.net
http://www.mnot.net/
General
URI structure
Sometimes, it is attractive to add features to protocols or applications by specifying a particular
structure for URIs (or parts thereof). This document cautions against this practice in standards
(sometimes called “URI Squatting”).
URIs very often include structure and application data. This might include artefacts
from filesystems (often occuring in the path component), and user information (often in the query
component). In some cases, there can even be application-specific data in the authority component
(e.g., some applications are spread across several hostnames to enable a form of partitioning or
dispatch).
Furthermore, constraints upon the structure of URIs can be imposed by an implementation; for
example, many Web servers use the filename extension of the last path segment to determine the
media type of the response. Likewise, pre-packaged applications often have highly structured URIs
that can only be changed in limited ways (often, just the hostname and port they are deployed upon).
Because the owner of the URI is choosing to use the server or the software, this can be seen as
reasonable delegation of authority. When such conventions are mandated by standards, however, it
can have several potentially detrimental effects:
Collisions - As more conventions for URI structure become standardised, it becomes more likely
that there will be collisions between such conventions (especially considering that servers,
applications and individual deployments will have their own conventions).
Dilution - Adorning URIs with extra information to support new standard features dilutes their
usefulness as identifiers when that information is ephemeral (as URIs ought to be stable; see
Section 3.5.1), or its inclusion causes several alternate forms of the URI to exist
(see Section 2.3.1).
Brittleness - A standard that specifies a static URI cannot change its form in future revisions.
Operational Difficulty - Supporting some URI conventions can be difficult in some
implementations. For example, specifying that a particular query parameter be used precludes the
use of Web servers that serve the response from a filesystem. Likewise, an application that fixes
a base path for its operation (e.g., “/v1”) makes it impossible to deploy other applications with
the same prefix on the same host.
Client Assumptions - When conventions are standardised, some clients will inevitably assume that
the standards are in use when those conventions are seen. This can lead to interoperability
problems; for example, if a specification documents that the “sig” URI query parameter indicates
that its payload is a cryptographic signature for the URI, it can lead to false positives.
While it is not ideal when a server or a deployed application constrains uri structure (indeed, this
is not recommended practice, but that discussion is out of scope for this document), recommending
standards that mandate URI structure is inappropriate because the structure of a URI needs to be
firmly under the control of its owner, and the IETF (as well as other organisations) should not
usurp this ownership; see Section 2.2.2.1.
This document explains best current practices for establishing URI structures, conventions and
formats in standards. It also offers strategies for specifications to avoid violating these
guidelines in .
These guidelines are IETF Best Current Practice, and are therefore binding upon IETF
standards-track documents, as well as submissions to the RFC Editor on the Independent and IRTF streams. See and for more information.
Other Open Standards organisations (in the sense of ) are encouraged to adopt them.
Questions as to their applicability ought to be handled through the liaison relationship, if
present.
Ad hoc efforts are also encouraged to adopt them, as this RFC reflects Best Current Practice.
This document’s requirements specifically targets a few different types of specifications:
URI Scheme Definitions (“scheme definitions”) - specifications that define and register URI
schemes, as per .
Protocol Extensions (“extensions”) - specifications that offer new capabilities to potentially
any identifier, or a large subset; e.g., a new signature mechanism for ‘http’ URIs, or metadata
for any URI.
Applications Using URIs (“applications”) - specifications that use URIs to meet specific needs;
e.g., a HTTP interface to particular information on a host.
Requirements that target the generic class “Specifications” apply to all specifications, including
both those enumerated above above and others.
Note that this specification ought not be interpreted as preventing the allocation of control of
URIs by parties that legitimately own them, or have delegated that ownership; for example, a
specification might legitimately specify the semantics of a URI on the IANA.ORG Web site as part of
the establishment of a registry.
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”,
“RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in
.
Different components of a URI have differing practices recommended.
Applications and extensions MAY require use of specific URI scheme(s); for example, it is perfectly
acceptable to require that an application support ‘http’ and ‘https’ URIs. However, applications
SHOULD NOT preclude the use of other URI schemes in the future, to promote reuse, unless they are
clearly specific to the nominated schemes.
Specifications MUST NOT define substructure within URI schemes, unless they do so by modifying
, or they are the registration document for the URI scheme(s) in question.
Scheme definitions define the presence, format and semantics of an authority component in URIs; all
other specifications MUST NOT constrain, define structure or semantics for them.
Scheme definitions define the presence, format, and semantics of a path component in URIs; all
other specifications MUST NOT constrain, define structure or semantics for any path component.
The only exception to this requirement is registered “well-known” URIs, as specified by .
See that document for a description of the applicability of that mechanism.
The presence, format and semantics of the query component of URIs is dependent upon many factors,
and MAY be constrained by a scheme definition. Often, they are determined by the implementation of
a resource itself.
Applications SHOULD NOT directly specify the syntax of queries, as this can cause operational
difficulties for deployments that do not support a particular form of a query.
Extensions MUST NOT specify the format or semantics of queries. In particular, extensions MUST NOT
assume that all HTTP(S) resources are capable of accepting queries in the format defined by
, Section 17.13.4.
Media type definitions (as per SHOULD specify the fragment identifier syntax(es) to be
used with them; other specifications MUST NOT define structure within the fragment identifier,
unless they are explicitly defining one for reuse by media type definitions.
Given the issues above, the most successful strategy for applications and extensions that wish to
use URIs is to use them in the fashion they were designed; as run-time artefacts that are exchanged
as part of the protocol, rather than staticly specified syntax.
For example, if a specific URI needs to be known to interact with an application, its “shape” can
be determined by interacting with the application’s more general interface (in Web terms, its “home
page”) to learn about that URI.
describes a framework for identifying the semantics of a link in a “link relation type”
to aid this. provides a standard syntax for “link templates” that can be used to
dynamically insert application-specific variables into a URI to enable such applications while
avoiding impinging upon URI owners’ control of them.
allows specific paths to be ‘reserved’ for standard use on URI schemes that opt into
that mechanism (‘http’ and ‘https’ by default). Note, however, that this is not a general “escape
valve” for applications that need structured URIs; see that specification for more information.
Specifying more elaborate structures in an attempt to avoid collisions is not adequate to conform
to this docuement. For example, prefixing query parameters with “myapp_” does not help.
This document does not introduce new protocol artefacts with security considerations.
This document clarifies appropriate registry policy for new URI schemes, and potentially for the
creation of new URI-related registries, if they attempt to mandate structure within URIs. There are
no direct IANA actions specified in this document.
Key words for use in RFCs to Indicate Requirement Levels
Harvard University
1350 Mass. Ave.
Cambridge
MA 02138
- +1 617 495 3864
sob@harvard.edu
General
keyword
In many standards track documents several words are used to signify
the requirements in the specification. These words are often
capitalized. This document defines these words as they should be
interpreted in IETF documents. Authors who follow these guidelines
should incorporate this phrase near the beginning of their document:
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119.
Note that the force of these words is modified by the requirement
level of the document in which they are used.
Uniform Resource Identifier (URI): Generic Syntax
World Wide Web Consortium
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge
MA
02139
USA
+1-617-253-5702
+1-617-258-5999
timbl@w3.org
http://www.w3.org/People/Berners-Lee/
Day Software
5251 California Ave., Suite 110
Irvine
CA
92617
USA
+1-949-679-2960
+1-949-679-2972
fielding@gbiv.com
http://roy.gbiv.com/
Adobe Systems Incorporated
345 Park Ave
San Jose
CA
95110
USA
+1-408-536-3024
LMM@acm.org
http://larry.masinter.net/
Applications
uniform resource identifier
URI
URL
URN
WWW
resource
A Uniform Resource Identifier (URI) is a compact sequence of characters
that identifies an abstract or physical resource. This specification
defines the generic URI syntax and a process for resolving URI references
that might be in relative form, along with guidelines and security
considerations for the use of URIs on the Internet.
The URI syntax defines a grammar that is a superset of all valid URIs,
allowing an implementation to parse the common components of a URI
reference without knowing the scheme-specific requirements of every
possible identifier. This specification does not define a generative
grammar for URIs; that task is performed by the individual
specifications of each URI scheme.
Guidelines and Registration Procedures for New URI Schemes
This document provides guidelines and recommendations for the definition of Uniform Resource Identifier (URI) schemes. It also updates the process and IANA registry for URI schemes. It obsoletes both RFC 2717 and RFC 2718. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.
Media Type Specifications and Registration Procedures
This document defines procedures for the specification and registration of media types for use in HTTP, MIME, and other Internet protocols. This memo documents an Internet Best Current Practice.
The Internet Standards Process -- Revision 3
Harvard University
1350 Mass. Ave.
Cambridge
MA
02138
US
+1 617 495 3864
sob@harvard.edu
This memo documents the process used by the Internet community for the standardization of protocols and procedures. It defines the stages in the standardization process, the requirements for moving a document between stages and the types of documents used during this process. It also addresses the intellectual property rights and copyright issues associated with the standards process.
The RFC Series and RFC Editor
Internet Architecture Board
This document describes the framework for an RFC Series and an RFC Editor function that incorporate the principles of organized community involvement and accountability that has become necessary as the Internet technical community has grown, thereby enabling the RFC Series to continue to fulfill its mandate. This memo provides information for the Internet community.
Defining Well-Known Uniform Resource Identifiers (URIs)
This memo defines a path prefix for "well-known locations", "/.well-known/", in selected Uniform Resource Identifier (URI) schemes. [STANDARDS-TRACK]
Web Linking
This document specifies relation types for Web links, and defines a registry for them. It also defines the use of such links in HTTP headers with the Link header field. [STANDARDS-TRACK]
URI Template
A URI Template is a compact sequence of characters for describing a range of Uniform Resource Identifiers through variable expansion. This specification defines the URI Template syntax and the process for expanding a URI Template into a URI reference, along with guidelines for the use of URI Templates on the Internet. [STANDARDS-TRACK]
Architecture of the World Wide Web, Volume One
HTML 4.01 Specification
W3C
W3C
Thanks to David Booth, Anne van Kesteren and Erik Wilde for their suggestions
and feedback