Internet-Draft | Constrained Resource Identifiers | March 2020 |
Hartke | Expires 10 September 2020 | [Page] |
The Constrained Resource Identifier (CRI) is a complement to the Uniform Resource Identifier (URI) that serializes the URI components in Concise Binary Object Representation (CBOR) instead of a sequence of characters. This simplifies parsing, comparison and reference resolution in environments with severe limitations on processing power, code size, and memory size.¶
This note is to be removed before publishing as an RFC.¶
The issues list for this Internet-Draft can be found at <https://github.com/core-wg/coral/labels/href>.¶
A reference implementation and a set of test vectors can be found at <https://github.com/core-wg/coral/tree/master/binary/python>.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 10 September 2020.¶
Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.¶
The Uniform Resource Identifier (URI) [RFC3986] and its most common usage, the URI reference, are the Internet standard for linking to resources in hypertext formats such as HTML [W3C.REC-html52-20171214] and the HTTP "Link" header field [RFC8288].¶
A URI reference is a sequence of characters chosen from the repertoire of US-ASCII characters. The individual components of a URI reference are delimited by a number of reserved characters, which necessitates the use of an escape mechanism ("percent-encoding") when these reserved characters are used in a non-delimiting function. The resolution of URI references involves parsing a character sequence into its components, combining those components with the components of a base URI, merging path components, removing dot-segments, and recomposing the result back into a character sequence.¶
Overall, the proper handling of URI references is relatively intricate. This can be a problem, especially in constrained environments [RFC7228] where nodes often have severe code size and memory size limitations. As a result, many implementations in such environments support only an ad-hoc, informally-specified, bug-ridden, non-interoperable subset of half of RFC 3986.¶
This document defines the Constrained Resource Identifier (CRI) by constraining URIs to a simplified subset and serializing their components in Concise Binary Object Representation (CBOR) [RFC7049bis] instead of a sequence of characters. This allows typical operations on URI references such as parsing, comparison and reference resolution to be implemented (including all corner cases) in a comparatively small amount of code.¶
As a result of simplification, however, CRIs are not capable of expressing all URIs permitted by the generic syntax of RFC 3986 (hence the "constrained" in "Constrained Resource Identifier"). The supported subset includes all URIs of the Constrained Application Protocol (CoAP) [RFC7252], most URIs of the Hypertext Transfer Protocol (HTTP) [RFC7230], and other URIs that are similar. The exact constraints are defined in Section 2.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
Terms defined in this document appear in cursive where they are introduced (rendered in plain text as the new term surrounded by underscores).¶
A Constrained Resource Identifier consists of the same five components as a URI: scheme, authority, path, query, and fragment. The components are subject to the following constraints:¶
Resource identifiers are generally created on the initial creation of a resource with a certain resource identifier, or the initial exposition of a resource under a particular resource identifier.¶
A Constrained Resource Identifier SHOULD be created by the naming authority that governs the namespace of the resource identifier. For example, for the resources of an HTTP origin server, that server is responsible for creating the CRIs for those resources.¶
The creator MUST ensure that any CRI created satisfies the constraints defined in Section 2. The creation of a CRI fails if the CRI cannot be validated to satisfy all of the constraints.¶
If a creator creates a CRI from user input, it MAY apply the following (and only the following) normalizations to get the CRI more likely to validate: map the scheme name to lowercase (C1); map the registered name to NFC (C4); elide the port if it's the default port for the scheme (C6); elide a single zero-length path segment (C7); map path segments, query parameters and the fragment identifier to NFC (C8, C9, C10).¶
Once a CRI has been created, it can be used and transferred without further normalization. All operations that operate on a CRI SHOULD rely on the assumption that the CRI is appropriately pre-normalized. (This does not contradict the requirement that when CRIs are transferred, recipients must operate on as-good-as untrusted input and fail gracefully in the face of malicious inputs.)¶
One of the most common operations on CRIs is comparison: determining whether two CRIs are equivalent, without using the CRIs to access their respective resource(s).¶
Determination of equivalence or difference of CRIs is based on simple component-wise comparison. If two CRIs are identical component-by-component (using code-point-by-code-point comparison for components that are Unicode strings) then it is safe to conclude that they are equivalent.¶
This comparison mechanism is designed to minimize false negatives while strictly avoiding false positives. The constraints defined in Section 2 imply the most common forms of syntax- and scheme-based normalizations in URIs, but do not comprise protocol-based normalizations that require accessing the resources or detailed knowledge of the scheme's dereference algorithm. False negatives can be caused by resource aliases and CRIs that do not fully satisfy the constraints.¶
When CRIs are compared to select (or avoid) a network action, such as retrieval of a representation, fragment components (if any) should be excluded from the comparison.¶
The most common usage of a Constrained Resource Identifier is to embed it in resource representations, e.g., to express a hyperlink between the represented resource and the resource identified by the CRI.¶
This section defines the serialization of CRIs in Concise Binary Object Representation (CBOR) [RFC7049bis]. To reduce representation size, CRIs are not serialized directly. Instead, CRIs are indirectly referenced through CRI references that take advantage of hierarchical locality. The CBOR serialization of CRI references is specified in Section 5.1.¶
The only operation defined on a CRI reference is reference resolution: the act of transforming a CRI reference into a CRI. An application MUST implement this operation by applying the algorithm specified in Section 5.2 or any algorithm that is functionally equivalent to it.¶
The method of transforming a CRI into a CRI reference is unspecified; implementations are free to use any algorithm as long as reference resolution of the resulting CRI reference yields the original CRI.¶
When testing for equivalence or difference, applications SHOULD NOT directly compare CRI references; the references should be resolved to their respective CRI before comparison.¶
A CRI reference is encoded as a CBOR array [RFC7049bis] that contains a sequence of zero or more options. Each option consists of an option number followed by an option value, holding one component or sub-component of the CRI reference. To reduce size, both option numbers and option values are immediate elements of the CBOR array and appear in alternating order.¶
Not all possible sequences of options denote a well-formed CRI reference. The structure can be described in the Concise Data Definition Language (CDDL) [RFC8610] as follows:¶
CRI-Reference = [ (?scheme, ?((host.name // host.ip), ?port) // path.type), *path, *query, ?fragment ] scheme = (0, text .regexp "[a-z][a-z0-9+.-]*") host.name = (1, text) host.ip = (2, bytes .size 4 / bytes .size 16) port = (3, 0..65535) path.type = (4, 0..127) path = (5, text) query = (6, text) fragment = (7, text)¶
The options correspond to the (sub-)components of a CRI, as described
in Section 2, with the addition of the
path.type
option. The path.type
option can be used
to express path prefixes like "/", "./", "../", "../../", etc. The
exact semantics of the option values are defined by Section 5.2. A sequence of options that is empty
or starts with a path
option is equivalent the same sequence
prefixed by a path.type
option with value 2.¶
Examples:¶
[0, "coap", 2, h'C6336401', 3, 61616, 5, ".well-known", 5, "core"]¶
[4, 0, 5, ".well-known", 5, "core", 6, "rt=temperature-c"]¶
A CRI reference is considered absolute if the sequence of
options starts with a scheme
option.¶
A CRI reference is considered relative if the sequence of
options is empty or starts with an option other than a scheme
option.¶
The term "relative" implies that a "base CRI" exists against which the relative reference is applied. Aside from fragment-only references, relative references are only usable when a base CRI is known.¶
The following steps define the process of resolving any CRI reference against a base CRI so that the result is a CRI in the form of an absolute CRI reference:¶
First Option Number | T | E |
---|---|---|
0 (scheme) | 0 | 0 |
1 (host.name) | 0 | 1 |
2 (host.ip) | 0 | 1 |
3 (port) | (invalid sequence of options) | |
4 (path.type) | option value - 1 | if T < 0 then 5 else 6 |
5 (path) | 1 | 6 |
6 (query) | 0 | 6 |
7 (fragment) | 0 | 7 |
none/empty sequence | 0 | 7 |
path
options from the end of the buffer (up to the number of
path
options in the buffer).¶
path.type
option.¶
path
option and the value
of that option is the zero-length string, remove that option from
the buffer.¶
CRIs are meant to replace both Uniform Resource Identifiers (URIs) [RFC3986] and Internationalized Resource Identifiers (IRIs) [RFC3987] in constrained environments [RFC7228]. Applications in these environments may never need to use URIs and IRIs directly, especially when the resource identifier is used simply for identification purposes or when the CRI can be directly converted into a CoAP request.¶
However, it may be necessary in other environments to determine the associated URI or IRI of a CRI, and vice versa. Applications can perform these conversions as follows:¶
Everything in this section also applies to CRI references, URI references and IRI references.¶
Applications MUST convert a CRI reference to a URI reference by determining the components of the URI reference according to the following steps and then recomposing the components to a URI reference string as specified in Section 5.3 of [RFC3986].¶
If the CRI reference contains a scheme
option, the scheme
component of the URI reference consists of the value of that
option.
Otherwise, the scheme component is undefined.¶
If the CRI reference contains a host.name
or
host.ip
option, the authority component consists of the
host subcomponent, optionally followed by a colon (":") character
and the port subcomponent.
Otherwise, the authority component is undefined.¶
The host subcomponent consists of the value of the
host.name
or host.ip
option.¶
Any character in the value of a host.name
option that is
not in the set of unreserved characters (Section 2.3 of [RFC3986]) or "sub-delims" (Section 2.2 of [RFC3986]) MUST be percent-encoded.¶
The value of a host.ip
option MUST be
represented as a string that matches the "IPv4address" or
"IP-literal" rule (Section 3.2.2 of [RFC3986]).¶
If the CRI reference contains a port
option, the port
subcomponent consists of the value of that option in decimal
notation.
Otherwise, the colon (":") character and the port subcomponent are
both omitted.¶
If the CRI reference is an empty sequence of options or starts
with a port
option, a path
option, or a
path.type
option where the value is not 0, the conversion
fails.¶
If the CRI reference contains a host.name
option, a
host.ip
option or a path.type
option where the
value is not 0, the path component of the URI reference is
prefixed by a slash ("/") character.
Otherwise, the path component is prefixed by the empty string.¶
If the CRI reference contains one or more path
options,
the prefix is followed by the value of each option, separated by a
slash ("/") character.¶
Any character in the value of a path
option that is not
in the set of unreserved characters or "sub-delims" or a colon
(":") or commercial at ("@") character MUST be
percent-encoded.¶
If the authority component is defined and the path component does not match the "path-abempty" rule (Section 3.3 of [RFC3986]), the conversion fails.¶
If the authority component is undefined and the scheme component is defined and the path component does not match the "path-absolute", "path-rootless" or "path-empty" rule (Section 3.3 of [RFC3986]), the conversion fails.¶
If the authority component is undefined and the scheme component is undefined and the path component does not match the "path-absolute", "path-noscheme" or "path-empty" rule (Section 3.3 of [RFC3986]), the conversion fails.¶
If the CRI reference contains one or more query
options,
the query component of the URI reference consists of the value of
each option, separated by an ampersand ("&") character.
Otherwise, the query component is undefined.¶
Any character in the value of a query
option that is not
in the set of unreserved characters or "sub-delims" or a colon
(":"), commercial at ("@"), slash ("/") or question mark ("?")
character MUST be percent-encoded.
Additionally, any ampersand character ("&") in the option
value MUST be percent-encoded.¶
If the CRI reference contains a fragment option, the fragment component of the URI reference consists of the value of that option. Otherwise, the fragment component is undefined.¶
Any character in the value of a fragment
option that is
not in the set of unreserved characters or "sub-delims" or a colon
(":"), commercial at ("@"), slash ("/") or question mark ("?")
character MUST be percent-encoded.¶
Parsers of CRI references must operate on input that is assumed to be untrusted. This means that parsers MUST fail gracefully in the face of malicious inputs. Additionally, parsers MUST be prepared to deal with resource exhaustion (e.g., resulting from the allocation of big data items) or exhaustion of the call stack (stack overflow). See Section 10 of [RFC7049bis] for additional security considerations relating to CBOR.¶
The security considerations discussed in Section 7 of [RFC3986] and Section 8 of [RFC3987] for URIs and IRIs also apply to CRIs.¶
This document has no IANA actions.¶
This section is to be removed before publishing as an RFC.¶
Changes from -02 to -03:¶
path.type
option (#33).¶
append-relation
path type (#41).¶
Changes from -01 to -02:¶
Changes from -00 to -01:¶
Thanks to Christian Amsüss, Carsten Bormann, Ari Keranen, Jim Schaad and Dave Thaler for helpful comments and discussions that have shaped the document.¶