idnits 2.17.1 draft-ietf-core-href-09.txt: -(3): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(483): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(484): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(485): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(486): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(487): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(488): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(489): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(490): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(491): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(493): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(1033): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 14 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (16 January 2022) is 823 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 558 -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode' -- Obsolete informational reference (is this intentional?): RFC 7230 (Obsoleted by RFC 9110, RFC 9112) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 CoRE Working Group C. Bormann, Ed. 3 Internet-Draft Universität Bremen TZI 4 Intended status: Standards Track H. Birkholz 5 Expires: 20 July 2022 Fraunhofer SIT 6 16 January 2022 8 Constrained Resource Identifiers 9 draft-ietf-core-href-09 11 Abstract 13 The Constrained Resource Identifier (CRI) is a complement to the 14 Uniform Resource Identifier (URI) that serializes the URI components 15 in Concise Binary Object Representation (CBOR) instead of a sequence 16 of characters. This simplifies parsing, comparison and reference 17 resolution in environments with severe limitations on processing 18 power, code size, and memory size. 20 About This Document 22 This note is to be removed before publishing as an RFC. 24 Status information for this document may be found at 25 https://datatracker.ietf.org/doc/draft-ietf-core-href/. 27 Discussion of this document takes place on the Constrained RESTful 28 Environments Working Group mailing list (mailto:core@ietf.org), which 29 is archived at https://mailarchive.ietf.org/arch/browse/core/. 31 Source for this draft and an issue tracker can be found at 32 https://github.com/core-wg/href. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at https://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on 20 July 2022. 50 Copyright Notice 52 Copyright (c) 2022 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 57 license-info) in effect on the date of publication of this document. 58 Please review these documents carefully, as they describe your rights 59 and restrictions with respect to this document. Code Components 60 extracted from this document must include Revised BSD License text as 61 described in Section 4.e of the Trust Legal Provisions and are 62 provided without warranty as described in the Revised BSD License. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 67 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 4 68 2. Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 4 69 2.1. Constraints by example . . . . . . . . . . . . . . . . . 6 70 2.2. Constraints not expressed by the data model . . . . . . . 7 71 3. Creation and Normalization . . . . . . . . . . . . . . . . . 8 72 4. Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 8 73 5. CRI References . . . . . . . . . . . . . . . . . . . . . . . 9 74 5.1. CBOR Serialization . . . . . . . . . . . . . . . . . . . 10 75 5.2. Ingesting and encoding a CRI Reference . . . . . . . . . 12 76 5.3. Reference Resolution . . . . . . . . . . . . . . . . . . 13 77 6. Relationship between CRIs, URIs and IRIs . . . . . . . . . . 14 78 6.1. Converting CRIs to URIs . . . . . . . . . . . . . . . . . 14 79 7. Extended CRI: Accommodating Percent Encoding . . . . . . . . 16 80 8. Implementation Status . . . . . . . . . . . . . . . . . . . . 18 81 9. Security Considerations . . . . . . . . . . . . . . . . . . . 18 82 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 83 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 84 11.1. Normative References . . . . . . . . . . . . . . . . . . 18 85 11.2. Informative References . . . . . . . . . . . . . . . . . 19 86 Appendix A. The Small Print . . . . . . . . . . . . . . . . . . 20 87 Appendix B. Change Log . . . . . . . . . . . . . . . . . . . . . 20 88 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 23 89 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 23 90 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 92 1. Introduction 94 The Uniform Resource Identifier (URI) [RFC3986] and its most common 95 usage, the URI reference, are the Internet standard for linking to 96 resources in hypertext formats such as HTML [W3C.REC-html52-20171214] 97 or the HTTP "Link" header field [RFC8288]. 99 A URI reference is a sequence of characters chosen from the 100 repertoire of US-ASCII characters. The individual components of a 101 URI reference are delimited by a number of reserved characters, which 102 necessitates the use of a character escape mechanism called "percent- 103 encoding" when these reserved characters are used in a non-delimiting 104 function. The resolution of URI references involves parsing a 105 character sequence into its components, combining those components 106 with the components of a base URI, merging path components, removing 107 dot-segments, and recomposing the result back into a character 108 sequence. 110 Overall, the proper handling of URI references is quite intricate. 111 This can be a problem especially in constrained environments 112 [RFC7228], where nodes often have severe code size and memory size 113 limitations. As a result, many implementations in such environments 114 support only an ad-hoc, informally-specified, bug-ridden, non- 115 interoperable subset of half of RFC 3986. 117 This document defines the _Constrained Resource Identifier (CRI)_ by 118 constraining URIs to a simplified subset and serializing their 119 components in Concise Binary Object Representation (CBOR) [RFC8949] 120 instead of a sequence of characters. This allows typical operations 121 on URI references such as parsing, comparison and reference 122 resolution (including all corner cases) to be implemented in a 123 comparatively small amount of code. 125 As a result of simplification, however, CRIs are not capable of 126 expressing all URIs permitted by the generic syntax of RFC 3986 127 (hence the "constrained" in "Constrained Resource Identifier"). The 128 supported subset includes all URIs of the Constrained Application 129 Protocol (CoAP) [RFC7252], most URIs of the Hypertext Transfer 130 Protocol (HTTP) [RFC7230], Uniform Resource Names (URNs) [RFC8141], 131 and other similar URIs. The exact constraints are defined in 132 Section 2. 134 1.1. Notational Conventions 136 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 137 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 138 "OPTIONAL" in this document are to be interpreted as described in 139 BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 140 capitals, as shown here. 142 In this specification, the term "byte" is used in its now customary 143 sense as a synonym for "octet". 145 Terms defined in this document appear in _cursive_ where they are 146 introduced (rendered in plain text as the new term surrounded by 147 underscores). 149 2. Constraints 151 A Constrained Resource Identifier consists of the same five 152 components as a URI: scheme, authority, path, query, and fragment. 153 The components are subject to the following constraints: 155 C1. The scheme name can be any Unicode string (see Definition D80 156 in [Unicode]) that matches the syntax of a URI scheme (see 157 Section 3.1 of [RFC3986], which constrains schemes to ASCII) 158 and is lowercase (see Definition D139 in [Unicode]). The 159 scheme is always present. 161 C2. An authority is always a host identified by an IP address or 162 registered name, along with optional port information. User 163 information is not supported (it is often considered to be a 164 deprecated part of the URI syntax, but then see also 165 https://www.rfc-editor.org/errata/eid5964 (https://www.rfc- 166 editor.org/errata/eid5964)). 168 Alternatively, the authority can be absent; the two cases for 169 this defined in Section 3.3 of [RFC3986] are modeled by two 170 different values used in place of an absent authority: 172 * the path can begin with a root ("/", as when the authority 173 is present), or 175 * the path can be rootless. 177 (Note that in Figure 1, no-authority is marked as a feature, as 178 not all CRI implementations will support authority-less URIs.) 180 C3. An IP address can be either an IPv4 address or an IPv6 address, 181 optionally with a zone identifier [RFC6874]. Future versions 182 of IP are not supported (it is likely that a binary mapping 183 would be strongly desirable, and that cannot be designed ahead 184 of time, to these versions need to be added as a future 185 extension if needed). 187 C4. A registered name is a sequence of one or more _labels_, which, 188 when joined with dots (".") in between them, result in a 189 Unicode string that is lowercase and in Unicode Normalization 190 Form C (NFC) (see Definition D120 in [Unicode]). (The syntax 191 may be further restricted by the scheme.) 193 C5. A port is always an integer in the range from 0 to 65535. 194 Ports outside this range, empty ports (port subcomponents with 195 no digits, see Section 3.2.3 of [RFC3986]), or ports with 196 redundant leading zeros, are not supported. 198 C6. The port is omitted if and only if the port would be the same 199 as the scheme's default port (provided the scheme is defining 200 such a default port) or the scheme is not using ports. 202 C7. A path consists of zero or more path segments. Note that a 203 path of just a single zero-length path segment is allowed -- 204 this is considered equivalent to a path of zero path segments 205 by HTTP and CoAP, but not for CRIs in general as they only 206 perform normalization on the Syntax-Based Normalization level 207 (Section 6.2.2 of [RFC3986], not on the scheme-specific Scheme- 208 Based Normalization level (Section 6.2.3 of [RFC3986]). 210 (A CRI implementation may want to offer scheme-cognizant 211 interfaces, performing this scheme-specific normalization for 212 schemes it knows. The interface could assert which schemes the 213 implementation knows and provide pre-normalized CRIs. This can 214 also relieve the application from removing a lone zero-length 215 path segment before putting path segments into CoAP Options, 216 i.e., from performing the check and jump in item 8 of 217 Section 6.4 of [RFC7252]. See also SP1 in Appendix A.) 219 C8. A path segment can be any Unicode string that is in NFC, with 220 the exception of the special "." and ".." complete path 221 segments. Note that this includes the zero-length string. 223 If no authority is present in a CRI, the leading path segment 224 can not be empty. (See also SP1 in Appendix A.) 226 C9. A query always consists of one or more query parameters. A 227 query parameter can be any Unicode string that is in NFC. It 228 is often in the form of a "key=value" pair. When converting a 229 CRI to a URI, query parameters are separated by an ampersand 230 ("&") character. (This matches the structure and encoding of 231 the target URI in CoAP requests.) Queries are optional; there 232 is a difference between an absent query and a single query 233 parameter that is the empty string. 235 C10. A fragment identifier can be any Unicode string that is in NFC. 236 Fragment identifiers are optional; there is a difference 237 between an absent fragment identifier and a fragment identifier 238 that is the empty string. 240 C11. The syntax of registered names, path segments, query 241 parameters, and fragment identifiers may be further restricted 242 and sub-structured by the scheme. There is no support, 243 however, for escaping sub-delimiters that are not intended to 244 be used in a delimiting function. 246 C12. When converting a CRI to a URI, any character that is outside 247 the allowed character range or is a delimiter in the URI syntax 248 is percent-encoded. For CRIs, percent-encoding always uses the 249 UTF-8 encoding form (see Definition D92 in [Unicode]) to 250 convert the character to a sequence of bytes (that is then 251 converted to a sequence of %HH triplets). 253 2.1. Constraints by example 255 While most URIs in everyday use can be converted to CRIs and back to 256 URIs matching the input after syntax-based normalization of the URI, 257 these URIs illustrate the constraints by example: 259 * https://host%ffname, https://example.com/x?data=%ff 261 All URI components must, after percent decoding, be valid UTF-8 262 encoded text. Bytes that are not valid UTF-8 show up, for 263 example, in BitTorrent web seeds. 265 * https://example.com/component%3bone;component%3btwo, 266 http://example.com/component%3dequals 268 While delimiters can be used in an escaped and unescaped form in 269 URIs with generally distinct meanings, CRIs only support one 270 escapable delimiter character per component, which is the 271 delimiter by which the component is split up in the CRI. 273 Note that the separators . (for authority parts), / (for paths), & 274 (for query parameters) are special in that they are syntactic 275 delimiters of their respective components in CRIs. Thus, the 276 following examples _are_ convertible to CRIs: 278 https://interior%2edot/ 280 https://example.com/path%2fcomponent/second-component 282 https://example.com/x?ampersand=%26&questionmark=? 284 * https://alice@example.com/ 286 The user information can not be expressed in CRIs. 288 2.2. Constraints not expressed by the data model 290 There are syntactically valid CRIs and CRI references that can not be 291 converted into a URI or URI reference, respectively. 293 For CRI references, this is acceptable -- they can be resolved still 294 and result in a valid CRI that can be converted back. (An example of 295 this is [0, ["p"]] which appends a slash and the path segment "p" to 296 its base). 298 (Full) CRIs that do not correspond to a valid URI are not valid on 299 their own, and can not be used. Normatively they are characterized 300 by the Section 6.1 process producing a valid and syntax-normalized 301 URI. For easier understanding, they are listed here: 303 * CRIs (and CRI references) containing a path component "." or "..". 305 These would be removed by the remove_dot_segments algorithm of 306 [RFC3986], and thus never produce a normalized URI after 307 resolution. 309 (In CRI references, the discard value is used to afford segment 310 removal, and with "." being an unreserved character, expressing 311 them as "%2e" and "%2e%2e" is not even viable, let alone 312 practical). 314 * CRIs without authority whose path starts with two or more empty 315 segments. 317 When converted to URIs, these would violate the requirement that 318 in absence of an authority, a URI's path can not begin with two 319 slash characters, and they would be indistinguishable from a URI 320 with a shorter path and a present but empty authority component. 322 3. Creation and Normalization 324 In general, resource identifiers are created on the initial creation 325 of a resource with a certain resource identifier, or the initial 326 exposition of a resource under a particular resource identifier. 328 A Constrained Resource Identifier SHOULD be created by the naming 329 authority that governs the namespace of the resource identifier (see 330 also [RFC8820]). For example, for the resources of an HTTP origin 331 server, that server is responsible for creating the CRIs for those 332 resources. 334 The naming authority MUST ensure that any CRI created satisfies the 335 constraints defined in Section 2. The creation of a CRI fails if the 336 CRI cannot be validated to satisfy all of the constraints. 338 If a naming authority creates a CRI from user input, it MAY apply the 339 following (and only the following) normalizations to get the CRI more 340 likely to validate: 342 * map the scheme name to lowercase (C1); 344 * map the registered name to NFC (C4) and split it on embedded dots; 346 * elide the port if it is the default port for the scheme (C6); 348 * map path segments, query parameters and the fragment identifier to 349 NFC form (C8, C9, C10). 351 Once a CRI has been created, it can be used and transferred without 352 further normalization. All operations that operate on a CRI SHOULD 353 rely on the assumption that the CRI is appropriately pre-normalized. 354 (This does not contradict the requirement that when CRIs are 355 transferred, recipients must operate on as-good-as untrusted input 356 and fail gracefully in the face of malicious inputs.) 358 4. Comparison 360 One of the most common operations on CRIs is comparison: determining 361 whether two CRIs are equivalent, without dereferencing the CRIs 362 (using them to access their respective resource(s)). 364 Determination of equivalence or difference of CRIs is based on simple 365 component-wise comparison. If two CRIs are identical component-by- 366 component (using code-point-by-code-point comparison for components 367 that are Unicode strings) then it is safe to conclude that they are 368 equivalent. 370 This comparison mechanism is designed to minimize false negatives 371 while strictly avoiding false positives. The constraints defined in 372 Section 2 imply the most common forms of syntax- and scheme-based 373 normalizations in URIs, but do not comprise protocol-based 374 normalizations that require accessing the resources or detailed 375 knowledge of the scheme's dereference algorithm. False negatives can 376 be caused, for example, by CRIs that are not appropriately pre- 377 normalized and by resource aliases. 379 When CRIs are compared to select (or avoid) a network action, such as 380 retrieval of a representation, fragment components (if any) should be 381 excluded from the comparison. 383 5. CRI References 385 The most common usage of a Constrained Resource Identifier is to 386 embed it in resource representations, e.g., to express a hyperlink 387 between the represented resource and the resource identified by the 388 CRI. 390 This section defines the serialization of CRIs in Concise Binary 391 Object Representation (CBOR) [RFC8949]. To reduce representation 392 size, CRIs are not serialized directly. Instead, CRIs are indirectly 393 referenced through _CRI references_. These take advantage of 394 hierarchical locality and provide a very compact encoding. The CBOR 395 serialization of CRI references is specified in Section 5.1. 397 The only operation defined on a CRI reference is _reference 398 resolution_: the act of transforming a CRI reference into a CRI. An 399 application MUST implement this operation by applying the algorithm 400 specified in Section 5.3 (or any algorithm that is functionally 401 equivalent to it). 403 The reverse operation of transforming a CRI into a CRI reference is 404 unspecified; implementations are free to use any algorithm as long as 405 reference resolution of the resulting CRI reference yields the 406 original CRI. Notably, a CRI reference is not required to satisfy 407 all of the constraints of a CRI; the only requirement on a CRI 408 reference is that reference resolution MUST yield the original CRI. 410 When testing for equivalence or difference, applications SHOULD NOT 411 directly compare CRI references; the references should be resolved to 412 their respective CRI before comparison. 414 5.1. CBOR Serialization 416 A CRI reference is encoded as a CBOR array [RFC8949], with the 417 structure as described in the Concise Data Definition Language (CDDL) 418 [RFC8610] as follows: 420 ; not expressed in this CDDL spec: trailing nulls to be left off 422 CRI-Reference = [ 423 ((scheme / null, authority / no-authority) 424 // discard), ; relative reference 425 path / null, 426 query / null, 427 fragment / null 428 ] 430 scheme = scheme-name / scheme-id 431 scheme-name = text .regexp "[a-z][a-z0-9+.-]*" 432 scheme-id = (COAP / COAPS / HTTP / HTTPS / URN / DID / 433 other-scheme) 434 .within nint 435 COAP = -1 COAPS = -2 HTTP = -3 HTTPS = -4 URN = -5 DID = -6 436 other-scheme = nint .feature "scheme-id-extension" 438 no-authority = NOAUTH-NOSLASH / NOAUTH-LEADINGSLASH 439 NOAUTH-LEADINGSLASH = null .feature "no-authority" 440 NOAUTH-NOSLASH = true .feature "no-authority" 442 authority = [host, ?port] 443 host = (host-ip // host-name) 444 host-name = (*text) ; lowercase, NFC labels 445 host-ip = (bytes .size 4 // 446 (bytes .size 16, ?zone-id)) 447 zone-id = text 448 port = 0..65535 450 discard = DISCARD-ALL / 0..127 451 DISCARD-ALL = true 452 path = [*text] 453 query = [*text] 454 fragment = text 456 Figure 1: CDDL for CRI CBOR serialization 458 This CDDL specification is simplified for exposition and needs to be 459 augmented by the following rule for interchange: Trailing null values 460 MUST be removed, and two leading null values (scheme and authority 461 both not given) are represented by using the discard alternative 462 instead. 464 The rules scheme, authority, path, query, fragment correspond to the 465 (sub-)components of a CRI, as described in Section 2, with the 466 addition of the discard section. The discard section can be used 467 when neither a scheme nor an authority is present. It then expresses 468 path prefixes such as "/", "./", "../", "../../", etc. The exact 469 semantics of the section values are defined by Section 5.3. 471 Most URI references that Section 4.2 of [RFC3986] calls "relative 472 references" (i.e., references that need to undergo a resolution 473 process to obtain a URI) correspond to the CRI form that starts with 474 discard. The exception are relative references with an authority 475 (called a "network-path reference" in Section 4.2 of [RFC3986]), 476 which in CRI references never carry a discard section (the value of 477 discard defaults to true). 479 | The structure of a CRI is visualized using the somewhat limited 480 | means of a railroad diagram below. 481 | 482 | cri-reference: 483 | ╭──────────────────────────────────────>───────────────────────────────────────╮ 484 | │ │ 485 | │ ╭─────────────────────>─────────────────────╮ │ 486 | │ │ │ │ 487 | │ │ ╭──────────────>──────────────╮ │ │ 488 | │ │ │ │ │ │ 489 | │ │ │ ╭──────>───────╮ │ │ │ 490 | │ │ │ │ │ │ │ │ 491 | │├──╯──╮── scheme ── authority ──╭──╯── path ──╯── query ──╯── fragment ──╰──╰──╰──╰──┤│ 492 | │ │ 493 | ╰──────── discard ────────╯ 494 | 495 | This visualization does not go into the details of the 496 | elements. 498 Examples: 500 [-1, / scheme -- equivalent to "coap" / 501 [h'C6336401', / host / 502 61616], / port / 503 [".well-known", / path / 504 "core"] 505 ] 507 [true, / discard / 508 [".well-known", / path / 509 "core"], 510 ["rt=temperature-c"]] / query / 512 [-6, / scheme -- equivalent to "did" / 513 true, / authority = NOAUTH-NOSLASH / 514 ["web:alice:bob"] / path / 515 ] 517 A CRI reference is considered _well-formed_ if it matches the 518 structure as expressed in Figure 1 in CDDL, with the additional 519 requirement that trailing null values are removed from the array. 521 A CRI reference is considered _absolute_ if it is well-formed and the 522 sequence of sections starts with a non-null scheme. 524 A CRI reference is considered _relative_ if it is well-formed and the 525 sequence of sections is empty or starts with a section other than 526 those that would constitute a scheme. 528 5.2. Ingesting and encoding a CRI Reference 530 From an abstract point of view, a CRI Reference is a data structure 531 with six sections: 533 scheme, authority, discard, path, query, fragment 535 Each of these sections can be unset ("null"), except for discard, 536 which is always an unsigned number or true. If scheme and/or 537 authority are non-null, discard must be true. 539 When ingesting a CRI Reference that is in the transfer form, those 540 sections are filled in from the transfer form (unset sections are 541 filled with null), and the following steps are performed: 543 * If the array is entirely empty, replace it with [0]. 545 * If discard is present in the transfer form (i.e., the outer array 546 starts with true or an unsigned number), set scheme and authority 547 to null. 549 * If scheme and/or authority are present in the transfer form (i.e., 550 the outer array starts with null, a text string, or a negative 551 integer), set discard to true. 553 Upon encoding the abstract form into the transfer form, the inverse 554 processing is performed: If scheme and/or authority are not null, the 555 discard value is not transferred (it must be true in this case). If 556 they are both null, they are both left out and only discard is 557 transferred. Trailing null values are removed from the array. As a 558 special case, an empty array is sent in place for a remaining [0] 559 (URI ""). 561 5.3. Reference Resolution 563 The term "relative" implies that a "base CRI" exists against which 564 the relative reference is applied. Aside from fragment-only 565 references, relative references are only usable when a base CRI is 566 known. 568 The following steps define the process of resolving any well-formed 569 CRI reference against a base CRI so that the result is a CRI in the 570 form of an absolute CRI reference: 572 1. Establish the base CRI of the CRI reference and express it in the 573 form of an abstract absolute CRI reference. 575 2. Initialize a buffer with the sections from the base CRI. 577 3. If the value of discard is true in the CRI reference, replace the 578 path in the buffer with the empty array, unset query and 579 fragment, and set a true authority to null. If the value of 580 discard is an unsigned number, remove as many elements from the 581 end of the path array; if it is non-zero, unset query and 582 fragment. Set discard to true in the buffer. 584 4. If the path section is set in the CRI reference, append all 585 elements from the path array to the array in the path section in 586 the buffer; unset query and fragment. 588 5. Apart from the path and discard, copy all non-null sections from 589 the CRI reference to the buffer in sequence; unset fragment if 590 query is non-null and thus copied. 592 6. Return the sections in the buffer as the resolved CRI. 594 6. Relationship between CRIs, URIs and IRIs 596 CRIs are meant to replace both Uniform Resource Identifiers (URIs) 597 [RFC3986] and Internationalized Resource Identifiers (IRIs) [RFC3987] 598 in constrained environments [RFC7228]. Applications in these 599 environments may never need to use URIs and IRIs directly, especially 600 when the resource identifier is used simply for identification 601 purposes or when the CRI can be directly converted into a CoAP 602 request. 604 However, it may be necessary in other environments to determine the 605 associated URI or IRI of a CRI, and vice versa. Applications can 606 perform these conversions as follows: 608 CRI to URI 609 A CRI is converted to a URI as specified in Section 6.1. 611 URI to CRI 612 The method of converting a URI to a CRI is unspecified; 613 implementations are free to use any algorithm as long as 614 converting the resulting CRI back to a URI yields an equivalent 615 URI. 617 CRI to IRI 618 A CRI can be converted to an IRI by first converting it to a URI 619 as specified in Section 6.1, and then converting the URI to an IRI 620 as described in Section 3.2 of [RFC3987]. 622 IRI to CRI 623 An IRI can be converted to a CRI by first converting it to a URI 624 as described in Section 3.1 of [RFC3987], and then converting the 625 URI to a CRI as described above. 627 Everything in this section also applies to CRI references, URI 628 references and IRI references. 630 6.1. Converting CRIs to URIs 632 Applications MUST convert a CRI reference to a URI reference by 633 determining the components of the URI reference according to the 634 following steps and then recomposing the components to a URI 635 reference string as specified in Section 5.3 of [RFC3986]. 637 scheme 638 If the CRI reference contains a scheme section, the scheme 639 component of the URI reference consists of the value of that 640 section. Otherwise, the scheme component is unset. 642 authority 643 If the CRI reference contains a host-name or host-ip item, the 644 authority component of the URI reference consists of a host 645 subcomponent, optionally followed by a colon (":") character and a 646 port subcomponent. Otherwise, the authority component is unset. 648 The host subcomponent consists of the value of the host-name or 649 host-ip item. 651 The host-name is turned into a single string by joining the 652 elements separated by dots ("."). Any character in the value of a 653 host-name item that is not in the set of unreserved characters 654 (Section 2.3 of [RFC3986]) or "sub-delims" (Section 2.2 of 655 [RFC3986]) MUST be percent-encoded. 657 The value of a host-ip item MUST be represented as a string that 658 matches the "IPv4address" or "IP-literal" rule (Section 3.2.2 of 659 [RFC3986]). Any zone-id is appended to the string, separated by 660 "%25" as defined in Section 2 of [RFC6874], or as specified in a 661 successor zone-id specification document; this also leads to a 662 modified "IP-literal" rule as specified in these documents. 664 If the CRI reference contains a port item, the port subcomponent 665 consists of the value of that item in decimal notation. 666 Otherwise, the colon (":") character and the port subcomponent are 667 both omitted. 669 path 670 If the CRI reference contains a discard item of value true, the 671 path component is prefixed by a slash ("/") character. If it 672 contains a discard item of value 0 and the path item is present, 673 the conversion fails. If it contains a positive discard item, the 674 path component is prefixed by as many "../" components as the 675 discard value minus one indicates. 677 If the discard item is not present and the CRI reference contains 678 an authority that is true, the path component of the URI reference 679 is prefixed by the zero-length string. Otherwise, the path 680 component is prefixed by a slash ("/") character. 682 If the CRI reference contains one or more path items, the prefix 683 is followed by the value of each item, separated by a slash ("/") 684 character. 686 Any character in the value of a path item that is not in the set 687 of unreserved characters or "sub-delims" or a colon (":") or 688 commercial at ("@") character MUST be percent-encoded. 690 If the authority component is present (not null or true) and the 691 path component does not match the "path-abempty" rule (Section 3.3 692 of [RFC3986]), the conversion fails. 694 If the authority component is not present, but the scheme 695 component is, and the path component does not match the "path- 696 absolute", "path-rootless" (authority == true) or "path-empty" 697 rule (Section 3.3 of [RFC3986]), the conversion fails. 699 If neither the authority component nor the scheme component are 700 present, and the path component does not match the "path- 701 absolute", "path-noscheme" or "path-empty" rule (Section 3.3 of 702 [RFC3986]), the conversion fails. 704 query 705 If the CRI reference contains one or more query items, the query 706 component of the URI reference consists of the value of each item, 707 separated by an ampersand ("&") character. Otherwise, the query 708 component is unset. 710 Any character in the value of a query item that is not in the set 711 of unreserved characters or "sub-delims" or a colon (":"), 712 commercial at ("@"), slash ("/") or question mark ("?") character 713 MUST be percent-encoded. Additionally, any ampersand character 714 ("&") in the item value MUST be percent-encoded. 716 fragment 717 If the CRI reference contains a fragment item, the fragment 718 component of the URI reference consists of the value of that item. 719 Otherwise, the fragment component is unset. 721 Any character in the value of a fragment item that is not in the 722 set of unreserved characters or "sub-delims" or a colon (":"), 723 commercial at ("@"), slash ("/") or question mark ("?") character 724 MUST be percent-encoded. 726 7. Extended CRI: Accommodating Percent Encoding 728 CRIs have been designed to relieve implementations operating on CRIs 729 from string scanning, which both helps constrained implementations 730 and implementations that need to achieve high throughput. 732 Basic CRI does not support URI components that _require_ percent- 733 encoding (Section 2.1 of [RFC3986]) to represent them in the URI 734 syntax, except where that percent-encoding is used to escape the main 735 delimiter in use. 737 E.g., the URI 738 https://alice/3%2f4-inch 740 is represented by the basic CRI 742 [-4, ["alice"], ["3/4-inch"]] 744 However, percent-encoding that is used at the application level is 745 not supported by basic CRIs: 747 did:web:alice:7%3A1-balun 749 This section presents a method to represent percent-encoded segments 750 of hostnames, paths, and queries. 752 The four CDDL rules 754 host-name = (*text) 755 path = [*text] 756 query = [*text] 757 fragment = text 759 are replaced with 761 host-name = (*text-or-pet) 762 path = [*text-or-pet] 763 query = [*text-or-pet] 764 fragment = text-or-pet 766 text-or-pet = text / 767 text-pet-sequence .feature "extended-cri" 769 ; text1 and pet1 alternating, at least one pet1: 770 text-pet-sequence = [?text1, ((+(pet1, text1), ?pet1) // pet1)] 771 ; pet is percent-encoded bytes 772 pet1 = bytes .ne '' 773 text1 = text .ne "" 775 That is, for each of the host-name, path, and query segments, and for 776 the fragment component, an alternate representation is provided 777 besides a simple text string: a non-empty array of alternating non- 778 blank text and byte strings, the text strings of which stand for non- 779 percent-encoded text, while the byte strings retain the special 780 semantics of percent-encoded text without actually being percent- 781 encoded. 783 The above DID URI can now be represented as: 785 [-6, true, [["web:alice:7", ':', "1-balun"]]] 787 8. Implementation Status 789 With the exception of the authority=true fix, host-names split into 790 labels, and Section 7, CRIs are implemented in 791 https://gitlab.com/chrysn/micrurus. 793 9. Security Considerations 795 Parsers of CRI references must operate on input that is assumed to be 796 untrusted. This means that parsers MUST fail gracefully in the face 797 of malicious inputs. Additionally, parsers MUST be prepared to deal 798 with resource exhaustion (e.g., resulting from the allocation of big 799 data items) or exhaustion of the call stack (stack overflow). See 800 Section 10 of [RFC8949] for additional security considerations 801 relating to CBOR. 803 The security considerations discussed in Section 7 of [RFC3986] and 804 Section 8 of [RFC3987] for URIs and IRIs also apply to CRIs. 806 10. IANA Considerations 808 This document has no IANA actions. 810 11. References 812 11.1. Normative References 814 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 815 Requirement Levels", BCP 14, RFC 2119, 816 DOI 10.17487/RFC2119, March 1997, 817 . 819 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 820 Resource Identifier (URI): Generic Syntax", STD 66, 821 RFC 3986, DOI 10.17487/RFC3986, January 2005, 822 . 824 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 825 Identifiers (IRIs)", RFC 3987, DOI 10.17487/RFC3987, 826 January 2005, . 828 [RFC6874] Carpenter, B., Cheshire, S., and R. Hinden, "Representing 829 IPv6 Zone Identifiers in Address Literals and Uniform 830 Resource Identifiers", RFC 6874, DOI 10.17487/RFC6874, 831 February 2013, . 833 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 834 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 835 May 2017, . 837 [RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data 838 Definition Language (CDDL): A Notational Convention to 839 Express Concise Binary Object Representation (CBOR) and 840 JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, 841 June 2019, . 843 [RFC8949] Bormann, C. and P. Hoffman, "Concise Binary Object 844 Representation (CBOR)", STD 94, RFC 8949, 845 DOI 10.17487/RFC8949, December 2020, 846 . 848 [Unicode] The Unicode Consortium, "The Unicode Standard, Version 849 13.0.0", ISBN 978-1-936213-26-9, March 2020, 850 . 852 11.2. Informative References 854 [RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for 855 Constrained-Node Networks", RFC 7228, 856 DOI 10.17487/RFC7228, May 2014, 857 . 859 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 860 Protocol (HTTP/1.1): Message Syntax and Routing", 861 RFC 7230, DOI 10.17487/RFC7230, June 2014, 862 . 864 [RFC7252] Shelby, Z., Hartke, K., and C. Bormann, "The Constrained 865 Application Protocol (CoAP)", RFC 7252, 866 DOI 10.17487/RFC7252, June 2014, 867 . 869 [RFC8141] Saint-Andre, P. and J. Klensin, "Uniform Resource Names 870 (URNs)", RFC 8141, DOI 10.17487/RFC8141, April 2017, 871 . 873 [RFC8288] Nottingham, M., "Web Linking", RFC 8288, 874 DOI 10.17487/RFC8288, October 2017, 875 . 877 [RFC8820] Nottingham, M., "URI Design and Ownership", BCP 190, 878 RFC 8820, DOI 10.17487/RFC8820, June 2020, 879 . 881 [W3C.REC-html52-20171214] 882 Faulkner, S., Eicholz, A., Leithead, T., Danilo, A., and 883 S. Moon, "HTML 5.2", World Wide Web Consortium 884 Recommendation REC-html52-20171214, 14 December 2017, 885 . 887 Appendix A. The Small Print 889 This appendix lists a few corner cases of URI semantics that 890 implementers of CRIs need to be aware of, but that are not 891 representative of the normal operation of CRIs. 893 SP1. Initial (Lone/Leading) Empty Path Segments: 895 * _Lone empty path segments:_ As per [RFC3986], s://x is distinct 896 from s://x/ -- i.e., a URI with an empty path is different from 897 one with a lone empty path segment. However, in HTTP, CoAP, they 898 are implicitly aliased (for CoAP, in item 8 of Section 6.4 of 899 [RFC7252]). As per item 7 of Section 6.5 of [RFC7252], 900 recomposition of a URI without Uri-Path Options from the other 901 URI-related CoAP Options produces s://x/, not s://x -- CoAP 902 prefers the lone empty path segment form. 903 // TBD: add similar text for HTTP, if that can be 904 made.Section 6.2.3 of [RFC3986] even states: 906 | In general, a URI that uses the generic syntax for authority with 907 | an empty path should be normalized to a path of "/". 909 * _Leading empty path segments without authority_: Somewhat related, 910 note also that URIs and URI references that do not carry an 911 authority cannot represent initial empty path segments (i.e., that 912 are followed by further path segments): s://x//foo works, but in a 913 s://foo URI or an (absolute-path) URI reference of the form //foo 914 the double slash would be mis-parsed as leading in to an 915 authority. 917 // (TBD: Add more small print/move that over from above.) 919 Appendix B. Change Log 921 This section is to be removed before publishing as an RFC. 923 Changes from -08 to -09 925 * Identify more esoteric features with a CDDL ".feature". 927 * Clarify that well-formedness requires removing trailing nulls. 929 * Fragments can contain PET. 931 * Percent-encoded text in PET is treated as byte strings. 933 * URIs with an authority but a completely empty path (e.g., 934 http://example.com): CRIs with an authority component no longer 935 always produce at least a slash in the path component. 937 For generic schemes, the conversion of scheme://example.com to a 938 CRI is now possible because CRI produces a URI with an authority 939 not followed by a slash following the updated rules of 940 Section 6.1. Schemes like http and coap do not distinguish 941 between the empty path and the path containing a single slash when 942 an authority is set (as recommended in [RFC3986]). For these 943 schemes, that equivalence allows implementations to convert the 944 just-a-slash URI to a CRI with a zero length path array (which, 945 however, when converted back, does not produce a slash after the 946 authority). 948 (Add an appendix "the small print" for more detailed discussion of 949 pesky corner cases like this.) 951 Changes from -07 to -08 953 * Fix the encoding of NOAUTH-NOSLASH / NOAUTH-LEADINGSLASH 955 * Add URN and DID schemes, add example. 957 * Add PET 959 * Remove hopeless attempt to encode "remote trailing nulls" rule in 960 CDDL (which is not a transformation language). 962 Changes from -06 to -07 964 * More explicitly discuss constraints (Section 2), add examples 965 (Section 2.1). 967 * Make CDDL more explicit about special simple values. 969 * Lots of gratuitous changes from XML2RFC redefinition of 970 semantics. 972 Changes from -05 to -06 974 * rework authority: 976 - split reg-names at dots; 977 - add optional zone identifiers [RFC6874] to IP addresses 979 Changes from -04 to -05 981 * Simplify CBOR structure. 983 * Add implementation status section. 985 Changes from -03 to -04: 987 * Minor editorial improvements. 989 * Renamed path.type/path-type to discard. 991 * Renamed option to section, substructured into items. 993 * Simplified the table "resolution-variables". 995 * Use the CBOR structure inspired by Jim Schaad's proposals. 997 Changes from -02 to -03: 999 * Expanded the set of supported schemes (#3). 1001 * Specified creation, normalization and comparison (#9). 1003 * Clarified the default value of the path.type option (#33). 1005 * Removed the append-relation path.type option (#41). 1007 * Renumbered the remaining path.types. 1009 * Renumbered the option numbers. 1011 * Restructured the document. 1013 * Minor editorial improvements. 1015 Changes from -01 to -02: 1017 * Changed the syntax of schemes to exclude upper case characters 1018 (#13). 1020 * Minor editorial improvements (#34 #37). 1022 Changes from -00 to -01: 1024 * None. 1026 Acknowledgements 1028 CRIs were developed by Klaus Hartke for use in the Constrained 1029 RESTful Application Language (CoRAL). The current author team is 1030 completing this work with a view to achieve good integration with the 1031 potential use cases, both inside and outside of CoRAL. 1033 Thanks to Christian Amsüss, Thomas Fossati, Ari Keränen, Jim Schaad, 1034 Dave Thaler and Marco Tiloca for helpful comments and discussions 1035 that have shaped the document. 1037 Contributors 1039 Klaus Hartke 1040 Ericsson 1041 Torshamnsgatan 23 1042 SE-16483 Stockholm 1043 Sweden 1045 Email: klaus.hartke@ericsson.com 1047 Authors' Addresses 1049 Carsten Bormann (editor) 1050 Universität Bremen TZI 1051 Postfach 330440 1052 D-28359 Bremen 1053 Germany 1055 Phone: +49-421-218-63921 1056 Email: cabo@tzi.org 1058 Henk Birkholz 1059 Fraunhofer SIT 1060 Rheinstrasse 75 1061 64295 Darmstadt 1062 Germany 1064 Email: henk.birkholz@sit.fraunhofer.de