idnits 2.17.1 draft-ietf-core-href-08.txt: -(3): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(468): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(469): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(470): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(471): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(472): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(473): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(474): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(475): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(476): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(478): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 14 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (7 November 2021) is 900 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 542 -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode' -- Obsolete informational reference (is this intentional?): RFC 7230 (Obsoleted by RFC 9110, RFC 9112) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 CoRE Working Group C. Bormann, Ed. 3 Internet-Draft Universität Bremen TZI 4 Intended status: Standards Track H. Birkholz 5 Expires: 11 May 2022 Fraunhofer SIT 6 7 November 2021 8 Constrained Resource Identifiers 9 draft-ietf-core-href-08 11 Abstract 13 The Constrained Resource Identifier (CRI) is a complement to the 14 Uniform Resource Identifier (URI) that serializes the URI components 15 in Concise Binary Object Representation (CBOR) instead of a sequence 16 of characters. This simplifies parsing, comparison and reference 17 resolution in environments with severe limitations on processing 18 power, code size, and memory size. 20 Discussion Venues 22 This note is to be removed before publishing as an RFC. 24 Discussion of this document takes place on the Constrained RESTful 25 Environments Working Group mailing list (core@ietf.org), which is 26 archived at https://mailarchive.ietf.org/arch/browse/core/. 28 Source for this draft and an issue tracker can be found at 29 https://github.com/core-wg/href. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at https://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on 11 May 2022. 48 Copyright Notice 50 Copyright (c) 2021 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 55 license-info) in effect on the date of publication of this document. 56 Please review these documents carefully, as they describe your rights 57 and restrictions with respect to this document. Code Components 58 extracted from this document must include Simplified BSD License text 59 as described in Section 4.e of the Trust Legal Provisions and are 60 provided without warranty as described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 65 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 66 2. Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 4 67 2.1. Constraints by example . . . . . . . . . . . . . . . . . 5 68 2.2. Constraints not expressed by the data model . . . . . . . 6 69 3. Creation and Normalization . . . . . . . . . . . . . . . . . 7 70 4. Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 8 71 5. CRI References . . . . . . . . . . . . . . . . . . . . . . . 8 72 5.1. CBOR Serialization . . . . . . . . . . . . . . . . . . . 9 73 5.2. Ingesting and encoding a CRI Reference . . . . . . . . . 12 74 5.3. Reference Resolution . . . . . . . . . . . . . . . . . . 12 75 6. Relationship between CRIs, URIs and IRIs . . . . . . . . . . 13 76 6.1. Converting CRIs to URIs . . . . . . . . . . . . . . . . . 14 77 7. Extended CRI: Accommodating Percent Encoding . . . . . . . . 16 78 8. Implementation Status . . . . . . . . . . . . . . . . . . . . 17 79 9. Security Considerations . . . . . . . . . . . . . . . . . . . 17 80 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 81 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 82 11.1. Normative References . . . . . . . . . . . . . . . . . . 18 83 11.2. Informative References . . . . . . . . . . . . . . . . . 18 84 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 19 85 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 20 86 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 21 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 89 1. Introduction 91 The Uniform Resource Identifier (URI) [RFC3986] and its most common 92 usage, the URI reference, are the Internet standard for linking to 93 resources in hypertext formats such as HTML [W3C.REC-html52-20171214] 94 or the HTTP "Link" header field [RFC8288]. 96 A URI reference is a sequence of characters chosen from the 97 repertoire of US-ASCII characters. The individual components of a 98 URI reference are delimited by a number of reserved characters, which 99 necessitates the use of a character escape mechanism called "percent- 100 encoding" when these reserved characters are used in a non-delimiting 101 function. The resolution of URI references involves parsing a 102 character sequence into its components, combining those components 103 with the components of a base URI, merging path components, removing 104 dot-segments, and recomposing the result back into a character 105 sequence. 107 Overall, the proper handling of URI references is quite intricate. 108 This can be a problem especially in constrained environments 109 [RFC7228], where nodes often have severe code size and memory size 110 limitations. As a result, many implementations in such environments 111 support only an ad-hoc, informally-specified, bug-ridden, non- 112 interoperable subset of half of RFC 3986. 114 This document defines the _Constrained Resource Identifier (CRI)_ by 115 constraining URIs to a simplified subset and serializing their 116 components in Concise Binary Object Representation (CBOR) [RFC8949] 117 instead of a sequence of characters. This allows typical operations 118 on URI references such as parsing, comparison and reference 119 resolution (including all corner cases) to be implemented in a 120 comparatively small amount of code. 122 As a result of simplification, however, CRIs are not capable of 123 expressing all URIs permitted by the generic syntax of RFC 3986 124 (hence the "constrained" in "Constrained Resource Identifier"). The 125 supported subset includes all URIs of the Constrained Application 126 Protocol (CoAP) [RFC7252], most URIs of the Hypertext Transfer 127 Protocol (HTTP) [RFC7230], Uniform Resource Names (URNs) [RFC8141], 128 and other similar URIs. The exact constraints are defined in 129 Section 2. 131 1.1. Notational Conventions 133 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 134 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 135 "OPTIONAL" in this document are to be interpreted as described in 136 BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 137 capitals, as shown here. 139 In this specification, the term "byte" is used in its now customary 140 sense as a synonym for "octet". 142 Terms defined in this document appear in _cursive_ where they are 143 introduced (rendered in plain text as the new term surrounded by 144 underscores). 146 2. Constraints 148 A Constrained Resource Identifier consists of the same five 149 components as a URI: scheme, authority, path, query, and fragment. 150 The components are subject to the following constraints: 152 C1. The scheme name can be any Unicode string (see Definition D80 153 in [Unicode]) that matches the syntax of a URI scheme (see 154 Section 3.1 of [RFC3986], which constrains schemes to ASCII) 155 and is lowercase (see Definition D139 in [Unicode]). The 156 scheme is always present. 158 C2. An authority is always a host identified by an IP address or 159 registered name, along with optional port information. User 160 information is not supported. 162 Alternatively, the authority can be absent; the two cases for 163 this defined in Section 3.3 of [RFC3986] are modeled by two 164 different values used in place of an absent authority: 166 * the path can begin with a root ("/", as when the authority 167 is present), or 169 * the path can be rootless. 171 C3. An IP address can be either an IPv4 address or an IPv6 address, 172 optionally with a zone identifier [RFC6874]. Future versions 173 of IP are not supported. 175 C4. A registered name is a sequence of one or more _labels_, which, 176 when joined with dots (".") in between them, result in a 177 Unicode string that is lowercase and in Unicode Normalization 178 Form C (NFC) (see Definition D120 in [Unicode]). (The syntax 179 may be further restricted by the scheme.) 181 C5. A port is always an integer in the range from 0 to 65535. 182 Ports outside this range, empty ports (port subcomponents with 183 no digits, see Section 3.2.3 of [RFC3986]), or ports with 184 redundant leading zeros, are not supported. 186 C6. The port is omitted if and only if the port would be the same 187 as the scheme's default port (provided the scheme is defining 188 such a default port) or the scheme is not using ports. 190 C7. A path consists of zero or more path segments. A path must not 191 consist of a single zero-length path segment, which is 192 considered equivalent to a path of zero path segments. 194 C8. A path segment can be any Unicode string that is in NFC, with 195 the exception of the special "." and ".." complete path 196 segments. It can be the zero-length string. No special 197 constraints are placed on the first path segment. 199 C9. A query always consists of one or more query parameters. A 200 query parameter can be any Unicode string that is in NFC. It 201 is often in the form of a "key=value" pair. When converting a 202 CRI to a URI, query parameters are separated by an ampersand 203 ("&") character. (This matches the structure and encoding of 204 the target URI in CoAP requests.) Queries are optional; there 205 is a difference between an absent query and a single query 206 parameter that is the empty string. 208 C10. A fragment identifier can be any Unicode string that is in NFC. 209 Fragment identifiers are optional; there is a difference 210 between an absent fragment identifier and a fragment identifier 211 that is the empty string. 213 C11. The syntax of registered names, path segments, query 214 parameters, and fragment identifiers may be further restricted 215 and sub-structured by the scheme. There is no support, 216 however, for escaping sub-delimiters that are not intended to 217 be used in a delimiting function. 219 C12. When converting a CRI to a URI, any character that is outside 220 the allowed character range or is a delimiter in the URI syntax 221 is percent-encoded. For CRIs, percent-encoding always uses the 222 UTF-8 encoding form (see Definition D92 in [Unicode]) to 223 convert the character to a sequence of bytes (that is then 224 converted to a sequence of %HH triplets). 226 2.1. Constraints by example 228 While most URIs in everyday use can be converted to CRIs and back to 229 URIs matching the input after syntax-based normalization of the URI, 230 these URIs illustrate the constraints by example: 232 * https://host%ffname, https://example.com/x?data=%ff 234 All URI components must, after percent decoding, be valid UTF-8 235 encoded text. Bytes that are not valid UTF-8 show up, for 236 example, in BitTorrent web seeds. 238 * https://example.com/component%3bone;component%3btwo, 239 http://example.com/component%3dequals 241 While delimiters can be used in an escaped and unescaped form in 242 URIs with generally distinct meanings, CRIs only support one 243 escapable delimiter character per component, which is the 244 delimiter by which the component is split up in the CRI. 246 Note that the separators . (for authority parts), / (for paths), & 247 (for query parameters) are special in that they are syntactic 248 delimiters of their respective components in CRIs. Thus, the 249 following examples _are_ convertible to CRIs: 251 https://interior%2edot/ 253 https://example.com/path%2fcomponent/second-component 255 https://example.com/x?ampersand=%26&questionmark=? 257 * https://alice@example.com/ 259 The user information can not be expressed in CRIs. 261 * URIs with an authority but a completely empty path (eg. 262 http://example.com) 264 CRIs with an authority component always produce at least a slash 265 in the path component. 267 For generic schemes, the conversion of scheme://example.com to a 268 CRI is impossible because no CRI produces a URI with an authority 269 not followed by a slash following the rules of Section 6.1. Most 270 schemes do not distinguish between the empty path and the path 271 containing a single slash when an authority is set (as recommended 272 in [RFC3986]). For these schemes, that equivalence allows 273 converting even the slash-less URI to a CRI (which, when converted 274 back, produces a slash after the authority). 276 2.2. Constraints not expressed by the data model 278 There are syntactically valid CRIs and CRI references that can not be 279 converted into a URI or URI reference, respectively. 281 For CRI references, this is acceptable -- they can be resolved still 282 and result in a valid CRI that can be converted back. (An example of 283 this is [0, ["p"]] which appends a slash and the path segment "p" to 284 its base). 286 (Full) CRIs that do not correspond to a valid URI are not valid on 287 their own, and can not be used. Normatively they are characterized 288 by the Section 6.1 process producing a valid and syntax-normalized 289 URI. For easier understanding, they are listed here: 291 * CRIs (and CRI references) containing a path component "." or "..". 293 These would be removed by the remove_dot_segments algorithm of 294 [RFC3986], and thus never produce a normalized URI after 295 resolution. 297 (In CRI references, the discard value is used to afford segment 298 removal, and with "." being an unreserved character, expressing 299 them as "%2e" and "%2e%2e" is not even viable, let alone 300 practical). 302 * CRIs without authority whose path starts with two or more empty 303 segments. 305 When converted to URIs, these would violate the requirement that 306 in absence of an authority, a URI's path can not begin with two 307 slash characters, and they would be indistinguishable from a URI 308 with a shorter path and a present but empty authority component. 310 3. Creation and Normalization 312 In general, resource identifiers are created on the initial creation 313 of a resource with a certain resource identifier, or the initial 314 exposition of a resource under a particular resource identifier. 316 A Constrained Resource Identifier SHOULD be created by the naming 317 authority that governs the namespace of the resource identifier (see 318 also [RFC8820]). For example, for the resources of an HTTP origin 319 server, that server is responsible for creating the CRIs for those 320 resources. 322 The naming authority MUST ensure that any CRI created satisfies the 323 constraints defined in Section 2. The creation of a CRI fails if the 324 CRI cannot be validated to satisfy all of the constraints. 326 If a naming authority creates a CRI from user input, it MAY apply the 327 following (and only the following) normalizations to get the CRI more 328 likely to validate: 330 * map the scheme name to lowercase (C1); 332 * map the registered name to NFC (C4) and split it on embedded dots; 333 * elide the port if it is the default port for the scheme (C6); 335 * elide a single zero-length path segment (C7); 337 * map path segments, query parameters and the fragment identifier to 338 NFC (C8, C9, C10). 340 Once a CRI has been created, it can be used and transferred without 341 further normalization. All operations that operate on a CRI SHOULD 342 rely on the assumption that the CRI is appropriately pre-normalized. 343 (This does not contradict the requirement that when CRIs are 344 transferred, recipients must operate on as-good-as untrusted input 345 and fail gracefully in the face of malicious inputs.) 347 4. Comparison 349 One of the most common operations on CRIs is comparison: determining 350 whether two CRIs are equivalent, without dereferencing the CRIs 351 (using them to access their respective resource(s)). 353 Determination of equivalence or difference of CRIs is based on simple 354 component-wise comparison. If two CRIs are identical component-by- 355 component (using code-point-by-code-point comparison for components 356 that are Unicode strings) then it is safe to conclude that they are 357 equivalent. 359 This comparison mechanism is designed to minimize false negatives 360 while strictly avoiding false positives. The constraints defined in 361 Section 2 imply the most common forms of syntax- and scheme-based 362 normalizations in URIs, but do not comprise protocol-based 363 normalizations that require accessing the resources or detailed 364 knowledge of the scheme's dereference algorithm. False negatives can 365 be caused, for example, by CRIs that are not appropriately pre- 366 normalized and by resource aliases. 368 When CRIs are compared to select (or avoid) a network action, such as 369 retrieval of a representation, fragment components (if any) should be 370 excluded from the comparison. 372 5. CRI References 374 The most common usage of a Constrained Resource Identifier is to 375 embed it in resource representations, e.g., to express a hyperlink 376 between the represented resource and the resource identified by the 377 CRI. 379 This section defines the serialization of CRIs in Concise Binary 380 Object Representation (CBOR) [RFC8949]. To reduce representation 381 size, CRIs are not serialized directly. Instead, CRIs are indirectly 382 referenced through _CRI references_. These take advantage of 383 hierarchical locality and provide a very compact encoding. The CBOR 384 serialization of CRI references is specified in Section 5.1. 386 The only operation defined on a CRI reference is _reference 387 resolution_: the act of transforming a CRI reference into a CRI. An 388 application MUST implement this operation by applying the algorithm 389 specified in Section 5.3 (or any algorithm that is functionally 390 equivalent to it). 392 The reverse operation of transforming a CRI into a CRI reference is 393 unspecified; implementations are free to use any algorithm as long as 394 reference resolution of the resulting CRI reference yields the 395 original CRI. Notably, a CRI reference is not required to satisfy 396 all of the constraints of a CRI; the only requirement on a CRI 397 reference is that reference resolution MUST yield the original CRI. 399 When testing for equivalence or difference, applications SHOULD NOT 400 directly compare CRI references; the references should be resolved to 401 their respective CRI before comparison. 403 5.1. CBOR Serialization 405 A CRI reference is encoded as a CBOR array [RFC8949], with the 406 structure as described in the Concise Data Definition Language (CDDL) 407 [RFC8610] as follows: 409 ; not expressed in this CDDL spec: trailing nulls to be left off 411 CRI-Reference = [ 412 ((scheme / null, authority / no-authority) 413 // discard), ; relative reference 414 path / null, 415 query / null, 416 fragment / null 417 ] 419 scheme = scheme-name / scheme-id 420 scheme-name = text .regexp "[a-z][a-z0-9+.-]*" 421 scheme-id = (COAP / COAPS / HTTP / HTTPS / other-scheme) 422 .within nint 423 COAP = -1 COAPS = -2 HTTP = -3 HTTPS = -4 URN = -5 DID = -6 424 other-scheme = nint .feature "scheme-id-extension" 426 no-authority = NOAUTH-NOSLASH / NOAUTH-LEADINGSLASH 427 NOAUTH-LEADINGSLASH = null 428 NOAUTH-NOSLASH = true 430 authority = [host, ?port] 431 host = (host-name // host-ip) 432 host-name = (*text) ; lowercase, NFC labels 433 host-ip = (bytes .size 4 // 434 (bytes .size 16, ?zone-id)) 435 zone-id = text 436 port = 0..65535 438 discard = DISCARD-ALL / 0..127 439 DISCARD-ALL = true 440 path = [*text] 441 query = [*text] 442 fragment = text 444 This CDDL specification is simplified for exposition and needs to be 445 augmented by the following rule for interchange: Trailing null values 446 are removed, and two leading null values (scheme and authority both 447 not given) are represented by using the discard alternative instead. 449 The rules scheme, authority, path, query, fragment correspond to the 450 (sub-)components of a CRI, as described in Section 2, with the 451 addition of the discard section. The discard section can be used 452 when neither a scheme nor an authority is present. It then expresses 453 path prefixes such as "/", "./", "../", "../../", etc. The exact 454 semantics of the section values are defined by Section 5.3. 456 Most URI references that Section 4.2 of [RFC3986] calls "relative 457 references" (i.e., references that need to undergo a resolution 458 process to obtain a URI) correspond to the CRI form that starts with 459 discard. The exception are relative references with an authority 460 (called a "network-path reference" in Section 4.2 of [RFC3986]), 461 which in CRI references never carry a discard section (the value of 462 discard defaults to true). 464 | The structure of a CRI is visualized using the somewhat limited 465 | means of a railroad diagram below. 466 | 467 | cri-reference: 468 | ╭──────────────────────────────────────>───────────────────────────────────────╮ 469 | │ │ 470 | │ ╭─────────────────────>─────────────────────╮ │ 471 | │ │ │ │ 472 | │ │ ╭──────────────>──────────────╮ │ │ 473 | │ │ │ │ │ │ 474 | │ │ │ ╭──────>───────╮ │ │ │ 475 | │ │ │ │ │ │ │ │ 476 | │├──╯──╮── scheme ── authority ──╭──╯── path ──╯── query ──╯── fragment ──╰──╰──╰──╰──┤│ 477 | │ │ 478 | ╰──────── discard ────────╯ 479 | 480 | This visualization does not go into the details of the 481 | elements. 483 Examples: 485 [-1, / scheme -- equivalent to "coap" / 486 [h'C6336401', / host / 487 61616], / port / 488 [".well-known", / path / 489 "core"] 490 ] 492 [true, / discard / 493 [".well-known", / path / 494 "core"], 495 ["rt=temperature-c"]] / query / 497 [-6, / scheme -- equivalent to "did" / 498 true, / authority = NOAUTH-NOSLASH / 499 ["web:alice:bob"] / path / 500 ] 502 A CRI reference is considered _well-formed_ if it matches the CDDL 503 structure. 505 A CRI reference is considered _absolute_ if it is well-formed and the 506 sequence of sections starts with a non-null scheme. 508 A CRI reference is considered _relative_ if it is well-formed and the 509 sequence of sections is empty or starts with a section other than 510 those that would constitute a scheme. 512 5.2. Ingesting and encoding a CRI Reference 514 From an abstract point of view, a CRI Reference is a data structure 515 with six sections: 517 scheme, authority, discard, path, query, fragment 519 Each of these sections can be unset ("null"), except for discard, 520 which is always an unsigned number or true. If scheme and/or 521 authority are non-null, discard must be true. 523 When ingesting a CRI Reference that is in the transfer form, those 524 sections are filled in from the transfer form (unset sections are 525 filled with null), and the following steps are performed: 527 * If the array is entirely empty, replace it with [0]. 529 * If discard is present in the transfer form (i.e., the outer array 530 starts with true or an unsigned number), set scheme and authority 531 to null. 533 * If scheme and/or authority are present in the transfer form (i.e., 534 the outer array starts with null, a text string, or a negative 535 integer), set discard to true. 537 Upon encoding the abstract form into the transfer form, the inverse 538 processing is performed: If scheme and/or authority are not null, the 539 discard value is not transferred (it must be true in this case). If 540 they are both null, they are both left out and only discard is 541 transferred. Trailing null values are removed from the array. As a 542 special case, an empty array is sent in place for a remaining [0] 543 (URI ""). 545 5.3. Reference Resolution 547 The term "relative" implies that a "base CRI" exists against which 548 the relative reference is applied. Aside from fragment-only 549 references, relative references are only usable when a base CRI is 550 known. 552 The following steps define the process of resolving any well-formed 553 CRI reference against a base CRI so that the result is a CRI in the 554 form of an absolute CRI reference: 556 1. Establish the base CRI of the CRI reference and express it in the 557 form of an abstract absolute CRI reference. 559 2. Initialize a buffer with the sections from the base CRI. 561 3. If the value of discard is true in the CRI reference, replace the 562 path in the buffer with the empty array, unset query and 563 fragment, and set a true authority to null. If the value of 564 discard is an unsigned number, remove as many elements from the 565 end of the path array; if it is non-zero, unset query and 566 fragment. Set discard to true in the buffer. 568 4. If the path section is set in the CRI reference, append all 569 elements from the path array to the array in the path section in 570 the buffer; unset query and fragment. 572 5. Apart from the path and discard, copy all non-null sections from 573 the CRI reference to the buffer in sequence; unset fragment if 574 query is non-null and thus copied. 576 6. Return the sections in the buffer as the resolved CRI. 578 6. Relationship between CRIs, URIs and IRIs 580 CRIs are meant to replace both Uniform Resource Identifiers (URIs) 581 [RFC3986] and Internationalized Resource Identifiers (IRIs) [RFC3987] 582 in constrained environments [RFC7228]. Applications in these 583 environments may never need to use URIs and IRIs directly, especially 584 when the resource identifier is used simply for identification 585 purposes or when the CRI can be directly converted into a CoAP 586 request. 588 However, it may be necessary in other environments to determine the 589 associated URI or IRI of a CRI, and vice versa. Applications can 590 perform these conversions as follows: 592 CRI to URI 593 A CRI is converted to a URI as specified in Section 6.1. 595 URI to CRI 596 The method of converting a URI to a CRI is unspecified; 597 implementations are free to use any algorithm as long as 598 converting the resulting CRI back to a URI yields an equivalent 599 URI. 601 CRI to IRI 602 A CRI can be converted to an IRI by first converting it to a URI 603 as specified in Section 6.1, and then converting the URI to an IRI 604 as described in Section 3.2 of [RFC3987]. 606 IRI to CRI 607 An IRI can be converted to a CRI by first converting it to a URI 608 as described in Section 3.1 of [RFC3987], and then converting the 609 URI to a CRI as described above. 611 Everything in this section also applies to CRI references, URI 612 references and IRI references. 614 6.1. Converting CRIs to URIs 616 Applications MUST convert a CRI reference to a URI reference by 617 determining the components of the URI reference according to the 618 following steps and then recomposing the components to a URI 619 reference string as specified in Section 5.3 of [RFC3986]. 621 scheme 622 If the CRI reference contains a scheme section, the scheme 623 component of the URI reference consists of the value of that 624 section. Otherwise, the scheme component is unset. 626 authority 627 If the CRI reference contains a host-name or host-ip item, the 628 authority component of the URI reference consists of a host 629 subcomponent, optionally followed by a colon (":") character and a 630 port subcomponent. Otherwise, the authority component is unset. 632 The host subcomponent consists of the value of the host-name or 633 host-ip item. 635 The host-name is turned into a single string by joining the 636 elements separated by dots ("."). Any character in the value of a 637 host-name item that is not in the set of unreserved characters 638 (Section 2.3 of [RFC3986]) or "sub-delims" (Section 2.2 of 639 [RFC3986]) MUST be percent-encoded. 641 The value of a host-ip item MUST be represented as a string that 642 matches the "IPv4address" or "IP-literal" rule (Section 3.2.2 of 643 [RFC3986]). Any zone-id is appended to the string, separated by 644 "%25" as defined in Section 2 of [RFC6874], or as specified in a 645 successor zone-id specification document; this also leads to a 646 modified "IP-literal" rule as specified in these documents. 648 If the CRI reference contains a port item, the port subcomponent 649 consists of the value of that item in decimal notation. 650 Otherwise, the colon (":") character and the port subcomponent are 651 both omitted. 653 path 654 If the CRI reference contains a discard item of value true, the 655 path component is prefixed by a slash ("/") character. If it 656 contains a discard item of value 0 and the path item is present, 657 the conversion fails. Otherwise, the path component is prefixed 658 by as many "../" components as the discard value minus one 659 indicates. 661 If the discard item is not present and the CRI reference contains 662 an authority that is true, the path component of the URI reference 663 is prefixed by the zero-length string. Otherwise, the path 664 component is prefixed by a slash ("/") character. 666 If the CRI reference contains one or more path items, the prefix 667 is followed by the value of each item, separated by a slash ("/") 668 character. 670 Any character in the value of a path item that is not in the set 671 of unreserved characters or "sub-delims" or a colon (":") or 672 commercial at ("@") character MUST be percent-encoded. 674 If the authority component is present (not null or true) and the 675 path component does not match the "path-abempty" rule (Section 3.3 676 of [RFC3986]), the conversion fails. 678 If the authority component is not present, but the scheme 679 component is, and the path component does not match the "path- 680 absolute", "path-rootless" (authority == true) or "path-empty" 681 rule (Section 3.3 of [RFC3986]), the conversion fails. 683 If neither the authority component nor the scheme component are 684 present, and the path component does not match the "path- 685 absolute", "path-noscheme" or "path-empty" rule (Section 3.3 of 686 [RFC3986]), the conversion fails. 688 query 689 If the CRI reference contains one or more query items, the query 690 component of the URI reference consists of the value of each item, 691 separated by an ampersand ("&") character. Otherwise, the query 692 component is unset. 694 Any character in the value of a query item that is not in the set 695 of unreserved characters or "sub-delims" or a colon (":"), 696 commercial at ("@"), slash ("/") or question mark ("?") character 697 MUST be percent-encoded. Additionally, any ampersand character 698 ("&") in the item value MUST be percent-encoded. 700 fragment 701 If the CRI reference contains a fragment item, the fragment 702 component of the URI reference consists of the value of that item. 703 Otherwise, the fragment component is unset. 705 Any character in the value of a fragment item that is not in the 706 set of unreserved characters or "sub-delims" or a colon (":"), 707 commercial at ("@"), slash ("/") or question mark ("?") character 708 MUST be percent-encoded. 710 7. Extended CRI: Accommodating Percent Encoding 712 CRIs have been designed to relieve implementations operating on CRIs 713 from string scanning, which both helps constrained implementations 714 and implementations that need to achieve high throughput. 716 Basic CRI does not support URI components that _require_ percent- 717 encoding (Section 2.1 of [RFC3986]) to represent them in the URI 718 syntax, except where that percent-encoding is used to escape the main 719 delimiter in use. 721 E.g., the URI 723 https://alice/3%2f4-inch 725 is represented by the basic CRI 727 [-4, ["alice"], ["3/4-inch"]] 729 However, percent-encoding that is used at the application level is 730 not supported by basic CRIs: 732 did:web:alice:7%3A1-balun 734 This section presents a method to represent percent-encoded segments 735 of hostnames, paths, and queries. 737 The three CDDL rules 739 host-name = (*text) 740 path = [*text] 741 query = [*text] 742 are replaced with 744 host-name = (*text-or-pet) 745 path = [*text-or-pet] 746 query = [*text-or-pet] 748 text-or-pet = text / 749 ([*(text, pet), ?text]) .feature "extended-cri" 751 ; pet is perent-encoded text 752 pet = text 754 That is, for each of the host-name, path, and query segments, for 755 each segment an alternate representation is provided: an array of 756 text strings, the even-numbered ones of which are normal text 757 strings, while the odd-numbered ones are text strings that retain the 758 special semantics of percent-encoded text without actually being 759 percent-encoded. 761 The above DID URI can now be represented as: 763 [-6, true, [["web:alice:7", ":", "1-balun"]]] 765 8. Implementation Status 767 With the exception of the authority=true fix and host-names split 768 into labels, CRIs are implemented in https://gitlab.com/chrysn/ 769 micrurus. 771 9. Security Considerations 773 Parsers of CRI references must operate on input that is assumed to be 774 untrusted. This means that parsers MUST fail gracefully in the face 775 of malicious inputs. Additionally, parsers MUST be prepared to deal 776 with resource exhaustion (e.g., resulting from the allocation of big 777 data items) or exhaustion of the call stack (stack overflow). See 778 Section 10 of [RFC8949] for additional security considerations 779 relating to CBOR. 781 The security considerations discussed in Section 7 of [RFC3986] and 782 Section 8 of [RFC3987] for URIs and IRIs also apply to CRIs. 784 10. IANA Considerations 786 This document has no IANA actions. 788 11. References 789 11.1. Normative References 791 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 792 Requirement Levels", BCP 14, RFC 2119, 793 DOI 10.17487/RFC2119, March 1997, 794 . 796 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 797 Resource Identifier (URI): Generic Syntax", STD 66, 798 RFC 3986, DOI 10.17487/RFC3986, January 2005, 799 . 801 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 802 Identifiers (IRIs)", RFC 3987, DOI 10.17487/RFC3987, 803 January 2005, . 805 [RFC6874] Carpenter, B., Cheshire, S., and R. Hinden, "Representing 806 IPv6 Zone Identifiers in Address Literals and Uniform 807 Resource Identifiers", RFC 6874, DOI 10.17487/RFC6874, 808 February 2013, . 810 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 811 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 812 May 2017, . 814 [RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data 815 Definition Language (CDDL): A Notational Convention to 816 Express Concise Binary Object Representation (CBOR) and 817 JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, 818 June 2019, . 820 [RFC8949] Bormann, C. and P. Hoffman, "Concise Binary Object 821 Representation (CBOR)", STD 94, RFC 8949, 822 DOI 10.17487/RFC8949, December 2020, 823 . 825 [Unicode] The Unicode Consortium, "The Unicode Standard, Version 826 13.0.0", ISBN 978-1-936213-26-9, March 2020, 827 . 829 11.2. Informative References 831 [RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for 832 Constrained-Node Networks", RFC 7228, 833 DOI 10.17487/RFC7228, May 2014, 834 . 836 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 837 Protocol (HTTP/1.1): Message Syntax and Routing", 838 RFC 7230, DOI 10.17487/RFC7230, June 2014, 839 . 841 [RFC7252] Shelby, Z., Hartke, K., and C. Bormann, "The Constrained 842 Application Protocol (CoAP)", RFC 7252, 843 DOI 10.17487/RFC7252, June 2014, 844 . 846 [RFC8141] Saint-Andre, P. and J. Klensin, "Uniform Resource Names 847 (URNs)", RFC 8141, DOI 10.17487/RFC8141, April 2017, 848 . 850 [RFC8288] Nottingham, M., "Web Linking", RFC 8288, 851 DOI 10.17487/RFC8288, October 2017, 852 . 854 [RFC8820] Nottingham, M., "URI Design and Ownership", BCP 190, 855 RFC 8820, DOI 10.17487/RFC8820, June 2020, 856 . 858 [W3C.REC-html52-20171214] 859 Faulkner, S., Eicholz, A., Leithead, T., Danilo, A., and 860 S. Moon, "HTML 5.2", World Wide Web Consortium 861 Recommendation REC-html52-20171214, 14 December 2017, 862 . 864 Appendix A. Change Log 866 This section is to be removed before publishing as an RFC. 868 Changes from -05 to -06 870 * rework authority: 872 - split reg-names at dots; 874 - add optional zone identifiers [RFC6874] to IP addresses 876 Changes from -04 to -05 878 * Simplify CBOR structure. 880 * Add implementation status section. 882 Changes from -03 to -04: 884 * Minor editorial improvements. 886 * Renamed path.type/path-type to discard. 888 * Renamed option to section, substructured into items. 890 * Simplied the table "resolution-variables". 892 * Use the CBOR structure inspired by Jim Schaad's proposals. 894 Changes from -02 to -03: 896 * Expanded the set of supported schemes (#3). 898 * Specified creation, normalization and comparison (#9). 900 * Clarified the default value of the path.type option (#33). 902 * Removed the append-relation path.type option (#41). 904 * Renumbered the remaining path.types. 906 * Renumbered the option numbers. 908 * Restructured the document. 910 * Minor editorial improvements. 912 Changes from -01 to -02: 914 * Changed the syntax of schemes to exclude upper case characters 915 (#13). 917 * Minor editorial improvements (#34 #37). 919 Changes from -00 to -01: 921 * None. 923 Acknowledgements 925 CRIs were developed by Klaus Hartke for use in the Constrained 926 RESTful Application Language (CoRAL). The current author team is 927 completing this work with a view to achieve good integration with the 928 potential use cases, both inside and outside of CoRAL. 930 Thanks to Christian Amsüss, Ari Keränen, Jim Schaad and Dave Thaler 931 for helpful comments and discussions that have shaped the document. 933 Contributors 935 Klaus Hartke 936 Ericsson 937 Torshamnsgatan 23 938 SE-16483 Stockholm 939 Sweden 941 Email: klaus.hartke@ericsson.com 943 Authors' Addresses 945 Carsten Bormann (editor) 946 Universität Bremen TZI 947 Postfach 330440 948 D-28359 Bremen 949 Germany 951 Phone: +49-421-218-63921 952 Email: cabo@tzi.org 954 Henk Birkholz 955 Fraunhofer SIT 956 Rheinstrasse 75 957 64295 Darmstadt 958 Germany 960 Email: henk.birkholz@sit.fraunhofer.de