idnits 2.17.1 draft-ietf-core-href-10.txt: -(3): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(498): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(499): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(500): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(501): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(502): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(503): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(504): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(505): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(506): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(508): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(1122): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 14 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (7 March 2022) is 771 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'CRI' is mentioned on line 408, but not defined == Missing Reference: 'CRI-Reference' is mentioned on line 408, but not defined -- Looks like a reference, but probably isn't: '0' on line 574 -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode' -- Obsolete informational reference (is this intentional?): RFC 7230 (Obsoleted by RFC 9110, RFC 9112) Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 CoRE Working Group C. Bormann, Ed. 3 Internet-Draft Universität Bremen TZI 4 Intended status: Standards Track H. Birkholz 5 Expires: 8 September 2022 Fraunhofer SIT 6 7 March 2022 8 Constrained Resource Identifiers 9 draft-ietf-core-href-10 11 Abstract 13 The Constrained Resource Identifier (CRI) is a complement to the 14 Uniform Resource Identifier (URI) that serializes the URI components 15 in Concise Binary Object Representation (CBOR) instead of a sequence 16 of characters. This simplifies parsing, comparison and reference 17 resolution in environments with severe limitations on processing 18 power, code size, and memory size. 20 The present revision -10 of this draft contains an experimental 21 addition that allows representing user information 22 (https://alice@chains.example) in the URI authority component. This 23 feature lacks test vectors and implementation experience at the time 24 of writing and requires discussion. 26 About This Document 28 This note is to be removed before publishing as an RFC. 30 Status information for this document may be found at 31 https://datatracker.ietf.org/doc/draft-ietf-core-href/. 33 Discussion of this document takes place on the Constrained RESTful 34 Environments Working Group mailing list (mailto:core@ietf.org), which 35 is archived at https://mailarchive.ietf.org/arch/browse/core/. 37 Source for this draft and an issue tracker can be found at 38 https://github.com/core-wg/href. 40 Status of This Memo 42 This Internet-Draft is submitted in full conformance with the 43 provisions of BCP 78 and BCP 79. 45 Internet-Drafts are working documents of the Internet Engineering 46 Task Force (IETF). Note that other groups may also distribute 47 working documents as Internet-Drafts. The list of current Internet- 48 Drafts is at https://datatracker.ietf.org/drafts/current/. 50 Internet-Drafts are draft documents valid for a maximum of six months 51 and may be updated, replaced, or obsoleted by other documents at any 52 time. It is inappropriate to use Internet-Drafts as reference 53 material or to cite them other than as "work in progress." 55 This Internet-Draft will expire on 8 September 2022. 57 Copyright Notice 59 Copyright (c) 2022 IETF Trust and the persons identified as the 60 document authors. All rights reserved. 62 This document is subject to BCP 78 and the IETF Trust's Legal 63 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 64 license-info) in effect on the date of publication of this document. 65 Please review these documents carefully, as they describe your rights 66 and restrictions with respect to this document. Code Components 67 extracted from this document must include Revised BSD License text as 68 described in Section 4.e of the Trust Legal Provisions and are 69 provided without warranty as described in the Revised BSD License. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 74 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 4 75 2. Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 4 76 2.1. Constraints not expressed by the data model . . . . . . . 6 77 3. Creation and Normalization . . . . . . . . . . . . . . . . . 7 78 4. Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 8 79 5. CRI References . . . . . . . . . . . . . . . . . . . . . . . 8 80 5.1. CBOR Serialization . . . . . . . . . . . . . . . . . . . 9 81 5.1.1. The discard Section . . . . . . . . . . . . . . . . . 11 82 5.1.2. Visualization . . . . . . . . . . . . . . . . . . . . 11 83 5.1.3. Examples . . . . . . . . . . . . . . . . . . . . . . 11 84 5.1.4. Specific Terminology . . . . . . . . . . . . . . . . 12 85 5.2. Ingesting and encoding a CRI Reference . . . . . . . . . 12 86 5.3. Reference Resolution . . . . . . . . . . . . . . . . . . 13 87 6. Relationship between CRIs, URIs and IRIs . . . . . . . . . . 14 88 6.1. Converting CRIs to URIs . . . . . . . . . . . . . . . . . 15 89 7. Extended CRI: Accommodating Percent Encoding (PET) . . . . . 17 90 8. Implementation Status . . . . . . . . . . . . . . . . . . . . 18 91 9. Security Considerations . . . . . . . . . . . . . . . . . . . 19 92 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 93 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 19 94 11.1. Normative References . . . . . . . . . . . . . . . . . . 19 95 11.2. Informative References . . . . . . . . . . . . . . . . . 20 96 Appendix A. The Small Print . . . . . . . . . . . . . . . . . . 21 97 Appendix B. Change Log . . . . . . . . . . . . . . . . . . . . . 22 98 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 25 99 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 25 100 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25 102 1. Introduction 104 The Uniform Resource Identifier (URI) [RFC3986] and its most common 105 usage, the URI reference, are the Internet standard for linking to 106 resources in hypertext formats such as HTML [W3C.REC-html52-20171214] 107 or the HTTP "Link" header field [RFC8288]. 109 A URI reference is a sequence of characters chosen from the 110 repertoire of US-ASCII characters. The individual components of a 111 URI reference are delimited by a number of reserved characters, which 112 necessitates the use of a character escape mechanism called "percent- 113 encoding" when these reserved characters are used in a non-delimiting 114 function. The resolution of URI references involves parsing a 115 character sequence into its components, combining those components 116 with the components of a base URI, merging path components, removing 117 dot-segments, and recomposing the result back into a character 118 sequence. 120 Overall, the proper handling of URI references is quite intricate. 121 This can be a problem especially in constrained environments 122 [RFC7228], where nodes often have severe code size and memory size 123 limitations. As a result, many implementations in such environments 124 support only an ad-hoc, informally-specified, bug-ridden, non- 125 interoperable subset of half of RFC 3986. 127 This document defines the _Constrained Resource Identifier (CRI)_ by 128 constraining URIs to a simplified subset and serializing their 129 components in Concise Binary Object Representation (CBOR) [RFC8949] 130 instead of a sequence of characters. This allows typical operations 131 on URI references such as parsing, comparison and reference 132 resolution (including all corner cases) to be implemented in a 133 comparatively small amount of code. 135 As a result of simplification, however, CRIs are not capable of 136 expressing all URIs permitted by the generic syntax of RFC 3986 137 (hence the "constrained" in "Constrained Resource Identifier"). The 138 supported subset includes all URIs of the Constrained Application 139 Protocol (CoAP) [RFC7252], most URIs of the Hypertext Transfer 140 Protocol (HTTP) [RFC7230], Uniform Resource Names (URNs) [RFC8141], 141 and other similar URIs. The exact constraints are defined in 142 Section 2. 144 1.1. Notational Conventions 146 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 147 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 148 "OPTIONAL" in this document are to be interpreted as described in 149 BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 150 capitals, as shown here. 152 In this specification, the term "byte" is used in its now customary 153 sense as a synonym for "octet". 155 Terms defined in this document appear in _cursive_ where they are 156 introduced (rendered in plain text as the new term surrounded by 157 underscores). 159 2. Constraints 161 A Constrained Resource Identifier consists of the same five 162 components as a URI: scheme, authority, path, query, and fragment. 163 The components are subject to the following constraints: 165 C1. The scheme name can be any Unicode string (see Definition D80 166 in [Unicode]) that matches the syntax of a URI scheme (see 167 Section 3.1 of [RFC3986], which constrains schemes to ASCII) 168 and is lowercase (see Definition D139 in [Unicode]). The 169 scheme is always present. 171 C2. An authority is always a host identified by an IP address or 172 registered name, along with optional port information, and 173 optionally preceded by user information. 175 Alternatively, the authority can be absent; the two cases for 176 this defined in Section 3.3 of [RFC3986] are modeled by two 177 different values used in place of an absent authority: 179 * the path can begin with a root ("/", as when the authority 180 is present), or 182 * the path can be rootless. 184 (Note that in Figure 1, no-authority is marked as a feature, as 185 not all CRI implementations will support authority-less URIs.) 187 C3. A userinfo is a text string built out of unreserved characters 188 (Section 2.3 of [RFC3986]) or "sub-delims" (Section 2.2 of 189 [RFC3986]); any other character needs to be percent-encoded 190 (Section 7). Note that this excludes the ":" character, which 191 is commonly deprecated as a way to delimit a cleartext password 192 in a userinfo. 194 C4. An IP address can be either an IPv4 address or an IPv6 address, 195 optionally with a zone identifier [RFC6874]. Future versions 196 of IP are not supported (it is likely that a binary mapping 197 would be strongly desirable, and that cannot be designed ahead 198 of time, so these versions need to be added as a future 199 extension if needed). 201 C5. A registered name is a sequence of one or more _labels_, which, 202 when joined with dots (".") in between them, result in a 203 Unicode string that is lowercase and in Unicode Normalization 204 Form C (NFC) (see Definition D120 in [Unicode]). (The syntax 205 may be further restricted by the scheme. As per Section 3.2.2 206 of [RFC3986], a registered name can be empty, for which case a 207 scheme can define a default for the host.) 209 C6. A port is always an integer in the range from 0 to 65535. 210 Ports outside this range, empty ports (port subcomponents with 211 no digits, see Section 3.2.3 of [RFC3986]), or ports with 212 redundant leading zeros, are not supported. 214 C7. The port is omitted if and only if the port would be the same 215 as the scheme's default port (provided the scheme is defining 216 such a default port) or the scheme is not using ports. 218 C8. A path consists of zero or more path segments. Note that a 219 path of just a single zero-length path segment is allowed -- 220 this is considered equivalent to a path of zero path segments 221 by HTTP and CoAP, but this equivalence does not hold for CRIs 222 in general as they only perform normalization on the Syntax- 223 Based Normalization level (Section 6.2.2 of [RFC3986], not on 224 the scheme-specific Scheme-Based Normalization level 225 (Section 6.2.3 of [RFC3986]). 227 (A CRI implementation may want to offer scheme-cognizant 228 interfaces, performing this scheme-specific normalization for 229 schemes it knows. The interface could assert which schemes the 230 implementation knows and provide pre-normalized CRIs. This can 231 also relieve the application from removing a lone zero-length 232 path segment before putting path segments into CoAP Options, 233 i.e., from performing the check and jump in item 8 of 234 Section 6.4 of [RFC7252]. See also SP1 in Appendix A.) 236 C9. A path segment can be any Unicode string that is in NFC, with 237 the exception of the special "." and ".." complete path 238 segments. Note that this includes the zero-length string. 240 If no authority is present in a CRI, the leading path segment 241 cannot be empty. (See also SP1 in Appendix A.) 243 C10. A query always consists of one or more query parameters. A 244 query parameter can be any Unicode string that is in NFC. It 245 is often in the form of a "key=value" pair. When converting a 246 CRI to a URI, query parameters are separated by an ampersand 247 ("&") character. (This matches the structure and encoding of 248 the target URI in CoAP requests.) Queries are optional; there 249 is a difference between an absent query and a single query 250 parameter that is the empty string. 252 C11. A fragment identifier can be any Unicode string that is in NFC. 253 Fragment identifiers are optional; there is a difference 254 between an absent fragment identifier and a fragment identifier 255 that is the empty string. 257 C12. The syntax of registered names, path segments, query 258 parameters, and fragment identifiers may be further restricted 259 and sub-structured by the scheme. There is no support, 260 however, for escaping sub-delimiters that are not intended to 261 be used in a delimiting function. 263 C13. When converting a CRI to a URI, any character that is outside 264 the allowed character range or is a delimiter in the URI syntax 265 is percent-encoded. For CRIs, percent-encoding always uses the 266 UTF-8 encoding form (see Definition D92 in [Unicode]) to 267 convert the character to a sequence of bytes (that is then 268 converted to a sequence of %HH triplets). 270 Examples for URIs at or beyond the boundaries of these constraints 271 are in SP2 in Appendix A. 273 2.1. Constraints not expressed by the data model 275 There are syntactically valid CRIs and CRI references that cannot be 276 converted into a URI or URI reference, respectively. 278 For CRI references, this is acceptable -- they can be resolved still 279 and result in a valid CRI that can be converted back. (An example of 280 this is [0, ["p"]] which appends a slash and the path segment "p" to 281 its base). 283 (Full) CRIs that do not correspond to a valid URI are not valid on 284 their own, and cannot be used. Normatively they are characterized by 285 the Section 6.1 process producing a valid and syntax-normalized URI. 286 For easier understanding, they are listed here: 288 * CRIs (and CRI references) containing a path component "." or "..". 290 These would be removed by the remove_dot_segments algorithm of 291 [RFC3986], and thus never produce a normalized URI after 292 resolution. 294 (In CRI references, the discard value is used to afford segment 295 removal, and with "." being an unreserved character, expressing 296 them as "%2e" and "%2e%2e" is not even viable, let alone 297 practical). 299 * CRIs without authority whose path starts with two or more empty 300 segments. 302 When converted to URIs, these would violate the requirement that 303 in absence of an authority, a URI's path cannot begin with two 304 slash characters, and they would be indistinguishable from a URI 305 with a shorter path and a present but empty authority component. 307 3. Creation and Normalization 309 In general, resource identifiers are created on the initial creation 310 of a resource with a certain resource identifier, or the initial 311 exposition of a resource under a particular resource identifier. 313 A Constrained Resource Identifier SHOULD be created by the naming 314 authority that governs the namespace of the resource identifier (see 315 also [RFC8820]). For example, for the resources of an HTTP origin 316 server, that server is responsible for creating the CRIs for those 317 resources. 319 The naming authority MUST ensure that any CRI created satisfies the 320 constraints defined in Section 2. The creation of a CRI fails if the 321 CRI cannot be validated to satisfy all of the constraints. 323 If a naming authority creates a CRI from user input, it MAY apply the 324 following (and only the following) normalizations to get the CRI more 325 likely to validate: 327 * map the scheme name to lowercase (C1); 329 * map the registered name to NFC (C5) and split it on embedded dots; 330 * elide the port if it is the default port for the scheme (C7); 332 * map path segments, query parameters and the fragment identifier to 333 NFC form (C9, C10, C11). 335 Once a CRI has been created, it can be used and transferred without 336 further normalization. All operations that operate on a CRI SHOULD 337 rely on the assumption that the CRI is appropriately pre-normalized. 338 (This does not contradict the requirement that when CRIs are 339 transferred, recipients must operate on as-good-as untrusted input 340 and fail gracefully in the face of malicious inputs.) 342 4. Comparison 344 One of the most common operations on CRIs is comparison: determining 345 whether two CRIs are equivalent, without dereferencing the CRIs 346 (using them to access their respective resource(s)). 348 Determination of equivalence or difference of CRIs is based on simple 349 component-wise comparison. If two CRIs are identical component-by- 350 component (using code-point-by-code-point comparison for components 351 that are Unicode strings) then it is safe to conclude that they are 352 equivalent. 354 This comparison mechanism is designed to minimize false negatives 355 while strictly avoiding false positives. The constraints defined in 356 Section 2 imply the most common forms of syntax- and scheme-based 357 normalizations in URIs, but do not comprise protocol-based 358 normalizations that require accessing the resources or detailed 359 knowledge of the scheme's dereference algorithm. False negatives can 360 be caused, for example, by CRIs that are not appropriately pre- 361 normalized and by resource aliases. 363 When CRIs are compared to select (or avoid) a network action, such as 364 retrieval of a representation, fragment components (if any) should be 365 excluded from the comparison. 367 5. CRI References 369 The most common usage of a Constrained Resource Identifier is to 370 embed it in resource representations, e.g., to express a hyperlink 371 between the represented resource and the resource identified by the 372 CRI. 374 This section defines the serialization of CRIs in Concise Binary 375 Object Representation (CBOR) [RFC8949]. To reduce representation 376 size, CRIs are not serialized directly. Instead, CRIs are indirectly 377 referenced through _CRI references_. These take advantage of 378 hierarchical locality and provide a very compact encoding. The CBOR 379 serialization of CRI references is specified in Section 5.1. 381 The only operation defined on a CRI reference is _reference 382 resolution_: the act of transforming a CRI reference into a CRI. An 383 application MUST implement this operation by applying the algorithm 384 specified in Section 5.3 (or any algorithm that is functionally 385 equivalent to it). 387 The reverse operation of transforming a CRI into a CRI reference is 388 unspecified; implementations are free to use any algorithm as long as 389 reference resolution of the resulting CRI reference yields the 390 original CRI. Notably, a CRI reference is not required to satisfy 391 all of the constraints of a CRI; the only requirement on a CRI 392 reference is that reference resolution MUST yield the original CRI. 394 When testing for equivalence or difference, applications SHOULD NOT 395 directly compare CRI references; the references should be resolved to 396 their respective CRI before comparison. 398 5.1. CBOR Serialization 400 A CRI or CRI reference is encoded as a CBOR array [RFC8949], with the 401 structure as described in the Concise Data Definition Language (CDDL) 402 [RFC8610] as follows: 403 // RFC Ed.: throughout this section, please replace RFC-XXXX with the 404 // RFC number of this specification and remove this note. 406 ; not expressed in this CDDL spec: trailing nulls to be left off 408 RFC-XXXX-Definitions = [CRI, CRI-Reference] 410 CRI = [ 411 scheme, 412 authority / no-authority, 413 local-part 414 ] 416 CRI-Reference = [ 417 ((scheme / null, authority / no-authority) 418 // discard), ; relative reference 419 local-part 420 ] 421 local-part = ( 422 path / null, 423 query / null, 424 fragment / null 425 ) 427 scheme = scheme-name / scheme-id 428 scheme-name = text .regexp "[a-z][a-z0-9+.-]*" 429 scheme-id = (COAP / COAPS / HTTP / HTTPS / URN / DID / 430 other-scheme) 431 .within nint 432 COAP = -1 COAPS = -2 HTTP = -3 HTTPS = -4 URN = -5 DID = -6 433 other-scheme = nint .feature "scheme-id-extension" 435 no-authority = NOAUTH-NOSLASH / NOAUTH-LEADINGSLASH 436 NOAUTH-LEADINGSLASH = null .feature "no-authority" 437 NOAUTH-NOSLASH = true .feature "no-authority" 439 authority = [?userinfo, host, ?port] 440 userinfo = (false, text .feature "userinfo") 441 host = (host-ip // host-name) 442 host-name = (*text) ; lowercase, NFC labels 443 host-ip = (bytes .size 4 // 444 (bytes .size 16, ?zone-id)) 445 zone-id = text 446 port = 0..65535 448 discard = DISCARD-ALL / 0..127 449 DISCARD-ALL = true 450 path = [*text] 451 query = [*text] 452 fragment = text 454 Figure 1: CDDL for CRI CBOR serialization 456 This CDDL specification is simplified for exposition and needs to be 457 augmented by the following rule for interchange of CRIs and CRI 458 references: Trailing null values MUST be removed, and two leading 459 null values (scheme and authority both not given) are represented by 460 using the discard alternative instead. 462 The rules scheme, authority, path, query, fragment correspond to the 463 (sub-)components of a CRI, as described in Section 2, with the 464 addition of the discard section. 466 5.1.1. The discard Section 468 The discard section can be used in a CRI reference when neither a 469 scheme nor an authority is present. It then expresses the operations 470 performed on a base CRI by CRI references that are equivalent to URI 471 references with relative paths and path prefixes such as "/", "./", 472 "../", "../../", etc. "." and ".." are not available in CRIs and are 473 therefore expressed using discard after a normalization step, as is 474 the presence or absence of a leading "/". 476 E.g., a simple URI reference "foo" specifies to remove one leading 477 segment from the base URI's path, which is represented in the 478 equivalent CRI reference discard section as the value 1; similarly 479 "../foo" removes two leading segments, represented as 2; and "/foo" 480 removes all segments, represented in the discard section as the value 481 true. The exact semantics of the section values are defined by 482 Section 5.3. 484 Most URI references that Section 4.2 of [RFC3986] calls "relative 485 references" (i.e., references that need to undergo a resolution 486 process to obtain a URI) correspond to the CRI form that starts with 487 discard. The exception are relative references with an authority 488 (called a "network-path reference" in Section 4.2 of [RFC3986]), 489 which discard the entire path of the base CRI. These CRI references 490 never carry a discard section: the value of discard defaults to true. 492 5.1.2. Visualization 494 The structure of a CRI reference is visualized using the somewhat 495 limited means of a railroad diagram: 497 cri-reference: 498 ╭──────────────────────────────────────>───────────────────────────────────────╮ 499 │ │ 500 │ ╭─────────────────────>─────────────────────╮ │ 501 │ │ │ │ 502 │ │ ╭──────────────>──────────────╮ │ │ 503 │ │ │ │ │ │ 504 │ │ │ ╭──────>───────╮ │ │ │ 505 │ │ │ │ │ │ │ │ 506 │├──╯──╮── scheme ── authority ──╭──╯── path ──╯── query ──╯── fragment ──╰──╰──╰──╰──┤│ 507 │ │ 508 ╰──────── discard ────────╯ 510 This visualization does not go into the details of the elements. 512 5.1.3. Examples 514 [-1, / scheme -- equivalent to "coap" / 515 [h'C6336401', / host / 516 61616], / port / 517 [".well-known", / path / 518 "core"] 519 ] 521 [true, / discard / 522 [".well-known", / path / 523 "core"], 524 ["rt=temperature-c"]] / query / 526 [-6, / scheme -- equivalent to "did" / 527 true, / authority = NOAUTH-NOSLASH / 528 ["web:alice:bob"] / path / 529 ] 531 5.1.4. Specific Terminology 533 A CRI reference is considered _well-formed_ if it matches the 534 structure as expressed in Figure 1 in CDDL, with the additional 535 requirement that trailing null values are removed from the array. 537 A CRI reference is considered _absolute_ if it is well-formed and the 538 sequence of sections starts with a non-null scheme. 540 A CRI reference is considered _relative_ if it is well-formed and the 541 sequence of sections is empty or starts with a section other than 542 those that would constitute a scheme. 544 5.2. Ingesting and encoding a CRI Reference 546 From an abstract point of view, a CRI Reference is a data structure 547 with six sections: 549 scheme, authority, discard, path, query, fragment 551 Each of these sections can be unset ("null"), except for discard, 552 which is always an unsigned number or true. If scheme and/or 553 authority are non-null, discard must be true. 555 When ingesting a CRI Reference that is in the transfer form, those 556 sections are filled in from the transfer form (unset sections are 557 filled with null), and the following steps are performed: 559 * If the array is entirely empty, replace it with [0]. 561 * If discard is present in the transfer form (i.e., the outer array 562 starts with true or an unsigned number), set scheme and authority 563 to null. 565 * If scheme and/or authority are present in the transfer form (i.e., 566 the outer array starts with null, a text string, or a negative 567 integer), set discard to true. 569 Upon encoding the abstract form into the transfer form, the inverse 570 processing is performed: If scheme and/or authority are not null, the 571 discard value is not transferred (it must be true in this case). If 572 they are both null, they are both left out and only discard is 573 transferred. Trailing null values are removed from the array. As a 574 special case, an empty array is sent in place for a remaining [0] 575 (URI ""). 577 5.3. Reference Resolution 579 The term "relative" implies that a "base CRI" exists against which 580 the relative reference is applied. Aside from fragment-only 581 references, relative references are only usable when a base CRI is 582 known. 584 The following steps define the process of resolving any well-formed 585 CRI reference against a base CRI so that the result is a CRI in the 586 form of an absolute CRI reference: 588 1. Establish the base CRI of the CRI reference and express it in the 589 form of an abstract absolute CRI reference. 591 2. Initialize a buffer with the sections from the base CRI. 593 3. If the value of discard is true in the CRI reference (which is 594 implicitly the case when scheme and/or authority are present in 595 the reference), replace the path in the buffer with the empty 596 array, unset query and fragment, and set a true authority to 597 null. If the value of discard is an unsigned number, remove as 598 many elements from the end of the path array; if it is non-zero, 599 unset query and fragment. 601 Set discard to true in the buffer. 603 4. If the path section is set in the CRI reference, append all 604 elements from the path array to the array in the path section in 605 the buffer; unset query and fragment. 607 5. Apart from the path and discard, copy all non-null sections from 608 the CRI reference to the buffer in sequence; unset fragment in 609 the buffer if query is non-null in the CRI reference (and 610 therefore has been copied to the buffer). 612 6. Return the sections in the buffer as the resolved CRI. 614 6. Relationship between CRIs, URIs and IRIs 616 CRIs are meant to replace both Uniform Resource Identifiers (URIs) 617 [RFC3986] and Internationalized Resource Identifiers (IRIs) [RFC3987] 618 in constrained environments [RFC7228]. Applications in these 619 environments may never need to use URIs and IRIs directly, especially 620 when the resource identifier is used simply for identification 621 purposes or when the CRI can be directly converted into a CoAP 622 request. 624 However, it may be necessary in other environments to determine the 625 associated URI or IRI of a CRI, and vice versa. Applications can 626 perform these conversions as follows: 628 CRI to URI 629 A CRI is converted to a URI as specified in Section 6.1. 631 URI to CRI 632 The method of converting a URI to a CRI is unspecified; 633 implementations are free to use any algorithm as long as 634 converting the resulting CRI back to a URI yields an equivalent 635 URI. 637 CRI to IRI 638 A CRI can be converted to an IRI by first converting it to a URI 639 as specified in Section 6.1, and then converting the URI to an IRI 640 as described in Section 3.2 of [RFC3987]. 642 IRI to CRI 643 An IRI can be converted to a CRI by first converting it to a URI 644 as described in Section 3.1 of [RFC3987], and then converting the 645 URI to a CRI as described above. 647 Everything in this section also applies to CRI references, URI 648 references and IRI references. 650 6.1. Converting CRIs to URIs 652 Applications MUST convert a CRI reference to a URI reference by 653 determining the components of the URI reference according to the 654 following steps and then recomposing the components to a URI 655 reference string as specified in Section 5.3 of [RFC3986]. 657 scheme 658 If the CRI reference contains a scheme section, the scheme 659 component of the URI reference consists of the value of that 660 section. Otherwise, the scheme component is unset. 662 authority 663 If the CRI reference contains a host-name or host-ip item, the 664 authority component of the URI reference consists of a host 665 subcomponent, optionally followed by a colon (":") character and a 666 port subcomponent, optionally preceded by a userinfo subcomponent. 667 Otherwise, the authority component is unset. 669 The host subcomponent consists of the value of the host-name or 670 host-ip item. 672 The userinfo subcomponent, if present, is turned into a single 673 string by appending a "@". Otherwise, both the subcomponent and 674 the "@" sign are omitted. Any character in the value of the 675 userinfo elements that is not in the set of unreserved characters 676 (Section 2.3 of [RFC3986]) or "sub-delims" (Section 2.2 of 677 [RFC3986]) MUST be percent-encoded. 679 The host-name is turned into a single string by joining the 680 elements separated by dots ("."). Any character in the elements 681 of a host-name item that is a dot ("."), or not in the set of 682 unreserved characters (Section 2.3 of [RFC3986]) or "sub-delims" 683 (Section 2.2 of [RFC3986]) MUST be percent-encoded. 685 The value of a host-ip item MUST be represented as a string that 686 matches the "IPv4address" or "IP-literal" rule (Section 3.2.2 of 687 [RFC3986]). Any zone-id is appended to the string, separated by 688 "%25" as defined in Section 2 of [RFC6874], or as specified in a 689 superseding zone-id specification document 690 [I-D.carpenter-6man-rfc6874bis]; this also leads to a modified 691 "IP-literal" rule as specified in these documents. 693 If the CRI reference contains a port item, the port subcomponent 694 consists of the value of that item in decimal notation. 695 Otherwise, the colon (":") character and the port subcomponent are 696 both omitted. 698 path 699 If the CRI reference contains a discard item of value true, the 700 path component is considered _rooted_. If it contains a discard 701 item of value 0 and the path item is present, the conversion 702 fails. If it contains a positive discard item, the path component 703 is considered _unrooted_ and prefixed by as many "../" components 704 as the discard value minus one indicates. 706 If the discard item is not present and the CRI reference contains 707 an authority that is true, the path component of the URI reference 708 is considered unrooted. Otherwise, the path component is 709 considered rooted. 711 If the CRI reference contains one or more path items, the path 712 component is constructed by concatenating the sequence of 713 representations of these items. These representations generally 714 contain a leading slash ("/") character and the value of each 715 item, processed as discussed below. The leading slash character 716 is omitted for the first path item only if the path component is 717 considered "unrooted". 719 Any character in the value of a path item that is not in the set 720 of unreserved characters or "sub-delims" or a colon (":") or 721 commercial at ("@") character MUST be percent-encoded. 723 If the authority component is present (not null or true) and the 724 path component does not match the "path-abempty" rule (Section 3.3 725 of [RFC3986]), the conversion fails. 727 If the authority component is not present, but the scheme 728 component is, and the path component does not match the "path- 729 absolute", "path-rootless" (authority == true) or "path-empty" 730 rule (Section 3.3 of [RFC3986]), the conversion fails. 732 If neither the authority component nor the scheme component are 733 present, and the path component does not match the "path- 734 absolute", "path-noscheme" or "path-empty" rule (Section 3.3 of 735 [RFC3986]), the conversion fails. 737 query 738 If the CRI reference contains one or more query items, the query 739 component of the URI reference consists of the value of each item, 740 separated by an ampersand ("&") character. Otherwise, the query 741 component is unset. 743 Any character in the value of a query item that is not in the set 744 of unreserved characters or "sub-delims" or a colon (":"), 745 commercial at ("@"), slash ("/") or question mark ("?") character 746 MUST be percent-encoded. Additionally, any ampersand character 747 ("&") in the item value MUST be percent-encoded. 749 fragment 750 If the CRI reference contains a fragment item, the fragment 751 component of the URI reference consists of the value of that item. 752 Otherwise, the fragment component is unset. 754 Any character in the value of a fragment item that is not in the 755 set of unreserved characters or "sub-delims" or a colon (":"), 756 commercial at ("@"), slash ("/") or question mark ("?") character 757 MUST be percent-encoded. 759 7. Extended CRI: Accommodating Percent Encoding (PET) 761 CRIs have been designed to relieve implementations operating on CRIs 762 from string scanning, which both helps constrained implementations 763 and implementations that need to achieve high throughput. 765 Basic CRI does not support URI components that _require_ percent- 766 encoding (Section 2.1 of [RFC3986]) to represent them in the URI 767 syntax, except where that percent-encoding is used to escape the main 768 delimiter in use. 770 E.g., the URI 772 https://alice/3%2f4-inch 774 is represented by the basic CRI 776 [-4, ["alice"], ["3/4-inch"]] 778 However, percent-encoding that is used at the application level is 779 not supported by basic CRIs: 781 did:web:alice:7%3A1-balun 783 This section presents a method to represent percent-encoded segments 784 of userinfo, hostnames, paths, and queries, as well as fragments. 786 The four CDDL rules 787 userinfo = (false, text .feature "userinfo") 788 host-name = (*text) 789 path = [*text] 790 query = [*text] 791 fragment = text 793 are replaced with 795 userinfo = (false, text-or-pet .feature "userinfo") 796 host-name = (*text-or-pet) 797 path = [*text-or-pet] 798 query = [*text-or-pet] 799 fragment = text-or-pet 801 text-or-pet = text / 802 text-pet-sequence .feature "extended-cri" 804 ; text1 and pet1 alternating, at least one pet1: 805 text-pet-sequence = [?text1, ((+(pet1, text1), ?pet1) // pet1)] 806 ; pet is percent-encoded bytes 807 pet1 = bytes .ne '' 808 text1 = text .ne "" 810 That is, for each of the host-name, path, and query segments, and for 811 the userinfo and fragment components, an alternate representation is 812 provided besides a simple text string: a non-empty array of 813 alternating non-blank text and byte strings, the text strings of 814 which stand for non-percent-encoded text, while the byte strings 815 retain the special semantics of percent-encoded text without actually 816 being percent-encoded. 818 The above DID URI can now be represented as: 820 [-6, true, [["web:alice:7", ':', "1-balun"]]] 822 8. Implementation Status 824 With the exception of the authority=true fix, host-names split into 825 labels, and Section 7, CRIs are implemented in 826 https://gitlab.com/chrysn/micrurus. A golang implementation of 827 version -10 of this document is found at: https://github.com/thomas- 828 fossati/href 830 9. Security Considerations 832 Parsers of CRI references must operate on input that is assumed to be 833 untrusted. This means that parsers MUST fail gracefully in the face 834 of malicious inputs. Additionally, parsers MUST be prepared to deal 835 with resource exhaustion (e.g., resulting from the allocation of big 836 data items) or exhaustion of the call stack (stack overflow). See 837 Section 10 of [RFC8949] for additional security considerations 838 relating to CBOR. 840 The security considerations discussed in Section 7 of [RFC3986] and 841 Section 8 of [RFC3987] for URIs and IRIs also apply to CRIs. 843 10. IANA Considerations 845 This document has no IANA actions. 847 11. References 849 11.1. Normative References 851 [I-D.carpenter-6man-rfc6874bis] 852 Carpenter, B., Cheshire, S., and R. M. Hinden, 853 "Representing IPv6 Zone Identifiers in Address Literals 854 and Uniform Resource Identifiers", Work in Progress, 855 Internet-Draft, draft-carpenter-6man-rfc6874bis-03, 8 856 February 2022, . 859 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 860 Requirement Levels", BCP 14, RFC 2119, 861 DOI 10.17487/RFC2119, March 1997, 862 . 864 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 865 Resource Identifier (URI): Generic Syntax", STD 66, 866 RFC 3986, DOI 10.17487/RFC3986, January 2005, 867 . 869 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 870 Identifiers (IRIs)", RFC 3987, DOI 10.17487/RFC3987, 871 January 2005, . 873 [RFC6874] Carpenter, B., Cheshire, S., and R. Hinden, "Representing 874 IPv6 Zone Identifiers in Address Literals and Uniform 875 Resource Identifiers", RFC 6874, DOI 10.17487/RFC6874, 876 February 2013, . 878 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 879 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 880 May 2017, . 882 [RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data 883 Definition Language (CDDL): A Notational Convention to 884 Express Concise Binary Object Representation (CBOR) and 885 JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, 886 June 2019, . 888 [RFC8949] Bormann, C. and P. Hoffman, "Concise Binary Object 889 Representation (CBOR)", STD 94, RFC 8949, 890 DOI 10.17487/RFC8949, December 2020, 891 . 893 [Unicode] The Unicode Consortium, "The Unicode Standard, Version 894 13.0.0", ISBN 978-1-936213-26-9, March 2020, 895 . 897 11.2. Informative References 899 [RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for 900 Constrained-Node Networks", RFC 7228, 901 DOI 10.17487/RFC7228, May 2014, 902 . 904 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 905 Protocol (HTTP/1.1): Message Syntax and Routing", 906 RFC 7230, DOI 10.17487/RFC7230, June 2014, 907 . 909 [RFC7252] Shelby, Z., Hartke, K., and C. Bormann, "The Constrained 910 Application Protocol (CoAP)", RFC 7252, 911 DOI 10.17487/RFC7252, June 2014, 912 . 914 [RFC8141] Saint-Andre, P. and J. Klensin, "Uniform Resource Names 915 (URNs)", RFC 8141, DOI 10.17487/RFC8141, April 2017, 916 . 918 [RFC8288] Nottingham, M., "Web Linking", RFC 8288, 919 DOI 10.17487/RFC8288, October 2017, 920 . 922 [RFC8820] Nottingham, M., "URI Design and Ownership", BCP 190, 923 RFC 8820, DOI 10.17487/RFC8820, June 2020, 924 . 926 [W3C.REC-html52-20171214] 927 Faulkner, S., Eicholz, A., Leithead, T., Danilo, A., and 928 S. Moon, "HTML 5.2", World Wide Web Consortium 929 Recommendation REC-html52-20171214, 14 December 2017, 930 . 932 Appendix A. The Small Print 934 This appendix lists a few corner cases of URI semantics that 935 implementers of CRIs need to be aware of, but that are not 936 representative of the normal operation of CRIs. 938 SP1. Initial (Lone/Leading) Empty Path Segments: 940 * _Lone empty path segments:_ As per [RFC3986], s://x is distinct 941 from s://x/ -- i.e., a URI with an empty path is different from 942 one with a lone empty path segment. However, in HTTP, CoAP, they 943 are implicitly aliased (for CoAP, in item 8 of Section 6.4 of 944 [RFC7252]). As per item 7 of Section 6.5 of [RFC7252], 945 recomposition of a URI without Uri-Path Options from the other 946 URI-related CoAP Options produces s://x/, not s://x -- CoAP 947 prefers the lone empty path segment form. 948 // TBD: add similar text for HTTP, if that can be 949 made.Section 6.2.3 of [RFC3986] even states: 951 | In general, a URI that uses the generic syntax for authority with 952 | an empty path should be normalized to a path of "/". 954 * _Leading empty path segments without authority_: Somewhat related, 955 note also that URIs and URI references that do not carry an 956 authority cannot represent initial empty path segments (i.e., that 957 are followed by further path segments): s://x//foo works, but in a 958 s://foo URI or an (absolute-path) URI reference of the form //foo 959 the double slash would be mis-parsed as leading in to an 960 authority. 962 SP2. Constraints (Section 2) of CRIs/basic CRIs 964 While most URIs in everyday use can be converted to CRIs and 965 back to URIs matching the input after syntax-based 966 normalization of the URI, these URIs illustrate the constraints 967 by example: 969 * https://host%ffname, https://example.com/x?data=%ff 971 All URI components must, after percent decoding, be valid 972 UTF-8 encoded text. Bytes that are not valid UTF-8 show up, 973 for example, in BitTorrent web seeds. 975 * https://example.com/component%3bone;component%3btwo, 976 http://example.com/component%3dequals 978 While delimiters can be used in an escaped and unescaped 979 form in URIs with generally distinct meanings, basic CRIs 980 (i.e., without percent-encoded text Section 7) only support 981 one escapable delimiter character per component, which is 982 the delimiter by which the component is split up in the CRI. 984 Note that the separators . (for authority parts), / (for 985 paths), & (for query parameters) are special in that they 986 are syntactic delimiters of their respective components in 987 CRIs. Thus, the following examples _are_ convertible to 988 basic CRIs: 990 https://interior%2edot/ 992 https://example.com/path%2fcomponent/second-component 994 https://example.com/x?ampersand=%26&questionmark=? 996 * https://alice@example.com/ 998 The user information can be expressed in CRIs if the 999 "userinfo" feature is present. The URI https://@example.com 1000 is represented as [-4, [false, "", "example", "com"]]; the 1001 false serves as a marker that the next element is the 1002 userinfo. 1004 The rules do not cater for unencoded ":" in userinfo, which 1005 is commonly considered a deprecated inclusion of a literal 1006 password. 1008 Appendix B. Change Log 1010 This section is to be removed before publishing as an RFC. 1012 Changes from -08 to -09 1014 * Identify more esoteric features with a CDDL ".feature". 1016 * Clarify that well-formedness requires removing trailing nulls. 1018 * Fragments can contain PET. 1020 * Percent-encoded text in PET is treated as byte strings. 1022 * URIs with an authority but a completely empty path (e.g., 1023 http://example.com): CRIs with an authority component no longer 1024 always produce at least a slash in the path component. 1026 For generic schemes, the conversion of scheme://example.com to a 1027 CRI is now possible because CRI produces a URI with an authority 1028 not followed by a slash following the updated rules of 1029 Section 6.1. Schemes like http and coap do not distinguish 1030 between the empty path and the path containing a single slash when 1031 an authority is set (as recommended in [RFC3986]). For these 1032 schemes, that equivalence allows implementations to convert the 1033 just-a-slash URI to a CRI with a zero length path array (which, 1034 however, when converted back, does not produce a slash after the 1035 authority). 1037 (Add an appendix "the small print" for more detailed discussion of 1038 pesky corner cases like this.) 1040 Changes from -07 to -08 1042 * Fix the encoding of NOAUTH-NOSLASH / NOAUTH-LEADINGSLASH 1044 * Add URN and DID schemes, add example. 1046 * Add PET 1048 * Remove hopeless attempt to encode "remote trailing nulls" rule in 1049 CDDL (which is not a transformation language). 1051 Changes from -06 to -07 1053 * More explicitly discuss constraints (Section 2), add examples 1054 (Appendix A, Paragraph 6, Item 1). 1056 * Make CDDL more explicit about special simple values. 1058 * Lots of gratuitous changes from XML2RFC redefinition of 1059 semantics. 1061 Changes from -05 to -06 1063 * rework authority: 1065 - split reg-names at dots; 1067 - add optional zone identifiers [RFC6874] to IP addresses 1069 Changes from -04 to -05 1070 * Simplify CBOR structure. 1072 * Add implementation status section. 1074 Changes from -03 to -04: 1076 * Minor editorial improvements. 1078 * Renamed path.type/path-type to discard. 1080 * Renamed option to section, substructured into items. 1082 * Simplified the table "resolution-variables". 1084 * Use the CBOR structure inspired by Jim Schaad's proposals. 1086 Changes from -02 to -03: 1088 * Expanded the set of supported schemes (#3). 1090 * Specified creation, normalization and comparison (#9). 1092 * Clarified the default value of the path.type option (#33). 1094 * Removed the append-relation path.type option (#41). 1096 * Renumbered the remaining path.types. 1098 * Renumbered the option numbers. 1100 * Restructured the document. 1102 * Minor editorial improvements. 1104 Changes from -01 to -02: 1106 * Changed the syntax of schemes to exclude upper case characters 1107 (#13). 1109 * Minor editorial improvements (#34 #37). 1111 Changes from -00 to -01: 1113 * None. 1115 Acknowledgements 1117 CRIs were developed by Klaus Hartke for use in the Constrained 1118 RESTful Application Language (CoRAL). The current author team is 1119 completing this work with a view to achieve good integration with the 1120 potential use cases, both inside and outside of CoRAL. 1122 Thanks to Christian Amsüss, Thomas Fossati, Ari Keränen, Jim Schaad, 1123 Dave Thaler and Marco Tiloca for helpful comments and discussions 1124 that have shaped the document. 1126 Contributors 1128 Klaus Hartke 1129 Ericsson 1130 Torshamnsgatan 23 1131 SE-16483 Stockholm 1132 Sweden 1133 Email: klaus.hartke@ericsson.com 1135 Authors' Addresses 1137 Carsten Bormann (editor) 1138 Universität Bremen TZI 1139 Postfach 330440 1140 D-28359 Bremen 1141 Germany 1142 Phone: +49-421-218-63921 1143 Email: cabo@tzi.org 1145 Henk Birkholz 1146 Fraunhofer SIT 1147 Rheinstrasse 75 1148 64295 Darmstadt 1149 Germany 1150 Email: henk.birkholz@sit.fraunhofer.de