idnits 2.17.1 draft-ietf-core-href-06.txt: -(3): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(371): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(372): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(373): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(374): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(375): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(376): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(377): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(378): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(379): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(381): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 14 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (25 July 2021) is 1003 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode' -- Obsolete informational reference (is this intentional?): RFC 7230 (Obsoleted by RFC 9110, RFC 9112) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 CoRE Working Group C. Bormann, Ed. 3 Internet-Draft Universität Bremen TZI 4 Intended status: Standards Track H. Birkholz 5 Expires: 26 January 2022 Fraunhofer SIT 6 25 July 2021 8 Constrained Resource Identifiers 9 draft-ietf-core-href-06 11 Abstract 13 The Constrained Resource Identifier (CRI) is a complement to the 14 Uniform Resource Identifier (URI) that serializes the URI components 15 in Concise Binary Object Representation (CBOR) instead of a sequence 16 of characters. This simplifies parsing, comparison and reference 17 resolution in environments with severe limitations on processing 18 power, code size, and memory size. 20 Discussion Venues 22 This note is to be removed before publishing as an RFC. 24 Discussion of this document takes place on the Constrained RESTful 25 Environments Working Group mailing list (core@ietf.org), which is 26 archived at https://mailarchive.ietf.org/arch/browse/core/ 27 (https://mailarchive.ietf.org/arch/browse/core/). Source for this 28 draft and an issue tracker can be found at https://github.com/core- 29 wg/href (https://github.com/core-wg/href) 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at https://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on 26 January 2022. 48 Copyright Notice 50 Copyright (c) 2021 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 55 license-info) in effect on the date of publication of this document. 56 Please review these documents carefully, as they describe your rights 57 and restrictions with respect to this document. Code Components 58 extracted from this document must include Simplified BSD License text 59 as described in Section 4.e of the Trust Legal Provisions and are 60 provided without warranty as described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 65 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 66 2. Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 4 67 3. Creation and Normalization . . . . . . . . . . . . . . . . . 5 68 4. Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 6 69 5. CRI References . . . . . . . . . . . . . . . . . . . . . . . 6 70 5.1. CBOR Serialization . . . . . . . . . . . . . . . . . . . 7 71 5.2. Ingesting and encoding a CRI Reference . . . . . . . . . 10 72 5.3. Reference Resolution . . . . . . . . . . . . . . . . . . 10 73 6. Relationship between CRIs, URIs and IRIs . . . . . . . . . . 11 74 6.1. Converting CRIs to URIs . . . . . . . . . . . . . . . . . 12 75 7. Implementation Status . . . . . . . . . . . . . . . . . . . . 14 76 8. Security Considerations . . . . . . . . . . . . . . . . . . . 14 77 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 78 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 79 10.1. Normative References . . . . . . . . . . . . . . . . . . 14 80 10.2. Informative References . . . . . . . . . . . . . . . . . 15 81 Appendix A. CDDL specification . . . . . . . . . . . . . . . . . 16 82 Appendix B. Change Log . . . . . . . . . . . . . . . . . . . . . 17 83 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 19 84 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 19 85 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 87 1. Introduction 89 The Uniform Resource Identifier (URI) [RFC3986] and its most common 90 usage, the URI reference, are the Internet standard for linking to 91 resources in hypertext formats such as HTML [W3C.REC-html52-20171214] 92 or the HTTP "Link" header field [RFC8288]. 94 A URI reference is a sequence of characters chosen from the 95 repertoire of US-ASCII characters. The individual components of a 96 URI reference are delimited by a number of reserved characters, which 97 necessitates the use of a character escape mechanism called "percent- 98 encoding" when these reserved characters are used in a non-delimiting 99 function. The resolution of URI references involves parsing a 100 character sequence into its components, combining those components 101 with the components of a base URI, merging path components, removing 102 dot-segments, and recomposing the result back into a character 103 sequence. 105 Overall, the proper handling of URI references is quite intricate. 106 This can be a problem especially in constrained environments 107 [RFC7228], where nodes often have severe code size and memory size 108 limitations. As a result, many implementations in such environments 109 support only an ad-hoc, informally-specified, bug-ridden, non- 110 interoperable subset of half of RFC 3986. 112 This document defines the _Constrained Resource Identifier (CRI)_ by 113 constraining URIs to a simplified subset and serializing their 114 components in Concise Binary Object Representation (CBOR) [RFC8949] 115 instead of a sequence of characters. This allows typical operations 116 on URI references such as parsing, comparison and reference 117 resolution (including all corner cases) to be implemented in a 118 comparatively small amount of code. 120 As a result of simplification, however, CRIs are not capable of 121 expressing all URIs permitted by the generic syntax of RFC 3986 122 (hence the "constrained" in "Constrained Resource Identifier"). The 123 supported subset includes all URIs of the Constrained Application 124 Protocol (CoAP) [RFC7252], most URIs of the Hypertext Transfer 125 Protocol (HTTP) [RFC7230], Uniform Resource Names (URNs) [RFC8141], 126 and other similar URIs. The exact constraints are defined in 127 Section 2. 129 1.1. Notational Conventions 131 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 132 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 133 "OPTIONAL" in this document are to be interpreted as described in 134 BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 135 capitals, as shown here. 137 In this specification, the term "byte" is used in its now customary 138 sense as a synonym for "octet". 140 Terms defined in this document appear in _cursive_ where they are 141 introduced (rendered in plain text as the new term surrounded by 142 underscores). 144 2. Constraints 146 A Constrained Resource Identifier consists of the same five 147 components as a URI: scheme, authority, path, query, and fragment. 148 The components are subject to the following constraints: 150 C1. The scheme name can be any Unicode string (see Definition D80 151 in [Unicode]) that matches the syntax of a URI scheme (see 152 Section 3.1 of [RFC3986], which constrains schemes to ASCII) 153 and is lowercase (see Definition D139 in [Unicode]). The 154 scheme is always present. 156 C2. An authority is always a host identified by an IP address or 157 registered name, along with optional port information. User 158 information is not supported. The authority can be absent; in 159 [RFC3986], in this case the path can be rootless or, as when he 160 authority is present, begin with a root ("/"); this is modelled 161 by two different values for an absent authority. 163 C3. An IP address can be either an IPv4 address or an IPv6 address, 164 optionally with a zone identifier [RFC6874]. Future versions 165 of IP are not supported. 167 C4. A registered name is a sequence of one or more _labels_, which, 168 when joined with dots (".") in between them, result in a 169 Unicode string that is lowercase and in Unicode Normalization 170 Form C (NFC) (see Definition D120 in [Unicode]). (The syntax 171 may be further restricted by the scheme.) 173 C5. A port is always an integer in the range from 0 to 65535. 174 Empty ports or ports outside this range are not supported. 176 C6. The port is omitted if and only if the port would be the same 177 as the scheme's default port (provided the scheme is defining 178 such a default port) or the scheme is not using ports. 180 C7. A path consists of zero or more path segments. A path must not 181 consist of a single zero-length path segment, which is 182 considered equivalent to a path of zero path segments. 184 C8. A path segment can be any Unicode string that is in NFC, with 185 the exception of the special "." and ".." complete path 186 segments. It can be the zero-length string. No special 187 constraints are placed on the first path segment. 189 C9. A query always consists of one or more query parameters. A 190 query parameter can be any Unicode string that is in NFC. It 191 is often in the form of a "key=value" pair. When converting a 192 CRI to a URI, query parameters are separated by an ampersand 193 ("&") character. (This matches the structure and encoding of 194 the target URI in CoAP requests.) Queries are optional; there 195 is a difference between an absent query and a single query 196 parameter that is the empty string. 198 C10. A fragment identifier can be any Unicode string that is in NFC. 199 Fragment identifiers are optional; there is a difference 200 between an absent fragment identifier and a fragment identifier 201 that is the empty string. 203 C11. The syntax of registered names, path segments, query 204 parameters, and fragment identifiers may be further restricted 205 and sub-structured by the scheme. There is no support, 206 however, for escaping sub-delimiters that are not intended to 207 be used in a delimiting function. 209 C12. When converting a CRI to a URI, any character that is outside 210 the allowed character range or is a delimiter in the URI syntax 211 is percent-encoded. For CRIs, percent-encoding always uses the 212 UTF-8 encoding form (see Definition D92 in [Unicode]) to 213 convert the character to a sequence of bytes (that is then 214 converted to a sequence of %HH triplets). 216 3. Creation and Normalization 218 In general, resource identifiers are created on the initial creation 219 of a resource with a certain resource identifier, or the initial 220 exposition of a resource under a particular resource identifier. 222 A Constrained Resource Identifier SHOULD be created by the naming 223 authority that governs the namespace of the resource identifier (see 224 also [RFC8820]). For example, for the resources of an HTTP origin 225 server, that server is responsible for creating the CRIs for those 226 resources. 228 The naming authority MUST ensure that any CRI created satisfies the 229 constraints defined in Section 2. The creation of a CRI fails if the 230 CRI cannot be validated to satisfy all of the constraints. 232 If a naming authority creates a CRI from user input, it MAY apply the 233 following (and only the following) normalizations to get the CRI more 234 likely to validate: 236 * map the scheme name to lowercase (C1); 237 * map the registered name to NFC (C4) and split it on embedded dots; 239 * elide the port if it is the default port for the scheme (C6); 241 * elide a single zero-length path segment (C7); 243 * map path segments, query parameters and the fragment identifier to 244 NFC (C8, C9, C10). 246 Once a CRI has been created, it can be used and transferred without 247 further normalization. All operations that operate on a CRI SHOULD 248 rely on the assumption that the CRI is appropriately pre-normalized. 249 (This does not contradict the requirement that when CRIs are 250 transferred, recipients must operate on as-good-as untrusted input 251 and fail gracefully in the face of malicious inputs.) 253 4. Comparison 255 One of the most common operations on CRIs is comparison: determining 256 whether two CRIs are equivalent, without dereferencing the CRIs 257 (using them to access their respective resource(s)). 259 Determination of equivalence or difference of CRIs is based on simple 260 component-wise comparison. If two CRIs are identical component-by- 261 component (using code-point-by-code-point comparison for components 262 that are Unicode strings) then it is safe to conclude that they are 263 equivalent. 265 This comparison mechanism is designed to minimize false negatives 266 while strictly avoiding false positives. The constraints defined in 267 Section 2 imply the most common forms of syntax- and scheme-based 268 normalizations in URIs, but do not comprise protocol-based 269 normalizations that require accessing the resources or detailed 270 knowledge of the scheme's dereference algorithm. False negatives can 271 be caused, for example, by CRIs that are not appropriately pre- 272 normalized and by resource aliases. 274 When CRIs are compared to select (or avoid) a network action, such as 275 retrieval of a representation, fragment components (if any) should be 276 excluded from the comparison. 278 5. CRI References 280 The most common usage of a Constrained Resource Identifier is to 281 embed it in resource representations, e.g., to express a hyperlink 282 between the represented resource and the resource identified by the 283 CRI. 285 This section defines the serialization of CRIs in Concise Binary 286 Object Representation (CBOR) [RFC8949]. To reduce representation 287 size, CRIs are not serialized directly. Instead, CRIs are indirectly 288 referenced through _CRI references_. These take advantage of 289 hierarchical locality and provide a very compact encoding. The CBOR 290 serialization of CRI references is specified in Section 5.1. 292 The only operation defined on a CRI reference is _reference 293 resolution_: the act of transforming a CRI reference into a CRI. An 294 application MUST implement this operation by applying the algorithm 295 specified in Section 5.3 (or any algorithm that is functionally 296 equivalent to it). 298 The reverse operation of transforming a CRI into a CRI reference is 299 unspecified; implementations are free to use any algorithm as long as 300 reference resolution of the resulting CRI reference yields the 301 original CRI. Notably, a CRI reference is not required to satisfy 302 all of the constraints of a CRI; the only requirement on a CRI 303 reference is that reference resolution MUST yield the original CRI. 305 When testing for equivalence or difference, applications SHOULD NOT 306 directly compare CRI references; the references should be resolved to 307 their respective CRI before comparison. 309 5.1. CBOR Serialization 311 A CRI reference is encoded as a CBOR array [RFC8949], with the 312 structure as described in the Concise Data Definition Language (CDDL) 313 [RFC8610] as follows: 315 ; not expressed in this CDDL spec: trailing nulls to be left off 317 CRI-Reference = [ 318 ((scheme / null, authority / null / true) 319 // discard), ; relative reference 320 path / null, 321 query / null, 322 fragment / null 323 ] 325 scheme = scheme-name / scheme-id 326 scheme-name = text .regexp "[a-z][a-z0-9+.-]*" 327 scheme-id = (COAP / COAPS / HTTP / HTTPS / other-scheme) 328 .within nint 329 COAP = -1 COAPS = -2 HTTP = -3 HTTPS = -4 330 other-scheme = nint .feature "scheme-id-extension" 332 authority = [host, ?port] 333 host = (host-name // host-ip) 334 host-name = (*text) ; lowercase, NFC labels 335 host-ip = (bytes .size 4 // 336 (bytes .size 16, ?zone-id)) 337 zone-id = text 338 port = 0..65535 340 discard = true / 0..127 341 path = [*text] 342 query = [*text] 343 fragment = text 345 This CDDL specification is simplified for exposition and needs to be 346 augmented by the following rule for interchange: Trailing null values 347 are removed, and two leading null values (scheme and authority both 348 not given) are represented by using the "discard" alternative 349 instead. A complete CDDL specification is given in Appendix A. 351 The rules "scheme", "authority", "path", "query", "fragment" 352 correspond to the (sub-)components of a CRI, as described in 353 Section 2, with the addition of the "discard" section. The "discard" 354 section can be used when neither a scheme nor an authority is 355 present. It then expresses path prefixes such as "/", "./", "../", 356 "../../", etc. The exact semantics of the section values are defined 357 by Section 5.3. 359 Most URI references that Section 4.2 of [RFC3986] calls "relative 360 references" (i.e., references that need to undergo a resolution 361 process to obtain a URI) correspond to the CRI form that starts with 362 "discard". The exception are relative references with an "authority" 363 (called a "network-path reference" in Section 4.2 of [RFC3986]), 364 which in CRI references never carry a "discard" section (the value of 365 "discard" defaults to "true"). 367 | The structure of a CRI is visualized using the somewhat limited 368 | means of a railroad diagram below. 369 | 370 | cri-reference: 371 | ╭──────────────────────────────────────>───────────────────────────────────────╮ 372 | │ │ 373 | │ ╭─────────────────────>─────────────────────╮ │ 374 | │ │ │ │ 375 | │ │ ╭──────────────>──────────────╮ │ │ 376 | │ │ │ │ │ │ 377 | │ │ │ ╭──────>───────╮ │ │ │ 378 | │ │ │ │ │ │ │ │ 379 | │├──╯──╮── scheme ── authority ──╭──╯── path ──╯── query ──╯── fragment ──╰──╰──╰──╰──┤│ 380 | │ │ 381 | ╰──────── discard ────────╯ 382 | 383 | This visualization does not go into the details of the 384 | elements. 386 Examples: 388 [-1, / scheme -- equivalent to "coap" / 389 [h'C6336401', / host / 390 61616], / port / 391 [".well-known", / path / 392 "core"] 393 ] 395 [true, / discard / 396 [".well-known", / path / 397 "core"], 398 ["rt=temperature-c"]] / query / 400 A CRI reference is considered _well-formed_ if it matches the CDDL 401 structure. 403 A CRI reference is considered _absolute_ if it is well-formed and the 404 sequence of sections starts with a non-null "scheme". 406 A CRI reference is considered _relative_ if it is well-formed and the 407 sequence of sections is empty or starts with a section other than 408 those that would constitute a "scheme". 410 5.2. Ingesting and encoding a CRI Reference 412 From an abstract point of view, a CRI Reference is a data structure 413 with six sections: 415 scheme, authority, discard, path, query, fragment 417 Each of these sections can be unset ("null"), except for discard, 418 which is always an unsigned number or "true". If scheme and/or 419 authority are non-null, discard must be "true". 421 When ingesting a CRI Reference that is in the transfer form, those 422 sections are filled in from the transfer form (unset sections are 423 filled with null), and the following steps are performed: 425 * If the array is entirely empty, replace it with "[0]". 427 * If discard is present in the transfer form (i.e., the outer array 428 starts with true or an unsigned number), set scheme and authority 429 to null. 431 * If scheme and/or authority are present in the transfer form (i.e., 432 the outer array starts with null, a text string, or a negative 433 integer), set discard to "true". 435 Upon encoding the abstract form into the transfer form, the inverse 436 processing is performed: If scheme and/or authority are not null, the 437 discard value is not transferred (it must be true in this case). If 438 they are both null, they are both left out and only discard is 439 transferred. Trailing null values are removed from the array. As a 440 special case, an empty array is sent in place for a remaining "[0]" 441 (URI ""). 443 5.3. Reference Resolution 445 The term "relative" implies that a "base CRI" exists against which 446 the relative reference is applied. Aside from fragment-only 447 references, relative references are only usable when a base CRI is 448 known. 450 The following steps define the process of resolving any well-formed 451 CRI reference against a base CRI so that the result is a CRI in the 452 form of an absolute CRI reference: 454 1. Establish the base CRI of the CRI reference and express it in the 455 form of an abstract absolute CRI reference. 457 2. Initialize a buffer with the sections from the base CRI. 459 3. If the value of discard is "true" in the CRI reference, replace 460 the path in the buffer with the empty array, unset query and 461 fragment, and set a "true" authority to "null". If the value of 462 discard is an unsigned number, remove as many elements from the 463 end of the path array; if it is non-zero, unset query and 464 fragment. Set discard to "true" in the buffer. 466 4. If the path section is set in the CRI reference, append all 467 elements from the path array to the array in the path section in 468 the buffer; unset query and fragment. 470 5. Apart from the path and discard, copy all non-null sections from 471 the CRI reference to the buffer in sequence; unset fragment if 472 query is non-null and thus copied. 474 6. Return the sections in the buffer as the resolved CRI. 476 6. Relationship between CRIs, URIs and IRIs 478 CRIs are meant to replace both Uniform Resource Identifiers (URIs) 479 [RFC3986] and Internationalized Resource Identifiers (IRIs) [RFC3987] 480 in constrained environments [RFC7228]. Applications in these 481 environments may never need to use URIs and IRIs directly, especially 482 when the resource identifier is used simply for identification 483 purposes or when the CRI can be directly converted into a CoAP 484 request. 486 However, it may be necessary in other environments to determine the 487 associated URI or IRI of a CRI, and vice versa. Applications can 488 perform these conversions as follows: 490 CRI to URI 491 A CRI is converted to a URI as specified in Section 6.1. 493 URI to CRI 494 The method of converting a URI to a CRI is unspecified; 495 implementations are free to use any algorithm as long as 496 converting the resulting CRI back to a URI yields an equivalent 497 URI. 499 CRI to IRI 500 A CRI can be converted to an IRI by first converting it to a URI 501 as specified in Section 6.1, and then converting the URI to an IRI 502 as described in Section 3.2 of [RFC3987]. 504 IRI to CRI 505 An IRI can be converted to a CRI by first converting it to a URI 506 as described in Section 3.1 of [RFC3987], and then converting the 507 URI to a CRI as described above. 509 Everything in this section also applies to CRI references, URI 510 references and IRI references. 512 6.1. Converting CRIs to URIs 514 Applications MUST convert a CRI reference to a URI reference by 515 determining the components of the URI reference according to the 516 following steps and then recomposing the components to a URI 517 reference string as specified in Section 5.3 of [RFC3986]. 519 scheme 520 If the CRI reference contains a "scheme" section, the scheme 521 component of the URI reference consists of the value of that 522 section. Otherwise, the scheme component is unset. 524 authority 525 If the CRI reference contains a "host-name" or "host-ip" item, the 526 authority component of the URI reference consists of a host 527 subcomponent, optionally followed by a colon (":") character and a 528 port subcomponent. Otherwise, the authority component is unset. 530 The host subcomponent consists of the value of the "host-name" or 531 "host-ip" item. 533 The "host-name" is turned into a single string by joining the 534 elements separated by dots ("."). Any character in the value of a 535 "host-name" item that is not in the set of unreserved characters 536 (Section 2.3 of [RFC3986]) or "sub-delims" (Section 2.2 of 537 [RFC3986]) MUST be percent-encoded. 539 The value of a "host-ip" item MUST be represented as a string that 540 matches the "IPv4address" or "IP-literal" rule (Section 3.2.2 of 541 [RFC3986]). Any zone-id is appended to the string, separated by 542 "%25" as defined in Section 2 of [RFC6874], or as specified in a 543 successor zone-id specification document; this also leads to a 544 modified "IP-literal" rule as specified in these documents. 546 If the CRI reference contains a "port" item, the port subcomponent 547 consists of the value of that item in decimal notation. 548 Otherwise, the colon (":") character and the port subcomponent are 549 both omitted. 551 path 552 If the CRI reference contains a "discard" item of value "true", 553 the path component is prefixed by a slash ("/") character. If it 554 contains a "discard" item of value "0" and the "path" item is 555 present, the conversion fails. Otherwise, the path component is 556 prefixed by as many "../" components as the "discard" value minus 557 one indicates. 559 If the discard item is not present and the CRI reference contains 560 an authority that is "true", the path component of the URI 561 reference is prefixed by the zero-length string. Otherwise, the 562 path component is prefixed by a slash ("/") character. 564 If the CRI reference contains one or more "path" items, the prefix 565 is followed by the value of each item, separated by a slash ("/") 566 character. 568 Any character in the value of a "path" item that is not in the set 569 of unreserved characters or "sub-delims" or a colon (":") or 570 commercial at ("@") character MUST be percent-encoded. 572 If the authority component is present (not "null" or "true") and 573 the path component does not match the "path-abempty" rule 574 (Section 3.3 of [RFC3986]), the conversion fails. 576 If the authority component is not present, but the scheme 577 component is, and the path component does not match the "path- 578 absolute", "path-rootless" (authority == "true") or "path-empty" 579 rule (Section 3.3 of [RFC3986]), the conversion fails. 581 If neither the authority component nor the scheme component are 582 present, and the path component does not match the "path- 583 absolute", "path-noscheme" or "path-empty" rule (Section 3.3 of 584 [RFC3986]), the conversion fails. 586 query 587 If the CRI reference contains one or more "query" items, the query 588 component of the URI reference consists of the value of each item, 589 separated by an ampersand ("&") character. Otherwise, the query 590 component is unset. 592 Any character in the value of a "query" item that is not in the 593 set of unreserved characters or "sub-delims" or a colon (":"), 594 commercial at ("@"), slash ("/") or question mark ("?") character 595 MUST be percent-encoded. Additionally, any ampersand character 596 ("&") in the item value MUST be percent-encoded. 598 fragment 599 If the CRI reference contains a fragment item, the fragment 600 component of the URI reference consists of the value of that item. 601 Otherwise, the fragment component is unset. 603 Any character in the value of a "fragment" item that is not in the 604 set of unreserved characters or "sub-delims" or a colon (":"), 605 commercial at ("@"), slash ("/") or question mark ("?") character 606 MUST be percent-encoded. 608 7. Implementation Status 610 With the exception of the authority=true fix and host-names split 611 into labels, CRIs are implemented in "https://gitlab.com/chrysn/ 612 micrurus". 614 8. Security Considerations 616 Parsers of CRI references must operate on input that is assumed to be 617 untrusted. This means that parsers MUST fail gracefully in the face 618 of malicious inputs. Additionally, parsers MUST be prepared to deal 619 with resource exhaustion (e.g., resulting from the allocation of big 620 data items) or exhaustion of the call stack (stack overflow). See 621 Section 10 of [RFC8949] for additional security considerations 622 relating to CBOR. 624 The security considerations discussed in Section 7 of [RFC3986] and 625 Section 8 of [RFC3987] for URIs and IRIs also apply to CRIs. 627 9. IANA Considerations 629 This document has no IANA actions. 631 10. References 633 10.1. Normative References 635 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 636 Requirement Levels", BCP 14, RFC 2119, 637 DOI 10.17487/RFC2119, March 1997, 638 . 640 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 641 Resource Identifier (URI): Generic Syntax", STD 66, 642 RFC 3986, DOI 10.17487/RFC3986, January 2005, 643 . 645 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 646 Identifiers (IRIs)", RFC 3987, DOI 10.17487/RFC3987, 647 January 2005, . 649 [RFC6874] Carpenter, B., Cheshire, S., and R. Hinden, "Representing 650 IPv6 Zone Identifiers in Address Literals and Uniform 651 Resource Identifiers", RFC 6874, DOI 10.17487/RFC6874, 652 February 2013, . 654 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 655 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 656 May 2017, . 658 [RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data 659 Definition Language (CDDL): A Notational Convention to 660 Express Concise Binary Object Representation (CBOR) and 661 JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, 662 June 2019, . 664 [RFC8949] Bormann, C. and P. Hoffman, "Concise Binary Object 665 Representation (CBOR)", STD 94, RFC 8949, 666 DOI 10.17487/RFC8949, December 2020, 667 . 669 [Unicode] The Unicode Consortium, "The Unicode Standard, Version 670 13.0.0", ISBN 978-1-936213-26-9, March 2020, 671 . 673 10.2. Informative References 675 [RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for 676 Constrained-Node Networks", RFC 7228, 677 DOI 10.17487/RFC7228, May 2014, 678 . 680 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 681 Protocol (HTTP/1.1): Message Syntax and Routing", 682 RFC 7230, DOI 10.17487/RFC7230, June 2014, 683 . 685 [RFC7252] Shelby, Z., Hartke, K., and C. Bormann, "The Constrained 686 Application Protocol (CoAP)", RFC 7252, 687 DOI 10.17487/RFC7252, June 2014, 688 . 690 [RFC8141] Saint-Andre, P. and J. Klensin, "Uniform Resource Names 691 (URNs)", RFC 8141, DOI 10.17487/RFC8141, April 2017, 692 . 694 [RFC8288] Nottingham, M., "Web Linking", RFC 8288, 695 DOI 10.17487/RFC8288, October 2017, 696 . 698 [RFC8820] Nottingham, M., "URI Design and Ownership", BCP 190, 699 RFC 8820, DOI 10.17487/RFC8820, June 2020, 700 . 702 [W3C.REC-html52-20171214] 703 Faulkner, S., Eicholz, A., Leithead, T., Danilo, A., and 704 S. Moon, "HTML 5.2", World Wide Web Consortium 705 Recommendation REC-html52-20171214, 14 December 2017, 706 . 708 Appendix A. CDDL specification 710 The full CDDL specification is somewhat redundant internally in order 711 to express trailing null suppression. 713 ; expressing null suppression 715 CRI-Reference = [ 716 ?( ((scheme, (authority / null / true) 717 // (null, authority)) 718 // discard), ; relative reference 719 ?( (path / null, query / null, fragment) // 720 (path / null, query) // 721 path) 722 ) 723 ] 725 scheme = scheme-name / scheme-id 726 scheme-name = text .regexp "[a-z][a-z0-9+.-]*" 727 scheme-id = (COAP / COAPS / HTTP / HTTPS / other-scheme) 728 .within nint 729 COAP = -1 COAPS = -2 HTTP = -3 HTTPS = -4 730 other-scheme = nint .feature "scheme-id-extension" 732 authority = [host, ?port] 733 host = (host-name // host-ip) 734 host-name = (*text) ; lowercase, NFC labels 735 host-ip = (bytes .size 4 // 736 (bytes .size 16, ?zone-id)) 737 zone-id = text 738 port = 0..65535 740 discard = true / 0..127 741 path = [*text] 742 query = [*text] 743 fragment = text 745 Appendix B. Change Log 747 This section is to be removed before publishing as an RFC. 749 Changes from -05 to -06 751 * rework authority: 753 - split reg-names at dots; 755 - add optional zone identifiers [RFC6874] to IP addresses 757 Changes from -04 to -05 759 * Simplify CBOR structure. 761 * Add implementation status section. 763 Changes from -03 to -04: 765 * Minor editorial improvements. 767 * Renamed path.type/path-type to discard. 769 * Renamed option to section, substructured into items. 771 * Simplied the table "resolution-variables". 773 * Use the CBOR structure inspired by Jim Schaad's proposals. 775 Changes from -02 to -03: 777 * Expanded the set of supported schemes (#3). 779 * Specified creation, normalization and comparison (#9). 781 * Clarified the default value of the "path.type" option (#33). 783 * Removed the "append-relation" path.type option (#41). 785 * Renumbered the remaining path.types. 787 * Renumbered the option numbers. 789 * Restructured the document. 791 * Minor editorial improvements. 793 Changes from -01 to -02: 795 * Changed the syntax of schemes to exclude upper case characters 796 (#13). 798 * Minor editorial improvements (#34 #37). 800 Changes from -00 to -01: 802 * None. 804 Acknowledgements 806 CRIs were developed by Klaus Hartke for use in the Constrained 807 RESTful Application Language (CoRAL). The current author team is 808 completing this work with a view to achieve good integration with the 809 potential use cases, both inside and outside of CoRAL. 811 Thanks to Christian Amsüss, Ari Keränen, Jim Schaad and Dave Thaler 812 for helpful comments and discussions that have shaped the document. 814 Contributors 816 Klaus Hartke 817 Ericsson 818 Torshamnsgatan 23 819 SE-16483 Stockholm 820 Sweden 822 Email: klaus.hartke@ericsson.com 824 Authors' Addresses 826 Carsten Bormann (editor) 827 Universität Bremen TZI 828 Postfach 330440 829 D-28359 Bremen 830 Germany 832 Phone: +49-421-218-63921 833 Email: cabo@tzi.org 835 Henk Birkholz 836 Fraunhofer SIT 837 Rheinstrasse 75 838 64295 Darmstadt 839 Germany 841 Email: henk.birkholz@sit.fraunhofer.de