idnits 2.17.1 draft-sporny-hashlink-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([2], [3], [4], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (October 31, 2020) is 1272 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 389 -- Looks like a reference, but probably isn't: '2' on line 391 -- Looks like a reference, but probably isn't: '3' on line 393 -- Looks like a reference, but probably isn't: '4' on line 395 == Unused Reference: 'RFC2119' is defined on line 378, but no explicit reference was found in the text ** Obsolete normative reference: RFC 7049 (Obsoleted by RFC 8949) Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Sporny 3 Internet-Draft Digital Bazaar 4 Intended status: Standards Track L. Rosenthol 5 Expires: May 4, 2021 Adobe Systems 6 October 31, 2020 8 Cryptographic Hyperlinks 9 draft-sporny-hashlink-06 11 Abstract 13 When using a hyperlink to fetch a resource from the Internet, it is 14 often useful to know if the resource has changed since the data was 15 published. Cryptographic hashes, such as SHA-256, are often used to 16 determine if published data has changed in unexpected ways. Due to 17 the nature of most hyperlinks, the cryptographic hash is often 18 published separately from the link itself. This specification 19 describes a data model and serialization formats for expressing 20 cryptographically protected hyperlinks. The mechanisms described in 21 the document enables a system to publish a hyperlink in a way that 22 empowers a consuming application to determine if the resource 23 associated with the hyperlink has changed in unexpected ways. 25 Feedback 27 This specification is a work product of the W3C Digital Verification 28 Community Group [1] and the W3C Credentials Community Group [2]. 29 Feedback related to this specification should be logged in the issue 30 tracker [3] or be sent to public-credentials@w3.org [4]. 32 Status of This Memo 34 This Internet-Draft is submitted in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF). Note that other groups may also distribute 39 working documents as Internet-Drafts. The list of current Internet- 40 Drafts is at https://datatracker.ietf.org/drafts/current/. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 This Internet-Draft will expire on May 4, 2021. 49 Copyright Notice 51 Copyright (c) 2020 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (https://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 64 1.1. Multiple Encodings . . . . . . . . . . . . . . . . . . . 4 65 2. Hashlink Data Model . . . . . . . . . . . . . . . . . . . . . 4 66 2.1. The Resource Hash . . . . . . . . . . . . . . . . . . . . 4 67 2.2. The Optional Metadata . . . . . . . . . . . . . . . . . . 4 68 2.2.1. URLs . . . . . . . . . . . . . . . . . . . . . . . . 4 69 2.2.2. Content Type . . . . . . . . . . . . . . . . . . . . 4 70 2.2.3. Experimental Metadata . . . . . . . . . . . . . . . . 5 71 3. Hashlink Serialization . . . . . . . . . . . . . . . . . . . 5 72 3.1. Hashlink URL . . . . . . . . . . . . . . . . . . . . . . 5 73 3.1.1. Serializing the Resource Hash . . . . . . . . . . . . 5 74 3.1.2. Serializing the Metadata . . . . . . . . . . . . . . 6 75 3.1.3. Deserializing the Metadata . . . . . . . . . . . . . 6 76 3.1.4. A Simple Hashlink Example . . . . . . . . . . . . . . 7 77 3.2. Hashlink as a Parameterized URL . . . . . . . . . . . . . 7 78 3.2.1. Hashlink as a Parameterized URL Example . . . . . . . 8 79 4. Hashlink Encoders and Decoders . . . . . . . . . . . . . . . 8 80 5. Security Considerations . . . . . . . . . . . . . . . . . . . 8 81 5.1. Insecure Hashing Functions . . . . . . . . . . . . . . . 8 82 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 83 6.1. Normative References . . . . . . . . . . . . . . . . . . 8 84 6.2. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 9 85 Appendix A. Security Considerations . . . . . . . . . . . . . . 9 86 Appendix B. Test Values . . . . . . . . . . . . . . . . . . . . 9 87 B.1. Simple Hashlink URL . . . . . . . . . . . . . . . . . . . 9 88 B.2. Multi-sourced Hashlink URL . . . . . . . . . . . . . . . 10 89 Appendix C. Acknowledgements . . . . . . . . . . . . . . . . . . 10 90 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 92 1. Introduction 94 Uniform Resource Locators (URLs) enable software developers to build 95 distributed systems that are able to publish information using 96 hyperlinks. When a client fetches a resource at the given hyperlink, 97 the result is typically a stream of data that the client may further 98 process. Due to the design of most hyperlinks, the data associated 99 with a hyperlink may change over time. This design feature is often 100 not an issue for systems that do not depend on static data. 102 Some software systems expect data published at a specific URL to not 103 change. For example, firmware files, operating system releases, 104 security upgrades, and other high-risk files are often distributed 105 with associated manifest files. These manifest files typically 106 utilize a cryptographic hash per URL to ensure that an attack to 107 modify the files themselves will be detected: 109 b1a653e5...de5d3e8f3 https://example.com/operating-system.iso 110 7b23bf52...557a0902c https://example.com/firmware-v4.35.bin 112 An unfortunate downside of the manifest file approach is that a 113 separate system from the URL itself must be utilized to add this 114 level of content integrity protection. In addition, the 115 cryptographic hash format for the files are often application 116 specific and are not easily upgradeable once newer and more advanced 117 cryptographic hash formats are standardized. 119 New types of distributed file storage networks have been deployed 120 over the past several decades. Examples include HTTP file mirrors 121 for the Debian Operating System, peer-to-peer file networks such as 122 BitTorrent, and content-addressed networks, such as the Inter 123 Planetary File System (IPFS). While each one of these systems have 124 their own URL format, it is currently not possible to express a 125 content-addressed URL that associates the content address to a file 126 published on each one of these networks. 128 This specification provides a simple data model and serialization 129 formats for cryptographic hyperlinks that: 131 o Enable existing URLs to add content integrity protection. 133 o Provide a URL format for multi-sourced content integrity protected 134 data. 136 o Enable URL metadata to be discarded without having to re-encode 137 the URL. 139 o Enable algorithm agility for all data model components 141 1.1. Multiple Encodings 143 A hashlink can be encoded in two different ways, the RECOMMENDED way 144 to express a hashlink is: 146 hl:: 148 To enable existing applications utilizing historical URL schemes to 149 provide content integrity protection, hashlinks may also be encoded 150 using URL parameters: 152 ?hl= 154 Implementers should take note that the URL parameter-based encoding 155 mechanism is application specific and SHOULD NOT be used unless the 156 URL resolver for the application cannot be upgraded to support the 157 RECOMMENDED encoding. 159 2. Hashlink Data Model 161 The hashlink data model is a simple expression of a cryptographic 162 hash of the resource, one or more URLs, and a content type. 164 2.1. The Resource Hash 166 The resource hash is the the mechanism that enables content integrity 167 protection for the associated data stream. The resource hash value 168 MUST be provided in a hashlink. 170 2.2. The Optional Metadata 172 All metadata associated with the hashlink is optional and is provided 173 to enable a client to more easily discover data that matches the 174 provided resource hash. 176 2.2.1. URLs 178 A hashlink may be associated with a set of one or more URLs that, 179 when dereferenced, result in data that matches the resource hash. 181 2.2.2. Content Type 183 A hashlink may be associated with exactly one Content Type that may 184 be used in protocols that support content types, such as HTTP's 185 Accept header. 187 2.2.3. Experimental Metadata 189 Application developers often need to express other important metadata 190 related to their specific application. These developers MUST use 191 this field to do so. Data expressed in this field MAY conflict with 192 keys chosen by other developers in other applications. Experimental 193 fields that become widely used are expected to be standardized and 194 become core metadata fields. 196 3. Hashlink Serialization 198 A hashlink may be serialized in one or two ways. The first is the 199 RECOMMENDED method, called a "Hashlink URL", which is a compact URL 200 representation of the Hashlink data model. The second is called a 201 "Hashlink as a Parameterized URL", which MUST NOT be used unless 202 there is no mechanism available to upgrade the application's URL 203 resolver. 205 3.1. Hashlink URL 207 The beginning of a Hashlink URL always starts with the following 208 three characters: 210 hl: 212 The remainder of the URL is a concatenation of the resource hash and, 213 optionally, the Hashlink URL metadata. 215 3.1.1. Serializing the Resource Hash 217 The value of the resource hash can be generated by utilizing the 218 following algorithm: 220 1. Generate the raw hash value by processing the resource data using 221 the cryptographic hashing algorithm. 223 2. Generate the multihash value by encoding the raw hash using the 224 Multihash Data Format [multihash]. 226 3. Generate the multibase hash by encoding the multihash value using 227 the Multibase Data Format [multibase]. 229 4. Output the multibase hash as the resource hash. 231 The example below demonstrates the output of the algorithm above for 232 a hashlink that expresses the data "Hello World!" processed using the 233 SHA-2, 256 bit, 32 byte cryptographic algorithm which is then 234 expressed using the base-58 Bitcoin base-encoding format: 236 zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e 238 3.1.2. Serializing the Metadata 240 To generate the value for the metadata, the metadata values are 241 encoded in the CBOR Data Format [RFC7049] using the following 242 algorithm: 244 1. Create the raw output map (CBOR major type 5). 246 2. If at least one URL exists, add a CBOR key of 15 (0x0f) to the 247 raw output map with a value that is an array (CBOR major type 4). 249 1. Encode each URL as a CBOR URI (CBOR type 32) and place it 250 into the array. 252 3. If the content type exists, add a CBOR key of 14 (0x0e) to the 253 raw output map with a value that is a UTF-8 byte string (0x6) and 254 the value of the content type. 256 4. If experimental metadata exists, add a CBOR key of 13 (0x0d) and 257 encode it as a map by creating a raw output map (CBOR major type 258 5). For each item in the map, serialize to CBOR where the CBOR 259 major types, the key name, and the value is derived from the 260 input data. For example a key of "foo" and a value of 200 would 261 be encoded as a CBOR major type of 2 for the key and a CBOR major 262 type of 0 for the value. 264 5. Generate the multibase value by encoding the raw output map using 265 the Multibase Data Format. 267 The example below demonstrates the output of the algorithm above for 268 metadata containing a single URL ("http://example.org/hw.txt") with a 269 content type of "text/plain" expressed using the base-58 Bitcoin 270 base-encoding format: 272 zuh8iaLobXC8g9tfma1CSTtYBakXeSTkHrYA5hmD4F7dCLw8XYwZ1GWyJ3zwF 274 3.1.3. Deserializing the Metadata 276 To deserialize the metadata, the "Serializing the Metadata" algorithm 277 is reversed. Implementers MUST use the following table to 278 deserialize keys to JSON: 280 +-----------+----------------+------------------+ 281 | Key (hex) | JSON key | JSON value | 282 +-----------+----------------+------------------+ 283 | 0x0f | "url" | Array of strings | 284 | 0x0e | "content-type" | string | 285 | 0x0d | "experimental" | JSON Object | 286 +-----------+----------------+------------------+ 288 Table 1: Multihash Algorithms Registry 290 The example below demonstrates the output of the algorithm above for 291 metadata containing a single URL ("http://example.org/hw.txt") with a 292 content type of "text/plain", and an experimental metadata key of 293 "foo" and value of 123: 295 { 296 "url": ["http://example.org/hw.txt"], 297 "content-type": "text/plain", 298 "experimental": { 299 "foo": 123 300 } 301 } 303 3.1.4. A Simple Hashlink Example 305 The example below demonstrates a simple hashlink that provides 306 content integrity protection for the "http://example.org/hw.txt" 307 file, which has a content type of "text/plain" (line breaks added for 308 readability purposes): 310 hl: 311 zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e: 312 zuh8iaLobXC8g9tfma1CSTtYBakXeSTkHrYA5hmD4F7dCLw8XYwZ1GWyJ3zwF 314 3.2. Hashlink as a Parameterized URL 316 An algorithm resulting in the same output as the one below MUST be 317 used when encoding the hashlink data model as a set of parameters in 318 a URL: 320 1. Create an empty string and assign it to the output value. 322 2. Append the first URL in the URL metadata array to the output URL. 324 3. Append a URL parameter with a key of "hl" and the value of the 325 resource hash as generated in Section 3.1.1. 327 3.2.1. Hashlink as a Parameterized URL Example 329 The example below demonstrates a simple hashlink that provides 330 content integrity protection for the "http://example.org/hw.txt" 331 file, which has a content type of "text/plain": 333 http://example.org/hw.txt?hl= 334 zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e 336 4. Hashlink Encoders and Decoders 338 Hashlink encoders and decoders MUST support the following core 339 algorithms: 341 1. The SHA-2, 256 bit, 32 byte output cryptographic hashing 342 algorithm and the associated Multihash Data Format. 344 2. The Bitcoin base58-encoding and decoding algorithm and the 345 associated Multibase Data Format. 347 Implementations MAY support algorithms and data formats in addition 348 to the ones listed above. 350 5. Security Considerations 352 This section documents the security attacks that are out of scope for 353 this specification as well as known attacks and mitigations against 354 those attacks. 356 5.1. Insecure Hashing Functions 358 There are a number of insecure cryptographic hashing functions in 359 deployment today. Among these are MD5 and SHA-1. Implementers MUST 360 throw an error by default when encoding or decoding these values. 361 Implementers MAY provide a non-default library option to override the 362 error. 364 6. References 366 6.1. Normative References 368 [multibase] 369 Benet, J. and M. Sporny, "The Multihash Data Format", 370 December 2018, . 373 [multihash] 374 Benet, J. and M. Sporny, "The Multihash Data Format", 375 August 2018, . 378 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 379 Requirement Levels", BCP 14, RFC 2119, 380 DOI 10.17487/RFC2119, March 1997, 381 . 383 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 384 Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, 385 October 2013, . 387 6.2. URIs 389 [1] https://w3c-dvcg.github.io/ 391 [2] https://w3c-ccg.github.io/ 393 [3] https://github.com/w3c-dvcg/hashlink/issues 395 [4] mailto:public-credentials@w3.org 397 Appendix A. Security Considerations 399 There are a number of security considerations to take into account 400 when implementing or utilizing this specification: TBD 402 Appendix B. Test Values 404 The following test values may be used to verify the conformance of 405 Hashlink encoders and decoders. 407 B.1. Simple Hashlink URL 409 The following Hashlink URL encodes the data "Hello World!" served 410 from the "http://example.org/hw.txt" URL with a content type of 411 "text/plain". The resource hash is generated using the SHA-2, 256 412 bit, 32 byte cryptographic algorithm which is then encoded using the 413 base-58 Bitcoin base-encoding format. The metadata options are 414 encoded using the base-58 Bitcoin base-encoding format. The final 415 Hashlink URL is (new lines added for readability purposes): 417 hl: 418 zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e: 419 zuh8iaLobXC8g9tfma1CSTtYBakXeSTkHrYA5hmD4F7dCLw8XYwZ1GWyJ3zwF 421 B.2. Multi-sourced Hashlink URL 423 The following Hashlink URL encodes the data "Hello World!" served 424 from three different networks. The first is a standard Web-based URL 425 ("http://example.org/hw.txt"), the second is an IPFS-based URL 426 ("ipfs:/ipfs/QmXfrS3pHerg44zzK6QKQj6JDk8H6cMtQS7pdXbohwNQfK/hello"), 427 and the third is a Tor-based URL ("http://c4m3g2upq6pkufl4.onion/ 428 hworld.txt"). The resource hash is generated using the SHA-2, 256 429 bit, 32 byte cryptographic algorithm which is then encoded using the 430 base-58 Bitcoin base-encoding format. The metadata options are 431 encoded using the base-58 Bitcoin base-encoding format. The final 432 Hashlink URL is (new lines added for readability purposes): 434 hl: 435 zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e: 436 z333PdTakFeJueF2bim3PaaDqbtqjkpxUc8ETSWXe6dQLWXQWvqiUdw8TJrncx3uKhwfc 437 88MtM5xZbR27FhVRUKv9ogekamVtdE3UbXnXpMRT1AseCtoBUt1NE8x2SsnJxGfiZN45V 438 VSCp6jh4dgcufL16tWrHREiSYESEGP1J75yXCvAdvKPr7nb5aYujLeay8Ww 440 Appendix C. Acknowledgements 442 The editors would like to thank the following individuals for 443 feedback on and implementations of the specification (in alphabetical 444 order): TBD 446 Portions of the work on this specification have been funded by the 447 United States Department of Homeland Security's Science and 448 Technology Directorate under contract HSHQDC-17-C-00019. The content 449 of this specification does not necessarily reflect the position or 450 the policy of the U.S. Government and no official endorsement should 451 be inferred. 453 Authors' Addresses 455 Manu Sporny 456 Digital Bazaar 457 203 Roanoke Street W. 458 Blacksburg, VA 24060 459 US 461 Phone: +1 540 961 4469 462 Email: msporny@digitalbazaar.com 463 URI: http://manu.sporny.org/ 464 Leonard Rosenthol 465 Adobe Systems 466 345 Park Ave. 467 San Jose, CA 95110-2704 468 US 470 Phone: +1 800 833 6687 471 Email: lrosenth@adobe.com 472 URI: https://www.linkedin.com/in/lrosenthol/