idnits 2.17.1 draft-sporny-hashlink-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([2], [3], [4], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (Nov 18, 2019) is 1614 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 388 -- Looks like a reference, but probably isn't: '2' on line 390 -- Looks like a reference, but probably isn't: '3' on line 392 -- Looks like a reference, but probably isn't: '4' on line 394 == Unused Reference: 'RFC2119' is defined on line 377, but no explicit reference was found in the text ** Obsolete normative reference: RFC 7049 (Obsoleted by RFC 8949) Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Sporny 3 Internet-Draft Digital Bazaar 4 Intended status: Standards Track Nov 18, 2019 5 Expires: May 21, 2020 7 Cryptographic Hyperlinks 8 draft-sporny-hashlink-04 10 Abstract 12 When using a hyperlink to fetch a resource from the Internet, it is 13 often useful to know if the resource has changed since the data was 14 published. Cryptographic hashes, such as SHA-256, are often used to 15 determine if published data has changed in unexpected ways. Due to 16 the nature of most hyperlinks, the cryptographic hash is often 17 published separately from the link itself. This specification 18 describes a data model and serialization formats for expressing 19 cryptographically protected hyperlinks. The mechanisms described in 20 the document enables a system to publish a hyperlink in a way that 21 empowers a consuming application to determine if the resource 22 associated with the hyperlink has changed in unexpected ways. 24 Feedback 26 This specification is a work product of the W3C Digital Verification 27 Community Group [1] and the W3C Credentials Community Group [2]. 28 Feedback related to this specification should be logged in the issue 29 tracker [3] or be sent to public-credentials@w3.org [4]. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at https://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on May 21, 2020. 48 Copyright Notice 50 Copyright (c) 2019 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (https://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 63 1.1. Multiple Encodings . . . . . . . . . . . . . . . . . . . 4 64 2. Hashlink Data Model . . . . . . . . . . . . . . . . . . . . . 4 65 2.1. The Resource Hash . . . . . . . . . . . . . . . . . . . . 4 66 2.2. The Optional Metadata . . . . . . . . . . . . . . . . . . 4 67 2.2.1. URLs . . . . . . . . . . . . . . . . . . . . . . . . 4 68 2.2.2. Content Type . . . . . . . . . . . . . . . . . . . . 4 69 2.2.3. Experimental Metadata . . . . . . . . . . . . . . . . 5 70 3. Hashlink Serialization . . . . . . . . . . . . . . . . . . . 5 71 3.1. Hashlink URL . . . . . . . . . . . . . . . . . . . . . . 5 72 3.1.1. Serializing the Resource Hash . . . . . . . . . . . . 5 73 3.1.2. Serializing the Metadata . . . . . . . . . . . . . . 6 74 3.1.3. Deserializing the Metadata . . . . . . . . . . . . . 6 75 3.1.4. A Simple Hashlink Example . . . . . . . . . . . . . . 7 76 3.2. Hashlink as a Parameterized URL . . . . . . . . . . . . . 7 77 3.2.1. Hashlink as a Parameterized URL Example . . . . . . . 8 78 4. Hashlink Encoders and Decoders . . . . . . . . . . . . . . . 8 79 5. Security Considerations . . . . . . . . . . . . . . . . . . . 8 80 5.1. Insecure Hashing Functions . . . . . . . . . . . . . . . 8 81 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 82 6.1. Normative References . . . . . . . . . . . . . . . . . . 8 83 6.2. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 9 84 Appendix A. Security Considerations . . . . . . . . . . . . . . 9 85 Appendix B. Test Values . . . . . . . . . . . . . . . . . . . . 9 86 B.1. Simple Hashlink URL . . . . . . . . . . . . . . . . . . . 9 87 B.2. Multi-sourced Hashlink URL . . . . . . . . . . . . . . . 10 88 Appendix C. Acknowledgements . . . . . . . . . . . . . . . . . . 10 89 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 10 91 1. Introduction 93 Uniform Resource Locators (URLs) enable software developers to build 94 distributed systems that are able to publish information using 95 hyperlinks. When a client fetches a resource at the given hyperlink, 96 the result is typically a stream of data that the client may further 97 process. Due to the design of most hyperlinks, the data associated 98 with a hyperlink may change over time. This design feature is often 99 not an issue for systems that do not depend on static data. 101 Some software systems expect data published at a specific URL to not 102 change. For example, firmware files, operating system releases, 103 security upgrades, and other high-risk files are often distributed 104 with associated manifest files. These manifest files typically 105 utilize a cryptographic hash per URL to ensure that an attack to 106 modify the files themselves will be detected: 108 b1a653e5...de5d3e8f3 https://example.com/operating-system.iso 109 7b23bf52...557a0902c https://example.com/firmware-v4.35.bin 111 An unfortunate downside of the manifest file approach is that a 112 separate system from the URL itself must be utilized to add this 113 level of content integrity protection. In addition, the 114 cryptographic hash format for the files are often application 115 specific and are not easily upgradeable once newer and more advanced 116 cryptographic hash formats are standardized. 118 New types of distributed file storage networks have been deployed 119 over the past several decades. Examples include HTTP file mirrors 120 for the Debian Operating System, peer-to-peer file networks such as 121 BitTorrent, and content-addressed networks, such as the Inter 122 Planetary File System (IPFS). While each one of these systems have 123 their own URL format, it is currently not possible to express a 124 content-addressed URL that associates the content address to a file 125 published on each one of these networks. 127 This specification provides a simple data model and serialization 128 formats for cryptographic hyperlinks that: 130 o Enable existing URLs to add content integrity protection. 132 o Provide a URL format for multi-sourced content integrity protected 133 data. 135 o Enable URL metadata to be discarded without having to re-encode 136 the URL. 138 o Enable algorithm agility for all data model components 140 1.1. Multiple Encodings 142 A hashlink can be encoded in two different ways, the RECOMMENDED way 143 to express a hashlink is: 145 hl:: 147 To enable existing applications utilizing historical URL schemes to 148 provide content integrity protection, hashlinks may also be encoded 149 using URL parameters: 151 ?hl= 153 Implementers should take note that the URL parameter-based encoding 154 mechanism is application specific and SHOULD NOT be used unless the 155 URL resolver for the application cannot be upgraded to support the 156 RECOMMENDED encoding. 158 2. Hashlink Data Model 160 The hashlink data model is a simple expression of a cryptographic 161 hash of the resource, one or more URLs, and a content type. 163 2.1. The Resource Hash 165 The resource hash is the the mechanism that enables content integrity 166 protection for the associated data stream. The resource hash value 167 MUST be provided in a hashlink. 169 2.2. The Optional Metadata 171 All metadata associated with the hashlink is optional and is provided 172 to enable a client to more easily discover data that matches the 173 provided resource hash. 175 2.2.1. URLs 177 A hashlink may be associated with a set of one or more URLs that, 178 when dereferenced, result in data that matches the resource hash. 180 2.2.2. Content Type 182 A hashlink may be associated with exactly one Content Type that may 183 be used in protocols that support content types, such as HTTP's 184 Accept header. 186 2.2.3. Experimental Metadata 188 Application developers often need to express other important metadata 189 related to their specific application. These developers MUST use 190 this field to do so. Data expressed in this field MAY conflict with 191 keys chosen by other developers in other applications. Experimental 192 fields that become widely used are expected to be standardized and 193 become core metadata fields. 195 3. Hashlink Serialization 197 A hashlink may be serialized in one or two ways. The first is the 198 RECOMMENDED method, called a "Hashlink URL", which is a compact URL 199 representation of the Hashlink data model. The second is called a 200 "Hashlink as a Parameterized URL", which MUST NOT be used unless 201 there is no mechanism available to upgrade the application's URL 202 resolver. 204 3.1. Hashlink URL 206 The beginning of a Hashlink URL always starts with the following 207 three characters: 209 hl: 211 The remainder of the URL is a concatenation of the resource hash and, 212 optionally, the Hashlink URL metadata. 214 3.1.1. Serializing the Resource Hash 216 The value of the resource hash can be generated by utilizing the 217 following algorithm: 219 1. Generate the raw hash value by processing the resource data using 220 the cryptographic hashing algorithm. 222 2. Generate the multihash value by encoding the raw hash using the 223 Multihash Data Format [multihash]. 225 3. Generate the multibase hash by encoding the multihash value using 226 the Multibase Data Format [multibase]. 228 4. Output the multibase hash as the resource hash. 230 The example below demonstrates the output of the algorithm above for 231 a hashlink that expresses the data "Hello World!" processed using the 232 SHA-2, 256 bit, 32 byte cryptographic algorithm which is then 233 expressed using the base-58 Bitcoin base-encoding format: 235 zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e 237 3.1.2. Serializing the Metadata 239 To generate the value for the metadata, the metadata values are 240 encoded in the CBOR Data Format [RFC7049] using the following 241 algorithm: 243 1. Create the raw output map (CBOR major type 5). 245 2. If at least one URL exists, add a CBOR key of 15 (0x0f) to the 246 raw output map with a value that is an array (CBOR major type 4). 248 1. Encode each URL as a CBOR URI (CBOR type 32) and place it 249 into the array. 251 3. If the content type exists, add a CBOR key of 14 (0x0e) to the 252 raw output map with a value that is a UTF-8 byte string (0x6) and 253 the value of the content type. 255 4. If experimental metadata exists, add a CBOR key of 13 (0x0d) and 256 encode it as a map by creating a raw output map (CBOR major type 257 5). For each item in the map, serialize to CBOR where the CBOR 258 major types, the key name, and the value is derived from the 259 input data. For example a key of "foo" and a value of 200 would 260 be encoded as a CBOR major type of 2 for the key and a CBOR major 261 type of 0 for the value. 263 5. Generate the multibase value by encoding the raw output map using 264 the Multibase Data Format. 266 The example below demonstrates the output of the algorithm above for 267 metadata containing a single URL ("http://example.org/hw.txt") with a 268 content type of "text/plain" expressed using the base-58 Bitcoin 269 base-encoding format: 271 zuh8iaLobXC8g9tfma1CSTtYBakXeSTkHrYA5hmD4F7dCLw8XYwZ1GWyJ3zwF 273 3.1.3. Deserializing the Metadata 275 To deserialize the metadata, the "Serializing the Metadata" algorithm 276 is reversed. Implementers MUST use the following table to 277 deserialize keys to JSON: 279 +-----------+----------------+------------------+ 280 | Key (hex) | JSON key | JSON value | 281 +-----------+----------------+------------------+ 282 | 0x0f | "url" | Array of strings | 283 | 0x0e | "content-type" | string | 284 | 0x0d | "experimental" | JSON Object | 285 +-----------+----------------+------------------+ 287 Table 1: Multihash Algorithms Registry 289 The example below demonstrates the output of the algorithm above for 290 metadata containing a single URL ("http://example.org/hw.txt") with a 291 content type of "text/plain", and an experimental metadata key of 292 "foo" and value of 123: 294 { 295 "url": ["http://example.org/hw.txt"], 296 "content-type": "text/plain", 297 "experimental": { 298 "foo": 123 299 } 300 } 302 3.1.4. A Simple Hashlink Example 304 The example below demonstrates a simple hashlink that provides 305 content integrity protection for the "http://example.org/hw.txt" 306 file, which has a content type of "text/plain" (line breaks added for 307 readability purposes): 309 hl: 310 zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e: 311 zuh8iaLobXC8g9tfma1CSTtYBakXeSTkHrYA5hmD4F7dCLw8XYwZ1GWyJ3zwF 313 3.2. Hashlink as a Parameterized URL 315 An algorithm resulting in the same output as the one below MUST be 316 used when encoding the hashlink data model as a set of parameters in 317 a URL: 319 1. Create an empty string and assign it to the output value. 321 2. Append the first URL in the URL metadata array to the output URL. 323 3. Append a URL parameter with a key of "hl" and the value of the 324 resource hash as generated in Section 3.1.1. 326 3.2.1. Hashlink as a Parameterized URL Example 328 The example below demonstrates a simple hashlink that provides 329 content integrity protection for the "http://example.org/hw.txt" 330 file, which has a content type of "text/plain": 332 http://example.org/hw.txt?hl= 333 zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e 335 4. Hashlink Encoders and Decoders 337 Hashlink encoders and decoders MUST support the following core 338 algorithms: 340 1. The SHA-2, 256 bit, 32 byte output cryptographic hashing 341 algorithm and the associated Multihash Data Format. 343 2. The Bitcoin base58-encoding and decoding algorithm and the 344 associated Multibase Data Format. 346 Implementations MAY support algorithms and data formats in addition 347 to the ones listed above. 349 5. Security Considerations 351 This section documents the security attacks that are out of scope for 352 this specification as well as known attacks and mitigations against 353 those attacks. 355 5.1. Insecure Hashing Functions 357 There are a number of insecure cryptographic hashing functions in 358 deployment today. Among these are MD5 and SHA-1. Implementers MUST 359 throw an error by default when encoding or decoding these values. 360 Implementers MAY provide a non-default library option to override the 361 error. 363 6. References 365 6.1. Normative References 367 [multibase] 368 Benet, J. and M. Sporny, "The Multihash Data Format", 369 December 2018, . 372 [multihash] 373 Benet, J. and M. Sporny, "The Multihash Data Format", 374 August 2018, . 377 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 378 Requirement Levels", BCP 14, RFC 2119, 379 DOI 10.17487/RFC2119, March 1997, 380 . 382 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 383 Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, 384 October 2013, . 386 6.2. URIs 388 [1] https://w3c-dvcg.github.io/ 390 [2] https://w3c-ccg.github.io/ 392 [3] https://github.com/w3c-dvcg/hashlink/issues 394 [4] mailto:public-credentials@w3.org 396 Appendix A. Security Considerations 398 There are a number of security considerations to take into account 399 when implementing or utilizing this specification: TBD 401 Appendix B. Test Values 403 The following test values may be used to verify the conformance of 404 Hashlink encoders and decoders. 406 B.1. Simple Hashlink URL 408 The following Hashlink URL encodes the data "Hello World!" served 409 from the "http://example.org/hw.txt" URL with a content type of 410 "text/plain". The resource hash is generated using the SHA-2, 256 411 bit, 32 byte cryptographic algorithm which is then encoded using the 412 base-58 Bitcoin base-encoding format. The metadata options are 413 encoded using the base-58 Bitcoin base-encoding format. The final 414 Hashlink URL is (new lines added for readability purposes): 416 hl: 417 zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e: 418 zuh8iaLobXC8g9tfma1CSTtYBakXeSTkHrYA5hmD4F7dCLw8XYwZ1GWyJ3zwF 420 B.2. Multi-sourced Hashlink URL 422 The following Hashlink URL encodes the data "Hello World!" served 423 from three different networks. The first is a standard Web-based URL 424 ("http://example.org/hw.txt"), the second is an IPFS-based URL 425 ("ipfs:/ipfs/QmXfrS3pHerg44zzK6QKQj6JDk8H6cMtQS7pdXbohwNQfK/hello"), 426 and the third is a Tor-based URL ("http://c4m3g2upq6pkufl4.onion/ 427 hworld.txt"). The resource hash is generated using the SHA-2, 256 428 bit, 32 byte cryptographic algorithm which is then encoded using the 429 base-58 Bitcoin base-encoding format. The metadata options are 430 encoded using the base-58 Bitcoin base-encoding format. The final 431 Hashlink URL is (new lines added for readability purposes): 433 hl: 434 zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e: 435 z333PdTakFeJueF2bim3PaaDqbtqjkpxUc8ETSWXe6dQLWXQWvqiUdw8TJrncx3uKhwfc 436 88MtM5xZbR27FhVRUKv9ogekamVtdE3UbXnXpMRT1AseCtoBUt1NE8x2SsnJxGfiZN45V 437 VSCp6jh4dgcufL16tWrHREiSYESEGP1J75yXCvAdvKPr7nb5aYujLeay8Ww 439 Appendix C. Acknowledgements 441 The editors would like to thank the following individuals for 442 feedback on and implementations of the specification (in alphabetical 443 order): TBD 445 Portions of the work on this specification have been funded by the 446 United States Department of Homeland Security's Science and 447 Technology Directorate under contract HSHQDC-17-C-00019. The content 448 of this specification does not necessarily reflect the position or 449 the policy of the U.S. Government and no official endorsement should 450 be inferred. 452 Author's Address 454 Manu Sporny 455 Digital Bazaar 456 203 Roanoke Street W. 457 Blacksburg, VA 24060 458 US 460 Phone: +1 540 961 4469 461 Email: msporny@digitalbazaar.com 462 URI: http://manu.sporny.org/