idnits 2.17.1 draft-thiemann-hash-urn-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (4 September 2003) is 7533 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 2141 (Obsoleted by RFC 8141) ** Obsolete normative reference: RFC 2234 (Obsoleted by RFC 4234) ** Obsolete normative reference: RFC 3548 (Obsoleted by RFC 4648) -- Obsolete informational reference (is this intentional?): RFC 3406 (Obsoleted by RFC 8141) Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Thiemann 3 Internet-Draft Freiburg University 4 Category: Informational 4 September 2003 5 Expires: March 4, 2004 7 A URN Namespace For Identifiers Based on Cryptographic Hashes 8 draft-thiemann-hash-urn-01.txt 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt. 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 This Internet-Draft will expire on March 4, 2004. 33 Copyright Notice 35 Copyright (C) The Internet Society (2003). All Rights Reserved. 37 Abstract 39 This document describes a URN namespace to identify immutable, typed 40 resources using content-based unique identifiers. The naming scheme 41 relies on an algorithm that computes identifiers from media types and 42 cryptographic hashes without a central authority. 44 1. Conventions used in this document 46 The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" 47 in this document are to be interpreted as defined in "Key words for 48 use in RFCs to Indicate Requirement Levels" [RFC2119]. 50 2. Introduction 52 A URN serves as a unique name for a resource [RFC1630]. Most URN 53 namespaces involve a central authority to ensure uniqueness of 54 assigned names. This approach has its merits but it requires 55 organizational structures for processing requests for naming and for 56 bookkeeping about used names. Thus, acquiring a URN becomes an 57 involved task not to be undertaken on a day-to-day basis. 59 A URN namespace based on cryptographic hashes enables using and 60 creating URNs on a day-to-day basis for storing and retrieving 61 immutable resources. It relies on a decentralized, algorithmic 62 assignment of identifiers by exploiting the uniqueness guarantees of 63 (cryptographic) hashes. This document contains the assignment 64 algorithm so that everyone can generate identifiers in this 65 namespace. 67 The namespace provides identifiers for typed resources with 68 application/octet-stream as a default type. 70 This namespace specification is for a formal namespace. The 71 specification adheres to the guidelines given in "Uniform Resource 72 Names (URN) Namespace Definition Mechanisms" [RFC3406]. 74 3. Specification Template 76 Namespace ID: 78 "hash" requested. 80 Registration Information: 82 Registration Version Number: 1 84 Registration Date: 2003-09-?? 86 Declared registrant of the namespace: 88 The CBUID Project 89 Institut fuer Informatik 90 Universitaet Freiburg 91 Georges-Koehler-Allee 079 92 D-79110 Freiburg 93 Germany 95 Contact: 96 Peter Thiemann 97 info@cbuid.org 99 Declaration of syntactic structure: 101 The Namespace Specific Strings (NSS) of all URNs assigned by 102 the schema described in this document will conform to the 103 syntax defined in section 2.2 of RFC2141 [RFC2141]. The formal 104 syntax of the NSS is defined by the following normative ABNF 105 [RFC2234] rules for : 107 hash-nss = [media-type] ":" [hash-scheme] ":" hash-value 108 hash-scheme = "md5" / "sha1" / "sha256" / "sha384" / "sha512" 109 hash-value = 1*(ALPHA / DIGIT / ".") 111 The following are comments and restrictions not captured by the 112 above grammar. 114 A is any MIME media type [RFC2046] which is 115 registered in the appropriate IANA registry [IANA-MT]. There 116 is no default for the specification. If omitted, 117 then the media type is unspecified, thus leaving the 118 application complete freedom to interpret the resource. 120 If the specification is omitted, then the length 121 of the unambiguously selects one of "sha1", 122 "sha256", "sha384", or "sha512" according to the following 123 table. 125 length of | implied 126 ------------------------------+----------------------------- 127 32 | "sha1" 128 56 | "sha256" 129 80 | "sha384" 130 104 | "sha512" 132 A is a non-empty sequence of characters encoding a 133 sequence of bits which must be a valid hash for the specified 134 hash-scheme. The encoding depends on the . If 135 is "md5", then is the base16 136 encoding [RFC3548] of the 16 octets of the MD5 hash value of 137 the resource (most significant octet first) so that the consists of 32 HEXDIG. If is "sha1", then 139 is the base32 encoding [RFC3548] of the 20 octets 140 of the SHA1 hash value of the resource (most significant octet 141 first) so that the consists of 32 BASE32DIG. The 142 other "sha" s are handled analogously according to 143 the above table. 145 In any case, the MUST provide the correct number 146 of bits for the chosen , 128 for "sha1", 256 for 147 "sha256", 384 for "sha384", and 512 for "sha512". 149 Examples: 151 urn:hash::md5:5307d294b6ccd9854f2deed8c1628b72 153 urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL 155 urn:hash:::JRBFASJWGY3EKRBSKFJVOVSEGNLFGTZVIJDTKURVGRKEKMRSKFGA==== 157 The implied for this identifier is "sha256" since 158 the consists of 56 BASE32DIG and specifies 256 159 bits. 161 urn:hash:text/plain::LBPI666ED2QSWVD3VSO5BG5R54TE22QL 163 The implied for this identifier is "sha1" since 164 the consists of 32 BASE32DIG and specifies 160 165 bits. 167 urn:hash:message/rfc822:md5:5307d294b6ccd9854f2deed8c1628b72 169 Relevant ancillary documentation: 171 None as yet. 173 Identifier uniqueness considerations: 175 Each identifier contains a cryptographic hash value for the 176 referenced resource. The probability that two different 177 resources have the same hash value depends on the hash 178 function. For the MD5 hash where the hash value has 128 bits, 179 it is conjectured [RFC1321] that the probability of a collision 180 is in the order of 1/2^64 by reasoning with the birthday 181 attack. For the sha1 hash where the hash value has 160 bits, 182 the same attach yields a probability of 1/2^80 for a collision. 184 Identifier persistence considerations: 186 The binding between the identifier and the referenced resource 187 is permanently established by the assignment algorithm that 188 computes the identifier from the resource. 190 The persistence of an identifier for some resource A might be 191 compromised by coming up with a different resource B with the 192 same identifier. However, this corresponds to solving the 193 "second preimage problem" for either the MD5 algorithm or an 194 algorithm of the SHA family. This problem turns out to be much 195 harder than just producing a collision. In fact, the handbook 196 of applied cryptography [HAC] estimates that computing a second 197 preimage takes on the order of 2^128 steps for MD5 and 2^160 198 steps for SHA1. 200 Process of identifier assignment: 202 Assignment is completely open, following the algorithm below. 204 The inputs of the algorithm are 205 - the name of a hash function 206 - a media type for 207 - a resource (a sequence of octets) 209 The algorithm applies the hash function to the resource, 210 converts the resulting bit sequence into a valid 211 according to the , and constructs the URN by 212 concatenating the , the , and the 213 using the syntax described above. Algorithms for 214 computing the hash functions mentioned in this document are 215 defined in the following references: 217 md5 [RFC1321] 218 sha1 [RFC3174] 219 sha256 [FIPS180-2] 220 sha384 [FIPS180-2] 221 sha512 [FIPS180-2] 223 The conversion of a to a string in base16 224 enconding proceeds as follows. The bits in the 225 are converted from most significant to least significant bit, 226 four bits at a time to their ASCII presentation. Each sequence 227 of four bits is represented by its hexadecimal digit from 228 "0123456789abcdef". That is, binary 0000 gets represented by 229 the character '0', 0001, by '1', and so on up to the 230 representation of 1111 as 'f'. 232 The conversion of a to a string in base32 233 enconding proceeds as follows. The bits in the 234 are converted from most significant to least significant bit, 235 five bits at a time to their ASCII presentation. Each sequence 236 of five bits is represented by its base32 digit from 237 "abcdefghijklmnopqrstuvwxyz234567" as defined in [RFC3548]. 238 That is, binary 00000 gets represented by the character 'a', 239 00001, by 'b', and so on up to the representation of 11111 as 240 '7'. A value that does not consist of a number of bits which is 241 divisible by five is padded with zero bits to the next multiple 242 of five. The length of a base32 encoded bit string is always 243 divisible by eight. Padding of an incomplete 8 character group 244 is done using the character '='. 246 Process of identifier resolution: 248 Not specified. 250 Rules for Lexical Equivalence: 252 Lexical equivalence is identity after normalization. An 253 identifier in the cbuid URN namespace is normalized by 254 converting all characters to lower case 256 Conformance with URN Syntax: 258 There are no additional characters reserved. 260 Validation mechanism: 262 Each identifier in the namespace MUST conform with the syntax 263 specified above. 265 Scope: 267 The namespace is global and public. 269 4. IANA Considerations 271 This document includes a URN namespace registration that is to be 272 entered into the IANA registry for URN NIDs. 274 5. Namespace Considerations 276 Many URN namespaces are assigned to organizations and rely on a 277 centralized registry to achieve uniqueness and persistency. In 278 contrast, the hash namespace is not tied to any organization. 279 Assignment of identifiers can be performed and verified individually, 280 while uniqueness is still preserved (with a probability close to 1). 282 The hard coding of the hashing schemes into the namespace definition 283 is intentional. This is because a valid identifier should be able to 284 act as a proxy for the the named resource. That way, metainformation 285 of descriptive or authoritative nature (such as endorsements, 286 signatures, etc) can be attached to the identifier and need not be 287 bundled with the actual resource. Such a proxy functionality is only 288 guaranteed as long as the underlying hashing scheme is not 289 compromised, that is, as long as no collisions are found. 291 The encoding of the hash value is also hard coded into the 292 definition. We have chosen not to make the encoding an additional 293 parameter of the URN scheme for two reasons 295 1. it would make identifier normalization non-trivial; 297 2. each hashing scheme has a standard encoding, which should be 298 reflected in the identifier. 300 One problem is the phasing out of compromised hash schemes. For 301 instance, many believe that MD5 is "not sufficiently secure" on the 302 grounds that it only provides 128 bit hashes and that colliding 303 inputs have been constructed. However, the only known approach for 304 solving the second preimage problem, which appears to be more 305 relevant for the application as an identifier, is brute force search 306 through on the order of 2^128 inputs. 308 If a procedure for computing a second preimage in significantly fewer 309 operations is ever published, then resolvers SHOULD refuse to resolve 310 the compromised hash scheme. This is in line with the semantics of 311 URNs which need to identify a resource uniquely but the resource need 312 not be available forever (cf. the discussion in BCP 66 [RFC3406]). 314 6. Community Considerations 316 Similar URNs are in use in peer-to-peer file transfer systems. Most 317 of them do not include a mediatype, although this practice can 318 provide extra guarantees. For example, a provider of metainformation 319 can state that mediatype of the resource has been verified by 320 including the mediatype in the published URN. For many formats, the 321 mediatype provides an additional self-verifiable attribute. 323 Some URI schemes in common use may be easily derived from the hash 324 scheme. 326 1. The sha1 scheme 328 urn:sha1: 330 is equivalent to 332 urn:hash::sha1: 334 and even to 336 urn:hash::: 338 2. Another proposed scheme is based on the data URL 339 urn:data-hash:text/plain;sha1, 341 which is equivalent to 343 urn:hash:text/plain:sha1: 345 In this case, the identifier from the hash namespace has a 346 simpler, more regular structure. 348 7. Security Considerations 350 The use of the namespace per se does have security implications. 351 However, it should be kept in mind that the uniqueness guarantee 352 given by cryptographic hashes is only probabilistic and that no known 353 procedure (save bitwise comparision) can provide a 100% guarantee of 354 the identify of the hashed resource. 356 Normative References 358 [FIPS180-2] National Institute of Standards and Technology, 359 "Specifications for the SECURE HASH STANDARD", August 2002. 360 http://csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf 362 [RFC1321] Rivest, R. L., "The MD5 Message-Digest Algorithm", RFC 363 1321, April 1992. 365 [RFC2046] Freed, N., and Borenstein, N., "Multipurpose Internet Mail 366 Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996. 368 [RFC2119] Bradner, S., "Key Words for Use in RFCs to Indicate 369 Requirement Levels", RFC 2119, March 1997. 371 [RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. 373 [RFC2234] Crocker, D., Editor, and P. Overell, "Augmented BNF for 374 Syntax Specifications: ABNF", RFC 2234, November 1997. 376 [RFC3174] Eastlake, E., and Jones, P., "US Secure Hash Algorithm 1 377 (SHA1)", RFC 3174, September 2001. 379 [RFC3548] Josefsson, S. (Ed.), "The Base16, Base32, and Base64 Data 380 Encodings", RFC 3548, July 2003. 382 Informational References 384 [HAC] Menezes, Alfred J., van Oorschot, Paul C., and Vanstone, Scott 385 A., Handbook of Applied Cryptography, CRC Press, 5th printing, August 386 2001. 388 [IANA-MT] IANA Registry of Media Types: ftp://ftp.isi.edu/in- 389 notes/iana/assignments/media-types/ 391 [RFC1630] Berners-Lee, T., "Universal Resource Identifiers in WWW," 392 RFC 1630, June 1994. 394 [RFC3406] Daigle, L., van Gulik, D.W., Iannella, R., and Faltstrom, 395 P., "Uniform Resource Names (URN) Namespace Definition Mechanisms", 396 RFC 3406, October 2002. 398 Contributors 400 Stephanie Kollenz 402 Matthias Neubauer 404 Author's Address 406 Peter Thiemann 407 Institut fuer Informatik 408 Universitaet Freiburg 409 Georges-Koehler-Allee 079 410 D-79110 Freiburg 411 Germany 413 Phone: +49 761 203 8051 414 EMail: thiemann@acm.org 415 URL: http://www.informatik.uni-freiburg.de/~thiemann 417 Intellectual Property Statement 419 The IETF takes no position regarding the validity or scope of any 420 intellectual property or other rights that might be claimed to 421 pertain to the implementation or use of the technology described in 422 this document or the extent to which any license under such rights 423 might or might not be available; neither does it represent that it 424 has made any effort to identify any such rights. Information on the 425 IETF's procedures with respect to rights in standards-track and 426 standards-related documentation can be found in BCP-11. Copies of 427 claims of rights made available for publication and any assurances of 428 licenses to be made available, or the result of an attempt made to 429 obtain a general license or permission for the use of such 430 proprietary rights by implementors or users of this specification can 431 be obtained from the IETF Secretariat. 433 The IETF invites any interested party to bring to its attention any 434 copyrights, patents or patent applications, or other proprietary 435 rights which may cover technology that may be required to practice 436 this standard. Please address the information to the IETF Executive 437 Director. 439 Full Copyright Statement 441 Copyright (C) The Internet Society (2003). All Rights Reserved. 443 This document and translations of it may be copied and furnished to 444 others, and derivative works that comment on or otherwise explain it 445 or assist in its implementation may be prepared, copied, published 446 and distributed, in whole or in part, without restriction of any 447 kind, provided that the above copyright notice and this paragraph are 448 included on all such copies and derivative works. However, this 449 document itself may not be modified in any way, such as by removing 450 the copyright notice or references to the Internet Society or other 451 Internet organizations, except as needed for the purpose of 452 developing Internet standards in which case the procedures for 453 copyrights defined in the Internet Standards process must be 454 followed, or as required to translate it into languages other than 455 English. 457 The limited permissions granted above are perpetual and will not be 458 revoked by the Internet Society or its successors or assignees. 460 This document and the information contained herein is provided on an 461 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 462 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 463 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 464 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 465 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 467 Acknowledgement 469 Funding for the RFC Editor function is currently provided by the 470 Internet Society.