idnits 2.17.1 draft-hallambaker-udf-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 314 has weird spacing: '...a 47 cc ab fe...' == Line 317 has weird spacing: '...9 e0 bd ea 47...' -- The document date (September 19, 2016) is 2769 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2119' is mentioned on line 123, but not defined == Missing Reference: 'TBS' is mentioned on line 165, but not defined ** Downref: Normative reference to an Informational RFC: RFC 1321 Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Hallam-Baker 3 Internet-Draft Comodo Group Inc. 4 Intended status: Standards Track September 19, 2016 5 Expires: March 23, 2017 7 Uniform Data Fingerprint (UDF) 8 draft-hallambaker-udf-04 10 Abstract 12 This document describes means of generating Uniform Data Fingerprint 13 (UDF) values and their presentation as text sequences and as URIs. 15 Cryptographic digests provide a means of uniquely identifying static 16 data without the need for a registration authority. A fingerprint is 17 a form of presenting a cryptographic digest that makes it suitable 18 for use in applications where human readability is required. The UDF 19 fingerprint format improves over existing formats through the 20 introduction of a compact algorithm identifier affording an 21 intentionally limited choice of digest algorithm and the inclusion of 22 an IANA registered MIME Content-Type identifier within the scope of 23 the digest input to allow the use of a single fingerprint format in 24 multiple application domains. 26 Alternative means of rendering fingerprint values are considered 27 including machine-readable codes, word and image lists. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on March 23, 2017. 46 Copyright Notice 48 Copyright (c) 2016 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 65 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 66 2.1. Algorithm Identifier . . . . . . . . . . . . . . . . . . 4 67 2.2. Content Type Identifier . . . . . . . . . . . . . . . . . 4 68 2.3. Representation . . . . . . . . . . . . . . . . . . . . . 5 69 2.4. Truncation . . . . . . . . . . . . . . . . . . . . . . . 5 70 3. Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 6 71 3.1. Binary Fingerprint Value . . . . . . . . . . . . . . . . 6 72 3.1.1. Version ID . . . . . . . . . . . . . . . . . . . . . 6 73 3.2. Truncation . . . . . . . . . . . . . . . . . . . . . . . 6 74 3.3. Base32 Representation . . . . . . . . . . . . . . . . . . 7 75 3.4. URI Representation . . . . . . . . . . . . . . . . . . . 7 76 3.5. Examples . . . . . . . . . . . . . . . . . . . . . . . . 7 77 3.5.1. Using SHA-2-512 Digest . . . . . . . . . . . . . . . 7 78 3.5.2. Using SHA-3-512 Digest . . . . . . . . . . . . . . . 8 79 3.6. Key Improvement . . . . . . . . . . . . . . . . . . . . . 8 80 3.7. Work Hardening . . . . . . . . . . . . . . . . . . . . . 8 81 4. Content Types . . . . . . . . . . . . . . . . . . . . . . . . 8 82 4.1. PKIX keyInfo . . . . . . . . . . . . . . . . . . . . . . 8 83 4.2. OpenPGP Key . . . . . . . . . . . . . . . . . . . . . . . 8 84 5. Additional UDF Renderings . . . . . . . . . . . . . . . . . . 8 85 5.1. Machine Readable Rendering . . . . . . . . . . . . . . . 8 86 5.2. Word Lists . . . . . . . . . . . . . . . . . . . . . . . 8 87 5.3. Image List . . . . . . . . . . . . . . . . . . . . . . . 9 88 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 89 6.1. Precision . . . . . . . . . . . . . . . . . . . . . . . . 9 90 6.2. Use of Truncated Digests . . . . . . . . . . . . . . . . 9 91 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 92 7.1. URI Registration . . . . . . . . . . . . . . . . . . . . 9 93 7.2. Content Type Registration . . . . . . . . . . . . . . . . 9 94 7.3. Version Registry . . . . . . . . . . . . . . . . . . . . 9 95 8. Normative References . . . . . . . . . . . . . . . . . . . . 9 96 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 10 98 1. Definitions 100 Cryptographic Digest Function 102 Digest 104 Fingerprint 106 Hash 108 Presentation 110 Fingerprint Strengthening 112 Fingerprint Work Hardening 114 Work Factor 116 Content-Type 118 1.1. Requirements Language 120 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 121 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 122 document are to be interpreted as described in RFC 2119 [RFC2119]. 124 2. Introduction 126 The use of cryptographic digest functions to produce identifiers is 127 well established as a means of generating a unique identifier for 128 fixed data without the need for a registration authority. 130 While the use of fingerprints of public keys was popularized by PGP, 131 they are employed in many other applications including OpenPGP, SSH, 132 BitCoin and PKIX. 134 A cryptographic digest is a particular form of hash function that has 135 the properties: 137 It is easy to compute the digest value for any given message 139 It is infeasible to generate a message from its digest value 140 It is infeasible to modify a message without changing the digest 141 value 143 It is infeasible to find two different messages with the same digest 144 value. 146 If these properties are met, the only way that two data objects that 147 map to the same digest value is by random chance. If the number of 148 possible digest values is sufficiently large (i.e. is a sufficiently 149 large number of bits in length), this chance is reduced to an 150 arbitrarily infinitesimal probability. Such values are described as 151 being probabilistically unique. 153 A fingerprint is a representation of a cryptographic digest value 154 optimized for purposes of verification and in some cases data entry. 156 2.1. Algorithm Identifier 158 Although a secure cryptographic digest algorithm has properties that 159 make it ideal for certain types of identifier use, several 160 cryptographic digest algorithms have found widespread use, some of 161 which have been demonstrated to be insecure. 163 For example the MD5 message digest algorithm [RFC1321], was widely 164 used in IETF protocols until it was demonstrated to be vulnerable to 165 collision attacks [TBS]. 167 The secure use of a fingerprint scheme therefore requires the digest 168 algorithm to either be fixed or otherwise determined by the 169 fingerprint value itself. Otherwise an attacker may be able to use a 170 weak, broken digest algorithm to generate a data object matching a 171 fingerprint value generated using a strong digest algorithm. 173 2.2. Content Type Identifier 175 A secure cryptographic digest algorithm provides a unique digest 176 value that is probabilistically unique for a particular byte sequence 177 but does not fix the context in which a byte sequence is interpreted. 178 While such ambiguity may be tolerated in a fingerprint format 179 designed for a single specific field of use, it is not acceptable in 180 a general purpose format. 182 For example, the SSH and OpenPGP applications both make use of 183 fingerprints as identifiers for the public keys used but using 184 different digest algorithms and data formats for representing the 185 public key data. While no such vulnerability has been demonstrated 186 to date, it is certainly conceivable that a crafty attacker might 187 construct an SSH key in such a fashion that OpenPGP interprets the 188 data in an insecure fashion. If the number of applications making 189 use of fingerprint format that permits such substitutions is 190 sufficiently large, the probability of a semantic substitution 191 vulnerability being possible becomes unacceptably large. 193 A simple control that defeats such attacks is to incorporate a 194 content type identifier within the scope of the data input to the 195 hash function. 197 2.3. Representation 199 The representation of a fingerprint is the format in which it is 200 presented to either an application or the user. 202 Base32 encoding is used to produce the preferred text representation 203 of a UDF fingerprint. This encoding uses only the letters of the 204 Latin alphabet with numbers chosen to minimize the risk of ambiguity 205 between numbers and letters (2, 3, 4, 5, 6 and 7). 207 To enhance readability and improve data entry, characters are grouped 208 into groups of five. 210 2.4. Truncation 212 Different applications of fingerprints demand different tradeoffs 213 between compactness of the representation and the number of 214 significant bits. A larger the number of significant bits reduces 215 the risk of collision but at a cost to convenience. 217 Modern cryptographic digest functions such as SHA-2 produce output 218 values of at least 256 bits in length. This is considerably larger 219 than most uses of fingerprints require and certainly greater than can 220 be represented in human readable form on a business card. 222 Since a strong cryptographic digest function produces an output value 223 in which every bit in the input value affects every bit in the output 224 value with equal probability, it follows that truncating the digest 225 value to produce a finger print is at least as strong as any other 226 mechanism if digest algorithm used is strong. 228 Using truncation to reduce the precision of the digest function has 229 the advantage that a lower precision fingerprint of some data content 230 is always a prefix of a higher prefix of the same content. This 231 allows higher precision fingerprints to be converted to a lower 232 precision without the need for special tools. 234 3. Encoding 236 A UDF fingerprint for a given data object is generated by calculating 237 the Binary Fingerprint Value for the given data object and type 238 identifier, truncating it to obtain the desired degree of precision 239 and then converting the truncated value to a representation. 241 3.1. Binary Fingerprint Value 243 The binary encoding of a fingerprint is calculated using the formula: 245 Fingerprint = < + H (< + ?:? + H(<)) 247 Where 249 H(x) is the cryptographic digest function 250 < is the fingerprint version and algorithm identifier. 251 < is the MIME Content-Type of the data. 252 < is the binary data. 254 The use of the nested hash function permits a fingerprint to be taken 255 of data for which a digest value is already known without the need to 256 calculate a new digest over the data. 258 The inclusion of a MIME content type prevents message substitution 259 attacks in which one content type is substituted for another. 261 3.1.1. Version ID 263 Two digest algorithm identifiers are specified in this document: 265 SHA-2-512 = 96 267 SHA-3-512 = 144 269 These algorithm identifiers have been carefully chosen so that the 270 first character in a SHA-2-512 fingerprint will always be 'M' and the 271 first character in a SHA-3-512 fingerprint will always be 'S'. These 272 provide mnemonics for 'Merkle-Damgard' and 'Sponge' respectively. 274 3.2. Truncation 276 The Binary Fingerprint Value is truncated to an integer multiple of 277 25 bits regardless of the intended output presentation. 279 The output of the hash function is truncated to a sequence of n bits 280 by first selecting the first n/8 bytes of the output function. If n 281 is an integer multiple of 8, no additional bits are required and this 282 is the result. Otherwise the remaining bits are taken from the most 283 significant bits of the next byte and any unused bits set to 0. 285 For example, to truncate the byte sequence [a0, b1, c2, d3, e4] to 25 286 bits. 25/8 = 3 bytes with 1 bit remaining, the first three bytes of 287 the truncated sequence is [a0, b1, c2] and the final byte is e4 AND 288 80 = 80 which we add to the previous result to obtain the final 289 truncated sequence of [a0, b1, c2, 80] 291 3.3. Base32 Representation 293 A modified version of Base32 [RFC4648] encoding is used to present 294 the fingerprint in text form grouping the output text into groups of 295 five characters separated by a dash '-'. This representation 296 improves the accuracy of both data entry and verification. 298 3.4. URI Representation 300 Any UDF fingerprint MAY be encoded as a URI by prefixing the Base32 301 text representation of the fingerprint with the string 'udf:' 303 3.5. Examples 305 In the following examples, is the UTF8 encoding of the 306 string "text/plain" and is the UTF8 encoding of the string "UDF Data 307 Value" 309 Data = 55 44 46 20 44 61 74 61 20 56 61 6c 75 65 311 3.5.1. Using SHA-2-512 Digest 313 H( ) = 314 48 da 47 cc ab fe a4 5c 76 61 d3 21 ba 34 3e 58 315 10 87 2a 03 b4 02 9d ab 84 7c ce d2 22 b6 9c ab 316 02 38 d4 e9 1e 2f 6b 36 a0 9e ed 11 09 8a ea ac 317 99 d9 e0 bd ea 47 93 15 bd 7a e9 e1 2e ad c4 15 318 H(H( ) + Content-ID>) = 319 45 e0 59 e0 39 34 ea b7 f6 5d 83 b2 d8 f9 b1 6d 320 2a 6b 08 63 d9 3c c1 02 86 7b 83 49 f2 d9 f0 8f 321 fe 07 87 30 c7 c9 05 74 ac a1 38 2b b3 14 4d c6 322 39 f9 8c 12 c0 4a 3e b5 05 0b 3e 67 df 52 4b 57 324 Text Presentation (100 bit)MB2GK-6DUF5-YGYYL-JNY5E 326 Text Presentation (125 bit)MB2GK-6DUF5-YGYYL-JNY5E-RWSHZ 328 Text Presentation (150bit)MB2GK-6DUF5-YGYYL-JNY5E-RWSHZ-SV75J 329 Text Presentation (250bit)MB2GK-6DUF5-YGYYL-JNY5E-RWSHZ-SV75J-C4OZQ- 330 5GIN2-GQ7FQ-EEHFI 332 3.5.2. Using SHA-3-512 Digest 334 [This data intentionally omitted pending publication of the final 335 SHA-3 standards document] 337 3.6. Key Improvement 339 3.7. Work Hardening 341 4. Content Types 343 4.1. PKIX keyInfo 345 4.2. OpenPGP Key 347 5. Additional UDF Renderings 349 By default, a UDF fingerprint is rendered in the Base32 encoding 350 described in this document. Additional renderings MAY be employed to 351 facilitate entry and/or verification of fingerprint values. 353 5.1. Machine Readable Rendering 355 The use of a machine-readable rendering such as a QR Code allows a 356 UDF value to be input directly using a smartphone or other device 357 equipped with a camera. 359 A QR code fixed to a network capable device might contain the 360 fingerprint of a machine readable description of the device. 362 5.2. Word Lists 364 The use of a Word List to encode fingerprint values was introduced by 365 Patrick Juola and Philip Zimmerman for the PGPfone application. The 366 PGP Word List is designed to facilitate exchange and verification of 367 fingerprint values in a voice application. To minimize the risk of 368 misinterpretation, two word lists of 256 values each are used to 369 encode alternative fingerprint bytes. The compact size of the lists 370 used allowed the compilers to curate them so as to maximize the 371 phonetic distance of the words selected. 373 The PGP Word List is designed to achieve a balance between ease of 374 entry and verification. Applications where only verification is 375 required may be better served by a much larger word list, permitting 376 shorter fingerprint encodings. 378 For example, a word list with 16384 entries permits 14 bits of the 379 fingerprint to be encoded at once, 65536 entries permits 16. These 380 encodings allow a 125 bit fingerprint to be encoded in 9 and 8 words 381 respectively. 383 5.3. Image List 385 An image list is used in the same manner as a word list affording 386 rapid visual verification of a fingerprint value. For obvious 387 reasons, this approach is not generally suited to data entry. 389 6. Security Considerations 391 6.1. Precision 393 6.2. Use of Truncated Digests 395 7. IANA Considerations 397 [This will be extended later] 399 7.1. URI Registration 401 [Here a URI registration for the udf: scheme] 403 7.2. Content Type Registration 405 [PKIX KeyInfo] 407 [PGP Key Packet] 409 7.3. Version Registry 411 96 = SHA-2-512 413 144 = SHA-3-512 415 8. Normative References 417 [RFC1321] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, 418 DOI 10.17487/RFC1321, April 1992. 420 [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 421 Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006. 423 Author's Address 425 Phillip Hallam-Baker 426 Comodo Group Inc. 428 Email: philliph@comodo.com