idnits 2.17.1 draft-hallambaker-udf-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: Since PKIX certificates and CLRs contain security policy information, UDF fingerprints used to identify certificates or CRLs SHOULD be presented with a minimum of 200 bits of precision. PKIX applications MUST not accept UDF fingerprints specified with less than 200 bits of precision for purposes of identifying trust anchors. -- The document date (January 6, 2019) is 1908 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 974 Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Hallam-Baker 3 Internet-Draft Comodo Group Inc. 4 Intended status: Informational January 6, 2019 5 Expires: July 10, 2019 7 Uniform Data Fingerprint (UDF) 8 draft-hallambaker-udf-12 10 Abstract 12 This document describes means of generating Uniform Data Fingerprint 13 (UDF) values and their presentation as text sequences and as URIs. 14 Uses of UDF fingerprints include but are not limited to creating 15 Strong Internet Names (SINs). 17 Cryptographic digests provide a means of uniquely identifying static 18 data without the need for a registration authority. A fingerprint is 19 a form of presenting a cryptographic digest that makes it suitable 20 for use in applications where human readability is required. The UDF 21 fingerprint format improves over existing formats through the 22 introduction of a compact algorithm identifier affording an 23 intentionally limited choice of digest algorithm and the inclusion of 24 an IANA registered MIME Content-Type identifier within the scope of 25 the digest input to allow the use of a single fingerprint format in 26 multiple application domains. 28 Alternative means of rendering fingerprint values are considered 29 including machine-readable codes, word and image lists. 31 This document is also available online at 32 http://mathmesh.com/Documents/draft-hallambaker-udf.html [1] . 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at https://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on July 10, 2019. 50 Copyright Notice 52 Copyright (c) 2019 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (https://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 68 1.1. Algorithm Identifier . . . . . . . . . . . . . . . . . . 4 69 1.2. Content Type Identifier . . . . . . . . . . . . . . . . . 4 70 1.3. Representation . . . . . . . . . . . . . . . . . . . . . 5 71 1.4. Truncation . . . . . . . . . . . . . . . . . . . . . . . 5 72 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 73 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 6 74 2.2. Defined Terms . . . . . . . . . . . . . . . . . . . . . . 6 75 2.3. Related Specifications . . . . . . . . . . . . . . . . . 7 76 2.4. Implementation Status . . . . . . . . . . . . . . . . . . 7 77 3. UDF Fingerprint . . . . . . . . . . . . . . . . . . . . . . . 7 78 3.1. Binary Fingerprint Value . . . . . . . . . . . . . . . . 7 79 3.1.1. Version ID . . . . . . . . . . . . . . . . . . . . . 8 80 3.2. Truncation . . . . . . . . . . . . . . . . . . . . . . . 9 81 3.3. Base32 Representation . . . . . . . . . . . . . . . . . . 9 82 3.4. Example Encoding . . . . . . . . . . . . . . . . . . . . 9 83 3.4.1. Using SHA-2-512 Digest . . . . . . . . . . . . . . . 9 84 3.4.2. Using SHA-3-512 Digest . . . . . . . . . . . . . . . 10 85 3.5. Fingerprint Improvement . . . . . . . . . . . . . . . . . 11 86 3.6. Compressed Presentation . . . . . . . . . . . . . . . . . 11 87 3.6.1. Example of Compressed Encoding. . . . . . . . . . . . 13 88 4. UDF Keyed Fingerprint . . . . . . . . . . . . . . . . . . . . 14 89 5. Content Types . . . . . . . . . . . . . . . . . . . . . . . . 16 90 5.1. PKIX Certificates and Keys . . . . . . . . . . . . . . . 17 91 5.2. OpenPGP Key . . . . . . . . . . . . . . . . . . . . . . . 17 92 5.3. DNSSEC . . . . . . . . . . . . . . . . . . . . . . . . . 17 93 6. URI Scheme . . . . . . . . . . . . . . . . . . . . . . . . . 17 94 6.1. Scheme Syntax . . . . . . . . . . . . . . . . . . . . . . 18 95 6.2. Scheme Semantics . . . . . . . . . . . . . . . . . . . . 18 96 6.3. Encoding considerations . . . . . . . . . . . . . . . . . 18 97 6.4. Interoperability considerations . . . . . . . . . . . . . 18 98 6.5. Security considerations . . . . . . . . . . . . . . . . . 18 99 7. Additional UDF Renderings . . . . . . . . . . . . . . . . . . 18 100 7.1. Machine Readable Rendering . . . . . . . . . . . . . . . 18 101 7.2. Word Lists . . . . . . . . . . . . . . . . . . . . . . . 18 102 7.3. Image List . . . . . . . . . . . . . . . . . . . . . . . 19 103 8. Security Considerations . . . . . . . . . . . . . . . . . . . 19 104 8.1. Work Factor and Precision . . . . . . . . . . . . . . . . 19 105 8.2. Semantic Substitution . . . . . . . . . . . . . . . . . . 20 106 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 107 9.1. URI Registration . . . . . . . . . . . . . . . . . . . . 20 108 9.2. Version Registry . . . . . . . . . . . . . . . . . . . . 21 109 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 110 10.1. Normative References . . . . . . . . . . . . . . . . . . 21 111 10.2. Informative References . . . . . . . . . . . . . . . . . 22 112 10.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 22 113 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 22 115 1. Introduction 117 The use of cryptographic digest functions to produce identifiers is 118 well established as a means of generating a unique identifier for 119 fixed data without the need for a registration authority. 121 While the use of fingerprints of public keys was popularized by PGP, 122 they are employed in many other applications including OpenPGP, SSH, 123 BitCoin and PKIX. 125 A cryptographic digest is a particular form of hash function that has 126 the properties: 128 o It is easy to compute the digest value for any given message 130 o It is infeasible to generate a message from its digest value 132 o It is infeasible to modify a message without changing the digest 133 value 135 o It is infeasible to find two different messages with the same 136 digest value. 138 If these properties are met, the only way that two data objects that 139 map to the same digest value is by random chance. If the number of 140 possible digest values is sufficiently large (i.e. is a sufficiently 141 large number of bits in length), this chance is reduced to an 142 arbitrarily infinitesimal probability. Such values are described as 143 being probabilistically unique. 145 A fingerprint is a representation of a cryptographic digest value 146 optimized for purposes of verification and in some cases data entry. 148 1.1. Algorithm Identifier 150 Although a secure cryptographic digest algorithm has properties that 151 make it ideal for certain types of identifier use, several 152 cryptographic digest algorithms have found widespread use, some of 153 which have been demonstrated to be insecure. 155 For example the MD5 message digest algorithm [RFC1321] , was widely 156 used in IETF protocols until it was demonstrated to be vulnerable to 157 collision attacks [Dobertin95] . 159 The secure use of a fingerprint scheme therefore requires the digest 160 algorithm to either be fixed or otherwise determined by the 161 fingerprint value itself. Otherwise an attacker may be able to use a 162 weak, broken digest algorithm to generate a data object matching a 163 fingerprint value generated using a strong digest algorithm. 165 The two digest algorithms currently used in the UDF scheme are both 166 believed to be strong. These are SHA-2-512 [SHA-2] and SHA-3-512 167 [SHA-3] . The most secure, 512 bit version of the algorithm is used 168 in both cases although the output is almost invariably truncated to a 169 shorter length. Use of the strongest version of the algorithm in 170 every circumstance eliminates the need to negotiate the algorithm 171 strength. 173 1.2. Content Type Identifier 175 A secure cryptographic digest algorithm provides a unique digest 176 value that is probabilistically unique for a particular byte sequence 177 but does not fix the context in which a byte sequence is interpreted. 178 While such ambiguity may be tolerated in a fingerprint format 179 designed for a single specific field of use, it is not acceptable in 180 a general purpose format. 182 For example, the SSH and OpenPGP applications both make use of 183 fingerprints as identifiers for the public keys used but using 184 different digest algorithms and data formats for representing the 185 public key data. While no such vulnerability has been demonstrated 186 to date, it is certainly conceivable that a crafty attacker might 187 construct an SSH key in such a fashion that OpenPGP interprets the 188 data in an insecure fashion. If the number of applications making 189 use of fingerprint format that permits such substitutions is 190 sufficiently large, the probability of a semantic substitution 191 vulnerability being possible becomes unacceptably large. 193 A simple control that defeats such attacks is to incorporate a 194 content type identifier within the scope of the data input to the 195 hash function. 197 1.3. Representation 199 The representation of a fingerprint is the format in which it is 200 presented to either an application or the user. 202 Base32 encoding is used to produce the preferred text representation 203 of a UDF fingerprint. This encoding uses only the letters of the 204 Latin alphabet with numbers chosen to minimize the risk of ambiguity 205 between numbers and letters (2, 3, 4, 5, 6 and 7). 207 To enhance readability and improve data entry, characters are grouped 208 into groups of five. 210 1.4. Truncation 212 Different applications of fingerprints demand different tradeoffs 213 between compactness of the representation and the number of 214 significant bits. A larger the number of significant bits reduces 215 the risk of collision but at a cost to convenience. 217 Modern cryptographic digest functions such as SHA-2 produce output 218 values of at least 256 bits in length. This is considerably larger 219 than most uses of fingerprints require and certainly greater than can 220 be represented in human readable form on a business card. 222 Since a strong cryptographic digest function produces an output value 223 in which every bit in the input value affects every bit in the output 224 value with equal probability, it follows that truncating the digest 225 value to produce a finger print is at least as strong as any other 226 mechanism if digest algorithm used is strong. 228 Using truncation to reduce the precision of the digest function has 229 the advantage that a lower precision fingerprint of some data content 230 is always a prefix of a higher prefix of the same content. This 231 allows higher precision fingerprints to be converted to a lower 232 precision without the need for special tools. 234 2. Definitions 236 This section presents the related specifications and standard, the 237 terms that are used as terms of art within the documents and the 238 terms used as requirements language. 240 2.1. Requirements Language 242 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 243 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 244 document are to be interpreted as described in [RFC2119] . 246 2.2. Defined Terms 248 Cryptographic Digest Function 250 A hash function that has the properties required for use as a 251 cryptographic hash function. These include collision resistance, 252 first pre-image resistance and second pre-image resistance. 254 Content Type An identifier indicating how a Data Value is to be 255 interpreted as specified in the IANA registry Media Types. 257 Commitment A cryptographic primitive that allows one to commit to a 258 chosen value while keeping it hidden to others, with the ability 259 to reveal the committed value later. 261 Data Value The binary octet stream that is the input to the digest 262 function used to calculate a digest value. 264 Data Object A Data Value and its associated Content Type 266 Digest Algorithm A synonym for Cryptographic Digest Function 268 Digest Value The output of a Cryptographic Digest Function 270 Data Digest Value The output of a Cryptographic Digest Function for 271 a given Data Value input. 273 Fingerprint A presentation of the digest value of a data value or 274 data object. 276 Fingerprint Presentation The representation of at least some part of 277 a fingerprint value in human or machine readable form. 279 Fingerprint Improvement The practice of recording a higher precision 280 presentation of a fingerprint on successful validation. 282 Fingerprint Work Hardening The practice of generating a sequence of 283 fingerprints until one is found that matches criteria that permit 284 a compressed presentation form to be used. The compressed 285 fingerprint thus being shorter than but presenting the same work 286 factor as an uncompressed one. 288 Hash A function which takes an input and returns a fixed-size 289 output. Ideally, the output of a hash function is unbiased and 290 not correlated to the outputs returned to similar inputs in any 291 predictable fashion. 293 Precision The number of significant bits provided by a Fingerprint 294 Presentation. 296 Work Factor A measure of the computational effort required to 297 perform an attack against some security property. 299 2.3. Related Specifications 301 This specification makes use of Base32 [RFC4648] encoding, SHA-2 302 [SHA-2] and SHA-3 [SHA-3] digest functions in the derivation of basic 303 fingerprints. The derivation of keyed fingerprints additionally 304 requires the use of the HMAC [RFC2014] and HKDF [RFC5869] functions. 306 UDFs are used in the definition of Strong Internet Names 307 [hallambaker-sin] . 309 2.4. Implementation Status 311 The implementation status of the reference code base is described in 312 the companion document [draft-hallambaker-mesh-developer] . 314 3. UDF Fingerprint 316 A UDF fingerprint for a given data object is generated by calculating 317 the Binary Fingerprint Value for the given data object and type 318 identifier, truncating it to obtain the desired degree of precision 319 and then converting the truncated value to a representation. 321 3.1. Binary Fingerprint Value 323 The binary encoding of a fingerprint is calculated using the formula: 325 Fingerprint = + H ( + ?:? + H()) 327 Figure 1 329 Where 330 H(x) is the cryptographic digest function 331 is the fingerprint version and algorithm identifier. 332 is the MIME Content-Type of the data. 333 is the binary data. 335 Figure 2 337 The use of the nested hash function permits a fingerprint to be taken 338 of data for which a digest value is already known without the need to 339 calculate a new digest over the data. 341 The inclusion of a MIME content type prevents message substitution 342 attacks in which one content type is substituted for another. 344 3.1.1. Version ID 346 A Version Identifier consists of a single byte. The following digest 347 algorithm identifiers are specified in this document: 349 +------------+------------------------+-------------------+ 350 | Version ID | Algorithm | Reference | 351 +------------+------------------------+-------------------+ 352 | 80 | HMAC-SHA-2-512 | | 353 | 96 | SHA-2-512 | | 354 | 97-100 | SHA-2-512 (compressed) | | 355 | 136 | Random data | | 356 | 144 | SHA-3-512 | | 357 | 145-149 | SHA-3-512 (compressed) | | 358 +------------+------------------------+-------------------+ 360 Table 1 362 These algorithm identifiers have been chosen so that the first 363 character in a SHA-2-512 fingerprint will always be ?M? and the first 364 character in a SHA-3-512 fingerprint will always be ?S?. These 365 provide mnemonics for ?Merkle-Damgard? and ?Sponge? respectively. 367 The first character of a keyed fingerprint will be 'K', a mnemonic 368 for 'Keyed'. 370 The version id 135 is used to identify random data such as nonce 371 values with a first character mnemonic of 'R'. While such data is 372 typically generated using a digest function, there is no need to 373 specify which one was used. 375 3.2. Truncation 377 The Binary Fingerprint Value is truncated to an integer multiple of 378 25 bits regardless of the intended output presentation. 380 The output of the hash function is truncated to a sequence of n bits 381 by first selecting the first n/8 bytes of the output function. If n 382 is an integer multiple of 8, no additional bits are required and this 383 is the result. Otherwise the remaining bits are taken from the most 384 significant bits of the next byte and any unused bits set to 0. 386 For example, to truncate the byte sequence [a0, b1, c2, d3, e4] to 25 387 bits. 25/8 = 3 bytes with 1 bit remaining, the first three bytes of 388 the truncated sequence is [a0, b1, c2] and the final byte is e4 AND 389 80 = 80 which we add to the previous result to obtain the final 390 truncated sequence of [a0, b1, c2, 80] 392 3.3. Base32 Representation 394 A modified version of Base32 [RFC4648] encoding is used to present 395 the fingerprint in text form grouping the output text into groups of 396 five characters separated by a dash ?-?. This representation improves 397 the accuracy of both data entry and verification. 399 3.4. Example Encoding 401 In the following examples, is the UTF8 encoding of the 402 string "text/plain" and is the UTF8 encoding of the string 403 "UDF Data Value" 405 Data = 406 55 44 46 20 44 61 74 61 20 56 61 6C 75 65 408 ContentType = 409 74 65 78 74 2F 70 6C 61 69 6E 411 Figure 3 413 3.4.1. Using SHA-2-512 Digest 414 H() = 415 48 DA 47 CC AB FE A4 5C 76 61 D3 21 BA 34 3E 58 416 10 87 2A 03 B4 02 9D AB 84 7C CE D2 22 B6 9C AB 417 02 38 D4 E9 1E 2F 6B 36 A0 9E ED 11 09 8A EA AC 418 99 D9 E0 BD EA 47 93 15 BD 7A E9 E1 2E AD C4 15 420 + ':' + H() = 421 74 65 78 74 2F 70 6C 61 69 6E 3A 48 DA 47 CC AB 422 FE A4 5C 76 61 D3 21 BA 34 3E 58 10 87 2A 03 B4 423 02 9D AB 84 7C CE D2 22 B6 9C AB 02 38 D4 E9 1E 424 2F 6B 36 A0 9E ED 11 09 8A EA AC 99 D9 E0 BD EA 425 47 93 15 BD 7A E9 E1 2E AD C4 15 427 H( + ':' + H()) = 428 C6 AF B7 C0 FE BE 04 E5 AE 94 E3 7B AA 5F 1A 40 429 5B A3 CE CC 97 4D 55 C0 9E 61 E4 B0 EF 9C AE F9 430 EB 83 BB 9D 5F 0F 39 F6 5F AA 06 DC 67 2A 67 71 431 4F FF 8F 83 C4 55 38 36 38 AE 42 7A 82 9C 85 BB 433 Prefixed, compressed, trimmed = 434 60 C6 AF B7 C0 FE BE 04 E5 AE 94 E3 7B AA 5F 1A 435 40 ... 437 Figure 4 439 The 125 bit fingerprint value is MDDK7-N6A72-7AJZN-OSTRX-XKS7D 441 This fingerprint MAY be specified with higher or lower precision as 442 appropriate. 444 100 bit precision MDDK7-N6A72-7AJZN-OSTRX 446 150 bit precision MDDK7-N6A72-7AJZN-OSTRX-XKS7D-JAFXI 448 200 bit precision MDDK7-N6A72-7AJZN-OSTRX-XKS7D-JAFXI-6OZSL-U2VOA 450 250 bit precision MDDK7-N6A72-7AJZN-OSTRX-XKS7D-JAFXI-6OZSL-U2VOA- 451 TZQ6J-MHPTS 453 3.4.2. Using SHA-3-512 Digest 454 H() = 455 6D 2E CF E6 93 5A 0C FC F2 A9 1A 49 E0 0C D8 07 456 A1 4E 70 AB 72 94 6E CC BB 47 48 F1 8E 41 49 95 457 07 1D F3 6E 0D 0C 8B 60 39 C1 8E B4 0F 6E C8 08 458 65 B4 C4 45 9B A2 7E 97 74 7B BE 68 BC A8 C2 17 460 + ':' + H() = 461 74 65 78 74 2F 70 6C 61 69 6E 3A 6D 2E CF E6 93 462 5A 0C FC F2 A9 1A 49 E0 0C D8 07 A1 4E 70 AB 72 463 94 6E CC BB 47 48 F1 8E 41 49 95 07 1D F3 6E 0D 464 0C 8B 60 39 C1 8E B4 0F 6E C8 08 65 B4 C4 45 9B 465 A2 7E 97 74 7B BE 68 BC A8 C2 17 467 H( + ':' + H()) = 468 8A 86 8A 06 1C 54 6E 7E 3F 75 5F 39 88 F9 FD 2F 469 8E C8 45 93 1B 80 A8 2F 29 16 7B A3 BE 21 1F 8A 470 75 61 88 A1 D5 7F 07 D5 9D 68 A4 2D 17 F4 4D 23 471 F9 E4 0B B2 1A 8D B9 F5 8D FC EC BD 01 F4 37 7C 473 Prefixed, compressed, trimmed = 474 90 8A 86 8A 06 1C 54 6E 7E 3F 75 5F 39 88 F9 FD 475 2F ... 477 Figure 5 479 The 125 bit fingerprint value is SCFIN-CQGDR-KG47R-7OVPT-TCHZ7 481 3.5. Fingerprint Improvement 483 Since an application must always calculate the full fingerprint value 484 as part of the verification process, an application MAY accept a low 485 precision (e.g. 100 bit) fingerprint value from the user and replace 486 it with a higher precision fingerprint (e.g. 250 bits) after 487 verification. 489 Applications are encouraged to make use of the practice of 490 fingerprint improvement wherever possible. 492 3.6. Compressed Presentation 494 Fingerprint compression permits the use of shorter fingerprint 495 presentation without a reduction in the attacker work factor by 496 requiring the fingerprint value to match a particular pattern. 498 UDF fingerprints MUST use compression if possible. A compressed 499 fingerprint uses a version identifier that specifies the form of 500 compression used as follows: 502 +------------+-----------+-------------------------+ 503 | Version ID | Algorithm | Compression | 504 +------------+-----------+-------------------------+ 505 | 96 | SHA-2-512 | None | 506 | 97 | SHA-2-512 | First 24 bits are zeros | 507 | 98 | SHA-2-512 | First 32 bits are zeros | 508 | 99 | SHA-2-512 | First 40 bits are zeros | 509 | 100 | SHA-2-512 | First 48 bits are zeros | 510 | 101 | SHA-2-512 | First 56 bits are zeros | 511 | 144 | SHA-3-512 | None | 512 | 145 | SHA-3-512 | First 24 bits are zeros | 513 | 146 | SHA-3-512 | First 32 bits are zeros | 514 | 147 | SHA-3-512 | First 40 bits are zeros | 515 | 148 | SHA-3-512 | First 48 bits are zeros | 516 | 149 | SHA-3-512 | First 56 bits are zeros | 517 +------------+-----------+-------------------------+ 519 Table 2 521 The compression prefixes are all multiples of 8 bits for ease of 522 implementation. 524 Currently, 24 bit compression may be achieved on commodity machines 525 with modest impact on key generation allowing use of a 100 bit (i.e. 526 20 character) presentation for a slight reduction in work factor. 527 Use of 40 bit compression has a noticeable impact, but can still be 528 achieved within hours without the use of special purpose hardware 529 (e.g. use of a GPU unit). Use of 48 bit compression is feasible with 530 a GPU and use of 56 bit compression which would allow a fingerprint 531 to be shortened by ten significant characters with increased work 532 factor is on the outer edge of practicality. While support for even 533 higher levels of compression is conceivable, it is probably not very 534 sensible. 536 Support for compression may introduce perverse incentives such as 537 performing key generation on machines that less secure but offer fast 538 (or cheap) processing power. An attacker might even offer to 539 generate public key pairs for free using their 'ultra fast' machine. 540 For this reason, it is probably desirable to at least support if not 541 mandate the use of some sort of salting scheme when compression is in 542 use. This allows the key to be generated in secure, trusted hardware 543 and only the discovery of a salt providing the desired compression 544 being performed on less trusted or untrusted devices. Such 545 approaches are outside the scope of this specification and certain 546 implementations may be subject to intellectual property claims. 548 3.6.1. Example of Compressed Encoding. 550 The string "290668103" has a SHA-2-512 UDF fingerprint with 29 551 leading zero bits. The inputs to the fingerprint are: 553 Data = 554 32 39 30 36 36 38 31 30 33 556 ContentType = 557 74 65 78 74 2F 70 6C 61 69 6E 559 H ( + ':' + H())= 561 AF ED 7C 65 22 CD 97 28 C9 1F AA D8 23 B6 0A 7C 562 1F 5B DB 51 D1 25 FE 15 FB DC 13 4D 54 80 67 3E 563 FA 91 5E F8 B1 57 AC A2 5A E5 EE D5 E9 AC B9 EE 564 1B 43 F1 23 2B F8 2E 01 EA 7F 34 24 47 FF 2C 13 566 Figure 6 568 Since the first three bytes of the final hash value are zeros, these 569 are dropped and the version identifier increased by 1: 571 H() = 572 61 31 26 83 80 72 8F BF C5 86 94 57 28 E6 82 E9 573 7F 7D 2D 99 34 88 C1 7B 5A 26 D8 C2 B0 22 45 07 574 1C 2A 76 16 AC F7 C7 66 BA 66 26 E8 B4 65 84 51 575 8C BE B3 87 DF B7 7B 05 B4 69 BE 9C BB AF C9 F3 577 + ':' + H() = 578 74 65 78 74 2F 70 6C 61 69 6E 3A 61 31 26 83 80 579 72 8F BF C5 86 94 57 28 E6 82 E9 7F 7D 2D 99 34 580 88 C1 7B 5A 26 D8 C2 B0 22 45 07 1C 2A 76 16 AC 581 F7 C7 66 BA 66 26 E8 B4 65 84 51 8C BE B3 87 DF 582 B7 7B 05 B4 69 BE 9C BB AF C9 F3 584 H( + ':' + H()) = 585 00 00 00 3B AD 4A E2 93 42 5C 6C E9 08 00 3D 4A 586 84 95 34 BB CE C6 6C AC 9E C1 E0 C0 72 E2 43 5D 587 CA 91 F7 84 93 E5 66 BC DE BC 93 F9 FA 52 27 98 588 86 DA EC CE A1 4D EE 55 C0 27 15 41 6B 30 8F 0E 590 Prefixed, compressed, trimmed = 591 61 3B AD 4A E2 93 42 5C 6C E9 08 00 3D 4A 84 95 592 34 ... 594 Figure 7 596 The 125 bit fingerprint value is ME522-SXCSN-BFY3H-JBAAD-2SUES 598 Note that the use of compression does not reduce the number of 599 characters presented. Compression increases the work factor that is 600 achieved for a given fingerprint length but does not in itself cause 601 the presentation to be changed. 603 The 125 bit UDF of the string "44870804" using SHA-3-512 is SETHM- 604 SHUAF-R7L7V-HRIEW-MQ5KT. 606 4. UDF Keyed Fingerprint 608 A Keyed UDF is a fingerprint derived using a Message Authentication 609 Code rather than a digest. 611 The inputs to the fingerprint function are 613 <Data> The content data 615 <Content-ID> The IANA content identifier 617 <Version-ID> Identifies the content digest and MAC algorithms 619 <KeyText> The key in text form. 621 The fingerprint value is 623 Fingerprint = + 624 MAC (, + ?:? + H()) 626 Figure 8 628 Where the value is calculated as follows: 630 IKM = UTF8 (Key) 631 PRK = MAC (UTF8 ("KeyedUDFMaster"), IKM) 632 OKM = HKDF-Expand(PRK, UTF8 ("KeyedUDFExpand"), HashLen) 634 Figure 9 636 Where the function UTF8(string) converts a string to the binary UTF8 637 representation, HKDF-Expand is as defined in [RFC5869] and the 638 function MAC(k,m) is the HMAC function formed from the specified hash 639 H(m) as specified in [RFC2014] . 641 Keyed UDFs are typically used in circumstances where user interaction 642 requires a cryptographic commitment type functionality 643 In the following example, is the UTF8 encoding of the 644 string "text/plain" and is the UTF8 encoding of the string 645 "Konrad is the traitor". The randomly chosen key is RBQ26-MEZGP- 646 4SVCU-RYOWO-QTURA. 648 Data = 649 4B 6F 6E 72 61 64 20 69 73 20 74 68 65 20 74 72 650 61 69 74 6F 72 652 ContentType = 653 74 65 78 74 2F 70 6C 61 69 6E 655 Key = 656 52 42 51 32 36 2D 4D 45 5A 47 50 2D 34 53 56 43 657 55 2D 52 59 4F 57 4F 2D 51 54 55 52 41 659 Figure 10 661 Processing is performed in the same manner as an unkeyed fingerprint: 663 H() = 664 93 FC DA F9 FA FD 1E 26 50 26 C3 C1 28 43 40 73 665 D8 BC 3D 62 87 73 2B 73 B8 EC 93 B6 DE 80 FF DA 666 70 0A D1 CE E8 F4 36 68 EF 4E 71 63 41 53 91 5C 667 CE 8C 5C CE C7 9A 46 94 6A 35 79 F9 33 70 85 01 669 + ':' + H() = 670 74 65 78 74 2F 70 6C 61 69 6E 3A 93 FC DA F9 FA 671 FD 1E 26 50 26 C3 C1 28 43 40 73 D8 BC 3D 62 87 672 73 2B 73 B8 EC 93 B6 DE 80 FF DA 70 0A D1 CE E8 673 F4 36 68 EF 4E 71 63 41 53 91 5C CE 8C 5C CE C7 674 9A 46 94 6A 35 79 F9 33 70 85 01 676 PRK(Key) = 677 77 0B FA BC 7D AB 3C EF 4F 13 3D 3F BC D8 CE 89 678 CC A2 89 10 F0 93 D4 44 D0 45 EA 23 AB AB C0 8E 679 9D 6F CB EB 37 EC EA DB B6 04 B3 1F 61 02 3B 9A 680 B8 29 48 45 36 9D 78 AC D6 DA 42 36 79 13 E9 51 682 HKDF(Key) = 683 7A 10 08 F5 9F 46 3C FF 09 7F 8E 59 41 FB 9B 22 684 28 FF 7E C5 A4 1D 01 11 18 A1 EC A9 DD A4 1D 48 685 29 6A B8 C9 98 7C 13 C9 15 74 C4 16 1A AA 6E 94 686 09 46 7F F7 88 84 15 A0 85 6F E5 19 82 06 20 58 688 MAC(, + ':' + H()) = 689 87 5F 7C 18 D7 D8 2C E4 CB D6 58 6D C0 7B 8B DC 690 C9 E4 7F 79 0B 7E 3E 13 63 EC 86 C4 AB 36 6D 78 691 74 D2 C0 D5 B9 A5 33 AB EE CA 4A 70 30 45 D9 D6 692 63 08 E0 5C 85 1B 1B C9 69 D0 55 6E 8A E0 2C 8D 694 Prefixed, compressed, trimmed = 695 50 87 5F 7C 18 D7 D8 2C E4 CB D6 58 6D C0 7B 8B 696 DC ... 698 Figure 11 700 The 125 bit fingerprint value is KCDV6-7AY27-MCZZG-L2ZMG-3QD3R 702 5. Content Types 704 While a UDF fingerprint MAY be used to identify any form of static 705 data, the use of a UDF fingerprint to identify a public key signature 706 key provides a level of indirection and thus the ability to identify 707 dynamic data. The content types used to identify public keys are 708 thus of particular interest. 710 As described in the security considerations section, the use of 711 fingerprints to identify a bare public key and the use of 712 fingerprints to identify a public key and associated security policy 713 information are very different. 715 5.1. PKIX Certificates and Keys 717 UDF fingerprints MAY be used to identify PKIX certificates, CRLs and 718 public keys in the ASN.1 encoding used in PKIX certificates. 720 Since PKIX certificates and CLRs contain security policy information, 721 UDF fingerprints used to identify certificates or CRLs SHOULD be 722 presented with a minimum of 200 bits of precision. PKIX applications 723 MUST not accept UDF fingerprints specified with less than 200 bits of 724 precision for purposes of identifying trust anchors. 726 PKIX certificates, keys and related content data are identified by 727 the following content types: 729 application/pkix-cert A PKIX Certificate 731 application/pkix-crl A PKIX CRL 733 application/pkix-keyinfo The KeyInfo structure defined in the PKIX 734 certificate specification 736 5.2. OpenPGP Key 738 OpenPGPv5 keys and key set content data are identified by the 739 following content types: 741 application/pgp-key-v5 An OpenPGP key 743 application/pgp-keys An OpenPGP key set. 745 5.3. DNSSEC 747 DNSSEC record data consists of DNS records which are identified by 748 the following content type: 750 application/dns A DNS resource record in binary format 752 6. URI Scheme 754 [RFC6920] . 756 6.1. Scheme Syntax 758 6.2. Scheme Semantics 760 6.3. Encoding considerations 762 6.4. Interoperability considerations 764 6.5. Security considerations 766 7. Additional UDF Renderings 768 By default, a UDF fingerprint is rendered in the Base32 encoding 769 described in this document. Additional renderings MAY be employed to 770 facilitate entry and/or verification of fingerprint values. 772 7.1. Machine Readable Rendering 774 The use of a machine-readable rendering such as a QR Code allows a 775 UDF value to be input directly using a smartphone or other device 776 equipped with a camera. 778 A QR code fixed to a network capable device might contain the 779 fingerprint of a machine-readable description of the device. 781 7.2. Word Lists 783 The use of a Word List to encode fingerprint values was introduced by 784 Patrick Juola and Philip Zimmerman for the PGPfone application. The 785 PGP Word List is designed to facilitate exchange and verification of 786 fingerprint values in a voice application. To minimize the risk of 787 misinterpretation, two-word lists of 256 values each are used to 788 encode alternative fingerprint bytes. The compact size of the lists 789 used allowed the compilers to curate them so as to maximize the 790 phonetic distance of the words selected. 792 The PGP Word List is designed to achieve a balance between ease of 793 entry and verification. Applications where only verification is 794 required may be better served by a much larger word list, permitting 795 shorter fingerprint encodings. 797 For example, a word list with 16384 entries permits 14 bits of the 798 fingerprint to be encoded at once, 65536 entries permits 16. These 799 encodings allow a 125 bit fingerprint to be encoded in 9 and 8 words 800 respectively. 802 7.3. Image List 804 An image list is used in the same manner as a word list affording 805 rapid visual verification of a fingerprint value. For obvious 806 reasons, this approach is not suited to data entry but is preferable 807 for comparison purposes. 809 8. Security Considerations 811 8.1. Work Factor and Precision 813 A given UDF data object has a single fingerprint value that may be 814 presented at different precisions. The shortest legitimate precision 815 with which a UDF fingerprint may be presented has 96 significant bits 817 A UDF fingerprint presents the same work factor as any other 818 cryptographic digest function. The difficulty of finding a second 819 data item that matches a given fingerprint is 2^n and the difficulty 820 or finding two data items that have the same fingerprint is 2^(n/2). 821 Where n is the precision of the fingerprint. 823 For the algorithms specified in this document, n = 512 and thus the 824 work factor for finding collisions is 2^256, a value that is 825 generally considered to be computationally infeasible. 827 Since the use of 512 bit fingerprints is impractical in the type of 828 applications where fingerprints are generally used, truncation is a 829 practical necessity. The longer a fingerprint is, the less likely it 830 is that a user will check every character. It is therefore important 831 to consider carefully whether the security of an application depends 832 on second pre-image resistance or collision resistance. 834 In most fingerprint applications, such as the use of fingerprints to 835 identify public keys, the fact that a malicious party might generate 836 two keys that have the same fingerprint value is a minor concern. 837 Combined with a flawed protocol architecture, such a vulnerability 838 may permit an attacker to construct a document such that the 839 signature will be accepted as valid by some parties but not by 840 others. 842 For example, Alice generates keypairs until two are generated that 843 have the same 100 bit UDF presentation (typically 2^48 attempts). 844 She registers one keypair with a merchant and the other with her 845 bank. This allows Alice to create a payment instrument that will be 846 accepted as valid by one and rejected by the other. 848 The ability to generate of two PKIX certificates with the same 849 fingerprint and different certificate attributes raises very 850 different and more serious security concerns. For example, an 851 attacker might generate two certificates with the same key and 852 different use constraints. This might allow an attacker to present a 853 highly constrained certificate that does not present a security risk 854 to an application for purposes of gaining approval and an 855 unconstrained certificate to request a malicious action. 857 In general, any use of fingerprints to identify data that has 858 security policy semantics requires the risk of collision attacks to 859 be considered. For this reason the use of short, ?user friendly? 860 fingerprint presentations (Less than 200 bits) SHOULD only be used 861 for public key values. 863 8.2. Semantic Substitution 865 Many applications record the fact that a data item is trusted, rather 866 fewer record the circumstances in which the data item is trusted. 867 This results in a semantic substitution vulnerability which an 868 attacker may exploit by presenting the trusted data item in the wrong 869 context. 871 The UDF format provides protection against high level semantic 872 substitution attacks by incorporating the content type into the input 873 to the outermost fingerprint digest function. The work factor for 874 generating a UDF fingerprint that is valid in both contexts is thus 875 the same as the work factor for finding a second preimage in the 876 digest function (2^512 for the specified digest algorithms). 878 It is thus infeasible to generate a data item such that some 879 applications will interpret it as a PKIX key and others will accept 880 as an OpenPGP key. While attempting to parse a PKIX key as an 881 OpenPGP key is virtually certain to fail to return the correct key 882 parameters it cannot be assumed that the attempt is guaranteed to 883 fail with an error message. 885 The UDF format does not provide protection against semantic 886 substitution attacks that do not affect the content type. 888 9. IANA Considerations 890 9.1. URI Registration 892 Scheme name: UDF 894 Status: Provisional 896 Applications/protocols that use this scheme name: Mathematical Mesh 897 Service protocols (mmm) 899 Contact: Phillip Hallam-Baker mailto:phill@hallambaker.com 901 Change controller: Phillip Hallam-Baker 903 References: [This document] 905 9.2. Version Registry 907 [Here request creation of a version registry for the UDF prefix 908 values] 910 80 = HMAC and SHA-2-512 911 81 = HMAC and SHA-3-512 912 96 = SHA-2-512 913 97 = SHA-2-512 with 24 leading zeros 914 98 = SHA-2-512 with 32 leading zeros 915 99 = SHA-2-512 with 40 leading zeros 916 100 = SHA-2-512 with 48 leading zeros 917 101 = SHA-2-512 with 56 leading zeros 918 136 = Random nonce 919 144 = SHA-3-512 920 145 = SHA-3-512 with 24 leading zeros 921 146 = SHA-3-512 with 32 leading zeros 922 147 = SHA-3-512 with 40 leading zeros 923 148 = SHA-3-512 with 48 leading zeros 924 149 = SHA-3-512 with 56 leading zeros 926 Figure 12 928 10. References 930 10.1. Normative References 932 [RFC2014] Weinrib, A. and J. Postel, "IRTF Research Group Guidelines 933 and Procedures", BCP 8, RFC 2014, DOI 10.17487/RFC2014, 934 October 1996. 936 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 937 Requirement Levels", BCP 14, RFC 2119, 938 DOI 10.17487/RFC2119, March 1997. 940 [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 941 Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006. 943 [RFC5869] Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand 944 Key Derivation Function (HKDF)", RFC 5869, 945 DOI 10.17487/RFC5869, May 2010. 947 [SHA-2] NIST, "Secure Hash Standard", August 2015. 949 [SHA-3] Dworkin, M., "SHA-3 Standard: Permutation-Based Hash and 950 Extendable-Output Functions", August 2015. 952 10.2. Informative References 954 [Dobertin95] 955 Eurocrypt 1996, "Cryptanalysis of MD5 Compress". 957 [draft-hallambaker-mesh-developer] 958 Hallam-Baker, P., "Mathematical Mesh: Reference 959 Implementation", draft-hallambaker-mesh-developer-07 (work 960 in progress), April 2018. 962 [hallambaker-sin] 963 "[Reference Not Found!]". 965 [RFC1321] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, 966 DOI 10.17487/RFC1321, April 1992. 968 [RFC6920] Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B., 969 Keranen, A., and P. Hallam-Baker, "Naming Things with 970 Hashes", RFC 6920, DOI 10.17487/RFC6920, April 2013. 972 10.3. URIs 974 [1] http://mathmesh.com/Documents/draft-hallambaker-udf.html 976 Author's Address 978 Phillip Hallam-Baker 979 Comodo Group Inc. 981 Email: phill@hallambaker.com