idnits 2.17.1 draft-eastlake-fnv-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 198 has weird spacing: '...ed char hash...' -- The document date (September 24, 2012) is 4231 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'N' is mentioned on line 198, but not defined Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Glenn Fowler 2 INTERNET-DRAFT AT&T Labs Research 3 Intended Status: Informational Landon Curt Noll 4 Cisco Systems 5 Kiem-Phong Vo 6 AT&T Labs Research 7 Donald Eastlake 8 Huawei Technologies 9 Expires: March 23, 2013 September 24, 2012 11 The FNV Non-Cryptographic Hash Algorithm 12 14 Abstract 16 FNV (Fowler/Noll/Vo) is a fast, non-cryptographic hash algorithm with 17 good dispersion. The purpose of this document is to make information 18 on FNV and open source code performing FNV conveniently available to 19 the Internet community. 21 Status of This Memo 23 This Internet-Draft is submitted to IETF in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Distribution of this document is unlimited. Comments should be sent 27 to the authors. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF), its areas, and its working groups. Note that 31 other groups may also distribute working documents as Internet- 32 Drafts. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 The list of current Internet-Drafts can be accessed at 40 http://www.ietf.org/1id-abstracts.html. The list of Internet-Draft 41 Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html. 44 Table of Contents 46 1. Introduction............................................3 48 2. FNV Basics..............................................4 49 2.1 FNV Primes.............................................4 50 2.2 FNV offset_basis.......................................5 51 2.3 FNV Endianism..........................................5 53 3. Other Hash Sizes and XOR Folding........................6 54 4. FNV Constants...........................................7 56 5. The Source Code.........................................9 57 5.1 FNV C Headers..........................................9 58 5.2 FNV C Code.............................................9 59 5.3 FNV Test Code..........................................9 61 6. Security Considerations................................10 62 6.1 Why is FNV Non-Cryptographic?.........................10 64 7. IANA Considerations....................................11 65 8. Acknowledgements.......................................11 67 9. References.............................................12 68 9.1 Normative References..................................12 69 9.2 Informative References................................12 71 Appendix A: Work Comparison with SHA-1....................13 72 Appendix B: Previous IETF Reference to FNV................14 73 Appendix C: A Few Test Vectors............................15 74 Appendix Z: Change Summary................................16 76 1. Introduction 78 The FNV hash algorithm is based on an idea sent as reviewer comments 79 to the [IEEE] POSIX P1003.2 committee by Glenn Fowler and Phong Vo in 80 1991. In a subsequent ballot round Landon Curt Noll suggested an 81 improvement on their algorithm. Some people tried this hash and found 82 that it worked rather well. In an EMail message to Landon, they named 83 it the "Fowler/Noll/Vo" or FNV hash. [FNV] 85 FNV hashes are designed to be fast while maintaining a low collision 86 rate. The high dispersion of the FNV hashes makes them well suited 87 for hashing nearly identical strings such as URLs, hostnames, 88 filenames, text, IP addresses, etc. Their speed allows one to quickly 89 hash lots of data while maintaining a reasonably low collision rate. 90 However, they are generally not suitable for cryptographic use. (See 91 Section 6.1.) 93 The FNV hash is widely used, for example in DNS servers, database 94 indexing hashes, major web search / indexing engines, netnews history 95 file Message-ID lookup functions, anti-spam filters, a spellchecker 96 programmed in Ada 95, flatassembler's open source x86 assembler - 97 user-defined symbol hashtree, non-cryptographic file fingerprints, 98 computing Unique IDs in DASM (DTN Applications for Symbian Mobile- 99 phones), Microsoft's hash_map implementation for VC++ 2005, the 100 realpath cache in PHP 5.x (php-5.2.3/TSRM/tsrm_virtual_cwd.c), and 101 many other uses. 103 FNV hash algorithms and source code have been released into the 104 public domain. The authors of the FNV algorithm took deliberate steps 105 to disclose the algorithm in a public forum soon after it was 106 invented. More than a year passed after this public disclosure and 107 the authors deliberately took no steps to patent the FNV algorithm. 108 Therefore, it is safe to say that the FNV authors have no patent 109 claims on the FNV algorithm as published. 111 If you use an FNV function in an application, you are kindly 112 requested to send an EMail about it to: fnv-mail@asthe.com 114 2. FNV Basics 116 This document focuses on the FNV-1a function whose pseudo-code is as 117 follows: 119 hash = offset_basis 120 for each octet_of_data to be hashed 121 hash = hash xor octet_of_data 122 hash = hash * FNV_Prime 123 return hash 125 In the pseudo-code above, hash is a power-of-two number of bits (32, 126 64, ... 1024) and offset_basis and FNV_Prime depend on the size of 127 hash. 129 The FNV-1 algorithm is the same, including the values of offset_basis 130 and FNV_Prime, except that the order of the two lines with the "xor" 131 and multiply operations are reversed. Operational experience 132 indicates better hash dispersion for small amounts of data with 133 FNV-1a. FNV-0 is the same as FNV-1 but with offset_basis set to zero. 134 FNV-1a is suggested for general use. 136 2.1 FNV Primes 138 The theory behind FNV_Prime's is beyond the scope of this document 139 but the basic property to look for is how an FNV_Prime would impact 140 dispersion. Now, consider any n-bit FNV hash where n is >= 32 and 141 also a power of 2. For each such an n-bit FNV hash, an FNV_Prime p is 142 defined as: 144 When s is an integer and 4 < s < 11, then FNV_Prime is the 145 smallest prime p of the form: 147 256**int((5 + 2^s)/12) + 2**8 + b 149 where b is an integer such that: 151 0 < b < 2**8 152 The number of one-bits in b is 4 or 5 154 and where p mod (2**40 - 2**24 - 1) > (2**24 + 2**8 + 2**7). 156 Experimentally, FNV_Primes matching the above constraints tend to 157 have better dispersion properties. They improve the polynomial 158 feedback characteristic when an FNV_Prime multiplies an intermediate 159 hash value. As such, the hash values produced are more scattered 160 throughout the n-bit hash space. 162 The case where s < 5 is not considered because the resulting hash 163 quality is too low. Such small hashes can, if desired, be derived 164 from a 32 bit FNV hash by XOR folding (see Section 3). The case where 165 s > 10 is not considered because of the doubtful utility of such 166 large FNV hashes and because the criteria for such large FNV_Primes 167 is more complex, due to the sparsity of such large primes, and would 168 needlessly clutter the criteria given above. 170 Per the above constraints, an FNV_Prime should have only 6 or 7 one- 171 bits in it. Therefore, some compilers may seek to improve the 172 performance of a multiplication with an FNV_Prime by replacing the 173 multiplication with shifts and adds. However, note that the 174 performance of this substitution is highly hardware-dependent and 175 should be done with care. FNV_Primes were selected primarily for the 176 quality of resulting hash function, not for compiler optimization. 178 2.2 FNV offset_basis 180 The offset_basis values for the n-bit FNV-1a algorithms are computed 181 by applying the n-bit FNV-0 algorithm to the 32 octets representing 182 the following character string in [ASCII]: 184 chongo /\../\ 186 The \'s in the above string are not C-style escape characters. In C- 187 string notation, these 32 octets are: 189 "chongo /\\../\\" 191 2.3 FNV Endianism 193 For persistent storage or interoperability between different hardware 194 platforms, an FNV hash shall be represented in the little endian 195 format. That is, the FNV hash will be stored in an array hash[N] with 196 N bytes such that its integer value can be retrieved as follows: 198 unsigned char hash[N]; 199 for ( i = N-1, value = 0; i >= 0; --i ) 200 value = value << 8 + hash[i]; 202 Of course, when FNV hashes are used in a single process or a group of 203 processes sharing memory on processors with compatible endian-ness, 204 the natural endianness of those processors can be used regardless of 205 its type, little, big, or some other exotic form. 207 3. Other Hash Sizes and XOR Folding 209 Many hash uses require a hash that is not one of the FNV sizes for 210 which constants are provided in Section 4. If a larger hash size is 211 needed, please contact the authors of this document. 213 Most hash applications make use of a hash that is a fixed size binary 214 field. Assume that k bits of hash are desired and k is less than 1024 215 but not one of the sizes for which constants are provided in Section 216 4. The recommended technique is to take the smallest FNV hash of size 217 S, where S is larger than k, and calculate the desired hash using xor 218 folding as shown below. The final bit masking operation is logically 219 unnecessarily if the size of hash is exactly the number of desired 220 bits. 222 temp = FNV_S ( data-to-be-hashed ) 223 hash = ( temp xor temp>>k ) bitwise-and ( 2**k - 1 ) 225 Hash functions are a trade-off between speed and strength. For 226 example, a somewhat stronger hash may be obtained for exact FNV sizes 227 by calculating an FNV twice as long as the desired output ( S = 2*k ) 228 and performing such data folding using a k equal to the size of the 229 desired output. However, if a much stronger hash, for example one 230 suitable for cryptographic applications, is wanted, algorithms 231 designed for that purpose, such as those in [RFC6234], should be 232 used. 234 If it is desired to obtain a hash result that is a value between 0 235 and max, where max is a not a power of two, simply choose an FNV hash 236 size S such that 2**S > max. Then calculate the following: 238 FNV_S mod ( max+1 ) 240 The resulting remainder will be in the range desired but will suffer 241 from a bias against large values with the bias being larger if 2**S 242 is only a little bigger than max. If this bias is acceptable, no 243 further processing is needed. If this bias is unacceptable, it can be 244 avoided by retrying for certain high values of hash, as follows, 245 before applying the mod operation above: 247 X = ( int( ( 2**S - 1 ) / ( max+1 ) ) ) * ( max+1 ) 248 while ( hash >= X ) 249 hash = ( hash * FNV_Prime ) + offset_basis 251 4. FNV Constants 253 The FNV Primes are as follows: 255 32 bit FNV_Prime = 2**24 + 2**8 + 0x93 = 16,777,619 256 = 0x01000193 258 64 bit FNV_Prime = 2**40 + 2**8 + 0xB3 = 1,099,511,628,211 259 = 0x00000100 000001B3 261 128 bit FNV_Prime = 2**88 + 2**8 + 0x3B = 262 309,485,009,821,345,068,724,781,371 263 = 0x00000000 01000000 00000000 0000013B 265 256 bit FNV_Prime = 2**168 + 2**8 + 0x63 = 266 374,144,419,156,711,147,060,143,317,175,368,453,031,918,731,002,211 = 267 0x0000000000000000 0000010000000000 0000000000000000 0000000000000163 269 512 bit FNV_Prime = 2**344 + 2**8 + 0x57 = 35, 270 835,915,874,844,867,368,919,076,489,095,108,449,946,327,955,754,392, 271 558,399,825,615,420,669,938,882,575,126,094,039,892,345,713,852,759 = 272 0x0000000000000000 0000000000000000 0000000001000000 0000000000000000 273 0000000000000000 0000000000000000 0000000000000000 0000000000000157 275 1024 bit FNV_Prime = 2**680 + 2**8 + 0x8D = 5, 276 016,456,510,113,118,655,434,598,811,035,278,955,030,765,345,404,790, 277 744,303,017,523,831,112,055,108,147,451,509,157,692,220,295,382,716, 278 162,651,878,526,895,249,385,292,291,816,524,375,083,746,691,371,804, 279 094,271,873,160,484,737,966,720,260,389,217,684,476,157,468,082,573 = 280 0x0000000000000000 0000000000000000 0000000000000000 0000000000000000 281 0000000000000000 0000010000000000 0000000000000000 0000000000000000 282 0000000000000000 0000000000000000 0000000000000000 0000000000000000 283 0000000000000000 0000000000000000 0000000000000000 000000000000018D 285 The FNV offset_basis values are as follows: 287 32 bit offset_basis = 2,166,136,261 = 0x811C9DC5 289 64 bit offset_basis = 14695981039346656037 = 0xCBF29CE4 84222325 291 128 bit offset_basis = 144066263297769815596495629667062367629 = 292 0x6C62272E 07BB0142 62B82175 6295C58D 294 256 bit offset_basis = 100,029,257,958,052,580,907,070,968, 295 620,625,704,837,092,796,014,241,193,945,225,284,501,741,471,925,557 = 296 0xDD268DBCAAC55036 2D98C384C4E576CC C8B1536847B6BBB3 1023B4C8CAEE0535 297 512 bit offset_basis = 9, 298 659,303,129,496,669,498,009,435,400,716,310,466,090,418,745,672,637, 299 896,108,374,329,434,462,657,994,582,932,197,716,438,449,813,051,892, 300 206,539,805,784,495,328,239,340,083,876,191,928,701,583,869,517,785 = 301 0xB86DB0B1171F4416 DCA1E50F309990AC AC87D059C9000000 0000000000000D21 302 E948F68A34C192F6 2EA79BC942DBE7CE 182036415F56E34B AC982AAC4AFE9FD9 304 1024 bit offset_basis = 14,197,795,064,947,621,068,722,070,641,403, 305 218,320,880,622,795,441,933,960,878,474,914,617,582,723,252,296,732, 306 303,717,722,150,864,096,521,202,355,549,365,628,174,669,108,571,814, 307 760,471,015,076,148,029,755,969,804,077,320,157,692,458,563,003,215, 308 304,957,150,157,403,644,460,363,550,505,412,711,285,966,361,610,267, 309 868,082,893,823,963,790,439,336,411,086,884,584,107,735,010,676,915 = 310 0x0000000000000000 005F7A76758ECC4D 32E56D5A591028B7 4B29FC4223FDADA1 311 6C3BF34EDA3674DA 9A21D90000000000 0000000000000000 0000000000000000 312 0000000000000000 0000000000000000 0000000000000000 000000000004C6D7 313 EB6E73802734510A 555F256CC005AE55 6BDE8CC9C6A93B21 AFF4B16C71EE90B3 315 5. The Source Code 317 The following sub-sections are intended, in later versions, to 318 include reference C source code and a test driver for FNV-1a. 320 5.1 FNV C Headers 322 TBD 324 5.2 FNV C Code 326 TBD 328 5.3 FNV Test Code 330 TBD 332 6. Security Considerations 334 This document is intended to provide convenient open source access by 335 the Internet community to the FNV non-cryptographic hash. No 336 assertion of suitability for cryptographic applications is made for 337 the FNV hash algorithms. 339 6.1 Why is FNV Non-Cryptographic? 341 A full discussion of cryptographic hash requirements and strength is 342 beyond the scope of this document. However, here are three 343 characteristics of FNV that would generally be considered to make it 344 non-cryptographic: 346 1. Work Factor - To make brute force inversion hard, a cryptographic 347 hash should be computationally expensive, especially for a general 348 purpose processor. But FNV is designed to be very inexpensive on a 349 general-purpose processor. (See Appendix A.) 351 2. Sticky State - A cryptographic hash should not have a state in 352 which it can stick for a plausible input pattern. But, in the very 353 unlikely event that the FNV hash variable becomes zero and the 354 input is a sequence of zeros, the hash variable will remain at 355 zero until there is a non-zero input byte and the final hash value 356 will be unaffected by the length of that sequence of zero input 357 bytes. Of course, for the common case of fixed length input, this 358 would not be significant because the number of non-zero bytes 359 would vary inversely with the number of zero bytes and for some 360 types of input runs of zeros do not occur. Furthermore, the 361 inclusion of even a little unpredictable input may be sufficient 362 to stop an adversary from inducing a zero hash variable. 364 3. Diffusion - Every output bit of a cryptographic hash should be an 365 equally complex function of every input bit. But it is easy to see 366 that the least significant bit of a direct FNV hash is the XOR of 367 the least significant bits of every input byte and does not depend 368 on any other input bit. While more complex, the second least 369 significant bit of an FNV hash has a similar weakness. If these 370 properties are considered a problem, they can be easily fixed by 371 XOR folding (see Section 3). 373 Nevertheless, none of the above have proven to be a problem in actual 374 practice for the many applications of FNV. 376 7. IANA Considerations 378 This document requires no IANA Actions. RFC Editor Note: please 379 delete this section before publication. 381 8. Acknowledgements 383 The contributions of the following are gratefully acknowledged: 385 Frank Ellermann, Bob Moskowitz, and Stefan Santesson. 387 9. References 389 Below are the normative and informative references for this document. 391 9.1 Normative References 393 [ASCII] - American National Standards Institute (formerly United 394 States of America Standards Institute), "USA Code for 395 Information Interchange", ANSI X3.4-1968, 1968. ANSI X3.4-1968 396 has been replaced by newer versions with slight modifications, 397 but the 1968 version remains definitive for the Internet. 399 9.2 Informative References 401 [FNV] - FNV web site: 402 http://www.isthe.com/chongo/tech/comp/fnv/index.html 404 [IEEE] - http://www.ieee.org 406 [RFC3174] - Eastlake 3rd, D. and P. Jones, "US Secure Hash Algorithm 407 1 (SHA1)", RFC 3174, September 2001. 409 [RFC6194] - Polk, T., Chen, L., Turner, S., and P. Hoffman, "Security 410 Considerations for the SHA-0 and SHA-1 Message-Digest 411 Algorithms", RFC 6194, March 2011. 413 [RFC6234] - Eastlake 3rd, D. and T. Hansen, "US Secure Hash 414 Algorithms (SHA and SHA-based HMAC and HKDF)", RFC 6234, May 415 2011. 417 Appendix A: Work Comparison with SHA-1 419 This section provides a simplistic rough comparison of the level of 420 effort required per input byte to compute FNV-1a and SHA-1 [RFC3174]. 422 Ignoring transfer of control and conditional tests and equating all 423 logical and arithmetic operations, FNV requires 2 operations per 424 byte, an XOR and a multiply. 426 SHA-1 is a relatively weak cryptographic hash producing a 160-bit 427 hash. It that has been partially broken [RFC6194]. It is actually 428 designed to accept a bit vector input although almost all computer 429 uses apply it to an integer number of bytes. It processes blocks of 430 512 bits (64 bytes) and we estimate the effort involved in SHA-1 431 processing a full block. Ignoring SHA-1 initial set up, transfer of 432 control, and conditional tests, but counting all logical and 433 arithmetic operations, including counting indexing as an addition, 434 SHA-1 requires 1,744 operations per 64 bytes block or 27.25 435 operations per byte. So by this rough measure, it is a little over 13 436 times the effort of FNV for large amounts of data. However, FNV is 437 commonly used for small inputs. Using the above method, for inputs of 438 N bytes, where N is <= 55 so SHA-1 will take one block (SHA-1 439 includes padding and an 8-byte length at the end of the data in the 440 last block), the ratio of the effort for SHA-1 to the effort for FNV 441 will be 872/N. For example, with an 8 byte input, SHA-1 will take 109 442 times as much effort as FNV. 444 Stronger cryptographic functions than SHA-1 generally have an even 445 high work factor. 447 Appendix B: Previous IETF Reference to FNV 449 FNV-1a was referenced in draft-ietf-tls-cached-info-08.txt that has 450 since expired. It was later decided that it would be better to use a 451 cryptographic hash for that application. 453 Below is the Jave code for FNV64 from that TLS draft include by the 454 kind permission of the author: 456 /** 457 * Java code sample, implementing 64 bit FNV-1a 458 * By Stefan Santesson 459 */ 461 import java.math.BigInteger; 463 public class FNV { 465 static public BigInteger getFNV1aToByte(byte[] inp) { 467 BigInteger m = new BigInteger("2").pow(64); 468 BigInteger fnvPrime = new BigInteger("1099511628211"); 469 BigInteger fnvOffsetBasis = 470 new BigInteger("14695981039346656037"); 472 BigInteger digest = fnvOffsetBasis; 474 for (byte b : inp) { 475 digest = digest.xor(BigInteger.valueOf((int) b & 255)); 476 digest = digest.multiply(fnvPrime).mod(m); 477 } 478 return digest; 480 } 481 } 483 Appendix C: A Few Test Vectors 485 Below are a few test vectors in the form of ASCII strings and their 486 FNV32 and FNV64 hashes using the FNV-1a algorithm. 488 Strings without null (zero byte) termination: 490 String FNV32 FNV64 491 "" 0x811c9dc5 0xcbf29ce484222325 492 "a" 0xe40c292c 0xaf63dc4c8601ec8c 493 "foobar" 0xbf9cf968 0x85944171f73967e8 495 Strings including null (zero byte) termination: 497 String FNV32 FNV64 498 "" 0x050c5d1f 0xaf63bd4c8601b7df 499 "a" 0x2b24d044 0x089be207b544f1e4 500 "foobar" 0x0c1c9eb8 0x34531ca7168b8f38 502 Appendix Z: Change Summary 504 RFC Editor Note: Please delete this appendix on publication. 506 From -00 to -01 508 1. Add Security Considerations section on why FNV is non- 509 cryptographic. 511 2. Add Appendix A on a work factor comparison with SHA-1. 513 3. Add Appendix B concerning previous IETF draft referenced to FNV. 515 4. Minor editorial changes. 517 From -01 to -02 519 1. Correct FNV_Prime determination criteria and add note as to why s 520 < 5 and s > 10 are not considered. 522 2. Add acknowledgements list. 524 3. Add a couple of references. 526 4. Minor editorial changes. 528 From -02 to -03 530 1. Replace direct reference to US-ASCII standard with reference to 531 RFC 20. 533 2. Update dates and verion number. 535 3. Minor editing changes. 537 From -03 to -04 539 1. Change reference to RFC 20 back to a reference to the ANSI 1968 540 ASCII standard. 542 2. Minor addition to Section 6, point 3. 544 3. Update dates and version number. 546 4. Minor editing changes. 548 Author's Address 550 Glenn Fowler 551 AT&T Labs Research 552 180 Park Avenue 553 Florham Park, NJ 07932 USA 555 Email: gsf@research.att.com 556 URL: http://www.research.att.com/~gsf/ 558 Landon Curt Noll 559 Cisco Systems 560 170 West Tasman Drive 561 San Jose, CA 95134 USA 563 Telephone: +1-408-424-1102 564 Email: fnv-rfc-mail@asthe.com 565 URL: http://www.isthe.com/chongo/index.html 567 Kiem-Phong Vo 568 AT&T Labs Research 569 180 Park Avenue 570 Florham Park, NJ 07932 USA 572 Email: kpv@research.att.com 573 URL: http://www.research.att.com/info/kpv/ 575 Donald Eastlake 576 Huawei Technologies 577 155 Beaver Street 578 Milford, MA 01757 USA 580 Telephone: +1-508-333-2270 581 EMail: d3e3e3@gmail.com 583 Copyright, Disclaimer, and Additional IPR Provisions 585 Copyright (c) 2012 IETF Trust and the persons identified as the 586 document authors. All rights reserved. 588 This document is subject to BCP 78 and the IETF Trust's Legal 589 Provisions Relating to IETF Documents 590 (http://trustee.ietf.org/license-info) in effect on the date of 591 publication of this document. Please review these documents 592 carefully, as they describe your rights and restrictions with respect 593 to this document. Code Components extracted from this document must 594 include Simplified BSD License text as described in Section 4.e of 595 the Trust Legal Provisions and are provided without warranty as 596 described in the Simplified BSD License. This Internet-Draft is 597 submitted to IETF in full conformance with the provisions of BCP 78 598 and BCP 79.