idnits 2.17.1 draft-irtf-cfrg-argon2-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 5 instances of too long lines in the document, the longest one being 139 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 25, 2017) is 2582 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '0' on line 697 -- Looks like a reference, but probably isn't: '1' on line 617 -- Looks like a reference, but probably isn't: '2' on line 618 -- Looks like a reference, but probably isn't: '3' on line 619 -- Looks like a reference, but probably isn't: '4' on line 620 -- Looks like a reference, but probably isn't: '5' on line 621 -- Looks like a reference, but probably isn't: '32' on line 1024 -- Looks like a reference, but probably isn't: '16' on line 1025 -- Looks like a reference, but probably isn't: '8' on line 1026 -- Looks like a reference, but probably isn't: '12' on line 1027 -- Looks like a reference, but probably isn't: '124' on line 1058 -- Looks like a reference, but probably isn't: '125' on line 1059 -- Looks like a reference, but probably isn't: '126' on line 1060 -- Looks like a reference, but probably isn't: '127' on line 1061 == Unused Reference: 'AB15' is defined on line 1160, but no explicit reference was found in the text Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 16 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Biryukov 3 Internet-Draft D. Dinu 4 Intended status: Informational D. Khovratovich 5 Expires: September 26, 2017 University of Luxembourg 6 S. Josefsson 7 SJD AB 8 March 25, 2017 10 The memory-hard Argon2 password hash and proof-of-work function 11 draft-irtf-cfrg-argon2-02 13 Abstract 15 This document describes the Argon2 memory-hard function for password 16 hashing and proof-of-work applications. We provide an implementer 17 oriented description together with sample code and test vectors. The 18 purpose is to simplify adoption of Argon2 for Internet protocols. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on September 26, 2017. 37 Copyright Notice 39 Copyright (c) 2017 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 2. Notation and Conventions . . . . . . . . . . . . . . . . . . 3 56 3. Argon2 Algorithm . . . . . . . . . . . . . . . . . . . . . . 3 57 3.1. Argon2 Inputs and Outputs . . . . . . . . . . . . . . . . 4 58 3.2. Argon2 Operation . . . . . . . . . . . . . . . . . . . . 4 59 3.3. Variable-length hash function H' . . . . . . . . . . . . 6 60 3.4. Indexing . . . . . . . . . . . . . . . . . . . . . . . . 6 61 3.4.1. Getting the 32-bit values J_1 and J_2 . . . . . . . . 7 62 3.4.2. Mapping J_1 and J_2 to reference block index . . . . 7 63 3.5. Compression function G . . . . . . . . . . . . . . . . . 8 64 3.6. Permutation P . . . . . . . . . . . . . . . . . . . . . . 9 65 4. Parameter Choice . . . . . . . . . . . . . . . . . . . . . . 10 66 5. Example Code . . . . . . . . . . . . . . . . . . . . . . . . 11 67 6. Test Vectors . . . . . . . . . . . . . . . . . . . . . . . . 20 68 6.1. Argon2d Test Vectors . . . . . . . . . . . . . . . . . . 20 69 6.2. Argon2i Test Vectors . . . . . . . . . . . . . . . . . . 21 70 6.3. Argon2id Test Vectors . . . . . . . . . . . . . . . . . . 22 71 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 24 72 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 73 9. Security Considerations . . . . . . . . . . . . . . . . . . . 24 74 9.1. Security as hash function and KDF . . . . . . . . . . . . 24 75 9.2. Security against time-space tradeoff attacks . . . . . . 24 76 9.3. Security for time-bounded defenders . . . . . . . . . . . 25 77 9.4. Recommendations . . . . . . . . . . . . . . . . . . . . . 25 78 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 25 79 10.1. Normative References . . . . . . . . . . . . . . . . . . 25 80 10.2. Informative References . . . . . . . . . . . . . . . . . 25 81 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 26 83 1. Introduction 85 This document describes the Argon2 memory-hard function for password 86 hashing and proof-of-work applications. We provide an implementer 87 oriented description together with sample code and test vectors. The 88 purpose is to simplify adoption of Argon2 for Internet protocols. 89 This document corresponds to version 1.3 of the Argon2 hash function. 91 Argon2 summarizes the state of the art in the design of memory-hard 92 functions. It is a streamlined and simple design. It aims at the 93 highest memory filling rate and effective use of multiple computing 94 units, while still providing defense against tradeoff attacks. 95 Argon2 is optimized for the x86 architecture and exploits the cache 96 and memory organization of the recent Intel and AMD processors. 98 Argon2 has one primary variant: Argon2id, and two supplementary 99 variants: Argon2d and Argon2i. Argon2d uses data-depending memory 100 access, which makes it suitable for cryptocurrencies and proof-of- 101 work applications with no threats from side-channel timing attacks. 102 Argon2i uses data-independent memory access, which is preferred for 103 password hashing and password-based key derivation. Argon2id works 104 as Argon2i for the first half of the first iteration over the memory, 105 and as Argon2d for the rest, thus providing both side-channel attack 106 protection and brute-force cost savings due to time-memory tradeoffs. 107 Argon2i makes more passes over the memory to protect from tradeoff 108 attacks. 110 For further background and discussion, see the Argon2 paper [ARGON2]. 112 2. Notation and Conventions 114 x**y --- x multiplied by itself y times 116 a*b --- multiplication of a and b 118 c-d --- substraction of c with d 120 E_f --- variable E with subscript index f 122 g / h --- g divided by h 124 I(j) --- function I evaluated on parameter j 126 K || L --- string K concatenated with string L 128 a ^ b --- bitwise exclusive-or between a and b 130 a mod b --- remainder of a modulo b, always in range [0, b-1] 132 a >>> n --- rotation of a to the right by n bits 134 trunc(a) --- the 64-bit value a truncated to the 32 least significant 135 bits 137 extract(a, i) --- the i-th set of 32-bits from a 139 |A| --- the number of elements in set A 141 3. Argon2 Algorithm 142 3.1. Argon2 Inputs and Outputs 144 Argon2 has the following input parameters: 146 o Message string P, which is a password for password hashing 147 applications. May have any length from 0 to 2**32 - 1 bytes. 149 o Nonce S, which is a salt for password hashing applications. May 150 have any length from 8 to 2**32-1 bytes. 16 bytes is recommended 151 for password hashing. Salt must be unique for each password. 153 o Degree of parallelism p determines how many independent (but 154 synchronizing) computational chains (lanes) can be run. It may 155 take any integer value from 1 to 2**24-1. 157 o Tag length T may be any integer number of bytes from 4 to 2**32-1. 159 o Memory size m can be any integer number of kibibytes from 8*p to 160 2**32-1. The actual number of blocks is m', which is m rounded 161 down to the nearest multiple of 4*p. 163 o Number of iterations t (used to tune the running time 164 independently of the memory size) can be any integer number from 1 165 to 2**32-1. 167 o Version number v is one byte 0x13. 169 o Secret value K (serves as key if necessary, but we do not assume 170 any key use by default) may have any length from 0 to 2**32-1 171 bytes. 173 o Associated data X may have any length from 0 to 2**32-1 bytes. 175 o Type y of Argon2: 0 for Argon2d, 1 for Argon2i, 2 for Argon2id. 177 The Argon2 output is a T-length string. 179 3.2. Argon2 Operation 181 Argon2 uses an internal compression function G with two 1024-byte 182 inputs and a 1024-byte output, and an internal hash function H. Here 183 H is the BLAKE2b [I-D.saarinen-blake2] hash function, and the 184 compression function G is based on its internal permutation. A 185 variable-length hash function H' built upon H is also used. G and H' 186 are described in later section. 188 The Argon2 operation is as follows. 190 1. Establish H_0 as the 64-bit value as shown in the figure below. 191 H is BLAKE2b and the non-strings p, T, m, t, v, y, length(P), 192 length(S), length(K), and length(X) are treated as a 32-bit 193 little-endian encoding of the integer. 195 H_0 = H(p, T, m, t, v, y, length(P), P, length(S), S, 196 length(K), K, length(X), X) 198 2. Allocate the memory as m' 1024-byte blocks where m' is derived 199 as: 201 m' = 4 * p * floor (m / 4p) 203 For p lanes, the memory is organized in a matrix B[i][j] of 204 blocks with p rows (lanes) and q = m' / p columns. 206 3. Compute B[i][0] for all i ranging from (and including) 0 to (not 207 including) p. 209 B[i][0] = H'(H_0, 0, i) 211 Here integers are padded to 4 bytes and encoded in little endian. 213 4. Compute B[i][1] for all i ranging from (and including) 0 to (not 214 including) p. 216 B[i][1] = H'(H_0, 1, i) 218 Here integers are padded to 4 bytes and encoded in little endian. 220 5. Compute B[i][j] for all i ranging from (and including) 0 to (not 221 including) p, and for all j ranging from (and including) 2 to 222 (not including) q. The block indices i' and j' are determined 223 differently for Argon2d, Argon2i, and Argon2id. 225 B[i][j] = G(B[i][j-1], B[i'][j']) 227 6. If the number of iterations t is larger than 1, we repeat the 228 steps however replacing the computations with the following 229 expression: 231 B[i][0] = G(B[i][q-1], B[i'][j']) XOR B[i][0] 232 B[i][j] = G(B[i][j-1], B[i'][j']) XOR B[i][j] 234 7. After t steps have been iterated, the final block C is computed 235 as the XOR of the last column: 237 C = B[0][q-1] XOR B[1][q-1] XOR ... XOR B[p-1][q-1] 239 8. The output tag is computed as H'(C). 241 3.3. Variable-length hash function H' 243 Let H_x be a hash function with x-byte output (in our case H_x is 244 BLAKE2b, which supports x between 1 and 64 inclusive). Let V_i be a 245 64-byte block, and A_i be its first 32 bytes, and T < 2**32 be the 246 tag length in bytes, encoded in little-endian as 32-bit integer. 247 Then we define: 249 if T <= 64 250 H'(X) = H_T(T||X) 251 else 252 r = ceil(T/32)-2 253 V_1 = H_64(T||X) 254 V_2 = H_64(V_1) 255 ... 256 V_r = H_64(V_{r-1}) 257 V_{r+1} = H_{T-32*r}(V_{r}) 258 H'(X) = A_1 || A_2 || ... || A_r || V_{r+1} 260 3.4. Indexing 262 To enable parallel block computation, we further partition the memory 263 matrix into S = 4 vertical slices. The intersection of a slice and a 264 lane is a segment of length q/S. Segments of the same slice are 265 computed in parallel and may not reference blocks from each other. 266 All other blocks can be referenced. 268 slice 0 slice 1 slice 2 slice 3 269 ___/\___ ___/\___ ___/\___ ___/\___ 270 / \ / \ / \ / \ 271 +----------+----------+----------+----------+ 272 | | | | | > lane 0 273 +----------+----------+----------+----------+ 274 | | | | | > lane 1 275 +----------+----------+----------+----------+ 276 | | | | | > lane 2 277 +----------+----------+----------+----------+ 278 | ... ... ... | ... 279 +----------+----------+----------+----------+ 280 | | | | | > lane p - 1 281 +----------+----------+----------+----------+ 283 Single-pass Argon2 with p lanes and 4 slices 285 3.4.1. Getting the 32-bit values J_1 and J_2 287 3.4.1.1. Argon2d 289 J_1 is given by the first 32 bits of block B[i][j-1], while J_2 is 290 given by the next 32-bits of block B[i][j-1]: 292 J_1 = extract(B[i][j-1], 1) 293 J_2 = extract(B[i][j-1], 2) 295 3.4.1.2. Argon2i 297 Each application of the 2-round compression function G in the counter 298 mode gives 128 64-bit values J_1 || J_2. The first input is the all 299 zero block and the second input is constructed as follows: 301 ( r || l || s || m' || t || x || i || 0 ), where 303 r -- the pass number 304 l -- the lane number 305 s -- the slice number 306 m' -- the total number of memory blocks 307 t -- the total number of passes 308 x -- the Argon2 type (0 for Argon2d, 1 for Argon2i, 2 for Argon2id) 309 i -- the counter (starts from 1 in each segment) 311 The values r, l, s, m', t, x, i are represented on 8 bytes in little- 312 endian. 314 3.4.1.3. Argon2id 316 If the pass number is 0 and the slice number is 0 or 1, then compute 317 J_1 and J_2 as for Argon2i, else compute J_1 and J_2 as for Argon2d. 319 3.4.2. Mapping J_1 and J_2 to reference block index 321 The value of l = J_2 mod p gives the index of the lane from which the 322 block will be taken. For the first pass (r=0) and the first slice 323 (s=0) the block is taken from the current lane. 325 The set R contains the indices that can be referenced according to 326 the following rules: 328 1. If l is the current lane, then R includes the indices of all 329 blocks in the last S - 1 = 3 segments computed and finished, as 330 well as the blocks computed in the current segment in the current 331 pass excluding B[i][j-1]. 333 2. If l is not the current lane, then R includes the indices of all 334 blocks in the last S - 1 = 3 segments computed and finished in 335 lane l. If B[i][j] is the first block of a segment, then the 336 very last index from R is excluded. 338 We are going to take a block from R with a non-uniform distribution 339 over [0, |R|): 341 J_1 in [0, 2**32) -> |R|(1 - J_1**2 / 2**64) 343 To avoid floating point computation, the following approximation is 344 used: 346 x = J_1**2 / 2**32 347 y = (|R| * x) / 2**32 348 z = |R| - 1 - y 350 The value of z gives the reference block index in R. 352 3.5. Compression function G 354 Compression function G is built upon the BLAKE2b round function P. P 355 operates on the 128-byte input, which can be viewed as 8 16-byte 356 registers: 358 P(A_0, A_1, ... ,A_7) = (B_0, B_1, ... ,B_7) 360 Compression function G(X, Y) operates on two 1024-byte blocks X and 361 Y. It first computes R = X XOR Y. Then R is viewed as a 8x8 matrix 362 of 16-byte registers R_0, R_1, ... , R_63. Then P is first applied 363 rowwise, and then columnwise to get Z: 365 ( Q_0, Q_1, Q_2, ... , Q_7) <- P( R_0, R_1, R_2, ... , R_7) 366 ( Q_8, Q_9, Q_10, ... , Q_15) <- P( R_8, R_9, R_10, ... , R_15) 367 ... 368 (Q_56, Q_57, Q_58, ... , Q_63) <- P(R_56, R_57, R_58, ... , R_63) 369 ( Z_0, Z_8, Z_16, ... , Z_56) <- P( Q_0, Q_8, Q_16, ... , Q_56) 370 ( Z_1, Z_9, Z_17, ... , Z_57) <- P( Q_1, Q_9, Q_17, ... , Q_57) 371 ... 372 ( Z_7, Z_15, Z 23, ... , Z_63) <- P( Q_7, Q_15, Q_23, ... , Q_63) 374 Finally, G outputs Z XOR R: 376 G: (X, Y) -> R = X XOR Y -P-> Q -P-> Z -P-> Z XOR R 377 +---+ +---+ 378 | X | | Y | 379 +---+ +---+ 380 | | 381 ---->XOR<---- 382 --------| 383 | \ / 384 | +---+ 385 | | R | 386 | +---+ 387 | | 388 | \ / 389 | P rowwise 390 | | 391 | \ / 392 | +---+ 393 | | Q | 394 | +---+ 395 | | 396 | \ / 397 | P columnwise 398 | | 399 | \ / 400 | +---+ 401 | | Z | 402 | +---+ 403 | | 404 | \ / 405 ------>XOR 406 | 407 \ / 409 Argon2 compression function G. 411 3.6. Permutation P 413 Permutation P is based on the round function of BLAKE2b. The 8 414 16-byte inputs S_0, S_1, ... , S_7 are viewed as a 4x4 matrix of 415 64-bit words, where S_i = (v_{2*i+1} || v_{2*i}): 417 v_0 v_1 v_2 v_3 418 v_4 v_5 v_6 v_7 419 v_8 v_9 v_10 v_11 420 v_12 v_13 v_14 v_15 422 It works as follows: 424 G(v_0, v_4, v_8, v_12) 425 G(v_1, v_5, v_9, v_13) 426 G(v_2, v_6, v_10, v_14) 427 G(v_3, v_7, v_11, v_15) 429 G(v_0, v_5, v_10, v_15) 430 G(v_1, v_6, v_11, v_12) 431 G(v_2, v_7, v_8, v_13) 432 G(v_3, v_4, v_9, v_14) 434 G(a, b, c, d) is defined as follows: 436 a <- (a + b + 2 * trunc(a) * trunc(b)) mod 2**64 437 d <- (d ^ a) >>> 32 438 c <- (c + d + 2 * trunc(c) * trunc(d)) mod 2**64 439 b <- (b ^ c) >>> 24 441 a <- (a + b + 2 * trunc(a) * trunc(b)) mod 2**64 442 d <- (d ^ a) >>> 16 443 c <- (c + d + 2 * trunc(c) * trunc(d)) mod 2**64 444 b <- (b ^ c) >>> 63 446 The modular additions in G are combined with 64-bit multiplications. 447 Multiplications are the only difference to the original BLAKE2b 448 design. This choice is done to increase the circuit depth and thus 449 the running time of ASIC implementations, while having roughly the 450 same running time on CPUs thanks to parallelism and pipelining. 452 4. Parameter Choice 454 Argon2d is optimized for settings where the adversary does not get 455 regular access to system memory or CPU, i.e. he can not run side- 456 channel attacks based on the timing information, nor he can recover 457 the password much faster using garbage collection. These settings 458 are more typical for backend servers and cryptocurrency minings. For 459 practice we suggest the following settings: 461 o Cryptocurrency mining, that takes 0.1 seconds on a 2 Ghz CPU using 462 1 core -- Argon2d with 2 lanes and 250 MB of RAM. 464 Argon2id is optimized for more realistic settings, where the 465 adversary possibly can access the same machine, use its CPU or mount 466 cold-boot attacks. We suggest the following settings: 468 o Backend server authentication, that takes 0.5 seconds on a 2 GHz 469 CPU using 4 cores -- Argon2id with 8 lanes and 4 GB of RAM. 471 o Key derivation for hard-drive encryption, that takes 3 seconds on 472 a 2 GHz CPU using 2 cores - Argon2id with 4 lanes and 6 GB of RAM. 474 o Frontend server authentication, that takes 0.5 seconds on a 2 GHz 475 CPU using 2 cores - Argon2id with 4 lanes and 1 GB of RAM. 477 We recommend the following procedure to select the type and the 478 parameters for practical use of Argon2. 480 1. Select the type y. If you do not know the difference between 481 them or you consider side-channel attacks as viable threat, 482 choose Argon2id. 484 2. Figure out the maximum number h of threads that can be initiated 485 by each call to Argon2. 487 3. Figure out the maximum amount m of memory that each call can 488 afford. 490 4. Figure out the maximum amount x of time (in seconds) that each 491 call can afford. 493 5. Select the salt length. 128 bits is sufficient for all 494 applications, but can be reduced to 64 bits in the case of space 495 constraints. 497 6. Select the tag length. 128 bits is sufficient for most 498 applications, including key derivation. If longer keys are 499 needed, select longer tags. 501 7. If side-channel attacks is a viable threat, enable the memory 502 wiping option in the library call. 504 8. Run the scheme of type y, memory m and h lanes and threads, using 505 different number of passes t. Figure out the maximum t such that 506 the running time does not exceed x. If it exceeds x even for t = 507 1, reduce m accordingly. 509 9. Hash all the passwords with the just determined values m, h, and 510 t. 512 5. Example Code 513 void fill_block(const block *prev_block, 514 const block *ref_block, 515 block *next_block) { 516 block blockR, block_tmp; 517 unsigned i; 519 copy_block(&blockR, ref_block); 520 xor_block(&blockR, prev_block); 521 copy_block(&block_tmp, &blockR); 523 /* Now blockR = ref_block + prev_block and bloc_tmp = ref_block + 524 prev_block */ 526 /* Apply Blake2 on columns of 64-bit words: (0,1,...,15), 527 then (16,17,..31)... finally (112,113,...127) */ 528 for (i = 0; i < 8; ++i) { 529 BLAKE2_ROUND_NOMSG( 530 blockR.v[16 * i], blockR.v[16 * i + 1], 531 blockR.v[16 * i + 2], blockR.v[16 * i + 3], 532 blockR.v[16 * i + 4], blockR.v[16 * i + 5], 533 blockR.v[16 * i + 6], blockR.v[16 * i + 7], 534 blockR.v[16 * i + 8], blockR.v[16 * i + 9], 535 blockR.v[16 * i + 10], blockR.v[16 * i + 11], 536 blockR.v[16 * i + 12], blockR.v[16 * i + 13], 537 blockR.v[16 * i + 14], blockR.v[16 * i + 15]); 538 } 540 /* Apply Blake2 on rows of 64-bit words: (0,1,16,17,...112,113), 541 then (2,3,18,19,...,114,115), ... and finally 542 (14,15,30,31,...,126,127) */ 543 for (i = 0; i < 8; i++) { 544 BLAKE2_ROUND_NOMSG( 545 blockR.v[2 * i], blockR.v[2 * i + 1], 546 blockR.v[2 * i + 16], blockR.v[2 * i + 17], 547 blockR.v[2 * i + 32], blockR.v[2 * i + 33], 548 blockR.v[2 * i + 48], blockR.v[2 * i + 49], 549 blockR.v[2 * i + 64], blockR.v[2 * i + 65], 550 blockR.v[2 * i + 80], blockR.v[2 * i + 81], 551 blockR.v[2 * i + 96], blockR.v[2 * i + 97], 552 blockR.v[2 * i + 112], blockR.v[2 * i + 113]); 553 } 555 copy_block(next_block, &block_tmp); 556 xor_block(next_block, &blockR); 557 } 558 void fill_block_with_xor(const block *prev_block, 559 const block *ref_block, 560 block *next_block) { 561 block blockR, block_tmp; 562 unsigned i; 564 copy_block(&blockR, ref_block); 565 xor_block(&blockR, prev_block); 566 copy_block(&block_tmp, &blockR); 568 /* Saving the next block contents for XOR over */ 569 xor_block(&block_tmp, next_block); 571 /* Now blockR = ref_block + prev_block and bloc_tmp = ref_block + 572 prev_block + next_block*/ 573 /* Apply Blake2 on columns of 64-bit words: (0,1,...,15) , then 574 (16,17,..31),... and finally (112,113,...127) */ 575 for (i = 0; i < 8; ++i) { 576 BLAKE2_ROUND_NOMSG( 577 blockR.v[16 * i], blockR.v[16 * i + 1], 578 blockR.v[16 * i + 2], blockR.v[16 * i + 3], 579 blockR.v[16 * i + 4], blockR.v[16 * i + 5], 580 blockR.v[16 * i + 6], blockR.v[16 * i + 7], 581 blockR.v[16 * i + 8], blockR.v[16 * i + 9], 582 blockR.v[16 * i + 10], blockR.v[16 * i + 11], 583 blockR.v[16 * i + 12], blockR.v[16 * i + 13], 584 blockR.v[16 * i + 14], blockR.v[16 * i + 15]); 585 } 587 /* Apply Blake2 on rows of 64-bit words: 588 (0,1,16,17,...112,113), then 589 (2,3,18,19,...,114,115), ... and finally 590 (14,15,30,31,...,126,127) */ 591 for (i = 0; i < 8; i++) { 592 BLAKE2_ROUND_NOMSG( 593 blockR.v[2 * i], blockR.v[2 * i + 1], 594 blockR.v[2 * i + 16], blockR.v[2 * i + 17], 595 blockR.v[2 * i + 32], blockR.v[2 * i + 33], 596 blockR.v[2 * i + 48], blockR.v[2 * i + 49], 597 blockR.v[2 * i + 64], blockR.v[2 * i + 65], 598 blockR.v[2 * i + 80], blockR.v[2 * i + 81], 599 blockR.v[2 * i + 96], blockR.v[2 * i + 97], 600 blockR.v[2 * i + 112], blockR.v[2 * i + 113]); 601 } 603 copy_block(next_block, &block_tmp); 604 xor_block(next_block, &blockR); 605 } 606 void generate_addresses(const argon2_instance_t *instance, 607 const argon2_position_t *position, 608 uint64_t *pseudo_rands) { 609 block zero_block, input_block, address_block,tmp_block; 610 uint32_t i; 612 init_block_value(&zero_block, 0); 613 init_block_value(&input_block, 0); 615 if (instance != NULL && position != NULL) { 616 input_block.v[0] = position->pass; 617 input_block.v[1] = position->lane; 618 input_block.v[2] = position->slice; 619 input_block.v[3] = instance->memory_blocks; 620 input_block.v[4] = instance->passes; 621 input_block.v[5] = instance->type; 623 for (i = 0; i < instance->segment_length; ++i) { 624 if (i % ARGON2_ADDRESSES_IN_BLOCK == 0) { 625 input_block.v[6]++; 626 init_block_value(&tmp_block, 0); 627 init_block_value(&address_block, 0); 628 fill_block_with_xor(&zero_block, &input_block, &tmp_block); 629 fill_block_with_xor(&zero_block, &tmp_block, &address_block); 630 } 632 pseudo_rands[i] = address_block.v[i % ARGON2_ADDRESSES_IN_BLOCK]; 633 } 634 } 636 void fill_segment(const argon2_instance_t *instance, 637 argon2_position_t position) { 638 block *ref_block = NULL, *curr_block = NULL; 639 uint64_t pseudo_rand, ref_index, ref_lane; 640 uint32_t prev_offset, curr_offset; 641 uint32_t starting_index; 642 uint32_t i; 643 int data_independent_addressing; 645 /* Pseudo-random values that determine the reference block 646 position */ 647 uint64_t *pseudo_rands = NULL; 649 if (instance == NULL) { 650 return; 651 } 653 data_independent_addressing = (instance->type == Argon2_i); 654 pseudo_rands = (uint64_t *)malloc(sizeof(uint64_t) * 655 (instance->segment_length)); 657 if (pseudo_rands == NULL) { 658 return; 659 } 661 if (data_independent_addressing) { 662 generate_addresses(instance, &position, pseudo_rands); 663 } 665 starting_index = 0; 667 if ((0 == position.pass) && (0 == position.slice)) { 668 /* we have already generated the first two blocks */ 669 starting_index = 2; 670 } 672 /* Offset of the current block */ 673 curr_offset = position.lane * instance->lane_length + 674 position.slice * instance->segment_length + 675 starting_index; 677 if (0 == curr_offset % instance->lane_length) { 678 /* Last block in this lane */ 679 prev_offset = curr_offset + instance->lane_length - 1; 680 } else { 681 /* Previous block */ 682 prev_offset = curr_offset - 1; 683 } 685 for (i = starting_index; i < instance->segment_length; 686 ++i, ++curr_offset, ++prev_offset) { 687 /*1.1 Rotating prev_offset if needed */ 688 if (curr_offset % instance->lane_length == 1) { 689 prev_offset = curr_offset - 1; 690 } 692 /* 1.2 Computing the index of the reference block */ 693 /* 1.2.1 Taking pseudo-random value from the previous block */ 694 if (data_independent_addressing) { 695 pseudo_rand = pseudo_rands[i]; 696 } else { 697 pseudo_rand = instance->memory[prev_offset].v[0]; 698 } 700 /* 1.2.2 Computing the lane of the reference block */ 701 ref_lane = ((pseudo_rand >> 32)) % instance->lanes; 702 if ((position.pass == 0) && (position.slice == 0)) { 703 /* Can not reference other lanes yet */ 704 ref_lane = position.lane; 705 } 707 /* 1.2.3 Computing the number of possible reference block 708 within the lane. */ 709 position.index = i; 710 ref_index = index_alpha(instance, &position, 711 pseudo_rand & 0xFFFFFFFF, 712 ref_lane == position.lane); 714 /* 2 Creating a new block */ 715 ref_block = instance->memory + 716 instance->lane_length * ref_lane + ref_index; 717 curr_block = instance->memory + curr_offset; 718 if (instance->version == ARGON2_OLD_VERSION_NUMBER) { 719 /* version 1.2.1 and earlier: overwrite, not XOR */ 720 fill_block(instance->memory + prev_offset, ref_block, 721 curr_block); 722 } else { 723 if(0 == position.pass) { 724 fill_block(instance->memory + prev_offset, ref_block, 725 curr_block); 726 } else { 727 fill_block_with_xor(instance->memory + prev_offset, 728 ref_block, curr_block); 729 } 730 } 731 } 733 free(pseudo_rands); 734 } 736 uint32_t index_alpha(const argon2_instance_t *instance, 737 const argon2_position_t *position, 738 uint32_t pseudo_rand, 739 int same_lane) { 740 /* 741 * Pass 0: 742 * This lane : all already finished segments plus already 743 * constructed blocks in this segment 744 * Other lanes : all already finished segments 745 * Pass 1+: 746 * This lane : (SYNC_POINTS - 1) last segments plus 747 * already constructed blocks in this segment 748 * Other lanes : (SYNC_POINTS - 1) last segments 749 */ 751 uint32_t reference_area_size; 752 uint64_t relative_position; 753 uint32_t start_position, absolute_position; 755 if (0 == position->pass) { 756 /* First pass */ 757 if (0 == position->slice) { 758 /* First slice */ 759 reference_area_size = 760 position->index - 1; /* all but the previous */ 761 } else { 762 if (same_lane) { 763 /* The same lane => add current segment */ 764 reference_area_size = position->slice * 765 instance->segment_length + 766 position->index - 1; 767 } else { 768 reference_area_size = position->slice * 769 instance->segment_length + 770 ((position->index == 0) ? (-1) : 0); 771 } 772 } 773 } else { 774 /* Second pass */ 775 if (same_lane) { 776 reference_area_size = instance->lane_length - 777 instance->segment_length + 778 position->index - 1; 779 } else { 780 reference_area_size = instance->lane_length - 781 instance->segment_length + 782 ((position->index == 0) ? (-1) : 0); 783 } 784 } 786 /* 1.2.4. Mapping pseudo_rand to 0.. 787 and produce relative position */ 788 relative_position = pseudo_rand; 789 relative_position = relative_position * relative_position >> 32; 790 relative_position = reference_area_size - 1 - 791 (reference_area_size * relative_position >> 32); 793 /* 1.2.5 Computing starting position */ 794 start_position = 0; 796 if (0 != position->pass) { 797 start_position = (position->slice == ARGON2_SYNC_POINTS - 1) 798 ? 0 799 : (position->slice + 1) * 800 instance->segment_length; 801 } 803 /* 1.2.6. Computing absolute position */ 804 absolute_position = (start_position + relative_position) % 805 instance->lane_length; /* absolute position */ 806 return absolute_position; 807 } 809 int fill_memory_blocks(argon2_instance_t *instance) { 810 uint32_t r, s; 811 argon2_thread_handle_t *thread = NULL; 812 argon2_thread_data *thr_data = NULL; 814 if (instance == NULL || instance->lanes == 0) { 815 return ARGON2_THREAD_FAIL; 816 } 818 /* 1. Allocating space for threads */ 819 thread = calloc(instance->lanes, sizeof(argon2_thread_handle_t)); 820 if (thread == NULL) { 821 return ARGON2_MEMORY_ALLOCATION_ERROR; 822 } 824 thr_data = calloc(instance->lanes, sizeof(argon2_thread_data)); 825 if (thr_data == NULL) { 826 free(thread); 827 return ARGON2_MEMORY_ALLOCATION_ERROR; 828 } 830 for (r = 0; r < instance->passes; ++r) { 831 for (s = 0; s < ARGON2_SYNC_POINTS; ++s) { 832 int rc; 833 uint32_t l; 835 /* 2. Calling threads */ 836 for (l = 0; l < instance->lanes; ++l) { 837 argon2_position_t position; 839 /* 2.1 Join a thread if limit is exceeded */ 840 if (l >= instance->threads) { 841 rc = argon2_thread_join(thread[l - instance->threads]); 842 if (rc) { 843 free(thr_data); 844 free(thread); 845 return ARGON2_THREAD_FAIL; 846 } 848 } 850 /* 2.2 Create thread */ 851 position.pass = r; 852 position.lane = l; 853 position.slice = (uint8_t)s; 854 position.index = 0; 855 /* preparing the thread input */ 856 thr_data[l].instance_ptr = instance; 857 memcpy(&(thr_data[l].pos), &position, 858 sizeof(argon2_position_t)); 859 rc = argon2_thread_create(&thread[l], &fill_segment_thr, 860 (void *)&thr_data[l]); 861 if (rc) { 862 free(thr_data); 863 free(thread); 864 return ARGON2_THREAD_FAIL; 865 } 867 /* fill_segment(instance, position); */ 868 /*Non-thread equivalent of the lines above */ 869 } 871 /* 3. Joining remaining threads */ 872 for (l = instance->lanes - instance->threads; l < instance->lanes; 873 ++l) { 874 rc = argon2_thread_join(thread[l]); 875 if (rc) { 876 return ARGON2_THREAD_FAIL; 877 } 878 } 879 } 880 } 882 if (thread != NULL) { 883 free(thread); 884 } 885 if (thr_data != NULL) { 886 free(thr_data); 887 } 889 return ARGON2_OK; 890 } 891 6. Test Vectors 893 This section contains test vectors for Argon2. 895 6.1. Argon2d Test Vectors 897 ======================================= 898 Argon2d version number 19 899 ======================================= 900 Memory: 32 KiB 901 Iterations: 3 902 Parallelism: 4 lanes 903 Tag length: 32 bytes 904 Password[32]: 01 01 01 01 01 01 01 01 905 01 01 01 01 01 01 01 01 906 01 01 01 01 01 01 01 01 907 01 01 01 01 01 01 01 01 908 Salt[16]: 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 909 Secret[8]: 03 03 03 03 03 03 03 03 910 Associated data[12]: 04 04 04 04 04 04 04 04 04 04 04 04 911 Pre-hashing digest: b8 81 97 91 a0 35 96 60 912 bb 77 09 c8 5f a4 8f 04 913 d5 d8 2c 05 c5 f2 15 cc 914 db 88 54 91 71 7c f7 57 915 08 2c 28 b9 51 be 38 14 916 10 b5 fc 2e b7 27 40 33 917 b9 fd c7 ae 67 2b ca ac 918 5d 17 90 97 a4 af 31 09 920 After pass 0: 921 Block 0000 [ 0]: db2fea6b2c6f5c8a 922 Block 0000 [ 1]: 719413be00f82634 923 Block 0000 [ 2]: a1e3f6dd42aa25cc 924 Block 0000 [ 3]: 3ea8efd4d55ac0d1 925 ... 926 Block 0031 [124]: 28d17914aea9734c 927 Block 0031 [125]: 6a4622176522e398 928 Block 0031 [126]: 951aa08aeecb2c05 929 Block 0031 [127]: 6a6c49d2cb75d5b6 931 After pass 1: 932 Block 0000 [ 0]: d3801200410f8c0d 933 Block 0000 [ 1]: 0bf9e8a6e442ba6d 934 Block 0000 [ 2]: e2ca92fe9c541fcc 935 Block 0000 [ 3]: 6269fe6db177a388 936 ... 937 Block 0031 [124]: 9eacfcfbdb3ce0fc 938 Block 0031 [125]: 07dedaeb0aee71ac 939 Block 0031 [126]: 074435fad91548f4 940 Block 0031 [127]: 2dbfff23f31b5883 942 After pass 2: 943 Block 0000 [ 0]: 5f047b575c5ff4d2 944 Block 0000 [ 1]: f06985dbf11c91a8 945 Block 0000 [ 2]: 89efb2759f9a8964 946 Block 0000 [ 3]: 7486a73f62f9b142 947 ... 948 Block 0031 [124]: 57cfb9d20479da49 949 Block 0031 [125]: 4099654bc6607f69 950 Block 0031 [126]: f142a1126075a5c8 951 Block 0031 [127]: c341b3ca45c10da5 952 Tag: 51 2b 39 1b 6f 11 62 97 953 53 71 d3 09 19 73 42 94 954 f8 68 e3 be 39 84 f3 c1 955 a1 3a 4d b9 fa be 4a cb 957 6.2. Argon2i Test Vectors 959 ======================================= 960 Argon2i version number 19 961 ======================================= 962 Memory: 32 KiB 963 Iterations: 3 964 Parallelism: 4 lanes 965 Tag length: 32 bytes 966 Password[32]: 01 01 01 01 01 01 01 01 967 01 01 01 01 01 01 01 01 968 01 01 01 01 01 01 01 01 969 01 01 01 01 01 01 01 01 970 Salt[16]: 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 971 Secret[8]: 03 03 03 03 03 03 03 03 972 Associated data[12]: 04 04 04 04 04 04 04 04 04 04 04 04 973 Pre-hashing digest: c4 60 65 81 52 76 a0 b3 974 e7 31 73 1c 90 2f 1f d8 975 0c f7 76 90 7f bb 7b 6a 976 5c a7 2e 7b 56 01 1f ee 977 ca 44 6c 86 dd 75 b9 46 978 9a 5e 68 79 de c4 b7 2d 979 08 63 fb 93 9b 98 2e 5f 980 39 7c c7 d1 64 fd da a9 982 After pass 0: 983 Block 0000 [ 0]: f8f9e84545db08f6 984 Block 0000 [ 1]: 9b073a5c87aa2d97 985 Block 0000 [ 2]: d1e868d75ca8d8e4 986 Block 0000 [ 3]: 349634174e1aebcc 987 ... 988 Block 0031 [124]: 975f596583745e30 989 Block 0031 [125]: e349bdd7edeb3092 990 Block 0031 [126]: b751a689b7a83659 991 Block 0031 [127]: c570f2ab2a86cf00 993 After pass 1: 994 Block 0000 [ 0]: b2e4ddfcf76dc85a 995 Block 0000 [ 1]: 4ffd0626c89a2327 996 Block 0000 [ 2]: 4af1440fff212980 997 Block 0000 [ 3]: 1e77299c7408505b 998 ... 999 Block 0031 [124]: e4274fd675d1e1d6 1000 Block 0031 [125]: 903fffb7c4a14c98 1001 Block 0031 [126]: 7e5db55def471966 1002 Block 0031 [127]: 421b3c6e9555b79d 1004 After pass 2: 1005 Block 0000 [ 0]: af2a8bd8482c2f11 1006 Block 0000 [ 1]: 785442294fa55e6d 1007 Block 0000 [ 2]: 9256a768529a7f96 1008 Block 0000 [ 3]: 25a1c1f5bb953766 1009 ... 1010 Block 0031 [124]: 68cf72fccc7112b9 1011 Block 0031 [125]: 91e8c6f8bb0ad70d 1012 Block 0031 [126]: 4f59c8bd65cbb765 1013 Block 0031 [127]: 71e436f035f30ed0 1014 Tag: c8 14 d9 d1 dc 7f 37 aa 1015 13 f0 d7 7f 24 94 bd a1 1016 c8 de 6b 01 6d d3 88 d2 1017 99 52 a4 c4 67 2b 6c e8 1019 6.3. Argon2id Test Vectors 1020 ======================================= 1021 Argon2id version number 19 1022 ======================================= 1023 Memory: 32 KiB, Iterations: 3, Parallelism: 4 lanes, Tag length: 32 bytes 1024 Password[32]: 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 1025 Salt[16]: 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 1026 Secret[8]: 03 03 03 03 03 03 03 03 1027 Associated data[12]: 04 04 04 04 04 04 04 04 04 04 04 04 1028 Pre-hashing digest: 28 89 de 48 7e b4 2a e5 00 c0 00 7e d9 25 2f 10 69 ea de c4 0d 57 65 b4 85 de 6d c2 43 7a 67 b8 54 6a 2f 0a cc 1a 08 82 db 8f cf 74 71 4b 47 2e 94 df 42 1a 5d a1 11 2f fa 11 43 43 70 a1 e9 97 1030 After pass 0: 1031 Block 0000 [ 0]: 6b2e09f10671bd43 1032 Block 0000 [ 1]: f69f5c27918a21be 1033 Block 0000 [ 2]: dea7810ea41290e1 1034 Block 0000 [ 3]: 6787f7171870f893 1035 ... 1036 Block 0031 [124]: 377fa81666dc7f2b 1037 Block 0031 [125]: 50e586398a9c39c8 1038 Block 0031 [126]: 6f732732a550924a 1039 Block 0031 [127]: 81f88b28683ea8e5 1041 After pass 1: 1042 Block 0000 [ 0]: 3653ec9d01583df9 1043 Block 0000 [ 1]: 69ef53a72d1e1fd3 1044 Block 0000 [ 2]: 35635631744ab54f 1045 Block 0000 [ 3]: 599512e96a37ab6e 1046 ... 1047 Block 0031 [124]: 4d4b435cea35caa6 1048 Block 0031 [125]: c582210d99ad1359 1049 Block 0031 [126]: d087971b36fd6d77 1050 Block 0031 [127]: a55222a93754c692 1052 After pass 2: 1053 Block 0000 [ 0]: 942363968ce597a4 1054 Block 0000 [ 1]: a22448c0bdad5760 1055 Block 0000 [ 2]: a5f80662b6fa8748 1056 Block 0000 [ 3]: a0f9b9ce392f719f 1057 ... 1058 Block 0031 [124]: d723359b485f509b 1059 Block 0031 [125]: cb78824f42375111 1060 Block 0031 [126]: 35bc8cc6e83b1875 1061 Block 0031 [127]: 0b012846a40f346a 1062 Tag: 0d 64 0d f5 8d 78 76 6c 08 c0 37 a3 4a 8b 53 c9 d0 1e f0 45 2d 75 b6 5e b5 25 20 e9 6b 01 e6 59 1063 7. Acknowledgements 1065 TBA 1067 8. IANA Considerations 1069 None. 1071 9. Security Considerations 1073 9.1. Security as hash function and KDF 1075 The collision and preimage resistance levels of Argon2 are equivalent 1076 to those of the underlying Blake2b hash function. To produce a 1077 collision, 2**256 inputs are needed. To find a preimage, 2**512 1078 inputs must be tried. 1080 The KDF security is determined by the key length and the size of the 1081 internal state of hash function H'. To distinguish the output of 1082 keyed Argon2 from random, minimum of (2**128,2**length(K)) calls to 1083 Blake2b is needed. 1085 9.2. Security against time-space tradeoff attacks 1087 Time-space tradeoffs allow computing a memory-hard function storing 1088 fewer memory blocks at the cost of more calls to the internal 1089 comression function. The advantage of tradeoff attacks is measured 1090 in the reduction factor to the time-area product, where memory and 1091 extra compression function cores contribute to the area, and time is 1092 increased to accomodate the recomputation of missed blocks. A high 1093 reduction factor may potentially speed up preimage search. 1095 The best attacks on the 1-pass and 2-pass Argon2i is the low-storage 1096 attack described in [CBS16], which reduces the time-area product 1097 (using the peak memory value) by the factor of 5. The best attack on 1098 3-pass and more Argon2i is [AB16] with reduction factor being a 1099 function of memory size and the number of passes. For 1 GiB of 1100 memory: 3 for 3 passes, 2.5 for 4 passes, 2 for 6 passes. The 1101 reduction factor grows by about 0.5 with every doubling the memory 1102 size. To completely prevent time-space tradeoffs from [AB16], number 1103 t of passes must exceed binary logarithm of memory minus 26. 1105 The best tradeoff attack on t-pass Argon2d is the ranking tradeoff 1106 attack, which reduces the time-area product by the factor of 1.33. 1108 The best tradeoff attack on 1-pass Argon2id is the combined low- 1109 storage attack (for the first half of the memory) and the ranking 1110 attack (for the second half), which bring together the factor of 1111 about 2.1. The best tradeoff attack on t-pass Argon2d is the ranking 1112 tradeoff attack, which reduces the time-area product by the factor of 1113 1.33. 1115 9.3. Security for time-bounded defenders 1117 A bottleneck in a system employing the password-hashing function is 1118 often the function latency rather than memory costs. A rational 1119 defender would then maximize the bruteforce costs for the attacker 1120 equipped with a list of hashes, salts, and timing information, for 1121 fixed computing time on the defender's machine. The attack cost 1122 estimates from [AB16] imply that for Argon2i 3 passes is almost 1123 optimal for the most of reasonable memory sizes, and that for Argon2d 1124 and Argon2id 1 pass maximizes the attack costs for the constant 1125 defender time. 1127 9.4. Recommendations 1129 The Argon2id variant with t=1 and maximum available memory is 1130 recommended as a default setting for all environments. This setting 1131 is secure against side-channel attacks and maximizes adversarial 1132 costs on dedicated bruteforce hardware. 1134 10. References 1136 10.1. Normative References 1138 [I-D.saarinen-blake2] 1139 Saarinen, M. and J. Aumasson, "The BLAKE2 Cryptographic 1140 Hash and MAC", draft-saarinen-blake2-06 (work in 1141 progress), August 2015. 1143 10.2. Informative References 1145 [ARGON2] Biryukov, A., Dinu, D., and D. Khovratovich, "Argon2: the 1146 memory-hard function for password hashing and other 1147 applications", 1148 WWW , 1149 October 2015. 1151 [CBS16] Corrigan-Gibbs, H., Boneh, D., and S. Schechter, "Balloon 1152 Hashing: Provably Space-Hard Hash Functions with Data- 1153 Independent Access Patterns", 1154 WWW , January 2016. 1156 [AB16] Alwen, J. and J. Blocki, "Efficiently Computing Data- 1157 Independent Memory-Hard Functions", 1158 WWW , December 2015. 1160 [AB15] Biryukov, A. and D. Khovratovich, "Tradeoff Cryptanalysis 1161 of Memory-Hard Functions", 1162 Asiacrypt'15 , 1163 December 2015. 1165 Authors' Addresses 1167 Alex Biryukov 1168 University of Luxembourg 1170 Email: alex.biryukov@uni.lu 1172 Daniel Dinu 1173 University of Luxembourg 1175 Email: dumitru-daniel.dinu@uni.lu 1177 Dmitry Khovratovich 1178 University of Luxembourg 1180 Email: dmitry.khovratovich@uni.lu 1182 Simon Josefsson 1183 SJD AB 1185 Email: simon@josefsson.org 1186 URI: http://josefsson.org/