idnits 2.17.1 draft-irtf-cfrg-argon2-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 22, 2016) is 2772 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '0' on line 684 -- Looks like a reference, but probably isn't: '1' on line 604 -- Looks like a reference, but probably isn't: '2' on line 605 -- Looks like a reference, but probably isn't: '3' on line 606 -- Looks like a reference, but probably isn't: '4' on line 607 -- Looks like a reference, but probably isn't: '5' on line 608 -- Looks like a reference, but probably isn't: '32' on line 953 -- Looks like a reference, but probably isn't: '16' on line 957 -- Looks like a reference, but probably isn't: '8' on line 958 -- Looks like a reference, but probably isn't: '12' on line 959 -- Looks like a reference, but probably isn't: '124' on line 997 -- Looks like a reference, but probably isn't: '125' on line 998 -- Looks like a reference, but probably isn't: '126' on line 999 -- Looks like a reference, but probably isn't: '127' on line 1000 Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 16 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Biryukov 3 Internet-Draft D. Dinu 4 Intended status: Informational D. Khovratovich 5 Expires: March 26, 2017 University of Luxembourg 6 S. Josefsson 7 SJD AB 8 September 22, 2016 10 The memory-hard Argon2 password hash and proof-of-work function 11 draft-irtf-cfrg-argon2-01 13 Abstract 15 This document describes the Argon2 memory-hard function for password 16 hashing and proof-of-work applications. We provide an implementer 17 oriented description together with sample code and test vectors. The 18 purpose is to simplify adoption of Argon2 for Internet protocols. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on March 26, 2017. 37 Copyright Notice 39 Copyright (c) 2016 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 2. Notation and Conventions . . . . . . . . . . . . . . . . . . 3 56 3. Argon2 Algorithm . . . . . . . . . . . . . . . . . . . . . . 3 57 3.1. Argon2 Inputs and Outputs . . . . . . . . . . . . . . . . 3 58 3.2. Argon2 Operation . . . . . . . . . . . . . . . . . . . . 4 59 3.3. Variable-length hash function H' . . . . . . . . . . . . 5 60 3.4. Indexing . . . . . . . . . . . . . . . . . . . . . . . . 6 61 3.4.1. Getting the 32-bit values J_1 and J_2 . . . . . . . . 6 62 3.4.2. Mapping J_1 and J_2 to reference block index . . . . 7 63 3.5. Compression function G . . . . . . . . . . . . . . . . . 8 64 3.6. Permutation P . . . . . . . . . . . . . . . . . . . . . . 9 65 4. Parameter Choice . . . . . . . . . . . . . . . . . . . . . . 10 66 5. Example Code . . . . . . . . . . . . . . . . . . . . . . . . 11 67 6. Test Vectors . . . . . . . . . . . . . . . . . . . . . . . . 20 68 6.1. Argon2d Test Vectors . . . . . . . . . . . . . . . . . . 20 69 6.2. Argon2i Test Vectors . . . . . . . . . . . . . . . . . . 21 70 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 22 71 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 72 9. Security Considerations . . . . . . . . . . . . . . . . . . . 22 73 9.1. Security as hash function and KDF . . . . . . . . . . . . 22 74 9.2. Security against time-space tradeoff attacks . . . . . . 23 75 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 23 76 10.1. Normative References . . . . . . . . . . . . . . . . . . 23 77 10.2. Informative References . . . . . . . . . . . . . . . . . 23 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 24 80 1. Introduction 82 This document describes the Argon2 memory-hard function for password 83 hashing and proof-of-work applications. We provide an implementer 84 oriented description together with sample code and test vectors. The 85 purpose is to simplify adoption of Argon2 for Internet protocols. 86 This document corresponds to version 1.3 of the Argon2 hash function. 88 Argon2 summarizes the state of the art in the design of memory-hard 89 functions. It is a streamlined and simple design. It aims at the 90 highest memory filling rate and effective use of multiple computing 91 units, while still providing defense against tradeoff attacks. 92 Argon2 is optimized for the x86 architecture and exploits the cache 93 and memory organization of the recent Intel and AMD processors. 94 Argon2 has two variants: Argon2d and Argon2i. Argon2d is faster and 95 uses data-depending memory access, which makes it suitable for 96 cryptocurrencies and proof-of-work applications with no threats from 97 side-channel timing attacks. Argon2i uses data-independent memory 98 access, which is preferred for password hashing and password-based 99 key derivation. Argon2i is slower as it makes more passes over the 100 memory to protect from tradeoff attacks. 102 For further background and discussion, see the Argon2 paper [ARGON2]. 104 2. Notation and Conventions 106 x**y --- x multiplied by itself y times 108 a*b --- multiplication of a and b 110 c-d --- substraction of c with d 112 E_f --- variable E with subscript index f 114 g / h --- g divided by h 116 I(j) --- function I evaluated on parameter j 118 K || L --- string K concatenated with string L 120 a ^ b --- bitwise exclusive-or between a and b 122 a mod b --- remainder of a modulo b, always in range [0, b-1] 124 a >>> n --- rotation of a to the right by n bits 126 trunc(a) --- the 64-bit value a truncated to the 32 least significant 127 bits 129 extract(a, i) --- the i-th set of 32-bits from a 131 |A| --- the number of elements in set A 133 3. Argon2 Algorithm 135 3.1. Argon2 Inputs and Outputs 137 Argon2 has the following input parameters: 139 o Message string P, which is a password for password hashing 140 applications. May have any length from 0 to 2**32 - 1 bytes. 142 o Nonce S, which is a salt for password hashing applications. May 143 have any length from 8 to 2**32-1 bytes. 16 bytes is recommended 144 for password hashing. Salt must be unique for each password. 146 o Degree of parallelism p determines how many independent (but 147 synchronizing) computational chains (lanes) can be run. It may 148 take any integer value from 1 to 2**24-1. 150 o Tag length T may be any integer number of bytes from 4 to 2**32-1. 152 o Memory size m can be any integer number of kibibytes from 8*p to 153 2**32-1. The actual number of blocks is m', which is m rounded 154 down to the nearest multiple of 4*p. 156 o Number of iterations t (used to tune the running time 157 independently of the memory size) can be any integer number from 1 158 to 2**32-1. 160 o Version number v is one byte 0x13. 162 o Secret value K (serves as key if necessary, but we do not assume 163 any key use by default) may have any length from 0 to 32 bytes. 165 o Associated data X may have any length from 0 to 2**32-1 bytes. 167 o Type y of Argon2: 0 for Argon2d, 1 for Argon2i. 169 The Argon2 output is a T-length string. 171 3.2. Argon2 Operation 173 Argon2 uses an internal compression function G with two 1024-byte 174 inputs and a 1024-byte output, and an internal hash function H. Here 175 H is the BLAKE2b [I-D.saarinen-blake2] hash function, and the 176 compression function G is based on its internal permutation. A 177 variable-length hash function H' built upon H is also used. G and H' 178 are described in later section. 180 The Argon2 operation is as follows. 182 1. Establish H_0 as the 64-bit value as shown in the figure below. 183 H is BLAKE2b and the non-strings p, T, m, t, v, y, length(P), 184 length(S), length(K), and length(X) are treated as a 32-bit 185 little-endian encoding of the integer. 187 H_0 = H(p, T, m, t, v, y, length(P), P, length(S), S, 188 length(K), K, length(X), X) 190 2. Allocate the memory as m' 1024-byte blocks where m' is derived 191 as: 193 m' = 4 * p * floor (m / 4p) 195 For p lanes, the memory is organized in a matrix B[i][j] of 196 blocks with p rows (lanes) and q = m' / p columns. 198 3. Compute B[i][0] for all i ranging from (and including) 0 to (not 199 including) p. 201 B[i][0] = H'(H_0, 0, i) 203 Here integers are padded to 4 bytes and encoded in little endian. 205 4. Compute B[i][1] for all i ranging from (and including) 0 to (not 206 including) p. 208 B[i][1] = H'(H_0, 1, i) 210 Here integers are padded to 4 bytes and encoded in little endian. 212 5. Compute B[i][j] for all i ranging from (and including) 0 to (not 213 including) p, and for all j ranging from (and including) 2) to 214 (not including) q. The block indices i' and j' are determined 215 differently for Argon2d and Argon2i. 217 B[i][j] = G(B[i][j-1], B[i'][j']) 219 6. If the number of iterations t is larger than 1, we repeat the 220 steps however replacing the computations with the following 221 expression: 223 B[i][0] = G(B[i][q-1], B[i'][j']) 224 B[i][j] = G(B[i][j-1], B[i'][j']) 226 7. After t steps have been iterated, the final block C is computed 227 as the XOR of the last column: 229 C = B[0][q-1] XOR B[1][q-1] XOR ... XOR B[p-1][q-1] 231 8. The output tag is computed as H'(C). 233 3.3. Variable-length hash function H' 235 Let H_x be a hash function with x-byte output (in our case H_x is 236 BLAKE2b, which supports x between 1 and 64 inclusive). Let V_i be a 237 64-byte block, and A_i be its first 32 bytes, and T < 2**32 be the 238 tag length in bytes, encoded in little-endian as 32-bit integer. 239 Then we define: 241 if T <= 64 242 H'(X) = H_T(T||X) 243 else 244 r = ceil(T/32)-2 245 V_1 = H_64(T||X) 246 V_2 = H_64(V_1) 247 ... 248 V_r = H_64(V_{r-1}) 249 V_{r+1} = H_{T-32*r}(V_{r}) 250 H'(X) = A_1 || A_2 || ... || A_r || V_{r+1} 252 3.4. Indexing 254 To enable parallel block computation, we further partition the memory 255 matrix into S = 4 vertical slices. The intersection of a slice and a 256 lane is a segment of length q/S. Segments of the same slice are 257 computed in parallel and may not reference blocks from each other. 258 All other blocks can be referenced. 260 slice 0 slice 1 slice 2 slice 3 261 ___/\___ ___/\___ ___/\___ ___/\___ 262 / \ / \ / \ / \ 263 +----------+----------+----------+----------+ 264 | | | | | > lane 0 265 +----------+----------+----------+----------+ 266 | | | | | > lane 1 267 +----------+----------+----------+----------+ 268 | | | | | > lane 2 269 +----------+----------+----------+----------+ 270 | ... ... ... | ... 271 +----------+----------+----------+----------+ 272 | | | | | > lane p - 1 273 +----------+----------+----------+----------+ 275 Single-pass Argon2 with p lanes and 4 slices 277 3.4.1. Getting the 32-bit values J_1 and J_2 279 3.4.1.1. Argon2d 281 J_1 is given by the first 32 bits of block B[i][j-1], while J_2 is 282 given by the next 32-bits of block B[i][j-1]: 284 J_1 = extract(B[i][j-1], 1) 285 J_2 = extract(B[i][j-1], 2) 287 3.4.1.2. Argon2i 289 Each application of the 2-round compression function G in the counter 290 mode gives 128 64-bit values J_1 || J_2. The first input is the all 291 zero block and the second input is constructed as follows: 293 ( r || l || s || m' || t || x || i || 0 ), where 295 r -- the pass number 296 l -- the lane number 297 s -- the slice number 298 m' -- the total number of memory blocks 299 t -- the total number of passes 300 x -- the Argon2 type (0 for Argon2d and 1 for Argon2i) 301 i -- the counter (starts from 1 in each segment) 303 The values r, l, s, m', t, x, i are represented on 8 bytes in little- 304 endian. 306 3.4.2. Mapping J_1 and J_2 to reference block index 308 The value of l = J_2 mod p gives the index of the lane from which the 309 block will be taken. For the firt pass (r=0) and the first slice 310 (s=0) the block is taken from the current lane. 312 The set R contains the indices that can be referenced according to 313 the following rules: 315 1. If l is the currnt lane, then R includes the indices of all 316 blocks computed in this lane that are not overwritten yet, 317 excluding B[i][j-1] 319 2. If l is not the current lane, then R includes the indices of all 320 blocks in the last S - 1 = 3 segments computed and finished in 321 lane l. If B[i][j] is the first block of a segment, then the 322 very last index from R is excluded. 324 We are going to take a block from R with a non-uniform distribution 325 over [0, |R|): 327 J_1 in [0, |R|) -> |R|(1 - J_1**2 / 2**64) 329 To avoid floating poit computation, the following approximation is 330 used: 332 x = J_1**2 / 2**32 333 y = (|R| * x) / 2**32 334 z = |R| - 1 - y 336 The value of z gives the reference block index in R. 338 3.5. Compression function G 340 Compression function G is built upon the BLAKE2b round function P. P 341 operates on the 128-byte input, which can be viewed as 8 16-byte 342 registers: 344 P(A_0, A_1, ... ,A_7) = (B_0, B_1, ... ,B_7) 346 Compression function G(X, Y) operates on two 1024-byte blocks X and 347 Y. It first computes R = X XOR Y. Then R is viewed as a 8x8 matrix 348 of 16-byte registers R_0, R_1, ... , R_63. Then P is first applied 349 rowwise, and then columnwise to get Z: 351 ( Q_0, Q_1, Q_2, ... , Q_7) <- P( R_0, R_1, R_2, ... , R_7) 352 ( Q_8, Q_9, Q_10, ... , Q_15) <- P( R_8, R_9, R_10, ... , R_15) 353 ... 354 (Q_56, Q_57, Q_58, ... , Q_63) <- P(R_56, R_57, R_58, ... , R_63) 355 ( Z_0, Z_8, Z_16, ... , Z_56) <- P( Q_0, Q_8, Q_16, ... , Q_56) 356 ( Z_1, Z_9, Z_17, ... , Z_57) <- P( Q_1, Q_9, Q_17, ... , Q_57) 357 ... 358 ( Z_7, Z_15, Z 23, ... , Z_63) <- P( Q_7, Q_15, Q_23, ... , Q_63) 360 Finally, G outputs Z XOR R: 362 G: (X, Y) -> R = X XOR Y -P-> Q -P-> Z -P-> Z XOR R 363 +---+ +---+ 364 | X | | Y | 365 +---+ +---+ 366 | | 367 ---->XOR<---- 368 --------| 369 | \ / 370 | +---+ 371 | | R | 372 | +---+ 373 | | 374 | \ / 375 | P rowwise 376 | | 377 | \ / 378 | +---+ 379 | | Q | 380 | +---+ 381 | | 382 | \ / 383 | P columnwise 384 | | 385 | \ / 386 | +---+ 387 | | Z | 388 | +---+ 389 | | 390 | \ / 391 ------>XOR 392 | 393 \ / 395 Argon2 compression function G. 397 3.6. Permutation P 399 Permutation P is based on the round function of BLAKE2b. The 8 400 16-byte inputs S_0, S_1, ... , S_7 are viewed as a 4x4 matrix of 401 64-bit words, where S_i = (v_{2*i+1} || v_{2*i}): 403 v_0 v_1 v_2 v_3 404 v_4 v_5 v_6 v_7 405 v_8 v_9 v_10 v_11 406 v_12 v_13 v_14 v_15 408 It works as follows: 410 G(v_0, v_4, v_8, v_12) 411 G(v_1, v_5, v_9, v_13) 412 G(v_2, v_6, v_10, v_14) 413 G(v_3, v_7, v_11, v_15) 415 G(v_0, v_5, v_10, v_15) 416 G(v_1, v_6, v_11, v_12) 417 G(v_2, v_7, v_8, v_13) 418 G(v_3, v_4, v_9, v_14) 420 G(a, b, c, d) is defined as follows: 422 a <- (a + b + 2 * trunc(a) * trunc(b)) mod 2**64 423 d <- (d ^ a) >>> 32 424 c <- (c + d + 2 * trunc(c) * trunc(d)) mod 2**64 425 b <- (b ^ c) >>> 24 427 a <- (a + b + 2 * trunc(a) * trunc(b)) mod 2**64 428 d <- (d ^ a) >>> 16 429 c <- (c + d + 2 * trunc(c) * trunc(d)) mod 2**64 430 b <- (b ^ c) >>> 63 432 The modular additions in G are combined with 64-bit multiplications. 433 Multiplications are the only difference to the original BLAKE2b 434 design. This choice is done to increase the circuit depth and thus 435 the running time of ASIC implementations, while having roughly the 436 same running time on CPUs thanks to parallelism and pipelining. 438 4. Parameter Choice 440 Argon2d is optimized for settings where the adversary does not get 441 regular access to system memory or CPU, i.e. he can not run side- 442 channel attacks based on the timing information, nor he can recover 443 the password much faster using garbage collection. These settings 444 are more typical for backend servers and cryptocurrency minings. For 445 practice we suggest the following settings: 447 o Cryptocurrency mining, that takes 0.1 seconds on a 2 Ghz CPU using 448 1 core -- Argon2d with 2 lanes and 250 MB of RAM. 450 o Backend server authentication, that takes 0.5 seconds on a 2 GHz 451 CPU using 4 cores -- Argon2d with 8 lanes and 4 GB of RAM. 453 Argon2i is optimized for more realistic settings, where the adversary 454 possibly can access the same machine, use its CPU or mount cold-boot 455 attacks. We use three passes to get rid entirely of the password in 456 the memory. We suggest the following settings: 458 o Key derivation for hard-drive encryption, that takes 3 seconds on 459 a 2 GHz CPU using 2 cores - Argon2i with 4 lanes and 6 GB of RAM. 461 o Frontend server authentication, that takes 0.5 seconds on a 2 GHz 462 CPU using 2 cores - Argon2i with 4 lanes and 1 GB of RAM. 464 We recommend the following procedure to select the type and the 465 parameters for practical use of Argon2. 467 1. Select the type y. If you do not know the difference between 468 them or you consider side-channel attacks as viable threat, 469 choose Argon2i. 471 2. Figure out the maximum number h of threads that can be initiated 472 by each call to Argon2. 474 3. Figure out the maximum amount m of memory that each call can 475 afford. 477 4. Figure out the maximum amount x of time (in seconds) that each 478 call can afford. 480 5. Select the salt length. 128 bits is sufficient for all 481 applications, but can be reduced to 64 bits in the case of space 482 constraints. 484 6. Select the tag length. 128 bits is sufficient for most 485 applications, including key derivation. If longer keys are 486 needed, select longer tags. 488 7. If side-channel attacks is a viable threat, enable the memory 489 wiping option in the library call. 491 8. Run the scheme of type y, memory m and h lanes and threads, using 492 different number of passes t. Figure out the maximum t such that 493 the running time does not exceed x. If it exceeds x even for t = 494 1, reduce m accordingly. 496 9. Hash all the passwords with the just determined values m, h, and 497 t. 499 5. Example Code 500 void fill_block(const block *prev_block, 501 const block *ref_block, 502 block *next_block) { 503 block blockR, block_tmp; 504 unsigned i; 506 copy_block(&blockR, ref_block); 507 xor_block(&blockR, prev_block); 508 copy_block(&block_tmp, &blockR); 510 /* Now blockR = ref_block + prev_block and bloc_tmp = ref_block + 511 prev_block */ 513 /* Apply Blake2 on columns of 64-bit words: (0,1,...,15), 514 then (16,17,..31)... finally (112,113,...127) */ 515 for (i = 0; i < 8; ++i) { 516 BLAKE2_ROUND_NOMSG( 517 blockR.v[16 * i], blockR.v[16 * i + 1], 518 blockR.v[16 * i + 2], blockR.v[16 * i + 3], 519 blockR.v[16 * i + 4], blockR.v[16 * i + 5], 520 blockR.v[16 * i + 6], blockR.v[16 * i + 7], 521 blockR.v[16 * i + 8], blockR.v[16 * i + 9], 522 blockR.v[16 * i + 10], blockR.v[16 * i + 11], 523 blockR.v[16 * i + 12], blockR.v[16 * i + 13], 524 blockR.v[16 * i + 14], blockR.v[16 * i + 15]); 525 } 527 /* Apply Blake2 on rows of 64-bit words: (0,1,16,17,...112,113), 528 then (2,3,18,19,...,114,115), ... and finally 529 (14,15,30,31,...,126,127) */ 530 for (i = 0; i < 8; i++) { 531 BLAKE2_ROUND_NOMSG( 532 blockR.v[2 * i], blockR.v[2 * i + 1], 533 blockR.v[2 * i + 16], blockR.v[2 * i + 17], 534 blockR.v[2 * i + 32], blockR.v[2 * i + 33], 535 blockR.v[2 * i + 48], blockR.v[2 * i + 49], 536 blockR.v[2 * i + 64], blockR.v[2 * i + 65], 537 blockR.v[2 * i + 80], blockR.v[2 * i + 81], 538 blockR.v[2 * i + 96], blockR.v[2 * i + 97], 539 blockR.v[2 * i + 112], blockR.v[2 * i + 113]); 540 } 542 copy_block(next_block, &block_tmp); 543 xor_block(next_block, &blockR); 544 } 545 void fill_block_with_xor(const block *prev_block, 546 const block *ref_block, 547 block *next_block) { 548 block blockR, block_tmp; 549 unsigned i; 551 copy_block(&blockR, ref_block); 552 xor_block(&blockR, prev_block); 553 copy_block(&block_tmp, &blockR); 555 /* Saving the next block contents for XOR over */ 556 xor_block(&block_tmp, next_block); 558 /* Now blockR = ref_block + prev_block and bloc_tmp = ref_block + 559 prev_block + next_block*/ 560 /* Apply Blake2 on columns of 64-bit words: (0,1,...,15) , then 561 (16,17,..31),... and finally (112,113,...127) */ 562 for (i = 0; i < 8; ++i) { 563 BLAKE2_ROUND_NOMSG( 564 blockR.v[16 * i], blockR.v[16 * i + 1], 565 blockR.v[16 * i + 2], blockR.v[16 * i + 3], 566 blockR.v[16 * i + 4], blockR.v[16 * i + 5], 567 blockR.v[16 * i + 6], blockR.v[16 * i + 7], 568 blockR.v[16 * i + 8], blockR.v[16 * i + 9], 569 blockR.v[16 * i + 10], blockR.v[16 * i + 11], 570 blockR.v[16 * i + 12], blockR.v[16 * i + 13], 571 blockR.v[16 * i + 14], blockR.v[16 * i + 15]); 572 } 574 /* Apply Blake2 on rows of 64-bit words: 575 (0,1,16,17,...112,113), then 576 (2,3,18,19,...,114,115), ... and finally 577 (14,15,30,31,...,126,127) */ 578 for (i = 0; i < 8; i++) { 579 BLAKE2_ROUND_NOMSG( 580 blockR.v[2 * i], blockR.v[2 * i + 1], 581 blockR.v[2 * i + 16], blockR.v[2 * i + 17], 582 blockR.v[2 * i + 32], blockR.v[2 * i + 33], 583 blockR.v[2 * i + 48], blockR.v[2 * i + 49], 584 blockR.v[2 * i + 64], blockR.v[2 * i + 65], 585 blockR.v[2 * i + 80], blockR.v[2 * i + 81], 586 blockR.v[2 * i + 96], blockR.v[2 * i + 97], 587 blockR.v[2 * i + 112], blockR.v[2 * i + 113]); 588 } 590 copy_block(next_block, &block_tmp); 591 xor_block(next_block, &blockR); 592 } 593 void generate_addresses(const argon2_instance_t *instance, 594 const argon2_position_t *position, 595 uint64_t *pseudo_rands) { 596 block zero_block, input_block, address_block,tmp_block; 597 uint32_t i; 599 init_block_value(&zero_block, 0); 600 init_block_value(&input_block, 0); 602 if (instance != NULL && position != NULL) { 603 input_block.v[0] = position->pass; 604 input_block.v[1] = position->lane; 605 input_block.v[2] = position->slice; 606 input_block.v[3] = instance->memory_blocks; 607 input_block.v[4] = instance->passes; 608 input_block.v[5] = instance->type; 610 for (i = 0; i < instance->segment_length; ++i) { 611 if (i % ARGON2_ADDRESSES_IN_BLOCK == 0) { 612 input_block.v[6]++; 613 init_block_value(&tmp_block, 0); 614 init_block_value(&address_block, 0); 615 fill_block_with_xor(&zero_block, &input_block, &tmp_block); 616 fill_block_with_xor(&zero_block, &tmp_block, &address_block); 617 } 619 pseudo_rands[i] = address_block.v[i % ARGON2_ADDRESSES_IN_BLOCK]; 620 } 621 } 623 void fill_segment(const argon2_instance_t *instance, 624 argon2_position_t position) { 625 block *ref_block = NULL, *curr_block = NULL; 626 uint64_t pseudo_rand, ref_index, ref_lane; 627 uint32_t prev_offset, curr_offset; 628 uint32_t starting_index; 629 uint32_t i; 630 int data_independent_addressing; 632 /* Pseudo-random values that determine the reference block 633 position */ 634 uint64_t *pseudo_rands = NULL; 636 if (instance == NULL) { 637 return; 638 } 640 data_independent_addressing = (instance->type == Argon2_i); 641 pseudo_rands = (uint64_t *)malloc(sizeof(uint64_t) * 642 (instance->segment_length)); 644 if (pseudo_rands == NULL) { 645 return; 646 } 648 if (data_independent_addressing) { 649 generate_addresses(instance, &position, pseudo_rands); 650 } 652 starting_index = 0; 654 if ((0 == position.pass) && (0 == position.slice)) { 655 /* we have already generated the first two blocks */ 656 starting_index = 2; 657 } 659 /* Offset of the current block */ 660 curr_offset = position.lane * instance->lane_length + 661 position.slice * instance->segment_length + 662 starting_index; 664 if (0 == curr_offset % instance->lane_length) { 665 /* Last block in this lane */ 666 prev_offset = curr_offset + instance->lane_length - 1; 667 } else { 668 /* Previous block */ 669 prev_offset = curr_offset - 1; 670 } 672 for (i = starting_index; i < instance->segment_length; 673 ++i, ++curr_offset, ++prev_offset) { 674 /*1.1 Rotating prev_offset if needed */ 675 if (curr_offset % instance->lane_length == 1) { 676 prev_offset = curr_offset - 1; 677 } 679 /* 1.2 Computing the index of the reference block */ 680 /* 1.2.1 Taking pseudo-random value from the previous block */ 681 if (data_independent_addressing) { 682 pseudo_rand = pseudo_rands[i]; 683 } else { 684 pseudo_rand = instance->memory[prev_offset].v[0]; 685 } 687 /* 1.2.2 Computing the lane of the reference block */ 688 ref_lane = ((pseudo_rand >> 32)) % instance->lanes; 689 if ((position.pass == 0) && (position.slice == 0)) { 690 /* Can not reference other lanes yet */ 691 ref_lane = position.lane; 692 } 694 /* 1.2.3 Computing the number of possible reference block 695 within the lane. */ 696 position.index = i; 697 ref_index = index_alpha(instance, &position, 698 pseudo_rand & 0xFFFFFFFF, 699 ref_lane == position.lane); 701 /* 2 Creating a new block */ 702 ref_block = instance->memory + 703 instance->lane_length * ref_lane + ref_index; 704 curr_block = instance->memory + curr_offset; 705 if (instance->version == ARGON2_OLD_VERSION_NUMBER) { 706 /* version 1.2.1 and earlier: overwrite, not XOR */ 707 fill_block(instance->memory + prev_offset, ref_block, 708 curr_block); 709 } else { 710 if(0 == position.pass) { 711 fill_block(instance->memory + prev_offset, ref_block, 712 curr_block); 713 } else { 714 fill_block_with_xor(instance->memory + prev_offset, 715 ref_block, curr_block); 716 } 717 } 718 } 720 free(pseudo_rands); 721 } 723 uint32_t index_alpha(const argon2_instance_t *instance, 724 const argon2_position_t *position, 725 uint32_t pseudo_rand, 726 int same_lane) { 727 /* 728 * Pass 0: 729 * This lane : all already finished segments plus already 730 * constructed blocks in this segment 731 * Other lanes : all already finished segments 732 * Pass 1+: 733 * This lane : (SYNC_POINTS - 1) last segments plus 734 * already constructed blocks in this segment 735 * Other lanes : (SYNC_POINTS - 1) last segments 736 */ 738 uint32_t reference_area_size; 739 uint64_t relative_position; 740 uint32_t start_position, absolute_position; 742 if (0 == position->pass) { 743 /* First pass */ 744 if (0 == position->slice) { 745 /* First slice */ 746 reference_area_size = 747 position->index - 1; /* all but the previous */ 748 } else { 749 if (same_lane) { 750 /* The same lane => add current segment */ 751 reference_area_size = position->slice * 752 instance->segment_length + 753 position->index - 1; 754 } else { 755 reference_area_size = position->slice * 756 instance->segment_length + 757 ((position->index == 0) ? (-1) : 0); 758 } 759 } 760 } else { 761 /* Second pass */ 762 if (same_lane) { 763 reference_area_size = instance->lane_length - 764 instance->segment_length + 765 position->index - 1; 766 } else { 767 reference_area_size = instance->lane_length - 768 instance->segment_length + 769 ((position->index == 0) ? (-1) : 0); 770 } 771 } 773 /* 1.2.4. Mapping pseudo_rand to 0.. 774 and produce relative position */ 775 relative_position = pseudo_rand; 776 relative_position = relative_position * relative_position >> 32; 777 relative_position = reference_area_size - 1 - 778 (reference_area_size * relative_position >> 32); 780 /* 1.2.5 Computing starting position */ 781 start_position = 0; 783 if (0 != position->pass) { 784 start_position = (position->slice == ARGON2_SYNC_POINTS - 1) 785 ? 0 786 : (position->slice + 1) * 787 instance->segment_length; 788 } 790 /* 1.2.6. Computing absolute position */ 791 absolute_position = (start_position + relative_position) % 792 instance->lane_length; /* absolute position */ 793 return absolute_position; 794 } 796 int fill_memory_blocks(argon2_instance_t *instance) { 797 uint32_t r, s; 798 argon2_thread_handle_t *thread = NULL; 799 argon2_thread_data *thr_data = NULL; 801 if (instance == NULL || instance->lanes == 0) { 802 return ARGON2_THREAD_FAIL; 803 } 805 /* 1. Allocating space for threads */ 806 thread = calloc(instance->lanes, sizeof(argon2_thread_handle_t)); 807 if (thread == NULL) { 808 return ARGON2_MEMORY_ALLOCATION_ERROR; 809 } 811 thr_data = calloc(instance->lanes, sizeof(argon2_thread_data)); 812 if (thr_data == NULL) { 813 free(thread); 814 return ARGON2_MEMORY_ALLOCATION_ERROR; 815 } 817 for (r = 0; r < instance->passes; ++r) { 818 for (s = 0; s < ARGON2_SYNC_POINTS; ++s) { 819 int rc; 820 uint32_t l; 822 /* 2. Calling threads */ 823 for (l = 0; l < instance->lanes; ++l) { 824 argon2_position_t position; 826 /* 2.1 Join a thread if limit is exceeded */ 827 if (l >= instance->threads) { 828 rc = argon2_thread_join(thread[l - instance->threads]); 829 if (rc) { 830 free(thr_data); 831 free(thread); 832 return ARGON2_THREAD_FAIL; 833 } 835 } 837 /* 2.2 Create thread */ 838 position.pass = r; 839 position.lane = l; 840 position.slice = (uint8_t)s; 841 position.index = 0; 842 /* preparing the thread input */ 843 thr_data[l].instance_ptr = instance; 844 memcpy(&(thr_data[l].pos), &position, 845 sizeof(argon2_position_t)); 846 rc = argon2_thread_create(&thread[l], &fill_segment_thr, 847 (void *)&thr_data[l]); 848 if (rc) { 849 free(thr_data); 850 free(thread); 851 return ARGON2_THREAD_FAIL; 852 } 854 /* fill_segment(instance, position); */ 855 /*Non-thread equivalent of the lines above */ 856 } 858 /* 3. Joining remaining threads */ 859 for (l = instance->lanes - instance->threads; l < instance->lanes; 860 ++l) { 861 rc = argon2_thread_join(thread[l]); 862 if (rc) { 863 return ARGON2_THREAD_FAIL; 864 } 865 } 866 } 867 } 869 if (thread != NULL) { 870 free(thread); 871 } 872 if (thr_data != NULL) { 873 free(thr_data); 874 } 876 return ARGON2_OK; 877 } 878 6. Test Vectors 880 This section contains test vectors for Argon2. 882 6.1. Argon2d Test Vectors 884 ======================================= 885 Argon2d version number 19 886 ======================================= 887 Memory: 32 KiB 888 Iterations: 3 889 Parallelism: 4 lanes 890 Tag length: 32 bytes 891 Password[32]: 01 01 01 01 01 01 01 01 892 01 01 01 01 01 01 01 01 893 01 01 01 01 01 01 01 01 894 01 01 01 01 01 01 01 01 895 Salt[16]: 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 896 Secret[8]: 03 03 03 03 03 03 03 03 897 Associated data[12]: 04 04 04 04 04 04 04 04 04 04 04 04 898 Pre-hashing digest: b8 81 97 91 a0 35 96 60 899 bb 77 09 c8 5f a4 8f 04 900 d5 d8 2c 05 c5 f2 15 cc 901 db 88 54 91 71 7c f7 57 902 08 2c 28 b9 51 be 38 14 903 10 b5 fc 2e b7 27 40 33 904 b9 fd c7 ae 67 2b ca ac 905 5d 17 90 97 a4 af 31 09 907 After pass 0: 908 Block 0000 [ 0]: db2fea6b2c6f5c8a 909 Block 0000 [ 1]: 719413be00f82634 910 Block 0000 [ 2]: a1e3f6dd42aa25cc 911 Block 0000 [ 3]: 3ea8efd4d55ac0d1 912 ... 913 Block 0031 [124]: 28d17914aea9734c 914 Block 0031 [125]: 6a4622176522e398 915 Block 0031 [126]: 951aa08aeecb2c05 916 Block 0031 [127]: 6a6c49d2cb75d5b6 918 After pass 1: 919 Block 0000 [ 0]: d3801200410f8c0d 920 Block 0000 [ 1]: 0bf9e8a6e442ba6d 921 Block 0000 [ 2]: e2ca92fe9c541fcc 922 Block 0000 [ 3]: 6269fe6db177a388 923 ... 924 Block 0031 [124]: 9eacfcfbdb3ce0fc 925 Block 0031 [125]: 07dedaeb0aee71ac 926 Block 0031 [126]: 074435fad91548f4 927 Block 0031 [127]: 2dbfff23f31b5883 929 After pass 2: 930 Block 0000 [ 0]: 5f047b575c5ff4d2 931 Block 0000 [ 1]: f06985dbf11c91a8 932 Block 0000 [ 2]: 89efb2759f9a8964 933 Block 0000 [ 3]: 7486a73f62f9b142 934 ... 935 Block 0031 [124]: 57cfb9d20479da49 936 Block 0031 [125]: 4099654bc6607f69 937 Block 0031 [126]: f142a1126075a5c8 938 Block 0031 [127]: c341b3ca45c10da5 939 Tag: 51 2b 39 1b 6f 11 62 97 940 53 71 d3 09 19 73 42 94 941 f8 68 e3 be 39 84 f3 c1 942 a1 3a 4d b9 fa be 4a cb 944 6.2. Argon2i Test Vectors 946 ======================================= 947 Argon2i version number 19 948 ======================================= 949 Memory: 32 KiB 950 Iterations: 3 951 Parallelism: 4 lanes 952 Tag length: 32 bytes 953 Password[32]: 01 01 01 01 01 01 01 01 954 01 01 01 01 01 01 01 01 955 01 01 01 01 01 01 01 01 956 01 01 01 01 01 01 01 01 957 Salt[16]: 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 958 Secret[8]: 03 03 03 03 03 03 03 03 959 Associated data[12]: 04 04 04 04 04 04 04 04 04 04 04 04 960 Pre-hashing digest: c4 60 65 81 52 76 a0 b3 961 e7 31 73 1c 90 2f 1f d8 962 0c f7 76 90 7f bb 7b 6a 963 5c a7 2e 7b 56 01 1f ee 964 ca 44 6c 86 dd 75 b9 46 965 9a 5e 68 79 de c4 b7 2d 966 08 63 fb 93 9b 98 2e 5f 967 39 7c c7 d1 64 fd da a9 969 After pass 0: 970 Block 0000 [ 0]: f8f9e84545db08f6 971 Block 0000 [ 1]: 9b073a5c87aa2d97 972 Block 0000 [ 2]: d1e868d75ca8d8e4 973 Block 0000 [ 3]: 349634174e1aebcc 974 ... 975 Block 0031 [124]: 975f596583745e30 976 Block 0031 [125]: e349bdd7edeb3092 977 Block 0031 [126]: b751a689b7a83659 978 Block 0031 [127]: c570f2ab2a86cf00 980 After pass 1: 981 Block 0000 [ 0]: b2e4ddfcf76dc85a 982 Block 0000 [ 1]: 4ffd0626c89a2327 983 Block 0000 [ 2]: 4af1440fff212980 984 Block 0000 [ 3]: 1e77299c7408505b 985 ... 986 Block 0031 [124]: e4274fd675d1e1d6 987 Block 0031 [125]: 903fffb7c4a14c98 988 Block 0031 [126]: 7e5db55def471966 989 Block 0031 [127]: 421b3c6e9555b79d 991 After pass 2: 992 Block 0000 [ 0]: af2a8bd8482c2f11 993 Block 0000 [ 1]: 785442294fa55e6d 994 Block 0000 [ 2]: 9256a768529a7f96 995 Block 0000 [ 3]: 25a1c1f5bb953766 996 ... 997 Block 0031 [124]: 68cf72fccc7112b9 998 Block 0031 [125]: 91e8c6f8bb0ad70d 999 Block 0031 [126]: 4f59c8bd65cbb765 1000 Block 0031 [127]: 71e436f035f30ed0 1001 Tag: c8 14 d9 d1 dc 7f 37 aa 1002 13 f0 d7 7f 24 94 bd a1 1003 c8 de 6b 01 6d d3 88 d2 1004 99 52 a4 c4 67 2b 6c e8 1006 7. Acknowledgements 1008 TBA 1010 8. IANA Considerations 1012 None. 1014 9. Security Considerations 1016 9.1. Security as hash function and KDF 1018 The collision and preimage resistance levels of Argon2 are equivalent 1019 to those of the underlying Blake2b hash function. To produce a 1020 collision, 2**256 inputs are needed. To find a preimage, 2**512 1021 inputs must be tried. 1023 The KDF security is determined by the key length and the size of the 1024 internal state of hash function H'. To distinguish the output of 1025 keyed Argon2 from random, minimum of (2**128,2**length(K)) calls to 1026 Blake2b is needed. 1028 9.2. Security against time-space tradeoff attacks 1030 Time-space tradeoffs allow computing a memory-hard function storing 1031 fewer memory blocks at the cost of more calls to the internal 1032 comression function. The advantage of tradeoff attacks is measured 1033 in the reduction factor to the time-area product, where memory and 1034 extra compression function cores contribute to the area, and time is 1035 increased to accomodate the recomputation of missed blocks. A high 1036 reduction factor may potentially speed up preimage search. 1038 The best attacks on the 1-pass and 2-pass Argon2i is the low-storage 1039 attack described in [CBS16], which reduces the time-area product 1040 (using the peak memory value) by the factor of 5. The best attack on 1041 3-pass Argon2i is the ranking tradeoff attack[AB15], which reduces 1042 the time-area product by the factor of 3. The best attack on 4 1043 passes and more of Argon2i is [AB16], but its reduction factor does 1044 not exceed 2 up to 32 GiB of memory. To completely prevent time- 1045 space tradeoffs from [AB16], number t of passes must exceed binary 1046 logarithm of memory minus 26. 1048 The best attack on t-pass Argon2d is the ranking tradeoff attack, 1049 which reduces the time-area product by the factor of 1.33. 1051 10. References 1053 10.1. Normative References 1055 [I-D.saarinen-blake2] 1056 Saarinen, M. and J. Aumasson, "The BLAKE2 Cryptographic 1057 Hash and MAC", draft-saarinen-blake2-06 (work in 1058 progress), August 2015. 1060 10.2. Informative References 1062 [ARGON2] Biryukov, A., Dinu, D., and D. Khovratovich, "Argon2: the 1063 memory-hard function for password hashing and other 1064 applications", 1065 WWW , 1066 October 2015. 1068 [CBS16] Corrigan-Gibbs, H., Boneh, D., and S. Schechter, "Balloon 1069 Hashing: Provably Space-Hard Hash Functions with Data- 1070 Independent Access Patterns", 1071 WWW , January 2016. 1073 [AB16] Alwen, J. and J. Blocki, "Efficiently Computing Data- 1074 Independent Memory-Hard Functions", 1075 WWW , December 2015. 1077 [AB15] Biryukov, A. and D. Khovratovich, "Tradeoff Cryptanalysis 1078 of Memory-Hard Functions", 1079 Asiacrypt'15 , 1080 December 2015. 1082 Authors' Addresses 1084 Alex Biryukov 1085 University of Luxembourg 1087 Email: alex.biryukov@uni.lu 1089 Daniel Dinu 1090 University of Luxembourg 1092 Email: dumitru-daniel.dinu@uni.lu 1094 Dmitry Khovratovich 1095 University of Luxembourg 1097 Email: dmitry.khovratovich@uni.lu 1099 Simon Josefsson 1100 SJD AB 1102 Email: simon@josefsson.org 1103 URI: http://josefsson.org/