]>
The memory-hard Argon2 password hash and proof-of-work function
University of Luxembourgalex.biryukov@uni.luUniversity of Luxembourgdumitru-daniel.dinu@uni.luABDK Consultingkhovratovich@gmail.comSJD ABsimon@josefsson.orghttp://josefsson.org/This document describes the Argon2 memory-hard function for
password hashing and proof-of-work applications. We provide an
implementer-oriented description together with sample code and
test vectors. The purpose is to simplify adoption of Argon2 for
Internet protocols.This document describes the Argon2 memory-hard function for
password hashing and proof-of-work applications. We provide an
implementer oriented description together with sample code and
test vectors. The purpose is to simplify adoption of Argon2 for
Internet protocols. This document corresponds to version 1.3 of the Argon2 hash
function.Argon2 summarizes the state of the art in the design of
memory-hard functions. It is a streamlined and simple design.
It aims at the highest memory filling rate and effective use of
multiple computing units, while still providing defense against
tradeoff attacks. Argon2 is optimized for the x86 architecture
and exploits the cache and memory organization of the recent
Intel and AMD processors. Argon2 has one primary variant: Argon2id, and two supplementary variants: Argon2d and
Argon2i. Argon2d uses data-dependent memory
access, which makes it suitable for cryptocurrencies and
proof-of-work applications with no threats from side-channel
timing attacks. Argon2i uses data-independent memory access,
which is preferred for password hashing and password-based key
derivation. Argon2id works as Argon2i for the first half of the first iteration over the
memory, and as Argon2d for the rest, thus providing both side-channel attack protection and
brute-force cost savings due to time-memory tradeoffs. Argon2i makes more passes over the
memory to protect from tradeoff attacks. Argon2 can be viewed as a mode of operation over a fixed-input-length compression function G and
a variable-input-length hash function H. Even though Argon2 can be potentially used with arbitrary function H,
as long as it provides outputs up to 64 bytes, in this document it MUST be BLAKE2b.For further background and discussion, see the Argon2 paper.x^y --- integer x multiplied by itself integer y timesa*b --- multiplication of integer a and integer bc-d --- substraction of integer c with integer dE_f --- variable E with subscript index fg / h --- integer g divided by integer h. The result is rational numberI(j) --- function I evaluated on integer parameter jK || L --- string K concatenated with string La XOR b --- bitwise exclusive-or between bitstrings a and ba mod b --- remainder of integer a modulo integer b, always in range [0, b-1]a >>> n --- rotation of 64-bit string a to the right by n bitstrunc(a) --- the 64-bit value, truncated to the 32 least significant
bitsfloor(a) --- the largest integer not bigger than aceil(a) --- the smallest integer not smaller than aextract(a, i) --- the i-th set of 32-bits from bitstring a, starting from 0-th|A| --- the number of elements in set ALE32(a) --- 32-bit integer a converted to bytestring in little endian. Example: 123456 (decimal) is 40 E2 01 00.LE64(a) --- 64-bit integer a converted to bytestring in little endian. Example: 123456 (decimal) is 40 E2 01 00 00 00 00 00.int32(s) --- 32-bit string s is converted to non-negative integer in little endian.int64(s) --- 64-bit string s is converted to non-negative integer in little endian.length(P) --- the bytelength of string P expressed as 32-bit integerArgon2 has the following input parameters:
Message string P, which is a password for password hashing
applications. May have any length from 0 to 2^(32) - 1 bytes.Nonce S, which is a salt for password hashing applications.
May have any length from 8 to 2^(32)-1 bytes. 16 bytes is recommended for
password hashing. Salt SHOULD be unique for each password.Degree of parallelism p determines how many independent
(but synchronizing) computational chains (lanes) can be
run. It may take any integer value from 1 to 2^(24)-1.Tag length T may be any integer number of bytes from 4 to
2^(32)-1.Memory size m can be any integer number of kibibytes from
8*p to 2^(32)-1. The actual number of blocks is m', which is
m rounded down to the nearest multiple of 4*p.Number of iterations t (used to tune the running time
independently of the memory size) can be any integer number
from 1 to 2^(32)-1.Version number v is one byte 0x13.Secret value K (serves as key if necessary, but we do not
assume any key use by default) may have any length from 0 to
2^(32)-1 bytes.Associated data X may have any length from 0 to 2^(32)-1
bytes.Type y of Argon2: 0 for Argon2d, 1 for Argon2i, 2 for Argon2id.The Argon2 output, or "tag" is a string T bytes long.Argon2 uses an internal compression function G with two
1024-byte inputs and a 1024-byte output, and an internal hash
function H^x() with x being its output length in bytes. Here H^x() applied to string A is the BLAKE2b function, which takes (d,|dd|,kk=0,nn=x) as parameters where d is A padded to a multiple of 128 bytes
and partitioned into 128-byte blocks. The compression function G is based on its internal
permutation. A variable-length hash function H' built upon H
is also used. G is described in Section and H' is described in
Section .The Argon2 operation is as follows.
Establish H_0 as the 64-byte value as shown
below.
Allocate the memory as m' 1024-byte blocks where m' is
derived as:
For p lanes, the memory is
organized in a matrix B[i][j] of blocks with p rows (lanes)
and q = m' / p columns.Compute B[i][0] for all i ranging from (and including) 0
to (not including) p.
Compute B[i][1] for all i ranging from (and including) 0
to (not including) p.
Compute B[i][j] for all i ranging from (and including) 0
to (not including) p, and for all j ranging from (and
including) 2) to (not including) q. The block indices l
and z are determined for each i, j differently for Argon2d, Argon2i, and Argon2id (Section ).
If the number of iterations t is larger than 1, we repeat
the steps however replacing the computations with the
following expression:
After t steps have been iterated, the final block C is computed as
the XOR of the last column:
The output tag is computed as H'^T(C).Let V_i be a 64-byte block, and W_i be its first 32 bytes. Then we define:
To enable parallel block computation, we further partition the
memory matrix into S = 4 vertical slices. The intersection of a
slice and a lane is a segment of length q/S. Segments of the
same slice are computed in parallel and may not reference blocks
from each other. All other blocks can be referenced.J_1 is given by the first 32 bits of block B[i][j-1],
while J_2 is given by the next 32-bits of block B[i][j-1]:
Each application of the 2-round compression function G
in the counter mode gives 128 64-bit values X, which are viewed as X1||X2
and converted to J_1=int32(X1) and J_2=int32(X2).
The first input to G is the all zero block and the second
input to G is constructed as follows:
The values r, l, s, m', t, x, i are represented as 8 bytes in
little-endian.If the pass number is 0 and the slice number is 0 or 1, then compute J_1 and J_2 as
for Argon2i, else compute J_1 and J_2 as for Argon2d.The value of l = J_2 mod p gives the index of the lane from
which the block will be taken. For the firt pass (r=0) and
the first slice (s=0) the block is taken from the current lane.The set W contains the indices that can be referenced
according to the following rules:
If l is the current lane, then W includes the indices of
all blocks in the last S - 1 = 3 segments computed and finished, as well as
the blocks computed in the current segment in the current pass
excluding B[i][j-1].If l is not the current lane, then W includes the indices of
all blocks in the last S - 1 = 3 segments computed and finished
in lane l. If B[i][j] is the first block of a segment, then the
very last index from W is excluded.We are going to take a block from W with a non-uniform
distribution over [0, |W|) using the mapping
To avoid floating point computation, the following approximation
is used:
The value of z gives the reference block index in W.Compression function G is built upon the BLAKE2b round
function P. P operates on the 128-byte input, which can be
viewed as 8 16-byte registers:
Compression function G(X, Y) operates on two 1024-byte
blocks X and Y. It first computes R = X XOR Y. Then R is
viewed as a 8x8 matrix of 16-byte registers R_0, R_1, ... ,
R_63. Then P is first applied to each row, and then to each column to
get Z:
Finally, G outputs Z XOR R:
Permutation P is based on the round function of BLAKE2b. The 8
16-byte inputs S_0, S_1, ... , S_7 are viewed as a 4x4 matrix of
64-bit words, where S_i = (v_{2*i+1} || v_{2*i}):
It works as follows:
GB(a, b, c, d) is defined as follows:
The modular additions in GB are combined with 64-bit multiplications.
Multiplications are the only difference to the original BLAKE2b design.
This choice is done to increase the circuit depth and thus the running
time of ASIC implementations, while having roughly the same running
time on CPUs thanks to parallelism and pipelining.
Argon2d is optimized for settings where the adversary does
not get regular access to system memory or CPU, i.e. he can not
run side-channel attacks based on the timing information, nor he
can recover the password much faster using garbage
collection. These settings are more typical for backend servers
and cryptocurrency minings. For practice we suggest the
following settings:
Cryptocurrency mining, that takes 0.1 seconds on a 2 Ghz
CPU using 1 core — Argon2d with 2 lanes and 250 MB of RAM.Argon2id is optimized for more realistic settings, where the
adversary possibly can access the same machine, use its CPU or
mount cold-boot attacks. We suggest the following
settings:
Backend server authentication, that takes 0.5 seconds on a
2 GHz CPU using 4 cores — Argon2id with 8 lanes and 4 GiB of
RAM.Key derivation for hard-drive encryption, that takes 3
seconds on a 2 GHz CPU using 2 cores - Argon2id with 4 lanes
and 6 GiB of RAM.Frontend server authentication, that takes 0.5 seconds on a
2 GHz CPU using 2 cores - Argon2id with 4 lanes and 1 GiB of
RAM.We recommend the following procedure to select the type and
the parameters for practical use of Argon2.
Select the type y. If you do not know the difference
between them or you consider side-channel attacks as viable
threat, choose Argon2id.Figure out the maximum number h of threads that can be
initiated by each call to Argon2.Figure out the maximum amount m of memory that each call
can afford.Figure out the maximum amount x of time (in seconds) that
each call can afford.Select the salt length. 128 bits is sufficient for all
applications, but can be reduced to 64 bits in the case of
space constraints.Select the tag length. 128 bits is sufficient for most
applications, including key derivation. If longer keys are
needed, select longer tags.If side-channel attacks are a viable threat, or if you're uncertain, enable the
memory wiping option in the library call.Run the scheme of type y, memory m and h lanes and threads,
using different number of passes t. Figure out the maximum t
such that the running time does not exceed x. If it exceeds x
even for t = 1, reduce m accordingly.Hash all the passwords with the just determined values m,
h, and t.This section contains test vectors for Argon2.We thank greatly the following authors who helped a lot in preparing and reviewing this document: Jean-Philippe Aumasson,
Samuel Neves, Joel Alwen, Jeremiah Blocki, Bill Cox, Arnold Reinhold, Solar Designer, Russ Housley, Stanislav Smyshlyaev, Kenny Paterson, Alexey Melnikov.None.The collision and preimage resistance levels of Argon2 are equivalent to those of the underlying BLAKE2b hash function.
To produce a collision, 2^(256) inputs are needed. To find a preimage, 2^(512) inputs must be tried.The KDF security is determined by the key length
and the size of the internal state of hash function H'.
To distinguish the output of keyed Argon2 from random, minimum of (2^(128),2^length(K)) calls to BLAKE2b are needed. Time-space tradeoffs allow computing a memory-hard function storing fewer memory blocks at the cost of more calls to
the internal comression function. The advantage of tradeoff attacks is measured in the reduction factor to the time-area
product, where memory and extra compression function cores contribute to the area, and time is increased to accomodate the recomputation
of missed blocks. A high reduction factor may potentially speed up preimage search.
The best attacks on the 1-pass and 2-pass Argon2i is the low-storage
attack described in , which reduces the
time-area product (using the peak memory value) by the factor of 5.
The best attack on 3-pass and more Argon2i is with reduction factor being a function of
memory size and the number of passes. For 1 gibibyte of memory: 3 for 3 passes, 2.5 for 4 passes, 2 for 6 passes. The reduction
factor grows by about 0.5 with every doubling the memory size.
To completely prevent time-space tradeoffs from , the
number of passes must exceed binary logarithm of memory minus 26.
Asymptotically, the best attack on 1-pass Argon2i is given in with maximal advantage
of the adversary upper bounded by O(m^(0.233)) where m is the number of blocks. This attack is also asymptotically optimal as also prove the upper bound on any attack of O(m^(0.25)).
The best tradeoff attack on t-pass Argon2d is the ranking tradeoff attack,
which reduces the time-area product by the factor of 1.33.
The best attack on Argon2id can be obtained by complementing the best attack
on the 1-pass Argon2i with the best attack on a multi-pass Argon2d. Thus the best tradeoff attack on 1-pass Argon2id is the combined low-storage attack (for the first half of the memory) and
the ranking attack (for the second half), which bring together the factor of about 2.1. The best tradeoff attack on
t-pass Argon2id is the ranking tradeoff attack,
which reduces the time-area product by the factor of 1.33.
A bottleneck in a system employing the password-hashing function
is often the function latency rather than memory costs. A rational
defender would then maximize the bruteforce costs for the attacker equipped
with a list of hashes, salts, and timing information, for fixed computing time
on the defender’s machine. The attack cost estimates from
imply that for Argon2i, 3 passes is almost optimal for the most of reasonable memory sizes,
and that for Argon2d and Argon2id, 1 pass maximizes the attack costs for the constant defender time.
The Argon2id variant with t=1 and maximum available memory is recommended
as a default setting for all environments. This setting is secure against side-channel attacks
and maximizes adversarial costs on dedicated bruteforce hardware.
&BLAKE2;
Argon2: the memory-hard function for password hashing
and other applicationsHigh Parallel Complexity Graphs and Memory-Hard Functions
The BLAKE2 Cryptographic Hash and Message Authentication Code (MAC)
This document describes the cryptographic hash function BLAKE2 and makes the algorithm specification and C source code conveniently available to the Internet community. BLAKE2 comes in two main flavors: BLAKE2b is optimized for 64-bit platforms and BLAKE2s for smaller architectures. BLAKE2 can be directly keyed, making it functionally equivalent to a Message Authentication Code (MAC).
Balloon Hashing: Provably Space-Hard Hash Functions with
Data-Independent Access PatternsEfficiently Computing Data-Independent Memory-Hard FunctionsOn the Depth-Robustness and Cumulative Pebbling Cost of Argon2iArgon2: New Generation of Memory-Hard Functions for Password Hashing
and Other ApplicationsTradeoff Cryptanalysis of Memory-Hard Functions