[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cfrg] (no subject) (Randomness recommendations)



John Kelsey wrote:
...

As has been said before, a simple measure of a bit stream calculating
Shannon entropy is not good.  Min-entropy is difficult to compute for a
bit stream.  (What block size do you use?  And if you pick one large
enough to find many problems that might occur, it requires too much
space and time.)


I don't see why it's any harder to compute than Shannon entropy.  Entropy
as we're discussing is defined for a probability distribution.  For
Shannon entropy, you need to know the probabilities of all the symbols to
compute it; for Min-entropy, you need only know the highest probability.

You are correct: min-entropy is no harder than shannon entropy. Both have the same problems. You have to choose a block size. If want to catch correlations you need a block size big enough to encompass the correlated bits. Then you have to choose whether to use a sequence of adjacent blocks in the data stream, or a sliding window. The sequence of adjacent blocks will miss correlations that cross block boundaries. The sliding window will count the same pattern multiple times as different values. Thus both will overestimate the entropy but for different reasons.


You can avoid choosing a window by using an LZ compression scheme as in NIST 800-22 test 10 (Lempel-Ziv Compression Test). But
(a) NIST 800-22 test 10 has met with disfavor. (See Larry Bassam's slides at csrc.nist.gov/CryptoToolkit/RNG/Workshop/ValTestandSTS.pdf. By the way, what is the reason for the disfavor?)
(b) The test works by counting the number of nodes in the internal dictionary, and specifies an empirically determined mean and standard deviation that can be used to compute a P-value. However, the size of the tree does not relate directly to entropy, so getting an entropy measurment out of it is a bit of a research project.



Now, you *do* need to take into account any correlation between outputs, but that's true for both Shannon entropy and min-entropy. (The usual estimate of bits of Shannon entropy per letter in English text is based on conditional probabilities given previous letters seen, not just on the letter frequencies!)


And the purpose of min-entropy does not apply to a bit
stream that is going to be "mixed" or "accumulated".


What do you mean?

Consider a hyptothetical raw HW generator that outputs two bits each cycle

code  probability

00    1/2
10    1/4
11    1/4
10    0

Now, I'm not going to get into ascii art, but lets assume our mixer is an SHA1 hasher and a register (initially zero) with the register output fed back to the SHA1 input, with two of the bits XOR'd with the HW generator output.

Now the naive question is how many cycles of input do I need to "fill" the mixer to where all 2^160 states are reachable? A bit more accurately, it is probably not known that all 2^160 outputs are in fact reachable, and there will be multiple ways of reaching the same output code. So just for fun, lets consider the number of cycles so the most probable state occurs with probability 1/(2^(160-k)) for small k.

In the beginning there is exactly one state "reachable". After one cycle we are in one of 3 states, two with probabilty 1/4 and one with probabilty 1/2. After two cycles we one in one of 9 states. The number of reachable states grows exponentially a first, but pretty soon the probabilty of a collison becomes significant, so the growth in the number of reachable states slows, and the probabilities of the states equalizes.

I don't know how to solve this analytically, so I wrote a little simulation but with just 16 bits. I had an array of 65536 probabilities. At each stage, I just chose the 3 successor states for each state with a random number generator. One got half the probability and the other 2 got 1/4 of the probability. In reality the successor states are determined by the design of SHA1, and, of course, are the same on each round rather than chosen anew on each round as I did.

Here is the output of one run of my little simulation.

round	min-entropy	Shannon entropy

0       0.000000        0.000000
1       1.000000        1.500000
2       2.000000        3.000000
3       3.000000        4.500000
4       4.000000        6.000000
5       5.000000        7.500000
6       6.000000        8.993274
7       7.000000        10.467430
8       8.000000        11.896266
9       9.000000        13.218532
10      9.966577        14.284023
11      10.966577       14.964906
12      11.966577       15.309874
13      12.522495       15.456957
14      13.017274       15.521733
15      13.203137       15.545512
16      13.227718       15.554354
17      13.320790       15.562737
18      13.297072       15.560971
19      13.053438       15.563896
20      13.387674       15.563441


Yikes! I'm wrong. The min-entropy is always less than the sum of the min-entropies of the stirred in stuff. (The min-entropy of the generator is 1 bit per round, and the Shannon entropy is 1.5 bits.) I was thinking that the "equalization" due to collisions would equalize the probabilities, reducing the difference between means and maxima.



Look, what we really want to know is "how hard is it for the attacker to attack our PRNG by guessing our seed?" So, if we know the most probable value for the seed, that gives us a nice worst-case estimate of how hard it is: if the most likely seed has probability 2^{-100}, then we know that there's no way for an attacker to try N guesses and have better than a probability of N*2^{-100} of success.

Suppose you send in a sequence of 256-bit blocks to be accumulated, where
each block has a maximum probability, conditioned on all previous blocks,
of P[i].  After we send in 100 such blocks, what's the maximum probability
of the whole input sequence?  p[0]*p[1]*p[2]*...*p[99].  Since the
min-entropy of block i is ME[i] = -lg(p[i]), we can add those logs to
multiply the probabilities, so we get ME[whole sequence] =
ME[0]+ME[1]+...+ME[99].

Now, why is that important?  Because assuming the accumulator isn't losing
any entropy (it will start losing a lot, once the amount of entropy gets
close to the size of the memory of the accumulator), that tells us the
maximum probability of the accumulated state.

Let's make this concrete.  Suppose I have a ring oscillator source which
buffers 32 bits at a time.  Based on a lot of statistical analysis, I am
convinced that conditioned on all previous bits of output, the next 32-bit
block of output never has probability higher than 1/8=2^{-3}, so it has 3
bits of min-entropy.  Now, I need to accumulate enough to make a 128 bit
AES key for my PRNG.  So I know that I need to accumulate ceiling(128/3) =
43 samples.  When I've done that, I know that my sequence of samples has
maximum probability of less than 2^{-128}, meaning that an attacker who's
trying to attack my seed by guessing what samples I got from my ring
oscillator source is worse off than if he just tried to guess my AES key
from the start.

...

-- David Jacobson


--John Kelsey, NIST, September 2004

In summary, my main point was that there are enough difficulties that the document should contain a reference or two to literature on estimating entropy.


And, thank you, John, for a good educational experience.

  -- David Jacobson


_______________________________________________ Cfrg mailing list Cfrg at ietf.org https://www1.ietf.org/mailman/listinfo/cfrg