[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Cfrg] Answers to HKDF questions



Here is a list of questions people asked related to HKDF in the last two days
with some answers. I wrote it in this way as it seems more manageable than
answering 20 or so messages, many raising similar questions.
SEE ALSO MY LONG MESSAGE FROM EARLIER TODAY WITH MORE
GENERAL RATIONALE.

>Many comments ask "why we need yet another KDF, if we have so many".

Answer: Because we have too many! (*)
Designers of new protocols/applications do not have any guidance to choose
between them. No one took the effort to sort out what properties we are looking
for. What does each of these schemes provide and what they lack, etc etc.
Well, I did.
My work is far from a full taxonomy of properties, scenarios and existing
schemes. But it does provide the main guidelines, and it does show (I want to
believe) that analysis does favor some schemes over others. In particular, HKDF
is the closest I could get to something that is general purpose enough and
well-founded in analytics.

(*) The best answer to the above question is given by Peter's own question:
> In fact every protocol that I know of has invented its own KDF, incompatible
> with every other KDF out there, thus my question about why we need yet another
> one.
We do not need another one, we need one...

> Dan Harkins and others: How is HMAC(0,X) better than SHA256(X||X).

Merkle-Damgard (MD), the iteration mode underlying SHA256, does not preserve
"randomness" in the following sense: Even if the compression function that you
use as the underlying hash is purely random, the result you get from M-D
iterating this compression function is NOT random (actually not even a good
extractor). HMAC, on the other hand, if you start with a random compression
function you end with a random oracle (i.e., a function that looks random to an
adversary that can only query the function in a bounded number of points).
This is known in the literature as "random oracle indifferentiability".
It is not a result about actual hash functions but a strong result about the
STRUCTURAL STRENGTH OF HMAC (if it starts with something good it preserves the
goodness). There are several papers on this; the main ones are [23,17] in my paper.

>What is the value of HKDF when salt is set to 0.

In this case we can only say useful things if we assume the compression function
to be random. In this case, as said above in the context of indifferentiability,
HMAC implies a random oracle and then good extraction properties while in the
case of a plain M-D hash an underlying random compression function does not
imply randomness and not even extraction. Moreover, the fact that you apply HMAC
only once (with salt=0) to concentrate the input entropy in one key, and then
use this key with the PRF is much better than the naive (unsalted) repetition
SHA(X||1)||SHA(X||2)||... common to other KDFs -- for more on this see the paper.

>Dan: HKDF is a GENERALIZATION of IKE's HKDF.

True, but only for the unsalted case. Note that IKEv1 illustrates nicely the
versatility of HKDF. It can use public nonces for salt in the signature mode,
and it can use a secret salt in PK encryption mode. The latter gives you a much
stronger KDF. One that is only based on the PRF properties of HMAC (since the
salt acts as a secret key to the PRF). IKE is also a good example of how
applications may use/find salt. Most other protocols do not use salt even if
they have it available (e.g., SP 800-56A). Salting the KDF has important
benefits for randomization and independence.

>Uri repeatedly asks what makes HMAC appropriate for extraction/KDF.

You should read the paper and the results cited there (especially section 8).
Keep in mind that HMAC can only inherit its properties from the underlying hash
function. It cannot transform a bad hash function into a good one, but for some
functionalities (KDF is one of them) it can use weaker hash functions for
stronger purposes than alternative schemes using the plain hash.
If you are not willing to assume anything from the underlying hash function
other than collision resistance, then there is only one result I can offer to
you: Look at the result from [23] quoted in page 24 of my paper (second bullet).
That result assumes properties of the hash function implied by collision
resistance only. The downside: It applies only when you truncate the output of
the extract part (for example, you use SHA-512 for extraction and SHA-256 for
PRF). But it may not be totally foolish (smiley inserted) for you to make a
heuristic leap of faith and accept this as an indication of strength also when
you do not truncate.

>Peter Gutman and Simon J.: PBKDF.

PBKDF is quite specific to passwords since it has the iterations count that is
not needed in general and you certainly do now want to artificially slow down
your KDF when your source has decent entropy. I believe that PBKDF2 is a good
design for passwords: it uses an iteration counter, salt, and HMAC as the
underlying PRF (better in that sense than PBKDF1). You could make it general
purpose by having the option of using it with iteration-count=1. However, even
in this case the design is not great as a general purpose KDF since it uses the
source keying material directly as a PRF key. There is nothing we can say in
general if you key a PRF with a non-uniform key (some PRFs do not even accept
arbitrary-long keys). That's why you first extract (or concentrate) the entropy
of the input into a short key and then use that key for keying the PRF in the
extraction stage.

> Can we use HKDF for passwords?

Not as defined. For passwords you need to replace the entropy of the source with
slowing iterations of the function to slow down dictionary attacks (see page 25
in my paper). This can be added, but it is not included now. For example, for
the specific case of passwords one could replace the extract part of HKDF with
the F(P,S,c,1) function from PBKDF2. I am not sure if this is something we want
to include in our draft. I will discuss with Pasi.


>The input source is random (a strong cryptographic key), can we skip extract?

When your input is already a strong cryptographic key you may choose to skip the
extract part and go directly to expansion. For this any (variable output) PRF
will do. The expansion part of HKDF is an implementation of such a PRF.

>David McGrew raises the question that we may need different KDFs for different
>settings.

I believe that having one general-purpose KDF is IMPORTANT. It does not preclude
an application to choose another one or a variant. But if we start standardizing
a KDF for each usage scenario (including different types of key material sources)
we will have to leave in the user's or protocol designer's hands to choose what
is best for his setting. Not only this is not an easy task (understanding
entropy and extraction in full generality is hard) but what you may assume for
an application today may be different tomorrow. The one case where a special
variant may be needed is for passwords (see above).

>Are block ciphers better than hash for KDF?

Block ciphers are good as a basis for a PRF (e.g via a variable input PRF mode).
However, they are more problematic for extraction. For block ciphers as
extractors we can only say things under the assumption that the block cipher is
a truly random permutation. In the case of HMAC we have much richer results
including those that do not idealize the hash function.
(BTW, those that said to believe more on ideal ciphers than in random oracles,
should pay attention to the related-key attacks against AES-256!)

>David McGrew: Should we standardize on the extract part and allow for multiple
>expansion modules?
Good question. From a modularity point of view the answer is yes.
From the IETF point of view that wants to have as specific transforms as
possible this may be problematic. In other words, my academic side would prefer
modularity in the spec and independence between extraction and expansion
(as long as the latter uses a good appropriate PRF mode). But I think that the
IETF (well represented in this work by Pasi :) will not like that.

>Zooko: How about outputting a random (algebraic) group element instead of a
>(close to) uniform string?

Even a "general purpose" KDF has limits to its generality. To me, the main need
for applications is to have a KDF that produces uniform (or "close to uniform")
strings of a certain length(s). In particular, the problem with choosing a
random group element is that techniques to do so are group-dependent (you do it
differently in a finite field multiplicative group than in an RSA group than in
an elliptic curve; you may even have different techniques for different types of
curves). So I do not see that you can find a "fits all" solution here.
(Neither is HKDF a fit-all solution - for example it doesn't do what you want -
but it as close as it get to serve the most common applications).

> David McGrew: what is the precise security statement for HKDF?
> What assumptions does one need to make about the hash function used in HKDF in
> order that the security analysis applies? The paper says that "it is shown in
> [23] (see Section 8) that using HMAC with a truncated output as an extractor
> allows to prove security under considerably weaker assumptions on the
> underlying hash function." However, both of the Lemmas in that paper (and the
> implication in Section 8) make random oracle assumptions.

Section 8 contains a summary of results from [23,17] regarding the properties of
HMAC extractor. These results are quantitative, relating the strength of the
underlying hash function with the quality of extraction. They cover a variety of
situations depending on the entropy of the source and the existence of salt.
The truncated output result that does not require any form of idealized
assumption (uses almost universality and/or collision resistance) is quoted in
my paper page 24 second bullet and is proven in [23].

I apologize for questions I missed.

Hugo