[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Cfrg] Rationale for HKDF



There have been many interesting and relevant issues raised in these
discussions. Instead of responding to each message separately I will try to
answer as many of them as I can in this -- not so concise -- message.
(I will then answer some more specific messages separately.)

The issues raised in this discussion and many other aspects of KDF design and
analysis are much less trivial than they may seem and most do not have a simple
answer (except if one trivializes the problem assuming we have perfect,
unbreakable hash functions). For that reason I wrote a paper that should
hopefully provide well founded answers that cannot be argued in an email
exchange.  The paper is posted in:
http://webee.technion.ac.il/~hugo/kdf/

I must say that what motivated me to write this paper were the recurring
(and endless) discussions in this cfrg list about basic questions such as
"What is a KDF, What properties define this primitive, Can we achieve them,
analyze them, etc.?"  In spite of the apparent simplicity of KDF as a
cryptographic primitive there are a LOT of subtleties surrounding this notion,
especially if one is willing to come up with a design that is well-founded on
current crypto knowledge; can be applied in many different practical scenarios;
and USES HASH FUNCTIONS AS PRUDENTLY AS POSSIBLE.

The latter is a very  important point and it means that we want to trust our
hash functions as little as possible. For example, if one KDF scheme requires a
hash function to be fully collision resistant and another will only require
second preimage resistance then the latter should be considered a much better
design. If one design can only be justified assuming the hash function behaves
magically as a perfect random function (or random oracle) and another only
requires combinatorial properties of the hash function or the pseudorandom
properties of the compression function then the latter should be favored.
One of the goals of the above paper is to illustrate how common KDFs
(including NIST's, ANSI, etc) ABUSE the hash functions by designing the KDF
under the premise (explicit or implicit) that the hash is some magic object
with perfect randomness.

The HKDF design in turn is much more careful. It is geared to base the
(mathematical) rationale of the scheme on as weak as possible assumptions on the
underlying hash. True, in some cases (for example, where one is supposed to
extract ALL the entropy of an input without even having a salt value), also
HKDF requires some idealized assumptions. But in many other cases the
scheme can be based on much weaker and realistic assumptions (e.g., HKDF as used
in IKE with safe-prime DH groups does not require any such ideal modeling nor
are these assumptions needed in the many realistic cases where you have a source
of high entropy and a salt value). Even when assuming some random-oracle type
properties from the hash, we can make a much more sound analysis modeling the
compression function as random rather than the iterated hash (this is a prudent
approach as exemplified by extension attacks that work on the iterated hash but
not on the compression function).

Another very important aspect of HKDF and its analysis is that we are not asking
the individual user or engineer to study the specifics of his application and
then decide what KDF to use. We propose a single design that can be justified
with different assumptions in different scenarios, and does that in quite an
optimal way. It is true that in some particular cases a perfect combinatorial
extractor will do better analytically and it is fine if an application decides
to use such extractor; but we want one, as general as possible, scheme that does
not require the user or engineer to have a crisp understanding of the exact
"entropy gap" in his application or other such hard-to-figure parameters.
We want a general purpose design that is conservative yet practical. HKDF does
quite well in this regard, certainly at the analytical level (in particular much
better than the traditional repeated hash approach of ANSI, NIST, etc).

I know that many people are skeptical of the value of a complexity-theoretic
type analysis as the one I present for HKDF. I am not going to convince anyone
with such prejudice to change their minds. For those that value this approach, I
think that they should find the HKDF analysis valuable. For those in between I
would suggest thinking about the original HMAC design and its purposes (and
proof) and how this design and mathematical analysis helped the scheme to
resist the passage of time and the increasing hash function cryptanalysis.
Let me be illustrate this point with some "historical anecdotes".

At the time (ca. 1995) the proposals on the table for an "MD5-based MAC"
(message authentication code to be used in IPsec) were the prepended-key MAC:
MAC(K,MSG)=MD5(K || MSG), and the appended-key MAC: MAC(K,MSG)=MD5(MSG || K)).
The supporters of the former claimed that since every packet header had the
length of the IP packet then extension attacks would not be possible. Guess
what? After a short while ESP was introduced that did not include the header
under the MAC. Had we used that proposal it would have been broken after 6
months. The appended-key proposal was much better. But if we had adopted that
one, it would mean that today (and actually quite many years ago) we would
have a broken MAC. Indeed, the appended-key scheme is broken if collisions
against the hash function are possible. Even the combined prepend-and-append
scheme was found to have some surprising weaknesses [Preneel-van Oorschot].
HMAC instead had much weaker assumption on the (compression) hash function,
and that's why its use with SHA1 today (or even with MD5 to some extent) can
still considered secure. An even more important point to make here is that if
we treat the hash function as a random function all of the above designs are
equally good. That is why ideal modelings are "bad advisers"; they have to be
left only for the cases where we do not have any other advise. And that is a big
differentiator between HKDF and other KDFs.

What we are trying to do with HKDF is exactly the same as we did with HMAC:
Prudent design combining sound analysis with careful engineering considerations.
I see no reason to stick to "legacy" KDFs for new applications. I am not
claiming or implying that applications using other KDFs should necessarily
move to HKDF, but it would be a good idea for new ones to adopt this more
careful design. No reason to stick to "legacy designs".

And, by the way, those that prefer to stick to designs already used in practice
should have no problem with HKDF. As it is defined in our internet draft it is
exactly the KDF used in IKE (both v1 and v2) and, I believe, in some other
applications. In that sense, even NIST approves this function (the next step
would be not only to approve it but to recommend it...)

Some have argued that we have too many KDFs already (NIST, ANSI, IEEE, PBKDF,
etc.). That is true, but rather than being an argument against HKDF it is one
important reason for the IETF to recommend a specific one for new applications.
Things would be different if we just had one well-established KDF that fits all
cases.

I have not entered a more technical discussion of the results in the paper.
I feel that such discussion is needed as a response to some questions that
were raised in this thread. I will answer those separately.
One thing I can say in one sentence here: A first basic test for a KDF is to
check whether it becomes a strong PRF when the source key material is fuly
random; if it doesn't then the design is flawed...  (e.g.  "repeated-hash"
designs of the type NIST, IEEE, etc do NOT pass this test -- see the paper).

Sorry for the length. I hope it does help to position this work better and
respond to many of the messages in this list. I will try to respond to more
specific issues separately later.

Hugo