CFRG S. Halevi (IBM)
Internet-Draft H. Krawczyk (IBM)
Expires: November 12, 2005 12 May, 2005
Strengthening Digital Signatures via Randomized Hashing
draft-irtf-cfrg-rhash-00.txt
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
The list of current Internet-Drafts can be accessed at http://
www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on November 12, 2005.
Copyright (C) The Internet Society (2005). All Rights Reserved.
Abstract
We propose to adopt randomized hashing as a mode of operation for
existing and future cryptographic hash functions. The idea is to
strengthen hash functions for use in the context of digital
signatures without requiring a change to the actual hashing and
signing algorithms or to their existing implementations.
We suggest that randomization can be achieved via the processing of
the input to the function, even if the hash function itself is not
randomized. Effective use of such mode of operation requires
changes to the standardization of the encoding and processing of
digital signatures (e.g., PKCS#1, FIPS186) but has no impact on
existing signature and hashing algorithms. We urge the standards
community to plan a transition towards these new mechanisms for
which we outline specific instantiations.
Halevi and Krawczyk [Page 1]
Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005
1 Introduction
Recent cryptanalytical advances in the area of collision-resistant
hash functions (CRHF), specifically the attacks against MD5 and
SHA-1, have not only shaken our confidence in the security of
specific constructions but, more fundamentally, have cast doubts on
our ability to design collision-resistant hash functions that will
withstand attacks over a long period of time. These attacks remind
us that cryptography is founded on heuristic constructions whose
security may be endangered unexpectedly. In particular, this
highlights the importance of following two fundamental cryptographic
design principles:
(i) design protocols and applications such that the underlying
cryptographic pieces (e.g., hash functions) are easy to replace when
need arises (in particular, avoid hard-wiring of any specific
construction into the application), and
(ii) design as general as possible mechanisms with as little as
possible requirements from the basic cryptographic building blocks.
The present proposal is intended to address these points, especially
the second one.
Although many existing applications that use hash functions do not
actually require full collision resistance, and although the current
attack on SHA-1 is not quite practical yet, it is clear that we
cannot dismiss the recent attacks as theoretical only. Indeed there
are important applications today that do rely on full collision
resistance, in particular those that use standard signature schemes
to provide non-repudiation or certification services. And with the
expected cryptanalytical improvements in the near future, ignoring
these new attacks would be irresponsible. Some of the options
contemplated in the applied cryptography world for responding to
the recent attacks on MD5 and SHA-1 are the following:
(1) Modify applications that rely on collision resistance such that
the particular use of CRHF in these applications will be less
vulnerable to collision attacks.
(2) Upgrade the systems using SHA-1 and MD5 to use stronger hash
functions such as the SHA2 family (256- and 512-bit versions).
The hope is that these functions will provide for more robust CRHFs.
Option (1) could be applied to different settings, but it is very
application specific. In particular, note that even if one could set
precise assumptions on the way specific applications are used today,
these assumptions are likely to change or become obsolete over time.
To illustrate this point, consider modifying applications that use
signatures so that the messages to be signed are structured in a way
that is unpredictable to the attacker. This approach relies heavily
on the understanding of the semantics and structure of messages used
in the application. Therefore, while it may be viable for specific
applications (such as choosing unpredictable serial numbers in
Halevi and Krawczyk [Page 2]
Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005
certificates) it is insufficient as a general measure.
Option (2) is more robust but its cost and complexity are
significant: it requires a multitude of applications, protocols and
implementations to instrument the transition to new functions.
Hopefully, the current attacks will improve at a mild enough pace so
that a relatively orderly transition can be implemented. And even if
we manage such a gradual transition, we must contemplate the
possibility that by the time the transition is completed the new
adopted functions will appear as weak as SHA-1 appears to be now.
The approach that we propose here takes elements from the above two
options. We suggest that we must plan for a transition to more
secure mechanisms, that this has to be done in an orderly way
(i.e., not as an uncontrolled panicking reaction), and that rather
than patching individual applications we re-engineer general
mechanisms in a way that provides for more robust cryptography,
specifically more secure signature schemes. To accomplish the
latter we propose to re-define the way hash functions are used in
the context of digital signatures so as not to rely so heavily on
the full collision resistance of our hash functions. This is likely
to result in a significantly longer useful life for SHA-1 itself
and, even more importantly, will result in significantly weaker
requirements from any hash family to be adopted or designed in the
future.
While this solution is not for free (see below), we show that it can
be done without having to change the basic signature algorithms in
use (e.g., RSA, DSS), without even changing the existing hash
functions (e.g., SHA-1, SHA2), and without the need to understand
the semantics of particular applications or messages. What needs to
be changed is the interface to the signing and hash algorithms. The
main tool for achieving all of the above, in particular lowering the
requirements on the security of hash functions, is the use of
randomized hashing, a well-studied notion in the cryptographic
literature (but seldom used in practice). Since our proposal
requires no change to the hashing algorithms themselves it can be
seen as a proposal for a "mode of operation" for existing and future
hash functions.
We end this introduction by noting that randomized hashing has
applications beyond the context of digital signatures. On the other
hand, it is important to also realize that randomized hashing is NOT
a replacement for CRHF in ALL possible applications. For example,
randomized hashing may not be appropriate in applications where
commitment is required or implied, e.g., a bidder committing to her
bid in an auction. So while we do not advocate abandoning CRHF as a
useful cryptographic tool, the important message we wish to convey
is that having a randomized mode of operation for CRHF for use in
digital signature applications, such as those requiring
non-repudiation, provides a substantial security gain and
significantly raises the bar against existing and future
cryptanalytical attacks on the underlying hash functions.
Halevi and Krawczyk [Page 3]
Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005
2. Randomized Hashing and Signature Schemes
The idea behind the use of randomized hashing is simple. Instead of
using a single deterministic hash function such as SHA-1 one uses a
keyed hash function (equivalently, a family of hash functions where
each function in the family is determined by a key or index).
For example, SHA-1 itself can be converted into a family of hash
functions indexed through a variable IV (we mention this as an
illustration, not necessarily as the best way to transform SHA-1
into an indexed hash family for our purposes here).
Let us denote by SIG a secure signing algorithm (such as RSA or DSS),
by H a family of hash functions, and by H_r the function from this
family indexed by the value r. Now, for signing a message m the
signer chooses a random value r and computes SIG(r,H_r(m)).
Here, the pair (r,H_r(m)) represents a (standard) encoding of the
concatenation of the values r and H_r(m). The signature on message
m now consists of the pair (r,SIG(r,H_r(m)). Before discussing
implementation issues (such as the choice of the family H, the index
r, and the encoding function) let's see why this method reduces the
reliance on collision resistance of the hash function.
Consider an attacker, Alice, against a honest signer Bob that signs
using the scheme from above. Alice provides a message m to be signed
by Bob and she gets back the pair (r,SIG(r,H_r(m)) from Bob, where r
is a value chosen at random (or pseudo-randomly) by Bob anew with
each signature. How can Alice attack this scheme (short of breaking
the signature algorithm, say RSA, itself)?
What Alice needs to do is to find a message m that Bob is willing to
sign and hope that when she receives the pair (r,H_r(m)), for
random r chosen by Bob, she will be able to find another message m'
for which H_r(m)=H_r(m') (with the same index r chosen and signed by
Bob). If she could do that then the signature string SIG(r,H_r(m))
would also be a valid signature for m'.
We remark that Alice could do a bit better by asking to sign many
messages m1, m2,..., getting back many pairs (r1, SIG(r1,H_r1(m1)),
(r2,SIG(r2, H_r2(m2)),..., and then finding another m' such that for
some i it holds that H_ri(mi)=H_ri(m'). But note that the number of
pairs is limited by the number of signatures that Bob is willing to
generate, and that Alice needs to engage in an on-line interaction
with Bob for every such pair. It is therefore likely that in most
applications the number of pairs available to Alice would be quite
small (say, not more than 2^30 or 2^40). Below we analyze only the
case of a single pair, while keeping in mind this additional factor
when dealing with concrete parameters.
Returning to the single pair condition, we see that Alice can
produce a forged signature if she can do one of the following:
Halevi and Krawczyk [Page 4]
Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005
(i) Cryptanalyze the family H to the point that for a random pair
(r,v) she can find m' such that H_r(m')=v.
(ii) Achieve (i) when in addition to the pair (r,v) Alice also
knows another value m for which H_r(m)=v.
(iii) Achieve (ii) when the value m is chosen by Alice herself
BEFORE learning r.
In other words, the collision finding task for Alice is not against
a fixed, known in advance, function as it is the case today with the
use of a fixed hash function, but against a random function in the
hash family H whose index r is revealed to Alice only after she
committed to the message m. In particular, being able to find
collisions against a fixed member of the family is useless; Alice
needs to be able to do so for a reasonably large fraction of hash
functions in the family.
Before we continue we note that the resistance to each of the above
forms of attacks is called, respectively:
(i) one-wayness (OW)
(ii) second-preimage resistance (SPR)
(iii) target-collision resistant (TCR)
The precise difference between SPR and TCR is that in the former the
first message m is chosen at random while in TCR the attacker gets
to choose m (but before learning r). We also remark that TCR
functions were first defined by Naor and Yung [NY89] where they were
called universal one-way hash function (UOWHF); the term TCR that we
use here is from [BR97].
Obviously, these tasks are harder to perform than a regular
collision-finding attack against a single CRHF function H (i.e. the
finding of two messages m,m' such that H(m)=H(m')).
More specifically, one can point to two essential differences
between a regular collision attack and any one of the above tasks.
First, a regular collision attack can be performed in a complete
off-line manner (i.e. ahead of the time when a signature is to be
issued) while each of (i)-(iii) depends on the choice of r and
therefore needs to be completed only after r is determined and
communicated to the adversary. Second, while collisions against a
single hash function that outputs k bits can be found by brute force
in time 2^{k/2}, a brute force TCR attack will take 2^k time.
And even if we recall the additional factor of 2^n pairs that Alice
can achieve via on-line interaction with Bob, a brute force attack
would still take her 2^{k-n} time (in the case of SHA-1 k=160,
while n would be no more than 40 in most reasonable applications).
Of course, none of the above says that SHA-1 (or any other specific
hash function) is sure to resist TCR attacks (or even SPR attacks).
But it clearly indicates that if an application uses a hash function
in a way that can only be broken under a successful TCR attack, then
the application is much more likely to remain secure in face of
cryptanalytical improvements than one that relies on full collision
resistance. This is true whether the hash function in use is a
Halevi and Krawczyk [Page 5]
Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005
partially broken CRHF such as SHA-1, a (hopefully) better
collision-resistant family such as SHA2, or any hash function to be
designed in the future.
While the above indicates general relations between the strengths
and vulnerabilities of different hashing tasks it does not tell us
how to instantiate a TCR function. We discuss this in the next
section. Later, in section 4, we explain how to integrate
randomized hashing into signatures (specifically, how to sign and
transport the index r).
3. A TCR Construction for Iterated Hash Functions
We propose a specific way to convert a single hash function H
(e.g SHA-1 or SHA2) into a TCR function family. The design
principles that we follow are:
(1) Do not change H: randomization is applied to the hash input
before the hash function is called.
(2) Minimize performance impact.
(3) Increase (heuristically) the likelihood of resistance of the
family to TCR attacks.
In 3.1 we present a basic construction (with some heuristic
rationale in Appendix A). In 3.2 we list some variants which take
into account some further trade-offs between performance and
plausible security. We stress that these methods, although
plausible, need to be scrutinized further before they can be
adopted.
3.1 A simple randomized hash construction
Let H be a hash function that processes the message to be hashed in
512-bit blocks. For example, if H is an integrated hash function
a-la-Merkle-Damgard then the underlying compression function has as
inputs an IV and a 512-bit data input. (We use 512 bits as the
typical block size but other values are possible.) Let XOR denote
the bit-wise exclusive-or operation.
Given a message m to be hashed, the signer (or "hasher") chooses a
512-bit random value r, and XORs each 512-bit block of m with r.
(If m is not an exact multiple of 512-bit blocks then the shorter
last block is XORed with an appropriately truncated r.)
In other words, we concatenate r to itself until we get a string r*
of the same length of m, and then compute m XOR r*.
We define H_r(m) to be H(m XOR r*).
Note: By our definition the result of (m XOR r*) is of the same
length as m; therefore, the length padding defined by Merkle-Damgard
functions such as SHA-1 is applied to (m XOR r*). In other words,
the length padding is not subject to the XOR with r*.
Halevi and Krawczyk [Page 6]
Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005
In Appendix A we provide some rationale on the choice of this
particular way of converting iterated hash functions into TCR.
Variants of this method are presented next.
3.2 Some Randomized Hash Variants
A possible strengthening of our construction from Sec 3.1 can be
obtained if, in addition to XORing each block of input with the
value r, one also prepends r to the input to H, i.e., the input to H
consists of the block r concatenated with (m XOR r*). This provides
a randomizing effect to the initial IV of H (in the spirit of the
HMAC construction).
An even more conservative variant could interleave the block r
between any two blocks of the original message, thus providing an IV
randomization feature for each application of the compression
function. The obvious drawback is the added computation (double the
cost of the original hash function).
Another natural idea is to add a layer of security by XORing a
different random pad to each block of the message. Clearly, this
adds a non-trivial computational cost (one would need to generate a
pad of the length of the message via some PRG). A midway strategy
could be to start with a pad of the length of a single block and
slightly (and inexpensively) change this pad for each new block of
input, for example by applying circular byte rotation to the
previous block pad. A similar idea would be to derive the pad from a
byte-oriented LFSR whose initial value is the key r.
Finally, if the generation of a 512-bit random (or pseudo-random)
quantity r for each signature is regarded as expensive (possibly
true for low-power devices, smart cards, etc.) then it is possible
to define r as the concatenation of a shorter pad. For example, in
order to define r one could choose a random 128-bit string and
concatenate it four times to create r. Given the heuristic nature of
our constructions this may be considered a reasonable trade-off.
4. TCR Hashing and Signature Encoding
Recall how randomized hashing is to be used in the context of digital
signatures. For signing a message m, the signer chooses at random a
value r and computes SIG(r,H_r(m)) where SIG represents a signing
algorithm (such as RSA or DSS). More precisely, the signer will use
a well-defined standard encoding of the concatenation of the values
r and H_r(m) and then apply algorithm SIG to this encoding.
The signature on message m consists of the pair (r,SIG(r,H_r(m)).
The above requires changing current signature schemes in four ways:
(1) Choosing a random (unpredictable) index r for each signature,
(2) Replacing the current hashing of a message m from H(m) to H_r(m),
(3) Signing r, and
(4) Transporting r as part of the signature.
Halevi and Krawczyk [Page 7]
Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005
Here we discuss the required changes to existing message encodings
for implementing the last 3 points. We focus on the two main
algorithms in use: RSA and DSS. We note that while changing existing
encoding standards may be one of the possible obstacles to adopting
randomized hashing, this change is instrumental in allowing for more
secure and robust signature schemes not only in the short term but
in the farther future as well. We suggest that this change to the
standards be specified and adopted as soon as possible. As we see
below, these changes can be specified in a way that is independent
of the specific randomized hash function to be used.
We start with RSA. The most common encoding in use with RSA
signatures is PKCS#1 v1.5. It specifies that given a message m to be
signed, the input to the RSA signature function is a string composed
of the hash value H(m) (computed on the message m using a
deterministic hash function such as SHA-1) which is padded to the
length of the RSA modulus with a standard deterministic padding
(this padding contains information to identify the hash algorithm in
use). This encoding can be extended to deal with randomized hashing
as follows. First, the value H(m) is replaced with H_r(m) for r
chosen by the signer. Second, part of the deterministic padding
(which is currently filled with repeated 0xff octets) is replaced
with the value of r. In this way, r is signed and, at the same time,
it is made available to the verifier of the signature without any
increase in the size of signatures (r is recovered by the verifier
by inverting the signature operation).
Another RSA encoding, called EMSA-PSS, is standardized by PKCS#1 v2.1
and is based on the randomized signature scheme of Bellare and
Rogaway [BR96]. Unfortunately, the standard defines an encoding in
which the first step is to apply a deterministic hash function (say,
SHA-1) to the message m. Only then the randomized encoding scheme of
PSS is applied. As a result, the signature scheme that uses EMSA-PSS
is broken if the hash function is not fully collision resistant.
In order to use this scheme with randomized hashing, one would
replace the current H(m) value in the encoding with H_r(m) and the
value r would be encoded in a way that the verifier of a signature
can recover it before applying the randomized hashing. The original
PSS scheme from [BR96] can be used, or adapted, to achieve such an
encoding.
Two points to remark regarding the applicability of PSS here are:
first, the original PSS scheme is patented -- see US Patent 6266771
(which may or may not be an obstacle for adoption). Second, the
main analytical benefit of PSS is its provability based on the so
called "random oracle model". While this provides a good heuristic
backing to the construction, one has to take into account that here
we are dealing explicitly with lowering the security requirements
from the hash function, so it is questionable how random-like these
functions may be required to be. Formal proofs aside, the PSS scheme
offers good heuristic advantages over the PKCS#1 v1.5 in that it
better randomizes the input to the RSA signing algorithm.
Halevi and Krawczyk [Page 8]
Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005
Regarding the DSS (or DSA) signature algorithm, the first thing to
note is that this is already a randomized signature scheme.
A DSS signature is composed of a pair of elements (R,S) where R is a
random element in the DSS group and S is a value computed as a
function of the private key of the signer, the discrete logarithm of
R (denoted k), and the value H(m) (where m is the message to be
signed and H a deterministic hash function). In order to convert
this scheme to use randomized hashing one can use R itself as the
index to the hash family, i.e., r=R (or to derive r from R in some
deterministic way). Then one would replace H(m) with H_r(m).
In this way the size of signatures is unchanged and no further
processing is required to generate r. Also note that while the
signature component R is not strictly "signed", the attacker cannot
control or choose this value (indeed, an attack that finds values R
and S for which (R,S) are a valid signature of H_R(m), for a value
H_R(m) not signed by the legitimate signer, would contradict the
basic security of DSS). One may argue that the use of H_r(m) instead
of H(m) can be viewed as an "implementation" of the random-oracle
version of DSS as analyzed by Pointcheval and Stern [PS96]; the same
caveats expressed in the case of PSS in relation to the use of the
random oracle model apply here as well.
One consideration in regards to using the component R of DSS
signatures as the index to the randomized hash family is that,
in order to ensure the TCR property, this index needs to be unknown
(unpredictable) to the attacker when the latter chooses the message
m to be signed. If the value of R is computed off-line by the signer
(which is possible in the case of DSS) and is leaked before the
attacker choses m then the benefit of randomized hashing is lost.
Hence, R=g^k should be kept secret together with k until the
signature is issued. This is not a fundamental limitation to the
practice of DSS since the DSS scheme already requires (in an
essential way) that k be kept secret, even if computed off-line,
since its discovery by the attacker is equivalent to finding the
secret private key of the signer!
5. Security Considerations
This document presents mechanisms that, if adopted by standard
bodies such as the IETF, will result in significant improvements to
our current and future digital signature systems. While this
document focuses on randomized modes of operation of hash functions
that provide randomized hashing without changing existing
algorithms, it is advisable that future hash families will be
designed with randomized hashing and TCR requirements in mind.
For example, new schemes that follow the Merkle-Damgard approach may
consider allowing for the masking of intermediate values with
optional user provided inputs (that is, such a mask could be set to
a default value, say 0, for deterministic uses of the hash function,
and to user-provided values when randomization is desired). The
important point is that implementations of the function will be
ready to accept such masks without having to change the function.
Halevi and Krawczyk [Page 9]
Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005
We note that all references to "randomness" in this document should
be interpreted as "pseudo-randomness" provided one uses a
cryptographically strong pseudorandom generator (or pseudo-random
function) initialized with a strong unpredictable seed.
We also mention that using TCR hashing may mean that the legitimate
signer can find two messages with the same signature (since it is
the legitimate signer that is choosing the randomness r). One should
note, however, that this has no bearing on non-repudiation (as the
signer is still bound to both messages). Moreover, as shown in
[SPMS02], even if one uses CRHF some secure signature schemes (such
as ECDSA) may allow a signer to find two different messages whose
signature string is the same. Still, as mentioned at the end of the
Introduction, there may be OTHER applications of CRHF that cannot be
replaced with a TCR family.
Finally, the general approaches to randomized hashing and digital
signatures discussed here do not depend on the specifics of the
concrete constructions that we proposed here. Other forms of
randomized hashing and TCR schemes may be superior to the ones
proposed here and further proposals are encouraged.
ACKNOWLEDGMENT. We thank Ran Canetti for useful discussions and for
badgering us to write this document.
REFERENCES
[BR96] M. Bellare and P. Rogaway, "The Exact Security of Digital
Signatures -- How to Sign with RSA and Rabin", Eurocrypt'96,
LNCS 1070, 1996.
[BR97] M. Bellare and P. Rogaway, "Collision-Resistant Hashing:
Towards Making UOWHFs Practical", Crypto'97, LNCS 1294, 1997
[HPL04] D. Hong, B. Preneel, and S. Lee, "Higher Order Universal
One-Way Hash Functions", Asiacrypt'04, LNCS 3329, 2004.
[NY89] M. Naor and M. Yung, "Universal One-Way Hash Functions and
Their Cryptographic Applications", STOC'89, 1989.
[PS96] D. Pointcheval and J. Stern, "Security Arguments for Digital
Signatures and Blind Signatures", J.Cryptology, 13:361-396,
2000.
[S00] V. Shoup, "A Composite Theorem for Universal One-Way Hash
Functions", Eurocrypt'00, LNCS 1807, 2000.
[SPMS02] Jacques Stern, David Pointcheval, John Malone-Lee, and
Nigel P. Smart, "Flaws in Applying Proof Methodologies to
Signature Schemes", CRYPTO '2002, LNCS 2442, 2002.
Halevi and Krawczyk [Page 10]
Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005
Appendix A - Rationale for the Proposed TCR Construction(s)
Our TCR proposals follow the following principles:
(1) Allow the use of existing functions such as SHA-1 and SHA2
(in particular, iterated hash functions a la Merkle-Damgard).
(2) Do not change the hash function but only the interface to it
(e.g., in our proposal randomization is achieved via the input
to the function and therefore implemented hash functions, in
either software or hardware, can be used without modification).
(3) Use as weak as possible properties of the compression function
underlying the hash construction.
Our construction is general enough to be used with any hash function
that processes the incoming data as blocks. Yet, we focus in our
discussion here on Merkle-Damgard (M-D) type of hash functions since
these are the most common schemes in practice.
While (1) and (2) are obvious properties of our suggested
construction we elaborate here on point (3). Ideally, we would have
liked to provide a mathematical theorem proving the security of our
construction using only relatively weak requirements from the
underlying compression function. While such theorems exist for some
specific constructions (e.g., [BR97,S00]), they all include
operations that violate the principle of using the existing hash
functions without any change (e.g., masking the intermediate value of
the compression function with each call to this function). We thus
settle for a heuristic rationale that should be scrutinized in light
of the evolving ideas in the area of hash function cryptanalysis.
Let H be a M-D function (the reader can think of SHA-1 for
concreteness) and h be the corresponding compression function.
That is, h acts on two inputs, a and b, where a represents an
intermediate value (IV) and b is a 512-bit block, and the output
of h is of the length of the IV (IV lengths vary with different
constructions, e.g., 160, 256, etc.). The function H itself is
defined for arbitrary inputs by iterating h over the successive
blocks of the input with each iteration using the IV computed by the
previous application of h (the first IV is set to some constant
defined by the specification of H).
Consider now a family of compression functions derived from h as
follows: for each 512-bit index r, define h_r(a,b)=h(a,b XOR r).
It is easy to see that iterating h_r as in a M-D construction one
obtains the function H_r that we defined in 3.1.
Merkle and Damgard showed that if a compression function h is
collision resistant with respect to fixed-length inputs, then the
function H obtained by iterating h is collision resistant on
arbitrary inputs. We would like to claim the same with respect to
the property of target-collision resistance (TCR), namely, that if
h is TCR so is H. This, however, is not necessarily the case.
Halevi and Krawczyk [Page 11]
Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005
Yet, an "approximation" to this result was recently shown by Hong,
Preneel and Lee [HPL04]. They show that if the construction h_r has
a property called "n-order TCR" then the iterated family H_r is TCR
for messages of up to n blocks. The property of n-order TCR is
defined by the following game between Alice (the Attacker) and a
"hasher" Bob.
(1) Bob chooses an index r and keeps it secret.
(2) For i=1,...,n: Alice chooses a pair (a_i,b_i) and receives from
Bob the value h_r(a_i,b_i).
(3) Alice chooses a pair (a,b).
(4) Bob reveals r to Alice
Alice wins the game if she can find (a',b') different from (a,b)
such that h_r(a,b)=h_r(a',b').
In other words, Alice needs to carry a TCR attack but she is
allowed to query h_r on n inputs of her choice before committing to
the first colliding message and before learning the value of r.
Intuitively, the difference with a regular TCR attack is that Alice
has now an advantage in choosing (a,b) since she can first learn
something about r from the first n queries.
A family h is called n-order TCR if any efficient attacker (Alice)
can only find (a',b') as above with insignificant probability.
Before we continue it is important to clarify that the above game
defining n-order TCR functions is not a game that reflects an actual
interaction between an attacker and a victim in real life but it is
only a virtual game used to define the security of a function.
How much does the extra phase (2) in the game from above help Alice
to find collisions? This of course depends on the specific function,
and to some extent also on the value of n. Note that if one lets n
to be huge (say 2^80 in the case of SHA-1) then Alice can use this
"learning phase" to find colliding pair (a_i,b_i) and (a_j,b_j) that
she can then use as (a,b) and (a',b') respectively. But recall that
n represents the length in blocks of the messages to be hashed with
the iterated construction, so it will typically be quite small.
(I.e., n=4 or so in the case of certificates, and n < 2^30 even for
huge documents.) Hence one may hope that the learning phase will
not be sufficiently useful for Alice to choose the colliding pair.
In other words, while in order to break a collision-resistant hash
function an attacker can spend a HUGE amount of OFF-LINE computation
for finding collisions, for breaking an n-order TCR function the
attacker is limited to only MODERATE ON-LINE interaction with the
hasher after which it needs to commit to a first colliding value x.
Only then the attacker receives the actual value r for which it
needs to find x' such that h_r(x)=h_r(x').
We also comment that the common view of the compression function
h(a,b) as a block cipher with key b and input a gives rise to another
heuristic argument supporting the view of h_r as n-order TCR.
Viewing h(a,b) as a block cipher, phase (2) of the attack from above
Halevi and Krawczyk [Page 12]
Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005
on h_r is just a chosen-plaintext related-key attack on the block
cipher h. If h resists such attacks with a moderate number of
queries, then phase (2) does not help the attacker learn much about
r. Hence, if h is both TCR and a sufficiently robust block cipher,
then it is also an n-order TCR.
As said, [HPL04] show that if a compression family h={h_r} is
n-order TCR then the family H={H_r} is TCR on n block inputs
(here H_r is a Merkle-Damgard iteration of h_r). Applying this
result to our case, we obtain that if the construction
h_r(a,b) = h(a, b XOR r) is an n-order TCR then the family H_r
described in 3.1 is TCR for n-block inputs.
In other words, any TCR attack against the family H_r that uses
n-block messages, translates into an n-order TCR attack against the
compression function family h_r with only n initial oracle queries.
This provides some foundation to the belief that even the existing
hash functions are significantly more secure in the sense of TCR
than for collision resistance when used as specified here.
In addition, one should examine the current attacks and see to what
extent they apply to the defined functions. In particular, we note
that the XORing of input blocks with a random block, while it
preserves differentials, it also destroys the ability of the
attacker to set some of the bits of the colliding messages to values
of its choice. It seems that an attack that takes advantage of
differentials in this setting would need to rely on universal
differentials that depend only on the hash function and for which
most pairs of messages with that difference would collide.
Finally, we point out to another "motivating" element in our design.
Remember that SPR (second pre-image resistant) functions are a weaker
(i.e., easier to accomplish) version of TCR functions where the
attacker cannot choose the first colliding value but rather this
value is determined at random. A straightforward way to transform
an SPR compression function h into a TCR family [S00] is to choose a
pair r=(s1,s2), where s1,s2 are random values of the length of the
IV and block size, respectively, and define
h_r(a,b)=h(a XOR s1, b XOR s2). Unfortunately, iterating such an
h_r is impractical as it requires modifying H such that the IV can
be XORed with S1 in each iteration of h. Therefore, instead of
using this full transformation of SPR into TCR we carry the
randomization only in the second input of h, namely, in our
construction in 3.1 we use h_r(a,b)=h(a,b XOR r) (when viewing h as
a block cipher, as mentioned before, the XORing with r provides for
randomization of the cipher key).
Halevi and Krawczyk [Page 13]
Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005
Authors' Addresses
Shai Halevi
shaih@alum.mit.edu
Hugo Krawczyk
hugo@ee.technion.ac.il
IBM T.J. Watson Research Center
19 Skyline Drive
Hawthorne, NY 10532
USA
Full Copyright Statement
Copyright (C) The Internet Society (2005).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR
IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Halevi and Krawczyk [Page 14]