]>
Implementation Guidance for
the PKCS #1 RSA Cryptography SpecificationRed Hat, Inc.Purkynova 115Brno61200Czech Republichkario@redhat.com
General
Internet Engineering Task ForceRSAThis document specifies additions and amendments to
RFC 8017. Specifically, it provides guidance to implementers
of the standard to protect against side-channel attacks.
It also deprecates the RSAES-PKCS-v1_5 encryption scheme, but
provides an alternative depadding algorithm that protects against
side-channel attacks raising from users of vulnerable APIs.
The purpose of this specification is to increase security of
RSA implementations.
The PKCS #1
describes the RSA cryptosystem, providing guidance on implementing
encryption schemes and signature schemes.Unfortunately, straight-forward implementation of the RSA
encryption schemes leave it vulnerable to side-channel attacks.
Protections against them are not documented in RFC 8017,
and attacks are mentioned only in passing.The RSAES-PKCS-v1_5 encryption scheme is known to be problematic
since 1998,
when Daniel Bleichenbacher published his attack
.
Side-channel attacks against public key algorithms, including RSA,
are known to be possible since 1996 thanks to work by Paul Kocher
.
Despite those results, side-channel attacks against RSA implementations
have proliferated for the next 25 years.
Including attacks against simple exponentiation implementations
, implementations
that use the Chinese Remainder Theorem optimisation
, and implementations that use
either base or exponent blinding exclusively
.
Similarly, side-channel free handling of the errors from
the RSAES-PKCS-v1_5 decryption operation is something that
implementations struggle with
.
We thus provide guidance how to implement those algorithms
in a way that should be secure against at least the simple timing
side channel attacks.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.In this document we reuse the notation from RFC 8017, in addition,
we define the following:

AL

alternative message length, non-negative integer,
0 <= AL <= k - 11

AM

alternative encoded message, an octet string

bb

base blinding factor, a positive integer

bbInv

base un-blinding factor, a positive integer,

D

octet string representation of d

DH

an octet string of a SHA-256 hash of D

KDK

an octet string containing a Key Derivation Key for a
specific ciphertext C

l

length in octets of the message M

b_i

an exponent blinding factor for i-th prime,
non-negative integer.

g_i

a modulus blinding factor for the i-th prime,
non-negative integer.

Cryptographic implementations may provide a lot of indirect signals
to the attacker that includes information about the secret
processed data. Depending on type of information, those leaks can
be used to decrypt data or retrieve private keys.
Most common side-channels that leak information about secret
data are:
Different errors returnedDifferent processing times of operationsDifferent patterns of jump instructions and memory accessesUse of hardware instructions that take different amount
time to execute depending on operands or resultSome of those leaks may be detectable over the network, while
others may require closer access to the attacked system.
With closer access, the attacker may be able to measure
power usage, electromagnetic emanations, or sounds and
correlate them with specific bits of secret information.
Recent research into network based side channel detection has shown
that even very small side channels (of just few clock cycles) can
be reliably detected over the network. The detectability depends
on the sample size the attacker is able to collect, not on size
of the side-channel.
As a general rule, all operations that process secret information
(be it parts of the private key or parts of encrypted message)
MUST be performed with code that doesn't have secret data dependent
branch instructions, secret data dependent memory accesses, or
uses non-constant time machine instructions (which ones are those
is architecture dependant,
but division is commonly non-constant time).
Special care should be placed around the code that handles
the conversion of the numerical representation to the octet string
representation in RSA decryption operations.All operations that use private keys SHOULD additionally employ
both base blinding and exponent blinding as protections against
leaks inside modular exponentiation code.The underlying modular exponentiation algorithm MUST be
constant time with regards to the exponent in all uses of the
private key.For private key decryption the modular exponentiation algorithm
MUST be constant time with regards to the output of the
exponentiation.In case the Chinese remainder theorem optimisation is used the
modular exponentiation algorithm must also be constant time
with regards to the used moduli.It's especially important to make sure that all values that
are secret to the attacker are stored in memory buffers that
have sizes determined by the public modulus.For example, the private exponents should be stored in
memory buffers that have sizes determined by the public
modulus value, not the numerical values of the exponents
themselves.Similarly, the size of the output buffer for multiplication
should always be equal to the sum of buffer sizes of multiplicands.
The output size of the modular reduction operation should
similarly be equal to the size of the modulus and not depend
on bit size of the output.For the modular exponentiation algorithm to be side-channel free
every step of the calculation MUST NOT depend on the bits of
the exponent. In particular, use of simple square and multiply
algorithm will leak information about bits of the exponent
through lack of multiplication operation in individual
exponentiation steps.The recommended workaround against it, is the use of the
Montgomery ladder construction.While that approach ensures that both the square and multiply
operations are performed, the fact that the results of them are
placed in different memory locations based on bits of the secret
exponent will provide enough information for an attacker
to recover the bits of the exponent. To counteract it,
the implementation should ensure that both memory locations
are accessed and updated on every step.
As multiplication operations quickly make the intermediate
values in modular exponentiation large, performing a modular
reduction after every multiplication or squaring operation
is a common optimisation.To further optimise the modular reduction, the Montgomery
modular multiplication is used for performing the combined
multiply-and-reduce operation. The last step of that operation
is conditional on the value of the output. A side-channel
free implementation should perfom the subtraction in all cases
and then copy the result or the first operand of the subtraction
based on sign of the result of the subtraction in side-channel
free manner.
As a protection against multiple attacks, it's RECOMMENDED to
perform all operations involving the private key with the use
of blinding .It should be noted that for decryption operations the
unblinding operation MUST be performed using side-channel free
code that does not leak information about the result of this
multiplication and reduction modulo operation.To implement base blinding, select a number bb uniformly at random
such that it is relatively prime to n and smaller than n.Compute multiplicative inverse of bb modulo n.
In the RSADP() operation, after performing step 1, multiply c
by bb mod n. Use the result as new c for all the remaining
operations.Before returning the value m in step 3, multiply it by bbInv mod n.
Note: multiplication by bbInv and reduction modulo n MUST be
performed using side-channel free code with respect to
value m.As calculating multiplicative inverse is expensive, implementations
MAY calculate new values of bb and bbInv by squaring them:
A given pair of blinding factors (bb, bbInv) MUST NOT be used for
more than one RSADP() operation.Unless the multiplication (squaring) and reduction modulo operations
are
verified to be side-channel free, it's RECOMMENDED to generate
completely new blinding parameters
every few hundred private key operations.To further protect against private key leaks, it's RECOMMENDED
to perform the blinding of the used exponents
.When performing the RSADP() operation, the blinding depends on the
form of the private key.If the key is in the first form, the pair (n, d), then the
exponent d should be modified by adding a multiple of Euler
phi(n): m = c^(d + b*phi(n)) mod n. Where b is a 64 bit long
uniform random number.A new value b MUST be selected for every RSADP() operation.If the key is the second form, the quintuple (p, q, dP, dQ,
qInv) with optional sequence of triplets (r_i, d_i, t_i), i = 3, ...,
u, then each exponent used MUST be blinded individually.The m_1 = c^(dP + b_1 * phi(p)) mod pThe m_2 = c^(dQ + b_2 * phi(q)) mod qIf u > 3, then m_i = c^(d_i + b_i * phi(r_i)) mod (r_i)Where b_1, b_2, ..., b_i are all uniformly selected random numbers
at least 64 bits long (or at least 2 machine word sizes, whichever
is greater).As Euler phi(p) for an argument p that's a prime is equal p - 1,
it's simple to calculate in this case.Note: the selection of random b_i values, multiplication of them
by the result of phi() function, and addition to the exponent
MUST be performed with side-channel free code.Use of smaller blinding factor is NOT RECOMMENDED, as values shorter
than 64 bits have been shown to still be vulnerable to side-channel
attacks.The b_1, b_2, ..., b_i factors MUST NOT be reused for multiple
RSADP() operations.To protect against private key leaks, it's RECOMMENDED
to perform blinding of the used modulus values in the CRT
implementations.When the key is in the first form, the pair (n, d), then
the used modulus is public, thus no blinding is necessary.If the key is in the second form, the quintuple (p, q, dP, dQ, qInv)
with the optional sequence f triplets (r_i, d_i, t_i), i = 3, ...,
u, then each modulus used MUST be blinded individually.The m_1 = c^dP mod (g_1 * p)The m_2 = c^dQ mod (g_2 * q)If u > 3, then m_i = c^d_i mod (g_3 * r_i)Where g_1, g_2, ..., g_i are all uniformely selected random number
at least 64 bits long (or at least 2 machine word sizes, whicher
is greater).Before step 3 of the original algorithm, reduce the returned
value m mod n.In case of RSA-OAEP, the padding is self-verifying, thus the
depadding operation needs to follow the standard algorithm
to provide a safe API to users.It MUST ignore the value of the very fist octet of padding and
process the remaining bytes as if it was equal zero.The RSAES-PKCS-v1_5 encryption scheme is considered deprecated, and
should be
used only to process legacy data. It MUST NOT be used as part
of online protocols or API endpoints.For implementations that can't remove support for this padding
mode it's RECOMMENDED to implement an implicit rejection mechanism
that completely
hides from the calling code whether the padding check
failed or not.It should be noted that the algorithm MUST be implemented
as stated, otherwise in case of heteregonous environments
where two implementations use the same key but implement the
implicit rejection differently, it may be possible for the
attacker to compare behaviour between the implementations to
guess if the padding check failed or not.The basic idea of the implicit rejection is to prepare a random
but deterministic message to be returned in case the standard
RSAES-PKCS-v1_5 padding checks fail. To do that, use the private
key and the provided ciphertext to derive a static, but unknown to
the attacker, random value. It's a combination of the method
documented in the TLS 1.2 (RFC 5246)
and the deterministic (EC)DSA signatures (RFC 6979
).For the calculation of the random message for implicit
rejection we define a Pseudo-Random Function (PRF) as follows:IRPRF( KDK, label, length )Input:KDK the key derivation keylabel a label making the output unique for a given KDKlength requested length of output in octetsOutput: derived key, an octet stringSteps:If KDK is not 32 octets long, or if length is larger than
8192 return error and stop.The returned value is created by concatenation of subsequent
calls to a SHA-256 HMAC function with the KDK as the
HMAC key and following octet string as the message:
Where the I is an iterator value encoded as two octet long
big endian integer, label is the passed in label, and bitLength
is the length times 8 (to represent number of bits of output)
encoded as two octet big endian integer. The iterator
is initialised to 0 on first call, and then incremented by
1 for every subsequent HMAC call.The HMAC is iterated until the concatenated output is
shorter than lengthThe output is the length left-most octets of the
concatenated HMAC outputFor implementations that cannot remove support for the
RSAES-PKCS-v1_5 encryption scheme
nor provide a usage-specific API,
it's possible to implement an implicit rejection
algorithm as a protection measure. It should be noted that
implementing it correctly is hard, thus it's RECOMMENDED instead
to
disable support for RSAES-PKCS-v1_5 padding instead.To implement implicit rejection, the
RSAES-PKCS1-V1_5-DECRYPT from section 7.2.2 of RFC 8017 needs
to be implemented as follows:Length checking: If the length of the ciphertext C is not k
octets (or if k < 11), output "decryption error" and stop.RSA decryption:
Convert the ciphertext C to an integer ciphertext
representative c:
Apply the RSADP decryption primitive to
the RSA private key (n, d) and the ciphertext
representative c to produce an integer message
representative m:
Note: the RSADP MUST be constant-time with respect of
message m.
If RSADP outputs "ciphertext representative out of range"
(meaning that c >= n), output "decryption error" and stop.
Convert the message representative m to an encoded message
EM of length k octets:
Note: I2OSP MUST be constant-time with respect of m.
Derivation of alternative message
Derive the Key Derivation Key (KDK)
Convert the private expoent d to a string of length
k octets:
Hash the private exponent using the SHA-256 algorithm:
Note: This value MAY be cached between the decryption
operations, but MUST be considered private-key
equivalent.
Use the DH as the SHA-256 HMAC key and the provided
ciphertext C as the message. If the ciphertext C is not
k octets long, it MUST be left padded with octets of value
zero.
Create the candidate lengths and the random message
Use the IRPRF with key KDK, "length" as six octet label
encoded with UTF-8, to generate 256 octet output.
Interpret this output as 128 two octet long big-endian
numbers.
Use the IRPRF with key KDK, "message" as a seven octet
label encoded with UTF-8 to generate k octet long output
to be used as the alternative message:
Select the alternative length for the alternative message.
Note: this must be performed in side-channel free way.
Iterate over the 128 candidate CL lengths. For each
zero out high order bits so that they have the same bit
length as the maximum valid message size (k - 11).Select the last length that's not larger than k - 11,
use 0 if none are. Save it as AL.EME-PKCS1-v1_5 decoding: Separate the encoded message EM into
an octet string PS consisting of nonzero octets and a message
M as
If the first octet of EM does not have hexadecimal value 0x00,
if the second octet of EM does not have hexadecimal value
0x02, if there is no octet with hexadecimal value 0x00 to
separate PS from M, or if the length of PS is less than 8
octets, the check variable must remember if any of those checks
failed. Irrespective of the check variable value, the code should
also return length of message M: L. If there is no octet with
hexadecimal value 0x00 to separate PS from M, then L should equal
0.
Note: All those checks MUST be performed irrespective if previous
checks failed or not. A common technique for that is to have
a check variable that is OR-ed with the results of subsequent
checks.
Decision which message to return: in case the check variable
is set, the code should return the last AL octets of AM,
in case the check variable is unset the code should return
the last L octets of EM.
Note: The decision which length to use MUST be performed in
side-channel free manner. While the length of the returned
message is not considered sensitive, the read memory location is.
As such, when returning message M both EM and AM
memory locations MUST be read.Performing all actions in a way that doesn't leak the status
of the padding check includes the API provided to 3rd party code.
In particular, if the RSA decryption implementation doesn't
implement implicit rejection, then all three pieces of information:
the padding check, the length of returned message, and the value
of the message are sensitive information, useful in mounting an
attack. As such, any API that returns an error in substantially
different manner than a successful decryption (e.g. raising
an exception, returning a null pointer, returning a different
type of structure) is vulnerable to side-channel attacks.While there are infinite ways to implement those algorithms
incorrectly few common ideas to work-around side-channel attacks
are repeated. We list few of them as examples of approaches that
don't work and thus MUST NOT be used.Commonly proposed workaround for timing attacks is to add
a random delay to procesing of encrypted data.
For such mitigation to be effective in practice
(raise attacker's work factor to gain meaningful safety
margin) it would need to be in the order of at least
few seconds if not tens of seconds. Effective sizes and
distributions of delays have not been subject of extensive
study.
It should also be noted that such a delay would need to be
applied in addition to the regular mitigation (generate random
key, use it in case RSAES-PKCS-v1_5 padding checks).
That is, it needs to be implemented in the calling code, not
in the depaddng code.
At the same time, performing a simple wait masks only the
timing side channel. It doesn't mitigate other side-channels,
like ones that depend on power usage or memory access pattern.
A simpler variant of the proposed implicit rejection
algorithm, where the decryption API returns a random
message in case the padding check fails or the decrypted
message has unexpected length, as decribed in
, isn't a universal mitigation.In case of a Bleichenbacher like attack, the attacker is
trying to differentiate two classes of ciphertexts:
ones that decrypt to message of specific size
and ones that have invalid padding or decrypt to a message
of unexpected size. That translates to two kind of values
being returned to calling code: either the same
message always for every decryption, or a different
message for every decryption.If that message is later used in any operation that
is not side-channel free (key derivation, symmetric padding
removal, message parsing), then the attacker will be able
to observe two kinds of behaviour: consistent one
for static messages and erratic one for randomly generated
messages.Use of such a message as a key with block cipher modes
that require padding, like AES-CBC with PKCS #7 padding,
is particularly leaky as it has about 1 in 256 chance of
successful decryption with random key.The simple implicit rejection with random message may be
implemented safely in TLS 1.2 because both static messages
and randomly generated messages are mixed together
with the random nonce generated by the server, thus the
attacker is unable to differentiate if the randomness originates
in the nonce or in the randomisation of the message.Current protocol deployments MUST NOT use
encryption with RSAES-PKCS-v1_5 padding.
Support for RSAES-PKCS-v1_5 SHOULD be disabled in default
configuration of any implementation of RSA cryptosystem.
All new protocols MUST NOT specify RSAES-PKCS-v1_5 as a valid
encryption padding for RSA keys.This memo includes no request to IANA.This whole document specifies security considerations for
RSA implementations.
&RFC8017;
&RFC2119;
&RFC5246;
&RFC6979;
Attacking Exponent Blinding in RSA without CRTExponent Blinding Does Not Always Lift (Partial) Spa Resistance to Higher-Level SecurityChosen Ciphertext Attacks Against Protocols Based on the RSA Encryption Standard PKCS#1Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems.A Practical Implementation of the Timing AttackImproving Divide and Conquer Attacks against Cryptosystems by Better Error Detection / Correction Strategies.A Timing Attack against RSA with the Chinese Remainder TheoremRemote timing attacks are practicalImproving Brumley and Boneh timing attack on unprotected SSL implementationsA Major Vulnerability in RSA Implementations due to MicroArchitectural Analysis ThreatA Vulnerability in RSA Implementations Due to Instruction Cache Analysis and Its Demonstration on OpenSSLPower attacks in the presence of exponent blinding.Return Of Bleichenbacher's Oracle Threat (ROBOT)Everlasting ROBOT: the Marvin Attack«provide test vectors here»see also: https://github.com/tlsfuzzer/tlslite-ng/blob/master/unit_tests/test_tlslite_utils_rsakey.py#L1694 or OpenSSL 3.2.0.proposed test vectors:Otherwise valid, but with wrong first byte of plaintextOtherwise valid, but with padding type specifying signatureOtherwise valid, but with PS of 7 bytesOtherwise valid, but with PS of 0 bytesOtherwise valid, but with the message separator byte missingInvalid ciphertext that decrypts to a synthehic message of
maximum sizeInvalid ciphertext that decrypts to a 0-bytes long messageInvalid ciphertext that needs to use the second-to-last
synthethic length for the returned messageValid ciphertext that starts with a zero byte«provide test vectors here«provide test vectors here»«provide test vectors here»