Re: [TLS] Entropy of SHA-2 and the P_SHA256-based PRF
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [TLS] Entropy of SHA-2 and the P_SHA256-based PRF



On 07/15/2010 01:26 AM, Marsh Ray wrote:

I think the "H[i] + " in front of F changes everything such that it is
no longer appropriate to model it as a pure iterated random function.
The entropy loss is eliminated because it is literally "added back in".

I did some Monte Carlo testing and found that my intuition was mostly wrong. The entropy loss effect is also observable with the Davies-Meyer construction where the chaining value from the previous block is added back into the result from the current block. In fact, under some circumstances, it makes it worse!

A pure Merkle-Damgard hash built from a random permutation (e.g., a block cipher) would not lose entropy in this way because there are no missing preimages. However, if the chaining input is mixed back to the output as in the Davies-Meyer, it reintroduces the entropy loss of the random function-based hash.

I was correct in thinking that D-M converts the worst-case of a completely degenerate compression function (which returns the same image every time) into a best case of no entropy loss. Ironically, improving the compression function from that point drags the output back down to the performance of the random function.

This suggests that the optimal narrow-pipe hash design would be based on a pure M-D construction with a block cipher compression function, but I assume these points of design were thoroughly researched ages ago.

- Marsh


implementation note: sizeof(XY_type) = 4
implementation note: XY_mask = 0x007FFFFF

----------------------------------------------------------------------
H is a 23-bit iterated hash function of [Merkle-Damgard] construction
using an ideal [random function] for the block compression function C: C : M -> X -> X
    M = {0, 1}^m  (considered fixed)
    X = {0, 1}^23 (8388608 elements)

The padded input message is split into blocks i = 1..N
      h[0] = IV
      h[i] = C(h[i - 1], m[i])
      H = h[n]  (output)

Preimage counts of the block function and their frequency
  0:  3085212  36.7786%
  1:  3087453  36.8053%
  2:  1542686  18.3903%
  3:  513732  6.12416%
  4:  128799  1.5354%
  5:  25759  0.307071%
  6:  4259  0.0507712%
  7:  627  0.00747442%
  8:  76  0.000905991%
  9:  5  5.96046E-05%

Entropy estimates:
   Shannon: 22.1729 bits, a 4% loss
       Min: 19.8301 bits, a 14% loss

----------------------------------------------------------------------
H is a 23-bit iterated hash function of [Merkle-Damgard] construction
using an ideal [random permutation] for the block compression function C: C : M -> X -> X
    M = {0, 1}^m  (considered fixed)
    X = {0, 1}^23 (8388608 elements)

The padded input message is split into blocks i = 1..N
      h[0] = IV
      h[i] = C(h[i - 1], m[i])
      H = h[n]  (output)

Preimage counts of the block function and their frequency
  1:  8388608  100%

Entropy estimates:
   Shannon: 23 bits, a 0% loss
       Min: 23 bits, a 0% loss

----------------------------------------------------------------------
H is a 23-bit iterated hash function of [Davies-Meyer] construction
using an ideal [random function] for the block compression function C: C : M -> X -> X
    M = {0, 1}^m  (considered fixed)
    X = {0, 1}^23 (8388608 elements)

The padded input message is split into blocks i = 1..N
      h[0] = IV
      h[i] = H[i - 1] + C(h[i - 1], m[i])
      H = h[n]  (output)

Preimage counts of the block function and their frequency
  0:  3085773  36.7853%
  1:  3085924  36.7871%
  2:  1543564  18.4007%
  3:  514089  6.12842%
  4:  128881  1.53638%
  5:  25315  0.301778%
  6:  4339  0.0517249%
  7:  636  0.00758171%
  8:  81  0.000965595%
  9:  5  5.96046E-05%
  11:  1  1.19209E-05%

Entropy estimates:
   Shannon: 22.1729 bits, a 4% loss
       Min: 19.5406 bits, a 15% loss

----------------------------------------------------------------------
H is a 23-bit iterated hash function of [Davies-Meyer] construction
using an ideal [random permutation] for the block compression function C: C : M -> X -> X
    M = {0, 1}^m  (considered fixed)
    X = {0, 1}^23 (8388608 elements)

The padded input message is split into blocks i = 1..N
      h[0] = IV
      h[i] = H[i - 1] + C(h[i - 1], m[i])
      H = h[n]  (output)

Preimage counts of the block function and their frequency
  0:  3087177  36.802%
  1:  3084271  36.7674%
  2:  1543368  18.3984%
  3:  514407  6.13221%
  4:  128403  1.53068%
  5:  25938  0.309205%
  6:  4330  0.0516176%
  7:  624  0.00743866%
  8:  80  0.000953674%
  9:  10  0.000119209%

Entropy estimates:
   Shannon: 22.1724 bits, a 4% loss
       Min: 19.8301 bits, a 14% loss

//	mdvsmd.cxx
//
// Monte Carlo evaluation to compare entropy loss for constant message blocks in
//	Merkle-Damgard hash constructions with that of Davies-Meyer.

#include <algorithm>
#include <cmath>
#include <iomanip>
#include <iostream>
#include <map>
#include <vector>
#include <stdexcept>
#include <boost/bind.hpp>
#include <boost/foreach.hpp>
#include <boost/function.hpp>
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/uniform_int.hpp>
#include <boost/ref.hpp>
#include <boost/integer.hpp>

typedef unsigned char uint8;
using namespace std;

//-	Configurables.

// Domain of h[i] : X = Y = {0, 1}^n. I.e. count of bits in the chaining variables and
//	compression funciton output.
static size_t const XY_domain_n = 23;

enum hf_type_enum { enMerkleDamgard, enDaviesMeyer };
static char const * hf_type_str[] = {
	"Merkle-Damgard",
	"Davies-Meyer" };

enum cf_type_enum { enRandomFunction, enRandomPermutation };
static char const * cf_type_str[] = {
	"random function",
	"random permutation" };

typedef boost::mt19937 rng_type; // Mersenne twister

//	"The smallest, built-in, unsigned integral type with at least N bits."
typedef boost::uint_t<XY_domain_n + 1>::least XY_type;

static size_t XY_cardinality = 1 << XY_domain_n;
static size_t XY_mask = XY_cardinality - 1; // for addition modulo-domain_n

void test(hf_type_enum hf_type, cf_type_enum cf_type);

typedef boost::function<XY_type (XY_type x)> FN_C_type;
void report_entropy_propagation(FN_C_type block_fn, XY_type XY_cardinality, rng_type & rng, string const & description);

int main()
{
	cout <<
	"implementation note: sizeof(XY_type) = " << sizeof(XY_type) << "\n"
"implementation note: XY_mask = 0x" << hex << uppercase << setfill('0') << setw(8) << XY_mask << dec << "\n"
	"\n";

	if ( ! (XY_domain_n <= sizeof(XY_type)*8) )
		throw runtime_error("XY_type too small for full domain");

for (size_t hft = 0; hft < sizeof(hf_type_str)/sizeof(hf_type_str[0]); ++hft) for (size_t cft = 0; cft < sizeof(cf_type_str)/sizeof(cf_type_str[0]); ++cft)
			test(hf_type_enum(hft), cf_type_enum(cft));

	return 0;
}

//	Returns a random i :
//		0 <= i < modulo            modulo == 0
//		0 <= i < XY_cardinality    modulo > 0
XY_type random_element(rng_type & rng, XY_type modulo = 0)
{
	XY_type elem = 0;
	if (!modulo)
	{
//size_t cnt_needed = (sizeof(XY_type) + sizeof(rng_type::result_type) - 1)/sizeof(rng_type::result_type);
		size_t cnt_needed = sizeof(XY_type)*8*2;
		while (cnt_needed--)
		{
			elem = (elem << 1) ^ rng();
		}
		elem &= XY_mask;
	}
	else
	{
		XY_type max_mult = (XY_type(~0)/modulo)*modulo;  assert(2 <= max_mult);
		do
			elem = random_element(rng);
		while (! (elem < max_mult) );
		return elem%modulo;
// if ( ! (modulo <= XY_cardinality) ) throw runtime_error("Doesn't make sense");
//		elem = boost::uniform_int<XY_type>(0, modulo - 1)(rng);
	}

	if (elem & ~XY_mask)
		throw runtime_error("Doesn't make sense");

	return elem;
}

//	Function with a mapping of X -> Y implemented as a vector.
struct FN_bare_C
{
	FN_bare_C(vector<XY_type> const & mapping) : mapping_(mapping) { }
	vector<XY_type> const & mapping_;

	XY_type operator () (XY_type x)
	{
		if ( ! (x < mapping_.size()) )
			throw runtime_error("out of range");

		XY_type y = mapping_[x];

		//y |= 7;// make C "less good"

		return y;
	}
};

//	Function with a mapping of X -> Y implemented as a vector.
struct FN_DaviesMeyer_block
{
	FN_DaviesMeyer_block(vector<XY_type> const & mapping) : bareC_(mapping) { }
	FN_bare_C bareC_;

	XY_type operator () (XY_type h_prev)
	{
		XY_type x = h_prev;
		XY_type y = bareC_(x);
		return (h_prev + y) & XY_mask;
//		return y;
	}
};

void test(hf_type_enum hf_type, cf_type_enum cf_type)
{
	string lineSep(70, '-');

	boost::mt19937 rng; // Mersenne twister

	cout <<
	lineSep << "\n"
"H is a " << XY_domain_n <<"-bit iterated hash function of [" << hf_type_str[hf_type] << "] construction\n" "using an ideal [" << cf_type_str[cf_type] << "] for the block compression function C: \n"
	"    C : M -> X -> X\n"
	"    M = {0, 1}^m  (considered fixed)\n"
" X = {0, 1}^" << XY_domain_n << " (" << XY_cardinality << " elements)\n"
	"\n"
	"The padded input message is split into blocks i = 1..N\n"
	"      h[0] = IV\n"
	"      h[i] = " <<
(hf_type == enMerkleDamgard ? "" : hf_type == enDaviesMeyer ? "H[i - 1] + " : "WTF?!") <<
	"C(h[i - 1], m[i])\n"
	"      H = h[n]  (output)\n"
	"\n";

	//	Initialize the mapping which defines C.
	vector<XY_type> mapping(XY_cardinality);
	if (cf_type == enRandomFunction)
	{
for (vector<XY_type>::iterator it = mapping.begin(); it != mapping.end(); ++it)
			*it = random_element(rng);
	}
	else if (cf_type == enRandomPermutation)
	{
		XY_type y = 0;
for (vector<XY_type>::iterator it = mapping.begin(); it != mapping.end(); ++it)
			*it = y++;

		if (!mapping.empty())
		for (unsigned int n = 0; n < 10; ++n)
		{
			//	Dumb broken shuffle.
			for (size_t i = 0; i < mapping.size(); ++i)
				swap(mapping[i], mapping[random_element(rng)]);

			//	Shuffle from Durstenfield Fisher-Yates via Wp
			for (size_t i = mapping.size() - 1; i; --i)
				swap(mapping[i], mapping[random_element(rng, i + 1)]);
		}
	}
	else throw runtime_error("wtf");

	//	Dump the mapping if it's not too big
	if (XY_cardinality <= 64)
	{
		cout << "mapping:\n";
		unsigned elem = 0;
		BOOST_FOREACH(XY_type img, mapping)
			cout << "  " << elem++ << " -> " << 0 + img << "\n";
		cout << "\n";
	}

	FN_bare_C bare_C(mapping);
	FN_DaviesMeyer_block dm_block(mapping);

	boost::function<XY_type(XY_type x)> block_fn =
		hf_type == enMerkleDamgard ? bare_C :
		hf_type == enDaviesMeyer ?   dm_block :
		boost::function<XY_type(XY_type x)>();

	report_entropy_propagation(
		block_fn, // boost::function<XY_type(XY_type x)> block_fn
		XY_cardinality, // XY_type XY_cardinality
		rng,
		"block function" ); // string const & description
}

// Supplied a function of X -> Y, evaluates trials to estimate effective entropy. void report_entropy_propagation(FN_C_type block_fn, XY_type XY_cardinality, rng_type & rng, string const & description)
{
	//	Count the number of preimages for each possible image.
	//	Map of image -> preimage frequency.
	vector<size_t> image_cntPreimages(XY_cardinality);
#if 1
	//	One trial for each element.
	for (XY_type element = 0; element < XY_cardinality; ++element)
	{
		XY_type image = block_fn(element);
		++image_cntPreimages[image];
	}
#else
	//	A bunch of random trials.
	for (size_t n = 0; n < max<XY_type>(XY_cardinality*2, 1024*1024); ++n)
	{
		XY_type element = random_element(rng);
		XY_type image = block_fn(element);
		++image_cntPreimages[image];
	}
#endif

	//	Dump the preimage counts if it's not too big
	if (XY_cardinality <= 32)
	{
		cout << "preimage counts:\n";
		for (size_t image = 0; image < image_cntPreimages.size(); ++image)
			cout << "  " << image << " <- qty " << image_cntPreimages[image] << "\n";
		cout << "\n";
	}

// Construct a map of preimage count -> frequency of that count's occurrence among images in Y.
	typedef map<size_t, size_t> cnt_freqs_type;
	cnt_freqs_type cntPreimages_freqs;
for (vector<size_t>::const_iterator cit = image_cntPreimages.begin(); cit != image_cntPreimages.end(); ++cit)
	{
		size_t cntPreimages = *cit;

		cnt_freqs_type::iterator it = cntPreimages_freqs.find(cntPreimages);
		if (it != cntPreimages_freqs.end())
			++it->second;
		else
			cntPreimages_freqs[cntPreimages] = 1;
	}

cout << "Preimage counts of the " << description << " and their frequency\n";
	BOOST_FOREACH(cnt_freqs_type::value_type pr, cntPreimages_freqs)
cout << " " << pr.first << ": " << pr.second << " " << (100.0*double(pr.second)/XY_cardinality) << "%\n";

	{
		size_t totalN = 0;
		BOOST_FOREACH(XY_type cnt, image_cntPreimages)
			totalN += cnt;

		double H_shn = 0.0;
		size_t cntPreimages_max = 0;
		BOOST_FOREACH(XY_type cnt, image_cntPreimages)
		{
			if (cnt)
			{
				double p = double(cnt)/totalN;
				H_shn -= p*log2(p);

				if (cntPreimages_max < cnt)
					cntPreimages_max = cnt;
			}
		}

//cout << "cntPreimages_max=" << cntPreimages_max << " totalN=" << totalN << "\n";

		double p = double(cntPreimages_max)/totalN;
		double H_min = -log2(p);

		cout << "\n"
		"Entropy estimates:\n"
" Shannon: " << H_shn << " bits, a " << int(100.0*(XY_domain_n - H_shn)/XY_domain_n + 0.5) << "% loss\n" " Min: " << H_min << " bits, a " << int(100.0*(XY_domain_n - H_min)/XY_domain_n + 0.5) << "% loss\n"
		"\n";
	}
}

Note: Messages sent to this list are the opinions of the senders and do not imply endorsement by the IETF.