[Q] Entropy reduction resulting from hashing

Nicol So

unread,

Oct 13, 2001, 10:50:15 AM10/13/01

to

The problem below came from an attempt to analyze the effect of
repeatedly applying a cryptographic hash function to an input string. I
spent a little time on it several months ago but couldn't find the
answer. I got a feeling that the *form* of the problem must be familiar
to mathematicians, regardless of whether the answer itself is known.

Here's a formalized description of the problem:

Let X be a uniformly distributed random variable whose value ranges over
{0,1}^n, i.e. binary strings of length n.

Let f:{0,1}^n -> {0,1}^n be a function chosen uniformly at random from
the set of functions defined on {0,1}^n.

What is the entropy of t iterated application of f on X, i.e. H(f^t(X))?
(For concreteness, assume entropy is measured in bits).

--
Nicol So, CISSP // para...@engineer.com
Disclaimer: Views expressed here are casual comments and should
not be relied upon as the basis for decisions of consequence.

Mok-Kong Shen

unread,

Oct 13, 2001, 11:49:42 AM10/13/01

to

Nicol So wrote:
>

> Let X be a uniformly distributed random variable whose value ranges over
> {0,1}^n, i.e. binary strings of length n.
>
> Let f:{0,1}^n -> {0,1}^n be a function chosen uniformly at random from
> the set of functions defined on {0,1}^n.
>
> What is the entropy of t iterated application of f on X, i.e. H(f^t(X))?
> (For concreteness, assume entropy is measured in bits).

If you allow a layman's (possibly wrong/silly) comment:
If X is from a perfectly random source, it has n bits of
entropy. H(f^t(X)) has also n bits. It can't have more
entropy. (Analogy: Purifying some chemicals already pure
hundred times doesn't result in anything 'more' pure.)

M. K. Shen

Corners

unread,

Oct 13, 2001, 1:03:21 PM10/13/01

to

"Mok-Kong Shen" <mok-ko...@t-online.de> wrote in message
news:3BC86296...@t-online.de...

You're right, it can't have more entropy, but it can have
less. For a random function, many values may not have a
"pre-image", but all will have an image. Some values cannot
be a result of f^t(x). Some values will have many
pre-images, and f^t(x) will be more likely that a value
selected at random. Both these things reduce entropy. HAC
discusses random functions, there might be enough info to
answer the question there. If I succeed, I'll post.

Nicol So

unread,

Oct 13, 2001, 1:20:05 PM10/13/01

to

Mok-Kong Shen wrote:
>
> Nicol So wrote:
> >
> > Let X be a uniformly distributed random variable whose value ranges over
> > {0,1}^n, i.e. binary strings of length n.
> >
> > Let f:{0,1}^n -> {0,1}^n be a function chosen uniformly at random from
> > the set of functions defined on {0,1}^n.
> >
> > What is the entropy of t iterated application of f on X, i.e. H(f^t(X))?
> > (For concreteness, assume entropy is measured in bits).
>
> If you allow a layman's (possibly wrong/silly) comment:
> If X is from a perfectly random source, it has n bits of
> entropy. H(f^t(X)) has also n bits.

H(f^t(X)) will not be n bits, due to collision. H(f^t(X)) is monotone
decreasing in t; I don't think it converges to 0. I *speculate* that it
converges to a limit of roughly n/2, but that's based on (possibly
incorrect) intuition, not analysis.

I think there might be useful results from the study of random graphs.

Corners

unread,

Oct 13, 2001, 1:50:57 PM10/13/01

to

"Nicol So" <nob...@no.spam.please> wrote in message
news:3BC854A7...@no.spam.please...

The "Handbook of Applied Cryptography" page 54 ff. gives a
formula for the remaining number of "Th iterate image
points". But I cant find anything about whether some are
more likely than others, so I'm lost. As the number of
pre-images approaches infinity, the number of
connected components is 1/2 ln(n)
images on cycles is sqrt(Pi*n/2)

We could make the assumption that all images on cycles are
equally likely to establish an upper bound,
sqrt(Pi*2^160/2) equally likely possibilities means that if
SHA is a random function, and if you iterate enough times,
you should end up with less than 81 bits of entropy. I have
no idea how many times are enough (less than 2^160, if that
helps), or how much less than 81 bits you'll end up with.

Consider also the properties of some particular hash
function, such as non-invertability, collision resistance,
short description (in the Kolmogorov sense), may mean that
it doesn't behave like a random function at all. This is
still an interesting (to me) problem, as stated, but beware
in applications.

Oh, and somebody check my reasoning. I am not a
mathematician.

David Wagner

unread,

Oct 13, 2001, 3:41:04 PM10/13/01

to

Nicol So wrote:
>I think there might be useful results from the study of random graphs.

Absolutely. My favorite reference is
P. Flajolet and A. M. Odlyzko, ``Random mapping statistics,'' EUROCRYPT '89.
http://www.dtc.umn.edu/~odlyzko/doc/arch/random.mappings.pdf
This is a fantastic source of combinatorial results on random maps.

David Wagner

unread,

Oct 13, 2001, 3:38:29 PM10/13/01

to

Nicol So wrote:
>Let f:{0,1}^n -> {0,1}^n be a function chosen uniformly at random from
>the set of functions defined on {0,1}^n.
>What is the entropy of t iterated application of f on X, i.e. H(f^t(X))?

Well, a while ago I did some calculations on a related question.
Let a_t denote the fraction of n-bit values which can appear as the
result of iterating f t times. Then a_1 ~ 1 - 1/e, and
a_{t+1} ~ 1 - e^{-a_t}.
The first few numbers in the sequence are .632, .469, .374, .312,
.268, .235, .210, etc. Asymptotically, a_t ~ 2/(t + log t) is a
pretty good approximation. Let me know if you'd like to see the
calculations in this paragraph justified.

This doesn't tell you exactly what the entropy will be, but it
looks like it should give us a pretty good guess. I'd imagine that
the lost entropy is going to be on the order of -lg a_t bits, which
is not too large as long as t isn't too big. Asymptotically, this
is -lg a_t ~ lg(t + log t) - 1, so if you iterate t = 2^k times,
my expectation is that you'll only lose about k-1 bits of entropy.
Caveat: This latter paragraph is all guesswork; I have no proof.

David Wagner

unread,

Oct 13, 2001, 3:55:34 PM10/13/01

to

Corners wrote:
>But I cant find anything about whether some are
>more likely than others, so I'm lost.

Yes, they are. They have to be; if we have a function with
signature S->S that is not surjective, then some point in the
range must be reached by two different pre-images.

Nicol So

unread,

Oct 13, 2001, 4:10:58 PM10/13/01

to

David Wagner wrote:
>
> Nicol So wrote:
> >Let f:{0,1}^n -> {0,1}^n be a function chosen uniformly at random from
> >the set of functions defined on {0,1}^n.
> >What is the entropy of t iterated application of f on X, i.e. H(f^t(X))?
>
> Well, a while ago I did some calculations on a related question.
> Let a_t denote the fraction of n-bit values which can appear as the
> result of iterating f t times. Then a_1 ~ 1 - 1/e, and
> a_{t+1} ~ 1 - e^{-a_t}.
> The first few numbers in the sequence are .632, .469, .374, .312,
> .268, .235, .210, etc. Asymptotically, a_t ~ 2/(t + log t) is a
> pretty good approximation. Let me know if you'd like to see the
> calculations in this paragraph justified.

This doesn't seem right. According to your asymptotic approximation, a_t
can get arbitrarily close to 0 as t tends to infinity. For any given f,
there's a certain value of t at and beyond which the graph
representation of f^t will consist of disjoint cycles only. The image of
{0,1}^n under f^t should not further reduce from that point on. Did I
miss something?

David Wagner

unread,

Oct 13, 2001, 5:36:07 PM10/13/01

to

Nicol So wrote:
>This doesn't seem right. According to your asymptotic approximation, a_t
>can get arbitrarily close to 0 as t tends to infinity. For any given f,
>there's a certain value of t at and beyond which the graph
>representation of f^t will consist of disjoint cycles only. The image of
>{0,1}^n under f^t should not further reduce from that point on. Did I
>miss something?

No, you're right. It's an approximation, and one that I believe
is fairly good if 1 << t << 2^{n/2}, but is likely to be bad for
larger values of t.

Here's the heuristic justification for my expansion. Let the set
S denote the image of f^t and T the image of f^{t+1}, i.e., T = f(S).
We know the expected size of S is 2^n a_t. We calculate the
expected size of T, i.e., 2^n a_{t+1}, as follows by fixing an
arbitrary y in {0,1}^n and asking whether it is in T:
Pr[y not in T]
= \prod_{x in S} Pr[f(x) != y]
= \prod_{x in S} (1 - 1/2^n)
= (1 - 1/2^n)^|S|
= (1 - 1/2^n)^{2^n a_t}
~ e^{-a_t}
Thus the expected value of |T| is something like 2^n (1 - e^{-a_t}),

and a_{t+1} ~ 1 - e^{-a_t}.

We can note several questionable assumptions in the above argument.
For one, we're being loose about how we're taking expected values
(since |T| is not a linear function of |S|, the above calculation is
not entirely justified).

Probably more importantly, we're implicitly
assuming that the random variable f is independent of T, and yet this
assumption is clearly incorrect. The above calculation would be
more reasonable if we picked t independent random maps f_1,f_2,..,f_t
and asked for the expected size of the image of f_t o ... o f_2 o f_1.

The latter may sound like a fatal flaw in the argument, but I want to
argue that it is not a big deal so long as t << 2^{n/2}. Suppose we
consider any pair of elements x,x' and ask for p(x,x') = Pr[f^t(x) = f^t(x')].
If we follow x,f(x),...,f^t(x), we see that we get a sequence of values
that are different with high probability (so long as t << 2^{n/2}) and
thus is distributed exactly like x,g_1(x),g_2(x),..,g_t(x) where
g_i = f_i o ... o f_2 o f_1. This means that if we define q(x,x') =
Pr[g_t(x) = g_t(x')], we have q(x,x') ~ p(x,x'). Moreover, clearly
the value of p(x,x') is intimately connected to the size of the image
of f^t, and q(x,x') to the image of g_t. My heuristic calculation
essentially gives us a good estimate of the size of the image of g_t,
and thus a good estimate of q(x,x'), but since p(x,x') ~ q(x,x'),
this implicitly gives us a good estimate of p(x,x'), which thus transfers
over to give us a good estimate of the size of f^t. Did that make any
sense?

David Wagner

unread,

Oct 13, 2001, 5:58:45 PM10/13/01

to

Nicol So wrote:
>David Wagner wrote:
>> Nicol So wrote:
>> >Let f:{0,1}^n -> {0,1}^n be a function chosen uniformly at random from
>> >the set of functions defined on {0,1}^n.
>> >What is the entropy of t iterated application of f on X, i.e. H(f^t(X))?
>>

>> Let a_t denote the fraction of n-bit values which can appear as the

>> result of iterating f t times. Then [...] a_t ~ 2/(t + log t) [...]

>
>This doesn't seem right. According to your asymptotic approximation, a_t
>can get arbitrarily close to 0 as t tends to infinity. For any given f,
>there's a certain value of t at and beyond which the graph
>representation of f^t will consist of disjoint cycles only. The image of
>{0,1}^n under f^t should not further reduce from that point on. Did I
>miss something?

I thought of another way to look at this. Again, this will be
heuristic in nature (and only valid for t << 2^{n/2}), but if you
don't need a proof, maybe it is interesting.

Instead of calculating a_t, let's calculate the collision probability:
if we pick two random elements x,x' in {0,1}^n, let p_t denote the
probability that they collide under f^t.

Note that if you're willing to use Renyi entropy of order two, instead of
the more usual Shannon entropy metric, then p_t immediately determines
the Renyi entropy, via H_2 = -lg p_t. (For more on entropy measures,
see <http://www.cs.berkeley.edu/~daw/my-posts/entropy-measures>.)

Now we can calculate p_t directly as follows. p_0 = Pr[x=x'] = 1/2^n.
Also p_1 = Pr[x=x'] + Pr[x!=x' and f(x)=f(x')] = p_0 + (1 - p_0) * 1/2^n
~ (2 - 1/2^n)/2^n. In general,
p_{t+1}
= p_t + Pr[f^t(x)!=f^t(x') and f^{t+1}(x)=f^{t+1}(x')]
~ p_t + (1 - p_t)/2^n
~ 1/2^n + (1 - 1/2^n) p_t.
For this calculation I assume that the sequence x,f(x),..,f^t(x) does
not enter a cycle, and similarly for x', so this calculation will only
be valid when t^2 << 2^n.

Solving the above recurrence relation gives
p_t = (1+a+a^2+...+a^t)/2^n where a = 1 - 1/2^n
= (1-a^{t+1})/((1-a) 2^n)
~ (t+1)/2^n.
We may now estimate the Renyi entropy as
H_2(f^t(X)) = -lg p_t ~ n - lg(t+1),
which is in rough agreement with my estimate of the size of the image
of {0,1}^t under f^t.

Does this seem helpful? Or does the estimate seem skewed?

Mok-Kong Shen

unread,

Oct 13, 2001, 7:23:23 PM10/13/01

to

Nicol So wrote:
>
> Mok-Kong Shen wrote:
> >
> > Nicol So wrote:
> > >
> > > Let X be a uniformly distributed random variable whose value ranges over
> > > {0,1}^n, i.e. binary strings of length n.
> > >
> > > Let f:{0,1}^n -> {0,1}^n be a function chosen uniformly at random from
> > > the set of functions defined on {0,1}^n.
> > >
> > > What is the entropy of t iterated application of f on X, i.e. H(f^t(X))?
> > > (For concreteness, assume entropy is measured in bits).
> >
> > If you allow a layman's (possibly wrong/silly) comment:
> > If X is from a perfectly random source, it has n bits of
> > entropy. H(f^t(X)) has also n bits.
>
> H(f^t(X)) will not be n bits, due to collision. H(f^t(X)) is monotone
> decreasing in t; I don't think it converges to 0. I *speculate* that it
> converges to a limit of roughly n/2, but that's based on (possibly
> incorrect) intuition, not analysis.
>
> I think there might be useful results from the study of random graphs.

Yes, I was wrong. But, if X has n bits of entropy, f^t(X),
which also has n bits, may or may not have n bits of
entropy, I suppose. So, without knowing f, one couldn't
give general statement on f^t(X), and hence also on
H(f^t(X)), I am afraid. (H(f^t(X)) applies H only once on
the argument f^t(X), not repeatedly applying H as you said,
in my understanding.)

M. K. Shen

Nicol So

unread,

Oct 13, 2001, 8:08:37 PM10/13/01

to

Mok-Kong Shen wrote:
>
> Nicol So wrote:
> >
> > H(f^t(X)) will not be n bits, due to collision. H(f^t(X)) is monotone
> > decreasing in t; I don't think it converges to 0. I *speculate* that it
> > converges to a limit of roughly n/2, but that's based on (possibly
> > incorrect) intuition, not analysis.
>

> Yes, I was wrong. But, if X has n bits of entropy, f^t(X),
> which also has n bits, may or may not have n bits of
> entropy, I suppose. So, without knowing f, one couldn't
> give general statement on f^t(X), and hence also on
> H(f^t(X)), I am afraid. (H(f^t(X)) applies H only once on
> the argument f^t(X), not repeatedly applying H as you said,
> in my understanding.)

In my original message, I stated that f is chosen uniformly at random
from all functions defined on {0,1}^n.

A random variable is just a distribution of values. X is a discrete
random variable. When f is chosen according to a specified distribution,
f(X) is another distribution of values--another random variable if you
will. The same f can be applied repeatedly to produce a series of random
variables, i.e. f^t(X).

H, the entropy function, is defined for discrete distributions. It is
certainly meaningful to talk about H(f^t(X)). What is being repeated
applied is f, not H.

Mok-Kong Shen

unread,

Oct 13, 2001, 8:16:45 PM10/13/01

to

Nicol So wrote:
>
> Mok-Kong Shen wrote:
> >
> > Nicol So wrote:
> > >
> > > H(f^t(X)) will not be n bits, due to collision. H(f^t(X)) is monotone
> > > decreasing in t; I don't think it converges to 0. I *speculate* that it
> > > converges to a limit of roughly n/2, but that's based on (possibly
> > > incorrect) intuition, not analysis.
> >
> > Yes, I was wrong. But, if X has n bits of entropy, f^t(X),
> > which also has n bits, may or may not have n bits of
> > entropy, I suppose. So, without knowing f, one couldn't
> > give general statement on f^t(X), and hence also on
> > H(f^t(X)), I am afraid. (H(f^t(X)) applies H only once on
> > the argument f^t(X), not repeatedly applying H as you said,
> > in my understanding.)
>
> In my original message, I stated that f is chosen uniformly at random
> from all functions defined on {0,1}^n.
>
> A random variable is just a distribution of values. X is a discrete
> random variable. When f is chosen according to a specified distribution,
> f(X) is another distribution of values--another random variable if you
> will. The same f can be applied repeatedly to produce a series of random
> variables, i.e. f^t(X).
>
> H, the entropy function, is defined for discrete distributions. It is
> certainly meaningful to talk about H(f^t(X)). What is being repeated
> applied is f, not H.

O.k., but you wrote 'repeatedly applying a cryptographic
hash function' and, since H presumably denotes this
hash function, that's how I interpreted the issue.

M. K. Shen

John Myre

unread,

Oct 15, 2001, 1:25:47 PM10/15/01

to

Mok-Kong Shen wrote:
<snip>

> O.k., but you wrote 'repeatedly applying a cryptographic
> hash function' and, since H presumably denotes this
> hash function, that's how I interpreted the issue.

<snip>

That's because you didn't read carefully. Look again at
the original post, and it is quite clear that H is *not*
a hash function.

JM

Mok-Kong Shen

unread,

Oct 15, 2001, 3:19:43 PM10/15/01

to

Perhaps I could take this opportunity to ask a question:
The original article said 'f:{0,1}^n -> {0,1}^n'. So the
hash function f has input and output of the same length.
How does one argue that such a function, when repeatedly
applied, continually enhances the entropy? Thanks.

M. K. Shen

Corners

unread,

Oct 15, 2001, 6:43:44 PM10/15/01

to

"Mok-Kong Shen" <mok-ko...@t-online.de> wrote in message

news:3BCB36CF...@t-online.de...

It can't increase the entropy. But it might decrease the
entropy. A trivial, non-random example would be to "AND"
the input stream with a constant. Each 0-bit in the
constant loses one bit of input entropy for the output
stream. (Under the usual assumptions that the input stream
was uncorrelated, uniformly distributed, etc.)

If a function is bijective, every image has one and only one
pre-image, and every pre-image has one and only one image,
then it preserves entropy. But the VAST majority of random
functions aren't bijective. Mapping 160 bits onto 160 bits,
there are:

2^(160*2^160) total functions = 10^(7.03929E+49)
but only (2^160)! bijective functions = 10^(6.97582E+49)
So ratio is 10^(-6.34722E+47). This is a very tiny
fraction.

Assume a random function with n inputs and outputs. An
output has a (1-1/n) chance that it won't be the image of a
given input. The chance that an output will have no
pre-image is (1-1/n)^n. For large values of n, this
approaches EXP(-1).
For n greater than thirty or so, this is approximately 37%.

(In the last paragraph I wrote "only (2^160)!" now, I'm
saying 30 is large. What a strange business we are in!)

So, about two-thirds of the potential outputs of a given
random function will actually be outputs. When a random
number is put through a given random function, the output is
more predictable than the input.

When the random function is iteratively applied, the math
gets a little more tricky. Knuth's "Art of Computer
Programming" and the "Handbook of Applied Cryptography" by
A. Menezes, P. van Oorschot and S. Vanstone do a better job
of explaining than I can.

According to my calculations, a random function, applied
repeatedly, will eventually lose more than half the original
Shannon entropy. (Thanks, David Wagner, for pointing out
different kinds of entropy)

SHA-1 may look like a random function, but is it? If
repeatedly applying a hash results in one of only a few
possibilities, then that would make a good attack for
finding collisions. I think good hash functions have
special characteristics that make them "non-random".

I think the fix from SHA-0 to SHA-1 could be interpreted as
preserving entropy. Can somebody who really knows help
here?

And, if somebody checks my math and finds an error, please
let me and everybody know, ok?

***
Corners (Ask me what time it is, I'll tell you how to build
a wrist watch!)

David Wagner

unread,

Oct 15, 2001, 7:42:15 PM10/15/01

to

Corners wrote:
>SHA-1 may look like a random function, but is it? If
>repeatedly applying a hash results in one of only a few
>possibilities, then that would make a good attack for
>finding collisions. I think good hash functions have
>special characteristics that make them "non-random".

I doubt it. I believe SHA-1 (like other cryptographic hash functions)
was designed to behave like a random function. I have no proof of this,
of course, but any deviation from "random" would likely be a publishable
result.

Corners

unread,

Oct 15, 2001, 9:43:24 PM10/15/01

to

"David Wagner" <d...@mozart.cs.berkeley.edu> wrote in message
news:9qfs8n$1o24$1...@agate.berkeley.edu...

It is non-random in the Kolmogorov sense, because there is a
description of it in less than 160*(2^160) bits = 2.92*10^49
bytes. That's the Shannon entropy of a random mapping of
160 bits onto 160 bits. An HTML version of the Fips-180-1
document is only 51,021 bytes long.

Sadly, this is like proving the digits of pi are non-random,
because they solve the equation for arc-cosine(-1).
Finding non-random behavior is a completely different story.
Probably way beyond my capabilities, even if it exists.
Still, maybeeeee....

David Wagner

unread,

Oct 15, 2001, 11:55:44 PM10/15/01

to

Corners wrote:
>It is non-random in the Kolmogorov sense, [...]

Yes, but I don't see why this would matter. Rijndael is also non-random
in that sense, but that's irrelevant to the security of Rijndael in
practice. I admit I didn't define what I meant by "looks random",
but this isn't what i meant. :-)

Sadly, I can't give you a good definition for security of a hash function
(this is something that seems harder to formalize than the corresponding
notion for, say, a block cipher), but I'm pretty sure Kolmogorov's sense
of randomness is not terribly relevant for this purpose.

Mok-Kong Shen

unread,

Oct 16, 2001, 4:43:35 AM10/16/01

to

Are there design rationales of the well-known hash
functions and test reports of satisfaction of requirements
available somewhere? Thanks.

M. K. Shen

Mok-Kong Shen

unread,

Oct 16, 2001, 4:59:16 AM10/16/01

to

David Wagner wrote:
>
[snip]

> Sadly, I can't give you a good definition for security of a hash function
> (this is something that seems harder to formalize than the corresponding
> notion for, say, a block cipher), but I'm pretty sure Kolmogorov's sense
> of randomness is not terribly relevant for this purpose.

In view of the difficulty of definition for security, how
do the crypto hash functions distinguish themselves from
the normal hash functions? Thanks.

M. K. Shen

Mok-Kong Shen

unread,

Oct 16, 2001, 12:53:07 PM10/16/01

to

Corners wrote:
>
> "Mok-Kong Shen" <mok-ko...@t-online.de> wrote:

> > Perhaps I could take this opportunity to ask a question:
> > The original article said 'f:{0,1}^n -> {0,1}^n'. So the
> > hash function f has input and output of the same length.
> > How does one argue that such a function, when repeatedly
> > applied, continually enhances the entropy? Thanks.
> >

> It can't increase the entropy. But it might decrease the
> entropy. A trivial, non-random example would be to "AND"
> the input stream with a constant. Each 0-bit in the
> constant loses one bit of input entropy for the output
> stream. (Under the usual assumptions that the input stream
> was uncorrelated, uniformly distributed, etc.)
>
> If a function is bijective, every image has one and only one
> pre-image, and every pre-image has one and only one image,
> then it preserves entropy. But the VAST majority of random
> functions aren't bijective. Mapping 160 bits onto 160 bits,
> there are:

[snip]

If a function permutes the bits, then it is certainly
bijective. Now, if the input of n bits doesn't have
full entropy, what could be said of the output?
Intuitively, there should in general be some 'increase',
isn't it? Or is it that the question is not well-posed?
Thanks.

M. K. Shen

Joseph Ashwood

unread,

Oct 16, 2001, 5:46:39 PM10/16/01

to

Nicol So <nob...@no.spam.please> wrote in message news:<3BC854A7...@no.spam.please>...

> The problem below came from an attempt to analyze the effect of
> repeatedly applying a cryptographic hash function to an input string. I
> spent a little time on it several months ago but couldn't find the
> answer. I got a feeling that the *form* of the problem must be familiar
> to mathematicians, regardless of whether the answer itself is known.
>
> Here's a formalized description of the problem:
>
> Let X be a uniformly distributed random variable whose value ranges over
> {0,1}^n, i.e. binary strings of length n.
>
> Let f:{0,1}^n -> {0,1}^n be a function chosen uniformly at random from
> the set of functions defined on {0,1}^n.
>
> What is the entropy of t iterated application of f on X, i.e. H(f^t(X))?
> (For concreteness, assume entropy is measured in bits).

As has already been noted it depends highly on the random function
chosen, for example f(t) = 000000....0000 obviously loses all entropy
very quickly. It's worth noting that there are fewer of these
(infinitely lossy functions) than there are of lossless functions, in
fact there are only 2^n infinitely lossy functions, while there are
2^n! lossless functions (since we're dealing in binary). Obviously the
majority of functions are partially lossy, that is there exists X,Y in
{0,1}^n where f(X)=f(Y), X =/= Y. It will take a significant amount
more thought on this to come up with a precise amount of average loss,
but an initial estimation based on the observation that there seems to
be g(k) =(((2^n)-k)*(product(t=0 to k) of (2^n)-t)))! functions
(please correct me if i'm wrong) will a given number k of collision,
this gives us weights for building an arithmetic mean. It seems that
we will be taking the (mean of (sum(k = 0 to 2^n) of
(((2^n)-k)/(2^n))*g(k))), this will give a value in [0,1] of the
initial entropy for that stage that will reach the end on average, so
if we label this function i(n), then (i(n))^x will give us the amount
of entropy preservation that will occur accross x iterations. Now all
someone has to do is figure out some way to compute that beast when n
is more than a handful. This converges to 0 for infinite x, which I
believe is wrong. It should converge to a very small amount, but it
should do converge slower than this one appears to, at least
intuitively.
Joe

David Wagner

unread,

Oct 16, 2001, 8:34:02 PM10/16/01

to

Joseph Ashwood wrote:

>Nicol So <nob...@no.spam.please> wrote:
>> Let X be a uniformly distributed random variable whose value ranges over

>> {0,1}^n, i.e. binary strings of length n. [...]

>
>As has already been noted it depends highly on the random function
>chosen, for example f(t) = 000000....0000 obviously loses all entropy
>very quickly.

Did you perhaps misunderstood the question? The function is a random
variable, and I believe Nicol So wants to know something about the
distribution of the entropy that results. It's like if someone asks
"What is the sum of i^2 from i=1 to i=10?", and you answer "well, the
answer depends on i". Does that make any sense?

Benjamin Goldberg

unread,

Oct 16, 2001, 11:54:14 PM10/16/01

to

Ordinary hash functions [the kind used for creating indices to buckets
of a hash table] only have to be reasonably collision resistant.

Crypto hash fuctions have to be preimage resistant and second preimage
resistant as well. There may be other required properties, but this is
a start, and if you look up those terms, you will likely discover links
to other requirements of crypto hash functions.

--
"What does stupid old man mean pidgin talk? Shampoo does not talk like a
bird."

Joseph Ashwood

unread,

Oct 17, 2001, 2:43:21 AM10/17/01

to

d...@mozart.cs.berkeley.edu (David Wagner) wrote in message news:<9qijlq$mp$3...@agate.berkeley.edu>...

There are a very large number of random functions, given the unlimited
output range from the question the expected random function is equally
probable with f(x)=000...000 is equally likely to be f(x)=SHA1(x) is
equally likely with any other appropriate function. Given that
possible range of functions the question is more "What is the sum of
f(i) from i=1 to i=10?" in this case we happen to have some extra
information about the function f(), it's domain and range are
identical sets, but we also know that it the output set is not
necessarily dense (in fact it's extremely unlikely that it will be).
Joe

David Wagner

unread,

Oct 17, 2001, 5:37:49 AM10/17/01

to

Joseph Ashwood wrote:
>Given that
>possible range of functions the question is more "What is the sum of

>f(i) from i=1 to i=10?" [...]

No, it's not. If you said "the sum of f(0) over all f", I'd agree.

I still think you are misunderstanding the question. I believe Nicol
So is asking for the *average*, over all f, of H(f^t(X)). If this is
indeed the correct question, saying that "the answer depends on f" is
not even wrong -- such a statement would be just a jumble of words to
which no meaningful mathematical interpretation can be assigned.

Do you agree?

Mok-Kong Shen

unread,

Oct 17, 2001, 8:14:15 AM10/17/01

to

Thanks. I like, despite your last sentence, to repeat
what I wrote in another follow-up:

Are there design rationales of the well-known hash
functions and test reports of satisfaction of
requirements available somewhere?

I hope that someone would kindl give a few truely valuable
URLs or other sources on the topic.

M. K. Shen

Mok-Kong Shen

unread,

Oct 17, 2001, 8:35:52 AM10/17/01

to

From some of the follow-ups of Corners, where he stressed
the case of bijective functions, I conjecture that, averaged
over all f, H(f^t(X)) would be decreasing with t, since
most f are not bijective.

BTW, I like to take this opportunity to repeat my question
of whether H(f^t(X)), with f a permutation, would be
increasing with t, if X doesn't have full entropy. Thanks
in advance.

M. K. Shen

Bryan Olson

unread,

Oct 17, 2001, 11:38:35 PM10/17/01

to

Benjamin Goldberg:

> Ordinary hash functions [the kind used for creating indices to buckets
> of a hash table] only have to be reasonably collision resistant.
>
> Crypto hash fuctions have to be preimage resistant and second preimage
> resistant as well.

That's not really it. Ordinary hash functions are not "collision
resistant" as cryptologists use the term. Collision resistance implies
second pre-image resistance.

Cryptographic hash functions are collision resistant in the sense that
intelligent adversaries are unable to find collisions. Ordinary hash
function merely minimize collisions in typical data sets.

--Bryan

Nicol So

unread,

Oct 17, 2001, 11:36:53 PM10/17/01

to

Mok-Kong Shen wrote:
>
> From some of the follow-ups of Corners, where he stressed
> the case of bijective functions, I conjecture that, averaged
> over all f, H(f^t(X)) would be decreasing with t, since
> most f are not bijective.

Your intuition is correct. The expected value of H(f^t(X)) is a
non-increasing function in t. What remains to be determined are the rate
of decrease and the limit to which the value converge as t tends to
infinity.

> BTW, I like to take this opportunity to repeat my question

> of whether H(f^t(X)), with f a permutation, would be
> increasing with t, if X doesn't have full entropy. Thanks
> in advance.

The answer is no. Applying a permutation defined on a set S to a random
variable whose values range over the same set does not change the
entropy of the latter. The following is an illustrative example.

Consider a probability distribution on the set {a, b, c}. Let the
probability mass function for this distribution be p(a) = 0.2, p(b) =
0.3, p(c) = 0.5. The (Shannon) entropy of this distribution is

-(0.2 * lg 0.2 + 0.3 * lg 0.3 + 0.5 * lg 0.5)
= 1.485 bits

Now, apply a permutation that maps a -> b, b -> c, c -> a. Let p' be the
resulting probability mass function. Now

p'(a) = 0.5, p'(b) = 0.2, p'(c) = 0.3

The entropy of this distribution is

-(0.5 * lg 0.5 + 0.2 * lg 0.2 + 0.3 * lg 0.3)
= 1.485 bits (same as before)

A couple things to note:

1. Entropy is a function of probability distribution, but is independent
of (the names of) the elements of the set. (That the elements are 'a',
'b', and 'c' does not enter into the calculation).

2. Applying a permutation to a set amounts to a consistent relabeling of
the elements.

3. If you compare the two entropy calculations, you'll notice that they
have the same expression, up to reordering of terms.

--
Nicol So, CISSP // paranoid 'at' engineer 'dot' com

Bryan Olson

unread,

Oct 17, 2001, 11:42:21 PM10/17/01

to

Mok-Kong Shen wrote:
I conjecture that, averaged
> over all f, H(f^t(X)) would be decreasing with t, since
> most f are not bijective.

Didn't Wagner already answer this?

> BTW, I like to take this opportunity to repeat my question
> of whether H(f^t(X)), with f a permutation, would be
> increasing with t, if X doesn't have full entropy. Thanks
> in advance.

Isn't that trivial? If f is a permutation H(f(X)) = H(X).
So by induction, for all t H(f^t(X)) = H(X), assuming f and
t are known.

--Bryan

Mok-Kong Shen

unread,

Oct 18, 2001, 7:12:18 AM10/18/01

to

Bryan Olson wrote:
>
> Mok-Kong Shen wrote:

> > BTW, I like to take this opportunity to repeat my question
> > of whether H(f^t(X)), with f a permutation, would be
> > increasing with t, if X doesn't have full entropy. Thanks
> > in advance.
>
> Isn't that trivial? If f is a permutation H(f(X)) = H(X).
> So by induction, for all t H(f^t(X)) = H(X), assuming f and
> t are known.

I am apparently having some confusion stemming from common
sense reasoning. If for permutation f H(f(X))=H(X) always
holds, what use is the shuffling in card games? The
question may be very dumb. Would you please give an
answer nonetheless? Thanks in advance.

M. K. Shen

David Rush

unread,

Oct 18, 2001, 12:21:26 PM10/18/01

to

There is no use to shuffling in card games except:

1) it allows you to verify that the ordering of the deck is in
fact arbitrary (instead of specially chosen to favor a
particular player)
2) It allows the selection of a *different* arbitrary ordering
at each shuffle

david rush
--
Christianity has not been tried and found wanting; it has been found
difficult and not tried.
-- G.K. Chesterton

Mok-Kong Shen

unread,

Oct 18, 2001, 1:56:15 PM10/18/01

to

David Rush wrote:
>
> Mok-Kong Shen <mok-ko...@t-online.de> writes:
> > Bryan Olson wrote:
> > > Mok-Kong Shen wrote:
> > > Isn't that trivial? If f is a permutation H(f(X)) = H(X).
> > > So by induction, for all t H(f^t(X)) = H(X), assuming f and
> > > t are known.
> >
> > I am apparently having some confusion stemming from common
> > sense reasoning. If for permutation f H(f(X))=H(X) always
> > holds, what use is the shuffling in card games? The
> > question may be very dumb. Would you please give an
> > answer nonetheless? Thanks in advance.
>
> There is no use to shuffling in card games except:
>
> 1) it allows you to verify that the ordering of the deck is in
> fact arbitrary (instead of specially chosen to favor a
> particular player)
> 2) It allows the selection of a *different* arbitrary ordering
> at each shuffle

But it is a permutation that is considered by the players
to be useful, of value, isn't it? (Why is it of value?)

M. K. Shen

Joseph Ashwood

unread,

Oct 18, 2001, 8:36:13 PM10/18/01

to

d...@mozart.cs.berkeley.edu (David Wagner) wrote in message news:<9qjjhd$h5m$1...@agate.berkeley.edu>...

Yes I agree. Admittedly we have some limitations on f, but even fixing
the length to l we've got l^l functions which even for l=16 begins to
get a bit insane to compute, and since it's safe to assume we're
talking about 2^100+ the computer in front of me can't even begin to
compute the number of functions, let alone any behavior regarding
them. In short, yes I agree, with a question as broad as the one given
we cannot even begin to answer it in any meaningful way.
Joe

Joseph Ashwood

unread,

Oct 18, 2001, 8:41:44 PM10/18/01

to

Mok-Kong Shen <mok-ko...@t-online.de> wrote in message news:<3BCD7B28...@t-online.de>...

I'm going to answer this and your other question regarding shuffling
in one message for brevity.

If f(X) is a permutation then no it does not add entropy to apply f
repeatedly. This requires certain assumptions, the important one being
that f is known to the analyzer of the entropy. If f is chosen
randomly (as in encryption) then there can be an argument that the
entropy in the selection process gets added into the total amount, but
in general this won't hold, and certainly won't hold for too long.

Shuffling adds entropy because the selection of the permutation is
unknown to even the shuffler, so the entropy of this selection is
added into the ordering of the deck of cards. There are exceptions, if
you come in contact with someone who can select their shuffles (any
card magician can do this) this no longer holds because they (as the
attacker) has knowledge of which permutation was chosen.
Joe

Bryan Olson

unread,

Oct 18, 2001, 10:08:59 PM10/18/01

to

Mok-Kong Shen:

> Bryan Olson wrote:
>
> > Isn't that trivial? If f is a permutation H(f(X)) = H(X).
> > So by induction, for all t H(f^t(X)) = H(X), assuming f and
> > t are known.
>
> I am apparently having some confusion stemming from common
> sense reasoning. If for permutation f H(f(X))=H(X) always
> holds, what use is the shuffling in card games?

Note the last phrase: "assuming f and t are known". Your
notation seemed to indicate that f and t were not random.

--Bryan

David Wagner

unread,

Oct 19, 2001, 12:00:12 AM10/19/01

to

Joseph Ashwood wrote:
>[...] even fixing

>the length to l we've got l^l functions which even for l=16 begins to
>get a bit insane to compute, and since it's safe to assume we're
>talking about 2^100+ the computer in front of me can't even begin to
>compute the number of functions, let alone any behavior regarding
>them. In short, yes I agree, with a question as broad as the one given
>we cannot even begin to answer it in any meaningful way.

Cannot answer it in meaningful way? I cannot agree.

You are confusing our ability to compute the answer by brute
force with our ability to compute the answer in some more sophisticated
mathematical way.

Example: What's the sum of 1+2+3+4+...+10^100? Exhaustive addition
on a computer is infeasible, but thanks to the power of mathematics,
we can answer the question nonetheless.

wtshaw

unread,

Oct 19, 2001, 1:43:20 AM10/19/01

to

In article <3BCF17BF...@t-online.de>, Mok-Kong Shen
<mok-ko...@t-online.de> wrote:

Playing a card game is a parallel to cryptoanalysis where secrets are
hidden and in time will probably be revealed. If the cards are never
shown, it means someone has given up. The thrill of a card game is much
like solving a cipher, frustrating if you can't.

Honest shuffling is a way create secrets, stack the action options in a
way beyond control, and reduce all to the same interlocked challenge to be
attacked partially through skill and events twisted by the luck of the
draw.

People like a challenge, especially one they might overcome. The value
for fun is diminished when something is crooked or players are
mismatched. On the other hand, crypto is most secure when there is a
mismatch between creator and breaker.
--
Dependable email means less reliance on surface mail. Local PO sould receive email and deliver it locally. Eliminate spread of disease and danger with electronic means. It is a soph to the ancients when physical presence is considered necessary.

Dabiker

unread,

Oct 19, 2001, 7:52:40 AM10/19/01

to

"David Rush" <ku...@bellsouth.net> skrev i meddelandet
news:okfsncg...@bellsouth.net...

> Mok-Kong Shen <mok-ko...@t-online.de> writes:
> > Bryan Olson wrote:
> > > Mok-Kong Shen wrote:
> > > Isn't that trivial? If f is a permutation H(f(X)) = H(X).
> > > So by induction, for all t H(f^t(X)) = H(X), assuming f and
> > > t are known.
> >
> > I am apparently having some confusion stemming from common
> > sense reasoning. If for permutation f H(f(X))=H(X) always
> > holds, what use is the shuffling in card games? The
> > question may be very dumb. Would you please give an
> > answer nonetheless? Thanks in advance.
>
> There is no use to shuffling in card games except:
>
> 1) it allows you to verify that the ordering of the deck is in
> fact arbitrary (instead of specially chosen to favor a
> particular player)
> 2) It allows the selection of a *different* arbitrary ordering
> at each shuffle

There is quite a difference between shuffling cards and shuffling bits.
If a deck of orderd cards is shuffled using three strokes, it is possible to
find out the way the cards felt into the counterpart of the deck (because
each card is unique), it is possible to find out the algorithm and a
function that describes the outcome.

However if same deck only had bit values 0's and 1's that would not be
feasible, because they are inseparable from eachother, there in rest the
strength of transpositons. As you can understand the strength of
transpositions is dependent on the size of a deck if you only have two cards
you can shuffle to kingdom come and it will not matter. The possibility to
find out the original state in a shuffle is solely dependent upon lack of
size and bitbalance.

Dabiker

unread,

Oct 19, 2001, 7:57:45 AM10/19/01

to

"Joseph Ashwood" <ash...@msn.com> skrev i meddelandet
news:aa0488fe.01101...@posting.google.com...

Wouldn't you say that my algorithm that can shuffle a deck of 256 bits in
2^256 different ways do a rather good job Joe?

Am i a cardmagician?

Dabiker

unread,

Oct 19, 2001, 8:00:44 AM10/19/01

to

"Joseph Ashwood" <ash...@msn.com> skrev i meddelandet
news:aa0488fe.01101...@posting.google.com...

Does this count also for Dabikers shuffle, if bitbalanced?

Mok-Kong Shen

unread,

Oct 19, 2001, 10:03:43 AM10/19/01

to

Joseph Ashwood wrote:
>

> If f(X) is a permutation then no it does not add entropy to apply f
> repeatedly. This requires certain assumptions, the important one being
> that f is known to the analyzer of the entropy. If f is chosen
> randomly (as in encryption) then there can be an argument that the
> entropy in the selection process gets added into the total amount, but
> in general this won't hold, and certainly won't hold for too long.
>
> Shuffling adds entropy because the selection of the permutation is
> unknown to even the shuffler, so the entropy of this selection is
> added into the ordering of the deck of cards. There are exceptions, if
> you come in contact with someone who can select their shuffles (any
> card magician can do this) this no longer holds because they (as the
> attacker) has knowledge of which permutation was chosen.

Thanks. I suppose that this also helps to see why e.g.
in Terry Ritter's DT shuffling twice the bit sequence
should be quite a bit better than only once.

M. K. Shen

Joseph Ashwood

unread,

Oct 19, 2001, 6:24:44 PM10/19/01

to

"Dabiker" <jon...@bredband.net> wrote in message news:<VrUz7.10210$W2....@news1.bredband.com>...

> Admittedly we have some limitations on f, but even fixing
> > the length to l we've got l^l functions which even for l=16 begins to
> > get a bit insane to compute, and since it's safe to assume we're
> > talking about 2^100+ the computer in front of me can't even begin to
> > compute the number of functions, let alone any behavior regarding
> > them. In short, yes I agree, with a question as broad as the one given
> > we cannot even begin to answer it in any meaningful way.
> > Joe
>
> Does this count also for Dabikers shuffle, if bitbalanced?

It's going to sound a bit weird at first but no it doesn't. Because
you've chosen permutations you've stepped into the realm of
invertibility which does not lose entropy. The difficulty of the
question being discussed is that instead of having a small number of
possible functions (N!) we have a much larger universe of possible
functions (N^N). I've found one possible way of finding a solution but
it is so solidly exponential that it simply cannot be expected to be
usable on cryptographicly large numbers (it becomes infeasible to
compute somewhere around 20 values).
Joe

Joseph Ashwood

unread,

Oct 19, 2001, 6:29:03 PM10/19/01

to

d...@mozart.cs.berkeley.edu (David Wagner) wrote in message news:<9qo8gc$2qkv$2...@agate.berkeley.edu>...

> Cannot answer it in meaningful way? I cannot agree.

Earlier I gave an attempt at it, maybe I'm just not good enough at
finding efficient ways to compute the result but that was a beast of
an equation to try to solve for any value, and it was exponentially
exponential making it completely infeasible to compute for values that
are worthwhile in size at least directly. If you'd like to take a
crack at it you're welcome to, but I'm not even sure if my equations
are correct (considering complexity they are probably wrong) so you
should probably at least verify the argument for them, and may want to
just start over.
Joe

David Wagner

unread,

Oct 19, 2001, 7:41:04 PM10/19/01

to

There's an important difference between "I don't see how to solve this
equation" and "this equation is unsolvable". Your post seemed to conflate
the two. That was my only point. Asking me if I can solve it doesn't
change this.