Entropy vs Work (Mistake in Crypto FAQ?)

pl...@ima.umn.edu

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

(please forgive possible repost)

I believe I have a very strong family of counterexamples to a
statement in the "Cryptography FAQ". Specifically, section 4.9
is eminently reasonable and widely believed. It is supported,
up to a point, by the asymptotic equipartition property (AEP)
of information theory. Section 4.9 of the FAQ reads:

> 4.9. What's a key-guessing attack? What's entropy?
>
> Say somebody is using the one-time pad---but isn't choosing
> keys randomly and uniformly from all m-bit messages, as
> he was supposed to for our security proof. In fact say
> he's known to prefer keys which are English words. Then a
> cryptanalyst can run through all English words as possible
> keys. This attack will often succeed, and it's much faster
> than a brute-force search of the entire keyspace.
>
> We can measure how bad a key distribution is by calculating
> its entropy. This number E is the number of ``real bits
> of information'' of the key: a cryptanalyst will typically
> happen across the key within 2^E guesses. E is defined as
> the sum of -p_K log_2 p_K, where p_K is the probability
> of key K.

As a general rule, this is dangerously wrong. A complete
discussion is given in a brief paper available online:

http://www.ima.umn.edu/~pliam/doc/pwe.ps

The crux of the matter turns on the following result which is
easy enough to state (in a fixed-width font :-) :

There are probabilities, which when arranged

p >= p >= ... >= p ,
1 2 n

satisfy

H
2
\---
\ p = arbitrarily small number.
/ i
/___

i = 1

The point being that if you search 2^H keys optimally (in
order of nonincreasing probability), you may have nearly *no*
chance of guessing correctly. This appears to contradict the
FAQ, no matter how you interpret it.

The probabilities are given as follows:

Let j,k be two positive integers, and let a = 2^j. The first
k probabilities are

1 1 1 1 1
_ , __ , __ , __ , ... , __ .
2 3 4 k
a a a a a

After that, m copies of the final 1/a^k are used. The value
of m is easily chosen so that the probabilities sum to 1.
It is not hard to show that for fixed j, the above sum S of
the first 2^H most probable p_i satisfies:

1
S -> -----,
a - 1

as k -> infinity. Because the sum S is the total probability
that the desired value falls in the search range, {1,...,2^H},
the assertion that you will "happen across the key" is wrong
for large a.

Entropy as a measure of uncertainty or redundancy of natural
language is well-founded. Wherever Cryptology makes use of
it in that context (such as in the formulation of unicity
distance), it is perfectly legitimate. However, there are
many notions of uncertainty. Economists use the Lorenz
curve and the Gini coefficient, while Political Scientists
use the minimum majority. They all have different properties
(see [1]). When it comes to estimating the work-factor of a
key-search as described in the Cryptography FAQ, entropy may
*not* be the best measure of uncertainty.
I personally believe that we should avoid using "entropy"
as a synonym for "randomness" or "work-factor".

John
pl...@ima.umn.edu

- - -

ref: [1] A.W. Marshall, Ingram Olkin, Inequalities: Theory
of Majorization and Its Applications, Academic Press, San
Diego, 1979.

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own

Medical Electronics Lab

unread,

Nov 11, 1998, 3:00:00 AM11/11/98

to

pl...@ima.umn.edu wrote:
> There are probabilities, which when arranged
>
> p >= p >= ... >= p ,
> 1 2 n
>
> satisfy
>
> H
> 2
> \---
> \ p = arbitrarily small number.
> / i
> /___
>
> i = 1

> [.....]

> I personally believe that we should avoid using "entropy"
> as a synonym for "randomness" or "work-factor".

Given your assumption on the probability distributions I agree
that these are not synonymous. But for "random" I usually
think of the probabilities as being uniform. For this assumption,
n
sum(p_i) = p_i(n^2 - n)/2
i=1

and p_i = 1/N where N is the total number of possible states.
This has a maximum of (N-1)/2, not exactly small. Different
assumptions lead to different results, so we ought to use
different words. I'm not exactly sure what "random" is, but
it's straightforward to figure out what it isn't.

Entropy remains elusive though :-)

Patience, persistence, truth,
Dr. mike

Rick Heylen

unread,

Nov 11, 1998, 3:00:00 AM11/11/98

to

John's paper may perhaps be clarified by a few examples.

Suppose we have a source of random samples that can be one of 77
values.This source is called "Source A" and two thirds of the time the
source returns 77. When it doesn't return 77 it returns one of the
remaining 76 possible values with equal probability. It is easy to show
that such a source has an entropy of 3 bits.
A source which returns, on request, an integer from 1 to 8 inclusive with
equal probability will also have an entropy of 3 bits. This source is
called "Source B" and if you guess that it will return the number 8 you
will be right an eighth of the time. The same is true if you guess that it
will return a different integer in the appropriate range. In contrast, if
you guess that "Source A" will return 77, you will be right two thirds of
the time. This is quite an important difference considering that both
sources have the same entropy. Imagine that you had to generate a secret
key using a few values from one of the two sources. Obviously, source B
would be the much better choice.

For a more realistic example, imagine that you get your 128 bit secret key
from a source, "Source C", that returns 128 bits at a time. Source C is not
perfect but its entropy is guaranteed to be over 127 bits. This sounds
quite good! On average, one might think a cracker would have to search
2^126 of the keys. Supposing Source C however returns a particular 128bit
value on average one time in 120 and the rest of the time it returns one
of the other 128 bit values equiprobably. This source has an entropy a
shade over 127 bits but a cracker knows the key to about 0.833% of your
messages right away.

Of course source A and C are designed to highlight the problem and do not
crop up "in the wild". The author is very emphatic that entropy should
never be equated with work but in many situations entropy is the only
measure of work that can be found. When obtaining random bits from english
text using a hash function, the sort of analysis seemingly advocated by the
author is impossible because the hash function is (hopefully) strong and
defies the necessary analysis.

The crux of the matter (according to John) is the existance of some
distributions with an interesting property. John outlines a distribution
with an entropy H. With a uniform distribution, trying 2^H values would be
sufficient to exhaust the possibilities. With John's distribution, the
probability that an optimal search with 2^H tries will be successfull can
be made arbitrarily small. The problem is that this distribution has an
unimaginably vast number of possible values and is exceptionally
non-uniform. This endows it with a low entropy compared with a more uniform
distribution with the same number of values. Naturally, searching this
huge, low entropy distribution is tricky.
Such distributions are mathematical curiosities only as they could not
exist in the context of cryptography.

Rick

pl...@ima.umn.edu

unread,

Nov 12, 1998, 3:00:00 AM11/12/98

to

In article <01be0dc4$81176de0$2500...@himalia.c-dilla.com>,

"Rick Heylen" <rick....@c-dilla.com> wrote:
> John's paper may perhaps be clarified by a few examples.

I think that everything you say is well thought-out. Also and
thanks for the examples.

> The author is very emphatic that entropy should never be
> equated with work but in many situations entropy is the only
> measure of work that can be found.

I am emphatic in part because in my opinion, there really *is*
an adequate measure of work, namely the area under the Lorenz
curve (after you flip it over). In the paper [1], I show
that this area is precisely the expected computation time
of an optimal brute-force attack. There are other suitable
measures of work depending on the situation.

I should clarify that for the extremes of 0 entropy and maximum
entropy, you may in fact equate entropy with the size, in bits,
of the optimal search space. In fact whenever a probability
distribution is uniform on some subset of the sample space,
you may equate the two quantities. In particular, this is
true whenever the AEP is valid. But they are not, *in general*
synonymous.

> When obtaining random bits from english text using a hash
> function, the sort of analysis seemingly advocated by the
> author is impossible because the hash function is (hopefully)
> strong and defies the necessary analysis.

There are several issues here. I think you are referring to
a common paradigm of crunching English down via a good hash
function into usable key material. If the source really is a
long string of English text, then the AEP saves the day and
the key-search space is reduced to 2^nH. But what if the
source is something else? The set of English words, and the
set of pass-phrases should be assumed to have some associated
probability distribution. Are either of these uniformly
distributed on some subset of all bit strings? Probably not.

> The problem is that this [my counterexample] distribution

> has an unimaginably vast number of possible values and
> is exceptionally non-uniform. This endows it with a low
> entropy compared with a more uniform distribution with
> the same number of values. Naturally, searching this huge,
> low entropy distribution is tricky.

Cryptology is filled with unimaginably vast sample spaces:
the set of 128-bit strings (size = 2^128), the set of all
permutations on those strings (size = 2^128 factorial), the
set of DES functions under the independent key assumption (size
= 2^768). Anyway, I get headache trying to imagine these...

> Such distributions are mathematical curiosities only as they
> could not exist in the context of cryptography.

Counterexamples are, sort of by definition, mathematical
curiosities. These, in fact, come from Huffman trees with
interesting properties (look at the tree for a=4). But that's
not the point.

There is a rather large leap being made here that because
the family of counterexamples has some extreme properties,
you can dismiss the possibility that any distribution from the
realm of cryptology might display *any* disparity between work
and entropy. Your epsilon may vary.

If anything, the history of cracking Unix passwords shows us,
two things:

1. Brute-force attacks evolve toward the optimal.

2. The distribution of passwords is sharply nonuniform.
The most probable passwords are words in a dictionary,
the next most probable passwords are of the form
"word.number", etc. The least probable passwords are
all *other* random 64-bit strings - these are equally
likely and constitute the majority of search space.

That sounds a lot like the counterexample, though empirical
data would be needed to verify the actual rate of probability
decrease.

John

[1] John Pliam, "The Disparity between Work and Entropy", 1998,
URL: http://www.ima.umn.edu/~pliam/doc/pwe.ps .

David Wagner

unread,

Nov 14, 1998, 3:00:00 AM11/14/98

to

In article <72dcle$us7$1...@joseph.cs.berkeley.edu>,

John Pliam <john....@worldnet.att.net> wrote:
> I believe I have a very strong family of counterexamples to a
> statement in the "Cryptography FAQ".
>

> > We can measure how bad a key distribution is by calculating
> > its entropy. This number E is the number of ``real bits
> > of information'' of the key: a cryptanalyst will typically
> > happen across the key within 2^E guesses. E is defined as
> > the sum of -p_K log_2 p_K, where p_K is the probability
> > of key K.
>
> As a general rule, this is dangerously wrong.

Agreed. Good catch!

As a general rule, there are many measures of entropy (not just the
Shannon entropy defined above), which are useful in different contexts;
confusing them can be dangerous.

Here are several examples:
+ Shannon entropy: H(X) = - \sum_x Pr_X(x) \log_2 Pr_X(x)
Useful for analyzing one-time pads, secret sharing, compression,
asymptotics, etc.
+ Renyi entropy (of order 2): H_2(X) = - \log_2 \sum_x Pr_X(x)^2
Useful for analyzing the probability of collisions.
+ ``Guessing'' entropy: H_G(X) = \log_2 \sum_i i p_i
if the probabilities Pr_X(x) are arranged so that p_1 >= p_2 >= ...
Useful for analyzing the expected cost of keysearch, inverting hash
functions, etc.
I'm stealing liberally from Chapter 3 of [1] here.

The different measures of entropy don't necessarily agree, of course.

I think many people are often very sloppy about their use of the word
``entropy.'' Your paper seems to be a useful reminder of just how
misleading our intuition can be.

Perhaps an example would be useful. Let's take Renyi entropy, say.
Suppose we have a memoryless source which generates outputs according to
the distribution of the random variable X. Then H_2(X) is a good
measure of the collision properties of the source. The probability that
any two fixed outputs collide is 2^{-H_2(X)}; also, after n outputs, the
expected number of collisions is n(n-1)/2 * 2^{-H_2(X)}. See [1] for
other results.

As an application of Renyi entropy, suppose we encrypt plaintext under
ECB mode, and the random variable X represents the distribution of
blocks of plaintext. Then after n outputs the expected number of
collisions in ciphertext blocks which leak information on the plaintext
is n(n-1)/2 * 2^{-H_2(X)}. So when we are interested in collision
properties, it seems that the Renyi entropy roughly matches our
intuition.

As an example application of ``guessing'' entropy, imagine analyzing the
Unix password hashing scheme, where we are given the hash of the user's
password. If we let the r.v. X represent the distribution of users'
passwords, then the expected amount of work needed (in an optimal
dictionary search attack) to recover a single target's password is
2^{H_G(X)}.

You give a generalization, wf_p(X), which measures the expected work
required to gain a probability p of success. This seems useful in the
case that we have many cryptanalytic targets. Suppose we have a
password file containing n hashes; if we do wf_{c/n}(X) work, the
expected number of passwords broken is c (assuming independence of
password choices).

Note that [1] cites a paper by Massey [2] that apparently proves the
bound 2^{H_G(X)} >= 2^{H(X)-2} + 1. [1] also cites a paper by McEliece
and Yu [3] that apparently proves a weak bound in the other direction,
H(X) >= (2^{H_G(X)} - 1) 2 \log_2 |X| / (|X| - 1), where |X| represents
the number of different possible values X can take on. See [1] for more
results in this vein.

Note that for the uniform distribution, the measures of entropy roughly
agree [H(X) = H_2(X) = \log_2 |X|, H_G(X) = (\log_2 (|X|+1)) - 1].
Interestlying, [1] mentions that when X is nearly uniform in the sense
that H(X) = \log_2 |X| - c for small c, then the other measures of
entropy roughly agree too. Therefore, to prove security, it is
apparently enough to show that H(X) is close to \log_2 |X|.

[1] Christian Cachin, Entropy Measures and Unconditional Security in
Cryptography, PhD thesis, ETH Zurich, May 1997.
ftp://ftp.inf.ethz.ch/pub/publications/dissertations/th12187.ps.gz
[2] James Massey, ``Guessing and entropy,'' Proc. 1994 IEEE Int'l Symp.
on Information Theory, 1994, p. 204.
[3] Robert McEliece and Zhong Yu, ``An inequality on entropy,'' Proc.
1995 IEEE Int'l Symp. on Information Theory, 1995, p. 329.

jsa...@freenet.edmonton.ab.ca

unread,

Nov 16, 1998, 3:00:00 AM11/16/98

to

Medical Electronics Lab (ros...@physiology.wisc.edu) wrote:
: Entropy remains elusive though :-)

Well, the meaning of entropy where it belongs - in thermodynamics - is
simple enough. A neat and tidy room is in a low-entropy state, a messy one
is in a high-entropy state, to use an informal example.

Entropy is low when a system is well ordered, so that the energy it
contains, instead of being in the form of heat, is in some form that is
easier to extract. (If the system has a hot part and a cold part, this
also lowers the entropy, as that allows the energy from some of the heat
to be extracted, whereas no energy can be extracted from heat from things
that are all at the same temperature.)

The reason entropy has a numerical expression that can be applied to
communications theory - and hence to cryptography - is this:

the number of states essentially equivalent to a particular low-entropy
state is smaller than the number of states essentially equivalent to a
high-entropy state, and in fact there is a formula for working out the
entropy from the number of states.

Thus, there are 26^100 100-letter random messages, but the number of
100-letter messages with normal English letter frequencies (say after a
transposition cipher is applied) is much lower, and the number of
100-letter English-language messages is much lower still.

In cryptography, to a greater extent than in thermodynamics, what is meant
by an equivalent state is a matter of judgement, so that alone makes
entropy less precise than one might like in this sphere. But it can be
used, within limits. This also relates to questions like the amount of
redundancy in a plaintext message, and by how much it can be compressed.

John Savard

Medical Electronics Lab

unread,

Nov 17, 1998, 3:00:00 AM11/17/98

to

It's not on my server anymore, but someone
posted a URL for a thesis on Entropy measures
the other day which I have printed out. I'm
only thru the introductory portions, but it
looks really nice and seems pedigogical enough
for me to follow easily.

There are many definitions of measuring entropy.
The thesis describes several methods and how
to use each one in different applications. While
it isn't aimed at hardware at all, the math should
apply equally well and the limitations of each
measure can be used to determine the limitations
of a hardware RNG.

Thanks to whoever posted it!