Does shuffle() produce uniform result ?

tooru honda

unread,

Aug 24, 2007, 3:38:51 AM8/24/07

to

Hi,

I have read the source code of the built-in random module, random.py.
After also reading Wiki article on Knuth Shuffle algorithm, I wonder if
the shuffle method implemented in random.py produces results with modulo
bias.

The reasoning is as follows: Because the method random() only produces
finitely many possible results, we get modulo bias when the number of
possible results is not divisible by the size of the shuffled list.

1. Does shuffle() produce uniform result ?

2. If not, is there a fast and uniform shuffle() available somewhere ?

Thanks !

-tooru honda

Steve Holden

unread,

Aug 24, 2007, 7:42:12 AM8/24/07

to pytho...@python.org

tooru honda wrote:
> Hi,
>
> I have read the source code of the built-in random module, random.py.
> After also reading Wiki article on Knuth Shuffle algorithm, I wonder if
> the shuffle method implemented in random.py produces results with modulo
> bias.
>
> The reasoning is as follows: Because the method random() only produces
> finitely many possible results, we get modulo bias when the number of
> possible results is not divisible by the size of the shuffled list.
>
> 1. Does shuffle() produce uniform result ?
>

Given the cycle length of the Mersenne twister algorithm that generates
the underlying random numbers, it would have to be a pretty long list to
see a significant bias, don't you think? Have you done any calculations
on the length of list you propose to use?

> 2. If not, is there a fast and uniform shuffle() available somewhere ?
>

Frankly I don't think you need to worry.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

Hrvoje Niksic

unread,

Aug 24, 2007, 8:54:14 AM8/24/07

to

tooru honda <tooru...@fast-mail.org> writes:

> I have read the source code of the built-in random module,
> random.py. After also reading Wiki article on Knuth Shuffle
> algorithm, I wonder if the shuffle method implemented in random.py
> produces results with modulo bias.

It doesn't have modulo bias because it doesn't use modulo to produce a
random index; it multiplies the floating point value with the desired
range. I'm not sure if that method produces any measurable bias.

Mark Dickinson

unread,

Aug 24, 2007, 9:30:14 PM8/24/07

to

On Aug 24, 8:54 am, Hrvoje Niksic <hnik...@xemacs.org> wrote:

It produces exactly the same level of bias as the 'modulo bias'
obtained by reducing a random integer in [0, 2**53). For example,
suppose we're trying to produce an integer x in the range 0 through 6
inclusive. If n is a random variable whose values are uniformly
distributed across range(0, 2**53) then:

x = n % 7

will produce 0, 1, 2 and 3 with probability (2**53//7+1)/2**53, and 4,
5 and 6 with probability (2**53//7)/2**53, while

x = floor((n/2**53)*7)

will produce 0, 1, 3 and 5 with probability (2**53//7+1)/2**53, and 2,
4 and 6 with probability (2**53//7)/2*53.

Either way, you'd have a very hard time detecting such a tiny bias.
At the other end of the scale, if you're trying to produce a value in
[0, 2**53-2] (for example) then it looks worse: with either method,
one of the values occurs exactly twice as often as all of the others.
But since there are now so many values, you'd again have problems
detecting any bias.

Steven Holden wrote:
> Frankly I don't think you need to worry.

What he said.

Mark

Paul Rubin

unread,

Aug 24, 2007, 9:53:30 PM8/24/07

to

tooru honda <tooru...@fast-mail.org> writes:
> The reasoning is as follows: Because the method random() only produces
> finitely many possible results, we get modulo bias when the number of
> possible results is not divisible by the size of the shuffled list.
>
> 1. Does shuffle() produce uniform result ?

The nonuniformity is too small to matter. But what is the
application? If you are doing something like implementing online
poker for real money, you shouldn't use the built-in RNG. It is not
designed for what we call adversarial indistinguishability from true
randomness. Instead, use the random byte stream available from
os.urandom() and derive your random numbers from that.

Mark Dickinson

unread,

Aug 24, 2007, 10:03:39 PM8/24/07

to

On Aug 24, 9:30 pm, Mark Dickinson <dicki...@gmail.com> wrote:

> x = floor((n/2**53)*7)
>
> will produce 0, 1, 3 and 5 with probability (2**53//7+1)/2**53, and 2,
> 4 and 6 with probability (2**53//7)/2*53.

Oops---I lied; I forgot to take into account the rounding implicit in
the (n/2**53)*7 multiplication. A bit of experimentation shows that
it's 0, 2, 4 and 6 that occur more often, with 1, 3 and 5 less likely
by a miniscule amount (at least on an IEEE-754 system).

Mark

tooru honda

unread,

Aug 24, 2007, 10:43:42 PM8/24/07

to

Hi, First of all, my thanks to all of you who replied.

I am writing a gamble simulation to convince my friend that his "winning
strategy" doesn't work. I use shuffle method from a random.SystemRandom
instance to shuffle 8 decks of cards.

As the number of cards is quite small (number of cards is 416), the
nonuniformity doesn't matter as most of you have already said. Just to
avoid argument from my friend, I am considering writing my own randint
and shuffle methods based on os.urandom() though.

-tooru honda

tooru honda

unread,

Aug 25, 2007, 8:35:52 AM8/25/07

to

At the end, I think it is worthwhile to implement my own shuffle and
random methods based on os.urandom. Not only does the resulting code
gets rid of the minuscule bias, but the program also runs much faster.

When using random.SystemRandom.shuffle, posix.open and posix.close from
calling os.urandom account for almost half of the total execution time
for my program. By implementing my own random and getting a much larger
chunk of random bytes from os.urandom each time, I am able to reduce the
total execution time by half.

-tooru honda

P.S. I use python 2.5.1 on MacOSX 10.4.10 (PowerPC).

Dan Bishop

unread,

Aug 25, 2007, 9:48:33 AM8/25/07

to

On Aug 24, 2:38 am, tooru honda <tooru_ho...@fast-mail.org> wrote:
> Hi,
>
> I have read the source code of the built-in random module, random.py.
> After also reading Wiki article on Knuth Shuffle algorithm, I wonder if
> the shuffle method implemented in random.py produces results with modulo
> bias.
>
> The reasoning is as follows: Because the method random() only produces
> finitely many possible results, we get modulo bias when the number of
> possible results is not divisible by the size of the shuffled list.
>
> 1. Does shuffle() produce uniform result ?

Experimental evidence with a 5-card decks and 1,200,000 simulation
gives me

[9782, 9784, 9784, 9786, 9817, 9822, 9825, 9841, 9859, 9864, 9866,
9872, 9886, 9888, 9893, 9893, 9893, 9893, 9896, 9896, 9901, 9905,
9909, 9909, 9912, 9917, 9919, 9920, 9920, 9925, 9932, 9933, 9934,
9938, 9938, 9943, 9944, 9946, 9947, 9951, 9952, 9955, 9956, 9959,
9960, 9961, 9962, 9963, 9964, 9966, 9968, 9968, 9968, 9969, 9969,
9975, 9977, 9978, 9978, 9979, 9985, 9987, 9990, 9996, 9998, 9998,
10002, 10009, 10012, 10015, 10017, 10018, 10020, 10026, 10027, 10028,
10032, 10035, 10042, 10047, 10048, 10049, 10049, 10055, 10056, 10060,
10062, 10064, 10066, 10069, 10070, 10076, 10087, 10089, 10092, 10095,
10097, 10098, 10100, 10101, 10116, 10117, 10128, 10129, 10131, 10137,
10138, 10140, 10146, 10155, 10158, 10162, 10162, 10163, 10177, 10190,
10190, 10232, 10260, 10292]

frequencies of each of the 120 possible orderings. The chi-squared
test statistic (for H0: all 120 orderings are equally likely) is
132.9472 with 119 degrees of freedom. The p-value is 0.1804, which is
greater than typically-used significance levels, so there is
insufficient evidence to reject H0.

Alex Martelli

unread,

Aug 25, 2007, 11:48:34 AM8/25/07

to

tooru honda <tooru...@fast-mail.org> wrote:

> At the end, I think it is worthwhile to implement my own shuffle and
> random methods based on os.urandom. Not only does the resulting code
> gets rid of the minuscule bias, but the program also runs much faster.
>
> When using random.SystemRandom.shuffle, posix.open and posix.close from
> calling os.urandom account for almost half of the total execution time
> for my program. By implementing my own random and getting a much larger
> chunk of random bytes from os.urandom each time, I am able to reduce the
> total execution time by half.

If I were in your shoes, I would optimize by subclassing
random.SystemRandom and overriding the random method to use os.urandom
with some large block size and then parcel it out, instead of the
_urandom(7) that it now uses. E.g., something like:

class SystemBlockRandom(random.SystemRandom):

def __init__(self):
random.SystemRandom.__init__(self)
def rand7():
while True:
randata = os.urandom(7*1024)
for i in xrange(0, 7*1024, 7):
yield long(binascii.hexlify(randata[i:i+7]),16)
self.rand7 = rand7().next

def random(self):
"""Get the next random number in the range [0.0, 1.0)."""
return (self.rand7() >> 3) * random.RECIP_BPF

(untested code). No need to reimplement anything else, it seems to me.

Alex

tooru honda

unread,

Aug 25, 2007, 11:30:45 PM8/25/07

to

By incorporating Alex's code, I got another performance boost of 20%.
It is mostly due to Alex's more efficient implementation of block random
than my own version.

-tooru honda

Below is the code I have now:

from binascii import hexlify
from os import urandom

class rcRandomC(random.SystemRandom):

def __init__(self):
random.SystemRandom.__init__(self)

def rand2():

while True:
randata = urandom(2*1024)
for i in xrange(0, 2*1024, 2):
yield int(hexlify(randata[i:i+2]),16) # integer
in [0,65535]

self.rand2_M = rand2().next

# modified from random._randbelow
def randrange(self,startN,stopN):

"""Choose a random integer from range(startN, stopN).
widthN<=65536

"""

widthN=stopN-startN

left_over_N=65536%widthN
upper_bound_N= 65535-left_over_N

random_number=self.rand2_M()

while random_number>upper_bound_N:

random_number=self.rand2_M()

r = random_number%widthN

return startN+r

def shuffle(self, x):
"""x, random=random.random -> shuffle list x in place; return
None.

"""

randrange=self.randrange

for i in reversed(xrange(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = randrange(0,i+1)
x[i], x[j] = x[j], x[i]

Alex Martelli

unread,

Aug 26, 2007, 1:14:37 AM8/26/07

to

tooru honda <tooru...@fast-mail.org> wrote:
...

> def rand2():
> while True:
> randata = urandom(2*1024)
> for i in xrange(0, 2*1024, 2):
> yield int(hexlify(randata[i:i+2]),16) # integer
> in [0,65535]

another equivalent possibility, which might probably be faster:

import array
...
def rand2():
while True:
x = array.array("H")
x.fromstring(urandom(2*4000))
for i in x: yield i

Alex

tooru honda

unread,

Aug 28, 2007, 4:27:12 AM8/28/07

to

Thanks to everyone who replied, (and special thanks to Alex Martelli,) I
was able to accomplish what I originally asked for: a shuffle() which is
both fast and without bias. It is the first time I try to optimize
python code, and it is definitely a very rewarding experience.

To bring closure to this thread, I will provide below some statistics
obtained from cProfile:

1. using random.SystemRandom

ncalls tottime percall cumtime percall filename:lineno(function)

4182879 35.154 0.000 211.006 0.000 os.py:724(urandom)
4182879 31.981 0.000 248.291 0.000 random.py:767(random)
40578 17.977 0.000 266.300 0.007 random.py:250(shuffle)
4182879 29.487 0.000 29.487 0.000 {posix.close}
4182879 99.460 0.000 99.460 0.000 {posix.open}
4182879 41.794 0.000 41.794 0.000 {posix.read}

2. my original optimization

8268 0.133 0.000 2.577 0.000 os.py:724(urandom)
4134322 15.094 0.000 21.606 0.000 baccarat.py:70(get2bytes)
4131441 17.221 0.000 41.722 0.000 baccarat.py:85(randrange)
40590 7.059 0.000 48.795 0.001 baccarat.py:112(shuffle)

3. using Alex's generator with string

4117 0.058 0.000 2.048 0.000 os.py:724(urandom)
4214795 10.186 0.000 14.880 0.000 baccarat.py:93(rand2)
4211790 8.883 0.000 23.763 0.000 baccarat.py:106(randrange)
40739 6.101 0.000 29.878 0.001 baccarat.py:131(shuffle)

4. using Alex's generator with array

2081 0.029 0.000 1.736 0.001 os.py:724(urandom)
4161569 1.724 0.000 3.473 0.000 baccarat.py:100(rand2new)
4158474 8.315 0.000 11.788 0.000 baccarat.py:113(randrange)
40514 5.726 0.000 17.528 0.000 baccarat.py:138(shuffle)

Antoon Pardon

unread,

Sep 3, 2007, 6:31:12 AM9/3/07

to

On 2007-08-26, tooru honda <tooru...@fast-mail.org> wrote:
> By incorporating Alex's code, I got another performance boost of 20%.
> It is mostly due to Alex's more efficient implementation of block random
> than my own version.

If I understand correctly that you are using urandom as a random
generator I wouldn't trust too much on this performance. Urandom
uses the systemwide entropy-pool. If other programs need this pool
too, your performance can drop spectaculary. If you are using a linux
machine just try to execute "od -x /dev/random" at the same time
as your program.

--
Antoon Pardon

Paul Rubin

unread,

Sep 3, 2007, 12:52:11 PM9/3/07

to

Antoon Pardon <apa...@forel.vub.ac.be> writes:
> If I understand correctly that you are using urandom as a random
> generator I wouldn't trust too much on this performance. Urandom
> uses the systemwide entropy-pool. If other programs need this pool
> too, your performance can drop spectaculary.

No the idea is that once there's enough entropy in the pool to make
one encryption key (say 128 bits), the output of /dev/urandom is
computationally indistinguishable from random output no matter how
much data you read from it.

Antoon Pardon

unread,

Sep 4, 2007, 2:28:54 AM9/4/07

to

If you were talking about /dev/random I would agree. But this is what
the man page on my system says about /dev/urandom

A read from the /dev/urandom device will not block waiting for
more entropy. As a result, if there is not sufficient
entropy in the entropy pool, the returned values are
theoretically vulnerable to a cryptographic attack on the algorithms
used by the driver. Knowledge of how to do this is not available
in the current non-classified literature, but it is the-
oretically possible that such an attack may exist. If this is a
concern in your application, use /dev/random instead.

And reading from /dev/random can block if there is not enough entropy.

--
Antoon Pardon

Paul Rubin

unread,

Sep 4, 2007, 2:42:56 AM9/4/07

to

Antoon Pardon <apa...@forel.vub.ac.be> writes:
> > No the idea is that once there's enough entropy in the pool to make
> > one encryption key (say 128 bits), the output of /dev/urandom is
> > computationally indistinguishable from random output no matter how
> > much data you read from it.
>
> If you were talking about /dev/random I would agree. But this is what

> the man page on my system says about /dev/urandom. ...

> the returned values are theoretically vulnerable to a
> cryptographic attack on the algorithms used by the driver.

Right. The idea is that those attacks don't exist and therefore the
output is computationally indistinguishable from random. Of course
whether the idea is correct, an unproven conjecture, but it looks
pretty good; certainly finding any problem with the specific
algorithms in urandom would be a significant research discovery and
not likely to affect the application being discussed. Finding a
general attack that couldn't be fixed with some simple tweak would be
a major crypto breakthrough that would probably reshape a lot of
complexity theory. This is unlike the situation with Mersenne
Twister, which was not designed for indistinguishability against an
opponent who knew what to look for.

In short, using /dev/random is fairly silly once you know there's
enough entropy in the randomness pool to make a good key. If
/dev/urandom's algorithms are broken then whatever you're doing with
the /dev/random output is probably also broken.

Steven D'Aprano

unread,

Sep 4, 2007, 5:13:21 PM9/4/07

to

On Mon, 03 Sep 2007 23:42:56 -0700, Paul Rubin wrote:

> Antoon Pardon <apa...@forel.vub.ac.be> writes:
>> > No the idea is that once there's enough entropy in the pool to make
>> > one encryption key (say 128 bits), the output of /dev/urandom is
>> > computationally indistinguishable from random output no matter how
>> > much data you read from it.
>>
>> If you were talking about /dev/random I would agree. But this is what
>> the man page on my system says about /dev/urandom. ...
>> the returned values are theoretically vulnerable to a
>> cryptographic attack on the algorithms used by the driver.
>
> Right. The idea is that those attacks don't exist and therefore the
> output is computationally indistinguishable from random.

It is a huge leap from what the man page says, that they don't exist in
the unclassified literature at the time the docs were written, to what
you're saying, that they don't exist.

The man page is clear: there is a possible vulnerability in /dev/urandom.
Any cryptographer worth his salt (pun intended) would be looking to close
that vulnerability BEFORE an attack is made public, and not just wait for
the attack to trickle down from the NSA to the script kiddies. The time
to close the stable door is _before_ the horse gets away.

> Of course
> whether the idea is correct, an unproven conjecture, but it looks pretty
> good; certainly finding any problem with the specific algorithms in
> urandom would be a significant research discovery and not likely to
> affect the application being discussed.

I agree that this flaw doesn't sound like it will effect the application
being discussed, but a problem has already been found and a solution is
already known: block until there's enough entropy. That's what /dev/
random does.

[snip]

> In short, using /dev/random is fairly silly once you know there's enough
> entropy in the randomness pool to make a good key. If /dev/urandom's
> algorithms are broken then whatever you're doing with the /dev/random
> output is probably also broken.

That doesn't follow. Antoon is specifically warning that /dev/urandom is
non-blocking. If you knew there was enough entropy available, you
wouldn't need /dev/random -- but how do you know there's enough entropy?

(I suppose you could look in /proc/sys/kernel/random/entropy_avail.)

For this specific application, it probably doesn't matter -- using /dev/
urandom is surely overkill, and on a single-user Linux desktop you're
unlikely to have vast numbers of applications reading /dev/urandom
without your knowledge. But why not use /dev/random? What's the downside?

--
Steven.

Raymond Hettinger

unread,

Sep 4, 2007, 9:39:50 PM9/4/07

to

On Aug 24, 12:38 am, tooru honda <tooru_ho...@fast-mail.org> wrote:
> 1. Does shuffle() produce uniform result ?

If you're worried about this microscopic bias (on the order of
2**-53), then shuffle more than once. That will distribute the bias
more evenly:

def super_shuffle(sequence):
for i in range(10):
shuffle(sequence)

Raymond Hettinger

Paul Rubin

unread,

Sep 5, 2007, 1:01:47 AM9/5/07

to

Steven D'Aprano <st...@REMOVE-THIS-cybersource.com.au> writes:
> > Right. The idea is that those attacks don't exist and therefore the
> > output is computationally indistinguishable from random.
>
> It is a huge leap from what the man page says, that they don't exist in
> the unclassified literature at the time the docs were written, to what
> you're saying, that they don't exist.

OK. /dev/random vs /dev/urandom is a perennial topic in sci.crypt and
there are endless long threads about it there, so I tried to give you
the short version, but will give a somewhat longer version here.

There can't be an actual security proof without also proving P!=NP;
however, the cryptographic part of /dev/urandom is designed pretty
conservatively and seems to be in good shape even after all these years.

> The man page is clear: there is a possible vulnerability in /dev/urandom.

There is the theoretical possibility of such a vulnerability, however
it also applies to /dev/random. Note that the man page is neither a
kernel internals document nor a cryptography textbook, so it doesn't
describe the algorithm or the theory behind it. For that you have to
look at the source code and the cryptographic literature, respectively.

For the random.c source code, see

http://www.cs.berkeley.edu/~daw/rnd/linux-rand

It's an old version but hasn't changed all that much since then AFAIK.
There is a long comment at the top explaining how it works. Short
version: /dev/random reads data from a bunch of stochastic sources
around the system (e.g. things like mouse motion) and makes a
seat-of-the-pants estimate of the amount of entropy coming from the
sources. The sources are not considered to be uniform or
uncorrelated; they're just supposed to contain entropy. The entropy
is distilled through a cryptographic hash function before being sent
out the API. /dev/random throttles its output so that the number of
bits emitted is always lower than the estimated input entropy.
/dev/urandom is the same but doesn't do this throttling.

For the theory of what we hope (but have not proved) that these hash
functions do, see for example the HMAC papers:

http://www-cse.ucsd.edu/~mihir/papers/hmac.html

> Any cryptographer worth his salt (pun intended) would be looking to close
> that vulnerability BEFORE an attack is made public, and not just wait for
> the attack to trickle down from the NSA to the script kiddies. The time
> to close the stable door is _before_ the horse gets away.

No really, these functions are very well studied, in fact there are
known ways to construct collisions in the md5 compression function but
that doesn't appear to be of any use against how random.c uses it.

> I agree that this flaw doesn't sound like it will effect the application
> being discussed, but a problem has already been found and a solution is
> already known: block until there's enough entropy. That's what /dev/
> random does.

No, having enough entropy does NOT guarantee indistinguishability from
randomness. Think of a 60-40 biased coin. Each flip gives you 0.97
bits of entropy, so if you want 64 bits of entropy, 66 flips gives it
to you. But it's very easy to spot the 60-40 bias and see that the
flips are not drawn uniformly from {heads,tails}. Running the flips
through MD5 experimentally appears to get rid of any correlation
(except for some very tricky adversarial constructions based on
controlling the input) so that is what /dev/random does. But that is
the exact same thing that urandom does. It's possible that some
as-yet-unknown attack can find correlations in the md5 output and also
in SHA256 or whatever.

More recently there's been some theoretical advances, see these:

http://en.wikipedia.org/wiki/Extractor
http://citeseer.ist.psu.edu/barak03true.html

so maybe that stuff can be used in implementations. I haven't looked
into it enough to know if it makes sense. Chapter 10 of Schneier and
Ferguson's "Practical Cryptography" also discusses how to write these
system-level PRNG's.

> That doesn't follow. Antoon is specifically warning that /dev/urandom is
> non-blocking. If you knew there was enough entropy available, you
> wouldn't need /dev/random -- but how do you know there's enough entropy?

/dev/urandom can in fact screw you if you try to use it before the
system has gathered enough entropy to give good output, such as while
the system is still booting up. That is not a cryptographic algorithm
vulnerability. It could also screw you if your system is somehow
leaking information about its entropy pool (say through
electromagnetic emissions from the memory bus). That is also not a
cryptographic algorithm vulnerability. /dev/random does protect you
somewhat from both of these screws. /dev/random does NOT protect you
against sufficiently strong hypothetical attacks against the crypto
algorithm.

Overall, the situation is:

1) /dev/urandom is slightly bogus because it can produce output when
there's almost no entropy in the system, i.e. during boot or
immediately after boot. But once the system has been running
for a while, the urandom output (in a real PC) is pretty good.

2) /dev/random is somewhat more bogus because it blocks when it's
capable of producing good (computationally secure) output. Note
that /dev/random's entropy estimates could in fact be wrong--for
example, in a virtual PC (say a simulated one), there might be ZERO
actual entropy (restarting the simulation might get you back to
exactly the same state).

3) /dev/random and /dev/urandom are in principle BOTH vulnerable to
cryptographic attacks. Some people incorrectly assume that this
doesn't apply to /dev/random. So from an information-theoretic
point of view, /dev/random and /dev/urandom are both bogus, but
from a computational security point of view, the algorithms look
ok (as best as we can tell), and if anything is irreparably wrong
with them then all of cryptography is in trouble, so really neither
is bogus in this regard.

4) The man page is fairly seriously bogus because it doesn't explain
the real situation with either /dev/urandom or /dev/random.

5) Really, these modern algorithms are much harder to attack than
fans of pre-computer (e.g. WW2-era) cryptography seem to imagine.
These algorithms are very fast on modern computers but wouldn't
have been practical without computers. Computers have extended
the capabilities of both attackers and defenders, but it looks
like they've given a lot more to the defenders (see Kahn "The
Codebreakers" 2nd ed. note at the end: it says something like
"in the long-running battle between cryptographers and cryptanalysts,
it looks like the cryptographers have won).

6) For a sort of philosophical note on how cryptography fits in with
the P vs. NP problem, this article is pretty cool:

http://www.cs.ucsd.edu/users/russell/average.ps

A brief summary is at:

http://weblog.fortnow.com/2004/06/impagliazzos-five-worlds.html

> For this specific application, it probably doesn't matter -- using /dev/
> urandom is surely overkill, and on a single-user Linux desktop you're
> unlikely to have vast numbers of applications reading /dev/urandom
> without your knowledge. But why not use /dev/random? What's the downside?

/dev/random is in fact quite slow, like a few bits per second, not
practical for dealing zillions of poker hands or whatever this
application was. Once you exhaust the stored pool (a few hundred
bytes or whatever) it takes quite a long time to get any more output.
Also, it doesn't give the theoretical guarantees that some people
imagine that it does. Aside from that, it's indistinguishable from
/dev/urandom once /dev/urandom has enough entropy to get started.

The right way for /dev/*random to work is block after system boot
until there is some reasonable fixed amount of entropy (say 256 bits)
gathered, then produce unlimited output without ever blocking again.
Some ongoing folding of new entropy into the entropy pool (see the
Fortuna algorithm in Schneier and Ferguson) is a good practical
precaution, but really high security applications should consider the
whole PC memory and CPU buses to be an insecure broadcast medium
anyway, so such applications generally do their most sensitive
operations in dedicated secure hardware.

Sorry to go on for so long but it connects to a never-ending source of
bogosity on sci.crypt (the other newsgroup I mostly hang out in),
where several times a year someone shows up selling some bogus
one-time-pad product based on supposed information-theoretic security,
they're misguided 100% of the time and get shot down, but it always
takes a while. My guess is that one of them is responsible for the
ongoing sci.crypt sporge attack.

Steven D'Aprano

unread,

Sep 6, 2007, 1:38:08 AM9/6/07

to

On Tue, 04 Sep 2007 22:01:47 -0700, Paul Rubin wrote:

> OK. /dev/random vs /dev/urandom is a perennial topic in sci.crypt and
> there are endless long threads about it there, so I tried to give you
> the short version, but will give a somewhat longer version here.

Thank you. Your points are taken, in particular:

> 4) The man page is fairly seriously bogus because it doesn't explain
> the real situation with either /dev/urandom or /dev/random.

--
Steven.

Lawrence D'Oliveiro

unread,

Sep 8, 2007, 10:41:04 PM9/8/07

to

In message <13drijh...@corp.supernews.com>, Steven D'Aprano wrote:

> Any cryptographer worth his salt (pun intended) would be looking to close
> that vulnerability BEFORE an attack is made public, and not just wait for
> the attack to trickle down from the NSA to the script kiddies.

Except that the NSA's reputation has taken a dent since they failed to
anticipate the attacks on MD5 and SHA-1.

Paul Rubin

unread,

Sep 8, 2007, 11:33:57 PM9/8/07

to

Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> writes:
> Except that the NSA's reputation has taken a dent since they failed to
> anticipate the attacks on MD5 and SHA-1.

NSA had nothing to do with MD5, and it's to NSA's credit that SHA-1
held up for as long as it did.

Bryan Olson

unread,

Sep 9, 2007, 2:42:27 AM9/9/07

to

Paul Rubin wrote:

> Lawrence D'Oliveiro writes:
>> Except that the NSA's reputation has taken a dent since they failed to
>> anticipate the attacks on MD5 and SHA-1.
>
> NSA had nothing to do with MD5, and it's to NSA's credit that SHA-1
> held up for as long as it did.

I haven't kept up. Has anyone exhibited a SHA-1 collision?

--
--Bryan

Lawrence D'Oliveiro

unread,

Sep 9, 2007, 2:53:32 AM9/9/07

to

In message <7xhcm4p...@ruckus.brouhaha.com>, Paul Rubin wrote:

> Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> writes:
>
>> Except that the NSA's reputation has taken a dent since they failed to
>> anticipate the attacks on MD5 and SHA-1.
>

> NSA had nothing to do with MD5 ...

Nevertheless, it was their job to anticipate attacks on it. After all, they
call themselves the "National _Security_ Agency", don't they?

> ... and it's to NSA's credit that SHA-1 held up for as long as it did.

But they have no convincing proposal for a successor. That means the gap
between the classified and non-classified state of the art has shrunk down
to insignificance.

Steven D'Aprano

unread,

Sep 9, 2007, 9:03:22 AM9/9/07

to

On Sun, 09 Sep 2007 18:53:32 +1200, Lawrence D'Oliveiro wrote:

> In message <7xhcm4p...@ruckus.brouhaha.com>, Paul Rubin wrote:
>
>> Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> writes:
>>
>>> Except that the NSA's reputation has taken a dent since they failed to
>>> anticipate the attacks on MD5 and SHA-1.
>>
>> NSA had nothing to do with MD5 ...
>
> Nevertheless, it was their job to anticipate attacks on it. After all,
> they call themselves the "National _Security_ Agency", don't they?

The NSA has many jobs, and doing public research in crypto is only one of
them -- and a particularly small one at that. For all we know, they had
an attack on MD5 ten years before anyone else and didn't tell anyone
because keeping it secret made it useful for one of their other jobs.

>> ... and it's to NSA's credit that SHA-1 held up for as long as it did.
>
> But they have no convincing proposal for a successor. That means the gap
> between the classified and non-classified state of the art has shrunk
> down to insignificance.

I don't see how that follows. But even if it does... maybe it's because
there is nowhere to go from here? You can't make mathematical
breakthroughs to order.

--
Steven.

Paul Rubin

unread,

Sep 9, 2007, 1:09:14 PM9/9/07

to

Bryan Olson <fakea...@nowhere.org> writes:
> I haven't kept up. Has anyone exhibited a SHA-1 collision?

I don't think anyone has shown an actual collision, but apparently
there is now a known way to find them in around 2**63 operations. I
don't know if it parallellizes as well as a brute force attack does.
If it does, then it's presumably within reach of the distributed
attacks like the ones used against DES in the late 1990's, given the
hardware speedups that have occurred since then. NIST is trying to
phase out SHA-1 by 2010.

http://en.wikipedia.org/wiki/SHA1#Cryptanalysis_of_SHA-1
http://csrc.nist.gov/hash_standards_comments.pdf

Paul Rubin

unread,

Sep 9, 2007, 1:09:51 PM9/9/07

to

The successor is SHA-2.

Lawrence D'Oliveiro

unread,

Sep 9, 2007, 10:58:02 PM9/9/07

to

According to this <http://en.wikipedia.org/wiki/SHA-1>, the family of
algorithms collectively described as "SHA-2" is by no means a definitive
successor to SHA-1.

Paul Rubin

unread,

Sep 9, 2007, 11:10:37 PM9/9/07

to

Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> writes:
> According to this <http://en.wikipedia.org/wiki/SHA-1>, the family of
> algorithms collectively described as "SHA-2" is by no means a definitive
> successor to SHA-1.

See <http://csrc.nist.gov/hash_standards_comments.pdf>:

However, due to advances in technology, NIST plans to phase out of
SHA-1 in favor of the larger and stronger hash functions (SHA-224,
SHA-256, SHA-384 and SHA-512) by 2010. SHA-1 and the larger hash
functions are specified in FIPS 180-2. For planning purposes by
Federal agencies and others, note also that the use of other
cryptographic algorithms of similar strength to SHA-1 will also be
phased out in 2010. SHA-1 and the stronger hash functions in FIPS
180-2 are all NIST approved.

This may also be of interest:

http://www.csrc.nist.gov/pki/HashWorkshop/index.html

Lawrence D'Oliveiro

unread,

Sep 10, 2007, 3:52:26 AM9/10/07

to

In message <13e7roq...@corp.supernews.com>, Steven D'Aprano wrote:

> On Sun, 09 Sep 2007 18:53:32 +1200, Lawrence D'Oliveiro wrote:
>
>> In message <7xhcm4p...@ruckus.brouhaha.com>, Paul Rubin wrote:
>>
>>> Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> writes:
>>>
>>>> Except that the NSA's reputation has taken a dent since they failed to
>>>> anticipate the attacks on MD5 and SHA-1.
>>>
>>> NSA had nothing to do with MD5 ...
>>
>> Nevertheless, it was their job to anticipate attacks on it. After all,
>> they call themselves the "National _Security_ Agency", don't they?
>
> The NSA has many jobs, and doing public research in crypto is only one of
> them -- and a particularly small one at that. For all we know, they had
> an attack on MD5 ten years before anyone else and didn't tell anyone
> because keeping it secret made it useful for one of their other jobs.

Yes, but they're supposed to look after US _National_ security, not their
own security. Since people in strategic jobs make so much use of hash
functions in crypto, that means it is most certainly an important part of
the NSA's function to ensure that there are good hash functions available.
They've fallen down on that job.

>>> ... and it's to NSA's credit that SHA-1 held up for as long as it did.
>>
>> But they have no convincing proposal for a successor. That means the gap
>> between the classified and non-classified state of the art has shrunk
>> down to insignificance.
>
> I don't see how that follows.

Because previously, the NSA has done things that it took open researchers
years, even decades, to figure out. But not any more.