Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Implementaion of random.shuffle

9 views
Skip to first unread message

Steve Holden

unread,
Jul 16, 2007, 8:53:40 AM7/16/07
to pytho...@python.org
shabda raaj wrote:
> I was going through the docs for module-random
> <http://docs.python.org/lib/module-random.html> And I came through this,
>
> * shuffle*( x[, random])
>
>
> Shuffle the sequence x in place. The optional argument random is a
> 0-argument function returning a random float in [0.0, 1.0); by
> default, this is the function random().
>
> Note that for even rather small |len(x)|, the total number of
> permutations of x is larger than the period of most random number
> generators; this implies that most permutations of a long sequence
> can never be generated.
>
> Why would we need to generate the total number of permutations for the
> list while trying to shuffle it? Should not we just use a knuth shuffle
> <http://en.wikipedia.org/wiki/Knuth_shuffle#Shuffling_algorithms> which
> does not require us to generate the total number of permutations. As
> long as len(x) is smaller than period of random number generator, a
> permutation can be generated.
>
> A quick first draft of implemetation might be,
>
> import random
>
> def shuffle(list_):
> for i in range(len(list_)):
> j = random.randint(i, len(list_)-1)
> list_[i], list_[j] = list_[j], list_[i]
> if __name__ == '__main__':
> a = range(100)
> shuffle(a)
> print a
>
> This is my first post to the list, I am not sure if this is the correct
> place to ask these questions. Please advise if this is not the correct
> place.
>
This is an excellent place to post the question. If it were to turn out
that your concerns were genuine that you might end up posting a patch to
the issue tracker and possibly then involving the developers. Until your
suspicions are confirmed, however, let the developers get on with
development.

You, like all (or most) Python users, have the source of the standard
library at your disposal. Had you looked at .../Lib/random.py you would
have found that the implementation of shuffle() reads as follows:

def shuffle(self, x, random=None, int=int):
"""x, random=random.random -> shuffle list x in place; return None.

Optional arg random is a 0-argument function returning a random
float in [0.0, 1.0); by default, the standard random.random.
"""

if random is None:
random = self.random
for i in reversed(xrange(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = int(random() * (i+1))
x[i], x[j] = x[j], x[i]

So it would appear that the developers chose the Knuth algorithm (with a
slight variation) for *their* implementation. Now you have to ask
yourself whether your surmise is genuinely correct (in which case the
documentation may contain a bug) or whether the documentation is indeed
correct and you are in error.

Note that there is no intent to generate all permutations of the list
and then choose one (that would indeed be a sub-optimal solution). The
documentation is only talking about whether the algorithm can in theory
produce all possible permutations of all possible input lists. It's
possible that the comment in the documentation is left over from an
earlier version. What do you think?

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

shabda raaj

unread,
Jul 16, 2007, 10:45:48 AM7/16/07
to

Oh, I wasn't aware that I could see the source of all python modules.
I guess, then the implementation is correct(its just the knuth
shuffle,
moving downward), its just the doc which is in error.
Might be we should log a doc bug for this?

--Shabda


Hrvoje Niksic

unread,
Jul 16, 2007, 10:55:53 AM7/16/07
to
Steve Holden <st...@holdenweb.com> writes:

> So it would appear that the developers chose the Knuth algorithm
> (with a slight variation) for *their* implementation. Now you have
> to ask yourself whether your surmise is genuinely correct (in which
> case the documentation may contain a bug) or whether the
> documentation is indeed correct and you are in error.

That is a good question. The random module uses the Mersenne twister,
which has a repetition period of 2**19937. The number of n-sized
permutations of a list with n elements is n!, while each shuffle
requires n calls to the PRNG. This means that to be able to generate
all permutations, the PRNG must have a period of at least n! * n. In
the case of MT, it means that, regarding the period, you are safe for
lists with around 2079 elements. shuffle's documentation may have
been written before the random module was converted to use the MT.

2**19937 being a really huge number, it's impossible to exhaust the
Mersenne twister by running it in sequence. However, there is also
the question of the spread of the first shuffle. Ideally we'd want
any shuffle, including the first one, to be able to produce any of the
n! permutations. To achieve that, the initial state of the PRNG must
be able to support at least n! different outcomes, which means that
the PRNG must be seeded by at least log2(n!) bits of randomness from
an outside source. For reference, Linux's /dev/random stops blocking
when 64 bits of randomness are available from the entropy pool, which
means that, in the worst case, shuffling more than 20 elements cannot
represent all permutations in the first shuffle!

Steve Holden

unread,
Jul 16, 2007, 11:29:22 AM7/16/07
to pytho...@python.org

Thanks for this thoughtful analysis. I believe we can therefore leave
the documentation (which almost certainly *was* written before the
adoption of MT) alone for now.

Scott David Daniels

unread,
Jul 16, 2007, 10:30:19 PM7/16/07
to
shabda raaj wrote:
> ... Oh, I wasn't aware that I could see the source of all python modules....
Well, actually not _all_ (or is that __all__), but that is exactly why
so many of us love Python -- no magic (or at least as little as needed).

--Scott David Daniels
scott....@acm.org

Steven D'Aprano

unread,
Jul 16, 2007, 10:51:13 PM7/16/07
to
On Mon, 16 Jul 2007 16:55:53 +0200, Hrvoje Niksic wrote:

> 2**19937 being a really huge number, it's impossible to exhaust the
> Mersenne twister by running it in sequence.

"Impossible"?

Surely this will do it:

for n in xrange(2**19937 + 1):
random.random()

Admittedly, if each call to random() took a picosecond, it would still
take 1e5982 centuries to run through the lot. You might want to go make a
coffee or something while you're waiting...

--
Steven.

Steven D'Aprano

unread,
Jul 16, 2007, 10:52:33 PM7/16/07
to
On Mon, 16 Jul 2007 14:45:48 +0000, shabda raaj wrote:

> On Jul 16, 5:53 pm, Steve Holden <st...@holdenweb.com> wrote:
>> shabda raaj wrote:
>> > I was going through the docs for module-random
>> > <http://docs.python.org/lib/module-random.html> And I came through
>> > this,
>>
>> > * shuffle*( x[, random])
>>
>> > Shuffle the sequence x in place. The optional argument random is
>> > a 0-argument function returning a random float in [0.0, 1.0); by
>> > default, this is the function random().
>>
>> > Note that for even rather small |len(x)|, the total number of
>> > permutations of x is larger than the period of most random number
>> > generators; this implies that most permutations of a long
>> > sequence can never be generated.

[snip]

> Oh, I wasn't aware that I could see the source of all python modules. I
> guess, then the implementation is correct(its just the knuth shuffle,
> moving downward), its just the doc which is in error. Might be we should
> log a doc bug for this?


No, it is not a doc error, you are confused.

Suppose you have a random number generator that has the tiny period of
120. (Obviously that's a really lousy RNG, but it will do for an
example.) That means it can only produce 120 different results (the
current number, plus the state the generator it is in) before repeating.

Note that the period is not the same as the spread of values returned,
the two are independent, although naturally the period must be at least
equal to the number of unique values returned.

If you have a list of just six items, there are 6! = 720 possible
permutations of the list, so AT BEST the random number generator can
produce 120/720 or 17% of the permutations.

That's an example of the pigeonhole principle, one of the most simple yet
powerful mathematical principles around. If you've got more items than
pigeonholes to put them in, either some will double up or some will miss
out.

In the case of CPython, the current implementation uses the Mersenne
Twister, which has a huge period of 2**19937. However, 2081! is larger
than that number, which means that at best a list of 2081 items or longer
can't be perfectly shuffled (not every permutation can be selected by the
algorithm).


--
Steven.

"Martin v. Löwis"

unread,
Jul 17, 2007, 12:59:07 AM7/17/07
to Steven D'Aprano
>> 2**19937 being a really huge number, it's impossible to exhaust the
>> Mersenne twister by running it in sequence.
>
> "Impossible"?

Unfeasible.

Regards,
Martin

shabda raaj

unread,
Jul 17, 2007, 2:44:05 AM7/17/07
to

The code for shuffle is

if random is None:
random = self.random
for i in reversed(xrange(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = int(random() * (i+1))
x[i], x[j] = x[j], x[i]

With


j = int(random() * (i+1))

being the only call to random().
As long as (i + 1) is not large enough so that j is generated with
equal probability in range [0, i] we would get all permutations with
equal probability. With MT with period of 2**19937 (i+1) would have to
be really large before this the case.

Anyway, is there some test harness we can run to test the robustness
of shuffle? We can run that test harness for large values and see at
what point all permutations are not possible or come with unequal
probability.

shabda raaj

unread,
Jul 17, 2007, 4:35:15 AM7/17/07
to
Ah, thanks I get it now!

--Shabda

On Jul 17, 7:52 am, Steven D'Aprano
<ste...@REMOVE.THIS.cybersource.com.au> wrote:


> On Mon, 16 Jul 2007 14:45:48 +0000,shabdaraaj wrote:
> > On Jul 16, 5:53 pm, Steve Holden <st...@holdenweb.com> wrote:

Hrvoje Niksic

unread,
Jul 17, 2007, 4:47:49 AM7/17/07
to
Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> writes:

> In the case of CPython, the current implementation uses the Mersenne
> Twister, which has a huge period of 2**19937. However, 2081! is
> larger than that number, which means that at best a list of 2081
> items or longer can't be perfectly shuffled (not every permutation
> can be selected by the algorithm).

Note that each shuffle requires n calls to the PRNG, not just one,
which reduces the theoretically safe list size by 1.

Josiah Carlson

unread,
Jul 17, 2007, 12:13:25 PM7/17/07
to
shabda raaj wrote:
> The code for shuffle is
>
> if random is None:
> random = self.random
> for i in reversed(xrange(1, len(x))):
> # pick an element in x[:i+1] with which to exchange x[i]
> j = int(random() * (i+1))
> x[i], x[j] = x[j], x[i]
>
> With
> j = int(random() * (i+1))
> being the only call to random().
> As long as (i + 1) is not large enough so that j is generated with
> equal probability in range [0, i] we would get all permutations with
> equal probability. With MT with period of 2**19937 (i+1) would have to
> be really large before this the case.

Be careful of assuming too much about permutations and randomness. In
particular, you are assuming too much about what mersenne twister
guarantees, what that implies regarding random permutations, and what a
random permutation really is.

In particular, if we merely enumerate all possible permutations of n
elements, there are n! total permutations. Now, because the generation
of a permutation *is fundamentally indistinguishable from choosing a
number in the range [0...n!)*, then we can rely on the knowledge and
research regarding the generation of such numbers. In particular,
thanks to birthday collisions, you only need to generate O(sqrt(n!))
random permutations to have a 50% chance of having come upon a duplicate
permutation. For example:

>>> a = range(10)
>>> b = {}
>>> import random
>>> while 1:
... x = a[:]
... random.shuffle(x)
... x = tuple(x)
... if x not in b:
... b[x] = None
... else:
... break
...
>>> len(b)
3605
>>> x
(4, 3, 9, 8, 0, 2, 1, 7, 6, 5)
>>> x in b
True
>>>

I only generated 3605 permutations of 10 items before I got a collision.
According to your claims, I should have gone through all 3628800
possible permutations before that happened. Note that I could repeat
this exercise with any decent pseudo random number generator, or actual
random information generated from any good random source. As long as
the PRNG doesn't suck, it should offer some variant of the above.


> Anyway, is there some test harness we can run to test the robustness
> of shuffle? We can run that test harness for large values and see at
> what point all permutations are not possible or come with unequal
> probability.

Shuffle works as intended. Your assumptions about it's functionality
are incorrect. Testing will only prove that your assumptions are wrong.


- Josiah

Steve Holden

unread,
Jul 17, 2007, 4:30:49 PM7/17/07
to pytho...@python.org
Josiah Carlson wrote:
> shabda raaj wrote:
[...]

>
>> Anyway, is there some test harness we can run to test the robustness
>> of shuffle? We can run that test harness for large values and see at
>> what point all permutations are not possible or come with unequal
>> probability.
>
> Shuffle works as intended. Your assumptions about it's functionality
> are incorrect. Testing will only prove that your assumptions are wrong.
>
Which is alone a good justification for testing.

easier-than-proving-oneself-right-ly y'rs - steve

George Sakkis

unread,
Jul 18, 2007, 3:32:35 PM7/18/07
to
On Jul 16, 10:51 pm, Steven D'Aprano

Wow, can you make a coffee in.. 57ms ?

$ time python -c "for n in xrange(2**19937 + 1): random.random()"
Traceback (most recent call last):
File "<string>", line 1, in <module>
OverflowError: long int too large to convert to int

real 0m0.057s
user 0m0.050s
sys 0m0.000s


Time-flies-like-an-arrow-ly y'rs ;-)

George

Steven D'Aprano

unread,
Jul 18, 2007, 11:47:01 PM7/18/07
to
On Wed, 18 Jul 2007 19:32:35 +0000, George Sakkis wrote:

> On Jul 16, 10:51 pm, Steven D'Aprano
> <ste...@REMOVE.THIS.cybersource.com.au> wrote:
>> On Mon, 16 Jul 2007 16:55:53 +0200, Hrvoje Niksic wrote:
>> > 2**19937 being a really huge number, it's impossible to exhaust the
>> > Mersenne twister by running it in sequence.
>>
>> "Impossible"?
>>
>> Surely this will do it:
>>
>> for n in xrange(2**19937 + 1):
>> random.random()
>>
>> Admittedly, if each call to random() took a picosecond, it would still
>> take 1e5982 centuries to run through the lot. You might want to go make a
>> coffee or something while you're waiting...
>
> Wow, can you make a coffee in.. 57ms ?

[snip demonstration of xrange raising an exception]

Of course! Can't you?

And if I use a microwave oven, the coffee is made so quickly that I
actually go backwards in time...

--
Steven.

Neil Cerutti

unread,
Jul 19, 2007, 8:05:14 AM7/19/07
to

But what happens if you use a microwave oven? ... What the!?!?

--
Neil Cerutti

Scott David Daniels

unread,
Feb 20, 2009, 6:15:42 PM2/20/09
to
[*** deleted section outlining Guido's time machine structure ***]

Steve, consider yourself warned!!!

-- The PSU

0 new messages