Perfect list shuffling: an interesting algorithm

Ben Rudiak-Gould

unread,

Jan 12, 2004, 4:30:40 AM1/12/04

to

This group holds occasional discussions of functional list shuffling
algorithms (i.e. algorithms which spit out one of the n! possible
permutations of their input with equal probability). A little over a
year ago Oleg posted a Theta(n log n) functional algorithm based on
the standard iterative algorithm.

One solution which often occurs to people is to pass the input to a
comparison-based sort, but with the comparison function replaced by a
source of random bits. This usually produces terrible results. But I
started thinking about a variation of quicksort in which there's no
pivot and the list is partitioned randomly, instead of by
magnitude--and I realized that, counterintuitively, this does produce
a perfect shuffle. (Handwavy proof at the end of this post.)

The average time required for a shuffle seems to be Theta(n log n),
which is optimal[1][2]. The worst-case time is unbounded, which is
also optimal (there's no way to perfectly shuffle more than two items
with any bounded number of random bits).

[1] The iterative algorithm seems to be Theta(n), but actually n is
bounded by the size of the numbers produced by the RNG. If n can grow
without bound, the algorithm becomes Theta(n log n).

[2] I haven't proven this average time, because the recurrence is kind
of nasty, but I've evaluated it numerically and it fits an n log n
curve very closely.

Also interesting is that the algorithm produces a perfect shuffle even
if the coin is biased, as long as the flips are independent of order
and of each other. The greater the bias, the more bits will be
consumed; I very much suspect that the ratio of average bits consumed
is given by Shannon's formula.

Whether or not it proves to be practical, I find this algorithm quite
delightful (especially the biased coin bit), and I was wondering if
there's any literature on it or anything similar. I haven't seen it
mentioned in the previous shuffling threads on this group.

Here's an implementation in Haskell:

shuffle :: (Monad m, ?flipCoin :: m Bool) => [a] -> m [a]

shuffle [] = return []
shuffle [x] = return [x]
shuffle xs = shuffle' xs [] []

shuffle' [] as bs = liftM2 (++) (shuffle as) (shuffle bs)
shuffle' (x:xs) as bs =
do r <- ?flipCoin
if r then shuffle' xs (x:as) bs
else shuffle' xs as (x:bs)

and in Scheme:

(define (shuffle flip-coin list)
(let loop1 ((list list))
(if (or (null? list) (null? (cdr list)))
list
(let loop2 ((in list) (out1 '()) (out2 '()))
(cond ((null? in)
(append (loop1 out1) (loop1 out2)) )
((flip-coin)
(loop2 (cdr in) (cons (car in) out1) out2) )
(else
(loop2 (cdr in) out1 (cons (car in) out2)) ))))))

For efficiency these should probably have an n = 2 base case also.

Handwavy proof by induction that this algorithm produces a perfect
shuffle:

The base cases of n = 0 and n = 1 are trivial.

For n >= 2, we will randomly partition the list into two sublists of
length k and n - k. If k = 0 or k = n, then we will loop with a list
of the same size. We might loop forever, but it's easy to see that
this happens with probability zero.

Assuming 0 < k < n, it's clear that any given element of the original
list has a k/n chance of going in the first sublist and a (n-k)/n
chance of going in the second. If it goes in the first it has a 1/k
chance of ending up in any particular list position after we've
shuffled the sublist, by the induction hypothesis, so it has a
(1/k) * (k/n) = 1/n chance overall of ending up in any final list
position on that side, and by the same argument, a 1/n chance of
ending in any final list position on the other side. QED.

Showing that it works with a biased coin is trickier, but I hope the
general idea is clear: the algorithm doesn't prefer any element over
another, so it has to produce a perfect shuffle.

-- Ben

Ross Paterson

unread,

Jan 12, 2004, 10:14:58 AM1/12/04

to

Ben Rudiak-Gould <be...@darkweb.not-this-part.com> writes:
>This group holds occasional discussions of functional list shuffling
>algorithms (i.e. algorithms which spit out one of the n! possible
>permutations of their input with equal probability). A little over a
>year ago Oleg posted a Theta(n log n) functional algorithm based on
>the standard iterative algorithm.
>
>One solution which often occurs to people is to pass the input to a
>comparison-based sort, but with the comparison function replaced by a
>source of random bits. This usually produces terrible results. But I
>started thinking about a variation of quicksort in which there's no
>pivot and the list is partitioned randomly, instead of by
>magnitude--and I realized that, counterintuitively, this does produce
>a perfect shuffle. (Handwavy proof at the end of this post.)

Another view: Certainly pairing each item with an infinite stream of
random bits, and sorting by the bit streams, gives a perfect shuffle.
If the sort is radix sort, no bit is examined twice, so need not be
remembered, and we can simulate multiple random bit streams with a single
stream of random bits. (Of course this assumes an infinite source of
random bits.)

George Russell

unread,

Jan 13, 2004, 6:59:47 AM1/13/04

to

Ross Paterson wrote (snipped):

> Another view: Certainly pairing each item with an infinite stream of
> random bits, and sorting by the bit streams, gives a perfect shuffle.
> If the sort is radix sort, no bit is examined twice, so need not be
> remembered, and we can simulate multiple random bit streams with a single
> stream of random bits. (Of course this assumes an infinite source of
> random bits.)

Interesting. Another way of describing this algorithm is to say that
given a list of n items, you produce a random permutation as follows:
(1) if n<=1 output the original list.
(2) if n>1, then toss a coin for each item in the list, using the results
to partition the list into LH (list of heads), and LT (list of tails).
Recursive permute LH and LT, and output the result.

Of course this algorithm may require an arbitrary number of bits, or indeed
given a maliciously-chosen bitstream (one that only outputs heads, for example)
fail to terminate. But the same could be said of any completely fair algorithm
for permuting n>=3 items starting with just a random bitstream, since n! does not
divide any power of 2.

You can optimise this algorithm just a little bit by adding a step
(1a) If n=2 sort the two items using a single coin-toss.

If what you want to do is to minimise the expected number of bits required, this
is case N=n! of the problem "randomly pick one of the numbers from 1 to N, using
the minimum number of expected bits". I've forgotten exactly how you solve it, but
I think it comes down to the binary representation of 1/N.

Tim Sweeney

unread,

Jan 13, 2004, 2:20:19 PM1/13/04

to

Can't one always do shuffling with O(n) element accesses, by forming a
new array one element at a time, at each point selecting and removing
(by index) a random element from the original array? For example, one
could implement this in C by starting with an array of n elements,
selecting a random item from among the remaining contiguous items,
appending that item onto the result array, and then replacing it (in
place) with the item at the end of the array, repeating until all
elements are exausted.

This appears to me to require O(n) element accesses, and a source of
O(n log n) random bits. Any source of O(n log n) computational steps
-- unless you are treating each random bit access as a step -- would I
think be an artefact of using functional lists instead of indexable,
updatable arrays.

Tim Sweeney

Ben Rudiak-Gould

unread,

Jan 13, 2004, 3:44:20 PM1/13/04

to

On 13 Jan 2004 11:20:19 -0800, t...@epicgames.com (Tim Sweeney) wrote:

>Can't one always do shuffling with O(n) element accesses, by forming a
>new array one element at a time, at each point selecting and removing
>(by index) a random element from the original array?
>

>This appears to me to require O(n) element accesses, and a source of
>O(n log n) random bits. Any source of O(n log n) computational steps
>-- unless you are treating each random bit access as a step -- would I
>think be an artefact of using functional lists instead of indexable,
>updatable arrays.

This common algorithm is really O(n log n), for two reasons. First,
each coin flip does have to be counted as a step. In fact, I'd expect
it to dominate the overall runtime in many cases, because producing
good pseudorandom bits is expensive, and truly random bits even more
so.

Second, array access is really O(log n), not O(1), because the larger
the array the more bits are needed to represent the indices. In fact,
any time bound which doesn't include a factor of log n is almost
certainly incorrect. This may sound overly pedantic, but I'm just
making the point that factors of log n tend to be no different from
constant factors in practice.

-- Ben

(Actually, array access is at best O(sqrt(n)) because of the
holographic bound, but that's another story.)

Ben Rudiak-Gould

unread,

Jan 13, 2004, 3:45:32 PM1/13/04

to

On 12 Jan 2004 15:14:58 GMT, Ross Paterson <ro...@city.ac.uk> wrote:

>Another view: Certainly pairing each item with an infinite stream of
>random bits, and sorting by the bit streams, gives a perfect shuffle.
>If the sort is radix sort, no bit is examined twice, so need not be
>remembered, and we can simulate multiple random bit streams with a single
>stream of random bits.

You're right, this is a much better way of looking at it. It also
clarifies why it works with a biased coin.

-- Ben

Neelakantan Krishnaswami

unread,

Jan 13, 2004, 5:41:39 PM1/13/04

to

In article <l4j800thjmslld0ma...@4ax.com>, Ben Rudiak-Gould
wrote:

>
> (Actually, array access is at best O(sqrt(n)) because of the
> holographic bound, but that's another story.)

What is the holographic bound? Whatever it is, it certainly is a
cool name. :)

--
Neel Krishnaswami
ne...@cs.cmu.edu

Peter Seibel

unread,

Jan 14, 2004, 3:31:24 PM1/14/04

to

Ben Rudiak-Gould <be...@darkweb.not-this-part.com> writes:

> Showing that it works with a biased coin is trickier, but I hope the
> general idea is clear: the algorithm doesn't prefer any element over
> another, so it has to produce a perfect shuffle.

So you lost me here. If I have a coin that returns heads with 2/3
probability, doesn't that mean that the first item in the list will
end up in the first half of the permutation 2/3's of the time? That
doesn't seem right.

-Peter

--
Peter Seibel pe...@javamonkey.com

Lisp is the red pill. -- John Fraser, comp.lang.lisp

Peter Seibel

unread,

Jan 14, 2004, 3:36:00 PM1/14/04

to

Peter Seibel <pe...@javamonkey.com> writes:

> Ben Rudiak-Gould <be...@darkweb.not-this-part.com> writes:
>
> > Showing that it works with a biased coin is trickier, but I hope the
> > general idea is clear: the algorithm doesn't prefer any element over
> > another, so it has to produce a perfect shuffle.
>
> So you lost me here. If I have a coin that returns heads with 2/3
> probability, doesn't that mean that the first item in the list will
> end up in the first half of the permutation 2/3's of the time? That
> doesn't seem right.

Bah, never mind. I'm an idiot. If the coin returns heads 2/3's of the
time, then there is a 2/3's probability that the first item will be in
the first 2/3's of the result. Which *does* seem right.

Ben Rudiak-Gould

unread,

Jan 15, 2004, 4:46:18 AM1/15/04

to

On Tue, 13 Jan 2004 22:41:39 +0000 (UTC), Neelakantan Krishnaswami
<ne...@cs.cmu.edu> wrote:

>In article <l4j800thjmslld0ma...@4ax.com>, Ben Rudiak-Gould
>wrote:
>>
>> (Actually, array access is at best O(sqrt(n)) because of the
>> holographic bound, but that's another story.)
>
>What is the holographic bound? Whatever it is, it certainly is a
>cool name. :)

Off-topic, but:

You can't pack more entropy into a spherical region of space than a
certain fixed multiple of the surface area (not volume) of the sphere.
This has obvious implications for the design of memory subsystems,
though I don't think we'll be hitting the limit for a while. It's also
called the "Bekenstein bound".

For more physical limits on computation than you can shake a stick at,
see http://www.cise.ufl.edu/~mpf/physlim/ . Another one that's quite
interesting is that you can't destroy information without producing
heat--that is, the bit bucket is real. This may someday lead to the
(heat) death of the von Neumann architecture. Graph-reduction
architectures are no better off, sad to say.

-- Ben

Jerzy Karczmarczuk

unread,

Jan 15, 2004, 6:41:39 AM1/15/04

to

Ben Rudiak-Gould wrote:

> For more physical limits on computation than you can shake a stick at,
> see http://www.cise.ufl.edu/~mpf/physlim/ . Another one that's quite
> interesting is that you can't destroy information without producing
> heat--that is, the bit bucket is real. This may someday lead to the
> (heat) death of the von Neumann architecture. Graph-reduction
> architectures are no better off, sad to say.

Ehm, ehm...
Reversible techniques of computation are a bit known now. The
Landauer principle may be a nuisance, yes. Bennett wrote a nice
article on this subject:

http://arxiv.org/abs/physics/0210005

But Bennet himself proposed some tricks.
For example it is known that irreversible, "standard" computations
may be rendered reversible, if one adds some additional garbage to
the result pool. One cannot discard this garbage. But one can - after
having finished one computation stage - reverse the process and to
"consume the garbage", to distil it into its original pool of nothing
interesting.

Now I HAVE A DREAM. I am convinced that functional languages and
functional architectures are *THE* way to implement those paradigms
and those tricks. So, perhaps in 50 years ...

(But, frankly, the Bekenstein bound, applicable rather to black holes
than to hamburgers, silicon-based or other, will - I think - have never
anything to do with real computers...)

Jerzy Karczmarczuk

Joachim Durchholz

unread,

Jan 15, 2004, 2:27:35 PM1/15/04

to

Ben Rudiak-Gould wrote:
> The average time required for a shuffle seems to be Theta(n log n),

<aside>
What is Theta(...)? I know asymptotic complexity such as O (n log n),
but not Theta.
Google gave me
f (n) is Theta (g (n))
iff f (n) is O (g (n)) and g (n) is O (f (n))
which seems to say that you can say Theta only if the complexity
function doesn't have any lower-order terms, but then I don't see that
*any* algorithm can be Theta (anything) since there's always a constant
term simply for starting up the algorithm... so what gives?
</aside>

> which is
> also optimal (there's no way to perfectly shuffle more than two items
> with any bounded number of random bits).

I think there are two entirely different kinds of unboundedness involved
here.

One is the unboundedness that you get if you insist on perfectly
distributed numbers in a given range that's not a multiple of the
underlying generator's range. I.e. for a source of random bits, this
would be a power of two, for a source of random numbers between 1 and 3,
this would be a power of three, for a source of random numbers between 1
and 6, this would be any product of the powers of two and three.
This one is easy to combat: You determine how precise your random
numbers really have to be, and simply ignore the slight bias. Modern
crypto-grade PRNGs have periods that are so huge that a bias is unlikely
to occur within the lifetime of the universe, which is enough for most
applications.

The other is the unboundedness involved in the algorithm itself, which
can't be combatted (at least not without changing the algorithm). There
is nonzero chance that the recursion will have another go at reshuffling
a list of the same length. Worse: the list that's going to be reshuffled
is a permutation of the original list; in other words, the algorithm
goes to great lengths to reshuffle parts of the input list, just to
decide it's going to shuffle it again.
I think this reshuffling is related to the imperviousness of the bias of
a randomness source, but I think there are simpler ways to remove that
bias: for example, for a biased two-bit source, one can count
consecutive runs of identical bits, compute the parity of the count, and
output these parity bits as an unbiased stream of random bits. As far as
I understood the algorithm and the comments made on it, this seems to
use up the same number of random bits, but it doesn't construct lots of
intermediate lists just for the purpose of shuffling them again.

In other words: one may regard the algorithm as a genial way to do two
things in one go, but to me, it looks like it's trying to do two things
at once and producing an unnecessarily inefficient,
difficult-to-understand, difficult-to-verify, hence unmaintainable
monster that would have been easy to write by separating the concers:
(a) unbiasing a source of randomness, (b) doing an ordinary perfect shuffle.
The resulting code would be far less intriguing and interesting, but it
would be more useful to the general public.
YMMV :-)

Regards,
Jo
--
Currently looking for a new job.

William Lovas

unread,

Jan 15, 2004, 3:29:35 PM1/15/04

to

In article <bu6pfs$kmg$1...@news.oberberg.net>, Joachim Durchholz wrote:
> Ben Rudiak-Gould wrote:
>> The average time required for a shuffle seems to be Theta(n log n),
>
><aside>
> What is Theta(...)? I know asymptotic complexity such as O (n log n),
> but not Theta.
> Google gave me
> f (n) is Theta (g (n))
> iff f (n) is O (g (n)) and g (n) is O (f (n))
> which seems to say that you can say Theta only if the complexity
> function doesn't have any lower-order terms, but then I don't see that
> *any* algorithm can be Theta (anything) since there's always a constant
> term simply for starting up the algorithm... so what gives?
></aside>

You have to start from the definition of O(f(n)).

f(n) is O(g(n)) iff there exist constants c and N such that
for any n > N, f(n) <= cg(n).

That is, f(n) is bounded from above by g(n), asymptotically. The
definition of Omega(f(n)) is similar, but it provides a bound from below.
Theta(f(n)) provides both -- meaning the algorithm will never exceed f(n)
asymptotically, but will never perform better than f(n) asymptotically.
Another definition of Theta(f(n)) is as a combination of O(f(n) and
Omega(f(n)).

For a more indepth treatment, consult an algorithms textbook, like Cormen,
Leiserson, Rivest, and Stein.

cheers,
William

Tomasz Zielonka

unread,

Jan 16, 2004, 2:10:16 AM1/16/04

to

William Lovas wrote:
>
> You have to start from the definition of O(f(n)).
>
> f(n) is O(g(n)) iff there exist constants c and N such that
> for any n > N, f(n) <= cg(n).
>
> That is, f(n) is bounded from above by g(n), asymptotically. The
> definition of Omega(f(n)) is similar, but it provides a bound from below.
> Theta(f(n)) provides both -- meaning the algorithm will never exceed f(n)
> asymptotically, but will never perform better than f(n) asymptotically.

I think this can be misleading. When you say that some algorithm's time
complexity is O(g(n)), you mean that one specific complexity function
f(n) is O(g(n)). Theta is not a way to say that every complexity
function of an algorithm is ...

If for a single 'n' you can have many inputs with different running
times, you have many complexity functions to consider. Most often we
focus on worst-case complexity and average complexity.

BubbleSort will sort an already sorted sequence of length n in a O(n)
steps, but it's worst-case complexity is still Theta(n^2).

QuickSort's average complexity is Theta(n log n), but still it's
worst-case complexity is Theta(n^2).

Best regards,
Tom

--
.signature: Too many levels of symbolic links

George Russell

unread,

Jan 16, 2004, 3:38:39 AM1/16/04

to

Joachim Durchholz wrote (snipped):

> One is the unboundedness that you get if you insist on perfectly
> distributed numbers in a given range that's not a multiple of the
> underlying generator's range. I.e. for a source of random bits, this
> would be a power of two, for a source of random numbers between 1 and 3,
> this would be a power of three, for a source of random numbers between 1
> and 6, this would be any product of the powers of two and three.
> This one is easy to combat: You determine how precise your random
> numbers really have to be, and simply ignore the slight bias.

To generate an unbiased random number between 1 and 3 given a coin you can,
for example, toss the coin twice, map HH, HT, TH to 1,2,3, and if you get TT
repeat. Of course this method has no guaranteed completion time, but its
average performance (8/3 coin tosses) is pretty good.

William Lovas

unread,

Jan 18, 2004, 1:37:19 PM1/18/04

to

In article <slrnc0f3in.2...@zodiac.mimuw.edu.pl>, Tomasz Zielonka
wrote:

> William Lovas wrote:
>>
>> You have to start from the definition of O(f(n)).
>>
>> f(n) is O(g(n)) iff there exist constants c and N such that
>> for any n > N, f(n) <= cg(n).
>>
>> That is, f(n) is bounded from above by g(n), asymptotically. The
>> definition of Omega(f(n)) is similar, but it provides a bound from below.
>> Theta(f(n)) provides both -- meaning the algorithm will never exceed f(n)
>> asymptotically, but will never perform better than f(n) asymptotically.
>
> I think this can be misleading. When you say that some algorithm's time
> complexity is O(g(n)), you mean that one specific complexity function
> f(n) is O(g(n)). Theta is not a way to say that every complexity
> function of an algorithm is ...

I concede: you are correct. Theta notation, then, is a way of describing a
*tight* asymptotic bound, yes?

For example, it is correct to say that merge sort, in the worst case, is
O(2^n), since it is O(n lg n), and n lg n is O(2^n). It would be incorrect
however to say that merge sort, in the worst case, is Theta(2^n), since 2^n
is not an appropriate lower bound for merge sort's worst-case behavior.

cheers,
William, not going on to study complexity theory :)

Tomasz Zielonka

unread,

Jan 18, 2004, 3:29:36 PM1/18/04

to

William Lovas wrote:
>
> I concede: you are correct. Theta notation, then, is a way of describing a
> *tight* asymptotic bound, yes?

But only as tight as asymptotic can be ;)

> For example, it is correct to say that merge sort, in the worst case, is
> O(2^n), since it is O(n lg n), and n lg n is O(2^n). It would be incorrect
> however to say that merge sort, in the worst case, is Theta(2^n), since 2^n
> is not an appropriate lower bound for merge sort's worst-case behavior.

Precisely.

Best regards,
Tomasz

Dylan Thurston

unread,

Jan 21, 2004, 1:06:01 AM1/21/04

to

On 2004-01-15, Jerzy Karczmarczuk <kar...@info.unicaen.fr> wrote:
> For example it is known that irreversible, "standard" computations
> may be rendered reversible, if one adds some additional garbage to
> the result pool. One cannot discard this garbage. But one can - after
> having finished one computation stage - reverse the process and to
> "consume the garbage", to distil it into its original pool of nothing
> interesting.

To be pedantic, this introduces an extra logarithmic factor, I believe
in both space and time.

> Now I HAVE A DREAM. I am convinced that functional languages and
> functional architectures are *THE* way to implement those paradigms
> and those tricks. So, perhaps in 50 years ...

When I thought about this, I didn't think it was true: functional
languages duplicate and discard data structures too much. Maybe one way
to say it is that garbage collection is a real problem.

Peace,
Dylan

Ben Rudiak-Gould

unread,

Jan 21, 2004, 3:31:34 AM1/21/04

to

On Thu, 15 Jan 2004 12:41:39 +0100, Jerzy Karczmarczuk
<kar...@info.unicaen.fr> wrote:

>Ehm, ehm...
>Reversible techniques of computation are a bit known now.

Certainly, and the page I linked to has information on the subject.

>Now I HAVE A DREAM. I am convinced that functional languages and
>functional architectures are *THE* way to implement those paradigms
>and those tricks. So, perhaps in 50 years ...

If you ever develop this beyond the dream stage I'm certainly
interested in hearing about it. I designed a toy reversible language a
couple of years ago (http://esoteric.sange.fi/essie2/download/kayak/)
and it ended up having a very imperative flavor, because I couldn't
figure out how to make it work any other way. (A lot more Kayak
programs have been written than you'll find there; maybe someday I'll
give it a proper home page.)

>(But, frankly, the Bekenstein bound, applicable rather to black holes
>than to hamburgers, silicon-based or other, will - I think - have never
>anything to do with real computers...)

Probably. I'm only pointing out that none of our "asymptotic bounds"
actually continue to apply as n->infinity. They're really "approximate
orders of growth for n in some finite range". Under this
interpretation Theta(n log n) may actually be better than Theta(n), if
the latter's constant factor is larger by at least a factor of
log(n_max).

-- Ben