--
Ryan Tarpine, rtar...@hotmail.com
"Technology is a queer thing. It brings you great gifts
in one hand, and it stabs you in the back with the other."
-- C.P. Snow
Easiest would be:
Put them in an array.
for i from 1 to N:
exchange the i'th element with a randomly chosen one
(exchanging it with itself resulting in a no-op)
(Sorry no OCaml. Should be easy enough to translate though.)
This means ca. 2N writes, on the average. Knuth's Art of Computer
Programming might have more efficient algorithms, but I can't imagine
one right now.
The algorithm is also very imperative; I'm not sure that it's possible
to write a functional version that matches both speed and efficiency.
(I'd be happy to see such a function.)
Regards,
Joachim
--
This is not an official statement from my employer.
Post to a newsgroup and ask for help. ;-)
> e.g. I need 0 through 1599 in random order, each number appearing
> exactly once. Ocaml if possible. Thanks in advance
let swap a i j =
let tmp = Array.get a i in
(Array.set a i (Array.get a j);
Array.set a j tmp)
let shuffled_array n =
let a = Array.init n (fun i -> i) in
(Random.init n;
for i = 0 to (n - 1) do
swap a i (Random.int n);
done;
a)
let a = shuffled_array 1600
Not really a very functional program, I admit, and probably not the best,
but I work best for money or love.
-- Brian
I believe that you will get a better randommess if you make that:
swap a i (i + (Random.int (n-i));
Al
> Brian Rogoff wrote:
>> ...
>> let swap a i j =
>> let tmp = Array.get a i in
>> (Array.set a i (Array.get a j);
>> Array.set a j tmp)
>>
>> let shuffled_array n =
>> let a = Array.init n (fun i -> i) in
>> (Random.init n;
>> for i = 0 to (n - 1) do
>> swap a i (Random.int n);
>> done;
>> a)
>>
>> let a = shuffled_array 1600
>> ...
>
> I believe that you will get a better randommess if you make that:
>
> swap a i (i + (Random.int (n-i));
> ...
Thanks to both of you. Just what I needed.
If you get an A, send us each 15%.
Al
Al Christians wrote:
But it amounts to slightly less than a... newline! Furthermore your
mailer would strip it as a white space :-(.
Alexander
> Easiest would be:
> Put them in an array.
> for i from 1 to N:
> exchange the i'th element with a randomly chosen one
> (exchanging it with itself resulting in a no-op)
I don't have the proof handy, but you can prove that swapping each element
with a random element is more likely to produce some final orderings than
others. You instead want to swap each element with a randomly chosen
element between it and the end of the list.
--
Russ Allbery (r...@stanford.edu) <http://www.eyrie.org/~eagle/>
I don't know. My first approach was to swap randomly chosen elements,
for N times, but I wasn't sure that that wouldn't create a bias.
> You instead want to swap each element with a randomly chosen
> element between it and the end of the list.
I do have a proof (well, sketch of it) that this algorithm will work.
It is by induction:
For one element, the algorithm will be trivially correct.
For N elements, there's an 1/N chance that the algorithm will swap the
first element with itself, leaving the others alone.
Further there's an 1/N chance that the algorithm will swap the first
element with any given other element O. This gives O an 1/N chance to
finish on position 1, and an (N-1)/N chance to end up on any other
position, at which is was placed with an 1/(N-1) probability in the
previous round by induction, which gives an (N-1)/N*1/(N-1) = 1/N
probability, q.e.d.
Conclusion: I don't really know whether my algorithm is correct, but I
can't prove it and I *can* prove yours, so your algorithm is
preferrable.
> Ryan Tarpine <rtar...@hotmail.com> wrote:
> > What's the easiest way to get a shuffled list or array of a
> > sequence of numbers? e.g. I need 0 through 1599 in random order,
> > each number appearing exactly once. Ocaml if possible. Thanks in
> > advance
> Easiest would be:
in Haskell, (sorry)
factorial n = product [1..n]
premutations l = left as an exercise
then
do r <- newStdGen -- start off a random generator
let (index, nextgen) = randomIvalInteger (0, factorial 1600 - 1) r
-- generate a random integer between 0 and 1600!
return (permutations [0..1599] !! index)
gives you what you want
> The algorithm is also very imperative;
My algorithem is quite functional. Might be a bit slow, though ;-)
--
Jón Fairbairn Jon.Fa...@cl.cam.ac.uk
Yup, it's O(N!) in time and space, since permutations is generating all
of them. I don't think that current-day computers can do that with 1600
items.
Is there a functional, efficient algorithm for generating a random
permutation?
> > do r <- newStdGen -- start off a random generator
> > let (index, nextgen) = randomIvalInteger (0, factorial 1600 - 1) r
> > -- generate a random integer between 0 and 1600!
> > return (permutations [0..1599] !! index)
> Yup, it's O(N!) in time and space, since permutations is generating all
> of them.
Uh, doesn't laziness turn it into an O(n) space thing? So unless I'm
mistaken, you're left with only O(n!) time complexity. Might still be
enough to put some people off, though.
> Is there a functional, efficient algorithm for generating a random
> permutation?
Uh, given a list of elements, draw a random one and add to an
accumulator list until none are left? Gives you O(n^2), but that's
because you have to traverse lists - and the constant factor is low.
And space is O(n).
-kzm
--
If I haven't seen further, it is by standing in the footprints of giants
I can prove that yours is wrong. Let N=3. Your algorithm generates
three random numbers in the course of its execution. Each of these
random numbers is between 1 and 3. The result, R, is a function of
these three random numbers and nothing else.
There are 27 possible triples of random numbers (1,1,1) .. (3,3,3),
and 6 possible results R. Since 27 is not an exact multiple of 6,
your algorithm must select some results more often than others.
In general, N^N is not a multiple of N! unless N<=2, so your algorithm
is biased except when N<=2.
--
@P=split//,".URRUU\c8R";@d=split//,"\nrekcah xinU / lreP rehtona tsuJ";sub p{
@p{"r$p","u$p"}=(P,P);pipe"r$p","u$p";++$p;($q*=2)+=$f=!fork;map{$P=$P[$f^ord
($p{$_})&6];$p{$_}=/ ^$P/ix?$P:close$_}keys%p}p;p;p;p;p;map{$p{$_}=~/^[P.]/&&
close$_}%p;wait until$?;map{/^r/&&<$_>}%p;$_=$d[$q];sleep rand(2)if/\S/;print
you can have an O(n log n) by using a tree instead of a list.
--
Rémi Vanicat
van...@labri.u-bordeaux.fr
http://dept-info.labri.u-bordeaux.fr/~vanicat
Hmm... I'd think this heavily depends on the details of the
implementation of permutations.
Am I guessing right here?
> So unless I'm
> mistaken, you're left with only O(n!) time complexity. Might still be
> enough to put some people off, though.
Are you sure you meant O(n!) here? That's one of the worst complexities
that one encounters in practice!
> > Is there a functional, efficient algorithm for generating a random
> > permutation?
>
> Uh, given a list of elements, draw a random one and add to an
> accumulator list until none are left?
For example.
> Gives you O(n^2), but that's
> because you have to traverse lists - and the constant factor is low.
O(N^2) isn't "efficient" if a linear one exists. The worst I'd accept is
O(N log N) (and when it comes to log N factors, I'm more concerned about
constants than big-O complexity).
One could use a tree that stores the not-yet-assigned numbers. We'd have
to store the size of each subtree in its root to guide the random-number
generator. And, since we're at it, we can dispose of the actual numbers,
since the number generated can be inferred from the sequence of
left-right decisions of the RNG, leaving us with a tree where each node
stores the number of unused numbers in some power-of-two-sized interval.
This would have a space and time complexity of O(N log N), but with a
relatively high constant overhead.
> And space is O(n).
Agreed.
> Ketil Z Malde <ke...@ii.uib.no> wrote:
> > "Joachim Durchholz" <joac...@gmx.de> writes:
> >
[I wrote]
> > > > do r <- newStdGen -- start off a random generator
> > > > let (index, nextgen) = randomIvalInteger (0, factorial 1600 -
> 1) r
> > > > -- generate a random integer between 0 and 1600!
> > > > return (permutations [0..1599] !! index)
> >
> > > Yup, it's O(N!) in time and space, since permutations is generating
> all
> > > of them.
> >
> > Uh, doesn't laziness turn it into an O(n) space thing?
>
> Hmm... I'd think this heavily depends on the details of the
> implementation of permutations.
I think so.
> Am I guessing right here?
>
> > So unless I'm
> > mistaken, you're left with only O(n!) time complexity. Might still be
> > enough to put some people off, though.
>
> Are you sure you meant O(n!) here?
He did. I'm not sure that people have fully appreciated the
implications of the smiley in my original post!
--
Jón Fairbairn Jon.Fa...@cl.cam.ac.uk
Code in Ocaml:
let _ = Random.self_init ()
(* shuffle the array in place *)
let shuffle (a:'a array) : unit =
let cmp a b = (Random.int 3) - 1 in
Array.sort cmp a
(* return a permuted array of ints of size l *)
let permutated_int (l:int) : int array =
let a = Array.init l (fun ii -> ii) in
shuffle a;
a
and run you get
# permutated_int 10;;
- : int array = [|3; 5; 9; 4; 1; 6; 8; 7; 2; 0|]
# permutated_int 20;;
- : int array =
[|11; 18; 3; 14; 2; 8; 7; 17; 9; 10; 5; 16; 13; 1; 15; 19; 12; 6; 4; 0|]
# permutated_int 20;;
- : int array =
[|9; 4; 17; 5; 14; 18; 7; 6; 11; 15; 19; 10; 13; 12; 16; 3; 8; 2; 1; 0|]
# permutated_int 20;;
- : int array =
[|4; 8; 16; 7; 17; 6; 10; 14; 15; 18; 11; 19; 9; 3; 0; 5; 13; 2; 1; 12|]
#
--
Mattias Waldau, http://www.abc.se/~m10217/
> I would use sort to shuffle, and have a compare function that
> is random. You will get a n log n shuffle using this technique.
Argh! There is a risk that the sort function will loop, or perform an
"array out of bounds" error. All sorting algorithms rely on the comparison
function to be a total order, which is certainly not the case for yours:
> let cmp a b = (Random.int 3) - 1
Better not do this -- unless you're competing for the "best abuse of a
library function" contest :-)
What you could do, however, is associate random integers to each element of
the array to be shuffled, then sort the (elt, random) pairs by increasing
"random" components. That should give you a good shuffle of the "elt"
components, in n log n time.
- Xavier Leroy
--
Valid e-mail address (without the underscores): Xavier.Leroy@i_n_r_i_a.f_r
This is a protection against junk mail. Apologies for the inconvenience.
Home page: http://pauillac.inria.fr/~xleroy/
Yeah... but can you prove that the distribution is truly random?
You'd have to prove that the sort algorithm, whatever it is, will place
an element in any array slot with equal probability if the comparison
operation is random. Quicksort is optimized for other purposes, so I
doubt it will have that property. OTOH if it does work, I suspect that
it will work for any sort algorithm.
Food for thought for algorithm researchers :)
> What you could do, however, is associate random integers to each element of
> the array to be shuffled, then sort the (elt, random) pairs by increasing
> "random" components. That should give you a good shuffle of the "elt"
> components, in n log n time.
That's what I'd do, though you need to make sure that the randoms are
all distinct, otherwise you get the n^n instead of n! problem
mentioned earlier in the thread in connexion with one of the more
imperative algorithms. Then the difficulty is to generate n /distinct/
random numbers in reasonable time/space.
--
Jón Fairbairn Jon.Fa...@cl.cam.ac.uk
Linear congruential generators have that property. Since they are of
the form x' <- (ax + b) mod m, you get a repeat only when the prng
loops. So a long-period will generate numbers without repeats.
This is occasional justification for using them -- I can't think of
any other situation in which you would want to.
Neel
> That's what I'd do, though you need to make sure that the randoms are
> all distinct, otherwise you get the n^n instead of n! problem
> mentioned earlier in the thread in connexion with one of the more
> imperative algorithms. Then the difficulty is to generate n /distinct/
> random numbers in reasonable time/space.
As someone else responded, if your PRNG has a large period, this won't
happen. In the general case, it's also improbable to get two identical
random numbers if your sequence is short enough. E.g. assuming 64-bit
random numbers, collisions are unlikely if the sequence to shuffle has
much less than 2^32 elements, which should often be the case :-)
> In the general case, it's also improbable to get two identical
> random numbers if your sequence is short enough. E.g. assuming 64-bit
> random numbers, collisions are unlikely if the sequence to shuffle has
> much less than 2^32 elements, which should often be the case :-)
Well, yes, that's true. But it doesn't /feel/ right to rely on
it, somehow!
--
Jón Fairbairn Jon.Fa...@cl.cam.ac.uk
Quicksort and mergesort are the sort algoritms that I mostly believe will
make a good shuffle.
In quicksort, the call to partition will take an element at a time
and decide if it should be put into the left or to the right subpart, then
this will be repeated recursively for the subparts.
In mergesort, the merger will either to take the first element
from the left or the right subpart, and then continue to recurse.
Regarding Xavier Leroy fear that the sorting library might crash
if compare isn't a total ordering, of course I cannot argue since
Xavier Leroy probably is the one who implemented the sorting in Ocaml.
A good and pure sorting functions should never be able to detect that
the compare function is cheating, since the only way to to that
would be to compare the same two elements twice. That compare is a
total ordering is only needed in order to be sure that the result
is sorted.
Al
For shuffling, LCGs are no good, because they do not have enough state.
Say your LCG has N bits of state. You are trying to produce an
unbiased shuffle of K items. Since 2^N does not divide K!, there is bias.
For example, consider an LCG with 15 bits of state. You are trying to
shuffle 7 items. You have just put 32768 items (the 32768 possible
outputs of the LCG) into 5040 boxes (the shuffles). The best possible
distribution will assign 6 random seeds to some permutations and 7
seeds to some others; this means that even in the optimal case some
permuations will come up 17% more often than others. With this much
bias, it is not clear that it is even worth using an 'unbiased'
algorithm.
A 15-bit LCG is not random enough to shuffle more than about 5 items
effectively. A 64-bit LCG will suffice to shuffle a list of up to 12
items. As the size of the list K increases, the amount of state
necessary in the LCG to acheave a reasonably unbiased shuffle
increases as O(K log K).
That's not really a disproof. It still depends on how often the random
comparison is applied to what elements.
I don't think that such abstract arguments would help much. Somebody
would have to either go through the gory details of analysing the effect
of a random sort order on a concrete sort algorithm, or be a genius and
give a *strict* argument why sorting by "random order" is a good or a
bad idea.
A better idea might be to apply the generate-next-random-number
algorithm to all numbers and use these numbers to establish a sort
order.
As Jon has pointed out, if you're truly worried about the quality of the
randomness, you'd have to use a PRNG with more bits of state than each
number alone can provide.
(Hope you can sort this out, it's not too clear what I'm writing here
but I don't have the time to reword this once again.)
If you check on this to see what the distribution of results is,
you will almost certainly find that it is very non-random looking.
Running with the original spec (1600 elements to sort, initially
arranged 0 to 1599), I find that the array returned has the first
elements too high by about 10%, and the last elements too low by
about 98%.
Look at the last two or three elements in the arrays returned below,
and you can see that this effect shows in your results, too:
> # permutated_int 10;;
> - : int array = [|3; 5; 9; 4; 1; 6; 8; 7; 2; 0|]
> # permutated_int 20;;
> - : int array =
> [|11; 18; 3; 14; 2; 8; 7; 17; 9; 10; 5; 16; 13; 1; 15; 19; 12; 6; 4; 0|]
> [|9; 4; 17; 5; 14; 18; 7; 6; 11; 15; 19; 10; 13; 12; 16; 3; 8; 2; 1; 0|]
> # permutated_int 20;;
> - : int array =
> [|4; 8; 16; 7; 17; 6; 10; 14; 15; 18; 11; 19; 9; 3; 0; 5; 13; 2; 1; 12|]
Al
Damn, you're right -- I should have thought of that. Thanks for
pointing out my error!
Neel
Well, you can check for repeats in n log n time, and repeat if
necessary. So you don't get a guaranteed time bound, but you get
guaranteed correctness and an infintesimal chance of repeating
more than a constant number of times.
--
Jeffrey M. Vinocur * jm...@cornell.edu
http://www.people.cornell.edu/pages/jmv16/
Ryan Tarpine wrote:
> What's the easiest way to get a shuffled list or array of a sequence of
> numbers? e.g. I need 0 through 1599 in random order, each number appearing
> exactly once. Ocaml if possible. Thanks in advance
>
i'm still new at functional programming, so here's a C++ algorithm. you'll
need a better way to seed the random variable though...
#include <iostream>
using namespace std;
void shuffle (int []);
int main()
{
int arr[1600] = {0};
shuffle(arr);
return 0;
}
/*
* I need 0 through 1599 in random order, each number appearing exactly
once. Ocaml if possible.
*/
void shuffle (int shuffarr[])
{
int rnum;
for (int i = 1; i <= 1600; i++) {
do {
rnum = rand()%1599;
} while (shuffarr[rnum] != 0);
shuffarr[rnum] = i - 1;
}
}
hope this helps a little.
cheers,
christian
It does not, as long as the total number of comparisons is finite.
There is no way to add up a finite number of fractions of the form
2^(-k) and get 1/6.
>Somebody would have to either go through the gory details of
>analysing the effect of a random sort order on a concrete sort
>algorithm,
Fine. I'll use the following sort algorithm for three elements a,b,c:
Compare a and b.
If a<b, compare b and c.
If b<c, you are done: a<b<c
If c<b, compare a and c.
If a<c, you are done: a<c<b
If c<a, you are done: c<a<b
If b<a, compare b and c.
If b<c, compare a and c.
If a<c, you are done: b<a<c
If c<a, you are done: b<c<a
If c<b, you are done: c<b<a
If each 'comparison' is done at random with a 1/2 probability of going
either way, then the outcome is:
a<b<c 1/4
a<c<b 1/8
c<a<b 1/8
b<a<c 1/8
b<c<a 1/8
c<b<a 1/4
Some of the orderings appear twice as often as others.
No sorting algorithm can do better than this in general, and most will
do exactly this. For instance, a straightforward implementation of
quicksort produces a table just like this one when it is run on a
3-element set. The pivot element is chosen. It is compared with the
other two elements. Half the time it turns out to be the middle
element and no more comparisons are done. The other half the time it
is either the largest or smallest, and the remaining two items must be
compared.
You can already see that something is wrong, because the pivot element
should be the middle element 1/3 of the time, not 1/2 of the time.
It's now trivial to prove that quicksort with a fair random comparison
function on any number of elements produces a biased shuffle. The
pivot element might be larger than all but three of the remaining
elements, and in that case the remaining three elements will be
recursively sorted with quicksort, which we've already seen will order
them in a biased way.
You can fix this by adjusting the random comparison function so that
it doesn't always choose between a<b and b<a equiprobably. But to do
that probably turns out to be equivalent to the simpler well-known
algorithms like Fischer-Yates.
>A better idea might be to apply the generate-next-random-number
>algorithm to all numbers and use these numbers to establish a sort
>order.
That works, if there is enough entropy in the PRNG. Fischer-Yates
will be more efficient, if your language has arrays.
That's less efficient than necessary, and has a *much* worse worst-case
efficiency than other algorithms that have been proposed!
Uh, sorry, somehow my post got threaded after the wrong post. I couldn't
agree more with your analysis (my fault, probably).
Anyway. The issue has been discussed far beyond the point of usefulness
anyway - a good algorithm was found, we've entered the hair-splitting
stage.
First we need to decide what is the perfect shuffle.
Let's consider a sequence of n elements: (e1, e2, ...en). Intuitively,
the perfect random shuffle will be a permutation chosen uniformly at
random from the set of all possible n! permutations. Let (b1, b2,
... bn) be such a random permutation. Of all n! permutations, (n-1)!
of them will have e1 at the first position, (n-1)! more will have
element e2 at the first position, etc. Therefore, in the perfect
shuffle (b1, b2, ... bn) b1 is uniformly randomly chosen among {e1,
... en}. The second element b2 of the shuffled sequence is uniformly
randomly chosen among {e1, ... en} - {b1}. The third element of the
shuffle b3 is uniformly randomly chosen among {e1, ... en} - {b1, b2},
etc. Therefore, to perform a perfect shuffle, we need a sequence of
numbers (r1, r2, ... rn) where r1 is a sample of a random
quantity uniformly distributed within [0..n-1]. r2 is an independent
sample of a random quantity uniformly distributed within
[0..n-2]. Finally, rn is 0. It easy to see that the joint probability
of (r1, r2, ... rn) is 1/n * 1/(n-1) * ... 1 = 1/n! -- which is to be
expected.
The imperative implementation of the algorithm is well known. Let's
arrange the input sequence (e1, e2, ...en) into an array 'a' so that
initially a[i] = ei, i=1..n. At step 1, we choose a random number r1
uniformly from [0..n-1]. Thus a[1+r1] will be the first element of the
permuted sample, b1. We swap a[1+r1] and a[1]. Thus a[1] will contain
b1. At step 2, we choose a random number r2 uniformly from
[0..n-2]. a[2+r2] will be a uniform sample from {e1..en} - {b1}. After
we swap a[2] and a[2+r2], a[2] will be b2, and a[3..n] will contain
the remaining elements of the original sequence, {e1..en} -
{b1,b2}. After n-1 steps, we're done. The imperative algorithm in
OCaml has been already posted on this thread.
Let's consider a functional implementation. The code will take as its
input a sequence (r1, r2, ... rn) where ri is an independent sample of
a random quantity uniformly distributed within [0..n-i]. We can obtain
r[i] as rng[i] mod (n-i), i<n; r[n] is 0. Here, rng is a sequence of
samples independently drawn from a uniform random distribution
[0..M-1]. We can obtain rng from radioactive decay measurements or
from published tables of random numbers. We can compute rng by
applying a digest function (e.g., md5) to a big collection of events
considered random (keystroke presses, ethernet packet arriving times,
checksums in TCP packets from several IP addresses, etc). Some
systems have a /dev/random, which is a source of a cryptographically
strong random numbers. Finally, rng can be a suitable pseudo-random
number generator (PRNG). Note, PRNG are specifically tested that (rng
mod k) is uniformly distributed. The latter phrase does not mean that
the histogram of (rng mod k) is perfectly flat. Even the "real"
random-number generators don't have the perfectly flat histogram (for
any finite number of samples). It's important that the histogram is
"statistically" flat (in the sense of chi-squared or
Kolmogorov-Smirnov criteria). A volume of Knuth discusses PRNG tests
in great detail.
First, a naive functional shuffle. Because in the sequence
(r1,...rn), rn is always zero, we pass to function 'shuffle' a
sequence of n-1 random numbers.
extract:: Integer -> [a] -> (a,[a])
-- given a list l, extract the n-th element and return the element and
-- the remaining list. We won't worry about the order of the list
-- of the remaining elements. n>=0. n==0 corresponds to the first element.
extract 0 (h:t) = (h,t)
extract n l = loop n l []
where
loop 0 (h:t) accum = (h,accum ++ t)
loop n (h:t) accum = loop (n-1) t (h:accum)
-- given a sequence (e1,...en) to shuffle, and a sequence
-- (r1,...r[n-1]) of numbers such that r[i] is an independent sample
-- from a uniform random distribution [0..n-i], compute the
-- corresponding permutation of the input sequence.
shuffle:: [b] -> [Integer] -> [b]
shuffle [e] [] = [e]
shuffle elements (r:r_others) = let (b,rest) = extract r elements
in b:(shuffle rest r_others)
Obviously the running time of "extract n l" is
O(length(l)). Therefore, the running time of shuffle is O(n^2).
The following is a more sophisticated algorithm
-- A complete binary tree, of leaves and internal nodes.
-- Internal node: Node card l r
-- where card is the number of leaves under the node.
-- Invariant: card >=2. All internal tree nodes are always full.
data Tree a = Leaf a | Node Integer (Tree a) (Tree a) deriving Show
fix f = g where g = f g -- The fixed point combinator
-- Convert a sequence (e1...en) to a complete binary tree
build_tree = (fix grow_level) . (map Leaf)
where
grow_level self [node] = node
grow_level self l = self $ inner l
inner [] = []
inner [e] = [e]
inner (e1:e2:rest) = (join e1 e2) : inner rest
join l@(Leaf _) r@(Leaf _) = Node 2 l r
join l@(Node ct _ _) r@(Leaf _) = Node (ct+1) l r
join l@(Leaf _) r@(Node ct _ _) = Node (ct+1) l r
join l@(Node ctl _ _) r@(Node ctr _ _) = Node (ctl+ctr) l r
-- example:
Main> build_tree ['a','b','c','d','e']
Node 5 (Node 4 (Node 2 (Leaf 'a') (Leaf 'b'))
(Node 2 (Leaf 'c') (Leaf 'd')))
(Leaf 'e')
-- given a sequence (e1,...en) to shuffle, and a sequence
-- (r1,...r[n-1]) of numbers such that r[i] is an independent sample
-- from a uniform random distribution [0..n-i], compute the
-- corresponding permutation of the input sequence.
shuffle1 elements rseq = shuffle1' (build_tree elements) rseq
where
shuffle1' (Leaf e) [] = [e]
shuffle1' tree (r:r_others) =
let (b,rest) = extract_tree r tree
in b:(shuffle1' rest r_others)
-- extract_tree n tree
-- extracts the n-th element from the tree and returns
-- that element, paired with a tree with the element
-- deleted.
-- The function maintains the invariant of the completeness
-- of the tree: all internal nodes are always full.
-- The collection of patterns below is deliberately not complete.
-- All the missing cases may not occur (and if they do,
-- that's an error.
extract_tree 0 (Node _ (Leaf e) r) = (e,r)
extract_tree 1 (Node 2 (Leaf l) (Leaf r)) = (r,Leaf l)
extract_tree n (Node c (Leaf l) r) =
let (e,new_r) = extract_tree (n-1) r
in (e,Node (c-1) (Leaf l) new_r)
extract_tree n (Node n1 l (Leaf e))
| n+1 == n1 = (e,l)
| otherwise = let (b,new_l) = extract_tree n l
in (b,Node (n1-1) new_l (Leaf e))
extract_tree n (Node c l@(Node cl _ _) r)
| n < cl = let (e,new_l) = extract_tree n l
in (e,Node (c-1) new_l r)
| otherwise = let (e,new_r) = extract_tree (n-cl) r
in (e,Node (c-1) l new_r)
-- examples
Main> shuffle1 ['a','b','c','d','e'] [0,0,0,0]
"abcde"
Note, that rseq of all zeros leaves the sequence unperturbed.
Main> shuffle1 ['a','b','c','d','e'] [4,3,2,1]
"edcba"
The rseq of (n-i | i<-[1..n-1]) reverses the original sequence of elements
Main> shuffle1 ['a','b','c','d','e'] [2,1,2,0]
"cbead"
Just some random shuffle.
The function build_tree builds a complete binary tree, of depth
ceil(log2(n)). The function 'extract_tree' traverses the tree and
rebuilds a truncated branch. This requires as many steps as the length
of the rebuilt branch, which is at most ceil(log2(n)). To be more
precise, the complexity of 'extract_tree' is ceil(log2(size(tree))),
because extract_tree keeps the tree complete. The function shuffle1'
invokes 'extract_tree' (n-1) times. Thus the overall complexity is
O(n*logn).
> > ...though you need to make sure that the randoms are all distinct,
> Linear congruential generators have that property. Since they are of
> the form x' <- (ax + b) mod m, you get a repeat only when the prng
> loops. So a long-period will generate numbers without repeats.
I'm afraid this is not true. Indeed, a linear congruential generator
repeats _completely_ after a period: that is sample[i] ==
sample[i+period] for _all_ i. That does not mean that a particular
sample cannot repeat within a period. Suppose we have a sequence of
random numbers uniformly distributed within [0..M-1]. If the i-th
sample has the value of x, the (i+1)-th sample may be x as well. The
probability of this event is the same as the probability of the
(i+1)-th sample being x+1 or any other given number within
[0..M-1]. Each sample in a sequence is chosen independently. The
sample of random numbers may contain subsequences (0,0,0,0) -- which
are just as likely as (1,2,3,4) or (3,1,4,1) or any other given
sequence of four values. In fact, this is one of the PRNG tests --
making sure that all the tuples (r1,r2) or (r1,r2,r3) appear equally
likely. The infamous rand() generator fails the triples test -- and
the generator was much maligned in the G.Forsythe, M.Malcolm, C.Moler
book. BTW, given a sequence of random numbers, the probability that
two consecutive numbers in the sequence are the same is 1/M. If we're
to shuffle a sequence of 1600 elements, we are interested in random
numbers distributed within [0..1599]. We need to draw at least 1599
elements to shuffle the sequence, i.e, 1598 pairs of consecutive
drawings. Given that the probability of two consecutive samples being
the same is 1/1600, which should rather expect to see one such pair.