Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Compare two methods of random permutations

42 views
Skip to first unread message

Mok-Kong Shen

unread,
May 8, 2013, 5:35:18 PM5/8/13
to

I want to compare in a practical sense two methods of random
permutations -- one theoretically perfect, namely that of Fisher and
Yates, and another ad hoc, let's call it X. A way of comparison I could
think of is the following:

One starts from the standard configuration of n objects [0, 1, 2,
...., n-1] and apply each method a fairly large number of times
successively to permute them and each time one computes the Hamming
distance of the result from the standard configuaration. One obtains
thus the frequency distribution of the Hamming distances for each
method. If the frequency distributions are fairly comparable to each
other, then X could be practically employed in place of the
theoretically perfect one.

Is this line of thought of mine correct? Does anyone have an idea of
a better method of comparison? Thanks in advance.

M. K. Shen

James Dow Allen

unread,
May 12, 2013, 12:31:11 AM5/12/13
to
On May 9, 4:35 am, Mok-Kong Shen <mok-kong.s...@t-online.de> wrote:
> I want to compare in a practical sense two methods of random
> permutations -- one theoretically perfect, namely that of Fisher and
> Yates, and another ad hoc ...

No one answered yet, so let me take a stab at a possible approach.

Pick a small set size, perhaps 9 or 10. Do the shuffle several
million times, preparing a histogram for the 9! (or 10!) possible
results. Prepare a histogram of the population counts in the 1st
histogram; calculate its statistical moments. Compare said
moments with the theoretically perfect moments.

James

Mok-Kong Shen

unread,
May 13, 2013, 2:01:43 AM5/13/13
to
Am 08.05.2013 23:35, schrieb Mok-Kong Shen:
>
> I want to compare in a practical sense two methods of random
[snip]

The OP was also posted to sci.stat.math (08.05.2013 11:29). There
have been some discussions, with one post of mine containing
certain computational results of mine. Please kindly take a
look there.

M. K. Shen

Mok-Kong Shen

unread,
May 13, 2013, 2:07:22 AM5/13/13
to
Please kindly take a look of my computational results reported
in the thread in sci.stat.math (08.05.2013 11:29).

M. K. Shen

James Dow Allen

unread,
May 13, 2013, 5:34:21 AM5/13/13
to
Not to nitpick, but this is why Usenet has the concept of
posting to multiple groups. You don't create two different
threads and then add posts referencing one to the other. ::whack::
You put two newsgroups on the Newsgroups line,
separated by a comma.

There are newbie groups to explain this further and help
you get started on Usenet.

James

bob

unread,
May 13, 2013, 11:26:17 AM5/13/13
to
Why would you not use the method of random permutation that is theoretically perfect? Isn't it pretty much trivial to implement and not very hard to understand?

You basically have a vector of stuff.

Then you pick one random item from the vector.

Make it the first in the new list.

Remove it from the vector.

Pick another random item from the vector.

Make it the second in the new list.

Remove it from the vector.

Repeat until your initial vector is empty.

Thanks.

James Dow Allen

unread,
May 13, 2013, 2:42:30 PM5/13/13
to
On May 13, 10:26 pm, bob <b...@coolfone.comze.com> wrote:
> On Wednesday, May 8, 2013 4:35:18 PM UTC-5, Mok-Kong Shen wrote:
> > I want to compare in a practical sense two methods of random
>
> Why would you not use the method of random permutation that is theoretically perfect?  Isn't it pretty much trivial to implement and not very hard to understand?

Good question. I'm going to guess he's worried about speed,
but still don't know how expects to improve on Fisher shuffle,
which only needs 51 small random integers for a 52-card deck.

Perhaps he doesn't know that you can get several independent
small integers from a single random-number, e.g. to get a
12-permutation from a 31-bit random:

int a, b;
do a = rand();
while (a >= 1916006400);
b = a % 12; a /= 12; doshufstep(12, b);
b = a % 11; a /= 11; doshufstep(11, b);
b = a % 10; a /= 10; doshufstep(10, b);
b = a % 9; a /= 9; doshufstep( 9, b);
b = a % 8; a /= 8; doshufstep( 8, b);
b = a % 7; a /= 7; doshufstep( 7, b);
b = a % 6; a /= 6; doshufstep( 6, b);
b = a % 5; a /= 5; doshufstep( 5, b);
b = a % 4; a /= 4; doshufstep( 4, b);
b = a % 3; a /= 3; doshufstep( 3, b);
b = a % 2; a /= 2; doshufstep( 2, b);

(A good compiler will remember the quotient
after doing the modulus.)

IIRC, I explained this to someone with a name
similar to OP's last decade.

James

Mok-Kong Shen

unread,
May 13, 2013, 3:09:36 PM5/13/13
to
Am 13.05.2013 17:26, schrieb bob:
> Why would you not use the method of random permutation that is
> theoretically perfect?

It was my fault that I din't cross-post to 2 groups (thus I later
requested you to kindly look up there). I explained in sci.stat.math
my motivation as follows (note also that n=52 is only an example case):

...... In particular I don't need to
generate the whole range of possible permutations but on the other hand
a not too limited small range. Consider the case n=52, one knows that
in-shuffle and out-shuffle have periods of 52 and 8 respectively, which
are negligibly small compared to the whole range (of size (52!)). On the
other hand, the method of Fisher and Yates needs n-1 (pseudo-)random
numbers. In one of my applications a couple of such random numbers are
already available for other reasons, so it would certainly be
conveninent and advantageous, if I could just use them for doing random
permutation without having to invoke a PRNG to acquire the n-1 random
numbers needed for F&Y. This was the motivation of my OP and, as it
turns out, my scheme runs also much faster than F&Y as reported further
below.

M. K. Shen



bob

unread,
May 13, 2013, 3:57:21 PM5/13/13
to
I think you may be overestimating the computational expense of creating pseudo-random numbers. In general, it is not that expensive as this code from Java shows:

/**
* Returns a pseudo-random uniformly distributed {@code int} value of
* the number of bits specified by the argument {@code bits} as
* described by Donald E. Knuth in <i>The Art of Computer Programming,
* Volume 2: Seminumerical Algorithms</i>, section 3.2.1.
*
* <p>Most applications will want to use one of this class' convenience methods instead.
*/
protected synchronized int next(int bits) {
seed = (seed * multiplier + 0xbL) & ((1L << 48) - 1);
return (int) (seed >>> (48 - bits));
}



Thanks.

Mok-Kong Shen

unread,
May 13, 2013, 4:59:30 PM5/13/13
to
There are of course lots of fast PRNGs, but how good these are
is not always clear. Anyway, I believe that for large n, the
efficiency issue may be something worthy of consideration.

M. K. Shen

bob

unread,
May 13, 2013, 6:28:41 PM5/13/13
to
If you want to learn more about how well different PRNG algorithms work, you may want to read the book

Seminumerical Algorithms

by Donald Knuth.

Here's an outline of it:

Chapter 3 – Random Numbers

3.1. Introduction
3.2. Generating Uniform Random Numbers
3.2.1. The Linear Congruential Method
3.2.1.1. Choice of modulus
3.2.1.2. Choice of multiplier
3.2.1.3. Potency
3.2.2. Other Methods
3.3. Statistical Tests
3.3.1. General Test Procedures for Studying Random Data
3.3.2. Empirical Tests
3.3.3. Theoretical Tests
3.3.4. The Spectral Test
3.4. Other Types of Random Quantities
3.4.1. Numerical Distributions
3.4.2. Random Sampling and Shuffling
3.5. What Is a Random Sequence?
3.6. Summary


(from http://en.wikipedia.org/wiki/The_Art_of_Computer_Programming)

Thanks.

Mok-Kong Shen

unread,
May 14, 2013, 4:54:00 AM5/14/13
to
Am 14.05.2013 00:28, schrieb bob:

> If you want to learn more about how well different PRNG algorithms work,
> you may want to read the book Seminumerical Algorithms
> by Donald Knuth.
[snip]

It's nice that you post the reference. I am in possession of that
great book since over a decade. I am interested in the topic of
PRN generation and has myself attempted to design PRNGs several times.
Recently I designed one, which is the function genrandom() that is
contained in my encryption software source code named JADE
(http://s13.zetaboards.com/Crypto/topic/6948465/1/#new). I should
be very grateful, if interested readers of this group would kindly
look at that PRNG and eventually give me comments and critiques.
(My email address above is a valid one.)

M. K. Shen



James Dow Allen

unread,
May 14, 2013, 8:32:24 AM5/14/13
to
On May 14, 3:54 pm, Mok-Kong Shen <mok-kong.s...@t-online.de> wrote:
> be very grateful, if interested readers of this group would kindly

Would be delighted to chit-chat.
When are you going to acknowledge my code fragment
which demonstrates finding a perfect random 12-permutation
with a single call to random()?
A 52-permutation can be formed from eight 32-bit randoms.

James

Mok-Kong Shen

unread,
May 14, 2013, 4:46:10 PM5/14/13
to
I am confused. In which follow-up did you mention your code?

M. K. Shen


James Dow Allen

unread,
May 14, 2013, 5:19:57 PM5/14/13
to
On May 15, 3:46 am, Mok-Kong Shen <mok-kong.s...@t-online.de> wrote:
> I am confused. In which follow-up did you mention your code?

Message-ID: <33a7d2bf-c504-471a-
b79e-93d...@oy9g2000pbb.googlegroups.com>

https://groups.google.com/group/comp.programming/msg/38c32bb27f26201a?dmode=source

James

James Dow Allen

unread,
May 14, 2013, 6:06:46 PM5/14/13
to
On May 15, 4:19 am, James Dow Allen <jdallen2...@yahoo.com> wrote:
> On May 15, 3:46 am, Mok-Kong Shen <mok-kong.s...@t-online.de> wrote:
>
> > I am confused. In which follow-up did you mention your code?
> https://groups.google.com/group/comp.programming/msg/38c32bb27f26201a...
>
> James

And, the reason I've been brusque or snippy in this thread is that
I EXPLAINED THIS TO YOU FOUR YEARS AGO, with you refusing to
take a few seconds to understand it.
https://groups.google.com/group/comp.programming/msg/da8573d12cad698f?dmode=source

You claimed then the method didn't work.
You were wrong then. You're still wrong.
Shall be bet money?

James

Mok-Kong Shen

unread,
May 15, 2013, 4:16:49 AM5/15/13
to
Oh, you should have provided a tiny little bit of context when
referring to matters 4 years ago! I'll look at the matter again.
Please give me 10 day time, ok?

M. K. Shen



James Dow Allen

unread,
May 15, 2013, 6:54:15 AM5/15/13
to
On May 15, 3:16 pm, Mok-Kong Shen <mok-kong.s...@t-online.de> wrote:

> Oh, you should have provided a tiny little bit of context when
> referring to matters 4 years ago! I'll look at the matter again.
> Please give me 10 day time, ok?

Fine. But just to clear up one more of your confusions:
The code snippet in question which I linked to just now,
WAS POSTED JUST A FEW HOURS AGO; it wasn't
from the 4-years-ago thread.

James

Patricia Shanahan

unread,
May 15, 2013, 10:11:30 AM5/15/13
to
The objection back then was "Anyway, I doubt the usefulness of % here,
since, if t is uniformly distributed in [0, n-1], then for m<n the
values t mod m wouldn't be uniformly distributed in [0, m-1]
in general, if I don't err."

That seems to me to be a good point. Suppose, for example that n is 16,
and m is 7. The results of the % for each value of t are:

0 -> 0
1 -> 1
2 -> 2
3 -> 3
4 -> 4
5 -> 5
6 -> 6
7 -> 0
8 -> 1
9 -> 2
10 -> 3
11 -> 4
12 -> 5
13 -> 6
14 -> 0
15 -> 1

Three inputs map to each of 0 and 1, but only two inputs map to each of
the other values. If t were uniformly distributed over [0,15] then t%7
would not be uniformly distributed.

The % method does work correctly if m is a factor of n.

java.util.Random contains code to correct for this effect in its
nextInt(int).

Patricia

James Dow Allen

unread,
May 15, 2013, 2:04:13 PM5/15/13
to
On May 15, 9:11 pm, Patricia Shanahan <p...@acm.org> wrote:
> The % method does work correctly if m is a factor of n.

Duh. Yeah. Note that the code snippet I posted assures, via
do a = rand();
while (a >= 1916006400);
that the requirements are met throughout.

> java.util.Random contains code to correct for this effect ...

So do my libraries.
I don't think this qualifies as "rocket science."

James

James Dow Allen

unread,
May 15, 2013, 2:06:47 PM5/15/13
to
On May 16, 1:04 am, James Dow Allen <jdallen2...@yahoo.com> wrote:

> Duh. ...

And, yes, I did explain this all to OP four years ago in
Message-ID: <34e58089-64de-4988-b50b-
bf12c7...@k13g2000prh.googlegroups.com>

HTH, James

Patricia Shanahan

unread,
May 15, 2013, 3:55:35 PM5/15/13
to
Ah, I see. You get a number that is uniformly distributed over the range
from 0 to 1916006400-1. 1916006400 is an integer multiple of 12!, so the
"m is a factor of n" condition is true for each remainder operation.

Patricia

Chris Uppal

unread,
May 16, 2013, 1:10:48 AM5/16/13
to
James Dow Allen wrote:

> Duh. Yeah. Note that the code snippet I posted assures, via
> do a = rand();
> while (a >= 1916006400);
> that the requirements are met throughout.

Missed that aspect of it myself. Shows the value of comments in code...

But, why 1916006400 ? I may be missing something but it seems to me that
239500800 (12! / 2) would work just as well and not have a confusing/misleading
factor of 8 mismatch with the code.

-- chris


Alan Curry

unread,
May 16, 2013, 2:57:48 AM5/16/13
to
In article <O-CdnaNSZ9AA9wnM...@bt.com>,
Whenever rand() returns something bigger than 1916006400 you have to call it
again. And keep calling it until it gives you something good. If you lower
the limit, you'll spend more time in the loop waiting for a usable random
number.

1916006400 is the largest multiple of 239500800 that fits in a signed 32-bit
integer, so it minimizes the number of rand() results that have to be thrown
away. If your rand() doesn't return a signed 32-bit integer you should adjust
the constant accordingly.

--
Alan Curry

Chris Uppal

unread,
May 16, 2013, 4:26:11 AM5/16/13
to
Alan Curry wrote:

> 1916006400 is the largest multiple of 239500800 that fits in a signed
> 32-bit integer, so it minimizes the number of rand() results that have to
> be thrown away.

Ah yes! Got it now, thanks.

-- chris


James Dow Allen

unread,
May 16, 2013, 4:35:43 AM5/16/13
to
On May 16, 12:10 pm, "Chris Uppal" <chris.up...@metagnostic.REMOVE-
THIS.org> wrote:
> James Dow Allen wrote:
> > Note that the code snippet I posted assures, via
> >          do a = rand();
> >         while (a >= 1916006400);
> > that the requirements are met throughout.
>
> Missed that aspect of it myself.  Shows the value of comments in code...

I could have written
// 5 * 12! would be too large -- overflows RANDMAX
while (a >= 4 * (12*11*10*9*8*7*6*5*4*3*2));
Sorry. Still, making excuses for the inexcusable, an
alert reader would have wondered why the "while a>= " is present
at all. Perhaps I forgot I wasn't posting in rec.puzzles. :-)

My "real code" for such is quite different:
1) One wants to handle arbitrary random sizes, not
just 12,11,10,9, ... specifically.
2) The sample code discards unused 11% of its random
inputs. Not only can that be reduced, but *some* of the
random "entropy" can be preserved even on the discards.

I don't know how important such coding is.
The twiddling sacrifices much of the time saved by fewer
calls to random(). Still I will post more general routines
if there's interest. Please tell me if most people use
31-bit generators (Posix?) or 32-bit generators (Marsaglia).
Yes, I know about "#ifdef" -- I just don't like to
post cluttered code.

James

bob

unread,
May 16, 2013, 4:46:24 PM5/16/13
to
I think what you are talking about is called "permutation unranking".

Basically, you take a number (a rank), and you use some math to convert it to a permutation. Each rank corresponds to a unique permutation.

Thanks.


Mok-Kong Shen

unread,
May 17, 2013, 8:11:48 AM5/17/13
to
I suppose one has first to know in which range is rand()
(assumed to be) a uniformly distributed random variable. Could
you tell? Further, could you at least give a sketch of what you
indicated to be more general constructs of your design?

M. K. Shen


James Dow Allen

unread,
May 17, 2013, 11:13:02 AM5/17/13
to
On May 17, 7:11 pm, Mok-Kong Shen <mok-kong.s...@t-online.de> wrote:
> I suppose one has first to know in which range is rand()
> (assumed to be) a uniformly distributed random variable. Could
> you tell? Further, could you at least give a sketch of what you
> indicated to be more general constructs of your design?

The method maintains variables a, b
where a is known to be a uniform variate
on {0, 1, 2, ..., b-1}
Initially, a=0, b=1.

To replenish when b becomes small, perform
b <-- b * 2^16
a <-- a * 2^16 + (16 random bits)
(The "16" is arbitrary example. Details may vary
according to word size and random() spec.)

To output a uniform variate on {0, 1, ..., N-1}
Retry:
q <-- a / N
r <-- a % N
m <-- b / N
If m == q Then
b = b % N
a = r
Replenish
goto Retry
Else
a = q
b = m
Return r

(Sorry if I've slipped translating from C to pseudo. ::whack:: )
Note that much of the division overhead is necessary in
any method, at least for "perfect" randoms.

Applications for such a method may be rare (though
not non-existent).
There are very fast medium-quality PRNG's good enough
for most applications when crypto-security is unneeded.
No one need bother with the method unless the PRNG
is slowish.

James

Mok-Kong Shen

unread,
May 19, 2013, 5:06:05 AM5/19/13
to
Am 17.05.2013 17:13, schrieb James Dow Allen:
> On May 17, 7:11 pm, Mok-Kong Shen <mok-kong.s...@t-online.de> wrote:
>> I suppose one has first to know in which range is rand()
>> (assumed to be) a uniformly distributed random variable. Could
>> you tell? Further, could you at least give a sketch of what you
>> indicated to be more general constructs of your design?
>
> The method maintains variables a, b
> where a is known to be a uniform variate
> on {0, 1, 2, ..., b-1}
> Initially, a=0, b=1.
[snip]

My first question was: You use rand(), presumably from
a certain programming language and that is fixed/unique
at least for a given compiler. Does one know in which
range [0, b-1] is rand() a uniform variate?

M. K. Shen

James Dow Allen

unread,
May 19, 2013, 6:14:42 AM5/19/13
to
On May 19, 4:06 pm, Mok-Kong Shen <mok-kong.s...@t-online.de> wrote:
> Am 17.05.2013 17:13, schrieb James Dow Allen:> On May 17, 7:11 pm, Mok-Kong Shen <mok-kong.s...@t-online.de> wrote:
> > The method maintains variables a, b
> > where a is known to be a uniform variate
> > on {0, 1, 2, ..., b-1}
> > Initially, a=0, b=1.
>
> My first question was: You use rand(), presumably from
> a certain programming language and that is fixed/unique
> at least for a given compiler. Does one know in which
> range [0, b-1] is rand() a uniform variate?

From your question, I'm afraid you failed to understand
my method. :-(

Anyway, for POSIX systems, rand() is uniform on
(0, 1, ..., RANDMAX}
HOWEVER, simplest, when you want best control and flexibility
is to link a specific PRNG into your application.
There are several such with source code easily found on-line.
Most return from {0, 1, ..., 2^32 - 1}

HOWEVER, this is unrelated to {0, ..., b-1} in the description
of the method I posted. That method maintains a variate
of ever-diminishing range, replenishing it by calling a PRNG
when the range becomes too small.
Hence the b in that description is variable.

James

Mok-Kong Shen

unread,
May 19, 2013, 7:26:22 AM5/19/13
to
But you do first get a number, say, R, from rand(), don't you?
If so, I think (at least till now) that the range of R, in
which R is (assumed to be) a uniform random variate, is of
significance.

Suppose now that R is uniformly distributed in [0, 2^32-1]. Then
as clearly shown in the post of Patricia Shanahan, one couldn't
"simply" use R%m to get a value that is uniformly distributed in
[0, m-1] for arbitrary m (which was my point of argument 4 years ago).

M. K. Shen

James Dow Allen

unread,
May 19, 2013, 7:59:23 AM5/19/13
to
And you're still wrong. Examine, with both eyes, EITHER the snippet
I posted 4 days ago, or the pseudo-code I posted three days ago.
Either of them deals with your objection.
Patricia figured it out. See if you can.
... I'm losing patience.

James

Mok-Kong Shen

unread,
May 19, 2013, 8:13:28 AM5/19/13
to
Let's be in details. Do you mean the following code?

do a = rand();
while (a >= 4 * (12*11*10*9*8*7*6*5*4*3*2));

Do you still maintain that, if a is (assumed to be uniform in
[0, 2^32-1], one could (theoretically correctly) get a uniformly
distributed random value in [0, m-1] for arbitrary m (e.g. 11),
with a%m? (Patricia Shanahan showed namely that one can't.)

M. K. Shen

James Dow Allen

unread,
May 19, 2013, 8:52:08 AM5/19/13
to
On May 19, 7:13 pm, Mok-Kong Shen <mok-kong.s...@t-online.de> wrote:
> Am 19.05.2013 13:59, schrieb James Dow Allen:
> > ... I'm losing patience.

One LAST try.

> with a%m? (Patricia Shanahan showed namely that one can't.)

No. Patricia overlooked the "while (a >= 4 * Factorial 15)
and revised her comment when it was pointed out.

This is my final effort. Pay attention.

You need a random number on {1,2,3,4,5} but all you have
for generator is a 6-sided die. What do you do?
Devise a solution before reading on.

.

.

.

..

Spoiler follows.

,

,

,

,

,

,

Roll the die. Accept any of {1,2,3,4,5}
But roll again if you get 6.

That's what my code does. That's what
my code did 4 years ago. That's how
everyone does it.

Does this help?

James

Mok-Kong Shen

unread,
May 19, 2013, 9:06:38 AM5/19/13
to
Am 19.05.2013 14:52, schrieb James Dow Allen:
> On May 19, 7:13 pm, Mok-Kong Shen <mok-kong.s...@t-online.de> wrote:
>> Am 19.05.2013 13:59, schrieb James Dow Allen:
>>> ... I'm losing patience.
>
> One LAST try.
>
>> with a%m? (Patricia Shanahan showed namely that one can't.)
>
> No. Patricia overlooked the "while (a >= 4 * Factorial 15)
> and revised her comment when it was pointed out.
>
> This is my final effort. Pay attention.
[skip]

I overlooked. I see your scheme. But for e.g. n=52 (cf. card games)
what kind of rand() would you need?

M. K. Shen

James Dow Allen

unread,
May 19, 2013, 9:50:23 AM5/19/13
to
Use the scheme I show in
Message-ID: <d40069e8-8b3d-4514-877f-
d50c7b...@pd6g2000pbc.googlegroups.com>
with N taking the values 52, 51, 50, ...
(Fisher-Yates).

ANY PRNG (rand()) can be used for the replenishing.
Most straightforward might be to grab a good 32-bit
PRNG (e.g. Mersenne Twister) off the Internet.
In that case you might use the pseudo-code as is,
with its 16-bit replenish. (Please don't tell me you
need help satisfying TWO invocations of 16-bit
replenish with ONE call to the 32-bit PRNG.)

James

Mok-Kong Shen

unread,
May 19, 2013, 10:14:22 AM5/19/13
to
But then you are returning to the common Fisher and
Yates scheme, which employs generally a good PRNG
returning a PRN in [0, 1) as real numbers, as explained
in Knuth's book. That's nothing new, isn't it?

M. K. Shen

James Dow Allen

unread,
May 19, 2013, 11:11:15 AM5/19/13
to
On May 19, 9:14 pm, Mok-Kong Shen <mok-kong.s...@t-online.de> wrote:
> But then you are returning to the common Fisher and
> Yates scheme, which employs generally a good PRNG
> returning a PRN in [0, 1) as real numbers, as explained
> in Knuth's book. That's nothing new, isn't it?
>
> M. K. Shen

Invoke the PRNG 8 times instead of 51 times.
What were you trying to do anyway, do
you even remember?

Mok-Kong Shen

unread,
May 19, 2013, 11:15:48 AM5/19/13
to
Do you have a "different" while-clause in the 8 invocations?
(You haven't anyway treated this, if I don't err.)

M. K. Shen

Mok-Kong Shen

unread,
May 19, 2013, 11:24:21 AM5/19/13
to
Am 19.05.2013 17:11, schrieb James Dow Allen:
[Addendum to my previous post] I forgot to answer your
question. I want to, if possible, use only 2 PRNs
in range [0,n-1] for e.g. n=52.

M. K. Shen


0 new messages