Constructing a random permutation on the fly

Tom Anderson

unread,

Jun 8, 2009, 6:57:22 PM6/8/09

to

Hello!

This question is not really about cryptography, but it has a
cryptographyish flavour, so i thought i'd ask it here. If there's a better
place to ask it, please let me know.

I have a list of N items, where N is about 100 000, and a few thousand or
more users. I would like to present each user with all the items in a
random order, where the order is different for each user. I only need to
go through each user's randomised list of items one by one, rather than
having random access into the list, but the process of going through them
could take days, weeks, perhaps even months.

I could put all the items in an array, shuffle them, and go through them
from first to last, storing the shuffled array for as long as it's needed.
Or i could put the numbers 0 ... N-1 in an array, shuffle and store that,
and go through that, using the numbers and indices into a master list of
items. However, i would like to avoid having to store great big lists of
numbers if at all possible.

So, what can i do?

My thinking is that i need three things. First, a counter for each user,
which starts at 0 and goes up to N-1. Secondly, a random(ish) key K, of
some form. Thirdly, a function which goes from a number in the range 0 ...
N-1 and a key to another number in the range 0 ... N-1, with the three
properties that (i) for any fixed value of the key, the mapping between
numbers is bijective, (ii) that mapping looks random (for some vague but
suitable definition of 'random'), and (iii) that for different values of
the key, the mapping looks different (for some vague but suitable
definition of 'different'). Then, when i need an item, i just put the
counter and the key through the function to get an index into a master
list of items, and then increment the counter. Thus, i can construct a
permutation of the items on the fly, storing only the counter and the key.

The problem is that i have no idea how to construct the function. Any
suggestions?

A simple one would be f(counter, key) = (counter + key) mod N, but that
fails criterion (ii), and possibly (iii) as well. If N+1 was prime, i
could do f(counter, key) = (((counter + 1) * (key + 1)) mod (N+1)) - 1,
which would be a bit better. If N was 2^64, i could use a block cipher
with a 64-bit block, which would be great.

But i'd really like a function that would work for any N.

The best i've come up with so far is something like (apologies if this
explanation is bad, or if the idea is bogus):

- Find the prime factors of N

- Express them as powers of primes (so 50 is 2^1 * 5^2)

- Construct Galois fields of order p^n for each

- Construct the Cartesian product of the Galois fields; note that the
order of this is N (right?)

- Define a reversible mapping from the numbers 0 ... N-1 to elements of
the Cartesian product set; it doesn't matter what it is, so do it by
going through the constituent Galois fields one by one, picking the element x
mod p^n for that field, and updating x to be x / p^n (like the fields
were digits in some really trippy positional digital notation)

- Choose a key from the range 0 ... N-1

- Combine two numbers in the range 0 ... N-1 by converting them to
elements of the Cartesian product, combining them by elementwise
addition, and converting the result back to a number

I started that description thinking i'd be able to use multiplication in
GF(p^n) at the climax, but then realised that that wouldn't be bijective
because of the zeroes. So i've used addition, which is much less
avalanchey. As it stands, that function would be stunningly bad for
numbers of the form p^n - it would reduce to the (counter + key) mod N
function i dismissed above.

Okay, so can i factorise N (mostly) into subprime factors, and then use
elementwise modular multiplication with a +1/-1 shimmy to avoid zeroes,
also as above? Does that sentence make any sense at all to anyone?

Anyway, as you can see, i've got nothing. Any and all suggestions welcome!

tom

--
The girlfriend of my friend is my enemy. -- old Arabic proverb

Tom Anderson

unread,

Jun 8, 2009, 7:12:05 PM6/8/09

to

On Mon, 8 Jun 2009, Tom Anderson wrote:

> The best i've come up with so far is something like (apologies if this
> explanation is bad, or if the idea is bogus):
>
> - Find the prime factors of N
>
> - Express them as powers of primes (so 50 is 2^1 * 5^2)
>
> - Construct Galois fields of order p^n for each
>
> - Construct the Cartesian product of the Galois fields; note that the
> order of this is N (right?)
>
> - Define a reversible mapping from the numbers 0 ... N-1 to elements of
> the Cartesian product set; it doesn't matter what it is, so do it by
> going through the constituent Galois fields one by one, picking the element
> x
> mod p^n for that field, and updating x to be x / p^n (like the fields
> were digits in some really trippy positional digital notation)
>
> - Choose a key from the range 0 ... N-1
>
> - Combine two numbers in the range 0 ... N-1 by converting them to
> elements of the Cartesian product, combining them by elementwise
> addition, and converting the result back to a number
>
> I started that description thinking i'd be able to use multiplication in
> GF(p^n) at the climax, but then realised that that wouldn't be bijective
> because of the zeroes. So i've used addition, which is much less avalanchey.
> As it stands, that function would be stunningly bad for numbers of the form
> p^n - it would reduce to the (counter + key) mod N function i dismissed
> above.

OKay, so what if i do something like that, but use the p^n numbers as
moduli for a system of congruences in Chinese Remainder style? I map from
numbers to tuples by simply taking x mod p^n for each element, and from
tuples to numbers by solving the Chinese Remainder problem. The Chinese
Remainder Theorem says that for any combination of moduli and coefficients
(or whatever you call them), al the solutions are congruent mod N, so if
we're only looking at numbers in 0 ... N-1, then there's only one
solution, so the mapping between numbers and coefficients is bijective
(which is well-known, right?). So that would work.

Still no idea if it would be a good randomiser, though!

Supplementary question: is there a name for the kind of function where if
you have f(x, y) = z, where x, y and z are all in the same set, then for
any fixed K, f(K, y) and f(x, K) are both bijective? Like modular
addition, xor, my crazy function above, etc. I've heard the term 'mixing
function', but is that right, and is there anything more proper?

tom

--
The revolution is here. Get against the wall, sunshine. -- Mike Froggatt

Bryan Olson

unread,

Jun 8, 2009, 9:27:14 PM6/8/09

to

Tom Anderson wrote:
> I have a list of N items, where N is about 100 000, and a few thousand
> or more users. I would like to present each user with all the items in a
> random order, where the order is different for each user. I only need to
> go through each user's randomised list of items one by one, rather than
> having random access into the list, but the process of going through
> them could take days, weeks, perhaps even months.
>
> I could put all the items in an array, shuffle them, and go through them
> from first to last, storing the shuffled array for as long as it's
> needed. Or i could put the numbers 0 ... N-1 in an array, shuffle and
> store that, and go through that, using the numbers and indices into a
> master list of items. However, i would like to avoid having to store
> great big lists of numbers if at all possible.
>
> So, what can i do?

With tiny numbers like 100,000, shuffling is trivial. Instead of storing
the permutations, store, per user, a key to initialize the pseudo-random
number generator that drives the shuffling, so you can re-generate that
user's permutation when needed.

> My thinking is that i need three things. First, a counter for each user,
> which starts at 0 and goes up to N-1. Secondly, a random(ish) key K, of
> some form. Thirdly, a function which goes from a number in the range 0
> ... N-1 and a key to another number in the range 0 ... N-1, with the
> three properties that (i) for any fixed value of the key, the mapping
> between numbers is bijective, (ii) that mapping looks random (for some
> vague but suitable definition of 'random'), and (iii) that for different
> values of the key, the mapping looks different

[...]

> If N was 2^64, i could use a block cipher
> with a 64-bit block, which would be great.
>
> But i'd really like a function that would work for any N.

That's kind of an interesting sci.crypt question, but the issues are
probably more complex than you want to take on. Consider, for example,
that the most popular 64-bit block cipher, DES (or triple-DES),
generates only even permutations. We don't think that's a problem for
DES because we don't expect anyone to nearly exhaust the mapping, but if
your function had the same flaw, a user approaching the end of his
permutation might be able to deduce adversarially useful information.

--
--Bryan

Paul Rubin

unread,

Jun 9, 2009, 1:38:43 AM6/9/09

to

Tom Anderson <tw...@urchin.earth.li> writes:
> I have a list of N items, where N is about 100 000, and a few thousand
> or more users. I would like to present each user with all the items in
> a random order, where the order is different for each user.

Short answer: look up "Hasty Pudding cipher".

Longer answer: let b = smallest even number so that 2**b >= N-1. Then
for k < N, represent k as a b-bit number. Chop that into two b/2-bit
halves. Use a generic Feistel cipher to encrypt the pair of
half-numbers with a user-specific key. Use a bunch of rounds to
ensure uniformity. Use the Hasty Pudding trick (look it up) to ensure
the result is in the range [0,N-1]. Use the encrypted number as
an index into your table of items. For each user, you have to
remember the user key (maybe derived from the user ID) and the
index of the last item sent.

Thomas Pornin

unread,

Jun 9, 2009, 6:56:31 AM6/9/09

to

According to Tom Anderson <tw...@urchin.earth.li>:

> The problem is that i have no idea how to construct the function. Any
> suggestions?

From a cryptography point of view, you want a block cipher, i.e. a
pseudo-random permutation, selected by the key and operating on the
[0..N-1] range.

The usual trick is to use a "standard" block cipher which operates on
n-bit blocks, such that 2^n is "not too large" with regards to N. To
encrypt a value (between 0 and N-1), you map that value into [0..2^n-1],
then apply the block cipher repeatedly until you get a result back in
the [0..N-1] range. The number of block cipher invocations will be, on
average, 2^n/N.

Unfortunately, usual block ciphers use large blocks (e.g. n = 64 or 128)
and there are good reasons for that (block cipher construction techniques
have some biases, e.g. a Feistel network produces only even permutations,
which can be neglected if n is sufficiently large). So, depending on
your security requirements, this may or may not be good enough for you.

For a more generic construction, which allows you to produce a "perfect"
block cipher on any input set size, see:
http://www.bolet.org/~pornin/2007-fse-granboulan+pornin.pdf
but it entails some sampling following the hypergeometric distribution,
which is CPU-intensive and somewhat complex to implement. Depending on
your security requirements, you may tolerate an approximate but simpler
sampling method.

--Thomas Pornin

Herman Rubin

unread,

Jun 9, 2009, 3:31:18 PM6/9/09

to

In article <alpine.DEB.1.10.0...@urchin.earth.li>,

Tom Anderson <tw...@urchin.earth.li> wrote:
>Hello!

>This question is not really about cryptography, but it has a
>cryptographyish flavour, so i thought i'd ask it here. If there's a better
>place to ask it, please let me know.

>I have a list of N items, where N is about 100 000, and a few thousand or
>more users. I would like to present each user with all the items in a
>random order, where the order is different for each user. I only need to
>go through each user's randomised list of items one by one, rather than
>having random access into the list, but the process of going through them
>could take days, weeks, perhaps even months.

>I could put all the items in an array, shuffle them, and go through them
>from first to last, storing the shuffled array for as long as it's needed.
>Or i could put the numbers 0 ... N-1 in an array, shuffle and store that,
>and go through that, using the numbers and indices into a master list of
>items. However, i would like to avoid having to store great big lists of
>numbers if at all possible.

>So, what can i do?

I do not see how you can do it without having an
array of length N for each user. If you do, there
is nothing else needed as far as memory. There are
probably ways of handling the space if the total is
excessive.

Each array starts out with the numbers from 0 to N-1
in order. There are ways of improving the efficiency
by combinations, but I will give the essential ideas.

At each time, the particular user array will be of
length k, occupying positions 0 to k-1. The number
k is known. Now select a number at random from 0
to k-1, exchange the number at that position with the
one at position k-1, and output the resulting number,
reducing k by 1.

Programmatically:

step: J is a random number from 0 to k-1;
q = n(J);
move n(k-1) to n(J);
output q;
k--;
end step

The array at any time will consist of the indices of the
items not already used in some order, not fully random.
But the item selected at each stage is a random item from
those not already selected, so it does not matter.
--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
hru...@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558

Ilmari Karonen

unread,

Jun 10, 2009, 11:32:32 AM6/10/09

to

Ding! We have a winner! :)

Seriously, this was the answer I was going to post. To expand a
little on it, the "Hasty Pudding trick" is to start with a number in
the desired range and iterate the encryption function until you get a
result that is also in the range. The fact that a block cipher is a
permutation guarantees that this procedure will terminate, and the
average number of iterations needed is simply B/N, where B (= 2**b
above) is the number of possible cipher outputs and N is size of the
desired range.

As for the number of rounds, there's a well-known result by Luby and
Rackoff saying that four rounds are enough (even for crypto purposes,
which your use case isn't) if your round function is random enough.
Since your application presumably isn't performance-critical, you
could just take any standard cryptographic hash function and truncate
its output to use as your round function.

--
Ilmari Karonen
To reply by e-mail, please replace ".invalid" with ".net" in address.

Paul Rubin

unread,

Jun 10, 2009, 11:42:29 AM6/10/09

to

Ilmari Karonen <use...@vyznev.invalid> writes:
> As for the number of rounds, there's a well-known result by Luby and
> Rackoff saying that four rounds are enough (even for crypto purposes,
> which your use case isn't) if your round function is random enough.

No, 4 rounds isn't enough for short word sizes like this, where the
probability of collision is non-negligible. I remember some paper
saying 7 rounds was enough but I don't remember the reason. I usually
use 10 or so rounds when I do stuff like this, but it's also been on
slightly larger inputs.

Paul Rubin

unread,

Jun 10, 2009, 11:47:48 AM6/10/09

to

Ilmari Karonen <use...@vyznev.invalid> writes:
> Seriously, this was the answer I was going to post. To expand a
> little on it, the "Hasty Pudding trick" is to start with a number in
> the desired range and iterate the encryption function until you get a
> result that is also in the range. The fact that a block cipher is a
> permutation guarantees that this procedure will terminate,

The process is actually not guaranteed to terminate, since the
permutation can have cycles. For n=100000 the probability of this may
be enough that you have to adjust the scheme to allow for it.

Thomas Pornin

unread,

Jun 10, 2009, 12:24:21 PM6/10/09

to

According to Paul Rubin <http://phr...@NOSPAM.invalid>:

> The process is actually not guaranteed to terminate, since the
> permutation can have cycles.

No, it will terminate. Permutations have cycles, actually every single
element in the permuted space is necessarily part of a unique cycle, and
those cycles do not intersect. Moreover, since you begin the process
with a value which is in your target range, you have the guarantee that
the cycle along which you are walking includes at least one value in the
target range, namely the one you begin with.

--Thomas Pornin

Scott Fluhrer

unread,

Jun 10, 2009, 12:31:37 PM6/10/09

to

"Paul Rubin" <http://phr...@NOSPAM.invalid> wrote in message
news:7x4ouod...@ruckus.brouhaha.com...

> Ilmari Karonen <use...@vyznev.invalid> writes:
>> Seriously, this was the answer I was going to post. To expand a
>> little on it, the "Hasty Pudding trick" is to start with a number in
>> the desired range and iterate the encryption function until you get a
>> result that is also in the range. The fact that a block cipher is a
>> permutation guarantees that this procedure will terminate,
>
> The process is actually not guaranteed to terminate, since the
> permutation can have cycles.

No, it will always terminate, because all members of a (finite) permutation
are a part of a cycle. That is, if you start with a value x_0, and have
x_{i+1} = perm( x_i ), then there will always be a value n s.t. x_n = x_0.
Hence, if x_0 is in range, then there will be a value x_m that will be in
range for m > 0, because even if x_i is not for 0<i<n, then it will be for
i=n.

--
poncho

Tom Anderson

unread,

Jun 10, 2009, 1:11:20 PM6/10/09

to

On Tue, 8 Jun 2009, Paul Rubin wrote:

> Tom Anderson <tw...@urchin.earth.li> writes:
>> I have a list of N items, where N is about 100 000, and a few thousand
>> or more users. I would like to present each user with all the items in
>> a random order, where the order is different for each user.
>
> Short answer: look up "Hasty Pudding cipher".

Short response: perfect, thanks!

> Longer answer: let b = smallest even number so that 2**b >= N-1. Then
> for k < N, represent k as a b-bit number. Chop that into two b/2-bit
> halves. Use a generic Feistel cipher to encrypt the pair of
> half-numbers with a user-specific key. Use a bunch of rounds to ensure
> uniformity. Use the Hasty Pudding trick (look it up) to ensure the
> result is in the range [0,N-1]. Use the encrypted number as an index
> into your table of items. For each user, you have to remember the user
> key (maybe derived from the user ID) and the index of the last item
> sent.

Longer response: the general structure of this idea is exactly what i had
in mind, but the idea of just using a Feistel cipher and then trimming the
result is what i was missing. Great! It took me a few minutes to figure
out why the Hasty Pudding trick works (and why it's never ambiguous), but
it's actually very simple to prove that it does.

Right, off to design an S-box ...

tom

--
People should not be afraid of their governments. Governments should be
afraid of their people. -- V

Paul Rubin

unread,

Jun 10, 2009, 1:21:36 PM6/10/09

to

Thomas Pornin <por...@bolet.org> writes:
> you have the guarantee that
> the cycle along which you are walking includes at least one value in the
> target range, namely the one you begin with.

Ah, right. Thanks.

Paul Rubin

unread,

Jun 10, 2009, 1:28:29 PM6/10/09

to

Tom Anderson <tw...@urchin.earth.li> writes:
> Right, off to design an S-box ...

Just use a standard crypto hash function H(key+text) unless you're
cpu-limited.

Tom Anderson

unread,

Jun 10, 2009, 6:28:09 PM6/10/09

to

Aha. After i posted, i started wondering about this, but of course, you
know there must be at least one in-range value, even if it's the one you
started with.

It did occur to me that another thing you could do would be, instead of
counting from 1 to N and applying a mapping from 1-to-M to 1-to-M
repeatedly to get to a value in 1-to-N, to count from 1 to M, apply the
mapping once, and just discard out-of-range values. You might have to
increment the counter more than once for any given randomised 1-to-N value
produced, but that isn't a problem here. Because the mapping is a
permutaion, you know that in going from 1 to M, you'll get every value
between 1 and M once, and thus every value between 1 and N once, plus all
those others greater than N which you ignore.

tom

--
Subvert normality.

Tom Anderson

unread,

Jun 10, 2009, 6:41:36 PM6/10/09

to

But i like S-boxes! :)

You're right. The beauty of the Feistel structure is that it doesn't
require any clever reversible functions, it just needs a good
shaker-upper, like a hash function.

Reading up on this, i came across the idea of unbalanced Feistel networks.
That got me thinking, and i realised you could make a sort of non-binary
Feistel structure which would work directly on numbers in the range 1 to N
- provided that N was composite. You pick some factor Q which divides N,
and then to split your plaintext number X into left and right parts, do
X/Q and X mod Q. Then you need a round function, and you can build an
algebraic equivalent of an unbalanced Feistel network. I haven't worked
the details out yet, though, so i could well have missed something
crucial. It's more of a stunt than a sensible idea anyway.

tom

--
Subvert normality.

Ilmari Karonen

unread,

Jun 11, 2009, 2:11:38 PM6/11/09

to

On 2009-06-10, Tom Anderson <tw...@urchin.earth.li> wrote:
>
> Reading up on this, i came across the idea of unbalanced Feistel networks.
> That got me thinking, and i realised you could make a sort of non-binary
> Feistel structure which would work directly on numbers in the range 1 to N
> - provided that N was composite. You pick some factor Q which divides N,
> and then to split your plaintext number X into left and right parts, do
> X/Q and X mod Q. Then you need a round function, and you can build an
> algebraic equivalent of an unbalanced Feistel network. I haven't worked
> the details out yet, though, so i could well have missed something
> crucial. It's more of a stunt than a sensible idea anyway.

Yes, BTDT. The only problem, as you note, is that it's not as general
as the Hasty Pudding Trick because N has to have a reasonably sized
proper divisor Q: primes don't work at all, and small multiples of
primes will produce an excessively unbalanced network.

Of course, you could combine the two tricks by using ceil(sqrt(N))^2
as the range of your (balanced) Feistel network and using the Hasty
Pudding Trick to reduce that range down to N, but I'm not convinced
that'd be any more efficient than just using a Feistel network with a
power-of-2 (or 4) range.

Tom Anderson

unread,

Jun 11, 2009, 2:33:04 PM6/11/09

to

On Thu, 11 Jun 2009, Ilmari Karonen wrote:

> On 2009-06-10, Tom Anderson <tw...@urchin.earth.li> wrote:
>
>> Reading up on this, i came across the idea of unbalanced Feistel networks.
>> That got me thinking, and i realised you could make a sort of non-binary
>> Feistel structure which would work directly on numbers in the range 1 to N
>> - provided that N was composite. You pick some factor Q which divides N,
>> and then to split your plaintext number X into left and right parts, do
>> X/Q and X mod Q. Then you need a round function, and you can build an
>> algebraic equivalent of an unbalanced Feistel network. I haven't worked
>> the details out yet, though, so i could well have missed something
>> crucial. It's more of a stunt than a sensible idea anyway.
>
> Yes, BTDT. The only problem, as you note, is that it's not as general
> as the Hasty Pudding Trick because N has to have a reasonably sized
> proper divisor Q: primes don't work at all, and small multiples of
> primes will produce an excessively unbalanced network.
>
> Of course, you could combine the two tricks by using ceil(sqrt(N))^2 as
> the range of your (balanced) Feistel network and using the Hasty Pudding
> Trick to reduce that range down to N,

Yes!

> but I'm not convinced that'd be any more efficient than just using a
> Feistel network with a power-of-2 (or 4) range.

LA LA LA NOT LISTENING.

tom

--
News flash: there's no deep meaning or hidden message BECAUSE DAVID
LYNCH IS INSANE

WTShaw

unread,

Jun 12, 2009, 6:00:22 AM6/12/09

to

On Jun 9, 2:31 pm, hru...@odds.stat.purdue.edu (Herman Rubin) wrote:
>
> At each time, the particular user array will be of
> length k, occupying positions 0 to k-1. The number
> k is known. Now select a number at random from 0
> to k-1, exchange the number at that position with the
> one at position k-1, and output the resulting number,
> reducing k by 1.
>

This is an efficient method that I have used, like it, "end around."

Another slides the rest of the elements in the array past the selected
element down to fill the void, "drop down."

Another needs many hits on the array until all elements are picked by
chance from the N possibilities, "shooting gallery."

There are others, some really obscure, but still the best is probably
the one you suggested, all based on a pRNG.

David Eather

unread,

Jun 12, 2009, 4:31:24 PM6/12/09

to

It seems to me that you know the elements, you know the way to randomly
shuffle them, so all you need to keep is the seed of a suitable CSPRNG
to recreate the list and hence a particular element whenever desired.

(Knuth has an algorithm for creating random permutations, but you could
also derive his method from the key setup of RC-4)

WTShaw

unread,

Jun 13, 2009, 12:43:12 PM6/13/09

to

I did post to this thread concerning making a permutation using a
random number generator but there is a better way, easy to use, an
uses the strengths of any set as a base.

You say..here we go again...but read carefully and you might learn
something.

A number of years ago, I ran across discussion of an old ACA algorithm
that hinted of its use as a stream cipher. Being prior computer, its
promise was too difficult and subject to clerical errors. The idea
was that a stream of characters previously assigned places in a
character set could be used to produce a subsequent series. I
wondered it that was to be a recursive series and it proved to be so.
This all predated binary emphasis.

The idea is simple in a set of 0 to n-1 characters, add the place
values of the last two characters mod n to produce a following
character. The next step would be to work with that last character
and the preceeding one to build a string that would ultimately repeat
itself. The earliest character is dropped from the current list to
keep it a constant length. You must start with at least two
characters

To improve matters, add a numeric constant of 0 to n-1 to the
formula. I tend to make the default value 3 from experience but
others might be as good, but not 0 or 2.

Before going into the next steps of permutation building, look at next
posted examples of series as I have them up and running. The base set
for convenience is the plain alphabet, n=3, and the starting series
seed remain constant length.

WTShaw

unread,

Jun 13, 2009, 12:56:13 PM6/13/09

to

On Jun 13, 11:43 am, WTShaw <lure...@gmail.com> wrote:

> Before going into the next steps of permutation building, look at next
> posted examples of series as I have them up and running. The base set
> for convenience is the plain alphabet, n=3, and the starting series
> seed remain constant length.

set = "abcdefghijklmnopqrstuvwxyz"

Cycle to regenerate <see> in a
repeating loop needed 1280 characters.

Cycle to regenerate <help> in a
repeating loop needed 10979 characters.

----

from the output field:

For SEEDS <got> & 3 & Set LC26: abcdefghijklmnopqrstuvwxyz
Cycle to regenerate <got> in a repeating loop needed 1280 characters.
xktkg gtpcl uqinb yrcsw xrwrq qkjdw pcout lqhea ohryb scwxb wbaae
dhknu akxnk naaqd twzsy utvqr okibv mzkom bdqhw agzji luwit hedok
ubhyl imwxl wlkky xlylm mabpe twasz vuxsu sppkh cumzj olaco ftwbs
awvzu xwuwt tspok gbtkx gkgtt cpyuq vnole csjxe jeqqx jqjcc ohtyd
ueabh elosc jxojo aardu xauax xaxaa addgj msyht ideok vbizm kozbq
duwat zwvyu wvtur qokhb ulyim jxyjy kklxy lymmn bcrgw afzih ksufp
cxucu zzwby acbfg josaj vmhkw ujtgf coktb gxkgk ttgpc yudva byecf
jkrwe qdxwd wccbh glque nbury ospjk bwoan rqhka unxkn kaand qtwms
lhgvq eoxvo vmmkb zodqu wntmj iyujv gheqo xhohy yizjk lwykx lklyy
mznop egwnf mvuks hfcpk uchzm joyap bstwo snjiz ukwhj gtsco xtotk
kgxtg tccyh dinoy epfwx ewedd kjqwc pbuty qurno heyof pwxow onned
ukahn kxaka nnqdg wmflu tiqeb xibim mxbmb qqujn gzwiy hjitu eqbxu
buyyv zwxyw yxxyx yyyzz abceg jnszi ukfhs pckup hmzwo ynpof gwofn
wvmuk jhwtg scbxg bgkkt xgtgc clhqv aoyrp sjkew rdqxw qwppo hgyqh
rabue ybfcj kowbn arquk nhaxk aknna dqgwz fyhgi qrbkv oimzx ozoqq
hjatm wilhw vgued bkhou ylvmj kywlx klkyy lzmno cetja fmiux fufcc
khpuz mwoln cbsgw bfaji muxju jggsp bktog kxtkt ggcpl udiao lrcfw
kejrq dkwqj pcbug ydhen ouelb spwko jbane quxnu nkkax nanqq gjzsl
ugidr oxioi zzkbm oqdhw ngmwv lujig urdox uoull izwky jlkxy kyllm
zaocr twnsm ihxsh sccxh chmmw blapo sgjbs nwimh xwhwg gfpox gogxx
gxggg ppyhq iable pswkr jedqk wdjcp ougld uraou rlofc wkbjo naeqh
xahak knxan aqqtj mfyug vdebk iovzm xomod dujag mjvyh wighr qbkuo
hlyvm wkljy xkykl lyzma oprgj asmvh kfusc pxupu mmjby ncost jofaw
izhkj uwgtf cbkgo t

Longer seeds produce longer series. This has promise but we are not
there yet.

WTShaw

unread,

Jun 13, 2009, 1:09:59 PM6/13/09

to

You are to find a way to go from the resultant series to a
permutation.

With simple work, one permutation to another can happen.

With a one step process, follow these evolving permutations with new
seeds acting on
previous data:

abcdefghijklmnopqrstuvwxyz
red
ykelrsfmaupxthobcqinvgwdjz
white
tamqdikxwbsovflzynhurcjpge
blue
tscmfwhgouvbnqdlyzexpkajir

-----

abcdefghijklmnopqrstuvwxyz
blue
pibwamzcrhuetlkngdxfqojysv
white
nlgpwmzfekbqcvshiajyrxudto
red
meitkabpwduhlncyqjzsgxofvr

---

Even with simple seeds, the process is not a set.

WTShaw

unread,

Jun 13, 2009, 1:13:37 PM6/13/09

to

On Jun 13, 11:56 am, WTShaw <lure...@gmail.com> wrote:

>
> Longer seeds produce longer series. This has promise but we are not
> there yet.

You are to find a way to go from the resultant series to a

David Eather

unread,

Jun 13, 2009, 3:45:44 PM6/13/09

to

To WTShaw,

In this thread, I did not post to you or in response to any of your
posts, but I posted to the OP. The OP was the first to mention creating
permutations, and in so doing he has almost solved his own problem. A
few tweaks and he is done.

I don't claim any great credit for my solution. The efficient method for
generating a permutation is given in Knuth and any other book on
computer algorithms and most of the posters on sci.crypt can come up
with an efficient PRNG with the required properties.

In hindsight my post (to the OP) was not as useful as it should be - I
believe I have a full and efficient solution and I intend to post it
when I have cleaned it up (particularity my pseudo code). An efficient
solution allows the generation of the specific required permutation/s at
any time, in any order, without requiring multiple iterations of other
permutations.

I do not wish to communicate with you any further.

WTShaw

unread,

Jun 14, 2009, 5:32:13 AM6/14/09

to

Sorry about the double posting but the process was sluggish and
thought maybe I pushed a wrong button or other. As for talking about
a group of protocols as not a set, syntax is confusing when words get
co-opted for specific meanings, so in the spirit of debate long ago
when it seems few had much of real substance at the time, the process
I sometimes use is not a "group."

Chaining permutations is most useful when extrapolating larger keys
made of many permutations. There are pitfalls but that is one reason
to survey such results to look for bad tendencies in any particular
process. Considering the redundancy qualities of English for example,
results that are more random tend to show that pattern masking even of
text can be effective in key generation.

Heads can be protected from receiving information by placing them
within a bucket of sand but usually those that feel that urge do not
need more that a rather small container, one left over after their
last haircut.

David Eather

unread,

Jun 15, 2009, 3:15:08 AM6/15/09

to

David Eather wrote:
> Tom Anderson wrote:
>> Hello!
>>
>> This question is not really about cryptography, but it has a
>> cryptographyish flavour, so i thought i'd ask it here. If there's a
>> better place to ask it, please let me know.
>>
>> I have a list of N items, where N is about 100 000, and a few thousand
>> or more users. I would like to present each user with all the items in
>> a random order, where the order is different for each user. I only
>> need to go through each user's randomised list of items one by one,
>> rather than having random access into the list, but the process of
>> going through them could take days, weeks, perhaps even months.
>>
>> I could put all the items in an array, shuffle them, and go through
>> them from first to last, storing the shuffled array for as long as
>> it's needed. Or i could put the numbers 0 ... N-1 in an array, shuffle
>> and store that, and go through that, using the numbers and indices
>> into a master list of items. However, i would like to avoid having to
>> store great big lists of numbers if at all possible.
>>
>> So, what can i do?

To create a random permutation of size N.

1. Find the number of minimum bits (n-bits) required for N
n-bits = int( log(N) / log(2) +1)
for N = 100000, n-bits = 17 bits

2. Find the required length of the user key
The MINIMUM length of user key (or seed) is
seed_length = int( log(N**2) / log(2) +1)
for N = 100000, seed_length = 34 bits

The length of key is to make sure that everyone has a unique seed.
Note that this is a minimum length and it means that the chance of a
matching key pair amongst all users is less than 50%.

I suggest that a more conservative seed length is at least
seed_length = int( log(N**3) / log(2) +1)
for N = 100000 this suggests seed_length = 50 bits

Use any convenient key length that is longer than minimum requirement
(eg. in the N=100000 example 40-bits would OK. But 64-bits would be a
good and more conservative choice.

3. Create a quality random number generator whose output is in the
range 0 to N-1 (pseudo code follows)

Get_Random:
increment iteration_counter
random_candidate = Hash( key || iteration_counter)
random_candidate = Random_Candidate AND (2**n-bits) -1
if random_candidate >= N-1 then Get_Random
Else return random_candidate

The variable iteration_counter is a stateful variable with local
scope. It ensure the hash function behaves like a random number
generator. Otherwise, if the hash function hit a fixed point it would
then output a never ending stream of identical numbers, which is not a
behavior expected of a random number generator.

Also, if the hash function gave two identical outputs for two
different keys, then from that matching point onwards, both keys would
produce a identical output. Using the iteration counter stops the above
from happening. An alternative is to use a bigger (and stronger) hash
function so these probabilities become so remote as to be negligible,
but the iteration counter is quicker and makes the probabilities of this
undesirable behavior zero.

The hash function is just a standard hash function that provides the
properties you require, such as size and/or cryptographic security. The
internal state of the hash function should be as large or larger than
the minimum seed_length.

Some examples of useful hash functions would be CRC-32, CRC-64, MD-4,
MD-5, SHA-1 or SHA-2. In the N=100000 example you could not use CRC-32
without possible problems (32-bits is too small an internal state when
for N = 100000, seed_length = 34 bits). My preferences would be to use
MD-4 for any non-cryptographic purpose (it is faster than CRC-32 but
with a much larger state of 128-bits) and to use SHA-2 using 256-bits,
384-bits or 512-bits as required for any generator that requires
cryptographic strength.

The || symbol simply means concatenation. The input to the hash
function is 0xKEY || 0xIteration_Counter. If required certain types of
off-line cryptographic attacks can be made more difficult by also
concatenating a ï¿½saltï¿½ at this point.

The AND function is a bit-wise AND which clears all unnecessary MSB's
in random_candidate variable. This limits the range of possible outputs
for random-candidate so that candidates will pass with better than a 50%
probability in the long run (76.3% in the 100000 sample case). This
method does not introduce any bias and is required for speed.

Note that the output sequence is exactly the same for each key.

4. To create the required permutation from the key.

dim permutation_array(N-1)

for i = 0 to n-1 step 1
permutation_array(i) = i.

for i = 0 to n-1 step 1
j = Get_Random(key)
swap permutation_array(i), permutation_array(j)

The first ï¿½for iï¿½ loop sets permutation_array into a known default
condition. In this case it is 0,1,2,3 ï¿½ N-1. If you didn't do this you
could not directly create the required permutation from just the key.

Completing the second ï¿½for iï¿½ loop produces the shuffled permutation.
Pointer i sequentially selects each element in the permutation_array and
the swap function exchanges that element with the element selected by
the pseudo random pointer j.

5. Done.

If I have not expressed something clearly or I have made an error then
please let me know.

David Eather

unread,

Jun 15, 2009, 3:41:45 AM6/15/09

to

> <snip> </snip>

Some edits required!

>
> To create a random permutation of size N.
>
> 1. Find the number of minimum bits (n-bits) required for N
> n-bits = int( log(N) / log(2) +1)
> for N = 100000, n-bits = 17 bits
>
> 2. Find the required length of the user key
> The MINIMUM length of user key (or seed) is
> seed_length = int( log(N**2) / log(2) +1)
> for N = 100000, seed_length = 34 bits
>
> The length of key is to make sure that everyone has a unique seed. Note
> that this is a minimum length and it means that the chance of a matching
> key pair amongst all users is less than 50%.
>
> I suggest that a more conservative seed length is at least
> seed_length = int( log(N**3) / log(2) +1)
> for N = 100000 this suggests seed_length = 50 bits
>
> Use any convenient key length that is longer than minimum requirement
> (eg. in the N=100000 example 40-bits would OK. But 64-bits would be a
> good and more conservative choice.
>

> 3. Create a quality *pseudo* random number generator whose output is in the

> range 0 to N-1 (pseudo code follows)
>
> Get_Random:
> increment iteration_counter
> random_candidate = Hash( key || iteration_counter)
> random_candidate = Random_Candidate AND (2**n-bits) -1
> if random_candidate >= N-1 then Get_Random
> Else return random_candidate
>
> The variable iteration_counter is a stateful variable with local scope.

> It ensures the hash function behaves like a random number generator.

> Otherwise, if the hash function hit a fixed point it would then output a

> never ending stream of identical numbers, which is not a behaviour

> expected of a random number generator.
>
> Also, if the hash function gave two identical outputs for two different

> keys, then from that matching point onwards, both keys would produce an

> identical output. Using the iteration counter stops the above from
> happening. An alternative is to use a bigger (and stronger) hash
> function so these probabilities become so remote as to be negligible,

> but using the iteration counter is quicker and makes the probabilities of this
> undesirable behaviour zero.

>
> The hash function is just a standard hash function that provides the

> properties you require, such as output size, internal state and/or cryptographic security. The

> internal state of the hash function should be as large or larger than
> the minimum seed_length.
>
> Some examples of useful hash functions would be CRC-32, CRC-64, MD-4,
> MD-5, SHA-1 or SHA-2. In the N=100000 example you could not use CRC-32
> without possible problems (32-bits is too small an internal state when
> for N = 100000, seed_length = 34 bits). My preferences would be to use
> MD-4 for any non-cryptographic purpose (it is faster than CRC-32 but

> with a much larger output of 128-bits) and to use SHA-2 using 256-bits,

> 384-bits or 512-bits as required for any generator that requires
> cryptographic strength.
>
> The || symbol simply means concatenation. The input to the hash
> function is 0xKEY || 0xIteration_Counter. If required certain types of
> off-line cryptographic attacks can be made more difficult by also
> concatenating a ï¿½saltï¿½ at this point.
>
> The AND function is a bit-wise AND which clears all unnecessary MSB's
> in random_candidate variable. This limits the range of possible outputs
> for random-candidate so that candidates will pass with better than a 50%
> probability in the long run (76.3% in the 100000 sample case). This
> method does not introduce any bias and is required for speed.
>
> Note that the output sequence is exactly the same for each key.

Yikes. Delete the above line and never think of it again.

Maaartin

unread,

Jun 15, 2009, 8:38:25 AM6/15/09

to graj...@seznam.cz

> for i = 0 to n-1 step 1
> j = Get_Random(key)
> swap permutation_array(i), permutation_array(j)

There's a problem with it. Your Get_Random is perfect, provided you
use a strong Hash. Nonetheless, your permutation is not perfectly
random, just try it out for n=3.

Without trying it, for n=3 you generate 3 numbers from the set {0, 1,
2} giving 3**3 = 27 possibilities, while here are 3! = 6 permutations.
The result can't be evenly distributed because 6 does not divide 27
evenly.

AFAIK, the perfect solution requires generating random numbers from
the sets {0, 1, 2}, {0, 1}, {0}.

David Eather

unread,

Jun 15, 2009, 12:37:08 PM6/15/09

to

Thanks for checking this.

Your argument and example are pretty compelling. I checked by brute
force. With your example, the 27 possible inputs result in the output
permutations 012, 201, 210 appearing 4 times each and the other
permutations appearing 5 times each, rather than the ideal 4.5 times each.

My suspicion is that the source of this bias is exactly the same as the
source of bias in RC-4 (from which I unashamedly ripped the method of
initializing the permutation). If that is correct, then the size of the
bias should typically head down as the size of the permutation increases
(That's not true for n=4).

I speculate that for the OP with his large permutations this bias is of
no concern?

Any thoughts? and thanks again.

Bryan Olson

unread,

Jun 15, 2009, 8:39:04 PM6/15/09

to

Right if you use Fisher-Yates (see:
http://en.wikipedia.org/wiki/Fisher-Yates_shuffle or Herman Rubin's post
in this thread), though selecting a random element from {0} strikes me
as vacuous. This raises the problem, oft-solved on sci.crypt, of
generating uniform trits from uniform bits. I'll repeat a solution below.

The Goldberg shuffler uses bits directly:
http://groups.google.com/group/sci.crypt.random-numbers/browse_frm/thread/1372022a6634b469/83d82f9bb7b0cdec

/* Return a random choice from the uniform distribution
of integers in [0..target_range), assuming rand_bit() returns
uniformly distributed random bits.
*/
int RandInRange(int target_range)
{
int rand = 0;
int current_range = 1; /* INVARIANT: rand is uniformly
distributed in [0..current_range) */
for(;;)
{
while (current_range < target_range)
{
current_range *= 2;
rand = 2 * rand + rand_bit();
/* Invariant true */
}
if (rand < target_range)
return rand;

/* rand uniform in [target_range..current_range) */
current_range -= target_range;
rand -= target_range;
/* Invariant true again */
}
}

--
--Bryan

Maaartin

unread,

Jun 16, 2009, 4:23:56 PM6/16/09

to

David Eather wrote:
> My suspicion is that the source of this bias is exactly the same as the
> source of bias in RC-4 (from which I unashamedly ripped the method of
> initializing the permutation). If that is correct, then the size of the
> bias should typically head down as the size of the permutation increases

I think, you're right.

> (That's not true for n=4).

Sorry, I don't understand this.

> I speculate that for the OP with his large permutations this bias is of
> no concern?

So do I, but I can give no proof.

> Any thoughts? and thanks again.

Generating a uniformly distributed number in range 0..n with n<2**32
using a random 32-bit number is quite easy, see
http://java.sun.com/javase/6/docs/api/java/util/Random.html#nextInt(int)

Shuffling using such a rng is easy as well:
for (int i=1; i<a.length; ++i) {
int ix = nextInt(i+1);
int tmp = a[i]; a[i] = a[ix]; a[ix] = tmp;
}
Be warned, I haven't tested it.

Bryan Olson wrote:
> Right if you use Fisher-Yates (see:
> http://en.wikipedia.org/wiki/Fisher-Yates_shuffle or Herman Rubin's post
> in this thread), though selecting a random element from {0} strikes me
> as vacuous.

That's a no-op just like multiplying by the "1" in 3! = 3 * 2 * 1,
just ignore it, or optimize it away; it corresponds with i=0 in the
above loop.

David Eather

unread,

Jun 16, 2009, 8:00:55 PM6/16/09

to

thanks Bryan

David Eather

unread,

Jun 16, 2009, 8:03:51 PM6/16/09

to

Maaartin wrote:
> David Eather wrote:
>> My suspicion is that the source of this bias is exactly the same as the
>> source of bias in RC-4 (from which I unashamedly ripped the method of
>> initializing the permutation). If that is correct, then the size of the
>> bias should typically head down as the size of the permutation increases
>
> I think, you're right.
>
>> (That's not true for n=4).
>
> Sorry, I don't understand this.

It's nothing very profound I'm afraid. I speculated that as n increased
the bias should decrease. I checked for a few "n's" including n=4. N=4
has larger biases than n=3.