Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Simple algorithm question - how to reorder a sequence economically

47 views
Skip to first unread message

Peter Brooks

unread,
May 24, 2013, 4:14:45 AM5/24/13
to
What is the easiest way to reorder a sequence pseudo-randomly?

That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
2,1,4,3) that is different each time.

I'm writing a simulation and would like to visit all the nodes in a
different order at each iteration of the simulation to remove the risk
of a fixed order introducing spurious evidence of correlation.

Chris Angelico

unread,
May 24, 2013, 4:37:49 AM5/24/13
to pytho...@python.org
Permuting a sequence iteratively to cover every possibility? Good fun.

Here's one way of looking at it. Imagine the indices of all elements
are in some special "base" like so:

[a, b, c, d] --> a*4+b*3+c*2+d*1

Then iterate up to the highest possible value (ie 4*3*2*1), picking
indices for each accordingly. I don't know how efficient this will be,
but here's a simple piece of code to do it:

>>> def permute(lst,pos):
lst=lst[:] # Take a copy
ret=[None]*len(lst)
for i in range(len(lst)):
pos,idx=divmod(pos,len(lst))
ret[i]=lst[idx]
del lst[idx]
return ret

>>> for i in range(4*3*2*1):
permute([10,20,30,40],i)


[10, 20, 30, 40]
[20, 10, 30, 40]
[30, 10, 20, 40]
[40, 10, 20, 30]
[10, 30, 20, 40]
[20, 30, 10, 40]
[30, 20, 10, 40]
[40, 20, 10, 30]
[10, 40, 20, 30]
[20, 40, 10, 30]
[30, 40, 10, 20]
[40, 30, 10, 20]
[10, 20, 40, 30]
[20, 10, 40, 30]
[30, 10, 40, 20]
[40, 10, 30, 20]
[10, 30, 40, 20]
[20, 30, 40, 10]
[30, 20, 40, 10]
[40, 20, 30, 10]
[10, 40, 30, 20]
[20, 40, 30, 10]
[30, 40, 20, 10]
[40, 30, 20, 10]

It works, it produces a unique list for any given index provided, but
it's not the cleanest or most efficient. But I know someone'll improve
on it... or tell me I'm an idiot for not taking a more obvious
approach :)

ChrisA

Fábio Santos

unread,
May 24, 2013, 4:47:07 AM5/24/13
to Chris Angelico, pytho...@python.org


On 24 May 2013 09:41, "Chris Angelico" <ros...@gmail.com> wrote:
>
> On Fri, May 24, 2013 at 6:14 PM, Peter Brooks
> <peter.h....@gmail.com> wrote:

> > What is the easiest way to reorder a sequence pseudo-randomly?
> >
> > That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
> > 2,1,4,3) that is different each time.
> >

...


> It works, it produces a unique list for any given index provided, but
> it's not the cleanest or most efficient. But I know someone'll improve
> on it... or tell me I'm an idiot for not taking a more obvious
> approach :)
>
> ChrisA

I think that is pretty much itertools.permutations from the standard library. The OP should check it out.

Chris Angelico

unread,
May 24, 2013, 5:11:11 AM5/24/13
to pytho...@python.org
On Fri, May 24, 2013 at 6:47 PM, Fábio Santos <fabiosa...@gmail.com> wrote:
>
> On 24 May 2013 09:41, "Chris Angelico" <ros...@gmail.com> wrote:
>>
>> On Fri, May 24, 2013 at 6:14 PM, Peter Brooks
>> <peter.h....@gmail.com> wrote:
>> > What is the easiest way to reorder a sequence pseudo-randomly?
>> >
>> > That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
>> > 2,1,4,3) that is different each time.
>> >
> ...
>
>> It works, it produces a unique list for any given index provided, but
>> it's not the cleanest or most efficient. But I know someone'll improve
>> on it... or tell me I'm an idiot for not taking a more obvious
>> approach :)
>>
>> ChrisA
>
> I think that is pretty much itertools.permutations from the standard
> library. The OP should check it out.

That works if all the permutations are wanted at once. Is there a way,
short of iterating over it N times, to request permutation #N? Or
maybe I'm misreading the OP and that's not a requirement.

ChrisA

Terry Jan Reedy

unread,
May 24, 2013, 6:10:26 AM5/24/13
to pytho...@python.org
random module has a shuffle function I believe


Steven D'Aprano

unread,
May 24, 2013, 6:52:18 AM5/24/13
to
On Fri, 24 May 2013 01:14:45 -0700, Peter Brooks wrote:

> What is the easiest way to reorder a sequence pseudo-randomly?

import random
random.shuffle(sequence)


The sequence is modified in place, so it must be mutable. Lists are okay,
tuples are not.


> That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
> 2,1,4,3) that is different each time.

You can't *guarantee* that it will be different each time. With a four-
item list, there are only 4! = 24 combinations, so on average you'll get
the original order one time in 24. For a ten-item list, that is once
every 3628800 times, and for a twenty-item list, once in
2432902008176640000 times. But of course these are *random*, and there's
always a chance of this:

http://dilbert.com/strips/comic/2001-10-25


--
Steven

Ned Batchelder

unread,
May 24, 2013, 7:26:00 AM5/24/13
to Steven D'Aprano, pytho...@python.org
Also, heed the note in the docs:  "Note that for even rather small len(x), the total number of permutations of x is larger than the period of most random number generators; this implies that most permutations of a long sequence can never be generated."  The default random number generator has a period of 2**19937-1, which means that lists longer than 2080 have more permutations than the period of the generator (because factorial(2081) > 2**19937).  Most shuffles involve lists far shorter than 2080, but it's still good to keep it in mind.

--Ned.

Peter Brooks

unread,
May 24, 2013, 9:23:14 AM5/24/13
to
Thank you all for those most helpful suggestions! random.shuffle does
precisely the job that I need quickly. Thank you for introducing me to
itertools, though, I should have remembered APL did this in a symbol
or two and I'm sure that itertools will come in handy in future.

Thanks for the warnings about random numbers too - I hope my lists
will be short enough for the permutations of the function to be
irrelevant. I don't need every single sequence to be unique, only that
the same sequence only occurs occasionally. My lists might get up to
the ~2k length one day, and I'll keep in mind the need, at that stage,
to use a different pseudo-random generator. Actually, thinking about
it, there is probably a source of non-algorithmically-derived 'random'
numbers somewhere on the net that would do the job nicely.

Steven D'Aprano

unread,
May 24, 2013, 9:57:21 AM5/24/13
to
On Fri, 24 May 2013 06:23:14 -0700, Peter Brooks wrote:


> Thanks for the warnings about random numbers too - I hope my lists will
> be short enough for the permutations of the function to be irrelevant. I
> don't need every single sequence to be unique, only that the same
> sequence only occurs occasionally. My lists might get up to the ~2k
> length one day, and I'll keep in mind the need, at that stage, to use a
> different pseudo-random generator. Actually, thinking about it, there is
> probably a source of non-algorithmically-derived 'random' numbers
> somewhere on the net that would do the job nicely.

That's massive overkill.

Take a closer look at what Ned wrote:

"The default random number generator has a period of 2**19937-1"

and consider the numbers.

For a list with 3000 items, there are 3000! possible permutations. That
is approximately 10**21024. That is, a number with 21024 digits, or
somewhere around a trillion trillion trillion ... trillion trillion,
where there are 1752 "trillions".

If you could generate a million permutations a second, it would take you
on average 10**210988 centuries before getting the original permutation
again. That's the expected result you would get with a source of true
randomness.

Instead, with Python's default pseudo-random number generator, you cannot
get the full 3000! permutations, but only 2**19937-1 of them. Using the
same calculation as above, that means that you will have to generate a
million permutations per second for "only" 10**13783 centuries before
getting the original permutation again.

There are uses where being able to generate any possible permutation is
important, and the default PRNG is not sufficient. But merely shuffling
your data around to avoid spurious correlations is not one of them. Save
yourself a lot of development effort, and debugging, and just use
random.shuffle.


--
Steven

duncan smith

unread,
May 24, 2013, 10:33:26 AM5/24/13
to
On 24/05/13 10:11, Chris Angelico wrote:
A long time ago I wrote some code to do that.


import gmpy

def LexPermFromIndex(items, index):
n = len(items)
inds = range(n)
perm = []
for i in range(1, n+1):
r, index = divmod(index, gmpy.fac(n-i))
r = int(r)
perm.append(inds[r])
inds = inds[:r] + inds[r+1:]

return [items[i] for i in perm]


>>> LexPermFromIndex([1,2,3,4], 0)
[1, 2, 3, 4]
>>> LexPermFromIndex([1,2,3,4], 1)
[1, 2, 4, 3]
>>> LexPermFromIndex([1,2,3,4], 10)
[2, 4, 1, 3]
>>>


I can't remember exactly why I wrote it. But I also have something for
generating a permutation's index and similar functions for combinations.

Duncan

Carlos Nepomuceno

unread,
May 24, 2013, 11:00:59 AM5/24/13
to pytho...@python.org
----------------------------------------
> Date: Fri, 24 May 2013 01:14:45 -0700
> Subject: Simple algorithm question - how to reorder a sequence economically
> From: peter.h....@gmail.com
> To: pytho...@python.org
> --
> http://mail.python.org/mailman/listinfo/python-list

I don't know what "spurious evidence of correlation" is. Can you give a mathematical definition?

Here's a snippet for creating a random shuffle by Fisher–Yates algorithm:

def FY_shuffle(l):
    from random import randint
    for i in range(len(l)-1,0,-1):
        j = randint(0,i)
        l[j],l[i] = l[i],l[j]

It looks just like random.shuffle() mentioned before, but you can change it as you see fit.

If you can afford to test all permutations you can iterate over it by doing:

>>> from itertools import permutations
>>> l=range(4)
>>> l
[0, 1, 2, 3]
>>> for i in permutations(l): print i
...
(0, 1, 2, 3)
(0, 1, 3, 2)
(0, 2, 1, 3)
[...]
(3, 1, 2, 0)
(3, 2, 0, 1)
(3, 2, 1, 0)
>>>

Note that 'i' is a tuple.

If you need a list of all permutations to make a selection:

>>> l=range(4)
>>> l
[0, 1, 2, 3]
>>> [list(i) for i in permutations(l)]
[[0, 1, 2, 3], [0, 1, 3, 2], [0, 2, 1, 3], [0, 2, 3, 1], [0, 3, 1, 2], [0, 3, 2, 1], [1, 0, 2, 3], [1, 0, 3, 2], [1, 2, 0, 3], [1, 2, 3, 0], [1, 3, 0, 2], [1, 3, 2, 0], [2, 0, 1, 3], [2, 0, 3, 1], [2, 1, 0, 3], [2, 1, 3, 0], [2, 3, 0, 1], [2, 3, 1, 0], [3, 0, 1, 2], [3, 0, 2, 1], [3, 1, 0, 2], [3, 1, 2, 0], [3, 2, 0, 1], [3, 2, 1, 0]]

This will produce big lists:
-for 10 elements (l=range(10)) the size of the list is about 30MB (on Windows).
-for 11 elements (l=range(11)) the size of the list is about 335MB (on Windows). It took more than 7GB for CPython 2.7.5 to create that list.

Didn't try after that.

John Ladasky

unread,
May 24, 2013, 1:33:47 PM5/24/13
to
On Friday, May 24, 2013 3:52:18 AM UTC-7, Steven D'Aprano wrote:
> On Fri, 24 May 2013 01:14:45 -0700, Peter Brooks wrote:
>
> > That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
> > 2,1,4,3) that is different each time.
>
> You can't *guarantee* that it will be different each time.

Well, within limits, you can guarantee a LONG TIME between repeats. And that may be good enough for Peter's purpose.

When the number of elements in your sequence is short enough (the right value of "short enough" is at least partially a matter of opinion, and the amount of RAM available, but I would guess n < 10 myself), use itertools.permutations to generate all the possibilities at once -- call this list p. Store p in memory. Then shuffle p, and use its elements one at a time. If you get to the end of p and need to keep working, it's up to you whether to shuffle p a second time. If you don't reshuffle p, it guarantees the maximum interval between reusing the same permutation.

Use random.shuffle on your sequence directly, when your sequence has a larger number of elements. This doesn't guarantee that two successive permutations will not be identical, but the probability is of course very low.

Peter Brooks

unread,
May 24, 2013, 3:01:35 PM5/24/13
to
On May 24, 5:00 pm, Carlos Nepomuceno <carlosnepomuc...@outlook.com>
wrote:
>
>
> I don't know what "spurious evidence of correlation" is. Can you give a mathematical definition?
>
If I run the simulation with the same sequence, then, because event E1
always comes before event E2, somebody might believe that there is a
causative connection between them in the world that's being simulated,
when, in fact, they only correlate in this way because the sequence is
not being shuffled. That's what it means.

Actually it'll be a bit more subtle than that, because each iteration
of the simulation updates all nodes in one time interval, the events
will not usually show the order of iteration - but, where there are
any secondary effects, that are related to the order in which the
nodes are updated, these will always happen the same way, which is my
concern.

Carlos Nepomuceno

unread,
May 24, 2013, 5:33:01 PM5/24/13
to pytho...@python.org
----------------------------------------
> Date: Fri, 24 May 2013 12:01:35 -0700
> Subject: Re: Simple algorithm question - how to reorder a sequence economically
> From: peter.h....@gmail.com
> To: pytho...@python.org
>

> On May 24, 5:00 pm, Carlos Nepomuceno <carlosnepomuc...@outlook.com>
> wrote:
>>
>>
>> I don't know what "spurious evidence of correlation" is. Can you give a mathematical definition?
>>
> If I run the simulation with the same sequence, then, because event E1
> always comes before event E2, somebody might believe that there is a
> causative connection between them in the world that's being simulated,
> when, in fact, they only correlate in this way because the sequence is
> not being shuffled. That's what it means.

Correlation does not imply causation. If "somebody" is an expert system and you want to avoid it's recognition and/or triggering of some kind, and you can't or don't want to change it's behavior, you may take the random way because it's cheaper.

> Actually it'll be a bit more subtle than that, because each iteration
> of the simulation updates all nodes in one time interval, the events
> will not usually show the order of iteration - but, where there are
> any secondary effects, that are related to the order in which the
> nodes are updated, these will always happen the same way, which is my
> concern.

You should have a more precise understanding of the dependence of the variables you taking in consideration before randomizing the series of events your are using for tests.

I suggest you start using PPMCC. If it's close to zero or negative you wouldn't have to mind about it! ;)

Peter Brooks

unread,
May 24, 2013, 8:28:07 PM5/24/13
to
On May 24, 11:33 pm, Carlos Nepomuceno <carlosnepomuc...@outlook.com>
wrote:
> ----------------------------------------
>
>
>
>
>
>
>
>
>
> > Date: Fri, 24 May 2013 12:01:35 -0700
> > Subject: Re: Simple algorithm question - how to reorder a sequence economically
> > From: peter.h.m.bro...@gmail.com
> > To: python-l...@python.org
>
> > On May 24, 5:00 pm, Carlos Nepomuceno <carlosnepomuc...@outlook.com>
> > wrote:
>
> >> I don't know what "spurious evidence of correlation" is. Can you give a mathematical definition?
>
> > If I run the simulation with the same sequence, then, because event E1
> > always comes before event E2, somebody might believe that there is a
> > causative connection between them in the world that's being simulated,
> > when, in fact, they only correlate in this way because the sequence is
> > not being shuffled. That's what it means.
>
> Correlation does not imply causation. If "somebody" is an expert system and you want to avoid it's recognition and/or triggering of some kind, and you can't or don't want to change it's behavior, you may take the random way because it's cheaper.
>
> > Actually it'll be a bit more subtle than that, because each iteration
> > of the simulation updates all nodes in one time interval, the events
> > will not usually show the order of iteration - but, where there are
> > any secondary effects, that are related to the order in which the
> > nodes are updated, these will always happen the same way, which is my
> > concern.
>
> You should have a more precise understanding of the dependence of the variables you taking in consideration before randomizing the series of events your are using for tests.
>
If the scenario could be modelled mathematically, then there'd be no
point in writing the simulation.

John Ladasky

unread,
May 25, 2013, 9:25:16 PM5/25/13
to
On Friday, May 24, 2013 10:33:47 AM UTC-7, Yours Truly wrote:
> If you don't reshuffle p, it guarantees the maximum interval between reusing
> the same permutation.

Of course, that comes at a certain price. Given two permutations p[x] and p[x+1], they will ALWAYS be adjacent, in every repetition of the list, unless you reshuffle. So the "spurious correlation" problem that Peter is worrying about would still exist, albeit at a meta-level.

You could play clever games, such as splitting p in half once you get to the end of it, shuffling each half independently, and then concatenating the halves. This algorithm scrambles the p[x]'s and p[x+1]'s pretty well, at the cost of cutting the average interval between repeats of a given p[x] from len(p) to something closer to len(p)/2.

Because someone's got to say it... "The generation of random numbers is too important to be left to chance." — Robert R. Coveyou

Roy Smith

unread,
May 25, 2013, 9:49:28 PM5/25/13
to
In article <15a1bb3a-514c-454e...@googlegroups.com>,
John Ladasky <john_l...@sbcglobal.net> wrote:

> Because someone's got to say it... "The generation of random numbers is too
> important to be left to chance." � Robert R. Coveyou

Absolutely. I know just enough about random number generation to
understand that I don't really know anything about it :-)

That being said, people who really care about random numbers, tend to
rely on some sort of physical process instead of computer algorithms. A
classic example would be /dev/random. A somewhat more fun example is
http://www.youtube.com/watch?v=7n8LNxGbZbs. Something radioactive and a
geiger counter are a good source of randomness (time intervals between
decay events).

Robert Kern

unread,
Jun 12, 2013, 9:14:43 AM6/12/13
to pytho...@python.org
On 2013-05-24 14:43, Chris Angelico wrote:
> On Fri, May 24, 2013 at 11:23 PM, Peter Brooks
> <peter.h....@gmail.com> wrote:
>> Actually, thinking about
>> it, there is probably a source of non-algorithmically-derived 'random'
>> numbers somewhere on the net that would do the job nicely.
>
> True entropy is usually provided by a source such as /dev/random (on
> Unix systems). It's sometimes referred to as "cryptographic"
> randomness, due to its necessity in secure encryption work. There are
> various ways to get this in a cross-platform way.

os.random() and os.urandom(), particularly.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

0 new messages