Here's what I'm currently doing:
%sections = (
'0'=>'x',
'1'=>'t',
'2'=>'m',
'3'=>'v'
);
$count_sect = (keys %sections); # counts the sections in the keys
do {
$rand_sect = int(rand($count_sect)); # generates a random integer <
$count_sect
print 'key:'.$rand_sect."\n";
if (exists $sections{$rand_sect})
{ $sect=$sections{$rand_sect};
print "SECTION:".$sect."\n"; # do things with the returned section
delete $sections{$rand_sect};
}
else
{ print '-key non existent'."\n";}
} until (!keys %sections);
Anyone have suggestions for improvement? Thanks.
>I have an array that I need to cycle through once, but in a random
>order.
perldoc -q shuffle
This is at least significantly less code:
my @arr = qw/x t m v/;
print splice(@arr, rand $_, 1) for reverse 1 .. @arr;
As Brian suggested, you should also read the applicable FAQ entry,
especially if your array is large.
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
Or even
print splice(@arr, rand @arr, 1) while @arr;
> As Brian suggested, you should also read the applicable FAQ entry,
> especially if your array is large.
There is also List::Util::shuffle, ready to use.
Anno
Let's not. Splicing out elements one-by-one of an array is very
inefficient. Shuffling can be done in linear time, while the splice
method takes time quadratic in the length of the array.
It's even worse than suggesting to use Bubblesort to sort an array.
See also http://www.perlmonks.org/index.pl?node_id=349969
Abigail
--
perl -wlpe '}$_=$.;{' file # Count the number of lines.
Yeah, that's what the FAQ says, too. OTOH, I measured the speed with
help of the Benchmark module, and code that created the array
my @arr = 1 .. 5000;
and then spliced it randomly using those inefficient methods, run
about 15 times per second. So, if you don't have a really big array,
or are about to run it very frequently, is there anything wrong with
keeping it simple?
> Shuffling can be done in linear time, while the splice
> method takes time quadratic in the length of the array.
I measured with the List::Util::shuffle() function, too, and it run
about 70 times per second. However, when I included loading of the
module in the benchmark, the overall speed was reduced to 12 times per
second, i.e. slightly slower than the slice() method...
use Benchmark 'cmpthese';
my %oldINC = %INC;
cmpthese -5, {
'splice()' => sub {
my @arr = 1 .. 5000;
my @shuffled;
push @shuffled, splice(@arr, rand @arr, 1) while @arr;
},
'shuffle()' => sub {
my @arr = 1 .. 5000;
%INC = %oldINC;
require List::Util;
my @shuffled = List::Util::shuffle(@arr);
},
};
Output including module loading:
Rate shuffle() splice()
shuffle() 12.0/s -- -23%
splice() 15.5/s 30% --
Odd. I just tried the same thing (well, I added a third option - sort
{int(rand(3)) - 1} @arr - to double check:
use Benchmark 'cmpthese';
my %oldINC = %INC;
cmpthese -5, {
'splice()' => sub {
my @arr = 1..5000;
my @shuffled;
push @shuffled, splice(@arr, rand @arr, 1) while @arr;
},
'shuffle()' => sub {
my @arr = 1..5000;
%INC = %oldINC;
require List::Util;
my @shuffled = List::Util::shuffle(@arr);
},
'rand()' => sub {
my @arr = 1..5000;
my @shuffled = sort {int(rand(3)) - 1} @arr;
}
};
and I got:
Rate rand() splice() shuffle()
rand() 23.1/s -- -47% -88%
splice() 43.3/s 87% -- -77%
shuffle() 189/s 719% 338% --
Where shuffle was the clear winner by a mile.... Normally I'd say YMMV, but this
seems like a slightly over the top difference. I'm using an slightly old perl
(5.8.1) - maybe something is different between our versions?
MB
Josh
I've seen variations of the third option before, being presented as a
way to shuffle a list. But I've yet to see any proof that this gives a
fair shuffle (fair being every possible permutation of the list has the
same chance of being produced, given a truely random 'rand()' function).
Abigail
--
perl -we '$_ = q ;4a75737420616e6f74686572205065726c204861636b65720as;;
for (s;s;s;s;s;s;s;s;s;s;s;s)
{s;(..)s?;qq qprint chr 0x$1 and \161 ssq;excess;}'
Given that's also one of the most efficient ways of doing it, no need
for a chance if the array grows.
Abigail
--
perl -we 'print split /(?=(.*))/s => "Just another Perl Hacker\n";'
<snip>
>> Output including module loading:
>>
>> Rate shuffle() splice()
>> shuffle() 12.0/s -- -23%
>> splice() 15.5/s 30% --
>
> Odd. I just tried the same thing (well, I added a third option -
> sort {int(rand(3)) - 1} @arr - to double check:
<snip>
> and I got:
>
> Rate rand() splice() shuffle()
> rand() 23.1/s -- -47% -88%
> splice() 43.3/s 87% -- -77%
> shuffle() 189/s 719% 338% --
>
> Where shuffle was the clear winner by a mile.... Normally I'd say
> YMMV, but this seems like a slightly over the top difference. I'm
> using an slightly old perl (5.8.1) - maybe something is different
> between our versions?
Well, I ran it using 5.8.0 on Windows 98. Just tried it with 5.8.1 on
Linux, and then the results were similar to yours. Can the explanation
possibly be that loading a compiled module is much slower on Windows?
Clearly it doesn't. Consider the simple case of a two element list (a,b).
There are two possible shuffle results: (a,b) and (b,a) each of which will
have equal probability of occuring in a fair shuffle.
Any sane implementation of sort will when asked to sort a two element list
do a single comparison, so for the above there are three cases:
(int(rand(3)) -1)==-1 -> (a,b)
(int(rand(3)) -1)==0 -> (a,b)
(int(rand(3)) -1)==1 -> (b,a)
Clearly the shuffle isn't fair since (a,b) is twice as likely. Of course
it might improve with a larger list, but it doesn't bode well when the
algorithm fails for such a simple case.
Now the middle case, it could be argued, could result in (b,a) if the
sort was not stable - but even so if it did it always would and hence
(b,a) would be favoured and the problem remains. Well, for a sane
sort implementation anyway.
--
Sam Holden
> On 23 Aug 2004 07:27:36 GMT, Abigail <abi...@abigail.nl> wrote:
>> Matthew Braid (m...@uq.net.au.invalid) wrote on MMMMX September MCMXCIII in
>><URL:news:cgbd5q$j40$1...@bunyip.cc.uq.edu.au>:
>> ''
>> '' Odd. I just tried the same thing (well, I added a third option - sort
>> '' {int(rand(3)) - 1} @arr - to double check:
>>
>> I've seen variations of the third option before, being presented as a
>> way to shuffle a list. But I've yet to see any proof that this gives a
>> fair shuffle (fair being every possible permutation of the list has the
>> same chance of being produced, given a truely random 'rand()' function).
>
> Clearly it doesn't. Consider the simple case of a two element list (a,b).
> There are two possible shuffle results: (a,b) and (b,a) each of which will
> have equal probability of occuring in a fair shuffle.
>
> Any sane implementation of sort will when asked to sort a two element list
> do a single comparison, so for the above there are three cases:
>
> (int(rand(3)) -1)==-1 -> (a,b)
> (int(rand(3)) -1)==0 -> (a,b)
> (int(rand(3)) -1)==1 -> (b,a)
>
> Clearly the shuffle isn't fair since (a,b) is twice as likely. Of course
> it might improve with a larger list, but it doesn't bode well when the
> algorithm fails for such a simple case.
The naive solution to that would be disallowing '0' so that pairs are
always swapped.
However, this wont make the shuffling fair either. One problem (which
can be verified with some simple empirical tests), is that the fairness
depends on the algorithm used.
Taking this little program:
my @list = (1 .. 4);
my %perms;
shuffle() for 1 .. 10000;
while (my ($key, $val) = each %perms) {
print "$key => $val\n";
}
sub shuffle {
my @shuffled = sort { int(rand(2)) == 0 ? -1 : 1 } @list;
$perms{ join "-", @shuffled }++;
}
On perl5.8.4 with merge-sort, this yields a distribution ressembling:
2-1-4-3 => 620
1-2-4-3 => 606
3-2-1-4 => 313
1-4-2-3 => 331
2-4-3-1 => 300
1-3-4-2 => 330
2-1-3-4 => 583
4-1-2-3 => 320
1-2-3-4 => 678
3-4-2-1 => 619
2-3-4-1 => 313
3-1-4-2 => 301
4-3-1-2 => 646
2-3-1-4 => 325
4-3-2-1 => 627
1-3-2-4 => 317
4-2-1-3 => 296
1-4-3-2 => 323
2-4-1-3 => 289
3-2-4-1 => 303
3-4-1-2 => 609
4-1-3-2 => 332
4-2-3-1 => 303
3-1-2-4 => 316
You consistently get certain permutations that show up only around 300
times where others happen at least 600 times (2-1-4-3 is one of them).
Now running the same with 'use sort "_qsort"' gives a very different
picture. It's even more biased:
2-1-4-3 => 621
1-2-4-3 => 584
3-2-1-4 => 651
1-4-2-3 => 321
2-4-3-1 => 155
1-3-4-2 => 323
2-1-3-4 => 1242
4-1-2-3 => 299
1-2-3-4 => 1199
3-4-2-1 => 142
2-3-4-1 => 344
3-1-4-2 => 286
4-3-1-2 => 169
4-3-2-1 => 155
2-3-1-4 => 647
1-3-2-4 => 629
4-2-1-3 => 328
1-4-3-2 => 154
2-4-1-3 => 290
3-2-4-1 => 316
4-1-3-2 => 174
3-4-1-2 => 147
3-1-2-4 => 659
4-2-3-1 => 165
Lists 2-1-3-4 and 1-2-3-4 are always at least around 1200.
In a strict sense, this doesn't prove anything and the randomized sort
could still be fair. But really, it's not.
It might be possible to write a sort algorithm in a way that it could be
used for shuffling. Perl's implementations however weren't designed that
way. Especially perl's quicksort has been optimized to death and I
wouldn't be surprised if those optimizations contribute to this strong
bias towards certain permutations.
Having said that, I tried it with these two sort-routines:
sub bubble (&@) {
my ($cmp, @copy) = @_;
for my $i (reverse 1 .. $#copy) {
for my $j (1 .. $i) {
local ($a, $b) = @copy[$j-1, $j];
@copy[$j-1, $j] = @copy[$j, $j-1] if &$cmp == 1;
}
}
return @copy;
}
sub selection (&@) {
my ($cmp, @copy) = @_;
my $min;
for my $i (0 .. $#copy-1) {
$min = $i;
for my $j ($i+1 .. $#copy) {
local ($a, $b) = @copy[$j, $min];
$min = $j if &$cmp == -1;
}
@copy[$min, $i] = @copy[$i, $min];
}
return @copy;
}
in the hope that maybe those naive sort algorithms could be used (as
they tend to compare each element with all remaining ones). But I was
wrong. Selection-sort has a similar bias as quicksort has (this time
favoring 4-1-2-3 and 4-1-3-2). Bubble-sort is slightly better, but still
more biased than perl's built-in mergesort.
There are other interesting things to note. When allowing the compare
function to return -1, 1 and 0 (as opposed to only -1 and 1), selection
sort gets less biased, whereas bubble sort gets more biased (4-3-2-1
only shows up around ten times and 1-2-3-4 1800 times on average).
Without being a formal proof for anything, these observations show that
the shuffle-by-sort approach is probably very rotten and should best be
left alone.
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
True, though the quadratic term has a small factor. One time-linear
shuffler swaps list elements instead of extracting them. That saves
the time splice needs to shift the array tight:
sub swapping {
for ( reverse 1 .. $#_ ) {
my $r = rand( 1 + $_);
@_[ $r, $_] = @_[ $_, $r];
}
@_;
}
Benchmarking this against
sub splicing {
map splice( @_, rand $_, 1), reverse 1 .. @_;
}
shows splicing() about twice as fast for arrays shorter than 1000.
The break-even point is with lists of length 12_000; from then on
swapping() wins out by an increasing margin. At 30_000, swapping
is twice as fast as splicing. On my machine.
That leaves quite a few applications (such as card shuffling) where
the splice solution is practical.
> It's even worse than suggesting to use Bubblesort to sort an array.
Even Bubblesort has found respectable applications. It is fast for small
lists that are almost in order and has been used to speed up another
sort algorithm. I think the example is documented in the _Programming
Pearls_ series, though the example I find [1] uses insertion sort (another
n**2 sorting algorithm).
Anno
[1] Bentley, _Programming Pearls_, p. 112f
The benchmark I posted showed that the execution time of
List::Util::shuffle() is much faster than that. Is the explanation,
then, that I was actually comparing apples and oranges, since
List::Util is a compiled module?
It is. It also contains a pure Perl implementation, but by default it
loads an XS version. That's why I wanted to add a benchmark against
a Perl implementation of the in-place shuffle.
Anno
[interesting results snipped]
One problem with this approach is that it violates the assumptions of
the underlying algorithm. Sort procedures assume that comparisons are
consistent with a linear ordering of the list elements. In particular,
they assume that comparing the same elements twice will have the same
outcome, but also transitivity of the order relation and more. Who
is to say what a sort algorithm is going to do? It may not even halt,
for instance if the random decisions indicate a cyclic ordering.
Nor are any predictions on running time valid, so the expectation
that it runs in n*log n time may not be true. Since that is worse
than the linear time of an in-place shuffle, there isn't much to
recommend the "random sort" at all.
Anno
So, it's fair to compare a splice who is doing its work in C with a
shuffle written in pure Perl?
I'd say comparing the XS shuffle with the C splice is more fair.
Abigail
--
echo "==== ======= ==== ======"|perl -pes/=/J/|perl -pes/==/us/|perl -pes/=/t/\
|perl -pes/=/A/|perl -pes/=/n/|perl -pes/=/o/|perl -pes/==/th/|perl -pes/=/e/\
|perl -pes/=/r/|perl -pes/=/P/|perl -pes/=/e/|perl -pes/==/rl/|perl -pes/=/H/\
|perl -pes/=/a/|perl -pes/=/c/|perl -pes/=/k/|perl -pes/==/er/|perl -pes/=/./;
[...]
> One problem with this approach is that it violates the assumptions of
> the underlying algorithm. Sort procedures assume that comparisons are
> consistent with a linear ordering of the list elements. In particular,
> they assume that comparing the same elements twice will have the same
> outcome, but also transitivity of the order relation and more. Who
> is to say what a sort algorithm is going to do? It may not even halt,
> for instance if the random decisions indicate a cyclic ordering.
Indeed, the documentation for Perl's sort function says:
The comparison function is required to behave. If it returns
inconsistent results (sometimes saying $x[1] is less than $x[2]
and sometimes saying the opposite, for example) the results are
not well-defined.
-----ScottG.
I didn't want to imply any unfairness, just to add a data point.
> I'd say comparing the XS shuffle with the C splice is more fair.
Replacing n calls to a C-level function by a single one is fair?
We know that the overhead swamps out a lot of what may be happening
low level, and these benchmarks prove it again. So setting one
swap operation against one call to splice seems fair enough from
that point of view.
Anno
> Matthew Braid (m...@uq.net.au.invalid) wrote on MMMMX September MCMXCIII in
> <URL:news:cgbd5q$j40$1...@bunyip.cc.uq.edu.au>:
> ''
> '' Odd. I just tried the same thing (well, I added a third option - sort
> '' {int(rand(3)) - 1} @arr - to double check:
>
> I've seen variations of the third option before, being presented as a
> way to shuffle a list. But I've yet to see any proof that this gives a
> fair shuffle (fair being every possible permutation of the list has the
> same chance of being produced, given a truely random 'rand()' function).
>
>
> Abigail
I only included the rand() method out of curiousity (I knew it was gonna be
slow, I just put it there as a comparison). I'd never actually _use_ that method
in my code :)
MB
>Odd. I just tried the same thing (well, I added a third option - sort
>{int(rand(3)) - 1} @arr - to double check:
I've always heard that using an inconsequent sorting function in perl,
like this one, is a recipe for a disaster.
You could just generate a random number per entry, and sort according to
that list. Like:
@sort = map rand, 0 .. $#item;
@shuffled = @item[sort { $sort[$a] <=> $sort[$b] } 0 .. $#item];
--
Bart.
It may still be hard to prove that this generates an unbiased sequence.
There will usually be a few duplicates among the generated numbers.
For these, the sequence is determined by the sort algorithm, not by
the random generator.
Anno
>There will usually be a few duplicates among the generated numbers.
Not *exact* duplicates, in the numerical sense (though perhaps it
looksthat way when stringified). Otherwise, you'd have reached the end
of the loop, and start getting the exact same sequence again.
--
Bart.