my_array.sort { |a,b| rand(2) }
>
but the results weren't very random. I then tried the following:
class Array
> def random_copy
> b = Array.new
> while b.length < self.length
> random_value = self[rand self.length]
> b.push random_value unless b.include? random_value
> end
> b
> end
> end
This works but I'm sure there's some Ruby magic for doing this better.
thanks
I think the most common idiom is:
array.sort_by { rand }
David
--
David A. Black
dbl...@wobblini.net
"Ruby for Rails", forthcoming from Manning Publications, April 2006!
my_array.sort_by { rand }
--
Posted via http://www.ruby-forum.com/.
j.
On 12/1/05, Vincent Foley <vfo...@gmail.com> wrote:
>
> Maybe there should be Array#randomize...
>
>
>
--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"
Jeff Wood
> hammed wrote:
> > I'd like to sort collections randomly. This is what I tried first:
> >
> > my_array.sort { |a,b| rand(2) }
>
> my_array.sort_by { rand }
A quick test seems to indicate that this works with the ruby 1.8.2
implementation on my Mac. However, there is no guarantee that this call
will select each of the n! permutations with equal probability.
Whether it does will depend on the exact algorithm used by 'sort'. For
instance, if sort uses the (inefficient for large arrays) bubble sort,
the last key will end up staying there in half the cases.
Reinder
> However, there is no guarantee that this call
> will select each of the n! permutations with equal probability.
How about just doing this?
class Array
def shuffle
newarr = self
self.length.times do |x|
y = rand(self.length)
newarr[x], newarr[y] = newarr[y], newarr[x]
end
newarr
end
end
shuffled_array = my_array.shuffle
--Steve
Actually, sort_by caches every value passed and uses those for
comparison (aka a Schwartzian Transform), so it would essentially
return the same results as sorting a list of random numbers,
regardless of algorithm.
Sam
Actually, it will. sort_by uses a Schwartzian transform to do the
sorting. That means that rand is only called once for each element of
the array. As long as rand gives a decent distribution of random
numbers, the permutation will be random too.
Here's a little experiment:
-------------------------------------------------
N = (ARGV.shift || 1000).to_i
A = %w(a b c)
Perms = %w(abc acb bac bca cab cba)
Score = Hash.new { |h, k| h[k] = 0 }
N.times do
sorted = A.dup.sort_by { rand }
Score[sorted.join("")] += 1
end
Score.keys.sort.each do |key|
puts "#{key}: #{Score[key]}"
end
-------------------------------------------------
Gives:
$ ruby sortdist.rb 100000
abc: 16693
acb: 16688
bac: 16752
bca: 16590
cab: 16475
cba: 16802
Thats a pretty good distribution for the number of iterations.
-- Jim Weirich
> On Sun, Dec 04, 2005 at 08:48:11AM +0900, Jim Weirich wrote:
>> reinder wrote:
>>>> my_array.sort_by { rand }
>>>
>>> A quick test seems to indicate that this works with the ruby 1.8.2
>>> implementation on my Mac. However, there is no guarantee that this call
>>> will select each of the n! permutations with equal probability.
>>
>> Actually, it will. sort_by uses a Schwartzian transform to do the
>> sorting. That means that rand is only called once for each element of
>> the array. As long as rand gives a decent distribution of random
>> numbers, the permutation will be random too.
>
> sort_by{ rand } is actually biased, since sort_by will preserve the
> relative order of the elements for which rand() returned the same value.
>
> %w[a b c d].sort_by{ 0 } # => ["a", "b", "c", "d"]
> i = 0
> %w[a b c d].sort_by{ 10 - (i += 1) / 2 } # => ["d", "b", "c", "a"]
>
> This means that permutations preserving the relative order of one (or
> more) pair of elements of the original array are a bit more probable.
>
> In praxis, the bias this introduces is often so small that the mere
> succinctness of sort_by{ rand } more than makes up for it.
does
%w( a b c d ).sort{ rand <=> rand }
help?
-a
--
===============================================================================
| ara [dot] t [dot] howard [at] noaa [dot] gov
| all happiness comes from the desire for others to be happy. all misery
| comes from the desire for oneself to be happy.
| -- bodhicaryavatara
===============================================================================
sort_by{ rand } is actually biased, since sort_by will preserve the
relative order of the elements for which rand() returned the same value.
%w[a b c d].sort_by{ 0 } # => ["a", "b", "c", "d"]
i = 0
%w[a b c d].sort_by{ 10 - (i += 1) / 2 } # => ["d", "b", "c", "a"]
This means that permutations preserving the relative order of one (or
more) pair of elements of the original array are a bit more probable.
In praxis, the bias this introduces is often so small that the mere
succinctness of sort_by{ rand } more than makes up for it.
--
Mauricio Fernandez
How frequently would rand return a same value? (In theory it would be
never but does anyone know the reality?)
> How frequently would rand return a same value? (In theory it would be
> never but does anyone know the reality?)
According to the Pickaxe, Kernel.rand is an implementation of the
[Mersenne Twister][1] algorithm. If that is still the case, then in
reality it will begin repeating exactly the same numbers after
2^19937-1 (approx. 4e6001) pseudo-random numbers, as that's the
period of the MT. The period is the point at which the exact same
series of PRNs will begin again. For example, if it started out as
1, 2, 3, ..., then after 2^19937-1, it'd start again at 1, 2, 3.
But, that's not what you really care about.
Numbers will certainly repeat within the period, since the period is
so much larger than our integer representation. For example, a 32
bit integer can only represent about 4e9. So, the likelihood of
getting any specific integer, in the 32 bit case, is 1/(2^32). This
is an extreme example:
5.times { p rand(2) }
5.times { p rand(1000) }
The bottom line is that the PRNs should be approximately uniformly
distributed. Apply that to your specific range, and you have your
answer.
--Steve
> Numbers will certainly repeat within the period, since the period
> is so much larger than our integer representation. For example, a
> 32 bit integer can only represent about 4e9. So, the likelihood of
> getting any specific integer, in the 32 bit case, is 1/(2^32).
I should add to this that you can approximate when you'll get
repeats, and it's likely to be much sooner than you'd guess. For
more information, see the [Birthday Problem][1].
--Steve
"Just" doing that
1) changes the receiver (use new_arr = self.dup instead)
2) definitely has a bias. Look e.g. at an array of length three. You will
first swap the first element with 3 possibilities, then the second and so
on; you're going through 3**3 possibilities. There are 3! possible
permutations, but alas, 3! == 6 is not a perfect divisor of 3**3 == 27,
thus you have a bias.
> 1) changes the receiver (use new_arr = self.dup instead)
> 2) definitely has a bias. Look e.g. at an array of length three.
> You will
> first swap the first element with 3 possibilities, then the
> second and so
> on; you're going through 3**3 possibilities. There are 3! possible
> permutations, but alas, 3! == 6 is not a perfect divisor of 3**3
> == 27,
> thus you have a bias.
Good points, thanks. How about this?
class Array
def shuffle
newarr = self.dup
2.times do
self.length.times do |x|
y = rand(self.length)
newarr[x], newarr[y] = newarr[y], newarr[x]
end
end
newarr
end
end
# test code lifted from Jim Weirich's earlier post
N = (ARGV.shift || 1000).to_i
A = %w(a b c)
Perms = %w(abc acb bac bca cab cba)
Score = Hash.new { |h, k| h[k] = 0 }
N.times do
sorted = A.shuffle
Score[sorted.join("")] += 1
end
Score.keys.sort.each do |key|
puts "#{key}: #{Score[key]}"
end
[~/Code/private/code/snippets] 382% ./shuffle.rb 100000
abc: 16596
acb: 16394
bac: 16700
bca: 16684
cab: 16841
cba: 16785
0:07.87 seconds, 96.4% CPU
This is really just meant to be a simple hack. I think a better
algorithm could assign a random index to each element, then sort on
those indices.
--Steve
I liked the Birthday Problem. Thanks.
I did some numbers and found the following probability estimates:
array size P(#rand() collision)
1000 5.54558e-11
10000 5.55056e-09
1000000 5.55096e-05
10000000 5.53574e-03
A collision implies that the associated pair of elements has got the same
ordering in the output array, instead of 50-50 chances of it being reversed.
More info, including the Ruby code I used, available at
http://eigenclass.org/hiki.rb?sort_by+rand+is+biased
> does
>
> %w( a b c d ).sort{ rand <=> rand }
>
> help?
I'm not totally sure it's really unbiased, but at first sight it looks good.
When called with no arguments, rand will use
static double
genrand_real(void)
{
unsigned long a=genrand_int32()>>5, b=genrand_int32()>>6;
return(a*67108864.0+b)*(1.0/9007199254740992.0);
}
so it would seem that two consecutive calls to rand() cannot return the
same value, but I still have to think about the effect of qsort.
It is however somewhat slower than sort_by{ rand }.
[100, 1000, 10000, 100000].each do |n|
GC.start
t0 = Time.new
(1..n).sort_by{rand}
a = (t1 = Time.new) - t0 # => 0.000425, 0.003002, 0.054392, 0.939692
(1..n).sort{rand <=> rand}
b = Time.new - t1 # => 0.001161, 0.026275, 0.289807, 3.791833
"%4.2f" % [b / a] # => "2.73", "8.75", "5.33", "404"
end
RUBY_VERSION # => "1.8.3"
RUBY_RELEASE_DATE # => "2005-09-24"
--
Mauricio Fernandez
It's considerably slower. Changing it to sort { rand(10000) <=> 5000 }
shows a 30% speedup on 10k element Arrays on my system. Don't know why
I chose those numbers, I suppose bigger numbers reduce the "0"
comparison, considering { rand(3) <=> 1 }.
Anyway, I put together a small benchmark to compare my (revised)
shuffle, to sort, and sort_by:
#!/usr/bin/env ruby
class Array
def shuffle
newarr = self.dup
2.times do
self.length.times do |x|
y = rand(self.length)
newarr[x], newarr[y] = newarr[y], newarr[x]
end
end
newarr
end
end
printf(" size shuffle sort sort_by\n")
[10, 100, 1000, 10000].each do |n|
# populate an Array size n
a = Array.new(n) { |i| i }
# my method
start = Time.new
100.times do
b = a.shuffle
end
a_time = Time.new - start
# Array's sort
start = Time.new
100.times do
b = a.sort { rand(10000) <=> 5000 }
end
b_time = Time.new - start
# Enumerable's sort_by
start = Time.new
100.times do
b = a.sort_by { rand }
end
c_time = Time.new - start
# results
printf("%6d %5.2f %5.2f %5.2f\n", n, a_time, b_time, c_time)
end
Which, on my P4/3.0GHz (Win32) emits this:
size shuffle sort sort_by
10 0.00 0.02 0.00
100 0.16 0.13 0.03
1000 1.61 2.03 0.44
10000 16.31 28.14 4.78
--Steve
> On Sun, Dec 04, 2005 at 10:21:02AM +0900, ara.t....@noaa.gov wrote:
>> On Sun, 4 Dec 2005, Mauricio [iso-8859-1] Fernández wrote:
>>> sort_by{ rand } is actually biased, since sort_by will preserve the
>>> relative order of the elements for which rand() returned the same value.
>
> I did some numbers and found the following probability estimates:
>
> array size P(#rand() collision)
> 1000 5.54558e-11
> 10000 5.55056e-09
> 1000000 5.55096e-05
> 10000000 5.53574e-03
>
> A collision implies that the associated pair of elements has got the same
> ordering in the output array, instead of 50-50 chances of it being reversed.
>
> More info, including the Ruby code I used, available at
> http://eigenclass.org/hiki.rb?sort_by+rand+is+biased
>
>> does
>>
>> %w( a b c d ).sort{ rand <=> rand }
>>
>> help?
>
> I'm not totally sure it's really unbiased, but at first sight it looks good.
the big difference is that a value will have rand called multiple times for it
- in otherwords an element compareds differently every time. this is bound to
help shuffle more uniformly it seems:
harp:~ > cat a.rb
%w( a b c d ).sort{|a,b| ra = rand; rb = rand; p a => ra; p b => rb; ra <=> rb }
harp:~ > ruby a.rb
{"a"=>0.181439380218727}
{"c"=>0.579403886629126}
{"c"=>0.713513989939958}
{"d"=>0.573687508540169}
{"a"=>0.851534740906118}
{"d"=>0.00156074151427565}
{"b"=>0.803591007691647}
{"a"=>0.494066110872563}
{"b"=>0.834676789626288}
{"c"=>0.792106134460754}
so, note that "a" was sorted using a different value each time.
can anyone who remembers stats and knows the sort algorithm weigh in?
|| > More info, including the Ruby code I used, available at
|| > http://eigenclass.org/hiki.rb?sort_by+rand+is+biased
|| >
|| >> does
|| >>
|| >> %w( a b c d ).sort{ rand <=> rand }
|| >>
|| >> help?
Hi, I'm new to ruby, but I use Python for some years now.
So I suggest the following 'pythonic' solution ;-)
def shuffle(a)
b=[]
# decorate
a.each { |x| b << [rand,x]; }
# sort
b.sort! { |x,y| x[0] <=> y[0] }
# undecorate
b.collect { |x| x[1] }
end
a=%w{a b c d e f}
puts shuffle a
Is there a shorter or more elegant way to do this
'decorate-sort-undecorate' in ruby ?
In Python it is quite easy by using list comprehensions...
Greetings, Uwe
That might be the very reason why it is ultimately much more biased than
sort_by{ rand }... I'm keeping the latter.
If the limited precision of rand() became a problem, I could always use
arr.sort_by{ [rand, rand] } where collisions happen with a probability
around 6.16298e-19 for a 10_000_000 element array, or more generally
Array.new(N){rand} (but then we'd have to look into MT for another
possible source of residual bias).
> it - in otherwords an element compareds differently every time. this is
> bound to help shuffle more uniformly it seems:
We cannot trust our intuition in these cases; in the following example, the
tenth element is more likely to stay where it was (look at pos[9]):
a = (1..100); pos = Array.new(100, 0)
t = Time.new; 100000.times{ pos[a.sort{rand <=> rand}.index(10)] += 1 }
Time.new - t
# => 223.891608
pos
# => [1176, 1208, 1083, 1153, 1191, 1130, 1098, 1133, 1189, 1438, 1150, 1043,
========
# 1121, 1067, 1076, 1069, 1078, 1113, 1129, 1143,1115, 1074, 1081, 1069, 1057,
# [...]
t = Time.new; 10000000.times{ pos[a.sort{rand <=> rand}.index(10)] += 1 }
Time.new - t
# => 22781.664168
pos
# => [105944, 105861, 105546, 105681, 104059, 101196, 99080, 100675, 110851,
# 132223, 101670, 96018, 96147, 96688, 97709, 99051, 97823, 97345, 98799,
========
# [...]
The bias can be observed easily with smaller arrays:
a = (1..20); pos = Array.new(20, 0)
t = Time.new
100000.times{ pos[a.sort{rand <=> rand}.index(3)] += 1 }
Time.new - t # => 20.848851
pos
# => [5084, 5694, 6561, 4326, 4205, 4792, 4913, 4827, 4791, 4571, 4560, 4707,
# 4863, 4934, 5112, 5011, 5071, 5326, 5482, 5170]
a = (1..10); pos = Array.new(10, 0)
t = Time.new
1_000_000.times{ pos[a.sort{rand <=> rand}.index(5)] += 1 }
Time.new - t # => 77.438912
pos
# => [113836, 103238, 99442, 88682, 101306, 83294, 93296, 100830, 100744, 115332]
Compare the latter to
a = (1..10); pos = Array.new(10, 0)
t = Time.new
1_000_000.times{ pos[a.sort_by{rand}.index(5)] += 1 }
Time.new - t # => 32.163499
pos
# => [99822, 100020, 100328, 100040, 99603, 100098, 100195, 100023, 99875, 99996]
--
Mauricio Fernandez
> On Tue, Dec 06, 2005 at 08:23:52AM +0900, ara.t....@noaa.gov wrote:
>
>>the big difference is that a value will have rand called multiple times for
>
>
> That might be the very reason why it is ultimately much more biased than
> sort_by{ rand }... I'm keeping the latter.
Indeed,
ar = [0, 1, 2]; result = Hash.new(0)
100000.times {result[ar.sort {rand <=> rand}] += 1}
p result
# => {[2, 1, 0]=>25235,
[2, 0, 1]=>12285,
[1, 0, 2]=>12426,
[0, 1, 2]=>25111,
[1, 2, 0]=>12468,
[0, 2, 1]=>12475}
Michael
--
Michael Ulm
R&D Team
ISIS Information Systems Austria
tel: +43 2236 27551-219, fax: +43 2236 21081
e-mail: micha...@isis-papyrus.com
Visit our Website: www.isis-papyrus.com
arr.sort_by{rand}
> In Python it is quite easy by using list comprehensions...
What does that look like in Python?
Is it shorter than the above one? }:-)
--
Mauricio Fernandez
that is not equivalent to my solution.
the python equivalent to yours is
arr.sort(key = lambda x: random())
||
|| > In Python it is quite easy by using list comprehensions...
|| What does that look like in Python?
from random import random
def shuffle(a):
b = [ (random(), i) for i in a]
b.sort()
return [ x[1] for x in b ]
print shuffle(range(10))
|| Is it shorter than the above one? }:-)
||
|| --
|| Mauricio Fernandez
greetings, Uwe
||
> || arr.sort_by{rand}
>
> that is not equivalent to my solution.
> the python equivalent to yours is
>
> arr.sort(key = lambda x: random())
>
> ||
> || > In Python it is quite easy by using list comprehensions...
> || What does that look like in Python?
>
> from random import random
>
> def shuffle(a):
> b = [ (random(), i) for i in a]
> b.sort()
> return [ x[1] for x in b ]
>
> print shuffle(range(10))
>
I would have thought those two functions do the same thing
with the exception that shuffle does not sort in place.
I'm probably missing something obvious. Could you explain
the difference between the two versions?
> When called with no arguments, rand will use
>
> static double
> genrand_real(void)
> {
> unsigned long a=genrand_int32()>>5, b=genrand_int32()>>6;
> return(a*67108864.0+b)*(1.0/9007199254740992.0);
> }
>
> so it would seem that two consecutive calls to rand() cannot return the
> same value, but I still have to think about the effect of qsort.
And that is a good thing? How can the sequence be "random" if it
never contains the same two numbers in row? (Remember Enigma...)
> Mauricio Fernandez
--
Christian Neukirchen <chneuk...@gmail.com> http://chneukirchen.org
|| > || > In Python it is quite easy by using list comprehensions...
|| > || What does that look like in Python?
|| >
|| > from random import random
|| >
|| > def shuffle(a):
|| > b = [ (random(), i) for i in a]
|| > b.sort()
|| > return [ x[1] for x in b ]
|| >
|| > print shuffle(range(10))
|| I would have thought those two functions do the same thing
|| with the exception that shuffle does not sort in place.
|| I'm probably missing something obvious. Could you explain
|| the difference between the two versions?
If you use random as sorting key, the result of a<=>b
will differ each time you evaluate a<=>b.
In my case I attach a random number to each item,
then I sort my list according to this number and remove
this 'decoration' afterwards.
Pythonprogrammers call this pattern "decorate-sort-undecorate".
As I do not now anything about the ruby interanls of sort,
I prefer my version.
Greetings, Uwe.
U> If you use random as sorting key, the result of a<=>b
U> will differ each time you evaluate a<=>b.
U> In my case I attach a random number to each item,
U> then I sort my list according to this number and remove
U> this 'decoration' afterwards.
run
ri Enum#sort_by
and see what it do.
Guy Decoux
From: "Uwe Schmitt" <sch...@num.uni-sb.de>
> ||
> || On Tue, Dec 06, 2005 at 07:32:34PM +0900, Uwe Schmitt wrote:
> || > Is there a shorter or more elegant way to do this
> || > 'decorate-sort-undecorate' in ruby ?
> ||
> || arr.sort_by{rand}
>
> that is not equivalent to my solution.
> the python equivalent to yours is
>
> arr.sort(key = lambda x: random())
No, Array#sort_by does the 'decorate-sort-undecorate', aka.
Schwartzian Transform.
I don't read Python very well, but I think the ruby equivalent
to 'arr.sort(key = lambda x: random())' might be:
arr.sort { rand <=> rand }
But that is different from arr.sort_by {rand}
Regards,
Bill
Thanks. As I told you before I'm new to ruby ;-)
Greetings, Uwe
Nah, I misread genrand_real; the above shows it is possible.
--
Mauricio Fernandez
Are you sure of what you say? AFAIK
sort(key=lambda x: somefunc..))
will do decorate/sort/undecorate under the hood, behaving like ruby's
#sort_by and like your version.
OTOH
sort(cmp=lambda x,y: something.. ) will behave like ruby's #sort.
h = Hash.new
self.each { |v| h[rand(1000000000)] = v }
h.keys.sort.collect { |k| h[k] }
Explanation..
* Hash works better than an Array of Arrays
[[rand,x[0]],[rand,x[1]],...] because Array#<=> seems to be slow.
* Got some speedup by avoiding Float#<=>, hence the Integer version of rand.
* On rand(1000000000); chose 1billion rather arbitrarily to keep things
out of the range of BigNum. Oddly, using 2^31-1 caused BigNums to show
up. Ideally we'd use the largest FixNum, but I don't know how to
retrieve that.
* Bias should be minimal for most applications, even at rand(1Billion).
* Can it be improved further???
Find my benchmark results, and script below.
--Steve
user system total real
shuffle (10) 0.016000 0.000000 0.016000 ( 0.016000)
shuffle2 (10) 0.000000 0.000000 0.000000 ( 0.000000)
shuffle3 (10) 0.000000 0.000000 0.000000 ( 0.000000)
sort (10) 0.000000 0.000000 0.000000 ( 0.000000)
sort_by (10) 0.000000 0.000000 0.000000 ( 0.000000)
user system total real
shuffle (100) 0.156000 0.000000 0.156000 ( 0.156000)
shuffle2 (100) 0.047000 0.000000 0.047000 ( 0.046000)
shuffle3 (100) 0.031000 0.000000 0.031000 ( 0.032000)
sort (100) 0.125000 0.000000 0.125000 ( 0.125000)
sort_by (100) 0.031000 0.000000 0.031000 ( 0.031000)
user system total real
shuffle (1000) 1.625000 0.000000 1.625000 ( 1.625000)
shuffle2 (1000) 0.656000 0.000000 0.656000 ( 0.672000)
shuffle3 (1000) 0.266000 0.000000 0.266000 ( 0.281000)
sort (1000) 2.094000 0.000000 2.094000 ( 2.093000)
sort_by (1000) 0.453000 0.000000 0.453000 ( 0.453000)
user system total real
shuffle (10000) 16.469000 0.032000 16.501000 ( 16.578000)
shuffle2 (10000) 8.156000 0.000000 8.156000 ( 8.203000)
shuffle3 (10000) 3.344000 0.047000 3.391000 ( 3.391000)
sort (10000) 25.453000 0.000000 25.453000 ( 25.593000)
sort_by (10000) 4.765000 0.000000 4.765000 ( 4.796000)
#!/usr/bin/env ruby
require 'benchmark'
class Array
def shuffle
newarr = self.dup
2.times do
self.length.times do |x|
y = rand(self.length)
newarr[x], newarr[y] = newarr[y], newarr[x]
end
end
newarr
end
def shuffle2
tmparr = self.collect { |x| [rand(1000000000),x] }
tmparr.sort.collect { |x| x[1] }
end
def shuffle3
h = Hash.new
self.each { |v| h[rand(1000000000)] = v }
h.keys.sort.collect { |k| h[k] }
end
end
[10, 100, 1000, 10000].each do |n|
# populate an Array size n
a = Array.new(n) { |i| i }
# run our benchmarks
Benchmark.bm(16) do |bb|
# my method
bb.report("shuffle ("+n.to_s+")") do
100.times do
a.shuffle
end
end
# decorate, sort, undecorate
bb.report("shuffle2 ("+n.to_s+")") do
100.times do
a.shuffle2
end
end
# decorate, sort, undecorate with a hash
bb.report("shuffle3 ("+n.to_s+")") do
100.times do
a.shuffle3
end
end
# Array's sort
bb.report("sort ("+n.to_s+")") do
100.times do
a.sort { rand(10000) <=> 5000 }
end
end
# Enumerable's sort_by
bb.report("sort_by ("+n.to_s+")") do
100.times do
a.sort_by { rand }
end
end
printf("\n")
end
end
OOPS!
I forgot to deal with collisions in the hash! :(
--Steve
Here's another shot.. still faster than sort_by{rand} on my system, and
now should be unbiased:
def shuffle3
h = Hash.new
self.each do |v|
k = rand(1000000000) until !h.has_key?(k)
h[k] = v
end
end
shuffle3 (10000) 3.625000 0.016000 3.641000 ( 3.656000)
sort_by (10000) 5.250000 0.000000 5.250000 ( 5.312000)
--Steve
This is not equivalent to
begin
k = rand(1000000000)
end until !h.has_key?
Compare
k = 1 until true
k # => nil
to
begin; k = 1 end until true
k # => 1
> h[k] = v
You forgot
h.keys.sort.collect { |k| h[k] }
> end
> end
>
> shuffle3 (10000) 3.625000 0.016000 3.641000 ( 3.656000)
> sort_by (10000) 5.250000 0.000000 5.250000 ( 5.312000)
After fixing shuffle3, I get:
module Enumerable
def shuffle3
h = Hash.new
k = rand(1000000000)
self.each do |v|
k = rand(1000000000) until !h.has_key?(k)
h[k] = v
end
h.keys.sort.collect { |k| h[k] }
end
def shuffle4
h = {}
k = rand(1000000000)
self.sort_by{ k = rand(1000000000) while h.has_key?(k); h[k] = k }
end
end
GC.disable
(1..10).shuffle4 # => [8, 3, 10, 2, 4, 6, 7, 5, 9, 1]
a = 1..100000
t0 = Time.new # => Tue Dec 06 22:13:56 CET 2005
a.shuffle4
(t1 = Time.new) - t0 # => 0.766411
a.shuffle3
(t2 = Time.new) - t1 # => 0.575967
a.sort_by{ rand }
Time.new - t2 # => 0.637288
--
Mauricio Fernandez
Oops.. I thought too much like C. Now I understand that the loop is
called 0 or more times, except in the case of "begin; end while" where
it's 1 or more. Thank you for teaching me!
> After fixing shuffle3, I get:
I'm getting slightly different results, still better than sort_by{rand}
and better than your new shuffle4, but now only on larger Arrays. I
wonder too how performance might vary when Enumerable is mixed into ADTs
other than Array.
--Steve
user system total real
shuffle3 (1000) 0.375000 0.000000 0.375000 ( 0.375000)
shuffle4 (1000) 0.484000 0.000000 0.484000 ( 0.485000)
sort_by{rand} (1000) 0.328000 0.016000 0.344000 ( 0.344000)
user system total real
shuffle3 (10000) 3.906000 0.031000 3.937000 ( 3.953000)
shuffle4 (10000) 5.531000 0.047000 5.578000 ( 5.594000)
sort_by{rand} (10000) 4.672000 0.109000 4.781000 ( 4.812000)
user system total real
shuffle3 (100000) 49.703000 0.328000 50.031000 ( 50.280000)
shuffle4 (100000) 71.938000 0.453000 72.391000 ( 73.061000)
sort_by{rand} (100000) 62.000000 0.782000 62.782000 ( 63.451000)
module Enumerable
def shuffle3
h = {}
self.each do |v|
begin
k = rand(1000000000)
end while h.has_key?(k)
? that way, we'd have NO colisions ever.
What are you all trying to achieve by variations of
rand <=> rand,
why not simply use
rand(3) - 1
?
which is not in danger of decreasing randomness by combining random numbers
What you have posted does not avoid collisions.
You probably mean something like
a.sort {2*rand(12345) <=> 2 * rand(12345) + 1}
However, the problem of the sort {rand<=>rand}
methods are not collisions, but the skewed results
from the algorithm that assumes a consistent
comparison operator. Quoting from an earlier
post of mine:
ar = [0, 1, 2]; result = Hash.new(0)
100000.times {result[ar.sort {rand <=> rand}] += 1}
p result
# => {[2, 1, 0]=>25235,
[2, 0, 1]=>12285,
[1, 0, 2]=>12426,
[0, 1, 2]=>25111,
[1, 2, 0]=>12468,
[0, 2, 1]=>12475}
You get basically the same result with your method
HTH,
and then he should do us a favour and write
> a.sort { [-1, 1][rand(2)] }
instead, which gives the same skewed results but at least does not do
it as complicated.
Brian
--
http://ruby.brian-schroeder.de/
Stringed instrument chords: http://chordlist.brian-schroeder.de/
http://www.rubygarden.org/ruby?ArrayShuffle is a variant of your code,
with a nice proof that it's unbiased.
http://c2.com/cgi/wiki?LinearShuffle has a lot of discussion on
shuffling algorithms.
martin
Thanks Martin!
--Steve