need some Ruby magic

Hammed Malik

unread,

Dec 1, 2005, 2:42:42 PM12/1/05

to

I'd like to sort collections randomly. This is what I tried first:

my_array.sort { |a,b| rand(2) }
>

but the results weren't very random. I then tried the following:

class Array
> def random_copy
> b = Array.new
> while b.length < self.length
> random_value = self[rand self.length]
> b.push random_value unless b.include? random_value
> end
> b
> end
> end

This works but I'm sure there's some Ruby magic for doing this better.

thanks

David A. Black

unread,

Dec 1, 2005, 2:48:51 PM12/1/05

to

Hi --

I think the most common idiom is:

array.sort_by { rand }

David

--
David A. Black
dbl...@wobblini.net

"Ruby for Rails", forthcoming from Manning Publications, April 2006!

Jim Weirich

unread,

Dec 1, 2005, 2:49:04 PM12/1/05

to

hammed wrote:
> I'd like to sort collections randomly. This is what I tried first:
>
> my_array.sort { |a,b| rand(2) }

my_array.sort_by { rand }

--
Posted via http://www.ruby-forum.com/.

Hammed Malik

unread,

Dec 1, 2005, 3:07:02 PM12/1/05

to

excellent! thanks.

Vincent Foley

unread,

Dec 1, 2005, 4:08:30 PM12/1/05

to

Maybe there should be Array#randomize...

Jeff Wood

unread,

Dec 1, 2005, 4:36:58 PM12/1/05

to

Take a look @ the Facets project. There are extension methods for array to
have a number of randomized behaviors.

j.

On 12/1/05, Vincent Foley <vfo...@gmail.com> wrote:
>
> Maybe there should be Array#randomize...
>
>
>

--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood

Reinder Verlinde

unread,

Dec 3, 2005, 12:27:49 PM12/3/05

to

In article <dd3f270e4d20842e...@ruby-forum.com>,
Jim Weirich <jim-keyword-...@weirichhouse.org> wrote:

> hammed wrote:
> > I'd like to sort collections randomly. This is what I tried first:
> >
> > my_array.sort { |a,b| rand(2) }
>
> my_array.sort_by { rand }

A quick test seems to indicate that this works with the ruby 1.8.2
implementation on my Mac. However, there is no guarantee that this call
will select each of the n! permutations with equal probability.

Whether it does will depend on the exact algorithm used by 'sort'. For
instance, if sort uses the (inefficient for large arrays) bubble sort,
the last key will end up staying there in half the cases.

Reinder

Stephen Waits

unread,

Dec 3, 2005, 2:48:02 PM12/3/05

to

On Dec 3, 2005, at 9:32 AM, Reinder Verlinde wrote:

> However, there is no guarantee that this call
> will select each of the n! permutations with equal probability.

How about just doing this?

class Array
def shuffle
newarr = self
self.length.times do |x|
y = rand(self.length)
newarr[x], newarr[y] = newarr[y], newarr[x]
end
newarr
end
end

shuffled_array = my_array.shuffle

--Steve

Sam Gentle

unread,

Dec 3, 2005, 4:22:08 PM12/3/05

to

Actually, sort_by caches every value passed and uses those for
comparison (aka a Schwartzian Transform), so it would essentially
return the same results as sorting a list of random numbers,
regardless of algorithm.

Sam

Jim Weirich

unread,

Dec 3, 2005, 6:48:11 PM12/3/05

to

reinder wrote:
>> my_array.sort_by { rand }
>
> A quick test seems to indicate that this works with the ruby 1.8.2
> implementation on my Mac. However, there is no guarantee that this call
> will select each of the n! permutations with equal probability.

Actually, it will. sort_by uses a Schwartzian transform to do the
sorting. That means that rand is only called once for each element of
the array. As long as rand gives a decent distribution of random
numbers, the permutation will be random too.

Here's a little experiment:

-------------------------------------------------
N = (ARGV.shift || 1000).to_i
A = %w(a b c)
Perms = %w(abc acb bac bca cab cba)
Score = Hash.new { |h, k| h[k] = 0 }
N.times do
sorted = A.dup.sort_by { rand }
Score[sorted.join("")] += 1
end

Score.keys.sort.each do |key|
puts "#{key}: #{Score[key]}"
end
-------------------------------------------------

Gives:

$ ruby sortdist.rb 100000
abc: 16693
acb: 16688
bac: 16752
bca: 16590
cab: 16475
cba: 16802

Thats a pretty good distribution for the number of iterations.

-- Jim Weirich

ara.t....@noaa.gov

unread,

Dec 3, 2005, 8:21:02 PM12/3/05

to

On Sun, 4 Dec 2005, Mauricio [iso-8859-1] Fernández wrote:

> On Sun, Dec 04, 2005 at 08:48:11AM +0900, Jim Weirich wrote:
>> reinder wrote:
>>>> my_array.sort_by { rand }
>>>
>>> A quick test seems to indicate that this works with the ruby 1.8.2
>>> implementation on my Mac. However, there is no guarantee that this call
>>> will select each of the n! permutations with equal probability.
>>
>> Actually, it will. sort_by uses a Schwartzian transform to do the
>> sorting. That means that rand is only called once for each element of
>> the array. As long as rand gives a decent distribution of random
>> numbers, the permutation will be random too.
>

> sort_by{ rand } is actually biased, since sort_by will preserve the
> relative order of the elements for which rand() returned the same value.
>
> %w[a b c d].sort_by{ 0 } # => ["a", "b", "c", "d"]
> i = 0
> %w[a b c d].sort_by{ 10 - (i += 1) / 2 } # => ["d", "b", "c", "a"]
>
> This means that permutations preserving the relative order of one (or
> more) pair of elements of the original array are a bit more probable.
>
> In praxis, the bias this introduces is often so small that the mere
> succinctness of sort_by{ rand } more than makes up for it.

does

%w( a b c d ).sort{ rand <=> rand }

help?

-a
--
===============================================================================
| ara [dot] t [dot] howard [at] noaa [dot] gov
| all happiness comes from the desire for others to be happy. all misery
| comes from the desire for oneself to be happy.
| -- bodhicaryavatara
===============================================================================

Mauricio Fernández

unread,

Dec 3, 2005, 8:04:00 PM12/3/05

to

On Sun, Dec 04, 2005 at 08:48:11AM +0900, Jim Weirich wrote:

> reinder wrote:
> >> my_array.sort_by { rand }
> >
> > A quick test seems to indicate that this works with the ruby 1.8.2
> > implementation on my Mac. However, there is no guarantee that this call
> > will select each of the n! permutations with equal probability.
>
> Actually, it will. sort_by uses a Schwartzian transform to do the
> sorting. That means that rand is only called once for each element of
> the array. As long as rand gives a decent distribution of random
> numbers, the permutation will be random too.

sort_by{ rand } is actually biased, since sort_by will preserve the

relative order of the elements for which rand() returned the same value.

%w[a b c d].sort_by{ 0 } # => ["a", "b", "c", "d"]
i = 0
%w[a b c d].sort_by{ 10 - (i += 1) / 2 } # => ["d", "b", "c", "a"]

This means that permutations preserving the relative order of one (or
more) pair of elements of the original array are a bit more probable.

In praxis, the bias this introduces is often so small that the mere
succinctness of sort_by{ rand } more than makes up for it.

--
Mauricio Fernandez

Brian Buckley

unread,

Dec 4, 2005, 10:59:18 AM12/4/05

to

> sort_by{ rand } is actually biased, since sort_by will preserve the
> relative order of the elements for which rand() returned the same value.

How frequently would rand return a same value? (In theory it would be
never but does anyone know the reality?)

Stephen Waits

unread,

Dec 4, 2005, 11:54:13 AM12/4/05

to

On Dec 4, 2005, at 7:59 AM, Brian Buckley wrote:

> How frequently would rand return a same value? (In theory it would be
> never but does anyone know the reality?)

According to the Pickaxe, Kernel.rand is an implementation of the
[Mersenne Twister][1] algorithm. If that is still the case, then in
reality it will begin repeating exactly the same numbers after
2^19937-1 (approx. 4e6001) pseudo-random numbers, as that's the
period of the MT. The period is the point at which the exact same
series of PRNs will begin again. For example, if it started out as
1, 2, 3, ..., then after 2^19937-1, it'd start again at 1, 2, 3.

But, that's not what you really care about.

Numbers will certainly repeat within the period, since the period is
so much larger than our integer representation. For example, a 32
bit integer can only represent about 4e9. So, the likelihood of
getting any specific integer, in the 32 bit case, is 1/(2^32). This
is an extreme example:

5.times { p rand(2) }
5.times { p rand(1000) }

The bottom line is that the PRNs should be approximately uniformly
distributed. Apply that to your specific range, and you have your
answer.

--Steve

[1]: http://en.wikipedia.org/wiki/Mersenne_twister

Stephen Waits

unread,

Dec 4, 2005, 12:16:59 PM12/4/05

to

On Dec 4, 2005, at 8:54 AM, Stephen Waits wrote:

> Numbers will certainly repeat within the period, since the period
> is so much larger than our integer representation. For example, a
> 32 bit integer can only represent about 4e9. So, the likelihood of
> getting any specific integer, in the 32 bit case, is 1/(2^32).

I should add to this that you can approximate when you'll get
repeats, and it's likely to be much sooner than you'd guess. For
more information, see the [Birthday Problem][1].

--Steve

[1]: http://mathworld.wolfram.com/BirthdayProblem.html

Kero

unread,

Dec 4, 2005, 3:27:55 PM12/4/05

to

> How about just doing this?
>
> class Array
> def shuffle
> newarr = self
> self.length.times do |x|
> y = rand(self.length)
> newarr[x], newarr[y] = newarr[y], newarr[x]
> end
> newarr
> end
> end
>
> shuffled_array = my_array.shuffle

"Just" doing that
1) changes the receiver (use new_arr = self.dup instead)
2) definitely has a bias. Look e.g. at an array of length three. You will
first swap the first element with 3 possibilities, then the second and so
on; you're going through 3**3 possibilities. There are 3! possible
permutations, but alas, 3! == 6 is not a perfect divisor of 3**3 == 27,
thus you have a bias.

Stephen Waits

unread,

Dec 4, 2005, 4:20:41 PM12/4/05

to

On Dec 4, 2005, at 12:32 PM, Kero wrote:

> 1) changes the receiver (use new_arr = self.dup instead)
> 2) definitely has a bias. Look e.g. at an array of length three.
> You will
> first swap the first element with 3 possibilities, then the
> second and so
> on; you're going through 3**3 possibilities. There are 3! possible
> permutations, but alas, 3! == 6 is not a perfect divisor of 3**3
> == 27,
> thus you have a bias.

Good points, thanks. How about this?

class Array
def shuffle
newarr = self.dup
2.times do

self.length.times do |x|
y = rand(self.length)
newarr[x], newarr[y] = newarr[y], newarr[x]
end

end
newarr
end
end

# test code lifted from Jim Weirich's earlier post

N = (ARGV.shift || 1000).to_i
A = %w(a b c)
Perms = %w(abc acb bac bca cab cba)
Score = Hash.new { |h, k| h[k] = 0 }
N.times do

sorted = A.shuffle

Score[sorted.join("")] += 1
end

Score.keys.sort.each do |key|
puts "#{key}: #{Score[key]}"
end

[~/Code/private/code/snippets] 382% ./shuffle.rb 100000
abc: 16596
acb: 16394
bac: 16700
bca: 16684
cab: 16841
cba: 16785

0:07.87 seconds, 96.4% CPU

This is really just meant to be a simple hack. I think a better
algorithm could assign a random index to each element, then sort on
those indices.

--Steve

Brian Buckley

unread,

Dec 5, 2005, 9:49:04 AM12/5/05

to

> repeats, and it's likely to be much sooner than you'd guess. For
> more information, see the [Birthday Problem][1].
>
> --Steve
>
> [1]: http://mathworld.wolfram.com/BirthdayProblem.html

I liked the Birthday Problem. Thanks.

Mauricio Fernández

unread,

Dec 5, 2005, 3:23:17 PM12/5/05

to

On Sun, Dec 04, 2005 at 10:21:02AM +0900, ara.t....@noaa.gov wrote:
> On Sun, 4 Dec 2005, Mauricio [iso-8859-1] Fernández wrote:
> >sort_by{ rand } is actually biased, since sort_by will preserve the
> >relative order of the elements for which rand() returned the same value.

I did some numbers and found the following probability estimates:

array size P(#rand() collision)
1000 5.54558e-11
10000 5.55056e-09
1000000 5.55096e-05
10000000 5.53574e-03

A collision implies that the associated pair of elements has got the same
ordering in the output array, instead of 50-50 chances of it being reversed.

More info, including the Ruby code I used, available at
http://eigenclass.org/hiki.rb?sort_by+rand+is+biased

> does
>
> %w( a b c d ).sort{ rand <=> rand }
>
> help?

I'm not totally sure it's really unbiased, but at first sight it looks good.

When called with no arguments, rand will use

static double
genrand_real(void)
{
unsigned long a=genrand_int32()>>5, b=genrand_int32()>>6;
return(a*67108864.0+b)*(1.0/9007199254740992.0);
}

so it would seem that two consecutive calls to rand() cannot return the
same value, but I still have to think about the effect of qsort.

It is however somewhat slower than sort_by{ rand }.

[100, 1000, 10000, 100000].each do |n|
GC.start
t0 = Time.new
(1..n).sort_by{rand}
a = (t1 = Time.new) - t0 # => 0.000425, 0.003002, 0.054392, 0.939692
(1..n).sort{rand <=> rand}
b = Time.new - t1 # => 0.001161, 0.026275, 0.289807, 3.791833
"%4.2f" % [b / a] # => "2.73", "8.75", "5.33", "404"
end
RUBY_VERSION # => "1.8.3"
RUBY_RELEASE_DATE # => "2005-09-24"

--
Mauricio Fernandez

Stephen Waits

unread,

Dec 5, 2005, 5:11:43 PM12/5/05

to

Mauricio Fernández wrote:
>> does
>>
>> %w( a b c d ).sort{ rand <=> rand }
>>
>> help?
>

> It is however somewhat slower than sort_by{ rand }.

It's considerably slower. Changing it to sort { rand(10000) <=> 5000 }
shows a 30% speedup on 10k element Arrays on my system. Don't know why
I chose those numbers, I suppose bigger numbers reduce the "0"
comparison, considering { rand(3) <=> 1 }.

Anyway, I put together a small benchmark to compare my (revised)
shuffle, to sort, and sort_by:

#!/usr/bin/env ruby

class Array
def shuffle
newarr = self.dup
2.times do
self.length.times do |x|
y = rand(self.length)
newarr[x], newarr[y] = newarr[y], newarr[x]
end
end
newarr
end
end

printf(" size shuffle sort sort_by\n")

[10, 100, 1000, 10000].each do |n|

# populate an Array size n
a = Array.new(n) { |i| i }

# my method
start = Time.new
100.times do
b = a.shuffle
end
a_time = Time.new - start

# Array's sort
start = Time.new
100.times do
b = a.sort { rand(10000) <=> 5000 }
end
b_time = Time.new - start

# Enumerable's sort_by
start = Time.new
100.times do
b = a.sort_by { rand }
end
c_time = Time.new - start

# results
printf("%6d %5.2f %5.2f %5.2f\n", n, a_time, b_time, c_time)

end

Which, on my P4/3.0GHz (Win32) emits this:

size shuffle sort sort_by
10 0.00 0.02 0.00
100 0.16 0.13 0.03
1000 1.61 2.03 0.44
10000 16.31 28.14 4.78

--Steve

ara.t....@noaa.gov

unread,

Dec 5, 2005, 6:23:52 PM12/5/05

to

On Tue, 6 Dec 2005, Mauricio [iso-8859-1] Fernández wrote:

> On Sun, Dec 04, 2005 at 10:21:02AM +0900, ara.t....@noaa.gov wrote:
>> On Sun, 4 Dec 2005, Mauricio [iso-8859-1] Fernández wrote:
>>> sort_by{ rand } is actually biased, since sort_by will preserve the
>>> relative order of the elements for which rand() returned the same value.
>
> I did some numbers and found the following probability estimates:
>
> array size P(#rand() collision)
> 1000 5.54558e-11
> 10000 5.55056e-09
> 1000000 5.55096e-05
> 10000000 5.53574e-03
>
> A collision implies that the associated pair of elements has got the same
> ordering in the output array, instead of 50-50 chances of it being reversed.
>
> More info, including the Ruby code I used, available at
> http://eigenclass.org/hiki.rb?sort_by+rand+is+biased
>
>> does
>>
>> %w( a b c d ).sort{ rand <=> rand }
>>
>> help?
>
> I'm not totally sure it's really unbiased, but at first sight it looks good.

the big difference is that a value will have rand called multiple times for it
- in otherwords an element compareds differently every time. this is bound to
help shuffle more uniformly it seems:

harp:~ > cat a.rb
%w( a b c d ).sort{|a,b| ra = rand; rb = rand; p a => ra; p b => rb; ra <=> rb }

harp:~ > ruby a.rb
{"a"=>0.181439380218727}
{"c"=>0.579403886629126}
{"c"=>0.713513989939958}
{"d"=>0.573687508540169}
{"a"=>0.851534740906118}
{"d"=>0.00156074151427565}
{"b"=>0.803591007691647}
{"a"=>0.494066110872563}
{"b"=>0.834676789626288}
{"c"=>0.792106134460754}

so, note that "a" was sorted using a different value each time.

can anyone who remembers stats and knows the sort algorithm weigh in?

Uwe Schmitt

unread,

Dec 6, 2005, 5:21:49 AM12/6/05

to

||

|| > More info, including the Ruby code I used, available at
|| > http://eigenclass.org/hiki.rb?sort_by+rand+is+biased
|| >
|| >> does
|| >>
|| >> %w( a b c d ).sort{ rand <=> rand }
|| >>
|| >> help?

Hi, I'm new to ruby, but I use Python for some years now.
So I suggest the following 'pythonic' solution ;-)

def shuffle(a)

b=[]
# decorate
a.each { |x| b << [rand,x]; }

# sort
b.sort! { |x,y| x[0] <=> y[0] }

# undecorate
b.collect { |x| x[1] }
end

a=%w{a b c d e f}
puts shuffle a

Is there a shorter or more elegant way to do this
'decorate-sort-undecorate' in ruby ?
In Python it is quite easy by using list comprehensions...

Greetings, Uwe

Mauricio Fernández

unread,

Dec 6, 2005, 5:10:33 AM12/6/05

to

On Tue, Dec 06, 2005 at 08:23:52AM +0900, ara.t....@noaa.gov wrote:
> the big difference is that a value will have rand called multiple times for

That might be the very reason why it is ultimately much more biased than
sort_by{ rand }... I'm keeping the latter.
If the limited precision of rand() became a problem, I could always use
arr.sort_by{ [rand, rand] } where collisions happen with a probability
around 6.16298e-19 for a 10_000_000 element array, or more generally
Array.new(N){rand} (but then we'd have to look into MT for another
possible source of residual bias).

> it - in otherwords an element compareds differently every time. this is
> bound to help shuffle more uniformly it seems:

We cannot trust our intuition in these cases; in the following example, the
tenth element is more likely to stay where it was (look at pos[9]):

a = (1..100); pos = Array.new(100, 0)

t = Time.new; 100000.times{ pos[a.sort{rand <=> rand}.index(10)] += 1 }
Time.new - t
# => 223.891608

pos
# => [1176, 1208, 1083, 1153, 1191, 1130, 1098, 1133, 1189, 1438, 1150, 1043,
========
# 1121, 1067, 1076, 1069, 1078, 1113, 1129, 1143,1115, 1074, 1081, 1069, 1057,
# [...]

t = Time.new; 10000000.times{ pos[a.sort{rand <=> rand}.index(10)] += 1 }
Time.new - t
# => 22781.664168

pos
# => [105944, 105861, 105546, 105681, 104059, 101196, 99080, 100675, 110851,
# 132223, 101670, 96018, 96147, 96688, 97709, 99051, 97823, 97345, 98799,
========
# [...]

The bias can be observed easily with smaller arrays:

a = (1..20); pos = Array.new(20, 0)
t = Time.new
100000.times{ pos[a.sort{rand <=> rand}.index(3)] += 1 }
Time.new - t # => 20.848851
pos
# => [5084, 5694, 6561, 4326, 4205, 4792, 4913, 4827, 4791, 4571, 4560, 4707,
# 4863, 4934, 5112, 5011, 5071, 5326, 5482, 5170]

a = (1..10); pos = Array.new(10, 0)
t = Time.new
1_000_000.times{ pos[a.sort{rand <=> rand}.index(5)] += 1 }
Time.new - t # => 77.438912
pos
# => [113836, 103238, 99442, 88682, 101306, 83294, 93296, 100830, 100744, 115332]

Compare the latter to

a = (1..10); pos = Array.new(10, 0)
t = Time.new
1_000_000.times{ pos[a.sort_by{rand}.index(5)] += 1 }
Time.new - t # => 32.163499
pos
# => [99822, 100020, 100328, 100040, 99603, 100098, 100195, 100023, 99875, 99996]

--
Mauricio Fernandez

Michael Ulm

unread,

Dec 6, 2005, 8:24:01 AM12/6/05

to

Mauricio Fernández wrote:

> On Tue, Dec 06, 2005 at 08:23:52AM +0900, ara.t....@noaa.gov wrote:
>
>>the big difference is that a value will have rand called multiple times for
>
>
> That might be the very reason why it is ultimately much more biased than
> sort_by{ rand }... I'm keeping the latter.

Indeed,

ar = [0, 1, 2]; result = Hash.new(0)
100000.times {result[ar.sort {rand <=> rand}] += 1}
p result
# => {[2, 1, 0]=>25235,
[2, 0, 1]=>12285,
[1, 0, 2]=>12426,
[0, 1, 2]=>25111,
[1, 2, 0]=>12468,
[0, 2, 1]=>12475}

Michael

--
Michael Ulm
R&D Team
ISIS Information Systems Austria
tel: +43 2236 27551-219, fax: +43 2236 21081
e-mail: micha...@isis-papyrus.com
Visit our Website: www.isis-papyrus.com

Mauricio Fernández

unread,

Dec 6, 2005, 8:34:46 AM12/6/05

to

On Tue, Dec 06, 2005 at 07:32:34PM +0900, Uwe Schmitt wrote:
> Hi, I'm new to ruby, but I use Python for some years now.
> So I suggest the following 'pythonic' solution ;-)

[...]

> Is there a shorter or more elegant way to do this
> 'decorate-sort-undecorate' in ruby ?

arr.sort_by{rand}

> In Python it is quite easy by using list comprehensions...

What does that look like in Python?
Is it shorter than the above one? }:-)

--
Mauricio Fernandez

Uwe Schmitt

unread,

Dec 6, 2005, 9:21:43 AM12/6/05

to

||
|| On Tue, Dec 06, 2005 at 07:32:34PM +0900, Uwe Schmitt wrote:
|| > Hi, I'm new to ruby, but I use Python for some years now.
|| > So I suggest the following 'pythonic' solution ;-)
|| [...]
|| > Is there a shorter or more elegant way to do this
|| > 'decorate-sort-undecorate' in ruby ?
||
|| arr.sort_by{rand}

that is not equivalent to my solution.
the python equivalent to yours is

arr.sort(key = lambda x: random())

||
|| > In Python it is quite easy by using list comprehensions...
|| What does that look like in Python?

from random import random

def shuffle(a):
b = [ (random(), i) for i in a]
b.sort()
return [ x[1] for x in b ]

print shuffle(range(10))

|| Is it shorter than the above one? }:-)
||
|| --
|| Mauricio Fernandez

greetings, Uwe

||

Michael Ulm

unread,

Dec 6, 2005, 10:06:23 AM12/6/05

to

Uwe Schmitt wrote:

> || arr.sort_by{rand}
>
> that is not equivalent to my solution.
> the python equivalent to yours is
>
> arr.sort(key = lambda x: random())
>
> ||
> || > In Python it is quite easy by using list comprehensions...
> || What does that look like in Python?
>
> from random import random
>
> def shuffle(a):
> b = [ (random(), i) for i in a]
> b.sort()
> return [ x[1] for x in b ]
>
> print shuffle(range(10))
>

I would have thought those two functions do the same thing
with the exception that shuffle does not sort in place.
I'm probably missing something obvious. Could you explain
the difference between the two versions?

Christian Neukirchen

unread,

Dec 6, 2005, 10:41:39 AM12/6/05

to

Mauricio Fernández <m...@acm.org> writes:

> When called with no arguments, rand will use
>
> static double
> genrand_real(void)
> {
> unsigned long a=genrand_int32()>>5, b=genrand_int32()>>6;
> return(a*67108864.0+b)*(1.0/9007199254740992.0);
> }
>
> so it would seem that two consecutive calls to rand() cannot return the
> same value, but I still have to think about the effect of qsort.

And that is a good thing? How can the sequence be "random" if it
never contains the same two numbers in row? (Remember Enigma...)

> Mauricio Fernandez
--
Christian Neukirchen <chneuk...@gmail.com> http://chneukirchen.org

Uwe Schmitt

unread,

Dec 6, 2005, 10:38:33 AM12/6/05

to

|| > || arr.sort_by{rand}
|| > that is not equivalent to my solution.
|| > the python equivalent to yours is
|| > arr.sort(key = lambda x: random())

|| > || > In Python it is quite easy by using list comprehensions...
|| > || What does that look like in Python?
|| >
|| > from random import random
|| >
|| > def shuffle(a):
|| > b = [ (random(), i) for i in a]
|| > b.sort()
|| > return [ x[1] for x in b ]
|| >
|| > print shuffle(range(10))

|| I would have thought those two functions do the same thing
|| with the exception that shuffle does not sort in place.
|| I'm probably missing something obvious. Could you explain
|| the difference between the two versions?

If you use random as sorting key, the result of a<=>b
will differ each time you evaluate a<=>b.
In my case I attach a random number to each item,
then I sort my list according to this number and remove
this 'decoration' afterwards.

Pythonprogrammers call this pattern "decorate-sort-undecorate".

As I do not now anything about the ruby interanls of sort,
I prefer my version.

Greetings, Uwe.

ts

unread,

Dec 6, 2005, 11:05:56 AM12/6/05

to

>>>>> "U" == Uwe Schmitt <sch...@num.uni-sb.de> writes:

U> If you use random as sorting key, the result of a<=>b
U> will differ each time you evaluate a<=>b.
U> In my case I attach a random number to each item,
U> then I sort my list according to this number and remove
U> this 'decoration' afterwards.

run

ri Enum#sort_by

and see what it do.

Guy Decoux

Bill Kelly

unread,

Dec 6, 2005, 11:12:14 AM12/6/05

to

Hi,

From: "Uwe Schmitt" <sch...@num.uni-sb.de>
> ||
> || On Tue, Dec 06, 2005 at 07:32:34PM +0900, Uwe Schmitt wrote:
> || > Is there a shorter or more elegant way to do this
> || > 'decorate-sort-undecorate' in ruby ?
> ||
> || arr.sort_by{rand}
>
> that is not equivalent to my solution.
> the python equivalent to yours is
>
> arr.sort(key = lambda x: random())

No, Array#sort_by does the 'decorate-sort-undecorate', aka.
Schwartzian Transform.

I don't read Python very well, but I think the ruby equivalent
to 'arr.sort(key = lambda x: random())' might be:

arr.sort { rand <=> rand }

But that is different from arr.sort_by {rand}

Regards,

Bill

Uwe Schmitt

unread,

Dec 6, 2005, 11:18:48 AM12/6/05

to

Thanks. As I told you before I'm new to ruby ;-)

Greetings, Uwe

Mauricio Fernández

unread,

Dec 6, 2005, 12:25:38 PM12/6/05

to

On Wed, Dec 07, 2005 at 12:41:39AM +0900, Christian Neukirchen wrote:
> Mauricio Fernández <m...@acm.org> writes:
>
> > When called with no arguments, rand will use
> >
> > static double
> > genrand_real(void)
> > {
> > unsigned long a=genrand_int32()>>5, b=genrand_int32()>>6;
> > return(a*67108864.0+b)*(1.0/9007199254740992.0);
> > }
> >
> > so it would seem that two consecutive calls to rand() cannot return the
> > same value, but I still have to think about the effect of qsort.
>
> And that is a good thing? How can the sequence be "random" if it
> never contains the same two numbers in row? (Remember Enigma...)

Nah, I misread genrand_real; the above shows it is possible.

--
Mauricio Fernandez

gabriele renzi

unread,

Dec 6, 2005, 12:52:29 PM12/6/05

to

Uwe Schmitt ha scritto:

Are you sure of what you say? AFAIK
sort(key=lambda x: somefunc..))

will do decorate/sort/undecorate under the hood, behaving like ruby's
#sort_by and like your version.
OTOH
sort(cmp=lambda x,y: something.. ) will behave like ruby's #sort.

Stephen Waits

unread,

Dec 6, 2005, 2:52:21 PM12/6/05

to

Ok, I think I've beat sort_by{ rand } on large Arrays , by about 30%. I
ended up with this:

h = Hash.new
self.each { |v| h[rand(1000000000)] = v }
h.keys.sort.collect { |k| h[k] }

Explanation..

* Hash works better than an Array of Arrays
[[rand,x[0]],[rand,x[1]],...] because Array#<=> seems to be slow.

* Got some speedup by avoiding Float#<=>, hence the Integer version of rand.

* On rand(1000000000); chose 1billion rather arbitrarily to keep things
out of the range of BigNum. Oddly, using 2^31-1 caused BigNums to show
up. Ideally we'd use the largest FixNum, but I don't know how to
retrieve that.

* Bias should be minimal for most applications, even at rand(1Billion).

* Can it be improved further???

Find my benchmark results, and script below.

--Steve

user system total real
shuffle (10) 0.016000 0.000000 0.016000 ( 0.016000)
shuffle2 (10) 0.000000 0.000000 0.000000 ( 0.000000)
shuffle3 (10) 0.000000 0.000000 0.000000 ( 0.000000)
sort (10) 0.000000 0.000000 0.000000 ( 0.000000)
sort_by (10) 0.000000 0.000000 0.000000 ( 0.000000)

user system total real
shuffle (100) 0.156000 0.000000 0.156000 ( 0.156000)
shuffle2 (100) 0.047000 0.000000 0.047000 ( 0.046000)
shuffle3 (100) 0.031000 0.000000 0.031000 ( 0.032000)
sort (100) 0.125000 0.000000 0.125000 ( 0.125000)
sort_by (100) 0.031000 0.000000 0.031000 ( 0.031000)

user system total real
shuffle (1000) 1.625000 0.000000 1.625000 ( 1.625000)
shuffle2 (1000) 0.656000 0.000000 0.656000 ( 0.672000)
shuffle3 (1000) 0.266000 0.000000 0.266000 ( 0.281000)
sort (1000) 2.094000 0.000000 2.094000 ( 2.093000)
sort_by (1000) 0.453000 0.000000 0.453000 ( 0.453000)

user system total real
shuffle (10000) 16.469000 0.032000 16.501000 ( 16.578000)
shuffle2 (10000) 8.156000 0.000000 8.156000 ( 8.203000)
shuffle3 (10000) 3.344000 0.047000 3.391000 ( 3.391000)
sort (10000) 25.453000 0.000000 25.453000 ( 25.593000)
sort_by (10000) 4.765000 0.000000 4.765000 ( 4.796000)

#!/usr/bin/env ruby

require 'benchmark'

class Array

def shuffle
newarr = self.dup
2.times do
self.length.times do |x|
y = rand(self.length)
newarr[x], newarr[y] = newarr[y], newarr[x]
end
end
newarr
end

def shuffle2
tmparr = self.collect { |x| [rand(1000000000),x] }
tmparr.sort.collect { |x| x[1] }
end

def shuffle3
h = Hash.new
self.each { |v| h[rand(1000000000)] = v }
h.keys.sort.collect { |k| h[k] }
end

end

[10, 100, 1000, 10000].each do |n|

# populate an Array size n
a = Array.new(n) { |i| i }

# run our benchmarks
Benchmark.bm(16) do |bb|

# my method
bb.report("shuffle ("+n.to_s+")") do
100.times do
a.shuffle
end
end

# decorate, sort, undecorate
bb.report("shuffle2 ("+n.to_s+")") do
100.times do
a.shuffle2
end
end

# decorate, sort, undecorate with a hash
bb.report("shuffle3 ("+n.to_s+")") do
100.times do
a.shuffle3
end
end

# Array's sort
bb.report("sort ("+n.to_s+")") do
100.times do

a.sort { rand(10000) <=> 5000 }
end

end

# Enumerable's sort_by
bb.report("sort_by ("+n.to_s+")") do
100.times do
a.sort_by { rand }
end
end

printf("\n")

end

Stephen Waits

unread,

Dec 6, 2005, 2:57:31 PM12/6/05

to

Stephen Waits wrote:
>
> h = Hash.new
> self.each { |v| h[rand(1000000000)] = v }
> h.keys.sort.collect { |k| h[k] }

OOPS!

I forgot to deal with collisions in the hash! :(

--Steve

Stephen Waits

unread,

Dec 6, 2005, 3:04:13 PM12/6/05

to

Stephen Waits wrote:
> I forgot to deal with collisions in the hash! :(

Here's another shot.. still faster than sort_by{rand} on my system, and
now should be unbiased:

def shuffle3
h = Hash.new

self.each do |v|
k = rand(1000000000) until !h.has_key?(k)
h[k] = v
end
end

shuffle3 (10000) 3.625000 0.016000 3.641000 ( 3.656000)
sort_by (10000) 5.250000 0.000000 5.250000 ( 5.312000)

--Steve

Mauricio Fernández

unread,

Dec 6, 2005, 4:16:12 PM12/6/05

to

On Wed, Dec 07, 2005 at 05:04:13AM +0900, Stephen Waits wrote:
> Stephen Waits wrote:
> >I forgot to deal with collisions in the hash! :(
>
> Here's another shot.. still faster than sort_by{rand} on my system, and
> now should be unbiased:
>
> def shuffle3
> h = Hash.new
> self.each do |v|
> k = rand(1000000000) until !h.has_key?(k)

This is not equivalent to
begin
k = rand(1000000000)
end until !h.has_key?

Compare
k = 1 until true
k # => nil
to
begin; k = 1 end until true
k # => 1

> h[k] = v

You forgot
h.keys.sort.collect { |k| h[k] }

> end
> end
>
> shuffle3 (10000) 3.625000 0.016000 3.641000 ( 3.656000)
> sort_by (10000) 5.250000 0.000000 5.250000 ( 5.312000)

After fixing shuffle3, I get:

module Enumerable

def shuffle3
h = Hash.new

k = rand(1000000000)

self.each do |v|
k = rand(1000000000) until !h.has_key?(k)
h[k] = v
end

h.keys.sort.collect { |k| h[k] }
end

def shuffle4
h = {}
k = rand(1000000000)
self.sort_by{ k = rand(1000000000) while h.has_key?(k); h[k] = k }
end
end

GC.disable
(1..10).shuffle4 # => [8, 3, 10, 2, 4, 6, 7, 5, 9, 1]
a = 1..100000
t0 = Time.new # => Tue Dec 06 22:13:56 CET 2005
a.shuffle4
(t1 = Time.new) - t0 # => 0.766411
a.shuffle3
(t2 = Time.new) - t1 # => 0.575967
a.sort_by{ rand }
Time.new - t2 # => 0.637288

--
Mauricio Fernandez

Stephen Waits

unread,

Dec 6, 2005, 6:18:23 PM12/6/05

to

Mauricio Fernández wrote:
>
> Compare
> k = 1 until true
> k # => nil
> to
> begin; k = 1 end until true
> k # => 1

Oops.. I thought too much like C. Now I understand that the loop is
called 0 or more times, except in the case of "begin; end while" where
it's 1 or more. Thank you for teaching me!

> After fixing shuffle3, I get:

I'm getting slightly different results, still better than sort_by{rand}
and better than your new shuffle4, but now only on larger Arrays. I
wonder too how performance might vary when Enumerable is mixed into ADTs
other than Array.

--Steve

user system total real
shuffle3 (1000) 0.375000 0.000000 0.375000 ( 0.375000)
shuffle4 (1000) 0.484000 0.000000 0.484000 ( 0.485000)
sort_by{rand} (1000) 0.328000 0.016000 0.344000 ( 0.344000)

user system total real
shuffle3 (10000) 3.906000 0.031000 3.937000 ( 3.953000)
shuffle4 (10000) 5.531000 0.047000 5.578000 ( 5.594000)
sort_by{rand} (10000) 4.672000 0.109000 4.781000 ( 4.812000)

user system total real
shuffle3 (100000) 49.703000 0.328000 50.031000 ( 50.280000)
shuffle4 (100000) 71.938000 0.453000 72.391000 ( 73.061000)
sort_by{rand} (100000) 62.000000 0.782000 62.782000 ( 63.451000)

module Enumerable

def shuffle3
h = {}
self.each do |v|
begin
k = rand(1000000000)
end while h.has_key?(k)

hma...@gmail.com

unread,

Dec 7, 2005, 9:53:10 AM12/7/05

to

Why not
a.dup.sort { 2*rand(12345) <=> 12345 }

? that way, we'd have NO colisions ever.

Brian Schröder

unread,

Dec 7, 2005, 10:12:30 AM12/7/05

to

What are you all trying to achieve by variations of
rand <=> rand,

why not simply use
rand(3) - 1
?

which is not in danger of decreasing randomness by combining random numbers

Michael Ulm

unread,

Dec 7, 2005, 10:19:48 AM12/7/05

to

hma...@gmail.com wrote:

What you have posted does not avoid collisions.
You probably mean something like

a.sort {2*rand(12345) <=> 2 * rand(12345) + 1}

However, the problem of the sort {rand<=>rand}
methods are not collisions, but the skewed results
from the algorithm that assumes a consistent
comparison operator. Quoting from an earlier
post of mine:

ar = [0, 1, 2]; result = Hash.new(0)
100000.times {result[ar.sort {rand <=> rand}] += 1}
p result
# => {[2, 1, 0]=>25235,
[2, 0, 1]=>12285,
[1, 0, 2]=>12426,
[0, 1, 2]=>25111,
[1, 2, 0]=>12468,
[0, 2, 1]=>12475}

You get basically the same result with your method

HTH,

Brian Schröder

unread,

Dec 7, 2005, 10:51:55 AM12/7/05

to

On 07/12/05, Michael Ulm <micha...@isis-papyrus.com> wrote:
> hma...@gmail.com wrote:
>
> > Why not
> > a.dup.sort { 2*rand(12345) <=> 12345 }
> >
> > ? that way, we'd have NO colisions ever.
> >
>
> What you have posted does not avoid collisions.
> You probably mean something like
>
> a.sort {2*rand(12345) <=> 2 * rand(12345) + 1}
>

> [snip]
>

and then he should do us a favour and write
> a.sort { [-1, 1][rand(2)] }
instead, which gives the same skewed results but at least does not do
it as complicated.

Brian

--
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/

Martin DeMello

unread,

Dec 7, 2005, 3:24:24 PM12/7/05

to

Stephen Waits <st...@waits.net> wrote:
>
>
> This is really just meant to be a simple hack. I think a better
> algorithm could assign a random index to each element, then sort on
> those indices.

http://www.rubygarden.org/ruby?ArrayShuffle is a variant of your code,
with a nice proof that it's unbiased.

http://c2.com/cgi/wiki?LinearShuffle has a lot of discussion on
shuffling algorithms.

martin

Stephen Waits

unread,

Dec 7, 2005, 4:16:11 PM12/7/05

to

Martin DeMello wrote:
> http://www.rubygarden.org/ruby?ArrayShuffle is a variant of your code,
> with a nice proof that it's unbiased.
>
> http://c2.com/cgi/wiki?LinearShuffle has a lot of discussion on
> shuffling algorithms.

Thanks Martin!

--Steve