default rand(n::Int) repeating numbers for n>=383

270 views
Skip to first unread message

David van Leeuwen

unread,
Apr 18, 2014, 9:23:34 AM4/18/14
to juli...@googlegroups.com
Hi, 

I know making random number generators is a non-trivial thing. 

In debugging my own code I stumbled upon some strange behavior of the default rand(n::Int).  For n>=383 (but maybe earlier), the float numbers in the array can occur more than once.  The following code strips this behavior to the bone:

function randtest(n::Int)
    r
= rand(n)
   
for i=1:n
       
for j=(i+1):n
           
if r[i] == r[j]
                println
("Random match found at ", i, ",", j)
               
return(true)
           
end
       
end
   
end
   
return false
end


When called with 383 or higher, after a few tries, I find hits of the same number occurring twice.  It appears that the last number in the array is always involved, and there even is some pattern in the other number involved in consecutive calls.  I've tested this on 2 different compiles:

Version 0.3.0-prerelease+2519 (2014-04-07 02:50 UTC) Commit 11972c1 (11 days old master) (linux)

Version 0.3.0-prerelease+1561 (2014-02-13 16:38 UTC) Commit d15ce97* (63 days old master) (macos)

Cheers, 

---david

Jacob Quinn

unread,
Apr 18, 2014, 11:43:55 AM4/18/14
to juli...@googlegroups.com
I'm not really seeing this. Am I running this test right? I'm on windows 8.1, master built yesterday.

In  [253]: for i = 1:100
    randtest(5000)
end

I get no hits running the above.

-Jacob


On Fri, Apr 18, 2014 at 9:23 AM, David van Leeuwen <david.va...@gmail.com> wrote:
2


John Myles White

unread,
Apr 18, 2014, 11:49:53 AM4/18/14
to juli...@googlegroups.com
On an 10.9 OS X build from last night, I also don't reproduce this.

Alternative test that also works for me (under the assumption of reliable hashing):

x = rand(1_000_000); length(unique(x)) == length(x)

Maybe a brief bug in old specific build?

-- John

Matt Bauman

unread,
Apr 18, 2014, 12:02:59 PM4/18/14
to juli...@googlegroups.com
Very curious.  I'm seeing the behavior on the current OS X build… and it is the same regardless of the RNG seed.

julia> srand(0)

julia
> [randtest(383) for _=1:10]' #'
Random match found at 2,383
Random match found at 3,383
Random match found at 4,383
Random match found at 5,383
Random match found at 6,383
Random match found at 7,383
Random match found at 8,383
Random match found at 9,383
Random match found at 10,383
1x10 Array{Bool,2}:
 
false  true  true  true  true  true  true  true  true  true

julia
> srand(1)

julia
> [randtest(383) for _=1:10]' #'
Random match found at 2,383
Random match found at 3,383
Random match found at 4,383
Random match found at 5,383
Random match found at 6,383
Random match found at 7,383
Random match found at 8,383
Random match found at 9,383
Random match found at 10,383
1x10 Array{Bool,2}:
 
false  true  true  true  true  true  true  true  true  true

julia
> srand(rand(Uint64,1)[1])

julia
> [randtest(383) for _=1:10]' #'
Random match found at 2,383
Random match found at 3,383
Random match found at 4,383
Random match found at 5,383
Random match found at 6,383
Random match found at 7,383
Random match found at 8,383
Random match found at 9,383
Random match found at 10,383
1x10 Array{Bool,2}:
 
false  true  true  true  true  true  true  true  true  true

# John's test behaves similarly:
julia> srand(0)

julia> x = rand(383); length(unique(x)) == length(x) # First invocation after seeding is ok
true

julia> x = rand(383); length(unique(x)) == length(x) # Subsequent ones have duplicate numbers
false

Julia Version 0.3.0-prerelease+2658
Commit 752dc20 (2014-04-18 11:35 UTC)
Platform Info:
 
System: Darwin (x86_64-apple-darwin13.1.0)
  CPU
: Intel(R) Core(TM) i5 CPU       M 520  @ 2.40GHz
  WORD_SIZE
: 64
  BLAS
: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
  LAPACK
: libopenblas
  LIBM
: libopenlibm

Matt Bauman

unread,
Apr 18, 2014, 12:38:43 PM4/18/14
to juli...@googlegroups.com
A few more points as I play with this over lunch: 383 does indeed seem to be the smallest vector for which this occurs… after that it seems to be all odd-lengths.  The seeded value is irrelevant, but the first time after re-seeding always seems to be ok.

julia> f(from,to) = [(srand(0); rand(n); length(unique(rand(n))) != n) for n in from:to]
f (generic function with 1 method)

julia> find(f(1,400))
9-element Array{Int64,1}:
 383
 385
 387
 389
 391
 393
 395
 397
 399

julia> find(f(1_000_001,1_000_010))
5-element Array{Int64,1}:
 1
 3
 5
 7
 9

John Myles White

unread,
Apr 18, 2014, 12:43:48 PM4/18/14
to juli...@googlegroups.com
Interesting. 383 does reliably produce this, whereas 38300 doesn't. Is this a result of trying to cleverly reuse entropy at some point?

 -- John

Jacob Quinn

unread,
Apr 18, 2014, 12:50:52 PM4/18/14
to juli...@googlegroups.com
Confirmed that using 383 specifically results in duplicates on windows.

-Jacob

Matt Bauman

unread,
Apr 18, 2014, 1:24:28 PM4/18/14
to juli...@googlegroups.com, quinn....@gmail.com
Well, 383 makes sense.  for N <= 382 (which is Base.LibRandom.dsfmt_min_array_size), the array is iteratively filled by single rand() invocations.  And odd-length-only makes sense, as for N > 382, the array is filled two Float64s at a time, with the last element getting set by a single rand() call.  I that last rand() call is assuming that the state of the RNG is unchanged by the preceding call… and as such is *overwriting* the state instead of incrementing it.  Or there's a similar kind of race condition.  That explains why the seed is irrelevant.  librandom.jl:120

julia> srand(0); rand(383);

julia> x = rand(383);

julia> find(x .== rand())
1-element Array{Int64,1}:
 3

julia> find(x .== rand())
1-element Array{Int64,1}:
 4

julia> find(x .== rand())
1-element Array{Int64,1}:
 5

Jeff Bezanson

unread,
Apr 18, 2014, 2:43:16 PM4/18/14
to juli...@googlegroups.com
This is terrifying. Thank you David for discovering and reporting
this. Matt, could you copy your findings into an issue?

Matt Bauman

unread,
Apr 18, 2014, 3:01:04 PM4/18/14
to juli...@googlegroups.com

David van Leeuwen

unread,
Apr 18, 2014, 3:34:38 PM4/18/14
to juli...@googlegroups.com, quinn....@gmail.com
Hello,

On Friday, April 18, 2014 7:24:28 PM UTC+2, Matt Bauman wrote:
Well, 383 makes sense.  for N <= 382 (which is Base.LibRandom.dsfmt_min_array_size), the array is iteratively filled by single rand() invocations.  And odd-length-only makes sense, as for N > 382, the array is filled two Float64s at a time, with the last element getting set by a single rand() call.  I that last rand() call is assuming that the state of the RNG is unchanged by the preceding call… and as such is *overwriting* the state instead of incrementing it.  Or there's a similar kind of race condition.  That explains why the seed is irrelevant.  librandom.jl:120

Yes, I had forgotten to mention the odd-property of the behavior.  I am glad that that, as well as the 383, have been explained.  

A little background to the stumbling: I was debugging code that short-time Gaussiansizes data, and found that for some lengths the mean of the gaussianized numbers wasn't 0, for random numbers as input data.  It was a small range between 383 and 399---the latter number being the max width of the window I used.  My implementation counts the number of values in the window smaller than the present value, and warps that index to the inverse normal distribution.  But for some input lengths---the odd numbers above 383, the index found was different from invperm(sortperm(data))---and I finally attributed this to a number occurring twice.  Which I found a bit odd (pun intended). 

Cheers, 

---david

Viral Shah

unread,
Apr 28, 2014, 1:31:11 AM4/28/14
to juli...@googlegroups.com
This is now fixed, by disabling the fill_array generators. 

-viral
Reply all
Reply to author
Forward
0 new messages