randperm run time is slow

336 views
Skip to first unread message

Brian Lucena

unread,
Jan 22, 2016, 7:30:34 PM1/22/16
to julia-users
I am running a simulation that requires me to generate random permutations of size 300 million.  I would ideally like to run this generation in a loop 10K or 100K times. I am surprised that randperm is not faster in Julia than it is.  It seems to be considerably slower than the equivalent in R (and R is clearly not known for speed)  :) 

In Julia:

julia> @time asd=randperm(300000000)

 43.437829 seconds (6 allocations: 2.235 GB, 0.01% gc time)

300000000-element Array{Int64,1}:


In R


> start = Sys.time()

> asd = sample(300000000)

> Sys.time()-start

Time difference of 23.27244 secs


Julia seems to be twice as slow!

Any thoughts on why that is?  Does randperm use the "knuth shuffle" or does it use some other algorithm.

Thanks,
Brian

cdm

unread,
Jan 22, 2016, 8:10:54 PM1/22/16
to julia-users

Tim Holy

unread,
Jan 22, 2016, 8:23:38 PM1/22/16
to julia...@googlegroups.com
Try
@edit randperm(10)
and see for yourself.

My bet is that you could speed it up by generating all the random numbers
you'll need in one go, rather than generating them one-by-one. Want to give it
a shot?

--Tim

On Friday, January 22, 2016 02:54:51 PM Brian Lucena wrote:
> I am running a simulation that requires me to generate random permutations
> of size 300 million. I would ideally like to run this generation in a loop
> 10K or 100K times. I am surprised that randperm is not faster in Julia than
> it is. It seems to be considerably slower than the equivalent in R (and R
> is clearly not known for speed) :)
>
> In Julia:
>
> *julia> **@time asd=randperm(300000000)*
>
> 43.437829 seconds (6 allocations: 2.235 GB, 0.01% gc time)
>
> *300000000-element Array{Int64,1}:*

Viral Shah

unread,
Jan 22, 2016, 11:17:21 PM1/22/16
to julia-users
Generating one by one or together is a small part of the execution time:

julia> @time rand(3*10^8);
  1.281946 seconds (10 allocations: 2.235 GB, 0.23% gc time)

julia> r(n) = for i=1:n; rand(); end;
julia> @time r(3*10^8)
  0.632389 seconds (6 allocations: 192 bytes)

The one by one is perhaps faster because it doesn't have the array allocation overhead. So, it 


-viral

Viral Shah

unread,
Jan 22, 2016, 11:30:46 PM1/22/16
to julia-users
If you see the code as @timholy suggested, you will see the knuth shuffle implementation. Let's capture this as a Julia performance issue on github, if we can't figure out an easy way to speed this up right away.

-viral

Rafael Fourquet

unread,
Jan 23, 2016, 12:26:49 AM1/23/16
to julia...@googlegroups.com
> Let's capture this as a Julia performance issue on github,
> if we can't figure out an easy way to speed this up right away.

I think I remember having identified a potentially sub-optimal
implementation of this function few weeks back (perhaps no more than
what Tim suggested) and had planned to investigate further (when time
permits...)
Message has been deleted

Viral Shah

unread,
Jan 23, 2016, 9:30:07 AM1/23/16
to julia-users
Unfortunately, this makes the implementation GPL. If you can describe the algorithm in an issue on github, someone can do a cleanroom implementation.

-viral

On Saturday, January 23, 2016 at 3:17:19 PM UTC+5:30, Dan wrote:
Another way to go about it, is to look at R's implementation of a random permutation and recreate it in Julia, hoping for similar performance. Having done so, the resulting Julia code is:

function nrandperm(r::AbstractRNG, n::Integer)
           res
= Array(typeof(n),n)
           
if n == 0
             
return res
           
end
           a
= typeof(n)[i for i=1:n]
           nn
= n
           
@inbounds for i = 1:n
               j
= floor(Int,nn*rand(r))+1
               res
[i] = a[j]
               a
[j] = a[nn]
               nn
-= 1
           
end
           
return res
end
nrandperm
(n::Integer) = nrandperm(Base.Random.GLOBAL_RNG, n)

A size 1,000,000 permutation was generated x2 faster with this method.
But, this method uses uniform floating random numbers, which might not be uniform on integers for large enough numbers. In general, it should be possible to optimize a more correct implementation. But if R envy is a driver, R performance is recovered with a pure Julia implementation (the R implementation is in C).

Kevin Squire

unread,
Jan 23, 2016, 9:43:36 AM1/23/16
to julia-users
Hi Dan, 

It would also be good if you deleted that post (or edited it and removed the code), for the same reason Viral mentioned: if someone reads the post and then creates a pull request for changing the Julia implementation, it could be argued that that implementation was derived from GPL code, which makes it GPL.  A clean-room implementation (such as creating the patch from a higher level description of the code) is okay.

(Viral, it would probably be good for you to remove the code from your post as well.  I did so in this post below.)

Cheers,
   Kevin

On Saturday, January 23, 2016 at 6:30:07 AM UTC-8, Viral Shah wrote:
Unfortunately, this makes the implementation GPL. If you can describe the algorithm in an issue on github, someone can do a cleanroom implementation.

-viral

On Saturday, January 23, 2016 at 3:17:19 PM UTC+5:30, Dan wrote:
Another way to go about it, is to look at R's implementation of a random permutation and recreate it in Julia, hoping for similar performance. Having done so, the resulting Julia code is:

<GPL code removed>
A size 1,000,000 permutation was generated x2 faster with this method.

Rafael Fourquet

unread,
Jan 23, 2016, 10:08:36 AM1/23/16
to julia...@googlegroups.com
The problem I think came essentially from the repeated creation of
RangeGenerator objects, cf.
https://github.com/JuliaLang/julia/pull/14772.
Message has been deleted

Dan

unread,
Jan 24, 2016, 10:51:00 AM1/24/16
to julia-users
A good description of random permutation generation is given in Wikipedia. It's the Fisher-Yates algorithm and its variants.

Stefan Karpinski

unread,
Jan 25, 2016, 10:19:45 AM1/25/16
to julia...@googlegroups.com
The algorithm was fine, the issue was that sampling from a range of integers was heavily optimized for the case of sampling repeatedly from the same interval, rather than from different intervals. Since the Fisher-Yates shuffle changes the interval for every sample, it was effectively a worst case scenario. Rafael posted a change that gets around that worst case by not calling the version that's optimized for sampling from the same interval repeatedly.

On Sunday, January 24, 2016, Dan <get...@gmail.com> wrote:
As requested I deleted the post. But algorithms for creating permutations are standard and very much in the public domain (what does TheArtOfProgramming say?). If someone really invests a little effort into it, a more formally correct algorithm can be implemented.


On Saturday, January 23, 2016 at 4:43:36 PM UTC+2, Kevin Squire wrote:
Reply all
Reply to author
Forward
0 new messages