Large randn overhead

541 views
Skip to first unread message

Walking Randomly

unread,
Oct 7, 2012, 11:53:54 PM10/7/12
to juli...@googlegroups.com
Hello again

Sorry for so many messages in such a short time but I'm really enjoying playing with Julia and hope I'm not bothering you all too much.  I'm wondering about the randn algorithm.  In MATLAB, there is an overhead of around 40% when generating normally distributed random numbers compared to uniform.  On my SandyBridge i7 running Linux, MATLAB 2012a I get

>> tic;rand(1,100000000);toc
Elapsed time is 1.443943 seconds.
>> tic;randn(1,100000000);toc
Elapsed time is 2.017200 seconds.

Now, Julia soundly beats MATLAB in the generation of uniformly distributed random numbers:

tic();a=rand(1,100000000);toc()
elapsed time: 0.3639218807220459 seconds

Almost 4 times faster! Very impressive.  Your choice of SFMT has paid off here. 

Moving to randn, however, and things are not so good:

tic();a=randn(1,100000000);toc()
elapsed time: 2.537461996078491 seconds

A 700% overhead compared to rand and slower than MATLAB's.

I found this difference while trying to work out why some Julia code is slower than MATLAB in a commonly used financial monte carlo simulation:

http://www.walkingrandomly.com/?p=4619

It's not the whole story but definitely a factor.

Cheers,
Mike

Patrick O'Leary

unread,
Oct 8, 2012, 12:39:46 AM10/8/12
to juli...@googlegroups.com
Interesting. It's a straight ccall to randmtzig_randn in librandom. MATLAB is also using a Mersenne Twister ziggurat as of R2008b for randn(). Nothing obvious. randn() should obviously be slower (if only due to sample rejection) but I'm surprised it's that much slower.

julia> @time randn(1,10000000);
elapsed time: 0.38684988021850586 seconds

julia> @time rand(1,10000000);
elapsed time: 0.08682799339294434 seconds

You should probably go ahead and file an issue.

Elliot Saba

unread,
Oct 8, 2012, 2:39:21 AM10/8/12
to juli...@googlegroups.com
I've noticed that my performance is highly variable the first time I run it in a new julia environment, and it also seems to fluctuate quite a bit just because of the memory requirements this operation has.  Try running the following and see if the timing changes at all:

[@time (randn(1,10000000); gc()) for x = 1:10];

Running it 10 times to eliminate any first-time compilation effects, and putting a gc() afterward to make sure there's no "randn was slower because it had to allocate 9MB of memory first" effect.

To make the point:  If I take the lowest of the ten scores for both randn() and rand(), I get only a factor of 2.1 difference. (0.1471s for randn() and 0.0701s for rand())
-E



--
 
 
 

Patrick O'Leary

unread,
Oct 8, 2012, 8:45:59 AM10/8/12
to juli...@googlegroups.com
Machine variation. I'm lookng at 4.5:1 randn():rand() consistently, even with your code. (I'm used to throwing away the first one due to JIT effects.)

Patrick O'Leary

unread,
Oct 8, 2012, 9:07:47 AM10/8/12
to juli...@googlegroups.com
Jeff didn't feel like waiting; he filed https://github.com/JuliaLang/julia/issues/1348

Viral Shah

unread,
Oct 8, 2012, 1:27:07 PM10/8/12
to juli...@googlegroups.com
Here's one hypothesis for the performance difference. The thing is that randmtzig originally used MT, which generated 32-bit ints. Now, we are using dsfmt, which generates double precision random numbers, and converts them to 32-bit random integers, of which randmtzig consumes a couple to constructs a 53-bit random integer, and uses that to generate normally distributed random numbers.

The dsfmt manual in fact even warns against using dsfmt to generate 32-bit random integers, apart from occasional use. Fixing this up might just give us the performance we want. The holy grail, of course, is to implement a ziggurat in julia that is just as fast. We already have an implementation of ziggurat in perf2, but we still need to close the gap with C to use it.

If someone knows of a better implementation of ziggurat we can use, I'm up for trying it out.

-viral

Viral Shah

unread,
Oct 11, 2012, 12:51:16 PM10/11/12
to juli...@googlegroups.com
On my core i5, I do note an almost 5x difference, but even so, it is still faster than Matlab R2012a. I'll dig into the ziggurat algorithm further. It is only executing 5 lines of code for the most part in randmtzig_randn to generate the normal random numbers, apart from the uniform rng.
julia> @time rand(1,10000000);
elapsed time: 0.02201104164123535 seconds

julia> @time randn(1,10000000);
elapsed time: 0.10574984550476074 seconds

matlab >> tic; rand(1,10000000); toc
Elapsed time is 0.167736 seconds.

matlab >> tic; randn(1,10000000); toc
Elapsed time is 0.196783 seconds.


-viral

Viral Shah

unread,
Oct 11, 2012, 1:02:53 PM10/11/12
to juli...@googlegroups.com
One major source of difference in this performance is that when we generate rand(n), we are able to use the dsfmt's fill_array routines, which are much more efficient than the routines that generate one random number at a time. This itself accounts for about 2-3x performance difference.

-viral

Viral Shah

unread,
Oct 12, 2012, 4:14:26 AM10/12/12
to juli...@googlegroups.com
Upon examining this further, this turns out to be the routine "devectorization is faster" issue. The following code fragment works as fast as the Matlab for me.

function bench_eu(numPaths)
    steps = 250
    r = 0.05
    sigma = .4;
    T = 1;
    dt = T/(steps)
    K = 100;

    S = 100 * ones(numPaths,1);

    t1 = (r-0.5*sigma.^2)*dt
    t2 = sigma*sqrt(dt)
    for i=1:steps
        for j=1:numPaths
            S[j] .*= exp(t1 + t2*randn())
        end
    end

    V = mean( exp(-r*T)*max(K-S,0) )
end



-viral

On Monday, October 8, 2012 9:23:54 AM UTC+5:30, Walking Randomly wrote:

Walking Randomly

unread,
Oct 13, 2012, 12:06:31 PM10/13/12
to juli...@googlegroups.com
Thanks for that Viral.  On my SandyBridge i7, MATLAB 2012a still wins at 6 seconds compared to Julia's 8.5 but your version is much better.

I am very new to Julia but can I infer from your comment 'the routine "devectorization is faster" issue' that I should unlearn all of the vectorisation habits I have learned with MATLAB?

Do you think that there is still some mileage to be gained by optimising Julia's randn() function?  I was stunned by Julia's 4 fold speed increase over MATLAB's vanilla rand().  If Julia could get the rand vs randn overhead to be around 30% as it is in MATLAB, it would win on both counts.

Stefan Karpinski

unread,
Oct 13, 2012, 2:53:04 PM10/13/12
to juli...@googlegroups.com
See extensive discussion of exactly this issue here:


--
 
 
 

Tim Holy

unread,
Oct 13, 2012, 3:27:58 PM10/13/12
to juli...@googlegroups.com
On Saturday, October 13, 2012 09:06:31 AM Walking Randomly wrote:
> I am very new to Julia but can I infer from your comment 'the routine
> "devectorization is faster" issue' that I should unlearn all of the
> vectorisation habits I have learned with MATLAB?

When vectorization leads to more readable (or easier-to-write) code, it's
still fine. But it doesn't usually help performance (usually, it does only if
someone has written a particularly clever vectorized algorithm that you aren't
planning to duplicate yourself). Indeed, often it hurts, because de-
vectorizing often lets you reduce the number of temporaries needed.

What those of us who have migrated from MATLAB do need to get out of the habit
of is thinking "this will be horrible unless I vectorize it," because with
Julia that's just not true. While vectorization can be nice syntax sometimes,
other times you have to go to mental gymnastics to figure out some way to
vectorize an operation. Nice not to have to worry about that anymore.

--Tim

Viral Shah

unread,
Oct 14, 2012, 1:00:45 AM10/14/12
to juli...@googlegroups.com
As Stefan mentioned, do take a look at the open issue. Also look at this pull request:


It would be great if you can pull this branch and try it out. It has a faster randn, and will be integrated once we have some sanity tests. Not sure if it will close the gap, but should help.

The gap between julia's rand and randn looks worse than matlab, because the rand is just so much faster. The differences are additive, rather than multiplicative. randn still has to do the same amount of work, while rand is doing much lesser work. 

-viral

Viral Shah

unread,
Oct 14, 2012, 1:48:47 PM10/14/12
to juli...@googlegroups.com
I have committed a faster randn - about 25% faster for me. Does this improve performance for you?

-viral

Yann Le Du

unread,
Oct 14, 2012, 5:31:44 PM10/14/12
to juli...@googlegroups.com
I tried it in Matlab R2011a on Mac:

>> tic();for el=1:10000000;randn();end;toc()
Elapsed time is 0.694521 seconds.

Git pulled your commit in Julia and:

julia> tic();for el=1:10000000;randn();end;toc()
elapsed time: 0.11960291862487793 seconds

Hands down!

Elliot Saba

unread,
Oct 14, 2012, 5:56:56 PM10/14/12
to juli...@googlegroups.com
A slightly more indepth look at things that hopefully rids us of weird "for loop" issues:

julia> tic();for el=1:10000000;randn();end;toc()
elapsed time: 0.09598183631896973 seconds

julia> tic(); randn(10000000); toc()
elapsed time: 0.21253013610839844 seconds

Now that's interesting, the for loop version is actually twice as fast as the single function call version!  Let's take a look at other languages:

MATLAB >> tic();for el=1:10000000;randn();end;toc()
Elapsed time is 0.757715 seconds.

MATLAB >> tic();randn(10000000,1);toc()
Elapsed time is 0.293145 seconds.


So here we can see that the MATLAB timing is actually pretty similar to Julia's when calling randn() with large numbers in the parentheses!  I feel that something must be confounding the results here for Julia.  My guess is memory allocation, as julia doesn't have to allocate a large array in the for loop example.  However, we can see that the for loop is really killing MATLAB, so unless someone knows of an "in-place" randn() we can use to test MATLAB against Julia, I don't think we're really comparing apples to oranges.

And finally, because I love pylab, I ran a similar program in python, and got the timings:
For loop: 3.88210296631 seconds
Single Function Call: 0.627640962601 seconds

OSX 10.8.2, MATLAB R2012b, 2.8GHz Intel Core 2 Duo, Julia commit f228bb8056
-E

--
 
 
 

Yann Le Du

unread,
Oct 14, 2012, 7:46:26 PM10/14/12
to juli...@googlegroups.com


On Sunday, October 14, 2012 11:57:38 PM UTC+2, Elliot Saba wrote:
A slightly more indepth look at things that hopefully rids us of weird "for loop" issues:

julia> tic();for el=1:10000000;randn();end;toc()
elapsed time: 0.09598183631896973 seconds

julia> tic(); randn(10000000); toc()
elapsed time: 0.21253013610839844 seconds

Now that's interesting, the for loop version is actually twice as fast as the single function call version!  Let's take a look at other languages:

MATLAB >> tic();for el=1:10000000;randn();end;toc()
Elapsed time is 0.757715 seconds.

MATLAB >> tic();randn(10000000,1);toc()
Elapsed time is 0.293145 seconds.


So here we can see that the MATLAB timing is actually pretty similar to Julia's when calling randn() with large numbers in the parentheses!  I feel that something must be confounding the results here for Julia.  My guess is memory allocation, as julia doesn't have to allocate a large array in the for loop example.  However, we can see that the for loop is really killing MATLAB, so unless someone knows of an "in-place" randn() we can use to test MATLAB against Julia, I don't think we're really comparing apples to oranges.

And finally, because I love pylab, I ran a similar program in python, and got the timings:
For loop: 3.88210296631 seconds
Single Function Call: 0.627640962601 seconds


And

tic();for el=1:10000000;randn(1);end;toc()

is much slower than


tic();for el=1:10000000;randn();end;toc()
And I had a try at ScalaLab, which, inside Scala, has some large overlaps with Julia with regards to goals, and here's the code and benchmarks:

var start = System.currentTimeMillis()
var v=vrandn(10000000);
var end = System.currentTimeMillis()
var timeStatic = end-start
println("time static = "+timeStatic)

Gives me:

timeStatic: Long = 1821

So that is an order of magnitude slower than randn(10000000) in Julia.
 
Reply all
Reply to author
Forward
0 new messages