the compute speed between julia's matrix multiply and that of numpy in python

谭磊

unread,

Dec 5, 2013, 6:45:54 AM12/5/13

to julia...@googlegroups.com

Hi,all

I tested the performance speed between julia‘s matrix multiply and that of numpy in python.

To my surprise , it take 27 second to do a multiply between 5000x10000 and 10000x5000.And 45 second to complete this in julia.so i am confused about the high performance of julia。

===============================================================

python code

import numpy

import cProfile

x=numpy.random.random((5000,10000))

y=numpy.random.random((10000,5000))

cProfile.run("numpy.dot(x,y)")

python result

4 function calls in 27.497 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)

1 0.028 0.028 27.496 27.496 <string>:1(<module>)

1 27.468 27.468 27.468 27.468 {built-in method dot}

1 0.000 0.000 27.497 27.497 {built-in method exec}

1 0.000 0.000 0.000 0.000 {method 'disable' of

=========================================

julia code

function myfun()

x = rand(5000,10000)

y = rand(10000,5000)

x*y

end

@profile myfunc()

Profile.print()

==========================================

julia take about 45second to complete.

am i something wrong in my code or julia is just a child to grow up

Matthias BUSSONNIER

unread,

Dec 5, 2013, 7:26:56 AM12/5/13

to julia...@googlegroups.com

Le 5 déc. 2013 à 12:45, 谭磊 a écrit :

> Hi,all

>
> x=numpy.random.random((5000,10000))
> y=numpy.random.random((10000,5000))
> cProfile.run("numpy.dot(x,y)")

Generate random outside of function

> ==============

> julia code
> function myfun()
> x = rand(5000,10000)
> y = rand(10000,5000)
> x*y
> end

Generate random inside function ?

is it fair ?

I get numpy ~12 sec, (IPython %timeit -n1 -r1)
julia ~8 (@time)

Some random x or y took more than 45sec for I don't know which reason on my machine.
--
M

Ivar Nesje

unread,

Dec 5, 2013, 7:24:56 AM12/5/13

to julia...@googlegroups.com

I don't think this is the area where Julia has much potential to beat numpy. When you use arrays of this size, most of the time is spent in optimized libraries that tries to do the calculation in the most efficient way for your processor, using every trick in the book. Julia and numpy uses the same libraries.

I am not familiar with cProfile, and you don't print the timings, but it seems like you include different things in the timing. The julia version includes the generation of 858 MB of random numbers, but this seems to be excluded from the Python version.

When I run the example, in global scope because * is actually a function call.

srand(10)

x = rand(5000,10000)

y = rand(10000,5000)

@time(x*y)
@profile(x*y)

ivarne~/tmp$ julia matmul.jl
elapsed time: 17.793986746 seconds (215458620 bytes allocated) 56 profile.jl; anonymous; line: 14

56 linalg/matmul.jl; *; line: 83
56 linalg/matmul.jl; gemm_wrapper; line: 223
56 linalg/matmul.jl; gemm_wrapper; line: 239
56 linalg/blas.jl; gemm!; line: 475

I get timings of the actual multiplication (not random number generation) of about 20 seconds, which is comparable to the 27 you report for numpy.

PS. When you do profiling/timing based on random numbers, you should ensure that you use the same numbers on each run by a call to srand(). Some algorithms might have iteration until convergence, and then the value might have a lot to say on the performance.

Steven G. Johnson

unread,

Dec 5, 2013, 9:34:07 AM12/5/13

to julia...@googlegroups.com

PS. When you do profiling/timing based on random numbers, you should ensure that you use the same numbers on each run by a call to srand(). Some algorithms might have iteration until convergence, and then the value might have a lot to say on the performance.

Matrix multiplication via any popular BLAS is the same cost regardless of the values (unless Inf or NaN are generated, since floating-point exceptions are slow). You might as well benchmark with an array of zeros.

Ivar Nesje

unread,

Dec 5, 2013, 9:50:08 AM12/5/13

to julia...@googlegroups.com

Thanks, I suspected that I was inaccurate bringing up that point when the topic was matrix multiplication. Does that hold true for any other operation, like A/b also? My point was anyway that there might be a good idea seed the random number generator to ensure that you are testing with the same data every time.

Stefan Karpinski

unread,

Dec 5, 2013, 11:25:42 AM12/5/13

to julia...@googlegroups.com

One operation where different random data can have a big impact is sorting. I believe that NumPy ships with a reference BLAS which is slower than OpenBLAS or MKL.

Reply all

Reply to author

Forward