numba vs. julia performance

Arnim Bleier

unread,

Nov 23, 2013, 1:47:47 PM11/23/13

to juli...@googlegroups.com

Hi,

I was interested in how well (fast) julia is doing in comparison to numba.

Since I need a lot of sapling in a not so easy to vectorize setting I chose the pi estimation classic.

python/numba:

import time

from numba import autojit
from numba import random
from math import sqrt

state = random.state_p
PREC = 0xffffffff

@autojit
def myhypot(a, b):
    return sqrt(a*a + b*b)

@autojit
def rand():
    return random.rk_interval(PREC, state) / PREC


@autojit
def myPi (nt):
    count_inside = 0
    for c in range(0, nt):
        if myhypot(rand(), rand()) < 1:
            count_inside += 1
    return 4.0 * count_inside / nt



myPi(100) 
tic = time.clock()
print myPi(10000000)
print time.clock() - tic

julia:


function myPi(nt)
  count_inside = 0
    for c in 1:nt
      if (hypot(rand(), rand()) < 1)
        count_inside += 1
      end
    end
  return 4.0 * count_inside / nt		
end

myPi(100)
@time myPi(10000000)

Both seem to do comparable

numba around 0.33 seconds

julia around 0.32 seconds.

Best

Arnim

Arnim Bleier

unread,

Nov 23, 2013, 1:53:15 PM11/23/13

to juli...@googlegroups.com

Numba code is optimized thanks to Siu Kwan Lam.

Steven G. Johnson

unread,

Nov 23, 2013, 2:21:17 PM11/23/13

to juli...@googlegroups.com

The following implementation in Julia (posted in Rosetta Code, http://rosettacode.org/wiki/Monte_Carlo_methods) is significantly faster:


function montepi(n)
    s = 0
    for i = 1:n
        s += rand()^2 + rand()^2 < 1
    end
    return 4*s/n
end

It saves the call to hypot, and avoids the branch as well.

@time myPi(10000000)
@time montepi(10000000)

gives 0.29s for myPi, 0.067s for montepi. You can even get slightly faster (0.63s) by:

function montepi2(n)
    s = 0
    for i = 1:n
        x = rand(); y = rand()
        s += x*x + y*y < 1
    end
    return 4*s/n
end

since Julia does not optimize ^ well for small integer powers yet (see https://github.com/JuliaLang/julia/issues/2741).

Stefan Karpinski

unread,

Nov 23, 2013, 2:22:16 PM11/23/13

to Julia Dev

Thanks for the report. Glad to see the performance is comparable. This is likely the same speed as you would get for writing the code in C, calling the libm hypot function. Note that if you define your own naïve hypot function in Julia like you do in Python, this gets faster (I'll only show a single representative timing, but I did several to let code gen settle in, etc.):

julia> function myPi1(nt)

count_inside = 0
for c in 1:nt

if hypot(rand(),rand()) < 1

count_inside += 1
end
end
return 4.0 * count_inside / nt

end

myPi1 (generic function with 1 method)

julia> @time myPi1(1000000)
elapsed time: 0.171592815 seconds (64 bytes allocated)

3.14142

julia> myhypot(x,y) = sqrt(x*x + y*y)
myhypot (generic function with 1 method)

julia> function myPi2(nt)

count_inside = 0
for c in 1:nt

if myhypot(rand(),rand()) < 1

count_inside += 1

end
end
return 4.0 * count_inside / nt
end

myPi2 (generic function with 1 method)

julia> @time myPi2(1000000)
elapsed time: 0.155481895 seconds (64 bytes allocated)
3.144868

This speedup occurs because the libm hypot function does a fair amount of additional work to ensure that overflow and underflow don't occur for large and small values of its arguments, so the Numba code and myPi2 are actually doing less work than the original Julia version (i.e. myPi1).

We can get further speedup by manually inlining the näive hypot calculation:

julia> function myPi3(nt)

count_inside = 0
for c in 1:nt

if sqrt(rand()^2 + rand()^2) < 1

count_inside += 1

end
end
return 4.0 * count_inside / nt
end

myPi3 (generic function with 1 method)

julia> @time myPi3(1000000)
elapsed time: 0.154946413 seconds (64 bytes allocated)
3.142644

The speedup is very slight, but it's there. Finally, we can, of course, speed up the computation even further by observing that sqrt(x) < 1 if and only if x < 1 for real x:

julia> function myPi4(nt)

count_inside = 0
for c in 1:nt

if rand()^2 + rand()^2 < 1

count_inside += 1
end
end
return 4.0 * count_inside / nt

end

myPi4 (generic function with 1 method)

julia> @time myPi4(1000000)
elapsed time: 0.100211371 seconds (64 bytes allocated)

3.141872

Of course, you can do the same trick in Python too. Relative to the original myPi1, the relative times are:

myPi1 = 0.171592815 = 1.00 * myPi1
myPi2 = 0.155481895 = 0.91 * myPi1
myPi3 = 0.154946413 = 0.90 * myPi1
myPi4 = 0.100211371 = 0.58 * myPi1

Steven G. Johnson

unread,

Nov 23, 2013, 2:27:18 PM11/23/13

to juli...@googlegroups.com

My montepi is about 23% faster, and montepi2 is about 30% faster, than myPi4 .... the benefit of avoiding the branch by simply adding the result of the comparison is noticeable.

Stefan Karpinski

unread,

Nov 23, 2013, 2:32:20 PM11/23/13

to Julia Dev

julia> @time float(pi)

elapsed time: 2.201e-6 seconds (64 bytes allocated)

3.141592653589793

45,000x faster – and more accurate!

Arnim Bleier

unread,

Nov 23, 2013, 3:36:56 PM11/23/13

to juli...@googlegroups.com

Thanks a lot for all your feedback.

The most important thing about Julia (for me) is probably its community.

100x times more charming + and supportive ;)

Best

Arnim

Randy Zwitch

unread,

Nov 25, 2013, 12:51:05 PM11/25/13

to juli...@googlegroups.com

I did a similar test a few months ago comparing Julia, Python and R with the various JIT options. Julia ended up being between 10-30% faster than Numba on pretty similar looping code.

http://randyzwitch.com/python-pypy-julia-r-pqr-jit-just-in-time-compiler/

Interesting side result of this benchmarking: Jokingly being accused of "king-sized trolling" at the DataGotham reception for this post!

Reply all

Reply to author

Forward