numba vs. julia performance

1,783 views
Skip to first unread message

Arnim Bleier

unread,
Nov 23, 2013, 1:47:47 PM11/23/13
to juli...@googlegroups.com
Hi,

I was interested in how well (fast) julia is doing in comparison to numba.
Since I need a lot of sapling in a not so easy to vectorize setting I chose the pi estimation classic.

python/numba:

import time


from numba import autojit

from numba import random

from math import sqrt


state = random.state_p

PREC = 0xffffffff


@autojit

def myhypot(a, b):

   return sqrt(a*a + b*b)


@autojit

def rand():

   return random.rk_interval(PREC, state) / PREC



@autojit

def myPi (nt):

   count_inside = 0

   for c in range(0, nt):

       if myhypot(rand(), rand()) < 1:

           count_inside += 1

   return 4.0 * count_inside / nt




myPi(100)

tic = time.clock()

print myPi(10000000)

print time.clock() - tic


 
julia:

function myPi(nt)
  count_inside = 0
    for c in 1:nt
      if (hypot(rand(), rand()) < 1)
        count_inside += 1
      end
    end
  return 4.0 * count_inside / nt
end

myPi(100)
@time myPi(10000000)


Both seem to do comparable 
numba around 0.33 seconds
julia     around 0.32 seconds.


Best
Arnim

Arnim Bleier

unread,
Nov 23, 2013, 1:53:15 PM11/23/13
to juli...@googlegroups.com
Numba code is optimized thanks to Siu Kwan Lam.

Steven G. Johnson

unread,
Nov 23, 2013, 2:21:17 PM11/23/13
to juli...@googlegroups.com
The following implementation in Julia (posted in Rosetta Code, http://rosettacode.org/wiki/Monte_Carlo_methods) is significantly faster:

function montepi(n)
s = 0
for i = 1:n
s += rand()^2 + rand()^2 < 1
end
return 4*s/n
end

It saves the call to hypot, and avoids the branch as well.

@time myPi(10000000)
@time montepi(10000000)

gives 0.29s for myPi, 0.067s for montepi.  You can even get slightly faster (0.63s) by:

function montepi2(n)
    s = 0
    for i = 1:n
        x = rand(); y = rand()
        s += x*x + y*y < 1
    end
    return 4*s/n
end

since Julia does not optimize ^ well for small integer powers yet (see https://github.com/JuliaLang/julia/issues/2741).

Stefan Karpinski

unread,
Nov 23, 2013, 2:22:16 PM11/23/13
to Julia Dev
Thanks for the report. Glad to see the performance is comparable. This is likely the same speed as you would get for writing the code in C, calling the libm hypot function. Note that if you define your own naïve hypot function in Julia like you do in Python, this gets faster (I'll only show a single representative timing, but I did several to let code gen settle in, etc.):

julia> function myPi1(nt)
         count_inside = 0
           for c in 1:nt
             if hypot(rand(),rand()) < 1
               count_inside += 1
             end
           end
         return 4.0 * count_inside / nt
       end
myPi1 (generic function with 1 method)

julia> @time myPi1(1000000)
elapsed time: 0.171592815 seconds (64 bytes allocated)
3.14142

julia> myhypot(x,y) = sqrt(x*x + y*y)
myhypot (generic function with 1 method)

julia> function myPi2(nt)
         count_inside = 0
           for c in 1:nt
             if myhypot(rand(),rand()) < 1
               count_inside += 1
             end
           end
         return 4.0 * count_inside / nt
       end
myPi2 (generic function with 1 method)

julia> @time myPi2(1000000)
elapsed time: 0.155481895 seconds (64 bytes allocated)
3.144868

This speedup occurs because the libm hypot function does a fair amount of additional work to ensure that overflow and underflow don't occur for large and small values of its arguments, so the Numba code and myPi2 are actually doing less work than the original Julia version (i.e. myPi1).

We can get further speedup by manually inlining the näive hypot calculation:

julia> function myPi3(nt)
         count_inside = 0
           for c in 1:nt
             if sqrt(rand()^2 + rand()^2) < 1
               count_inside += 1
             end
           end
         return 4.0 * count_inside / nt
       end
myPi3 (generic function with 1 method)

julia> @time myPi3(1000000)
elapsed time: 0.154946413 seconds (64 bytes allocated)
3.142644

The speedup is very slight, but it's there. Finally, we can, of course, speed up the computation even further by observing that sqrt(x) < 1 if and only if x < 1 for real x:

julia> function myPi4(nt)
         count_inside = 0
           for c in 1:nt
             if rand()^2 + rand()^2 < 1
               count_inside += 1
             end
           end
         return 4.0 * count_inside / nt
       end
myPi4 (generic function with 1 method)

julia> @time myPi4(1000000)
elapsed time: 0.100211371 seconds (64 bytes allocated)
3.141872

Of course, you can do the same trick in Python too. Relative to the original myPi1, the relative times are:

myPi1 = 0.171592815 = 1.00 * myPi1
myPi2 = 0.155481895 = 0.91 * myPi1
myPi3 = 0.154946413 = 0.90 * myPi1
myPi4 = 0.100211371 = 0.58 * myPi1

Steven G. Johnson

unread,
Nov 23, 2013, 2:27:18 PM11/23/13
to juli...@googlegroups.com
My montepi is about 23% faster, and montepi2 is about 30% faster, than myPi4 .... the benefit of avoiding the branch by simply adding the result of the comparison is noticeable.

Stefan Karpinski

unread,
Nov 23, 2013, 2:32:20 PM11/23/13
to Julia Dev
julia> @time float(pi)
elapsed time: 2.201e-6 seconds (64 bytes allocated)
3.141592653589793

45,000x faster – and more accurate!

Arnim Bleier

unread,
Nov 23, 2013, 3:36:56 PM11/23/13
to juli...@googlegroups.com
Thanks a lot for all your feedback.
The most important thing about Julia (for me) is probably its community.

100x times more charming + and supportive ;)

Best
Arnim

Randy Zwitch

unread,
Nov 25, 2013, 12:51:05 PM11/25/13
to juli...@googlegroups.com
I did a similar test a few months ago comparing Julia, Python and R with the various JIT options. Julia ended up being between 10-30% faster than Numba on pretty similar looping code.


Interesting side result of this benchmarking: Jokingly being accused of "king-sized trolling" at the DataGotham reception for this post!
Reply all
Reply to author
Forward
0 new messages