Julia computational efficiency vs C vs Java vs Python vs Cython

3,689 views
Skip to first unread message

Przemyslaw Szufel

unread,
Jan 14, 2014, 4:32:16 PM1/14/14
to julia...@googlegroups.com
Dear Julia users,

I am considering using Julia for computational projects.
As a first to get a feeling of the new language a I tried to benchmark Julia speed against other popular languages.
I used an example code from the Cython tutorial: http://docs.cython.org/src/tutorial/cython_tutorial.html [ the code for finding n first prime numbers].

Rewriting the code in different languages and measuring the times on my Windows laptop gave me the following results:

Language | Time in seconds (less=better)

Python: 65.5
Cython (with MinGW): 0.82
Java : 0.64
Java (with -server option) : 0.64
C (with MinGW): 0.64
Julia (0.2): 2.1
Julia (0.3 nightly build): 2.1

All the codes for my experiments are attached to this post (Cython i Python are both being run starting from the prim.py file)

The thing that worries me is that Julia takes much much longer than Cython ,,,
I am a beginner to Julia and would like to kindly ask what am I doing wrong with my code.
I start Julia console and use the command  include ("prime.jl") to execute it.

This code looks very simple and I think the compiler should be able to optimise it to at least the speed of Cython?
Maybe I my code has been written in non-Julia style way and the compiler has problems with it?

I will be grateful for any answers or comments.

Best regards,
Przemyslaw Szufel
prime.jl
primes.pyx
prim.py
Primes.java
primes.c

Eric Davies

unread,
Jan 14, 2014, 5:05:19 PM1/14/14
to julia...@googlegroups.com
Running test2() once before running @time test2() (to force compilation) results in a 13% performance improvement on my system.

Simon Kornblith

unread,
Jan 14, 2014, 5:29:40 PM1/14/14
to julia...@googlegroups.com
With a 64-bit build, Julia integers are 64-bit unless otherwise specified. In C, you use ints, which are 32-bit. Changing them to long long makes the C code perform similarly to the Julia code on my system. Unfortunately, it's hard to operate on 32-bit integers in Julia, since + promotes to 64-bit by default (am I missing something)?

Simon

Przemyslaw Szufel

unread,
Jan 14, 2014, 5:31:31 PM1/14/14
to julia...@googlegroups.com
Eric,
thanks for the reply!
To measure execution time I have run all this examples several times - this are averages. The standard deviation of running time seems to around 5%.
To calculate my measures for Julia I also tried to run test2() before @time test2() because I hoped for some compilation to happen in the first run.
Unfortunately I did not find any significant time difference.
Przemyslaw.

Przemyslaw Szufel

unread,
Jan 14, 2014, 5:46:04 PM1/14/14
to julia...@googlegroups.com
Simon,
Thanks for the explanation!
In Java int is 32 bit as well.
I have just replaced ints with longs in Java and found out that now I get the Java speed also very similar to Julia.

However I tried in Cython:
def primes_list(int kmax):
    cdef int k, i
    cdef long n
    cdef long p[20000]
...

and surprisingly the speed did not change...at first I thought that maybe something did not compile or is in cache - but I made sure - it's not the cache.
 Cython speed remains unchanged regardles using int or long?
I know that now it becomes other language question...but maybe someone can explain?

All best,
Przemyslaw Szufel

Simon Kornblith

unread,
Jan 14, 2014, 5:55:12 PM1/14/14
to julia...@googlegroups.com
In C long is only guaranteed to be at least 32 bits (IIRC it's 64 bits on 64-bit *nix but 32-bit on 64-bit Windows). long long is guaranteed to be at least 64 bits (and is 64 bits on all systems I know of).

Simon

Przemyslaw Szufel

unread,
Jan 14, 2014, 6:04:23 PM1/14/14
to julia...@googlegroups.com
Simon,
Thanks!
I changed in Cython to
def primes_list(int kmax):   
    cdef int k, i
    cdef long long n
    cdef long long p[20000]
and now I am getting 2.1 seconds - exactly the same time as Julia and Java with longs...

Since the computational difference between 64bit longs and 32bit ints is soo high - is there any way to rewrite my toy example to force Julia to do 32 bit int calculations?

All best,
Przemyslaw Szufel

Földes László

unread,
Jan 14, 2014, 6:25:20 PM1/14/14
to julia...@googlegroups.com
You can force the literals by enclosing them in int32():

p = [int32(0) for i=1:20000]
result
= [int32(0) for i=1:20000]  
    k
= int32(0)
    n
= int32(2)
   
while k < int32(20000)
        i
= int32(0)

Simon Kornblith

unread,
Jan 14, 2014, 6:35:39 PM1/14/14
to julia...@googlegroups.com
Unfortunately I think this will just slow things down, since +(Int32,Int32) converts its arguments to Int64 and returns an Int64 (presumably to avoid overflow):

julia> code_typed(+, (Int32,Int32))
1-element Array{Any,1}:
 :($(Expr(:lambda, {:x,:y}, {{},{{:x,Int32,0},{:y,Int32,0}},{}}, quote  # int.jl, line 16:
        return box(Int64,add_int(box($(Int64),sext_int($(Int64),x::Int32))::Int64,box($(Int64),sext_int($(Int64),y::Int32))::Int64))::Int64
    end)))

which means the types of variables that start out as Int32 will be unstable, since you are adding to them over the course of the loop. I am not actually sure how to get Julia to do 32-bit addition.

Simon

Przemyslaw Szufel

unread,
Jan 14, 2014, 6:37:07 PM1/14/14
to julia...@googlegroups.com
Foldes,

I went for your solution and got a time increase from
2.1 seconds (64bit integers) to 17.78 seconds (32 bit dow-casting).
Seems like casting is no cheap...

Any other ideas possibilities?

All best,
Przemyslaw

P.S.
Naturally I realize that this is toy example and normally in a typical production code we would rather use real numbers for computations not ints.
I am asking just out of curiosity ;-)

Stefan Karpinski

unread,
Jan 14, 2014, 8:44:32 PM1/14/14
to Julia Users
It's primarily because you're using 64-bit integers in Julia and 32-bit integers in all the other cases. If you change the C code to use `long` instead of `int`, the timings are about the same. You're also just summing result in C but accumulating it in Python and Julia and summing later, which is less efficient – although it ends up not making much difference here.

It's difficult to use 32-bit integer arithmetic in Julia on a 64-bit machine, but you usually don't want to. It's sometimes a performance hit like it is here, but that's pretty rare, and 2 billion really just isn't big enough for a lot of pretty mundane things you might use integers for (Bill Gates can't represent their net worth with a 32-bit int!). 64-bit integers on the other hand are big enough for any mundane counting task – you don't hear people talking about quintillions of things, ever. Still, it would be nice to be able to compile and/or run Julia in 32-bit mode on a 64-bit machine and vice versa.

It's clearly not appropriate for benchmarking, but this Julia one-liner computes the same thing much more efficiently:

julia> test2(n=20000) = sum(primes(ifloor(n*log(n*log(n))))[1:n])
test2 (generic function with 2 methods)

julia> @time test2()
elapsed time: 0.004566614 seconds (363112 bytes allocated)

The primes function uses a BitArray and the Sieve of Atkin to find all primes up to the given number. Since ifloor(n*log(n*log(n))) is an upper bound on the nth prime, this always produces at least n prime values and returns their sum.

Jameson Nash

unread,
Jan 14, 2014, 9:35:26 PM1/14/14
to julia...@googlegroups.com
If idiv (rem/mod) is that much more expensive on 64-bit numbers than
32-bit, should we do the sign-extend after the operation instead of
prior to the computation?

Stefan Karpinski

unread,
Jan 14, 2014, 10:02:20 PM1/14/14
to Julia Users
That might be a very good idea.

Andy M

unread,
Jan 14, 2014, 10:09:37 PM1/14/14
to julia...@googlegroups.com
On Wednesday, 15 January 2014 01:44:32 UTC, Stefan Karpinski wrote:
It's difficult to use 32-bit integer arithmetic in Julia on a 64-bit machine, but you usually don't want to. It's sometimes a performance hit like it is here, but that's pretty rare, and 2 billion really just isn't big enough for a lot of pretty mundane things you might use integers for (Bill Gates can't represent their net worth with a 32-bit int!).

I can see the sense in that, but couldn't you argue that if someone has explicitly declared a 32-bit int, it is probably for performance reasons, so they would probably prefer to think about the overflow issues in exchange for the performance boost? It just seems strange to forbid an operation that someone already has to opt into, especially when Julia is aiming to support highly optimized libraries with tight loops like the one benchmarked above.

Though I suppose it would be a bit difficult to deal with the interaction between integer literals and 32 bit operations, without causing lots of accidental type promotion.

Tobias Knopp

unread,
Jan 15, 2014, 3:17:25 AM1/15/14
to julia...@googlegroups.com
@Stefan: Is there a good reason to promote Int32 operations to the native mashine type? Sorry, if this has already discussed in depth. But for me it feals wrong to automatically upcast integer operations. I think that the type should keep stable and that this is more important than overflow issues.

Python promotes this also to int32 by the way

Milan Bouchet-Valat

unread,
Jan 15, 2014, 3:43:23 AM1/15/14
to julia...@googlegroups.com
Le mercredi 15 janvier 2014 à 00:17 -0800, Tobias Knopp a écrit :
> @Stefan: Is there a good reason to promote Int32 operations to the
> native mashine type? Sorry, if this has already discussed in depth.
> But for me it feals wrong to automatically upcast integer operations.
> I think that the type should keep stable and that this is more
> important than overflow issues.
Yeah, that's hard to guess what to do by default. With Int32 preserving
the type when summing might be reasonable (though not always), but if
somebody uses e.g. Uint8 to save space when there are many observations
with small counts, the sum should really use a different type.

But maybe it would be better to preserve the type, and let people use an
accumulator of a different type if they need it. If they are using a non
standard integer length, they must now what they are doing.

Related to this discussion:
https://github.com/JuliaLang/julia/issues/5311


Regards

Tobias Knopp

unread,
Jan 15, 2014, 3:54:28 AM1/15/14
to julia...@googlegroups.com
When using Uint8 one has a good reason for that.

I have found that that arrays have a different promotion rule: Array{Int32}+Array{Int32} -> Array{Int32}

So this does not seem to be entirely consistant.

Miles Lubin

unread,
Jan 15, 2014, 5:32:20 AM1/15/14
to julia...@googlegroups.com
Just to throw in my two cents, I don't think it's the right approach to brush off a class of performance optimizations that has a valid use case in practice and can lead to a 4x speedup. There should at least be *some* way to access nonpromoting integer operations, even if the default operators do promote.

Stefan Karpinski

unread,
Jan 15, 2014, 8:32:45 AM1/15/14
to Julia Users
We already provide all the necessary intrinsics for 32-bit arithmetic, so it's pretty easy to write a module that redefines arithmetic operations on integers to do this, but it's definitely some work that would need to be done. I'd be in favor of having this option. The biggest issue is that it would only apply in lexical scope. I.e. even if Int16 + Int16 => Int16, you'd still have sum(Int16[]) => Int, since sum is defined in Base. So to get the full effect, this would probably need to be a global switch, at which point you're really just talking about running in 32-bit mode on a 64-bit system.

Stefan Karpinski

unread,
Jan 15, 2014, 8:34:03 AM1/15/14
to Julia Users
We probably need a thorough FAQ entry on this subject. I've probably done some explanation before, but it bears writing it somewhere more permanent.

John Myles White

unread,
Jan 15, 2014, 10:30:21 AM1/15/14
to julia...@googlegroups.com
The arguments against changing are pretty strong, but I’d really like it if Julia did a bit less automatic promotion. For example, it would be great if sum(x::T…) returned a value of type T.

— John

Steven G. Johnson

unread,
Jan 15, 2014, 12:03:57 PM1/15/14
to julia...@googlegroups.com
The following version uses 32-bit integers in the arrays (to reduce cache pressure) and 32-bit integer remainder operations, with everything else 64-bit, and is nearly 3x faster than the original code on my machine:

rem32(x::Int32, y::Int32) = Base.box(Int32,Base.srem_int(Base.unbox(Int32,x),Base.unbox(Int32,y)))
function test3()
    p = Int32[0 for i=1:20000]
    result = Int32[0 for i=1:20000]  
    k = 0
    n = 2
    while k < int32(20000)
        i::Int32 = 0
        while (i < k) && ( rem32(int32(n), p[i+1]) != 0 )
            i = i + 1
        end
        if i == k
            k = k + 1
            p[k] = n           
            result[k] = n
        end
        n = n + 1
    end
    print(sum(result),"\r\n")
end

Steven G. Johnson

unread,
Jan 15, 2014, 12:07:58 PM1/15/14
to julia...@googlegroups.com


On Wednesday, January 15, 2014 12:03:57 PM UTC-5, Steven G. Johnson wrote:
The following version uses 32-bit integers in the arrays (to reduce cache pressure) and 32-bit integer remainder operations, with everything else 64-bit, and is nearly 3x faster than the original code on my machine:

(Correction: I accidentally left i::Int32 in, so i is 32-bit.  Removing the ::Int32 qualifier from i doesn't seem to make a significant performance difference, though.)

Steven G. Johnson

unread,
Jan 15, 2014, 12:14:50 PM1/15/14
to julia...@googlegroups.com
I've filed an issue for this:

    https://github.com/JuliaLang/julia/issues/5409

As Jameson suggested, not promoting the operands in 32-bit integer division/remainder operations looks like it will almost completely fix this performance problem without introducing any danger of overflow.

Marcus Urban

unread,
Jan 15, 2014, 1:44:01 PM1/15/14
to julia...@googlegroups.com
Without SIMD, I don't see much use for 32-bit integers on 64-bit architectures. However, if Julia adds support for autovectorization using SIMD instructions, twice as many 32-bit integers can be packed into the same SSE/AVX register and operated on at the same time.

Földes László

unread,
Jan 15, 2014, 4:28:15 PM1/15/14
to julia...@googlegroups.com
Sorry for the wrong info, I was switching between a 32 bit and a 64 bit machine (SSH terminal), and I just happened to run the script on the 32 bit machine...

Iain Dunning

unread,
Jan 15, 2014, 8:16:23 PM1/15/14
to julia...@googlegroups.com
From a philosophical POV alone, I think its inconsistent that we
a) Don't save people from overflows, but
b) Silently do Int32 math as Int64 behind the scenes to presumably save themselves from themselves

I think the overflow behaviour suprises some people, but only because they've been trained on Python etc. instead of C, but the Int32 behaviour would surprise pretty much everyone given how Julia normally acts (as the manual says, its falls into the more "no automatic coversion" family of languages)

John Myles White

unread,
Jan 15, 2014, 11:38:27 PM1/15/14
to julia...@googlegroups.com
+1 for Iain’s point of view.

— John

Stefan Karpinski

unread,
Jan 16, 2014, 12:37:50 AM1/16/14
to Julia Users
It's worth discussion, but there are a few significant differences. In general, our policy is not "you're on your own, kid", but rather "we'll do what we can, but not if it's going to cost too much". In other words, we tend towards safety except where safety is unacceptably slow. BigInt arithmetic everywhere? Not gonna happen. Int64 arithmetic everywhere? Doable. Using 64-bit integer ops usually does not cost that much – this is a particularly bad case – specifically because a 32-bit mod is *much* faster than 64-bit mod. With the recent recent change to integer division, it only takes *one* type annotation beyond giving the storage type of `p` to get near-C speed:

function test2()
    p = zeros(Int32,20000)
    k = 0
    n::Int32 = 2
    result = 0
    @time while k < 20000
        i = 0
        while i < k && (n % p[i+1]) != 0
            i = i + 1
        end
        if i == k
            k = k + 1
            p[k] = n
            result += n
        end
        n = n + 1
    end
    println(result)
end

That's not bad. It's certainly no huge stretch of the imagination to think that some additional optimization could remove the need for even that annotation.

Tobias Knopp

unread,
Jan 16, 2014, 1:39:05 AM1/16/14
to julia...@googlegroups.com
But why isn't float32 promoting to float64 on basic arithmetics then? 

As Markus pointed out, when we get SIMD support (in the form of the @simd macro or by autovectorization of llvm) there can be a factor of two between both integer computations.

P.S.: Hope you don't get this wrong. Its just a very minor nitpick on one of the most awesome type systems I have ever seen. 

Milan Bouchet-Valat

unread,
Jan 16, 2014, 4:43:13 AM1/16/14
to julia...@googlegroups.com
Le mercredi 15 janvier 2014 à 22:39 -0800, Tobias Knopp a écrit :
But why isn't float32 promoting to float64 on basic arithmetics then? 
As Markus pointed out, when we get SIMD support (in the form of the @simd macro or by autovectorization of llvm) there can be a factor of two between both integer computations.

P.S.: Hope you don't get this wrong. Its just a very minor nitpick on one of the most awesome type systems I have ever seen. 
Don't be sorry -- they only get what they deserve for creating Julia: we have become greedy too. ;-)

Stefan Karpinski

unread,
Jan 16, 2014, 8:36:18 AM1/16/14
to Julia Users
On Thu, Jan 16, 2014 at 4:43 AM, Milan Bouchet-Valat <nali...@club.fr> wrote:
Don't be sorry -- they only get what they deserve for creating Julia: we have become greedy too. ;-)

I did literally ask for it.

Stefan Karpinski

unread,
Jan 16, 2014, 8:41:32 AM1/16/14
to Julia Users
On Thu, Jan 16, 2014 at 1:39 AM, Tobias Knopp <tobias...@googlemail.com> wrote:
But why isn't float32 promoting to float64 on basic arithmetics then? 

Well, it used to. But there were enough people who really wanted to use Float32 arithmetic, so we changed it. The squeaky wheel gets oiled.

As Markus pointed out, when we get SIMD support (in the form of the @simd macro or by autovectorization of llvm) there can be a factor of two between both integer computations.

 This is a very legitimate concern, and we may have to revisit this decision.

Job van der Zwan

unread,
Jan 19, 2014, 11:58:35 AM1/19/14
to julia...@googlegroups.com
I must say, I'm really surprised that adding two Int32 variables results in an Int64 - I naively assumed Julia to work like this:

a::T
b::T
(a+b)::T

If people explicitly type a variable as an 32 bit integer, I think it's safe to assume they want to keep it that way unless there is no other option - like some kind of "principle of least promotion/conversion."

Marcus Urban

unread,
Jan 19, 2014, 1:43:58 PM1/19/14
to julia...@googlegroups.com
It would really be helpful it the conversion rules, especially the ones that might effect performance, were stated more clearly. Alternatively, a debug-mode for the runtime or a separate ahead-of-time tool could be useful.

Given that Julia is a bit of a moving target, I'm sometimes not clear about which behaviors are a consequence of the language design and which are simply occur because a certain optimization has not been created.

Ivan Tong

unread,
Jan 21, 2014, 8:17:36 PM1/21/14
to julia...@googlegroups.com
I think the greedy solution is both behaviors? =D

Is it possible to give the option to the user?  Maybe a keyword argument for each variable or some other notation?  Example

a::Int32
b::_Int32_

so a & b are both Int32, but one allows promotion, the other doesn't?  (or some other symbol/keyword...)  Just a greedy suggestion...

(and some sort of rule when you do a+b?  idk haha)

I'm writing a Cython reader for a custom file converter (and wishing Julia was more mature, since I'm a long time Matlab user).  I need a lot of bit-masking and using pointers with a lot of explicit casting to the input mmap file.  So for me personally, right now, I think I'd want or even need to have variables be the type I specify.  (Example, I've been accessing individual bytes instead of using shifts whenever possible, because it seemed a bit faster when I benchmarked it.  On a Big Endian arch, promotion might give me hard to track down errors?  Maybe?).

Job van der Zwan

unread,
Jan 23, 2014, 1:48:51 PM1/23/14
to julia...@googlegroups.com
On Wednesday, 22 January 2014 02:17:36 UTC+1, Ivan Tong wrote:
I think the greedy solution is both behaviors? =D

Although I get the idea of "let's try to please everyone", I think that it will just lead to confusion and hard to find bugs in this particular case.

Ivan Tong

unread,
Jan 23, 2014, 8:43:25 PM1/23/14
to julia...@googlegroups.com
Actually, now that I think back on it, that was a silly suggestion.

Question...  Anyone want automatic promotion when using type annotations?  Honest question.  It seems like people have only asked to _not_ have the automatic promotion behavior (for Float32).  Likewise, I can imagine anyone working on say reading/writing from binary files or network comm wanting very specific integer types.  So I totally agree with you, Job.

Just my two cents.
Reply all
Reply to author
Forward
0 new messages