hey guys,

I plan to have a tinker with Julia now and then, being mainly a

Python/Cython hacker. Compiled Julia from git master today in advance

of Stefan's talk tonight.

I was curious if Julia does any optimization of array expressions, so

I set up a very simple benchmark: (Note, I am _not_ trolling even if

it seems like it and just looking to understand Julia's JIT and what's

going on)

function test1()

n = 1000000

a = randn(n)

b = randn(n)

c = randn(n)

for i=1:500

result = sum(a + b + c)

end

end

@time test1()

On my machine this takes about 14-15ms per iteration. OK. Time to fire

up IPython and see how NumPy (not actually super optimized, I've had

to reimplement loads of things in NumPy by hand in Cython) does:

In [6]: timeit (a + b + c).sum()

100 loops, best of 3: 11 ms per loop

OK, not bad. Time for NumExpr:

In [13]: import numexpr as ne

In [12]: timeit ne.evaluate('sum(a + b + c)')

100 loops, best of 3: 6.9 ms per loop

So, how much performance is actually left on the table? C function:

double add_things(double *a, double *b, double *c, int n) {

register double result = 0;

register int i;

for (i = 0; i < n; ++i)

{

result += *a++ + *b++ + *c++;

}

return result;

}

Wrapped in Cython and compiled:

cdef extern from "foo.h":

double add_things(double *a, double *b, double *c, int n)

def cython_test(ndarray a, ndarray b, ndarray c):

return add_things(<double*> a.data,

<double*> b.data,

<double*> c.data, len(a))

In [6]: timeit cython_test(a, b, c)

100 loops, best of 3: 2.26 ms per loop

Turns out doing C isn't even really necessary, straight Cython will do:

def cython_test2(ndarray[float64_t] a, ndarray[float64_t] b,

ndarray[float64_t] c):

cdef:

Py_ssize_t i, n = len(a)

float64_t result = 0

for i in range(n):

result += a[i] + b[i] + c[i]

return result

In [5]: timeit cython_test2(a, b, c)

100 loops, best of 3: 2.25 ms per loop

So here's the question: am I doing it wrong? Even in NumExpr above you

can get much better performance in array operations with no true JIT

(it has a VM that tries to eliminate temporaries). But NumExpr is

extremely limited. At minimum I was very surprised that vanilla

Python, temporaries and all, wins out over Julia in this simple

benchmark. Note that the %timeit function disables Python's GC which

may be having an effect in the timings.

cheers and looking forward to tonight's talk,

Wes