There is very little overhead calling Julia from C++

K leo

unread,

Sep 8, 2016, 12:43:30 AM9/8/16

to julia-users

I just did a test of calling a Julia function 100,000 times, both from Julia and from C++. The execution times are very close. The results are as follows. This is on Xubuntu 16.04 64bits.

*********** Julia **********

| | |_| | | | (_| | | Version 0.5.0-rc3+0 (2016-08-22 23:43 UTC)

_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release

|__/ | x86_64-unknown-linux-gnu

julia> include("speedTest.jl")

speedTest

julia> speedTest.TestLoop()

elapsed time: 3.21365718 seconds

3.21365718

*********** C++ ***********

> g++ -o test -fPIC -I$JULIA_DIR/include/julia test3.cpp -L$JULIA_DIR/lib/ -L$JULIA_DIR/lib/julia -lLLVM-3.7.1 -ljulia $JULIA_DIR/lib/julia/libstdc++.so.6

> ./test

3.22423

The codes are shown below:

Julai:

module speedTest

function TestFunc()
f=0.
for i=1:10000
f += Float64(i*fld(3,2))*sqrt(rand()+1.)
end
end

function TestLoop()
tic()
for i=1:100000
TestFunc()
end
toc()
end

end

C++:

#include <julia.h>
#include <iostream>
#include <sys/time.h>
using namespace std;
typedef unsigned long long timestamp_t;

static timestamp_t get_timestamp ()
{
struct timeval now;
gettimeofday (&now, NULL);
return now.tv_usec + (timestamp_t)now.tv_sec * 1000000;
}

int main(int argc, char *argv[])
{
jl_init(NULL);

jl_load("speedTest.jl");
jl_value_t * mod = (jl_value_t*)jl_eval_string("speedTest");
jl_function_t * func = jl_get_function((jl_module_t*)mod,"TestFunc");

timestamp_t t0 = get_timestamp();

for(int i=1; i<100000; i++) {
jl_call0(func);
}

timestamp_t t1 = get_timestamp();

double secs = (t1 - t0) / 1000000.0L;
cout<< secs << endl;

jl_atexit_hook(0);
return 0;
}

Tim Holy

unread,

Sep 8, 2016, 6:07:34 AM9/8/16

to julia...@googlegroups.com

Keep in mind that statements like "very little overhead" depend entirely on
what you're comparing against. Your TestFunc is quite expensive, so it's not
surprising that how it's called adds little overhead. If you called a much
cheaper function, you might come to a rather different conclusion. I'm not
saying you can't/shouldn't do this, but you should be aware that your
conclusions may not generalize to all usage patterns.

For example, much of what makes julia fun is the fact that you can build up
complicated functionality from "atomic" pieces that do very little work on
their own, and julia links them all together (using a great deal of inlining)
to deliver awesome performance. Presumably you'll lose those advantages when
calling individual functions from C++.

Best,
--Tim

K leo

unread,

Sep 8, 2016, 7:33:45 PM9/8/16

to julia-users

Stefan Karpinski's words (in https://groups.google.com/forum/#!searchin/julia-users/C$2B$2B$20call$20julia$20struct%7Csort:relevance/julia-users/KTMlJ15vzVA/2W3qOis7Kk8J) explained the results:

There's also the issue that we can and do turn Julia functions into C-callable function pointers that can be invoked from C as if they were C function pointers – this currently has zero overhead and if the Julia function is fast, then calling it from C will also be fast. If these require interpreter state, then that would need to be a function call argument to every C-callable function, which is at odds with many C APIs (although good libraries do allow for a void* data argument). Maybe this could be made to work, but my suspicion is that it would introduce too much overhead and destroy our current ability to do zero-cost two-way interop with C.

Steven G. Johnson

unread,

Sep 8, 2016, 7:40:32 PM9/8/16

to julia-users

Except that in your example code, you aren't calling the Julia code through a raw C function pointer. You are calling it through jl_call0, which *does* have a fair amount of overhead (which you aren't seeing because the function execution is expensive enough to hide the call overhead).

To get it down to C overhead, you need to generate a C function pointer with cfunction (or by using the undocumented Base.@ccallable macro, see #9400). This is how we create low-overhead callback functions to pass to C, but you can do it from the C side as well. See http://julialang.org/blog/2013/05/callback

Bart Janssens

unread,

Sep 11, 2016, 5:33:02 PM9/11/16

to julia...@googlegroups.com

On Fri, Sep 9, 2016 at 1:40 AM Steven G. Johnson <steve...@gmail.com> wrote:

Except that in your example code, you aren't calling the Julia code through a raw C function pointer. You are calling it through jl_call0, which *does* have a fair amount of overhead (which you aren't seeing because the function execution is expensive enough to hide the call overhead).

To confirm this, I added it to my worst-case benchmark in CxxWrap.jl. Using jl_call from C++ is about 25 times slower than using ccall from Julia. The function here just divides a number by 2, so it needs boxing and unboxing. Test code here:
https://github.com/barche/CxxWrap.jl/blob/master/deps/src/examples/functions.cpp#L103-L111

Timings:
Pure Julia test:
0.061723 seconds (4 allocations: 160 bytes)
ccall test:
0.092434 seconds (4 allocations: 160 bytes)
CxxWrap.jl test:
0.139052 seconds (4 allocations: 160 bytes)
Pure C++:
0.057972 seconds (4 allocations: 160 bytes)
jl_call inside C++ loop (array is 100 times smaller than other tests):
0.025484 seconds (1.00 M allocations: 15.259 MB, 4.84% gc time)

That said, if the test you did is representative of your real-world problem, it should be fine.

Cheers,

Bart

K leo

unread,

Sep 11, 2016, 10:56:25 PM9/11/16

to julia-users

Sorry, how to tell from these numbers that using jl_call from C++ is about 25 times slower than using ccall from Julia?

Bart Janssens

unread,

Sep 12, 2016, 2:22:17 AM9/12/16

to julia...@googlegroups.com

The jl_call test is 100 times smaller, so the timing there has to be multiplied by 100, resulting in 2.5 s for jl_call vs 0.1 s for ccall. I reduced the size to avoid having the test suite take to long.

Op ma 12 sep. 2016 04:56 schreef K leo <cnbi...@gmail.com>:

Reply all

Reply to author

Forward