Fastest way to call C -> Java JNI

1,456 views
Skip to first unread message

Todd Lipcon

unread,
Sep 4, 2013, 5:28:18 PM9/4/13
to mechanica...@googlegroups.com
Hey folks,

I was helping a colleague last week who is working on a project that involves calling user-provided Java functions from a native program. The Java functions themselves are typically quite short (on the order of 10s or 100s of cycles), so the JNI overhead actually represents a significant amount of time relative to the execution itself.

We were able to get about a 2.5x speedup from where we started by using CallNonVirtualObjectMethodA rather than CallObjectMethod(), and by using Unsafe on a pre-allocated buffer to pass data back and forth (instead of having call arguments). But, we still see a fair amount of time spent in the JNI code according to the gperftools CPU profiler.

Any further tips and tricks to minimize the overhead as much as possible? Would creating a new class on the fly with a static function be appreciably faster compared to non-virtual dispatch on an instance method?

-Todd
Message has been deleted

ymo

unread,
Sep 4, 2013, 9:24:11 PM9/4/13
to mechanica...@googlegroups.com, to...@lipcon.org
Aslo passing primitive types would be easy but Strings and other complicated objects would need some way of marshalling/de-marshalling over the ring buffers. The less you copy back and forth the better.

Todd Lipcon

unread,
Sep 4, 2013, 9:31:11 PM9/4/13
to ymo, mechanica...@googlegroups.com
Thanks, that's a clever idea. Unfortunately in the particular architecture of this application I don't think it will really be doable -- if we could do async, we could also probably just build bigger batches across which to amortize the JNI call overhead. Without async, you'd end up with a lot of extra CPU usage due to spinning which wouldn't be acceptable for total system throughput (we're often CPU bound)

Thanks

-Todd

Nitsan Wakart

unread,
Sep 5, 2013, 3:19:03 AM9/5/13
to mechanica...@googlegroups.com
If you can do async you could avoid JNI all together and use a shared lock free offheap queue to exchange messages on.
This has been discussed elsewhere on this list.


From: Todd Lipcon <to...@lipcon.org>
To: ymo <ymol...@gmail.com>
Cc: mechanica...@googlegroups.com
Sent: Thursday, September 5, 2013 3:31 AM
Subject: Re: Fastest way to call C -> Java JNI

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


ymo

unread,
Sep 5, 2013, 5:23:17 PM9/5/13
to mechanica...@googlegroups.com, ymo, to...@lipcon.org
Mind you even the async nature can be hidden behind an interface. Meaning that when the c++ calls a function that function would block until the call comes back from the ringbuffer. Which is prolly whats already happening if you used JNI but this might prove to be much faster. If you have calls from multiple c++ threads i would have one ring buffer for each thread and one single java thread to read from all these incoming ringbuffers.

Todd Lipcon

unread,
Sep 5, 2013, 5:25:47 PM9/5/13
to ymo, mechanica...@googlegroups.com
On Thu, Sep 5, 2013 at 2:23 PM, ymo <ymol...@gmail.com> wrote:
Mind you even the async nature can be hidden behind an interface. Meaning that when the c++ calls a function that function would block until the call comes back from the ringbuffer. Which is prolly whats already happening if you used JNI but this might prove to be much faster. If you have calls from multiple c++ threads i would have one ring buffer for each thread and one single java thread to read from all these incoming ringbuffers.

The thing I'd worry about, though, is that the Java thread would have to spin, given there's no futex() equivalent. And if the C++ side is only sending a single call at a time, there will be a lot of useless spinning on both sides.

For a latency sensitive application with a bunch of parallel tasks going across the boundary, it definitely makes sense. In the case where there's only one C++ thread trying to make calls against a single Java function (but you don't want to waste CPU which might be better used by other collocated tasks), I'm skeptical.

-Todd
Reply all
Reply to author
Forward
0 new messages