Is there a Java interface for caffe classification?

379 views
Skip to first unread message

Hao Chen

unread,
Jan 26, 2016, 5:51:58 AM1/26/16
to Caffe Users
I've been using JNA for calling cpp program for caffe classification on cpu devices for some time. I moved the program to a Tesla K80 GPU recently but does not see any performance improvement. I noticed the model has been loaded into graphic memory however the GPU seems to be idle for the whole time.

I read that JNA could be slow comparing to other Java C++ bridges, but I sure this is the reason.

The cpp code I'm using is very similar to the caffe example: https://github.com/BVLC/caffe/blob/master/examples/cpp_classification/classification.cpp

My environment include:
  • CUDA 7.5
  • cuDNN v4
  • Tesla K80
  • Java 1.7.0_64
  • JNA 4.2.1
Message has been deleted
Message has been deleted

Felix Abecassis

unread,
Jan 26, 2016, 11:28:20 PM1/26/16
to Caffe Users
> so we finally made it work but relating the parameter of cpp method to caffe gpu device usage is so counterintuitive. Really hope some one who have similar need or have run into similar problems could be kindly to help me here.

What do you mean?


On Tuesday, January 26, 2016 at 6:31:45 PM UTC-8, Hao Chen wrote:
Update:

1 Last post has a typo: I read that JNA could be slow comparing to other Java C++ bridges, but I not sure this is the reason.
2 Changed to jni, now whether the program is running on GPU is somehow decided by whether the image is passed as an OpenCV mat or a byte array.

so we finally made it work but relating the parameter of cpp method to caffe gpu device usage is so counterintuitive. Really hope some one who have similar need or have run into similar problems could be kindly to help me here.

Hao Chen

unread,
Jan 27, 2016, 3:04:21 AM1/27/16
to Caffe Users
Sorry, wrong information. It is still not working. 

Jan C Peters

unread,
Jan 27, 2016, 4:04:11 AM1/27/16
to Caffe Users
The Java-C++ bridge should not be a big overhead compared to the actual feed-forward process. Although, getting the data (images) through that bridge might be difficult (at least if you want to do it efficiently). Maybe try to avoid that, and provide the image data to caffe in a different way (like from a file or supported DB). Also I would advise you to refrain from using Java here, unless you absolutely have to for some external reason.

The performance also depends on how you use the pipeline of course. Only feeding through single images one at a time is terribly inefficient, the difference between GPU and CPU mostly vanishes (because of the significant data transfer overhead). The real strengths of GPUs are found in training large networks with large input databases, with reasonably large batch sizes. If you are just feeding forward, and only few samples (<1000), the GPU won't be much faster than the CPU.

Jan

Hao Chen

unread,
Jan 27, 2016, 10:27:24 PM1/27/16
to Caffe Users
Thanks for the information. I measured the time cost of Java-C++ bridges it is indeed not big.

I understand that my current classification process is not efficient at all, however there should be about 10 times of running time reduction according to tests on the same machine.

The problem now is we are not able to make GPU work on forward passes when calling from java. We have tried JNI but no good either. I think there is bug in our implementation but sadly we have little knowledge about Java-C++ bridges and how caffe utilize CUDA. We found a working example here with JavaCPP, which uses jni too. We will study it and try to figure out our problem.

Jan C Peters

unread,
Jan 28, 2016, 4:39:06 AM1/28/16
to Caffe Users
Sounds reasonable. Actually caffe should not show wierd behavior just because you use a Java-C++ bridge. The error is probably somewhere in your code.

Though take immense care about statements such as "is ten times faster than". These statements usually have a large number of conditions it assumes and depends on, some explicit, most implicit. So usually it does not help anybody to make such claims, as it probably only holds for a very specific situation on a specific machine, dealing with a very tightly bounded problem. And if you do it slightly differently you might lose much performance. PR guys like those kinds of statements of course. Well, they usually don't need to take responsibility...

Jan

Hao Chen

unread,
Jan 28, 2016, 10:23:46 PM1/28/16
to Caffe Users
You are absolutely right about performance measurements. While in this case, the experiment is carried out by myself on the exactly same machine with same code, only difference is instead of calling from java, I used only C++. This is why I made the statement that I think GPU is not working properly.

Thanks for your insights. I think I have to turn to my code for answers since it is very not likely a Java-C++ bridge choice problem.
Reply all
Reply to author
Forward
0 new messages