The Java-C++ bridge should not be a big overhead compared to the actual feed-forward process. Although, getting the data (images) through that bridge might be difficult (at least if you want to do it efficiently). Maybe try to avoid that, and provide the image data to caffe in a different way (like from a file or supported DB). Also I would advise you to refrain from using Java here, unless you absolutely have to for some external reason.
The performance also depends on how you use the pipeline of course. Only feeding through single images one at a time is terribly inefficient, the difference between GPU and CPU mostly vanishes (because of the significant data transfer overhead). The real strengths of GPUs are found in training large networks with large input databases, with reasonably large batch sizes. If you are just feeding forward, and only few samples (<1000), the GPU won't be much faster than the CPU.
Jan