Improving Java->Python transfer performance

Justin

unread,

Jun 12, 2017, 1:20:13 PM6/12/17

to Py4J Support and Comments

Hi there!

I'm attempting to move a 512x512 float array worth of data from a JVM instance to a Numpy array. This operation is taking a really head-scratchingly long time (~14 seconds). Are there known tricks for improving the performance of such transfers? I've tried to search around for any clue on what I might be doing wrong, but I've come up fairly dry. There is only a suggestion that transfers be done using byte arrays, which implies some kind of encoding/decoding on either end of the transmission. I hope that I missed something. Any ideas?

Thanks!

-Justin

Justin

unread,

Jun 12, 2017, 5:45:15 PM6/12/17

to Py4J Support and Comments

For the record, I moved to byte arrays, as this has an optimized pathway built in, but the performance is still slow (although much more reasonable (1-2 seconds)). Suggestions for better speed are still welcome!

Thanks again.

Peter A

unread,

Jun 12, 2017, 6:08:07 PM6/12/17

to Justin, Py4J Support and Comments

I've used memory mapped files in the past

--
You received this message because you are subscribed to the Google Groups "Py4J Support and Comments" group.
To unsubscribe from this group and stop receiving emails from it, send an email to py4j+uns...@py4j.org.
To post to this group, send email to py...@py4j.org.
To view this discussion on the web visit https://groups.google.com/a/py4j.org/d/msgid/py4j/72b113ab-9108-4630-83a7-b679ca9101bc%40py4j.org.

Jonah Graham

unread,

Jun 13, 2017, 4:48:26 AM6/13/17

to Peter A, Justin, Py4J Support and Comments

Hi Justin,

Barthelemy has been working on improving the performance in the 0.11
version (as yet unreleasd). You can follow his progress here:

https://github.com/bartdag/py4j/issues/249
https://github.com/bartdag/py4j/issues/237

I think Barthelemy has also published some benchmarks based on
https://github.com/bartdag/py4j-benchmark but I can't find them now.

While waiting for 0.11 of Py4J another option is to transfer only
meta-data over the py4j pipe. Your total datasize is about 2MB, it
should be possible to transfer much faster than 1-2 seconds Probably a
lot of your remaining slowdown is due to the Base64 encoding[1,2].
Have the Java side write to a binary file in numpy format and have
numpy load from that file. The OS will generally keep that data cached
so there is little performance overhead. See
https://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html
for the NumPy side and
https://github.com/DawnScience/scisoft-core/blob/master/uk.ac.diamond.scisoft.analysis/src/uk/ac/diamond/scisoft/analysis/io/NumPyFileSaver.java
for some Java code that stores Eclipse January datasets[3] as NumPy
data files.

[1] decode on Python side:
https://github.com/bartdag/py4j/blob/262a20477a8a302cae0ede597c93b953a3fc2702/py4j-python/src/py4j/protocol.py#L163
[2] encode on Java side:
https://github.com/bartdag/py4j/blob/9ab16a93461e6e374d729679343db58ebb922b5e/py4j-java/src/main/java/py4j/ReturnObject.java#L149
[3] Eclipse January is a set of libraries for handling numerical data
in Java. It is inspired in part by NumPy and aims to provide similar
functionality. https://www.eclipse.org/january/

HTH,
Jonah
~~~
Jonah Graham
Kichwa Coders Ltd.
www.kichwacoders.com

> https://groups.google.com/a/py4j.org/d/msgid/py4j/CAFKAwfuks8BtX2DAuHVNKAVkdMSFXLRaB7VNWMerU_kACoM7%2BA%40mail.gmail.com.

Barthelemy Dagenais

unread,

Jun 13, 2017, 5:10:27 AM6/13/17

to Py4J Support and Comments

Hi Justin

Your initial attempt at transferring a float matrix was probably slow
because Py4J was transferring references and you were accessing each
individual cell, making 512*512 network calls to Java. Your second
attempt is currently the best way to transfer a binary payload if you
want to rely only on Py4J.

The strategy described by Jonah is the most popular one for serious
performance (while everyone is waiting for me to find the time to
complete 0.11 :-p). I personally also prefer to write a file on one
side and read it on the other side, but others have had success with
memory mapped file (Peter) or a dedicated socket transferring binary
blobs (PySpark).

Not sure where I posted the benchmark data, but I also couldn't find
it (!) so here is a link to the data:
https://docs.google.com/spreadsheets/d/14ljMYIESFbOBFe4o_Fy6WirI2P5iCQuTP9fA1BuLMAI/edit#gid=0

Finally, a small historical note about Py4J: when I created Py4J, the
goal was to mix Python and Java libs by calling methods and
maintaining states across the JVM and the Python interpreter. It
became somehow popular in academia and industry because one could
write a Python script to control a JVM and the use of sockets instead
of JNI meant that more integration scenarios were possible, at the
cost of performance. At the time, the text protocol on which Py4J is
based was even faster than protobuf (due to the poor performance of
the Python implementation of protobuf and the fact that Java
reflection is driven by strings so binary conversion can be wasteful).
The focus totally changed with the increasing popularity of "data
science" and now it seems that transferring large blobs from the JVM
to numpy/pandas is a major requirement. Py4J's architecture is
obviously not optimized for that as of now and workarounds must be
used.

Thanks to all who participated to this discussion! Much appreciated!

Barthelemy

> To view this discussion on the web visit https://groups.google.com/a/py4j.org/d/msgid/py4j/CAPmGMvhp7VyQaY4Bfu45WGpxor6GMzxAJc-rFoRC4zi%3DJKiMOw%40mail.gmail.com.

Barthelemy Dagenais

unread,

Jun 13, 2017, 2:00:44 PM6/13/17

to Scott Lewis, Py4J Support and Comments

Scott: my comment about protobuf's performance on Python do not apply
to recent versions. Additionally, the performance test I performed
many years ago was about calling Java methods from Python with
arguments passed by reference. For this use case, the Py4J's text
protocol was faster because the Java reflection API works with Strings
and protobuf's binary encoding and decoding was doing unnecessary
work. Actually, even while working on 0.11, it was difficult to beat a
simple text protocol because passing strings on socket is already
quite optimized by Java and Python. The problem obviously arise when
you are trying to pass large blobs by value: the text protocol just
doesn't scale hence my slow work on a binary protocol.

On Tue, Jun 13, 2017 at 1:42 PM, Scott Lewis <scott...@gmail.com> wrote:
> Thanks Jonah, I've joined the two issues you listed (249, 237).
>
> FWIW: I have the following observations:

>
>>Your total datasize is about 2MB, it
>>should be possible to transfer much faster than 1-2 seconds Probably a
>> lot of your remaining slowdown is due to the Base64 encoding[1,2].
>

> Although what Jonah says is certainly possible, based upon my own
> measurements I would be a little surprised if the Base64 encoding/decoding
> was responsible for this much time. I've found that the choice of
> serialization/deserialization can matter more than the B64 encoding that
> Py4j is doing. For example, Java serialization is quite slow. As I
> mentioned I've found protobuf can perform well between Java and Python at
> least...I'm not sure if Barthelemy's comment about python protobuf is still
> true given more recent versions of protobuf, but if so then perhaps a custom
> serialization, or one of the others that are available and are reputed to be
> performant...e.g. Capn proto: https://capnproto.org/ could be used.
>
> FWIW, I think that the attempt to optimize the copy/pass-by-value of byte[]
> (as per 249 and 237) is right given Barthelemy's observations about the
> growing importance of performant data exchange between java and python. I
> also think it's important to separate into two distinct layers the
> transport/Py4j optimization and the optimization of serialization. That is,
> provide the API to plug-in appropriate serialization/deserialization
> approaches (e.g. protobuf, capn proto, custom, etc). That's the approach
> I've taken with layering protobuf on Py4j to create an OSGi remote services
> distribution provider for Java<->Python OSGi services.
>
> A related thought about usability: I've started using Python class and
> method decorators, and this is working well. For example, here's the
> source for a Python-implemented implementation of a IHello remote
> serviceservice:
>
> @protobuf_remote_service(objectClass=['org.eclipse.ecf.examples.protobuf.hello.IHello'])
> class HelloServiceImpl:
>
> @protobuf_remote_service_method(arg_type=HelloMsgContent)
> def sayHello(self,pbarg):
> '''...do something with pbarg input
> return result
>
> The @protobuf_remote_service is responsible for passing any/all metadata
> (including the objectClass array of service interface names), and the
> @protobuf_remote_service_method defines the argument type passed into the
> sayHello method implementation. Note the Java implements is not required as
> the @protobuf_remote_service adds this to the HelloServiceImpl class
> dynamically.
>
> here's the whole example:
>
> https://github.com/ECF/Py4j-RemoteServicesProvider/tree/master/examples/org.eclipse.ecf.examples.protobuf.hello
>
> java interfaces, proto file and protobuf-generated code, etc.
>
> The feedback so far is that the decorators have helped people with using
> pass-by-value Java->Python impl remote services (our use cases). There are
> other decorators that I've been working on that are not specifically bound
> to protobuf, and if interested I could contribute them or help with others.

>> https://groups.google.com/a/py4j.org/d/msgid/py4j/CAM_%3DWAZ4aVeS9Wof7_czJiqQS6OSyVCGrH3mDEx%3DMo8_%2BmFfOw%40mail.gmail.com.
>
>

Reply all

Reply to author

Forward