Scott: my comment about protobuf's performance on Python do not apply
to recent versions. Additionally, the performance test I performed
many years ago was about calling Java methods from Python with
arguments passed by reference. For this use case, the Py4J's text
protocol was faster because the Java reflection API works with Strings
and protobuf's binary encoding and decoding was doing unnecessary
work. Actually, even while working on 0.11, it was difficult to beat a
simple text protocol because passing strings on socket is already
quite optimized by Java and Python. The problem obviously arise when
you are trying to pass large blobs by value: the text protocol just
doesn't scale hence my slow work on a binary protocol.
On Tue, Jun 13, 2017 at 1:42 PM, Scott Lewis <
scott...@gmail.com> wrote:
> Thanks Jonah, I've joined the two issues you listed (249, 237).
>
> FWIW: I have the following observations:
>
>>Your total datasize is about 2MB, it
>>should be possible to transfer much faster than 1-2 seconds Probably a
>> lot of your remaining slowdown is due to the Base64 encoding[1,2].
>
> Although what Jonah says is certainly possible, based upon my own
> measurements I would be a little surprised if the Base64 encoding/decoding
> was responsible for this much time. I've found that the choice of
> serialization/deserialization can matter more than the B64 encoding that
> Py4j is doing. For example, Java serialization is quite slow. As I
> mentioned I've found protobuf can perform well between Java and Python at
> least...I'm not sure if Barthelemy's comment about python protobuf is still
> true given more recent versions of protobuf, but if so then perhaps a custom
> serialization, or one of the others that are available and are reputed to be
> performant...e.g. Capn proto:
https://capnproto.org/ could be used.
>
> FWIW, I think that the attempt to optimize the copy/pass-by-value of byte[]
> (as per 249 and 237) is right given Barthelemy's observations about the
> growing importance of performant data exchange between java and python. I
> also think it's important to separate into two distinct layers the
> transport/Py4j optimization and the optimization of serialization. That is,
> provide the API to plug-in appropriate serialization/deserialization
> approaches (e.g. protobuf, capn proto, custom, etc). That's the approach
> I've taken with layering protobuf on Py4j to create an OSGi remote services
> distribution provider for Java<->Python OSGi services.
>
> A related thought about usability: I've started using Python class and
> method decorators, and this is working well. For example, here's the
> source for a Python-implemented implementation of a IHello remote
> serviceservice:
>
> @protobuf_remote_service(objectClass=['org.eclipse.ecf.examples.protobuf.hello.IHello'])
> class HelloServiceImpl:
>
> @protobuf_remote_service_method(arg_type=HelloMsgContent)
> def sayHello(self,pbarg):
> '''...do something with pbarg input
> return result
>
> The @protobuf_remote_service is responsible for passing any/all metadata
> (including the objectClass array of service interface names), and the
> @protobuf_remote_service_method defines the argument type passed into the
> sayHello method implementation. Note the Java implements is not required as
> the @protobuf_remote_service adds this to the HelloServiceImpl class
> dynamically.
>
> here's the whole example:
>
>
https://github.com/ECF/Py4j-RemoteServicesProvider/tree/master/examples/org.eclipse.ecf.examples.protobuf.hello
>
> java interfaces, proto file and protobuf-generated code, etc.
>
> The feedback so far is that the decorators have helped people with using
> pass-by-value Java->Python impl remote services (our use cases). There are
> other decorators that I've been working on that are not specifically bound
> to protobuf, and if interested I could contribute them or help with others.
>>
https://groups.google.com/a/py4j.org/d/msgid/py4j/CAM_%3DWAZ4aVeS9Wof7_czJiqQS6OSyVCGrH3mDEx%3DMo8_%2BmFfOw%40mail.gmail.com.
>
>