How can I build my service as multiple python processes to 1 JVM?

549 views
Skip to first unread message

Samuel Tan

unread,
Dec 7, 2016, 9:10:45 PM12/7/16
to py...@py4j.org
Hi,

From hints in advanced topics, it seems java gateway of py4j can support multiple python processes by considering them as threads. Without clear statement presented, I am guessing whether I can build such a service to achieve m:1 communication model. 

Currently I am using JavaGateway as the broker to read messages from various resources. All messages will be stored in local caches. Multiple java threads will be created to consume these caches and each message will trigger a method call of python process. Even if this service is not CPU bound, I can see python concurrency is pretty slow due to GIL, say, 30 java threads corresponding to 7 running python threads.

I am thinking whether there's any way to help improve the out throughput of python. The perfect way I can think is to register multiple python processes to a single java gateway, with either same or different ports. But a simple experiment failed like the second launched python processes failed due to unavailable socket. As well, the first created python process also failed due to unknown errors, same did jvm gateway. 

If there's no such a solution, I have to launch multiple services with 1:1 communication model using dynamic ports. I don't want to choose this way since JVM is so expensive on memory, 

Any suggestion? Any clue will be appreciated~

Thanks,
-Sam 

Barthelemy Dagenais

unread,
Dec 8, 2016, 5:43:15 AM12/8/16
to Samuel Tan, Py4J Support and Comments
There are currently two threading models:

A) Py4J creates threads as needed. If the client concurrently calls
the server (python or java does not matter), a thread is created on
the server. Each side tries to optimise the number of threads and
sockets created by reusing client sockets (e.g., if two Python threads
always access Java in turn, only one socket and one Java thread would
be needed).

B) Py4J "pins" one Java thread to on Python thread. You are guaranteed
that code executed in Thread X on the Java side will execute on Thread
X on the Python side. If Thread Y from Java tries to access Python, a
new Thread Y is created in Python to handle the communication. This is
mainly useful in two scenarios: 1) you have deep recursion between
Python and Java. This model will be more efficient because only one
thread is used instead of one thread/socket per recursive call. 2) you
want to execute code synchroneously on the Java UI thread.

Py4J does not support Python's multiprocessing out of the box in the
sense that sharing the same JavaGateway and CallbackServer instances
between processes will likely produce massive communication errors, if
you get past socket sharing issues.

Here is what you can do though:

1) Multiple Python processes accessing the same Java GatewayServer by
creating a new JavaGateway instance in each Python process pointing to
the same Java IP/Port. This will minimally create one Java thread per
Python process. If you want to call Python from Java though, this
solution won't work because you can only have one instance of
CallbackServer for each GatewayServer.

2) Multiple Python processes accessing a different Java GatewayServer.
Each Python process would be linked to a Java Thread with a pair of
GatewayServer/CallbackServer.

This is how I would do it:

Java Thread 1: Start GatewayServer on port X. Set the readTimeout to
something other than 0* This will be responsible for receiving
connection requests.

Python Process 1: Create a new JavaGateway on port X. Call Java to
request for a new port.

Java Thread 1: Start Java Thread 2, Start a new GatewayServer, get the
listening port Y and sends it to Python

Python Process 1: Gets the port Y, close the connection
(gateway.close()), and start a new JavaGateway with the new port Y.
You can also start a callbackserver and communicates the port to the
new GatewayServer.

Python Process 2: repeat the process

* This will make sure that each connection is correctly closed after
the read timeout expires.

HTH,
Barthélémy
> --
> You received this message because you are subscribed to the Google Groups
> "Py4J Support and Comments" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to py4j+uns...@py4j.org.
> To post to this group, send email to py...@py4j.org.
> To view this discussion on the web visit
> https://groups.google.com/a/py4j.org/d/msgid/py4j/CAANOte9Oe9jnabTv5m%3D-eadTWXFF4HDyt50uUYN9rK3-%3DfDxOw%40mail.gmail.com.

Samuel Tan

unread,
Dec 8, 2016, 3:31:38 PM12/8/16
to Barthelemy Dagenais, Py4J Support and Comments
Hi Barthelemy,

Thanks for your timely instruction. In your second solution, so each GatewayServer is a persistent java thread, and I just need launch one JVM, right? I remember I have read some sample codes with the same idea but can't find them. If you can direct me to the correct place, it will help a lot.

Best,
-Sam

Barthelemy Dagenais

unread,
Dec 8, 2016, 8:12:50 PM12/8/16
to Samuel Tan, Py4J Support and Comments
Hi,

with solution 2, there is indeed only one JVM, but multiple
GatewayServer instances. If you are referring to the documentation
about pinned thread model (deterministic/synchronized thread model),
this is the relevant section:
https://www.py4j.org/advanced_topics.html#using-single-threading-model-pinned-thread.

Barthélémy

Samuel Tan

unread,
Dec 8, 2016, 8:15:29 PM12/8/16
to Barthelemy Dagenais, Py4J Support and Comments
Thanks Barthelemy, I can start to try launch multiple JavaGateway with multi-threads. Will let you know once any  further questions appear.

Best,
-Sam
Reply all
Reply to author
Forward
0 new messages