multiprocessing issue with python 3.6

483 views
Skip to first unread message

Luca Sbardella

unread,
Dec 9, 2016, 5:57:59 AM12/9/16
to python-tulip
Hi,

I'm trying to run pulsar in multiprocessing mode (using the multiprocessing module to create processes rather than asyncio subprocess).
However, in python 3.6 I have a small problem.
When the new process starts, it creates the event loop and starts it but I get

raise RuntimeError('This event loop is already running')

The loop is not running, but the _running_loop global in the asyncio.events module has been inherited from the master process and therefore the get_event_loop function is somehow broken.

I resolved the issue via setting the running loop to None when the Process run method is called:

def run(self):
    try:
        from asyncio.events import _set_running_loop
        _set_running_loop(None)
    except ImportError:
        pass
    ...

Is that what I'm supposed to do? Or is there a better way?

Thanks!

Yury Selivanov

unread,
Dec 9, 2016, 4:25:02 PM12/9/16
to Luca Sbardella, python-tulip
A better was is to never fork or spawn multiprocessing.Process from a running coroutine. Ideally, you want to stop the loop, spawn a process, resume the loop.

Yury

Luca Sbardella

unread,
Dec 9, 2016, 5:38:58 PM12/9/16
to Yury Selivanov, python-tulip
right, so if the forking is not in a coroutine it may work?!?!
 
  Ideally, you want to stop the loop, spawn a process, resume the loop.

that does not sound what I should be doing, but I'll test it

Thanks


Yury




--

Yury Selivanov

unread,
Dec 9, 2016, 5:43:49 PM12/9/16
to Luca Sbardella, python-tulip

> > Is that what I'm supposed to do? Or is there a better way?
>
> A better was is to never fork or spawn multiprocessing.Process from a running coroutine.
>
> right, so if the forking is not in a coroutine it may work?!?!

It should, because the running loop is set only when the loop is running :)

>
> Ideally, you want to stop the loop, spawn a process, resume the loop.
>
> that does not sound what I should be doing, but I'll test it

I find forking from within a coroutine or a callback function to be quite dangerous. It’s usually better to pre-fork or to use the approach I describe above (with any kind of asynchronous IO framework, not just asyncio).

Yury

Luca Sbardella

unread,
Dec 9, 2016, 6:21:12 PM12/9/16
to Yury Selivanov, python-tulip

>
>   Ideally, you want to stop the loop, spawn a process, resume the loop.
>
> that does not sound what I should be doing, but I'll test it

I find forking from within a coroutine or a callback function to be quite dangerous. It’s usually better to pre-fork or to use the approach I describe above (with any kind of asynchronous IO framework, not just asyncio).

I think I'll stick with my initial hack for now, simpler

Thanks for help,


--

Denis Costa

unread,
Dec 28, 2016, 4:58:34 AM12/28/16
to python-tulip, luca.sb...@gmail.com
Hi Yuri,


On Friday, December 9, 2016 at 11:43:49 PM UTC+1, Yury Selivanov wrote:
I find forking from within a coroutine or a callback function to be quite dangerous. It’s usually better to pre-fork or to use the approach I describe above (with any kind of asynchronous IO framework, not just asyncio).

Could you elaborate more why this is quite dangerous?


Thanx

Denis Costa

Martin Richard

unread,
Dec 28, 2016, 5:29:28 AM12/28/16
to Denis Costa, python-tulip, luca.sb...@gmail.com
Hi Denis,

We are talking about forking without exec-ing right after, so using subprocess coroutines is mostly fine.

It's dangerous because you may:
1/ run scheduled code (callback, task, etc) twice,
2/ interfere with the parent loop from the child by mistake.

1/ you can't really know if the loop has other tasks or pending callbacks scheduled to run when you fork: it means that if both the parent and child runs the same loop, some tasks will run on both process. This is a problem because some side effects may be applied twice: both the master and the child will write the same buffer on a socket, or the child might steal data that should have been consumed by the parent.

2/ At least on linux, asyncio uses epoll, which is a structure owned by the kernel and identified by a fd. When forking, the child inherits this fd. This means that the list of events watched by the loop (for instance "a read is ready on socket X") is registered by the kernel and shared by both processes.

If one of the processes open a socket and watches an event on it, the other process will receive the notification... but it doesn't know anything about this new file, and may try to read on the wrong one (or non-existing fd).

Anyway, even if the loop is not running when the fork is performed, there is still a problem which requires monkey-patching.

If you choose to close the paren's loop in the child right after the fork, you will only prevent the 1st problem: when the loop is closed and disposed, the cleanup will unregister all watched events, which will affect the parent loop (2nd problem). You *must* monkey-patch the loop's selector.

I've got a project which does that, and it's quite brittle as I've got to take care of the global state of the loop when forking. I am considering replacing this fragile implementation with one that start a fresh python process. The downside with the strategy is that spawning a process will take more time (initialization is quite slow in python) and I will need a RPC to send data from the parent.

Maybe there are other problems I'm not aware of, but as I said, I fork a process with a running loop in a something used in prod, and it works fine, so in practice, it's hard but doable.
--
Martin Richard
www.martiusweb.net
Reply all
Reply to author
Forward
0 new messages