New Process doesn't find the current ActorSystem

42 views
Skip to first unread message

Torsten Hencke

unread,
Jan 12, 2023, 12:15:49 PM1/12/23
to thespian.py
Hi,

I'm trying to refactor my multiprocess program to using Thespian Actors (multiprocQueueBase) instead. As an intermediate step I was trying to first create a single Actor that is used by a couple other processes. However, only the main process is able to communicate with that ActorSystem(). The other processes all seem to fall back to using simpleSystemBase. 

The new processes are created from the main process and will be handled as an Actor itself eventually, but for the intermediate step: How can I tell the new process about the existing ActorSystem? 

I already tried just sending the ActorAdress to the new process, because that's really all I need, but that just gave me a CannotPickle error.

Best regards,

Torsten

Kevin Quick

unread,
Jan 18, 2023, 11:50:54 AM1/18/23
to Torsten Hencke, thespian.py
Hi Torsten,

Sorry for the delay in response: your email got flagged as spam by
gmail and so I only just found it.

To answer your primary question, if you are creating a new process
(e.g. via the subprocess library) then that process should use the
same parameters to the `ActorSystem(...)` call as the original
process, but this will only work for system bases that are capable of
supporting this. If your child processes are ending up using the
simpleSystemBase then it's likely that the arguments to
`ActorSystem(...)` are different for those processes.

Unfortunately, the multiprocQueueBase is not one of the bases capable
of supporting post-fork connections, and I apologize that this isn't
very clear from the documentation. The limitation on this base is
that the Python Queue functionality
(https://docs.python.org/3/library/multiprocessing.html#pipes-and-queues)
does not support that methodology; instead, a Queue must be created
*before* the sub-process is forked. A multiprocQueueBase address
cannot be sent to an unrelated process, and this is why you are
receiving the CannotPickle error.

The good news is that there are two solutions available to you:

1. Use the multiprocUDPBase or (even better) the multiprocTCPBase.
Both of these bases allow for new processes to connect to the actor
system that haven't been pre-configured.
2. Use the `ActorSystem().createActor()` to create the sub-process
actors. When made within the context of the current actor system,
Thespian will ensure that there is an active Queue to communicate with
the created sub-process actors when creating them.

Regards,
Kevin
> --
> You received this message because you are subscribed to the Google Groups "thespian.py" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to thespianpy+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/thespianpy/f8dd31dc-c948-4afa-8f6a-f26e8408c9d1n%40googlegroups.com.



--
-KQ

Torsten Hencke

unread,
Jan 24, 2023, 7:10:21 AM1/24/23
to thespian.py
Well, I just managed to get past that intermediate step now and got the system running the way I want (still using multiprocQueueBase). There are however some other inconveniences, I noticed:

1. requesting a "global" Actor for a second time doesn't seem work as expected:

asys = ActorSystem()
logger = asys.createActor(LoggerAgent, globalName='process_logger')

It works fine the first time I call it. When I call it a second time within the same process I don't get an answer. The creation process seems to get stuck somewhere in PeriodicExecutor._run(). I expected to just get the address of the existing Actor when using that call.

2. Shutting Down the Actor System is really slow (~10 seconds), even when there is only 1 Actor and no pending Messages:

asys = ActorSystem()
logger.info(f'Exiting Actor System')
asys.tell(asys.systemAddress, ActorExitRequest())
logger.info(f'Shutting down internal processes: {asys.systemAddress}')
asys.shutdown()   # this part takes 10 seconds.
logger.info(f'Internal processes shut down.')


Is there any way to bring that down to less than a second?

3. How do I handle a ActorExitRequest?

When the system shuts down I want my Actors to do some cleanup work before getting terminated. However, neither receiveMsg_ActorExitRequest  nor receiveUnrecognizedMessage seem to get called.

Best regards

Torsten

Torsten Hencke

unread,
Jan 24, 2023, 10:31:20 AM1/24/23
to thespian.py
Regarding 3): Thespian seems to fail handling my KeyboardInterrupt in MultiprocessQueueTransport.py, line 215. Catching the KeyboardInterrupt and continue "solved" the issue for me in development, but I'd prefer knowing a solution for production.

Kevin Quick

unread,
Jan 24, 2023, 4:22:03 PM1/24/23
to Torsten Hencke, thespian.py
The combination of your issue 1 and issue 2 make it sound like you
have either (a) an unresponsive Actor, or (b) a blasting Actor, and
your third issue leads me to suspect the former.

In case (a), the Actor might be unresponsive because your
receiveMessage() is performing some blocking call that never returns.
The context for an Actor's receiveMessage() (and the Actor model
itself) is single-threaded within that receiveMessage, so until it
returns the Actor cannot handle any other messages. On the
ActorSystem shutdown() call, each Actor is sent an ActorExitRequest.
This is actually delivered to the Actor's receiveMessage() method
(thus, for your third question, you would match on this message and
perform whatever cleanup you desired), so if the Actor's
receiveMessage() cannot be called because it is blocked on a previous
message then the ActorExitRequest cannot be processed by that Actor.
When an Actor's receiveMessage() exits after handling an
ActorExitRequest, the Actor is then shut down; if the ActorExitRequest
cannot be delivered to the receiveMessage then it will never complete
that processing and perform the shutdown. The ActorSystem has a
10-second failover after which it uses more forceful methods to kill
Actors that did not comply with the ActorExitRequest, which is
probably your second issue.

The other case (b) is a blasting Actor which is an Actor that
continuously blasts outgoing send() messages. There is
back-propagation flow control in Thespian such that a send()
call--which is normally asynchronous in that it completes in the
sending Actor without requiring the receiver to have handled it--is
blocked until the receiver handles messages. This blasting Actor has
two effects: it adds a large number of messages to propagate through
the ActorSystem and thus decreases the available bandwidth for
delivery of messages, and the other effect is that the blocking send()
looks very much like the situation described above for case (a).

I suspect that the second attempt to create a globalName Actor (your
first issue) blocks due to one of the above situations.

If this information doesn't help diagnose your issues, you could
provide the rough implementation of your Actor here that might help
with additional diagnosis.

I'll respond to the KeyboardInterrupt issue on the issue you filed
there (https://github.com/thespianpy/Thespian/issues/72).

Regards,
Kevin
> To view this discussion on the web visit https://groups.google.com/d/msgid/thespianpy/285e8647-932a-4a5f-b85c-a97df4427e46n%40googlegroups.com.



--
-KQ
Reply all
Reply to author
Forward
0 new messages