The problem? The outer service (Process) is based on BasicService.
BasicService (so all services) uses a simple state machine to manage
state transitions as well as waiting for state changes (for example,
when serve_forever calls start and waits until stopped). Turns out, I
made AbstractStateMachine use threading.Event when the subject (a
service) does not have an async manager to provide some kind of Event.
The result? When you call Process.serve_forever, it will start
everything and then block (Process is a BasicService with no
AsyncManager).
This was a rather scary realization. I thought maybe the entire
architecture was based on false assumptions. Especially since there
were some other bugs along the way. But it turns out, once those bugs
are fixed (for example, Service needs to install an AsyncManager
before BasicService installs a state machine, so I had to create a
pre_init hook to make that happen), the main thing needed was to
assume if you don't have an async manager, it will use a passthrough
event object. Which means, if you build something with BasicService,
any calls to self.state.wait() will just continue as if the event were
set. But you will almost never use BasicService because it's mostly
used for internal services (Process, Container, AsyncManager,
Service).
The problem with this is that when you call serve_forever on the outer
service, which is a Process, it will pass on
self.state.wait("stopped") and in the case of the runner, it will then
just exit out. SO, the solution I came up with was to somehow
recursively call self.state.wait("stopped"), assuming that there will
be some service in the hierarchy that will actually wait and yield to
whatever concurrency framework. Without cluttering the interface of
BasicService/Service, I just made serve_forever run recursively. It
will now:
- Try to start self, but if already started, just pass
- Call ready_callback
- Call serve_forever on children
- This will cause start to be called, but if start before worked,
it should have already been called. But that's fine, it will just keep
going
- Then it will recurse deeper
- Wait until stopped
- Then the outer service waits until stopped
So then when you call stop, it goes down the tree and calls stop on
all services, which will trigger the stopped event and bubble up to
the top, cause serve_forever to continue and in the case of the outer
service, the program will exit. Yay.
Here is the significant commit for review:
https://github.com/progrium/ginkgo/commit/1ff9441a8a52649e05dd07937c8a536f93975d27