Hi,
As some of you are probably aware, I'm playing around with Akka and domain driven design. In another thread, we discussed the idea of a single actor per aggregate instance. We also discussed the idea of timing out an actor when it's not being used.
I'm aware I can use ReceiveTimeout on a child. My thinking up to now, has been to then let the parent coordinator (which supervises all instances of a given aggregate) know via a StopMe message. The parent would then call context.stop(childRef) in order to stop the child. According to the docs, context.stop will immediately free up the actor name for further use. So, should the next message in the parent's mailbox be for the same aggregate instance, it will create a new actor and forward the message on to that - all seems good. However, maybe I'm mistaken, but I can see a possible race condition here.
Calling context.stop() would allow the existing child to finish processing the current message it's working on (despite ReceiveTimeout, it's feasible it could have received another message) and thus, in theory at least, for a very short time you could have two actors, representing the same aggregate instance, alive at the same time. With akka-persistence (EventsourcedProcessor) in the mix, one of a few implications is that, the "dying" instance writes an event to the journal after the new instance has finished recovering events from the journal - the new instance could/would, therefore, be in an invalid state.
Another issue in the above example is that, if there are further messages in the "dying" child's mailbox when calling context.stop(), these messages will be sent to dead letters when, in fact, you'd want them to be processed by the new child _before_ any "new" messages destined for it.
I'm sorry if I've explained this badly - is it at all clear what I'm getting at? If I'm not talking nonsense, can anyone suggest a better pattern that can:
1) Enforce the unique actor per aggregate instance guarantee
2) Ensure messages (commands) are not lost or delivered out of order
This is probably an edge case because, in most systems, an actor reaching its configured ReceiveTimeout will probably mean there really is a low chance of it receiving further messages within the window where the race condition exists.
Andrew