Hi,
I'm working through the failure scenarios to understand where we could lose data and I'm receiving a dead letter error unexpectedly.
The model is a Collector Actor and a set of Worker Actors. Collector is the parent to the Workers. A 'boss' is watching the Collector with a restart() SupervisorStrategy. Workers 'pull' work from the Collector, so when a Worker finishes, it sends a 'done' request then the Collector finds the next thing to do and sends it.
We are deliberately throwing an exception in the Collector's onReceive() method so it gets restarted to understand the flow. We are also making the Worker Actors sleep for 5 seconds to simulate activity when they receive work.
Frequently we see dead letter errors on the LAST message sent from the Collector before we throw the exception:
[INFO] [03/06/2014 08:59:04.801] [Main-akka.actor.default-dispatcher-2] [akka://Main/user/app/Supervisor/1394114329779_4] Message [com.silverpop.rnd.collector_failure.TestMessage] from Actor[akka://Main/user/app/Supervisor#-138733138] to Actor[akka://Main/user/app/Supervisor/1394114329779_4#1820908525] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
Based on the timestamps in the logs, it looks like the restart process is stopping the Worker (who is idle at this point) BEFORE the write to its Mailbox, so the message is getting send to the dead letter.
Sample code:
@Override
public void onReceive(Object msg) {
m_numProcessed++;
TestMessage test = (TestMessage) msg;
log.error("Received message from: " + sender().path().toString() + " " + ((TestMessage) msg).getMessageNumber());
ActorRef sender = sender();
if (sender.isTerminated()) {
log.error("Sender was terminated, so not sending the next message. " + sender.path());
} else {
int msgId = m_messageNumber++;
log.error("Sending message id: " + msgId);
sender().tell(new TestMessage(NUM_SECONDS, msgId), getSelf());
}
boolean blowup = true;
if (blowup && m_numProcessed % 20 == 0) {
throw new RuntimeException("Blowing up the Collector");
}
}
Full logs:
[ERROR] [03/06/2014 08:59:04.794] [Main-akka.actor.default-dispatcher-10] [akka://Main/user/app/Supervisor] Received message from: akka://Main/user/app/Supervisor/1394114329779_3 16
[ERROR] [03/06/2014 08:59:04.794] [Main-akka.actor.default-dispatcher-10] [akka://Main/user/app/Supervisor] Sending message id: 20
[ERROR] [03/06/2014 08:59:04.794] [Main-akka.actor.default-dispatcher-10] [akka://Main/user/app/Supervisor] Received message from: akka://Main/user/app/Supervisor/1394114329779_0 17
[ERROR] [03/06/2014 08:59:04.794] [Main-akka.actor.default-dispatcher-10] [akka://Main/user/app/Supervisor] Sending message id: 21
[ERROR] [03/06/2014 08:59:04.794] [Main-akka.actor.default-dispatcher-10] [akka://Main/user/app/Supervisor] Received message from: akka://Main/user/app/Supervisor/1394114329779_2 15
[ERROR] [03/06/2014 08:59:04.794] [Main-akka.actor.default-dispatcher-10] [akka://Main/user/app/Supervisor] Sending message id: 22
[ERROR] [03/06/2014 08:59:04.794] [Main-akka.actor.default-dispatcher-10] [akka://Main/user/app/Supervisor] Received message from: akka://Main/user/app/Supervisor/1394114329779_1 18
[ERROR] [03/06/2014 08:59:04.794] [Main-akka.actor.default-dispatcher-10] [akka://Main/user/app/Supervisor] Sending message id: 23
[ERROR] [03/06/2014 08:59:04.796] [Main-akka.actor.default-dispatcher-6] [akka://Main/user/app/Supervisor] Received message from: akka://Main/user/app/Supervisor/1394114329779_4 19
[ERROR] [03/06/2014 08:59:04.796] [Main-akka.actor.default-dispatcher-6] [akka://Main/user/app/Supervisor] Sending message id: 24
[ERROR] [03/06/2014 08:59:04.796] [Main-akka.actor.default-dispatcher-2] [akka://Main/user/app/Supervisor] Blowing up the Collector
java.lang.RuntimeException: Blowing up the Collector
at com.silverpop.rnd.collector_failure.CollectorFailure.onReceive(CollectorFailure.java:68)
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[ERROR] [03/06/2014 08:59:04.796] [Main-akka.actor.default-dispatcher-2] [akka://Main/user/app/Supervisor] Collector: I got a preRestart
[ERROR] [03/06/2014 08:59:04.797] [Main-akka.actor.default-dispatcher-6] [akka://Main/user/app/Supervisor/1394114329779_4] akka://Main/user/app/Supervisor/1394114329779_4 I got a post-stop message? Why?
[INFO] [03/06/2014 08:59:04.801] [Main-akka.actor.default-dispatcher-2] [akka://Main/user/app/Supervisor/1394114329779_4] Message [com.silverpop.rnd.collector_failure.TestMessage] from Actor[akka://Main/user/app/Supervisor#-138733138] to Actor[akka://Main/user/app/Supervisor/1394114329779_4#1820908525] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[ERROR] [03/06/2014 08:59:09.798] [Main-akka.actor.default-dispatcher-9] [akka://Main/user/app/Supervisor/1394114329779_0] akka://Main/user/app/Supervisor/1394114329779_0 I got a post-stop message? Why?
[ERROR] [03/06/2014 08:59:09.798] [Main-akka.actor.default-dispatcher-10] [akka://Main/user/app/Supervisor/1394114329779_3] akka://Main/user/app/Supervisor/1394114329779_3 I got a post-stop message? Why?
looking at the timestamps, the other threads (akka://Main/user/app/Supervisor/1394114329779_0 for example) finish their 5 second sleep before getting the postShutdown message as you'd expect.
but akka://Main/user/app/Supervisor/1394114329779_4 gets the postShutdown message immediately and then the dead letter error.
Reading the documentation, shouldn't the inbox for that Worker still exist while Collector is doing it's restart logic? Or is the issue because the child Actor is destroyed and not doing its own 'restart' logic?
Java 1.7, Akka 2.3.0 running on a local machine from in IntelliJ.
Thanks,
Chris