Hi gang.
What is the process for a node to gracefully exit a cluster?
Nodes in our system are going through this sequence:
- jvm gets the shutdown signal
- node calls cluster.leave(cluster.selfAddress)
- node waits until it sees MemberRemoved with its own address
- node gives singletons a grace period to migrate
- actor system is shutdown
- jvm exits
This feels correct, but the docs are fuzzy on when the node can drop out.
Moreover, ClusterSingletonManager has a hard time with this flow. Especially for 1-node clusters, it tries to handover to a non-existing peer, fails, and then fails harder when it is restarted and the cluster service is no longer running.
Is there a better way for nodes to leave the cluster?
Logs below.
INFO 12:19:41,356 com.kixeye.common.cluster.ClusterModule - member removed: leave completed!
INFO 12:19:41,362 com.kixeye.common.log.AkkaLogger - Cluster Node [akka.tcp://gh...@127.0.0.1:50570] - Shutting down...
INFO 12:19:41,371 com.kixeye.common.log.AkkaLogger - Cluster Node [akka.tcp://gh...@127.0.0.1:50570] - Successfully shut down
INFO 12:19:41,374 akka.contrib.pattern.ClusterSingletonManager - Exited [akka.tcp://gh...@127.0.0.1:50570] INFO 12:19:41,376 akka.contrib.pattern.ClusterSingletonManager - Oldest observed OldestChanged: [akka.tcp://gh...@127.0.0.1:50570 -> None]
INFO 12:19:41,381 akka.contrib.pattern.ClusterSingletonManager - ClusterSingletonManager state change [Oldest -> WasOldest]
INFO 12:19:41,396 akka.actor.LocalActorRef - Message [akka.cluster.ClusterEvent$LeaderChanged] from Actor[akka://ghost/deadLetters] to Actor[akka://ghost/system/cluster/core/daemon/autoDown#2017004581] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
INFO 12:19:41,396 akka.actor.LocalActorRef - Message [akka.dispatch.sysmsg.Terminate] from Actor[akka://ghost/system/cluster/core/daemon/heartbeatSender#1919962524] to Actor[akka://ghost/system/cluster/core/daemon/heartbeatSender#1919962524] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
INFO 12:19:41,397 akka.actor.LocalActorRef - Message [akka.cluster.ClusterEvent$RoleLeaderChanged] from Actor[akka://ghost/deadLetters] to Actor[akka://ghost/system/cluster/core/daemon/autoDown#2017004581] was not delivered. [3] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
INFO 12:19:41,397 akka.actor.LocalActorRef - Message [akka.cluster.ClusterEvent$SeenChanged] from Actor[akka://ghost/deadLetters] to Actor[akka://ghost/system/cluster/core/daemon/autoDown#2017004581] was not delivered. [4] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
INFO 12:19:41,398 akka.actor.LocalActorRef - Message [akka.cluster.InternalClusterAction$Unsubscribe] from Actor[akka://ghost/deadLetters] to Actor[akka://ghost/system/cluster/core/daemon#1571353727] was not delivered. [5] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
INFO 12:19:41,398 akka.actor.LocalActorRef - Message [akka.cluster.InternalClusterAction$Unsubscribe] from Actor[akka://ghost/deadLetters] to Actor[akka://ghost/system/cluster/core/daemon#1571353727] was not delivered. [6] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
INFO 12:19:42,395 akka.contrib.pattern.ClusterSingletonManager - Retry [1], sending TakeOverFromMe to [None]
INFO 12:19:43,415 akka.contrib.pattern.ClusterSingletonManager - Retry [2], sending TakeOverFromMe to [None]
INFO 12:19:44,435 akka.contrib.pattern.ClusterSingletonManager - Retry [3], sending TakeOverFromMe to [None]
INFO 12:19:45,455 akka.contrib.pattern.ClusterSingletonManager - Retry [4], sending TakeOverFromMe to [None]
INFO 12:19:46,475 akka.contrib.pattern.ClusterSingletonManager - Retry [5], sending TakeOverFromMe to [None]
ERROR 12:19:47,517 akka.actor.OneForOneStrategy - Expected hand-over to [None] never occured
akka.contrib.pattern.ClusterSingletonManagerIsStuck: Expected hand-over to [None] never occured
at akka.contrib.pattern.ClusterSingletonManager$$anonfun$10.applyOrElse(ClusterSingletonManager.scala:556) ~[akka-contrib_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.contrib.pattern.ClusterSingletonManager$$anonfun$10.applyOrElse(ClusterSingletonManager.scala:548) ~[akka-contrib_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) ~[scala-library.jar:?]
at akka.actor.FSM$class.processEvent(FSM.scala:603) ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.contrib.pattern.ClusterSingletonManager.processEvent(ClusterSingletonManager.scala:336) ~[akka-contrib_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:597) ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:569) ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.Actor$class.aroundReceive(Actor.scala:465) ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.contrib.pattern.ClusterSingletonManager.aroundReceive(ClusterSingletonManager.scala:336) ~[akka-contrib_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:491) [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.ActorCell.invoke_aroundBody2(ActorCell.scala:462) [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.ActorCell.invoke_aroundBody3$advice(ActorCell.scala:536) [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.ActorCell.invoke(ActorCell.scala:1) [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.dispatch.Mailbox.run(Mailbox.scala:220) [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library.jar:?]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library.jar:?]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library.jar:?]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library.jar:?]
INFO 12:19:47,520 akka.actor.LocalActorRef - Message [akka.cluster.InternalClusterAction$Unsubscribe] from Actor[akka://ghost/deadLetters] to Actor[akka://ghost/system/cluster/core/daemon#1571353727] was not delivered. [7] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
INFO 12:19:47,520 akka.actor.LocalActorRef - Message [akka.cluster.InternalClusterAction$Unsubscribe] from Actor[akka://ghost/deadLetters] to Actor[akka://ghost/system/cluster/core/daemon#1571353727] was not delivered. [8] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
INFO 12:19:47,581 akka.actor.LocalActorRef - Message [akka.actor.PoisonPill$] from Actor[akka://ghost/user/$a/masterregion-DI_USA_1/queuemgr-DI_USA_1/$b#-1227772002] to Actor[akka://ghost/user/$a/masterregion-DI_USA_1/queuemgr-DI_USA_1/$b/$a#1401591931] was not delivered. [9] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
INFO 12:19:47,611 akka.actor.LocalActorRef - Message [akka.actor.PoisonPill$] from Actor[akka://ghost/user/$a/masterregion-DI_USA_1/masterstats-DI_USA_1/$a#1216246889] to Actor[akka://ghost/user/$a/masterregion-DI_USA_1/masterstats-DI_USA_1/$a/$a#-146187624] was not delivered. [10] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
ERROR 12:19:47,616 akka.actor.OneForOneStrategy - requirement failed: Cluster node must not be terminated
akka.actor.PostRestartException: exception post restart (class akka.contrib.pattern.ClusterSingletonManagerIsStuck)
at akka.actor.dungeon.FaultHandling$$anonfun$6.apply(FaultHandling.scala:240) ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.dungeon.FaultHandling$$anonfun$6.apply(FaultHandling.scala:238) ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1.applyOrElse(FaultHandling.scala:293) ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1.applyOrElse(FaultHandling.scala:288) ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) ~[scala-library.jar:?]
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) ~[scala-library.jar:?]
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) ~[scala-library.jar:?]
at akka.actor.dungeon.FaultHandling$class.finishRecreate(FaultHandling.scala:238) ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.dungeon.FaultHandling$class.handleChildTerminated(FaultHandling.scala:281) ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.ActorCell.handleChildTerminated(ActorCell.scala:344) ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.dungeon.DeathWatch$class.watchedActorTerminated(DeathWatch.scala:53) ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.ActorCell.watchedActorTerminated(ActorCell.scala:344) ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:430) [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.ActorCell.systemInvoke_aroundBody0(ActorCell.scala:453) [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.ActorCell.systemInvoke_aroundBody1$advice(ActorCell.scala:477) [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:1) [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.dispatch.Mailbox.run(Mailbox.scala:219) [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library.jar:?]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library.jar:?]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library.jar:?]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library.jar:?]
Caused by: java.lang.IllegalArgumentException: requirement failed: Cluster node must not be terminated
at scala.Predef$.require(Predef.scala:233) ~[scala-library.jar:?]
at akka.contrib.pattern.ClusterSingletonManager.preStart(ClusterSingletonManager.scala:389) ~[akka-contrib_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.Actor$class.postRestart(Actor.scala:547) ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.contrib.pattern.ClusterSingletonManager.postRestart(ClusterSingletonManager.scala:336) ~[akka-contrib_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.Actor$class.aroundPostRestart(Actor.scala:485) ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.contrib.pattern.ClusterSingletonManager.aroundPostRestart(ClusterSingletonManager.scala:336) ~[akka-contrib_2.10-2.3.0-RC1.jar:2.3.0-RC1]
at akka.actor.dungeon.FaultHandling$class.finishRecreate(FaultHandling.scala:229) ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
... 15 more