Akka Cluster Singleton - Rejoining to existing cluster

Madabhattula Rajesh Kumar

unread,

Jun 26, 2016, 6:27:31 AM6/26/16

to Akka User List

Hi Team,

I have a two nodes(A and B) in a Akka cluster. I am using Akka cluster singleton

When initial cluster, A is a Oldest node and B is a Younger node.

I manually killed A node using control C command. After some time, I started Node A but Node A is not rejoining the cluster. Node A starting in its own cluster.

Now I have a two clusters A and B.

I am getting below exceptions.

How to resolve this issue? Please help me

Exception 1:-

[WARN] [06/26/2016 00:28:07.842] [ClusterSystem-akka.actor.default-dispatcher-15] [akka.tcp://Cluste...@127.0.0.1:2552/system/cluster/core/daemon] Cluster Node [akka.tcp://Cluste...@127.0.0.1:2552] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://Cluste...@127.0.0.1:2551, status = Up)]
[INFO] [06/26/2016 00:28:17.851] [ClusterSystem-akka.actor.default-dispatcher-24] [akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://Cluste...@127.0.0.1:2552] - Leader is auto-downing unreachable node [akka.tcp://Cluste...@127.0.0.1:2551]
[INFO] [06/26/2016 00:28:17.851] [ClusterSystem-akka.actor.default-dispatcher-19] [akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://Cluste...@127.0.0.1:2552] - Marking unreachable node [akka.tcp://Cluste...@127.0.0.1:2551] as [Down]
[INFO] [06/26/2016 00:28:18.853] [ClusterSystem-akka.actor.default-dispatcher-15] [akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://Cluste...@127.0.0.1:2552] - Leader is removing unreachable node [akka.tcp://Cluste...@127.0.0.1:2551]
[INFO] [06/26/2016 00:28:18.853] [ClusterSystem-akka.actor.default-dispatcher-24] [akka.tcp://Cluste...@127.0.0.1:2552/user/clusterSingleton] Member removed [akka.tcp://Cluste...@127.0.0.1:2551]
[WARN] [06/26/2016 00:28:19.007] [ClusterSystem-akka.remote.default-remote-dispatcher-6] [akka.tcp://Cluste...@127.0.0.1:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FClusterSystem%40127.0.0.1%3A2551-1] Association with remote system [akka.tcp://Cluste...@127.0.0.1:2551] has failed, address is now gated for [5000] ms. Reason: [Disassociated]

Exception 2:-

[WARN] [06/26/2016 03:48:34.827] [ClusterSystem-akka.remote.default-remote-dispatcher-19] [akka.tcp://Cluste...@127.0.0.1:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FClusterSystem%40127.0.0.1%3A2551-1] Association with remote system [akka.tcp://Cluste...@127.0.0.1:2551] has failed, address is now gated for [5000] ms. Reason: [Disassociated]
[ERROR] [06/26/2016 03:48:34.832] [ClusterSystem-akka.remote.default-remote-dispatcher-20] [akka.remote.Remoting] Association to [akka.tcp://Cluste...@127.0.0.1:2551] with UID [1216631716] irrecoverably failed. Quarantining address.
java.lang.IllegalStateException: Error encountered while processing system message acknowledgement buffer: [0 {}] ack: ACK[1, {}]
    at akka.remote.ReliableDeliverySupervisor$$anonfun$receive$1.applyOrElse(Endpoint.scala:299)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:484)
    at akka.remote.ReliableDeliverySupervisor.aroundReceive(Endpoint.scala:198)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
    at akka.actor.ActorCell.invoke(ActorCell.scala:495)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
    at akka.dispatch.Mailbox.run(Mailbox.scala:224)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalArgumentException: Highest SEQ so far was 0 but cumulative ACK is 1
    at akka.remote.AckedSendBuffer.acknowledge(AckedDelivery.scala:103)
    at akka.remote.ReliableDeliverySupervisor$$anonfun$receive$1.applyOrElse(Endpoint.scala:295)
    ... 11 more

[WARN] [06/26/2016 03:48:34.833] [ClusterSystem-akka.actor.default-dispatcher-15] [akka.tcp://Cluste...@127.0.0.1:2552/system/cluster/core/daemon] Cluster Node [akka.tcp://Cluste...@127.0.0.1:2552] - Marking node as TERMINATED [akka.tcp://Cluste...@127.0.0.1:2551], due to quarantine. Node roles []
[INFO] [06/26/2016 03:48:34.835] [ClusterSystem-akka.actor.default-dispatcher-15] [akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://Cluste...@127.0.0.1:2552] - Marking unreachable node [akka.tcp://Cluste...@127.0.0.1:2551] as [Down]
[INFO] [06/26/2016 03:48:35.182] [ClusterSystem-akka.remote.default-remote-dispatcher-20] [akka.remote.Remoting] Quarantined address [akka.tcp://Cluste...@127.0.0.1:2551] is still unreachable or has not been restarted. Keeping it quarantined.
[INFO] [06/26/2016 03:48:35.403] [ClusterSystem-akka.actor.default-dispatcher-2] [akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://Cluste...@127.0.0.1:2552] - Leader is removing unreachable node [akka.tcp://Cluste...@127.0.0.1:2551]
(Member is Removed: {} after {},akka.tcp://Cluste...@127.0.0.1:2551,Down)
[INFO] [06/26/2016 03:48:35.408] [ClusterSystem-akka.actor.default-dispatcher-16] [akka.tcp://Cluste...@127.0.0.1:2552/user/clusterSingleton] Member removed [akka.tcp://Cluste...@127.0.0.1:2551]

Regards,
Rajesh

Guido Medina

unread,

Jun 26, 2016, 9:28:51 AM6/26/16

to Akka User List

Hi Rajesh,

If you have only one seed node, once restarted, other nodes won't try to re-join, at least that's the behavior for akka 2.4.7

I see one exception that was probably fixed already so you might want to use either latest 2.4.x which is 2.4.7 or latest 2.3.x

Try the latest version of whatever major version you are using and also make both nodes seed nodes,

as a general rule if you want any node to re-join the cluster after restart there has to be always one seed node alive otherwise your cluster dies.

Edit: This my knowledge of it, I can't be wrong so the best place to confirm that is by reading the cluster documentation.

HTH,

Guido.

On Sunday, June 26, 2016 at 11:27:31 AM UTC+1, Madabhattula Rajesh Kumar wrote:

Hi Team,

I have a two nodes(A and B) in a Akka cluster. I am using Akka cluster singleton

When initial cluster, A is a Oldest node and B is a Younger node.

I manually killed A node using control C command. After some time, I started Node A but Node A is not rejoining the cluster. Node A starting in its own cluster.

Now I have a two clusters A and B.

I am getting below exceptions.

How to resolve this issue? Please help me

Exception 1:-

[WARN] [06/26/2016 00:28:07.842] [ClusterSystem-akka.actor.default-dispatcher-15] [akka.tcp://ClusterSystem@127.0.0.1:2552/system/cluster/core/daemon] Cluster Node [akka.tcp://ClusterSystem@127.0.0.1:2552] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://ClusterSystem@127.0.0.1:2551, status = Up)]
[INFO] [06/26/2016 00:28:17.851] [ClusterSystem-akka.actor.default-dispatcher-24] [akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://ClusterSystem@127.0.0.1:2552] - Leader is auto-downing unreachable node [akka.tcp://ClusterSystem@127.0.0.1:2551]
[INFO] [06/26/2016 00:28:17.851] [ClusterSystem-akka.actor.default-dispatcher-19] [akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://ClusterSystem@127.0.0.1:2552] - Marking unreachable node [akka.tcp://ClusterSystem@127.0.0.1:2551] as [Down]
[INFO] [06/26/2016 00:28:18.853] [ClusterSystem-akka.actor.default-dispatcher-15] [akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://ClusterSystem@127.0.0.1:2552] - Leader is removing unreachable node [akka.tcp://ClusterSystem@127.0.0.1:2551]
[INFO] [06/26/2016 00:28:18.853] [ClusterSystem-akka.actor.default-dispatcher-24] [akka.tcp://ClusterSystem@127.0.0.1:2552/user/clusterSingleton] Member removed [akka.tcp://ClusterSystem@127.0.0.1:2551]
[WARN] [06/26/2016 00:28:19.007] [ClusterSystem-akka.remote.default-remote-dispatcher-6] [akka.tcp://ClusterSystem@127.0.0.1:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FClusterSystem%40127.0.0.1%3A2551-1] Association with remote system [akka.tcp://ClusterSystem@127.0.0.1:2551] has failed, address is now gated for [5000] ms. Reason: [Disassociated]

Exception 2:-

[WARN] [06/26/2016 03:48:34.827] [ClusterSystem-akka.remote.default-remote-dispatcher-19] [akka.tcp://ClusterSystem@127.0.0.1:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FClusterSystem%40127.0.0.1%3A2551-1] Association with remote system [akka.tcp://ClusterSystem@127.0.0.1:2551] has failed, address is now gated for [5000] ms. Reason: [Disassociated]
[ERROR] [06/26/2016 03:48:34.832] [ClusterSystem-akka.remote.default-remote-dispatcher-20] [akka.remote.Remoting] Association to [akka.tcp://ClusterSystem@127.0.0.1:2551] with UID [1216631716] irrecoverably failed. Quarantining address.

java.lang.IllegalStateException: Error encountered while processing system message acknowledgement buffer: [0 {}] ack: ACK[1, {}]
    at akka.remote.ReliableDeliverySupervisor$$anonfun$receive$1.applyOrElse(Endpoint.scala:299)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:484)
    at akka.remote.ReliableDeliverySupervisor.aroundReceive(Endpoint.scala:198)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
    at akka.actor.ActorCell.invoke(ActorCell.scala:495)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
    at akka.dispatch.Mailbox.run(Mailbox.scala:224)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalArgumentException: Highest SEQ so far was 0 but cumulative ACK is 1
    at akka.remote.AckedSendBuffer.acknowledge(AckedDelivery.scala:103)
    at akka.remote.ReliableDeliverySupervisor$$anonfun$receive$1.applyOrElse(Endpoint.scala:295)
    ... 11 more

[WARN] [06/26/2016 03:48:34.833] [ClusterSystem-akka.actor.default-dispatcher-15] [akka.tcp://ClusterSystem@127.0.0.1:2552/system/cluster/core/daemon] Cluster Node [akka.tcp://ClusterSystem@127.0.0.1:2552] - Marking node as TERMINATED [akka.tcp://ClusterSystem@127.0.0.1:2551], due to quarantine. Node roles []
[INFO] [06/26/2016 03:48:34.835] [ClusterSystem-akka.actor.default-dispatcher-15] [akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://ClusterSystem@127.0.0.1:2552] - Marking unreachable node [akka.tcp://ClusterSystem@127.0.0.1:2551] as [Down]
[INFO] [06/26/2016 03:48:35.182] [ClusterSystem-akka.remote.default-remote-dispatcher-20] [akka.remote.Remoting] Quarantined address [akka.tcp://ClusterSystem@127.0.0.1:2551] is still unreachable or has not been restarted. Keeping it quarantined.
[INFO] [06/26/2016 03:48:35.403] [ClusterSystem-akka.actor.default-dispatcher-2] [akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://ClusterSystem@127.0.0.1:2552] - Leader is removing unreachable node [akka.tcp://ClusterSystem@127.0.0.1:2551]
(Member is Removed: {} after {},akka.tcp://ClusterSystem@127.0.0.1:2551,Down)
[INFO] [06/26/2016 03:48:35.408] [ClusterSystem-akka.actor.default-dispatcher-16] [akka.tcp://ClusterSystem@127.0.0.1:2552/user/clusterSingleton] Member removed [akka.tcp://ClusterSystem@127.0.0.1:2551]

Regards,
Rajesh

Guido Medina

unread,

Jun 26, 2016, 9:30:05 AM6/26/16

to Akka User List

Edit: I can be wrong that's what I meant, "not can't"

Madabhattula Rajesh Kumar

unread,

Jun 26, 2016, 10:03:12 AM6/26/16

to Akka User List

Hi Guido,

I am using 2.4.7 version. In my test Node B is active after kill Node A.

When I kill Node A, Node B become Oldest. When I restart Node A, Node A starts it's own cluster. It is not rejoining the cluster.

Regards,
Rajesh

Guido Medina

unread,

Jun 26, 2016, 10:08:15 AM6/26/16

to Akka User List

Like I said, make both nodes seed nodes, it seems to me you are restarting the ONLY seed node you have.

Reply all

Reply to author

Forward