Clustering - New Incarnation unable to re-join

141 views
Skip to first unread message

Drew Goya

unread,
May 26, 2016, 6:56:22 AM5/26/16
to Akka User List
I'm having some difficulty with akka clustering and I'm hoping someone here has solved this problem before.

I'm running an 30 node cluster in AWS and quite often when several nodes restart the cluster state refuses to converge and I get logs that looks like this:

New incarnation of existing member [Member(address = akka.tcp://Bidder...@172.16.29.115:2552, status = Down)] is trying to join. Existing will be removed from the cluster and then new member will be allowed to join.


Followed by:
 Leader can currently not perform its duties, reachability status: ...


I've tried all manner of tuning in the akka.remote and akka.cluster sections of application.conf but nothing really seems to help.

I'm currently running akka 2.4.6 (I saw there was a fix related to this in 2.4.3)

Relevant snippet of my application.conf

remote {
  log-remote-lifecycle-events = on
  netty.tcp {
    hostname = ${?LOCAL_IP}
    bind-hostname = 0.0.0.0
    port = ${?AKKA_PORT}
  }
  use-dispatcher = "remote-dispatcher"
  retry-gate-closed-for = 5 s
}
cluster {
  seed-nodes = [${?SEED_NODE1}, ${?SEED_NODE2}, ${?SEED_NODE3}]
  use-dispatcher = "cluster-dispatcher"
  roles = ["backend"]
  auto-down-unreachable-after = 10s
  allow-weakly-up-members = on
  role {
    # Minimum required number of members of a certain role before the leader
    # changes member status of 'Joining' members to 'Up'. Typically used together
    # with 'Cluster.registerOnMemberUp' to defer some action, such as starting
    # actors, until the cluster has reached a certain size.
    backend.min-nr-of-members = 20
  }
}

Akka Team

unread,
Jun 3, 2016, 5:09:27 AM6/3/16
to Akka User List
Hi Drew,

When a cluster has nodes marked unreachable new nodes cannot be marked as up until
those nodes are either reachable again or has been downed. This is what the "leader can currently
not perform its duties, reachability status ..." message informs you about. As soon as the
cluster has reached a state with no unreachable nodes the re-joining node should transition
to the Up state.


-- 
Johan
Akka Team
Typesafe - Reactive apps on the JVM
Blog: letitcrash.com
Twitter: @akkateam
Reply all
Reply to author
Forward
0 new messages