Akka-cluster holds and keeps storing wrong info about membership on EC2/docker

45 views
Skip to first unread message

Unknown Unknown

unread,
Feb 22, 2018, 6:11:20 AM2/22/18
to Akka User List

Hi All.


There are two hosts on EC2: H1 and H2. There are tree modules of sbt-project: masterclientand worker. Each module implement akka-cluster node, which subscribes to a cluster events and logs them. Also each node logs a cluster state every 1 minute (for debug). The following ports are used for cluster-nodes:master: 2551worker: 3000client: 5000

The project available at github

The more details about infrastructure: my previous question

A module can be redeployed in H1 or H2 randomly.

There is a strange behavior of the akka-cluster. When one of nodes (for example worker) is redeployed. The following steps illustrate a history of deploying:

The initial state - when worker is deployed on H1 and master and client are deployed on H2

----[state-of-deploying-0]---  
H1 = [worker]
H2 = [master, client]

cluster status:    // cluster works correctly
  Member(address = akka.tcp://ClusterSystem@H1:3000, status = Up)
  Member(address = akka.tcp://ClusterSystem@H2:2551, status = Up)
  Member(address = akka.tcp://ClusterSystem@H2:5000, status = Up)
----------------

After that the worker module has been redeployed on host H2

----[state-of-deploying-1]---  
H1 = [-]
H2 = [master, client, worker (Redeployed)]

cluster status:    // WRONG cluster state!
  Member(address = akka.tcp://ClusterSystem@H1:3000, status = Up) // ???
  Member(address = akka.tcp://ClusterSystem@H2:2551, status = Up)
  Member(address = akka.tcp://ClusterSystem@H2:3000, status = WeaklyUp)
  Member(address = akka.tcp://ClusterSystem@H2:5000, status = Up)
----------------

The above situation happens occasionally. In this case a cluster stores a wrong state of membership and will not repair it:

Member(address = akka.tcp://ClusterSystem@H1:3000, status = Up) // ???

The host H1 doesn't contain any instances of worker. And > telnet H1 3000 returns connection refused. But why does the akka-cluster keep storing this wrong info?


---

this question is duplicated from https://stackoverflow.com/questions/48924863

Reply all
Reply to author
Forward
0 new messages