In akka cluster, why a node which state is down can be makes reachable

388 views
Skip to first unread message

lisheng...@gmail.com

unread,
May 31, 2016, 4:53:48 AM5/31/16
to Akka User List, 2817...@qq.com
the log is follow:
2016-05-31 07:40:54,053 | WARN  | lt-dispatcher-16 | ClusterCoreDaemon                | 167 - com.typesafe.akka.slf4j - 2.3.10 | Cluster Node [akka.tcp://opendaylight...@192.168.23.240:2550] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://opendaylight...@192.168.23.102:2550, status = Up)]
2016-05-31 07:41:08,785 | INFO  | lt-dispatcher-14 | kka://opendaylight-cluster-data) | 167 - com.typesafe.akka.slf4j - 2.3.10 | Cluster Node [akka.tcp://opendaylight...@192.168.23.240:2550] - Leader is auto-downing unreachable node [akka.tcp://opendaylight...@192.168.23.102:2550]
2016-05-31 07:41:11,267 | INFO  | lt-dispatcher-14 | kka://opendaylight-cluster-data) | 167 - com.typesafe.akka.slf4j - 2.3.10 | Cluster Node [akka.tcp://opendaylight...@192.168.23.240:2550] - Marking unreachable node [akka.tcp://opendaylight...@192.168.23.102:2550] as [Down]
2016-05-31 07:41:12,243 | INFO  | lt-dispatcher-14 | kka://opendaylight-cluster-data) | 167 - com.typesafe.akka.slf4j - 2.3.10 | Cluster Node [akka.tcp://opendaylight...@192.168.23.240:2550] - Marking node(s) as REACHABLE [Member(address = akka.tcp://opendaylight...@192.168.23.102:2550, status = Down)]
2016-05-31 07:41:12,243 | INFO  | lt-dispatcher-14 | kka://opendaylight-cluster-data) | 167 - com.typesafe.akka.slf4j - 2.3.10 | Cluster Node [akka.tcp://opendaylight...@192.168.23.240:2550] - Marking node(s) as REACHABLE [Member(address = akka.tcp://opendaylight...@192.168.23.102:2550, status = Down)]

And then any log, leader remove the node 192.168.23.102:2550

the Akka version is 2.3.10

thanks

Mark Hatton

unread,
Jun 2, 2016, 9:09:16 AM6/2/16
to Akka User List, 2817...@qq.com
The log output looks sensible to me given that you are using auto downing.  I read it as follows:-
  1. Node 102 is genuinely not reachable from node 240.  This may be due to network partition, or too much GC, IO, CPU, etc
  2. Node 240's failure detector fails to receive sufficient heartbeats from 102 and marks it as unreachable and then auto-downs it
  3. Node 102 recovers (e.g. network partition resolves itself)
  4. Node 240 detects 102 as reachable again, but since it is marked down it is unable to rejoin the cluster
In this scenario if you disabled auto-downing or configured it to be less aggressive, the 102 node could have successfully rejoined.

Relevant quote from the docs:

"unreachable is not a real member states but more of a flag in addition to the state signaling that the cluster is unable to talk to this node"

Hope that helps

Mark


On Tuesday, 31 May 2016 09:53:48 UTC+1, lisheng...@gmail.com wrote:
the log is follow:
2016-05-31 07:40:54,053 | WARN  | lt-dispatcher-16 | ClusterCoreDaemon                | 167 - com.typesafe.akka.slf4j - 2.3.10 | Cluster Node [akka.tcp://opendaylight-cluste...@192.168.23.240:2550] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://opendaylight-cluste...@192.168.23.102:2550, status = Up)]
2016-05-31 07:41:08,785 | INFO  | lt-dispatcher-14 | kka://opendaylight-cluster-data) | 167 - com.typesafe.akka.slf4j - 2.3.10 | Cluster Node [akka.tcp://opendaylight-cluste...@192.168.23.240:2550] - Leader is auto-downing unreachable node [akka.tcp://opendaylight-cluste...@192.168.23.102:2550]
2016-05-31 07:41:11,267 | INFO  | lt-dispatcher-14 | kka://opendaylight-cluster-data) | 167 - com.typesafe.akka.slf4j - 2.3.10 | Cluster Node [akka.tcp://opendaylight-cluste...@192.168.23.240:2550] - Marking unreachable node [akka.tcp://opendaylight-cluste...@192.168.23.102:2550] as [Down]
2016-05-31 07:41:12,243 | INFO  | lt-dispatcher-14 | kka://opendaylight-cluster-data) | 167 - com.typesafe.akka.slf4j - 2.3.10 | Cluster Node [akka.tcp://opendaylight-cluste...@192.168.23.240:2550] - Marking node(s) as REACHABLE [Member(address = akka.tcp://opendaylight-cluste...@192.168.23.102:2550, status = Down)]
2016-05-31 07:41:12,243 | INFO  | lt-dispatcher-14 | kka://opendaylight-cluster-data) | 167 - com.typesafe.akka.slf4j - 2.3.10 | Cluster Node [akka.tcp://opendaylight-cluste...@192.168.23.240:2550] - Marking node(s) as REACHABLE [Member(address = akka.tcp://opendaylight-cluste...@192.168.23.102:2550, status = Down)]

And then any log, leader remove the node 192.168.23.102:2550

the Akka version is 2.3.10

thanks

weed

unread,
Jun 2, 2016, 10:39:25 AM6/2/16
to Mark Hatton, Akka User List
Thanks Mark. But why leader can not remove down node at last. In my experience, this node first mark 102 node unreachable, after few seconds, this node mark another node 100 unreachable. Then,this node down 102 node, after this, 102 node become reachable. the node of 100 become down, leader remove node 100 but can not remove node 102.
------------------ 原始邮件 ------------------
发件人: "Mark Hatton"<mark....@shazam.com>
发送时间: 2016年6月2日(星期四) 晚上9:05
收件人: "Akka User List"<akka...@googlegroups.com>;
抄送: "281725287"<2817...@qq.com>;
主题: Re: In akka cluster, why a node which state is down can be makesreachable

2817...@qq.com

unread,
Jun 3, 2016, 2:26:40 AM6/3/16
to Akka User List, 2817...@qq.com
But Why a node in the down state can become reachable. And leader can not remove this node.


在 2016年6月2日星期四 UTC+8下午9:09:16,Mark Hatton写道:

Patrik Nordwall

unread,
Jun 3, 2016, 5:40:01 AM6/3/16
to Akka User List, 2817...@qq.com
Unreachable/reachable is orthogonal to the member status Up/Down/...

It's just reflects what the failure detectors thinks about the node. It can flip back and forth independent of the member status.

/Patrik

fre 3 juni 2016 kl. 08:26 skrev <2817...@qq.com>:
But Why a node in the down state can become reachable. And leader can not remove this node.


在 2016年6月2日星期四 UTC+8下午9:09:16,Mark Hatton写道:
The log output looks sensible to me given that you are using auto downing.  I read it as follows:-
  1. Node 102 is genuinely not reachable from node 240.  This may be due to network partition, or too much GC, IO, CPU, etc
  2. Node 240's failure detector fails to receive sufficient heartbeats from 102 and marks it as unreachable and then auto-downs it
  3. Node 102 recovers (e.g. network partition resolves itself)
  4. Node 240 detects 102 as reachable again, but since it is marked down it is unable to rejoin the cluster
In this scenario if you disabled auto-downing or configured it to be less aggressive, the 102 node could have successfully rejoined.

Relevant quote from the docs:

"unreachable is not a real member states but more of a flag in addition to the state signaling that the cluster is unable to talk to this node"

Hope that helps

Mark


On Tuesday, 31 May 2016 09:53:48 UTC+1, lisheng...@gmail.com wrote:
the log is follow:
2016-05-31 07:40:54,053 | WARN  | lt-dispatcher-16 | ClusterCoreDaemon                | 167 - com.typesafe.akka.slf4j - 2.3.10 | Cluster Node [akka.tcp://opendaylight...@192.168.23.240:2550] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://opendaylight...@192.168.23.102:2550, status = Up)]
2016-05-31 07:41:08,785 | INFO  | lt-dispatcher-14 | kka://opendaylight-cluster-data) | 167 - com.typesafe.akka.slf4j - 2.3.10 | Cluster Node [akka.tcp://opendaylight...@192.168.23.240:2550] - Leader is auto-downing unreachable node [akka.tcp://opendaylight...@192.168.23.102:2550]
2016-05-31 07:41:11,267 | INFO  | lt-dispatcher-14 | kka://opendaylight-cluster-data) | 167 - com.typesafe.akka.slf4j - 2.3.10 | Cluster Node [akka.tcp://opendaylight...@192.168.23.240:2550] - Marking unreachable node [akka.tcp://opendaylight...@192.168.23.102:2550] as [Down]
2016-05-31 07:41:12,243 | INFO  | lt-dispatcher-14 | kka://opendaylight-cluster-data) | 167 - com.typesafe.akka.slf4j - 2.3.10 | Cluster Node [akka.tcp://opendaylight...@192.168.23.240:2550] - Marking node(s) as REACHABLE [Member(address = akka.tcp://opendaylight...@192.168.23.102:2550, status = Down)]
2016-05-31 07:41:12,243 | INFO  | lt-dispatcher-14 | kka://opendaylight-cluster-data) | 167 - com.typesafe.akka.slf4j - 2.3.10 | Cluster Node [akka.tcp://opendaylight...@192.168.23.240:2550] - Marking node(s) as REACHABLE [Member(address = akka.tcp://opendaylight...@192.168.23.102:2550, status = Down)]

And then any log, leader remove the node 192.168.23.102:2550

the Akka version is 2.3.10

thanks

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages