--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
Hi,So the way the cluster works currently is that the unreachable node has to be removed (by doing a down on it) before a system with the same address/port is allowed to join the cluster. If you have the auto-down set to a low value and wait with restarting the "crashed" node until you see the master setting it to DOWN, does it work then?
The thing that seems weird in your log is that 127.0.0.1:2552 suddenly marks the node as reachable again instead of just downing it. If the old node had been downd and removed correctly, then the new one with the same address/port should be allowed to connect. There might be an issue with the failure detector and a missmatch between addresses and unique addresses (address:port:uid).Would it be possible for you to package up a minimal project that we can use to reproduce this?
In my three node cluster (akka 2.3.6 - scala 2.10.4) with the config below
cluster {seed-nodes = ["akka.tcp://a...@127.0.0.1:2552" // using one of the three as seed node]auto-down-unreachable-after = 120s}I `Ctrl+C` one of my nodes so that simulate some crash/termination I seeRemoting - Tried to associate with unreachable remote address [akka.tcp://a...@127.0.0.1:2553]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: /127.0.0.1:2553but when I restart the process it is ignored to join and they cannot interoperate, and I continue to see the following message:
Cluster Node [akka.tcp://a...@127.0.0.1:2552] - Existing member [UniqueAddress(akka.tcp://adp@127.0.0.1:2553,392261992)] is trying to join, ignoring
13:36:18.964UTC INFO [adp-akka.actor.default-dispatcher-2] Cluster(akka://adp) - Cluster Node [akka.tcp://a...@127.0.0.1:2552] - Marking node(s) as REACHABLE [Member(address = akka.tcp://a...@127.0.0.1:2553, status = Up)]
Cluster Node [akka.tcp://a...@127.0.0.1:2552] - Existing member [UniqueAddress(akka.tcp://adp@127.0.0.1:2553,392261992)] is trying to join, ignoringCluster Node [akka.tcp://a...@127.0.0.1:2552] - Existing member [UniqueAddress(akka.tcp://adp@127.0.0.1:2553,392261992)] is trying to join, ignoringCluster Node [akka.tcp://a...@127.0.0.1:2552] - Existing member [UniqueAddress(akka.tcp://adp@127.0.0.1:2553,392261992)] is trying to join, ignoring
...I'd expect cluster to reconnect after one of my node restarts :(when I decrease "auto-down-unreachable-after" my crashed node is down in my seed node, so it is quarantined and won't be able to rejoin after startup until both node restart.I doubt what is the correct pattern for per node restarts in a clustered deployment!?
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
On 5 November 2014 at 00:22:55, richard (harold.ric...@gmail.com) wrote:
I am seeing something similar with this github code, based on akka-datareplication, using Akka 2.3.6(That might be a little too complex for a ticket)Note that auto-down-unreachable-after is commented out
Started two instances, one on 2551 (the seed) and another on 1234.Enter text into each instance, which is correctly replicated to each.Kill and restart the 1234 instance.The new 1234 instance receives the current state (from 2551) and continues toreplicate in both directions!The log on 2551 does indicate a problem[INFO] [11/04/2014 17:20:07.309] [ClusterSystem-akka.actor.default-dispatcher-20] [Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://ClusterSystem@localhost:2551] - Existing member [UniqueAddress(akka.tcp://ClusterSystem@localhost:1234,1772853420)] is trying to join, ignoring[INFO] [11/04/2014 17:20:17.319] [ClusterSystem-akka.actor.default-dispatcher-17] [Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://ClusterSystem@localhost:2551] - Existing member [UniqueAddress(akka.tcp://ClusterSystem@localhost:1234,1772853420)] is trying to join, ignoring[INFO] [11/04/2014 17:20:28.310] [ClusterSystem-akka.actor.default-dispatcher-3] [Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://ClusterSystem@localhost:2551] - Existing member [UniqueAddress(akka.tcp://ClusterSystem@localhost:1234,1772853420)] is trying to join, ignoring
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
Hi Richard,On 5 November 2014 at 00:22:55, richard (harold.ric...@gmail.com) wrote:
I am seeing something similar with this github code, based on akka-datareplication, using Akka 2.3.6(That might be a little too complex for a ticket)Note that auto-down-unreachable-after is commented outIf the old node is never downed and removed from the cluster, then the new node can never join.
You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/AdRSv2yuwo4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
On 5 November 2014 at 09:53:00, Behrad (beh...@gmail.com) wrote:
2014-11-05 11:59 GMT+03:30 Björn Antonsson <bjorn.a...@typesafe.com>:Hi Richard,On 5 November 2014 at 00:22:55, richard (harold.ric...@gmail.com) wrote:
I am seeing something similar with this github code, based on akka-datareplication, using Akka 2.3.6(That might be a little too complex for a ticket)Note that auto-down-unreachable-after is commented outIf the old node is never downed and removed from the cluster, then the new node can never join.Does this mean we should always set auto-down to a small value so that we can recover from(and reconnect) clusternote crashes? What is the "unreachable" -> "reachable state" state change for then !? I'd expect that my node went to unreachable state again is reachable when it's again up in between the failure detection threshold.It also isn't happening for me, in both cases.
If you want to have the nodes automatically be downed is a different issue than the reachability. The states reachabel/unreachable is for a node instance that experiences connection failures (network outages et.c.) but not restarts, while the downing is necessary when a new node with the same address/port as the old one is joining (in effect a restarted actor system).
B/
Patrik Nordwall
Typesafe - Reactive apps on the JVM
Twitter: @patriknw
The funny and bad thing is that when I tested my code today it was working!!! (as I said in my previous post it also was working at start but lately I couldn't get it working)I'm confused since I haven't changed anything related to this :( So, Am i missing a bit of change mine, or it could it depend on1) bad termination of previous sbt run's in developments!? (So how could the remoting port be opened if it's not been released)2) anything related to network/configuration that leads to that misbehave... !?hum?My concerns is two-fold:1) I'm really eager to reproduce that, and will push a test case if I found one2) there are still unclear points for me in akka clustering philosophy:I saw node B rejoining my node A seed, after B restarted today, but when it[B] didn't get aware of seed node A restart!!! Why is that happening?here is both nodes conf:remote {log-remote-lifecycle-events = offnetty.tcp {hostname = "127.0.0.1"port = 2552}transport-failure-detector {heartbeat-interval = 30sacceptable-heartbeat-pause = 35s}}cluster {seed-nodes = ["akka.tcp://a...@127.0.0.1:2552"]auto-down-unreachable-after = 10s}P.S. can we continue topic on the github issue page? there feels more comfortable for me :)