Akka Cluster - nodes disagree on cluster size

153 views
Skip to first unread message

Anders Båtstrand

unread,
Apr 21, 2015, 6:20:30 AM4/21/15
to akka...@googlegroups.com
Dear group

I have a cluster of three nodes. Node nr. 1 is having some memory troubles (might be related), with only 30-70 MB free.

Node 1 is reporting the cluster to be of three nodes, with node 2 as leader.

The other two nodes agree on the leader, but does not include node 1 in the cluster.

How is this possible? If node 2 and 3 does not see node 1, how can node 1 see them?

How can I debug this?

Best regards,

Anders Båtstrand

Anders Båtstrand

unread,
Apr 22, 2015, 3:26:38 AM4/22/15
to akka...@googlegroups.com
After some investigations I found that the memory was all used by the internal buffer in ClusterSingletonProxy. I will solve that (I guess) by using a bounded mailbox instead of the default.

I am still confused by the cluster size issue, though. The node was obviously busy sending messages to the cluster singleton, but could this cause the system-messages needed for detecting cluster problems to wait? Is there a priority for messages here?

And even if the system-messages had to wait, should not the internal clock detect that it had not heard from the other nodes, and act on that?

Regards,

Anders

Patrik Nordwall

unread,
Apr 22, 2015, 8:42:25 AM4/22/15
to akka...@googlegroups.com
On Wed, Apr 22, 2015 at 9:26 AM, Anders Båtstrand <ande...@gmail.com> wrote:
After some investigations I found that the memory was all used by the internal buffer in ClusterSingletonProxy.

Great, that explains it.
 
I will solve that (I guess) by using a bounded mailbox instead of the default.

It is using Stash so you should be able to configure an appropriate stash-capacity.
Note that we have a ticket for making this buffering optional: https://github.com/akka/akka/issues/15110
 

I am still confused by the cluster size issue, though. The node was obviously busy sending messages to the cluster singleton, but could this cause the system-messages needed for detecting cluster problems to wait? Is there a priority for messages here?

And even if the system-messages had to wait, should not the internal clock detect that it had not heard from the other nodes, and act on that?

Node 1 and 3 detected that 2 was unreachable because they did not receive heartbeat replies from it. Probably you are also using auto-down, since you said that node 2 was removed from 1 and 3.

Node 2 was completely swamped with trying to free memory, garbage collecting, so it was not able to anything else, i.e. it was not even able to detect that 1 and 3 are unreachable.

By running with DEBUG log level you can see when the heartbeat messages are sent and received.

/Patrik
 

Regards,

Anders

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



--

Patrik Nordwall
Typesafe Reactive apps on the JVM
Twitter: @patriknw

Anders Båtstrand

unread,
Apr 24, 2015, 4:25:33 AM4/24/15
to akka...@googlegroups.com
Thank you for your explanation!

I tried to solve the problem by implementing my own mailbox, that
deletes old messages when full (that is my requierement).

As I understand you, this will not work, as the stashing is not using
the mailbox?

But how can I change the stash-capacity behaviour, so that I remove
the oldest messages, instead of throwing the StashOverflowException?

If the stashing is not using the mailbox, it is mayby not possible to
plug in my owm implementation?

Best regards,

Anders
> You received this message because you are subscribed to a topic in the
> Google Groups "Akka User List" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/akka-user/7lZ_0Ukdeyo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to

Patrik Nordwall

unread,
Apr 26, 2015, 2:01:06 PM4/26/15
to akka...@googlegroups.com
On Fri, Apr 24, 2015 at 10:25 AM, Anders Båtstrand <ande...@gmail.com> wrote:
Thank you for your explanation!

I tried to solve the problem by implementing my own mailbox, that
deletes old messages when full (that is my requierement).

As I understand you, this will not work, as the stashing is not using
the mailbox?

Stash requires a deque mailbox. 

But how can I change the stash-capacity behaviour, so that I remove
the oldest messages, instead of throwing the StashOverflowException?

I think that is difficult. This guy should probably not use stash. I have added a comment to the ticket.

If you need a fix now, I would recommend that you create your own copy of the ClusterSingletonProxy that does what you want instead of fighting this with a custom mailbox.
 

If the stashing is not using the mailbox, it is mayby not possible to
plug in my owm implementation?

You can replace the mailbox, but it must provide the deque semantics. 

Regards,
Patrik

Anders Båtstrand

unread,
Jun 2, 2015, 9:15:45 AM6/2/15
to akka...@googlegroups.com
I now encountered the problem again: The cluster (3 nodes) suddenly has two leaders, and only one of the nodes reported all the other nodes to be part of the cluster.

While it might have been triggered by high CPU, I am not sure why it did not self-heal. Should not the gossip converge?

When I checked the system, all applications were running fine, with almost no load.

What I don't understand is the following:

If one node reports another node to be up, how can it be possible that the other node reports the first node to be down (I am using auto-down)?

Best regards,

Anders

Akka Team

unread,
Jun 7, 2015, 6:29:19 AM6/7/15
to Akka User List
Hi Anders,

On Tue, Jun 2, 2015 at 3:15 PM, Anders Båtstrand <ande...@gmail.com> wrote:
I now encountered the problem again: The cluster (3 nodes) suddenly has two leaders, and only one of the nodes reported all the other nodes to be part of the cluster.

While it might have been triggered by high CPU, I am not sure why it did not self-heal. Should not the gossip converge?

There are two things here. First, nodes just mark other nodes as UNREACHABLE. This is a fully recoverable operation. DOWNING means that the node has been removed and cannot come back until it has been restarted. When you say only one of the nodes reported the other nodes to be part of the cluster did you mean that the other nodes have seen this UNREACHABLE, or have they downed it?
 

When I checked the system, all applications were running fine, with almost no load.

What I don't understand is the following:

If one node reports another node to be up, how can it be possible that the other node reports the first node to be down (I am using auto-down)?

Hmm, this reminds me of an older ticket: https://github.com/akka/akka/issues/16624

Which version of Akka are you using? Does this happen with 2.3.11?

-Endre
 

Best regards,

Anders

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



--
Akka Team
Typesafe - Reactive apps on the JVM
Blog: letitcrash.com
Twitter: @akkateam

Anders Båtstrand

unread,
Jun 9, 2015, 4:09:22 AM6/9/15
to akka...@googlegroups.com

I thought I was using 2.3.11, but this system was still on 2.3.10. I have upgraded, and will see if it happens again.

I meant the nodes were downed, yes.

Thank you for the bug pointer, that might be a way to trigger it!

Anders

You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/7lZ_0Ukdeyo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.

Anders Båtstrand

unread,
Jun 23, 2015, 9:23:59 AM6/23/15
to akka...@googlegroups.com
This is happening again a lot for me, and on 2.3.11.

I am running jmeter-tests, so I am pretty sure load is the trigger. Also when I deploy other applications on the machines, the cluster sometimes gets into this state.

Is there some debug logging I can turn on to investigate this? It seems the cluster gossip does not converge...

Anders

To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscribe@googlegroups.com.



To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



--

Akka Team

Typesafe - Reactive apps on the JVM

Blog: letitcrash.com
Twitter: @akkateam

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/7lZ_0Ukdeyo/unsubscribe.

To unsubscribe from this group and all its topics, send an email to akka-user+unsubscribe@googlegroups.com.

Patrik Nordwall

unread,
Jun 24, 2015, 8:16:46 AM6/24/15
to akka...@googlegroups.com
You can run with akka.loglevel=DEBUG, but also INFO level should show pretty well what the cluster is doing (if you know what to look for).

I have a hard time understanding what you are seeing and what you expect. First thing you must clarify is if members are downed and removed, or if they are "just" unreachable.

You said that nodes were downed. What do you expect after that and what do you see?

To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.



To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



--

Akka Team

Typesafe - Reactive apps on the JVM

Blog: letitcrash.com
Twitter: @akkateam

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/7lZ_0Ukdeyo/unsubscribe.

To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.


To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.

To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



--

Patrik Nordwall

Typesafe Reactive apps on the JVM

Twitter: @patriknw

Anders Båtstrand

unread,
Jun 24, 2015, 8:32:06 AM6/24/15
to akka...@googlegroups.com
Sorry about the confusion, I am probably using some terminology wrong. I will try again.

This problem is happening on all my clusters under load, using Akka 2.3.11.

I am using auto-down-after-unreachable, so nodes are downed (that is what I called removed) automatically.

When I start the cluster, all three members has all three members in the set cluster.readView().members().

Under load, some members go to the set unreachableMembers(), and back again. In some cases, they are too long unreachable, and are downed. Then they are no longer part
of the set members(), and no longer part of the set unreachableMembers(). That is why I called it "removed".

After running the tests for a couple of hours, I see that TWO of the nodes only have each other in members(), but the third node has all three in members(). The third node in some cases also has a different leader.

On all the nodes, the set unreachableMembers() is empty.

What I don't understand is how the third node can have all three nodes in the members() set, but the other nodes does not have it in theirs. This is a stable state, I have to restart the third node to fix this. I would expect that if one node is seeing another (has the node in members()), that goes both ways.

Hope this was more clear! I am trying to reproduce this in a more controlled example, but I have not managed it so far. Our planned, temporary solution is to run clusters of size one so far... :-(

Anders

Patrik Nordwall

unread,
Jun 24, 2015, 10:46:28 AM6/24/15
to akka...@googlegroups.com
On Wed, Jun 24, 2015 at 2:32 PM, Anders Båtstrand <ande...@gmail.com> wrote:
Sorry about the confusion, I am probably using some terminology wrong. I will try again.

This problem is happening on all my clusters under load, using Akka 2.3.11.

I am using auto-down-after-unreachable, so nodes are downed (that is what I called removed) automatically.

When I start the cluster, all three members has all three members in the set cluster.readView().members().

Under load, some members go to the set unreachableMembers(), and back again. In some cases, they are too long unreachable, and are downed. Then they are no longer part
of the set members(), and no longer part of the set unreachableMembers(). That is why I called it "removed".

Thanks. Now I got it, and you are using the right terminology.
 

After running the tests for a couple of hours, I see that TWO of the nodes only have each other in members(), but the third node has all three in members(). The third node in some cases also has a different leader.

On all the nodes, the set unreachableMembers() is empty.

What I don't understand is how the third node can have all three nodes in the members() set, but the other nodes does not have it in theirs. This is a stable state, I have to restart the third node to fix this. I would expect that if one node is seeing another (has the node in members()), that goes both ways.

Is the third node still operational? No OutOfMemoryErrors there?

We have this issue https://github.com/akka/akka/issues/17479 with the cluster.readView (cluster.state is the public api, by the way).

We should look at cluster logs (INFO level is enough), especially from the third node. Grep for log messages with "Cluster Node ".
 

Hope this was more clear! I am trying to reproduce this in a more controlled example, but I have not managed it so far. Our planned, temporary solution is to run clusters of size one so far... :-(

Anders

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Anders Båtstrand

unread,
Jun 24, 2015, 10:57:09 AM6/24/15
to akka...@googlegroups.com
No OutOfMemory, the third node is running fine. Except is can be the leader, and in that case I have two leaders...

I think I have reproduced it in the following program (let me know if you want the complete maven setup or similar):

application.conf:

akka {
  actor
.provider = "akka.cluster.ClusterActorRefProvider"
  remote.netty.tcp.hostname = "127.0.0.1"

  cluster {
    seed
-nodes = [
     
"akka.tcp://sys...@127.0.0.1:2551",
     
"akka.tcp://sys...@127.0.0.1:2552"
    ]

   
auto-down-unreachable-after = 10ms

    failure
-detector {
      heartbeat
-interval = 10 ms
      threshold
= 2.0
      acceptable-heartbeat-pause = 10 ms
      expected
-response-after = 5 ms
   
}
 
}
}


And the test:

package no.kantega.workshop.akka;

import akka.actor.ActorRef;
import akka.actor.ActorSystem;
import akka.actor.Address;
import akka.cluster.Cluster;
import akka.cluster.Member;
import akka.contrib.pattern.DistributedPubSubExtension;
import akka.contrib.pattern.DistributedPubSubMediator;
import com.typesafe.config.ConfigFactory;
import org.junit.Test;
import scala.collection.Iterator;
import scala.collection.immutable.SortedSet;

import java.util.HashSet;
import java.util.Set;
import java.util.concurrent.TimeUnit;

import static com.jayway.awaitility.Awaitility.await;
import static com.typesafe.config.ConfigValueFactory.fromAnyRef;
import static org.assertj.core.api.Assertions.assertThat;

public class ClusterConvergenceTest {

   
@Test
    public void test_many_times() throws InterruptedException {
       
for (int i = 0; i < 100; i++) {
            cluster_membership_is_symmetric
();
       
}
   
}

   
@Test
    public void cluster_membership_is_symmetric() throws InterruptedException {

       
ActorSystem actorSystem1 = ActorSystem.apply("system", ConfigFactory.load()
               
.withValue("akka.remote.netty.tcp.port", fromAnyRef("2551")));

       
ActorSystem actorSystem2 = ActorSystem.apply("system", ConfigFactory.load()
               
.withValue("akka.remote.netty.tcp.port", fromAnyRef("2552")));

       
ActorSystem actorSystem3 = ActorSystem.apply("system", ConfigFactory.load()
               
.withValue("akka.remote.netty.tcp.port", fromAnyRef("2553")));

       
try {

           
Cluster cluster1 = Cluster.get(actorSystem1);
           
Cluster cluster2 = Cluster.get(actorSystem2);
           
Cluster cluster3 = Cluster.get(actorSystem3);

           
// Wait until all members can see all the others:
            await().atMost(20, TimeUnit.SECONDS).until(() -> cluster1.state().members().size() == 3);
           
await().atMost(20, TimeUnit.SECONDS).until(() -> cluster2.state().members().size() == 3);
           
await().atMost(20, TimeUnit.SECONDS).until(() -> cluster3.state().members().size() == 3);

           
System.out.println("Generate some load (we should see cluster events in the console log)...");

           
for (ActorSystem system : new ActorSystem[]{actorSystem1, actorSystem2, actorSystem3}) {
               
ActorRef mediator = DistributedPubSubExtension.get(system).mediator();
               
for (int i = 0; i < 1_000_000; i++) {
                   
String event = "Message number  " + i;
                    mediator
.tell(new DistributedPubSubMediator.Publish("dummy stream", event), ActorRef.noSender());
               
}
           
}

           
System.out.println("Wait for things to settle down (cluster events should stop)...");

           
// Ideally I would have a way to know when all messages was processed...
            Thread.sleep(30_000L);

           
System.out.println("Check that cluster membership is reflexise...");

           
membershipIsSymmetric(cluster1, cluster2);
           
membershipIsSymmetric(cluster2, cluster3);
           
membershipIsSymmetric(cluster1, cluster3);

       
} finally {
            actorSystem1
.shutdown();
            actorSystem2
.shutdown();
            actorSystem3
.shutdown();
            actorSystem1
.awaitTermination();
            actorSystem2
.awaitTermination();
            actorSystem3
.awaitTermination();
       
}
   
}

   
/**
     * Here we check that cluster1 has cluster2 as a member iff cluster2 has cluster1 as a member.
     */
    private static void membershipIsSymmetric(Cluster cluster1, Cluster cluster2) {
       
if (addresses(cluster1.state().members()).contains(cluster2.selfAddress())) {
           
// node 1 sees node 2, check the opposite way
            assertThat(addresses(cluster2.state().members())).contains(cluster1.selfAddress());
       
} else {
           
// node 1 does not see node 2, check the opposite way
            assertThat(addresses(cluster2.state().members())).doesNotContain(cluster1.selfAddress());
       
}
   
}

   
private static Set<Address> addresses(SortedSet<Member> members) {
       
Set<Address> set = new HashSet<>();
       
Iterator<Member> iterator = members.iterator();
       
while (iterator.hasNext()) {
           
set.add(iterator.next().address());
       
}
       
return set;
   
}
}

You might have to change some values on your system in case it is faster than mine (or slower, but I doubt that).

Anders

Patrik Nordwall

unread,
Jun 24, 2015, 11:04:36 AM6/24/15
to akka...@googlegroups.com
On Wed, Jun 24, 2015 at 4:57 PM, Anders Båtstrand <ande...@gmail.com> wrote:
No OutOfMemory, the third node is running fine. Except is can be the leader, and in that case I have two leaders...

What are you using the leader for? There is no guarantee that there will not be more than one leader.
For that you have to use the cluster singleton, or cluster sharding.
 

I think I have reproduced it in the following program (let me know if you want the complete maven setup or similar):

Thanks, I will try that.
 

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Anders Båtstrand

unread,
Jun 24, 2015, 11:20:30 AM6/24/15
to akka...@googlegroups.com
I am using the cluster singleton, my mistake. I was somehow believing the leader always had the singleton...

Anyway, it might be that https://github.com/akka/akka/issues/17479 is related. I am not downing any node manually, however, and a node will never down itself, right? Anyway, this bug gave me some pointers, and I will investigate further.

I avoid split-brain by calling System.exit on any system that has less than half the cluster size in it's member list.

Anders
...

Anders Båtstrand

unread,
Jun 24, 2015, 12:11:15 PM6/24/15
to akka...@googlegroups.com
I have attached my logs showing the problem.

I do now think that the problem is the same as the bug you mention. I can read the following:

2015-06-24 17:51:54,693 INFO Cluster(akka://my-system) my-system-akka.actor.default-dispatcher-3 - Cluster Node [akka.tcp://my-system@machine2:15552] - Successfully shut down
2015-06-24 17:51:54,677 INFO Cluster(akka://my-system) my-system-akka.actor.default-dispatcher-17 - Cluster Node [akka.tcp://my-system@machine2:15552] - Shutting down...
2015-06-24 17:51:54,603 INFO Cluster(akka://my-system) my-system-akka.actor.default-dispatcher-20 - Cluster Node [akka.tcp://my-system@machine1:15552] - Marking unreachable node [akka.tcp://my-system@machine2:15552] as [Down]
2015-06-24 17:51:54,603 INFO Cluster(akka://my-system) my-system-akka.actor.default-dispatcher-4 - Cluster Node [akka.tcp://my-system@machine1:15552] - Leader is auto-downing unreachable node [akka.tcp://my-system@machine2:15552]

The problem is that I am still finding three members in the cluster.state.members set on machine2, and all of them have status 'up'. And this is (as far as I can tell) exactly what the bug is about.

It seems the other problems I thought was related is caused by me not shutting down the system when this is happening.

I will hook into the shutdown process of the cluster, and call System.exit from there. That way the node will be restarted, and re-join the cluster.

Anders
...
cluster.log

Patrik Nordwall

unread,
Jun 25, 2015, 8:19:12 AM6/25/15
to akka...@googlegroups.com
Glad to see that you are on track. In 2.4 you might find registerOnMemberRemoved useful: http://doc.akka.io/docs/akka/snapshot/scala/cluster-usage.html

Cheers,
Patrik

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages