After some investigations I found that the memory was all used by the internal buffer in ClusterSingletonProxy.
I will solve that (I guess) by using a bounded mailbox instead of the default.
I am still confused by the cluster size issue, though. The node was obviously busy sending messages to the cluster singleton, but could this cause the system-messages needed for detecting cluster problems to wait? Is there a priority for messages here?
And even if the system-messages had to wait, should not the internal clock detect that it had not heard from the other nodes, and act on that?
Regards,
Anders--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
Patrik Nordwall
Typesafe - Reactive apps on the JVM
Twitter: @patriknw
Thank you for your explanation!
I tried to solve the problem by implementing my own mailbox, that
deletes old messages when full (that is my requierement).
As I understand you, this will not work, as the stashing is not using
the mailbox?
But how can I change the stash-capacity behaviour, so that I remove
the oldest messages, instead of throwing the StashOverflowException?
If the stashing is not using the mailbox, it is mayby not possible to
plug in my owm implementation?
I now encountered the problem again: The cluster (3 nodes) suddenly has two leaders, and only one of the nodes reported all the other nodes to be part of the cluster.
While it might have been triggered by high CPU, I am not sure why it did not self-heal. Should not the gossip converge?
When I checked the system, all applications were running fine, with almost no load.
What I don't understand is the following:
If one node reports another node to be up, how can it be possible that the other node reports the first node to be down (I am using auto-down)?
Best regards,
Anders
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
I thought I was using 2.3.11, but this system was still on 2.3.10. I have upgraded, and will see if it happens again.
I meant the nodes were downed, yes.
Thank you for the bug pointer, that might be a way to trigger it!
Anders
You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/7lZ_0Ukdeyo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscribe@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
--
Akka Team
Typesafe - Reactive apps on the JVM
Blog: letitcrash.com
Twitter: @akkateam
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/7lZ_0Ukdeyo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+unsubscribe@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
--
Akka Team
Typesafe - Reactive apps on the JVM
Blog: letitcrash.com
Twitter: @akkateam
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/7lZ_0Ukdeyo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
Patrik Nordwall
Typesafe - Reactive apps on the JVM
Twitter: @patriknw
Sorry about the confusion, I am probably using some terminology wrong. I will try again.
This problem is happening on all my clusters under load, using Akka 2.3.11.
I am using auto-down-after-unreachable, so nodes are downed (that is what I called removed) automatically.
When I start the cluster, all three members has all three members in the set cluster.readView().members().
Under load, some members go to the set unreachableMembers(), and back again. In some cases, they are too long unreachable, and are downed. Then they are no longer part
of the set members(), and no longer part of the set unreachableMembers(). That is why I called it "removed".
After running the tests for a couple of hours, I see that TWO of the nodes only have each other in members(), but the third node has all three in members(). The third node in some cases also has a different leader.
On all the nodes, the set unreachableMembers() is empty.
What I don't understand is how the third node can have all three nodes in the members() set, but the other nodes does not have it in theirs. This is a stable state, I have to restart the third node to fix this. I would expect that if one node is seeing another (has the node in members()), that goes both ways.
Hope this was more clear! I am trying to reproduce this in a more controlled example, but I have not managed it so far. Our planned, temporary solution is to run clusters of size one so far... :-(
Anders
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
akka {
actor.provider = "akka.cluster.ClusterActorRefProvider"
remote.netty.tcp.hostname = "127.0.0.1"
cluster {
seed-nodes = [
"akka.tcp://sys...@127.0.0.1:2551",
"akka.tcp://sys...@127.0.0.1:2552"
]
auto-down-unreachable-after = 10ms
failure-detector {
heartbeat-interval = 10 ms
threshold = 2.0
acceptable-heartbeat-pause = 10 ms
expected-response-after = 5 ms
}
}
}
package no.kantega.workshop.akka;
import akka.actor.ActorRef;
import akka.actor.ActorSystem;
import akka.actor.Address;
import akka.cluster.Cluster;
import akka.cluster.Member;
import akka.contrib.pattern.DistributedPubSubExtension;
import akka.contrib.pattern.DistributedPubSubMediator;
import com.typesafe.config.ConfigFactory;
import org.junit.Test;
import scala.collection.Iterator;
import scala.collection.immutable.SortedSet;
import java.util.HashSet;
import java.util.Set;
import java.util.concurrent.TimeUnit;
import static com.jayway.awaitility.Awaitility.await;
import static com.typesafe.config.ConfigValueFactory.fromAnyRef;
import static org.assertj.core.api.Assertions.assertThat;
public class ClusterConvergenceTest {
@Test
public void test_many_times() throws InterruptedException {
for (int i = 0; i < 100; i++) {
cluster_membership_is_symmetric();
}
}
@Test
public void cluster_membership_is_symmetric() throws InterruptedException {
ActorSystem actorSystem1 = ActorSystem.apply("system", ConfigFactory.load()
.withValue("akka.remote.netty.tcp.port", fromAnyRef("2551")));
ActorSystem actorSystem2 = ActorSystem.apply("system", ConfigFactory.load()
.withValue("akka.remote.netty.tcp.port", fromAnyRef("2552")));
ActorSystem actorSystem3 = ActorSystem.apply("system", ConfigFactory.load()
.withValue("akka.remote.netty.tcp.port", fromAnyRef("2553")));
try {
Cluster cluster1 = Cluster.get(actorSystem1);
Cluster cluster2 = Cluster.get(actorSystem2);
Cluster cluster3 = Cluster.get(actorSystem3);
// Wait until all members can see all the others:
await().atMost(20, TimeUnit.SECONDS).until(() -> cluster1.state().members().size() == 3);
await().atMost(20, TimeUnit.SECONDS).until(() -> cluster2.state().members().size() == 3);
await().atMost(20, TimeUnit.SECONDS).until(() -> cluster3.state().members().size() == 3);
System.out.println("Generate some load (we should see cluster events in the console log)...");
for (ActorSystem system : new ActorSystem[]{actorSystem1, actorSystem2, actorSystem3}) {
ActorRef mediator = DistributedPubSubExtension.get(system).mediator();
for (int i = 0; i < 1_000_000; i++) {
String event = "Message number " + i;
mediator.tell(new DistributedPubSubMediator.Publish("dummy stream", event), ActorRef.noSender());
}
}
System.out.println("Wait for things to settle down (cluster events should stop)...");
// Ideally I would have a way to know when all messages was processed...
Thread.sleep(30_000L);
System.out.println("Check that cluster membership is reflexise...");
membershipIsSymmetric(cluster1, cluster2);
membershipIsSymmetric(cluster2, cluster3);
membershipIsSymmetric(cluster1, cluster3);
} finally {
actorSystem1.shutdown();
actorSystem2.shutdown();
actorSystem3.shutdown();
actorSystem1.awaitTermination();
actorSystem2.awaitTermination();
actorSystem3.awaitTermination();
}
}
/**
* Here we check that cluster1 has cluster2 as a member iff cluster2 has cluster1 as a member.
*/
private static void membershipIsSymmetric(Cluster cluster1, Cluster cluster2) {
if (addresses(cluster1.state().members()).contains(cluster2.selfAddress())) {
// node 1 sees node 2, check the opposite way
assertThat(addresses(cluster2.state().members())).contains(cluster1.selfAddress());
} else {
// node 1 does not see node 2, check the opposite way
assertThat(addresses(cluster2.state().members())).doesNotContain(cluster1.selfAddress());
}
}
private static Set<Address> addresses(SortedSet<Member> members) {
Set<Address> set = new HashSet<>();
Iterator<Member> iterator = members.iterator();
while (iterator.hasNext()) {
set.add(iterator.next().address());
}
return set;
}
}
No OutOfMemory, the third node is running fine. Except is can be the leader, and in that case I have two leaders...
I think I have reproduced it in the following program (let me know if you want the complete maven setup or similar):
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
...
2015-06-24 17:51:54,693 INFO Cluster(akka://my-system) my-system-akka.actor.default-dispatcher-3 - Cluster Node [akka.tcp://my-system@machine2:15552] - Successfully shut down
2015-06-24 17:51:54,677 INFO Cluster(akka://my-system) my-system-akka.actor.default-dispatcher-17 - Cluster Node [akka.tcp://my-system@machine2:15552] - Shutting down...
2015-06-24 17:51:54,603 INFO Cluster(akka://my-system) my-system-akka.actor.default-dispatcher-20 - Cluster Node [akka.tcp://my-system@machine1:15552] - Marking unreachable node [akka.tcp://my-system@machine2:15552] as [Down]
2015-06-24 17:51:54,603 INFO Cluster(akka://my-system) my-system-akka.actor.default-dispatcher-4 - Cluster Node [akka.tcp://my-system@machine1:15552] - Leader is auto-downing unreachable node [akka.tcp://my-system@machine2:15552]...
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.