I have a 2 members Hazelcast cluster on the same computer.
There a few computers (about 5-6 other developers' machines) in the same network with similar clusters (but different ones, that is they have different group name/password).
The seems to lead to a situation when SplitBrainJoinMessage deque in my Hazelcast cluster's master is flooded with messages. My master node might be contributing to this situation resulting in some kind of chain reaction since judging by the code MulticastJoiner, it sends another SplitBrainJoinMessage in response.
I can see in my log, thousands of similar messages:
2017-10-25 07:45:45,799 odeMulticastListener [ster.MulticastThread] - [10.10.6.102]:5702 [my-cluster] [3.8.2] Dropped: SplitBrainJoinMessage{packetVersion=4, buildNumber=20170518, memberVersion=3.8.2, clusterVersion=3.8, address=[10.10.6.54]:5702, uuid='5c3a81a6-d31b-42b9-8b9d-19f73324ac6e', liteMember=false, memberCount=1, dataMemberCount=1}
Eventually, free heap runs out and CPU utilization goes to 100% and "OutOfMemoryError: GC overhead limit exceeded" ensue.
When I use tcp-ip join and restrict it to the localhost, this is not reproduced.
We use Hazelcast 3.8.2.
I would be grateful if someone could shed some light on this.
My hazelcast.xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?><hazelcast xmlns="http://www.hazelcast.com/schema/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.hazelcast.com/schema/config https://hazelcast.com/schema/config/hazelcast-config-3.8.xsd">
<group>
<name>my-cluster</name>
<password>tcRT0FUAZOtqEo5yl2aIa2biN63z9fp7</password>
</group>
<network>
<join>
<multicast enabled="true"/>
</join>
</network>
<properties>
<!-- tried playing with these but it seems that they might affect only whether the problem occurs sooner or later
<property name="hazelcast.merge.next.run.delay.seconds">60</property>
<property name="hazelcast.merge.first.run.delay.seconds">10</property>
-->
</properties>
</hazelcast>hazelcast.merge.next.run.delay.seconds. But since messages are accumulated in the deque and processed by the MulticastJoiner while there are still messages in the deque, this interval is ignored and this in my opinion creates flooding and resulting in a chain reaction that only makes things worse. Is it a known consequence? Can this be improved somehow?hazelcast.merge.next.run.delay.seconds. But since messages are accumulated in the deque and processed by the MulticastJoiner while there are still messages in the deque, this interval is ignored and this in my opinion creates flooding and resulting in a chain reaction that only makes things worse. Is it a known consequence? Can this be improved somehow? hazelcast.merge.next.run.delay.seconds in order to prevent increasing number of events in the deque. In your case, your cluster members cannot catch up the other clusters' searchForOtherClusters() runs. Most probably, some other cluster/s are configured with small number of hazelcast.merge.next.run.delay.seconds, and this causes the flooding in your cluster.with the correct (or default) configuration of all clusters in the multicast group, it is very hard to get an OOME with "SplitBrainJoinMessage"s from other clusters
Another option is that you can specify a multicast group in Hazelcast configuration and use it
Most probably, some other cluster/s are configured with small number of hazelcast.merge.next.run.delay.seconds, and this causes the flooding in your cluster.
When there are still messages in the deque, Hazelcast member should try to process all of them regardless of thehazelcast.merge.next.run.delay.secondsin order to prevent increasing number of events in the deque.
--
You received this message because you are subscribed to a topic in the Google Groups "Hazelcast" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hazelcast/9cVZvMPWwv8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hazelcast+unsubscribe@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at https://groups.google.com/group/hazelcast.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/CAOZbE7wfsd%3DDK2Y94CWkj4GyrVb-BhcjgyXYSfXt__YONasxeQ%40mail.gmail.com.