Not able to Join Hazelcast Cluster with 3.12 version but works with 3.11.4

Amandeep Singh

unread,

Jun 3, 2019, 9:21:22 AM6/3/19

to Hazelcast

Hi All,

Has anyone noticed following issue?

Hazelcast is able to perform cluster using version 3.11.4 but it does with 3.12. I am using TCP-IP protocol. The configuration is like this -

<?xml version="1.0" encoding="UTF-8"?>

<hazelcast xmlns="http://www.hazelcast.com/schema/config"

xsi:schemaLocation="http://www.hazelcast.com/schema/config

http://www.hazelcast.com/schema/config/hazelcast-config-3.9.xsd"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<group>

<name>trades</name>

</group>

<public-address>10.0.32.6</public-address>

<join>

<tcp-ip enabled="true">

</tcp-ip>

</join>

</network>

</hazelcast>

This same configuration works with 3.11.4 but not with 3.12.

I am not able to see clear culprit message in logs but seeing with DEBUG logs -

2019-06-03 13:11:07.123 DEBUG 21 --- [ main] com.hazelcast.cluster.impl.TcpIpJoiner : [10.0.32.6]:5701 [trades] [3.12] Sending master question to [10.0.32.5]:5701

2019-06-03 13:11:07.125 DEBUG 21 --- [ration.thread-1] c.h.i.cluster.impl.ClusterJoinManager : [10.0.32.6]:5701 [trades] [3.12] Received a master question from [10.0.32.6]:5701, but this node is not master itself or doesn't have a master yet!

Any thoughts would be appreciated!!

regards,

Amandeep Singh

Kyle Rosier

unread,

Jun 3, 2019, 10:07:34 AM6/3/19

to Hazelcast

Hello Amandeep,

Did you start all the 3.12 nodes fresh, or are 3.12 nodes trying to join with 3.11.x nodes?

Can you try moving the "<aws enabled="false"/>" after the tcp-ip section? I tried your config but updated xsi:schemaLocation to point to the 3.12 version and saw an error at the tcp-ip line when the aws section precedes it "Invalid content ... No child element expected at this point."

Also please note that per our docs the public address value should be given in the format host IP address:port number.

Sincerely,

Kyle

Amandeep Singh

unread,

Jun 3, 2019, 11:11:46 AM6/3/19

to haze...@googlegroups.com

Hi Kyle,

Thanks for the suggestions. I tried both but they are not working. Surprisingly there is nothing in logs as error.

I tried multicast mode and that is working. So, looks problem with TCP/IP only. As 3.11.4 works that rules out networking issue.

I am starting afresh so there is no mix of 3.11.4 and 3.12. All are on 3.12.

regards,

Amandeep Singh

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at https://groups.google.com/group/hazelcast.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/897257b9-aeea-4456-9bd5-f75af4374c5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Amandeep Singh

unread,

Jun 4, 2019, 4:18:16 AM6/4/19

to haze...@googlegroups.com

Hi,

I think I have identified the issue but no solution.

It is getting proved that in 3.12 there is some problem with ip range.

e.g. If I give configuration like this i.e. the first node that is coming up has same IP as mentioned in range, cluster gets formed.

Working config in 3.12

<tcp-ip enabled="true">
<member>10.0.46.6-11</member>
</tcp-ip>
<aws enabled="false"/>
</join>
</network>

However, if I give configuration like this, Hazelcast never able to form cluster and it stuck in loop and final shut done itself.

Failing config in 3.12

<tcp-ip enabled="true">
<member>10.0.46.4-11</member>
</tcp-ip>
<aws enabled="false"/>
</join>
</network>

But both these configs work in 3.11.4. Do you see anything missing? Any suggestion?

Please note my IPs are not static they are dynamically generated. I have template based hazelcast.xml and that template is filled at the runtime. I am doing this test using docker swarm.

regards,

Amandeep Singh

Pugazhenthi Sonachalam

unread,

Jun 4, 2019, 6:45:55 AM6/4/19

to haze...@googlegroups.com

Looks the XML tags are different in 3.12 please check github samples

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at https://groups.google.com/group/hazelcast.

To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/89093564-b42f-46b6-9af0-749e2153e328%40googlegroups.com.

Amandeep Singh

unread,

Jun 4, 2019, 9:04:57 AM6/4/19

to haze...@googlegroups.com

Thanks again, I did correct the xml after referring to git hub( https://github.com/hazelcast/hazelcast/blob/master/hazelcast/src/main/resources/hazelcast-full-example.xml ) hazelcast but that also didn't work.

<tcp-ip enabled="true">
<member-list>
<member>10.0.54.11-31</member>
</member-list>

</tcp-ip>
<aws enabled="false"/>
</join>
</network>

However, weird thing is when hazelcast process die itself(after continuous try), docker swarm replace the failing containers and then Hazelcast discover all nodes and successfully form cluster. So the same code(image) starts working.

I have noticed that in failing scenario, Hazelcast process could not add all ips in ip-range to blacklist. I could not find

2019-06-04 09:16:01.474 INFO 22 --- [cached.thread-3] com.hazelcast.cluster.impl.TcpIpJoiner : [10.0.48.17]:5701 [trades] [3.12] [10.0.48.18]:5701 is added to the blacklist.

for all ips except where we actually want.

I don't know why. Further noticed that adding to blacklist happens in separate thread (not in main).

So, not getting what is actual cause.

regards,

Amandeep Singh

To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/CA%2B2%2BR3tQoauA%2BrAgXAAqawZR%2BFkn4e6XO43YG%3D0q5x81zQWLkg%40mail.gmail.com.

Reply all

Reply to author

Forward