Not able to Join Hazelcast Cluster with 3.12 version but works with 3.11.4

1,350 views
Skip to first unread message

Amandeep Singh

unread,
Jun 3, 2019, 9:21:22 AM6/3/19
to Hazelcast
Hi All,

Has anyone noticed following issue?

Hazelcast is able to perform cluster using version 3.11.4 but it does with 3.12. I am using TCP-IP protocol. The configuration is like this - 


<?xml version="1.0" encoding="UTF-8"?>
           xsi:schemaLocation="http://www.hazelcast.com/schema/config
                               http://www.hazelcast.com/schema/config/hazelcast-config-3.9.xsd"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">


    <group>
        <name>trades</name>
        <password>*****</password>
    </group>


    <network>
        <public-address>10.0.32.6</public-address>
        <port>5701</port>
        <join>
            <multicast enabled="false"/>
            <aws enabled="false"/>
            <tcp-ip enabled="true">
                <members>10.0.32.6-16</members>
            </tcp-ip>
        </join>
    </network>
   
</hazelcast>

This same configuration works with 3.11.4 but not with 3.12. 

I am not able to see clear culprit message in logs but seeing with DEBUG logs - 

2019-06-03 13:11:07.123 DEBUG 21 --- [           main] com.hazelcast.cluster.impl.TcpIpJoiner   : [10.0.32.6]:5701 [trades] [3.12] Sending master question to [10.0.32.5]:5701
2019-06-03 13:11:07.125 DEBUG 21 --- [ration.thread-1] c.h.i.cluster.impl.ClusterJoinManager    : [10.0.32.6]:5701 [trades] [3.12] Received a master question from [10.0.32.6]:5701, but this node is not master itself or doesn't have a master yet!

Any thoughts would be appreciated!!

regards,
Amandeep Singh

Kyle Rosier

unread,
Jun 3, 2019, 10:07:34 AM6/3/19
to Hazelcast
Hello Amandeep,

Did you start all the 3.12 nodes fresh, or are 3.12 nodes trying to join with 3.11.x nodes?

Can you try moving the "<aws enabled="false"/>" after the tcp-ip section? I tried your config but updated xsi:schemaLocation to point to the 3.12 version and saw an error at the tcp-ip line when the aws section precedes it "Invalid content ... No child element expected at this point."

Also please note that per our docs the public address value should be given in the format host IP address:port number.

Sincerely,
Kyle

Amandeep Singh

unread,
Jun 3, 2019, 11:11:46 AM6/3/19
to haze...@googlegroups.com
Hi Kyle,

Thanks for the suggestions. I tried both but they are not working. Surprisingly there is nothing in logs as error.
I tried multicast mode and that is working. So, looks problem with TCP/IP only. As 3.11.4 works that rules out networking issue.

I am starting afresh so there is no mix of 3.11.4 and 3.12. All are on 3.12.

regards,
Amandeep Singh

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at https://groups.google.com/group/hazelcast.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/897257b9-aeea-4456-9bd5-f75af4374c5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Amandeep Singh

unread,
Jun 4, 2019, 4:18:16 AM6/4/19
to haze...@googlegroups.com
Hi,

I think I have identified the issue but no solution. 

It is getting proved that in 3.12 there is some problem with ip range.

e.g.  If I give configuration like this i.e. the first node that is coming up has same IP as mentioned in range, cluster gets formed.
Working config in 3.12 
<network>
        <public-address>10.0.46.6:5701</public-address>
        <port auto-increment="false">5701</port>

        <join>
            <multicast enabled="false"/>
            <tcp-ip enabled="true">
                <member>10.0.46.6-11</member>
            </tcp-ip>
            <aws enabled="false"/>
        </join>
</network>

However, if I give configuration like this, Hazelcast never able to form cluster and it stuck in loop and final shut done itself. 
Failing config in 3.12
<network>
        <public-address>10.0.46.6:5701</public-address>
        <port auto-increment="false">5701</port>

        <join>
            <multicast enabled="false"/>
            <tcp-ip enabled="true">
                <member>10.0.46.4-11</member>
            </tcp-ip>
            <aws enabled="false"/>
        </join>
</network>  

But both these configs work in 3.11.4. Do you see anything missing? Any suggestion?

Please note my IPs are not static they are dynamically generated. I have template based hazelcast.xml and that template is filled at the runtime. I am doing this test using docker swarm.

regards,
Amandeep Singh

Pugazhenthi Sonachalam

unread,
Jun 4, 2019, 6:45:55 AM6/4/19
to haze...@googlegroups.com
Looks the XML tags are different in 3.12 please check github samples 

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at https://groups.google.com/group/hazelcast.

Amandeep Singh

unread,
Jun 4, 2019, 9:04:57 AM6/4/19
to haze...@googlegroups.com
Thanks again, I did correct the xml after referring to git hub( https://github.com/hazelcast/hazelcast/blob/master/hazelcast/src/main/resources/hazelcast-full-example.xml  ) hazelcast but that also didn't work.


 <network>
        <public-address>10.0.54.21:5701</public-address>
        <port auto-increment="false">5701</port>

        <join>
            <multicast enabled="false"/>
            <tcp-ip enabled="true">
                <member-list>
                    <member>10.0.54.11-31</member>
                </member-list>

            </tcp-ip>
            <aws enabled="false"/>
        </join>
    </network>

However, weird thing is when hazelcast process die itself(after continuous try), docker swarm replace the failing containers and then Hazelcast discover all nodes and successfully form cluster.  So the same code(image) starts working. 

 I have noticed that in failing scenario, Hazelcast process could not add all ips in ip-range to blacklist. I could not find
 2019-06-04 09:16:01.474  INFO 22 --- [cached.thread-3] com.hazelcast.cluster.impl.TcpIpJoiner   : [10.0.48.17]:5701 [trades] [3.12] [10.0.48.18]:5701 is added to the blacklist. 
for all ips except where we actually want.

I don't know why. Further noticed that adding to blacklist happens in separate thread (not in main). 

So, not getting what is actual cause.

regards,
Amandeep Singh

Reply all
Reply to author
Forward
0 new messages