[Longish] New to hazelcast: Questions about discovery and how to change to TCP discovery?

290 views
Skip to first unread message

o haya

unread,
Feb 24, 2022, 8:28:26 AM2/24/22
to Hazelcast
Hi,

We currently utilize Hazelcast in one of our apps, but I am really new to working with Hazelcast and am just learning about it in order to make a change in the discovery.

I was told that they are using what they call "automatic clustering" in the initial (current) implementation of the app.  From reading about Hazelcast, I gather that what they mean by that is "auto detection".

Also, they said that they want to switch from that to "TCP Discovery", and said this was because "automatic clustering" is not recommended for production environments, and I found similar comment in the Hazelcast documentation:


" By default, Hazelcast tries to automatically detect the applicable discovery mechanism based on the runtime environment. Note that using Auto Detection is not recommended for production. Note also that if Hazelcast finds no applicable discovery mechanism, then it falls back to Multicast."

As I said, I am just starting to learn about Hazelcast, so I am curious about "Why?" is "Auto detection" not recommended for production?

I am guessing that might be because it is not reliable?

In  any event, as I said, I am having to modify our app to change it so that it uses "TCP discovery".

I've been looking at our code, and we have a class/method that configures the Hazelcast.  Here is the (current) main method in that class (FYI this app is a Spring boot app, so Environment is the Spring boot Environment class):

    public Config hazelCastConfig(Environment env){
           
            return new Config()
                            .setInstanceName("app")
                        .setClusterName("dev")
                        .setNetworkConfig(
                                        new NetworkConfig()
                                        .setPort(env.getRequiredProperty("cache.port", Integer.class))
                                        .setPortAutoIncrement(true)
                                        .setPortCount(100)
                                        .setJoin(new JoinConfig().setAutoDetectionConfig(new AutoDetectionConfig().setEnabled(false)))
                        )
                .addMapConfig(
                        new MapConfig()
                                .setName("myService")
                                .setTimeToLiveSeconds(env.getRequiredProperty("cache.ttl.app", Integer.class))
                                .setEvictionConfig(
                                                        new EvictionConfig()
                                                              .setEvictionPolicy(EvictionPolicy.valueOf(env.getRequiredProperty("cache.evictionPolicy")))
                                                              .setMaxSizePolicy(MaxSizePolicy.valueOf(env.getRequiredProperty("cache.maxSizePolicy")))
                                                              .setSize(env.getRequiredProperty("cache.size", Integer.class))))
                                  .addMapConfig(
                                        new MapConfig()
                                              .setName("attributes")
                                              .setTimeToLiveSeconds(env.getRequiredProperty("cache.ttl.attributes", Integer.class))
                                              .setEvictionConfig(
                                                              new EvictionConfig()
                                                  .setEvictionPolicy(EvictionPolicy.valueOf(env.getRequiredProperty("cache.evictionPolicy")))
                                                  .setMaxSizePolicy(MaxSizePolicy.valueOf(env.getRequiredProperty("cache.maxSizePolicy")))
                                                  .setSize(env.getRequiredProperty("cache.size", Integer.class))))
                                  .addMapConfig(
                                                new MapConfig()
                                                      .setName("usergroups")
                                                      .setTimeToLiveSeconds(env.getRequiredProperty("cache.ttl.usergroups", Integer.class))
                                                      .setEvictionConfig(
                                                                      new EvictionConfig()
                                                          .setEvictionPolicy(EvictionPolicy.valueOf(env.getRequiredProperty("cache.evictionPolicy")))
                                                          .setMaxSizePolicy(MaxSizePolicy.valueOf(env.getRequiredProperty("cache.maxSizePolicy")))
                                                          .setSize(env.getRequiredProperty("cache.size", Integer.class))));                        
                             
    }

Can someone explain what in the above code needs to be modified so that the Hazelcast would be using TCP discovery rather than auto discovery/detection?

Also, besides just the code modification itself, is there any other configuration have to be done?

Thanks in advance,
Jim




o haya

unread,
Feb 24, 2022, 8:32:53 AM2/24/22
to Hazelcast
Hi - Sorry I forgot to mention that we are using Hazelcast 4.2.2.

Jim

o haya

unread,
Feb 24, 2022, 9:27:10 AM2/24/22
to Hazelcast
Hi,

I have a clarification/correction (as I said, I am just learning about all of this):

I noticed this line in the code I posted:

.setJoin(new JoinConfig().setAutoDetectionConfig(new AutoDetectionConfig().setEnabled(false)))

and I asked (internally): "Doesn't that DISABLE auto-detection?" and got the answer "yes, it does", and got further explanation:

Apparently they originally setup Hazelcast in the app that I am working on, with the auto-detection enabled, but then they discovered that if another/different app was also on the same physical machine and was using Hazelcast and had auto-detection enabled, they the clusters would be together, which they do not want.

So they turned off the Hazelcast in the app (which is the current status) that I am working on (for now) and that is also why they asked me to look into modifying this app to use TCP discovery.

Neil Stevenson

unread,
Feb 24, 2022, 12:21:44 PM2/24/22
to haze...@googlegroups.com
 The default discovery mechanism for Hazelcast is multicast.

 The process essentially broadcasts out it's presence to the network, and tries to join with whichever processes respond.
 This is great for development, zero configuration and you're clustered with another process on your machine or **nearby**.
 This is not so great for production, for the same reason, the uncontrolled nature of who you might join with.
 But also for more subtle reasons of multicast reach and control.
 Multicast is frequently disabled by network teams, totally or partially.
 For instance, you may not find a node on a different subnet when multicast can't cross subnets (ie. not **nearby**).
 Or you may find a node today, but the network team decides tomorrow to tighten up policies and your cluster won't form at the next restart.
 Hence, multicast use in production is not recommended.

 Auto-discovery attempts to make this simpler still, trying to guess and so something appropriate if running AWS, Azure, GCP etc
 It may guess wrong, and wrong guesses aren't good for production.
 It may pick multicast, and as above, multicast isn't good for production.

 For coding
config.getNetworkConfig().getJoin().getAutoDetectionConfig().setEnabled(false);

config.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);

config.getNetworkConfig().getJoin().getTcpIpConfig().setEnabled(true).setMembers(List.of("123.456.789.1:5701"));


 If you just use the first two lines above, discovery is turned off. This process won't look for others to join,
(but it will allow others to find it). Adding the third line will probe that IP address, and if that's a node in
a 10 node cluster it'll tell you where the others are.

You don't need to list all nodes in the cluster for TCP discovery, as you may have a cluster that expands at peak time
and contracts later. So it's generally sufficient to list 3 nodes, unless you expand or contract by more than that.
So long as you get a response from one node, it'll tell you where all the others are.
If all nodes are down, a full restart, it wouldn't matter how many you listed.
If you're doing a rolling bounce, you'd only do one node at a time.

If you don't want others to join with the process, eg. in an integration test, give the cluster a unique name.

Neill

This message contains confidential information and is intended only for the individuals named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required, please request a hard-copy version. -Hazelcast

o haya

unread,
Feb 24, 2022, 1:12:17 PM2/24/22
to Hazelcast
Hi Neill,

THANK YOU for the SUPER detailed response and information!

Jim

o haya

unread,
Feb 24, 2022, 4:16:06 PM2/24/22
to Hazelcast
Hi Neill,

You suggested:

config.getNetworkConfig().getJoin().getTcpIpConfig().setEnabled(true).setMembers(List.of("123.456.789.1:5701"));

What class/method is "List.of()"?

Thanks,
Jim

o haya

unread,
Feb 24, 2022, 7:01:57 PM2/24/22
to Hazelcast
Hi,

I made the suggested changes and deployed the new app.

I added the following to make the new code:

            // Configure for TCP-detection/clustering
            config.getNetworkConfig().getJoin().getAutoDetectionConfig().setEnabled(false);
            config.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);
           
            List<String> myList = new ArrayList<>();
            myList.add( "svc01.XXX.com:9001" );
            myList.add( "svc02.XXX.com:9001" );

            System.out.println("+++++++++++++++++++++ In HazelConfig: About to set CLUSTERING!!!");
            config.getNetworkConfig().getJoin().getTcpIpConfig().setEnabled(true).setMembers(myList);
           
            return config;

I did the following test:

Our app has an endpoint for clearing the cache and another endpoint for displaying the cache.  I cleared the cache on both svc01 and svc02, then I sent a test request to the svc01.

I checked the cache (using the endpoint) on SVC01 and see the one request in the cache.  I checked the cache on the SVC02, BUT the cache appears to still be empty :(!!

Shouldn't the entry appear in the cache on the SVC02 because of the clustering being enabled/active?


I also have a question:  Is the order of the hostnames in the setMembers() matter?  In my case, the order of the hostnames is the same on both servers, i.e, "SVC01" and then "SVC02".


Thanks,
Jim

o haya

unread,
Feb 24, 2022, 7:56:55 PM2/24/22
to Hazelcast
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Hi,

I just wanted to confirm what the port in the list that is used in the setMembers() is supposed to be? 

I am using 9001, because I was told that the caching config uses ports 9001-9101.

Also, when testing, if I do netstat, should I see connections between the 2 SVC01/02 servers?  FYI, when I do netstat on either machine, I am only seeing port 9001 in LISTEN state.

o haya

unread,
Feb 24, 2022, 8:34:08 PM2/24/22
to Hazelcast
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Hi,

FYI, good news!  I was just able to get the TCP clustering working.  I realized that the ports 9001-9101 were not open on either SVC01 or SVC02, so I opened them in the firewall, bounced the app on both machines and tested again, and VOILA!!  The cache entry showed up on both sides when I tested!

Thanks,
Jim

o haya

unread,
Feb 25, 2022, 6:19:16 AM2/25/22
to Hazelcast
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Hi,

I have another question:  for the setMembers() call(s), do I have to only include the OTHER members of the cluster, OR does the setMembers() call include BOTH the other members of the cluster, AND the hostname/IP of the machine on which the configuration is being set?

In other words, where cluster is going to include svc01 and svc02, and on the svc01 machine, does the setMembers() only include svc02?  or does include both svc01 and svc02?

Thanks,
Jim

Josef Cacek

unread,
Feb 28, 2022, 3:17:36 AM2/28/22
to haze...@googlegroups.com
Hi Jim,

the usual TCP discovery setup is with the member list containing all
the cluster members. Then you can use the same config on all the
members without change.
Moreover, the member list is checked during the member startup and a
possible bind address is searched there. It's handy when the server
has more network interfaces.

Still, even if you list only part of addresses, the cluster should be
properly formed.
E.g. suppose you have members A, B, C, and member A doesn't list any
other member in the config. If B and C only list the address of member
A, the cluster should be formed eventually. Nevertheless, this is not
a very safe configuration, so I encourage you to list all possible
members in the configuration.

WRT to your previous mail, I realized you use hostnames in your
configuration. It can be a source of troubles and I suggest using IP
addresses instead.
New Hazelcast version 5.1 will have a bunch of improvements in this
area, so using hostnames with the new version will be safe.

Best regards,
-- Josef
> --
> You received this message because you are subscribed to the Google Groups "Hazelcast" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/5ef8a321-2521-4067-b7ed-27c101c1169dn%40googlegroups.com.

--

o haya

unread,
Mar 1, 2022, 9:14:33 PM3/1/22
to Hazelcast
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Hi,

Thanks for that information!

Jim
Reply all
Reply to author
Forward
0 new messages