About cluster ports, hz instances don't see each other

31 views
Skip to first unread message

Joan Balagueró

unread,
Jun 16, 2023, 5:30:30 AM6/16/23
to Hazelcast
Hello,

Hazelcast 4.2.6.

We start a hazelcast instance programatically in this way (below just the network section):

cfg.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);
cfg.getNetworkConfig().setPort(9975);
cfg.getNetworkConfig().setPortAutoIncrement(false);
cfg.getNetworkConfig().setReuseAddress(true);

This works perfectly in our test environment, but when we moved to live in Amazon the 3 instances didn't see each other. The 9975 is open and reachable by all the instances, but I get 3 clusters of 1 node instead of a 3-node cluster.

So the question is: must I open any other port in addition to the 9975?

And another question: do you any issue in having just one port open for communication between instances?

Thanks!

Joan.


Joan Balagueró

unread,
Jun 16, 2023, 5:44:26 AM6/16/23
to Hazelcast
Let me explain a bit more. There is no autodiscovery here, all the ec2 hostnames are on a mysql table, when a new hz instance starts it just reads all the hostnames on this table and add them as a cluster members. This is the code (the variable 'this.nodes' contains the list of hostnames):

 private void addMembers(Config cfg)
 {
  ArrayList<String> arrHosts = new ArrayList<>();
 
  for (dtoNode node: this.nodes) {
  arrHosts.add(node.getIPAddressFromHostname());
  }
 
  TcpIpConfig tcpconfig = cfg.getNetworkConfig().getJoin().getTcpIpConfig();
  tcpconfig.setEnabled(true);
  tcpconfig.setMembers(arrHosts);
 
  InterfacesConfig ic = cfg.getNetworkConfig().getInterfaces();
  ic.setEnabled(true);
  ic.setInterfaces(arrHosts);
 }

Josef Cacek

unread,
Jun 16, 2023, 10:43:46 AM6/16/23
to haze...@googlegroups.com
Hi Joan,

I don't see the custom port used in the code "arrHosts.add(node.getIPAddressFromHostname());"
Shouldn't it be rather the arrHosts.add(node.getIPAddressFromHostname() + ":9975");

If the port is not provided for the member address (in the tcp-ip discovery), then three default ports are tried (5701, 5702, 5703).

Regards,
-- Josef

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/c07d41b6-81e8-445c-b066-96ed59e41057n%40googlegroups.com.

This message contains confidential information and is intended only for the individuals named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required, please request a hard-copy version. -Hazelcast

Joan Balagueró

unread,
Jun 17, 2023, 6:32:55 AM6/17/23
to Hazelcast
Hi Josef,

Thanks a lot for your quick response. I will try this on live but ... why is this working perfectly in my test environment? It's a 2-node cluster, below the traces from both nodes when I stop and start the second node:

First node trace when second node stops
-----------------------------------------------------------------

2023-06-17 10:26:18     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Shutdown request of Member [10.1.0.4]:9975 - 2a0ba087-ec28-4025-b621-fac431ac3d00 is handled
2023-06-17 10:26:18     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Repartitioning cluster data. Migration tasks count: 233
2023-06-17 10:26:18     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] All migration tasks have been completed. (repartitionTime=Sat Jun 17 10:26:18 UTC 2023, plannedMigrations=233, completedMigrations=233, remainingMigrations=0, totalCompletedMigrations=700)
2023-06-17 10:26:18     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Connection[id=1, /10.1.0.5:9975->/10.1.0.4:37933, qualifier=null, endpoint=[10.1.0.4]:9975, alive=false, connectionType=MEMBER, planeIndex=0] closed. Reason: Connection closed by the other side
2023-06-17 10:26:18     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Connecting to /10.1.0.4:9975, timeout: 10000, bind-any: true
2023-06-17 10:26:18     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Could not connect to: /10.1.0.4:9975. Reason: IOException[Connection refused to address /10.1.0.4:9975]
2023-06-17 10:26:18     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Connecting to /10.1.0.4:9975, timeout: 10000, bind-any: true
2023-06-17 10:26:18     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Could not connect to: /10.1.0.4:9975. Reason: IOException[Connection refused to address /10.1.0.4:9975]
2023-06-17 10:26:18     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Connecting to /10.1.0.4:9975, timeout: 10000, bind-any: true
2023-06-17 10:26:18     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Could not connect to: /10.1.0.4:9975. Reason: IOException[Connection refused to address /10.1.0.4:9975]
2023-06-17 10:26:18     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Connecting to /10.1.0.4:9975, timeout: 10000, bind-any: true
2023-06-17 10:26:18     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Could not connect to: /10.1.0.4:9975. Reason: IOException[Connection refused to address /10.1.0.4:9975]
2023-06-17 10:26:18     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Removing connection to endpoint [10.1.0.4]:9975 Cause => java.io.IOException {Connection refused to address /10.1.0.4:9975}, Error-Count: 5
2023-06-17 10:26:18     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Removing Member [10.1.0.4]:9975 - 2a0ba087-ec28-4025-b621-fac431ac3d00
2023-06-17 10:26:18     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Partition balance is ok, no need to repartition.
2023-06-17 10:26:18     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6]

Members {size:1, ver:3} [
        Member [10.1.0.5]:9975 - 4f3acadc-2e05-4198-bd4f-035c99e8b967 this
]

2023-06-17 10:26:18     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Committing/rolling-back live transactions of [10.1.0.4]:9975, UUID: 2a0ba087-ec28-4025-b621-fac431ac3d00


First node trace when second node starts
-------------------------------------------------------------------

2023-06-17 10:27:29     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Initialized new cluster connection between /10.1.0.5:9975 and /10.1.0.4:52409
2023-06-17 10:27:35     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6]

Members {size:2, ver:4} [
        Member [10.1.0.5]:9975 - 4f3acadc-2e05-4198-bd4f-035c99e8b967 this
        Member [10.1.0.4]:9975 - 27f888d8-72f8-4f7e-9005-cee7b8814448
]

2023-06-17 10:27:36     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Repartitioning cluster data. Migration tasks count: 467
2023-06-17 10:27:36     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] All migration tasks have been completed. (repartitionTime=Sat Jun 17 10:27:36 UTC 2023, plannedMigrations=467, completedMigrations=467, remainingMigrations=0, totalCompletedMigrations=1167)



Second node trace (starting)
----------------------------------------------

2023-06-17 10:27:27     [LOCAL] [ventusproxyCluster] [4.2.6] Interfaces is enabled, trying to pick one address matching to one of: [10.1.0.5, 10.1.0.5, 10.1.0.4, 10.1.0.4]
2023-06-17 10:27:27     [10.1.0.4]:9975 [ventusproxyCluster] [4.2.6] Hazelcast 4.2.6 (20221125 - 622d299) starting at [10.1.0.4]:9975
2023-06-17 10:27:29     [10.1.0.4]:9975 [ventusproxyCluster] [4.2.6] Using TCP/IP discovery
2023-06-17 10:27:29     [10.1.0.4]:9975 [ventusproxyCluster] [4.2.6] CP Subsystem is not enabled. CP data structures will operate in UNSAFE mode! Please note that UNSAFE mode will not provide strong consistency guarantees.
2023-06-17 10:27:29     [10.1.0.4]:9975 [ventusproxyCluster] [4.2.6] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
2023-06-17 10:27:29     [10.1.0.4]:9975 [ventusproxyCluster] [4.2.6] [10.1.0.4]:9975 is STARTING
2023-06-17 10:27:29     [10.1.0.4]:9975 [ventusproxyCluster] [4.2.6] Initialized new cluster connection between /10.1.0.4:52409 and /10.1.0.5:9975
2023-06-17 10:27:35     [10.1.0.4]:9975 [ventusproxyCluster] [4.2.6]

Members {size:2, ver:4} [
        Member [10.1.0.5]:9975 - 4f3acadc-2e05-4198-bd4f-035c99e8b967
        Member [10.1.0.4]:9975 - 27f888d8-72f8-4f7e-9005-cee7b8814448 this
]

2023-06-17 10:27:36     [10.1.0.4]:9975 [ventusproxyCluster] [4.2.6] [10.1.0.4]:9975 is STARTED


Thanks,

Joan.

Joan Balagueró

unread,
Jun 17, 2023, 6:44:53 AM6/17/23
to Hazelcast
Hi again Josef,

Ok, looking at the above traces now, I see that other ports are being used:

Port 37933:
2023-06-17 10:26:18     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Connection[id=1, /10.1.0.5:9975->/10.1.0.4:37933, qualifier=null, endpoint=[10.1.0.4]:9975, alive=false, connectionType=MEMBER, planeIndex=0] closed. Reason: Connection closed by the other side

Port 52409:
2023-06-17 10:27:29     [10.1.0.5]:9975 [ventusproxyCluster] [4.2.6] Initialized new cluster connection between /10.1.0.5:9975 and /10.1.0.4:52409
And:
2023-06-17 10:27:29     [10.1.0.4]:9975 [ventusproxyCluster] [4.2.6] Initialized new cluster connection between /10.1.0.4:52409 and /10.1.0.5:9975

The point is that all ports between live servers are closed, and just the ones necessaries are open, in our case just 9975 (that's why test works, there is not any port restriction on test). 

So, must I also open 37933 and 52409? And where can I configure these ports? Are always these ones, or maybe they change on every stop/start?

Thanks,

Joan.

Joan Balagueró

unread,
Jun 17, 2023, 6:16:21 PM6/17/23
to Hazelcast
Hi,

I think the issue is related with the outbound port. I have fixed this port to just one value ... I will try next Monday on live.

Thanks,

Joan.

Reply all
Reply to author
Forward
0 new messages