Hazelcast member discovery taking upto 9 minutes

59 views
Skip to first unread message

Akashdeep kaur

unread,
Nov 22, 2022, 8:06:15 AM11/22/22
to Hazelcast
Hello,

We are using hazelcast 3.12.7 in embedded mode, where we are using deployments for managing embedded hazelcast members. 

We use Kubernetes API discovery mechanism for members discovery, which is taking up to 8-9 minutes.

Is that normal? And also is there any way to reduce the discovery time?

Thanks!

Neil Stevenson

unread,
Nov 22, 2022, 11:04:08 AM11/22/22
to Hazelcast
Hi
 "statefulset" is recommended over "deployment" - see https://github.com/hazelcast/hazelcast-kubernetes#requirements-and-recommendations

 I just tried using 5.2.0 and DNS. Pod-1 in a statefulset found pod-0 in 7 seconds.

 An upgrade is worth doing if you can, as there's been other work in the Kubernetes plugin since then, might be what you're seeing is a known&fixed fault.
 It's at least worth a try in Dev/Test to see if upgrading sorts the discovery time.

Neil

Akashdeep kaur

unread,
Nov 24, 2022, 5:20:17 AM11/24/22
to Hazelcast
Hi Neil,

Thanks a lot for your response!

We have tried hazelcast 5.2.0 with "statefulset" and DNS Lookup as discovery mechanism. 

Still, the hazelcast members took around 7 minutes to discover each other. I'm attaching the hazelcast TRACE logs and the hazelcast configuration here for your reference. 

Thanks!



TRACE-logs-pod-0.log
TRACE-logs-pod-1.log
hazelcast-configuration.xml

Neil Stevenson

unread,
Nov 24, 2022, 11:21:56 AM11/24/22
to Hazelcast
 You are running two Hazelcast instances in each container (a "DMPManager-manager" and a "dev"). If this is deliberate, you would be better to run them on different port ranges so they don't interfere with each other.

 The 'service-dns' in your XML doesn't match the logs, but perhaps you're amending it in Java. Other than that I can't spot anything immediately wrong.

 Can you try without setting the namespace ?
 Since it's a DNS lookup slowness, it may be something specific to namespaces -- I use default.

 It's also worth trying without those 4 properties from the config in case they are causing issues (though "hazelcast.logging.type" is pretty safe)
 
Neil
Message has been deleted

Akashdeep kaur

unread,
Nov 29, 2022, 7:25:38 AM11/29/22
to Hazelcast
I have removed the [dev] Hazelcast instance. Now only one instance is running in each container. The Hazelcast instances are still taking longer to join the cluster. I have attached here, the logs and Hazelcast configuration used.

We have used, Statefulset instead of Deployment and DNS lookup instead of Kubernetes API discovery as suggested above.

FYI: I have also tried removing these 4 properties from the configuration. By removing this, the members didn't join even after 15 minutes. So, I again added them in the configuration.
<properties>
<property name="hazelcast.mancenter.enabled">false</property>
<property name="hazelcast.wait.seconds.before.join">0</property>
<property name="hazelcast.logging.type">slf4j</property>
<property name="hazelcast.discovery.enabled">true</property>
</properties>
Archive.zip

Neil Stevenson

unread,
Nov 29, 2022, 1:02:26 PM11/29/22
to Hazelcast
Are the clocks on the Kubernetes nodes synchronised ?

Based on the timestamps, 10.244.3.64 starts first
2022-11-29 04:31:19,457 [localhost-startStop-1] INFO  c.h.l.Slf4jFactory$Slf4jLogger.65 - [10.244.3.64]
2022-11-29 04:32:56,138 [localhost-startStop-1] INFO  c.h.l.Slf4jFactory$Slf4jLogger.65 - [10.244.1.53]
but in the member list it's 2nd
Members {size:2, ver:2} [
        Member [10.244.1.53]:5701 - 57f807d4-943c-492d-bad5-77acb186dea3 this
        Member [10.244.3.64]:5701 - f8ef8854-9fc8-4734-8ddd-d1ef3405fa6a
]

The logs show other things running in the same process (Tomcat, Zipkin, etc) and it's possible these are consuming resources so Hazelcast doesn't get enough cpu.

  The below would be the minimal networking. 
  You shouldn't need properties to make it faster. If you do, that's a symptom of the fault, not the cure.
  Note here it's using the default namespace instead of 'dmp-system' and that's something to explore.
network:
  join:
    auto-detection:
      enabled: false
    multicast:
      enabled: false
    kubernetes:
      enabled: true
      service-dns: 'yourappname.default.svc.cluster.local'

Neil

Reply all
Reply to author
Forward
0 new messages