Issue to get Hazelcast cluster up and running and access it using DNS discovery mode

843 views
Skip to first unread message

Alex

unread,
Mar 23, 2021, 4:55:01 AM3/23/21
to vert.x

Hi there,
I am facing an issue to get an cluster up and running on Azure Kubernetes Service since I am not able to contact the service from outside. I am using Vert.x 3.9.5 with Hazelcast Kubernetes in version 1.5.1. on Java 11.
What I did so far...:

  1. Adding dependencies to the pom.xml and created a cluster.xml
<dependency>
          <groupId>io.vertx</groupId>
          <artifactId>vertx-hazelcast</artifactId>
          <version>${vertx.version}</version>
       </dependency>

       <dependency>
          <groupId>com.hazelcast</groupId>
          <artifactId>hazelcast-kubernetes</artifactId>
          <version>${hazelcast-kubernetes.version}</version>
       </dependency>

cluster.xml:

<hazelcast
  xsi:schemaLocation="http://www.hazelcast.com/schema/config http://www.hazelcast.com/schema/config/hazelcast-config-3.12.xsd"
  xmlns="http://www.hazelcast.com/schema/config"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <properties>
    <property name="hazelcast.logging.type">slf4j</property>
    <property name="hazelcast.discovery.enabled">true</property>
  </properties>
  <network>
    <join>
      <multicast enabled="false"/>
      <tcp-ip enabled="false" />
      <discovery-strategies>
        <discovery-strategy enabled="true"
                            class="com.hazelcast.kubernetes.HazelcastKubernetesDiscoveryStrategy">
          <properties>
            <property name="service-dns">service-hazelcast-server.default.svc.cluster.local</property>
          </properties>
        </discovery-strategy>
      </discovery-strategies>
    </join>
  </network>
</hazelcast>


2. Created a headless service (as described here: https://vertx.io/docs/3.9.6/vertx-hazelcast/java/#_configuring_for_kubernetes)
apiVersion: v1
kind: Service
metadata:
 namespace: default
 name: service-hazelcast-server
spec:
 selector:
   component: service-hazelcast-server
 clusterIP: None
 ports:
 - name: hz-port-name
   port: 5701
   targetPort: 5701
   protocol: TCP


3. Applied the headless service above to the Kubernetes cluster.

service/service-hazelcast-server ClusterIP None <none> 5701/TCP 3d23h

4. Created a deployment for a client
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway-deployment
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: api-gateway
  template:
    metadata:
      labels:
        app: api-gateway
        component: service-hazelcast-server
    spec:
      containers:
      - name: api-gateway
        image: SOME_REGISTRY/api_gateway_service/api_gateway_service:16
      restartPolicy: Always


5. And a service for the client
apiVersion: v1
kind: Service
metadata:
  name: api-gateway
  namespace: default
spec:
  type: LoadBalancer
  selector:
    app: api-gateway
  ports:
  - port: 8787
    targetPort: 8787
    name: rest-verticle-port
  - name: hazelcast-port
    port: 5701  
    targetPort: 5701

The client service is deployed correctly and gives me an external ip. Let's assume this external ip is 12.34.56.78 with ports 8787:31236/TCP,5701:31667/TCP. My Dockerfile is exposing port 8787.

Also the log shows that the cluster consists of one member (I added the complete log, just in case...) 2021-03-22 14:50:00.236 INFO 1 --- [ main] com.tsystems.aqm.gateway.Application : Starting Application v0.0.1-SNAPSHOT on api-gateway-deployment-6c95bc56df-qkw4r with PID 1 (/usr/verticles/api-gateway-service-0.0.1-SNAPSHOT.jar started by root in /usr/verticles) 2021-03-22 14:50:00.240 INFO 1 --- [ main] com.tsystems.aqm.gateway.Application : No active profile set, falling back to default profiles: default 2021-03-22 14:50:01.238 INFO 1 --- [ main] com.tsystems.aqm.gateway.Application : Started Application in 1.575 seconds (JVM running for 2.163) 2021-03-22 14:50:01.544 WARN 1 --- [worker-thread-0] c.h.instance.HazelcastInstanceFactory : Hazelcast is starting in a Java modular environment (Java 9 and newer) but without proper access to required Java packages. Use additional Java arguments to provide Hazelcast access to Java internal API. The internal API access is used to get the best performance results. Arguments to be used: --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED 2021-03-22 14:50:01.604 INFO 1 --- [worker-thread-0] com.hazelcast.instance.AddressPicker : [LOCAL] [dev] [3.12.5] Prefer IPv4 stack is true, prefer IPv6 addresses is false 2021-03-22 14:50:01.616 INFO 1 --- [worker-thread-0] com.hazelcast.instance.AddressPicker : [LOCAL] [dev] [3.12.5] Picked [10.244.2.14]:5701, using socket ServerSocket[addr=/0.0.0.0,localport=5701], bind any local is true 2021-03-22 14:50:01.645 INFO 1 --- [worker-thread-0] com.hazelcast.system : [10.244.2.14]:5701 [dev] [3.12.5] Hazelcast 3.12.5 (20191210 - 294ff46) starting at [10.244.2.14]:5701 2021-03-22 14:50:01.645 INFO 1 --- [worker-thread-0] com.hazelcast.system : [10.244.2.14]:5701 [dev] [3.12.5] Copyright (c) 2008-2019, Hazelcast, Inc. All Rights Reserved. 2021-03-22 14:50:01.896 INFO 1 --- [worker-thread-0] c.h.s.i.o.impl.BackpressureRegulator : [10.244.2.14]:5701 [dev] [3.12.5] Backpressure is disabled 2021-03-22 14:50:02.372 INFO 1 --- [worker-thread-0] c.h.s.d.integration.DiscoveryService : [10.244.2.14]:5701 [dev] [3.12.5] Kubernetes Discovery properties: { service-dns: service-hazelcast-server.default.svc.cluster.local, service-dns-timeout: 5, service-name: null, service-port: 0, service-label: null, service-label-value: true, namespace: default, resolve-not-ready-addresses: false, use-node-name-as-external-address: false, kubernetes-api-retries: 3, kubernetes-master: https://kubernetes.default.svc} 2021-03-22 14:50:02.375 INFO 1 --- [worker-thread-0] c.h.s.d.integration.DiscoveryService : [10.244.2.14]:5701 [dev] [3.12.5] Kubernetes Discovery activated with mode: DNS_LOOKUP 2021-03-22 14:50:02.438 INFO 1 --- [worker-thread-0] com.hazelcast.instance.Node : [10.244.2.14]:5701 [dev] [3.12.5] Activating Discovery SPI Joiner 2021-03-22 14:50:02.683 INFO 1 --- [worker-thread-0] c.h.s.i.o.impl.OperationExecutorImpl : [10.244.2.14]:5701 [dev] [3.12.5] Starting 2 partition threads and 3 generic threads (1 dedicated for priority tasks) 2021-03-22 14:50:02.686 INFO 1 --- [worker-thread-0] c.h.internal.diagnostics.Diagnostics : [10.244.2.14]:5701 [dev] [3.12.5] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments. 2021-03-22 14:50:02.690 INFO 1 --- [worker-thread-0] com.hazelcast.core.LifecycleService : [10.244.2.14]:5701 [dev] [3.12.5] [10.244.2.14]:5701 is STARTING WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by com.hazelcast.internal.networking.nio.SelectorOptimizer (jar:file:/usr/verticles/api-gateway-service-0.0.1-SNAPSHOT.jar!/BOOT-INF/lib/hazelcast-3.12.5.jar!/) to field sun.nio.ch.SelectorImpl.selectedKeys WARNING: Please consider reporting this to the maintainers of com.hazelcast.internal.networking.nio.SelectorOptimizer WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 2021-03-22 14:50:03.117 WARN 1 --- [worker-thread-0] c.h.s.d.integration.DiscoveryService : [10.244.2.14]:5701 [dev] [3.12.5] Cannot fetch the current zone, ZONE_AWARE feature is disabled 2021-03-22 14:50:08.336 INFO 1 --- [worker-thread-0] c.h.internal.cluster.ClusterService : [10.244.2.14]:5701 [dev] [3.12.5] Members {size:1, ver:1} [ Member [10.244.2.14]:5701 - 2ef9b0fc-6d74-4df9-8c4d-7dc625dd4f76 this ] 2021-03-22 14:50:08.367 INFO 1 --- [worker-thread-0] com.hazelcast.core.LifecycleService : [10.244.2.14]:5701 [dev] [3.12.5] [10.244.2.14]:5701 is STARTED 2021-03-22 14:50:08.734 INFO 1 --- [worker-thread-3] c.h.i.p.impl.PartitionStateManager : [10.244.2.14]:5701 [dev] [3.12.5] Initializing cluster partition table arrangement... 2021-03-22 14:50:09.354 INFO 1 --- [ntloop-thread-0] com.tsystems.aqm.gateway.Application : Rest Controller Verticle started: e91e44bf-e754-40bc-b92b-168f23d11ef4

I should now be able to call 12.34.56.78:8787/someRessource. But this just gives me a timeout. What am I doing wrong? Is there anything missing or badly configured? I would really appreciate any kind of hint.

Thank you in advance.

Alex

unread,
Mar 23, 2021, 7:52:36 AM3/23/21
to vert.x
Sorry for the bad formatting of the log in the previous post. Here is a hopefully better formatted log output:

2021-03-22 14:50:09.357  INFO 1 --- [worker-thread-5] c.t.aqm.gateway.APIGatewayVerticle       : API Gateway is running on port 8787

Thomas SEGISMONT

unread,
Mar 24, 2021, 5:36:15 AM3/24/21
to vert.x
Hi

If you can't reach the app from the outside, it's most probably a problem with the ingress configuration, not with Vert.x clustering.

I can't see anything wrong in your clustering config. How many pods have you started? If you started one pod then what we see in the logs is expected.

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/023cd339-43a1-4112-ae35-ccf48036226cn%40googlegroups.com.

Alex

unread,
Mar 24, 2021, 6:33:34 AM3/24/21
to vert.x
Hi,
thanks for your reply.
Exactly, I've got only one pod started with the app. Do you maybe have an idea whats needs to be changed or updated regarding the ingress config? - I cannot find anything that really helps me to get this to work...
The thing is... If I deploy another app, which is not part of the cluster and and LoadBalancer service, it is reachable as expected and working just smoothly.

Thomas SEGISMONT

unread,
Mar 24, 2021, 9:24:30 AM3/24/21
to vert.x
Sorry I don't have any experience configuring AWS Kubernetes.

I can only recommend to double-check the server is listening on the right port.

Alex

unread,
Mar 24, 2021, 9:35:00 AM3/24/21
to vert.x
Thank you anyway! Maybe someone else will read this and has a hint for me...

I am expecting that somehow it is a privilege/access problem with Azure Kubernetes Service. My guess:
The api-gateway service is of type LoadBalancer. So it gets it's external ip when the service is deployed. But the container was started in its pod before, so maybe it is missing the clusterHost which is the external IP?

Alex

unread,
Apr 12, 2021, 11:27:25 AM4/12/21
to vert.x
Hi there,
I finally managed to get the cluster running. The mistake was a missing port forwarding for hazelcast in my container. The small things matter, don't they? :)
I am facing another problem, unfortunately: I cannot communicate with messages using the Eventbus.
Since I am using a separate main() method I tried to specify the clusterhost as follows in each cluster member:

public static void main(String[] args) {
       
        Config hazelcastConfig = new Config();
        hazelcastConfig.setProperty("hazelcast.discovery.enabled", "true");
        hazelcastConfig.getNetworkConfig().getJoin().getTcpIpConfig().setEnabled(false);
        hazelcastConfig.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);
        hazelcastConfig.getNetworkConfig().getJoin().getKubernetesConfig().setEnabled(true)
//            .setProperty("namespace", "default")
            .setProperty("service-dns", "clustered-app.default.svc.cluster.local");
       
       
        ClusterManager mgr = new HazelcastClusterManager(hazelcastConfig);

        String address = null;
        try {
            address = InetAddress.getLocalHost().getHostAddress();
        } catch (UnknownHostException e) {
            log.info("Unable to retrieve local host address and set it as event bus host. " + e.getMessage());
        }
       
        VertxOptions options = new VertxOptions().setClusterManager(mgr)
                                                 .setBlockedThreadCheckInterval(960000)
                                                 .setClustered(true)
                                                 .setEventBusOptions(new EventBusOptions().setPort(5701).setHost(address));       
        Vertx.clusteredVertx(options, ar -> {

            if (ar.failed()) {
                log.error("Cannot create vert.x instance : " + ar.cause());
            } else {
                Vertx vertx = ar.result();
               // deploy verticles here...
}


And I start the verticles with:

java -jar MY-FAT-VERTICLE-JAR.jar -cluster -conf config.json

It seems like the setting I've done using the VertxOptions.setEventBusOptions(...) has no effect since the output of the console on startup is (which is the POD-IP):
INFO: No cluster-host specified so using address 10.244.1.37

What do I need to specify here instead? And how? Probably the service name of the headless service?
Hint: My Kubernetes uses 3 VMs.

I appreciate every hint since I am so close to finally get it up and running. Thx in advance!

Thomas SEGISMONT

unread,
Apr 13, 2021, 4:34:11 AM4/13/21
to vert.x
Hi,

From the output it seems that the main class for this jar is not your custom class but the Vert.x Launcher class.
You can verify this by inspecting the META-INF/MANIFEST.MF file.

Also, if you have your own main method, the command line args like -cluster or -conf are redundant.

Regards

Alex

unread,
Apr 13, 2021, 8:35:53 AM4/13/21
to vert.x
Hi back,
thank you so much. It's working now, finally!
Thank you all guys for vertx in general. It is just an amazing framework.
Best
Alex

Thomas SEGISMONT

unread,
Apr 13, 2021, 8:46:39 AM4/13/21
to vert.x
Thanks for the kind words. I'm glad you like it

SONG

unread,
Jun 11, 2023, 7:31:14 PM6/11/23
to vert.x
is Anyone here?

I have same problem in 2023...

I made clustering in kubernetes and clustered together.
but.. event bus can't communicate each other....

Only different configuration is.. event bus configuration i think..

here is my config..

Config hazelcastConfig = new Config();
hazelcastConfig.setProperty("hazelcast.discovery.enabled", "true");
hazelcastConfig.getNetworkConfig().getJoin().getTcpIpConfig().setEnabled(false);
hazelcastConfig.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);
hazelcastConfig.getNetworkConfig().getJoin().getKubernetesConfig().setEnabled(true)
.setProperty("namespace", "postgres-operator")
.setProperty("service-dns", "fintech-cluster.postgres-operator.svc.cluster.local");


.setEventBusOptions(new EventBusOptions()
.setSsl(false)
.setClusterPublicHost("myapi")
)


the other modules is just different setClustertPublicHost String
and it same with pod name.

the reason I use setClusterPublicHost is 
when I use setHost it can not bind address
   ( which one use same address and port?)



1. hazelcast members

Members {size:2, ver:2} [
Member [10.1.104.57]:5701 - 8f0dd3f0-3af5-479f-b8a7-78ac0203499b
Member [10.1.219.107]:5701 - 96739ab1-39af-4360-a552-8164b1c29f69 this
]
this ip is pod ip. not service.

2. logs..

15:04:36.999 [vert.x-eventloop-thread-8] DEBUG io.vertx.core.eventbus.impl.clustered.ConnectionHolder -- Not connected to server e1202ab7-a4ee-484b-9fd7-4e6b74679d2b - starting queuing
15:04:37.020 [vert.x-eventloop-thread-8] DEBUG io.netty.resolver.dns.DnsQueryContext -- [id: 0x1cc8a94a] WRITE: UDP, [21854: /10.152.183.10:53], DefaultDnsQuestion(data.postgres-operator.svc.cluster.local. IN A)
15:04:37.033 [vert.x-eventloop-thread-8] DEBUG io.netty.resolver.dns.DnsNameResolver -- [id: 0x1cc8a94a] RECEIVED: UDP [21854: /10.152.183.10:53], DatagramDnsResponse(from: /10.152.183.10:53, to: /0:0:0:0:0:0:0:0:58444, 21854, QUERY(0), NoError(0), RD AA)
DefaultDnsQuestion(data.postgres-operator.svc.cluster.local. IN A)
DefaultDnsRawRecord(data.postgres-operator.svc.cluster.local. 5 IN A 4B)

15:05:37.073 [vert.x-eventloop-thread-8] WARN io.vertx.core.eventbus.impl.clustered.ConnectionHolder -- Connecting to server e1202ab7-a4ee-484b-9fd7-4e6b74679d2b failed


please help..
2021년 4월 13일 화요일 오후 9시 46분 39초 UTC+9에 Thomas SEGISMONT님이 작성:

Thomas SEGISMONT

unread,
Jun 19, 2023, 5:02:52 AM6/19/23
to ve...@googlegroups.com
Looking at the logs it seems Hazelcast discovery works properly.

I would suggest to remove:
.setClusterPublicHost("myapi")

Otherwise the eventbus cannot connect to the actual IP of the receiver pod.
Don't set clusterHost either.

SONG

unread,
Jun 21, 2023, 2:27:38 AM6/21/23
to vert.x
Thank you.

I deleted it. and work!!



2023년 6월 19일 월요일 오후 6시 2분 52초 UTC+9에 Thomas SEGISMONT님이 작성:
Reply all
Reply to author
Forward
0 new messages