Clustering Vertx

232 views
Skip to first unread message

keith.h...@airwallex.com

unread,
Oct 17, 2018, 2:26:49 AM10/17/18
to vert.x
Hi I have two instances that have been configured in a hazelcast cluster.

I have also configured the vertexOptions (clustered, clusterHost and clusterPort)

On each instance i have one verticle (A) with a consumer listening on a topic to provide prices

On each instance I have one http verticle (B) that calls vertx.eventBus().send(...) and one of the above verticles will respond with a price.

When i shutdown one of the instances and hit the http endpoint on the other instance - I would of expected the verticle (A) that is still running on the same instance to pick up the event and respond.

Instead I get the two exceptions =>

io.vertx.core.eventbus.ReplyException: Timed out after waiting 500(ms) for a reply. address: 1f6b27e9-b567-4531-b341-df44a47e5efb, repliedAddress: blah.blah.price

and 

Oct 17, 2018 5:17:23 PM io.vertx.core.eventbus.impl.clustered.ConnectionHolder
WARNING: Connecting to server localhost:61588 failed
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:61588

port 61588 is the other instance port that had been configured for 'clusterPort' in the VertxOptions.

I'm trying to test the resilience of a cluster in case that one instance dies unexpectedly or is shutdown for an upgrade.

Any advice?

Thanks.

keith.h...@airwallex.com

unread,
Oct 17, 2018, 2:57:10 AM10/17/18
to vert.x
Actually, I lie ... 50% of the time it works, the other 50% does not.

It's like the instance that is still running still this that there is two verticle (A) consumers listening on the topic. - but really there is only one which is the Verticle (A) running on the same instance.



keith.h...@airwallex.com

unread,
Oct 17, 2018, 4:04:24 AM10/17/18
to vert.x
This seems similar to  https://github.com/vert-x3/vertx-hazelcast/issues/13 which was fixed in  https://github.com/eclipse-vertx/vert.x/pull/1848

Is this correct? But wouldn't it be fixed in vertex version 3.5.4?



keith.h...@airwallex.com

unread,
Oct 18, 2018, 2:04:11 AM10/18/18
to vert.x
Looking closer at the code,

The ConnectionHolder ping/polls for connectivity and will remove the 'connection' from the event bus at this point:  https://github.com/eclipse-vertx/vert.x/blob/master/src/main/java/io/vertx/core/eventbus/impl/clustered/ConnectionHolder.java#L108

But as the instance died unexpectedly, the subscriptions for both killed instance still exists in the 'subs' map found at: https://github.com/eclipse-vertx/vert.x/blob/master/src/main/java/io/vertx/core/eventbus/impl/clustered/ClusteredEventBus.java#L64

Then in the following code point, it re-adds the serviceId that was just removed in the first above link: https://github.com/eclipse-vertx/vert.x/blob/master/src/main/java/io/vertx/core/eventbus/impl/clustered/ClusteredEventBus.java#L394

This 'subs' map is a hazelcast distributed map. (as mentioned in the mentioned issues). but nothing has removed those subscriptions (as the instance died)

Also, the ClusterManager.nodeListener.nodeLeft() callback is not being invoked at:  https://github.com/eclipse-vertx/vert.x/blob/master/src/main/java/io/vertx/core/impl/HAManager.java#L150

This callback looks like it clean's up the sub's map.

Thomas SEGISMONT

unread,
Oct 21, 2018, 7:09:16 PM10/21/18
to ve...@googlegroups.com
It can take some time before Hazelcast removes the dead node from the cluster view. Can you show the logs after you killed the other nodes?

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/e04866ee-5854-44ad-9d86-1a51301f8619%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

keith.h...@airwallex.com

unread,
Oct 21, 2018, 10:28:50 PM10/21/18
to vert.x
Ok .. found the issue


I just assumed that you could add multiple, but only one is allowed - And HAManager is / should be the only registered listener.

I can just add my own HZ member listener on the raw HZ instance to do what i was trying to do ....

All is good. Clustering is working nicely now.

Thanks Tomas.

Thomas SEGISMONT

unread,
Oct 22, 2018, 6:42:00 AM10/22/18
to ve...@googlegroups.com
Thanks for letting us know. 

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
Reply all
Reply to author
Forward
0 new messages