Help with Vertx clustered deploy in AWS

2,321 views
Skip to first unread message

Tomas Riha

unread,
Mar 14, 2014, 3:45:37 AM3/14/14
to ve...@googlegroups.com
Hi,

Ive just recently started to use Vertx and so far I love it! But last night I ran into some problems with my deploy on AWS.

This is what I want to do.

I have two modules some_module_storage and some_module_rest_service. I want to use the EventBus as my interface between the storage and my services, the rest service is just the first of many on the same storage. My plan is to deploy the service modules and the storage module on separate nodes so that they can be individually horizontally scaled. So I really do need to cluster the EventBus. I also do like the fatJar deployment as its easy for me to upgrade Vertx and distribute. I bake my AWS images for each release of a new module and its very nice to not have to bother with an extra layer of dependency management between the module and Vertx. 

What I understand is that amazon does not support multicast so I need to reconfigure my cluster.xml. I have done this and I am currently distributing the cluster.xml separately. My first question is can I put the cluster.xml inside the fatJar? If so where?

When I run with dedicated IP configuration in the cluster.xml the application clusters it self. But I obviously do not want to do this as I dont know the AWS node IPs as they change on each deploy.

So I want to use the aws configuration. I do this and turn multicast and dedicated ip configs off by setting them to false. Now nothing happens.

What I realized next is that the hazlecast-cloud.jar is not bundled with vertx.

I do add the hazlecast could dependency with version 2.6.6. Is that the correct version to add? Where should it be added in the fatJar?

When I did this I do get a class not found exception. I know I should provide which one I get but I dont have that log entry where Im located atm.

So here is my next question what else do I need to add in form of dependencies inorder to get this to work?

Also I think it would be really nice if the Vertx documentation could cover in a good way how to deploy into AWS. So far Ive had to search the documentation, this forum and the hazelcast forum to get to where I am. That is imho way to fragmented inorder to get Vertx to cluster in AWS, which cant really be that uncommon. If you help me get this up and running Ill help compile some documentation on it how I got it to work.

Thanks in advance.
Tomas

mor...@citizenme.com

unread,
Mar 14, 2014, 7:03:21 AM3/14/14
to ve...@googlegroups.com
Hi Tomas,

What I think you need is to use a framework like Chef or Puppet for this sort of stuff (distribution and configuration management). That's what I have done - and it works really well. I have created a cookbook for vert.x (doesn't include fatjar - instead it downloads and installs the vertx binary separately, deploys the apps from my own Nexus using Chef "data bags" as the driver to configuration). You can find my Chef cookbook here: https://github.com/citizenme/chef-repo/tree/master/cookbooks/vertx.

For a quick-entry into for instance Chef you can consider using http://aws.amazon.com/opsworks/.

Alternatively, simply consider scripting it (download of vert.x, deployment and fixing of configuration) as part of a deployment script that you distribute - use the AWS API to extract the information you need and update the configuration files. Bundling configuration for servers (and topology) that you effectively know nothing about at compile/build time is a really bad idea. Much better to use the right tools for this - like Chef or Puppet.

> I do add the hazlecast could dependency with version 2.6.6. Is that the correct version to add? Where should it be added in the fatJar?

I am yet to build in the automated download of hazelcast-cloud jar into my vert.x cookbook - but it will happen real soon. If you look in the lib directory of a vert.x installation (I imagine that a fatjar "zip" contains a lib directory too) you will find a hazelcast-X.Y.Z.jar file - make sure you match the version at the end for the hazelcast-cloud-X.Y.Z.jar version - in my case I'm on Vert.x 2.1RC1 and it comes with version hazelcast-2.6.7.jar, which must be matched up with hazelcast-cloud-2.6.7.jar

> So I want to use the aws configuration. I do this and turn multicast and dedicated ip configs off by setting them to false. Now nothing happens.

Yeah, that's because the hazelcast-cloud.jar doesn't exist - it kind of gives you a hint about this in the log file, but not entirely obvious.

I strongly suggest that you configure the AWS part such that you tag your cluster - in this way you can have multiple individual & independent clusters of vert.x apps running in AWS. Also, make sure you set the region to the correct string or hazelcast-cloud will not find other nodes.

Example, I'm in Ireland and use eu-west-1. My vert.x tag for my clustered instances are as follows:

vertx-cluster = app-cluster-1

This is reflected in cluster.xml as:

    <network>
        <port auto-increment="true">5701</port>
        <join>
            <aws enabled="true">
                <access-key>access-key</access-key>
                <secret-key>secret-key</secret-key>
                <region>eu-west-1</region>
                <tag-key>vertx-cluster</tag-key>
                <tag-value>app-cluster-1</tag-value>
            </aws>
        </join>
....


> If you help me get this up and running Ill help compile some documentation on it how I got it to work.

I'm yet to complete docs for my vert.x Chef cookbook but once I have them in place I'll let the list know.

Hope this helps.

Best regards,
Morten Jensen

Tomas Riha

unread,
Mar 14, 2014, 1:57:48 PM3/14/14
to ve...@googlegroups.com
Here is the exception I was talking about.

Vertx 2.1RC1 and hazelcast-could.jar 2.6.7 built into fatJar (the extra jar is in the root lib dir of the fatJar). The cluster.xml is outside and the -cp points to its directory.

Failed to run fat jar
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.vertx.java.platform.impl.FatJarStarter.go(FatJarStarter.java:197)

        at org.vertx.java.platform.impl.FatJarStarter.main(FatJarStarter.java:59
)
Caused by: java.lang.NoClassDefFoundError: com/hazelcast/core/MembershipListener

        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(Unknown Source)
        at java.security.SecureClassLoader.defineClass(Unknown Source)
        at java.net.URLClassLoader.defineClass(Unknown Source)
        at java.net.URLClassLoader.access$100(Unknown Source)
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at org.vertx.java.spi.cluster.impl.hazelcast.HazelcastClusterManagerFact
ory.createClusterManager(HazelcastClusterManagerFactory.java:31)
        at org.vertx.java.core.impl.DefaultVertx.<init>(DefaultVertx.java:117)
        at org.vertx.java.platform.impl.DefaultPlatformManager.createVertxSynchr
onously(DefaultPlatformManager.java:135)
        at org.vertx.java.platform.impl.DefaultPlatformManager.<init>(DefaultPla
tformManager.java:115)
        at org.vertx.java.platform.impl.DefaultPlatformManagerFactory.createPlat
formManager(DefaultPlatformManagerFactory.java:33)
        at org.vertx.java.platform.impl.cli.Starter.createPM(Starter.java:211)
        at org.vertx.java.platform.impl.cli.Starter.startPM(Starter.java:255)
        at org.vertx.java.platform.impl.cli.Starter.runVerticle(Starter.java:275
)
        at org.vertx.java.platform.impl.cli.Starter.<init>(Starter.java:91)
        at org.vertx.java.platform.impl.cli.Starter.main(Starter.java:55)
        ... 6 more
Caused by: java.lang.ClassNotFoundException: com.hazelcast.core.MembershipListen
er
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 27 more

Tomas Riha

unread,
Mar 14, 2014, 2:22:40 PM3/14/14
to ve...@googlegroups.com
Hi Morten,

Thanks for your answer.

Im actually using packer instead of chef. It allows me to do what I do with chef plus building the AMI. Basiclly I build with gradle and Ive got two extra tasks one bake and then one launch. Bake uses packer to build the AMI and then I use Netflix ASGARD to push the image into my auto scaling group. Works pretty sweet if you ask me. 

Tomas

mor...@citizenme.com

unread,
Mar 14, 2014, 2:43:32 PM3/14/14
to ve...@googlegroups.com
Hi Tomas,

This is a class inside hazelcast:

Caused by: java.lang.NoClassDefFoundError: com/hazelcast/core/MembershipListener

It exists in hazelcast-2.6.7.jar - did this file by any chance disappear from your fatJar (or is it not packaged in it?) - both hazelcast-2.6.7.jar and hazelcast-cloud-2.6.7.jar must be part of the class path.

Morten

Tomas Riha

unread,
Mar 14, 2014, 3:24:36 PM3/14/14
to ve...@googlegroups.com
Hmm so it seems that when adding the dependency to hazelcast-cloud the hazelcast artifact disappears. Hmmm strange. Anyways so I solved it by adding the hazelcast-all jar.

Next question.

When in my local cluster on my dev machine my sending and reciving message over the EventBus works just fine. But now that I have finally clustered it correctly in aws and they do find each other my registered handler does not pick it up. Any thoughts?

Thanks again
Tomas

mor...@citizenme.com

unread,
Mar 14, 2014, 3:49:13 PM3/14/14
to ve...@googlegroups.com
> Anyways so I solved it by adding the hazelcast-all jar

Yes, that's another way of doing it. A previous recent thread had a discussion with maintainers about either adding hazelcast-cloud.jar or replacing the existing hazelcast.jar with hazelcast-all.jar - they were not eager because they feel few people run an AWS configuration. They want to keep libs at an absolute minimum to reduce the footprint. I can agree with this in general - but would have loved to have seen hazelcast-all.jar in there as it would solve a few issues (for instance also for fatjars in AWS).

> my registered handler does not pick it up. 

Have you checked all nodes in your cluster?

Also, try reducing your AWS "cluster" to one node - simply shut them all down, start one up - and try again.

It works fine in my end... And I have experimented with having 1 verticle (homebrew) http-to-event-bus proxy on one node and 1 verticle event-bus receiver on another node - and communication works really well even between nodes.

Thanks.
Morten

Tomas Riha

unread,
Mar 16, 2014, 2:46:07 PM3/16/14
to ve...@googlegroups.com
Im still confused by this.

I have only two nodes in the cluster the rest-service node and the backend-service node.

I do a "send" to the event bus. What Im wondering (and should test but would like some theory around) is if the message I send from the rest-service node ends up "stuck" in that node and not transported to the other node thats listening for it. If thats so should I use "publish" instead?

Tomas

Tomas Riha

unread,
Mar 16, 2014, 3:01:46 PM3/16/14
to ve...@googlegroups.com
Also Im wondering.

Could it be my SG that is poorly configured so that it can actually join nodes but not communicate between them?

What ports do you have in your SG?

Thanks again
Tomas

mor...@citizenme.com

unread,
Mar 17, 2014, 5:24:12 AM3/17/14
to ve...@googlegroups.com
Hi Tomas,

I run multiple services on my AWS nodes (Cassandra + Vert.x) so what I did was configure all nodes in my cluster to have full access to all ports to all other servers in that security group (only). This means that dynamically allocated port ranges like in the case of Vert.x port ranges (5701-57XX...) are not blocked out.

This should be a perfectly reasonable set-up - I got my own 3 private subnets spanning 3 availability zones in a VPC config. I got separate public subnets (with IGW attached) for elastic load balancing (1 in each AZ - for access to private subnet) with separate SG with access to port 443/80 on the servers.

So... All Cassandra + Vert.x nodes in the same SG: sg-12345678 - and I have an inbound rule for the hosts in the same SG: 

Type: All traffic
Protocol: All
Port range: All
Source: sg-12345678 (<group tag>)

Morten

Nick Scavelli

unread,
Mar 17, 2014, 9:53:07 AM3/17/14
to ve...@googlegroups.com
Remember there is also -cluster-host and -cluster-port run options with vert.x. So if your nodes are joining the hazelcast cluster, you might want to make sure these options are setup correctly.

Arik Bron

unread,
Apr 17, 2016, 12:38:01 PM4/17/16
to vert.x
Was this problem resolved?

If yes - what was the solution?

I am running into a problem with clustering event bus on AWS...

Tomas Riha

unread,
Apr 18, 2016, 3:48:12 AM4/18/16
to ve...@googlegroups.com
Yes I did but don't remember how I'm not using vertx atm. 

But I think the key was to read the hazelcast documentation and how they say hazelcast should be clustered on aws and remove all other config then the aws specific. Also I have some memory about having to tag my instances right as hazelcast uses aws APIs to find ips of cluster members.

Sorry for not being much help. Was some time ago. Still which we would have gone for vertx at our company but politics won. 😕

Sent from my iPhone
--
You received this message because you are subscribed to a topic in the Google Groups "vert.x" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/vertx/4IvE5JzR-t8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/e4f00eef-3f77-49d5-8ebb-27241b2b2a53%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Arik Bron

unread,
Apr 27, 2016, 9:20:00 AM4/27/16
to vert.x
I recently redloyed Vert.x into AWS as a cluster. AWS supports auto-detection of cluster nodes. There is also a thread in which I posted some of the findings and others responded to it.

Let me know whether you still need help.

Jaye Jung

unread,
Sep 26, 2016, 5:47:23 AM9/26/16
to vert.x
Hi, Arik.

I was trying to deploy Vert.x in AWS as a cluster. However, Vert.x in AWS couldn't detect each other automatically.
Would you help me? I couldn't find the solution you posted.

Thanks.

2016년 4월 27일 수요일 오후 10시 20분 0초 UTC+9, Arik Bron 님의 말:

Clement Escoffier

unread,
Sep 26, 2016, 5:55:08 AM9/26/16
to ve...@googlegroups.com
Hi,

If you are using the Hazelcast cluster manager, configure it as indicated in https://hazelcast.com/resources/amazon-ec2-deployment-guide/ and http://docs.hazelcast.org/docs/3.5/manual/html/ec2.html.

Alternatively, you can use the Zookeeper cluster manager. In this case, you just need an EC2 instance running Zookeeper. This instance is known by all the other nodes (let’s say the IP:Port is set in the ZOOKEEPER_SERVICE_HOST env variable) and use it:
java -Dvertx.zookeeper.hosts=${ZOOKEEPER_SERVICE_HOST} -jar your-fat-jar.jar -cluster

Clement


You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.

jklingsporn

unread,
Aug 21, 2020, 1:55:40 PM8/21/20
to vert.x
Oldie but goldie. I am currently facing similar issues (nodes only detect after a while, lite-member-feature does not work). If anyone could shared their configuration/setup I'd be very happy. I've also in discussion with the hazelcast-guys: https://github.com/hazelcast/hazelcast-aws/issues/69#issuecomment-678344863
Reply all
Reply to author
Forward
0 new messages