Vert.x in docker containers multi-node issues

7,534 views
Skip to first unread message

Kristopher Cieplak

unread,
Mar 17, 2014, 1:35:18 PM3/17/14
to ve...@googlegroups.com
**Posted in Hazelcast as well.**

I am having issues with getting Vert.x, in a cluster configuration, working properly from within docker containers.  This appears to be with the usage of hazelcast to do cluster member detection.  The scenario is as follows:

I have a working situation when the docker containers are all running on a single node using the default multicast configuration.

-->DOCKER-CONTAINER-A(172.17.0.37)
-->DOCKER-CONTAINER-B(172.17.0.38)

In this scenario everything works as expected because multicast across the docker host network interface (172.17.42.1) in this case.  Both containers can access each others network interfaces across the host, i.e. 172.17.0.37 can talk to 172.17.0.38. Every one is happy, and this can scale across one node.

The scenario that I am having difficulty with is where there are multiple hosts involved, thus not sharing the docker host interface.

-->DOCKER-CONTAINER-A(172.17.0.37)

-->DOCKER-CONTAINER-A(172.17.0.11)

I am using mesos to run containers so there will also be another level of port redirection but for my simple test case I am only using and running one docker container on each node.  The containers are exposing port 5701 on the host, i.e. amazo1.ott.qnx.com:5701 is accessible. In this scenario multicast will not work so i need to move to a static tcp.ip cluster setup.  I have modified the cluster.xml configuration to add HOST-A/DOCKER-CONTAINER-A as a cluster, i.e. 

<network>
        <port auto-increment="false">5701</port>
        <join>
            <multicast enabled="false">
                <multicast-group>224.2.2.3</multicast-group>
                <multicast-port>54327</multicast-port>
            </multicast>
            <tcp-ip enabled="true">
                <interface>amazo1.ott.qnx.com:5701</interface>
            </tcp-ip>
        </join>

Here is where the issue happens.

** HOST-A/DOCKER-CONTAINER-A the following occurs:
[172.17.0.37]:5701 [dev] Hazelcast Community Edition 2.6.6 (20140123) starting at Address[172.17.0.37]:5701
[172.17.0.37]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com
[172.17.0.37]:5701 [dev] Address[172.17.0.37]:5701 is STARTING
[172.17.0.37]:5701 [dev]
Members [1] {
Member [172.17.0.37]:5701 this
}
[172.17.0.37]:5701 [dev] 5701 is accepting socket connection from /10.222.108.245:52965
[172.17.0.37]:5701 [dev] 5701 accepted socket connection from /10.222.108.245:52965
[172.17.0.37]:5701 [dev] Wrong bind request from Address[172.17.0.11]:5701! This node is not requested endpoint: Address[amazo1.ott.qnx.com]:5701
[172.17.0.37]:5701 [dev] Connection [/10.222.108.245:52965] lost. Reason: Explicit close

** HOST-B/DOCKER-CONTAINER-A the following occurs:
[172.17.0.11]:5701 [dev] Hazelcast Community Edition 2.6.6 (20140123) starting at Address[172.17.0.11]:5701
[172.17.0.11]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com
[172.17.0.11]:5701 [dev] Address[172.17.0.11]:5701 is STARTING
[172.17.0.11]:5701 [dev] Connecting to possible member: Address[amazo1.ott.qnx.com]:5701
[172.17.0.11]:5701 [dev] 52965 accepted socket connection from amazo1.ott.qnx.com/10.222.108.242:5701
[172.17.0.11]:5701 [dev] Connection [Address[amazo1.ott.qnx.com]:5701] lost. Reason: java.io.EOFException[null]

To me what looks like is happening is that the "announced"(correct term?) address in hazelcast is the local address within the docker container, i.e. 172.17.0.11, which is not reachable by HOST-A/DOCKER-CONTAINER-A. 

I think for this scenario I need a way to bind to the local interface, i.e. 172.17.0.11, but announce my public address, as amazo2.ott.qnx.com/10.222.108.245. I do not see anyway in which that can be accomplished.  I have tried at the "vertex run" level to use the -cluster-host but this is expecting an address that is local to the docker interfaces.

Any insight/experience would be greatly appreciated
Kristopher

Nick Scavelli

unread,
Mar 17, 2014, 2:57:28 PM3/17/14
to ve...@googlegroups.com
Yea so this seems very similar to what we experienced getting Vert.x working on OpenShift. I believe vertx now supports what you are trying to achieve.

1. Getting hazelcast working

Hazelcast supports a public address option which can be added to cluster.xml. Here is an example of a cluster.xml file used for a scaled vertx app on openshift https://gist.github.com/nscavell/a7727f16c402a1e6e4fa. You will notice on line 19 where we specify the public IP and port. You probably don't have to specify the port since you are using 5701 publicly as well. We also had to set 3 properties in lines 10-12. You may or may not have to set these. I can provide more details on this if you wish.

2. Getting vert.x event bus clustered

Vert.x also supports configuring the local and public addresses so it can bind properly. Here is the relevant output from the vertx process running on OpenShift:

java -Dvertx.cluster.public.host=10.164.105.33 -Dvertx.cluster.public.port=60558 ... org.vertx.java.platform.impl.cli.Starter run server.js -cluster -cluster-port 9123 -cluster-host 127.10.192.129

You'll notice the use of the system properties "vertx.cluster.public.host" and "vertx.cluster.public.port". You can set these system properties by exporting VERTX_OPTS prior to running the vertx command. This will tell vertx just like we told hazelcast to bind to one address but publish another so people can communicate with us.

I hope this helps.

Kristopher Cieplak

unread,
Mar 17, 2014, 9:40:34 PM3/17/14
to ve...@googlegroups.com
Thanks Nick,

I have briefly tried the configuration options and they seem to have worked, Hazelcast seems happy now.  I won't know until tomorrow if the end to end works but looks very promising.

Kristopher


On Monday, March 17, 2014 1:35:18 PM UTC-4, Kristopher Cieplak wrote:

Kristopher Cieplak

unread,
Mar 19, 2014, 7:57:59 PM3/19/14
to ve...@googlegroups.com
Thanks Nick,
With that secret sauce, I am now running vert.x in docker containers on multiple mesos-slaves across multiple nodes, all via Marathon and can scale my instances and have vert.x clustering working.  Awesome!

Nick Scavelli

unread,
Mar 20, 2014, 11:26:00 AM3/20/14
to ve...@googlegroups.com
Good deal !

Alex Thurston

unread,
Jan 20, 2015, 9:14:27 AM1/20/15
to ve...@googlegroups.com
Nick, hopefully you can help me out as well.  I also work with Kris.  We still have the same situation but I not want to run another vertx instance (new JVM) on the same node.  Is this something that is doable?  Things appear to be clustered as far as HZ is concerned:

Jan 20 14:03:00 quarter001 quarter: #011Member [vertxmaster001.staging.qnx.altus.bblabs]:5701
Jan 20 14:03:00 quarter001 quarter: #011Member [quarter001.staging.qnx.altus.bblabs]:5701       <- The original vertx JVM
Jan 20 14:03:00 quarter001 quarter: #011Member [quarter001.staging.qnx.altus.bblabs]:5702       <- The new vertx JVM

My second JVM has the same public host but a different public port.

However, when I try to publish to the address that the second JVM verticle has registered, I get nothing.

Nick Scavelli

unread,
Jan 21, 2015, 4:54:00 PM1/21/15
to ve...@googlegroups.com
Usually this means that while hazelcast is clustered the event bus net server isn't configured properly; hence, you won't see any messages on the event bus.

You can configure this by in the -cluster-host and -cluster-port options when you run vertx. Look at my original post where I list the java command that is running, those are all configurations for the event bus net server. Hazelcast is done in the cluster.xml.

sANTo L

unread,
Jan 22, 2015, 5:20:58 AM1/22/15
to ve...@googlegroups.com
Hi Alex,

Maybe you should take a look at the tutorial I posted earlier: https://groups.google.com/forum/#!topic/vertx/zC8lB00PTTU
It describes in detail the steps needed to run Vert.x in a multi-node docker environment

sANTo

Girish M

unread,
Feb 10, 2015, 12:24:26 AM2/10/15
to ve...@googlegroups.com
Hi Nick,

I have come up with a possible simpler approach to achieve multi-node clustering using docker. Described here: https://groups.google.com/forum/#!topic/vertx/i5zW_YVerCo

I have verified it with a simple notification client-server setup and is working well. For multi-node we need to add a route from the second vertx node to the first node running "cluster master".

Request your thoughts.

Thank you,
Girish

Pieter Smit

unread,
Jun 8, 2016, 9:06:50 AM6/8/16
to vert.x
hi,

Just want to add my 2 cents, on Vert.x 3 + Docker + Rancher. I also struggled to get it working, but here is my solution:

I start my Vertx Cluster in code like so (no extra config files):

        String localIp = System.getProperty("local.ip");
        String publicIp = System.getProperty("public.ip"); 
        
        Config hazelcastConfig = new Config();
        hazelcastConfig.setProperty("hazelcast.local.localAddress", localIp);
        
        NetworkConfig networkConfig = hazelcastConfig.getNetworkConfig();
        InterfacesConfig networkInterface = networkConfig.getInterfaces();
        networkInterface.setEnabled(true).addInterface(localIp);
        networkConfig.setPublicAddress(publicIp);

        ClusterManager mgr = new HazelcastClusterManager(hazelcastConfig);
        
        VertxOptions options = new VertxOptions().setClusterManager(mgr)
                .setClustered(true).setClusterHost(localIp).setClusterPort(9111);
        
        Vertx.clusteredVertx(options, res -> {
          if (res.succeeded()) {
              vertx = res.result();
              System.out.println("Clustered on " + localIp + " and external " + publicIp);
              registerHandler();
          }

I'm running my fat jar from within the container with the following script (I need to wait a while for the interface to become available, I know this is not ideal or correct): 

   sleep 5

   #docker's ip address for this container
   MY_LOCAL=`ip addr | grep 172 | awk '{print $2}' | cut -d/ -f1`
   MY_PUBLIC=`ip addr | grep 10.42 | awk '{print $2}' | cut -d/ -f1`

   java -jar -Dlocal.ip=$MY_LOCAL -Dpublic.ip=$MY_PUBLIC client.jar dockerclient


The grep expressions match my server setup and 9111 is just a random port I choose. The eventbus now publishes the messages to all the containers in Rancher that has the 9111 and 5701 ports exposed. At the moment this will also publish messages accross different stacks in Rancher, which is not ideal. 

I hope this helps,
Pieter



Ajit Amitav Das

unread,
Nov 24, 2016, 3:09:49 AM11/24/16
to vert.x
Hi Nick- can you please describe below 3 properties are these needed  ? Will there be any issue if local address  set by default with the docker container ip i.e. something like this ? HOST=$(netstat -nr | grep '^0.0.0.0' | awk '{print $2}').

I am facing something similar issue when I deploy in AWS with hazelcast inside docker, though I can discover all the private ips from docker with a given security-group and tags it can't identify the peers hence not joining the clusters across multiple EC2 instances.

<property name="hazelcast.local.localAddress">127.10.192.129</property>
<property name="hazelcast.socket.server.bind.any">false</property>
<property name="hazelcast.socket.client.bind">false</property>
Reply all
Reply to author
Forward
0 new messages