best way to debug eventbus communications?

Travis Thurber

unread,

Jun 13, 2017, 11:56:59 AM6/13/17

to vert.x

Hello,

I am attempting to deploy a cluster of vertx instances running inside docker containers on Amazon EC2 instances. I have read all the posts I can find here concerning this fun topic, and learned much along the way ( in particular, that eventbus communication does not use the cluster manager, but rather direct tcp connections ). My current situation is this:

By using the Hazelcast AWS discovery service, my instances successfully form a hazelcast cluster and can successfully perform all clustering operations that are managed by hazelcast, for instance sharing session information across the cluster.

All instances are running the same verticle, and make use of the SockJS eventbus client to establish socket connections with web clients, so as to communicate with clients over the eventbus. This is also working, but not quite in the way I would hope.

Say, for instance that I have three instances of my service deployed, let's call them A, B, and C. Hazelcast reports that each instance has successfully joined the cluster, and cluster communications are using port 5701. I have configured the eventbus to always use port 5702, and each instance reports that it is listening on port 5702. Now, web client X establishes a socket connection with instance A. Client X begins sending messages to the eventbus via A, and successfully receives replies. However, only instance A is consuming the eventbus messages, whereas I would hope that the work of consuming the messages would be shared across A, B, and C.

Here are my questions:

Say client X sends many many messages to the eventbus simultaneously over a socket connection with A ( send, not publish ). Is it fair to expect consumption of the messages to be shared across all instances on the cluster? Or should I truly expect A to consume all the messages since the socket is with A?
It's difficult for me to parse through all the generated logs, but as far as I can tell, none of the instances are reporting any issues with eventbus communication on port 5702. But, they are also not reporting ANYTHING about port 5702 aside from a single log message stating that a listener is established on port 5702 on the expected interface. Can anyone suggest log settings that would target very specifically messages concerning the eventbus?
My current hazelcast config very much restricts network settings to those required for AWS discovery. Do any hazelcast config settings inadvertently affect eventbus communications ( for instance, disabling tcp discovery )?

Many thanks for any help that anyone can provide!

Thomas SEGISMONT

unread,

Jun 13, 2017, 12:37:32 PM6/13/17

to ve...@googlegroups.com

Hi

2017-06-13 17:56 GMT+02:00 Travis Thurber <travis....@gmail.com>:

Hello,

I am attempting to deploy a cluster of vertx instances running inside docker containers on Amazon EC2 instances. I have read all the posts I can find here concerning this fun topic, and learned much along the way ( in particular, that eventbus communication does not use the cluster manager, but rather direct tcp connections ). My current situation is this:

By using the Hazelcast AWS discovery service, my instances successfully form a hazelcast cluster and can successfully perform all clustering operations that are managed by hazelcast, for instance sharing session information across the cluster.

All instances are running the same verticle, and make use of the SockJS eventbus client to establish socket connections with web clients, so as to communicate with clients over the eventbus. This is also working, but not quite in the way I would hope.

Say, for instance that I have three instances of my service deployed, let's call them A, B, and C. Hazelcast reports that each instance has successfully joined the cluster, and cluster communications are using port 5701. I have configured the eventbus to always use port 5702, and each instance reports that it is listening on port 5702. Now, web client X establishes a socket connection with instance A. Client X begins sending messages to the eventbus via A, and successfully receives replies. However, only instance A is consuming the eventbus messages, whereas I would hope that the work of consuming the messages would be shared across A, B, and C.

Here are my questions:
Say client X sends many many messages to the eventbus simultaneously over a socket connection with A ( send, not publish ). Is it fair to expect consumption of the messages to be shared across all instances on the cluster? Or should I truly expect A to consume all the messages since the socket is with A?

If the same verticle is deployed on all machines, I assume they consumers listen to the same event bus address. So if Vert.x is started in clustered mode, messages should be load balanced.

It's difficult for me to parse through all the generated logs, but as far as I can tell, none of the instances are reporting any issues with eventbus communication on port 5702. But, they are also not reporting ANYTHING about port 5702 aside from a single log message stating that a listener is established on port 5702 on the expected interface. Can anyone suggest log settings that would target very specifically messages concerning the eventbus?

No you won't get more info at DEBUG level

My current hazelcast config very much restricts network settings to those required for AWS discovery. Do any hazelcast config settings inadvertently affect eventbus communications ( for instance, disabling tcp discovery )?

A custom Hazelcast config must have a few Vert.x objects definitions, as described in http://vertx.io/docs/vertx-hazelcast/java/#_using_an_existing_hazelcast_cluster

Many thanks for any help that anyone can provide!

There are two reasons I can think of for your problem:

- your consumers are defined as local consumers so the subscription is not shared between Vert.x nodes

- the EventBus TCP servers listen on an interface other nodes can't reach; have you tried to set the cluster public host option?

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/410a9885-88e8-4d8f-b62a-db85f2485ad0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Travis Thurber

unread,

Jun 13, 2017, 1:13:10 PM6/13/17

to vert.x

Thanks for your reply!

I have indeed included the Vert.x specific options in my custom Hazelcast config, and have confirmed that my consumers are not local consumers.

I have tried various settings with public host and port with no luck, but perhaps I have not done it correctly.

For the hazelcast config, I am able to dynamically write the host IP into the config at the moment before the docker container is built. My understanding was that the eventbus will use the hazelcast public host by default ( is this the case, or must I specify it again in the vertx options? ). I am deploying a fat jar, so my current setup uses the Launcher class with these args: "run org.my.verticle.class -cluster -cluster-port 5702". I have tried extending the Launcher class so that I can dynamically set the "options.setClusterPublicHost()" option to the host machine interface, but did not notice any difference in cluster eventbus behavior ( are options set in the Launcher perhaps overwritten by using the commandline argument settings ? ).

For completeness, I should mention: My Dockerfile exposes ports 443, 5071, 5072. My EC2 instance uses nginx to reverse proxy traffic from its external ports to the docker exposed ports; 5071 and 5072 are streamed through unchanged. I can confirm that both the host instance and the docker container are listening on 5071 and 5072. My security groups allow traffic on 443, 5071, and 5702. I do however notice that there are no packets flowing into 5072 on the host instances, so clearly no negotiation is taking place.

Hi

To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.

Travis Thurber

unread,

Jun 13, 2017, 3:51:02 PM6/13/17

to vert.x

Okay, I definitely needed to explicitly set the clusterPublicHost. Once I did that, I can see that traffic is attempting to transmit between instances on 5702. There is still a problem, but it it is probably with my nginx reverse proxy configuration, as the connection is refused between nginx and the docker container.

Travis Thurber

unread,

Jun 13, 2017, 3:58:40 PM6/13/17

to vert.x

Actually, I do notice that the kernel of the docker container running my verticle is not registering any listeners on 5702, based on a "netstat -tan"... I see only 5701 and 443.

Is there some additional configuration I need to cause the verticle to actually listen on 5702?

Travis Thurber

unread,

Jun 13, 2017, 6:18:31 PM6/13/17

to vert.x

Well, this is strange... i found that in order for the verticle to listen on 5702, i HAD to use the commandline option -cluster-host 5702... programmatically setting the port with the EventBusOptions did not create a listener. I had removed my commandline options because i needed to set clusterPublicHost programmatically anyway. Is it perhaps a bug that using EventBusOptions to specify port isn't respected?

But it's all working now. Sorry for spamming this thread, but thanks for the sounding board!

Tim Fox

unread,

Jun 14, 2017, 3:50:52 AM6/14/17

to vert.x

On Tuesday, 13 June 2017 23:18:31 UTC+1, Travis Thurber wrote:

Well, this is strange... i found that in order for the verticle to listen on 5702, i HAD to use the commandline option -cluster-host 5702

Do you mean -cluster-*port* 5702?

5702 seems like a port number not a host name.

There are two pairs of properties that determine what host/port the event bus listens at and which host/port is publicised to others.

ClusterHost, ClusterPort - this pair determines the local host/port that event bus listens at

ClusterPublicHost, ClusterPublicPort - this pair determines what host/port is advertised to other nodes as the correct host/port to connect back to.

In most normal cases you just need to specify ClusterHost/ClusterPort (or not specify anything at all if the default is fine), but in some cases, e.g. running in some containers, the host/port you should listen to locally is just a local address and the container does some magic to proxy traffic from a different host/port to the local one. In this case the host/port the other nodes use to connect to your node is not the same host/port the local instance is listening on. This is what ClusterPublicHost/ClusterPublicHost are for - it contains the value of host/port that other nodes need to use to connect to you.

In your case you probably need to specify both pairs.

Travis Thurber

unread,

Jun 14, 2017, 10:00:16 AM6/14/17

to vert.x

Yes, I typo-ed that, I meant -cluster-port 5702.

I've gotten everything working now, and all I ended up needing was ClusterPort and ClusterPublicHost.

The strange thing was that if I set ClusterPort programatically, i.e. via a Launcher override as in the code block below, the verticle did not honor the port setting and instead listened for eventbus messages on a random port like in the default behavior. To get the verticle listening on the specified port, I had to use the command line option.

Launcher override:

...

public class MyLauncher extends Launcher {

  private JsonObject config = null;

  public static void main(String[] args) {
     new MyLauncher().dispatch(args);
   }

  @Override
   public void afterConfigParsed(JsonObject config) {
       this.config = config;
   }

  @Override
   public void beforeStartingVertx(VertxOptions options) {
    options
     .setClustered(true)
      .setClusterPublicHost(config.getString("host")) // this worked and was necessary ("host" is the docker host machine's IP)
     .setClusterPort(5702) // this did not seem to be honored
     .setEventBusOptions(
        options.getEventBusOptions()
         .setClustered(true)
         .setPort(5702) // is this the same setting as above? also did not seem to be honored
     );
   }

}

And for reference, here are the salient parts of my build.gradle with the command line options:

...
task capsule(type: Jar, dependsOn: jar) {
  archiveName = "my-capsule.jar"
  from jar
  from { configurations.runtime }
  from (configurations.capsule.collect { zipTree(it) }) { include "Capsule.class" }
  manifest {
    attributes(
      "Main-Class" : "Capsule",
      "Application-Class" : "com.example.MyLauncher",
      "Args" : "run com.example.MyServer -cluster -cluster-port 5702",
      "Min-Java-Version" : "1.8.0",
      "JVM-Args" : run.jvmArgs.join(' '),
      "System-Properties" : run.systemProperties.collect { k,v -> "$k=$v" }.join(' '),
      "Java-Agents" : "${(++configurations.quasar.iterator()).getName()}"
    )
  }
}

Ian Andrews

unread,

Feb 2, 2018, 7:23:50 AM2/2/18

to vert.x

This thread was very helpful in getting a Vertx cluster working in Kubernetes. I was able to get hazelcast cluster discovery working using the hazelcast-kubernetes plugin, and this thread helped getting the actual eventbus transport working.

Since I haven't seen a complete example of this case anywhere, here is the setup that worked for me:

I included this dependency in my pom to get the hazelcast-kubernetes plugin:

<dependency>
    <groupId>com.hazelcast</groupId>
    <artifactId>hazelcast-kubernetes</artifactId>
    <version>1.1.0</version>
</dependency>

kubernetes-service.yml:

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: vertx-client
  labels:
    app: vertx-client
spec:
  selector:
    matchLabels:
      app: vertx-client
  template:
    metadata:
      labels:
        app: vertx-client
        vertx-cluster: yup
    spec:
      containers:
      - name: vertx-client
        image: sample/client:1.0
        ports:
        - containerPort: 5701
        - containerPort: 5702
---
apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: vertx-service
  labels:
    app: vertx-service
spec:
  selector:
    matchLabels:
      app: vertx-service
  template:
    metadata:
      labels:
        app: vertx-service
        vertx-cluster: yup
    spec:
      containers:
      - name: vertx-service
        image: sample/service:1.0
        ports:
        - containerPort: 5701
        - containerPort: 5702
---
apiVersion: v1
kind: Service
metadata:
  name: vertx-cluster
  labels:
    app: vertx-cluster
spec:
  clusterIP: None
  ports:
  - port: 5701
    targetPort: 5701
    protocol: TCP
  selector:
    vertx-cluster: yup

cluster.xml:

<?xml version="1.0" encoding="UTF-8"?>
<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-3.6.xsd"
           xmlns="http://www.hazelcast.com/schema/config"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <properties>
        <property name="hazelcast.mancenter.enabled">false</property>
        <property name="hazelcast.memcache.enabled">false</property>
        <property name="hazelcast.rest.enabled">false</property>
        <property name="hazelcast.wait.seconds.before.join">0</property>
        <property name="hazelcast.logging.type">jdk</property>
        <property name="hazelcast.shutdownhook.enabled">false</property>

        <!-- at the moment the discovery needs to be activated explicitly -->
        <property name="hazelcast.discovery.enabled">true</property>
    </properties>

    <network>
        <port auto-increment="true" port-count="10000">5701</port>
        <outbound-ports>
            <ports>0</ports>
        </outbound-ports>
        <join>
            <multicast enabled="false"/>
            <tcp-ip enabled="false"/>

            <discovery-strategies>
                <discovery-strategy enabled="true" class="com.hazelcast.kubernetes.HazelcastKubernetesDiscoveryStrategy">
                    <properties>
                        <property name="service-name">vertx-cluster</property>
                        <property name="service-label-name">app</property>
                        <property name="service-label-value">vertx-cluster</property>
                        <property name="resolve-not-ready-addresses">true</property>
                        <property name="namespace">default</property>
                    </properties>
                </discovery-strategy>
            </discovery-strategies>
        </join>
    </network>

    <partition-group enabled="false"/>
    <executor-service name="default">
        <pool-size>16</pool-size>
        <queue-capacity>0</queue-capacity>
    </executor-service>

    <multimap name="__vertx.subs">
        <backup-count>1</backup-count>
    </multimap>
    <map name="__vertx.haInfo">
        <time-to-live-seconds>0</time-to-live-seconds>
        <max-idle-seconds>0</max-idle-seconds>
        <eviction-policy>NONE</eviction-policy>
        <max-size policy="PER_NODE">0</max-size>
        <eviction-percentage>25</eviction-percentage>
        <merge-policy>com.hazelcast.map.merge.LatestUpdateMapMergePolicy</merge-policy>
    </map>
    <semaphore name="__vertx.*">
        <initial-permits>1</initial-permits>
    </semaphore>
</hazelcast>

Command that the Docker containers ran:

java -cp .:service-1.0.jar io.vertx.core.Launcher -cluster -cluster-port 5702 -cluster-host $(hostname -i)

Reply all

Reply to author

Forward