[New] Containers in Docker Swarm not forming single cluster across multiple docker hosts

431 views
Skip to first unread message

Prateek Kumar

unread,
Sep 23, 2018, 2:42:11 AM9/23/18
to jgroups-dev
Hi!

I wanted to check if there's any progress towards a feasible/working solution to support container services which can coalesce into a single cluster using Jgroups 4 and are spread across separate docker hosts.

It seems the last update was in 2015 as below:
Have not tried some possible solutions since I would like to first confirm if its possible OOTB (or not), before trying those like - Gossip router, separate clustering software like weave/pipework/libswarm, or instructions at https://goldmann.pl/blog/2014/01/21/connecting-docker-containers-on-multiple-hosts/.

So could someone please advise if this is possible?

Scenario:
  • On premise private LAN network (not in AWS yet).
  • 2 docker hosts in a swarm cluster
  • Containers deployed as services stack (not containers like docker run with/without --net=host) so they will form a bridge network and get distributed across multiple docker hosts.
  • Jgroups 4 inside each container - for testing here we can use belaban/jgroups itself (but deployed using docker stack deploy instead), although in our application has an embedded Infinispan cache running in distributed nodes and is using Jgroups to discover multiple instances of our webapp. 
What's working - multiple service containers on a single docker host can indeed form a cluster (This was achieved after following instructions at https://stackoverflow.com/questions/50086557/infinispans-jgroups-not-joining-the-same-cluster-in-docker-services cross posted at https://groups.google.com/forum/#!searchin/jgroups-dev/jgroups$20docker|sort:date/jgroups-dev/2UOCSOHYkSc/Z_xZ_lfRCQAJ)

Not working - when deployed across the 2 (or multiple) docker hosts. Here is the log of what's been done by tweaking belaban/jgroups.

[root@localhost Downloads]# #create the following files in folder below

[root@localhost Downloads]# cd jgroups/
[root@localhost jgroups]# ls
docker
-compose-jgroups.yml  dockerfile_jgroups  entrypoint.sh
[root@localhost jgroups]# dos2unix entrypoint.sh #necessary on windows to take care of line endings

[root@localhost jgroups]# cat entrypoint.sh #to ensure container does not exit when deployed as service stack in docker swarm
#!/bin/bash
set -x
echo
"entrypoint"
tail
-f /dev/null

[root@localhost jgroups]# cat dockerfile_jgroups
FROM belaban
/jgroups
COPY entrypoint
.sh /
ENTRYPOINT
["/entrypoint.sh"]
CMD
["run"]
[root@localhost jgroups]#


[root@localhost jgroups]# cat docker-compose-jgroups.yml
version
: "3"
 
services
:
  j1
:
    image
: pk_jgroups:latest
    networks
:
     
- jgnet
 
  j2
:
    image
: pk_jgroups:latest
    networks
:
     
- jgnet
 
networks
:
   jgnet
:


[root@localhost jgroups]# docker build --rm -t pk_jgroups:latest -f dockerfile_jgroups .

[root@localhost jgroups] # #docker image ls #shows pk_jgroups:latest

[root@localhost jgroups]# #either do a docker save and export this image as a tar onto other docker host in cluster, or build it there similarly too

[root@localhost jgroups]# #both hosts in docker swarm cluster must have this image and be in swarm cluster and can run services as workers



[root@localhost jgroups]# docker stack deploy -c docker-compose-jgroups.yml pk_jgroups

docker
-compose-jgroups.yml# docker service ls

docker
-compose-jgroups.yml# docker ps #run on each docker host to get container id



At this point suppose containerId1 is on dockerHost1 and containerId2 went to dockerHost2 - if we run the following commands on each host, they should form a cluster but are failing to coalesce into single cluster.
[root@localhost jgroups]# docker exec -it containerId1 /bin/bash

bash
-4.3$ cd jgroups-docker/conf/
bash
-4.3$ chat.sh -props udp.xml
** view: [containerId1 -38465|0] (1) [containerId1 -38465]

exit


If the above were to be done on a single docker host running in swarm mode (which is possible even for single host), then the above shows view : 2. Similarly in multiple host mode with replicas above, the containers form clusters on each host, but do not coalesce into a single cluster across the docker hosts.

Am hoping there's a solution here! Will provide any/more details if sought.
Thanks,
Prateek Kumar

Bela Ban

unread,
Sep 26, 2018, 8:38:43 AM9/26/18
to jgrou...@googlegroups.com
I haven't yet looked into this, but according to [1], an overlay network
might solve this. If this is correct, then the only thing that would
need to be changed is the bind_addr in TCP.

Can you try this out?

[1]
https://docs.docker.com/network/overlay/#customize-the-docker_gwbridge-interface

On 23/09/18 08:42, Prateek Kumar wrote:
> Hi!
>
> I wanted to check if there's any *progress *towards a feasible/working
> solution to support _container services which can coalesce into a single
> cluster using Jgroups 4 and are spread across separate docker hosts._
>
> It seems the last update was in *2015 *as below:
>
> * Not gone beyond single host -
> https://issues.jboss.org/browse/JGRP-1840?focusedCommentId=13011991&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13011991
> * Still need to investigate beyond single host
> - http://belaban.blogspot.com/2014/10/jgroups-and-docker.html
> * OOTB support is missing and might require changes in docker to
> Have not tried some /possible /solutions since I would like to first
> *confirm if its possible OOTB (or not)*, before trying those like -
> Gossip router, separate clustering software like
> weave/pipework/libswarm, or instructions
> at https://goldmann.pl/blog/2014/01/21/connecting-docker-containers-on-multiple-hosts/.
>
> /So could someone please advise if this is possible?/
>
> _*Scenario*:_
>
> * On premise private LAN network (not in AWS yet).
> * 2 docker hosts in a swarm cluster
> * Containers deployed as services stack (not containers like docker
> run with/without --net=host) so they will form a bridge network and
> get distributed across multiple docker hosts.
> * Jgroups 4 inside each container - for testing here we can use
> belaban/jgroups itself (but deployed using docker stack deploy
> instead), although in our application has an embedded Infinispan
> cache running in distributed nodes and is using Jgroups to discover
> multiple instances of our webapp.
>
> *_What's working_* - multiple service containers on a /single /docker
> *_Not working_ *- when deployed across the 2 (or multiple) docker hosts.
> to dockerHost2 -ifwe run the following commands on each host,they should
> form a cluster but are failing to coalesce intosingle cluster.
> [root@localhost jgroups]# docker exec -it containerId1 /bin/bash
>
> bash-4.3$ cd jgroups-docker/conf/
> bash-4.3$ chat.sh -props udp.xml
> **view:[containerId1 -38465|0](1)[containerId1 -38465]
>
> exit
> |
>
>
> *If the above were to be done on a single docker host running in swarm
> mode (which is possible even for single host), then the above shows view
> : 2. Similarly in multiple host mode with replicas above, the containers
> form clusters on each host, but do not coalesce into a single cluster
> /across /the docker hosts.*
> *
> *
> Am hoping there's a solution here! Will provide any/more details if sought.
> Thanks,
> Prateek Kumar
>
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>.
> To post to this group, send email to jgrou...@googlegroups.com
> <mailto:jgrou...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jgroups-dev/e70f7b94-6367-4148-b9fe-b973c2266817%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-dev/e70f7b94-6367-4148-b9fe-b973c2266817%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
Bela Ban | http://www.jgroups.org

Prateek Kumar

unread,
Sep 27, 2018, 1:31:43 AM9/27/18
to jgroups-dev
Hi Bela,

I tried following the instructions in your link [1] but I'm not sure if I did it correctly.

Since we are using docker swarm in a cluster of 2 docker hosts, the services are already using an overlay network (ingress) and bridge (docker_gwbridge). The link [1] describes how to re-create customized overlay networks. 
I tried recreating the custom ingress and docker_gwbridge using our known default gateway (which was set in the TCP bind_addr="default gateway IP address") and subnet but still could not get it to work.

Separately, I did find a feasible workaround using the jgroups-gossip image https://hub.docker.com/r/jboss/jgroups-gossip/ by deploying it as a service in the docker compose .yml file.
services:
  gossip:
    image: jboss/jgroups-gossip:latest
    ports:
      - "12001:12001"
    networks:
      - webnet
    environment:
      - LogLevel=DEBUG
This adds the gossip router in the multi host cluster and services across the cluster did form a single coalesced cluster (in infinispan) after I changed the jgroups.xml to use <TCPGOSSIP initial_hosts="gossip[12001]" />
The problem with this fix is that it seems to work only with a very specific docker CE version 18.03 and 18.05 but not others (and its failing to work in docker EE entirely). I'm still not sure why a minor version change can break compatibility.

_prateek

Bela Ban

unread,
Sep 27, 2018, 7:21:23 AM9/27/18
to jgrou...@googlegroups.com


On 27/09/18 07:31, Prateek Kumar wrote:
> Hi Bela,
>
> I tried following the instructions in your link [1] but I'm not sure if
> I did it correctly.
>
> Since we are using docker swarm in a cluster of 2 docker hosts, the
> services are already using an overlay network (ingress) and bridge
> (docker_gwbridge). The link [1] describes how to re-create customized
> overlay networks.

Are you exposing the ports correctly (using -p)?

> I tried recreating the custom ingress and docker_gwbridge using our
> known default gateway (which was set in the TCP bind_addr="default
> gateway IP address") and subnet but still could not get it to work.
>
> Separately, I did find a feasible workaround using the jgroups-gossip
> image https://hub.docker.com/r/jboss/jgroups-gossip/ by deploying it as
> a service in the docker compose .yml file.
> services:
>   gossip:
>     image: jboss/jgroups-gossip:latest
>     ports:
>       - "12001:12001"
>     networks:
>       - webnet
>     environment:
>       - LogLevel=DEBUG
> This adds the gossip router in the multi host cluster and services
> across the cluster did form a single coalesced cluster (in infinispan)
> after I changed the jgroups.xml to use <TCPGOSSIP
> initial_hosts="gossip[12001]" />

That would work, but I'm actually going to look into how to run this via
docker swarm and an overlay network

> The problem with this fix is that it seems to work only with a very
> specific docker CE version 18.03 and 18.05 but not others (and its
> failing to work in docker EE entirely). I'm still not sure why a minor
> version change can break compatibility.

Strange; I would not know what such a small version change would break
this. I don't know docker EE
> <https://goldmann.pl/blog/2014/01/21/connecting-docker-containers-on-multiple-hosts/>.
> <https://groups.google.com/forum/#%21searchin/jgroups-dev/jgroups$20docker%7Csort:date/jgroups-dev/2UOCSOHYkSc/Z_xZ_lfRCQAJ>)
> > an email to jgroups-dev...@googlegroups.com <javascript:>
> > <mailto:jgroups-dev...@googlegroups.com <javascript:>>.
> > To post to this group, send email to jgrou...@googlegroups.com
> <javascript:>
> > <mailto:jgrou...@googlegroups.com <javascript:>>.
> <https://groups.google.com/d/msgid/jgroups-dev/e70f7b94-6367-4148-b9fe-b973c2266817%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/optout>.
>
> --
> Bela Ban | http://www.jgroups.org
>
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>.
> To post to this group, send email to jgrou...@googlegroups.com
> <mailto:jgrou...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jgroups-dev/cc88f9e1-4d14-454b-ac66-09047df3c2e5%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-dev/cc88f9e1-4d14-454b-ac66-09047df3c2e5%40googlegroups.com?utm_medium=email&utm_source=footer>.

Prateek Kumar

unread,
Sep 27, 2018, 9:08:03 AM9/27/18
to jgroups-dev
I think so - yes our compose yml file exposes the specific ports that each service in the stack needs to expose outside. In addition I've opened port 7800 and 43366 in firewalld on both docker hosts since the jgroups.xml has:

<TCP bind_addr="match-interface:eth2,match-interface:eth0,loopback"
bind_port="${jgroups.tcp.port:7800}"
... />
<MPING mcast_addr="${jgroups.mping.mcast_addr:228.2.4.6}"
mcast_port="${jgroups.mping.mcast_port:43366}"
ip_ttl="${jgroups.udp.ip_ttl:2}"/>

In swarm mode the services expose all ports to each other anyway.

Am hoping it can work on docker swarm and overlay network eventually then.
>      > <mailto:jgroups-dev+unsub...@googlegroups.com <javascript:>>.
>      > To post to this group, send email to jgrou...@googlegroups.com
>     <javascript:>
>      > <mailto:jgrou...@googlegroups.com <javascript:>>.
>      > To view this discussion on the web visit
>      >
>     https://groups.google.com/d/msgid/jgroups-dev/e70f7b94-6367-4148-b9fe-b973c2266817%40googlegroups.com
>     <https://groups.google.com/d/msgid/jgroups-dev/e70f7b94-6367-4148-b9fe-b973c2266817%40googlegroups.com>
>
>      >
>     <https://groups.google.com/d/msgid/jgroups-dev/e70f7b94-6367-4148-b9fe-b973c2266817%40googlegroups.com?utm_medium=email&utm_source=footer
>     <https://groups.google.com/d/msgid/jgroups-dev/e70f7b94-6367-4148-b9fe-b973c2266817%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
>      > For more options, visit https://groups.google.com/d/optout
>     <https://groups.google.com/d/optout>.
>
>     --
>     Bela Ban | http://www.jgroups.org
>
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-dev...@googlegroups.com

Bela Ban

unread,
Sep 28, 2018, 7:17:01 AM9/28/18
to jgrou...@googlegroups.com
OK, so running default JGroups between nodes in a Docker swarm will
*not* work as IP multicasting between docker nodes is *not* supported;
see [1] for details (and vote for it!).

Default JGroups (udp.xml) uses IP multicasting for discovery, so it
won't work in a docker swarm for the time being. Also, UDP as transport
uses IP multicasting, too, so this needs to be disabled
(ip_mcast=false), or use TCP instead.

However, there are alternatives:

* Use a docker plugin (e.g. weave) to implement multicasting in a swarm
network

* Use a different discovery mechanism, e.g. TCPPING, TCPGOSSIP,
FILE_PING (to an NFS server), JDBC_PING (to a shared DB), NATIVE_S3_PING
or GOOGLE_PING (cloud store) etc...

Once you switch to AWS, you can use TCP/NATIVE_S3_PING, this has been
tested and works.

[1] https://github.com/docker/libnetwork/issues/552

On 23/09/18 08:42, Prateek Kumar wrote:
> Hi!
>
> I wanted to check if there's any *progress *towards a feasible/working
> solution to support _container services which can coalesce into a single
> cluster using Jgroups 4 and are spread across separate docker hosts._
>
> It seems the last update was in *2015 *as below:
>
> * Not gone beyond single host -
> https://issues.jboss.org/browse/JGRP-1840?focusedCommentId=13011991&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13011991
> * Still need to investigate beyond single host
> - http://belaban.blogspot.com/2014/10/jgroups-and-docker.html
> * OOTB support is missing and might require changes in docker to
> Have not tried some /possible /solutions since I would like to first
> *confirm if its possible OOTB (or not)*, before trying those like -
> Gossip router, separate clustering software like
> weave/pipework/libswarm, or instructions
> at https://goldmann.pl/blog/2014/01/21/connecting-docker-containers-on-multiple-hosts/.
>
> /So could someone please advise if this is possible?/
>
> _*Scenario*:_
>
> * On premise private LAN network (not in AWS yet).
> * 2 docker hosts in a swarm cluster
> * Containers deployed as services stack (not containers like docker
> run with/without --net=host) so they will form a bridge network and
> get distributed across multiple docker hosts.
> * Jgroups 4 inside each container - for testing here we can use
> belaban/jgroups itself (but deployed using docker stack deploy
> instead), although in our application has an embedded Infinispan
> cache running in distributed nodes and is using Jgroups to discover
> multiple instances of our webapp.
>
> *_What's working_* - multiple service containers on a /single /docker
> *_Not working_ *- when deployed across the 2 (or multiple) docker hosts.
> Atthispoint suppose containerId1 ison dockerHost1 andcontainerId2 went
> to dockerHost2 -ifwe run the following commands on each host,they should
> form a cluster but are failing to coalesce intosingle cluster.
> [root@localhost jgroups]# docker exec -it containerId1 /bin/bash
>
> bash-4.3$ cd jgroups-docker/conf/
> bash-4.3$ chat.sh -props udp.xml
> **view:[containerId1 -38465|0](1)[containerId1 -38465]
>
> exit
> |
>
>
> *If the above were to be done on a single docker host running in swarm
> mode (which is possible even for single host), then the above shows view
> : 2. Similarly in multiple host mode with replicas above, the containers
> form clusters on each host, but do not coalesce into a single cluster
> /across /the docker hosts.*
> *
> *
> Am hoping there's a solution here! Will provide any/more details if sought.
> Thanks,
> Prateek Kumar
>
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>.
> To post to this group, send email to jgrou...@googlegroups.com
> <mailto:jgrou...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jgroups-dev/e70f7b94-6367-4148-b9fe-b973c2266817%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-dev/e70f7b94-6367-4148-b9fe-b973c2266817%40googlegroups.com?utm_medium=email&utm_source=footer>.

Bela Ban

unread,
Sep 28, 2018, 7:23:19 AM9/28/18
to jgrou...@googlegroups.com


On 27/09/18 07:31, Prateek Kumar wrote:
This should work for discovery, but *not* for sending of group messages
(= messages to all cluster members). I suggest replace UDP with TCP or
change UDP.ip_mcast to false.

Have you tried using the weave plugin?
> <https://goldmann.pl/blog/2014/01/21/connecting-docker-containers-on-multiple-hosts/>.
>
> >
> > /So could someone please advise if this is possible?/
> >
> > _*Scenario*:_
> >
> >   * On premise private LAN network (not in AWS yet).
> >   * 2 docker hosts in a swarm cluster
> >   * Containers deployed as services stack (not containers like
> docker
> >     run with/without --net=host) so they will form a bridge
> network and
> >     get distributed across multiple docker hosts.
> >   * Jgroups 4 inside each container - for testing here we can use
> >     belaban/jgroups itself (but deployed using docker stack deploy
> >     instead), although in our application has an embedded Infinispan
> >     cache running in distributed nodes and is using Jgroups to
> discover
> >     multiple instances of our webapp.
> >
> > *_What's working_* - multiple service containers on a /single
> /docker
> > host can indeed form a cluster (This was achieved after following
> > instructions
> > at
> https://stackoverflow.com/questions/50086557/infinispans-jgroups-not-joining-the-same-cluster-in-docker-services
> <https://stackoverflow.com/questions/50086557/infinispans-jgroups-not-joining-the-same-cluster-in-docker-services>
>
> > cross posted
> > at
> > an email to jgroups-dev...@googlegroups.com <javascript:>
> > <mailto:jgroups-dev...@googlegroups.com <javascript:>>.
> > To post to this group, send email to jgrou...@googlegroups.com
> <javascript:>
> > <mailto:jgrou...@googlegroups.com <javascript:>>.
> <https://groups.google.com/d/msgid/jgroups-dev/e70f7b94-6367-4148-b9fe-b973c2266817%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/optout>.
>
> --
> Bela Ban | http://www.jgroups.org
>
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>.
> To post to this group, send email to jgrou...@googlegroups.com
> <mailto:jgrou...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jgroups-dev/cc88f9e1-4d14-454b-ac66-09047df3c2e5%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-dev/cc88f9e1-4d14-454b-ac66-09047df3c2e5%40googlegroups.com?utm_medium=email&utm_source=footer>.

Prateek Kumar

unread,
Sep 29, 2018, 12:36:07 AM9/29/18
to jgroups-dev
All right, thanks. I have not tested weave yet (introducing a third-party plugin in the network would be a last resort).

But I will try with FILE_PING (with an nfs server) and retry TCPPING.

A fall back would be GossipRouter (TCPGOSSIP) mostly because it would also work with remote caches like https://hub.docker.com/r/jboss/infinispan-server/ which use jgroups too.

Thanks for getting back to us!
>      > <mailto:jgroups-dev+unsub...@googlegroups.com <javascript:>>.
>      > To post to this group, send email to jgrou...@googlegroups.com
>     <javascript:>
>      > <mailto:jgrou...@googlegroups.com <javascript:>>.
>      > To view this discussion on the web visit
>      >
>     https://groups.google.com/d/msgid/jgroups-dev/e70f7b94-6367-4148-b9fe-b973c2266817%40googlegroups.com
>     <https://groups.google.com/d/msgid/jgroups-dev/e70f7b94-6367-4148-b9fe-b973c2266817%40googlegroups.com>
>
>      >
>     <https://groups.google.com/d/msgid/jgroups-dev/e70f7b94-6367-4148-b9fe-b973c2266817%40googlegroups.com?utm_medium=email&utm_source=footer
>     <https://groups.google.com/d/msgid/jgroups-dev/e70f7b94-6367-4148-b9fe-b973c2266817%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
>      > For more options, visit https://groups.google.com/d/optout
>     <https://groups.google.com/d/optout>.
>
>     --
>     Bela Ban | http://www.jgroups.org
>
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-dev...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages