Question about TUNNEL + FD_SOCK/FD

Paul Luk

unread,

Dec 1, 2020, 10:56:10 PM12/1/20

to jgroups-dev

Currently, due to firewall issue, i need to cluster JBoss EAP 7.x server via the Gossip Router and hence use the TUNNEL protocol.

from an post: https://docs.jboss.org/jbossas/docs/Clustering_Guide/beta422/html/ch07s05s04.html, it suggest to use FD + FD_SOCK

For TUNNEL transport, shall i also include the FD_SOCK?

As i am not an expert on jgroups, can anybody comment on my current setting ? Thanks.

--------------------------------

<property name="gossip_router_hosts">gossip_router_host[port]</property>

</transport>

<socket-protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>

<property name="use_mcast_xmit">false</property>

</protocol>

</stack>

---------------------------------

Bela Ban

unread,

Dec 2, 2020, 10:38:09 AM12/2/20

to jgrou...@googlegroups.com

If you have a firewall and use TUNNEL, then FD_SOCK does not make any
sense, as it uses direct member-to-member TCP connections. This would go
aroiund a firewall and thus most likely not work.

On 02.12.20 4:56 am, Paul Luk wrote:
> Currently, due to firewall issue, i need to cluster JBoss EAP 7.x
> server via the Gossip Router and hence use the TUNNEL protocol.
>
> from an post:
> https://docs.jboss.org/jbossas/docs/Clustering_Guide/beta422/html/ch07s05s04.html

> <https://docs.jboss.org/jbossas/docs/Clustering_Guide/beta422/html/ch07s05s04.html>,

> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jgroups-dev/1f04182a-264c-43a9-b872-c49fe2d73faan%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-dev/1f04182a-264c-43a9-b872-c49fe2d73faan%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Bela Ban, JGroups lead (http://www.jgroups.org)

Paul Luk

unread,

Dec 3, 2020, 4:28:45 AM12/3/20

to jgroups-dev

Bela,

Thanks for you opinion.

i am deploying my jboss eap 7.x application to openshift 3.11 which seems have unstable network/DNS...

the jboss eap node sometimes dis-join from the jboss cluster and auto-heal after around ~15mins...

i want to check whether i can set the jgroups to reduce the time to auto-heal (asap after network hiccup resolved)...

is there any optimization or options you generally recommended?

e.g. i see that, in some jgroups posts, you ever suggest to turn on the 'msg_counts_as_heartbeat' of the FD_ALL. but in the jgroups document, it state "Treat messages received from members as heartbeats. Note that this means we're updating a value in a hashmap every time a message is passing up the stack through FD, which is costly."

Bela Ban

unread,

Dec 3, 2020, 9:07:31 AM12/3/20

to jgrou...@googlegroups.com

If you can, use FD_ALL3 which suppresses sending out heartbeats if
messages are already sent.

> <https://groups.google.com/d/msgid/jgroups-dev/1f04182a-264c-43a9-b872-c49fe2d73faan%40googlegroups.com?utm_medium=email&utm_source=footer

> <https://groups.google.com/d/msgid/jgroups-dev/1f04182a-264c-43a9-b872-c49fe2d73faan%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org

> <http://www.jgroups.org>)

>
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>.
> To view this discussion on the web visit

> https://groups.google.com/d/msgid/jgroups-dev/1508a5e6-ec28-4b78-bedc-c88b1f077b2bn%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-dev/1508a5e6-ec28-4b78-bedc-c88b1f077b2bn%40googlegroups.com?utm_medium=email&utm_source=footer>.

Paul Luk

unread,

Dec 21, 2020, 1:06:53 AM12/21/20

to jgroups-dev

Bela,

Thanks for your advice.

But, unluckily, i found JBoss EAP 7.3.3 do not bundled with a jgroups version that have the FD_ALL3. (jgroups version = jgroups-4.1.4.Final-redhat-00001.jar)

Anyway, i spend some days for troubleshooting the issue...i found, in my environment (with 2 or more firewalls) RHSSO nodes may disjoin due to connection to the gossip router...

Some debug logs as following:

1. at time T, the RHSSO/JBoss EAP clustering failed (exception: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request xxx from yyy)....

2. at time T + <60s : FD_ALL: xxx: suspecting [yyy]

3. at time T + <60s: VERIFY_SUSPECT: VERIFY_SUSPECT: [yyy] is dead (passing up SUSPECT event)

4. at time T + 4mins: UNICAST3: xxx: removing expired connection for yyy (240047 ms old) from recv_table
5. at time T + 15mins, the jgroups TUNNEL

- xxx: connection to host:port closed, trying to re-establish connection

- xxx: failed sending a message to yyy (router used host:port): java.lang.Exception: connection to host:port broken. Could not send message to yyy: java.net.SocketException: Socket closed

- xxx: re-established connection to host:port successfully for group ejb

(it take less than 1s for the reconnection to the gossip router).

So, i don't know the TUNNEL found the clustering issue after 15mins...checked that there is no long GC pause...

I suspect there is network timeout/stuck that cause the TUNNEL to slow (T + 15mins) to found the issue

Question about TUNNEL + FD_SOCK/FD_ALL

Paul Luk

Bela Ban

Paul Luk

Bela Ban

Paul Luk