Question about TUNNEL + FD_SOCK/FD_ALL

79 views
Skip to first unread message

Paul Luk

unread,
Dec 1, 2020, 10:56:10 PM12/1/20
to jgroups-dev
Currently, due to firewall issue, i need to cluster JBoss EAP 7.x server via the Gossip Router and hence use the TUNNEL protocol.


For TUNNEL transport, shall i also include the FD_SOCK?

As i am not an expert on jgroups, can anybody comment on my current setting ? Thanks.

--------------------------------

                <stack name="gossip-router">
                    <transport type="TUNNEL" socket-binding="jgroups-tcp">
                        <property name="gossip_router_hosts">gossip_router_host[port]</property>
                        <property name="reconnect_interval">3000</property>
<property name="port_range">0</property> 
                    </transport>
                    <protocol type="PING" />               
                    <protocol type="MERGE3"/>
                    <socket-protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/> <!-- worth to include for TUNNEL? -->
                    <protocol type="FD_ALL"/>
                    <protocol type="VERIFY_SUSPECT"/>
                    <protocol type="pbcast.NAKACK2">
                        <property name="use_mcast_xmit">false</property>
                    </protocol>
                    <protocol type="UNICAST3"/>
                    <protocol type="pbcast.STABLE"/>
                    <protocol type="pbcast.GMS"/>
                    <protocol type="MFC"/>
                    <protocol type="FRAG3"/>
                </stack>

---------------------------------

Bela Ban

unread,
Dec 2, 2020, 10:38:09 AM12/2/20
to jgrou...@googlegroups.com
If you have a firewall and use TUNNEL, then FD_SOCK does not make any
sense, as it uses direct member-to-member TCP connections. This would go
aroiund a firewall and thus most likely not work.

On 02.12.20 4:56 am, Paul Luk wrote:
> Currently, due to firewall issue, i need to cluster JBoss EAP 7.x
> server via the Gossip Router and hence use the TUNNEL protocol.
>
> from an post:
> https://docs.jboss.org/jbossas/docs/Clustering_Guide/beta422/html/ch07s05s04.html
> <https://docs.jboss.org/jbossas/docs/Clustering_Guide/beta422/html/ch07s05s04.html>,
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jgroups-dev/1f04182a-264c-43a9-b872-c49fe2d73faan%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-dev/1f04182a-264c-43a9-b872-c49fe2d73faan%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Bela Ban, JGroups lead (http://www.jgroups.org)

Paul Luk

unread,
Dec 3, 2020, 4:28:45 AM12/3/20
to jgroups-dev
Bela,

  Thanks for you opinion.

  i am deploying my jboss eap 7.x application to openshift 3.11 which seems have unstable network/DNS...
the jboss eap node sometimes dis-join from the jboss cluster and auto-heal after around ~15mins...
i want to check whether i can set the jgroups to reduce the time to auto-heal (asap after network hiccup resolved)...

  is there any optimization or options you generally recommended? 
  e.g.  i see that, in some jgroups posts, you ever suggest to turn on the 'msg_counts_as_heartbeat' of the FD_ALL. but in the jgroups document, it state "Treat messages received from members as heartbeats. Note that this means we're updating a value in a hashmap every time a message is passing up the stack through FD, which is costly."

Bela Ban

unread,
Dec 3, 2020, 9:07:31 AM12/3/20
to jgrou...@googlegroups.com

Paul Luk

unread,
Dec 21, 2020, 1:06:53 AM12/21/20
to jgroups-dev
Bela,

  Thanks for your advice.

  But, unluckily, i found JBoss EAP 7.3.3 do not bundled with a jgroups version that have the FD_ALL3. (jgroups version = jgroups-4.1.4.Final-redhat-00001.jar)

  Anyway, i spend some days for troubleshooting the issue...i found, in my environment (with 2 or more firewalls) RHSSO nodes may disjoin due to connection to the gossip router...

  Some debug logs as following:
  1. at time T, the RHSSO/JBoss EAP clustering failed (exception: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request xxx from yyy)....
  2. at time T + <60s : FD_ALL: xxx: suspecting [yyy] 
  3. at time T + <60s:  VERIFY_SUSPECT: VERIFY_SUSPECT: [yyy] is dead (passing up SUSPECT event)
  4. at time T + 4mins:  UNICAST3: xxx: removing expired connection for yyy (240047 ms old) from recv_table  
  5. at time T + 15mins, the jgroups TUNNEL 
      - xxx: connection to host:port closed, trying to re-establish connection
      - xxx: failed sending a message to yyy (router used host:port): java.lang.Exception: connection to host:port broken. Could not send message to yyy: java.net.SocketException: Socket closed
     - xxx: re-established connection to host:port successfully for group ejb
     (it take less than 1s for the reconnection to the gossip router).

 So, i don't know the TUNNEL found the clustering issue after 15mins...checked that there is no long GC pause...

 I suspect there is network timeout/stuck that cause the TUNNEL to slow (T + 15mins) to found the issue

 Do you have a hint about that or any suggestion for me to further check?   

  many thx.
Reply all
Reply to author
Forward
0 new messages