Bela,
Thanks for your advice.
But, unluckily, i found JBoss EAP 7.3.3 do not bundled with a jgroups version that have the FD_ALL3. (jgroups version = jgroups-4.1.4.Final-redhat-00001.jar)
Anyway, i spend some days for troubleshooting the issue...i found, in my environment (with 2 or more firewalls) RHSSO nodes may disjoin due to connection to the gossip router...
Some debug logs as following:
1. at time T, the RHSSO/JBoss EAP clustering failed (exception: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request xxx from yyy)....
2. at time T + <60s : FD_ALL: xxx: suspecting [yyy]
3. at time T + <60s: VERIFY_SUSPECT: VERIFY_SUSPECT: [yyy] is dead (passing up SUSPECT event)
4. at time T + 4mins:
UNICAST3: xxx: removing expired connection for yyy (240047 ms old) from recv_table
5. at time T + 15mins, the jgroups TUNNEL
- xxx: connection to host:port closed, trying to re-establish connection
- xxx: failed sending a message to yyy (router used host:port): java.lang.Exception: connection to host:port broken. Could not send message to yyy: java.net.SocketException: Socket closed
- xxx: re-established connection to host:port successfully for group ejb
(it take less than 1s for the reconnection to the gossip router).
So, i don't know the TUNNEL found the clustering issue after 15mins...checked that there is no long GC pause...
I suspect there is network timeout/stuck that cause the TUNNEL to slow (T + 15mins) to found the issue
Do you have a hint about that or any suggestion for me to further check?
many thx.