Hi Bela,
I am using JGroups (4.1.10) TUNNEL protocol + gossip router.
i hit an issue that, in openshift environment, due to unknown network hiccup/openshift app node issue, TCP response from gossip router is being dropping for the tcp stream.
as a result, the sender keep tcp retransmission until after 15mins, the tcp retransmission failed. jgroups will then discard the tcp connect and try to make another tcp connection to the gossip router which will succeed and cluster resume...
having multiple gossip routers seems don't help in this situation...
while i think we can alter the '/proc/sys/net/ipv4/tcp_retries2' to a lower value in OS level (which affect all applications running in the same openshift cluster), do you think that we can handle it in jgroups (say set a timeout for 1 mins and discard the existing tcp connection and re-connect....)? i am not able to find related setting for that in the documentation....
thank you.
below is the setting i use.
-------------------------------------
<stack name="tunnelStack">
<transport type="TUNNEL" socket-binding="jgroups-tcp">
<property name="gossip_router_hosts">host1[port],host2[port]</property>
<property name="reconnect_interval">3000</property>
<property name="port_range">0</property>
<property name="ergonomics">false</property>
</transport>
<protocol type="PING">
<property name="async_discovery">true</property>
<property name="ergonomics">false</property>
</protocol>
<protocol type="MERGE3">
<property name="max_interval">10000</property>
<property name="min_interval">3000</property>
<property name="ergonomics">false</property>
</protocol>
<protocol type="FD_ALL">
<property name="timeout">9000</property>
<property name="timeout_check_interval">2000</property>
<property name="interval">3000</property>
<property name="ergonomics">false</property>
</protocol>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2">
<property name="use_mcast_xmit">false</property>
</protocol>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>