Thanks, Bela for your email.
firewall/SELinux will not be an issue, as this problem only occurs when we have 15 node cluster( when we tried to create a separate 3 nodes JGroup cluster with 4.0.10 version it's working perfectly fine).
The issue only occurs in 15 nodes JGroup cluster.
Extraction of logs,as flow of logs is
2020-09-03 11:47:30.317 WARN org.jgroups.protocols.pbcast.GMS - vmc0198-27827: not member of view [vmc0208-48939|123]; discarding it
2020-09-03 11:47:32.316 WARN org.jgroups.protocols.pbcast.GMS - vmc0198-27827: failed to create view from delta-view; dropping view: java.lang.IllegalStateException: the view-id of the delta view ([vmc0208-48939|123]) doesn't match the current view-id ([vmc0208-48939|122]); discarding delta view [vmc0208-48939|124], ref-view=[vmc0208-48939|123], joined=[vmc0198-5504]
2020-09-03 11:47:32.323 WARN org.jgroups.protocols.pbcast.GMS - vmc0198-27827: not member of view [vmc0208-48939|124]; discarding it.
2020-09-03 11:49:07.160 WARN org.jgroups.protocols.pbcast.NAKACK2 - JGRP000011: vmc0198-63871: dropped message batch from non-member vmc0201-28703 (view=MergeView::[vmc0208-48939|140] (24) [
***REMOVING MACHINE NAME AND PORT ***] ])
2020-09-03 11:49:07.160 WARN org.jgroups.protocols.pbcast.NAKACK2 - JGRP000011: vmc0198-23411: dropped message batch from non-member vmc0201-28703 (view=[***REMOVING MACHINE NAME AND PORT FOR CLEAR VIEW ***] .])
2020-09-05 16:16:07.380 DEBUG org.jgroups.protocols.FD_ALL - haven't received a heartbeat from vmc0201-55458 for 12541 ms, adding it to suspect list
2020-09-05 16:16:07.535 DEBUG org.jgroups.protocols.FD_SOCK - vmc0198-24881: failed connecting to vmc0204-45403: connect timed out
2020-09-05 16:16:07.536 DEBUG org.jgroups.protocols.FD_SOCK - vmc0198-24881: broadcasting suspect(vmc0204-45403)
2020-09-05 16:16:07.536 DEBUG org.jgroups.protocols.FD_SOCK - vmc0198-24881: pingable_mbrs=[***REMOVING MACHINE NAME AND PORT ***], ping_dest=vmc0204-54485
2020-09-05 16:16:08.513 DEBUG org.jgroups.protocols.pbcast.GMS - vmc0198-52842: installing view [
***REMOVING MACHINE NAME AND PORT FOR CLEAR VIEW *** ]
2020-09-05 16:16:08.513 DEBUG org.jgroups.protocols.pbcast.GMS - vmc0198-24881: installing view [vmc0200-30543|2672] (184) [
***REMOVING MACHINE NAME AND PORT FOR CLEAR VIEW *** ]
I will try to enable msg_counts_as_heartbeat to true in FD_ALL and observe any difference.
By default, the UDP thread pool max size is 100 ( protected int thread_pool_max_threads=100), we are not passing any values as of now.
new UDP().setValue("bind_addr",InetAddress.getByName(myBindAddress)).setValue("mcast_port", 10600).setValue("bind_port", 10601)
.setValue("port_range", 100).setValue("diagnostics_bind_interfaces", parInterfaceList).setValue("diagnostics_port", 10599),
Shall I pass 200 in UDP parameter? .setValue("thread_pool_max_threads", 200)
Also, currently we are passing FD_ALL timeout=12000 and interval 3000, do I need to modify/update to default values
new FD_ALL().setValue("timeout", 12000).setValue("interval", 3000),
//In FD_ALL code
protected long interval=8000;
protected long timeout=40000;
Please suggest?