Lot of heartbeat missing in Jgroup 4.0.10

555 views
Skip to first unread message

abhije...@gmail.com

unread,
Sep 3, 2020, 6:57:27 AM9/3/20
to jgroups-dev
Greetings,

Hi, 

We just upgraded Jgroup from 3.4.3 to 4.0.10
The moment its gets upgraded to 4.0.10, traffic started dropping 
When enabling DEBUG log, we see a lot of heartbeat missing logs

2020-09-01 07:31:57.817 DEBUG org.jgroups.protocols.FD_ALL - haven't received a heartbeat from XXXXXXX-65220 for 85596 ms, adding it to suspect list. 

WARN  org.jgroups.protocols.pbcast.NAKACK2 - JGRP000011: XXXXXX-65513: dropped message batch from non-member YYYYYY-53626 (view=...)

org.jgroups.protocols.UDP - JGRP000032: XXXXXXX-64804: no physical address for OJDNGVS15-3604, dropping message.

When we rollback JGROUp to 3.4.3 issue resolved and traffic started normally.
Again when we do upgrade, it starts falling with the same logs!!

What I understand our current network have some issue which doesn't support the latest (4.0.10) JGROUP, But what exactly I am not getting!!
Have a few  questions and doubts regards same
1. What are the changes/modifications done(w.r.t to the above-mentioned issue), which makes JGroup 4.0.10 not getting stable in the current environment?
2. What is the required networking requirement to run the latest Jgroup?
3. How and what to check on my n/w related to the above issue? latency/NTP..etc

Any pointer will be highly appreciated.

Bela Ban

unread,
Sep 4, 2020, 5:29:33 AM9/4/20
to jgrou...@googlegroups.com
What's your config? I suggest copy udp.xml from 4.x and *don't reuse the
one from 3.6.x*, then make changes to it.
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jgroups-dev/ba246e65-5be8-4ce3-a421-db92581db0e7n%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-dev/ba246e65-5be8-4ce3-a421-db92581db0e7n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Bela Ban, JGroups lead (http://www.jgroups.org)

abhije...@gmail.com

unread,
Sep 4, 2020, 6:36:19 AM9/4/20
to jgroups-dev
Hi Bela,

Thanks for your email.
Below is protocol list with value we are using in Jgroup 4.0.10 (Mostly replica of udp.xml, with changed values)
Protocol[] protocolStack={
                new UDP().setValue("bind_addr",InetAddress.getByName(myBindAddress)).setValue("mcast_port", 10600).setValue("bind_port", 10601)
                .setValue("port_range", 100).setValue("diagnostics_bind_interfaces", parInterfaceList).setValue("diagnostics_port", 10599),
                new PING(),
                new MERGE3(),
                new FD_SOCK().setValue("bind_addr", InetAddress.getByName(myBindAddress)),
                new FD_ALL().setValue("timeout", 12000).setValue("interval", 3000),
                new VERIFY_SUSPECT().setValue("bind_addr", InetAddress.getByName(myBindAddress)),
                new BARRIER(),
                new NAKACK2(),
                new UNICAST3(),
                new STABLE(),
                new GMS().setValue("print_local_addr", true),
                new UFC(),
                new MFC(),
                new FRAG2()};

The only change in the current version(4.0.10) we did is UNICAST3 which earlier we used UNICAST2.

Kindly suggest!!
I am also keen to know what changes introduced in the new JGroup version, which makes JGroup not working in the same environment on which the earlier version of JGroup(3.4.3) is working smoothly.

Bela Ban

unread,
Sep 4, 2020, 7:27:35 AM9/4/20
to jgrou...@googlegroups.com
Your config looks ok.

- Do you have any firewalls/SELinux etc enabled?

- Can you post the logs?

I also suggest the following:

* Set msg_counts_as_heartbeat to true in FD_ALL

* Look at your thread pool config in UDP, perhaps increase the max size
> https://groups.google.com/d/msgid/jgroups-dev/b49eb689-e29c-4cd3-b35d-335cdc1405c8n%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-dev/b49eb689-e29c-4cd3-b35d-335cdc1405c8n%40googlegroups.com?utm_medium=email&utm_source=footer>.

abhije...@gmail.com

unread,
Sep 5, 2020, 11:26:47 AM9/5/20
to jgroups-dev
Thanks, Bela for your email.

firewall/SELinux will not be an issue, as this problem only occurs when we have 15 node cluster( when we tried to create a separate 3 nodes JGroup cluster with 4.0.10 version it's working perfectly fine).
The issue only occurs in 15 nodes JGroup cluster.

Extraction of logs,as flow of logs is 
 2020-09-03 11:47:30.317 WARN  org.jgroups.protocols.pbcast.GMS - vmc0198-27827: not member of view [vmc0208-48939|123]; discarding it
  2020-09-03 11:47:32.316 WARN  org.jgroups.protocols.pbcast.GMS - vmc0198-27827: failed to create view from delta-view; dropping view: java.lang.IllegalStateException: the view-id of the delta view ([vmc0208-48939|123]) doesn't match the current view-id ([vmc0208-48939|122]); discarding delta view [vmc0208-48939|124], ref-view=[vmc0208-48939|123], joined=[vmc0198-5504]
  2020-09-03 11:47:32.323 WARN  org.jgroups.protocols.pbcast.GMS - vmc0198-27827: not member of view [vmc0208-48939|124]; discarding it.
2020-09-03 11:49:07.160 WARN  org.jgroups.protocols.pbcast.NAKACK2 - JGRP000011: vmc0198-63871: dropped message batch from non-member vmc0201-28703 (view=MergeView::[vmc0208-48939|140] (24) [ ***REMOVING MACHINE NAME AND PORT ***]  ])
  2020-09-03 11:49:07.160 WARN  org.jgroups.protocols.pbcast.NAKACK2 - JGRP000011: vmc0198-23411: dropped message batch from non-member vmc0201-28703 (view=[***REMOVING MACHINE NAME AND PORT FOR CLEAR VIEW ***]  .])
    2020-09-05 16:16:07.380 DEBUG org.jgroups.protocols.FD_ALL - haven't received a heartbeat from vmc0201-55458 for 12541 ms, adding it to suspect list
  2020-09-05 16:16:07.535 DEBUG org.jgroups.protocols.FD_SOCK - vmc0198-24881: failed connecting to vmc0204-45403: connect timed out
  2020-09-05 16:16:07.536 DEBUG org.jgroups.protocols.FD_SOCK - vmc0198-24881: broadcasting suspect(vmc0204-45403)
  2020-09-05 16:16:07.536 DEBUG org.jgroups.protocols.FD_SOCK - vmc0198-24881: pingable_mbrs=[***REMOVING MACHINE NAME AND PORT ***], ping_dest=vmc0204-54485
   2020-09-05 16:16:08.513 DEBUG org.jgroups.protocols.pbcast.GMS - vmc0198-52842: installing view [ ***REMOVING MACHINE NAME AND PORT FOR CLEAR VIEW ***  ]
  2020-09-05 16:16:08.513 DEBUG org.jgroups.protocols.pbcast.GMS - vmc0198-24881: installing view [vmc0200-30543|2672] (184) [ ***REMOVING MACHINE NAME AND PORT FOR CLEAR VIEW ***  ]
  
I will try to enable msg_counts_as_heartbeat to true in FD_ALL and observe any difference.

By default, the UDP thread pool max size is 100 ( protected int thread_pool_max_threads=100), we are not passing any values as of now.
 new UDP().setValue("bind_addr",InetAddress.getByName(myBindAddress)).setValue("mcast_port", 10600).setValue("bind_port", 10601)
                .setValue("port_range", 100).setValue("diagnostics_bind_interfaces", parInterfaceList).setValue("diagnostics_port", 10599),  

Shall I  pass 200 in UDP parameter? .setValue("thread_pool_max_threads", 200)

Also, currently we are passing FD_ALL timeout=12000 and interval 3000, do I need to modify/update to default values
new FD_ALL().setValue("timeout", 12000).setValue("interval", 3000),   
//In FD_ALL code
protected long                                   interval=8000;
 protected long                                   timeout=40000;

Please suggest?

Bela Ban

unread,
Sep 8, 2020, 8:59:05 AM9/8/20
to jgrou...@googlegroups.com
Hard to tell what's going on, without more information.

It's clear that the cluster is falling apart, caused by FD_ALL and
FD_SOCK. The former missed some heartbeats, this can be caused by long
GC cycles, a network problem, a thread pool that's too small. etc.

The latter is unable to connect to other members (failed connecting to
vmc0204-45403), and this also causes issues.

What kind of load do you have on this system? 15 membe-clusters are
typically not an issue to JGroups...
> *.setValue("thread_pool_max_threads", 200)*
> *
> *
> Also, currently we are passing FD_ALL timeout=12000 and interval 3000,
> do I need to modify/update to default values
> *new FD_ALL().setValue("timeout", 12000).setValue("interval", 3000), *
> //In FD_ALL code
> protected long interval=8000;
> **protected long timeout=40000;
> https://groups.google.com/d/msgid/jgroups-dev/5e32c24b-cbfe-4ecb-8dfd-fb893902d2abn%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-dev/5e32c24b-cbfe-4ecb-8dfd-fb893902d2abn%40googlegroups.com?utm_medium=email&utm_source=footer>.

Abhijeet Banerjee

unread,
Sep 8, 2020, 10:11:21 AM9/8/20
to Bela Ban, jgrou...@googlegroups.com
Hi Bela,

Thanks for your email.
Below is protocol list with value we are using in Jgroup 4.0.10 (Mostly replica of udp.xml, with changed values)
Protocol[] protocolStack={
                new UDP().setValue("bind_addr",InetAddress.getByName(myBindAddress)).setValue("mcast_port", 10600).setValue("bind_port", 10601)
                .setValue("port_range", 100).setValue("diagnostics_bind_interfaces", parInterfaceList).setValue("diagnostics_port", 10599),
                new PING(),
                new MERGE3(),
                new FD_SOCK().setValue("bind_addr", InetAddress.getByName(myBindAddress)),
                new FD_ALL().setValue("timeout", 12000).setValue("interval", 3000),
                new VERIFY_SUSPECT().setValue("bind_addr", InetAddress.getByName(myBindAddress)),
                new BARRIER(),
                new NAKACK2(),
                new UNICAST3(),
                new STABLE(),
                new GMS().setValue("print_local_addr", true),
                new UFC(),
                new MFC(),
                new FRAG2()};

The only change in current version(4.0.10) we did is UNICAST3 which earlier we used UNICAST2.

Kindly suggest!!
I am also keen to know what changes introduced in the new JGroup version, which makes JGroup not working in the same environment on which the earlier version of JGroup(3.4.3) is working smoothly.
Thanks\
Abhijeet Banerjee
+919910512611

<div class="LI-profile-badge"  data-version="v1" data-size="medium" data-locale="en_US" data-type="horizontal" data-theme="light" data-vanity="abhijeet-banerjee-a18b2a51"><a class="LI-simple-link" href='https://in.linkedin.com/in/abhijeet-banerjee-a18b2a51?trk=profile-badge'>Abhijeet Banerjee</a></div>





To unsubscribe from this group and stop receiving emails from it, send an email to jgroups-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jgroups-dev/7a822954-89e0-dec8-3722-cd50fa7f6fc7%40mailbox.org.
Message has been deleted

abhije...@gmail.com

unread,
Sep 10, 2020, 5:04:38 AM9/10/20
to jgroups-dev
Hi Bela,

Our distributed application uses jgroups communication cluster of n x n fashion where n is the number of nodes in the cluster.
Which means on a :

-> 3 node cluster with each node having 3 channels - 9 (3x3) view members per channel.
-> 5 node cluster with each node having 5 channels - 25 (5x5) view members per channel.
-> 7 node cluster with each node having 7 channels - 49 (7x7) view members per channel.
...
...
-> 15 node cluster with each node having 15 channels - 225 (15x15) view members per channel.

Our application logic for inter-node communication depends on a consistent full member view of every active channel in the jgroups cluster.

We have upgraded from jgroups 3.4.3 to 4.0.10 and have observed following behavioral change on the creation and consistency of channels' view between jgroups 3.4.3 and 4.0.10 variants of our application.

We have simulated the jgroups cluster of our application with attached standalone applications JGroups3Test.jar (working on jgroups 3.4.3) and JGroups4Test.jar (working on jgroups 4.0.10).
NOTE: Due to a security check I had updated .jar extension to .doc, Please revert the extension before examining the same.
 Both applications take the IP addresses and the number of channels as arguments. We have run both applications in the following matrix and collected view data and timings.

Number of members (number of nodes x number of channels) Jgroups 3.4.3    Jgroup 4.0.10

225 (15x15) Simultaneous start 25 - 30 seconds* 15 minutes**
225 (15x15) Rolling start (view after 15th node start) 20 seconds* 10 minutes**
196 (14x14) Simultaneous start 25 seconds* 4 minutes**
169 (13x13) Simultaneous start 30 - 31 seconds* 7 minutes**
144 (12x12) Simultaneous start 27 seconds* 5 minutes**
121 (11x11) Simultaneous start 22 seconds* 2 minutes**
100 (10x10) Simultaneous start 20 seconds* 5 minutes**
...
...
9 to 49 channels (3x3) to (7x7) almost immediate* almost immediate*

*  Consistent full member view reaching in an approx given time.
** Inconsistent partial member view (reaching momentarily full member strength after approx given time).

Attached is the standalone application code for JGroups3Test.jar (working on jgroups 3.4.3) and JGroups4Test.jar (working on jgroups 4.0.10) and screen-shots. The application can be run in your environment to verify the same behavior.

Please suggest :
1. Any configuration change that can be done on jgroups 4.0.10 application to achieve a similar behavior as on jgroups 3.4.3.
2. The reason why jgroups 4.0.10 exhibit this behavior and the underlying architectural change.
15NodeJgroups3After30Seconds.jpg
JGroups4TestApp.doc
15NodeJgroups4After10Minutes.jpg
15NodeJgroups3.jpg
15NodeJgroups4.jpg
JGroups3TestApp.doc

abhije...@gmail.com

unread,
Sep 11, 2020, 7:22:33 AM9/11/20
to jgroups-dev
Hello Bela,

May you please suggest if there are any configuration tweaks that we need to do, which can help to fix the problem in Jgroup 4.0.10.

As mentioned in the last mail we have created a small program both in Jgroup 3.4.3 and Jgroups 4.0.10
Both applications take IP addresses and the number of channels as arguments. We have run both applications in the following matrix and collected view data and timings.
Below are the stats:

Number of members (number of nodes x number of channels)         Jgroups 3.4.3                 Jgroup 4.0.10

225 (15x15) Simultaneous start                                                              25 - 30 seconds*             15 minutes**
225 (15x15) Rolling start (view after 15th node start)                         20 seconds*                     10 minutes**
196 (14x14) Simultaneous start                                                              25 seconds*                      4 minutes**
169 (13x13) Simultaneous start                                                              30 - 31 seconds*               7 minutes**
144 (12x12) Simultaneous start                                                              27 seconds*                       5 minutes**
121 (11x11) Simultaneous start                                                              22 seconds*                       2 minutes**
100 (10x10) Simultaneous start                                                              20 seconds*                       5 minutes**
...
...
9 to 49 channels (3x3) to (7x7) almost immediate* almost immediate*  
Note: Even after taking 15 minutes, views are not stable its keeps fluctuating.

If possible can we schedule a call, where we can present and understand our issue in detail?
Please do let me know your suitable time if you agree?

Abhijeet Banerjee

unread,
Sep 14, 2020, 1:19:30 AM9/14/20
to Bela Ban, jgrou...@googlegroups.com
Thanks\
Abhijeet Banerjee
+919910512611

<div class="LI-profile-badge"  data-version="v1" data-size="medium" data-locale="en_US" data-type="horizontal" data-theme="light" data-vanity="abhijeet-banerjee-a18b2a51"><a class="LI-simple-link" href='https://in.linkedin.com/in/abhijeet-banerjee-a18b2a51?trk=profile-badge'>Abhijeet Banerjee</a></div>




To unsubscribe from this group and stop receiving emails from it, send an email to jgroups-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jgroups-dev/443ea17d-3e1a-e8bb-0e3a-3e116f134db8%40mailbox.org.
15NodeJgroups3.jpg
15NodeJgroups4.jpg
15NodeJgroups4After10Minutes.jpg
15NodeJgroups3After30Seconds.jpg
JGroups3TestApp.doc
JGroups4TestApp.doc

Chintan Mohan Rohila

unread,
Sep 14, 2020, 6:58:47 AM9/14/20
to jgroups-dev
Hi Bela,

We have some observations further to what have been tested so far.

The attached JGroups v4.0.10 test application does stabilizes on view members in absence of following ip4v config. but as soon as we have these parameters set the members in the views fluctuate on reaching a cluster of 10 nodes with 30 channels from 300 to ~297.
Also, there is a considerable time difference between JGroups v3.4.3 and JGroups v4.0.10 apps for view attaining full 300 members.

RHEL ip4 properties:
net.ipv4.route.flush=1
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.conf.default.rp_filter = 1

Please suggest that how these properties are affecting the heartbeat sharing between the members? 
Also, what could be the reason for the time difference for views in attaining the full member strength in a 10 node X 30 channels per node setup betweeen JGroups v3.4.3 and JGroups v4.0.10 apps and can it be fine tuned using some JGroups/OS config?

Regards,
Chintan

Mayank Maheshwari

unread,
Sep 22, 2020, 1:43:11 AM9/22/20
to jgroups-dev
Hi Bela,

Appreciate your reply on this issue. We need your support here to analyse the issue we are facing with Jgroup version4.

Thanks
Mayank

Bela Ban

unread,
Sep 25, 2020, 9:17:45 AM9/25/20
to Abhijeet Banerjee, jgrou...@googlegroups.com
Why do you busy-loop in Receiver? Calling Thread.yield() continuously is
probably burning enough CPU to cause FD_ALL to suspect a member!

I modified your program slightly and was able to get more than 100
members (10x10) without problems...

Cheers,
> *NOTE: Due to a security check I had updated .jar extension to .doc,
> Please revert the extension before examining the same.*
> *Thanks\
> **Abhijeet Banerjee
> +919910512611*
> <div class="LI-profile-badge" data-version="v1" data-size="medium"
> data-locale="en_US" data-type="horizontal" data-theme="light"
> data-vanity="abhijeet-banerjee-a18b2a51"><a class="LI-simple-link"
> href='https://in.linkedin.com/in/abhijeet-banerjee-a18b2a51?trk=profile-badge'>Abhijeet
> Banerjee</a></div> <https://in.linkedin.com/in/abhijeet-banerjee-a18b2a51>
>
>
>
>
>
> On Tue, Sep 8, 2020 at 6:29 PM Bela Ban <bel...@mailbox.org
> <mailto:bel...@mailbox.org>> wrote:
>
> Hard to tell what's going on, without more information.
>
> It's clear that the cluster is falling apart, caused by FD_ALL and
> FD_SOCK. The former missed some heartbeats, this can be caused by
> long
> GC cycles, a network problem, a thread pool that's too small. etc.
>
> The latter is unable to connect to other members (failed
> connecting to
> vmc0204-45403), and this also causes issues.
>
> What kind of load do you have on this system? 15 membe-clusters are
> typically not an issue to JGroups...
>
>
> On 05.09.20 5:26 PM, abhije...@gmail.com
> > > > <mailto:jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>>.
> > > > To view this discussion on the web visit
> > > >
> > >
> >
> https://groups.google.com/d/msgid/jgroups-dev/ba246e65-5be8-4ce3-a421-db92581db0e7n%40googlegroups.com
> >
> > >
> > > >
> > >
> >
> <https://groups.google.com/d/msgid/jgroups-dev/ba246e65-5be8-4ce3-a421-db92581db0e7n%40googlegroups.com?utm_medium=email&utm_source=footer>.
> >
> > >
> > >
> > > --
> > > Bela Ban, JGroups lead (http://www.jgroups.org)
> > >
> > > --
> > > You received this message because you are subscribed to
> the Google
> > > Groups "jgroups-dev" group.
> > > To unsubscribe from this group and stop receiving emails from
> > it, send
> > > an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>
> > > <mailto:jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>>.
> > > To view this discussion on the web visit
> > >
> >
> https://groups.google.com/d/msgid/jgroups-dev/b49eb689-e29c-4cd3-b35d-335cdc1405c8n%40googlegroups.com
> >
> > >
> >
> <https://groups.google.com/d/msgid/jgroups-dev/b49eb689-e29c-4cd3-b35d-335cdc1405c8n%40googlegroups.com?utm_medium=email&utm_source=footer>.
> >
> >
> > --
> > Bela Ban, JGroups lead (http://www.jgroups.org)
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "jgroups-dev" group.
> > To unsubscribe from this group and stop receiving emails from
> it, send
> > an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev%2Bunsu...@googlegroups.com>
> > <mailto:jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev%2Bunsu...@googlegroups.com>>.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/jgroups-dev/5e32c24b-cbfe-4ecb-8dfd-fb893902d2abn%40googlegroups.com
>
> >
> <https://groups.google.com/d/msgid/jgroups-dev/5e32c24b-cbfe-4ecb-8dfd-fb893902d2abn%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org)
>
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev%2Bunsu...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jgroups-dev/443ea17d-3e1a-e8bb-0e3a-3e116f134db8%40mailbox.org.

Bela Ban

unread,
Sep 29, 2020, 9:38:01 AM9/29/20
to Abhijeet Banerjee, jgrou...@googlegroups.com
Forgot to attach the modified files...

Note that this is JGroups 5, but it should be simple to change the tests
to run on JGroups-4.x.

Cheers,


On 10.09.20 11:02 AM, Abhijeet Banerjee wrote:
> *NOTE: Due to a security check I had updated .jar extension to .doc,
> Please revert the extension before examining the same.*
> *Thanks\
> **Abhijeet Banerjee
> +919910512611*
> <div class="LI-profile-badge" data-version="v1" data-size="medium"
> data-locale="en_US" data-type="horizontal" data-theme="light"
> data-vanity="abhijeet-banerjee-a18b2a51"><a class="LI-simple-link"
> href='https://in.linkedin.com/in/abhijeet-banerjee-a18b2a51?trk=profile-badge'>Abhijeet
> Banerjee</a></div> <https://in.linkedin.com/in/abhijeet-banerjee-a18b2a51>
>
>
>
>
>
> On Tue, Sep 8, 2020 at 6:29 PM Bela Ban <bel...@mailbox.org
> <mailto:bel...@mailbox.org>> wrote:
>
> Hard to tell what's going on, without more information.
>
> It's clear that the cluster is falling apart, caused by FD_ALL and
> FD_SOCK. The former missed some heartbeats, this can be caused by
> long
> GC cycles, a network problem, a thread pool that's too small. etc.
>
> The latter is unable to connect to other members (failed
> connecting to
> vmc0204-45403), and this also causes issues.
>
> What kind of load do you have on this system? 15 membe-clusters are
> typically not an issue to JGroups...
>
>
> On 05.09.20 5:26 PM, abhije...@gmail.com
> > > > <mailto:jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>>.
> > > > To view this discussion on the web visit
> > > >
> > >
> >
> https://groups.google.com/d/msgid/jgroups-dev/ba246e65-5be8-4ce3-a421-db92581db0e7n%40googlegroups.com
> >
> > >
> > > >
> > >
> >
> <https://groups.google.com/d/msgid/jgroups-dev/ba246e65-5be8-4ce3-a421-db92581db0e7n%40googlegroups.com?utm_medium=email&utm_source=footer>.
> >
> > >
> > >
> > > --
> > > Bela Ban, JGroups lead (http://www.jgroups.org)
> > >
> > > --
> > > You received this message because you are subscribed to
> the Google
> > > Groups "jgroups-dev" group.
> > > To unsubscribe from this group and stop receiving emails from
> > it, send
> > > an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>
> > > <mailto:jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>>.
> > > To view this discussion on the web visit
> > >
> >
> https://groups.google.com/d/msgid/jgroups-dev/b49eb689-e29c-4cd3-b35d-335cdc1405c8n%40googlegroups.com
> >
> > >
> >
> <https://groups.google.com/d/msgid/jgroups-dev/b49eb689-e29c-4cd3-b35d-335cdc1405c8n%40googlegroups.com?utm_medium=email&utm_source=footer>.
> >
> >
> > --
> > Bela Ban, JGroups lead (http://www.jgroups.org)
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "jgroups-dev" group.
> > To unsubscribe from this group and stop receiving emails from
> it, send
> > an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev%2Bunsu...@googlegroups.com>
> > <mailto:jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev%2Bunsu...@googlegroups.com>>.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/jgroups-dev/5e32c24b-cbfe-4ecb-8dfd-fb893902d2abn%40googlegroups.com
>
> >
> <https://groups.google.com/d/msgid/jgroups-dev/5e32c24b-cbfe-4ecb-8dfd-fb893902d2abn%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org)
>
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev%2Bunsu...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jgroups-dev/443ea17d-3e1a-e8bb-0e3a-3e116f134db8%40mailbox.org.
Reciever.java
Sender.java
TestJgroups4.java

Chintan Mohan Rohila

unread,
Sep 29, 2020, 9:38:17 AM9/29/20
to Bela Ban, Abhijeet Banerjee, jgrou...@googlegroups.com
Hi Bela,

Thanks for your reply on this.

We have the same application code that is running on JGroup 3.4.3 and it produces a consistent view for all the channels on a 15 channels X 15 node setup.

Moreover, this is just a crude test application trying to verify the status of channels' view but our real product application code doesn't have any such CPU extensive operations.

I would highly appreciate if you can review the configuration of the JGroups 4.0.10 protocol stack and suggest some protocol config parameters or OS config that can help us achieve a quick full member and consistent view of all the channels in a 15 channels X 15 node setup similar to view creation in JGroup 3.4.3 channels.

Also please share the code you have updated for a stable 10 X 10 setup. I'll try to test the same on our 15 X 15 setup and share the results.

Thanks.


To unsubscribe from this group and stop receiving emails from it, send an email to jgroups-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jgroups-dev/5e0e7fa3-7cb5-d4b5-4f76-ab9c4ce91da6%40mailbox.org.


--
Best regards,
Chintan Rohila

Bela Ban

unread,
Sep 29, 2020, 9:38:24 AM9/29/20
to Chintan Mohan Rohila, Abhijeet Banerjee, jgrou...@googlegroups.com
If I can't reproduce this, I can't suggest changes... I attached the 3
modified classes.

On 27.09.20 07:26, Chintan Mohan Rohila wrote:
> Hi Bela,
>
> > <mailto:bel...@mailbox.org <mailto:bel...@mailbox.org>>> wrote:
> >
> >     Hard to tell what's going on, without more information.
> >
> >     It's clear that the cluster is falling apart, caused by
> FD_ALL and
> >     FD_SOCK. The former missed some heartbeats, this can be caused by
> >     long
> >     GC cycles, a network problem, a thread pool that's too small.
> etc.
> >
> >     The latter is unable to connect to other members (failed
> >     connecting to
> >     vmc0204-45403), and this also causes issues.
> >
> >     What kind of load do you have on this system? 15
> membe-clusters are
> >     typically not an issue to JGroups...
> >
> >
> >     On 05.09.20 5:26 PM, abhije...@gmail.com
> <mailto:abhije...@gmail.com>
> >     <mailto:jgroups-dev%2Bunsu...@googlegroups.com
> <mailto:jgroups-dev%252Buns...@googlegroups.com>>
> >     > <mailto:jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev%2Bunsu...@googlegroups.com>
> >     <mailto:jgroups-dev%2Bunsu...@googlegroups.com
> <mailto:jgroups-dev%252Buns...@googlegroups.com>>>.
> >     > To view this discussion on the web visit
> >     >
> >
> https://groups.google.com/d/msgid/jgroups-dev/5e32c24b-cbfe-4ecb-8dfd-fb893902d2abn%40googlegroups.com
> >
> >     >
> >
>  <https://groups.google.com/d/msgid/jgroups-dev/5e32c24b-cbfe-4ecb-8dfd-fb893902d2abn%40googlegroups.com?utm_medium=email&utm_source=footer>.
> >
> >     --
> >     Bela Ban, JGroups lead (http://www.jgroups.org)
> >
> >     --
> >     You received this message because you are subscribed to the
> Google
> >     Groups "jgroups-dev" group.
> >     To unsubscribe from this group and stop receiving emails from it,
> >     send an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev%2Bunsu...@googlegroups.com>
> >     <mailto:jgroups-dev%2Bunsu...@googlegroups.com
> <mailto:jgroups-dev%252Buns...@googlegroups.com>>.
> >     To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/jgroups-dev/443ea17d-3e1a-e8bb-0e3a-3e116f134db8%40mailbox.org.
> >
>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org)
>
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev%2Bunsu...@googlegroups.com>.
> To view this discussion on the web visit
--
Bela Ban | http://www.jgroups.org

Reciever.java
Sender.java
TestJgroups4.java
Reply all
Reply to author
Forward
0 new messages