Understanding the coordinator role and the FLUSH protocol

53 views
Skip to first unread message

Liviu Ioan

unread,
Apr 15, 2024, 8:18:32 AM4/15/24
to jgroups-dev

Hello,

 

We went a few times through various sections of the Jgroups manual, but it would be great to see your input on some of the questions below.

We are using Jgroups 3.6.20 over UDP.

 

The coordinator

Q1: What is the definition of the cluster coordinator? What is the list of the main actions performed by the coordinator? We would really appreciate a detailed overview of the coordinator role.

 

Q2: When a view is generated, can we enforce the coordinator (first member of the view) to be a specific node using MembershipChangePolicy?

 

FLUSH

We understand FLUSH is a stop-the-world mechanism used for:

A. transferring state

B. having a coherent view when nodes join/leave

In our case, only 2 nodes are transferring state between them, the majority of the nodes are started at the beginning and (normally) do not leave the cluster; a small number of nodes (less than 6) can join & leave the cluster at random moments, started manually.

Q3: We need to use the FLUSH protocol in the protocol stack, right? Could you give us a detailed explanation on what happens if we remove the FLUSH protocol?

 

Q4: Sometimes, the time between block and unblock calls from MembershipListener is very large (tens of minutes, even over an hour), what could be the possible root cause?

 

Q5: If we have a testing scenario (split partition) where we have a systematic merge failure (100% reproducible) when enabling both MERGE3 and FLUSH, but we have a successful merge when enabling MERGE3 and disabling FLUSH, what could be the main issue?

When FLUSH is enabled, the MERGE3 component is continuously repeating the failed merge attempt.

 

Q6: Do you think we have better results if we make an effort to integrate Jgroups 4.x/5.x? Currently, due to changes in the API, we cannot do a "quick" integration.

 

Q7: If a split partition occurs, does the application integrating Jgroups need to perform any operations, related to merging views? Or does it only have to merge states?

 

Thank you.

Liviu

Bela Ban

unread,
Apr 15, 2024, 8:51:03 AM4/15/24
to jgrou...@googlegroups.com


On 15.04.2024 14:18, Liviu Ioan wrote:

Hello,

 

We went a few times through various sections of the Jgroups manual, but it would be great to see your input on some of the questions below.

We are using Jgroups 3.6.20 over UDP.

 

The coordinator

Q1: What is the definition of the cluster coordinator? What is the list of the main actions performed by the coordinator? We would really appreciate a detailed overview of the coordinator role.



A coordinator is the oldest member in a view; joining members are added at  the end of the view. The coord joins new members are removes old members via views



 

Q2: When a view is generated, can we enforce the coordinator (first member of the view) to be a specific node using MembershipChangePolicy?



Yes



FLUSH

We understand FLUSH is a stop-the-world mechanism used for:

A. transferring state

B. having a coherent view when nodes join/leave

In our case, only 2 nodes are transferring state between them, the majority of the nodes are started at the beginning and (normally) do not leave the cluster; a small number of nodes (less than 6) can join & leave the cluster at random moments, started manually.

Q3: We need to use the FLUSH protocol in the protocol stack, right? Could you give us a detailed explanation on what happens if we remove the FLUSH protocol?



FLUSH implements virtual synchrony. A message sent in view v1 will be delivered by *everyone in the cluster* in v1 or subsequent view v2. However, FLUSH is ~20+ years old and hasn't really seen much maintenance, as nobody's using it.

I've been thinking of deprecating FLUSH for quite a while now...

What's you requirements for using FLUSH? Some of the stuff provided by FLUSH can be performed at the application level or using a protocol such as RSVP. Others might be better off switching to a CP system like RAFT (jgroups-raft)...



Q4: Sometimes, the time between block and unblock calls from MembershipListener is very large (tens of minutes, even over an hour), what could be the possible root cause?


I don't know



Q5: If we have a testing scenario (split partition) where we have a systematic merge failure (100% reproducible) when enabling both MERGE3 and FLUSH, but we have a successful merge when enabling MERGE3 and disabling FLUSH, what could be the main issue?

When FLUSH is enabled, the MERGE3 component is continuously repeating the failed merge attempt.



Possibly the same issue as above: FLUSH is waiting for messages which it never receives...



Q6: Do you think we have better results if we make an effort to integrate Jgroups 4.x/5.x? Currently, due to changes in the API, we cannot do a "quick" integration.



In general, most definitely! But not for FLUSH. which is old irrespective! :-)


 

Q7: If a split partition occurs, does the application integrating Jgroups need to perform any operations, related to merging views? Or does it only have to merge states?



http://www.jgroups.org/manual5/index.html#HandlingNetworkPartitions


Thank you.

Liviu

--
You received this message because you are subscribed to the Google Groups "jgroups-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jgroups-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jgroups-dev/89b68f3d-b984-4a35-9184-56cf8719b111n%40googlegroups.com.

-- 
Bela Ban | http://www.jgroups.org

Message has been deleted

Liviu Ioan

unread,
Apr 15, 2024, 11:38:11 AM4/15/24
to jgroups-dev

Hello,

 

Thank you for your fast reply!

 

The FLUSH protocol

Thanks for the info regarding the deprecation of the FLUSH protocol.

As mentioned, in our case, we have 2 nodes sharing state (normally, both started at the beginning of a scenario).

The majority of the nodes are started in the beginning. Also, some nodes (small number) can join the cluster anytime.

I understand FLUSH is used for distributed concurrency protections around state transfers and join operations (updating the view).

Q8: Is my understanding correct? Is there a way to tell if we really need FLUSH functionalities?

 

Q9: If we really need the FLUSH functionalities, can we use RSVP for our needs, as a FLUSH alternative, for protecting views updates and states transfers?

 

Merge view done by JGroups

Thanks for the link on handling partitions.

If I understand correctly, indeed, as stated, the views are merged by Jgroups, and the app is only concerned with merging the state.

Q10: Do you confirm?

 

Again, thanks.

Liviu

Bela Ban

unread,
Apr 15, 2024, 2:32:19 PM4/15/24
to jgrou...@googlegroups.com


On 15.04.2024 17:38, Liviu Ioan wrote:

Hello,

 

Thank you for your fast reply!

 

The FLUSH protocol

Thanks for the info regarding the deprecation of the FLUSH protocol.

As mentioned, in our case, we have 2 nodes sharing state (normally, both started at the beginning of a scenario).

The majority of the nodes are started in the beginning. Also, some nodes (small number) can join the cluster anytime.

I understand FLUSH is used for distributed concurrency protections around state transfers and join operations (updating the view).

Q8: Is my understanding correct? Is there a way to tell if we really need FLUSH functionalities?



The description above is not very precise, but it *looks* as if you don't need FLUSH



Q9: If we really need the FLUSH functionalities, can we use RSVP for our needs, as a FLUSH alternative, for protecting views updates and states transfers?


What do you mean by 'protecting' view updates? State transfers don't need FLUSH, as the digest shipped with a state makes sure messages not in the state will be resent



Merge view done by JGroups

Thanks for the link on handling partitions.

If I understand correctly, indeed, as stated, the views are merged by Jgroups, and the app is only concerned with merging the state.

Q10: Do you confirm?



Yes

Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
0 new messages