Weighted Quorum corner case

14 views
Skip to first unread message

Nik

unread,
Sep 16, 2017, 11:50:53 AM9/16/17
to codership
Hi,

So I believe I am hitting the corner case specified in the weighted quorum documentation page:

Note
Warning: If a group partitions at the moment when the weight change message is delivered, all partitioned components that deliver weight change messages in the transitional view will become non-primary components. Partitions that deliver messages in the regular view will go through quorum computation with the applied weight when the following transitional view is delivered.
In other words, there is a corner case where the entire cluster can become non-primary component, if the weight changing message is sent at the moment when partitioning takes place. Recovering from such a situation should be done either by waiting for a re-merge or by inspecting which partition is most advanced and by bootstrapping it as a new Primary Component.


I believe I am hitting this case when running into networking issues between aws regions which knocks out 2 of my nodes from a 5 node cluster.  No network issues being reported between the remaining three but I lose the primary component/quorum sometimes when the partitioning occurs.

If I am indeed hitting this corner case, is there anything else I can do to prevent this scenario from occuring during partitioning?  Or is the only way to avoid this scenario is to avoid partitioning in the first place?  I already increased my timeout values according to this page, do I just need to make those timeouts higher?
Reply all
Reply to author
Forward
0 new messages