Disable failback to previous master in Galera cluster

459 views
Skip to first unread message

Robert Dimsdale

unread,
Oct 14, 2014, 9:01:23 PM10/14/14
to maxs...@googlegroups.com
We have configured MaxScale to route traffic to a three-node Galera MariaDB cluster, and verified it works using either connection-based (readconnroute) or statement-based (readwritesplit) modes when all nodes are healthy. We want all writes to go to a single node and have no preference as to a read-write split - we're just trying to avoid the case where writes can be routed to multiple nodes.

However, we observe that when a node fails a new master is chosen and new connections (and hence writes) are routed to this new master. When the original node returns, it regains 'master' status and regardless of the routing mechanism (connection- or statement-based) new connections are now routed to this node. Thus we are able to write to two nodes simultaneously.

How can we configure MaxScale to prevent failback in this scenario? I.e. once a new master is chosen, it remains master until it fails, irrespective of whether the previous master returns.

Our config file is attached.

Thanks,
Rob Dimsdale & Jonathan Berkhahn
CF-Services
Pivotal Software
MaxScale.cnf

Mark Riddoch

unread,
Oct 15, 2014, 3:46:48 AM10/15/14
to Robert Dimsdale, maxs...@googlegroups.com
Hi Rob,

good to hear that you are experimenting with MaxScale and Galera. Unfortunately, we currently have no way to prevent the fallback to the previous master. The master/slave concept is implemented in the mojito module for Galera, it would be fairly simple to modify the monitor to add a “sticky” option to the master selection for Galera.

Mark

--
You received this message because you are subscribed to the Google Groups "MaxScale" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maxscale+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<MaxScale.cnf>

sc...@pivotal.io

unread,
Oct 15, 2014, 3:38:24 PM10/15/14
to maxs...@googlegroups.com, cf-serv...@pivotal.io
Mark, 

When you suggest modifying the monitor, I assume you mean a feature of MaxScale which monitors nodes of a Galera cluster? 

Where can I find more information about the mojito module you mentioned? Google only turned up results related to Yahoo UI. 

Thank you,
Shannon

Mark Riddoch

unread,
Oct 15, 2014, 6:01:08 PM10/15/14
to sc...@pivotal.io, maxs...@googlegroups.com, cf-serv...@pivotal.io
Hi Shannon,

my apologies on two counts, firstly I was not very clear this morning when I answered your email. I think there are two things here that need to be resolved. One can probably be best classified as a bug, that is the existence of a small time window when there can be two masters within the Galera cluster. The other is the addition of a feature to inhibit the automatic fallback of the master when the previous node recovers. Both of these would be within the galeramon module. I have been talking with the author of that module this today and plan to resolve at least the first of these for the next beta refresh of MaxScale. It may eel be possible to also include the second as he believes it may be relatively straight forward.

The other thing I must apologies for is that I did not notice that the spell checker on my machine substituted a typo of monitor as mojito. So sorry for the extra confusion that has caused.

I plan to talk again tomorrow to the author of this module and will hopefully have some feedback for you then as to when we can make it available.

Regards
Mark

Shannon Coen

unread,
Oct 15, 2014, 7:22:46 PM10/15/14
to Mark Riddoch, maxs...@googlegroups.com, cf-serv...@pivotal.io
That is great news, Mark. Thank you very much for your prompt responses. 

As my colleagues Rob and Jonathan mentioned, preventing fallback is important to prevent connections on multiple Galera nodes. I suppose this goal could also potentially be achieved by severing connections to node 1 on fallback to node 0, but this approach seems like it would force unnecessary client reconnects and introduce what I assume to be an anti-pattern; severing connections to a healthy node. 


Shannon Coen
Product Manager, Cloud Foundry
Pivotal, Inc.

Cloud Foundry Services

unread,
Oct 21, 2014, 6:00:16 PM10/21/14
to Mark Riddoch, maxscale, cf-serv...@pivotal.io
Hello Mark,

Have you had a chance to discuss this feature request [1] further with the author of the galeramon module; do you have a better sense as to whether this feature may make it into the next beta refresh? Also, do you have a ballpark timeline for release of this next beta? 

[1] "inhibit the automatic fallback of the master when the previous node recovers"

Thank you,

Shannon Coen
Product Manager, Cloud Foundry
Pivotal, Inc.

Mark Riddoch

unread,
Oct 21, 2014, 6:08:01 PM10/21/14
to Cloud Foundry Services, maxscale
Hi Shannon,

yes, I discussed it with him this morning. His initial thoughts are that it is feasible to include this for the next beta refresh. We are planning on having that ready for the end of next week, however the timing is slightly flexible depending on the feedback we get from the various testers. If we have anything critical we will delay that refresh for further fixes. Hopefully that will not be the case however, I would be surprised if we were to be more that a few days off that planned date.

Regards
Mark

Cloud Foundry Services

unread,
Oct 21, 2014, 7:18:46 PM10/21/14
to Mark Riddoch, Cloud Foundry Services, maxscale
More great news.
Thank you very much; looking forward to it.

Shannon Coen
Product Manager, Cloud Foundry
Pivotal, Inc.

--
You received this message because you are subscribed to a topic in the Google Groups "MaxScale" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/maxscale/oWc4G4tG1TU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to maxscale+u...@googlegroups.com.

Massimiliano Pinto

unread,
Oct 27, 2014, 12:53:01 PM10/27/14
to sc...@pivotal.io, maxscale
Hi Shannon,


I've started the implementation of Master stickiness in Galera monitor module.

The monitor can detect and store the current "master" server and avoid a new master selection, that is based on the node with MIN(wsrep_local_index).

When a node comes back and the value of  'wsrep_local_index' is 0, MaxScale monitor will not select this one as Master but it uses saved master node (if still running) and adds to it an extra bit to the status field.


This is just an example of MaxScale> show servers

Server 0x1a47a60 (server1)
Server: 192.168.122.131
Status:               Slave, Synced, Running
Protocol: MySQLBackend
Port: 3306
Server Version: 5.5.40-MariaDB-wsrep-log
Node Id: 0
Server 0x1a47960 (server2)
Server: 192.168.122.132
Status:               Slave, Synced, Running
Protocol: MySQLBackend
Port: 3306
Server Version: 5.5.40-MariaDB-wsrep-log
Node Id: 2

Server 0x1a47860 (server3)
Server: 192.168.122.133
Status:               Master, Synced, Master Stickiness, Running
Protocol: MySQLBackend
Port: 3306
Server Version: 5.5.40-MariaDB-wsrep-log
Node Id: 1


The examples shows server3 as Master (it was selected when server1 went down) with the Master Stickiness bit.

That bit is added to the status when the master is not set via the rule of MIN(wsrep_local_index)

I'm still working on the option switch that makes optional this new behaviour and also running tests.

I'll share the repository branch with the code to test when the implementation is ready.



Massimiliano

Shannon Coen

unread,
Oct 27, 2014, 4:10:08 PM10/27/14
to Massimiliano Pinto, maxscale
Hello Massimiliano,

It's great to hear this is being worked on. Could you please confirm that I correctly understand the expected behavior? 

Without stickiness, MaxScale always assigns master status to healthy node with the lowest index id, even though the current master may be healthy. 

Node 0 fails, master assigned to node 1.
Node 1 also fails, master assigned to node 2.
Node 1 returns to health, master assigned to node 1 (even though node 2 is still healthy)
Node 0 returns to health, master assigned to node 0 (even though node 1 is still healthy)

With stickiness, MaxScale should only reroute connections when the current master fails.

Node 0 fails, master assigned to node 1.
Node 1 also fails, master assigned to node 2.
Node 1 returns to health, master remains assigned to node 2
Node 0 returns to health, master remains assigned to node 2
Node 2 fails, master assigned to node 0

Is this the same as you've described?

Also, could you help me understand why a Galera operator would want non-sticky "fallback" behavior? Unless connections are closed by MaxScale on fallback (I don't think MaxScale is doing this?), it exposes the potential for deadlocking. Even if MaxScale severed connections on fallback, it seems like this only makes sense if node 0 is somehow more important that the other cluster nodes in the cluster. Maybe all nodes are not given the same memory/cpu, or maybe other nodes are distributed in networks with higher latency? In these cases, failing over to another node results in a degraded service state. We have been thinking that all cluster nodes should be equivalent, such that every node can serve as master without service degradation until it fails, but I imagine other operators may have different use cases. I'd be interested to hear some examples.

Thank you,

 


Shannon Coen
Product Manager, Cloud Foundry
Pivotal, Inc.

Massimiliano Pinto

unread,
Oct 28, 2014, 10:56:19 AM10/28/14
to Shannon Coen, maxscale
Hi Shannon,


On 27 Oct 2014, at 21:09, Shannon Coen <sc...@pivotal.io> wrote:

Hello Massimiliano,

It's great to hear this is being worked on. Could you please confirm that I correctly understand the expected behavior? 

Without stickiness, MaxScale always assigns master status to healthy node with the lowest index id, even though the current master may be healthy. 

Node 0 fails, master assigned to node 1.
Node 1 also fails, master assigned to node 2.
Node 1 returns to health, master assigned to node 1 (even though node 2 is still healthy)
Node 0 returns to health, master assigned to node 0 (even though node 1 is still healthy)

With stickiness, MaxScale should only reroute connections when the current master fails.

Node 0 fails, master assigned to node 1.
Node 1 also fails, master assigned to node 2.
Node 1 returns to health, master remains assigned to node 2
Node 0 returns to health, master remains assigned to node 2
Node 2 fails, master assigned to node 0

Is this the same as you've described?


Yes, that's exactly how it works.


Also, could you help me understand why a Galera operator would want non-sticky "fallback" behavior?

Well, the ideal solution is to have the master stickiness by default, but we also think about users that already have experience with Galera monitor and how it behaves.
Retaining previous behaviour will not cause any surprise.

Unless connections are closed by MaxScale on fallback (I don't think MaxScale is doing this?), it exposes the potential for deadlocking.

This is not implemented right now, we may look at for a future version, i.e. 1.1.
MaxScale users feedback also matters for the development start.

Even if MaxScale severed connections on fallback, it seems like this only makes sense if node 0 is somehow more important that the other cluster nodes in the cluster. Maybe all nodes are not given the same memory/cpu, or maybe other nodes are distributed in networks with higher latency?


Currently we have implemented a weighting mechanism for routing modules only.

It would be useful to add a similar approach in the monitor module allowing master selection based non only on node id but on other informations.

In these cases, failing over to another node results in a degraded service state. We have been thinking that all cluster nodes should be equivalent, such that every node can serve as master without service degradation until it fails, but I imagine other operators may have different use cases. I'd be interested to hear some examples.

Thank you,

 



Massimiliano
Reply all
Reply to author
Forward
0 new messages