How to cancel a failover in progress when DelayMasterPromotionIfSQLThreadNotUpToDate is set to true?

46 views
Skip to first unread message

Liviu Lintes

unread,
May 12, 2020, 8:27:50 AM5/12/20
to orchestrator-mysql
Hi,
 
We are running Orchestrator with the settings:
  - FailMasterPromotionIfSQLThreadNotUpToDate  False
  - DelayMasterPromotionIfSQLThreadNotUpToDate true

We are running in semi-sync and with a very large rpl_semisync_master_timeout value.
This means that while replication can be delayed, we would not have data loss. So, if the master crashes and replication is delayed, orchestrator will delay the failover up to the moment the fastest slave catches up.
However there might be cases where replication can be delayed hours or more, the master crashes and we would like to manually promote one of the slaves as a new master and open it for RW. 
Is there a way to tell orchestrator to cancel the failover in a case like this?
So.  again the scenario is:
 - replication is delayed on all slaves by hours.
 - master crashes.
 - due to the above config, orchestrator will delay fail over until one replica catches up.
 - we don't want to wait that long and we would like to open one of the slaves for writing and cancel any further action from Orchestrator.

We can manually open one of the slaves for RW, point the app to the new slave, but Orchestrator will still execute the post failover hook when replication catches up and the results can be unexpected.
So we are looking for a way to tell Orchestrator to not wait for replication to catch up and not execute the post failover hooks anymore.
Any advise for such a scenario? Any API endpoint that can help?

 Thank you,
 Liviu

Shlomi Noach

unread,
May 12, 2020, 3:03:58 PM5/12/20
to Liviu Lintes, orchestrator-mysql
Right now that's not supported. However, I've laid the grounds for a timeout-based variant, something like DelayMasterPromotionIfSQLThreadNotUpToDateMaxSeconds (int)

It's actually quite simple to add this new feature. Would you like to open an issue with a feature request?

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-my...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/orchestrator-mysql/3e273120-12f8-4f6c-846a-4366bf66f976%40googlegroups.com.

Liviu Lintes

unread,
May 13, 2020, 9:53:34 AM5/13/20
to orchestrator-mysql
We can add a feature request, but I am trying to understand more about it and how this would help us.
As per my understanding, DelayMasterPromotionIfSQLThreadNotUpToDateMaxSeconds will wait for the set amount of time and if SQL threads are not up to date it will go ahead and promote one of the slaves to master and regroup remaining slaves under the new master, even if the SQL thread did not catch up.
Our problem is that this feature will apply to all MySQL instances monitored by this Orchestrator.
We would like to have the ability to make distinction between MySQL clusters. For some clusters we are OK with waiting until the replication catches up. For others we would like to be able to tell Orchestrator to stop waiting for SQL Thread to catch up and pick a slave now and promote it.
We have tried to manually open one of the slaves for RW and regroup the other slaves under it thinking that Orchestrator will pick the only remaining direct slave for failover.
However, it will not pick the direct slave, it will still pick a second level slave of the direct slave.
For example:
- node1 is master and node2,3,4 are slaves and delayed.
- node1 crashes
- orchestrator wait for one of the node2,3,4 to catch up.
- While they are still delayed we designate node3,4 as slaves of node2, while node2 is still a slave of node1.
- node2 is open for writing and data should be propagated to node3,4 but for node1, node2 will have errant GTIDs.
- the espectation was that Orchestrator will pick node2 to failover but this is not the case. It always picks one of the node 3 or 4.

That is basically what we want to achieve. If, while Orchestrator is waiting for SQL threads to catch up, we could signal orchestrator that we want to cancel the wait and just go ahead and pick one slave and promote it, that would be good enough. We don't necessary want to be able to pick the promoted slave ourselves. Letting Orchestrator pick it is good enough but this should not be handled statically by DelayMasterPromotionIfSQLThreadNotUpToDateMaxSeconds. It should be something that can be signaled to Orchestrator for a specific MySQL instance.

Please let us know what you think about this request.
Thank you,
 Liviu
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-mysql+unsub...@googlegroups.com.

Shlomi Noach

unread,
May 14, 2020, 3:51:25 AM5/14/20
to Liviu Lintes, orchestrator-mysql
Thank you for elaborating, I think I understand your problem better now.

I looked at the code to see what the failure paths would be for a `WaitForSQLThreadUpToDate` scenario. Allow me to elaborate a bit, and then suggest a couple hacks that should work right away.

When `WaitForSQLThreadUpToDate: true`, and upon a DeadMaster, orchestrator begins to run a failover, disregarding `WaitForSQLThreadUpToDate`. It will choose the candidate replica, regroup replicas, etc. It will then have the "promoted replica" -- the one designated to be a master. Just before promoting it and finalizing the failover, it runs a set of tests, among which is `WaitForSQLThreadUpToDate`. If any of those tests fail, then the promotion is aborted.

Now, there's two ways you can actively cause `WaitForSQLThreadUpToDate` to fail:

- on the designated replica, run `stop slave` -- operation should fail after `ReasonableReplicationLagSeconds` (default: 10 seconds)
- on the designated replica, run `stop slave; change master to master_sql_delay=1` -- operation should fail within 1 second

This will abort the active failover.

Moreover, you will expect to have a designated replica with all other servers replicating from it. If you wish to manually `orchestrator-client -c reset-replica -i <the-designated-replica> && orchestrator-client -c set-writeable -i <the-designated-replica>`, it will take up to 1 minute for orchestrator to identify that this is now the new master, and orchestrator will advertise this to KV stores. But it will not run failover hooks.

What I'll look into, is how `orchestrator-client -c force-master-failover` should behave in this case. Right now, `force-master-failover` again takes `WaitForSQLThreadUpToDate` into consideration, but I believe this is the wrong thing to do; I just need to verify I haven't missed any hidden assumption.

Does that make sense? Would you like to try this and report back?

To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-my...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-my...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/orchestrator-mysql/29c52775-916d-44b0-aa47-c9051838284d%40googlegroups.com.

Liviu Lintes

unread,
May 14, 2020, 8:55:35 AM5/14/20
to orchestrator-mysql
Thank you for the detailed response. One more question related to this scenario. How do I identify the PromotedReplica? I assume that as soon as the master crashed, the fail over starts, it will pick up a slave to be the promoted replica. Few mins or hours later when we decide to stop the fail over process we would have to have a way to identify what is the PromotedReplica and apply the suggested methods for stopping the fail over process. Is there an API requests that would allow me to identify it?

Thanl uou,
 Liviu
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-mysql+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-mysql+unsub...@googlegroups.com.

Shlomi Noach

unread,
May 14, 2020, 9:04:08 AM5/14/20
to Liviu Lintes, orchestrator-mysql
If the fail over process goes well, there will only be a single replica, and that’s your promoted replica.
But, even if not, you can just stop replication on all of the master’s Immediate replicas.


Orchestrato-client -c which-replicas -i <the-master>

To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-my...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-my...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-my...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/orchestrator-mysql/58985a19-5854-4050-bb4f-0c30d5470b99%40googlegroups.com.

Liviu Lintes

unread,
May 14, 2020, 11:52:21 AM5/14/20
to orchestrator-mysql
Actually, the scenario that I tested was the case you have one very large transaction that was applied on primary and it is applied on secondaries generating delays.
I will test the case where replication is delayed because slaves cannot simply keep up with a lot of small transactions. For this scenario, I should be able to apply the "fix" you provided.
But if you have a very large transaction, that takes 2 hours to apply on the secondaries, during those 2 hours, slaves are not re-grouped and I cannot stop the replicas and/or set a delay on the slave. In this case you do not know what is the PromotedRepica picked by Orchestrator until the replication caught up.
Here is an example of the recover log for such a case where I have a large transaction on the slaves:

2020-05-12T14:15:01Z RecoverDeadMaster: masterRecoveryType=MasterRecoveryGTID
2020-05-12T14:15:01Z RecoverDeadMaster: regrouping replicas via GTID

---- During this interval, (14:15:01 to 14:17:59) there is no actual regrouping of slaves under PromotedReplica and I do not know which one is the promoted replica.
-----I tried to pick one and movedBelowGTID, but at the end of this interval it would picka another replica to promote.
Is there a way during this interval to find out what is the promoted replica?

2020-05-12T14:17:59Z RecoverDeadMaster: promotedReplicaIsIdeal(mysqd8-rr-002:4301)
2020-05-12T14:18:01Z RecoverDeadMaster: 0 postponed functions
2020-05-12T14:18:01Z checking if should replace promoted replica with a better candidate
2020-05-12T14:18:01Z + checking if promoted replica is the ideal candidate
2020-05-12T14:18:01Z + searching for an ideal candidate
2020-05-12T14:18:01Z + checking if promoted replica is an OK candidate
2020-05-12T14:18:01Z + searching for a candidate
2020-05-12T14:18:01Z + searching for a candidate
2020-05-12T14:18:01Z + found no server to promote on top promoted replica
2020-05-12T14:18:01Z replace-promoted-replica-with-candidate: promoted instance mysqd8-rr-002:4301 requires no further action


Thank you,
 Liviu
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-mysql+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-mysql+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-mysql+unsub...@googlegroups.com.

Shlomi Noach

unread,
May 14, 2020, 11:56:56 AM5/14/20
to Liviu Lintes, orchestrator-mysql
So, I have to say, a 2 hour transaction is something I consider as BadNews(TM) and should be avoided. There’s various limitations and risks to such long transaction. Not being able to stop replication is one of them. 
To now expect orchestrator to be able to handle it is more challenging. I can work something out. But I’m not sure where that would leave you: even if you abort the failover, you’re stuck with a bunch of replicas you can’t interrupt. Or perhaps you have some flow that just works?

To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-my...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-my...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-my...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-my...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/orchestrator-mysql/09ecb575-9dd1-43a7-b127-b4a5c46fd173%40googlegroups.com.

Liviu Lintes

unread,
May 14, 2020, 5:52:39 PM5/14/20
to orchestrator-mysql
I have tested the scenario where there are no large transactions on the slaves and cancelling the failover is easy as long as I am able to issue a stop slave.
Things are clear here. One more question about the case where I do have a large transaction on slaves. Based on my testing, I noticed that as long as transactions are in progress, all slaves are still under the crashed master. No regrouping is done until stop slave can be issued and finalized.
 While orchestrator is waiting for the transaction to finish on slaves, it seems that it already picked a PromotedReplica. It is just waiting to be able to issue stop slave so it can regroup them.
 Is there a way I can find what is the PromotedReplica during this time while it is waiting for the large transaction to finish on slaves? If I can identify it, I can just open that one and let orchestrator at the end do the regrouping, etc.. The other slaves will not be accessible as they would be read_only and of course apps are not redirected to them.

If I cannot identify it via API/orchestrator-client requests my fix for cancelling this type of delayed failover would be:
- pick one of the slaves.
- kill MySQL on the remaining slaves. Graceful shutdown cannot happen while large transaction is in progress.
- make new master RW. set semi_sync_master OFF, etc..
- restart the killed slaves and regroup them under the new master.

It is a bit brutal but it seems to work. Do you have any other suggestion for this case? We do have these large transaction occurring especially in an environment where DBs are self service and primary keys are not enforced and one large delete can cause havoc. This is not PROD but still you do not want to keep the instance unavailable when these things happen.

 Thank you,
 Liviu
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-mysql+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-mysql+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-mysql+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-mysql+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages