Orchestrator stuck trying to failover mariadb

Robert

unread,

Feb 2, 2021, 5:34:50 PM2/2/21

to orchestrator-mysql

We have a 3 node mariadb cluster with orchestrator running on all 3 nodes in a raft configuration. Recently we had an issue where one node in the cluster was in an amazon availability zone that had an outage. In this case the node in that AZ was the mysql master and the orchestrator leader. We see that orchestrator failed over to a different node as expected, but it did not ever do a mysql failover. We see the following error message repeating a few times that seems to be the culprit but we dont see any evidence that a failover was attempted before this in a log or on the db. Can anyone provide any guidance on how we ended up in this state, and how to troubleshoot?

ERROR AttemptRecoveryRegistration: cluster NODE1 has recently experienced a failover (of NODE1) and is in active period. It will not be failed over again. You may acknowledge the failure on this cluster (-c ack-cluster-recoveries) or on NODE1 (-c ack-instance-recoveries) to remove this blockage

thanks,

Robert

Shlomi Noach

unread,

Feb 3, 2021, 1:05:13 AM2/3/21

to Robert, orchestrator-mysql

Hi,

Can you check the api for recent recoveried (or _claimed_ recent recoveries)? see /api/audit-recovery on any of the nodes.

if, indeed, a recovery is listed, proceed to look into exactly what happened with that recovery, via /api/audit-recovery-steps/<uid>

Or, you can do all that from the web interface.

That'll give some starting point to finding our what went wrong.

--
This message and any attachments are solely for the intended recipient. If you are not the intended recipient, disclosure, copying, use, or distribution of the information included in this message is prohibited -- please immediately and permanently delete this message.
--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-my...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/orchestrator-mysql/e289f0d3-6a86-474e-8e0b-838a386416c9n%40googlegroups.com.

Robert Little

unread,

Feb 3, 2021, 11:59:38 AM2/3/21

to Shlomi Noach, orchestrator-mysql

Hi Shlomi,

Thanks for the response. I tried the audit-recovery endpoint and did get a few instances, but the audit-recovery-steps endpoint is always returning []. Is this expected behavior?

Thanks,

Reply all

Reply to author

Forward