Manual master promotions with GTID replication

694 views
Skip to first unread message

martin....@footballaddicts.com

unread,
Aug 11, 2017, 11:29:00 AM8/11/17
to orchestrator-mysql
Hi,

I'm a little confused about how to promote a new master in a GTID-enabled cluster using Orchestrator, and I'm hoping someone can help make things a bit clearer :)

In short, I can promote a new master via make-co-master and reset-slave and all is well so far, but I'm left in a state where I'm unable to do another promotion because Orchestrator reverts to "classic" replication the next time I attempt to issue make-co-master.

For my testing setup, I'm using Orchestrator 2.1.5 to manage a cluster of three nodes (running Percona 5.6.36-82.1-log) with GTID enabled.

I'm trying to get from here:

db1 [0s,ok,5.6.36-82.1-log,ro,ROW,>>,GTID]
+ db2 [0s,ok,5.6.36-82.1-log,ro,ROW,>>,GTID]
+ db3 [0s,ok,5.6.36-82.1-log,ro,ROW,>>,GTID]

To here:

db2 [0s,ok,5.6.36-82.1-log,ro,ROW,>>,GTID]
+ db1 [0s,ok,5.6.36-82.1-log,ro,ROW,>>,GTID]
+ db3 [0s,ok,5.6.36-82.1-log,ro,ROW,>>,GTID]

And then finally, some time later, to here:

db3 [0s,ok,5.6.36-82.1-log,ro,ROW,>>,GTID]
+ db1 [0s,ok,5.6.36-82.1-log,ro,ROW,>>,GTID]
+ db2 [0s,ok,5.6.36-82.1-log,ro,ROW,>>,GTID]

The first promotion is easily achieved by issuing make-co-master and reset-slave. However, at this point db2 has lost its GTID flag:

db2:3306 [0s,ok,5.6.36-82.1-log,ro,ROW,>>]
+ db1:3306 [0s,ok,5.6.36-82.1-log,ro,ROW,>>,GTID]
+ db3:3306 [0s,ok,5.6.36-82.1-log,ro,ROW,>>,GTID]

When I attempt the second promotion, again by issuing make-co-master to make db3 a co-master of db2, it fails with error 1776:

Error 1776: Parameters MASTER_LOG_FILE, MASTER_LOG_POS, RELAY_LOG_FILE and RELAY_LOG_POS cannot be set when MASTER_AUTO_POSITION is active.


I have found that I can avoid this problem by using detach-replica-master-host rather than reset-slave, but that leaves me with a replication error on the demoted node and the risk of accidentally re-attaching it at any point in the future. Also, the option to detach from the master host is not present in Orchestrator's web interface, which leads me to believe it's not the ideal way to demote the node.

My assumption is that for any move operation on a node after a reset-slave, Orchestrator would prefer GTID replication if both the instance being moved and the destination instance have GTID enabled, but that does not seem to be the case.

Did I misunderstand how to perform a master promotion with GTID replication using Orchestrator? Is there a better way to do it that I'm missing?


Cheers,

Martin

Simon Mudd

unread,
Aug 12, 2017, 4:20:26 AM8/12/17
to orchestrator-mysql
You don’t mention here if you’re doing talking about “scheduled” master promotions or “unscheduled” (due to master failure).
Orchestrator is generally intended to handle the unexpected failure case.
Shlomi and I talked about handling “scheduled promotions” the other day. AFAIK most of the work that’s needed is already handled but
there are no hooks to ensure systems external to MySQL are aware of the change and I had proposed adding such hooks.

Our case may not be typical but we had a “scheduled failover script” written prior to using orchestrator so we still use that. This script
_is_ aware of our complete environment which is why we haven’t changed it, but a colleague asked me why we weren’t using orchestrator
for this process and to some extent I agree.  You can always make the master go away and orchestrator will recover but the process should
be more complete and faster if you do all the steps yourself and in the right order.

For me the missing functionality is to have external hooks to be aware of the fact the master has moved and also to tell the infrastructure
that the old master is dead (or at least no longer the primary master).  These are minor but important notifications on an active system
with a number of clients talking to the master.

Note:  I don’t think that GTID has any real relation to your question, orchestrator shouldn’t care as long as you use MySQL GTID,
MariaDB GTID or Pseudo-GTID.

Shlomi may be able to clarify this better perhaps, but that’s my understanding of the current status.

Simon

martin....@footballaddicts.com

unread,
Aug 14, 2017, 4:51:07 AM8/14/17
to orchestrator-mysql
Thanks, Simon.

It's definitely not an unscheduled promotion, but I think refactoring is probably a more true description of what I'm trying to do.

I'm testing this in a lab setup, but the real-world case I'm looking to solve is that I currently have a master running on old hardware and an older OS, and I want to promote a newer machine to master so that I can re-install the old one and use it as a read-only slave. In my testing, I can definitely achieve that (that's the first promotion in the test case I outlined), but it seems like promoting another node after that would fail.

I haven't tried the "unscheduled" promotion use case though, but it would definitely be interesting to see if that results in the same error. I'll run some more tests today and report back.

/Martin

martin....@footballaddicts.com

unread,
Aug 14, 2017, 8:48:26 AM8/14/17
to orchestrator-mysql, martin....@footballaddicts.com
I have tried a promotion as recovery from a master failure and it seems to work fine, but I realized while testing that it's not really relevant in this case since a recovery does not (and cannot) demote the failed master as it is unreachable by Orchestrator.

This will only be an issue when refactoring/manually promoting a new master. I can go from having master db1 to having co-masters db1/db2 to having master db2, but at that point I'm left unable to make a node co-master of db2.

Shlomi Noach

unread,
Aug 15, 2017, 12:43:25 AM8/15/17
to martin....@footballaddicts.com, orchestrator-mysql
Sorry for the late response, was ooo.

Martin, did you try:

orchestrator -c graceful-master-takeover?

as per documentation:

Gracefully discard master and promote another (direct child) instance instead, even if everything is running well.
This allows for planned switchover.
NOTE:
- Promoted instance must be a direct child of the existing master
- Promoted instance must be the *only* direct child of the existing master. It *is* a planned failover thing.
- Orchestrator will first issue a "set global read_only=1" on existing master
- It will promote candidate master to the binlog positions of the existing master after issuing the above
- There _could_ still be statements issued and executed on the existing master by SUPER users, but those are ignored.
- Orchestrator then proceeds to handle a DeadMaster failover scenario
- Orchestrator will issue all relevant pre-failover and post-failover external processes.
Examples:

orchestrator -c graceful-master-takeover -alias mycluster
Indicate cluster by alias. Orchestrator automatically figures out the master and verifies it has a single direct replica

orchestrator -c graceful-master-takeover -i instance.in.relevant.cluster.com
Indicate cluster by an instance. You don't structly need to specify the master, orchestrator
will infer the master's identify.


---




--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-mysql+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/orchestrator-mysql/ee962cda-73b2-40a9-862d-e8e29a8c5dc6%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Simon Mudd

unread,
Aug 16, 2017, 6:20:01 AM8/16/17
to orchestrator-mysql, martin....@footballaddicts.com
Another thought:

Graceful takeover also doesn't quite fully resolve several issues:
* you "lose" the original master. this is not ideal as the reason for the change may be to upgrade the master and often the old master could potentially be upgraded afterwards and put back into service without requiring a new server to replace it.
* there may be cases when you already have a master in one zone (with slaves) and an intermediate master in another zone (with its own slaves) and you want to "interchange roles". This may be part of an upgrade process or to allow you to do "topology" maintenance and again want to promote the intermediate master to be a master.  We have an internal procedure which does this at the moment called "reverse replication".

Consequently this new feature might be interesting in orchestrator.
I had discussed this with Shlomi previously and there is an issue that orchestrator may not have all the information it needs to achieve such a process but if the information were available then this added functionality would be very convenient.

Simon

Shlomi Noach

unread,
Aug 16, 2017, 6:35:31 AM8/16/17
to Simon Mudd, orchestrator-mysql, martin....@footballaddicts.com
Graceful takeover does not lose the original master. It repoints it to replicate from the promoted master.

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-mysql+unsub...@googlegroups.com.

martin....@footballaddicts.com

unread,
Aug 16, 2017, 7:47:31 AM8/16/17
to orchestrator-mysql, martin....@footballaddicts.com
Thanks, graceful-master-takeover seems to be close to what I want. I've had some success with it in my testing setup, but I have also run into a few weird issues (for example, it sometimes fails saying there are too many replicas when there is only one).

At this point, I don't trust the Vagrant setup I use for testing. I'm working on rebuilding it from scratch right now to verify that the issues I'm having aren't just coming from a bad replication setup. Will report back soon.

Thanks for all the pointers so far, much appreciated!

/Martin

Shlomi Noach

unread,
Aug 16, 2017, 8:09:32 AM8/16/17
to martin....@footballaddicts.com, orchestrator-mysql
it sometimes fails saying there are too many replicas when there is only one

It happens to me as well. For some reason it takes time for orchestrator to realize the master only has the one replica. I'll look into it. 

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-mysql+unsub...@googlegroups.com.

martin....@footballaddicts.com

unread,
Aug 16, 2017, 12:44:57 PM8/16/17
to orchestrator-mysql, martin....@footballaddicts.com
On Wednesday, August 16, 2017 at 2:09:32 PM UTC+2, Shlomi Noach wrote:
it sometimes fails saying there are too many replicas when there is only one

It happens to me as well. For some reason it takes time for orchestrator to realize the master only has the one replica. I'll look into it. 


I see, that's good to know. I thought it might be related to something specific in my setup, but I can discard that theory then :)

I have a new test setup (available here: https://github.com/masv/orchestrator-sandbox) and I can confirm that graceful-master-takeover helps me achieve the ultimate goal of promoting a new master. It does require some manual cleanup afterwards; making the new master writeable, restarting replication on the old master and issuing reset-slave on the new master to make it the only master of the cluster. I can work with that though.

However, I'm still confused by make-co-master. It's listed as a classic relocation command in the CLI help, so I shouldn't assume it works with GTID replication, but the web UI does allow it even when in GTID mode. When issued, the command fails with error 1776 as explained earlier.

Is there a way (other than graceful-master-takeover) to get from master-slave to master-master with GTID replication using Orchestrator, or is that not supported?

I have shared commands and debug logs for my exploration of make-co-master and graceful-master-takeover here, if they are of any use: https://gist.github.com/masv/73ef9cc07cf98ccf8f0aa28225c55689

/Martin

Shlomi Noach

unread,
Aug 16, 2017, 2:07:06 PM8/16/17
to martin....@footballaddicts.com, orchestrator-mysql
making the new master writeable, restarting replication on the old master and issuing reset-slave on the new master to make it the only master of the cluster

set 
"ApplyMySQLPromotionAfterMasterFailover": true

to make orchestrator run "set read_only=0;reset slave all" on promoted master.

I can check the m-m scenario later on.


--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-mysql+unsub...@googlegroups.com.

martin....@footballaddicts.com

unread,
Aug 17, 2017, 10:36:53 AM8/17/17
to orchestrator-mysql, martin....@footballaddicts.com

set 
"ApplyMySQLPromotionAfterMasterFailover": true

to make orchestrator run "set read_only=0;reset slave all" on promoted master.


Ah yes, that's the option I was looking for. Works as expected, thanks! 
Reply all
Reply to author
Forward
0 new messages