MySQL Directory Files Corrupt on Node

20 views
Skip to first unread message

Dean McDowell

unread,
Oct 3, 2019, 6:18:15 PM10/3/19
to Percona Discussion

Hi oracles I hope everyone is well!

 

I have a situation with a Galera Cluster I have running (master-master-master) and I hope someone has the answer.  At this point I must apologise, this is my first Percona XtraDB cluster so I am treading carefully at every turn. 

 

The issue I have is that the cluster is running with 2 nodes at the minute as one of the nodes ran into an issue and subsequently the MySQL directory has been emptied.

 

SO!  MySQL directory is completely empty now (all files had been manually deleted as there were corrupted (this didn't affect the clusters data, thankfully).

 

Am I right in saying that I can go ahead and just start this node with the standard "systemctl start mysql" and the Cluster will take care of the rest, IE. the newly added node will get all required files from the cluster before it is fully synced and writable?

 

Any advice or pointers would be great.

 

Thanks heroes!

Marco Shaw

unread,
Oct 3, 2019, 7:13:06 PM10/3/19
to percona-d...@googlegroups.com
It looks like you sent this exact same message 10 days ago.  I'm not an expert to guide you through such a scenario from start to finish and be relatively sure nothing happens.

For starters, you may want to look at your config to determine how the SST settings are set.

I've had this exact thing happen before (mysql files are gone!) and panicked, but eventually everything did sync again.  You should be particular careful if you are using SST and how it's set.  Some transfers can take the transferring node offline, then you only have one DB accepting writes.

--
You received this message because you are subscribed to the Google Groups "Percona Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to percona-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/percona-discussion/bd75e16d-9677-484f-9c53-d467222e546e%40googlegroups.com.

Dean McDowell

unread,
Oct 4, 2019, 7:54:57 AM10/4/19
to Percona Discussion
Hi Marco,

Thanks for the reply.

No idea how it posted again so apologies for that.

You are correct though. I went digging deeper and read up on the SST sync again.

All I need to do is add the node with the previously corrupted MySQL directory back into the cluster (now with a completely blank MySQL directory).

It will then take a complete copy of what it needs from one of the two running nodes.

I tested within a POC environment and it worked. Just like to get real-world experience/input also.

I will report back with any further findings after adding the node back in.

Tha ks for your time Marco, appreciate it.

Justin Swanhart

unread,
Oct 4, 2019, 9:53:14 AM10/4/19
to percona-d...@googlegroups.com
You should never use just two modes, not even for testing, because it is not realistic, unless you are using an arbitrator node.  

You need three nodes (three data or two data + one arbitrator) at a minimum because galera must have a quorum.  

If you use just two nodes (and no arbitrator) the failure of one stops writes on both, which defeats the purpose of galera, because quorum is lost and writes on the surviving node fail.

If you do have an arbitrator, then when an offline node joins (even with an empty data dir) SST will proceed as normal, as long as the other data node and the arbitrator are available.

If you do not have an arbitrator, you must bootstrap the cluster to restore the failed node.

See the galera docs/website for more info on primary components, quorum, and arbitration.

--
The Doctor



--
You received this message because you are subscribed to the Google Groups "Percona Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to percona-discuss...@googlegroups.com.

Justin Swanhart

unread,
Oct 4, 2019, 10:38:03 PM10/4/19
to percona-d...@googlegroups.com
Hi,

Sorry, missed that you have three nodes.  I saw "running with 2 nodes".  In a three node cluster, when you restart the third node it will do a state transfer and come online, as you saw.  There isn't anything else you need to do.  My advice about the arbitrator node was not necessary.

--
The Doctor
Reply all
Reply to author
Forward
0 new messages