Split-brain due to network failure

1,138 views
Skip to first unread message

Eamon Daly

unread,
Jun 19, 2013, 9:39:48 PM6/19/13
to prm-d...@googlegroups.com
I'm new to corosync and pacemaker so this might be a stunningly stupid question, but here goes: our (well-known company's) cloud network is somewhat spotty, and we've discovered that if the nic flakes, we never recover from the split-brain. Specifically, if db1 is the master and I do "ifdown eth2" on db2 to simulate a connectivity failure, PRM immediately kills the slave thread like so:

130619 15:44:37 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
130619 15:44:37 [Note] Slave I/O thread killed while reading event
130619 15:44:37 [Note] Slave I/O thread exiting, read up to log 'mysql-bin-db1.002084', position 48735139
130619 15:44:37 [Note] Error reading relay log event: slave SQL thread was killed

That's expected behavior, of course: db2 has no way of knowing that its own network is the issue, so it immediately takes over the master role. Unfortunately, in doing so it resets the slave status* and prevents any chance of re-establishing its binlog position once the network comes back. What is the "right" way to mitigate transient ping issues?


* This may be naive, but why doesn't it update p_mysql_REPL_INFO when this happens? Shouldn't this be unique for each slave so it could theoretically pick up the thread again once it recovers from an outage?

Garrick Peterson

unread,
Jun 19, 2013, 9:59:02 PM6/19/13
to prm-d...@googlegroups.com
Eamon,

I may be corrected, but the immediate solution to your problem is to add another node to your cluster (which does not have to run MySQL, just corosync/pacemaker), and enable quorum (set the no-quorum-policy option to stop (the default)).

What this does is prevent a simple network segmentation from causing both servers to go into master mode. Instead, the node which does not have connection to the other two will shut down MySQL entirely and release any vips, preventing writes. When it's reconnected to the remaining nodes, it will simply resume as a slave as it was before.

There is still the potential for data loss (i.e. if the master is the one that's segmented from the other two nodes, any un-replicated data will be lost), but you'll reduce the potential for split brain.

-Garrick

--
You received this message because you are subscribed to the Google Groups "PRM-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prm-discuss...@googlegroups.com.
To post to this group, send email to prm-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/prm-discuss.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

-- 
Garrick Peterson, Remote DBA, Percona
garrick....@percona.com
skype: garrick-peterson
office: +1-888-401-3401 Ext 575
Belgrade, Montana, United States (GMT -7)

http://www.percona.com
http://www.mysqlperformanceblog.com/

Percona Live London MySQL Conference

Eamon Daly

unread,
Jun 20, 2013, 12:09:27 AM6/20/13
to prm-d...@googlegroups.com
Thanks for the quick reply, Garrick, but unfortunately it looks like that doesn't work: the third (non-MySQL) node comes up as offline. I gave it a shot anyway, but sure enough, db2 immediately reset its slave status and tried to assume master. Is there something special that needs to happen so the cluster considers the third node online even when MySQL isn't running?

Florian Haas

unread,
Jun 20, 2013, 7:24:44 AM6/20/13
to prm-d...@googlegroups.com
On 06/20/2013 06:09 AM, Eamon Daly wrote:
> Thanks for the quick reply, Garrick, but unfortunately it looks like
> that doesn't work: the third (non-MySQL) node comes up as offline. I
> gave it a shot anyway, but sure enough, db2 immediately reset its slave
> status and tried to assume master. Is there something special that needs
> to happen so the cluster considers the third node online even when MySQL
> isn't running?

A node popping up in the Pacemaker cluster, but never going beyond the
OFFLINE status, is usually a symptom of Corosync starting and joining
the Totem ring, but the Pacemaker master control process (pacemakerd)
never getting fired up.

When your three nodes are running, do

corosync-objctl | grep member

to figure out whether you have 2 or 3 nodes joined. Assuming you are
seeing 3, but crm_mon keeps reporting the third node as OFFLINE, chances
are you never did "service pacemaker start" on that third node.

More on checking Corosync membership:
http://www.hastexo.com/resources/hints-and-kinks/checking-corosync-cluster-membership

Cheers,
Florian


signature.asc

Yves Trudeau

unread,
Jun 20, 2013, 9:41:05 AM6/20/13
to prm-d...@googlegroups.com
Hi,
   ifdown to simulate network loss is not the best, corosync/pacemaker will react wildly since it is binding to an interface that is gone.  I suggest you use iptables to block the traffic.  Of course, the best in your case would be to have redundant rings.   Also, with decent token time and token_retransmits_before_loss_const, the network issue must be long enough to cause such a problem.  The default values require an outage of more than 4s to get corosync to react.  If this is too short, you can set "token" to a larger value, the default is "1000" ms.  That will of course slow down the dead node detection. 

Since replication is, by design, async, there's of course the possibility of losing data and since there's no network, it's almost impossible to extract the data without human intervention.  In such cases, you have to choose between continuity of service and data integrity. 

Regards,

Yves




2013/6/20 Florian Haas <floria...@hastexo.com>

Eamon Daly

unread,
Jun 20, 2013, 5:41:00 PM6/20/13
to prm-d...@googlegroups.com
Yep, PEBCAK: I hadn't opened up the firewall on the third node. Including that node and setting the no quorum policy to stop seems to be the combination I've been looking for! Thanks, everyone!
Reply all
Reply to author
Forward
0 new messages