mysqldump lock whole cluster

620 views
Skip to first unread message

James Wang

unread,
Aug 3, 2015, 5:18:08 AM8/3/15
to codership
Hi All,

A colleague did a mysqldump (without --single-transaction) on one the Galera nodes.  This locked the table on the node as expected.

However, the whole cluster seamed locked as well.

Why this behavior please?  Does the cluster need to wait till all nodes acknowledgements ?  I thought only wait for majority (Quorum). 

Please shed some light.
Thanks  a lot in advance

Philip Stoev

unread,
Aug 3, 2015, 5:58:34 AM8/3/15
to James Wang, codership
Hello,

What has happened is as follows:
1. mysqldump causes the table to be locked for updates while it is dumping
it.
2. the lock means that updates on that same table arriving from other nodes
can not be applied
3. the queue of updates waiting to be applied grows on the node and it
requests that the other nodes throttle (e.g. apply flow control to) their
activity until it can catch up
4. upon receipt of the flow-control message, the other nodes begin to block
transaction commits from their clients.

So it is not strictly about all nodes acknowledging a transaction, it is
about all nodes being able to apply it in time, and the node performing the
mysqldump can not do that.

To work around this problem, you could:
* try a non-blocking backup via xtrabackup
* temporarily remove the node from the cluster while the mysqldump is
running by running set global wsrep_desync=ON before starting xtrabackup and
then set global wsrep_desync=OFF after it is done.

Philip Stoev
--
You received this message because you are subscribed to the Google Groups
"codership" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to codership-tea...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

James Wang

unread,
Aug 3, 2015, 6:18:03 AM8/3/15
to codership, jwang...@gmail.com

Can not a non-block "mysqldump --single-transaction ..."  do the trick as well as xtrabackup please?

Also, am I right to say that if any queries which need to lock a table shall lock the whole cluster?

Thanks a lot in advance

Philip Stoev

unread,
Aug 3, 2015, 6:28:58 AM8/3/15
to James Wang, codersh...@googlegroups.com
mysqldump --single-transaction could work however please note that it does
not provide a consistent snapshot of the database in face of
concurrently-running DDL statements:

https://dev.mysql.com/doc/refman/5.6/en/mysqldump.html#option_mysqldump_single-transaction

There are two distinct cases of table locking which can cause the cluster to
stop committing new transactions:
* if LOCK TABLE or FLUSH TABLES WITH READ LOCK is issued on a node, new
transactions will stop committing due to the flow control mechanism
described previously. This is worked around using wsrep_desync;
* if there is a long running ALTER TABLE. Since all nodes must execute the
ALTER in exactly the same logical moment of time, other transactions will be
blocked; This limitation is scheduled to be removed in the next Galera
release. In the meantime, the "RSU" method can be used to run the DDL one
node at a time.

James Wang

unread,
Aug 3, 2015, 6:52:29 AM8/3/15
to codership, jwang...@gmail.com
Thanks a lot for the clarification, expecting Version 4

James Wang

unread,
Aug 3, 2015, 7:00:38 AM8/3/15
to codership, jwang...@gmail.com
More questions please:

If there is a network issue and one node does not acknowledge, will the cluster has to wait please?

Thanks in advance

Philip Stoev

unread,
Aug 3, 2015, 7:11:31 AM8/3/15
to James Wang, codersh...@googlegroups.com
Hello,

The transaction will wait up to a specified configurable timeout for all
nodes to acknowledge it. If a node does not respond in time, it will be
evicted from the cluster and the transaction will commit on the remaining
nodes. As soon as the offending node is evicted from the cluster, the
cluster will continue to operate normally with the reduced number of nodes.
The evicted node can then rejoin in the future and will be brought up to
speed with the rest.

Philip Stoev

-----Original Message-----
From: James Wang
Sent: Monday, August 03, 2015 14:00
To: codership
Cc: jwang...@gmail.com
Subject: Re: [codership-team] mysqldump lock whole cluster


James Wang

unread,
Aug 3, 2015, 7:19:42 AM8/3/15
to codership, jwang...@gmail.com
Thanks a lot.

Can not the same principle apply to "ALTER TABLE; LOCK TABLE;" please?

If one node is engaged in above, just evict it instead of waiting?

Thanks

Philip Stoev

unread,
Aug 3, 2015, 7:32:32 AM8/3/15
to James Wang, codersh...@googlegroups.com
Eviction is reserved for situations that are outside of user control, e.g.
server failures or network outages. The evicted node will exit and must be
restarted manually to join the cluster.

ALTER TABLE and LOCK TABLE on the other hand are user-initiated actions -- a
system should no go against the explicit wishes of the user by creating an
error situation and shutting down nodes.

James Wang

unread,
Aug 3, 2015, 7:39:37 AM8/3/15
to codership, jwang...@gmail.com
Thanks a lot.

BTW, how does version address this (ALTER TABLE; LOCK TABLE; et al )then, if you know?

Thanks in advance

James Wang

unread,
Aug 3, 2015, 7:45:01 AM8/3/15
to codership, jwang...@gmail.com
I meant Version 4.

Also, timeout parameter name please?
mysql> show variables like '%timeout%';
+-----------------------------+----------+
| Variable_name               | Value    |
+-----------------------------+----------+
| connect_timeout             |     *   |
| delayed_insert_timeout      |   *    |
| have_statement_timeout      | YES      |
| innodb_flush_log_at_timeout | *        |
| innodb_lock_wait_timeout    | *       |
| innodb_rollback_on_timeout  | OFF      |
| interactive_timeout         | *   |
| lock_wait_timeout           | * |
| net_read_timeout            | *       |
| net_write_timeout           | *       |
| rpl_stop_slave_timeout      | * |
| slave_net_timeout           | *     |
| thread_pool_idle_timeout    | *       |
| wait_timeout                | *   |
+-----------------------------+----------+

I replaced my settings with * 

It would be great if you would recommend these timeout variable values.

Thanks a lot in advance

Philip Stoev

unread,
Aug 3, 2015, 9:01:49 AM8/3/15
to James Wang, codership, jwang...@gmail.com
When an ALTER TABLE is issued, all the nodes will only take whatever locks
are required to run the ALTER, allowing updates on other tables to proceed.
LOCK TABLE behavior will remain the same.

Philip Stoev

unread,
Aug 3, 2015, 9:03:36 AM8/3/15
to James Wang, codership, jwang...@gmail.com
Reply all
Reply to author
Forward
0 new messages