Entire MariaDB 10.1 Galera cluster locks - wsrep slave threads stuck on comitted (?)

1,296 views
Skip to first unread message

Niels Hendriks

unread,
May 4, 2017, 8:35:34 AM5/4/17
to codership
Hi,

We have a 3-node MariaDB 10.1 Galera cluster where we had 2 incidents of the entire cluster locking. When we check the processlist, we see a lot of comitted <id> threads that have been busy a very long time, I think these are the wsrep_slave_threads as there are 12 of them (which we have configured):

|       1 | system user          |                 | NULL                  | Sleep   |    880 | committed 108652861 | NULL                                                                                                 |    0.000 |
|       2 | system user          |                 | NULL                  | Sleep   | 711100 | wsrep aborter idle  | NULL                                                                                                 |    0.000 |
|       3 | system user          |                 | NULL                  | Sleep   |    880 | committed 108652860 | NULL                                                                                                 |    0.000 |
|       4 | system user          |                 | NULL                  | Sleep   |    882 | committed 108652854 | NULL                                                                                                 |    0.000 |
|       5 | system user          |                 | NULL                  | Sleep   |    882 | committed 108652855 | NULL                                                                                                 |    0.000 |
|       6 | system user          |                 | NULL                  | Sleep   |    881 | committed 108652859 | NULL                                                                                                 |    0.000 |
|       7 | system user          |                 | NULL                  | Sleep   |    879 | committed 108652880 | NULL                                                                                                 |    0.000 |
|       9 | system user          |                 | NULL                  | Sleep   |    883 | committed 108652848 | NULL                                                                                                 |    0.000 |
|      10 | system user          |                 | NULL                  | Sleep   |    882 | committed 108652853 | NULL                                                                                                 |    0.000 |
|      11 | system user          |                 | NULL                  | Sleep   |    881 | committed 108652858 | NULL                                                                                                 |    0.000 |
|      12 | system user          |                 | NULL                  | Sleep   |    881 | committed 108652856 | NULL                                                                                                 |    0.000 |
|      13 | system user          |                 | NULL                  | Sleep   |    882 | committed 108652852 | NULL                                                                                                 |    0.000 |
|      14 | system user          |                 | NULL                  | Sleep   |    881 | committed 108652857 | NULL                                                                                                 |    0.000 |


All the other queries are stuck on query_end across all 3 nodes. Manually killing queries kept them stuck in "killed" state. The only way to restart mysql was to do a kill -9 on it. Once we did so though on one of the nodes everything went fine again and we could restart the other node back up with an IST just fine.

The nodes have 24 CPU threads and 12 cores, so we set the wsrep_slave_threads to 12. The data is on an nvme disk and we use 10Gbit NICs / cables to connect the nodes with each other. 
We have the following galera related settings (obviously without user/pass for sst)
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_node_address=10.0.0.3
wsrep_cluster_address='gcomm://10.0.0.1,10.0.0.2,10.0.0.3'
wsrep_sst_method=xtrabackup-v2
wsrep_retry_autocommit=4
wsrep_slave_threads=12
wsrep_provider_options="gcache.size=10G"
binlog_format=ROW
innodb_autoinc_lock_mode=2
innodb_doublewrite=1
innodb_log_file_size=512M
innodb_flush_log_at_trx_commit=0
enforce_storage_engine=innodb
sql_mode="NO_AUTO_CREATE_USER"


I'm not sure what causes it, are there any things I can check now or do to prevent this? Or things I should check in particular if it happens again? 

Thanks,
Niels

John Daniels

unread,
May 4, 2017, 10:14:13 AM5/4/17
to codership
This was happening to us for quite a while. For us this turned out to be an innodb gap deadlock.

With log error enabled you can verify if it's the same thing by checking if you get some log lines that end like:
lock_mode X locks rec but not gap

Fixed by changing isolation level off default to READ-COMMITTED

That being said, my troubleshooting method when it happened was something like:

show ENGINE innodb status;
show status like '%';
show processlist;
Show status like "Threads%";
Select substring_index(host, ':', 1) as host_name,state,count(*) from information_schema.processlist group by state,host_name;
Show global variables like '%';

- John

Niels Hendriks

unread,
May 4, 2017, 4:17:43 PM5/4/17
to codership
Thanks John, I checked our logs and I do not see anything regarding the errors you mention ("lock_mode X locks rec but not gap"). In fact, when it went unresponsive I did not see any logging from mysql at all to indicate the possible cause.

I'll use the commands you mentioned to see if I can determine the cause the next time it happens. I also saw wsrep has a debug option as well so I am thinking of giving that a try as well to see if it gives any additional information.

Cheers,
Niels

Niels Hendriks

unread,
Jun 23, 2017, 1:41:32 AM6/23/17
to codership
This just happened again for us, it's still a pain to debug since it's so rare. I've got output from the commands mentioned by John but I don't really see a clear indicator in there. I also enabled wsrep_debug as it happened and waited 1-2 minutes but no special messages that give a clear indication at that point either.

What I did notice was that the last 2 times it happened it was caused when our backup software ran - which also runs the query "flush tables" (on a single node in the cluster). Could that be the cause?
The processlist looks like this - changed the usernames / db names a bit since it's also some personal information:
+---------+----------------------+-----------------+-----------------------+---------+--------+-----------------------------+------------------------------------------------------------------------------------------------------+----------+
| Id      | User                 | Host            | db                    | Command | Time   | State                       | Info                                                                                                 | Progress |
+---------+----------------------+-----------------+-----------------------+---------+--------+-----------------------------+------------------------------------------------------------------------------------------------------+----------+
|       1 | system user          |                 | NULL                  | Sleep   | 660096 | wsrep aborter idle          | NULL                                                                                                 |    0.000 |
|       2 | system user          |                 | NULL                  | Sleep   |    970 | applied write set 185104091 | NULL                                                                                                 |    0.000 |
|       3 | system user          |                 | NULL                  | Sleep   |    970 | applied write set 185104095 | NULL                                                                                                 |    0.000 |
|       4 | system user          |                 | NULL                  | Sleep   |    970 | Waiting for table flush     | NULL                                                                                                 |    0.000 |
|       5 | system user          |                 | NULL                  | Sleep   |    970 | applied write set 185104090 | NULL                                                                                                 |    0.000 |
|       6 | system user          |                 | NULL                  | Sleep   |    970 | applied write set 185104089 | NULL                                                                                                 |    0.000 |
|       7 | system user          |                 | NULL                  | Sleep   |    970 | applied write set 185104092 | NULL                                                                                                 |    0.000 |
|       9 | system user          |                 | NULL                  | Sleep   |    970 | applied write set 185104086 | NULL                                                                                                 |    0.000 |
|      10 | system user          |                 | NULL                  | Sleep   |    970 | applied write set 185104088 | NULL                                                                                                 |    0.000 |
|      11 | system user          |                 | NULL                  | Sleep   |    970 | applied write set 185104087 | NULL                                                                                                 |    0.000 |
|      12 | system user          |                 | NULL                  | Sleep   |    971 | Waiting for table flush     | NULL                                                                                                 |    0.000 |
|      13 | system user          |                 | NULL                  | Sleep   |    970 | Waiting for table flush     | NULL                                                                                                 |    0.000 |
|      14 | system user          |                 | NULL                  | Sleep   |    970 | applied write set 185104094 | NULL                                                                                                 |    0.000 |
| 2345718 | usera1234            | 127.0.0.1:46503 | useradb123            | Query   |  65721 | Sending data                | SELECT <query>                                                                                       |    0.000 |
| 2580876 | backup               | 127.0.0.1:44012 | NULL                  | Query   |   6826 | Waiting for table flush     | flush tables                                                                                         |    0.000 |
| 2581798 | usera1234            | 127.0.0.1:44698 | useradb123            | Query   |   6577 | Waiting for table flush     | SELECT <query>                                                                                       |    0.000 |
| 2584284 | usera1234            | 127.0.0.1:46188 | useradb123            | Query   |   5892 | Waiting for table flush     | SELECT <query>                                                                                       |    0.000 |
| 2587531 | usera1234            | 127.0.0.1:48241 | useradb123            | Query   |   5001 | Waiting for table flush     | SELECT <query>                                                                                       |    0.000 |
| 2590650 | usera1234            | 127.0.0.1:50285 | useradb123            | Query   |   4138 | Waiting for table flush     | SELECT <query>                                                                                       |    0.000 |
| 2594425 | usera1234            | 127.0.0.1:53210 | useradb123            | Query   |   3121 | Waiting for table flush     | SELECT <query>                                                                                       |    0.000 |
| 2597340 | usera1234            | 127.0.0.1:54767 | useradb123            | Query   |   2309 | Waiting for table flush     | SELECT <query>                                                                                       |    0.000 |
| 2599707 | usera1234            | 127.0.0.1:56133 | useradb123            | Query   |   1649 | Waiting for table flush     | SELECT <query>                                                                                       |    0.000 |
| 2602109 | userab12345678912345 | 127.0.0.1:57396 | userabb12345678912345 | Query   |    969 | query end                   | UPDATE <query>                                                                                       |    0.000 |
| 2602118 | userc12345           | 127.0.0.1:57399 | userdb12345           | Query   |    966 | query end                   | UPDATE <query>                                                                                       |    0.000 |
| 2602160 | userd1234567111      | 127.0.0.1:57420 | userdb1234567111      | Query   |    956 | query end                   | UPDATE <query>                                                                                       |    0.000 |
| 2602193 | usere12345111        | 127.0.0.1:57429 | usere1d2345111        | Query   |    945 | query end                   | DELETE <query>                                                                                       |    0.000 |
<many more lines with query_end>


Another thing I noticed is that the wsrep threads (system user in processlist) during normal operation are pretty much always showing "committed <id>". Is this an indication that I do more queries than that can be committed? Do I need to increase the wsrep slave threads in that case?

Cheers,
Niels
Reply all
Reply to author
Forward
0 new messages