Galera whole cluster locked due to deadlock inside innodb in 1 node

Ilias Bertsimas

unread,

Oct 11, 2012, 10:56:35 AM10/11/12

to codersh...@googlegroups.com

Hello,

I just got a really weird lock in one node due to a deadlock and it held back the whole cluster making it unavailable to server queries.

This is what I got in the log:

WSREP: BF lock wait long

and some extra innodb deadlock monitor output:

------------------------

LATEST DETECTED DEADLOCK

------------------------

121011 14:19:36

*** (1) TRANSACTION:

TRANSACTION 817CA1D73, ACTIVE 0 sec inserting

mysql tables in use 201, locked 201

LOCK WAIT 6 lock struct(s), heap size 1248, 4 row lock(s), undo log entries 2

MySQL thread id 6060764, OS thread handle 0x7fbbd0119700, query id 2791554675 192.168.0.2 user update

INSERT INTO clipping_resume (idReport, flag, cnt) VALUES (NEW.idReport, NEW.flag, 1)

ON DUPLICATE KEY UPDATE cnt=cnt+1

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 364 page no 14 n bits 408 index `PRIMARY` of table `maindb`.`clipping_resume` trx id 817CA1D73 lock_mode X locks rec but not gap waiting

*** (2) TRANSACTION:

TRANSACTION 817CA1D6F, ACTIVE 0 sec fetching rows, thread declared inside InnoDB 497

mysql tables in use 201, locked 201

32 lock struct(s), heap size 6960, 192 row lock(s), undo log entries 190

MySQL thread id 6025190, OS thread handle 0x7fb6a40d1700, query id 2791554664 192.168.0.2 user

DELETE FROM clipping where idReport = 3593 AND flag in ('S') LIMIT 100

*** (2) HOLDS THE LOCK(S):

RECORD LOCKS space id 364 page no 14 n bits 408 index `PRIMARY` of table `maindb`.`clipping_resume` trx id 817CA1D6F lock_mode X locks rec but not gap

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 212 page no 34892 n bits 600 index `flag` of table `maindb`.`clipping` /* Partition `p198` */ trx id 817CA1D6F lock_mode X locks rec but not gap waiting

*** WE ROLL BACK TRANSACTION (1)

Is there a way to avoid this with out of order commits or any way to timeout a node and make it leave the cluster so the other ones can continue ?

Thanks!

Alex Yurchenko

unread,

Oct 11, 2012, 11:17:50 AM10/11/12

to codersh...@googlegroups.com

Just to clarify:

Have the deadlock resolved automatically?

Have the cluster resumed operation after it?

In other words, is the problem in that the cluster was stuck for the
duration of the deadlock?

Thanks,
Alex

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

Ilias Bertsimas

unread,

Oct 11, 2012, 11:24:15 AM10/11/12

to codersh...@googlegroups.com

Hi,

No it did not resume automatically it was locked for a long time with the last message being:

WSREP: BF lock wait long

I had to take down the node with the deadlocks to allow the rest of the cluster to continue.

seppo....@codership.com

unread,

Oct 11, 2012, 11:27:23 AM10/11/12

to codersh...@googlegroups.com

..and one more question: did the cluster hang at the same time that
was tagged in the deadlock report: 121011 14:19:36

-seppo

Ilias Bertsimas

unread,

Oct 11, 2012, 11:41:18 AM10/11/12

to codersh...@googlegroups.com

Hi Seppo,

The only thing I can say is from the nagios log which indicated the first unavailability at 10-11-2012 15:57:22 so yes you are right it seems it the deadlocks timed out a few times but the backlog made that eventually impossible.

Thanks!

Ilias Bertsimas

unread,

Oct 11, 2012, 12:19:14 PM10/11/12

to codersh...@googlegroups.com

It seems this deadlock is related to multi-master and a trigger we have on that table which is the one that holds the lock on the deadlocks monitor.

Is there any known issues with galera and mysql triggers ?

seppo....@codership.com

unread,

Oct 12, 2012, 3:33:30 AM10/12/12

to codersh...@googlegroups.com

There are no known trigger related issues atm. Can you send your
trigger code and related table definitions? I could try to reproduce
the issue with your test scenario. My email is:
seppo....@codership.com

-seppo

> --

Reply all

Reply to author

Forward