Running pt-online-schema-change on a busy production DB, hit by whole lotta deadlocks

1,216 views
Skip to first unread message

Yoni Levy

unread,
Jun 24, 2014, 10:54:16 AM6/24/14
to percona-d...@googlegroups.com
Hi,

TL;DR - pt-online-schema-change deadlocks like crazy - am i doing something wrong?

It's my first time using the Percona Toolkit so bear with me.

I'm attempting to alter a MySQL table (InnoDB engine) with ~30MM rows, that's being written to and read from quite actively.
From what I understand, pt-online-schema-change should be able to handle that nicely, but that doesn't seem to be the case.

For the record, here's what I ran:
pt-online-schema-change --execute --alter "ADD COLUMN xxx TINYINT(4) DEFAULT NULL" D=db,t=tbl

Immediately following that, our servers started complaining about deadlocked queries, and consequently many user requests (directly hitting that table) failed.
I had to Ctrl+C the pt-online-schema-change in order for the system to be functional again.

Here's a sample deadlock:
------------------------
LATEST DETECTED DEADLOCK
------------------------
140624 15:38:54
*** (1) TRANSACTION:
TRANSACTION C092E7AC, ACTIVE 1 sec setting auto-inc lock
mysql tables in use 2, locked 2
LOCK WAIT 4 lock struct(s), heap size 1248, 1 row lock(s), undo log entries 2
MySQL thread id 1190466, OS thread handle 0x7f3273731700, query id 10521593868 ip-10-123-10-133.ec2.internal 10.123.10.133 admin update
REPLACE INTO `db`.`_tbl_new` (`stuff`) VALUES (NEW.`stuff`)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `db`.`_tbl_new` trx id C092E7AC lock mode AUTO-INC waiting
*** (2) TRANSACTION:
TRANSACTION C092E66D, ACTIVE 1 sec inserting
mysql tables in use 2, locked 2
12 lock struct(s), heap size 3112, 7 row lock(s), undo log entries 15
MySQL thread id 1190115, OS thread handle 0x7f327895a700, query id 10521592449 ip-10-6-126-45.ec2.internal 10.6.126.45 admin update
REPLACE INTO `db`.`_tbl_new` (`stuff`) VALUES (NEW.`stuff`)
*** (2) HOLDS THE LOCK(S):
TABLE LOCK table `db`.`_tbl_new` trx id C092E66D lock mode AUTO-INC
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 277 page no 229622 n bits 336 index `id_auto_inc` of table `db`.`_tbl_new` trx id C092E66D lock_mode X waiting
Record lock, heap no 270 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 8; hex 800000000cde28dc; asc       ( ;;
 1: len 8; hex 800000000006eb25; asc        %;;

MySQL version: 5.5.31
Percona Toolkit version: 2.2.8

Is there some configuration I'm missing or something else to help prevent those deadlocks? Is this a known issue?

Any help would be greatly appreciated, I really want to alter that table.
(╯°□°)╯︵ ┻━┻

Thanks,
Yoni

Bill Karwin

unread,
Jun 24, 2014, 11:57:28 AM6/24/14
to percona-d...@googlegroups.com
Yes, this is a known cause of deadlocks, but it is not specific to pt-online-schema-change. 

I would guess that your table has a secondary UNIQUE KEY in addition to the auto-increment primary key. What’s going on in the above deadlock is:

  1. Transaction 2 acquires its AUTO-INC lock, but hasn’t yet acquired the lock on the UNIQUE KEY in the table.
  2. Transaction 1 tries to acquire its AUTO-INC lock, but it’s already in use, so transaction 1 waits on it. It acquires an "intention lock" to indicate that it’s waiting.  
  3. Transaction 2 tries to acquire a exclusive lock on the UNIQUE KEY, but can’t acquire it because another session has an intention lock. Exclusive locks must be truly exclusive, no lock of any other type may be held by another session.

Thus both sessions are waiting on each other, and cause a deadlock. MySQL always has this risk of deadlocks on auto-inc locks, but normally it’s rare. I guess the chance of the race condition happening is increased while pt-online-schema-change is inserting rows as fast as it can.

I think you can work around this by using SET GLOBAL innodb_autoinc_lock_mode = 0; This will make MySQL revert to auto-inc locking behavior it used in the MySQL 5.0 time frame, which holds a table-level lock auto-inc lock for the full duration of the INSERT. See http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html for details on that.

That should avoid the risk deadlock, even though it limits concurrency temporarily. Transaction 2 will acquire the table lock. Transaction 1 will wait on the table lock, and not acquire an insert intention lock. Transaction 2 can finish its work.

You can set innodb_autoinc_lock_mode back to the default value of 1 (or other value you had been using) after your pt-online-schema-change is complete.

Any help would be greatly appreciated, I really want to alter that table.
(╯°□°)╯︵ ┻━┻

Unfortunately, pt-online-schema-change does not currently support the FLIP TABLE statement.
 ┬──┬ ノ(°—°ノ)


--
Bill Karwin
Senior Knowledge Manager

Yoni Levy

unread,
Jun 26, 2014, 10:18:31 AM6/26/14
to percona-d...@googlegroups.com
Thanks so much for look into this Bill!

Unfortunately, I've tried re-running with innodb_autoinc_lock_mode = 0, and experienced the same exact behaviour, here's a fresh deadlock in case I missed something:
------------------------
LATEST DETECTED DEADLOCK
------------------------
140626 16:01:16
*** (1) TRANSACTION:
TRANSACTION C3127E01, ACTIVE 0 sec inserting
mysql tables in use 2, locked 2
LOCK WAIT 10 lock struct(s), heap size 3112, 15 row lock(s), undo log entries 19
MySQL thread id 87, OS thread handle 0x7f6d2103e700, query id 2824608 ip-10-6-126-45.ec2.internal 10.6.126.45 admin update
REPLACE INTO `db`.`_tbl_new` (`stuff`) VALUES (NEW.`stuff`)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 281 page no 3129 n bits 400 index `id_auto_inc` of table `db`.`_tbl_new` trx id C3127E01 lock_mode X waiting
Record lock, heap no 325 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 8; hex 800000000d05c9cc; asc         ;;
 1: len 8; hex 80000000000590e3; asc         ;;
*** (2) TRANSACTION:
TRANSACTION C3127DFA, ACTIVE 0 sec setting auto-inc lock
mysql tables in use 2, locked 2
5 lock struct(s), heap size 1248, 1 row lock(s), undo log entries 10
MySQL thread id 155, OS thread handle 0x7f6d20e23700, query id 2824746 ip-10-6-126-45.ec2.internal 10.6.126.45 admin update
REPLACE INTO `db`.`_tbl_new` (`stuff`) VALUES (NEW.`stuff`)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 281 page no 3129 n bits 400 index `id_auto_inc` of table `db`.`_tbl_new` trx id C3127DFA lock_mode X locks rec but not gap
Record lock, heap no 325 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 8; hex 800000000d05c9cc; asc         ;;
 1: len 8; hex 80000000000590e3; asc         ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `db`.`_tbl_new` trx id C3127DFA lock mode AUTO-INC waiting
*** WE ROLL BACK TRANSACTION (2)

Any ideas?

Thanks
Reply all
Reply to author
Forward
0 new messages