We have a 5 node galera cluster all running the following versions, and has been this version for 5 months:
MariaDB-Galera-server.x86_64 5.5.63-1.el6
galera.x86_64 25.3.25-1.rhel6.el6
We have been running Galera for a few years now, and this is the first time we have seen this error that caused our entire cluster to lock up.
One of the nodes in mysqld.log showed this error, and all queries started to back up on all of the nodes in the cluster. Basically the cluster was not able to commit any transactions until I stopped this one node that had the below error and the cluster resumed and was back to normal. After the cluster was normal, I started mysql back on the node that had an issue, and then brought the node back into the cluster, and everything was fine.
To sum up, this one node seemed to be having issues and caused the entire cluster to freeze. Here is the log of what happened at the time on this one node. Does anyone have any insight what might have caused this? We are researching to determine the root cause to prevent it from happening again.
WSREP: BF lock wait long
WSREP: BF lock wait long
WSREP: BF lock wait long
WSREP: BF lock wait long
WSREP: BF lock wait long
WSREP: BF lock wait long
WSREP: BF lock wait long
WSREP: BF lock wait long
WSREP: BF lock wait long
WSREP: BF lock wait long
WSREP: BF lock wait long
WSREP: BF lock wait long
WSREP: BF lock wait long
=====================================
190703 12:06:33 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 31 seconds
BACKGROUND THREAD
srv_master_thread loops: 14191917 1_second, 14189665 sleeps, 1419108 10_second, 1042 background, 1040 flush
srv_master_thread log flush and writes: 11054372
SEMAPHORES
OS WAIT ARRAY INFO: reservation count 6077701, signal count 374120954
Mutex spin waits 757196958, rounds 547825370, OS waits 2315758
RW-shared spins 56593013, rounds 203428691, OS waits 2674175
RW-excl spins 17202237, rounds 562381137, OS waits 960110
Spin rounds per wait: 0.72 mutex, 3.59 RW-shared, 32.69 RW-excl
LATEST FOREIGN KEY ERROR
190507 17:38:07 Transaction:
TRANSACTION 7735D2F61, ACTIVE 0 sec inserting, thread declared inside InnoDB 500
TRANSACTION 7735D2F61, ACTIVE 0 sec inserting, thread declared inside InnoDB 500
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1248, 2 row lock(s), undo log entries 1
MySQL thread id 104878631, OS thread handle 0x7f5ec64dc700, query id 1326995915 ipaddress username update
INSERT INTO table_name (column1, column2, column3, column4, column5, column6, column7, column8) VALUES (xxxxxx, xxxxx, xxxxxxx, x, x, x, NOW(), 'xxxxxxxxxx')
Foreign key constraint fails for table table_name:
,
CONSTRAINT table FOREIGN KEY (column1) REFERENCES tablename (column1) ON DELETE NO ACTION ON UPDATE NO ACTION
Trying to add in child table, in index column tuple:
DATA TUPLE: 3 fields;
0: len 4; hex 80010af7; asc ;;
1: len 4; hex 800053d2; asc S ;;
2: len 1; hex 80; asc ;;
But in parent table tablename, in index PRIMARY,
the closest match we can find is record:
PHYSICAL RECORD: n_fields 16; compact format; info bits 0
0: len 4; hex 80010afa; asc ;;
1: len 6; hex 00076e2767da; asc n'g ;;
2: len 7; hex 85006b00390110; asc k 9 ;;
3: len 30; hex 352f332f313920434d583a20537461722057617273204d61792074686520; asc 5/3/19 name ; (total 45 bytes);
4: SQL NULL;
5: len 4; hex 80000002; asc ;;
6: len 4; hex 4e6f6e65; asc None;;
7: len 6; hex 63656e746572; asc center;;
8: len 0; hex ; asc ;;
9: SQL NULL;
10: SQL NULL;
11: len 8; hex 8000125cf7b59c99; asc \ ;;
12: len 4; hex 5ccb7156; asc \ qV;;
13: len 4; hex 800554a7; asc T ;;
14: len 4; hex 80000001; asc ;;
15: len 4; hex 80000000; asc ;;
FILE I/O
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
46261180 OS file reads, 916766780 OS file writes, 576272661 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
INSERT BUFFER AND ADAPTIVE HASH INDEX
Ibuf: size 1, free list len 5, seg size 7, 5602494 merges
merged operations:
insert 11301289, delete mark 1701670, delete 408068
discarded operations:
insert 0, delete mark 0, delete 0
Hash table size 382495843, node heap has 795110 buffer(s)
0.00 hash searches/s, 1.81 non-hash searches/s
LOG
Log sequence number 7096693912410
Log flushed up to 7096693912410
Last checkpoint at 7096693912410
Max checkpoint age 5894923
Checkpoint age target 5710707
Modified age 0
Checkpoint age 0
0 pending log writes, 0 pending chkp writes
555402869 log i/o's done, 0.00 log i/o's/second
BUFFER POOL AND MEMORY
Total memory allocated 198369607680; in additional pool allocated 0
Total memory allocated by read views 10944
Internal hash tables (constant factor + variable factor)
Adaptive hash index 16087053200 (3059966744 + 13027086456)
Page hash 191248744 (buffer pool 0 only)
Dictionary cache 766901531 (764993104 + 1908427)
File system 166480 (82672 + 83808)
Lock system 478263432 (478120568 + 142864)
Recovery system 0 (0 + 0)
Dictionary memory allocated 1908427
Buffer pool size 11796479
Buffer pool size, bytes 193273511936
Free buffers 34
Database pages 11001335
Old database pages 4061019
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 151504888, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 46261099, created 3021519, written 351793841
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 11001335, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
ROW OPERATIONS
32 queries inside InnoDB, 321 queries in queue
1 read views open inside InnoDB
34 transactions active inside InnoDB
34 out of 1000 descriptors used
---OLDEST VIEW---
Normal read view
Read view low limit trx n:o 7AA63D947
Read view up limit trx id 7AA63D129
Read view low limit trx id 7AA63D947
LATEST DETECTED DEADLOCK
190609 18:09:10
*** (1) TRANSACTION:
TRANSACTION 79425C174, ACTIVE 0 sec starting index read
mysql tables in use 3, locked 3
LOCK WAIT 13 lock struct(s), heap size 3112, 21 row lock(s)
MySQL thread id 152298777, OS thread handle 0x7f5ec5edf700, query id 1910369064 ipaddress username Updating
UPDATE table_name SET column1 = xx, column2 = 'xxxxxxxxx, column3 = 'xxxxxxxx' WHERE column4='xxxxxxx' AND column IN('xxxxxxx')
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 318 page no 5056896 n bits 528 index name of table table_name trx id 79425C174 lock_mode X waiting
*** (2) TRANSACTION:
TRANSACTION 79425C16C, ACTIVE 0 sec fetching rows, thread declared inside InnoDB 499
mysql tables in use 3, locked 3
61 lock struct(s), heap size 14776, 2794 row lock(s)
MySQL thread id 152298776, OS thread handle 0x7f5eb57c4700, query id 1910369068 ipaddress username Updating
UPDATE table_name SET column1 = xx, column2 = 'xxxxxxxxx', column3 = 'xxxxxxxxx' WHERE column4 ='xxxxxxx' AND column IN('xxxxxxxxxx')
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 318 page no 5056896 n bits 528 index name of table table_name trx id 79425C16C lock_mode X
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 318 page no 6257198 n bits 360 index PRIMARY of table table_name trx id 79425C16C lock_mode X locks rec but not gap waiting
*** WE ROLL BACK TRANSACTION (1)