Galera cluster not responding

antonio falzarano

unread,

Nov 29, 2016, 4:39:53 AM11/29/16

to codership

Hi,

i have a cluster Galera with 5.6.33-25.17-1precise galera-3 25.3.18-1precise and 5 nodes inside EC2.

Today all cluster become not responding because all nodes seems wait a select inside a node.
After i kill this select all cluster start to working.

After some times cluster not responding again, and the same node was stuck, so i have shutdown the node and now seems all working.

inside syslog i find in shutdown node:

Nov 29 06:58:47 mysqld: 2016-11-29 06:58:47 11542 [Warning] WSREP: Failed to report last committed 1168794106, -4 (Interrupted system call)
Nov 29 07:08:41 mysqld: InnoDB: Warning: a long semaphore wait:
Nov 29 07:08:41 mysqld: --Thread 139839303853824 has waited at trx0undo.ic line 171 for 241.00 seconds the semaphore:
Nov 29 07:08:41 mysqld: X-lock (wait_ex) on RW-latch at 0x7f2e831683c0 created in file buf0buf.cc line 1069
Nov 29 07:08:41 mysqld: a writer (thread id 139839303853824) has reserved it in mode wait exclusive
Nov 29 07:08:41 mysqld: number of readers 1, waiters flag 0, lock_word: ffffffffffffffff
Nov 29 07:08:41 mysqld: Last time read locked in file buf0flu.cc line 1056
Nov 29 07:08:41 mysqld: Last time write locked in file /home/galera/mysql-wsrep-5.6.33-25.17/storage/innobase/trx/trx0rec.cc line 1295
Nov 29 07:08:41 mysqld: InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
Nov 29 07:08:41 mysqld: InnoDB: Pending preads 10, pwrites 0
Nov 29 07:08:42 mysqld:
Nov 29 07:08:42 mysqld: =====================================
Nov 29 07:08:42 mysqld: 2016-11-29 07:08:42 7f2de2a1f700 INNODB MONITOR OUTPUT
Nov 29 07:08:42 mysqld: =====================================
Nov 29 07:08:42 mysqld: Per second averages calculated from the last 48 seconds
Nov 29 07:08:42 mysqld: -----------------
Nov 29 07:08:42 mysqld: BACKGROUND THREAD
Nov 29 07:08:42 mysqld: -----------------
Nov 29 07:08:42 mysqld: srv_master_thread loops: 5581918 srv_active, 0 srv_shutdown, 294174 srv_idle
Nov 29 07:08:42 mysqld: srv_master_thread log flush and writes: 5876004
Nov 29 07:08:42 mysqld: ----------
Nov 29 07:08:42 mysqld: SEMAPHORES
Nov 29 07:08:42 mysqld: ----------
Nov 29 07:08:42 mysqld: OS WAIT ARRAY INFO: reservation count 235483700
Nov 29 07:08:42 mysqld: --Thread 139839303853824 has waited at trx0undo.ic line 171 for 242.00 seconds the semaphore:
Nov 29 07:08:42 mysqld: X-lock (wait_ex) on RW-latch at 0x7f2e831683c0 created in file buf0buf.cc line 1069
Nov 29 07:08:42 mysqld: a writer (thread id 139839303853824) has reserved it in mode wait exclusive
Nov 29 07:08:42 mysqld: number of readers 1, waiters flag 0, lock_word: ffffffffffffffff
Nov 29 07:08:42 mysqld: Last time read locked in file buf0flu.cc line 1056
Nov 29 07:08:42 mysqld: Last time write locked in file /home/galera/mysql-wsrep-5.6.33-25.17/storage/innobase/trx/trx0rec.cc line 1295
Nov 29 07:08:42 mysqld: OS WAIT ARRAY INFO: signal count 640964419
Nov 29 07:08:42 mysqld: Mutex spin waits 454540392, rounds 9329786601, OS waits 157704170
Nov 29 07:08:42 mysqld: RW-shared spins 154743432, rounds 2048028474, OS waits 45124355
Nov 29 07:08:42 mysqld: RW-excl spins 40505202, rounds 1513121851, OS waits 23773635
Nov 29 07:08:42 mysqld: Spin rounds per wait: 20.53 mutex, 13.23 RW-shared, 37.36 RW-excl

I have excluded transactions but if is useful i can attach it

antonio falzarano

unread,

Nov 29, 2016, 4:43:40 AM11/29/16

to codership

I have forget that cluster is set to write only to one node with haproxy, and other 4 receive only select from application.

The node that i have excluded was one of the reader nodes.

--
You received this message because you are subscribed to a topic in the Google Groups "codership" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/codership-team/2iSSrADur8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to codership-team+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

James Wang

unread,

Nov 29, 2016, 4:59:21 AM11/29/16

to codership

What system do you use please? PXC or MariaBD?

This happened to me as well. Can not explain why.

Suggestion: put your slow node (e.g. reporting, backup) as a slave to the cluster.

antonio falzarano

unread,

Nov 29, 2016, 5:08:46 AM11/29/16

to James Wang, codership

I use Codership version of mysql Galera http://galeracluster.com , not Percona or MariaDB.

antonio falzarano

unread,

Nov 29, 2016, 6:21:54 AM11/29/16

to codership

I found that both time that cluster was unresponsive, the node generate this log

[Warning] WSREP: Failed to report last committed , -4 (Interrupted system call)

i found this bug https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1434646

there is a connection with this bug? anyone have experience with a problem that i described?

Thanks to all.

antonio falzarano

unread,

Dec 15, 2016, 12:09:51 PM12/15/16

to codership

Anyone of galeracluster Team can have an answer for this problem?

Thanks you
Antonio

Il giorno martedì 29 novembre 2016 10:39:53 UTC+1, antonio falzarano ha scritto:

Reply all

Reply to author

Forward