Strange “split-brain”-like behavior on m-m-m Percona

30 views
Skip to first unread message

A N

unread,
Sep 16, 2016, 11:39:55 AM9/16/16
to codership

Good day. We have Percona XtraDB Cluster 5.6.30-76.3-56-log, wsrep 25.16, multi-master from 3 nodes, serving applications written in php via haproxy.

We got an incident - there was no success in stopping two nodes out of three, so, we bootstrapped from the node with actual data. After that, we attached back two nodes in sequence. We checked cluster status and standard parameters, all values were right, so, we allowed application in. On the contrary, we got something similar to split-brain condition - one of two joiners spit out mass of errors about duplicate PK and deadlocks. After investigation, we removed faulty node from application serving.

Somewhat strange, wsrep_local_recv_queue_avg for this faulty node was extremely high contrary to other joiner, where it was 1 and then slowly decreased to 0.4. For faulty node, it was 10, after that 5 and for long time 3. After stopping application serving, it reduced down to 0.5 and then 0.25. For other two nodes it was near 0.4.

Now there are two major questions. The first one is: how to determine what situation is totally back to normal? And the second: is there any way to reduce the probability of that kind of incidents, diagnose them early and prevent them?
Reply all
Reply to author
Forward
0 new messages