Hi,
CentOS 7.2 / MariaDB 10.1.19 + SST xtrabackup-v2 from Percona, test cluster of 3 nodes.
I'm facing a problem when trying to recover cluster after crash (i.e. 3 nodes crashed etc..) or 1 node crashed but "wsrep_last_committed" changed on the other two nodes:
1) I'm unable to start 1st node (galera recover or/and galera new cluster) due to following error message:
Nov 13 12:37:09 galera3 mysqld: 2016-11-13 12:37:09 140349567105216 [ERROR] WSREP: failed to open gcomm backend connection: 131: invalid UUID: 00000000 (FATAL)
Nov 13 12:37:09 galera3 mysqld: at gcomm/src/pc.cpp:PC():271
Nov 13 12:37:09 galera3 mysqld: 2016-11-13 12:37:09 140349567105216 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -131 (State not recoverable)
Nov 13 12:37:09 galera3 mysqld: 2016-11-13 12:37:09 140349567105216 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1380: Failed to open channel 'elo' at 'gcomm://
192.168.124.16,192.168.124.34,192.168.124.121': -131 (State not recoverable)
Nov 13 12:37:09 galera3 mysqld: 2016-11-13 12:37:09 140349567105216 [ERROR] WSREP: gcs connect failed: State not recoverable
Nov 13 12:37:09 galera3 mysqld: 2016-11-13 12:37:09 140349567105216 [ERROR] WSREP: wsrep::connect(gcomm://
192.168.124.16,192.168.124.34,192.168.124.121) failed: 7
Nov 13 12:37:09 galera3 mysqld: 2016-11-13 12:37:09 140349567105216 [ERROR] Aborting
See full logs here ->
http://pastebin.com/NycUa6yN*** I found that the only one solution is to remove /var/lib/mysql/gvwstate.dat file and then I can recover/start cluster as normal. ***
2) Also this is happening when 1 node crashes (other 2 are still up) and "wsrep_last_committed" has changed on other 2 nodes.
Can't run recover/new cluster/whatever as long as file var/lib/mysql/gvwstate.dat exists.
See my config (/etc/my.cnf) for reference:
http://pastebin.com/0eiWSjVaThanks,
Konrad