Galera cluster / WSREP: failed to open gcomm backend connection

3,704 views
Skip to first unread message

konrad.zie...@gmail.com

unread,
Nov 13, 2016, 8:37:40 AM11/13/16
to codership
Hi,

CentOS 7.2 / MariaDB 10.1.19 + SST xtrabackup-v2 from Percona, test cluster of 3 nodes.

I'm facing a problem when trying to recover cluster after crash (i.e. 3 nodes crashed etc..) or 1 node crashed but "wsrep_last_committed" changed on the other two nodes:

1) I'm unable to start 1st node (galera recover or/and galera new cluster) due to following error message:

Nov 13 12:37:09 galera3 mysqld: 2016-11-13 12:37:09 140349567105216 [ERROR] WSREP: failed to open gcomm backend connection: 131: invalid UUID: 00000000 (FATAL)
Nov 13 12:37:09 galera3 mysqld: at gcomm/src/pc.cpp:PC():271
Nov 13 12:37:09 galera3 mysqld: 2016-11-13 12:37:09 140349567105216 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -131 (State not recoverable)
Nov 13 12:37:09 galera3 mysqld: 2016-11-13 12:37:09 140349567105216 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1380: Failed to open channel 'elo' at 'gcomm://192.168.124.16,192.168.124.34,192.168.124.121': -131 (State not recoverable)
Nov 13 12:37:09 galera3 mysqld: 2016-11-13 12:37:09 140349567105216 [ERROR] WSREP: gcs connect failed: State not recoverable
Nov 13 12:37:09 galera3 mysqld: 2016-11-13 12:37:09 140349567105216 [ERROR] WSREP: wsrep::connect(gcomm://192.168.124.16,192.168.124.34,192.168.124.121) failed: 7
Nov 13 12:37:09 galera3 mysqld: 2016-11-13 12:37:09 140349567105216 [ERROR] Aborting

See full logs here -> http://pastebin.com/NycUa6yN

*** I found that the only one solution is to remove /var/lib/mysql/gvwstate.dat file and then I can recover/start cluster as normal. ***

2) Also this is happening when 1 node crashes (other 2 are still up) and "wsrep_last_committed" has changed on other 2 nodes.
Can't run recover/new cluster/whatever as long as file var/lib/mysql/gvwstate.dat exists.

See my config (/etc/my.cnf) for reference: http://pastebin.com/0eiWSjVa

Thanks,
Konrad


Doug Whitfield

unread,
Aug 3, 2021, 1:40:33 PM8/3/21
to codership
On Sunday, November 13, 2016 at 7:37:40 AM UTC-6 konrad.zie...@gmail.com wrote:
then I can recover/start cluster as normal. ***

I deleted the file, but what steps did you complete for the following? I recognize this is from ~5 years ago, but I can't find instructions anywhere that work.

Thanks!

Colin Charles

unread,
Aug 18, 2021, 2:11:50 AM8/18/21
to konrad.zie...@gmail.com, codership


> On 13 Nov 2016, at 21:11, konrad.zie...@gmail.com wrote:
>
> *** I found that the only one solution is to remove /var/lib/mysql/gvwstate.dat file and then I can recover/start cluster as normal. ***
>
> 2) Also this is happening when 1 node crashes (other 2 are still up) and "wsrep_last_committed" has changed on other 2 nodes.
> Can't run recover/new cluster/whatever as long as file var/lib/mysql/gvwstate.dat exists.
>
> See my config (/etc/my.cnf) for reference: http://pastebin.com/0eiWSjVa


Note that you’re using MariaDB Server 10.1.19, which isn’t supported any longer and likely has bugs in it fixed in later releases

Later releases e.g. 10.2 included pc.recovery=on which is recovery of the primary component stored in gvwstate.dat

see: https://mariadb.com/docs/reference/mdb/wsrep_provider_options/pc.recovery/ and https://galeracluster.com/library/documentation/pc-recovery.html

config is fine, just that you’re using a rather old version of MariaDB Server; i would recommend an upgrade
--
Colin Charles, http://bytebot.net/blog/
twitter: @bytebot | skype: colincharles
"First they ignore you, then they laugh at you, then they fight you, then you win." -- Mohandas Gandhi

Reply all
Reply to author
Forward
0 new messages