failed to open gcomm backend connection

John Test

unread,

Feb 23, 2016, 9:39:36 AM2/23/16

to codership

Hi folks,

Getting the following errors on secondary nodes and they are not joining the cluster. The first bootstrap node comes up ok.

I would very much appreciate some help.

}2016-02-23 14:24:16 6919 [ERROR] WSREP: failed to open gcomm backend connection: 131: 1e6ca095 last prims not consistent (FATAL)
at gcomm/src/pc_proto.cpp:is_prim():815
2016-02-23 14:24:16 6919 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -131 (State not recoverable)
2016-02-23 14:24:16 6919 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1379: Failed to open channel 'my_wsrep_cluster' at 'gcomm://172.18.230.71,172.18.230.72,172.21.230.71,172.21.230.72,172.27.1.161,172.18.230.73,172.21.230.73': -131 (State not recoverable)
2016-02-23 14:24:16 6919 [ERROR] WSREP: gcs connect failed: State not recoverable
2016-02-23 14:24:16 6919 [ERROR] WSREP: wsrep::connect(gcomm://172.18.230.71,172.18.230.72,172.21.230.71,172.21.230.72,172.27.1.161,172.18.230.73,172.21.230.73) failed: 7
2016-02-23 14:24:16 6919 [ERROR] Aborting

Thank you!

hunter86bg

unread,

Feb 24, 2016, 4:34:48 AM2/24/16

to codership

Please check if you have the "nc" and rsync binnary . In Centos 7 it's nmap-ncat and rsync respectively.
Check that in my.cnf the following sections are present on all nodes and setup properly:
wsrep_cluster_name="Your_Cluster_name"
wsrep_cluster_address="gcomm://node1,node2,node3,node4"
wsrep_node_name="your_node_name"
wsrep_node_address="your_node_dns_hostname_or_ip"

Also, check if something is listening on port tcp 4567

netstat -alpn | egrep '4567|4568|4444|3306'

For simple troubleshooting stop the firewall temporary and check if this fixes it.
In RHEL/CentOS:

service iptables stop

or

systemctl stop firewalld

Disable Selinux or App Armour. In RHEL/CentOS :

setenforce 0

Please provide feedback once you have done these steps (try after each step), so anyone else can benefit on your experience.

hunter86bg

unread,

Feb 24, 2016, 4:38:01 AM2/24/16

to codership

EDIT:
In section:
wsrep_cluster_address="gcomm://ca5-sm_db-01,ca5-sm_db-02,ca5-sm_db-03,ca5-sm_db-04"
you can add the ports which the other cluster members are listening (if you are not using the default 4567) like:
wsrep_cluster_address="gcomm://ca5-sm_db-01:port_on_which_wsrep_listens,ca5-sm_db-02:port_on_which_wsrep_listens,ca5-sm_db-03:port_on_which_wsrep_listens,ca5-sm_db-04:port_on_which_wsrep_listens"
Please differentiate this port (by default 4567) by the port on which mysql serves clients (defaults 3306), as they are different thing.

John Test

unread,

Feb 24, 2016, 3:18:34 PM2/24/16

to hunter86bg, codership

Thank you for your response.

I have this
wsrep_cluster_address="gcomm://node1,node2,node3,node4"

I do not have these
wsrep_cluster_name="Your_Cluster_name"
wsrep_node_name="your_node_name"
wsrep_node_address="your_node_dns_hostname_or_ip"

The ports are all default.
Are these important to have?

Here is how the problem was fixed.
The sst file in /var/lib/mysql had own of root:root
I changed chown all to mysql:mysql

I am guessing this happened as I had restored from xtrabackup backup.

However, I did do this as part of the restore process so how did
this ownership change on other nodes? Not sure about this.

I also have skip name resolv in my.cnf and wonder if it is causing any issues.

--
You received this message because you are subscribed to a topic in the Google Groups "codership" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/codership-team/wX9G0PxSdW8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to codership-tea...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

hunter86bg

unread,

Feb 25, 2016, 8:01:11 AM2/25/16

to codership, strahil...@gmail.com

You'd better define at least the wsrep_cluster_name.

Reply all

Reply to author

Forward