WSREP: gcs connect failed: Connection timed out. On Galera with mariadb

8,950 views
Skip to first unread message

trupti mali

unread,
Aug 14, 2015, 2:42:06 AM8/14/15
to codership

I have two node cluster. Both the nodes are behind firewall, using IPTables. I have opened ports 3306.4567,,4568 and 4444 as based in guideline of official galera site.http://galeracluster.com/documentation-webpages/firewallsettings.html

My first node settings in my.cnf- ip addr - 1.2.3.4


[mysqld]

#mysql settings

binlog_format=ROW

default-storage-engine=innodb

innodb_autoinc_lock_mode=2

query_cache_size=0

query_cache_type=0


#galera settings

wsrep_provider=/usr/lib/galera/libgalera_smm.so

wsrep_cluster_name=galera_cluster

wsrep_cluster_address=gcomm://1.2.3.4,5.6.7.8

wsrep_sst_method=rsync

wsrep_provider_options="base_host=1.2.3.4"

wsrep_node_name=node1


My second node settings in my.cnf- ip addr 5.6.7.8


[mysqld]

#mysql settings

binlog_format=ROW

default-storage-engine=innodb

innodb_autoinc_lock_mode=2

query_cache_size=0

query_cache_type=0


#galera settings

wsrep_provider=/usr/lib/galera/libgalera_smm.so

wsrep_cluster_name=galera_cluster

wsrep_cluster_address=gcomm://1.2.3.4,5.6.7.8

wsrep_sst_method=rsync

wsrep_sst_method=rsync

wsrep_provider_options="base_host=5.6.7.8"

wsrep_node_name=node1


I start first node with --wsrep-new-cluster and it starts well. But when I start second node I get below err.

I am using Galera 3.9(rXXXX) by Codership. Server version: 10.0.20-MariaDB-1~wheezy-wsrep-log mariadb.org binary distribution, wsrep_25.10.r4144

Node 2 logs ----------

wsrep_provider_options="base_host=5.6.7.8"

Aug 14 05:15:15 (none) mysqld: 150814  5:15:15 [Note] WSREP: view((empty))

Aug 14 05:15:15 (none) mysqld: 150814  5:15:15 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)

Aug 14 05:15:15 (none) mysqld: #011 at gcomm/src/pc.cpp:connect():161

Aug 14 05:15:15 (none) mysqld: 150814  5:15:15 [ERROR] WSREP: gcs/src/gcs_core.cpp:long int gcs_core_open(gcs_core_t*, const char*, const char*, bool)():206: Failed to open backend connection: -110 (Connection timed out)

Aug 14 05:15:15 (none) mysqld: 150814  5:15:15 [ERROR] WSREP: gcs/src/gcs.cpp:long int gcs_open(gcs_conn_t*, const char*, const char*, bool)():1379: Failed to open channel 'galera_cluster' at 'gcomm://192.168.xx.xx,192.168.xx.xx': -110 (Connection timed out)

Aug 14 05:15:15 (none) mysqld: 150814  5:15:15 [ERROR] WSREP: gcs connect failed: Connection timed out

Aug 14 05:15:15 (none) mysqld: 150814  5:15:15 [ERROR] WSREP: wsrep::connect() failed: 7

Aug 14 05:15:15 (none) mysqld: 150814  5:15:15 [ERROR] Aborting

Aug 14 05:15:15 (none) mysqld:

Aug 14 05:15:15 (none) mysqld: 150814  5:15:15 [Note] WSREP: Service disconnected.

Aug 14 05:15:16 (none) mysqld: 150814  5:15:16 [Note] WSREP: Some threads may fail to exit.

Aug 14 05:15:16 (none) mysqld: 150814  5:15:16 [Note] /usr/sbin/mysqld: Shutdown complete.


Stuck since a day. Can someone please help.


Josten

unread,
Aug 14, 2015, 2:53:38 AM8/14/15
to trupti mali, codership

Hello,

Can you confirm that all your nodes have netcat/nc installed? Can you confirm by opening up a listener on those ports that there indeed open?

--
You received this message because you are subscribed to the Google Groups "codership" group.
To unsubscribe from this group and stop receiving emails from it, send an email to codership-tea...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

trupti mali

unread,
Aug 14, 2015, 3:17:35 AM8/14/15
to codership, trupt...@gmail.com

Thanks Johan,

i tried nc on Node 2 which is facing problems starting up

here is the netstast output


tcp        0      0 0.0.0.0:3306            0.0.0.0:*               LISTEN      687/nc 

tcp        0      0 0.0.0.0:4567            0.0.0.0:*               LISTEN      689/nc          

tcp        0      0 0.0.0.0:4568            0.0.0.0:*               LISTEN      693/nc          

tcp        0      0 0.0.0.0:4444            0.0.0.0:*               LISTEN      698/nc          


On the primary node - 
here is the netstat output..no occurence of  usage of 4568 and 4444 by mysql processes. Is this any issue?

cp        0      0 0.0.0.0:4444            0.0.0.0:*               LISTEN      5701/nc         

tcp        0      0 0.0.0.0:3306            0.0.0.0:*               LISTEN      7866/mysqld     

tcp        0      0 0.0.0.0:49933           0.0.0.0:*               LISTEN      3009/rpc.statd  

tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      2976/rpcbind    

tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      19482/sshd      

tcp        0      0 0.0.0.0:4567            0.0.0.0:*               LISTEN      7866/mysqld     

Josten

unread,
Aug 14, 2015, 3:33:10 AM8/14/15
to trupti mali, codership

Can you confirm that socat and pv are installed?

It's OK that you do not see these ports as they only open during SST. That's why I suggested using nc to create your own listener to confirm that there is no firewall issue.

Are these systems on the same LAN?

Can you confirm rsync works between them?

Do you have a mysql user setup on each sever that you've confirmed to be working remotely from each host?

trupti mali

unread,
Aug 14, 2015, 4:13:48 AM8/14/15
to codership, trupt...@gmail.com
Well I didnt have those before. But I installed socat and pv now. Can you suggest what traffic I need to monitor ?
Yes both nodes on same LAN
about rsync I tried transferring a file over rsync..I used the syntax as 

rsync -zvh testme ro...@192.168.node1ip.xxx:/home

but it gives me error as 


Permission denied (publickey).

rsync: connection unexpectedly closed (0 bytes received so far) [sender]

rsync error: error in rsync protocol data stream (code 12) at io.c(605) [sender=3.0.9]


I also tried ssh transfer using known user but over there also I get the error . 


Permission denied (publickey).

rsync: connection unexpectedly closed (0 bytes received so far) [sender]

rsync error: error in rsync protocol data stream (code 12) at io.c(605) [sender=3.0.9]


This could be the issue? I dont know which user is used to do rsync between two nodes..but I have root login disabled on both the nodes.

trupti mali

unread,
Aug 14, 2015, 5:39:15 AM8/14/15
to codership
Another observation - I tried reinstalling rsync and 

Aug 14 06:25:01 (none) rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="3324" x-info="http://www.rsyslog.com"] rsyslogd was HUPed

Aug 14 06:25:01 (none) rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="3324" x-info="http://www.rsyslog.com"] rsyslogd was HUPed

Aug 14 11:24:03 (none) rsyslogd-2177: imuxsock begins to drop messages from pid 27871 due to rate-limiting

I see above logs.


It starts for a while and fails again..

alexey.y...@galeracluster.com

unread,
Aug 14, 2015, 6:00:57 AM8/14/15
to trupti mali, codership
1. can you telnet from node2 to node1 ports 4567,4568 when node1 is
running?
2. if you have selinux or apparmor enabled try to disable it and see if
that helps
> I am using Galera 3.9(rXXXX) by Codership. *Server version:
> 10.0.20-MariaDB-1~wheezy-wsrep-log mariadb.org binary distribution,
> wsrep_25.10.r4144*
>
> *Node 2 logs ----------*
Reply all
Reply to author
Forward
0 new messages