MySQL Galera node does not sync after reboot

244 views
Skip to first unread message

Maurix

unread,
Jan 16, 2014, 2:21:31 PM1/16/14
to codersh...@googlegroups.com
Hello.

I have a 3-nodes MySQL Galera cluster, using MySQL 5.5.23 and Galera 23.2.1(r129) x64 on CentOS 6.2 x64 servers. Yesterday, I shut down the third node due to a scheduled maintenance task. I did not change anything in MySQL Galera machines, and the other two nodes continued to work as usual.

Today, when I powered on the machine, it joined the cluster (or it seems that it did), but it did not syncronize with other nodes.
The mysql-errors log reports the following messages:

140116 19:28:46 [Note] WSREP: Node 0 (vmpr-mgc-03) requested state transfer from '*any*'. Selected 1 (vmpr-mgc-01)(SYNCED) as donor.
140116 19:28:46 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 3558066)
140116 19:28:46 [Note] WSREP: Requesting state transfer: success, donor: 1
140116 19:28:49 [Warning] WSREP: 1 (vmpr-mgc-01): State transfer to 0 (vmpr-mgc-03) failed: -1 (Operation not permitted)
140116 19:28:49 [ERROR] WSREP: gcs/src/gcs_group.c:gcs_group_handle_join_msg():712: Will never receive state. Need to abort.

There is a warning saying that the mode vmpr-mgc-03 seems to start synconizazion with the node vmpr-mgc-01, but it cannot because it is not permitted. Before the shutdown, the node vmpr-mgc-03 was correctly synced.

The replication aborts with the following messages:

140116 19:52:38 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
         at gcomm/src/pc.cpp:connect():148
140116 19:52:38 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -110 (Connection timed out)
140116 19:52:38 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to open channel 'wsrep_dbcluster' at 'gcomm://vmpr-mgc-02': -110 (Connection timed out)
140116 19:52:38 [ERROR] WSREP: gcs connect failed: Connection timed out
140116 19:52:38 [ERROR] WSREP: wsrep::connect() failed: 6
140116 19:52:38 [ERROR] Aborting

Anyone can help me? 

Thanks in advance.

Regards,

    Maurix

PS  I attached the whole mysql-error.log file.

mysql-error.log

Alex Yurchenko

unread,
Jan 18, 2014, 6:09:41 PM1/18/14
to codersh...@googlegroups.com
On 2014-01-16 21:21, Maurix wrote:
> Hello.
>
> I have a 3-nodes MySQL Galera cluster, using MySQL 5.5.23 and Galera
> 23.2.1(r129) x64 on CentOS 6.2 x64 servers. Yesterday, I shut down the
> third node due to a scheduled maintenance task. I did not change
> anything
> in MySQL Galera machines, and the other two nodes continued to work as
> usual.
>
> Today, when I powered on the machine, it joined the cluster (or it
> seems
> that it did), but it did not syncronize with other nodes.
> The mysql-errors log reports the following messages:
>
> 140116 19:28:46 [Note] WSREP: Node 0 (vmpr-mgc-03) requested state
> transfer
> from '*any*'. Selected 1 (vmpr-mgc-01)(SYNCED) as donor.
> 140116 19:28:46 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 3558066)
> 140116 19:28:46 [Note] WSREP: Requesting state transfer: success,
> donor: 1
> 140116 19:28:49 [Warning] WSREP: 1 (vmpr-mgc-01): State transfer to 0
> (vmpr-mgc-03) failed: -1 (Operation not permitted)

Most likely wsrep_sst_mysqldump script on vmpr-mgc-01 could not connect
to vmpr-mgc-03, probably due to wrong user/password. Log from
vmpr-mgc-01 should have more info.

> 140116 19:28:49 [ERROR] WSREP:
> gcs/src/gcs_group.c:gcs_group_handle_join_msg():712: Will never receive
> state. Need to abort.
>
> There is a warning saying that the mode vmpr-mgc-03 seems to start
> synconizazion with the node vmpr-mgc-01, but it cannot because it is
> not
> permitted. Before the shutdown, the node vmpr-mgc-03 was correctly
> synced.
>
> The replication aborts with the following messages:
>
> 140116 19:52:38 [ERROR] WSREP: failed to open gcomm backend connection:
> 110: failed to reach primary view: 110 (Connection timed out)
> at gcomm/src/pc.cpp:connect():148
> 140116 19:52:38 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195:
> Failed to open backend connection: -110 (Connection timed out)
> 140116 19:52:38 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to
> open channel 'wsrep_dbcluster' at 'gcomm://vmpr-mgc-02': -110
> (Connection
> timed out)

So, what's happening at vmpr-mgc-02?

> 140116 19:52:38 [ERROR] WSREP: gcs connect failed: Connection timed out
> 140116 19:52:38 [ERROR] WSREP: wsrep::connect() failed: 6
> 140116 19:52:38 [ERROR] Aborting
>
> Anyone can help me?
>
> Thanks in advance.
>
> Regards,
>
> Maurix
>
> PS I attached the whole mysql-error.log file.

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011
Reply all
Reply to author
Forward
0 new messages