Hi Laurent,
On 2012-05-10 11:56, Laurent MINOST wrote:
> Hi,
>
> This morning I had a node that unsync by itself or crash, I don't
> really
> know why, It automatically tried to resync from another Synced node
> on the
> cluster but resync stalled probably because of the current opened bug
> on
> IST (
https://bugs.launchpad.net/galera/+bug/985747), so I have
> collected
> all the logs from each node and provide them here to know if it's
> possible
> to find the reason of this desync/crash ?
Thanks for your report. According to the logs it was not a crash but a
network partitioning. The "crashed" node became unresponsive and was
kicked out by the other two:
> 120510 9:30:09 [Warning] WSREP: last inactive check more than PT1.5S
> ago, skipping check
> 120510 9:30:10 [Note] WSREP: (468c977e-9995-11e1-0800-6f132903562b,
> 'tcp://
0.0.0.0:4567') turning message relay requesting on, nonlive
> peers: tcp://
192.168.0.1:4567 tcp://
192.168.0.5:4567
> 120510 9:30:10 [Warning] WSREP: last inactive check more than PT1.5S
> ago, skipping check
> 120510 9:30:11 [Note] WSREP: (468c977e-9995-11e1-0800-6f132903562b,
> 'tcp://
0.0.0.0:4567') turning message relay requesting off
> 120510 9:30:11 [Note] WSREP:
> view(view_id(NON_PRIM,468c977e-9995-11e1-0800-6f132903562b,13) memb {
> 468c977e-9995-11e1-0800-6f132903562b,
> } joined {
> } left {
> } partitioned {
> 5d8c2fc7-976d-11e1-0800-2052394e626e,
> 9f9b7649-976d-11e1-0800-659e792e93dd,
> })
1) During IST the donor is not blocked at all, so it keeps on serving
clients as usual.
2) For non-blocking IST you should be using Percona XtraDB Cluster or
just take xtrabackup SST script from them.
> - What is exactly the meaning of wsrep_provider_version please ? I
> cannot
> find any information on this parameter from
>
http://www.codership.com/wiki/doku.php?id=mysql_options_0.8
> I saw that on my cluster I have a different wsrep_provider_version on
> one
> of my node, that seems pretty strange to me as I remember I used the
> same
> binaries for MySQL-wsrep : mysql-5.5.23_wsrep_23.5-linux and source
> file
> for Galera lib : galera-23.2.0-src.tar.gz ? Maybe it can have an
> impact on
> the cluster stability ?
> node1 : wsrep_provider_version 2.1dev(rXXXX)
> node2 : wsrep_provider_version 2.0(rXXXX)
> node3 : wsrep_provider_version 2.0(rXXXX)
This is just for informational purposes.
> - Is it possible to disable IST totally for all nodes temporarly
> until the
> bug with IST is fixed please and how ?
1) Yes, you should unset wsrep_node_address and ist.recv_addr
variables.
2) I have just pushed a fix for that bug in revision 127, you may want
to try that now.
Regards,
Alex
> Thanks.
> Regards,
>
> Laurent
--
Alexey Yurchenko,
Codership Oy,
www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011