Percona XtraDB Cluster 5.5.30-23.7.4 SST Problem

330 views
Skip to first unread message

Laurent MINOST

unread,
May 6, 2013, 7:12:37 AM5/6/13
to percona-d...@googlegroups.com
Hi,

I'm facing a strange problem this morning with my 3 nodes Percona XtraDB Test Cluster, one of my node cannot rejoin/join back the cluster, each time it tries the SST part is not succeeding and so the node is shutting down itself a few seconds after being started.

Error logs from joiner node and donor node are available on Pastebin :

Joiner node : http://pastebin.com/GqrnnHu8
Donor node : http://pastebin.com/gnru7T6W

I tried to set -x each script to debug but without success as it does not provide more informations to find the cause of the problem.

Also, as it was stated on the donor error log, I've checked the content of the innobackup.backup.log but also without success as I do not have any idea of what could be the problem there :

------------------------------------------------------------------------------------------------
# cat /opt/mysql-galera/data//innobackup.backup.log

InnoDB Backup Utility v1.5.1-xtrabackup; Copyright 2003, 2009 Innobase Oy
and Percona Ireland Ltd 2009-2012.  All Rights Reserved.

This software is published under
the GNU GENERAL PUBLIC LICENSE Version 2, June 1991.

130506 11:47:33  innobackupex: Starting mysql with options:  --defaults-file='/etc/my-galera.cnf' --password=xxxxxxxx --user='replicator' --socket='/opt/mysql-galera/data/mysqld.sock' --unbuffered --
130506 11:47:33  innobackupex: Connected to database with mysql child process (pid=12616)
130506 11:47:39  innobackupex: Connection to database server closed
IMPORTANT: Please check that the backup run completes successfully.
           At the end of a successful backup run innobackupex
           prints "completed OK!".

innobackupex: Using mysql  Ver 14.14 Distrib 5.5.30, for Linux (i686) using readline 5.1
innobackupex: Using mysql server version Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

innobackupex: Created backup directory /tmp
tar: -: Cannot write: Broken pipe
tar: Error is not recoverable: exiting now
innobackupex: Error: Failed to stream 'backup-my.cnf': Inappropriate ioctl for device at /usr/bin/innobackupex line 381.
------------------------------------------------------------------------------------------------


Anyone who already had this problem please ? Any clue/information would be very appreciated to understand what is the problem here.
Thanks !

Regards,

Laurent

Alex Yurchenko

unread,
May 6, 2013, 7:47:19 AM5/6/13
to percona-d...@googlegroups.com
So what's at /usr/bin/innobackupex line 381?

And have you checked if another xtrabackup, innobackupex or nc process
is already running there?

>
> Anyone who already had this problem please ? Any clue/information
> would be
> very appreciated to understand what is the problem here.
> Thanks !
>
> Regards,
>
> Laurent

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

Laurent MINOST

unread,
May 6, 2013, 8:05:04 AM5/6/13
to percona-d...@googlegroups.com
Hi Alexey,

Thanks for your answer.

Below is the code at line 381, nothing for my side that helped me to understand the cause :

    364 #
    365 # Die subroutine kills all child processes and exits this process.
    366 # This subroutine takes the same argument as the built-in die function.
    367 #    Parameters:
    368 #       message   string which is printed to stdout
    369 #
    370 sub Die {
    371     my $message = shift;
    372     my $extra_info = '';
    373
    374     # kill all child processes of this process
    375     kill_child_processes();
    376
    377     if ($current_mysql_request) {
    378         $extra_info = " while waiting for reply to MySQL request:" .
    379             " '$current_mysql_request'";
    380     }
    381     die "$prefix Error: $message$extra_info";
    382 }
    383

For your second question, Yes I already checked and there is no 'freezed/stalled' nc running or other processes related to mysql/innobackupex/xtrabackup ... :(

Another information : I tried to stop/wait/start my 3rd node and it joined back properly but it was an IST not an SST, maybe I will try it by forcing an SST so I could see if the same problem is happening with the 3rd node and hope to identify consequently that the problem is due to the donor (2nd node).

Regards,

Laurent

Laurent MINOST

unread,
May 6, 2013, 9:21:28 AM5/6/13
to percona-d...@googlegroups.com
So it seems that the problem is located at the joiner side as I tried to stopped the 3rd node, did a dirty rm -rf * in data directory then restarted the node and it syncs back successfully to the alone node forming the cluster ...

By the way, I've also found a similar thread : http://www.percona.com/forums/questions-discussions/percona-xtrabackup/9817-innobackupex-sporadic-failures?p=9884 but no answer there yet ...

Regards,

Laurent

Laurent MINOST

unread,
May 6, 2013, 10:15:08 AM5/6/13
to percona-d...@googlegroups.com
OK, so I've finally found the "culprit" :
Only on this 1st node the package for nc/netcat was recently upgraded in a pool of upgrades and it seems that Fedora decided to replace the standard nc/Netcat by a ncat now coming from nmap which do not accept the same parameters :

New ncat on Fedora 18 :

[root@cygnus MAISON logs]# rpm -qf /usr/bin/nc
nmap-ncat-6.25-1.fc18.x86_64

[root@cygnus MAISON logs]# rpm -qf /usr/bin/ncat
nmap-ncat-6.25-1.fc18.x86_64

nmap-ncat.x86_64 : Nmap's Netcat replacement

and this nmap-ncat replacement is not understand -d coming from the wsrep_sst_xtrabackup script (     ${NC_BIN} -dl ${NC_PORT}  | tar xfi  - -C ${DATA}  1>&2 ) as the same one for "standard" netcat :

Standard netcat :
     -d      Do not attempt to read from stdin.

Nmap's ncat replacement :
  -d, --delay <time>         Wait between read/writes

This is explaining why we were having the error logged in the joiner log : 
Ncat: Invalid -d delay "l" (must be greater than 0). QUITTING.

So sorry for this time waste, this is totally Fedora oriented problem because of replacement of netcat ...

Thanks Alexey for your time and answer.

Regards,

Laurent

Alex Yurchenko

unread,
May 6, 2013, 6:00:17 PM5/6/13
to percona-d...@googlegroups.com
Great that you nailed it! And maybe this "time waste" will save time
for somebody else ;)

Laurent MINOST

unread,
May 7, 2013, 8:33:32 AM5/7/13
to percona-d...@googlegroups.com
If it can be useful for someone else then yes it would not have been a waste of time :p

Moreover, I had to go back to the old nc binary to make SST working again, with the new netcat binary coming from nmap SST does not work, maybe it needs a new combination of parameters but I did not find it for the moment ... I don't know if this change is also foreseen for Debian based distributions later but it will probably be seen for Redhat/CentOS distributions in a more of less near future as CentOS/Redhat are based upon Fedora :p

Sanket Gupta

unread,
May 22, 2013, 3:33:24 AM5/22/13
to percona-d...@googlegroups.com
Hi
I was facing a similar problem where the innobackup.backup.log was stuck at
 innobackup.backup.log  on donor (db3)
_________________________________________________________________________________-

InnoDB Backup Utility v1.5.1-xtrabackup; Copyright 2003, 2009 Innobase Oy
and Percona Ireland Ltd 2009-2012.  All Rights Reserved.
This software is published under
the GNU GENERAL PUBLIC LICENSE Version 2, June 1991.
130522 10:50:39  innobackupex: Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/va$
130522 10:50:39  innobackupex: Connected to MySQL server

IMPORTANT: Please check that the backup run completes successfully.
           At the end of a successful backup run innobackupex
           prints "completed OK!".
______________________________________________________________________________________-
 
On comparing logs from toher nodes I found this line to be new
"innobackupex: Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/va"
 
On the older nodes the line was

+++++++++++++++++++++++++++++

Starting mysql with options: --defaults-file='/etc/my.cnf' --password=xxxxxxxx --user='root' --socket='/var/run/mysqld/mysqld.sock' --unbuffered --

130522 11:23:36 innobackupex: Connected to database with mysql child process (pid=22313)

++++++++++++++++++++++++++++++++

Comparing xtrabackup version, I discovered that the node with the issues was using xtrabackup 2.1.2 while the working nodes were using 2.0.7

Since my other nodes were using xtrabackup-2.0.7, I thought of downgrading to the same version and try again!
 
AND it worked
 
Not sure what the bug description is  but this needs to be logged as a bug  on xtrabackup 2.1.2

Laurent MINOST

unread,
May 22, 2013, 5:34:07 AM5/22/13
to percona-d...@googlegroups.com
Hi Sanket,

Yes I can also confirm there is a big problem with Percona Xtrabackup 2.1.2, I was in the process of testing the new release this morning on a test cluster but failed everything when testing it because of this SST hang and finally saw the problem by coming to the group here, so as the bug has now been confirmed by Percona, I'm downgrading to 2.0.x ... :(

I strongly think that this is now mandatory to wait for at least 1 or 2 versions "to pass" to be sure to upgrade *securely* due to the lack of non-regression tests with all these failed percona releases during previous weeks, IMO this is not acceptable and not professionnal but this is only my opinion ...

Regards,

Laurent
Reply all
Reply to author
Forward
0 new messages