Re: [codership-team] Possible bug in wsrep_sst_rsync (2.0b)

306 views
Skip to first unread message

Alex Yurchenko

unread,
Feb 13, 2012, 2:24:35 PM2/13/12
to codersh...@googlegroups.com
On 2012-02-13 21:41, Simon Balz wrote:
> Hi
>
> It's me again. Sorry for flooding the mailing list, but as you can
> see
> I'm currently testing Galera extensively :)
>
> The last thing I tried to test is the rsync SST method in 2.0 beta:
> wsrep_sst_method=rsync
> wsrep_sst_auth=user:pass
> (wsrep_sst_receive_address is not set because IP of first interface
> is
> fine).
>
> What I've seen, Galera is passing the arguments correctly to the
> wsrep_sst_rsync script:
> $ ps -ef | grep rsync
> sh -c wsrep_sst_rsync 'joiner' '172.29.155.198' 'user:pass'
> '/var/lib/
> mysql/' '27012' 2>sst.err
>
> After this call, nothing happens further.

Hi Simon,

What exactly went wrong there (and something went wrong apparently)
should be seen from mysql error log and sst.err in the datadir. rsync
SST requires getting global read lock on the server and it may take some
time.

> When I have a look at the
> script '/usr/bin/wsrep_sst_rsync', I can see the sst credentials are
> passed as argument into the variable AUTH but this variable is never
> used again. In the end, the rsync call happens as user 'mysql', which
> is actually not what I wanted.

Authentication field is ignored by default rsync SST script - i.e. it
goes unprotected because of performance and simplicity considerations:
we needed a script that is generic enough. We felt that setting up
stunnel or passwordless SSH access would complicate the matter which is
complex enough already.

> So my question:
> - Is wsrep_sst_rsync intended to work with a user other than 'mysql'?

I think it is a normal security practice that the child process cannot
elevate privileges. Since sst script is called by mysqld, it will run as
whatever user mysqld is running as.

> - If yes, is this a bug in the script so the credentials aren't used?

Nope, it is just how it was initially conceived. Note that from the
script interface POV authentication credentials is just a string. It can
be anything, including SSH keys.

> - Is it recommended to exchange SSH keys?

Frankly, I'm not exactly sure how SSH keys would work with 'mysql' user
which normally does not have a home directory. And I guess you will need
to rewrite the script somewhat, because currently joiner starts rsync in
server mode. You'll want to disable this.

Too bad rsync does not support SSL - we could have used the same keys
as for mysql client and Galera replication.

Regards,
Alex

> Thx
> Simon

Simon Balz

unread,
Feb 14, 2012, 3:43:32 AM2/14/12
to codership
Hi Alex

Thanks for your reply. I wasn't aware, this script is only a
reference, it's now quite clear to me that it didn't run trough
without customization.
It shouldn't be a problem for me to modify it to get it working.
Actually, the rsync call would copy the whole /var/lib/mysql
directory, which I think is too much. Are there any files more than
the innodb files which have to be copied to the new node? E.g. the
galera.cache or grastate.dat file?$

My idea is to request a lock, then creating a lvm snapshot, releasing
the lock again and using the lvm snapshot to transfer the innodb data
to the joining node.

Any further input?

Thx
Simon

Alex Yurchenko

unread,
Feb 14, 2012, 6:37:17 AM2/14/12
to codersh...@googlegroups.com
On 2012-02-14 11:43, Simon Balz wrote:
> Hi Alex
>
> Thanks for your reply. I wasn't aware, this script is only a
> reference, it's now quite clear to me that it didn't run trough
> without customization.

Well, actually it was intentionally simplified to be run without
customization. The fact that it didn't means something wrong in
configuration or system setup, e.g. firewall.

> It shouldn't be a problem for me to modify it to get it working.
> Actually, the rsync call would copy the whole /var/lib/mysql
> directory, which I think is too much. Are there any files more than
> the innodb files which have to be copied to the new node? E.g. the
> galera.cache or grastate.dat file?$

I think you're mistaken, rsync should copy only required files, there
is a filter for that in the script. In particular, only innodb data and
log files and schema directories should be copied.

> My idea is to request a lock, then creating a lvm snapshot, releasing
> the lock again and using the lvm snapshot to transfer the innodb data
> to the joining node.
>
> Any further input?

I'd strongly recommend to use the provided rsync script as a base
instead of writing your own from scratch. In particular it demonstrates
how to request a lock: echo "flush tables" and to release it: echo
"continue"

Best regards,
Alex

>
> Thx
> Simon

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

Simon Balz

unread,
Feb 14, 2012, 7:42:56 AM2/14/12
to codership
Hi Alex

Thanks for clearing. I was completely wrong understanding the sst
rsync implementation, now it's clear to me.
I completely agree with you to use the default script provided by
Galera and I just did giving a try, without 100% success:

It seems that the rsync copy is running fine but afterwards, the
joining node doesn't really join the cluster and stays on state 1
(waiting).
There are some warnings but nothing which helps me to identify the
reason.
Please find the logs here:
Joiner: http://pastebin.com/vVtbbXtv
Donor: http://pastebin.com/9KeGrT5z

Thank you very much
Simon

Alex Yurchenko

unread,
Feb 14, 2012, 8:13:56 AM2/14/12
to codersh...@googlegroups.com

According to the logs donor believes that it has successfully
transferred the data, and I have little doubt about it.

Joiner however seems to have its log cut too short. Normally you should
see InnoDB recovering after that. One suspicion that I have is that data
was transferred to a wrong directory, but I don't really see how it is
possible. You should also have sst.err files on both nodes, the may
contain additional information about what happened.

One more thing to worry about is

120214 13:33:13 [Warning] WSREP: last inactive check more than PT1.5S
ago, skipping check

during state transfer. This usually means that the node was swapping so
hard that Galera main loop could not get control for several seconds. Be
it swapping or just heavy reading, the node is pretty much
non-operational in this condition, it is very-very slow and probably
should be given some time to process events. Given that those messages
continue till the very end of the joiner log, did you give it enough
time to process SST before sending the log?

In general such messages is a sign of serious node misconfiguration,
and since joiner has not yet initiated storage engines, it is probably
caused by third programs. The cause should be found and eliminated,
running the cluster with such a node is pointless.

Regards,
Alex

> Thank you very much
> Simon

--

Simon Balz

unread,
Feb 15, 2012, 3:49:24 AM2/15/12
to codership
Hi Alex

I've solved the "last inactive check" messages, there were too less
system resources available (I tried to bring the systems to their
upper limit by removing cores and memory).

But I still have the same situation as before. This time I gave about
45 minutes time for the sync before I copied the logs so there are no
more log entries to see.
Donor: http://pastebin.com/TFDzHKur
Joiner: http://pastebin.com/gUWAGz1F
garbd: http://pastebin.com/PXjhjW4b

Donor is in state 4 after this sequence and joiner state 1 (comment:
Waiting for SST (4) ).

sst.err is empty on both systems.

Thx

Alex Yurchenko

unread,
Feb 15, 2012, 2:12:45 PM2/15/12
to codersh...@googlegroups.com

Hi Simon,

This is all very mysterious and I must say that there is not enough
information in logs to make any conclusions. Could you confirm that
state transfer is really happening by removing _all_ contents of
/var/lib/mysql on joiner and trying again? Is there anything
non-standard in joiner configuration, have you modified wsrep_sst_rsync
scripts on joiner or on donor in any way? Can you confirm that rsync is
of the same version on joiner and on donor (so far this was the sole
cause of problems with rsync SST)?

Kind regards,
Alex

Simon Balz

unread,
Feb 15, 2012, 3:08:55 PM2/15/12
to codership
Hi Alex

Indeed it's very mysterious :(
I just tried it with a complete fresh installation (rm -f /var/lib/
mysql/*) on both sides (donor and joiner): same situation.
rsync version is on both sides the same (3.0.6). Further I can confirm
that the rsync process worked fine: ibdata has the same filesize on
both sides and the frm files of the database are there as well. I
noticed that the rsync process shut down after
120215 20:58:02 [Note] WSREP: 1 (mysrv-dev-mysql1.p.mydomain.net):
State transfer to 0 (mysrv-dev-mysql3.p.mydomain.net) complete.
appears in the logfile.
I didn't touch the wsrep_sst_rsync script...
Is there a way to debug this process? Setting wsrep_debug=1 didn't
really give more informations.
Note: In the state where the joiner is synced with the group but
nothing happens, I can't stop mysql properly, I always have to kill
it...

Thx
Simon

Alex Yurchenko

unread,
Feb 15, 2012, 6:06:04 PM2/15/12
to codersh...@googlegroups.com
On 2012-02-15 23:08, Simon Balz wrote:
> Hi Alex
>
> Indeed it's very mysterious :(
> I just tried it with a complete fresh installation (rm -f /var/lib/
> mysql/*) on both sides (donor and joiner): same situation.
> rsync version is on both sides the same (3.0.6). Further I can
> confirm
> that the rsync process worked fine: ibdata has the same filesize on
> both sides and the frm files of the database are there as well. I
> noticed that the rsync process shut down after
> 120215 20:58:02 [Note] WSREP: 1 (mysrv-dev-mysql1.p.mydomain.net):
> State transfer to 0 (mysrv-dev-mysql3.p.mydomain.net) complete.
> appears in the logfile.

Can you see /var/lib/mysql/rsync_sst_complete file at either of the
nodes?

Is rsync process still running on joiner after donor completed his
part?

> I didn't touch the wsrep_sst_rsync script...
> Is there a way to debug this process? Setting wsrep_debug=1 didn't
> really give more informations.

You could change

#!/bin/bash -ue

to

#!/bin/bash -uex

- should log to sst.err. But there's not much chance that it'll help.

> Note: In the state where the joiner is synced with the group but
> nothing happens, I can't stop mysql properly, I always have to kill
> it...

Yes, that's the way it goes. It is very hard to shut down the server in
a "clean" way at that moment, so it is not implemented yet.

Regards,
Alex


> Thx

Simon Balz

unread,
Feb 16, 2012, 3:19:32 AM2/16/12
to codership

Hi Alex

> Can you see /var/lib/mysql/rsync_sst_complete file at either of the
> nodes?
In the state I'm stuck I can see rsync_sst_complete on the donor. On
the joiner, rsync_sst_complete get's removed when wsrep_sst_rsync
exits but it was there for sure after the rsync process finished.

> Is rsync process still running on joiner after donor completed his
> part?
No. Neither on the donor.

> You could change
>
> #!/bin/bash -ue
>
> to
>
> #!/bin/bash -uex
>
> - should log to sst.err. But there's not much chance that it'll help.
At least, it showed that the wsrep_sst_rsync script went trough fine.

Would it help if you had direct access to the systems?

Thx and regards,
Simon

Simon Balz

unread,
Feb 17, 2012, 5:04:35 AM2/17/12
to codership
Hi guys

Alex solved the problem. The cause was, that I was setting the
wsrep_cluster_address during runtime when the innodb engine already
has been started.
To avoid starting the innodb engine before the sst happens, it's
mandatory to set wsrep_cluster_addres in the my.cnf.
Normally the storage engine should abort on sst. Alex told me this
should be fixed in the upcoming release.

Best regards
Simon

Oli Sennhauser

unread,
Feb 19, 2012, 4:04:35 AM2/19/12
to codership
Hello list,

We have the pleasure to announce, that Codership partners with FromDual
to offer consulting and support services for Galera Cluster for MySQL.

You do not only have a great product like Galera Cluster but now you
also have the appropriate services for it!

If you are interested in support, traing or consulting related to Galera
Cluster for MySQL or MySQL itself do not hesitate to contact us.

Regards,
Oli

--

FromDual - Vendor independent and neutral MySQL consulting.

Oli Sennhauser CEO / Senior Consultant
Phone: +41 44 940 24 82 Mobile: +41 79 830 09 33
oli.sen...@fromdual.com http://www.fromdual.com
Skype: fromdual Twitter: fromdual

Reply all
Reply to author
Forward
0 new messages