big db sync at node joining behaviour

26 views
Skip to first unread message

alex

unread,
May 24, 2012, 4:31:35 AM5/24/12
to codersh...@googlegroups.com
Hi! I set up replication on a two node cluster. Changed wsrep cluster address to other's node address and tested replication on small dbs. Everything was working perfectly.
I tried a previous scenario (first server down, making changes on the second one) with a greater database (aprox. 40 GB). Now, it tells me failed whenever I want to start the first node and in the background, several rsync processes are open. Is this normal? How should the replication behave in this case and what can I do now to start the second node? I tried also bootstrapping the cluster (by changing the cluster address on the server which has the big db to "gcomm://" and trying to connect with the other, but happens the same thing)

Thanks in advance!

Henrik Ingo

unread,
May 24, 2012, 5:58:12 AM5/24/12
to alex, codersh...@googlegroups.com
Hi alex

I've also had the rsync method fail on me for various reasons.
Sometimes it's firewall, and could be selinux, but sometimes I never
figured out what was wrong. It is unfortunate that the rsync sst
method is not easy to debug when it doesn't work.

One thing I can help with though: When it fails, the rsync daemon is
left running. You need to kill it (or all of them) before retrying.

Hopefully codership guys can help you with more details, just wanted
to share my experience.

henrik
> --
> You received this message because you are subscribed to the Google Groups
> "codership" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/codership-team/-/I8wK1wh8IfMJ.
> To post to this group, send email to codersh...@googlegroups.com.
> To unsubscribe from this group, send email to
> codership-tea...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/codership-team?hl=en.



--
henri...@avoinelama.fi
+358-40-8211286 skype: henrik.ingo irc: hingo
www.openlife.cc

My LinkedIn profile: http://www.linkedin.com/profile/view?id=9522559

alex

unread,
May 24, 2012, 7:06:30 AM5/24/12
to codersh...@googlegroups.com
hi henrik

It seems that this is normal. When one node is down and you make great changes to the active one, at the reconnection depending on your sync method (in my case rsync), the servers will synchronize the dbs. Until finished, you cannot access mysql console on the joiner node. If you take a look at wsrep status on the active node, you will notice somewhere the status "doner (+)"..also if you take a look at the active processes, you will see cpu activity for rsync on both nodes. That means the replication is doing its job and you have to let them finish and when it does, rsync processes will dissapear and you will be able to connect normally.

Hope that it helps and I would be greatfull if a developer could offer some official infos about that.

alex

Henrik Ingo

unread,
May 24, 2012, 7:33:47 AM5/24/12
to alex, codersh...@googlegroups.com
Ah yes. That is when everything works well. Of course, syncing 40G of
data will take some time so you have to be patient.

But sometimes the rsync failed on me, for various reasons (like
firewalls). Then the MySQL startup fails, and you can view the error
in MySQL error log. In those cases I've noticed the rsync daemon is
often left running, and I need to kill it before re-trying.

Also when everything goes fine, you can follow the progress of the sst
in MySQL error log.

henrik
> --
> You received this message because you are subscribed to the Google Groups
> "codership" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/codership-team/-/rDagL83b2zEJ.

Alexey Yurchenko

unread,
May 25, 2012, 2:36:29 PM5/25/12
to codersh...@googlegroups.com
Hi

1. Since rsync needs to modify data directory on joiner directly, mysqld on joiner cannot really start until SST is finished. Hence it is impossible to connect to joiner to check the progress.
2. What is strange in your reports is that IST never kicks in (at least you don't mention it). Normally, if you shut down a node, do some modifications to DB in the cluster and restart the node, IST should happen and it should be much faster than rsync. By default the node caches 128M of transactions, so it takes a lot transactions to overflow. What version of the cluster are you using?

Regards,
Alex
Reply all
Reply to author
Forward
0 new messages