advice on reference node failure / recovery

52 views
Skip to first unread message

Debian Guy

unread,
May 30, 2012, 2:53:10 AM5/30/12
to codersh...@googlegroups.com
Hello, everyone,

I would appreciate a bit of advice concerning the following scenario:
Galera cluster (percona release) with 3 nodes (N1, N2, N3), out of which N1 is considered as a reference node.
N1 has wsrep_cluster_address='gcom://', while N2 and N3 have it pointing to the IP of N1 (perhaps this is what I'm doing wrong, but need a double check).

If I simulate a failure of N1 (a forced power off  or network failure), the remaining nodes seem to handle everything correctly and replicate changes between them.

When N1 returns, it will join the cluster, but does not receive any of the updates (due to the wsrep_cluster_address) - somewhat normal for this behavior. And if any of the other nodes is to restart, will most likely do a full resync from N1 and lose all the changes.

What's the best approach concerning the wsrep_cluster_address setting of each node, in order to avoid such issue? Have it unset in the configuration file and configure it dynamically after startup using some external component, or do some form of chain on the nodes with the wsrep_cluster_address (e.g.: N2 => N1, N3 => N2, N1 => N3)?

TIA

Alex Yurchenko

unread,
May 30, 2012, 3:37:07 AM5/30/12
to codersh...@googlegroups.com
On 2012-05-30 09:53, Debian Guy wrote:
> Hello, everyone,
>
> I would appreciate a bit of advice concerning the following scenario:
> Galera cluster (percona release) with 3 nodes (N1, N2, N3), out of
> which N1
> is considered as a reference node.
> N1 has wsrep_cluster_address='gcom://', while N2 and N3 have it
> pointing to
> the IP of N1 (perhaps this is what I'm doing wrong, but need a double
> check).
>
> If I simulate a failure of N1 (a forced power off or network
> failure), the
> remaining nodes seem to handle everything correctly and replicate
> changes
> between them.
>
> When N1 returns, it will join the cluster,

No it won't. How will it know about the other guys if
wsrep_cluster_address=gcomm:// ?

It will create a new cluster.

> but does not receive any of the
> updates (due to the wsrep_cluster_address) - somewhat normal for this
> behavior. And if any of the other nodes is to restart, will most
> likely do
> a full resync from N1 and lose all the changes.
>
> What's the best approach concerning the wsrep_cluster_address setting
> of
> each node, in order to avoid such issue? Have it unset in the
> configuration
> file and configure it dynamically after startup using some external
> component, or do some form of chain on the nodes with the
> wsrep_cluster_address (e.g.: N2 => N1, N3 => N2, N1 => N3)?

Unfortunately dynamic address changes are limited to mysqldump SST
method only, since with anything else you have to join the cluster
_before_ mysqld is fully initialized.

The best approach for now is to remember that
wsrep_cluster_address=gcomm:// is a temporary setting to bootstrap a new
cluster and must be changed to some real address asap. (And the reason
to change it in my.cnf is that it is the only way to adjust the
configuration in a system with mysqld_safe)

Regards,
Alex

> TIA

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

Debian Guy

unread,
May 30, 2012, 3:51:15 AM5/30/12
to codersh...@googlegroups.com

Thank you for your reply, Alexey.

> When N1 returns, it will join the cluster,
No it won't. How will it know about the other guys if
wsrep_cluster_address=gcomm:// ?
It will create a new cluster.

You are right, was a poor interpretation of the condition in the initial post.
 
The best approach for now is to remember that
wsrep_cluster_address=gcomm:// is a temporary setting to bootstrap a new
cluster and must be changed to some real address asap. (And the reason
to change it in my.cnf is that it is the only way to adjust the
configuration in a system with mysqld_safe)

Thanks for clearing it up. SST is configured to rsync in my case.

So, if I got this correctly, when N1 returns, to have it join the existing cluster I will need to
issue a -- set global wsrep_cluster_address="gcomm://<ADDR>" --, ADDR being the IP of
any of the nodes that are members of the cluster with Synced status (could be probably
tuned a bit based on the wsrep_local_index value of the other nodes) ?

Alex Yurchenko

unread,
May 30, 2012, 10:16:11 AM5/30/12
to codersh...@googlegroups.com
Then it will see that it needs SST and for that it needs to restart
(because you have to run rsync before storage engine initialization) and
when it restarts, what will be the value of wsrep_cluster_address again?

Debian Guy

unread,
May 30, 2012, 10:37:02 AM5/30/12
to codersh...@googlegroups.com


On Wednesday, May 30, 2012 5:16:11 PM UTC+3, Alexey Yurchenko wrote:
Then it will see that it needs SST and for that it needs to restart
(because you have to run rsync before storage engine initialization) and
when it restarts, what will be the value of wsrep_cluster_address again?


 Did some extra reading in the meanwhile, seems my approach was bad from
the beginning.
 Empty IP wsrep_cluster_address is only used either when you set up a new
cluster from scratch (e.g. load data on this first node, then link the others to
it so they'll sync up) or when doing a full recovery.
Otherwise, as soon as the follow-up nodes are in sync, the value in my.cnf
should be change to one of the other nodes and the first node restarted with
the new setting. Live (, test) and learn - hope I got it right this time.

Thanks again for the heads up on this matter.
Reply all
Reply to author
Forward
0 new messages