Avoiding NON-PRIMARY situation in 2-node dpeloyment

SyRenity

unread,

Apr 2, 2012, 11:50:25 AM4/2/12

to codersh...@googlegroups.com

Hi.

Is there a safe way to designate a node as primary in 2-node deployment, so if the other node goes offline, the primary node won't get into NON-PRIMARY status, still allowing writes and reads?

That way the split-brain situation would be avoided, as the 1st node expected to have up-to-date status and considered as the main one.

Or the only safe solution is to have arbitrators running on additional machines?

By the way, I recall that some related work-around for 2-node clusters was added in past, does anyone remember how it worked?

Thanks!

Alex Yurchenko

unread,

Apr 2, 2012, 1:59:40 PM4/2/12

to codersh...@googlegroups.com

Hi,

In multi-master mode running arbitrator on one of the nodes will be
pretty much equivalent declaring that node as "primary".

You can also set pc.ignore_sb, if you use the cluster in a master-slave
mode that should be quite safe. That's how you can emulate the classic
2-node HA schemes with virtual IPs.

Regards,
Alex

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

SyRenity

unread,

Apr 2, 2012, 4:01:05 PM4/2/12

to codersh...@googlegroups.com

Hi.

In multi-master mode running arbitrator on one of the nodes will be
pretty much equivalent declaring that node as "primary".

Good idea about the arbitrator on 1st node - I presume as it runs on same machine, there won't be any traffic and the processing will be negligible.

You can also set pc.ignore_sb, if you use the cluster in a master-slave
mode that should be quite safe. That's how you can emulate the classic
2-node HA schemes with virtual IPs.

With this setting, on restart of a 2nd node will it discard any locally different data and just get the state from the 1st node?

Alex Yurchenko

unread,

Apr 2, 2012, 5:04:30 PM4/2/12

to codersh...@googlegroups.com

On 2012-04-02 23:01, SyRenity wrote:
> Hi.
>
> In multi-master mode running arbitrator on one of the nodes will be
>> pretty much equivalent declaring that node as "primary".
>>
> Good idea about the arbitrator on 1st node - I presume as it runs on
> same
> machine, there won't be any traffic and the processing will be
> negligible.

As a matter of fact there will be additional traffic, but in most cases
it won't be noticeable. I have not yet seen situations where network
would be a bottleneck for replication.

>
>> You can also set pc.ignore_sb, if you use the cluster in a
>> master-slave
>> mode that should be quite safe. That's how you can emulate the
>> classic
>> 2-node HA schemes with virtual IPs.
>>
>
> With this setting, on restart of a 2nd node will it discard any
> locally
> different data and just get the state from the 1st node?

You are not supposed to have different data in a master-slave cluster.
You may have missing transactions, but those will be covered by state
transfer. Never do this in multi-master cluster. If your data diverges
there is no way to detect it.

SyRenity

unread,

Apr 3, 2012, 4:44:32 AM4/3/12

to codersh...@googlegroups.com

Hi.

As a matter of fact there will be additional traffic, but in most cases
it won't be noticeable. I have not yet seen situations where network
would be a bottleneck for replication.

The additional traffic will be even in case it runs on same machine?

You are not supposed to have different data in a master-slave cluster.
You may have missing transactions, but those will be covered by state
transfer. Never do this in multi-master cluster. If your data diverges
there is no way to detect it.

In case there is do some data which was written to slave (due to application error, etc...), will the state transfer from primary erase it and restore all to same data?

Regards.

Alex Yurchenko

unread,

Apr 3, 2012, 6:50:09 PM4/3/12

to codersh...@googlegroups.com

On 2012-04-03 11:44, SyRenity wrote:
> Hi.
>
>
> As a matter of fact there will be additional traffic, but in most
> cases
>> it won't be noticeable. I have not yet seen situations where network
>> would be a bottleneck for replication.
>>
>
> The additional traffic will be even in case it runs on same machine?

Yes, arbitrator looks kike a node to the rest.

>> You are not supposed to have different data in a master-slave
>> cluster.
>>
>> You may have missing transactions, but those will be covered by
>> state
>> transfer. Never do this in multi-master cluster. If your data
>> diverges
>> there is no way to detect it.
>>
>
> In case there is do some data which was written to slave (due to
> application error, etc...), will the state transfer from primary
> erase it
> and restore all to same data?

Suppose nodes A and B split at seqno 10, and after that node A was
updated to seqno 12 and node B to seqno 13.

1) they won't automatically remerge because each of them will believe
the other node to be dead and form his own primary component. We can't
have PC nodes keep on reconnecting to failed ones.

2) if you make node A join node B, it will receive IST and you'll have
data inconsistency.

3) if you make node B join node A, it will receive full SST and become
consistent with node A.

Does this explain?

Regards,
Alex

> Regards.

Henrik Ingo

unread,

Apr 3, 2012, 7:08:12 PM4/3/12

to Alex Yurchenko, codersh...@googlegroups.com

On Wed, Apr 4, 2012 at 1:50 AM, Alex Yurchenko <alexey.y...@codership.com> wrote:

Suppose nodes A and B split at seqno 10, and after that node A was updated to seqno 12 and node B to seqno 13.

1) they won't automatically remerge because each of them will believe the other node to be dead and form his own primary component. We can't have PC nodes keep on reconnecting to failed ones.

2) if you make node A join node B, it will receive IST and you'll have data inconsistency.

3) if you make node B join node A, it will receive full SST and become consistent with node A.

Ah, so with the introduction of IST, using pc.ignore_sb is even more dangerous than before!

Good time to remind of a bug I once reported, suddenly this is not so unlikely to happen anymore:

https://bugs.launchpad.net/galera/+bug/843752

henrik

--
henri...@avoinelama.fi
+358-40-8211286 skype: henrik.ingo irc: hingo
www.openlife.cc

My LinkedIn profile: http://www.linkedin.com/profile/view?id=9522559

SyRenity

unread,

Apr 4, 2012, 9:06:20 AM4/4/12

to codersh...@googlegroups.com

Hi.

Suppose nodes A and B split at seqno 10, and after that node A was
updated to seqno 12 and node B to seqno 13.
1) they won't automatically remerge because each of them will believe
the other node to be dead and form his own primary component. We can't
have PC nodes keep on reconnecting to failed ones.
2) if you make node A join node B, it will receive IST and you'll have
data inconsistency.
3) if you make node B join node A, it will receive full SST and become
consistent with node A.
Does this explain?

Yes, I see now.

So as long as we make sure that node B connects node A (and not the other way around) and gets full SST, this option is safe to use. Correct?

Henrik Ingo

unread,

Apr 4, 2012, 2:22:39 PM4/4/12

to SyRenity, codersh...@googlegroups.com

On Wed, Apr 4, 2012 at 4:06 PM, SyRenity <stas....@gmail.com> wrote:

So as long as we make sure that node B connects node A (and not the other way around) and gets full SST, this option is safe to use. Correct?

But always remember that's quite a big if :-) The reason people want true syncronous replication like galera provides (with 3 nodes) is that we make human errors all the time, and then A and B are screwed.

henrik

SyRenity

unread,

Apr 4, 2012, 2:32:28 PM4/4/12

to codersh...@googlegroups.com, SyRenity, henri...@avoinelama.fi

But always remember that's quite a big if :-) The reason people want true syncronous replication like galera provides (with 3 nodes) is that we make human errors all the time, and then A and B are screwed.

You have a point of course, and if it not for the duplicate amount of traffic required for each arbitrator, I would have it running on every application server.

Alexey Yurchenko

unread,

Apr 18, 2012, 11:09:37 AM4/18/12

to codersh...@googlegroups.com, SyRenity, henri...@avoinelama.fi

On Wednesday, April 4, 2012 9:32:28 PM UTC+3, SyRenity wrote:

But always remember that's quite a big if :-) The reason people want true syncronous replication like galera provides (with 3 nodes) is that we make human errors all the time, and then A and B are screwed.

You have a point of course, and if it not for the duplicate amount of traffic required for each arbitrator, I would have it running on every application server.

Note that replication traffic is far thinner than client-server.

SyRenity

unread,

Apr 19, 2012, 4:37:00 PM4/19/12

to codersh...@googlegroups.com, SyRenity, henri...@avoinelama.fi

Hi.

Note that replication traffic is far thinner than client-server.

You have any approx. numbers?

Alex Yurchenko

unread,

Apr 19, 2012, 5:02:53 PM4/19/12

to codersh...@googlegroups.com

YMMV, but for sysbench it is like 10 times.

--

Vadim Tkachenko

unread,

Apr 19, 2012, 11:06:31 PM4/19/12

to Alex Yurchenko, codersh...@googlegroups.com

In our experiments it seems that sysbench, when running on single
allocated box and connecting to 3 nodes,
it already uses whole 1Gb network and further increasing of nodes does
not show increase of throughput
because network sysbench<->switch is overloaded.

So it really seems that client network sooner becomes bottleneck.

> --
> You received this message because you are subscribed to the Google Groups
> "codership" group.
> To post to this group, send email to codersh...@googlegroups.com.
> To unsubscribe from this group, send email to
> codership-tea...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/codership-team?hl=en.
>

--
Vadim Tkachenko, CTO, Percona Inc.
Phone +1-925-400-7377, Skype: vadimtk153
Schedule meeting: http://tungle.me/VadimTkachenko

Looking for Replication with Data Consistency?
Try Percona XtraDB Cluster!

SyRenity

unread,

Apr 22, 2012, 6:41:43 AM4/22/12

to codersh...@googlegroups.com

On Friday, April 20, 2012 12:02:53 AM UTC+3, Alexey Yurchenko wrote:

YMMV, but for sysbench it is like 10 times.

Wow, then it significant indeed.

Will definitely look into having arbitrator on same node as primary.

Reply all

Reply to author

Forward