mainaining uptime for even numbered cluster

37 views
Skip to first unread message

Mati

unread,
Jan 11, 2017, 10:33:11 AM1/11/17
to codership
Hello,

We have a 2X2 Galera cluster (2 data centers, 2 servers per data center). Which is obviously far from the optimal or the recommended layout. We are operating under budget constraints.
Yesterday and today one data center failed and only 2 servers remained active, on data center B. Due to the even number of servers the entire cluster was inactive.
Is it possible to configure the cluster so that if data center A is down, but data center B is still up, that the cluster will still be operational?

Kind regards,
Mati Skiba

Philip Stoev

unread,
Jan 11, 2017, 11:09:55 AM1/11/17
to Mati, codersh...@googlegroups.com
Hello,

You may wish to do one of the following:

1. Install a galera arbitrator to run in datacenter A. It does not require a
full-blown database server with storage, however it will receive all
replication events from the rest of the cluster.
2. Look at the pc.weight wsrep provider option to affect the way cluster
quorum is calculated. You may find it useful to simulate various failure
scenarios using a spreadsheet and determine if the cluster will behave as
desired under modified values of pc.weight.

Thank you.

Philip Stoev
--
You received this message because you are subscribed to the Google Groups
"codership" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to codership-tea...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mati

unread,
Jan 14, 2017, 8:33:25 AM1/14/17
to codership, batt...@gmail.com
Great ideas.

Thank you very much,
Mati Skiba

Emil Petkov

unread,
Jan 22, 2017, 12:05:14 AM1/22/17
to codership
Hi Mati

You can definitely play with the pc.weight or have a Galera arbitrator in DC A or DC B, it is relatively lightweight.

But my advice would be to have yet another node in a third DC C, which can arbitrate between DC A and DC B.

It can be a fully blown Galera node member, and also serve as a backup in case of major disruptions.

This is the preferred design for active-active, highly available 2-DC scenarios such as yours.

--
Emil
Reply all
Reply to author
Forward
0 new messages