Galera cluster + HAproxy + Split brain

Xavier De Arburn

unread,

Apr 29, 2014, 6:14:29 AM4/29/14

to codersh...@googlegroups.com

Hi guys,

I setup a 3 nodes Mariadb galera cluster.

I have a web site who use the cluster.

I use haproxy for loadbalancing.

Everything is working great.

if node 1 is down haproxy redirecting sql requests on node 2 or node 3 and vice et versa.

I just want to know, when 2 nodes are down why the cluster is available yet? isn't supposed to be in split brain state (non-primary) and refuse any request?

if node 1 and 2 are down wsrep_cluster_status on node 3 remains in primary state and the web site still works.

When node 1&2 are back the synchronization works also

I not using any galera arbitrator and everything working even there is only one node up

is it normal?

Thank you

Jay Janssen

unread,

Apr 29, 2014, 8:11:19 AM4/29/14

to Xavier De Arburn, codersh...@googlegroups.com

Did you set pc.weight in wsrep_provider_options on node3 by chance?

--
You received this message because you are subscribed to the Google Groups "codership" group.
To unsubscribe from this group and stop receiving emails from it, send an email to codership-tea...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Jay Janssen
https://about.me/jay.janssen

Xavier De Arburn

unread,

Apr 29, 2014, 8:36:42 AM4/29/14

to codersh...@googlegroups.com, Xavier De Arburn

Did you set pc.weight in wsrep_provider_options on node3 by chance?

Nop here is my cluster.cnf same for all nodes:

[mysqld]
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_provider_options="gcache.size=256M; gcache.page_size=128M"
#wsrep_cluster_address=gcomm://
wsrep_cluster_address=gcomm://192.168.0.2,192.168.0.3,192.168.0.4
wsrep_cluster_name="coueb-mdb-cluster"
wsrep_node_address="192.168.0.2"
wsrep_node_name="vm402"
wsrep_sst_method=xtrabackup
#wsrep_sst_auth="root:iop"
wsrep_node_incoming_address=192.168.0.2
wsrep_sst_receive_address=192.168.0.2
wsrep_slave_threads=16

yan.zhang

unread,

Apr 29, 2014, 8:42:33 AM4/29/14

to codersh...@googlegroups.com

primary component is selected by quorum, and here is the detail about how quorum works. http://galeracluster.com/documentation-webpages/weightedquorum.html

if you don't set pc.weight, then its default value is 1. Then the quorum calculation formula would be

(sum(p_i) - sum(l_i)) / 2 < sum(m_i)

p_i—Members of the last seen primary component

l_i—Members that are known to have left gracefully

m_i—Current component members

I think the tricky part is how do you define a node "left gracefully". If you execute command like "stop_node"(give galera instance proper signal to exit), then the node is left gracefully. But if you stop node by "kill -9" command, then node is not left gracefully.

So I think to your case, you should stop node by executing command "stop_node". Right ? If you do, let's see the two steps.

a. stop node1. then node2 and node3 left.

(3 - 1) / 2 < 2.

So the left part is primary component.

b. stop node2. then node3 left.

(2 - 1) / 2 < 1.

So the left part is also primary component.

=====

If you really want to reproduce brain-split case, at b) step, you can kill node2 by 'kill -9' command, then

(2 - 0) / 2 == 1

The difference is that node2 is not left gracefully.

Jay Janssen

unread,

Apr 29, 2014, 8:50:26 AM4/29/14

to Xavier De Arburn, codersh...@googlegroups.com, Xavier De Arburn

How are the 2 nodes going down? Is it a crash on them or are they being simply stopped. Note that nodes that are cleanly shutdown exit the cluster gracefully and this does not count as a quorum election. You can easily reduce the number of nodes in the primary cluster to 1 in this way.

On April 29, 2014 at 8:36:57 AM, Xavier De Arburn (dear...@gmail.com) wrote:

Nop here is my cluster.cnf same for all nodes:

[mysqld]
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_provider_options="gcache.size=256M; gcache.page_size=128M"
#wsrep_cluster_address=gcomm://
wsrep_cluster_address=gcomm://192.168.0.2,192.168.0.3,192.168.0.4
wsrep_cluster_name="coueb-mdb-cluster"
wsrep_node_address="192.168.0.2"
wsrep_node_name="vm402"
wsrep_sst_method=xtrabackup
#wsrep_sst_auth="root:iop"
wsrep_node_incoming_address=192.168.0.2
wsrep_sst_receive_address=192.168.0.2
wsrep_slave_threads=16

--
Jay Janssen
https://about.me/jay.janssen

erkan yanar

unread,

Apr 29, 2014, 2:08:42 PM4/29/14

to codersh...@googlegroups.com

Get rid of the idea there is a configured number of nodes in the cluster.
It is all about joining an leaving.
*Only* if a node leaves *without* saying goodby quorum calculation is done.
This happens when a node/mysqld crashes.
Shuting down does not provoke a quorum calculation.

Regards
Erkan

--
über den grenzen muß die freiheit wohl wolkenlos sein

Joe

unread,

Apr 30, 2014, 1:44:52 AM4/30/14

to codersh...@googlegroups.com

Does anyone have a HAProxy config that they would be willing to share?

Kacper Gogół

unread,

Apr 30, 2014, 7:45:29 AM4/30/14

to codersh...@googlegroups.com

U got it on galera list too....
On percona site wiki ...

https://groups.google.com/forum/#!topic/codership-team/RO5ZyLnEWKo
http://www.percona.com/doc/percona-xtradb-cluster/5.5/howtos/haproxy.html

W dniu 2014-04-30 07:44, Joe pisze:

Xavier De Arburn

unread,

Apr 30, 2014, 11:25:25 AM4/30/14

to codersh...@googlegroups.com, Xavier De Arburn

Le mardi 29 avril 2014 14:50:26 UTC+2, Jay Janssen a écrit :

How are the 2 nodes going down? Is it a crash on them or are they being simply stopped. Note that nodes that are cleanly shutdown exit the cluster gracefully and this does not count as a quorum election. You can easily reduce the number of nodes in the primary cluster to 1 in this way.

I begin to understand, i simply stopped node 1&2 so this is not a split brain state because these two nodes exited from cluster.

One more question:

I have two nodes in datacenter1 and another in datacenter 2

is it necessary to setup pc.weight in wsrep_provider_options?

Thank you

Sai Nuthalapati

unread,

Mar 14, 2016, 4:14:23 PM3/14/16

to codership, dear...@gmail.com

I am looking your post... We are in the process of setting up the Galera cluster in two data centers.

MarKus, did you get answer for your question.

I would like to know the best practices for the number of nodes. do you see any issues having 2 nodes in Datacenter and 1 in another? When do we need to use Arbitrator?

Reply all

Reply to author

Forward