Galera High Availability

67 views
Skip to first unread message

Soumaya Msallem

unread,
Nov 8, 2023, 1:54:45 AM11/8/23
to codership
Hello Community,

I hope you are doing well.

I am a newbe to this fantastic world and community :-) 

I have some questions, If someone would help me or point out to some interesting links, it could be very helpful for me.

I have understand that Galera is a plugin which handles synchronous, multi-master replication between a bunch of MariaDB servers in a cluster, this part is clear for me. However, in lots of documentations, I have seen a mention that Galera is a High Availability solution for MariaDB : Does this mean that Galera is able to make a failed server rejoin the cluster ? Is it able to track the health of failed nodes and have them leave and then rejoin the cluster ? Does it have failover capability?

If someone have some interesting documentations relating to this topic, I will be happy to consume them. 

Thanks in advance for your time.

My best regards,
Soumaya 


Private Citizen

unread,
Nov 8, 2023, 5:10:24 AM11/8/23
to codership
I have understand that Galera is a plugin which handles synchronous, multi-master replication between a bunch of MariaDB servers in a cluster, this part is clear for me. However, in lots of documentations, I have seen a mention that Galera is a High Availability solution for MariaDB : Does this mean that Galera is able to make a failed server rejoin the cluster ? Is it able to track the health of failed nodes and have them leave and then rejoin the cluster ? Does it have failover capability?

The cluster is passive, if a node ask for updates it answers. Think of it as a master-slave update relationship that is spider web'ed instead of just a single connection between two nodes or a hub with spokes.

The cluster keeps track of failed nodes only as far as knowing who is currently connected. If a node falls off, the cluster makes note of the fact there is one less node connected, one less node to send updates to. But it doesn't preform CPR on that node. It is up to individual nodes to bring themselves back online with the cluster. Such as with local/systemd service monitoring and restarting, etc.

Only thing to keep in mind with all of this is split brain. If enough nodes fail before they are able to get back online you could end up with a locked up database (split brain protection) depending how you balance the weights of each node.

Reply all
Reply to author
Forward
0 new messages