MariaDB Galera and Geo redundancy

40 views
Skip to first unread message

Sergio Charrua

unread,
Oct 7, 2024, 2:39:29 PM10/7/24
to codership
Hello all,

I am trying to simulate 2 clusters in different datacenters/different locations, using VMWare VSXi and 6 VMs: 3 are located on DC1 and other 3 on DC2. All nodes are RHEL 9.
Both DCs communicate through the same VLAN and this works fine.
I have installed MariaDB 10.5.22 + Galera.

On DC1 , the 3 nodes are all data nodes, where node #1 was bootstrapped using galera_new_cluster script 
On DC2, 2 nodes are Data nodes and I added 1 Arbitrator

When starting nodes, all 6 are connected on the same cluster, sharing data, and this seems to be working fine.

The problem that I find is when I simulate a network outage: I modify VLAN ID on DC2 and both DCs loose connectivity. 
Looking at each node's logs, they effectively loose connectivity, but :
- all nodes (3 from DC1 and 2 from DC2) are unavailable for queries
- Arbitrator keeps retrying to connect to nodes, ex.:
Oct  7 20:08:16 ire-lab-se3 garb-systemd[8999]: 2024-10-07 20:08:16.562  INFO: (dd79df53-972e, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.20.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 358971429 cwnd: 1 last_queued_since: 359271583299206 last_delivered_since: 359271583299206 send_queue_length: 0 send_queue_bytes: 0

I was expecting that:
- DC1 would remain available and synced, after all, the bootstrap was on Node#1 and there are 3 nodes available on this network
- Garbd arbitrator would handle the remaining 2 nodes in DC2, allowing applications on DC2 to use the current data


I have tried multiple test scenarios, and this is the only one that is failing (network outage) and I can't seem to be able to solve the issue.
The configuration files on each Galera Data Node is similar to (only node name and ID differ):

#
# These groups are read by MariaDB server.
# Use it for options that only the server (but not clients) should see
#
# See the examples of server my.cnf files in /usr/share/mysql/
#

# this is read by the standalone daemon and embedded servers
[server]
log_error=/var/log/mariadb/mariadb.err
log_warnings=9
#default_time_zone='UTC'
table_definition_cache=4096
table_open_cache=4096
#innodb_read_only_compressed=OFF # only for 10.6 and later

# this is only for the mysqld standalone daemon
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mysqld/mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mariadb/mariadb.log
pid-file=/run/mariadb/mariadb.pid

# disabling symlinks is recommend to prevent assorted security risks
symbolic_links=0

#enable binary logging
log_bin=/var/log/mariadb/mariadb-bin
log_bin_index=/var/log/mariadb/mariadb-bin.index

#enable relay log files
relay_log=/var/log/mariadb/relay-bin
relay_log_index=/var/log/mariadb/relay-bin.index

log_slave_updates=1
performance_schema=ON
interactive_timeout=180
wait_timeout=180

max_connections=500

#
# * Galera-related settings
#
[galera]
# Mandatory settings
#wsrep_on=ON
#wsrep_provider=
#wsrep_cluster_address=
#binlog_format=row
#default_storage_engine=InnoDB
#innodb_autoinc_lock_mode=2
#
# Allow server to accept connections on all interfaces.
#
#bind-address=0.0.0.0
#
# Optional setting
#wsrep_slave_threads=1
#innodb_flush_log_at_trx_commit=0

# this is only for embedded server
[embedded]

# This group is only read by MariaDB servers, not by MySQL.
# If you use the same .cnf file for MySQL and MariaDB,
# you can put MariaDB-only options here
[mariadb]

# This group is only read by MariaDB-10.5 servers.
# If you use the same .cnf file for MariaDB of different versions,
# use this group for options that older servers don't understand
[mariadb-10.5]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
wsrep_on=ON
query_cache_size=0
query_cache_type=0
innodb_log_file_size=100M
innodb_file_per_table
innodb_flush_log_at_trx_commit=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
# below parameter should not include Arbitrators, only MariaDB nodes
wsrep_cluster_address="gcomm://10.20.0.1,10.20.0.2,10.20.0.3,10.20.0.4,10.20.0.5"
wsrep_cluster_name='galera_cluster_LAB'
wsrep_node_address='10.20.0.1'
wsrep_node_name='galera_LAB_1'
wsrep_sst_method='rsync'
#'mariabackup'
#wsrep_sst_auth=backupuser;backupuser
server_id=2
wsrep_provider_options='gmcast.segment=1;gcache.size=2G'
#wsrep_sst_donor="galera_1"



and the Gardb Arbitrator is set as:

# Copyright (C) 2012 Codership Oy
# This config file is to be sourced by garb service script.

# A comma-separated list of node addresses (address[:port]) in the cluster
 GALERA_NODES="10.20.0.4:4567 10.20.0.5:4567"

# Galera cluster name, should be the same as on the rest of the nodes.
 GALERA_GROUP="galera_cluster_LAB"

# Optional Galera internal options string (e.g. SSL settings)
# see https://galeracluster.com/library/documentation/galera-parameters.html
 GALERA_OPTIONS=""

# Log file for garbd. Optional, by default logs to syslog
# FOR SOME REASON IT DOESN'T WORK; RETURNS AN ERROR PRBABLY RELATED TO PERMISSIONS
# LOG_FILE="/var/log/mariadb/garbd.log"

# Where to persist necessary data
# WORK_DIR=""


So I think this is pretty much standard....

I googled a lot, but could not find anything that makes this work as expected. The requirements are that, in production, both DCs should communicate freely and if a network outage happens, application servers on both DCs should continue working with the local mariadb galera nodes (querying and inserting new rows, if required) and once the network is up again, all nodes should sync data between each of them (new data from DC1 exported to DC2 and new data from DC2 exported to DC1, and so on....)

Am I missing something here? could anyone help?

Thanks

Sergio



Matt Horwood

unread,
Oct 8, 2024, 3:05:39 AM10/8/24
to codership
Hi Sergio,

We have used Galera in cross DC mode, the first bit of magic is that any cluster need to be odd numbered. Other wise you could end up with split brain, in that no node knows who has the full data.

Also each DC will need a different `gmcast.segment`, that way the cluster knows who is local and who is remote.

Sergio Charrua

unread,
Oct 8, 2024, 5:12:25 AM10/8/24
to codership
Thanks Matt,

I will retry the setup but this time with 3 data nodes on each DC, without an arbitrator a see the results. 

Reply all
Reply to author
Forward
0 new messages