Galera Cluster backup via SST

1,498 views
Skip to first unread message

hunter86bg

unread,
Jan 27, 2016, 6:49:54 AM1/27/16
to codership
Hello guys,

I've been recently (past 4-5 days) trying to do a backup on a test platform of 4 VMs- 3 CentOS 7 (lastest updates, minimal install) and 1 Backup VM (with rsyncd).I invoke the backup  by the Galera Arbitrator and rsyncd with  a custom script and I succeeded but only if I use the "wsrep_sst_rsync" script. Copying this script as "wsrep_sst_backup" in the relevant directory causes issues , although they are 1:1.
Actually, trying to invoke a SST via xtrabackup-v2 failed with a bug complaining about the Joiner's version.
By the way, Here is a short guide how I managed to create the galera backup

1. Prepare an rsync server:

A) Prepare the rsync daemon to be constantly running on the recieving node(s):


Edit the config file:

# vim /etc/rsyncd.conf


lock file = /var/run/rsync.lock

log file
= /var/log/rsyncd.log

pid file
= /var/run/rsyncd.pid


[BACKUP]
 

path
= /share2/galera_backup

uid
= root

gid
= root

secrets file
= /etc/rsync.secrets #not needed if you use exchange the ssh keys (passwordless)
 

read only
= no

hosts allow
= galera1,galera2,galera3,galera4



B) Modify the systemd service to keep the rsync daemon always listening , as we don't have a rsync cluster resource.

#cp /usr/lib/systemd/system/rsyncd.service /etc/systemd/system/
# vim /etc/systemd/system/rsyncd.service

[Unit]

Description=fast remote file copy program daemon

ConditionPathExists=/etc/rsyncd.conf


[Service]

PIDFile=/var/run/rsyncd.pid

Lockfile=/var/run/rsyncd.lock

EnvironmentFile=/etc/sysconfig/rsyncd

ExecStart=/usr/bin/rsync --daemon --no-detach "$OPTIONS"

ExecStop=/usr/bin/pkill rsync

ExecStopPost=/usr/bin/rm /var/run/rsyncd.pid

ExecStopPost=/usr/bin/rm /var/run/rsyncd.lock


 

RestartSec=5s

Restart=always


[Install]

WantedBy=default.target


and save it.Then implement the changes via “systemctl daemon-reload”


Set the rsyncd to start on boot, and then start it via:

#systemctl enable rsyncd && systemctl start rsyncd

Check if the rsyncd is listening on it's default port:

#netstat -alpn | grep rsync


C) Create the folder defined in the BACKUP section of the rsync config:

# mkdir -p /share2/galera_backup

And provide proper permissions according to the users that will be used for backup.


Now test that the rsync will actually succeeed to copy any files by issuing the command from the donor's node:

#rsync -av /some_test_directory rsync://root@rsync_server_hostname_or_ip/BACKUP

Note that BACKUP is the section defined in our rsync server's config (it's not a folder)

If it succeeds , we can proceed with the next step.


2.Prepare the donor for backup.

First, we need to select a node that will be a donor for our platform.

The script calling the donor must check what the donor's status is!The status can be checked with the following query: “SELECT STATUS LIKE 'wsrep_local_state_comment';”.The user doesn't need to have any grants at all.

Create any user to monitor the server with

 CREATE USER monitor@'%';”
The user should have no grants at all.


Example of galera arbitrator command:

garbd --address gcomm://galera1:4567 --group DbCluster --donor galera1 --sst 'rsync:galera4/BACKUP'

garbd -------> galera arbitrator process (daemon/one-time)

--address gcomm://galera1:4567 ----> "gcomm://hostname:default_port_for_node_communication" this section is configured in all galera's my.cnf file

--group DbCluster ---> cluster's name defined in all galera's my.cnf file

--donor galera1 ----> inform the cluster that node with hostname galera1 will be a donor , and not to become as such for any other node

--sst 'rsync:galera4/BACKUP' ----> points the method for State Snapshot Transfer's method ; in our case we use method rsync (: is used as separator), hostname where the rsync daemon is listening (/ is separating),BACKUP - definded in the rsyncd as [BACKUP] section with parameters

Note: this example is from a galera with not running mysqld on the host. If the garbd process is invoked from a running node we need to tell the cluster at which port our galera arbitrator will be listening via the modified command:

garbd --address gcomm://galera1:4567?gmcast.listen_addr=tcp://0.0.0.0:4444 --group DbCluster --donor galera1 --sst 'rsync:galera4/BACKUP'


3.Create a custom invoking script to check if the current node is the active node,check that the donor is in sync with the rest of the cluster, check the number of the nodes (as we don't want to leave the cluster with less than 2 working nodes), check that the receiving side is available and after that invoke the galera arbitrator to initiate a backup via sst.

I have made a copy of the " /usr/bin/wsrep_sst_backup" and edited line 187 and removed "-log_dir".
Here is my custom script to invoke the backup:
#!/bin/bash -uex
#This Script makes several checks before initiating a galera backup
#Developed by: S.Nikolov
#ver.: 0.2

cluster_name
=GALERA
donor
=galera3
receiver
=backup #hostname where the rsyncd will be listening (could be our host)
minimum_cluster_size
=2
destination
=BACKUP #rsync section name
mysql_user
=monitor #cluster status user "CREATE USER monitor@'%';"

#check if donor is in sync
mysql
-u $mysql_user -h $donor -e "SHOW STATUS LIKE  'wsrep_local_state_comment';" 2> /dev/null | grep Synced | cut  -f 2
if [ $? == 0 ]; then
    echo
"Donor" $donor " is in sync"
       
else
    echo
"Donor not in sync or we can't get it's status.Cannot proceed with the backup.Exiting"; exit;
fi
#check the cluster size ( we don't want to start a backup with only 1 node )
actual_cluster_size
=$(mysql -u $mysql_user -h $donor -e "SHOW STATUS LIKE  'wsrep_cluster_size';" 2> /dev/null | grep  size | cut -f 2)
if [ -z $actual_cluster_size ];then  #if we didn't get the cluster size - > exit
    echo
"Cluster size unknown"; exit;
       
else # check if cluster size is not too small
           
if [ $actual_cluster_size -lt $minimum_cluster_size ]; then    
                echo
" Cluster size is too small (" $actual_cluster_size ")"; exit
                       
else
                echo
" Cluster size is enough for backup (" $actual_cluster_size ")"
           
fi
fi
#check  the rsyncd status on the backup server
systemctl
-H $receiver status rsyncd | grep running 2> /dev/null
if [ $? == 0 ]; then
    echo
"Rsyncd is running.Proceeding with the backup.";
               
else    
    echo
"Rsyncd not running. Trying to start it";
        systemctl
-H $receiver start rsyncd;
        systemctl
-H $receiver status rsyncd | grep running 2> /dev/null
           
if [ $? != 0 ]; then
            echo
"We failed to start the rsync daemon. Aborting Backup"; exit;
           
fi
fi
#invoke the backup script from the local node;use gcomm section if we got galera running here
garbd
--address gcomm://$donor:4567  --group $cluster_name --donor $donor --sst rsync:$receiver/$destination && echo BACKUP PROCESS STARTED AT  $(date '+%d-%b-%y %H:%M');

You will need to edit it a little bit, to fit it to your environment.
If you copy the "wsrep_sst_rsync" as "wsrep_sst_backup" and invoke it with "--sst backup" it fails. But in this case it works fine.

Note that I had tested it with CentOS7 and latest rpms from galeracluster.com; and adding a new node ,with the changed rsync script, works fine.Sadly I didn't have enough time to check if the  "rsync_sst_backup" exists and it's been created later than the invoke script was started.
Please feel free to comment.


Best Regards,
Strahil Nikolov
Message has been deleted

hunter86bg

unread,
Jan 28, 2016, 9:30:30 AM1/28/16
to codership
Due to type mistake read : "I have made a copy of the " /usr/bin/wsrep_sst_backup" and edited line 187 and removed "-log_dir"." as:
"I have made a backup of the " /usr/bin/wsrep_sst_rsync" and edited line 187 by removing "-log_dir"."

Ale C.

unread,
Jan 28, 2016, 9:51:08 AM1/28/16
to codership
Amazing! 
Will try this one out an let you know !

hunter86bg

unread,
Feb 4, 2016, 2:37:18 AM2/4/16
to codership
Hi,
Could you provide some feedback ?
Reply all
Reply to author
Forward
0 new messages