Memory Leak bringing down cluster after cluster

452 views
Skip to first unread message

Tomi T

unread,
Feb 8, 2021, 5:34:04 AM2/8/21
to codership
I'm setting up MariaDB Galera Cluster and ran into what seems to be a memory leak. I've been importing dbdumps of our production databases to test clusters repeatedly as a part of testing and _sometimes_ mariadb process in running cluster nodes starts to use excessive amount of memory during an import and doesn't release it afterwards. This happens maybe in one out of three or four imports while most of the time memory usage doesn't increase at all given that it is already at normal or above normal level.

Our cluster nodes are automatically launched and configured AWS t3a.large instances with 2 vcpu and 8g RAM running AWS Linux 2. Normally the memory usage of mariadb process stays at 6.4g and we have 2.2g free RAM. Sometimes while dbdump import is on, all db cluster nodes start reserving more memory. Initially we didn't have swap and this crashed all nodes in the cluster in a couple of test runs. With 8 GB swap, the cluster stays up after malignant import with ~95% RAM full and ~30% swap used in each node. We can continue running imports with this memory exhausted cluster and still most of the time it works ok. But when the next memory hungry import happens by, the cluster will grind to a halt and the import will hang. After that we need to shutdown the cluster and bootstrap again.

After adding the swap we can make the cluster healthy by launching new nodes and retiring the ones with exhausted memory so we could basically automate the replacement and live with this issue. But I must say I'm not 110% confident that this setup is ready to take over from our 1-node setup. I'd be happy to hear your thoughts and if there's some configuration changes that I could try. Also I'd be happy to co-operate with devs of Galera Cluster to help investigating the issue further. 

More Details: 

- I generally don't restart instances, I replace them with new ones instead
- I have started many new clusters and can reproduce the described issue in a new cluster
- All instances are in the same availability zone
- Performance of the cluster is good -- one or more nodes the import time is approx 23 minutes
- Nodes have 64g general purpose ssd root filesystem 
- There are 6 dbdump files (500M,48M,9M,400M,41M,303M) total uncompressed size 6.6g 
- rexec in code snippets below is a custom function to execute commands remotely through ssh

DB Dump Creation Command: 

sudo mysqldump [database-name] --flush-logs | gzip > [dbdump_filename]

DB Dump Import Command: 

echo "create database if not exists '[database-name]'" | sudo mysql
zcat '[dbdump_filename]' | sudo mysql --database='[database-name]'

DB Node Installation Code (python3):

def SetupDBNode(args,instance,use_private_ip): 
    server_private_ip = GetInstanceIP(instance,True)
    server_public_ip = GetInstanceIP(instance,False)
    server_ip = GetInstanceIP(instance,use_private_ip)
    db_cluster_instances = GetAWSInstances(True,security_group_id_db_servers)
    db_cluster_ips = GetClusterIPs(db_cluster_instances,True)

    if len(db_cluster_ips)==0: 
        err("ERROR: SetupDBNode - could not resolve even one cluster ip! Abort.")
        Exit(1)

    if len(db_cluster_ips)==1 and not (args.start_db_cluster or args.force_start_db_cluster): 
        err("ERROR: SetupDBNode - could resolve only one cluster node and [start-db-cluster or force-start-db-cluster] parameter was not given! Abort.")
        Exit(1)

    if not "Tags" in instance: 
        err("ERROR: SetupDBNode - could not find Tags from our instance "+json.dumps(instance)+"! Abort.")
        Exit(1)

    server_name = ResolveInstanceName(instance)
    conf_directory = ScriptDirectory()+"/conf/db-cluster"

    info("SetupDBNode - Distributing '"+conf_directory+"' to "+server_ip+":"+DistTargetDirectory()+".")
    rsync_send(conf_directory, DistTargetDirectory()+"/", server_ip, dist_user, DBServersPemFileName(), ["--exclude", ".git", "--exclude", ".vscode", "--exclude", ".gitignore"])

    info("SetupDBNode - Replacing variables in DB configurations.")
    rexec(server_ip, "find "+DistTargetDirectory()+" -type f | xargs -I repl sed -i 's/@@HOST-PRIVATE-IP@@/"+server_private_ip+"/g' 'repl'")
    rexec(server_ip, "find "+DistTargetDirectory()+" -type f | xargs -I repl sed -i 's/@@HOST-NAME@@/"+server_name+"/g' 'repl'")
    rexec(server_ip, "find "+DistTargetDirectory()+" -type f | xargs -I repl sed -i 's/@@DB-CLUSTER-NODES@@/"+",".join(db_cluster_ips)+"/g' 'repl'")

    info("SetupDBNode - Installing Python3.")
    rexec(server_ip, "sudo yum install -y python3")

    info("SetupDBNode - Installing EPEL.")
    rexec(server_ip, "sudo amazon-linux-extras install -y epel")

    info("SetupDBNode - Setting up config for Nagios remote monitoring.")
    SetupNagiosSSH(server_ip)

    info("SetupDBNode - Installing MariaDB.")
    rexec(server_ip, "sudo cp "+DistTargetDirectory()+"/yum.repos.d/MariaDB.repo /etc/yum.repos.d/")
    rexec(server_ip, "sudo yum -y install MariaDB-server MariaDB-client")

    info("SetupDBNode - Installing MariaDB Configuration.")
    rexec(server_ip, "sudo cp "+DistTargetDirectory()+"/mysql/my.cnf.d/* /etc/my.cnf.d/")

    info("SetupDBNode - Enabling mariadb service for automagic start after node reboot.")
    rexec(server_ip, "sudo systemctl enable mariadb")

    info("SetupDBNode - Installing Included Configuration files for mariadb.service")
    rexec(server_ip, "sudo mkdir -p /etc/systemd/system/mariadb.service.d")
    rexec(server_ip, "sudo cp "+DistTargetDirectory()+"/mysql/mariadb.service.d/* /etc/systemd/system/mariadb.service.d/")
    rexec(server_ip, "sudo systemctl daemon-reload")

    info("SetupDBNode - Creating logfiles for MariaDB.")
    rexec(server_ip, "sudo touch /var/log/mysqld.log && sudo chown mysql.mysql /var/log/mysqld.log && sudo touch /var/log/mysqld-slow.log && sudo chown mysql.mysql /var/log/mysqld-slow.log")

    start_time = datetime.now()

    if args.force_start_db_cluster: 
        info("SetupDBNode - force-start => setting safe_to_bootstrap.")
        rexec(server_ip, "sudo replace 'safe_to_bootstrap: 0' 'safe_to_bootstrap: 1' -- /var/lib/mysql/grastate.dat")

    if args.start_db_cluster or args.force_start_db_cluster: 
        info("SetupDBNode - Starting the Galera Cluster.")
        rexec(server_ip, "sudo galera_new_cluster")
    else: 
        info("SetupDBNode - Starting a new DB Cluster Node.")
        rexec(server_ip, "sudo systemctl start mariadb")

    info("SetupDBNode - Removing Distribution Directory from the node.")
    rexec(server_ip, "sudo rm -rf "+DistTargetDirectory())

    info("SetupDBNode - Updating Config and Content of web cluster [to take advantage of this new node]")
    UpdateWebClusterConfigAndContent(db_cluster_ips,False)

    info("SetupDBNode - Updating Nagios Config.")
    UpdateNagiosClusterConfig(GetAWSInstances(True,security_group_id_db_servers),GetAWSInstances(True,security_group_id_web_servers),GetAWSInstances(True,security_group_id_mgt_servers),True)

    info("SetupDBNode - Done setting up node "+server_ip+".")
    return


-- Configuration Files:

conf/db-cluster/mysql/mariadb.service.d/limits.conf
conf/db-cluster/mysql/my.cnf.d/server.cnf
conf/db-cluster/yum.repos.d/MariaDB.repo


-- limits.conf:

[Service]
LimitNOFILE=50000


-- server.cnf: 

[mysqld]

transaction-isolation=READ-COMMITTED
datadir=/var/lib/mysql
log-error = /var/log/mysqld.log
socket=/var/lib/mysql/mysql.sock
user=mysql
default_storage_engine=InnoDB
skip-name-resolve
slow_query_log = 1
slow-query_log_file = /var/log/mysqld-slow.log
long_query_time = 10
binlog_format = ROW
performance_schema = on
max_connections = 150
bind-address=@@HOST-PRIVATE-IP@@

innodb_buffer_pool_size=5G
innodb_autoinc_lock_mode=2
innodb_io_capacity = 200 
innodb_read_io_threads = 4
innodb_write_io_threads = 2
innodb_log_buffer_size = 128M
innodb_log_file_size = 512M
innodb_flush_log_at_trx_commit = 0 
innodb_flush_method = O_DIRECT_NO_FSYNC

[galera]
wsrep_on=ON
wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so

wsrep_node_name='@@HOST-NAME@@'
wsrep_node_address="@@HOST-PRIVATE-IP@@"
wsrep_cluster_name='services-db'
wsrep_cluster_address="gcomm://@@DB-CLUSTER-NODES@@"

wsrep_provider_options="gcache.size=1G; gcache.page_size=1G"
wsrep_slave_threads=4 
wsrep_sst_method=rsync

-- MariaDB.repo: 

# MariaDB 10.5 CentOS repository list - created 2020-12-17 23:42 UTC
[mariadb]
name = MariaDB
gpgcheck=1





Alexey Yurchenko

unread,
Feb 8, 2021, 8:10:13 AM2/8/21
to codership
While it is hard to say what causes such behavior, perhaps you should try to set up streaming replication for dump imports: https://galeracluster.com/library/documentation/using-sr.html

Also STATUS variables dump when this "leak" happens could have helped.

Tomi T

unread,
Feb 9, 2021, 4:34:27 AM2/9/21
to codership
Thanks a lot for the suggestion regarding streaming replication. I will try if it helps! 

I also wanted to provide you with those status variables so yesterday I imported dbdumps many times to a 2-node cluster but couldn't repro the issue. As the last task yesterday I replaced the older node by adding a new node to the cluster and shut down the old. Today morning I did another dbdump import and the issue was finally reproduced. Both new nodes are now out of RAM and using quite a chunk of their swaps.

I attached here mysqld.log and output of 'show global status' from the older node of the cluster.

20210209-db-cluster-oom.zip

Alexey Yurchenko

unread,
Feb 9, 2021, 6:16:44 AM2/9/21
to codership
> wsrep_cert_index_size    54163252

This is quite a number - by itself its about 2Gb of RAM. And you have very big transactions (well, db dumps). Streaming replication should help with it.
Reply all
Reply to author
Forward
0 new messages