GaleraCluster INSERT very slow

1,326 views
Skip to first unread message

Robin Theys

unread,
Jul 18, 2018, 5:51:01 AM7/18/18
to codership
Hi,

I have a galera cluster with 5 nodes, and a maxcsale in front with sperate read/write process.

I just write a script that INSERT 1000 record . It took 24 sec for just 1000 simple insert.

How can I improve write statement?

[galera]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
query_cache_size=0
query_cache_type=0
performance_schema = ON
wsrep_slave_threads= 2
innodb_flush_log_at_trx_commit = 1
innodb_buffer_pool_size = 3G
innodb_log_buffer_size  = 1G
wsrep_provider_options="gcache.size=256M"
bind-address=0.0.0.0
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="xxxxxxxxxxx"
wsrep_cluster_address="gcomm://xxxxxxxxxxxxxxxxxxxxx"
wsrep_node_address="172.16.1.215"
wsrep_node_name="xxxxxxxxxx"
wsrep_sst_method=rsync

Bojan F

unread,
Jul 18, 2018, 6:18:35 AM7/18/18
to codership

In short if you enable binary logs on all galera nodes (if not on all it will not be much better cause the ones without binary logs will be bottle neck), I noticed much better performances:
log_bin=/tmp/binlog

As I was investigating MariaDB source code few months ago I noticed that if you don't enable binary logs "sync" command is called for every record and so performances are many times worse than with enabled binary logs. Even I didn't need binary logs but just because of perfomances.

Please try above solution and reply if you noticed any improvement. Just to know that my case is not specific.

Robin Theys

unread,
Jul 18, 2018, 7:48:14 AM7/18/18
to codership
Hey Bojan,


thanks for reply I have set log_bin but no changes


[galera]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
query_cache_size=0
query_cache_type=0
wsrep_slave_threads= 8
innodb_flush_log_at_trx_commit = 0

innodb_buffer_pool_size = 3G
innodb_log_buffer_size  = 1G
max_connections=2000
sync_binlog=0
log_bin=/tmp/binlog
bind-address=0.0.0.0
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="galera01"
wsrep_cluster_address="gcomm://xxxxxx"
wsrep_node_address="xxxxxxx"
wsrep_node_name="xxxxxx"
wsrep_sst_method=rsync

Bojan F

unread,
Jul 18, 2018, 8:12:50 AM7/18/18
to codership
I am not sure if it is ok that you have all parameters inside section [galera] if you really have so on your system?
Cause I afraid your command log_bin was ignored in this case and as I see also some other probably (innodb_flush_log_at_trx_commit).
As I know inside [galera] should be only wsrep parameters.

For example log_bin should be inside [server]. Did you get anything on /tmp/binlog in your case or it was empty? If empty then property was not used and so performace was for sure same. Anyway maybe you should create this folder /tmp/binlog before too.

Check example about sections from my link from previous post and try to define similar sections with appropriate parameters inside these sections.

Just to be sure that this is not the problem.

Robin Theys

unread,
Jul 18, 2018, 9:54:25 AM7/18/18
to codership
so I modify my config and here's:

# MariaDB database server configuration file.
#
# You can copy this file to one of:
# - "/etc/mysql/my.cnf" to set global options,
# - "~/.my.cnf" to set user-specific options.
#
# One can use all long options that the program supports.
# Run program with --help to get a list of available options and with
# --print-defaults to see which it would actually understand and use.
#
# For explanations see
# http://dev.mysql.com/doc/mysql/en/server-system-variables.html

# This will be passed to all mysql clients
# It has been reported that passwords should be enclosed with ticks/quotes
# escpecially if they contain "#" chars...
# Remember to edit /etc/mysql/debian.cnf when changing the socket location.
[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock

# Here is entries for some specific programs
# The following values assume you have at least 32M ram

# This was formally known as [safe_mysqld]. Both versions are currently parsed.
[mysqld_safe]
socket          = /var/run/mysqld/mysqld.sock
nice            = 0

[mysqld]
#
# * Basic Settings
#
user            = mysql
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr
datadir         = /home/data/mysql
tmpdir          = /tmp
lc_messages_dir = /usr/share/mysql
lc_messages     = en_US
skip-external-locking
#
# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
#bind-address           = 127.0.0.1
#
# * Fine Tuning
#
max_connections         = 2000
connect_timeout         = 5
wait_timeout            = 600
max_allowed_packet      = 64M
thread_cache_size       = 128
sort_buffer_size        = 4M
bulk_insert_buffer_size = 16M
tmp_table_size          = 32M
max_heap_table_size     = 32M
#
# * MyISAM
#
# This replaces the startup script and checks MyISAM tables if needed
# the first time they are touched. On error, make copy and try a repair.
myisam_recover_options = BACKUP
key_buffer_size         = 128M
#open-files-limit       = 2000
table_open_cache        = 400
myisam_sort_buffer_size = 512M
concurrent_insert       = 2
read_buffer_size        = 2M
read_rnd_buffer_size    = 1M
#
# * Query Cache Configuration
#
# Cache only tiny result sets, so we can fit more in the query cache.
query_cache_limit               = 128K
query_cache_size                = 0
query_cache_type                = OFF
# for more write intensive setups, set to DEMAND or OFF
#query_cache_type               = DEMAND
#
# * Logging and Replication
#
# Both location gets rotated by the cronjob.
# Be aware that this log type is a performance killer.
# As of 5.1 you can enable the log at runtime!
#general_log_file        = /var/log/mysql/mysql.log
#general_log             = 1
#
# Error logging goes to syslog due to /etc/mysql/conf.d/mysqld_safe_syslog.cnf.
#
# we do want to know about network errors and such
log_warnings            = 2
#
# Enable the slow query log to see queries with especially long duration
#slow_query_log[={0|1}]
slow_query_log_file     = /var/log/mysql/mariadb-slow.log
long_query_time = 10
#log_slow_rate_limit    = 1000
log_slow_verbosity      = query_plan

#log-queries-not-using-indexes
#log_slow_admin_statements
#
# The following can be used as easy to replay backup logs or for replication.
# note: if you are setting up a replication slave, see README.Debian about
#       other settings you may need to change.
#server-id              = 1
#report_host            = master1
#auto_increment_increment = 2
#auto_increment_offset  = 1
log_bin                 = /var/log/mysql/mariadb-bin
log_bin_index           = /var/log/mysql/mariadb-bin.index
# not fab for performance, but safer
sync_binlog             = 0
expire_logs_days        = 10
max_binlog_size         = 100M
# slaves
#relay_log              = /var/log/mysql/relay-bin
#relay_log_index        = /var/log/mysql/relay-bin.index
#relay_log_info_file    = /var/log/mysql/relay-bin.info
#log_slave_updates
#read_only
#
# If applications support it, this stricter sql_mode prevents some
# mistakes like inserting invalid dates etc.
#sql_mode               = NO_ENGINE_SUBSTITUTION,TRADITIONAL
#
# * InnoDB
#
# InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/.
# Read the manual for more InnoDB related options. There are many!
default_storage_engine  = InnoDB
# you can't just change log file size, requires special procedure
#innodb_log_file_size   = 50M

innodb_buffer_pool_size = 3G
innodb_log_buffer_size  = 1G
innodb_file_per_table   = 1
innodb_open_files       = 4000
innodb_io_capacity      = 4000
innodb_flush_method     = O_DIRECT
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
#
# * Security Features
#
# Read the manual, too, if you want chroot!
# chroot = /var/lib/mysql/
#
# For generating SSL certificates I recommend the OpenSSL GUI "tinyca".
#
# ssl-ca=/etc/mysql/cacert.pem
# ssl-cert=/etc/mysql/server-cert.pem
# ssl-key=/etc/mysql/server-key.pem

#
# * Galera-related settings
#
[galera]
binlog_format=ROW
default-storage-engine=innodb
wsrep_slave_threads= 8
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="galera01"
wsrep_cluster_address="gcomm://xxxxxxxx"
wsrep_node_address="xxxxx"
wsrep_node_name="xxxxx"
wsrep_sst_method=rsync
#innodb_flush_log_at_trx_commit=0

[mysqldump]
quick
quote-names
max_allowed_packet      = 64M

[mysql]
#no-auto-rehash # faster start of mysql but no tab completion

[isamchk]
key_buffer              = 64M

#
# * IMPORTANT: Additional settings that can override those from this file!
#   The files must end with '.cnf', otherwise they'll be ignored.
#
!include /etc/mysql/mariadb.cnf
!includedir /etc/mysql/conf.d/


and now it take 23 sec for 1000 insert.... no luck really.

On Wednesday, 18 July 2018 11:51:01 UTC+2, Robin Theys wrote:

Bojan F

unread,
Jul 18, 2018, 10:43:51 AM7/18/18
to codership
Do you run mariadb as service?
I guess you restarted it after changes of parameters?
Do you see now any binlog file inside: /var/log/mysql/mariadb-bin


Can you do the following:
1. restart the service again
2. start inserts
3. during the test execute in mysql console: show status like "wsrep%"
Just to see if this will give us any useful info.


Can you stop all nodes except one, so you will have only one "live" node? And then to try to insert if there is any performance change

Robin Theys

unread,
Jul 18, 2018, 10:56:47 AM7/18/18
to codership
ok no problem did it here s the output :

MariaDB [(none)]>  show status like "wsrep%";
+------------------------------+-------------------------------------------------------------------------------------------+
| Variable_name                | Value                                                                                     |
+------------------------------+-------------------------------------------------------------------------------------------+
| wsrep_apply_oooe             | 0.002297                                                                                  |
| wsrep_apply_oool             | 0.000000                                                                                  |
| wsrep_apply_window           | 1.002297                                                                                  |
| wsrep_causal_reads           | 0                                                                                         |
| wsrep_cert_deps_distance     | 63.769525                                                                                 |
| wsrep_cert_index_size        | 29                                                                                        |
| wsrep_cert_interval          | 0.000000                                                                                  |
| wsrep_cluster_conf_id        | 179                                                                                       |
| wsrep_cluster_size           | 5                                                                                         |
| wsrep_cluster_state_uuid     | 48138f93-6a4f-11e8-8e39-5787a7a70402                                                      |
| wsrep_cluster_status         | Primary                                                                                   |
| wsrep_commit_oooe            | 0.000000                                                                                  |
| wsrep_commit_oool            | 0.000000                                                                                  |
| wsrep_commit_window          | 1.000000                                                                                  |
| wsrep_connected              | ON                                                                                        |
| wsrep_desync_count           | 0                                                                                         |
| wsrep_evs_delayed            |                                                                                           |
| wsrep_evs_evict_list         |                                                                                           |
| wsrep_evs_repl_latency       | 0.0100037/0.0104955/0.0109873/0.000491793/2                                               |
| wsrep_evs_state              | OPERATIONAL                                                                               |
| wsrep_flow_control_paused    | 0.000000                                                                                  |
| wsrep_flow_control_paused_ns | 0                                                                                         |
| wsrep_flow_control_recv      | 0                                                                                         |
| wsrep_flow_control_sent      | 0                                                                                         |
| wsrep_gcomm_uuid             | 71d420f3-8a91-11e8-b084-67738639805b                                                      |
| wsrep_incoming_addresses     | 172.16.1.215:3306,172.16.1.211:3306,172.16.1.212:3306,172.16.1.213:3306,172.16.1.214:3306 |
| wsrep_last_committed         | 47000                                                                                     |
| wsrep_local_bf_aborts        | 0                                                                                         |
| wsrep_local_cached_downto    | 45695                                                                                     |
| wsrep_local_cert_failures    | 0                                                                                         |
| wsrep_local_commits          | 0                                                                                         |
| wsrep_local_index            | 2                                                                                         |
| wsrep_local_recv_queue       | 0                                                                                         |
| wsrep_local_recv_queue_avg   | 0.001515                                                                                  |
| wsrep_local_recv_queue_max   | 2                                                                                         |
| wsrep_local_recv_queue_min   | 0                                                                                         |
| wsrep_local_replays          | 0                                                                                         |
| wsrep_local_send_queue       | 0                                                                                         |
| wsrep_local_send_queue_avg   | 0.083333                                                                                  |
| wsrep_local_send_queue_max   | 2                                                                                         |
| wsrep_local_send_queue_min   | 0                                                                                         |
| wsrep_local_state            | 4                                                                                         |
| wsrep_local_state_comment    | Synced                                                                                    |
| wsrep_local_state_uuid       | 48138f93-6a4f-11e8-8e39-5787a7a70402                                                      |
| wsrep_protocol_version       | 8                                                                                         |
| wsrep_provider_name          | Galera                                                                                    |
| wsrep_provider_vendor        | Codership Oy <in...@codership.com>                                                         |
| wsrep_provider_version       | 25.3.23(r3789)                                                                            |
| wsrep_ready                  | ON                                                                                        |
| wsrep_received               | 1320                                                                                      |
| wsrep_received_bytes         | 1367776                                                                                   |
| wsrep_repl_data_bytes        | 0                                                                                         |
| wsrep_repl_keys              | 0                                                                                         |
| wsrep_repl_keys_bytes        | 0                                                                                         |
| wsrep_repl_other_bytes       | 0                                                                                         |
| wsrep_replicated             | 0                                                                                         |
| wsrep_replicated_bytes       | 0                                                                                         |
| wsrep_thread_count           | 9                                                                                         |
+------------------------------+-------------------------------------------------------------------------------------------+
58 rows in set (0.001 sec)


On Wednesday, 18 July 2018 11:51:01 UTC+2, Robin Theys wrote:

Robin Theys

unread,
Jul 19, 2018, 3:36:12 AM7/19/18
to codership
And for final test, I stop all node execpt one , and 1000 insert took 11sec....


On Wednesday, 18 July 2018 11:51:01 UTC+2, Robin Theys wrote:

Bojan F

unread,
Jul 19, 2018, 3:43:41 AM7/19/18
to codership
First this from my previous post: Do you see any binlog file inside: /var/log/mysql/mariadb-bin
Just to be sure log files are created.


My other suggestion would be if you can try excatly the same test with change that you have only one node that is not configured as Galera Cluster.
Just to see if without galera is much better or there is some other problem in the system.

Robin Theys

unread,
Jul 19, 2018, 3:52:06 AM7/19/18
to codership
yep see the binlog file:

d /var/log/mysql/mariadb-bin.
mariadb-bin.000042  mariadb-bin.000045  mariadb-bin.000048  mariadb-bin.000051  mariadb-bin.000054  mariadb-bin.000057  mariadb-bin.000060  mariadb-bin.000063  mariadb-bin.000066
mariadb-bin.000043  mariadb-bin.000046  mariadb-bin.000049  mariadb-bin.000052  mariadb-bin.000055  mariadb-bin.000058  mariadb-bin.000061  mariadb-bin.000064  mariadb-bin.index
mariadb-bin.000044  mariadb-bin.000047  mariadb-bin.000050  mariadb-bin.000053  mariadb-bin.000056  mariadb-bin.000059  mariadb-bin.000062  mariadb-bin.000065

Robin Theys

unread,
Jul 19, 2018, 3:54:01 AM7/19/18
to codership
and at the moment, I just use one node all other node are stopped..


MaxScale> list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
server1            | xxxxxxxxxx    |  3306 |           0 | Down
server2            | xxxxxxxxxx    |  3306 |           0 | Down
server3            | xxxxxxxxxx    |  3306 |           0 | Down
server4            | xxxxxxxxxx    |  3306 |           0 | Down
server5            | xxxxxxxxxxx    |  3306 |           0 | Master, Synced, Running

Bojan F

unread,
Jul 19, 2018, 4:04:07 AM7/19/18
to codership
But anyway as I know it is not same if one node is configured as Galera Cluster (as is in your case - with wsrep configuration) or as normal standalone MariaDB (without wsrep configuration).

With this test we will see if performance problem is for sure connected with Galera Cluster and not with anything else. That was my idea.

Robin Theys

unread,
Jul 19, 2018, 4:29:45 AM7/19/18
to codership
ok I have comment the galera config in my.cnf, start mariadb service and retry the insert, same issue, 1000 insert took 10sec.

/etc/mysql# cat my.cnf

[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock

[mysqld_safe]
socket          = /var/run/mysqld/mysqld.sock
nice            = 0

[mysqld]
user            = mysql
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr
datadir         = /home/data/mysql
tmpdir          = /tmp
lc_messages_dir = /usr/share/mysql
lc_messages     = en_US
skip-external-locking
#bind-address           = 127.0.0.1

max_connections         = 2000
connect_timeout         = 5
wait_timeout            = 600
max_allowed_packet      = 64M
thread_cache_size       = 128
sort_buffer_size        = 4M
bulk_insert_buffer_size = 16M
tmp_table_size          = 32M
max_heap_table_size     = 32M
# * MyISAM

myisam_recover_options = BACKUP
key_buffer_size         = 128M
#open-files-limit       = 2000
table_open_cache        = 400
myisam_sort_buffer_size = 512M
concurrent_insert       = 2
read_buffer_size        = 2M
read_rnd_buffer_size    = 1M
# * Query Cache Configuration

query_cache_limit               = 128K
query_cache_size                = 0
query_cache_type                = 0
#

#general_log_file        = /var/log/mysql/mysql.log
#general_log             = 1
log_warnings            = 2

#slow_query_log[={0|1}]
slow_query_log_file     = /var/log/mysql/mariadb-slow.log
long_query_time = 10
#log_slow_rate_limit    = 1000
log_slow_verbosity      = query_plan

#log-queries-not-using-indexes
#log_slow_admin_statements
#
#server-id              = 1
#report_host            = master1
#auto_increment_increment = 2
#auto_increment_offset  = 1
log_bin                 = /var/log/mysql/mariadb-bin
log_bin_index           = /var/log/mysql/mariadb-bin.index
# not fab for performance, but safer
binlog_format           = ROW

sync_binlog             = 0
expire_logs_days        = 10
max_binlog_size         = 100M
# slaves
#relay_log              = /var/log/mysql/relay-bin
#relay_log_index        = /var/log/mysql/relay-bin.index
#relay_log_info_file    = /var/log/mysql/relay-bin.info
#log_slave_updates
#read_only
#
# If applications support it, this stricter sql_mode prevents some
# mistakes like inserting invalid dates etc.
#sql_mode               = NO_ENGINE_SUBSTITUTION,TRADITIONAL
#
# * InnoDB
default_storage_engine  = InnoDB
#innodb_log_file_size   = 50M

innodb_buffer_pool_size = 3G
innodb_log_buffer_size  = 1G
innodb_file_per_table   = 1
innodb_open_files       = 4000
innodb_io_capacity      = 4000
innodb_flush_method     = O_DIRECT
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit = 2
#
#[galera]
#wsrep_slave_threads= 50
#wsrep_on=ON
#wsrep_provider=/usr/lib/galera/libgalera_smm.so
#wsrep_cluster_name="galera01"
#wsrep_cluster_address="gcomm://xxxxxxxxx"
#wsrep_node_address="xxxxxx"
#wsrep_node_name="xxxxx"
#wsrep_sst_method=rsync


[mysqldump]
quick
quote-names
max_allowed_packet      = 64M

[mysql]

[isamchk]
key_buffer              = 64M

!include /etc/mysql/mariadb.cnf
!includedir /etc/mysql/conf.d/

Bojan F

unread,
Jul 19, 2018, 6:35:59 AM7/19/18
to codership
Then it looks like Galera is not the problem but something else.

How do you execute "inserts"? What are you using?
Can you try to execute inserts directly on same machine where you have this the only one mariaDB node - locally? Just to see that there is not any network or maxscale issue.

Arjen Lentz

unread,
Jul 19, 2018, 7:24:13 PM7/19/18
to codersh...@googlegroups.com


Hi Robin

>innodb_flush_method = O_DIRECT

Try the one server with that line commented out.

If that yields a significant difference, you may wish to check your storage layer performance and latency with the HDlatency tool.

Regards,
Arjen.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Rick Ruggiero

unread,
Jul 19, 2018, 7:33:15 PM7/19/18
to Arjen Lentz, codersh...@googlegroups.com
Maybe looking at the SQL itself may help, can you upload it to google drive?


Rick Ruggiero
e: rick.r...@gmail.com

--
You received this message because you are subscribed to the Google Groups "codership" group.
To unsubscribe from this group and stop receiving emails from it, send an email to codership-tea...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Song

unread,
Oct 4, 2018, 2:47:11 AM10/4/18
to codership
Hello. Bojan. 

Wanted let you know that in my case enabling bin_log really helped. 
I have a Galera cluster with 3 nodes.

and adding log_bin to to my.cnf  file almost doubled the TPS from sysbench results.
from 100~200 TPS to 400~600 depending on the number of rows in the table. (  --table-size parameter in sysbench ) 

but still I can't figure out reason why enabling binary log gives better performance while is has additional task to do. 
 
 
2018년 7월 18일 수요일 오후 7시 18분 35초 UTC+9, Bojan F 님의 말:

Bojan F

unread,
Oct 4, 2018, 3:58:07 AM10/4/18
to codership
David, thanks for info and I am glad it helps.

It was not logical to me also that additional work (saving binary logs) improves performance.

So few months ago I was investigating source code of MariaDB and as I noticed in both cases function log_and_orde is called, but class is different (TC_LOG_BINLOG-binary logs enabled, TC_LOG_MMAP-binary log disabled):
TC_LOG_BINLOG::log_and_order
MariaDB has parameter innodb_flush_log_at_trx_commit that improves performances a lot if value is 0 or 2 (maybe also 3) because binary logs are not sync to disk for every record but only once per second. And this parameter is used only if binary logs are enabled.
So this is main reason why is so much better if binary logs are used and also this  innodb_flush_log_at_trx_commit is set to 0 or 2


If binary logs are not enabled TC_LOG_MMAP::log_and_order is called. In this case these functions are called:

log.cc – int TC_LOG_MMAP::log_and_order(THD *thd, my_xid xid, bool all,

log.cc – int TC_LOG_MMAP::log_one_transaction(my_xid xid)

                  log.cc – int TC_LOG_MMAP::sync()

           my_mmap.c – int my_msync(int fd, void *addr, size_t len, int flags)

                           my_mmap.c – msync(addr, len, flags);

                           my_mmap.c – my_sync(fd, MYF(0));

                                          my_sync.c – int my_sync(File fd, myf my_flags)


This my_sync is very slow because it is called for every record (not once per second as in above case). For parameter my_flags I tried both options MS_ASYNC or MS_SYNC but total waiting time was same, the difference was just where will wait inside this function.

Maybe I am wrong but these were my observations when I was investigating why is so slow without binary logs.
So after that I set binary logs on all three Galera Cluster nodes and it is much faster.

I was investigating on MariaDB 10.2.6, maybe there is some change in later versions.
Reply all
Reply to author
Forward
0 new messages