Help needed: bad benchmark results

Henrik Ingo

unread,

Aug 4, 2011, 4:40:38 PM8/4/11

to codersh...@googlegroups.com

Hi

So vacation is over and I have been benchmarking Galera. I don't like
the results.

I'm aware of your own previous benchmarks:
http://www.codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited

I also ran sysbench, but didn't get those results. I get sysbench oltp
transactions:

2 334.38 317.99 386.39 385.88
8 1063.6 420.27 458.36 397.63
16 1620.59 323.91 285.12 351.1
32 2016.24 340.68 409.62 344.82
64 1596.61 312.34 237.85 314.97
128 1256.25 368.59 459.13 383.44
512 708.4 359.83 452.22 343.76
1023 661.51 289.21 219.89 314.38

First column = sysbench threads
Second column = single mysql node (*)
3-5 column = 3 node galera cluster, writing to 1, 2 and 3 nodes respectively

*) Same mysql-galera binary, just no other nodes connected. Server has
bbu write cache, so results are pretty similar with or without
sync_binlog and flush_trx set to 0.

Essentially, with galera replication turned on, I don't get much
better performance than I get with single thread. There's quite a lot
of variability too.

So I'm not an expert in troubleshooting performance problems (I'm an
expert in just plain avoiding them :-) anyway, but this seems clearly
a galera issue, so I know nothing about this and need help.

A few things I noticed during the benchmark:

run top and press 1 to get per-cpu statistics. The state will
fluctuate between reasonable load on %us and %sy, totaling a MySQL
load of 100-300%

top - 10:42:44 up 85 days, 16:21, 1 user, load average: 1.49, 1.50, 1.45
Tasks: 240 total, 1 running, 239 sleeping, 0 stopped, 0 zombie
Cpu0 : 22.1%us, 3.7%sy, 0.0%ni, 63.2%id, 0.0%wa, 0.0%hi, 11.0%si, 0.0%st
Cpu1 : 8.0%us, 3.7%sy, 0.0%ni, 87.3%id, 0.3%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu2 : 9.3%us, 2.3%sy, 0.0%ni, 87.7%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu3 : 8.9%us, 2.6%sy, 0.0%ni, 60.3%id, 27.5%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu4 : 16.9%us, 6.6%sy, 0.0%ni, 75.1%id, 0.0%wa, 0.0%hi, 1.3%si, 0.0%st
Cpu5 : 5.3%us, 2.0%sy, 0.0%ni, 91.7%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st
Cpu6 : 7.7%us, 2.0%sy, 0.0%ni, 89.7%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu7 : 9.0%us, 2.0%sy, 0.0%ni, 75.3%id, 12.7%wa, 0.0%hi, 1.0%si, 0.0%st
Mem: 32958664k total, 18286588k used, 14672076k free, 325192k buffers
Swap: 34996216k total, 0k used, 34996216k free, 10916132k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17656 root 18 0 16.6g 6.1g 7728 S 125.6 19.4 55:54.94 mysqld
28821 root 15 0 12868 1208 820 R 0.7 0.0 0:00.04 top
1 root 15 0 10348 696 584 S 0.0 0.0 0:04.22 init

And a state with almost no load on any cpu, except one cpu is 99% in
%wa state (very unusual and not good).

top - 10:44:02 up 85 days, 16:22, 1 user, load average: 0.97, 1.34, 1.40
Tasks: 240 total, 1 running, 239 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 0.0%ni, 98.3%id, 1.7%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 0.0%us, 0.3%sy, 0.0%ni, 0.0%id, 99.7%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32958664k total, 18289516k used, 14669148k free, 325192k buffers
Swap: 34996216k total, 0k used, 34996216k free, 10916144k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17656 root 19 0 16.6g 6.1g 7728 S 0.3 19.4 56:51.15 mysqld
28843 root 15 0 12868 1208 820 R 0.3 0.0 0:00.04 top
1 root 15 0 10348 696 584 S 0.0 0.0 0:04.22 init

Interval between those two states is a few seconds. (Hard to say when
top sampling has its own interval too.)

Show processlist would typically show:

and all other 500+ threads waiting to commit:

Using poormansprofiler, I sometimes could see a state which seems analogous:

# /usr/local/bin/poormansprofiler
278
254 pthread_cond_wait@@GLIBC_2.3.2,gcs_repl,galera::ReplicatorSMM::replicate(galera::TrxHandle*),galera_pre_commit,wsrep_run_wsrep_commit(THD*,,??,ha_commit_trans(THD*,,end_trans(THD*,,mysql_execute_command(THD*),mysql_parse(THD*,,dispatch_command(enum_server_command,,do_command(THD*),handle_one_connection,start_thread,clone
10 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,os_aio_simulated_handle,fil_aio_wait,io_handler_thread,start_thread,clone
2 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,srv_suspend_mysql_thread,row_mysql_handle_errors,row_search_for_mysql,ha_innodb::index_read,handler::read_range_first(st_key_range,handler::read_multi_range_first(st_key_multi_range**,,QUICK_RANGE_SELECT::get_next(),??,mysql_update(THD*,,mysql_execute_command(THD*),mysql_parse(THD*,,dispatch_command(enum_server_command,,do_command(THD*),handle_one_connection,start_thread,clone
1 select,os_thread_sleep,srv_monitor_thread,start_thread,clone
1 select,os_thread_sleep,srv_lock_timeout_thread,start_thread,clone
1 select,os_thread_sleep,srv_error_monitor_thread,start_thread,clone
1 select,handle_connections_sockets,main
1 pthread_cond_wait@@GLIBC_2.3.2,wsrep_rollback_process(THD*),start_wsrep_THD,start_thread,clone
1 pthread_cond_wait@@GLIBC_2.3.2,gu_fifo_get_head,gcs_recv,galera::GcsActionSource::process(void*),galera::ReplicatorSMM::async_recv(void*),galera_recv,wsrep_replication_process(THD*),start_wsrep_THD,start_thread,clone
1 pthread_cond_wait@@GLIBC_2.3.2,gcs_set_last_applied,galera::ServiceThd::thd_func(void*),start_thread,clone
1 pthread_cond_wait@@GLIBC_2.3.2,??,gcs_core_recv,??,start_thread,clone
1 fsync,os_file_fsync,at,fil_flush,fil_flush_file_spaces,buf_flush_batch,srv_master_thread,start_thread,clone
1 epoll_wait,asio::detail::task_io_service<asio::detail::epoll_reactor>::run(asio::error_code&),gcomm::AsioProtonet::event_loop(gu::datetime::Period,GCommConn::run(),GCommConn::run_fn(void*),start_thread,clone
1 do_sigwait,sigwait,signal_hand,start_thread,clone
[root@esitbi130lab mysql-galera]# /usr/local/bin/poormansprofiler
278
157 read,??,my_net_read,do_command(THD*),handle_one_connection,start_thread,clone
98 pthread_cond_wait@@GLIBC_2.3.2,gcs_repl,galera::ReplicatorSMM::replicate(galera::TrxHandle*),galera_pre_commit,wsrep_run_wsrep_commit(THD*,,??,ha_commit_trans(THD*,,end_trans(THD*,,mysql_execute_command(THD*),mysql_parse(THD*,,dispatch_command(enum_server_command,,do_command(THD*),handle_one_connection,start_thread,clone
10 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,os_aio_simulated_handle,fil_aio_wait,io_handler_thread,start_thread,clone
1 select,os_thread_sleep,srv_monitor_thread,start_thread,clone
1 select,os_thread_sleep,srv_lock_timeout_thread,start_thread,clone
1 select,os_thread_sleep,srv_error_monitor_thread,start_thread,clone
1 select,handle_connections_sockets,main
1 pthread_cond_wait@@GLIBC_2.3.2,wsrep_rollback_process(THD*),start_wsrep_THD,start_thread,clone
1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,srv_suspend_mysql_thread,row_mysql_handle_errors,row_search_for_mysql,ha_innodb::index_read,handler::read_range_first(st_key_range,handler::read_multi_range_first(st_key_multi_range**,,QUICK_RANGE_SELECT::get_next(),??,mysql_update(THD*,,mysql_execute_command(THD*),mysql_parse(THD*,,dispatch_command(enum_server_command,,do_command(THD*),handle_one_connection,start_thread,clone
1 pthread_cond_wait@@GLIBC_2.3.2,gu_fifo_get_head,gcs_recv,galera::GcsActionSource::process(void*),galera::ReplicatorSMM::async_recv(void*),galera_recv,wsrep_replication_process(THD*),start_wsrep_THD,start_thread,clone
1 pthread_cond_wait@@GLIBC_2.3.2,gcs_set_last_applied,galera::ServiceThd::thd_func(void*),start_thread,clone
1 pthread_cond_wait@@GLIBC_2.3.2,??,gcs_core_recv,??,start_thread,clone
1 fsync,os_file_fsync,at,fil_flush,fil_flush_file_spaces,buf_flush_batch,srv_master_thread,start_thread,clone
1 epoll_wait,asio::detail::task_io_service<asio::detail::epoll_reactor>::run(asio::error_code&),gcomm::AsioProtonet::event_loop(gu::datetime::Period,GCommConn::run(),GCommConn::run_fn(void*),start_thread,clone
1 do_sigwait,sigwait,signal_hand,start_thread,clone
[root@esitbi130lab mysql-galera]# /usr/local/bin/poormansprofiler
278
160 read,??,my_net_read,do_command(THD*),handle_one_connection,start_thread,clone
63 pthread_cond_wait@@GLIBC_2.3.2,gcs_repl,galera::ReplicatorSMM::replicate(galera::TrxHandle*),galera_pre_commit,wsrep_run_wsrep_commit(THD*,,??,ha_commit_trans(THD*,,end_trans(THD*,,mysql_execute_command(THD*),mysql_parse(THD*,,dispatch_command(enum_server_command,,do_command(THD*),handle_one_connection,start_thread,clone
32 pthread_cond_wait@@GLIBC_2.3.2,galera::Monitor<galera::ReplicatorSMM::LocalOrder>::enter(galera::ReplicatorSMM::LocalOrder&),galera::ReplicatorSMM::cert(galera::TrxHandle*),galera::ReplicatorSMM::pre_commit(galera::TrxHandle*),galera_pre_commit,wsrep_run_wsrep_commit(THD*,,??,ha_commit_trans(THD*,,end_trans(THD*,,mysql_execute_command(THD*),mysql_parse(THD*,,dispatch_command(enum_server_command,,do_command(THD*),handle_one_connection,start_thread,clone
10 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,os_aio_simulated_handle,fil_aio_wait,io_handler_thread,start_thread,clone
1 select,os_thread_sleep,srv_monitor_thread,start_thread,clone
1 select,os_thread_sleep,srv_master_thread,start_thread,clone
1 select,os_thread_sleep,srv_lock_timeout_thread,start_thread,clone
1 select,os_thread_sleep,srv_error_monitor_thread,start_thread,clone
1 select,handle_connections_sockets,main
1 pthread_cond_wait@@GLIBC_2.3.2,wsrep_rollback_process(THD*),start_wsrep_THD,start_thread,clone
1 pthread_cond_wait@@GLIBC_2.3.2,gcs_set_last_applied,galera::ServiceThd::thd_func(void*),start_thread,clone
1 pthread_cond_wait@@GLIBC_2.3.2,??,gcs_core_recv,??,start_thread,clone
1 pthread_cond_wait@@GLIBC_2.3.2,galera::Monitor<galera::ReplicatorSMM::LocalOrder>::enter(galera::ReplicatorSMM::LocalOrder&),galera::ReplicatorSMM::cert(galera::TrxHandle*),galera::ReplicatorSMM::process_trx(void*,,galera::GcsActionSource::dispatch(void*,,galera::GcsActionSource::process(void*),galera::ReplicatorSMM::async_recv(void*),galera_recv,wsrep_replication_process(THD*),start_wsrep_THD,start_thread,clone
1 my_hash_sort_simple,hp_rec_hashnr,hp_write_key,heap_write,ha_heap::write_row(unsigned,handler::ha_write_row(unsigned,??,??,sub_select(JOIN*,,??,JOIN::exec(),mysql_select(THD*,,handle_select(THD*,,??,mysql_execute_command(THD*),mysql_parse(THD*,,dispatch_command(enum_server_command,,do_command(THD*),handle_one_connection,start_thread,clone
1 epoll_wait,asio::detail::task_io_service<asio::detail::epoll_reactor>::run(asio::error_code&),gcomm::AsioProtonet::event_loop(gu::datetime::Period,GCommConn::run(),GCommConn::run_fn(void*),start_thread,clone
1 do_sigwait,sigwait,signal_hand,start_thread,clone
[root@esitbi130lab mysql-galera]# /usr/local/bin/poormansprofiler
278
254 pthread_cond_wait@@GLIBC_2.3.2,gcs_repl,galera::ReplicatorSMM::replicate(galera::TrxHandle*),galera_pre_commit,wsrep_run_wsrep_commit(THD*,,??,ha_commit_trans(THD*,,end_trans(THD*,,mysql_execute_command(THD*),mysql_parse(THD*,,dispatch_command(enum_server_command,,do_command(THD*),handle_one_connection,start_thread,clone
10 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,os_aio_simulated_handle,fil_aio_wait,io_handler_thread,start_thread,clone
1 select,os_thread_sleep,srv_monitor_thread,start_thread,clone
1 select,os_thread_sleep,srv_lock_timeout_thread,start_thread,clone
1 select,os_thread_sleep,srv_error_monitor_thread,start_thread,clone
1 select,os_thread_sleep,fil_flush,fil_flush_file_spaces,buf_flush_batch,srv_master_thread,start_thread,clone
1 select,os_thread_sleep,fil_flush,fil_flush_file_spaces,buf_flush_batch,log_preflush_pool_modified_pages,at,at,log_free_check,index=0x33ad44d8,,row_ins_index_entry,at,at,at,row_insert_for_mysql,ha_innodb::write_row,handler::ha_write_row(unsigned,write_record(THD*,,mysql_insert(THD*,,mysql_execute_command(THD*),mysql_parse(THD*,,dispatch_command(enum_server_command,,do_command(THD*),handle_one_connection,start_thread,clone
1 select,handle_connections_sockets,main
1 pthread_cond_wait@@GLIBC_2.3.2,wsrep_rollback_process(THD*),start_wsrep_THD,start_thread,clone
1 pthread_cond_wait@@GLIBC_2.3.2,gu_fifo_get_head,gcs_recv,galera::GcsActionSource::process(void*),galera::ReplicatorSMM::async_recv(void*),galera_recv,wsrep_replication_process(THD*),start_wsrep_THD,start_thread,clone
1 pthread_cond_wait@@GLIBC_2.3.2,gcs_set_last_applied,galera::ServiceThd::thd_func(void*),start_thread,clone
1 pthread_cond_wait@@GLIBC_2.3.2,??,gcs_core_recv,??,start_thread,clone
1 fsync,os_file_fsync,at,fil_flush,fil_flush_file_spaces,buf_flush_batch,log_preflush_pool_modified_pages,at,at,log_free_check,at,at,row_update_for_mysql,ha_innodb::update_row,handler::ha_update_row(unsigned,mysql_update(THD*,,mysql_execute_command(THD*),mysql_parse(THD*,,dispatch_command(enum_server_command,,do_command(THD*),handle_one_connection,start_thread,clone
1 epoll_wait,asio::detail::task_io_service<asio::detail::epoll_reactor>::run(asio::error_code&),gcomm::AsioProtonet::event_loop(gu::datetime::Period,GCommConn::run(),GCommConn::run_fn(void*),start_thread,clone
1 do_sigwait,sigwait,signal_hand,start_thread,clone

Given that MySQL always seems to keep 10 threads ready for new
connections (which means, doing nothing), I interpret the above as
basically all threads (254) being in some call path inside galera
replication, with a long tail of threads each in a different state,
some being a galera call path, some not.

Since you can get good scalability with the same test, I suppose the
question is just what I've done wrong?

The servers are running CentOS 5.4, some kind of EMC storage and XFS filesystem.

henrik

--
henri...@avoinelama.fi
+358-40-8211286 skype: henrik.ingo irc: hingo
www.openlife.cc

My LinkedIn profile: http://www.linkedin.com/profile/view?id=9522559

Alex Yurchenko

unread,

Aug 4, 2011, 5:08:52 PM8/4/11

to codersh...@googlegroups.com

Hi Henrik!

This obviously requires a good sleepover or two, but I just can't help
quick response:

This is surely very not like normal:

0) Are you sure you're running OLTP-complex benchmark? Other modes were
not tested much.

1) Single galera node cannot be so much slower than a stock server.
Single node does not even replicate anything.

2) Note that G performance peaks at 8 threads. Sharply.

3) Which brings us to: are you sure that stock mysql and galera mysql
are configure equally? The above two observations hint that you might
be using unmodified my.cnf that comes with the demo distribution and
which was made so minimal as to work on my old laptop with 756Mb RAM.

4) Also, I notice that you're running only one slave thread. It
normally makes little difference in sysbench (and tremendous difference
in dbt2), but still I'd advise to set it to 32 - just for fun.

5) If nothing of the above helps it may make sense to run a benchmark
against a standalone MySQL-wsrep server without Galera loaded (SET
GLOBAL wsrep_provider='none';)

6) A very important source for G troubleshooting is SHOW STATUS LIKE
'wsrep%'. Some stats there are differential, so it makes sense to do two
polls with a short (10-20 sec) interval to see the numbers corresponding
to this particular time.

Regards,
Alex

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

seppo....@codership.com

unread,

Aug 4, 2011, 5:20:35 PM8/4/11

to codersh...@googlegroups.com

Hi Henrik,

Please show also the exact sysbench invocation (prepare and run) and
sysbench result summary.

Thanks, Seppo

Quoting Alex Yurchenko <alexey.y...@codership.com>:

> --
> You received this message because you are subscribed to the Google
> Groups "codership" group.
> To post to this group, send email to codersh...@googlegroups.com.
> To unsubscribe from this group, send email to
> codership-tea...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/codership-team?hl=en.

--
http://www.codership.com seppo....@codership.com
tel: +358 40 510 5938 skype: seppo_jaakola

Henrik Ingo

unread,

Aug 5, 2011, 1:44:50 AM8/5/11

to Alex Yurchenko, seppo....@codership.com, codersh...@googlegroups.com

On Fri, Aug 5, 2011 at 12:08 AM, Alex Yurchenko
<alexey.y...@codership.com> wrote:
> Hi Henrik!
>
> This obviously requires a good sleepover or two, but I just can't help quick
> response:
>
> This is surely very not like normal:

No!

> 0) Are you sure you're running OLTP-complex benchmark? Other modes were not
> tested much.

In fact, I compiled sysbench 0.5 and am running the oltp.lua test. I
believe that's the successor to the OLTP-complex.

> 1) Single galera node cannot be so much slower than a stock server. Single
> node does not even replicate anything.

Single galera node is good. The first column in my table (or second,
if you account for the column with #threads) is the same mysql-galera
binary running standalone.

> 2) Note that G performance peaks at 8 threads. Sharply.

You mean in general or in my tests? There's a small peak on 8 threads
in columns 3 to 5 which are the clustered benchmarks, but the
variability in the results is high enough to almost completely account
for that.

> 3) Which brings us to: are you sure that stock mysql and galera mysql are
> configure equally? The above two observations hint that you might be using
> unmodified my.cnf that comes with the demo distribution and which was made
> so minimal as to work on my old laptop with 756Mb RAM.

There's no stock mysql in this test, just a single galera node (good
performance) vs 3 node cluster (flat performance). The my.cnf is the
same for all.

I've increased the innodb buffer pool to 15000M, other than that it is
the my.cnf from galera demo package. The sysbench data set should be
around 3GB so fits in memory.

> 4) Also, I notice that you're running only one slave thread. It normally
> makes little difference in sysbench (and tremendous difference in dbt2), but
> still I'd advise to set it to 32 - just for fun.

Will try. One way to interpret the results is that no group commit is
happening and only one transaction is replicated at the time, so
suspecting something like this is one thing to explore. On the other
hand, I understood that the slave threads speed up applying
transactions on the slave, but do not help in the commit phase?

> 5) If nothing of the above helps it may make sense to run a benchmark
> against a standalone MySQL-wsrep server without Galera loaded (SET GLOBAL
> wsrep_provider='none';)

Standalone server, even with galera loaded, is fine.

> 6) A very important source for G troubleshooting is SHOW STATUS LIKE
> 'wsrep%'. Some stats there are differential, so it makes sense to do two
> polls with a short (10-20 sec) interval to see the numbers corresponding to
> this particular time.

I'll grab a few snapshots. But you'll have to explain to me what to
look at and what is expected/normal values.

On Fri, Aug 5, 2011 at 12:20 AM, <seppo....@codership.com> wrote:
> Hi Henrik,
>
> Please show also the exact sysbench invocation (prepare and run) and
> sysbench result summary.

$ cat sysbench_galera.sh
#!/bin/sh

# Run this manually to prepare some tables:
# sysbench --test=sysbench/tests/db/oltp.lua --mysql-host=esitbi130lab \
# --mysql-user=root --mysql-password=rootpass --mysql-table-engine=InnoDB \
# --mysql-engine-trx=yes --oltp-table-size=2000000 --max-time=300 \
# --max-requests=0 --num-threads=1 --oltp-tables-count=10 \
# --oltp-auto-inc=off prepare \
# The above creates a dataset roughly
# 189,8*2000000*10/1024/1024 = 3620M = 3,5G

NUM_THREADS="1 2 4 8 16 32 64 128 256 512 1023"
#NUM_THREADS="512 1023"

for N in $NUM_THREADS
do
# --oltp-auto-inc=off is undocumented, but important when running galera.
# sysbench assumes that autoinc columns will be a continuos sequence 1,2,3...
# but with galera they're not (auto_increment_increment)
# --mysql-host=esitbi130lab,esitbi129lab,esitbi127lab \
sysbench/sysbench --test=sysbench/tests/db/oltp.lua \
--mysql-host=esitbi130lab,esitbi129lab \
--mysql-user=root --mysql-password=rootpass \
--mysql-table-engine=InnoDB --mysql-engine-trx=yes \
--oltp-table-size=2000000 --max-time=150 \
--max-requests=0 --num-threads=$N --oltp-tables-count=10 \
--oltp-auto-inc=off run
sleep 5
done

And a few sample results follow. First some good results from single
node runs with mysql-galera binary.

Running the test with following options:
Number of threads: 16
Random number generator seed is 0 and will be ignored

WARNING: timer was never started
Threads started!

OLTP test statistics:
queries performed:
read: 3403428
write: 972408
other: 486204
total: 4862040
transactions: 243102 (1620.59 per sec.)
deadlocks: 0 (0.00 per sec.)
read/write requests: 4375836 (29170.55 per sec.)
other operations: 486204 (3241.17 per sec.)

General statistics:
total time: 150.0087s
total number of events: 243102
total time taken by event execution: 2399.0378s
response time:
min: 6.04ms
avg: 9.87ms
max: 646.63ms
approx. 95 percentile: 13.62ms

Threads fairness:
events (avg/stddev): 15193.8750/50.02
execution time (avg/stddev): 149.9399/0.00

sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 32
Random number generator seed is 0 and will be ignored

WARNING: timer was never started
Threads started!

OLTP test statistics:
queries performed:
read: 4267564
write: 1219304
other: 609652
total: 6096520
transactions: 304826 (2016.24 per sec.)
deadlocks: 0 (0.00 per sec.)
read/write requests: 5486868 (36292.38 per sec.)
other operations: 609652 (4032.49 per sec.)

General statistics:
total time: 151.1851s
total number of events: 304826
total time taken by event execution: 4836.2077s
response time:
min: 6.72ms
avg: 15.87ms
max: 5304.59ms
approx. 95 percentile: 22.90ms

Threads fairness:
events (avg/stddev): 9525.8125/137.69
execution time (avg/stddev): 151.1315/0.00

Running the test with following options:
Number of threads: 2
Random number generator seed is 0 and will be ignored

Threads started!

OLTP test statistics:
queries performed:
read: 702226
write: 200636
other: 100318
total: 1003180
transactions: 50159 (334.38 per sec.)
deadlocks: 0 (0.00 per sec.)
read/write requests: 902862 (6018.90 per sec.)
other operations: 100318 (668.77 per sec.)

General statistics:
total time: 150.0044s
total number of events: 50159
total time taken by event execution: 299.8241s
response time:
min: 4.48ms
avg: 5.98ms
max: 24.08ms
approx. 95 percentile: 6.88ms

Threads fairness:
events (avg/stddev): 25079.5000/0.50
execution time (avg/stddev): 149.9121/0.00

And here some poor results from 3 node cluster and sysbench connecting
to 1 node only:

Running the test with following options:
Number of threads: 2
Random number generator seed is 0 and will be ignored

Threads started!

OLTP test statistics:
queries performed:
read: 667814
write: 190804
other: 95402
total: 954020
transactions: 47701 (317.99 per sec.)
deadlocks: 0 (0.00 per sec.)
read/write requests: 858618 (5723.89 per sec.)
other operations: 95402 (635.99 per sec.)

General statistics:
total time: 150.0060s
total number of events: 47701
total time taken by event execution: 299.8352s
response time:
min: 4.73ms
avg: 6.29ms
max: 24.04ms
approx. 95 percentile: 7.32ms

Threads fairness:
events (avg/stddev): 23850.5000/1.50
execution time (avg/stddev): 149.9176/0.00

sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 8
Random number generator seed is 0 and will be ignored

WARNING: timer was never started
Threads started!

OLTP test statistics:
queries performed:
read: 965412
write: 275832
other: 137916
total: 1379160
transactions: 68958 (420.27 per sec.)
deadlocks: 0 (0.00 per sec.)
read/write requests: 1241244 (7564.82 per sec.)
other operations: 137916 (840.54 per sec.)

General statistics:
total time: 164.0812s
total number of events: 68958
total time taken by event execution: 1306.7396s
response time:
min: 4.75ms
avg: 18.95ms
max: 72592.95ms
approx. 95 percentile: 10.18ms

Threads fairness:
events (avg/stddev): 8619.7500/981.06
execution time (avg/stddev): 163.3425/1.81

sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 16
Random number generator seed is 0 and will be ignored

WARNING: timer was never started
Threads started!

OLTP test statistics:
queries performed:
read: 699174
write: 199764
other: 99882
total: 998820
transactions: 49941 (323.91 per sec.)
deadlocks: 0 (0.00 per sec.)
read/write requests: 898938 (5830.46 per sec.)
other operations: 99882 (647.83 per sec.)

General statistics:
total time: 154.1796s
total number of events: 49941
total time taken by event execution: 2435.1176s
response time:
min: 4.76ms
avg: 48.76ms
max: 20290.17ms
approx. 95 percentile: 14.62ms

Threads fairness:
events (avg/stddev): 3121.3125/278.44
execution time (avg/stddev): 152.1949/1.39

sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 32
Random number generator seed is 0 and will be ignored

WARNING: timer was never started
Threads started!

OLTP test statistics:
queries performed:
read: 784378
write: 224108
other: 112054
total: 1120540
transactions: 56027 (340.68 per sec.)
deadlocks: 0 (0.00 per sec.)
read/write requests: 1008486 (6132.16 per sec.)
other operations: 112054 (681.35 per sec.)

General statistics:
total time: 164.4585s
total number of events: 56027
total time taken by event execution: 5162.3920s
response time:
min: 4.78ms
avg: 92.14ms
max: 67708.54ms
approx. 95 percentile: 25.22ms

Threads fairness:
events (avg/stddev): 1750.8438/110.76
execution time (avg/stddev): 161.3247/1.00

Henrik Ingo

unread,

Aug 5, 2011, 3:39:40 AM8/5/11

to Alex Yurchenko, seppo....@codership.com, codersh...@googlegroups.com

Ok, I tried increasing wsrep_slave_threads to 32 and then also to 256,
but it doesn't change anything.

sysbench oltp transactions:

thr 32 256
1 203.34 204.02
2 387.25 375.9
4 446.01 402.98
8 253.79 309.35
16 438.92 389.13
32 284.44 301.81
64 401.27 383.11
128 287.86 314.16
256 390.81 365.83
512 294.24 330.12
1023 375.53 325.45

sysbench oltp 95% latency:

thr 32 256
1 4.99 4.97
2 5.37 5.35
4 7.53 7.54
8 9.09 9.11
16 11.37 11.34
32 18.34 18.31
64 222.53 223.86
128 2757.81 3619.14
256 4543.67 4519.25
512 6572.01 6227.27
1023 7697.31 8873.39

I was sampling SHOW STATUS LIKE "%wsrep%"; and the following values
seem bad, according to your documentation:

SET GLOBAL wsrep_slave_threads=32;

| wsrep_local_send_queue | 32 |
| wsrep_local_send_queue_avg | 1.353282 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_avg | 1.369707 |

| wsrep_local_send_queue | 34 |
| wsrep_local_send_queue_avg | 0.824589 |
| wsrep_local_recv_queue | 30 |
| wsrep_local_recv_queue_avg | 1.415460 |
| wsrep_flow_control_paused | 0.790793 |
| wsrep_flow_control_sent | 9 |
| wsrep_flow_control_recv | 52 |
| wsrep_cert_deps_distance | 105.201550 |

wsrep_slave_threads=256:

| wsrep_local_send_queue | 190 |
| wsrep_local_send_queue_avg | 52.787226 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_avg | 23.375606 |
| wsrep_flow_control_paused | 0.903926 |
| wsrep_flow_control_sent | 22 |
| wsrep_flow_control_recv | 68 |
| wsrep_cert_deps_distance | 137.518919 |
| wsrep_apply_oooe | 0.349014 |
| wsrep_apply_oool | 0.012709 |
| wsrep_apply_window | 2.714530 |

Initially, with only 1 or 2 sysbench threads the queue's are down at
zero, but then start rapidly increasing as seen above.

Again, sampling show processlist gives a picture of what's going on.

In this first one, system thread #3 is in the "post SQL applying"
phase, but time is zero ie it just arrived at that state. Most other
threads are sleeping and waiting for more work to arrive, 1 is already
processing SQL and 1 already committed.

+------+-------------+-----------------+--------+---------+-------+-----------------------------+------------------------------------------------------------------------+

But then you quickly get into the following state, where "post SQL
applying" has been blocked for 2 seconds, and also all client threads
are waiting for commit to complete.

esitbi129lab >show processlist;
+------+-------------+-----------------+--------+---------+-------+-----------------------------+------------------+

I've observed Time column grow to at least 5-6 seconds.

I should add that this doesn't prove it is the "Post SQL applying"
thread that is blocking everything, it could also be something else
that blocks all of these. In any case, it is the "Post SQL applying"
state that gets us into this trouble when viewing show processlist. I
assume this is a state for a galera thread?

henrik

seppo....@codership.com

unread,

Aug 5, 2011, 4:39:08 AM8/5/11

to codersh...@googlegroups.com

Hi Henrik,

I'm reproducing this test in my single physical server hosting 2
logical Galera nodes, and I can confirm that this SQL load profile
causes serious slown down in applying. I have not yet looked into the
actual SQL, but will do that later today.

Enabling parallel applying seems to help a lot, I tried with 4 and 16
applier threads, and the trend in performance gain was clearly
upwards. I believe that in your test runs, you were actually using
only 1 applier all the time. Unfortunately our documentation is not
clear about this, but you cannot launch more appliers simply by
issuing SET GLOBAL wsrep_slave_threads=# command, because appliers are
launched during replication provider change only. You'd better write
in your my.cnf wsrep_slave_threads=... and restart your slave. (it is
enough to set this is slaves only).
BTW, you can visualize slave appliers in slave nodes by 'SHOW
PROCESSLIST' command. You can add more appliers as long as they can
work effectively.

Looking at how master and slaves behave with respect to parallel
applying, it seems that this test load is very sensitive to innodb
buffer pool paging. Only way to overcome that is by adding more
appliers. I start analyzing the actual SQL just in a bit...

Thanks, Seppo

Quoting Henrik Ingo <henri...@avoinelama.fi>:

Henrik Ingo

unread,

Aug 5, 2011, 4:47:59 AM8/5/11

to seppo....@codership.com, codersh...@googlegroups.com

Thanks. I was suspecting that, but since show variables claims that
the wsrep_slave_threads has been set, I didn't actually set it via
my.cnf. It would perhaps be better if the setting of that variable
would fail.

Similarly, I tried setting
wsrep_provider_options="replicator.commit_order=1"; and that also
returns "Query ok" but when doing show variables you see that the
value isn't really changed.

I'll try again with setting parallel slaves from my.cnf.

One thing I don't understand though: If my data set fits in memory
(Innodb_buffer_pool_pages_free > 0), and I just recently loaded the
data with sysbench prepare, then all of it should be in memory? So I
shouldn't be constrained by disk seeks as in a disk based workload.

henrik

Henrik Ingo

unread,

Aug 5, 2011, 4:48:45 AM8/5/11

to seppo....@codership.com, codersh...@googlegroups.com

Ah, so what I wanted to say, it would be better if the variables that
can only be set in my.cnf would return an error when you try to set
them at runtime.

henrik

seppo....@codership.com

unread,

Aug 5, 2011, 4:55:57 AM8/5/11

to codersh...@googlegroups.com

Quoting Henrik Ingo <henri...@avoinelama.fi>:

> One thing I don't understand though: If my data set fits in memory
> (Innodb_buffer_pool_pages_free > 0), and I just recently loaded the
> data with sysbench prepare, then all of it should be in memory? So I
> shouldn't be constrained by disk seeks as in a disk based workload.
>

That's correct, my test environment had less BP and page replacement
was happening heavily. It may be that we were seeing different
problems in our tests.

Henrik Ingo

unread,

Aug 5, 2011, 5:50:24 AM8/5/11

to seppo....@codership.com, codersh...@googlegroups.com

Ok. So I now really have 32 slave threads, as I can see from SHOW PROCESSLIST;

+----+-------------+----------------------------------+--------+---------+------+-----------------------------+------------------------------------------------------------------------------------------------------+

|
+----+-------------+----------------------------------+--------+---------+------+-----------------------------+------------------------------------------------------------------------------------------------------+

| 665 | wsrep aborter idle | NULL

| 3 | post SQL applying (4404280) | NULL

|
+----+-------------+----------------------------------+--------+---------+------+-----------------------------+------------------------------------------------------------------------------------------------------+
38 rows in set (0.00 sec)

This snapshot is interesting. We only have 4 client threads (and 4
more connecting another node), but we already reach the problematic
state where everything is blocked, and no amount of slave threads will
be able to help us.

Anyway, could you elaborate on the meaning of this status counter:

| wsrep_flow_control_paused | 0.518608 |
| wsrep_flow_control_paused | 0.782890 |
| wsrep_flow_control_paused | 0.904079 |

...it seems at some intervals we are doing 90% nothing, which is kind
of what results show too :-) So what is this flow control and why
could it be stuck?

Alex Yurchenko

unread,

Aug 5, 2011, 6:04:43 AM8/5/11

to codersh...@googlegroups.com

On Fri, 5 Aug 2011 08:44:50 +0300, Henrik Ingo wrote:

>> 1) Single galera node cannot be so much slower than a stock server.
>> Single
>> node does not even replicate anything.
>
> Single galera node is good. The first column in my table (or second,
> if you account for the column with #threads) is the same mysql-galera
> binary running standalone.

Damnit! I knew it needed a sleep-over. Completely missed that part.

>> 2) Note that G performance peaks at 8 threads. Sharply.
>
> You mean in general or in my tests? There's a small peak on 8 threads
> in columns 3 to 5 which are the clustered benchmarks, but the
> variability in the results is high enough to almost completely
> account
> for that.

Looked rather sharp and consistent to me, but it is not important now
that you have wsrep stats:

> I've increased the innodb buffer pool to 15000M, other than that it
> is
> the my.cnf from galera demo package. The sysbench data set should be
> around 3GB so fits in memory.

Yes. But it will be IN the buffer pool only after either of:
- you ran sysbench prepare against the complete cluster
- the nodes are sufficiently warmed up (and I think it'd take at least
10 minutes)
Did your benchmark run comply to any of those conditions? If you start
with cold servers, 150 seconds is a too short time to warm them up
sufficiently. If you ran the benchmark over the warm nodes, then we have
a serious issue in slave applying (which should be no different from
stock MySQL slave), but so far the evidence points at cold database.

Note that good single node figures are no excuse here, because single
master is warmed up in parallel by many client threads. As you change to
multi-master you have the following effects: client concurrency on
individual nodes is much less, single slave threads kills it completely.

This looks interesting:
32 threads single node:

> response time:
> min: 6.72ms
> avg: 15.87ms
> max: 5304.59ms
> approx. 95 percentile: 22.90ms

32 threads cluster:

> response time:
> min: 4.78ms
> avg: 92.14ms
> max: 67708.54ms
> approx. 95 percentile: 25.22ms

Notice how close min and 95% values are, which essentially means that
most transactions go normally, but there are a few that spoil the whole
picture. This is a good (or, rather, bad?) indication that there is some
urgent buffer pool fetching going on. Read: slave threads are waiting
for IO.

> Ok, I tried increasing wsrep_slave_threads to 32 and then also to
> 256,
> but it doesn't change anything.
>

> I was sampling SHOW STATUS LIKE "%wsrep%"; and the following values
> seem bad, according to your documentation:
>
> SET GLOBAL wsrep_slave_threads=32;

Unfortunately it does not work this way. You need to reload provider
for this to take effect:

mysql> SET GLOBAL wsrep_provider='none';
mysql> SET GLOBAL wsrep_provider='<path to libgalera_smm.so>';
mysql> SET GLOBAL wsrep_cluster_address='gcomm://<peer node address>';

or as Seppo suggested, put it in my.cnf and restart nodes.

It is easy to see how many slave threads are running from the
processlist: the number of slave threads is the number of system user
threads - 1.

Now these numbers clearly state that the problem is in slave applying.
It is the _only_ reason wsrep_flow_control_paused can be non-zero. And
the most likely cause of it is buffer pool page fetching. And high wa
states suggest the same. However if you confirm that you run the
benchmark on warmed-up nodes, then we should seek another cause. Notice
that with a single slave thread we may be also facing CPU computational
limits - it can only utilize one core (and only between disk IOPs!).

> | wsrep_flow_control_paused | 0.903926 |
> | wsrep_flow_control_sent | 22 |
> | wsrep_flow_control_recv | 68 |

Also sent/recv numbers indicate that nodes are likely to be similar in
performance, so no "slow" node there that would slow down the rest.

As a side note - this is how many appliers might make sense:
> | wsrep_cert_deps_distance | 137.518919 |

> Initially, with only 1 or 2 sysbench threads the queue's are down at
> zero, but then start rapidly increasing as seen above.

This is quite in line with "single slave applier cannot handle all
replication events".

> Again, sampling show processlist gives a picture of what's going on.
>
> In this first one, system thread #3 is in the "post SQL applying"
> phase, but time is zero ie it just arrived at that state. Most other
> threads are sleeping and waiting for more work to arrive, 1 is
> already
> processing SQL and 1 already committed.

> But then you quickly get into the following state, where "post SQL

> applying" has been blocked for 2 seconds, and also all client threads
> are waiting for commit to complete.

This most likely only means that this node (as a whole) is waiting for
some other node to catch-up. Remember, replication (and therefore
commits) is paused in this benchmark for ~90% of the time.

"Post SQL applying" is just a slave thread ;) It reads "Post SQL
applying" because it has just applied an SQL statement, which in this
case (if you have binlog_format='ROW') is a COMMIT for a preceding RBR
binlog event. It is good to see it in this state, it means that it has
no pending work to do.

Hope this explains.

Regards,
Alex

P.S. I see that you have new results with multiple slave threads and
still high flow_control_paused numbers. Please try the benchmark after
10-20 minute heat-up period to rule out cold buffer pool.

Alex Yurchenko

unread,

Aug 5, 2011, 6:34:34 AM8/5/11

to codersh...@googlegroups.com

On Fri, 5 Aug 2011 12:50:24 +0300, Henrik Ingo wrote:

> Anyway, could you elaborate on the meaning of this status counter:
>
> | wsrep_flow_control_paused | 0.518608 |
> | wsrep_flow_control_paused | 0.782890 |
> | wsrep_flow_control_paused | 0.904079 |
>
> ...it seems at some intervals we are doing 90% nothing, which is kind
> of what results show too :-) So what is this flow control and why
> could it be stuck?

Henrik,

the only reason for the flow control to act up is the long recv queue
on one or more nodes. So the rest are waiting for it to catch up. Error
log should show the actual recv queue limits:

110803 0:14:31 [Note] WSREP: Flow-control interval: [12, 23]

they depend on the number of nodes in the cluster.

So it is pretty much pinpoints the problem to slow slave appliers. In
your processlist dumps we see however that slave appliers are idle
("Post SQL applying"). Is it the case on other nodes as well?

Could you please run the benchmark with, say, 32 clients in a purely
master-slave setup after sufficient warm-up and
- post _full_ output of SHOW STATUS LIKE 'wsrep%' from slave
- post top snapshot from slave
- post example processlist from slave
- attach output of SHOW VARIABLES LIKE '%' from slave (going to be long
and ugly, but we better have it all instead of guessing)

Henrik Ingo

unread,

Aug 5, 2011, 7:06:07 AM8/5/11

to Alex Yurchenko, codersh...@googlegroups.com

Thanks for your comments. This is very educational.

> P.S. I see that you have new results with multiple slave threads and still
> high flow_control_paused numbers. Please try the benchmark after 10-20
> minute heat-up period to rule out cold buffer pool.

So the answer is that I haven't been pedantic about using warm up
runs, but in most cases sysbench prepare did precede sysbench run.
Just for good measure I now did a warmup with 1 client per galera node
(ie 3 clients) running for 15 minutes. I then ran the same 2,5 min
tests against 2 nodes (3rd is slave only) and results are the same:

Number of threads: 16
Random number generator seed is 0 and will be ignored

WARNING: timer was never started
Threads started!

OLTP test statistics:
queries performed:

read: 819980
write: 234278
other: 117138
total: 1171396
transactions: 58568 (387.31 per sec.)
deadlocks: 2 (0.01 per sec.)
read/write requests: 1054258 (6971.76 per sec.)
other operations: 117138 (774.63 per sec.)

General statistics:
total time: 151.2183s
total number of events: 58568
total time taken by event execution: 2419.1916s
response time:
min: 5.65ms
avg: 41.31ms
max: 8974.65ms
approx. 95 percentile: 11.39ms

Btw, I think the response time statistics from sysbench are per query,
not per transaction. That would mean that queries generally complete
within 12 ms, however the commit's then hang when they wait on galera,
and therefore the max (and also average) response time is very high.
I'm not 100% sure sysbench does this, but would make sense.

I will now try your next suggestion and collect information you requested.

Henrik Ingo

unread,

Aug 5, 2011, 7:19:20 AM8/5/11

to Alex Yurchenko, codersh...@googlegroups.com

On Fri, Aug 5, 2011 at 1:34 PM, Alex Yurchenko
<alexey.y...@codership.com> wrote:
> So it is pretty much pinpoints the problem to slow slave appliers. In your
> processlist dumps we see however that slave appliers are idle ("Post SQL
> applying"). Is it the case on other nodes as well?

other nodes behave similarly, but hard for me to say what they look
like *at the same instance in time*.

To save people from scrolling, rest of this email contains just the
copy pasted information as requested, no comments from me.

henrik

> Could you please run the benchmark with, say, 32 clients in a purely
> master-slave setup after sufficient warm-up and
> - post _full_ output of SHOW STATUS LIKE 'wsrep%' from slave

esitbi129lab >show status like "%wsrep%";
+----------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------+--------------------------------------+
| wsrep_local_state_uuid | 5ec0abfe-be8b-11e0-0800-ced57c516764 |
| wsrep_protocol_version | 1 |
| wsrep_last_committed | 5552903 |
| wsrep_replicated | 551856 |
| wsrep_replicated_bytes | 883905522 |
| wsrep_received | 800250 |
| wsrep_received_bytes | 1231189503 |
| wsrep_local_commits | 551779 |
| wsrep_local_cert_failures | 7 |
| wsrep_local_bf_aborts | 260 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_avg | 0.000000 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_avg | 0.262727 |
| wsrep_flow_control_paused | 0.609370 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 3 |
| wsrep_cert_deps_distance | 108.146154 |
| wsrep_apply_oooe | 0.418227 |
| wsrep_apply_oool | 0.000369 |
| wsrep_apply_window | 1.810714 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 1.188054 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced (6) |
| wsrep_cert_index_size | 402 |
| wsrep_cluster_conf_id | 3 |
| wsrep_cluster_size | 3 |
| wsrep_cluster_state_uuid | 5ec0abfe-be8b-11e0-0800-ced57c516764 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_index | 1 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <in...@codership.com> |
| wsrep_provider_version | 0.8.1(r57) |
| wsrep_ready | ON |
+----------------------------+--------------------------------------+

38 rows in set (0.00 sec)

> - post top snapshot from slave

oscillates between 2 states:

[root@esitbi129lab mysql-galera]# top

top - 14:10:06 up 86 days, 19:48, 1 user, load average: 0.90, 1.64, 1.47
Tasks: 210 total, 1 running, 209 sleeping, 0 stopped, 0 zombie

Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu1 : 0.0%us, 0.3%sy, 0.0%ni, 0.0%id, 99.7%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.3%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu6 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 32958664k total, 21598964k used, 11359700k free, 318092k buffers
Swap: 34996216k total, 0k used, 34996216k free, 16420884k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

17475 root 19 0 16.6g 3.8g 7508 S 0.7 12.1 35:06.58 mysqld
1 root 15 0 10348 700 584 S 0.0 0.0 0:09.29 init

[root@esitbi129lab mysql-galera]# top

top - 14:11:16 up 86 days, 19:50, 1 user, load average: 1.11, 1.53, 1.44
Tasks: 210 total, 1 running, 209 sleeping, 0 stopped, 0 zombie
Cpu0 : 5.4%us, 2.0%sy, 0.0%ni, 79.3%id, 12.0%wa, 0.0%hi, 1.3%si, 0.0%st
Cpu1 : 2.3%us, 0.3%sy, 0.0%ni, 97.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 3.3%us, 0.7%sy, 0.0%ni, 80.8%id, 14.2%wa, 0.3%hi, 0.7%si, 0.0%st
Cpu3 : 2.3%us, 0.7%sy, 0.0%ni, 97.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 4.0%us, 1.3%sy, 0.0%ni, 64.1%id, 30.6%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 1.3%us, 0.0%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 1.3%us, 0.3%sy, 0.0%ni, 98.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32958664k total, 21599808k used, 11358856k free, 318092k buffers
Swap: 34996216k total, 0k used, 34996216k free, 16420892k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

17475 root 19 0 16.6g 3.8g 7508 S 26.3 12.2 35:18.58 mysqld
19798 root 15 0 12840 1180 820 R 0.3 0.0 0:00.12 top
21835 root 10 -5 0 0 0 S 0.3 0.0 0:05.68
xfsdatad/2
1 root 15 0 10348 700 584 S 0.0 0.0 0:09.29 init

> - post example processlist from slave

esitbi129lab >show processlist;
+------+-------------+-----------------+------+---------+------+-----------------------------------------------+------------------+

esitbi129lab >show processlist;
+------+-------------+-----------------+------+---------+------+-----------------------------+------------------+

esitbi129lab >show processlist;
+------+-------------+-----------------+------+---------+------+-----------------------------------------------+------------------+

> - attach output of SHOW VARIABLES LIKE '%' from slave (going to be long and
> ugly, but we better have it all instead of guessing)

esitbi129lab >SHOW VARIABLES LIKE '%'\G
*************************** 1. row ***************************
Variable_name: auto_increment_increment
Value: 3
*************************** 2. row ***************************
Variable_name: auto_increment_offset
Value: 2
*************************** 3. row ***************************
Variable_name: autocommit
Value: ON
*************************** 4. row ***************************
Variable_name: automatic_sp_privileges
Value: ON
*************************** 5. row ***************************
Variable_name: back_log
Value: 50
*************************** 6. row ***************************
Variable_name: basedir
Value: /data/c/mysql-5.1.57-galera-0.8.1-x86_64/mysql/
*************************** 7. row ***************************
Variable_name: big_tables
Value: OFF
*************************** 8. row ***************************
Variable_name: binlog_cache_size
Value: 32768
*************************** 9. row ***************************
Variable_name: binlog_direct_non_transactional_updates
Value: OFF
*************************** 10. row ***************************
Variable_name: binlog_format
Value: ROW
*************************** 11. row ***************************
Variable_name: bulk_insert_buffer_size
Value: 8388608
*************************** 12. row ***************************
Variable_name: character_set_client
Value: utf8
*************************** 13. row ***************************
Variable_name: character_set_connection
Value: utf8
*************************** 14. row ***************************
Variable_name: character_set_database
Value: latin1
*************************** 15. row ***************************
Variable_name: character_set_filesystem
Value: binary
*************************** 16. row ***************************
Variable_name: character_set_results
Value: utf8
*************************** 17. row ***************************
Variable_name: character_set_server
Value: latin1
*************************** 18. row ***************************
Variable_name: character_set_system
Value: utf8
*************************** 19. row ***************************
Variable_name: character_sets_dir
Value: /data/c/mysql-5.1.57-galera-0.8.1-x86_64/mysql/share/mysql/charsets/
*************************** 20. row ***************************
Variable_name: collation_connection
Value: utf8_general_ci
*************************** 21. row ***************************
Variable_name: collation_database
Value: latin1_swedish_ci
*************************** 22. row ***************************
Variable_name: collation_server
Value: latin1_swedish_ci
*************************** 23. row ***************************
Variable_name: completion_type
Value: 0
*************************** 24. row ***************************
Variable_name: concurrent_insert
Value: 1
*************************** 25. row ***************************
Variable_name: connect_timeout
Value: 10
*************************** 26. row ***************************
Variable_name: datadir
Value: /data/c/mysql-5.1.57-galera-0.8.1-x86_64/mysql/var/
*************************** 27. row ***************************
Variable_name: date_format
Value: %Y-%m-%d
*************************** 28. row ***************************
Variable_name: datetime_format
Value: %Y-%m-%d %H:%i:%s
*************************** 29. row ***************************
Variable_name: default_week_format
Value: 0
*************************** 30. row ***************************
Variable_name: delay_key_write
Value: ON
*************************** 31. row ***************************
Variable_name: delayed_insert_limit
Value: 100
*************************** 32. row ***************************
Variable_name: delayed_insert_timeout
Value: 300
*************************** 33. row ***************************
Variable_name: delayed_queue_size
Value: 1000
*************************** 34. row ***************************
Variable_name: div_precision_increment
Value: 4
*************************** 35. row ***************************
Variable_name: engine_condition_pushdown
Value: ON
*************************** 36. row ***************************
Variable_name: error_count
Value: 0
*************************** 37. row ***************************
Variable_name: event_scheduler
Value: OFF
*************************** 38. row ***************************
Variable_name: expire_logs_days
Value: 0
*************************** 39. row ***************************
Variable_name: flush
Value: OFF
*************************** 40. row ***************************
Variable_name: flush_time
Value: 0
*************************** 41. row ***************************
Variable_name: foreign_key_checks
Value: ON
*************************** 42. row ***************************
Variable_name: ft_boolean_syntax
Value: + -><()~*:""&|
*************************** 43. row ***************************
Variable_name: ft_max_word_len
Value: 84
*************************** 44. row ***************************
Variable_name: ft_min_word_len
Value: 4
*************************** 45. row ***************************
Variable_name: ft_query_expansion_limit
Value: 20
*************************** 46. row ***************************
Variable_name: ft_stopword_file
Value: (built-in)
*************************** 47. row ***************************
Variable_name: general_log
Value: OFF
*************************** 48. row ***************************
Variable_name: general_log_file
Value: /data/c/mysql-5.1.57-galera-0.8.1-x86_64/mysql/var/mysqld.log
*************************** 49. row ***************************
Variable_name: group_concat_max_len
Value: 1024
*************************** 50. row ***************************
Variable_name: have_community_features
Value: YES
*************************** 51. row ***************************
Variable_name: have_compress
Value: YES
*************************** 52. row ***************************
Variable_name: have_crypt
Value: YES
*************************** 53. row ***************************
Variable_name: have_csv
Value: YES
*************************** 54. row ***************************
Variable_name: have_dynamic_loading
Value: YES
*************************** 55. row ***************************
Variable_name: have_geometry
Value: YES
*************************** 56. row ***************************
Variable_name: have_innodb
Value: YES
*************************** 57. row ***************************
Variable_name: have_ndbcluster
Value: NO
*************************** 58. row ***************************
Variable_name: have_openssl
Value: NO
*************************** 59. row ***************************
Variable_name: have_partitioning
Value: NO
*************************** 60. row ***************************
Variable_name: have_query_cache
Value: YES
*************************** 61. row ***************************
Variable_name: have_rtree_keys
Value: YES
*************************** 62. row ***************************
Variable_name: have_ssl
Value: NO
*************************** 63. row ***************************
Variable_name: have_symlink
Value: YES
*************************** 64. row ***************************
Variable_name: hostname
Value: esitbi129lab.ntc.nokia.com
*************************** 65. row ***************************
Variable_name: identity
Value: 0
*************************** 66. row ***************************
Variable_name: ignore_builtin_innodb
Value: ON
*************************** 67. row ***************************
Variable_name: init_connect
Value:
*************************** 68. row ***************************
Variable_name: init_file
Value:
*************************** 69. row ***************************
Variable_name: init_slave
Value:
*************************** 70. row ***************************
Variable_name: innodb_adaptive_flushing
Value: ON
*************************** 71. row ***************************
Variable_name: innodb_adaptive_hash_index
Value: ON
*************************** 72. row ***************************
Variable_name: innodb_additional_mem_pool_size
Value: 8388608
*************************** 73. row ***************************
Variable_name: innodb_autoextend_increment
Value: 8
*************************** 74. row ***************************
Variable_name: innodb_autoinc_lock_mode
Value: 2
*************************** 75. row ***************************
Variable_name: innodb_buffer_pool_size
Value: 15728640000
*************************** 76. row ***************************
Variable_name: innodb_change_buffering
Value: inserts
*************************** 77. row ***************************
Variable_name: innodb_checksums
Value: ON
*************************** 78. row ***************************
Variable_name: innodb_commit_concurrency
Value: 0
*************************** 79. row ***************************
Variable_name: innodb_concurrency_tickets
Value: 500
*************************** 80. row ***************************
Variable_name: innodb_data_file_path
Value: ibdata1:10M:autoextend
*************************** 81. row ***************************
Variable_name: innodb_data_home_dir
Value:
*************************** 82. row ***************************
Variable_name: innodb_disallow_writes
Value: OFF
*************************** 83. row ***************************
Variable_name: innodb_doublewrite
Value: OFF
*************************** 84. row ***************************
Variable_name: innodb_fast_shutdown
Value: 1
*************************** 85. row ***************************
Variable_name: innodb_file_format
Value: Antelope
*************************** 86. row ***************************
Variable_name: innodb_file_format_check
Value: Barracuda
*************************** 87. row ***************************
Variable_name: innodb_file_per_table
Value: OFF
*************************** 88. row ***************************
Variable_name: innodb_flush_log_at_trx_commit
Value: 0
*************************** 89. row ***************************
Variable_name: innodb_flush_method
Value:
*************************** 90. row ***************************
Variable_name: innodb_force_recovery
Value: 0
*************************** 91. row ***************************
Variable_name: innodb_io_capacity
Value: 200
*************************** 92. row ***************************
Variable_name: innodb_lock_wait_timeout
Value: 50
*************************** 93. row ***************************
Variable_name: innodb_locks_unsafe_for_binlog
Value: ON
*************************** 94. row ***************************
Variable_name: innodb_log_buffer_size
Value: 8388608
*************************** 95. row ***************************
Variable_name: innodb_log_file_size
Value: 104857600
*************************** 96. row ***************************
Variable_name: innodb_log_files_in_group
Value: 2
*************************** 97. row ***************************
Variable_name: innodb_log_group_home_dir
Value: ./
*************************** 98. row ***************************
Variable_name: innodb_max_dirty_pages_pct
Value: 75
*************************** 99. row ***************************
Variable_name: innodb_max_purge_lag
Value: 0
*************************** 100. row ***************************
Variable_name: innodb_mirrored_log_groups
Value: 1
*************************** 101. row ***************************
Variable_name: innodb_old_blocks_pct
Value: 37
*************************** 102. row ***************************
Variable_name: innodb_old_blocks_time
Value: 0
*************************** 103. row ***************************
Variable_name: innodb_open_files
Value: 300
*************************** 104. row ***************************
Variable_name: innodb_read_ahead_threshold
Value: 56
*************************** 105. row ***************************
Variable_name: innodb_read_io_threads
Value: 4
*************************** 106. row ***************************
Variable_name: innodb_replication_delay
Value: 0
*************************** 107. row ***************************
Variable_name: innodb_rollback_on_timeout
Value: OFF
*************************** 108. row ***************************
Variable_name: innodb_spin_wait_delay
Value: 6
*************************** 109. row ***************************
Variable_name: innodb_stats_method
Value: nulls_equal
*************************** 110. row ***************************
Variable_name: innodb_stats_on_metadata
Value: ON
*************************** 111. row ***************************
Variable_name: innodb_stats_sample_pages
Value: 8
*************************** 112. row ***************************
Variable_name: innodb_strict_mode
Value: OFF
*************************** 113. row ***************************
Variable_name: innodb_support_xa
Value: ON
*************************** 114. row ***************************
Variable_name: innodb_sync_spin_loops
Value: 30
*************************** 115. row ***************************
Variable_name: innodb_table_locks
Value: ON
*************************** 116. row ***************************
Variable_name: innodb_thread_concurrency
Value: 0
*************************** 117. row ***************************
Variable_name: innodb_thread_sleep_delay
Value: 10000
*************************** 118. row ***************************
Variable_name: innodb_use_sys_malloc
Value: ON
*************************** 119. row ***************************
Variable_name: innodb_version
Value: 1.0.16
*************************** 120. row ***************************
Variable_name: innodb_write_io_threads
Value: 4
*************************** 121. row ***************************
Variable_name: insert_id
Value: 0
*************************** 122. row ***************************
Variable_name: interactive_timeout
Value: 28800
*************************** 123. row ***************************
Variable_name: join_buffer_size
Value: 131072
*************************** 124. row ***************************
Variable_name: keep_files_on_create
Value: OFF
*************************** 125. row ***************************
Variable_name: key_buffer_size
Value: 8384512
*************************** 126. row ***************************
Variable_name: key_cache_age_threshold
Value: 300
*************************** 127. row ***************************
Variable_name: key_cache_block_size
Value: 1024
*************************** 128. row ***************************
Variable_name: key_cache_division_limit
Value: 100
*************************** 129. row ***************************
Variable_name: language
Value: /data/c/mysql-5.1.57-galera-0.8.1-x86_64/mysql/share/mysql/english/
*************************** 130. row ***************************
Variable_name: large_files_support
Value: ON
*************************** 131. row ***************************
Variable_name: large_page_size
Value: 0
*************************** 132. row ***************************
Variable_name: large_pages
Value: OFF
*************************** 133. row ***************************
Variable_name: last_insert_id
Value: 0
*************************** 134. row ***************************
Variable_name: lc_time_names
Value: en_US
*************************** 135. row ***************************
Variable_name: license
Value: GPL
*************************** 136. row ***************************
Variable_name: local_infile
Value: ON
*************************** 137. row ***************************
Variable_name: locked_in_memory
Value: OFF
*************************** 138. row ***************************
Variable_name: log
Value: OFF
*************************** 139. row ***************************
Variable_name: log_bin
Value: ON
*************************** 140. row ***************************
Variable_name: log_bin_trust_function_creators
Value: OFF
*************************** 141. row ***************************
Variable_name: log_bin_trust_routine_creators
Value: OFF
*************************** 142. row ***************************
Variable_name: log_error
Value: /data/c/mysql-5.1.57-galera-0.8.1-x86_64/mysql/var/esitbi129lab.ntc.nokia.com.err
*************************** 143. row ***************************
Variable_name: log_output
Value: FILE
*************************** 144. row ***************************
Variable_name: log_queries_not_using_indexes
Value: OFF
*************************** 145. row ***************************
Variable_name: log_slave_updates
Value: OFF
*************************** 146. row ***************************
Variable_name: log_slow_queries
Value: OFF
*************************** 147. row ***************************
Variable_name: log_warnings
Value: 1
*************************** 148. row ***************************
Variable_name: long_query_time
Value: 10.000000
*************************** 149. row ***************************
Variable_name: low_priority_updates
Value: OFF
*************************** 150. row ***************************
Variable_name: lower_case_file_system
Value: OFF
*************************** 151. row ***************************
Variable_name: lower_case_table_names
Value: 0
*************************** 152. row ***************************
Variable_name: max_allowed_packet
Value: 1048576
*************************** 153. row ***************************
Variable_name: max_binlog_cache_size
Value: 18446744073709547520
*************************** 154. row ***************************
Variable_name: max_binlog_size
Value: 1073741824
*************************** 155. row ***************************
Variable_name: max_connect_errors
Value: 10
*************************** 156. row ***************************
Variable_name: max_connections
Value: 1024
*************************** 157. row ***************************
Variable_name: max_delayed_threads
Value: 20
*************************** 158. row ***************************
Variable_name: max_error_count
Value: 64
*************************** 159. row ***************************
Variable_name: max_heap_table_size
Value: 16777216
*************************** 160. row ***************************
Variable_name: max_insert_delayed_threads
Value: 20
*************************** 161. row ***************************
Variable_name: max_join_size
Value: 18446744073709551615
*************************** 162. row ***************************
Variable_name: max_length_for_sort_data
Value: 1024
*************************** 163. row ***************************
Variable_name: max_long_data_size
Value: 1048576
*************************** 164. row ***************************
Variable_name: max_prepared_stmt_count
Value: 16382
*************************** 165. row ***************************
Variable_name: max_relay_log_size
Value: 0
*************************** 166. row ***************************
Variable_name: max_seeks_for_key
Value: 18446744073709551615
*************************** 167. row ***************************
Variable_name: max_sort_length
Value: 1024
*************************** 168. row ***************************
Variable_name: max_sp_recursion_depth
Value: 0
*************************** 169. row ***************************
Variable_name: max_tmp_tables
Value: 32
*************************** 170. row ***************************
Variable_name: max_user_connections
Value: 0
*************************** 171. row ***************************
Variable_name: max_write_lock_count
Value: 18446744073709551615
*************************** 172. row ***************************
Variable_name: min_examined_row_limit
Value: 0
*************************** 173. row ***************************
Variable_name: multi_range_count
Value: 256
*************************** 174. row ***************************
Variable_name: myisam_data_pointer_size
Value: 6
*************************** 175. row ***************************
Variable_name: myisam_max_sort_file_size
Value: 9223372036853727232
*************************** 176. row ***************************
Variable_name: myisam_mmap_size
Value: 18446744073709551615
*************************** 177. row ***************************
Variable_name: myisam_recover_options
Value: OFF
*************************** 178. row ***************************
Variable_name: myisam_repair_threads
Value: 1
*************************** 179. row ***************************
Variable_name: myisam_sort_buffer_size
Value: 8388608
*************************** 180. row ***************************
Variable_name: myisam_stats_method
Value: nulls_unequal
*************************** 181. row ***************************
Variable_name: myisam_use_mmap
Value: OFF
*************************** 182. row ***************************
Variable_name: net_buffer_length
Value: 16384
*************************** 183. row ***************************
Variable_name: net_read_timeout
Value: 30
*************************** 184. row ***************************
Variable_name: net_retry_count
Value: 10
*************************** 185. row ***************************
Variable_name: net_write_timeout
Value: 60
*************************** 186. row ***************************
Variable_name: new
Value: OFF
*************************** 187. row ***************************
Variable_name: old
Value: OFF
*************************** 188. row ***************************
Variable_name: old_alter_table
Value: OFF
*************************** 189. row ***************************
Variable_name: old_passwords
Value: OFF
*************************** 190. row ***************************
Variable_name: open_files_limit
Value: 5120
*************************** 191. row ***************************
Variable_name: optimizer_prune_level
Value: 1
*************************** 192. row ***************************
Variable_name: optimizer_search_depth
Value: 62
*************************** 193. row ***************************
Variable_name: optimizer_switch
Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on
*************************** 194. row ***************************
Variable_name: pid_file
Value: /data/c/mysql-5.1.57-galera-0.8.1-x86_64/mysql/var/mysqld.pid
*************************** 195. row ***************************
Variable_name: plugin_dir
Value: /data/c/mysql-5.1.57-galera-0.8.1-x86_64/mysql/lib/mysql/plugin
*************************** 196. row ***************************
Variable_name: port
Value: 3306
*************************** 197. row ***************************
Variable_name: preload_buffer_size
Value: 32768
*************************** 198. row ***************************
Variable_name: profiling
Value: OFF
*************************** 199. row ***************************
Variable_name: profiling_history_size
Value: 15
*************************** 200. row ***************************
Variable_name: protocol_version
Value: 10
*************************** 201. row ***************************
Variable_name: pseudo_thread_id
Value: 1187
*************************** 202. row ***************************
Variable_name: query_alloc_block_size
Value: 8192
*************************** 203. row ***************************
Variable_name: query_cache_limit
Value: 1048576
*************************** 204. row ***************************
Variable_name: query_cache_min_res_unit
Value: 4096
*************************** 205. row ***************************
Variable_name: query_cache_size
Value: 0
*************************** 206. row ***************************
Variable_name: query_cache_type
Value: OFF
*************************** 207. row ***************************
Variable_name: query_cache_wlock_invalidate
Value: OFF
*************************** 208. row ***************************
Variable_name: query_prealloc_size
Value: 8192
*************************** 209. row ***************************
Variable_name: rand_seed1
Value:
*************************** 210. row ***************************
Variable_name: rand_seed2
Value:
*************************** 211. row ***************************
Variable_name: range_alloc_block_size
Value: 4096
*************************** 212. row ***************************
Variable_name: read_buffer_size
Value: 131072
*************************** 213. row ***************************
Variable_name: read_only
Value: OFF
*************************** 214. row ***************************
Variable_name: read_rnd_buffer_size
Value: 262144
*************************** 215. row ***************************
Variable_name: relay_log
Value:
*************************** 216. row ***************************
Variable_name: relay_log_index
Value:
*************************** 217. row ***************************
Variable_name: relay_log_info_file
Value: relay-log.info
*************************** 218. row ***************************
Variable_name: relay_log_purge
Value: ON
*************************** 219. row ***************************
Variable_name: relay_log_space_limit
Value: 0
*************************** 220. row ***************************
Variable_name: report_host
Value:
*************************** 221. row ***************************
Variable_name: report_password
Value:
*************************** 222. row ***************************
Variable_name: report_port
Value: 3306
*************************** 223. row ***************************
Variable_name: report_user
Value:
*************************** 224. row ***************************
Variable_name: rpl_recovery_rank
Value: 0
*************************** 225. row ***************************
Variable_name: secure_auth
Value: OFF
*************************** 226. row ***************************
Variable_name: secure_file_priv
Value:
*************************** 227. row ***************************
Variable_name: server_id
Value: 0
*************************** 228. row ***************************
Variable_name: skip_external_locking
Value: ON
*************************** 229. row ***************************
Variable_name: skip_name_resolve
Value: OFF
*************************** 230. row ***************************
Variable_name: skip_networking
Value: OFF
*************************** 231. row ***************************
Variable_name: skip_show_database
Value: OFF
*************************** 232. row ***************************
Variable_name: slave_compressed_protocol
Value: OFF
*************************** 233. row ***************************
Variable_name: slave_exec_mode
Value: STRICT
*************************** 234. row ***************************
Variable_name: slave_load_tmpdir
Value: /tmp
*************************** 235. row ***************************
Variable_name: slave_net_timeout
Value: 3600
*************************** 236. row ***************************
Variable_name: slave_skip_errors
Value: OFF
*************************** 237. row ***************************
Variable_name: slave_transaction_retries
Value: 10
*************************** 238. row ***************************
Variable_name: slow_launch_time
Value: 2
*************************** 239. row ***************************
Variable_name: slow_query_log
Value: OFF
*************************** 240. row ***************************
Variable_name: slow_query_log_file
Value: /data/c/mysql-5.1.57-galera-0.8.1-x86_64/mysql/var/mysqld-slow.log
*************************** 241. row ***************************
Variable_name: socket
Value: /data/c/mysql-5.1.57-galera-0.8.1-x86_64/mysql/var/mysqld.sock
*************************** 242. row ***************************
Variable_name: sort_buffer_size
Value: 2097144
*************************** 243. row ***************************
Variable_name: sql_auto_is_null
Value: ON
*************************** 244. row ***************************
Variable_name: sql_big_selects
Value: ON
*************************** 245. row ***************************
Variable_name: sql_big_tables
Value: OFF
*************************** 246. row ***************************
Variable_name: sql_buffer_result
Value: OFF
*************************** 247. row ***************************
Variable_name: sql_log_bin
Value: ON
*************************** 248. row ***************************
Variable_name: sql_log_off
Value: OFF
*************************** 249. row ***************************
Variable_name: sql_log_update
Value: ON
*************************** 250. row ***************************
Variable_name: sql_low_priority_updates
Value: OFF
*************************** 251. row ***************************
Variable_name: sql_max_join_size
Value: 18446744073709551615
*************************** 252. row ***************************
Variable_name: sql_mode
Value:
*************************** 253. row ***************************
Variable_name: sql_notes
Value: ON
*************************** 254. row ***************************
Variable_name: sql_quote_show_create
Value: ON
*************************** 255. row ***************************
Variable_name: sql_safe_updates
Value: OFF
*************************** 256. row ***************************
Variable_name: sql_select_limit
Value: 18446744073709551615
*************************** 257. row ***************************
Variable_name: sql_slave_skip_counter
Value:
*************************** 258. row ***************************
Variable_name: sql_warnings
Value: OFF
*************************** 259. row ***************************
Variable_name: ssl_ca
Value:
*************************** 260. row ***************************
Variable_name: ssl_capath
Value:
*************************** 261. row ***************************
Variable_name: ssl_cert
Value:
*************************** 262. row ***************************
Variable_name: ssl_cipher
Value:
*************************** 263. row ***************************
Variable_name: ssl_key
Value:
*************************** 264. row ***************************
Variable_name: storage_engine
Value: InnoDB
*************************** 265. row ***************************
Variable_name: sync_binlog
Value: 0
*************************** 266. row ***************************
Variable_name: sync_frm
Value: ON
*************************** 267. row ***************************
Variable_name: system_time_zone
Value: EEST
*************************** 268. row ***************************
Variable_name: table_definition_cache
Value: 256
*************************** 269. row ***************************
Variable_name: table_lock_wait_timeout
Value: 50
*************************** 270. row ***************************
Variable_name: table_open_cache
Value: 64
*************************** 271. row ***************************
Variable_name: table_type
Value: InnoDB
*************************** 272. row ***************************
Variable_name: thread_cache_size
Value: 0
*************************** 273. row ***************************
Variable_name: thread_handling
Value: one-thread-per-connection
*************************** 274. row ***************************
Variable_name: thread_stack
Value: 262144
*************************** 275. row ***************************
Variable_name: time_format
Value: %H:%i:%s
*************************** 276. row ***************************
Variable_name: time_zone
Value: SYSTEM
*************************** 277. row ***************************
Variable_name: timed_mutexes
Value: OFF
*************************** 278. row ***************************
Variable_name: timestamp
Value: 1312543083
*************************** 279. row ***************************
Variable_name: tmp_table_size
Value: 16777216
*************************** 280. row ***************************
Variable_name: tmpdir
Value: /tmp
*************************** 281. row ***************************
Variable_name: transaction_alloc_block_size
Value: 8192
*************************** 282. row ***************************
Variable_name: transaction_prealloc_size
Value: 4096
*************************** 283. row ***************************
Variable_name: tx_isolation
Value: REPEATABLE-READ
*************************** 284. row ***************************
Variable_name: unique_checks
Value: ON
*************************** 285. row ***************************
Variable_name: updatable_views_with_limit
Value: YES
*************************** 286. row ***************************
Variable_name: version
Value: 5.1.57
*************************** 287. row ***************************
Variable_name: version_comment
Value: wsrep_0.8.1.r3114
*************************** 288. row ***************************
Variable_name: version_compile_machine
Value: x86_64
*************************** 289. row ***************************
Variable_name: version_compile_os
Value: unknown-linux-gnu
*************************** 290. row ***************************
Variable_name: wait_timeout
Value: 28800
*************************** 291. row ***************************
Variable_name: warning_count
Value: 0
*************************** 292. row ***************************
Variable_name: wsrep_auto_increment_control
Value: ON
*************************** 293. row ***************************
Variable_name: wsrep_causal_reads
Value: OFF
*************************** 294. row ***************************
Variable_name: wsrep_certify_nonPK
Value: ON
*************************** 295. row ***************************
Variable_name: wsrep_cluster_address
Value: gcomm://cluster127
*************************** 296. row ***************************
Variable_name: wsrep_cluster_name
Value: my_wsrep_cluster
*************************** 297. row ***************************
Variable_name: wsrep_convert_LOCK_to_trx
Value: OFF
*************************** 298. row ***************************
Variable_name: wsrep_data_home_dir
Value: /data/c/mysql-5.1.57-galera-0.8.1-x86_64/mysql/var/
*************************** 299. row ***************************
Variable_name: wsrep_dbug_option
Value:
*************************** 300. row ***************************
Variable_name: wsrep_debug
Value: OFF
*************************** 301. row ***************************
Variable_name: wsrep_drupal_282555_workaround
Value: ON
*************************** 302. row ***************************
Variable_name: wsrep_local_cache_size
Value: 20971520
*************************** 303. row ***************************
Variable_name: wsrep_max_ws_rows
Value: 131072
*************************** 304. row ***************************
Variable_name: wsrep_max_ws_size
Value: 1073741824
*************************** 305. row ***************************
Variable_name: wsrep_node_incoming_address
Value: 172.21.212.26:3306
*************************** 306. row ***************************
Variable_name: wsrep_node_name
Value: esitbi129lab.ntc.nokia.com
*************************** 307. row ***************************
Variable_name: wsrep_notify_cmd
Value:
*************************** 308. row ***************************
Variable_name: wsrep_on
Value: ON
*************************** 309. row ***************************
Variable_name: wsrep_provider
Value: /data/c/mysql-5.1.57-galera-0.8.1-x86_64/galera/lib/libgalera_smm.so
*************************** 310. row ***************************
Variable_name: wsrep_provider_options
Value: evs.debug_log_mask = 0x1; evs.inactive_check_period =
PT1S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0;
evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S;
evs.keepalive_period = PT1S; evs.max_install_timeouts = 1;
evs.send_window = 4; evs.stats_report_period = PT1M;
evs.suspect_timeout = PT5S; evs.use_aggregate = true;
evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout =
PT5M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16;
gcs.fc_master_slave = NO; gcs.max_packet_size = 32616;
gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807;
gcs.recv_q_soft_limit = 0.25; gmcast.listen_addr = tcp://0.0.0.0:4567;
gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.time_wait = PT5S;
gmcast.version = 0; pc.checksum = true; pc.ignore_quorum = false;
pc.ignore_sb = false; pc.linger = PT2S; pc.npvo = false; pc.version =
0; protonet.backend = asio; protonet.version = 0;
replicator.commit_order = 3
*************************** 311. row ***************************
Variable_name: wsrep_retry_autocommit
Value: 1
*************************** 312. row ***************************
Variable_name: wsrep_slave_threads
Value: 32
*************************** 313. row ***************************
Variable_name: wsrep_sst_auth
Value: ********
*************************** 314. row ***************************
Variable_name: wsrep_sst_donor
Value:
*************************** 315. row ***************************
Variable_name: wsrep_sst_method
Value: mysqldump
*************************** 316. row ***************************
Variable_name: wsrep_sst_receive_address
Value: AUTO
*************************** 317. row ***************************
Variable_name: wsrep_start_position
Value: 00000000-0000-0000-0000-000000000000:-1
*************************** 318. row ***************************
Variable_name: wsrep_ws_persistency
Value: OFF
318 rows in set (0.00 sec)

Henrik Ingo

unread,

Aug 5, 2011, 7:53:29 AM8/5/11

to Alex Yurchenko, codersh...@googlegroups.com

Hi Alex

Just to give an update from a different perspective. Since your
results are from sysbench 0.4, I downloaded that to give it a try.
This seems to scale well, I'm now at 16 threads (writing against 1
node in 3 node cluster). show processlist and show status seem very
healthy now, no stalls and no flow_control_paused.

So I guess the next step would be to investigate differences in the
SQL done by sysbench 0.5 and sysbench 0.4 test and learn what the
offending transactions might be.

This reminds me in so many ways how it was to learn MySQL cluster :-)
Seems like clustered database are just very sensitive creatures. Say
something wrong and they become really upset - much like beutiful
women!

henrik

sysbench 0.4.12: multi-threaded system evaluation benchmark

No DB drivers specified, using mysql

Running the test with following options:
Number of threads: 16

Doing OLTP test.
Running mixed OLTP test
Using Special distribution (12 iterations, 1 pct of values are
returned in 75 pct cases)
Using "BEGIN" for starting transactions
Using auto_inc on the id column
Threads started!
Time limit exceeded, exiting...
(last message repeated 15 times)
Done.

OLTP test statistics:
queries performed:

read: 3663044
write: 1308230
other: 523292
total: 5494566
transactions: 261646 (1744.23 per sec.)

deadlocks: 0 (0.00 per sec.)

read/write requests: 4971274 (33140.28 per sec.)
other operations: 523292 (3488.45 per sec.)

Test execution summary:
total time: 150.0070s
total number of events: 261646
total time taken by event execution: 2397.4358
per-request statistics:
min: 4.82ms
avg: 9.16ms
max: 255.02ms
approx. 95 percentile: 13.30ms

Threads fairness:
events (avg/stddev): 16352.8750/53.89
execution time (avg/stddev): 149.8397/0.00

Alex Yurchenko

unread,

Aug 5, 2011, 7:59:17 AM8/5/11

to henri...@avoinelama.fi, codersh...@googlegroups.com

Gotcha, you sneaky bastard!

Ehm! What we see in this process list is:
- slave thread 11 is trying to apply trx #5610196 (global transaction
sequence number) for good 2 seconds already and is currently in delete
row handler.
- all the rest 31 slave threads have applied their stuff (post RBR
applying, notice, their seqno's are all between 5610197 and 5610227) and
since by default we have in-order commit policy they are all waiting for
that thread to finish applying and commit.

Why the transaction #5610196 is so long-running is probably a question
to the storage engine, so I'd suggest to try this benchmark with innodb
plugin. I'll be checking variables for possible culprits.

Two important notes here:
1) single node sysbench stats also show some long running (few seconds)
queries (or transactions, I believe it is the latter, but not important
now), but since they are executed locally, they are not ordered yet and
so don't affect other threads. On the slave, however, we need to
preserve the order and when such long running trx happens on slave it
blocks everything.
2) with an average trx execution time of ~10ms, it is virtually
impossible to compensate for several seconds by adding more slaves.

In the above processlist the node is waiting for another node.

Again slow transaction

Regards,
Alex

Henrik Ingo

unread,

Aug 5, 2011, 8:30:07 AM8/5/11

to Alex Yurchenko, codersh...@googlegroups.com

Thanks. I always wanted to ask you what those numbers mean, now I know.

All of this is already run with innodb_plugin, anything else I'd
consider a waste of time as nobody should use stock mysql 5.1 innodb
if they are interested in performance.

I'm now running another round with both sysbench versions and
collecting query logs so I can analyze them.

henrik

--

Henrik Ingo

unread,

Aug 5, 2011, 9:19:27 AM8/5/11

to Alex Yurchenko, codersh...@googlegroups.com

Hi again. This is the last email for this week, I'll enjoy weekend
after this :-)

I captured some query logs both for sysbench 0.4 and sysbench 0.5,
browsed through them manually and with mk-query-digest. The weird
thing is... All of this looks fine.

As far as I can see, the queries are similar in both tests.
Distribution could have been a bit different, there seems to be
slightly more updates and deletes in 0.5. But these tests do roughly
the same thing.

All updates and deletes and inserts are single row (where id=?).

What's funnier, there is not a single query that takes seconds to
return, also commit statements are in the order of 1ms. So the problem
seems to be observable on the slave side only (as you have explained),
but then I'm running out of tools to see what precisely is happening.
Why does a single row delete block for seconds?

henrik

Alex Yurchenko

unread,

Aug 5, 2011, 9:58:22 AM8/5/11

to henri...@avoinelama.fi, codersh...@googlegroups.com

On Fri, 5 Aug 2011 16:19:27 +0300, Henrik Ingo wrote:
> Hi again. This is the last email for this week, I'll enjoy weekend
> after this :-)

Sure! Life is short and weekends are even shorter! ;)

> I captured some query logs both for sysbench 0.4 and sysbench 0.5,
> browsed through them manually and with mk-query-digest. The weird
> thing is... All of this looks fine.
>
> As far as I can see, the queries are similar in both tests.
> Distribution could have been a bit different, there seems to be
> slightly more updates and deletes in 0.5. But these tests do roughly
> the same thing.
>
> All updates and deletes and inserts are single row (where id=?).
>
> What's funnier, there is not a single query that takes seconds to
> return, also commit statements are in the order of 1ms. So the
> problem

Well, IIRC, sysbench has reported several seconds max time even for a
single server. It might be 1 transaction in a thousand.

> seems to be observable on the slave side only (as you have
> explained),
> but then I'm running out of tools to see what precisely is happening.
> Why does a single row delete block for seconds?

Honestly, no idea at the moment. So I'd go to suggest that there are
some catastrophic IO trashing events happening which periodically block
some of innodb IO threads. But this does not explain normal operation of
sysbench-0.4.

However, what we found with sysbench-0.4 is that it has very low page
replacement rate, effectively localizing its transactions around some
part of data. E.g. you could run with 6Gb buffer pool and 18Gb of data
and still not require parallel appliers.

Sysbench-0.5 might have changed this somehow, but since all 3Gb of data
should fit well into the 15Gb buffer pool, there should be no page
replacement going on at all.

So what does this leave us with? Dirty page flushing? Perhaps playing
with innodb_max_dirty_pages_pct could reveal something? Just guessing at
the moment.

Could it be a HW IO problem?

Regards,
Alex

Henrik Ingo

unread,

Aug 5, 2011, 9:58:47 AM8/5/11

to Alex Yurchenko, codersh...@googlegroups.com

Interesting. The final test inspired by reading query logs produces
good results (comparable to single node case and sysbench 0.4).

$ sysbench --test=sysbench/tests/db/oltp.lua --mysql-host=esitbi130lab \
> --mysql-user=root --mysql-password=rootpass --mysql-table-engine=InnoDB \
> --mysql-engine-trx=yes --oltp-table-size=2000000 --max-time=300 \
> --max-requests=0 --num-threads=1 --oltp-tables-count=10 \
> --oltp-auto-inc=off cleanup

sysbench 0.5: multi-threaded system evaluation benchmark

Dropping table 'sbtest1'...
Dropping table 'sbtest2'...
Dropping table 'sbtest3'...
Dropping table 'sbtest4'...
Dropping table 'sbtest5'...
Dropping table 'sbtest6'...
Dropping table 'sbtest7'...
Dropping table 'sbtest8'...
Dropping table 'sbtest9'...
Dropping table 'sbtest10'...
$ sysbench --test=sysbench/tests/db/oltp.lua --mysql-host=esitbi130lab
--mysql-user=root --mysql-password=rootpass
--mysql-table-engine=InnoDB --mysql-engine-trx=yes
--oltp-table-size=2000000 --max-time=300 --max-requests=0
--num-threads=1 --oltp-tables-count=1 --oltp-auto-inc=off prepare

sysbench 0.5: multi-threaded system evaluation benchmark

Creating table 'sbtest1'...
Inserting 2000000 records into 'sbtest1'
$ ./sysbench_galera.sh

sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 32
Random number generator seed is 0 and will be ignored

WARNING: timer was never started
Threads started!

OLTP test statistics:
queries performed:

read: 4081392
write: 1166112
other: 583056
total: 5830560
transactions: 291528 (1943.40 per sec.)

deadlocks: 0 (0.00 per sec.)

read/write requests: 5247504 (34981.18 per sec.)
other operations: 583056 (3886.80 per sec.)

General statistics:
total time: 150.0094s
total number of events: 291528
total time taken by event execution: 4798.6997s
response time:
min: 7.10ms
avg: 16.46ms
max: 407.09ms
approx. 95 percentile: 24.65ms

Threads fairness:
events (avg/stddev): 9110.2500/23.64
execution time (avg/stddev): 149.9594/0.00

So what changed?
- one table instead of 10. (multiple tables is new feature in 0.5)
- resulting in 10x smaller data set.

Also, if I create just one table but increase amount of rows by 10x:

$ sysbench --test=sysbench/tests/db/oltp.lua --mysql-host=esitbi130lab
--mysql-user=root --mysql-password=rootpass
--mysql-table-engine=InnoDB --mysql-engine-trx=yes
--oltp-table-size=2000000 --max-time=300 --max-requests=0
--num-threads=1 --oltp-tables-count=1 --oltp-auto-inc=off cleanup

sysbench 0.5: multi-threaded system evaluation benchmark

Dropping table 'sbtest1'...
$ sysbench --test=sysbench/tests/db/oltp.lua --mysql-host=esitbi130lab
--mysql-user=root --mysql-password=rootpass
--mysql-table-engine=InnoDB --mysql-engine-trx=yes
--oltp-table-size=20000000 --max-time=300 --max-requests=0
--num-threads=1 --oltp-tables-count=1 --oltp-auto-inc=off prepare

sysbench 0.5: multi-threaded system evaluation benchmark

Creating table 'sbtest1'...
Inserting 20000000 records into 'sbtest1'
$ sysbench --test=sysbench/tests/db/oltp.lua --mysql-host=esitbi130lab
--mysql-user=root --mysql-password=rootpass
--mysql-table-engine=InnoDB --mysql-engine-trx=yes
--oltp-table-size=20000000 --max-time=150 --max-requests=0
--num-threads=32 --oltp-tables-count=1 --oltp-auto-inc=off run

sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 32
Random number generator seed is 0 and will be ignored

WARNING: timer was never started
Threads started!

OLTP test statistics:
queries performed:

read: 1366652
write: 390472
other: 195236
total: 1952360
transactions: 97618 (635.13 per sec.)

deadlocks: 0 (0.00 per sec.)

read/write requests: 1757124 (11432.31 per sec.)
other operations: 195236 (1270.26 per sec.)

General statistics:
total time: 153.6980s
total number of events: 97618
total time taken by event execution: 4917.3192s
response time:
min: 6.74ms
avg: 50.37ms
max: 107641.08ms
approx. 95 percentile: 25.24ms

Threads fairness:
events (avg/stddev): 3050.5625/75.01
execution time (avg/stddev): 153.6662/0.02

Which is the poor result again.

And no, it's not because the buffer pool isn't big enough:

----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 16081928192; in additional pool allocated 0
Dictionary memory allocated 45967
Buffer pool size 960000
Free buffers 608681
Database pages 347009
Old database pages 128075
Modified db pages 57369

(On all 3 servers.)

Btw, I can also spot similar state for update as we did earlier for delete:

Update_rows_log_event::ha_update_row(8572091) | NULL |

RBR applying (8572102) | NULL |

RBR applying (8572113) | NULL |

RBR applying (8572093) | NULL |

Spotted this too, what does the following mean (16):

Some pointer gone haywire? I got it with the sysbench run as given above.

So, there is something with the larger 3+GB data set that gives poor
performance, when same workload with 360 k data set works fine.

henrik

Alex Yurchenko

unread,

Aug 5, 2011, 5:51:28 PM8/5/11

to henri...@avoinelama.fi, codersh...@googlegroups.com

Hi Henrik!

I did some reading:

1)
http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-performance-adaptive_flushing.html
This describes blocking of normal IO threads due to dirty page flushing
(and in that supports my initial suspicion), but claims that it had been
fixed in InnoDB plugin 1.0.4 with innodb_adaptive_flushing. I've seen
situations where adaptive flushing was harmful to performance. Maybe
there is something in slave load profile that breaks adaptive flushing
heuristics.

2)
http://dimitrik.free.fr/db_STRESS_MySQL_InnoDB_dirty_pages_and_log_size_impact_May2009.html
(a very long read)
This might be outdated, but it shows how buggy this could be.

Overall it looks like there are at least the following variables that
could drastically affect dirty page flushing and overall performance
when data fit into the buffer pool:

- innodb_log_file_size
in your tests it was 100M and it is rather low for 15G buffer pool. One
would expect that small log file size would result in more frequent and,
therefore, smooth flushing, but hell knows. At least 2) shows that
increasing it to 1G can significantly improve average latencies and
performance.

- innodb_adaptive_flushing
ON in your benchmarks. It may not work well in general and it may not
work well with this particular log file size to buffer pool size ratio.
Perhaps turning it off can change things.

- innodb_max_dirty_pages_pct
75 in your tests. Some people use 90, some 15. Trying these 2 settings
can change things.

- innodb_thread_concurrency
Unlimited in your tests and I think thats how it should be. However the
author of 2) got the best (and smoothest) results with
innodb_thread_concurrency=16 - but with a "master" load. So it may or
may not help, but I'd try it if nothing of the above helps.

These might seem like weak suggestions given that single server works
fine. Well, I already mentioned this but if you have 32 client
connections on master and one of them suddenly is delayed by several
seconds, other connections won't notice it unless they need to lock the
same rows. On a slave situation is drastically different: 1 second delay
is felt by all cluster because of transaction ordering. Also slave load
does not have selects and therefore its access pattern can differ from
master's. And this difference could be enough to cause flushing to
perform less efficiently.

Regards,
Alex

Henrik Ingo

unread,

Aug 8, 2011, 8:18:03 AM8/8/11

to Alex Yurchenko, codersh...@googlegroups.com

Hi Alex

This is a very interesting theory and the symptoms people report from
dirty page flushing are similar to what I see. It would also fit well
into the explanation why a larger data set creates the problem: it is
not the size of the data per se, but then the sysbench workload will
also end up hitting a larger number of pages.

I had noticed that the innodb_log_file_size was rather small, but
since I had no problem in the single node case, I didn't touch it. (I
wanted to change as little as possible.)

So I re-started this morning by setting innodb_log_file_size to
2x1500M a number I've seen used on some larger production servers. I'm
now getting good numbers. Will blog about them when finished. Thanks a
lot for your help!

I also re-ran some of my baseline single node tests, this is related
to the bug I filed earlier today:
https://bugs.launchpad.net/codership-mysql/+bug/822599

I tried 3 variations of single node, running mysql-galera binary, but
no other nodes connected:
- sync_binlog and innodb_flush_logs_at_trx_commit and
innodb_doublewrite all set to 1 (as you should with a single node
database, or DRBD for that matter).
- all set to zero (as you can do in a galera cluster)
- innodb_* parameters set to 1 but sync_binlog to 0 (commonly used
"compromise" between durability and performance)

As expected, results for the first configuration are great and
workload is pretty much cpu bound, while for the second results are
poor and limited by the iops of the server.

Results for the third configuration are similar to the first. This is
unexpected, I would expect them to be significantly better due to: 1)
no fsyncs from binlog, should double throughput and 2) un-breaks
InnoDB group commit, this should also give much better performance.

However, what I'm seeing is that performance is still as poor as with
sync_binlog=1. Since wsrep and galera plugins are enabled (even if no
other node is connected), is it possible this has similar effect as
sync_binlog=1 in that the transaction internally becomes an XA
transaction and this breaks group commit for InnoDB? If yes, could you
remind me how I can start the mysql-galera binary with wsrep/galera
completely disabled?

henrik

--

Henrik Ingo

unread,

Aug 8, 2011, 8:38:42 AM8/8/11

to Alex Yurchenko, codersh...@googlegroups.com

On Mon, Aug 8, 2011 at 3:18 PM, Henrik Ingo <henri...@avoinelama.fi> wrote:
> - sync_binlog and innodb_flush_logs_at_trx_commit and
> innodb_doublewrite all set to 1 (as you should with a single node
> database, or DRBD for that matter).
> - all set to zero (as you can do in a galera cluster)
> - innodb_* parameters set to 1 but sync_binlog to 0 (commonly used
> "compromise" between durability and performance)
>
> As expected, results for the first configuration are great and
> workload is pretty much cpu bound, while for the second results are
> poor and limited by the iops of the server.

Oops, that's of course backwards :-) First is poor, second is great.

> Results for the third configuration are similar to the first.

This one is correct, ie results from third configuration were also poor.

> transaction and this breaks group commit for InnoDB? If yes, could you
> remind me how I can start the mysql-galera binary with wsrep/galera
> completely disabled?

Ah, you already told me in this thread:

"it may make sense to run a benchmark against a standalone MySQL-wsrep
server without Galera loaded (SET GLOBAL wsrep_provider='none';)"

henrik

Alex Yurchenko

unread,

Aug 8, 2011, 9:55:51 AM8/8/11

to codersh...@googlegroups.com

Hi Henrik!

Great to know that benchmarking issue was sorted out!

As for disabling wsrep, there are two approaches:

1) as you mentioned, setting wsrep_provider='none'. This will however
only unload the provider, the MySQL-wsrep patch would still be there and
still be calling wsrep calls provided by a dummy provider (which are
stubs). But the server behaviour in this configuration should be
identical to standalone MySQL.

2) by setting wsrep_on=0. It is a session variable and would disable
most of wsrep patch even if provider is loaded.

However in practice I don't think there is much difference in
performance between them. And anyway, it is nothing compared to what
"safe" settings do to a standalone server.

Speaking of them, I think it is a matter of hardware and setup. In my
experience sync_binlog was by far the most evil setting, dropping
sysbench (0.4) performance by a factor of 10 (seriously, 7200RPM 500Gb
sata drive, XFS filesystem). However different combination of IO
throughput and latencies may cause different results. But frankly, I
don't know how sync_binlog works internally.

BTW, are you sure the binlog was actually written to disk?

Regards,
Alex

--

Henrik Ingo

unread,

Aug 10, 2011, 7:36:44 AM8/10/11

to Alex Yurchenko, codersh...@googlegroups.com

Hi Alex, and thanks again for all the help so far

I'm now compiling the data from the benchmarks and writing some
commentary. I realized there's something I'd still like you to
explain:

In debugging the poorly performing configuration, we saw a show
processlist *on the slave* side where it turns out InnoDB is stuck
purging its log file, 1 Galera thread is stuck waiting on InnoDB and
31 Galera threads are stuck waiting for the first Galera thread.

The question is, why does any of this cause poor performance *on the
master*? As you've explained Galera replication to me, this should
happen:

1) client does transaction against master.
2) on commit, the transaction is synchronously replicated to slaves,
but not applied to innodb.
3) after commit, each slave applies the transaction asynchronously.

From the above it doesn't make sense that the master is stuck when
InnoDB on slave side has an issue. So what is really happening?

henrik

Alex Yurchenko

unread,

Aug 10, 2011, 9:28:26 AM8/10/11

to henri...@avoinelama.fi, codersh...@googlegroups.com

On Wed, 10 Aug 2011 14:36:44 +0300, Henrik Ingo wrote:
> Hi Alex, and thanks again for all the help so far
>
> I'm now compiling the data from the benchmarks and writing some
> commentary. I realized there's something I'd still like you to
> explain:
>
> In debugging the poorly performing configuration, we saw a show
> processlist *on the slave* side where it turns out InnoDB is stuck
> purging its log file, 1 Galera thread is stuck waiting on InnoDB and
> 31 Galera threads are stuck waiting for the first Galera thread.
>
> The question is, why does any of this cause poor performance *on the
> master*? As you've explained Galera replication to me, this should
> happen:
>
> 1) client does transaction against master.
> 2) on commit, the transaction is synchronously replicated to slaves,
> but not applied to innodb.
> 3) after commit, each slave applies the transaction asynchronously.
>
> From the above it doesn't make sense that the master is stuck when
> InnoDB on slave side has an issue. So what is really happening?
>
> henrik

Good question!

Don't forget that we're striving to be synchronous. There are two
reasons for it:

1) Even in asynchronous replication you want your slave to be current
(so there's not much point in asynchronous replication anyway). It makes
no sense to have master consistently outperforming slave since, well,
slave will diverge from master more and more to the point of being
unusable. So you want the master to perform on average as SLOW (and
isn't it ironic?) as a slave. The question is only how far are you
willing to average. And so Galera has a flow control to address that.

2) For efficient Galera multi-master operation you want the nodes to be
as close as possible since the bigger the gap, the higher the
probability of certification failure, so being synchronous is not only
an HA requirement, it is a must for efficient multi-master. And so by
default Galera has a really tight flow control.

But it can be arbitrarily relaxed with certain parameters
(http://www.codership.com/wiki/doku.php?id=galera_parameters_0.8):

- gcs.fc_limit tells how tight (in terms of wsrep_recv_queue length)
flow control should be.
- gcs.fc_factor is a minor tuning hysteresis parameter. Should be
closer to 1.0 especially for large gcs.fc_limit.
- gcs.fc_master_slave is just tells not to adjust fc_limit if you add
more nodes (in multi-master it makes sense to increase it to compensate
for increased replication noise) but it does not make Galera
master-slave only. Just a flow control control ;)

So if you have trx rate of 3K trx/second then setting gcs.fc_factor=3K
will give you approximately 1 second tolerance.

Note that flow control in Galera is _reactive_. So in the above example
there is no guarantee that recv queue will never grow to 10K due to some
replication spike. But if it does, then the cluster will be paused until
recv queue goes below gcs.fc_limit*gcs.fc_factor.

Regards,
Alex

Reply all

Reply to author

Forward