sysbench test was stucked in mariadb 10.1 ?

281 views
Skip to first unread message

Ljr Yang

unread,
May 2, 2016, 9:35:20 AM5/2/16
to codership
Hi, All
     Recently, We use Sysbench test  MariaDB 10.1.13.  In Test ,We found server lock and can't write .
 
Testing environment:
server1:  10.2.1.242    (HP OEM Fusion IO PCIE Card  3.2TB )
server2:  10.2.1.239    (ScanDisk Fusion IO PCIE Card 3.2TB)

sysbench test script:
 sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --num-threads=16  --mysql-host=localhost --mysql-port=3306 --mysql-socket=/data/mysql2/data/mysql.sock --mysql-user=root --mysql-password=123456 --mysql-db=test --max-time=0 --max-requests=0  --report-interval=1 --oltp-tables-count=16 run

when wsrep_on = on ,  after run a few minutes or a few hours, both server1 and server2  were stucked, all write stoped ,only select statement run normal  :
................
[ 815s] threads: 16, tps: 6176.00, reads: 86431.00, writes: 24699.00, response time: 3.37ms (95%), errors: 0.00, reconnects:  0.00
[ 816s] threads: 16, tps: 6064.99, reads: 84907.86, writes: 24256.96, response time: 3.66ms (95%), errors: 0.00, reconnects:  0.00
[ 817s] threads: 16, tps: 6097.00, reads: 85394.05, writes: 24394.02, response time: 3.54ms (95%), errors: 0.00, reconnects:  0.00
[ 818s] threads: 16, tps: 6131.97, reads: 85819.52, writes: 24529.86, response time: 3.65ms (95%), errors: 0.00, reconnects:  0.00
[ 819s] threads: 16, tps: 6143.04, reads: 86036.63, writes: 24575.18, response time: 3.44ms (95%), errors: 0.00, reconnects:  0.00
[ 820s] threads: 16, tps: 5819.80, reads: 81532.22, writes: 23332.20, response time: 3.60ms (95%), errors: 0.00, reconnects:  0.00
[ 821s] threads: 16, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[ 822s] threads: 16, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[ 823s] threads: 16, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[ 824s] threads: 16, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[ 825s] threads: 16, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[ 826s] threads: 16, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
................

There are not any error message and locking issue  In mariadb error.log file.
After execute pstack `pidof mysqld` > pstack.log,  sysbench run normal again , and start write data.

Every time have this problem,  in pstack.log file contains a few lock: 
#0  0x00007f2b04fe0334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f2b04fdb5d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007f2b04fdb4a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
....................

When set wsrep_on = off,  Sysbench test  normal in MariaDB 10.1.13.

As contrast ,We download Percona-XtraDB-Cluster-5.6.28-rel76.1-25.14.1.Linux.x86_64.ssl101.tar.gz   and run sysbench test, found same problem too.

We can't confirm , is it Fusion-IO card  or galera question ?   please help analyze ,thank you.

mariadb config file ,  when occur problem ,  we save   processlist , innodb status, pstack output file  in attach

Reference link:
mariadb_my.cnf
show_processlist.txt
show_engine_innodb_status.txt
pstack.log

antonio falzarano

unread,
May 2, 2016, 2:00:05 PM5/2/16
to codership
Hi,
In my experience i found that with a heavy write environment galera have problems with transactions lock and sometimes all cluster block.
I suggest to try test with single node write to view if you have the same problems.

Regards
Antonio Falzarano

alexey.y...@galeracluster.com

unread,
May 2, 2016, 5:07:17 PM5/2/16
to Ljr Yang, codership
It would be great if you could reproduce it with the original
Codership's binaries: http://galeracluster.com/downloads/

And if you can, then file an issue at
https://github.com/codership/mysql-wsrep and submit the following info
from both servers:

# cat /proc/$(pidof mysqld)/limits | grep "open files"
# cat /proc/sys/vm/dirty_ratio
# cat /proc/sys/vm/nr_hugepages
# cat /sys/kernel/mm/transparent_hugepage/enabled
# cat /sys/block/sda//queue/scheduler

mysql> SHOW FULL PROCESSLIST;
mysql> SHOW GLOBAL VARIABLES\G
mysql> SHOW STATUS LIKE 'wsrep%';
mysql> SHOW ENGINE InnoDB STATUS\G

Regards,
Alex


On 2016-05-02 10:35, Ljr Yang wrote:
> Hi, All
> Recently, We use Sysbench test MariaDB 10.1.13. In Test ,We
> found
> server lock and can't write .
>
> Testing environment:
> server1: 10.2.1.242 (HP OEM Fusion IO PCIE Card 3.2TB )
> server2: 10.2.1.239 (ScanDisk Fusion IO PCIE Card 3.2TB)
>
> sysbench test script:
> sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua
> --num-threads=16
> --mysql-host=localhost --mysql-port=3306
> --mysql-socket=/data/mysql2/data/mysql.sock --mysql-user=root
> --mysql-password=123456 --mysql-db=test --max-time=0 --max-requests=0
> --report-interval=1 --oltp-tables-count=16 run
>
> when *wsrep_on = on *, after run a few minutes or a few hours, both
> After execute* pstack `pidof mysqld` > pstack.log*, sysbench run
> normal
> again , and start write data.
>
> Every time have this problem, in pstack.log file contains a few lock:
> #0 0x00007f2b04fe0334 in __lll_lock_wait () from
> /lib64/libpthread.so.0
> #1 0x00007f2b04fdb5d8 in _L_lock_854 () from /lib64/libpthread.so.0
> #2 0x00007f2b04fdb4a7 in pthread_mutex_lock () from
> /lib64/libpthread.so.0
> ....................
>
> When *set wsrep_on = off*, Sysbench test normal in MariaDB 10.1.13.

alexey.y...@galeracluster.com

unread,
May 2, 2016, 5:22:38 PM5/2/16
to Ljr Yang, codership
On 2016-05-02 18:07, alexey.y...@galeracluster.com wrote:
> It would be great if you could reproduce it with the original
> Codership's binaries: http://galeracluster.com/downloads/
>
> And if you can, then file an issue at
> https://github.com/codership/mysql-wsrep and submit the following info
> from both servers:
>
> # cat /proc/$(pidof mysqld)/limits | grep "open files"
> # cat /proc/sys/vm/dirty_ratio
> # cat /proc/sys/vm/nr_hugepages
> # cat /sys/kernel/mm/transparent_hugepage/enabled
> # cat /sys/block/sda//queue/scheduler
>
> mysql> SHOW FULL PROCESSLIST;
> mysql> SHOW GLOBAL VARIABLES\G
> mysql> SHOW STATUS LIKE 'wsrep%';
> mysql> SHOW ENGINE InnoDB STATUS\G
>
> Regards,
> Alex

And of course
# gdb $(which mysqld) -p $(pidof mysqld) -batch -ex "thr apply all bt" >
stack.txt
but after you collect the info above. pstack does not seem to print all
threads stacks

Ljr Yang

unread,
May 2, 2016, 9:56:37 PM5/2/16
to codership
Thank you.
From beginning to end, we run sysbench to test with single node write, 
Last ,  strat galera cluster only one node read and write.

在 2016年5月3日星期二 UTC+8上午2:00:05,antonio falzarano写道:

Will Fong

unread,
May 2, 2016, 11:38:22 PM5/2/16
to Ljr Yang, codership
Hi,

On Mon, May 2, 2016 at 9:35 PM, Ljr Yang <dbms...@gmail.com> wrote:
> sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --num-threads=16

My guess here is that you're using sysbench 0.4 that was included with
your distro, which had some issues running in a Galera cluster. Sorry,
I don't remember the exact reason, but you shouldn't have any problems
running the latest version, 0.5.

You can get the source here: https://github.com/akopytov/sysbench

Running your test, with some local modifications:

[root@server1 ~]# sysbench/sysbench/sysbench
--test=sysbench/sysbench/tests/db/oltp.lua --num-threads=16
--mysql-host=localhost --mysql-port=3306 --mysql-user=root
--mysql-password= --mysql-db=test --max-time=0 --max-requests=0
--report-interval=300 --oltp-tables-count=16 run
WARNING: Both max-requests and max-time are 0, running endless test
sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 16
Report intermediate results every 300 second(s)
Random number generator seed is 0 and will be ignored


Initializing worker threads...

Threads started!

[ 300s] threads: 16, tps: 525.87, reads: 7362.90, writes: 2103.60,
response time: 50.10ms (95%), errors: 0.01, reconnects: 0.00
[ 600s] threads: 16, tps: 508.29, reads: 7116.12, writes: 2033.15,
response time: 50.69ms (95%), errors: 0.01, reconnects: 0.00
[ 900s] threads: 16, tps: 503.63, reads: 7051.08, writes: 2014.57,
response time: 51.16ms (95%), errors: 0.00, reconnects: 0.00


Hope that helps!

-will


--
Will Fong, Senior Support Engineer
MariaDB Corporation

Ljr Yang

unread,
May 2, 2016, 11:39:44 PM5/2/16
to codership, dbms...@gmail.com
Today , we continue sysbench test mariadb 10.1 (wsrep_on=on,  only one node). After run 2262s , sysbench can't write :
.............................................
[2256s] threads: 16, tps: 5311.88, reads: 74425.29, writes: 21299.51, response time: 3.68ms (95%), errors: 0.00, reconnects:  0.00
[2257s] threads: 16, tps: 3919.00, reads: 54800.07, writes: 15622.02, response time: 3.67ms (95%), errors: 0.00, reconnects:  0.00
[2258s] threads: 16, tps: 5430.13, reads: 76032.76, writes: 21718.50, response time: 3.70ms (95%), errors: 0.00, reconnects:  0.00
[2259s] threads: 16, tps: 5504.99, reads: 77060.86, writes: 22025.96, response time: 3.68ms (95%), errors: 0.00, reconnects:  0.00
[2260s] threads: 16, tps: 5537.01, reads: 77520.09, writes: 22142.03, response time: 3.62ms (95%), errors: 0.00, reconnects:  0.00
[2261s] threads: 16, tps: 5518.00, reads: 77242.99, writes: 22070.00, response time: 3.62ms (95%), errors: 0.00, reconnects:  0.00
[2262s] threads: 16, tps: 3351.91, reads: 46997.77, writes: 13461.65, response time: 3.60ms (95%), errors: 0.00, reconnects:  0.00
[2263s] threads: 16, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[2264s] threads: 16, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[2265s] threads: 16, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[2266s] threads: 16, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[2267s] threads: 16, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[2269s] threads: 16, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[2270s] threads: 16, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[2271s] threads: 16, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[2272s] threads: 16, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
..................


cat /proc/$(pidof mysqld)/limits | grep "open files" 
Max open files            20005                20005                files 

cat /proc/sys/vm/dirty_ratio 
20

cat /proc/sys/vm/nr_hugepages 
40960

cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

cat /sys/block/sda//queue/scheduler 
noop anticipatory deadline [cfq]


 mysql> SHOW FULL PROCESSLIST; 
attach file:  1. show_full_processlist.txt

 mysql> SHOW GLOBAL VARIABLES\G 
attach file: 2.show_global_variables.txt

 mysql> SHOW STATUS LIKE 'wsrep%'; 
attach file: 3.show_status_wsrep.txt

 mysql> SHOW ENGINE InnoDB STATUS\G 
attach file: 4.show_engine_innodb_status.txt

gdb -ex "set pagination 0" -ex "thread apply all bt" -batch -p $(pidof mysqld) > gdb.log

attach file: 5.gdb.log

run gdb above, show some errors:
Unhandled dwarf expression opcode 0xf3
Unhandled dwarf expression opcode 0xf3
Unhandled dwarf expression opcode 0xf3
Unhandled dwarf expression opcode 0xf3
Unhandled dwarf expression opcode 0xf3
...................

Afternoon, we will download  Codership's binaries to test, Thanks.

在 2016年5月3日星期二 UTC+8上午5:22:38,Alexey Yurchenko写道:
1. show_full_processlist.txt
2.show_global_variables.txt
3.show_status_wsrep.txt
4.show_engine_innodb_status.txt
5.gdb.log

Ljr Yang

unread,
May 2, 2016, 11:45:56 PM5/2/16
to codership, dbms...@gmail.com


在 2016年5月3日星期二 UTC+8上午11:39:44,Ljr Yang写道:
cat /sys/block/fioa/queue/scheduler   (HP corporation oem pcie card,  Sales reply us,  customer can't modify the parameter )
none

Ljr Yang

unread,
May 9, 2016, 9:28:30 PM5/9/16
to codership, dbms...@gmail.com
Hi, Will Fong,
   Thank you ,  We use  sysbench-0.5-6.el6.x86_64 and yum repo from percona:

   name = Percona-Release YUM repository - Source packages


在 2016年5月3日星期二 UTC+8上午11:38:22,Will Fong写道:

Ljr Yang

unread,
May 9, 2016, 9:47:04 PM5/9/16
to codership

Additional instructions:
our online environment is 3 nodes's  mariadb galera cluster 10.0.24, IO device is dell PowerEdge R720  sas raid10  3.2TB, continuously normal run, 

online test environment : IO device is  dell PowerEdge R610  raid1  100GB, when use sysbench 0.5 test mariadb galera cluster 10.0.24/25 and mariadb 10.1.13,  all  run normal

recently ,we want to upgrade mariadb galera cluster 10.0.24 to mariadb 10.1 and Fusion-IO ,  but in fusion-io test , found some problem above.

在 2016年5月2日星期一 UTC+8下午9:35:20,Ljr Yang写道:

Ljr Yang

unread,
May 24, 2016, 10:46:25 PM5/24/16
to codership
Thank you.

We contact other company's DBA yestarday  , found it is CentOS 6.6 bug.

receference link:
https://www.infoq.com/news/2015/05/redhat-futex
https://groups.google.com/forum/#!msg/mechanical-sympathy/QbmpZxp6C64/0M4_EbzSLj4J


after we replace to CentOS 6.5, mariadb 10.1 (galera cluster) run normal

Ljr Yang

unread,
May 28, 2016, 7:04:36 AM5/28/16
to codership
Additional instructions:

Because our machine's CPU is Intel(R) Xeon(R) CPU E5-2630 v3, it belong Haswell-based servers, when use CentOS 6.6 , hit bug.

Last two days, we test MairaDB 10.1 master/slave or galera cluster in CentOS 6.7, all is normal.  

For your reference. Thank you.

 



在 2016年5月25日星期三 UTC+8上午10:46:25,Ljr Yang写道:
Reply all
Reply to author
Forward
0 new messages