PXC node crashes after 30-40 minutes

121 views
Skip to first unread message

Ovidiu Lixandru

unread,
Jun 27, 2013, 5:39:51 AM6/27/13
to percona-d...@googlegroups.com
Hi,

I have a PXC with 3 nodes running 5.5.31-23.7.5.438.rhel6.x86_64 on CentOS 6.4, 8 CPU cores and 32GB of RAM. Since the upgrade (from 5.5.30), one of the nodes crashes after some tens of minutes with different mysql signals (6 or 11). Signal 6:

07:32:11 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any

key_buffer_size=268435456
read_buffer_size=25165824
max_used_connections=26
max_threads=502
thread_count=10
connection_count=9
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 24942792 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x17b7a700
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f9245112d88 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7d2795]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x6aae74]
/lib64/libpthread.so.0[0x37db80f500]
/lib64/libc.so.6(gsignal+0x35)[0x37db4328a5]
/lib64/libc.so.6(abort+0x175)[0x37db434085]
/usr/lib64/libgalera_smm.so(_ZN6galera3FSMINS_9TrxHandle5StateENS1_10TransitionENS_10EmptyGuardENS_11EmptyActionEE8shift_toES2_+0x2d9)[0x7f925683be39]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM11post_commitEPNS_9TrxHandleE+0xd9)[0x7f9256859729]
/usr/lib64/libgalera_smm.so(galera_post_commit+0x5e)[0x7f925687116e]
/usr/sbin/mysqld(_Z25wsrep_cleanup_transactionP3THD+0x97)[0x65f7d7]
/usr/sbin/mysqld[0x6b0c96]
/usr/sbin/mysqld(_Z15ha_commit_transP3THDb+0x4b9)[0x6b2189]
/usr/sbin/mysqld(_Z12trans_commitP3THD+0x47)[0x64ae87]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x1580)[0x598a20]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x33b)[0x59cc5b]
/usr/sbin/mysqld[0x59dce2]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1af2)[0x59fee2]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x167)[0x5a04d7]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x14f)[0x63c40f]
/usr/sbin/mysqld(handle_one_connection+0x51)[0x63c5f1]
/lib64/libpthread.so.0[0x37db807851]
/lib64/libc.so.6(clone+0x6d)[0x37db4e890d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f8e200009a0): is an invalid pointer
Connection ID (thread ID): 16776
Status: NOT_KILLED

Another crash, with signal 11 this time:

08:43:22 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any

key_buffer_size=268435456
read_buffer_size=25165824
max_used_connections=27
max_threads=502
thread_count=7
connection_count=6
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 24942792 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7d2795]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x6aae74]
/lib64/libpthread.so.0[0x37db80f500]
/usr/lib64/libgalera_smm.so(_ZN6gcache10RingBuffer14get_new_bufferEl+0x167)[0x7f80b695eb27]
/usr/lib64/libgalera_smm.so(_ZN6gcache10RingBuffer6mallocEl+0x39)[0x7f80b695ed29]
/usr/lib64/libgalera_smm.so(_ZN6gcache6GCache6mallocEl+0x97)[0x7f80b6960337]
/usr/lib64/libgalera_smm.so(gcs_defrag_handle_frag+0x92)[0x7f80b6a09462]
/usr/lib64/libgalera_smm.so(gcs_core_recv+0x489)[0x7f80b6a0ed09]
/usr/lib64/libgalera_smm.so(+0x151770)[0x7f80b6a15770]
/lib64/libpthread.so.0[0x37db807851]
/lib64/libc.so.6(clone+0x6d)[0x37db4e890d]
You may download the Percona Server operations manual by visiting
in the manual which will help you identify the cause of the crash.

The other nodes haven't had any problems since the upgrade. The upgrade was needed for its fixes (php and undefined symbol in the mysql libs...).

Any ideas or hints on what I could do to debug this? They're really annoying because it's a production server (of course :) ).

Thank you.

Ovidiu Lixandru

unread,
Jun 29, 2013, 5:15:35 PM6/29/13
to percona-d...@googlegroups.com
The followup: I couldn't pinpoint the exact cause because a fix was needed fast, but I was able to solve it by deleting everything in datadir and reinitializing the node.

Cheers.

Raghavendra D Prabhu

unread,
Jun 30, 2013, 8:27:43 PM6/30/13
to percona-d...@googlegroups.com
Hi,
Thanks for the followup. If this recurs, can you report it here - https://bugs.launchpad.net/percona-xtradb-cluster

>
>Cheers.
>
>--
>You received this message because you are subscribed to the Google Groups "Percona Discussion" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to percona-discuss...@googlegroups.com.
>To post to this group, send email to percona-d...@googlegroups.com.
>For more options, visit https://groups.google.com/groups/opt_out.
>
>




Regards,
--
Raghavendra Prabhu | http://about.me/raghavendra.prabhu
Product Lead | Percona XtraDB Cluster (PXC)
Percona LLC. - http://www.percona.com / Blog: http://www.mysqlperformanceblog.com/
Contact: http://wnohang.net/contact | GPG: 0xD72BE977

Make plans to attend Percona Live London MySQL Conference
2013.<http://www.percona.com/live/london-2013/>

Maarten van Baarsel

unread,
Sep 12, 2013, 3:10:12 AM9/12/13
to percona-d...@googlegroups.com
On 1/7/2013 02:27 , Raghavendra D Prabhu wrote:
> Hi,
>
> * On Sat, Jun 29, 2013 at 02:15:35PM -0700, Ovidiu Lixandru
> <ovidiu....@gmail.com> wrote:
>> The followup: I couldn't pinpoint the exact cause because a fix was
>> needed
>> fast, but I was able to solve it by deleting everything in datadir and
>> reinitializing the node.
>
> Thanks for the followup. If this recurs, can you report it here -
> https://bugs.launchpad.net/percona-xtradb-cluster

I just had a crash which I think has the same backtrace as the second one Ovidio reported:

2013-09-12T02:00:43.611686+02:00 database-1 mysqld: 00:00:43 UTC - mysqld got signal 11 ;
[...]
2013-09-12T02:00:43.611789+02:00 database-1 mysqld:
2013-09-12T02:00:43.611797+02:00 database-1 mysqld: key_buffer_size=3221225472
2013-09-12T02:00:43.611806+02:00 database-1 mysqld: read_buffer_size=1048576
2013-09-12T02:00:43.612495+02:00 database-1 mysqld: max_used_connections=10
2013-09-12T02:00:43.612536+02:00 database-1 mysqld: max_threads=130
2013-09-12T02:00:43.612549+02:00 database-1 mysqld: thread_count=5
2013-09-12T02:00:43.612559+02:00 database-1 mysqld: connection_count=5
2013-09-12T02:00:43.612569+02:00 database-1 mysqld: It is possible that mysqld could use up to
2013-09-12T02:00:43.612850+02:00 database-1 mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3546721 K bytes of memory
2013-09-12T02:00:43.612888+02:00 database-1 mysqld: Hope that's ok; if not, decrease some variables in the equation.
2013-09-12T02:00:43.612897+02:00 database-1 mysqld:
2013-09-12T02:00:43.612906+02:00 database-1 mysqld: Thread pointer: 0x0
2013-09-12T02:00:43.612916+02:00 database-1 mysqld: Attempting backtrace. You can use the following information to find out
2013-09-12T02:00:43.612925+02:00 database-1 mysqld: where mysqld died. If you see no messages after this, something went
2013-09-12T02:00:43.612932+02:00 database-1 mysqld: terribly wrong...

2013-09-12T02:00:43.653178+02:00 database-1 mysqld: stack_bottom = 0 thread_stack 0x40000
2013-09-12T02:00:43.663492+02:00 database-1 mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7dbebe]
2013-09-12T02:00:43.663962+02:00 database-1 mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x6b1fc4]
2013-09-12T02:00:43.663962+02:00 database-1 mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7fde249dccb0]
2013-09-12T02:00:43.664666+02:00 database-1 mysqld: /usr/lib/libgalera_smm.so(_ZN6gcache10RingBuffer14get_new_bufferEl+0x13c)[0x7fde2184f96c]
2013-09-12T02:00:43.665243+02:00 database-1 mysqld: /usr/lib/libgalera_smm.so(_ZN6gcache10RingBuffer6mallocEl+0x2d)[0x7fde2184fb7d]
2013-09-12T02:00:43.665278+02:00 database-1 mysqld: /usr/lib/libgalera_smm.so(_ZN6gcache6GCache6mallocEl+0x72)[0x7fde21851f62]
2013-09-12T02:00:43.673047+02:00 database-1 mysqld: /usr/lib/libgalera_smm.so(gcs_defrag_handle_frag+0x92)[0x7fde218fc0d2]
2013-09-12T02:00:43.673075+02:00 database-1 mysqld: /usr/lib/libgalera_smm.so(gcs_core_recv+0x4e9)[0x7fde21902549]
2013-09-12T02:00:43.673903+02:00 database-1 mysqld: /usr/lib/libgalera_smm.so(+0x15d5c9)[0x7fde219075c9]
2013-09-12T02:00:43.673903+02:00 database-1 mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7fde249d4e9a]
2013-09-12T02:00:43.673903+02:00 database-1 mysqld: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fde240c8ccd]
2013-09-12T02:00:43.673903+02:00 database-1 mysqld: You may download the Percona Server operations manual by visiting
2013-09-12T02:00:43.673903+02:00 database-1 mysqld: http://www.percona.com/software/percona-server/. You may find information
2013-09-12T02:00:43.673903+02:00 database-1 mysqld: in the manual which will help you identify the cause of the crash.
2013-09-12T02:00:54.383735+02:00 database-1 mysqld_safe: Number of processes running now: 0
2013-09-12T02:00:54.387844+02:00 database-1 mysqld_safe: WSREP: not restarting wsrep node automatically
2013-09-12T02:00:54.392125+02:00 database-1 mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended

I'm on Ubuntu 12.04.

root@database-1:/home/mrten# dpkg -l | grep -i percona
ii libmysqlclient18 1:5.5.32-rel31.0-549.precise Percona Server database client library
ii percona-toolkit 2.2.4 Advanced MySQL and system command-line tools
ii percona-xtrabackup 2.1.4-657-1.precise Open source backup tool for InnoDB and XtraDB
ii percona-xtradb-cluster-client-5.5 5.5.31-23.7.5-438.precise Percona Server database client binaries
ii percona-xtradb-cluster-common-5.5 5.5.31-23.7.5-438.precise Percona Server database common files (e.g. /etc/mysql/my.cnf)
ii percona-xtradb-cluster-galera-2.x 152.precise Galera components of Percona XtraDB Cluster
ii percona-xtradb-cluster-server-5.5 5.5.31-23.7.5-438.precise Percona Server database server binaries

No issue created as there is already this one:

https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1152565

Maarten.

Maarten van Baarsel

unread,
Sep 12, 2013, 3:17:24 AM9/12/13
to percona-d...@googlegroups.com
On 12/9/2013 09:10 , Maarten van Baarsel wrote:

> No issue created as there is already this one:
>
> https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1152565

Reading that bug, how do I check for a corrupted 'gcache file'?

I haven't done anything to the server yet, it's not really production
but nearing that state rapidly.

This wasn't a restart of the server, I haven't done an SST in - I think
- weeks, I did restart replication from the current master to the
cluster (which ends on another member) so there was a burst of queries
happening.

thanks,
M.


Alex Yurchenko

unread,
Sep 12, 2013, 7:06:14 PM9/12/13
to percona-d...@googlegroups.com
On 2013-09-12 10:17, Maarten van Baarsel wrote:
> On 12/9/2013 09:10 , Maarten van Baarsel wrote:
>
>> No issue created as there is already this one:
>>
>> https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1152565
>
> Reading that bug, how do I check for a corrupted 'gcache file'?

There's no such tool at the moment.

> I haven't done anything to the server yet, it's not really production
> but nearing that state rapidly.
>
> This wasn't a restart of the server, I haven't done an SST in - I think
> - weeks, I did restart replication from the current master to the
> cluster (which ends on another member) so there was a burst of queries
> happening.

So far it does seem to be rather random (and rather rare, it seems).

> thanks,
> M.

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

Maarten van Baarsel

unread,
Sep 13, 2013, 12:06:43 PM9/13/13
to percona-d...@googlegroups.com
On 13-09-2013 01:06:14, Alex Yurchenko wrote:
> On 2013-09-12 10:17, Maarten van Baarsel wrote:
>> On 12/9/2013 09:10 , Maarten van Baarsel wrote:
>>
>>> No issue created as there is already this one:
>>>
>>> https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1152565
>>
>> Reading that bug, how do I check for a corrupted 'gcache file'?
>
> There's no such tool at the moment.

OK, I'll save the gcache file for later and restart my server then.

M.

Alex Yurchenko

unread,
Sep 13, 2013, 1:09:57 PM9/13/13
to percona-d...@googlegroups.com
Don't save the file. ATM gcache files are not intended to be persistent.
They get reinitialized on every startup.

> M.

Regards,
Alex
Reply all
Reply to author
Forward
0 new messages