MySQL Master - MySQL-wsrep Slave replication issue

1,465 views
Skip to first unread message

Oleksandr Drach

unread,
Feb 28, 2013, 4:39:43 AM2/28/13
to codersh...@googlegroups.com
Dear Codership community!

I have used Percona's guide to create regular MySQL Master - MySQL-wsrep Slave replication. Finally I am goingg to move to wsrep-cluster with Codership's guide.

At the moment I still used old regular MySQL server for production and wsrep-cluster as slave in test mode.
I have tried both STATEMENT and ROW binary logs but eventually I've got the following replication error again:
>SHOW SLAVE STATUS \G
....
Last_SQL_Errno: 1047
Last_SQL_Error: Error 'Unknown command' on query. Default database: 'mydb'. Query: 'BEGIN'
....

What do you suggest to fix this Master-Slave replication issue?

Thanks in advance!

Ilias Bertsimas

unread,
Feb 28, 2013, 4:48:27 AM2/28/13
to codersh...@googlegroups.com
Hello Oleksander,

It might be a conversion of locks to a single block of multiple transactions http://dev.mysql.com/doc/refman/5.5/en/begin-end.html
Do you have the option wsrep_convert_LOCK_to_trx enabled by any chance ?

Kind Regards,
Ilias.

Oleksandr Drach

unread,
Feb 28, 2013, 4:53:50 AM2/28/13
to codersh...@googlegroups.com
Hello Ilias,

This option was disabled in my configs (wsrep.cnf):
...
# convert locking sessions into transactions
wsrep_convert_LOCK_to_trx=0
...
So probably there is another cause.

seppo....@codership.com

unread,
Feb 28, 2013, 5:05:10 AM2/28/13
to codersh...@googlegroups.com
Does this error happen when you start your Galera node? During mysql
startup, replication slave threads will be launched automatically and
they start pulling and applying new replication events. However, node
might not yet have joined the Galera cluster, and Galera does not
allow clients to run any transactions until node is in sync with the
cluster. This would result in 'Unknown command' error for the slave
SQL thread.

To avoid this scenario, you should skip slave start and start slave
manually only after node is in synced state in the cluster.

-seppo


Quoting Oleksandr Drach <lucky...@gmail.com>:

> Hello Ilias,
>
> This option was disabled in my configs (wsrep.cnf):
> *...*
> *# convert locking sessions into transactions*
> *wsrep_convert_LOCK_to_trx=0*
> ...
> So probably there is another cause.
>
> On Thursday, February 28, 2013 11:48:27 AM UTC+2, Ilias Bertsimas wrote:
>>
>> Hello Oleksander,
>>
>> It might be a conversion of locks to a single block of multiple
>> transactions http://dev.mysql.com/doc/refman/5.5/en/begin-end.html
>> Do you have the option wsrep_convert_LOCK_to_trx enabled by any chance ?
>>
>> Kind Regards,
>> Ilias.
>>
>> On Thursday, February 28, 2013 9:39:43 AM UTC, Oleksandr Drach wrote:
>>>
>>> Dear Codership community!
>>>
>>> I have used Percona's guide
>>> <http://www.percona.com/doc/percona-xtrabackup/howtos/setting_up_replication.html>to
>>> create regular MySQL Master - MySQL-wsrep Slave replication. Finally I am
>>> goingg to move to wsrep-cluster with Codership's
>>> guide<http://www.codership.com/content/5-tips-migrating-your-mysql-server-galera-cluster>
>>> .
>>>
>>> At the moment I still used old regular MySQL server for production and
>>> wsrep-cluster as slave in test mode.
>>> I have tried both STATEMENT and ROW binary logs but eventually I've got
>>> the following replication error again:
>>> >SHOW SLAVE STATUS \G
>>> ....
>>> *Last_SQL_Errno: 1047*
>>> *Last_SQL_Error: Error 'Unknown command' on query. Default database:
>>> 'mydb'. Query: 'BEGIN'*
>>> *....*
>>> *
>>> *

Oleksandr Drach

unread,
Feb 28, 2013, 5:15:15 AM2/28/13
to codersh...@googlegroups.com
Hello, Seppo!

Let me explain it in more details. Am talking now only about the following particular part of mixed cluster:

[MySQL]====>[MySQL-wsrep]
Master________Slave

So it is not related to other wsrep-nodes I think.
Initial replication starts normally and works for a while but then in stops. Slave does not receive updates from master.

So I need to fix it first before migrating fully to wsrep-cluster.

Ilias Bertsimas

unread,
Feb 28, 2013, 5:20:47 AM2/28/13
to codersh...@googlegroups.com
For some reason I assumed that the slave was a simple mysql node. As you said you followed the percona guide to easy migration to galera from normal mysql.
This seems like a symptom of galera not in synced state not able to process anything.
Have you checked the galera slave node's status when it happens ? Is it synced and wsrep_on is ON ?

Oleksandr Drach

unread,
Feb 28, 2013, 9:14:32 AM2/28/13
to codersh...@googlegroups.com
I am using xtrabackup for SST and node works as MySQL-slave well in "Donor/Desynced" state.
It is also hard to find the state of node in the moment of replication crash.
Later, when issue is detected and node is in Synced mode it anyway stops with the issue and replication does not start.

Ilias Bertsimas

unread,
Feb 28, 2013, 9:26:30 AM2/28/13
to codersh...@googlegroups.com
Hello,

I assume you use sync_donor option along with xtrabackup ?
You can use pt-stalk where you can set a custom value like the sql slave error number to trigger a collection of information for you.

Kind Regards,
Ilias.

Oleksandr Drach

unread,
Feb 28, 2013, 10:31:21 AM2/28/13
to codersh...@googlegroups.com
I have not done any modifications to "sync_donor" variable so it uses default value.
The problem seems to be really caused by wsrep: when I disable it (simply removing wsrep.cnf file and restarting server) Master-Slave replication works well.
Once I add this config again I'll get replication error sooner or later.

Ilias Bertsimas

unread,
Feb 28, 2013, 10:46:17 AM2/28/13
to codersh...@googlegroups.com
If you have the sync_donor to default value which is NO it means when you do an SST the donor node goes into desynced state and it can't process anything until it is done with the SST.
So if the state of your node was "Donor/Desynced" it might be what caused the replication to fail with unknown command.

Oleksandr Drach

unread,
Feb 28, 2013, 11:00:19 AM2/28/13
to codersh...@googlegroups.com
Not exactly...

I have synced few wsrep-nodes and they worked well for some time. So issue does not occur when Salve is in Donor state.
The issue unexpectedly occurs later, when all the cluster is running.

BTW xtrabackup is non-blocking SST method so running "SHOW SLAVE STATUS \G" during sending SST to joining node does show that node works normally as a Slave and Seconds_Behind_Master: is almost always 0.

Ilias Bertsimas

unread,
Feb 28, 2013, 11:33:21 AM2/28/13
to codersh...@googlegroups.com
Okay, you just mentioned the donor/desynced status and I did not understand where it fit with the issue.
Xtrabackup is non blocking but to be able to keep the donor synced you need to have the sync_donor option enabled with that otherwise it will not process requests while it is performing as SST.
I am not sure what happens in the case of mysql replication as I have not tried it with SST at the same time but I assumed the same as with normal requests.

Alex Yurchenko

unread,
Feb 28, 2013, 11:41:57 AM2/28/13
to codersh...@googlegroups.com
So, Oleksanr, would you care to give us some real information finally?

when master-slave replication breaks:
- what is binlog_format on master and slave?
- what is SHOW STATUS LIKE 'wsrep%' on slave?
- what is SHOW FULL PROCESSLIST on slave?
- what is the last 5 minutes in the error log on slave?

On 2013-02-28 18:00, Oleksandr Drach wrote:
> Not exactly...
>
> I have synced few wsrep-nodes and they worked well for some time. So
> issue
> does not occur when Salve is in Donor state.
> The issue unexpectedly occurs later, when all the cluster is running.
>
> BTW xtrabackup is non-blocking SST method so running "SHOW SLAVE
> STATUS \G"
> during sending SST to joining node does show that node works normally
> as a
> Slave and *Seconds_Behind_Master*: is almost always 0.
--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

Oleksandr Drach

unread,
Feb 28, 2013, 12:41:10 PM2/28/13
to codersh...@googlegroups.com
Sure, Alex!
Issue is now easy catchable. On removing wsrep.cnf Slave works normally. On adding wsrep component it works few seconds and then exits, causing mysql daemon to stop working with dump.


On Thursday, February 28, 2013 6:41:57 PM UTC+2, Alexey Yurchenko wrote:
So, Oleksanr, would you care to give us some real information finally?

when master-slave replication breaks:
- what is binlog_format on master and slave?
Binlog formats are dewscribed below
Master
...
server-id               = 1
log-bin                 = /var/lib/mysql/TheMaster-bin
binlog-format           = ROW
expire_logs_days       = 10
max_binlog_size         = 100M
...
Slave 
...
server-id               = 2
#replicate-wild-ignore-table = information_schema.%
#replicate-wild-ignore-table = performance_schema.%
log_bin                 = /var/log/mysql/mysql-bin.log
log-bin                 = /var/lib/mysql/TheSlave-bin
binlog-format           = ROW
expire_logs_days       = 10
max_binlog_size         = 100M
# Relay logs
log_slave_updates       = ON
relay_log_index         = relay-bin.index
relay_log               = relay-bin
...
- what is SHOW STATUS LIKE 'wsrep%' on slave?
I have run continuously  SHOW STATUS LIKE 'wsrep%' on slave and here what I've got.

mysql> show status like "wsrep%";
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_local_state_uuid     | 1c46b8ff-81c6-11e2-0800-13893d0f3134 |
| wsrep_protocol_version     | 4                                    |
| wsrep_last_committed       | 88                                   |
| wsrep_replicated           | 47                                   |
| wsrep_replicated_bytes     | 17236                                |
| wsrep_received             | 2                                    |
| wsrep_received_bytes       | 140                                  |
| wsrep_local_commits        | 47                                   |
| wsrep_local_cert_failures  | 0                                    |
| wsrep_local_bf_aborts      | 0                                    |
| wsrep_local_replays        | 0                                    |
| wsrep_local_send_queue     | 0                                    |
| wsrep_local_send_queue_avg | 0.000000                             |
| wsrep_local_recv_queue     | 0                                    |
| wsrep_local_recv_queue_avg | 0.000000                             |
| wsrep_flow_control_paused  | 0.000000                             |
| wsrep_flow_control_sent    | 0                                    |
| wsrep_flow_control_recv    | 0                                    |
| wsrep_cert_deps_distance   | 20.829787                            |
| wsrep_apply_oooe           | 0.000000                             |
| wsrep_apply_oool           | 0.000000                             |
| wsrep_apply_window         | 0.000000                             |
| wsrep_commit_oooe          | 0.000000                             |
| wsrep_commit_oool          | 0.000000                             |
| wsrep_commit_window        | 0.000000                             |
| wsrep_local_state          | 4                                    |
| wsrep_local_state_comment  | Synced                               |
| wsrep_cert_index_size      | 87                                   |
| wsrep_causal_reads         | 0                                    |
| wsrep_incoming_addresses   | 10.0.0.89                         |
| wsrep_cluster_conf_id      | 1                                    |
| wsrep_cluster_size         | 1                                    |
| wsrep_cluster_state_uuid   | 1c46b8ff-81c6-11e2-0800-13893d0f3134 |
| wsrep_cluster_status       | Primary                              |
| wsrep_connected            | ON                                   |
| wsrep_local_index          | 0                                    |
| wsrep_provider_name        | Galera                               |
| wsrep_provider_vendor      | Codership Oy <in...@codership.com>    |
| wsrep_provider_version     | 23.2.2(r137)                         |
| wsrep_ready                | ON                                   |
+----------------------------+--------------------------------------+
40 rows in set (0.00 sec)

mysql> show status like "wsrep%";
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111)
ERROR: 
Can't connect to the server
 
- what is SHOW FULL PROCESSLIST on slave?

Here it is in the moment of crash
mysql> SHOW FULL PROCESSLIST;
+----+------------------+-----------+------+---------+------+-----------------------------------------------------------------------------+-----------------------+
| Id | User             | Host      | db   | Command | Time | State                                                                       | Info                  |
+----+------------------+-----------+------+---------+------+-----------------------------------------------------------------------------+-----------------------+
|  1 | system user      |           | NULL | Sleep   |   25 | wsrep aborter idle                                                          | NULL                  |
|  2 | system user      |           | NULL | Sleep   |   25 | NULL                                                                        | NULL                  |
|  3 | system user      |           | NULL | Connect |   13 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL                  |
|  4 | system user      |           | NULL | Connect |   22 | Waiting for master to send event                                            | NULL                  |
|  5 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
|  6 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
|  7 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
|  8 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
|  9 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
| 10 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
| 11 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
| 12 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
| 13 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
| 14 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
| 15 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
| 16 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
| 17 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
| 18 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
| 19 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
| 20 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
| 21 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
| 22 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
| 23 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
| 24 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
| 25 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
| 26 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
| 27 | mysql_replica    | localhost | NULL | Query   |    0 | sleeping                                                                    | SHOW FULL PROCESSLIST |
| 28 | system user      |           | NULL | Sleep   |   22 | NULL                                                                        | NULL                  |
+----+------------------+-----------+------+---------+------+-----------------------------------------------------------------------------+-----------------------+
28 rows in set (0.00 sec)

mysql> SHOW FULL PROCESSLIST;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111)
ERROR: 
Can't connect to the server

- what is the last 5 minutes in the error log on slave?
130228  9:53:55 [Note] Slave SQL thread initialized, starting replication in log 'TheMaster-bin.000109' at position 6095170, relay log './relay-bin.000010' position: 99693
16:53:55 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=0
max_threads=1200
thread_count=17
connection_count=17
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2756409 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f81dc000990
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f82a808c6a8 thread_stack 0x2d800
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8287a5]
/usr/sbin/mysqld(handle_fatal_signal+0x36a)[0x6a18ca]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f82bbc93cb0]
/usr/sbin/mysqld(wsrep_dict_foreign_find_index+0x15)[0x8cc965]
/usr/sbin/mysqld(wsrep_append_foreign_key+0x48e)[0x83592e]
/usr/sbin/mysqld[0x8656c0]
/usr/sbin/mysqld[0x868df5]
/usr/sbin/mysqld[0x869a69]
/usr/sbin/mysqld[0x854991]
/usr/sbin/mysqld[0x838777]
/usr/sbin/mysqld(_ZN7handler13ha_delete_rowEPKh+0x68)[0x6a6d68]
/usr/sbin/mysqld(_ZN21Delete_rows_log_event11do_exec_rowEPK14Relay_log_info+0xfe)[0x74e0ce]
/usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info+0x266)[0x7540d6]
/usr/sbin/mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14Relay_log_info+0x1bd)[0x5286ad]
/usr/sbin/mysqld[0x52de16]
/usr/sbin/mysqld(handle_slave_sql+0x8f4)[0x52f334]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f82bbc8be9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f82bac614bd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 4
Status: NOT_KILLED

information that should help you find out what is causing the crash.
130228 09:53:55 mysqld_safe Number of processes running now: 0
130228 09:53:55 mysqld_safe WSREP: sleeping 15 seconds before restart
130228 09:54:10 mysqld_safe mysqld restarted
130228 09:54:10 mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.cwOJEDHjwd
130228 09:54:19 mysqld_safe WSREP: Recovered position 1c46b8ff-81c6-11e2-0800-13893d0f3134:41
130228  9:54:19 [Note] WSREP: wsrep_start_position var submitted: '1c46b8ff-81c6-11e2-0800-13893d0f3134:41'
130228  9:54:19 [Warning] options --log-slow-admin-statements, --log-queries-not-using-indexes and --log-slow-slave-statements have no effect if --log_slow_queries is not set
130228  9:54:19 [Note] WSREP: Read nil XID from storage engines, skipping position init
130228  9:54:19 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
130228  9:54:19 [Note] WSREP: wsrep_load(): Galera 23.2.2(r137) by Codership Oy <in...@codership.com> loaded succesfully.
130228  9:54:19 [Note] WSREP: Found saved state: 1c46b8ff-81c6-11e2-0800-13893d0f3134:-1
130228  9:54:19 [Note] WSREP: Reusing existing '/media/mysql//galera.cache'.
130228  9:54:19 [Note] WSREP: Passing config to GCS: base_host = 10.0.0.89; base_port = 4567; cert.log_conflicts = no; evs.consensus_timeout = PT1M; evs.inactive_check_period = PT10S; evs.inactive_timeout = PT1M; evs.keepalive_period = PT3S; evs.send_window = 1024; evs.suspect_timeout = PT30S; evs.user_send_window = 512; gcache.dir = /media/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /media/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 128; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://10.0.0.89:4567; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
130228  9:54:19 [Note] WSREP: Assign initial position for certification: 41, protocol version: -1
130228  9:54:19 [Note] WSREP: wsrep_sst_grab()
130228  9:54:19 [Note] WSREP: Start replication

Alex Yurchenko

unread,
Feb 28, 2013, 1:19:36 PM2/28/13
to codersh...@googlegroups.com
Now we see that it simply crashes:
https://bugs.launchpad.net/codership-mysql/+bug/1078346

On 2013-02-28 19:41, Oleksandr Drach wrote:
> Sure, Alex!
> Issue is now easy catchable. On removing wsrep.cnf Slave works
> normally. On
> adding wsrep component it works few seconds and then exits, causing
> mysql
> daemon to stop working with dump.
>
>
> On Thursday, February 28, 2013 6:41:57 PM UTC+2, Alexey Yurchenko
> wrote:
>>
>> So, Oleksanr, would you care to give us some real information
>> finally?
>>
>> when master-slave replication breaks:
>> - what is binlog_format on master and slave?
>>
> Binlog formats are dewscribed below
> *Master*
> ...
> *server-id = 1*
> *log-bin = /var/lib/mysql/TheMaster-bin
> *
> *binlog-format = ROW*
> *expire_logs_days = 10*
> *max_binlog_size = 100M*
> ...
> *Slave*
> ...
> *server-id = 2*
> *#replicate-wild-ignore-table = information_schema.%*
> *#replicate-wild-ignore-table = performance_schema.%*
> *log_bin = /var/log/mysql/mysql-bin.log*
> *log-bin = /var/lib/mysql/TheSlave-bin*
> *binlog-format = ROW*
> *expire_logs_days = 10*
> *max_binlog_size = 100M*
> *# Relay logs*
> *log_slave_updates = ON*
> *relay_log_index = relay-bin.index*
> *relay_log = relay-bin*
> ...
>
>> - what is SHOW STATUS LIKE 'wsrep%' on slave?
>>
> I have run continuously SHOW STATUS LIKE 'wsrep%' on slave and here
> what
> I've got.
>
> *mysql> show status like "wsrep%";*
>
> *+----------------------------+--------------------------------------+*
> *| Variable_name | Value
> |*
>
> *+----------------------------+--------------------------------------+*
> *| wsrep_local_state_uuid | 1c46b8ff-81c6-11e2-0800-13893d0f3134
> |*
> *| wsrep_protocol_version | 4
> |*
> *| wsrep_last_committed | 88
> |*
> *| wsrep_replicated | 47
> |*
> *| wsrep_replicated_bytes | 17236
> |*
> *| wsrep_received | 2
> |*
> *| wsrep_received_bytes | 140
> |*
> *| wsrep_local_commits | 47
> |*
> *| wsrep_local_cert_failures | 0
> |*
> *| wsrep_local_bf_aborts | 0
> |*
> *| wsrep_local_replays | 0
> |*
> *| wsrep_local_send_queue | 0
> |*
> *| wsrep_local_send_queue_avg | 0.000000
> |*
> *| wsrep_local_recv_queue | 0
> |*
> *| wsrep_local_recv_queue_avg | 0.000000
> |*
> *| wsrep_flow_control_paused | 0.000000
> |*
> *| wsrep_flow_control_sent | 0
> |*
> *| wsrep_flow_control_recv | 0
> |*
> *| wsrep_cert_deps_distance | 20.829787
> |*
> *| wsrep_apply_oooe | 0.000000
> |*
> *| wsrep_apply_oool | 0.000000
> |*
> *| wsrep_apply_window | 0.000000
> |*
> *| wsrep_commit_oooe | 0.000000
> |*
> *| wsrep_commit_oool | 0.000000
> |*
> *| wsrep_commit_window | 0.000000
> |*
> *| wsrep_local_state | 4
> |*
> *| wsrep_local_state_comment | Synced
> |*
> *| wsrep_cert_index_size | 87
> |*
> *| wsrep_causal_reads | 0
> |*
> *| wsrep_incoming_addresses | 10.0.0.89 |*
> *| wsrep_cluster_conf_id | 1
> |*
> *| wsrep_cluster_size | 1
> |*
> *| wsrep_cluster_state_uuid | 1c46b8ff-81c6-11e2-0800-13893d0f3134
> |*
> *| wsrep_cluster_status | Primary
> |*
> *| wsrep_connected | ON
> |*
> *| wsrep_local_index | 0
> |*
> *| wsrep_provider_name | Galera
> |*
> *| wsrep_provider_vendor | Codership Oy <in...@codership.com>
> |*
> *| wsrep_provider_version | 23.2.2(r137)
> |*
> *| wsrep_ready | ON
> |*
>
> *+----------------------------+--------------------------------------+*
> *40 rows in set (0.00 sec)*
> *
> *
> *mysql> show status like "wsrep%";*
> *ERROR 2006 (HY000): MySQL server has gone away*
> *No connection. Trying to reconnect...*
> *ERROR 2002 (HY000): Can't connect to local MySQL server through
> socket
> '/var/run/mysqld/mysqld.sock' (111)*
> *ERROR: *
> *Can't connect to the server*
>
>
>> - what is SHOW FULL PROCESSLIST on slave?
>>
>
> Here it is in the moment of crash
> *mysql> SHOW FULL PROCESSLIST;*
> *
>
> +----+------------------+-----------+------+---------+------+-----------------------------------------------------------------------------+-----------------------+
> *
> *| Id | User | Host | db | Command | Time | State
> |
> Info
> |*
> *
>
> +----+------------------+-----------+------+---------+------+-----------------------------------------------------------------------------+-----------------------+
> *
> *| 1 | system user | | NULL | Sleep | 25 | wsrep
> aborter idle
> |
> NULL |*
> *| 2 | system user | | NULL | Sleep | 25 | NULL
> | NULL
> |*
> *| 3 | system user | | NULL | Connect | 13 | Slave
> has
> read all relay log; waiting for the slave I/O thread to update it |
> NULL
> |*
> *| 4 | system user | | NULL | Connect | 22 |
> Waiting for
> master to send event |
> NULL
> |*
> *| 5 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 6 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 7 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 8 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 9 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 10 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 11 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 12 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 13 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 14 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 15 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 16 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 17 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 18 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 19 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 20 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 21 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 22 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 23 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 24 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 25 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 26 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 27 | mysql_replica | localhost | NULL | Query | 0 |
> sleeping
> | SHOW
> FULL
> PROCESSLIST |*
> *| 28 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *
>
> +----+------------------+-----------+------+---------+------+-----------------------------------------------------------------------------+-----------------------+
> *
> *28 rows in set (0.00 sec)*
> *
> *
> *mysql> SHOW FULL PROCESSLIST;*
> *ERROR 2006 (HY000): MySQL server has gone away*
> *No connection. Trying to reconnect...*
> *ERROR 2002 (HY000): Can't connect to local MySQL server through
> socket
> '/var/run/mysqld/mysqld.sock' (111)*
> *ERROR: *
> *Can't connect to the server*
>
> - what is the last 5 minutes in the error log on slave?
>>
> *130228 9:53:55 [Note] Slave SQL thread initialized, starting
> replication
> in log 'TheMaster-bin.000109' at position 6095170, relay log
> './relay-bin.000010' position: 99693*
> *16:53:55 UTC - mysqld got signal 11 ;*
> *This could be because you hit a bug. It is also possible that this
> binary*
> *or one of the libraries it was linked against is corrupt, improperly
> built,
> *
> *or misconfigured. This error can also be caused by malfunctioning
> hardware.
> *
> *We will try our best to scrape up some info that will hopefully
> help*
> *diagnose the problem, but since we have already crashed,*
> *something is definitely wrong and this may fail.*
> *
> *
> *key_buffer_size=134217728*
> *read_buffer_size=131072*
> *max_used_connections=0*
> *max_threads=1200*
> *thread_count=17*
> *connection_count=17*
> *It is possible that mysqld could use up to*
> *key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads
> =
> 2756409 K bytes of memory*
> *Hope that's ok; if not, decrease some variables in the equation.*
> *
> *
> *Thread pointer: 0x7f81dc000990*
> *Attempting backtrace. You can use the following information to find
> out*
> *where mysqld died. If you see no messages after this, something
> went*
> *terribly wrong...*
> *stack_bottom = 7f82a808c6a8 thread_stack 0x2d800*
> */usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8287a5]*
> */usr/sbin/mysqld(handle_fatal_signal+0x36a)[0x6a18ca]*
> */lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f82bbc93cb0]*
> */usr/sbin/mysqld(wsrep_dict_foreign_find_index+0x15)[0x8cc965]*
> */usr/sbin/mysqld(wsrep_append_foreign_key+0x48e)[0x83592e]*
> */usr/sbin/mysqld[0x8656c0]*
> */usr/sbin/mysqld[0x868df5]*
> */usr/sbin/mysqld[0x869a69]*
> */usr/sbin/mysqld[0x854991]*
> */usr/sbin/mysqld[0x838777]*
> */usr/sbin/mysqld(_ZN7handler13ha_delete_rowEPKh+0x68)[0x6a6d68]*
> *
>
> /usr/sbin/mysqld(_ZN21Delete_rows_log_event11do_exec_rowEPK14Relay_log_info+0xfe)[0x74e0ce]
> *
> *
>
> /usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info+0x266)[0x7540d6]
> *
> *
>
> /usr/sbin/mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14Relay_log_info+0x1bd)[0x5286ad]
> *
> */usr/sbin/mysqld[0x52de16]*
> */usr/sbin/mysqld(handle_slave_sql+0x8f4)[0x52f334]*
> */lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f82bbc8be9a]*
> */lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f82bac614bd]*
> *
> *
> *Trying to get some variables.*
> *Some pointers may be invalid and cause the dump to abort.*
> *Query (0): is an invalid pointer*
> *Connection ID (thread ID): 4*
> *Status: NOT_KILLED*
> *
> *
> *The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html
> contains
> *
> *information that should help you find out what is causing the
> crash.*
> *130228 09:53:55 mysqld_safe Number of processes running now: 0*
> *130228 09:53:55 mysqld_safe WSREP: sleeping 15 seconds before
> restart*
> *130228 09:54:10 mysqld_safe mysqld restarted*
> *130228 09:54:10 mysqld_safe WSREP: Running position recovery with
> --log_error=/tmp/tmp.cwOJEDHjwd*
> *130228 09:54:19 mysqld_safe WSREP: Recovered position
> 1c46b8ff-81c6-11e2-0800-13893d0f3134:41*
> *130228 9:54:19 [Note] WSREP: wsrep_start_position var submitted:
> '1c46b8ff-81c6-11e2-0800-13893d0f3134:41'*
> *130228 9:54:19 [Warning] options --log-slow-admin-statements,
> --log-queries-not-using-indexes and --log-slow-slave-statements have
> no
> effect if --log_slow_queries is not set*
> *130228 9:54:19 [Note] WSREP: Read nil XID from storage engines,
> skipping
> position init*
> *130228 9:54:19 [Note] WSREP: wsrep_load(): loading provider library
> '/usr/lib/galera/libgalera_smm.so'*
> *130228 9:54:19 [Note] WSREP: wsrep_load(): Galera 23.2.2(r137) by
> Codership Oy <in...@codership.com> loaded succesfully.*
> *130228 9:54:19 [Note] WSREP: Found saved state:
> 1c46b8ff-81c6-11e2-0800-13893d0f3134:-1*
> *130228 9:54:19 [Note] WSREP: Reusing existing
> '/media/mysql//galera.cache'.*
> *130228 9:54:19 [Note] WSREP: Passing config to GCS: base_host =
> 10.0.0.89; base_port = 4567; cert.log_conflicts = no;
> evs.consensus_timeout
> = PT1M; evs.inactive_check_period = PT10S; evs.inactive_timeout =
> PT1M;
> evs.keepalive_period = PT3S; evs.send_window = 1024;
> evs.suspect_timeout =
> PT30S; evs.user_send_window = 512; gcache.dir = /media/mysql/;
> gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name =
> /media/mysql//galera.cache; gcache.page_size = 128M; gcache.size =
> 128M;
> gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 128;
> gcs.fc_master_slave = NO; gcs.max_packet_size = 64500;
> gcs.max_throttle =
> 0.25; gcs.recv_q_hard_limit = 9223372036854775807;
> gcs.recv_q_soft_limit =
> 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://10.0.0.89:4567;
> replicator.causal_read_timeout = PT30S; replicator.commit_order = 3*
> *130228 9:54:19 [Note] WSREP: Assign initial position for
> certification:
> 41, protocol version: -1*
> *130228 9:54:19 [Note] WSREP: wsrep_sst_grab()*
> *130228 9:54:19 [Note] WSREP: Start replication*

Oleksandr Drach

unread,
Feb 28, 2013, 5:49:37 PM2/28/13
to codersh...@googlegroups.com
Good to know that problem is already known and mostly fixed :)
Thanks, Alex!
Reply all
Reply to author
Forward
0 new messages