Now we see that it simply crashes:
https://bugs.launchpad.net/codership-mysql/+bug/1078346
On 2013-02-28 19:41, Oleksandr Drach wrote:
> Sure, Alex!
> Issue is now easy catchable. On removing wsrep.cnf Slave works
> normally. On
> adding wsrep component it works few seconds and then exits, causing
> mysql
> daemon to stop working with dump.
>
>
> On Thursday, February 28, 2013 6:41:57 PM UTC+2, Alexey Yurchenko
> wrote:
>>
>> So, Oleksanr, would you care to give us some real information
>> finally?
>>
>> when master-slave replication breaks:
>> - what is binlog_format on master and slave?
>>
> Binlog formats are dewscribed below
> *Master*
> ...
> *server-id = 1*
> *log-bin = /var/lib/mysql/TheMaster-bin
> *
> *binlog-format = ROW*
> *expire_logs_days = 10*
> *max_binlog_size = 100M*
> ...
> *Slave*
> ...
> *server-id = 2*
> *#replicate-wild-ignore-table = information_schema.%*
> *#replicate-wild-ignore-table = performance_schema.%*
> *log_bin = /var/log/mysql/mysql-bin.log*
> *log-bin = /var/lib/mysql/TheSlave-bin*
> *binlog-format = ROW*
> *expire_logs_days = 10*
> *max_binlog_size = 100M*
> *# Relay logs*
> *log_slave_updates = ON*
> *relay_log_index = relay-bin.index*
> *relay_log = relay-bin*
> ...
>
>> - what is SHOW STATUS LIKE 'wsrep%' on slave?
>>
> I have run continuously SHOW STATUS LIKE 'wsrep%' on slave and here
> what
> I've got.
>
> *mysql> show status like "wsrep%";*
>
> *+----------------------------+--------------------------------------+*
> *| Variable_name | Value
> |*
>
> *+----------------------------+--------------------------------------+*
> *| wsrep_local_state_uuid | 1c46b8ff-81c6-11e2-0800-13893d0f3134
> |*
> *| wsrep_protocol_version | 4
> |*
> *| wsrep_last_committed | 88
> |*
> *| wsrep_replicated | 47
> |*
> *| wsrep_replicated_bytes | 17236
> |*
> *| wsrep_received | 2
> |*
> *| wsrep_received_bytes | 140
> |*
> *| wsrep_local_commits | 47
> |*
> *| wsrep_local_cert_failures | 0
> |*
> *| wsrep_local_bf_aborts | 0
> |*
> *| wsrep_local_replays | 0
> |*
> *| wsrep_local_send_queue | 0
> |*
> *| wsrep_local_send_queue_avg | 0.000000
> |*
> *| wsrep_local_recv_queue | 0
> |*
> *| wsrep_local_recv_queue_avg | 0.000000
> |*
> *| wsrep_flow_control_paused | 0.000000
> |*
> *| wsrep_flow_control_sent | 0
> |*
> *| wsrep_flow_control_recv | 0
> |*
> *| wsrep_cert_deps_distance | 20.829787
> |*
> *| wsrep_apply_oooe | 0.000000
> |*
> *| wsrep_apply_oool | 0.000000
> |*
> *| wsrep_apply_window | 0.000000
> |*
> *| wsrep_commit_oooe | 0.000000
> |*
> *| wsrep_commit_oool | 0.000000
> |*
> *| wsrep_commit_window | 0.000000
> |*
> *| wsrep_local_state | 4
> |*
> *| wsrep_local_state_comment | Synced
> |*
> *| wsrep_cert_index_size | 87
> |*
> *| wsrep_causal_reads | 0
> |*
> *| wsrep_incoming_addresses | 10.0.0.89 |*
> *| wsrep_cluster_conf_id | 1
> |*
> *| wsrep_cluster_size | 1
> |*
> *| wsrep_cluster_state_uuid | 1c46b8ff-81c6-11e2-0800-13893d0f3134
> |*
> *| wsrep_cluster_status | Primary
> |*
> *| wsrep_connected | ON
> |*
> *| wsrep_local_index | 0
> |*
> *| wsrep_provider_name | Galera
> |*
> *| wsrep_provider_vendor | Codership Oy <
in...@codership.com>
> |*
> *| wsrep_provider_version | 23.2.2(r137)
> |*
> *| wsrep_ready | ON
> |*
>
> *+----------------------------+--------------------------------------+*
> *40 rows in set (0.00 sec)*
> *
> *
> *mysql> show status like "wsrep%";*
> *ERROR 2006 (HY000): MySQL server has gone away*
> *No connection. Trying to reconnect...*
> *ERROR 2002 (HY000): Can't connect to local MySQL server through
> socket
> '/var/run/mysqld/mysqld.sock' (111)*
> *ERROR: *
> *Can't connect to the server*
>
>
>> - what is SHOW FULL PROCESSLIST on slave?
>>
>
> Here it is in the moment of crash
> *mysql> SHOW FULL PROCESSLIST;*
> *
>
> +----+------------------+-----------+------+---------+------+-----------------------------------------------------------------------------+-----------------------+
> *
> *| Id | User | Host | db | Command | Time | State
> |
> Info
> |*
> *
>
> +----+------------------+-----------+------+---------+------+-----------------------------------------------------------------------------+-----------------------+
> *
> *| 1 | system user | | NULL | Sleep | 25 | wsrep
> aborter idle
> |
> NULL |*
> *| 2 | system user | | NULL | Sleep | 25 | NULL
> | NULL
> |*
> *| 3 | system user | | NULL | Connect | 13 | Slave
> has
> read all relay log; waiting for the slave I/O thread to update it |
> NULL
> |*
> *| 4 | system user | | NULL | Connect | 22 |
> Waiting for
> master to send event |
> NULL
> |*
> *| 5 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 6 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 7 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 8 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 9 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 10 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 11 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 12 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 13 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 14 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 15 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 16 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 17 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 18 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 19 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 20 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 21 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 22 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 23 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 24 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 25 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 26 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *| 27 | mysql_replica | localhost | NULL | Query | 0 |
> sleeping
> | SHOW
> FULL
> PROCESSLIST |*
> *| 28 | system user | | NULL | Sleep | 22 | NULL
> | NULL
> |*
> *
>
> +----+------------------+-----------+------+---------+------+-----------------------------------------------------------------------------+-----------------------+
> *
> *28 rows in set (0.00 sec)*
> *
> *
> *mysql> SHOW FULL PROCESSLIST;*
> *ERROR 2006 (HY000): MySQL server has gone away*
> *No connection. Trying to reconnect...*
> *ERROR 2002 (HY000): Can't connect to local MySQL server through
> socket
> '/var/run/mysqld/mysqld.sock' (111)*
> *ERROR: *
> *Can't connect to the server*
>
> - what is the last 5 minutes in the error log on slave?
>>
> *130228 9:53:55 [Note] Slave SQL thread initialized, starting
> replication
> in log 'TheMaster-bin.000109' at position 6095170, relay log
> './relay-bin.000010' position: 99693*
> *16:53:55 UTC - mysqld got signal 11 ;*
> *This could be because you hit a bug. It is also possible that this
> binary*
> *or one of the libraries it was linked against is corrupt, improperly
> built,
> *
> *or misconfigured. This error can also be caused by malfunctioning
> hardware.
> *
> *We will try our best to scrape up some info that will hopefully
> help*
> *diagnose the problem, but since we have already crashed,*
> *something is definitely wrong and this may fail.*
> *
> *
> *key_buffer_size=134217728*
> *read_buffer_size=131072*
> *max_used_connections=0*
> *max_threads=1200*
> *thread_count=17*
> *connection_count=17*
> *It is possible that mysqld could use up to*
> *key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads
> =
> 2756409 K bytes of memory*
> *Hope that's ok; if not, decrease some variables in the equation.*
> *
> *
> *Thread pointer: 0x7f81dc000990*
> *Attempting backtrace. You can use the following information to find
> out*
> *where mysqld died. If you see no messages after this, something
> went*
> *terribly wrong...*
> *stack_bottom = 7f82a808c6a8 thread_stack 0x2d800*
> */usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8287a5]*
> */usr/sbin/mysqld(handle_fatal_signal+0x36a)[0x6a18ca]*
> */lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f82bbc93cb0]*
> */usr/sbin/mysqld(wsrep_dict_foreign_find_index+0x15)[0x8cc965]*
> */usr/sbin/mysqld(wsrep_append_foreign_key+0x48e)[0x83592e]*
> */usr/sbin/mysqld[0x8656c0]*
> */usr/sbin/mysqld[0x868df5]*
> */usr/sbin/mysqld[0x869a69]*
> */usr/sbin/mysqld[0x854991]*
> */usr/sbin/mysqld[0x838777]*
> */usr/sbin/mysqld(_ZN7handler13ha_delete_rowEPKh+0x68)[0x6a6d68]*
> *
>
> /usr/sbin/mysqld(_ZN21Delete_rows_log_event11do_exec_rowEPK14Relay_log_info+0xfe)[0x74e0ce]
> *
> *
>
> /usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info+0x266)[0x7540d6]
> *
> *
>
> /usr/sbin/mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14Relay_log_info+0x1bd)[0x5286ad]
> *
> */usr/sbin/mysqld[0x52de16]*
> */usr/sbin/mysqld(handle_slave_sql+0x8f4)[0x52f334]*
> */lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f82bbc8be9a]*
> */lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f82bac614bd]*
> *
> *
> *Trying to get some variables.*
> *Some pointers may be invalid and cause the dump to abort.*
> *Query (0): is an invalid pointer*
> *Connection ID (thread ID): 4*
> *Status: NOT_KILLED*
> *
> *
> *The manual page at
http://dev.mysql.com/doc/mysql/en/crashing.html
> contains
> *
> *information that should help you find out what is causing the
> crash.*
> *130228 09:53:55 mysqld_safe Number of processes running now: 0*
> *130228 09:53:55 mysqld_safe WSREP: sleeping 15 seconds before
> restart*
> *130228 09:54:10 mysqld_safe mysqld restarted*
> *130228 09:54:10 mysqld_safe WSREP: Running position recovery with
> --log_error=/tmp/tmp.cwOJEDHjwd*
> *130228 09:54:19 mysqld_safe WSREP: Recovered position
> 1c46b8ff-81c6-11e2-0800-13893d0f3134:41*
> *130228 9:54:19 [Note] WSREP: wsrep_start_position var submitted:
> '1c46b8ff-81c6-11e2-0800-13893d0f3134:41'*
> *130228 9:54:19 [Warning] options --log-slow-admin-statements,
> --log-queries-not-using-indexes and --log-slow-slave-statements have
> no
> effect if --log_slow_queries is not set*
> *130228 9:54:19 [Note] WSREP: Read nil XID from storage engines,
> skipping
> position init*
> *130228 9:54:19 [Note] WSREP: wsrep_load(): loading provider library
> '/usr/lib/galera/libgalera_smm.so'*
> *130228 9:54:19 [Note] WSREP: wsrep_load(): Galera 23.2.2(r137) by
> Codership Oy <
in...@codership.com> loaded succesfully.*
> *130228 9:54:19 [Note] WSREP: Found saved state:
> 1c46b8ff-81c6-11e2-0800-13893d0f3134:-1*
> *130228 9:54:19 [Note] WSREP: Reusing existing
> '/media/mysql//galera.cache'.*
> *130228 9:54:19 [Note] WSREP: Passing config to GCS: base_host =
> 10.0.0.89; base_port = 4567; cert.log_conflicts = no;
> evs.consensus_timeout
> = PT1M; evs.inactive_check_period = PT10S; evs.inactive_timeout =
> PT1M;
> evs.keepalive_period = PT3S; evs.send_window = 1024;
> evs.suspect_timeout =
> PT30S; evs.user_send_window = 512; gcache.dir = /media/mysql/;
> gcache.keep_pages_size = 0; gcache.mem_size = 0;
gcache.name =
> /media/mysql//galera.cache; gcache.page_size = 128M; gcache.size =
> 128M;
> gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 128;
> gcs.fc_master_slave = NO; gcs.max_packet_size = 64500;
> gcs.max_throttle =
> 0.25; gcs.recv_q_hard_limit = 9223372036854775807;
> gcs.recv_q_soft_limit =
> 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://
10.0.0.89:4567;
> replicator.causal_read_timeout = PT30S; replicator.commit_order = 3*
> *130228 9:54:19 [Note] WSREP: Assign initial position for
> certification:
> 41, protocol version: -1*
> *130228 9:54:19 [Note] WSREP: wsrep_sst_grab()*
> *130228 9:54:19 [Note] WSREP: Start replication*