'Slave SQL: Could not execute Delete_rows' causes WSREP_FATAL in 23.2.2(r137)

3,868 views
Skip to first unread message

Tom Maguire

unread,
Feb 20, 2013, 7:25:37 AM2/20/13
to codersh...@googlegroups.com
Hi all,

I was hoping for some help issue I recently experienced in which 2 of our galera nodes crashed. Our logs show the following:

[ERROR] Slave SQL: Could not execute Delete_rows event on table obfuscated_table_name.basket_line; Can't find record in 'basket_line', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 559, Error_code: 1032

This seems to be a similar issue to:

and

however those issues have been closed.

Our setup is a 5 node cluster - 3 x database nodes (galera 3.2.2(r137), mysql 5.5.23-23.6), 2 x abritrators (galera galera 3.2.2(r137)). 

I have attached the logs below along with annotations of what I believe is going on, but do not know why the Delete_rows failure would cause this fatal error on two of the nodes.

Any help would be much appreciated,
Thanks
Tom

Log:

-- 1) Fatal error caused by error one delete_rows

Feb 19 20:28:48 ubuntu mysqld: 130219 20:28:48 [ERROR] Slave SQL: Could not execute Delete_rows event on table obfuscated_table_name.basket_line; Can't find record in 'basket_line', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 559, Error_code: 1032
Feb 19 20:28:48 ubuntu mysqld: 130219 20:28:48 [Warning] WSREP: RBR event 2 Delete_rows apply warning: 120, 840116
Feb 19 20:28:48 ubuntu mysqld: 130219 20:28:48 [ERROR] WSREP: Failed to apply trx: source: 7d0d9f8c-79d8-11e2-0800-0e368da98bb3 version: 2 local: 0 state: APPLYING flags: 1 conn_id: 12328 trx_id: 63606907 seqnos (l: 52333, g: 840116, s: 840115, d: 839858, ts: 1361305728372486983)
Feb 19 20:28:48 ubuntu mysqld: 130219 20:28:48 [ERROR] WSREP: Failed to apply app buffer: ۈ#Q#023, seqno: 840116, status: WSREP_FATAL
Feb 19 20:28:48 ubuntu mysqld: #011 at galera/src/replicator_smm.cpp:apply_wscoll():49
Feb 19 20:28:48 ubuntu mysqld: #011 at galera/src/replicator_smm.cpp:apply_trx_ws():120
Feb 19 20:28:48 ubuntu mysqld: 130219 20:28:48 [ERROR] WSREP: Node consistency compromized, aborting...
Feb 19 20:28:48 ubuntu mysqld: 130219 20:28:48 [Note] WSREP: Closing send monitor...
Feb 19 20:28:48 ubuntu mysqld: 130219 20:28:48 [Note] WSREP: Closed send monitor.
Feb 19 20:28:48 ubuntu mysqld: 130219 20:28:48 [Note] WSREP: gcomm: terminating thread

-- 2) Mysql attempts to restart

Feb 19 20:28:48 ubuntu mysqld: 130219 20:28:48 [Note] WSREP: gcomm: joining thread
Feb 19 20:28:48 ubuntu mysqld: 130219 20:28:48 [Note] WSREP: gcomm: closing backend
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: (463cfc89-79d8-11e2-0800-b5c56386e120, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.176.193.179:4567 
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: view(view_id(NON_PRIM,463cfc89-79d8-11e2-0800-b5c56386e120,20) memb {
Feb 19 20:28:50 ubuntu mysqld: #011463cfc89-79d8-11e2-0800-b5c56386e120,
Feb 19 20:28:50 ubuntu mysqld: } joined {
Feb 19 20:28:50 ubuntu mysqld: } left {
Feb 19 20:28:50 ubuntu mysqld: } partitioned {
Feb 19 20:28:50 ubuntu mysqld: #0117d0d9f8c-79d8-11e2-0800-0e368da98bb3,
Feb 19 20:28:50 ubuntu mysqld: #011a2d5652e-79d8-11e2-0800-9927af8c9cd7,
Feb 19 20:28:50 ubuntu mysqld: #011d5e54ccf-79d7-11e2-0800-5ed8cd887f42,
Feb 19 20:28:50 ubuntu mysqld: #011d8d85a8e-79d7-11e2-0800-a128aa8a2bf8,
Feb 19 20:28:50 ubuntu mysqld: })
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: view((empty))
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: gcomm: closed
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: Flow-control interval: [16, 16]
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: Received NON-PRIMARY.
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 840116)
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: Received self-leave message.
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: Flow-control interval: [0, 0]
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: Received SELF-LEAVE. Closing connection.
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 840116)
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: RECV thread exiting 0: Success
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: recv_thread() joined.
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: Closing slave action queue.
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Feb 19 20:28:50 ubuntu mysqld_safe: Number of processes running now: 0
Feb 19 20:28:50 ubuntu mysqld_safe: mysqld restarted
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: Read nil XID from storage engines, skipping position init
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: wsrep_load(): Galera 23.2.2(r137) by Codership Oy <in...@codership.com> loaded succesfully.
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: Passing config to GCS: base_host = 198.101.155.56; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: wsrep_sst_grab()
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: Start replication
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: protonet asio version 0
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: backend: asio
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: GMCast version 0
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: (f8b59949-7ad2-11e2-0800-e69380c792c8, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: (f8b59949-7ad2-11e2-0800-e69380c792c8, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: EVS version 0
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: PC version 0
Feb 19 20:28:50 ubuntu mysqld: 130219 20:28:50 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '10.176.194.90:4567,10.176.193.179:4567,10.177.29.92:4567,10.177.18.3:4567'
Feb 19 20:29:04 ubuntu mysqld: 130219 20:29:04 [Note] WSREP: declaring 7d0d9f8c-79d8-11e2-0800-0e368da98bb3 stable
Feb 19 20:29:04 ubuntu mysqld: 130219 20:29:04 [Note] WSREP: declaring d5e54ccf-79d7-11e2-0800-5ed8cd887f42 stable
Feb 19 20:29:04 ubuntu mysqld: 130219 20:29:04 [Note] WSREP: declaring d8d85a8e-79d7-11e2-0800-a128aa8a2bf8 stable
Feb 19 20:29:04 ubuntu mysqld: 130219 20:29:04 [Note] WSREP: declaring f8c14d61-7ad2-11e2-0800-1e76ffed10e5 stable

-- 3) Monit realises mysql isn't running

Feb 19 20:29:04 ubuntu monit[15278]: 'mysqld' process is not running
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: view(view_id(PRIM,7d0d9f8c-79d8-11e2-0800-0e368da98bb3,22) memb {
Feb 19 20:29:05 ubuntu mysqld: #0117d0d9f8c-79d8-11e2-0800-0e368da98bb3,
Feb 19 20:29:05 ubuntu mysqld: #011d5e54ccf-79d7-11e2-0800-5ed8cd887f42,
Feb 19 20:29:05 ubuntu mysqld: #011d8d85a8e-79d7-11e2-0800-a128aa8a2bf8,
Feb 19 20:29:05 ubuntu mysqld: #011f8b59949-7ad2-11e2-0800-e69380c792c8,
Feb 19 20:29:05 ubuntu mysqld: #011f8c14d61-7ad2-11e2-0800-1e76ffed10e5,
Feb 19 20:29:05 ubuntu mysqld: } joined {
Feb 19 20:29:05 ubuntu mysqld: } left {
Feb 19 20:29:05 ubuntu mysqld: } partitioned {
Feb 19 20:29:05 ubuntu mysqld: #011463cfc89-79d8-11e2-0800-b5c56386e120,
Feb 19 20:29:05 ubuntu mysqld: #011a2d5652e-79d8-11e2-0800-9927af8c9cd7,
Feb 19 20:29:05 ubuntu mysqld: })
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: gcomm: connected
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: Opened channel 'my_wsrep_cluster'
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: Waiting for SST to complete.
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 3, memb_num = 5
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: STATE EXCHANGE: sent state msg: 0136734e-7ad3-11e2-0800-c7c4f270819d
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: STATE EXCHANGE: got state msg: 0136734e-7ad3-11e2-0800-c7c4f270819d from 0 (pwpdb2n2)
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: STATE EXCHANGE: got state msg: 0136734e-7ad3-11e2-0800-c7c4f270819d from 2 (garb)
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: STATE EXCHANGE: got state msg: 0136734e-7ad3-11e2-0800-c7c4f270819d from 1 (garb)
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: STATE EXCHANGE: got state msg: 0136734e-7ad3-11e2-0800-c7c4f270819d from 3 (pwpdb2n1)
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: STATE EXCHANGE: got state msg: 0136734e-7ad3-11e2-0800-c7c4f270819d from 4 (pwpdb2n3)
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: 'garb' demoted SYNCED->PRIMARY due to gap in history: 838099 - 840116
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: 'garb' demoted SYNCED->PRIMARY due to gap in history: 838099 - 840116
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: Quorum results:
Feb 19 20:29:05 ubuntu mysqld: #011version    = 2,
Feb 19 20:29:05 ubuntu mysqld: #011component  = PRIMARY,
Feb 19 20:29:05 ubuntu mysqld: #011conf_id    = 20,
Feb 19 20:29:05 ubuntu mysqld: #011members    = 1/5 (joined/total),
Feb 19 20:29:05 ubuntu mysqld: #011act_id     = 840116,
Feb 19 20:29:05 ubuntu mysqld: #011last_appl. = -1,
Feb 19 20:29:05 ubuntu mysqld: #011protocols  = 0/4/2 (gcs/repl/appl),
Feb 19 20:29:05 ubuntu mysqld: #011group UUID = 9133fe08-59a7-11e2-0800-766eb736d615
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: Flow-control interval: [36, 36]
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 840116)
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: State transfer required: 
Feb 19 20:29:05 ubuntu mysqld: #011Group state: 9133fe08-59a7-11e2-0800-766eb736d615:840116
Feb 19 20:29:05 ubuntu mysqld: #011Local state: 00000000-0000-0000-0000-000000000000:-1
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: New cluster view: global state: 9133fe08-59a7-11e2-0800-766eb736d615:840116, view# 21: Primary, number of nodes: 5, my index: 3, protocol version 2
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Warning] WSREP: Gap in state sequence. Need state transfer.
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: Node 1 (garb) requested state transfer from '*any*'. Selected 0 (pwpdb2n2)(SYNCED) as donor.
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: 1 (garb): State transfer from 0 (pwpdb2n2) complete.
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Warning] WSREP: Rejecting JOIN message from 2 (garb): new State Transfer required.
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: Member 1 (garb) synced with group.
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Warning] WSREP: SYNC message sender from non-JOINED 2 (garb). Ignored.
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Warning] WSREP: 0 (pwpdb2n2): State transfer to 1 (garb) failed: -1 (Operation not permitted)
Feb 19 20:29:05 ubuntu mysqld: 130219 20:29:05 [Note] WSREP: Member 0 (pwpdb2n2) synced with group.

-- 4) Monit tries to restart mysql

Feb 19 20:29:06 ubuntu monit[15278]: 'mysqld' trying to restart
Feb 19 20:29:06 ubuntu monit[15278]: 'mysqld' start: /etc/init.d/mysql
Feb 19 20:29:07 ubuntu mysqld_safe: Starting mysqld daemon with databases from /var/lib/mysql
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: Read nil XID from storage engines, skipping position init
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: wsrep_load(): Galera 23.2.2(r137) by Codership Oy <in...@codership.com> loaded succesfully.
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: Passing config to GCS: base_host = 198.101.155.56; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: wsrep_sst_grab()
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: Start replication
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: protonet asio version 0
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: backend: asio
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: GMCast version 0
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: (027649c0-7ad3-11e2-0800-0a3fac110d13, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: (027649c0-7ad3-11e2-0800-0a3fac110d13, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: EVS version 0
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: PC version 0
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '10.176.194.90:4567,10.176.193.179:4567,10.177.29.92:4567,10.177.18.3:4567'

-- Address already in use because monit trying to restart when mysql instance already running

Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [ERROR] WSREP: Address already in use
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [ERROR] WSREP: failed to open gcomm backend connection: 98: error while trying to listen 'tcp://0.0.0.0:4567?socket.non_blocking=1', asio error 'Address already in use': 98 (Address already in use)
Feb 19 20:29:07 ubuntu mysqld: #011 at gcomm/src/asio_tcp.cpp:listen():757
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -98 (Address already in use)
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to open channel 'my_wsrep_cluster' at 'gcomm://10.176.194.90:4567,10.176.193.179:4567,10.177.29.92:4567,10.177.18.3:4567': -98 (Address already in use)
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [ERROR] WSREP: gcs connect failed: Address already in use
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [ERROR] WSREP: wsrep::connect() failed: 6
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [ERROR] Aborting
Feb 19 20:29:07 ubuntu mysqld: 
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: Service disconnected.
Feb 19 20:29:07 ubuntu mysqld: 130219 20:29:07 [Note] WSREP: Running: 'wsrep_sst_rsync 'joiner' '198.101.155.56' 'root:' '/var/lib/mysql/' '/etc/mysql/conf.d/wsrep.cnf' '10882' 2>sst.err'
Feb 19 20:29:07 ubuntu rsyncd[11362]: rsyncd version 3.0.9 starting, listening on port 4444
Feb 19 20:29:08 ubuntu mysqld: 130219 20:29:08 [Note] WSREP: Node 4 (pwpdb2n3) requested state transfer from '*any*'. Selected 0 (pwpdb2n2)(SYNCED) as donor.
Feb 19 20:29:08 ubuntu mysqld: 130219 20:29:08 [Note] WSREP: Prepared SST request: rsync|198.101.155.56:4444/rsync_sst
Feb 19 20:29:08 ubuntu mysqld: 130219 20:29:08 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Feb 19 20:29:08 ubuntu mysqld: 130219 20:29:08 [Note] WSREP: Assign initial position for certification: 840116, protocol version: 2

-- Failure to prepare for state transfer

Feb 19 20:29:08 ubuntu mysqld: 130219 20:29:08 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (9133fe08-59a7-11e2-0800-766eb736d615): 1 (Operation not permitted)
Feb 19 20:29:08 ubuntu mysqld: #011 at galera/src/replicator_str.cpp:prepare_for_IST():440. IST will be unavailable.
Feb 19 20:29:08 ubuntu mysqld: 130219 20:29:08 [Note] WSREP: Requesting state transfer failed: -11(Resource temporarily unavailable). Will keep retrying every 1 second(s)
Feb 19 20:29:08 ubuntu mysqld: 130219 20:29:08 [Note] WSREP: Some threads may fail to exit.
Feb 19 20:29:08 ubuntu mysqld: 130219 20:29:08 [Note] /usr/sbin/mysqld: Shutdown complete
Feb 19 20:29:08 ubuntu mysqld: 
Feb 19 20:29:08 ubuntu mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended
Feb 19 20:29:20 ubuntu /etc/init.d/mysql[11435]: 0 processes alive
Feb 19 20:29:20 ubuntu /etc/init.d/mysql[11435]: 
Feb 19 20:31:39 ubuntu mysqld: 130219 20:31:39 [Note] WSREP: 0 (pwpdb2n2): State transfer to 4 (pwpdb2n3) complete.
Feb 19 20:31:39 ubuntu mysqld: 130219 20:31:39 [Note] WSREP: Member 0 (pwpdb2n2) synced with group.
Feb 19 20:31:39 ubuntu mysqld: 130219 20:31:39 [Note] WSREP: Node 3 (pwpdb2n1) requested state transfer from '*any*'. Selected 0 (pwpdb2n2)(SYNCED) as donor.
Feb 19 20:31:39 ubuntu mysqld: 130219 20:31:39 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 840116)
Feb 19 20:31:39 ubuntu mysqld: 130219 20:31:39 [Note] WSREP: Requesting state transfer: success after 152 tries, donor: 0
Feb 19 20:31:39 ubuntu rsyncd[12737]: name lookup failed for 198.61.162.118: Name or service not known
Feb 19 20:31:39 ubuntu rsyncd[12737]: connect from UNKNOWN (198.61.162.118)
Feb 19 20:31:39 ubuntu rsyncd[12737]: rsync to rsync_sst/ from UNKNOWN (198.61.162.118)
Feb 19 20:31:39 ubuntu rsyncd[12737]: receiving file list
Feb 19 20:31:41 ubuntu mysqld: 130219 20:31:41 [Note] WSREP: 4 (pwpdb2n3): State transfer from 0 (pwpdb2n2) complete.
Feb 19 20:31:41 ubuntu mysqld: 130219 20:31:41 [Note] WSREP: Member 4 (pwpdb2n3) synced with group.

-- Sync completed

Feb 19 20:34:11 ubuntu rsyncd[12737]: sent 52508 bytes  received 1099036504 bytes  total size 1098711453
Feb 19 20:34:11 ubuntu rsyncd[13437]: name lookup failed for 198.61.162.118: Name or service not known
Feb 19 20:34:11 ubuntu rsyncd[13437]: connect from UNKNOWN (198.61.162.118)
Feb 19 20:34:11 ubuntu rsyncd[13437]: rsync to rsync_sst/ from UNKNOWN (198.61.162.118)
Feb 19 20:34:11 ubuntu rsyncd[13437]: receiving file list
Feb 19 20:34:11 ubuntu mysqld: 130219 20:34:11 [Note] WSREP: 0 (pwpdb2n2): State transfer to 3 (pwpdb2n1) complete.
Feb 19 20:34:11 ubuntu mysqld: 130219 20:34:11 [Note] WSREP: Member 0 (pwpdb2n2) synced with group.
Feb 19 20:34:11 ubuntu rsyncd[13437]: sent 54 bytes  received 179 bytes  total size 44
Feb 19 20:34:12 ubuntu rsyncd[11362]: rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(549) [Receiver=3.0.9]
Feb 19 20:34:12 ubuntu mysqld: 130219 20:34:12 [Note] WSREP: SST complete, seqno: 840116
Feb 19 20:34:12 ubuntu mysqld: 130219 20:34:12 InnoDB: The InnoDB memory heap is disabled
Feb 19 20:34:12 ubuntu mysqld: 130219 20:34:12 InnoDB: Mutexes and rw_locks use GCC atomic builtins
Feb 19 20:34:12 ubuntu mysqld: 130219 20:34:12 InnoDB: Compressed tables use zlib 1.2.3.3
Feb 19 20:34:12 ubuntu mysqld: 130219 20:34:12 InnoDB: Using Linux native AIO
Feb 19 20:34:12 ubuntu mysqld: 130219 20:34:12 InnoDB: Initializing buffer pool, size = 128.0M
Feb 19 20:34:12 ubuntu mysqld: 130219 20:34:12 InnoDB: Completed initialization of buffer pool
Feb 19 20:34:12 ubuntu mysqld: 130219 20:34:12 InnoDB: highest supported file format is Barracuda.
Feb 19 20:34:12 ubuntu mysqld: InnoDB: Log scan progressed past the checkpoint lsn 3243507631
Feb 19 20:34:12 ubuntu mysqld: 130219 20:34:12  InnoDB: Database was not shut down normally!
Feb 19 20:34:12 ubuntu mysqld: InnoDB: Starting crash recovery.
Feb 19 20:34:12 ubuntu mysqld: InnoDB: Reading tablespace information from the .ibd files...
Feb 19 20:34:12 ubuntu mysqld: InnoDB: Restoring possible half-written data pages from the doublewrite
Feb 19 20:34:12 ubuntu mysqld: InnoDB: buffer...
Feb 19 20:34:12 ubuntu mysqld: InnoDB: Doing recovery: scanned up to log sequence number 3243507661
Feb 19 20:34:12 ubuntu mysqld: 130219 20:34:12  InnoDB: Waiting for the background threads to start
Feb 19 20:34:13 ubuntu mysqld: 130219 20:34:13 InnoDB: 1.1.8 started; log sequence number 3243507661
Feb 19 20:34:13 ubuntu mysqld: 130219 20:34:13 [Note] Event Scheduler: Loaded 0 events
Feb 19 20:34:13 ubuntu mysqld: 130219 20:34:13 [Note] WSREP: Signalling provider to continue.
Feb 19 20:34:13 ubuntu mysqld: 130219 20:34:13 [Note] WSREP: SST received: 9133fe08-59a7-11e2-0800-766eb736d615:840116
Feb 19 20:34:13 ubuntu mysqld: 130219 20:34:13 [Note] WSREP: 3 (pwpdb2n1): State transfer from 0 (pwpdb2n2) complete.
Feb 19 20:34:13 ubuntu mysqld: 130219 20:34:13 [Note] WSREP: Shifting JOINER -> JOINED (TO: 840116)
Feb 19 20:34:13 ubuntu mysqld: 130219 20:34:13 [Note] WSREP: Member 3 (pwpdb2n1) synced with group.
Feb 19 20:34:13 ubuntu mysqld: 130219 20:34:13 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 840116)
Feb 19 20:34:13 ubuntu mysqld: 130219 20:34:13 [Note] WSREP: Synchronized with group, ready for connections
Feb 19 20:34:13 ubuntu mysqld: 130219 20:34:13 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Feb 19 20:34:13 ubuntu mysqld: 130219 20:34:13 [Note] /usr/sbin/mysqld: ready for connections.
Feb 19 20:34:13 ubuntu mysqld: Version: '5.5.23'  socket: '/var/run/mysqld/mysqld.sock'  port: 1470  Source distribution, wsrep_23.6.r3755

Henrik Ingo

unread,
Feb 20, 2013, 8:32:31 AM2/20/13
to Tom Maguire, codersh...@googlegroups.com
This simply means the data on your nodes had diverged. A row that was
deleted on the first node, was not present on the other nodes. Galera
regards this as a serious error and shuts down the conflicting nodes
to avoid further data inconsistencies. (So to be technically precise,
nodes did not crash.)

One known bug that currently is still there is if you do ALTER TABLE
and add an auto_increment column on a table with existing data. It can
be done, but if you just do it out of the box without some
precautions, you could experience something like this. (But this is a
bug, so in the future it should be fixed, at least for most common
cases.

It's also possible to manually create a situation where some nodes are
different from the others. But galera protects relatively well against
that, you have to try really hard to succeed in making such mistakes.

henrik
> --
> You received this message because you are subscribed to the Google Groups
> "codership" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to codership-tea...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



--
henri...@avoinelama.fi
+358-40-8211286 skype: henrik.ingo irc: hingo
www.openlife.cc

My LinkedIn profile: http://www.linkedin.com/profile/view?id=9522559

Tom Maguire

unread,
Feb 20, 2013, 8:46:04 AM2/20/13
to codersh...@googlegroups.com, Tom Maguire, henri...@avoinelama.fi
Hi Henrik,

Thanks for your input - it sounds like that is what has happened to us.

There was a migration run on the basket_line table (where the error occurred ) which ALTERed the primary key to be an auto_increment instead of the varchar it was before. This migration was run against 1 of the 3 database nodes within the cluster and then the error occurred on the first time we tried to run a DELETE query against that table.

Does this sound like the bug you spoke of?

Thanks
Tom 

Ilias Bertsimas

unread,
Feb 20, 2013, 9:30:18 AM2/20/13
to codersh...@googlegroups.com, Tom Maguire, henri...@avoinelama.fi
Hello Tom,

It seems that it is the same issue I faced a while back with ALTER on an auto_increment PK.

Kind Regards,
Ilias.

Henrik Ingo

unread,
Feb 20, 2013, 11:22:23 AM2/20/13
to Ilias Bertsimas, codersh...@googlegroups.com, Tom Maguire
Yes, I was too lazy to search for it :-)

Note that there is a workaround described in the comments.

henrik

Tom Maguire

unread,
Feb 20, 2013, 11:55:24 AM2/20/13
to codersh...@googlegroups.com, Ilias Bertsimas, Tom Maguire, henri...@avoinelama.fi
Thanks Henrick, Ilias :)

Laurent MINOST

unread,
Feb 21, 2013, 9:26:50 AM2/21/13
to codersh...@googlegroups.com, Ilias Bertsimas, Tom Maguire, henri...@avoinelama.fi
Hi everyone,

Sorry but I do not understand well then, so what is the purpose of the wsrep_auto_increment_control setting then as it causes as a result some data inconsistency ? Do we need to disable it and so its default value should be set to OFF ?
Thanks for your lights.

Regards,

Laurent

Ilias Bertsimas

unread,
Feb 21, 2013, 9:42:02 AM2/21/13
to codersh...@googlegroups.com, Ilias Bertsimas, Tom Maguire, henri...@avoinelama.fi
Hi Laurent,

The purpose of wsrep_auto_increment_control is to allow galera to control the auto_increment columns on insert across multiple nodes and make sure that no 2 rows are using the same auto_increment value. It automatically provides the offset and is usually the number of nodes in the cluster.
Disabling it is a workaround for the issue mentioned in the bug report and if you do not ALTER PK columns that use auto_increment while there is already data in the table you should be fine.
You can probably disable it if you only insert on a single node in a master-slave galera setup or if you can separate your workload across nodes in a way that tables with auto_increment columns have inserts only in a single node at any point in time.

Kind Regards,
Ilias.

Laurent MINOST

unread,
Feb 21, 2013, 10:36:20 AM2/21/13
to codersh...@googlegroups.com, Ilias Bertsimas, Tom Maguire, henri...@avoinelama.fi
Hi Ilias,

Thanks for this clear answer ! :)
So if I understood well, wsrep_auto_increment_control should be avoided in "multi-master" use due to this bug ? As we plan to use galera cluster with haproxy in front with a "3 way" setup ( traffic from application1 go to haproxy ip/port1 which have node1 as primary node and node2/3 as backup, traffic from application2 go to haproxy ip/port2 which have node2 as primary node and node1/3 as backup, ...), we will write only to one node at a time for a database, and so should not have this problem so I will disable wsrep_auto_increment_control.

Regards,

Laurent

Henrik Ingo

unread,
Feb 22, 2013, 2:52:10 PM2/22/13
to Laurent MINOST, codersh...@googlegroups.com, Ilias Bertsimas
On Thu, Feb 21, 2013 at 5:36 PM, Laurent MINOST <lol...@gmail.com> wrote:
> Hi Ilias,
>
> Thanks for this clear answer ! :)
> So if I understood well, wsrep_auto_increment_control should be avoided in
> "multi-master" use due to this bug ?

NO! This is not the point with this bug. Most people / anyone who
doesn't understand or want to think about it should keep
wsrep_auto_increment_control *enabled*, which is the default.

The bug happens if you have an existing table without an
auto_increment column, and then do an ALTER TABLE ...ADD COLUMN ....
auto_increment. Currently this doesn't work as expected, and the
comments in the bug describe how to deal with it. Note that this is a
bug and will be fixed.

Note that doing such an ALTER TABLE on existing data is rather rare.
For anything else that users do to their MySQL data there are no known
bugs related to auto_increment control. Also note that even with this
bug, Galera will actually recover from it without any data corruption
and even without downtime if you use xtrabackup sst method.

> As we plan to use galera cluster with
> haproxy in front with a "3 way" setup ( traffic from application1 go to
> haproxy ip/port1 which have node1 as primary node and node2/3 as backup,
> traffic from application2 go to haproxy ip/port2 which have node2 as primary
> node and node1/3 as backup, ...), we will write only to one node at a time
> for a database, and so should not have this problem so I will disable
> wsrep_auto_increment_control.
>

Sounds like you could, but I would still leave it to the default if I were you.

henrik

Laurent MINOST

unread,
Feb 25, 2013, 5:03:40 AM2/25/13
to codersh...@googlegroups.com, Laurent MINOST, Ilias Bertsimas, henri...@avoinelama.fi
Hi Henrik,

Thanks for this explanation, so it brings more details on the problem, I will finally follow your advise and keep wsrep_auto_increment_control as its default value !
I saw that this bug was reported by Alexey on 2010-05-29, so I think it is/was a low priority bug but as it seems to affect some more people on these forums then maybe the occurrence is a little bit higher and so it should be added to the "fix list" of a near future release, don't you think ? 

Regards,

Laurent

Henrik Ingo

unread,
Feb 25, 2013, 6:01:59 AM2/25/13
to Laurent MINOST, codersh...@googlegroups.com, Ilias Bertsimas
On Mon, Feb 25, 2013 at 12:03 PM, Laurent MINOST <lol...@gmail.com> wrote:
> I saw that this bug was reported by Alexey on 2010-05-29, so I think it
> is/was a low priority bug but as it seems to affect some more people on
> these forums then maybe the occurrence is a little bit higher and so it
> should be added to the "fix list" of a near future release, don't you think
> ?

It was apparently seen already in 2010, but really became actively
complained about only this winter (October).

Fixing this bug is certainly something that will happen. I suppose the
existence of a workaround has otoh made it less critical when compared
to other bugs we've fixed in the past months.
Reply all
Reply to author
Forward
0 new messages