Hi. During a large transaction of a few million rows on our primary node, it seems the secondary node became inconsistent somehow. We only read/write from/to the primary node. We are using the following:
This is the second time this has happened. A full SST was also initiated immediately after this error happened which eventually brought down the primary node due to another error I'm trying to troubleshoot.
121109 8:12:38 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000022 of size 134217728 bytes
121109 8:12:43 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000022
121109 8:12:43 [Note] WSREP: view(view_id(PRIM,c23bca00-1fb1-11e2-0800-306a616525db,6) memb {
c23bca00-1fb1-11e2-0800-306a616525db,
} joined {
} left {
} partitioned {
364f0c86-1fd5-11e2-0800-bab6a4443916,
})
121109 8:12:43 [Note] WSREP: forgetting 364f0c86-1fd5-11e2-0800-bab6a4443916 (tcp://
10.7.25.110:4567)
121109 8:12:43 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
121109 8:12:43 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 25992e04-2a6f-11e2-0800-351b63128070
121109 8:12:43 [Warning] WSREP: Last Applied Action message in non-primary configuration from member 0
121109 8:12:43 [Note] WSREP: STATE EXCHANGE: sent state msg: 25992e04-2a6f-11e2-0800-351b63128070
121109 8:12:43 [Note] WSREP: STATE EXCHANGE: got state msg: 25992e04-2a6f-11e2-0800-351b63128070 from 0 (maindb-node0)
121109 8:12:43 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 5,
members = 1/1 (joined/total),
act_id = 10182391,
last_appl. = 10182382,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = e459a41e-19ed-11e2-0800-28913924bd33
121109 8:12:43 [Note] WSREP: Flow-control interval: [253, 256]
121109 8:12:43 [Note] WSREP: New cluster view: global state: e459a41e-19ed-11e2-0800-28913924bd33:10182391, view# 6: Primary, number of nodes: 1, my index: 0, protocol version 2
121109 8:12:43 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
121109 8:12:44 [Note] WSREP: Assign initial position for certification: 10182391, protocol version: 2
121109 8:12:44 [Note] WSREP: remote endpoint tcp://
10.7.25.110:4567 changed identity 364f0c86-1fd5-11e2-0800-bab6a4443916 -> 261706d0-2a6f-11e2-0800-48fc91a1cea0
121109 8:12:44 [Note] WSREP: declaring 261706d0-2a6f-11e2-0800-48fc91a1cea0 stable
121109 8:12:44 [Note] WSREP: view(view_id(PRIM,261706d0-2a6f-11e2-0800-48fc91a1cea0,7) memb {
261706d0-2a6f-11e2-0800-48fc91a1cea0,
c23bca00-1fb1-11e2-0800-306a616525db,
} joined {
} left {
} partitioned {
})
121109 8:12:44 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
121109 8:12:44 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
121109 8:12:44 [Note] WSREP: STATE EXCHANGE: sent state msg: 26649fda-2a6f-11e2-0800-e7466d8051c6
121109 8:12:44 [Note] WSREP: STATE EXCHANGE: got state msg: 26649fda-2a6f-11e2-0800-e7466d8051c6 from 0 (mysql-db1-standby)
121109 8:12:44 [Note] WSREP: STATE EXCHANGE: got state msg: 26649fda-2a6f-11e2-0800-e7466d8051c6 from 1 (maindb-node0)
121109 8:12:44 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 6,
members = 1/2 (joined/total),
act_id = 10182393,
last_appl. = 10182382,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = e459a41e-19ed-11e2-0800-28913924bd33
121109 8:12:44 [Note] WSREP: Flow-control interval: [253, 256]
121109 8:12:44 [Note] WSREP: New cluster view: global state: e459a41e-19ed-11e2-0800-28913924bd33:10182393, view# 7: Primary, number of nodes: 2, my index: 1, protocol version 2
121109 8:12:44 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
121109 8:12:44 [Note] WSREP: Assign initial position for certification: 10182393, protocol version: 2
------------------------------------------------------------------------------------
121109 8:12:38 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000022 of size 134217728 bytes
121109 8:12:42 [ERROR] Slave SQL: Could not execute Delete_rows event on table p1.pos; Can't find record in 'pos', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 978, Error_code: 1032
121109 8:12:42 [Warning] WSREP: RBR event 2 Delete_rows apply warning: 120, 10182390
121109 8:12:43 [ERROR] WSREP: Failed to apply trx: source: c23bca00-1fb1-11e2-0800-306a616525db version: 2 local: 0 state: APPLYING flags: 1 conn_id: 1219339 trx_id: 45372703 seqnos (l: 1396875, g: 10182390, s: 10182389, d: 10182380, ts: 1352466709844062000)
121109 8:12:43 [ERROR] WSREP: Failed to apply app buffer: ^U^A<9d>P^Sö¦£, seqno: 10182390, status: WSREP_FATAL
at galera/src/replicator_smm.cpp:apply_wscoll():49
at galera/src/replicator_smm.cpp:apply_trx_ws():120
121109 8:12:43 [ERROR] WSREP: Node consistency compromized, aborting...
121109 8:12:43 [Note] WSREP: Closing send monitor...
121109 8:12:43 [Note] WSREP: Closed send monitor.
121109 8:12:43 [Note] WSREP: gcomm: terminating thread
121109 8:12:43 [Note] WSREP: gcomm: joining thread
121109 8:12:43 [Note] WSREP: gcomm: closing backend
121109 8:12:43 [Note] WSREP: view(view_id(NON_PRIM,364f0c86-1fd5-11e2-0800-bab6a4443916,5) memb {
364f0c86-1fd5-11e2-0800-bab6a4443916,
} joined {
} left {
} partitioned {
c23bca00-1fb1-11e2-0800-306a616525db,
})
121109 8:12:43 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
121109 8:12:43 [Note] WSREP: view((empty))
121109 8:12:43 [Note] WSREP: gcomm: closed
121109 8:12:43 [Note] WSREP: Flow-control interval: [253, 256]
121109 8:12:43 [Note] WSREP: Received NON-PRIMARY.
121109 8:12:43 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 10182391)
121109 8:12:43 [Note] WSREP: Received self-leave message.
121109 8:12:43 [Note] WSREP: Flow-control interval: [253, 256]
121109 8:12:43 [Note] WSREP: Received SELF-LEAVE. Closing connection.
121109 8:12:43 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 10182391)
121109 8:12:43 [Note] WSREP: RECV thread exiting 0: Success
121109 8:12:43 [Note] WSREP: recv_thread() joined.
121109 8:12:43 [Note] WSREP: Closing slave action queue.
121109 8:12:43 [Note] WSREP: /usr/sbin/mysqld: Terminated.
121109 08:12:43 mysqld_safe Number of processes running now: 0
121109 08:12:43 mysqld_safe mysqld restarted
121109 8:12:44 [Note] Flashcache bypass: disabled
121109 8:12:44 [Note] Flashcache setup error is : ioctl failed
121109 8:12:44 [Note] WSREP: Read nil XID from storage engines, skipping position init
121109 8:12:44 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/libgalera_smm.so'
121109 8:12:44 [Note] WSREP: wsrep_load(): Galera 2.2(r114) by Codership Oy <
in...@codership.com> loaded succesfully.
121109 8:12:44 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
121109 8:12:44 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.
121109 8:12:44 [Note] WSREP: Passing config to GCS: base_host = 10.7.25.110; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0;
gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.99; gcs.fc_limit = 256; gcs.fc_master_slave = yes; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
121109 8:12:44 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
121109 8:12:44 [Note] WSREP: wsrep_sst_grab()
121109 8:12:44 [Note] WSREP: Start replication
121109 8:12:44 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
121109 8:12:44 [Note] WSREP: protonet asio version 0
121109 8:12:44 [Note] WSREP: backend: asio
121109 8:12:44 [Note] WSREP: GMCast version 0
121109 8:12:44 [Note] WSREP: (261706d0-2a6f-11e2-0800-48fc91a1cea0, 'tcp://
0.0.0.0:4567') listening at tcp://
0.0.0.0:4567121109 8:12:44 [Note] WSREP: (261706d0-2a6f-11e2-0800-48fc91a1cea0, 'tcp://
0.0.0.0:4567') multicast: , ttl: 1
121109 8:12:44 [Note] WSREP: EVS version 0
121109 8:12:44 [Note] WSREP: PC version 0
121109 8:12:44 [Note] WSREP: gcomm: connecting to group 'maindb', peer '
10.7.25.111:'
121109 8:12:44 [Note] WSREP: declaring c23bca00-1fb1-11e2-0800-306a616525db stable
121109 8:12:44 [Note] WSREP: view(view_id(PRIM,261706d0-2a6f-11e2-0800-48fc91a1cea0,7) memb {
261706d0-2a6f-11e2-0800-48fc91a1cea0,
c23bca00-1fb1-11e2-0800-306a616525db,
} joined {
} left {
} partitioned {
})
121109 8:12:44 [Note] WSREP: gcomm: connected
121109 8:12:44 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
121109 8:12:44 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
121109 8:12:44 [Note] WSREP: Opened channel 'maindb'
121109 8:12:44 [Note] WSREP: Waiting for SST to complete.
121109 8:12:44 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
121109 8:12:44 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 26649fda-2a6f-11e2-0800-e7466d8051c6
121109 8:12:44 [Note] WSREP: STATE EXCHANGE: sent state msg: 26649fda-2a6f-11e2-0800-e7466d8051c6
121109 8:12:44 [Note] WSREP: STATE EXCHANGE: got state msg: 26649fda-2a6f-11e2-0800-e7466d8051c6 from 0 (mysql-db1-standby)
121109 8:12:44 [Note] WSREP: STATE EXCHANGE: got state msg: 26649fda-2a6f-11e2-0800-e7466d8051c6 from 1 (maindb-node0)
121109 8:12:44 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 6,
members = 1/2 (joined/total),
act_id = 10182393,
last_appl. = -1,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = e459a41e-19ed-11e2-0800-28913924bd33
121109 8:12:44 [Note] WSREP: Flow-control interval: [253, 256]
121109 8:12:44 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 10182393)
121109 8:12:44 [Note] WSREP: State transfer required:
Group state: e459a41e-19ed-11e2-0800-28913924bd33:10182393
Local state: 00000000-0000-0000-0000-000000000000:-1
121109 8:12:44 [Note] WSREP: New cluster view: global state: e459a41e-19ed-11e2-0800-28913924bd33:10182393, view# 7: Primary, number of nodes: 2, my index: 0, protocol version 2
121109 8:12:44 [Warning] WSREP: Gap in state sequence. Need state transfer.
121109 8:12:46 [Note] WSREP: Running: 'wsrep_sst_xtrabackup 'joiner' '10.7.25.110' 'root:ft7' '/var/lib/mysql/' '/etc/mysql/wsrep.cnf' '20119' 2>sst.err'