Hello, I have a 3 node cluster accessed by a PHP application. All tables are Innodb. When load testing the application with apache-jmeter, the primary cluster node fails with the following error:
[ERROR] WSREP: FSM: no such a transition COMMITTING -> ROLLED_BACK
Aborted
121029 14:53:02 mysqld_safe Number of processes running now: 0
121029 14:53:02 mysqld_safe mysqld restarted
121029 14:53:02 [Note] Flashcache bypass: disabled
121029 14:53:02 [Note] Flashcache setup error is : ioctl failed
121029 14:53:02 [Note] WSREP: Read nil XID from storage engines, skipping position init
121029 14:53:02 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
121029 14:53:02 [Note] WSREP: wsrep_load(): Galera 2.1dev(r112) by Codership Oy <
in...@codership.com> loaded succesfully.
121029 14:53:02 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
121029 14:53:02 [Note] WSREP: Reusing existing '/data/mysql1//galera.cache'.
121029 14:53:02 [Note] WSREP: Passing config to GCS: base_host = 11.11.11.110; gcache.dir = /data/mysql1/; gcache.keep_pages_size = 0; gcache.mem_size = 0;
gcache.name = /data/mysql1//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
121029 14:53:02 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
121029 14:53:02 [Note] WSREP: wsrep_sst_grab()
121029 14:53:02 [Note] WSREP: Start replication
121029 14:53:02 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
121029 14:53:02 [Note] WSREP: protonet asio version 0
121029 14:53:02 [Note] WSREP: backend: asio
121029 14:53:02 [Note] WSREP: GMCast version 0
121029 14:53:02 [Note] WSREP: (ddd40d58-21f9-11e2-0800-19fa5c7b5506, 'tcp://
0.0.0.0:4567') listening at tcp://
0.0.0.0:4567121029 14:53:02 [Note] WSREP: (ddd40d58-21f9-11e2-0800-19fa5c7b5506, 'tcp://
0.0.0.0:4567') multicast: , ttl: 1
121029 14:53:02 [Note] WSREP: EVS version 0
121029 14:53:02 [Note] WSREP: PC version 0
121029 14:53:02 [Note] WSREP: gcomm: connecting to group 'test_cluster2', peer '
11.11.11.111:4567'
121029 14:53:02 [Note] WSREP: (ddd40d58-21f9-11e2-0800-19fa5c7b5506, 'tcp://
0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://
11.11.11.112:4567121029 14:53:03 [Note] WSREP: (ddd40d58-21f9-11e2-0800-19fa5c7b5506, 'tcp://
0.0.0.0:4567') turning message relay requesting off
121029 14:53:07 [Note] WSREP: declaring 02b0f513-21dc-11e2-0800-7a0a8588f798 stable
121029 14:53:07 [Note] WSREP: declaring ebe80665-21f4-11e2-0800-bedffce635c8 stable
121029 14:53:07 [Note] WSREP: view(view_id(PRIM,02b0f513-21dc-11e2-0800-7a0a8588f798,11) memb {
02b0f513-21dc-11e2-0800-7a0a8588f798,
ddd40d58-21f9-11e2-0800-19fa5c7b5506,
ebe80665-21f4-11e2-0800-bedffce635c8,
} joined {
} left {
} partitioned {
92cdb47c-21f7-11e2-0800-262cdab68ee6,
})
121029 14:53:08 [Note] WSREP: gcomm: connected
121029 14:53:08 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
121029 14:53:08 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
121029 14:53:08 [Note] WSREP: Opened channel 'test_cluster2'
121029 14:53:08 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
121029 14:53:08 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
121029 14:53:08 [Note] WSREP: Waiting for SST to complete.
121029 14:53:08 [Note] WSREP: STATE EXCHANGE: sent state msg: e0eef247-21f9-11e2-0800-c2c66b6d352c
121029 14:53:08 [Note] WSREP: STATE EXCHANGE: got state msg: e0eef247-21f9-11e2-0800-c2c66b6d352c from 0 (node112)
121029 14:53:08 [Note] WSREP: STATE EXCHANGE: got state msg: e0eef247-21f9-11e2-0800-c2c66b6d352c from 2 (node111)
121029 14:53:08 [Note] WSREP: STATE EXCHANGE: got state msg: e0eef247-21f9-11e2-0800-c2c66b6d352c from 1 (node110)
121029 14:53:08 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 9,
members = 2/3 (joined/total),
act_id = 8642,
last_appl. = -1,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = 97be425f-21db-11e2-0800-d98563b1f16e
121029 14:53:08 [Note] WSREP: Flow-control interval: [14, 28]
121029 14:53:08 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 8642)
121029 14:53:08 [Note] WSREP: State transfer required:
Group state: 97be425f-21db-11e2-0800-d98563b1f16e:8642
Local state: 00000000-0000-0000-0000-000000000000:-1
121029 14:53:08 [Note] WSREP: New cluster view: global state: 97be425f-21db-11e2-0800-d98563b1f16e:8642, view# 10: Primary, number of nodes: 3, my index: 1, protocol version 2
121029 14:53:08 [Warning] WSREP: Gap in state sequence. Need state transfer.
121029 14:53:10 [Note] WSREP: Running: 'wsrep_sst_xtrabackup 'joiner' '11.11.11.110' 'xtrabackup:1234567' '/data/mysql1/' '/etc/mysql/my.cnf' '11904' 2>sst.err'
121029 14:53:10 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
121029 14:53:10 [Note] WSREP: Assign initial position for certification: 8642, protocol version: 2
121029 14:53:10 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (97be425f-21db-11e2-0800-d98563b1f16e): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():439. IST will be unavailable.
121029 14:53:10 [Note] WSREP: Node 1 (node110) requested state transfer from '*any*'. Selected 0 (node112)(SYNCED) as donor.
121029 14:53:10 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 8682)
121029 14:53:10 [Note] WSREP: Requesting state transfer: success, donor: 0
121029 14:58:25 [Note] WSREP: 0 (node112): State transfer to 1 (node110) complete.
121029 14:58:25 [Note] WSREP: Member 0 (node112) synced with group.
Can any insight be provided for this error? After it occurs, the cluster automatically fixes itself, but what is causing the issue in the first place?