Hello, I have a 3 node cluster accessed by a PHP application. All tables are Innodb. When load testing the application with apache-jmeter, the primary cluster node fails with the following error:
[ERROR] WSREP: FSM: no such a transition COMMITTING -> ROLLED_BACK Aborted 121029 14:53:02 mysqld_safe Number of processes running now: 0 121029 14:53:02 mysqld_safe mysqld restarted 121029 14:53:02 [Note] Flashcache bypass: disabled 121029 14:53:02 [Note] Flashcache setup error is : ioctl failed
121029 14:53:08 [Note] WSREP: gcomm: connected 121029 14:53:08 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636 121029 14:53:08 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0) 121029 14:53:08 [Note] WSREP: Opened channel 'test_cluster2' 121029 14:53:08 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3 121029 14:53:08 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID. 121029 14:53:08 [Note] WSREP: Waiting for SST to complete. 121029 14:53:08 [Note] WSREP: STATE EXCHANGE: sent state msg: e0eef247-21f9-11e2-0800-c2c66b6d352c 121029 14:53:08 [Note] WSREP: STATE EXCHANGE: got state msg: e0eef247-21f9-11e2-0800-c2c66b6d352c from 0 (node112) 121029 14:53:08 [Note] WSREP: STATE EXCHANGE: got state msg: e0eef247-21f9-11e2-0800-c2c66b6d352c from 2 (node111) 121029 14:53:08 [Note] WSREP: STATE EXCHANGE: got state msg: e0eef247-21f9-11e2-0800-c2c66b6d352c from 1 (node110) 121029 14:53:08 [Note] WSREP: Quorum results: version = 2, component = PRIMARY, conf_id = 9, members = 2/3 (joined/total), act_id = 8642, last_appl. = -1, protocols = 0/4/2 (gcs/repl/appl), group UUID = 97be425f-21db-11e2-0800-d98563b1f16e 121029 14:53:08 [Note] WSREP: Flow-control interval: [14, 28] 121029 14:53:08 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 8642) 121029 14:53:08 [Note] WSREP: State transfer required: Group state: 97be425f-21db-11e2-0800-d98563b1f16e:8642 Local state: 00000000-0000-0000-0000-000000000000:-1 121029 14:53:08 [Note] WSREP: New cluster view: global state: 97be425f-21db-11e2-0800-d98563b1f16e:8642, view# 10: Primary, number of nodes: 3, my index: 1, protocol version 2 121029 14:53:08 [Warning] WSREP: Gap in state sequence. Need state transfer. 121029 14:53:10 [Note] WSREP: Running: 'wsrep_sst_xtrabackup 'joiner' '11.11.11.110' 'xtrabackup:1234567' '/data/mysql1/' '/etc/mysql/my.cnf' '11904' 2>sst.err' 121029 14:53:10 [Note] WSREP: Prepared SST request: xtrabackup|11.11.11.110:4444/xtrabackup_sst 121029 14:53:10 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 121029 14:53:10 [Note] WSREP: Assign initial position for certification: 8642, protocol version: 2 121029 14:53:10 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (97be425f-21db-11e2-0800-d98563b1f16e): 1 (Operation not permitted) at galera/src/replicator_str.cpp:prepare_for_IST():439. IST will be unavailable. 121029 14:53:10 [Note] WSREP: Node 1 (node110) requested state transfer from '*any*'. Selected 0 (node112)(SYNCED) as donor. 121029 14:53:10 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 8682) 121029 14:53:10 [Note] WSREP: Requesting state transfer: success, donor: 0 121029 14:58:25 [Note] WSREP: 0 (node112): State transfer to 1 (node110) complete. 121029 14:58:25 [Note] WSREP: Member 0 (node112) synced with group.
At other times, the same load test causes this error:
[ERROR] WSREP: FSM: no such a transition REPLICATING -> REPLICATING
Can any insight be provided for this error? After it occurs, the cluster automatically fixes itself, but what is causing the issue in the first place?
On Monday, October 29, 2012 3:15:26 PM UTC-4, PaulO wrote:
> Hello, I have a 3 node cluster accessed by a PHP application. All tables > are Innodb. When load testing the application with apache-jmeter, the > primary cluster node fails with the following error:
> [ERROR] WSREP: FSM: no such a transition COMMITTING -> ROLLED_BACK > Aborted > 121029 14:53:02 mysqld_safe Number of processes running now: 0 > 121029 14:53:02 mysqld_safe mysqld restarted > 121029 14:53:02 [Note] Flashcache bypass: disabled > 121029 14:53:02 [Note] Flashcache setup error is : ioctl failed
> 121029 14:53:02 [Note] WSREP: Read nil XID from storage engines, skipping > position init > 121029 14:53:02 [Note] WSREP: wsrep_load(): loading provider library > '/usr/lib/libgalera_smm.so' > 121029 14:53:02 [Note] WSREP: wsrep_load(): Galera 2.1dev(r112) by > Codership Oy <i...@codership.com> loaded succesfully. > 121029 14:53:02 [Note] WSREP: Found saved state: > 00000000-0000-0000-0000-000000000000:-1 > 121029 14:53:02 [Note] WSREP: Reusing existing > '/data/mysql1//galera.cache'. > 121029 14:53:02 [Note] WSREP: Passing config to GCS: base_host = > 11.11.11.110; gcache.dir = /data/mysql1/; gcache.keep_pages_size = 0; > gcache.mem_size = 0; gcache.name = /data/mysql1//galera.cache; > gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; > gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; > gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit > = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; > replicator.causal_read_timeout = PT30S; replicator.commit_order = 3 > 121029 14:53:02 [Note] WSREP: Assign initial position for certification: > -1, protocol version: -1 > 121029 14:53:02 [Note] WSREP: wsrep_sst_grab() > 121029 14:53:02 [Note] WSREP: Start replication > 121029 14:53:02 [Note] WSREP: Setting initial position to > 00000000-0000-0000-0000-000000000000:-1 > 121029 14:53:02 [Note] WSREP: protonet asio version 0 > 121029 14:53:02 [Note] WSREP: backend: asio > 121029 14:53:02 [Note] WSREP: GMCast version 0 > 121029 14:53:02 [Note] WSREP: (ddd40d58-21f9-11e2-0800-19fa5c7b5506, > 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567 > 121029 14:53:02 [Note] WSREP: (ddd40d58-21f9-11e2-0800-19fa5c7b5506, > 'tcp://0.0.0.0:4567') multicast: , ttl: 1 > 121029 14:53:02 [Note] WSREP: EVS version 0 > 121029 14:53:02 [Note] WSREP: PC version 0 > 121029 14:53:02 [Note] WSREP: gcomm: connecting to group 'test_cluster2', > peer '11.11.11.111:4567' > 121029 14:53:02 [Note] WSREP: (ddd40d58-21f9-11e2-0800-19fa5c7b5506, > 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: > tcp://11.11.11.112:4567 > 121029 14:53:03 [Note] WSREP: (ddd40d58-21f9-11e2-0800-19fa5c7b5506, > 'tcp://0.0.0.0:4567') turning message relay requesting off > 121029 14:53:07 [Note] WSREP: declaring > 02b0f513-21dc-11e2-0800-7a0a8588f798 stable > 121029 14:53:07 [Note] WSREP: declaring > ebe80665-21f4-11e2-0800-bedffce635c8 stable > 121029 14:53:07 [Note] WSREP: > view(view_id(PRIM,02b0f513-21dc-11e2-0800-7a0a8588f798,11) memb { > 02b0f513-21dc-11e2-0800-7a0a8588f798, > ddd40d58-21f9-11e2-0800-19fa5c7b5506, > ebe80665-21f4-11e2-0800-bedffce635c8, > } joined { > } left { > } partitioned { > 92cdb47c-21f7-11e2-0800-262cdab68ee6, > }) > 121029 14:53:08 [Note] WSREP: gcomm: connected > 121029 14:53:08 [Note] WSREP: Changing maximum packet size to 64500, > resulting msg size: 32636 > 121029 14:53:08 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0) > 121029 14:53:08 [Note] WSREP: Opened channel 'test_cluster2' > 121029 14:53:08 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = > no, my_idx = 1, memb_num = 3 > 121029 14:53:08 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID. > 121029 14:53:08 [Note] WSREP: Waiting for SST to complete. > 121029 14:53:08 [Note] WSREP: STATE EXCHANGE: sent state msg: > e0eef247-21f9-11e2-0800-c2c66b6d352c > 121029 14:53:08 [Note] WSREP: STATE EXCHANGE: got state msg: > e0eef247-21f9-11e2-0800-c2c66b6d352c from 0 (node112) > 121029 14:53:08 [Note] WSREP: STATE EXCHANGE: got state msg: > e0eef247-21f9-11e2-0800-c2c66b6d352c from 2 (node111) > 121029 14:53:08 [Note] WSREP: STATE EXCHANGE: got state msg: > e0eef247-21f9-11e2-0800-c2c66b6d352c from 1 (node110) > 121029 14:53:08 [Note] WSREP: Quorum results: > version = 2, > component = PRIMARY, > conf_id = 9, > members = 2/3 (joined/total), > act_id = 8642, > last_appl. = -1, > protocols = 0/4/2 (gcs/repl/appl), > group UUID = 97be425f-21db-11e2-0800-d98563b1f16e > 121029 14:53:08 [Note] WSREP: Flow-control interval: [14, 28] > 121029 14:53:08 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 8642) > 121029 14:53:08 [Note] WSREP: State transfer required: > Group state: 97be425f-21db-11e2-0800-d98563b1f16e:8642 > Local state: 00000000-0000-0000-0000-000000000000:-1 > 121029 14:53:08 [Note] WSREP: New cluster view: global state: > 97be425f-21db-11e2-0800-d98563b1f16e:8642, view# 10: Primary, number of > nodes: 3, my index: 1, protocol version 2 > 121029 14:53:08 [Warning] WSREP: Gap in state sequence. Need state > transfer. > 121029 14:53:10 [Note] WSREP: Running: 'wsrep_sst_xtrabackup 'joiner' > '11.11.11.110' 'xtrabackup:1234567' '/data/mysql1/' '/etc/mysql/my.cnf' > '11904' 2>sst.err' > 121029 14:53:10 [Note] WSREP: Prepared SST request: xtrabackup| > 11.11.11.110:4444/xtrabackup_sst > 121029 14:53:10 [Note] WSREP: wsrep_notify_cmd is not defined, skipping > notification. > 121029 14:53:10 [Note] WSREP: Assign initial position for certification: > 8642, protocol version: 2 > 121029 14:53:10 [Warning] WSREP: Failed to prepare for incremental state > transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not > match group state UUID (97be425f-21db-11e2-0800-d98563b1f16e): 1 (Operation > not permitted) > at galera/src/replicator_str.cpp:prepare_for_IST():439. IST will > be unavailable. > 121029 14:53:10 [Note] WSREP: Node 1 (node110) requested state transfer > from '*any*'. Selected 0 (node112)(SYNCED) as donor. > 121029 14:53:10 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 8682) > 121029 14:53:10 [Note] WSREP: Requesting state transfer: success, donor: 0 > 121029 14:58:25 [Note] WSREP: 0 (node112): State transfer to 1 (node110) > complete. > 121029 14:58:25 [Note] WSREP: Member 0 (node112) synced with group.
> At other times, the same load test causes this error:
> [ERROR] WSREP: FSM: no such a transition REPLICATING -> REPLICATING
> Can any insight be provided for this error? After it occurs, the cluster > automatically fixes itself, but what is causing the issue in the first > place?
You must be hitting a known and fixed bug. At least some bugs with the same symptoms have been identified and fixed some time ago. Unfortunately not all of these fixes are in PXC yet. Still, try the latest PXC release it may fix your particular problem.
> On Monday, October 29, 2012 3:15:26 PM UTC-4, PaulO wrote:
>> Hello, I have a 3 node cluster accessed by a PHP application. All >> tables
>> are Innodb. When load testing the application with apache-jmeter, >> the
>> primary cluster node fails with the following error:
>> [ERROR] WSREP: FSM: no such a transition COMMITTING -> ROLLED_BACK
>> Aborted
>> 121029 14:53:02 mysqld_safe Number of processes running now: 0
>> 121029 14:53:02 mysqld_safe mysqld restarted
>> 121029 14:53:02 [Note] Flashcache bypass: disabled
>> 121029 14:53:02 [Note] Flashcache setup error is : ioctl failed
>> 121029 14:53:02 [Note] WSREP: Read nil XID from storage engines, >> skipping
>> position init
>> 121029 14:53:02 [Note] WSREP: wsrep_load(): loading provider library
>> '/usr/lib/libgalera_smm.so'
>> 121029 14:53:02 [Note] WSREP: wsrep_load(): Galera 2.1dev(r112) by
>> Codership Oy <i...@codership.com> loaded succesfully.
>> 121029 14:53:02 [Note] WSREP: Found saved state:
>> 00000000-0000-0000-0000-000000000000:-1
>> 121029 14:53:02 [Note] WSREP: Reusing existing
>> '/data/mysql1//galera.cache'.
>> 121029 14:53:02 [Note] WSREP: Passing config to GCS: base_host =
>> 11.11.11.110; gcache.dir = /data/mysql1/; gcache.keep_pages_size = >> 0;
>> gcache.mem_size = 0; gcache.name = /data/mysql1//galera.cache;
>> gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0;
>> gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO;
>> gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; >> gcs.recv_q_hard_limit
>> = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor >> = NO;
>> replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
>> 121029 14:53:02 [Note] WSREP: Assign initial position for >> certification:
>> -1, protocol version: -1
>> 121029 14:53:02 [Note] WSREP: wsrep_sst_grab()
>> 121029 14:53:02 [Note] WSREP: Start replication
>> 121029 14:53:02 [Note] WSREP: Setting initial position to
>> 00000000-0000-0000-0000-000000000000:-1
>> 121029 14:53:02 [Note] WSREP: protonet asio version 0
>> 121029 14:53:02 [Note] WSREP: backend: asio
>> 121029 14:53:02 [Note] WSREP: GMCast version 0
>> 121029 14:53:02 [Note] WSREP: (ddd40d58-21f9-11e2-0800-19fa5c7b5506,
>> 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
>> 121029 14:53:02 [Note] WSREP: (ddd40d58-21f9-11e2-0800-19fa5c7b5506,
>> 'tcp://0.0.0.0:4567') multicast: , ttl: 1
>> 121029 14:53:02 [Note] WSREP: EVS version 0
>> 121029 14:53:02 [Note] WSREP: PC version 0
>> 121029 14:53:02 [Note] WSREP: gcomm: connecting to group >> 'test_cluster2',
>> peer '11.11.11.111:4567'
>> 121029 14:53:02 [Note] WSREP: (ddd40d58-21f9-11e2-0800-19fa5c7b5506,
>> 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive >> peers:
>> tcp://11.11.11.112:4567
>> 121029 14:53:03 [Note] WSREP: (ddd40d58-21f9-11e2-0800-19fa5c7b5506,
>> 'tcp://0.0.0.0:4567') turning message relay requesting off
>> 121029 14:53:07 [Note] WSREP: declaring
>> 02b0f513-21dc-11e2-0800-7a0a8588f798 stable
>> 121029 14:53:07 [Note] WSREP: declaring
>> ebe80665-21f4-11e2-0800-bedffce635c8 stable
>> 121029 14:53:07 [Note] WSREP:
>> view(view_id(PRIM,02b0f513-21dc-11e2-0800-7a0a8588f798,11) memb {
>> 02b0f513-21dc-11e2-0800-7a0a8588f798,
>> ddd40d58-21f9-11e2-0800-19fa5c7b5506,
>> ebe80665-21f4-11e2-0800-bedffce635c8,
>> } joined {
>> } left {
>> } partitioned {
>> 92cdb47c-21f7-11e2-0800-262cdab68ee6,
>> })
>> 121029 14:53:08 [Note] WSREP: gcomm: connected
>> 121029 14:53:08 [Note] WSREP: Changing maximum packet size to 64500,
>> resulting msg size: 32636
>> 121029 14:53:08 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
>> 121029 14:53:08 [Note] WSREP: Opened channel 'test_cluster2'
>> 121029 14:53:08 [Note] WSREP: New COMPONENT: primary = yes, >> bootstrap =
>> no, my_idx = 1, memb_num = 3
>> 121029 14:53:08 [Note] WSREP: STATE EXCHANGE: Waiting for state >> UUID.
>> 121029 14:53:08 [Note] WSREP: Waiting for SST to complete.
>> 121029 14:53:08 [Note] WSREP: STATE EXCHANGE: sent state msg:
>> e0eef247-21f9-11e2-0800-c2c66b6d352c
>> 121029 14:53:08 [Note] WSREP: STATE EXCHANGE: got state msg:
>> e0eef247-21f9-11e2-0800-c2c66b6d352c from 0 (node112)
>> 121029 14:53:08 [Note] WSREP: STATE EXCHANGE: got state msg:
>> e0eef247-21f9-11e2-0800-c2c66b6d352c from 2 (node111)
>> 121029 14:53:08 [Note] WSREP: STATE EXCHANGE: got state msg:
>> e0eef247-21f9-11e2-0800-c2c66b6d352c from 1 (node110)
>> 121029 14:53:08 [Note] WSREP: Quorum results:
>> version = 2,
>> component = PRIMARY,
>> conf_id = 9,
>> members = 2/3 (joined/total),
>> act_id = 8642,
>> last_appl. = -1,
>> protocols = 0/4/2 (gcs/repl/appl),
>> group UUID = 97be425f-21db-11e2-0800-d98563b1f16e
>> 121029 14:53:08 [Note] WSREP: Flow-control interval: [14, 28]
>> 121029 14:53:08 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 8642)
>> 121029 14:53:08 [Note] WSREP: State transfer required:
>> Group state: 97be425f-21db-11e2-0800-d98563b1f16e:8642
>> Local state: 00000000-0000-0000-0000-000000000000:-1
>> 121029 14:53:08 [Note] WSREP: New cluster view: global state:
>> 97be425f-21db-11e2-0800-d98563b1f16e:8642, view# 10: Primary, number >> of
>> nodes: 3, my index: 1, protocol version 2
>> 121029 14:53:08 [Warning] WSREP: Gap in state sequence. Need state
>> transfer.
>> 121029 14:53:10 [Note] WSREP: Running: 'wsrep_sst_xtrabackup >> 'joiner'
>> '11.11.11.110' 'xtrabackup:1234567' '/data/mysql1/' >> '/etc/mysql/my.cnf'
>> '11904' 2>sst.err'
>> 121029 14:53:10 [Note] WSREP: Prepared SST request: xtrabackup|
>> 11.11.11.110:4444/xtrabackup_sst
>> 121029 14:53:10 [Note] WSREP: wsrep_notify_cmd is not defined, >> skipping
>> notification.
>> 121029 14:53:10 [Note] WSREP: Assign initial position for >> certification:
>> 8642, protocol version: 2
>> 121029 14:53:10 [Warning] WSREP: Failed to prepare for incremental >> state
>> transfer: Local state UUID (00000000-0000-0000-0000-000000000000) >> does not
>> match group state UUID (97be425f-21db-11e2-0800-d98563b1f16e): 1 >> (Operation
>> not permitted)
>> at galera/src/replicator_str.cpp:prepare_for_IST():439. IST >> will
>> be unavailable.
>> 121029 14:53:10 [Note] WSREP: Node 1 (node110) requested state >> transfer
>> from '*any*'. Selected 0 (node112)(SYNCED) as donor.
>> 121029 14:53:10 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 8682)
>> 121029 14:53:10 [Note] WSREP: Requesting state transfer: success, >> donor: 0
>> 121029 14:58:25 [Note] WSREP: 0 (node112): State transfer to 1 >> (node110)
>> complete.
>> 121029 14:58:25 [Note] WSREP: Member 0 (node112) synced with group.
>> At other times, the same load test causes this error:
>> [ERROR] WSREP: FSM: no such a transition REPLICATING -> REPLICATING
>> Can any insight be provided for this error? After it occurs, the >> cluster
>> automatically fixes itself, but what is causing the issue in the >> first
>> place?
-- Alexey Yurchenko,
Codership Oy, www.codership.com Skype: alexey.yurchenko, Phone: +358-400-516-011
On Monday, October 29, 2012 4:54:40 PM UTC-4, Alexey Yurchenko wrote:
> You must be hitting a known and fixed bug. At least some bugs with the > same symptoms have been identified and fixed some time ago. > Unfortunately not all of these fixes are in PXC yet. Still, try the > latest PXC release it may fix your particular problem.
> On 2012-10-29 21:42, PaulO wrote: > > Furthermore, I have the following configurations on my 3 nodes:
> > On Monday, October 29, 2012 3:15:26 PM UTC-4, PaulO wrote:
> >> Hello, I have a 3 node cluster accessed by a PHP application. All > >> tables > >> are Innodb. When load testing the application with apache-jmeter, > >> the > >> primary cluster node fails with the following error:
> >> [ERROR] WSREP: FSM: no such a transition COMMITTING -> ROLLED_BACK > >> Aborted > >> 121029 14:53:02 mysqld_safe Number of processes running now: 0 > >> 121029 14:53:02 mysqld_safe mysqld restarted > >> 121029 14:53:02 [Note] Flashcache bypass: disabled > >> 121029 14:53:02 [Note] Flashcache setup error is : ioctl failed
Do you have any other thoughts on what the issue might be? Is it possible that the load test is just writing to the nodes too quickly? The strange part is that on some runs of the load test with jMeter, it will not have the issue. However, on other identical runs, it will cause the issue almost immediately.
> Do you have any other thoughts on what the issue might be? Is it > possible
> that the load test is just writing to the nodes too quickly?
Of course if you used a single threaded test, there would be no problem. But you probably won't need a cluster then. ;)
> The strange
> part is that on some runs of the load test with jMeter, it will not > have
> the issue. However, on other identical runs, it will cause the issue > almost
> immediately.
This error is a result of a bad execution context cleanup (wrong state is preserved) and most likely involves cluster-wide deadlock (you should not be seeing that if you use the cluster in a master-slave way). So it is a race and may happen and may not.
> Any help would be appreciated.
You could
1) describe your test in sufficient detail so that we could reproduce it in our lab, then we could fix it or confirm that it is fixed. (if there is sensitive information involved, you can e-mail directly to i...@codership.com)
2) compile the latest head from lp:codership-mysql and test with it and Galera 2.2 (again from https://launchpad.net/galera/+download) and verify that it is fixed there.
Thanks,
Alex
-- Alexey Yurchenko,
Codership Oy, www.codership.com Skype: alexey.yurchenko, Phone: +358-400-516-011
Thanks for the insight so far. We have narrowed down the issue to a specific function in our application, so we will continue to dig there. However, one other thing that we have noticed during these tests is that sometimes, when 2 of the 3 nodes go down and only 1 is still up, that node still has a local_state_comment of "Synced (6)" and we are still able to access the data through that node. It seems that the Split brain detection is not functioning correctly. Have you seen an issue like this? Here is some of the output of "show global status like 'wsrep%';"
On Wednesday, October 31, 2012 10:28:28 AM UTC-4, Alexey Yurchenko wrote:
> On 2012-10-31 14:37, PaulO wrote: > > Do you have any other thoughts on what the issue might be? Is it > > possible > > that the load test is just writing to the nodes too quickly?
> Of course if you used a single threaded test, there would be no > problem. But you probably won't need a cluster then. ;)
> > The strange > > part is that on some runs of the load test with jMeter, it will not > > have > > the issue. However, on other identical runs, it will cause the issue > > almost > > immediately.
> This error is a result of a bad execution context cleanup (wrong state > is preserved) and most likely involves cluster-wide deadlock (you should > not be seeing that if you use the cluster in a master-slave way). So it > is a race and may happen and may not.
> > Any help would be appreciated.
> You could
> 1) describe your test in sufficient detail so that we could reproduce > it in our lab, then we could fix it or confirm that it is fixed. (if > there is sensitive information involved, you can e-mail directly to > in...@codership.com <javascript:>)
> 2) compile the latest head from lp:codership-mysql and test with it and > Galera 2.2 (again from https://launchpad.net/galera/+download) and > verify that it is fixed there.
On Wednesday, October 31, 2012 1:31:56 PM UTC-4, PaulO wrote:
> Thanks for the insight so far. We have narrowed down the issue to a > specific function in our application, so we will continue to dig there. > However, one other thing that we have noticed during these tests is that > sometimes, when 2 of the 3 nodes go down and only 1 is still up, that node > still has a local_state_comment of "Synced (6)" and we are still able to > access the data through that node. It seems that the Split brain detection > is not functioning correctly. Have you seen an issue like this? Here is > some of the output of "show global status like 'wsrep%';"
> Any idea why the primary node does not block access to the DB?
> On Wednesday, October 31, 2012 10:28:28 AM UTC-4, Alexey Yurchenko wrote:
>> On 2012-10-31 14:37, PaulO wrote: >> > Do you have any other thoughts on what the issue might be? Is it >> > possible >> > that the load test is just writing to the nodes too quickly?
>> Of course if you used a single threaded test, there would be no >> problem. But you probably won't need a cluster then. ;)
>> > The strange >> > part is that on some runs of the load test with jMeter, it will not >> > have >> > the issue. However, on other identical runs, it will cause the issue >> > almost >> > immediately.
>> This error is a result of a bad execution context cleanup (wrong state >> is preserved) and most likely involves cluster-wide deadlock (you should >> not be seeing that if you use the cluster in a master-slave way). So it >> is a race and may happen and may not.
>> > Any help would be appreciated.
>> You could
>> 1) describe your test in sufficient detail so that we could reproduce >> it in our lab, then we could fix it or confirm that it is fixed. (if >> there is sensitive information involved, you can e-mail directly to >> in...@codership.com)
>> 2) compile the latest head from lp:codership-mysql and test with it and >> Galera 2.2 (again from https://launchpad.net/galera/+download) and >> verify that it is fixed there.
Upon further investigation, the issue seems to be related to when we have multiple web servers attempting to connect (many times per second) at the same time. We are using KEMP hardware load balancers, not HAProxy, could this be part of the issue?
When I have load tested just 1 web server which writes to just 1 DB server, there is no issue. If I have 1 web server write to multiple DB servers, I get the error. If I have 2 web servers write to 1 DB server, I get the error, regardless of whether I use the KEMP balancer or not.
Does HAProxy somehow funnel the requests better than a hardware balancer would such that it makes it easier for the cluster to keep up?
> On Wednesday, October 31, 2012 1:31:56 PM UTC-4, PaulO wrote:
>> Thanks for the insight so far. We have narrowed down the issue to a >> specific function in our application, so we will continue to dig there. >> However, one other thing that we have noticed during these tests is that >> sometimes, when 2 of the 3 nodes go down and only 1 is still up, that node >> still has a local_state_comment of "Synced (6)" and we are still able to >> access the data through that node. It seems that the Split brain detection >> is not functioning correctly. Have you seen an issue like this? Here is >> some of the output of "show global status like 'wsrep%';"
>> Any idea why the primary node does not block access to the DB?
>> On Wednesday, October 31, 2012 10:28:28 AM UTC-4, Alexey Yurchenko wrote:
>>> On 2012-10-31 14:37, PaulO wrote: >>> > Do you have any other thoughts on what the issue might be? Is it >>> > possible >>> > that the load test is just writing to the nodes too quickly?
>>> Of course if you used a single threaded test, there would be no >>> problem. But you probably won't need a cluster then. ;)
>>> > The strange >>> > part is that on some runs of the load test with jMeter, it will not >>> > have >>> > the issue. However, on other identical runs, it will cause the issue >>> > almost >>> > immediately.
>>> This error is a result of a bad execution context cleanup (wrong state >>> is preserved) and most likely involves cluster-wide deadlock (you should >>> not be seeing that if you use the cluster in a master-slave way). So it >>> is a race and may happen and may not.
>>> > Any help would be appreciated.
>>> You could
>>> 1) describe your test in sufficient detail so that we could reproduce >>> it in our lab, then we could fix it or confirm that it is fixed. (if >>> there is sensitive information involved, you can e-mail directly to >>> in...@codership.com)
>>> 2) compile the latest head from lp:codership-mysql and test with it and >>> Galera 2.2 (again from https://launchpad.net/galera/+download) and >>> verify that it is fixed there.
> Upon further investigation, the issue seems to be related to when we > have
> multiple web servers attempting to connect (many times per second) at > the
> same time. We are using KEMP hardware load balancers, not HAProxy, > could
> this be part of the issue?
It may help the issue to surface, but there is nothing wrong with using hardware balancers.
> When I have load tested just 1 web server which writes to just 1 DB > server,
> there is no issue.
> If I have 1 web server write to multiple DB servers, I get the error.
Yes, this is when you get cluster-wide conflicts.
> If I have 2 web servers write to 1 DB server, I get the error, > regardless
> of whether I use the KEMP balancer or not.
Are you *absolutely* sure that they are writing to 1 DB server?
> Does HAProxy somehow funnel the requests better than a hardware > balancer
> would such that it makes it easier for the cluster to keep up?
HAProxy is a dumb TCP proxy, it does not understand what's in the connections. If anything it may be far slower which may reduce the contention. But if there is a bug, you'll hit it with HAProxy sooner or later.
About the 3rd node staying online: apparently the nodes didn't leave the cluster ungracefully (in most such cases we're trying to close cluster connection before aborting). This allowed the 3rd node to continue. Details in the logs.
>> On Wednesday, October 31, 2012 1:31:56 PM UTC-4, PaulO wrote:
>>> Thanks for the insight so far. We have narrowed down the issue to a
>>> specific function in our application, so we will continue to dig >>> there.
>>> However, one other thing that we have noticed during these tests is >>> that
>>> sometimes, when 2 of the 3 nodes go down and only 1 is still up, >>> that node
>>> still has a local_state_comment of "Synced (6)" and we are still >>> able to
>>> access the data through that node. It seems that the Split brain >>> detection
>>> is not functioning correctly. Have you seen an issue like this? >>> Here is
>>> some of the output of "show global status like 'wsrep%';"
>>> Any idea why the primary node does not block access to the DB?
>>> On Wednesday, October 31, 2012 10:28:28 AM UTC-4, Alexey Yurchenko >>> wrote:
>>>> On 2012-10-31 14:37, PaulO wrote:
>>>> > Do you have any other thoughts on what the issue might be? Is it
>>>> > possible
>>>> > that the load test is just writing to the nodes too quickly?
>>>> Of course if you used a single threaded test, there would be no
>>>> problem. But you probably won't need a cluster then. ;)
>>>> > The strange
>>>> > part is that on some runs of the load test with jMeter, it will >>>> not
>>>> > have
>>>> > the issue. However, on other identical runs, it will cause the >>>> issue
>>>> > almost
>>>> > immediately.
>>>> This error is a result of a bad execution context cleanup (wrong >>>> state
>>>> is preserved) and most likely involves cluster-wide deadlock (you >>>> should
>>>> not be seeing that if you use the cluster in a master-slave way). >>>> So it
>>>> is a race and may happen and may not.
>>>> > Any help would be appreciated.
>>>> You could
>>>> 1) describe your test in sufficient detail so that we could >>>> reproduce
>>>> it in our lab, then we could fix it or confirm that it is fixed. >>>> (if
>>>> there is sensitive information involved, you can e-mail directly >>>> to
>>>> in...@codership.com)
>>>> 2) compile the latest head from lp:codership-mysql and test with >>>> it and
>>>> Galera 2.2 (again from https://launchpad.net/galera/+download) and
>>>> verify that it is fixed there.