Re: WSREP FSM Error when load testing PHP application

165 views
Skip to first unread message

PaulO

unread,
Oct 29, 2012, 3:42:58 PM10/29/12
to percona-d...@googlegroups.com
Furthermore, I have the following configurations on my 3 nodes:

Node 1: 
wsrep_urls=gcomm://10.10.11.111:4567,gcomm://10.10.11.112:4567,gcomm://

Node 2: 
wsrep_urls=gcomm://10.10.11.110:4567,gcomm://10.10.11.112:4567

Node 3:
wsrep_urls=gcomm://10.10.11.110:4567,gcomm://10.10.11.111:4567


I am using Percona XtraDB Cluster 5.5.24-55.



On Monday, October 29, 2012 3:15:26 PM UTC-4, PaulO wrote:
Hello, I have a 3 node cluster accessed by a PHP application. All tables are Innodb. When load testing the application with apache-jmeter, the primary cluster node fails with the following error:

[ERROR] WSREP: FSM: no such a transition COMMITTING -> ROLLED_BACK
Aborted
121029 14:53:02 mysqld_safe Number of processes running now: 0
121029 14:53:02 mysqld_safe mysqld restarted
121029 14:53:02 [Note] Flashcache bypass: disabled
121029 14:53:02 [Note] Flashcache setup error is : ioctl failed

121029 14:53:02 [Note] WSREP: Read nil XID from storage engines, skipping position init
121029 14:53:02 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
121029 14:53:02 [Note] WSREP: wsrep_load(): Galera 2.1dev(r112) by Codership Oy <in...@codership.com> loaded succesfully.
121029 14:53:02 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
121029 14:53:02 [Note] WSREP: Reusing existing '/data/mysql1//galera.cache'.
121029 14:53:02 [Note] WSREP: Passing config to GCS: base_host = 11.11.11.110; gcache.dir = /data/mysql1/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/mysql1//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
121029 14:53:02 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
121029 14:53:02 [Note] WSREP: wsrep_sst_grab()
121029 14:53:02 [Note] WSREP: Start replication
121029 14:53:02 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
121029 14:53:02 [Note] WSREP: protonet asio version 0
121029 14:53:02 [Note] WSREP: backend: asio
121029 14:53:02 [Note] WSREP: GMCast version 0
121029 14:53:02 [Note] WSREP: (ddd40d58-21f9-11e2-0800-19fa5c7b5506, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
121029 14:53:02 [Note] WSREP: (ddd40d58-21f9-11e2-0800-19fa5c7b5506, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
121029 14:53:02 [Note] WSREP: EVS version 0
121029 14:53:02 [Note] WSREP: PC version 0
121029 14:53:02 [Note] WSREP: gcomm: connecting to group 'test_cluster2', peer '11.11.11.111:4567'
121029 14:53:02 [Note] WSREP: (ddd40d58-21f9-11e2-0800-19fa5c7b5506, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://11.11.11.112:4567
121029 14:53:03 [Note] WSREP: (ddd40d58-21f9-11e2-0800-19fa5c7b5506, 'tcp://0.0.0.0:4567') turning message relay requesting off
121029 14:53:07 [Note] WSREP: declaring 02b0f513-21dc-11e2-0800-7a0a8588f798 stable
121029 14:53:07 [Note] WSREP: declaring ebe80665-21f4-11e2-0800-bedffce635c8 stable
121029 14:53:07 [Note] WSREP: view(view_id(PRIM,02b0f513-21dc-11e2-0800-7a0a8588f798,11) memb {
        02b0f513-21dc-11e2-0800-7a0a8588f798,
        ddd40d58-21f9-11e2-0800-19fa5c7b5506,
        ebe80665-21f4-11e2-0800-bedffce635c8,
} joined {
} left {
} partitioned {
        92cdb47c-21f7-11e2-0800-262cdab68ee6,
})
121029 14:53:08 [Note] WSREP: gcomm: connected
121029 14:53:08 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
121029 14:53:08 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
121029 14:53:08 [Note] WSREP: Opened channel 'test_cluster2'
121029 14:53:08 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
121029 14:53:08 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
121029 14:53:08 [Note] WSREP: Waiting for SST to complete.
121029 14:53:08 [Note] WSREP: STATE EXCHANGE: sent state msg: e0eef247-21f9-11e2-0800-c2c66b6d352c
121029 14:53:08 [Note] WSREP: STATE EXCHANGE: got state msg: e0eef247-21f9-11e2-0800-c2c66b6d352c from 0 (node112)
121029 14:53:08 [Note] WSREP: STATE EXCHANGE: got state msg: e0eef247-21f9-11e2-0800-c2c66b6d352c from 2 (node111)
121029 14:53:08 [Note] WSREP: STATE EXCHANGE: got state msg: e0eef247-21f9-11e2-0800-c2c66b6d352c from 1 (node110)
121029 14:53:08 [Note] WSREP: Quorum results:
        version    = 2,
        component  = PRIMARY,
        conf_id    = 9,
        members    = 2/3 (joined/total),
        act_id     = 8642,
        last_appl. = -1,
        protocols  = 0/4/2 (gcs/repl/appl),
        group UUID = 97be425f-21db-11e2-0800-d98563b1f16e
121029 14:53:08 [Note] WSREP: Flow-control interval: [14, 28]
121029 14:53:08 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 8642)
121029 14:53:08 [Note] WSREP: State transfer required:
        Group state: 97be425f-21db-11e2-0800-d98563b1f16e:8642
        Local state: 00000000-0000-0000-0000-000000000000:-1
121029 14:53:08 [Note] WSREP: New cluster view: global state: 97be425f-21db-11e2-0800-d98563b1f16e:8642, view# 10: Primary, number of nodes: 3, my index: 1, protocol version 2
121029 14:53:08 [Warning] WSREP: Gap in state sequence. Need state transfer.
121029 14:53:10 [Note] WSREP: Running: 'wsrep_sst_xtrabackup 'joiner' '11.11.11.110' 'xtrabackup:1234567' '/data/mysql1/' '/etc/mysql/my.cnf' '11904' 2>sst.err'
121029 14:53:10 [Note] WSREP: Prepared SST request: xtrabackup|11.11.11.110:4444/xtrabackup_sst
121029 14:53:10 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
121029 14:53:10 [Note] WSREP: Assign initial position for certification: 8642, protocol version: 2
121029 14:53:10 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (97be425f-21db-11e2-0800-d98563b1f16e): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():439. IST will be unavailable.
121029 14:53:10 [Note] WSREP: Node 1 (node110) requested state transfer from '*any*'. Selected 0 (node112)(SYNCED) as donor.
121029 14:53:10 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 8682)
121029 14:53:10 [Note] WSREP: Requesting state transfer: success, donor: 0
121029 14:58:25 [Note] WSREP: 0 (node112): State transfer to 1 (node110) complete.
121029 14:58:25 [Note] WSREP: Member 0 (node112) synced with group.



At other times, the same load test causes this error:

[ERROR] WSREP: FSM: no such a transition REPLICATING -> REPLICATING

Can any insight be provided for this error? After it occurs, the cluster automatically fixes itself, but what is causing the issue in the first place? 

Alex Yurchenko

unread,
Oct 29, 2012, 4:54:37 PM10/29/12
to percona-d...@googlegroups.com
You must be hitting a known and fixed bug. At least some bugs with the
same symptoms have been identified and fixed some time ago.
Unfortunately not all of these fixes are in PXC yet. Still, try the
latest PXC release it may fix your particular problem.
--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

PaulO

unread,
Oct 30, 2012, 4:39:51 PM10/30/12
to percona-d...@googlegroups.com
After upgrading each of the 3 nodes to the newest version, I am still getting this error:

121030 16:34:17 [ERROR] WSREP: FSM: no such a transition REPLICATING -> ROLLED_BACK
Aborted
121030 16:34:17 mysqld_safe Number of processes running now: 0
121030 16:34:17 mysqld_safe mysqld restarted
121030 16:34:17 [Note] Flashcache bypass: disabled
121030 16:34:17 [Note] Flashcache setup error is : ioctl failed

121030 16:34:17 [Note] WSREP: Read nil XID from storage engines, skipping position init
121030 16:34:17 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
121030 16:34:17 [Note] WSREP: wsrep_load(): Galera 2.2(r115) by Codership Oy <in...@codership.com> loaded succesfully.
121030 16:34:17 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
121030 16:34:17 [Note] WSREP: Reusing existing '/data/mysql1//galera.cache'.
121030 16:34:17 [Note] WSREP: Passing config to GCS: base_host = 10.10.11.110; gcache.dir = /data/mysql1/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/mysql1//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; pc.ignore_sb = false; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
121030 16:34:17 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
121030 16:34:17 [Note] WSREP: wsrep_sst_grab()
121030 16:34:17 [Note] WSREP: Start replication
121030 16:34:17 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
121030 16:34:17 [Note] WSREP: protonet asio version 0
121030 16:34:17 [Note] WSREP: backend: asio
121030 16:34:17 [Note] WSREP: GMCast version 0
121030 16:34:17 [Note] WSREP: (2d2525aa-22d1-11e2-0800-9da5e98fae00, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
121030 16:34:17 [Note] WSREP: (2d2525aa-22d1-11e2-0800-9da5e98fae00, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
121030 16:34:17 [Note] WSREP: EVS version 0
121030 16:34:17 [Note] WSREP: PC version 0
121030 16:34:17 [Note] WSREP: gcomm: connecting to group 'test_cluster2', peer '10.10.11.111:4567'
121030 16:34:17 [Note] WSREP: (2d2525aa-22d1-11e2-0800-9da5e98fae00, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.10.11.112:4567
121030 16:34:18 [Note] WSREP: (2d2525aa-22d1-11e2-0800-9da5e98fae00, 'tcp://0.0.0.0:4567') turning message relay requesting off
121030 16:34:22 [Note] WSREP: declaring 3f6142cf-22cf-11e2-0800-b481e890b368 stable
121030 16:34:22 [Note] WSREP: declaring 526375df-22cf-11e2-0800-3d47d06db344 stable
121030 16:34:22 [Note] WSREP: view(view_id(PRIM,2d2525aa-22d1-11e2-0800-9da5e98fae00,4) memb {
        2d2525aa-22d1-11e2-0800-9da5e98fae00,
        3f6142cf-22cf-11e2-0800-b481e890b368,
        526375df-22cf-11e2-0800-3d47d06db344,
} joined {
} left {
} partitioned {
        5970a776-22cf-11e2-0800-64b44ccf18ca,
})
121030 16:34:23 [Note] WSREP: gcomm: connected
121030 16:34:23 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
121030 16:34:23 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
121030 16:34:23 [Note] WSREP: Opened channel 'test_cluster2'
121030 16:34:23 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
121030 16:34:23 [Note] WSREP: Waiting for SST to complete.
121030 16:34:23 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 306ced91-22d1-11e2-0800-3368c23d357e
121030 16:34:23 [Note] WSREP: STATE EXCHANGE: sent state msg: 306ced91-22d1-11e2-0800-3368c23d357e
121030 16:34:23 [Note] WSREP: STATE EXCHANGE: got state msg: 306ced91-22d1-11e2-0800-3368c23d357e from 0 (node110)
121030 16:34:23 [Note] WSREP: STATE EXCHANGE: got state msg: 306ced91-22d1-11e2-0800-3368c23d357e from 1 (node112)
121030 16:34:23 [Note] WSREP: STATE EXCHANGE: got state msg: 306ced91-22d1-11e2-0800-3368c23d357e from 2 (node111)
121030 16:34:23 [Note] WSREP: Quorum results:
        version    = 2,
        component  = PRIMARY,
        conf_id    = 3,
        members    = 2/3 (joined/total),
        act_id     = 2016,
        last_appl. = -1,
        protocols  = 0/4/2 (gcs/repl/appl),
        group UUID = 721fa11a-22b7-11e2-0800-28d019b4cb3f
121030 16:34:23 [Note] WSREP: Flow-control interval: [14, 28]
121030 16:34:23 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 2016)
121030 16:34:23 [Note] WSREP: State transfer required:
        Group state: 721fa11a-22b7-11e2-0800-28d019b4cb3f:2016
        Local state: 00000000-0000-0000-0000-000000000000:-1
121030 16:34:23 [Note] WSREP: New cluster view: global state: 721fa11a-22b7-11e2-0800-28d019b4cb3f:2016, view# 4: Primary, number of nodes: 3, my index: 0, protocol version 2
121030 16:34:23 [Warning] WSREP: Gap in state sequence. Need state transfer.
121030 16:34:23 [Note] WSREP: (2d2525aa-22d1-11e2-0800-9da5e98fae00, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.10.11.111:4567
121030 16:34:23 [Note] WSREP: remote endpoint tcp://10.10.11.111:4567 changed identity 526375df-22cf-11e2-0800-3d47d06db344 -> 30f824fd-22d1-11e2-0800-3971ee83c1f9
121030 16:34:23 [Note] WSREP: (2d2525aa-22d1-11e2-0800-9da5e98fae00, 'tcp://0.0.0.0:4567') turning message relay requesting off
121030 16:34:25 [Note] WSREP: Running: 'wsrep_sst_xtrabackup 'joiner' '10.10.11.110:4444' 'xtrabackup:123456789' '/data/mysql1/' '/etc/mysql/my.cnf' '16821' 2>sst.err'
121030 16:34:25 [Note] WSREP: Prepared SST request: xtrabackup|10.10.11.110:4444/xtrabackup_sst
121030 16:34:25 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
121030 16:34:25 [Note] WSREP: Assign initial position for certification: 2016, protocol version: 2
121030 16:34:25 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (721fa11a-22b7-11e2-0800-28d019b4cb3f): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():440. IST will be unavailable.
121030 16:34:28 [Note] WSREP: evs::proto(2d2525aa-22d1-11e2-0800-9da5e98fae00, GATHER, view_id(REG,2d2525aa-22d1-11e2-0800-9da5e98fae00,4)) suspecting node: 526375df-22cf-11e2-0800-3d47d06db344
121030 16:34:29 [Note] WSREP: declaring 30f824fd-22d1-11e2-0800-3971ee83c1f9 stable
121030 16:34:29 [Note] WSREP: declaring 3f6142cf-22cf-11e2-0800-b481e890b368 stable
121030 16:34:29 [Note] WSREP: view(view_id(PRIM,2d2525aa-22d1-11e2-0800-9da5e98fae00,5) memb {
        2d2525aa-22d1-11e2-0800-9da5e98fae00,
        30f824fd-22d1-11e2-0800-3971ee83c1f9,
        3f6142cf-22cf-11e2-0800-b481e890b368,
} joined {
} left {
} partitioned {
        526375df-22cf-11e2-0800-3d47d06db344,
})
121030 16:34:29 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
121030 16:34:29 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 34117026-22d1-11e2-0800-b17b27d9cef1
121030 16:34:29 [Note] WSREP: STATE EXCHANGE: sent state msg: 34117026-22d1-11e2-0800-b17b27d9cef1
121030 16:34:29 [Note] WSREP: STATE EXCHANGE: got state msg: 34117026-22d1-11e2-0800-b17b27d9cef1 from 0 (node110)
121030 16:34:29 [Note] WSREP: STATE EXCHANGE: got state msg: 34117026-22d1-11e2-0800-b17b27d9cef1 from 2 (node112)
121030 16:34:29 [Note] WSREP: STATE EXCHANGE: got state msg: 34117026-22d1-11e2-0800-b17b27d9cef1 from 1 (node111)
121030 16:34:29 [Note] WSREP: Quorum results:
        version    = 2,
        component  = PRIMARY,
        conf_id    = 4,
        members    = 1/3 (joined/total),
        act_id     = 2049,
        last_appl. = 0,
        protocols  = 0/4/2 (gcs/repl/appl),
        group UUID = 721fa11a-22b7-11e2-0800-28d019b4cb3f
121030 16:34:29 [Note] WSREP: Flow-control interval: [14, 28]
121030 16:34:29 [Note] WSREP: Shifting PRIMARY -> PRIMARY (TO: 2049)
121030 16:34:29 [Note] WSREP: Node 0 (node110) requested state transfer from '*any*'. Selected 2 (node112)(SYNCED) as donor.
121030 16:34:29 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 2049)
121030 16:34:29 [Note] WSREP: Requesting state transfer: success, donor: 2

PaulO

unread,
Oct 31, 2012, 8:37:26 AM10/31/12
to percona-d...@googlegroups.com
Do you have any other thoughts on what the issue might be? Is it possible that the load test is just writing to the nodes too quickly? The strange part is that on some runs of the load test with jMeter, it will not have the issue. However, on other identical runs, it will cause the issue almost immediately. 

Any help would be appreciated.

Alex Yurchenko

unread,
Oct 31, 2012, 10:28:26 AM10/31/12
to percona-d...@googlegroups.com
On 2012-10-31 14:37, PaulO wrote:
> Do you have any other thoughts on what the issue might be? Is it
> possible
> that the load test is just writing to the nodes too quickly?

Of course if you used a single threaded test, there would be no
problem. But you probably won't need a cluster then. ;)

> The strange
> part is that on some runs of the load test with jMeter, it will not
> have
> the issue. However, on other identical runs, it will cause the issue
> almost
> immediately.

This error is a result of a bad execution context cleanup (wrong state
is preserved) and most likely involves cluster-wide deadlock (you should
not be seeing that if you use the cluster in a master-slave way). So it
is a race and may happen and may not.

> Any help would be appreciated.

You could

1) describe your test in sufficient detail so that we could reproduce
it in our lab, then we could fix it or confirm that it is fixed. (if
there is sensitive information involved, you can e-mail directly to
in...@codership.com)

2) compile the latest head from lp:codership-mysql and test with it and
Galera 2.2 (again from https://launchpad.net/galera/+download) and
verify that it is fixed there.

Thanks,
Alex

PaulO

unread,
Oct 31, 2012, 1:31:56 PM10/31/12
to percona-d...@googlegroups.com
Thanks for the insight so far. We have narrowed down the issue to a specific function in our application, so we will continue to dig there. However, one other thing that we have noticed during these tests is that sometimes, when 2 of the 3 nodes go down and only 1 is still up, that node still has a local_state_comment of "Synced (6)" and we are still able to access the data through that node. It seems that the Split brain detection is not functioning correctly. Have you seen an issue like this? Here is some of the output of "show global status like 'wsrep%';"

| wsrep_local_state          | 4                                    |
| wsrep_local_state_comment  | Synced (6)                           |
| wsrep_cert_index_size      | 0                                    |
| wsrep_causal_reads         | 0                                    |
| wsrep_cluster_conf_id      | 9                                    |
| wsrep_cluster_size         | 1                                    |
| wsrep_cluster_state_uuid   | 721fa11a-22b7-11e2-0800-28d019b4cb3f |
| wsrep_cluster_status       | Primary                              |
| wsrep_connected            | ON                                   |
| wsrep_local_index          | 0                                    |
| wsrep_provider_name        | Galera                               |
| wsrep_provider_vendor      | Codership Oy <in...@codership.com>    |
| wsrep_provider_version     | 2.2(r115)                            |
| wsrep_ready                | ON                                   |


Any idea why the primary node does not block access to the DB? 

PaulO

unread,
Oct 31, 2012, 1:34:04 PM10/31/12
to percona-d...@googlegroups.com
Here is my "wsrep_provider_options" on the primary node with node 2 and node 3 down.

| wsrep_provider_options | base_host = 11.11.11.110; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /data/mysql1/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/mysql1//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 11.11.11.110; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3 |


I confirmed that pc.ignore_sb=false;

PaulO

unread,
Oct 31, 2012, 3:06:11 PM10/31/12
to percona-d...@googlegroups.com
Upon further investigation, the issue seems to be related to when we have multiple web servers attempting to connect (many times per second) at the same time. We are using KEMP hardware load balancers, not HAProxy, could this be part of the issue? 

When I have load tested just 1 web server which writes to just 1 DB server, there is no issue. 
If I have 1 web server write to multiple DB servers, I get the error.
If I have 2 web servers write to 1 DB server, I get the error, regardless of whether I use the KEMP balancer or not.

Does HAProxy somehow funnel the requests better than a hardware balancer would such that it makes it easier for the cluster to keep up?

Alex Yurchenko

unread,
Oct 31, 2012, 6:15:40 PM10/31/12
to percona-d...@googlegroups.com
On 2012-10-31 21:06, PaulO wrote:
> Upon further investigation, the issue seems to be related to when we
> have
> multiple web servers attempting to connect (many times per second) at
> the
> same time. We are using KEMP hardware load balancers, not HAProxy,
> could
> this be part of the issue?

It may help the issue to surface, but there is nothing wrong with using
hardware balancers.

> When I have load tested just 1 web server which writes to just 1 DB
> server,
> there is no issue.
> If I have 1 web server write to multiple DB servers, I get the error.

Yes, this is when you get cluster-wide conflicts.

> If I have 2 web servers write to 1 DB server, I get the error,
> regardless
> of whether I use the KEMP balancer or not.

Are you *absolutely* sure that they are writing to 1 DB server?

> Does HAProxy somehow funnel the requests better than a hardware
> balancer
> would such that it makes it easier for the cluster to keep up?

HAProxy is a dumb TCP proxy, it does not understand what's in the
connections. If anything it may be far slower which may reduce the
contention. But if there is a bug, you'll hit it with HAProxy sooner or
later.

About the 3rd node staying online: apparently the nodes didn't leave
the cluster ungracefully (in most such cases we're trying to close
cluster connection before aborting). This allowed the 3rd node to
continue. Details in the logs.

Regards,
Alex
Reply all
Reply to author
Forward
0 new messages