Galera certification issue with Select For Update (expecting deadlocks, get inconsistency)?

427 views
Skip to first unread message

Brad House

unread,
Mar 18, 2016, 4:15:39 PM3/18/16
to codership
I've been doing some testing of my application against Galera (ok, really Percona-XtraDB-Cluster-server-56-5.6.28-25.14.1.el7.x86_64),
and have run across something I cannot explain, and it appears to only occur under load.

I've got a load balancer sitting in front of 3 nodes directing traffic in a round robin fashion (Linux IPVS/LVS using server load
balancing features in keepalived).  My application spawns 25 DB connections, resulting in 8-9 connections per DB node.

Essentially my application is doing something like:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;
SELECT var FROM foo WHERE id = ? FOR UPDATE;
# Do some math on 'var'
UPDATE foo SET var=? WHERE id=?;
COMMIT;

"id" is the primary key in this case.

What I am observing is "N" concurrent committers get the same select result (expected with galera), do their math,
update with math result which succeeds (expected with galera), then commit and all succeed (*unexpected*).
I would have expected all but one commit to deadlock (1213) and I'd need to retry the transaction.  I do get
some occasional deadlocks reported, but not as many as I should.  This leads to inconsistent behavior since
the math performed is now off.

If I attempt to reproduce this with the mysql command line tool, I cannot, all but one case results in a deadlock as
expected.

A few other notes:
1) The math in this case may or may not result in the same value, it depends on the operation. (Does this
    matter?)
2) The number of committers ("N") here could be as high as the number of connections, in theory, but in
    general it is probably 1-3 ... but this statement during a load test is executed a few dozen times per
    *second*.
3) This works flawlessly if all traffic is directed to a single node, so it is definitely something introduced by
    using Galera.
4) The value of wsrep_sync_wait appears to have zero effect (values 0, 1, 3 tried)
5) I could attempt to change the "UPDATE" statement to include the original value of 'foo' that is expected
    to try to catch this condition, but I'm not actually sure if it would depending on how the record updates
    are transmitted.  I have not tried that, but wouldn't think it should be necessary.
6) I'm 90% sure we tested this same load scenario a year or so ago and this worked fine, could there be
    a regression in newer versions?  I haven't yet tried to roll back to something older.
7) I haven't yet tried to create a reduced test case, I wanted to first ask to see if anyone else had seen
    what I'm seeing.


And finally, I'm sure everyone wants my DB configs ...

/etc/my.cnf:
[mysqld]
datadir = /var/lib/mysql
# move tmpdir due to /tmp being a memory backed tmpfs filesystem, mysql uses this for on disk sorting 
tmpdir = /var/lib/mysql/tmp

[mysqld_safe]
pid-file = /run/mysqld/mysql.pid
syslog

!includedir /etc/my.cnf.d


/etc/my.cnf.d/base.cnf: [mysqld] bind-address = 0.0.0.0 key_buffer = 256M max_allowed_packet = 16M max_connections = 256 # Some optimizations thread_concurrency = 10 sort_buffer_size = 2M query_cache_limit = 100M query_cache_size = 256M log_bin binlog_format = ROW gtid_mode = ON log_slave_updates enforce_gtid_consistency = 1 group_concat_max_len = 102400 innodb_buffer_pool_size = 10G innodb_log_file_size = 64M innodb_file_per_table = 1 innodb_file_format = barracuda default_storage_engine = innodb # SSD Tuning innodb_flush_neighbors = 0 innodb_io_capacity = 6000

/etc/my.cnf.d/cluster.cnf: # Galera cluster [mysqld] wsrep_provider = /usr/lib64/libgalera_smm.so wsrep_sst_method = xtrabackup-v2 wsrep_sst_auth = "sstuser:s3cretPass" wsrep_cluster_name = cluster wsrep_slave_threads = 32 wsrep_max_ws_size = 2G wsrep_provider_options = "gcache.size = 5G; pc.recovery = true" wsrep_cluster_address = gcomm://10.30.30.11,10.30.30.12,10.30.30.13 wsrep_sync_wait = 0 innodb_autoinc_lock_mode = 2 innodb_locks_unsafe_for_binlog = 1 innodb_flush_log_at_trx_commit = 0 sync_binlog = 0 innodb_support_xa = 0 innodb_flush_method = ALL_O_DIRECT [sst] progress = 1 time = 1 streamfmt = xbstream

Thanks!
-Brad

alexey.y...@galeracluster.com

unread,
Mar 19, 2016, 12:37:01 PM3/19/16
to Brad House, codership
It does look like a bug - sounds like SELECT FOR UPDATE does not lock
the rows. If it is so you better report this to Percona.

BTW, why do you insist on using SERIALIZABLE isolation level (which
neither PXC nor MySQL/Galera support) and SELECT FOR UPDATE? At best it
does nothing when using several masters, at worst there may be issues -
maybe this is one of those.

Brad House

unread,
Mar 20, 2016, 3:36:05 PM3/20/16
to codership, bradley...@gmail.com
Using Serializable and SELECT FOR UPDATE is there for portability mostly, we don't know what
underlying system there might be, and for vanilla MySQL, it is necessary for this to
work reliably.  Also, if the same query is executed in parallel on the same node, and we didn't
use SELECT FOR UPDATE, we'd cause major issues since galera only protects this
cross-node.

Anyhow, most of my assumptions are based on this Galera blog posting:
http://galeracluster.com/2015/09/support-for-mysql-transaction-isolation-levels-in-galera-cluster/

There the CEO of CoderShip says to use SELECT FOR UPDATE and that combined
with the snapshot isolation provided by galera should be enough for my case.  Infact,
my case is almost identical to the example provided.  I guess the only real concern is if
for some reason setting "SERIALIZABLE" with Galera is actually *worse* than
"REPEATABLE READ", in which case I'd need to avoid SERIALIZABLE.

James Wang

unread,
Mar 21, 2016, 6:59:34 AM3/21/16
to codership, bradley...@gmail.com
mine: READ-COMMITTED

Do you use Galera in LAN environment?

On Sunday, 20 March 2016 19:36:05 UTC, Brad House wrote:
Using Serializable and SELECT FOR UPDATE is there for portability mostly, we don't know what

Brad House

unread,
Mar 21, 2016, 3:46:31 PM3/21/16
to codership, bradley...@gmail.com
Yes, it is a LAN environment, all nodes are interconnected via dual 1Gbps in this particular test.

I also neglected to mention that I actually get the server to deadlock a couple of connections, "SHOW PROCESSLIST;"
shows the connections as sleeping, but they never returned the result to the client!

I've attached a test case I wrote that reproduces this issue every time for me...

-Brad
galera_test.c
Reply all
Reply to author
Forward
0 new messages