MariaDb-Galera, inconsistent results from "insert on duplicate key update"

Nils Martin Sande

unread,

Mar 31, 2015, 5:00:57 PM3/31/15

to codersh...@googlegroups.com

Hi, I'm having a weird problem with MariaDb Galera Server 10.0.17 (on centos 7 test and RHEL6.5 production).

When writing a large list of rows to a table using an "insert on duplicate key update" statement committed data seems to be lost somewhere on the way. Basically my list may contain duplicate keys, and the update part of the statement simply checks the timestamp field of the two conflicting entries and updates fields when needed. To speed things up the inserts is done in batches of up to 10000 rows (one transaction per batch).

Explanation of the fields in the summary below:

time - seconds in took to write all records to the db.
listSize - size of the list to be written to db.
keys - number of unique keys in the list.
inserts - number of statements which resulted in an insert (returned value 1)
actualSize - the actual size of the snapshot in the database, using "select count(*) ...."
duplicates - number of statements which resulted in update or no-change (return values 0 and 2)

In short keys and inserts should be equal and inserts + duplicates should equal listSize, and actualSize should equal inserts.

Snapshot summary: time=34S, listSize=704762, keys=704696, inserts=704696, actualSize=704696, duplicates=66
Snapshot summary: time=32S, listSize=704762, keys=704696, inserts=704702, actualSize=674705, duplicates=60
Snapshot summary: time=32S, listSize=704762, keys=704696, inserts=704697, actualSize=694698, duplicates=65
Snapshot summary: time=39S, listSize=704762, keys=704696, inserts=704698, actualSize=694698, duplicates=64

In the first example all is god. In the other examples something is very wrong. I have found the following:

Sending all writes to the same node fixes the problem (MaxScale RW Split), even with many batches written in parallel.
Only writing 1 batch at a time solves the problem. Increasing the batch parallelism increases the chance of inconsistencies.
Very small batch sizes decreases the chance of errors.

The error is not consistent. One "god" setup with a particular batchsize and paralellism may succeed 9 out of 10 times, while an other setup may only succeed 1 out of 20 times. For the batches that fail, no error is returned.

All I can find of interest in the logs is dead lock issues caused by dispatching writes to multiple nodes, but this is handled by simple transaction handling, rollback and retry on dead lock looser or lock acquisition failure. Basically data which has been committed is lost while MariaDB cluster keeps running. This seems pretty fatal for a data base so I'm guessing there is something wrong at my end, but after 2 weeks I'm still in the dark. An other project using the same cluster has reported problems with committed updates which seems to go missing. Does anyone have any ideas?

Br

Nils Martin

alexey.y...@galeracluster.com

unread,

Apr 1, 2015, 6:38:39 AM4/1/15

to Nils Martin Sande, codersh...@googlegroups.com

Have you checked data consistency between the nodes?

On 2015-04-01 00:00, Nils Martin Sande wrote:
> Hi, I'm having a weird problem with MariaDb Galera Server 10.0.17 (on
> centos 7 test and RHEL6.5 production).
>
> When writing a large list of rows to a table using an "insert on
> duplicate
> key update" statement committed data seems to be lost somewhere on the
> way.
> Basically my list may contain duplicate keys, and the update part of
> the
> statement simply checks the timestamp field of the two conflicting
> entries
> and updates fields when needed. To speed things up the inserts is done
> in
> batches of up to 10000 rows (one transaction per batch).
>
> Explanation of the fields in the summary below:
>

> - time - seconds in took to write all records to the db.
> - listSize - size of the list to be written to db.
> - keys - number of unique keys in the list.
> - inserts - number of statements which resulted in an insert
> (returned
> value 1)
> - actualSize - the actual size of the snapshot in the database,
> using
> "select count(*) ...."
> - duplicates - number of statements which resulted in update or

> no-change (return values 0 and 2)
>
> In short keys and inserts should be equal and inserts + duplicates
> should
> equal listSize, and actualSize should equal inserts.
>

> 1. Snapshot summary: time=34S, listSize=704762, keys=704696,
> inserts=704696, actualSize=704696, duplicates=66
> 2. Snapshot summary: time=32S, listSize=704762, keys=704696,
> inserts=704702, actualSize=674705, duplicates=60
> 3. Snapshot summary: time=32S, listSize=704762, keys=704696,
> inserts=704697, actualSize=694698, duplicates=65
> 4. Snapshot summary: time=39S, listSize=704762, keys=704696,

> inserts=704698, actualSize=694698, duplicates=64
>
> In the first example all is god. In the other examples something
> is very wrong. I have found the following:
>
>

> 1. Sending all writes to the same node fixes the problem (MaxScale

> RW
> Split), even with many batches written in parallel.

> 2. Only writing 1 batch at a time solves the problem. Increasing the

> batch parallelism increases the chance of inconsistencies.

> 3. Very small batch sizes decreases the chance of errors.

Nils Martin Sande

unread,

Apr 1, 2015, 7:13:22 AM4/1/15

to codersh...@googlegroups.com, nil...@gmail.com

Yes, I also just double checked to be sure. I stopped all nodes, then bootstraped node1. I then did an "rm -Rf *" in /var/lib/mysql/ on node2 before starting it. Ran one instance of the test using these two nodes. The test failed with:

listSize=704769, keys=704696, inserts=704699, actualSize=684699, duplicates=63

I then ran select to file of the table in question on both nodes:

select * into outfile '/tmp/galera1.txt' fields terminated by ',' l
ines terminated by '' from SnapshotData order by keyField;

According to both diff and md5summ the two nodes had identical versions of the table:

[nms@test ~]$ md5sum galera0.txt galera1.txt
81325e82299c4d519e335bfe16eed801 galera0.txt
81325e82299c4d519e335bfe16eed801 galera1.txt
[nms@test ~]$ diff galera0.txt galera1.txt
[nms@test ~]$

Nils Martin Sande

unread,

Apr 7, 2015, 7:50:41 AM4/7/15

to codersh...@googlegroups.com, nil...@gmail.com

This problem seems to be caused by the transaction manager used (Jboss transaction manager). Switching to Spring DatasourceTransactionManager appears to have solved the problem. I need to to do more testing, but I think it's safe to say that this is a client side problem (not caused by mariadb or the load balancer). I stumbled across the solution while trying(and failing) to recreate the problem in a simple standalone Java application using Springs transaction management.

Reply all

Reply to author

Forward