UPSERT with non-unique index

Stanimir Stamenkov

unread,

Aug 2, 2020, 12:55:44 PM8/2/20

to

[I've posted this on Stack Overflow:
https://stackoverflow.com/questions/63194116/upsert-with-non-unique-index]

I need to implement concurrent-safe UPSERT using a non-unique key and
avoid unnecessary auto-increment of ID.

Traditional INSERT ... ON DUPLICATE KEY [1] doesn't work for me, so I'm
performing:

INSERT INTO table (col1, col2, col3, col4, col5)
SELECT 1, 2, 'value3', 'value4', 'value5'
WHERE NOT EXISTS (SELECT 1
FROM table
WHERE col3 = 'value3'
AND col4 = 'value4'
AND col5 = 'value5')

then if it results in no row inserted, I'm performing:

UPDATE table
SET col1 = col1 + 1,
col2 = MAX(col2, 2)
WHERE col3 = 'value3'
AND col4 = 'value4'
AND col5 = 'value5'

There's an index:

CREATE INDEX ON table (col3, col4, col5)

It is non-unique as there are legacy data that does not allow me to
declare it unique. Newer records, however, should not have duplicated
(col3, col4, col5) rows.

Unsurprisingly, using the given INSERT statement I'm getting mixed
results trying to execute it concurrently from two sessions. I can see
the second session blocking until the first one commits its transaction,
but then the second transaction is also able to insert a new row
sometimes (or sometimes it achieves the expected of avoiding to insert a
duplicate (col3, col4, col5) row).

I'm currently performing manual unique-check after the insert:

SELECT COUNT(1)
FROM table
WHERE col3 = 'value3'
AND col4 = 'value4'
AND col5 = 'value5'

but I've also tried:

INSERT INTO table (col1, col2, col3, col4, col5)
SELECT 1, 2, 'value3', 'value4', 'value5'
WHERE NOT EXISTS (SELECT 1
FROM table
WHERE col3 = 'value3'
AND col4 = 'value4'
AND col5 = 'value5'
FOR UPDATE)

which appears to work with the examples I'm always getting a duplicate
(col3, col4, col5) row, otherwise. Is the given FOR UPDATE usage
reliable for the purpose of ensuring no duplicate (col3, col4, col5) row
will be inserted?

I'm using READ-COMMITTED transaction isolation.

[1] https://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html

--
Stanimir

J.O. Aho

unread,

Aug 2, 2020, 2:15:06 PM8/2/20

to

Maybe the blocking is more then way you are testing things than what in
reality happens.

I would suggest you use

LOCK TABLES table WRITE;

<do what you did before>

UNLOCK TABLES;

LOCK TABLES table WRITE;

UNLOCK TABLES;

https://www.mysqltutorial.org/mysql-table-locking/
https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html

--

//Aho

Axel Schwenke

unread,

Aug 2, 2020, 4:39:06 PM8/2/20

to

On 02.08.2020 18:55, Stanimir Stamenkov wrote:
>
> I need to implement concurrent-safe UPSERT

...

> Traditional INSERT ... ON DUPLICATE KEY [1] doesn't work for me

...

> There's an index:
>
> CREATE INDEX ON table (col3, col4, col5)
>
> It is non-unique as there are legacy data that does not allow me to declare
> it unique.

That is the point where I would solve it.

1. add a new column `is_legacy_data`, i.e. TINYINT DEFAULT 0 that will be 0
for all new columns

2. for legacy columns with (col3, col4, col5) being not unique, set the new
column to a value that makes (col3, col4, col5, `is_legacy_data`) unique

3. add the unique index on (col3, col4, col5, `is_legacy_data`)

4. be happy with INSERT ... ON DUPLICATE KEY ...

The Natural Philosopher

unread,

Aug 2, 2020, 5:28:41 PM8/2/20

to

I don't know how SQL manages it, but if you have two asynchronous
instances of read-modify-write, where what is written depends on what is
read, then you MUST make the read-modify-write cycle ATOMIC - that is
uninterruptible, by another process doing the same.

You need locking, and the attendant danger that if aprocess crashes with
locks set...

--
The biggest threat to humanity comes from socialism, which has utterly
diverted our attention away from what really matters to our existential
survival, to indulging in navel gazing and faux moral investigations
into what the world ought to be, whilst we fail utterly to deal with
what it actually is.

Lew Pitcher

unread,

Aug 2, 2020, 5:46:42 PM8/2/20

to

I'd roll the whole thing in a transaction

START TRANSACTION;

-- your sql goes here

COMMIT;

--
Lew Pitcher
"In Skills, We Trust"

The Natural Philosopher

unread,

Aug 3, 2020, 5:05:08 AM8/3/20

to

I don't use SQL often enough to remember that it has transactions these
days :-(

Note that you appear to need to be using innoDB or another advanced
engine and not MyISAM if you want to use transactions

--
"In our post-modern world, climate science is not powerful because it is
true: it is true because it is powerful."

Lucas Bergkamp

J.O. Aho

unread,

Aug 3, 2020, 5:38:17 AM8/3/20

to

Yeah, it's quite recently they introduced xa transactions in MySQL, 2005.

> Note that you appear to need to be using innoDB or another advanced
> engine and not MyISAM if you want to use transactions

Yes, you need a db-engine like InnoDB or XtraDB which both support xa
transactions.

--

//Aho

The Natural Philosopher

unread,

Aug 3, 2020, 11:30:29 AM8/3/20

to

On 03/08/2020 10:38, J.O. Aho wrote:
> On 03/08/2020 11.05, The Natural Philosopher wrote:
>> On 02/08/2020 22:46, Lew Pitcher wrote:
>
>>> I'd roll the whole thing in a transaction
>>>
>>> START TRANSACTION;
>>>
>>> -- your sql goes here
>>>
>>> COMMIT;
>>>
>>>
>> I don't use SQL often enough to remember that it has transactions
>> these days :-(
>
> Yeah, it's quite recently they introduced xa transactions in MySQL, 2005.
>

At my time of life that seems like yesterday...That was about when I was
needing to use them too. I remember looking at them and deciding it
wasn't necessary for that application. And it was all legacy ISAM files
as well.

>
>> Note that you appear to need to be using innoDB or another advanced
>> engine and not MyISAM if you want to use transactions
>
> Yes, you need a db-engine like InnoDB or XtraDB which both support xa
> transactions.
>

--

Canada is all right really, though not for the whole weekend.

"Saki"

Stanimir Stamenkov

unread,

Aug 3, 2020, 1:29:28 PM8/3/20

to

Sun, 2 Aug 2020 22:39:04 +0200, /Axel Schwenke/:

The current setup is similar, col5 is a new column which has a default
value for applications which don't know about it. Can't think of an
easy way to make it unique and recognize the data is from an older
application this way. There will be a transitional period where old and
new version applications will be pumping data into the table.

At the end, I'll be normalizing the data post application upgrade to
ensure all (col3, col4, col5) are unique and finally make the index
unique, but wanted to have some insurance no unnecessary duplicates may
appear in the meantime.

> 4. be happy with INSERT ... ON DUPLICATE KEY ...

Insert attempts which will end up updates are too many to waste sequence
keys for me.

--
Stanimir

Stanimir Stamenkov

unread,

Aug 3, 2020, 1:45:04 PM8/3/20

to

Sun, 2 Aug 2020 20:15:03 +0200, /J.O. Aho/:

> I would suggest you use
>
> LOCK TABLES table WRITE;
>
> <do what you did before>
>
> UNLOCK TABLES;
>
>
> LOCK TABLES table WRITE;
>
> UNLOCK TABLES;
>
> https://www.mysqltutorial.org/mysql-table-locking/
> https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html

Wouldn't this basically block concurrent inserts/updates? I don't want
to block all on the same table trying to insert/update different sets of
data.

--
Stanimir

The Natural Philosopher

unread,

Aug 3, 2020, 2:09:47 PM8/3/20

to

it wont be for long...

--
“People believe certain stories because everyone important tells them,
and people tell those stories because everyone important believes them.
Indeed, when a conventional wisdom is at its fullest strength, one’s
agreement with that conventional wisdom becomes almost a litmus test of
one’s suitability to be taken seriously.”

Paul Krugman

J.O. Aho

unread,

Aug 3, 2020, 3:55:12 PM8/3/20

to

Yes, it will, the good thing (or sometimes bad) is that the others will
be waiting for the lock to be released and then do their task.

Unless you do a lot in the transaction, the time the table is locked
tends to be a lot. If the case is that you do a lot, then I would
recommend you to break out things, so that the only thing you will do is
the check for existing values and based on that do the update or insert.
It could be good to make an insert SP that takes care of everything
(begin transaction (if innodb),locking table, check if key already
exists and do update else insert, unlock table, commit transaction (if
innodb)).
See https://dev.mysql.com/doc/refman/5.7/en/create-procedure.html

Without the begin transaction/commit transaction, there is a risk for
deadlocks, which means that the table never gets unlocked (could happen
if the locking thread unexpectedly dies) and everyone else who wait for
access the table will keep on waiting forever.

Other alternative is to move the old data that can have more than one
row with the same key to a history table and in the current table you
make the key to a primary key. This will complicate things when you need
to fetch data, you could of course be able to make a view that selects
with a union from both tables, this will make it easier when you want to
join things, but keep in mind it may slow things down.
Alternative you have a switch telling you if you need current data or
historical data, if you need the historical data, then use the history
table otherwise the current table.
As I don't know the needs, I can't give you a best option, but some
options you can look at and then decide yourself which works for you.

--

//Aho