deadlock on INSERT IGNORE ... SELECT ... WHERE NOT EXISTS ...

jimr

unread,

May 6, 2011, 6:02:16 PM5/6/11

to

Hi,

I'm hoping I can get some pointers to documentation that I should
review to understand the behavior of some SQL under MySQL.

I've got an InnoDB table 'resource' where I can't build a unique index
on a column 'path' due to size limitations:

CREATE TABLE resource (
id BIGINT NOT NULL,
path VARCHAR(1024) NOT NULL,
PRIMARY KEY (id),
INDEX path (path(128))
) ENGINE = InnoDB CHARACTER SET = utf8;

I was looking at using this prepared statment SQL to insert distinct
paths into the table:

INSERT IGNORE INTO resource (id, path)
SELECT COALESCE(MAX(id)+1, 0), ?
FROM resource
WHERE NOT EXISTS (SELECT id FROM resource WHERE path = ? LIMIT 1)

while this appears to work, I'm not at all sure if it's a bad way to
try and accomplish what I'm after. I was reluctant to use a separate
counter table (so I used COALESCE(MAX(id)+1, 0)), and thought the
subselect would do the right thing regarding duplicate paths.

I can run this serially over my test data and it does the right thing.
When I run it under heavy concurrent row (testing the unlikely
scenerio of many threads calling the statement w/ the same path
value), I occasionally see MySQL telling me the query deadlocked and
that I should try it again.

Coding up retry logic I do find that I consistently get the expected
results, but I am wondering: (a) is there a better "form" of this
statement, (b) why does it deadlock?

Given that this is the *only* sql executing on the database, I thought
it'd never be able to deadlock.

Thank you,

Jim

Luuk

unread,

May 7, 2011, 4:49:43 AM5/7/11

to

On 07-05-2011 00:02, jimr wrote:
> Hi,
>
> I'm hoping I can get some pointers to documentation that I should
> review to understand the behavior of some SQL under MySQL.
>
> I've got an InnoDB table 'resource' where I can't build a unique index
> on a column 'path' due to size limitations:
>
> CREATE TABLE resource (
> id BIGINT NOT NULL,
> path VARCHAR(1024) NOT NULL,
> PRIMARY KEY (id),
> INDEX path (path(128))
> ) ENGINE = InnoDB CHARACTER SET = utf8;
>
> I was looking at using this prepared statment SQL to insert distinct
> paths into the table:
>
> INSERT IGNORE INTO resource (id, path)
> SELECT COALESCE(MAX(id)+1, 0), ?
> FROM resource
> WHERE NOT EXISTS (SELECT id FROM resource WHERE path = ? LIMIT 1)
>

> while this appears to work, I'm not at all sure if it's a bad way to
> try and accomplish what I'm after. I was reluctant to use a separate
> counter table (so I used COALESCE(MAX(id)+1, 0)), and thought the
> subselect would do the right thing regarding duplicate paths.

I dont think you need the COALESCE here

because when you compare:
INSERT IGNORE INTO resource (id,path) values (0,'test');
and:
INSERT IGNORE INTO resource (id,path) values (NULL,'test');

The second on is better because the first one will work if you do not
have a record where id=0

>
> I can run this serially over my test data and it does the right thing.
> When I run it under heavy concurrent row (testing the unlikely
> scenerio of many threads calling the statement w/ the same path
> value), I occasionally see MySQL telling me the query deadlocked and
> that I should try it again.
>
> Coding up retry logic I do find that I consistently get the expected
> results, but I am wondering: (a) is there a better "form" of this
> statement,

There must be, but i would start using another language like PHP to
check if the path is already in the table before trying to insert it...

>(b) why does it deadlock?

Any errors in the error.log? Or, to ask the question differenlty how did
you determine there is a deadlock?

>
> Given that this is the *only* sql executing on the database, I thought
> it'd never be able to deadlock.
>
>
> Thank you,
>
> Jim

--
Luuk

jimr

unread,

May 7, 2011, 10:11:18 AM5/7/11

to

On Saturday, May 7, 2011 1:49:43 AM UTC-7, Luuk wrote:
>
> On 07-05-2011 00:02, jimr wrote:
> >
> > while this appears to work, I'm not at all sure if it's a bad way to
> > try and accomplish what I'm after. I was reluctant to use a separate
> > counter table (so I used COALESCE(MAX(id)+1, 0)), and thought the
> > subselect would do the right thing regarding duplicate paths.
>
> I dont think you need the COALESCE here
>
> because when you compare:
> INSERT IGNORE INTO resource (id,path) values (0,'test');
> and:
> INSERT IGNORE INTO resource (id,path) values (NULL,'test');
>
> The second on is better because the first one will work if you do not
> have a record where id=0

Does that assume I am using auto incrementing on the id table? I'm
not currently -- I had read a few bug reports about auto incrementing
not working properly, and wasn't positive of the state of the feature.

> > Coding up retry logic I do find that I consistently get the expected
> > results, but I am wondering: (a) is there a better "form" of this
> > statement,
>
> There must be, but i would start using another language like PHP to
> check if the path is already in the table before trying to insert it...

I had hoped to avoid using a multi-step transaction and locking the
entire table.

> >(b) why does it deadlock?
>
> Any errors in the error.log? Or, to ask the question differenlty how did
> you determine there is a deadlock?

The MySQL server reported the deadlock to the client:

"Deadlock found when trying to get lock; try restarting transaction"

It's discussed here

http://dev.mysql.com/doc/refman/5.0/en/innodb-lock-modes.html

and I am wondering if the SUBSELECT is causing a shared lock and the
sort of issue described in that page. If so, I was hoping there was
something I could read up on that told me how to rewrite the
expression to avoid the problem.

Jim

Luuk

unread,

May 7, 2011, 12:20:51 PM5/7/11

to

On 07-05-2011 16:11, jimr wrote:
> On Saturday, May 7, 2011 1:49:43 AM UTC-7, Luuk wrote:
>>
>> On 07-05-2011 00:02, jimr wrote:
>>>
>>> while this appears to work, I'm not at all sure if it's a bad way to
>>> try and accomplish what I'm after. I was reluctant to use a separate
>>> counter table (so I used COALESCE(MAX(id)+1, 0)), and thought the
>>> subselect would do the right thing regarding duplicate paths.
>>
>> I dont think you need the COALESCE here
>>
>> because when you compare:
>> INSERT IGNORE INTO resource (id,path) values (0,'test');
>> and:
>> INSERT IGNORE INTO resource (id,path) values (NULL,'test');
>>
>> The second on is better because the first one will work if you do not
>> have a record where id=0
>
> Does that assume I am using auto incrementing on the id table? I'm
> not currently -- I had read a few bug reports about auto incrementing
> not working properly, and wasn't positive of the state of the feature.

no, this is not assuming auto increment...

>
>>> Coding up retry logic I do find that I consistently get the expected
>>> results, but I am wondering: (a) is there a better "form" of this
>>> statement,
>>
>> There must be, but i would start using another language like PHP to
>> check if the path is already in the table before trying to insert it...
>
> I had hoped to avoid using a multi-step transaction and locking the
> entire table.
>
>>> (b) why does it deadlock?
>>
>> Any errors in the error.log? Or, to ask the question differenlty how did
>> you determine there is a deadlock?
>
> The MySQL server reported the deadlock to the client:
>
> "Deadlock found when trying to get lock; try restarting transaction"
>
> It's discussed here
>
> http://dev.mysql.com/doc/refman/5.0/en/innodb-lock-modes.html
>
> and I am wondering if the SUBSELECT is causing a shared lock and the
> sort of issue described in that page. If so, I was hoping there was
> something I could read up on that told me how to rewrite the
> expression to avoid the problem.
>
>
> Jim

Maybe you could create a new temporary table (resourceTMP), and insert
the new values there, with their correct id's, leaving out the existing
values

At the end you do a :
INSERT INTO resource VALUES (id, path) SELECT id, path FROM resourceTMP

This would avoid the shared lock, if it is created,

and only inserting the new record should be faster then what you are
doing now...

--
Luuk

Jerry Stuckle

unread,

May 7, 2011, 8:38:17 PM5/7/11

to

On 5/7/2011 10:11 AM, jimr wrote:
> On Saturday, May 7, 2011 1:49:43 AM UTC-7, Luuk wrote:
>>
>> On 07-05-2011 00:02, jimr wrote:
>>>
>>> while this appears to work, I'm not at all sure if it's a bad way to
>>> try and accomplish what I'm after. I was reluctant to use a separate
>>> counter table (so I used COALESCE(MAX(id)+1, 0)), and thought the
>>> subselect would do the right thing regarding duplicate paths.
>>
>> I dont think you need the COALESCE here
>>
>> because when you compare:
>> INSERT IGNORE INTO resource (id,path) values (0,'test');
>> and:
>> INSERT IGNORE INTO resource (id,path) values (NULL,'test');
>>
>> The second on is better because the first one will work if you do not
>> have a record where id=0
>
> Does that assume I am using auto incrementing on the id table? I'm
> not currently -- I had read a few bug reports about auto incrementing
> not working properly, and wasn't positive of the state of the feature.
>

I don't know where you got that idea. auto_increment has been working
just fine for years.

>>> Coding up retry logic I do find that I consistently get the expected
>>> results, but I am wondering: (a) is there a better "form" of this
>>> statement,
>>
>> There must be, but i would start using another language like PHP to
>> check if the path is already in the table before trying to insert it...
>
> I had hoped to avoid using a multi-step transaction and locking the
> entire table.
>
>>> (b) why does it deadlock?
>>
>> Any errors in the error.log? Or, to ask the question differenlty how did
>> you determine there is a deadlock?
>
> The MySQL server reported the deadlock to the client:
>
> "Deadlock found when trying to get lock; try restarting transaction"
>
> It's discussed here
>
> http://dev.mysql.com/doc/refman/5.0/en/innodb-lock-modes.html
>
> and I am wondering if the SUBSELECT is causing a shared lock and the
> sort of issue described in that page. If so, I was hoping there was
> something I could read up on that told me how to rewrite the
> expression to avoid the problem.
>
>
> Jim

Use auto_increment values should resolve your problems.

I was going to say something else about your design, but you didn't
reply to the message - you started a new thread. So the old message
isn't available right now, and it's not worth my time to go back and
search for it. So I won't.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstu...@attglobal.net
==================

Jerry Stuckle

unread,

May 7, 2011, 8:41:02 PM5/7/11

to

On 5/7/2011 10:11 AM, jimr wrote:

> On Saturday, May 7, 2011 1:49:43 AM UTC-7, Luuk wrote:
>>
>> On 07-05-2011 00:02, jimr wrote:
>>>
>>> while this appears to work, I'm not at all sure if it's a bad way to
>>> try and accomplish what I'm after. I was reluctant to use a separate
>>> counter table (so I used COALESCE(MAX(id)+1, 0)), and thought the
>>> subselect would do the right thing regarding duplicate paths.
>>
>> I dont think you need the COALESCE here
>>
>> because when you compare:
>> INSERT IGNORE INTO resource (id,path) values (0,'test');
>> and:
>> INSERT IGNORE INTO resource (id,path) values (NULL,'test');
>>
>> The second on is better because the first one will work if you do not
>> have a record where id=0
>
> Does that assume I am using auto incrementing on the id table? I'm
> not currently -- I had read a few bug reports about auto incrementing
> not working properly, and wasn't positive of the state of the feature.
>

I don't know where you got that idea. auto_increment has been working
just fine for years.

>>> Coding up retry logic I do find that I consistently get the expected

>>> results, but I am wondering: (a) is there a better "form" of this
>>> statement,
>>
>> There must be, but i would start using another language like PHP to
>> check if the path is already in the table before trying to insert it...
>
> I had hoped to avoid using a multi-step transaction and locking the
> entire table.
>
>>> (b) why does it deadlock?
>>
>> Any errors in the error.log? Or, to ask the question differenlty how did
>> you determine there is a deadlock?
>
> The MySQL server reported the deadlock to the client:
>
> "Deadlock found when trying to get lock; try restarting transaction"
>
> It's discussed here
>
> http://dev.mysql.com/doc/refman/5.0/en/innodb-lock-modes.html
>
> and I am wondering if the SUBSELECT is causing a shared lock and the
> sort of issue described in that page. If so, I was hoping there was
> something I could read up on that told me how to rewrite the
> expression to avoid the problem.
>
>
> Jim

Use auto_increment values should resolve your problems.

Jerry Stuckle

unread,

May 7, 2011, 9:19:01 PM5/7/11

to

On 5/7/2011 8:41 PM, Jerry Stuckle wrote:

Sorry for the duplicate post. News reader said the first one failed due
to server problems.