Race condition in get_or_create

218 views
Skip to first unread message

Ian Clelland

unread,
Sep 2, 2011, 2:04:39 PM9/2/11
to django-d...@googlegroups.com
I'm seeing errors which I believe are due to a race condition in django.db.models.query.get_or_create, on a fairly high traffic site. Our production servers are running Django 1.2.5, but I don't see any changes in the code in trunk that would affect this. (I'm totally willing to construct a test case against trunk, but I'm posting this here in case it's already a recognized bug, or an error on my part)

If two requests make the same call to get_or_create(), at roughly the same time, with a database server in REPEATABLE_READ isolation level, then I believe that it's possible for the following sequence of events to occur:

1. Process 1 enters into a transaction as part of the default view middleware.
2. Process 1 calls QuerySet.get(**lookup), no result is returned. (DoesNotExist is raised)
----
3. Process 2 enters into a transaction as part of the default view middleware.
4. Process 2 calls QuerySet.get(**lookup), no result is returned. (DoesNotExist is raised)
5. Process 2 calls transaction.savepoint
6. Process 2 saves a new object
7. Process 2 commits and returns the object
----
8. Process 1 calls transaction.savepoint
9. Process 1 tries to save a new object; this locks before #7, above, and fails after #7, with an IntegrityError
10. Process 1 rolls back to the savepoint, *but does not leave the outer transaction*
11. Process 1 calls QuerySet.get(**lookup), again, *but because we're still in the outer transaction, this returns nothing*
12. Process 1 Raises an integrity error, rather than getting the new object.

Process 1 fails, because it performed the initial read inside of a transaction, but before the save point. In fact, inside of the same transaction, I believe it is impossible for the initial self.get() and the self.get() in the exception handler to return different results.

Some SQL-shell testing shows that it's possible for this to work, as long as we set the savepoint before the initial read. That way, when we catch an IntegrityError and roll back to the savepoint, the lock is released, and Process 1 can actually see the object committed by Process 2.

I expect to open up a ticket for this, unless someone can tell me "you're doing it wrong", or point me to another ticket (I've scanned the trac database, but didn't see anything identical. 15507 touches this, but won't actually do anything to solve it.)

--
Regards,
Ian Clelland
<clel...@gmail.com>

Ian Clelland

unread,
Sep 2, 2011, 2:06:06 PM9/2/11
to django-d...@googlegroups.com
And, of course, immediately after posting this, I find http://code.djangoproject.com/ticket/13906, which seems to cover much of the same area.

Ian

Jeremy Dunck

unread,
Sep 2, 2011, 2:22:07 PM9/2/11
to django-d...@googlegroups.com
On Fri, Sep 2, 2011 at 11:04 AM, Ian Clelland <clel...@gmail.com> wrote:
> I'm seeing errors which I believe are due to a race condition in
> django.db.models.query.get_or_create, on a fairly high traffic site. Our
> production servers are running Django 1.2.5, but I don't see any changes in
> the code in trunk that would affect this. (I'm totally willing to construct
> a test case against trunk, but I'm posting this here in case it's already a
> recognized bug, or an error on my part)
> If two requests make the same call to get_or_create(), at roughly the same
> time, with a database server in REPEATABLE_READ isolation level, then I
> believe that it's possible for the following sequence of events to occur:

I suspect you're using MySQL. Am I right? We just switched a mysql
site from repeatable read (the default) to read committed (postgres's
default, hence a better-tested path in django) for this very reason.

Ian Clelland

unread,
Sep 2, 2011, 2:42:53 PM9/2/11
to django-d...@googlegroups.com
We are. 5.0.51a, I believe.
READ COMMITTED would definitely solve this -- that seems to be the main point behind #13906, but there seems to be some resistance there. Have you encountered any other issues from making that switch?


Regards,
Ian Clelland
<clel...@gmail.com>

Jeremy Dunck

unread,
Sep 2, 2011, 2:44:47 PM9/2/11
to django-d...@googlegroups.com
We've been running in prod without trouble under 'read committed' for
about a week, though not under heavy load -- it's a fairly new site.

I'm not sure how much assurance I can offer at higher load, sorry.

> --
> You received this message because you are subscribed to the Google Groups
> "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to
> django-develop...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-developers?hl=en.
>

Reply all
Reply to author
Forward
0 new messages