|Race condition in get_or_create||Ian Clelland||9/2/11 11:04 AM|
I'm seeing errors which I believe are due to a race condition in django.db.models.query.get_or_create, on a fairly high traffic site. Our production servers are running Django 1.2.5, but I don't see any changes in the code in trunk that would affect this. (I'm totally willing to construct a test case against trunk, but I'm posting this here in case it's already a recognized bug, or an error on my part)
If two requests make the same call to get_or_create(), at roughly the same time, with a database server in REPEATABLE_READ isolation level, then I believe that it's possible for the following sequence of events to occur:
1. Process 1 enters into a transaction as part of the default view middleware.
2. Process 1 calls QuerySet.get(**lookup), no result is returned. (DoesNotExist is raised)
3. Process 2 enters into a transaction as part of the default view middleware.
4. Process 2 calls QuerySet.get(**lookup), no result is returned. (DoesNotExist is raised)
5. Process 2 calls transaction.savepoint
6. Process 2 saves a new object
7. Process 2 commits and returns the object
8. Process 1 calls transaction.savepoint
9. Process 1 tries to save a new object; this locks before #7, above, and fails after #7, with an IntegrityError
10. Process 1 rolls back to the savepoint, *but does not leave the outer transaction*
11. Process 1 calls QuerySet.get(**lookup), again, *but because we're still in the outer transaction, this returns nothing*
12. Process 1 Raises an integrity error, rather than getting the new object.
Process 1 fails, because it performed the initial read inside of a transaction, but before the save point. In fact, inside of the same transaction, I believe it is impossible for the initial self.get() and the self.get() in the exception handler to return different results.
Some SQL-shell testing shows that it's possible for this to work, as long as we set the savepoint before the initial read. That way, when we catch an IntegrityError and roll back to the savepoint, the lock is released, and Process 1 can actually see the object committed by Process 2.
I expect to open up a ticket for this, unless someone can tell me "you're doing it wrong", or point me to another ticket (I've scanned the trac database, but didn't see anything identical. 15507 touches this, but won't actually do anything to solve it.)
|Re: Race condition in get_or_create||Ian Clelland||9/2/11 11:06 AM|
|Re: Race condition in get_or_create||jdunck||9/2/11 11:22 AM|
On Fri, Sep 2, 2011 at 11:04 AM, Ian Clelland <clel...@gmail.com> wrote:
> I'm seeing errors which I believe are due to a race condition in
I suspect you're using MySQL. Am I right? We just switched a mysql
|Re: Race condition in get_or_create||Ian Clelland||9/2/11 11:42 AM|
We are. 5.0.51a, I believe.
READ COMMITTED would definitely solve this -- that seems to be the main point behind #13906, but there seems to be some resistance there. Have you encountered any other issues from making that switch?
|Re: Race condition in get_or_create||jdunck||9/2/11 11:44 AM|
We've been running in prod without trouble under 'read committed' for
about a week, though not under heavy load -- it's a fairly new site.
I'm not sure how much assurance I can offer at higher load, sorry.