get_or_create can still cause IntegrityError

688 views
Skip to first unread message

Tomasz Zielinski

unread,
May 7, 2011, 6:28:27 AM5/7/11
to django-d...@googlegroups.com
I think that get_or_create is still broken, despite this fix: http://code.djangoproject.com/ticket/6641

I'm not sure if the solution is optimal, trying to INSERT first might be overally more expensive, but I think that something should be done with get_or_create.

Best,

Tomasz

David Cramer

unread,
May 7, 2011, 1:09:12 PM5/7/11
to Django developers
We hit this same issue in Postgres (it's definitely not MySQL
specific). I'm unsure of the solution or precise conditions we're
hitting it in, but I think by default we use READ COMMITTED.

On May 7, 3:28 am, Tomasz Zielinski <tomasz.zielin...@pyconsultant.eu>
wrote:
> I think that get_or_create is still broken, despite this fix:http://code.djangoproject.com/ticket/6641
>
> The details, along with a working solution, are in my answer here:http://stackoverflow.com/questions/2235318/how-do-i-deal-with-this-ra...

David Cramer

unread,
May 7, 2011, 2:03:40 PM5/7/11
to Django developers
Would it help (in PG-world) at least if the selects where in
savepoints as well?

Tomasz Zielinski

unread,
May 8, 2011, 7:30:58 AM5/8/11
to django-d...@googlegroups.com
W dniu sobota, 7 maja 2011, 20:03:40 UTC+2 użytkownik David Cramer napisał:
Would it help (in PG-world) at least if the selects where in
savepoints as well?

In case you asked me - I don't know, I don't use PG on a daily basis...


Anyway, let me list some possible solutions to launch the discussion:
1. Leave it as is, and update the documentation to warn users that get_or_create is problematic for concurrent writes unless used in READ COMMITED isolation level
2. Switch to my code and update the documentation to warn users that get_or_create commits the open transaction
3. Combine 1&2 and somehow allow two modes of operation for get_or_create, either based on a flag, or based on retrieved isolation level - or maybe add some get_or_create2 ?
4. Remove get_or_create, because it cannot be fixed to gracefully handle all common cases

Re savepoint-based solution, I don't list it because I don't "feel" how it should work.

Tomasz

Cal Leeming [Simplicity Media Ltd]

unread,
May 9, 2011, 1:05:17 PM5/9/11
to django-d...@googlegroups.com
Hi guys,

I spent literally *months* trying to find the best way to resolve this situation. On our high performance sites + backends (around 120qps, 50/50 split between read and write) we were getting IntegrityError raised by get_or_create (which we now refer to as Object Collisions).

We are able to trigger the issue by having two scripts attempt to get_or_create() a new object at the same time (it takes a few loop iterations until it reproduces itself).

When get_or_create() is called, it'll raise a duplicate key exception, but if you then attempt to do a .get() on the row using the same args/kwargs straight away afterwards, you'll notice it returns an empty result. This (we think) is because when you create a row within a transaction, although the row isn't there, it will still raise a duplicate key error until the row is either commited or rolled back.

Eventually, we created a modified get_or_create() which *reduce* the chance of it happening:


However, we are still left with a final problem, which is we either put a sleep() in the code (eww!) to increase the chance of a successful query, or you raise an ObjectNotReady exception (custom), and then add logic into your code to deal with this occurrence. The other option is that you re-factor your code so that the time between commits on large chunks of code are as small as possible, reducing the chance of this error. But, from what we can tell, you can only "reduce" the chance of raising an ObjectNotReady, you can't get rid of it entirely. (unless you get rid of transactions entirely lol)


2011/5/8 Tomasz Zielinski <tomasz.z...@pyconsultant.eu>

Tomasz

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

Tomasz Zielinski

unread,
May 11, 2011, 6:40:23 PM5/11/11
to django-d...@googlegroups.com
W dniu poniedziałek, 9 maja 2011, 19:05:17 UTC+2 użytkownik SleepyCal napisał:
Hi guys,

I spent literally *months* trying to find the best way to resolve this situation. (...)

Have you checked my StackOverflow answer I linked in the top post? It *completely* removes the problem for MySQL/InnoDB,
on REPEATABLE READ isolation level. If you use Postgres with comparable isolation level then my solution might also work for you.
For READ COMMITED level the problem shouldn't occur anyway.

Tomasz

Tomasz Zielinski

unread,
May 17, 2011, 7:16:41 AM5/17/11
to django-d...@googlegroups.com
There's a related, ongoing discussion here: http://code.djangoproject.com/ticket/13906
It looks like the general attitude is that MySQL backend should switch to READ COMMITED isolation level,
which would be the simplest solution.

Tomasz
Reply all
Reply to author
Forward
0 new messages