--
Ticket URL: <https://code.djangoproject.com/ticket/20429>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_better_patch: => 1
* stage: Unreviewed => Accepted
* needs_tests: => 1
* needs_docs: => 1
Comment:
Upsert is a fairly common idea; seems worthwhile to me.
Regarding implementation: there's a lot of common ground between
get_or_create and update_or_create, so I'm wondering if there should be
some common base implementation, rather than layering one over the other.
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:1>
* owner: nobody => elektrrrus
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:2>
Comment (by tunixman):
I hadn't thought about that. That's a good idea I'd be willing to look
into if needed.
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:3>
Comment (by elektrrrus):
Pull request created: https://github.com/django/django/pull/1132
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:4>
Comment (by tunixman):
Should we at all be concerned in the update code about limiting updated
attributes to just fields? It seems like it could potentially allow some
subtle and difficult to identify bugs if any updates are allowed.
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:5>
Comment (by elektrrrus):
Yes, You have right. I'll add checking if updated fields are model
attributes.
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:6>
Comment (by elektrrrus):
I've changed approach according to HonzaKral proposition, currently
utilizing update method on QuerySet.
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:7>
Comment (by tunixman):
Using `update()` won't call `save` on the created or updated object
(https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.update),
which is slightly different from the `get_or_create()` API
(https://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-
create). The difference may be something we should keep in mind when
deciding on the final implementation. My preference is that they behave
the same in this regard (i.e. `save()` is called with both functions).
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:8>
Comment (by elektrrrus):
Yes, i know. I will talk also tomorrow with other developers on sprints
about this approach.
Maybe the best way would be to choose through some kwarg to use wheater
update or save?
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:9>
Comment (by tunixman):
Maybe calling it `update_or_create` is misleading, too, and calling it
something like `upsert` is better. This would avoid the name collision
with `update` on `QuerySet`, and then we can follow the semantics of
`get_or_create` more closely. But I do think following the semantics of
`get_or_create` are more important than other considerations. Callers will
either expect `save` to have been called after updating to perform
application-level consistency checks, or people will need to remember to
call `save` after each `update_or_create` call, and being safe by default
is important to me.
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:10>
Comment (by elektrrrus):
Here is django-developers thread:
https://groups.google.com/forum/?fromgroups#!topic/django-
developers/cE0oTNMZaKU
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:11>
Comment (by elektrrrus):
After short talk with @russellm decision is to follow get_or_create
approach with calling save on model.
I'll update pull request shortly.
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:12>
Comment (by elektrrrus):
Pull request updated.
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:13>
Comment (by russellm):
Review notes on the pull request.
Also - when submitting a patch that you want reviewed, it helps if you
uncheck the "Patch needs improvement" flag. The list of patches needing
review is essentially the list with a patch, that doesn't need
improvement, and doesn't need tests or docs. If you don't update the
flags, your patch will likely end up being missed for review.
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:14>
* needs_better_patch: 1 => 0
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:15>
Comment (by elektrrrus):
@russelm, waitng for Your second review, i've pushed new commit with
updates according Your sugessions(and also uncheck "Patch needs
improvement" flag).
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:16>
* needs_better_patch: 0 => 1
Comment:
Is the current patch still at:
https://github.com/django/django/pull/1132/files ?
Probably by accident documentation of the first and last method have been
removed.
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:17>
* cc: timograham@… (added)
* needs_better_patch: 1 => 0
* needs_docs: 1 => 0
* needs_tests: 1 => 0
Comment:
Updated pull request to address outstanding issues:
https://github.com/django/django/pull/1248
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:18>
* needs_better_patch: 0 => 1
Comment:
Some backends like mysql can handle upsert natively, update_or_create
should use that as advantage.
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:19>
Comment (by timo):
I imagine that would increase the complexity of the patch quite a bit. I
wonder if we need that optimization for the first version of this? Is the
anonymous commenter willing to implement it?
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:20>
* needs_better_patch: 1 => 0
Comment:
As suggested in IRC, I used [https://github.com/django/djangobench
djangobench] to test this change, but I didn't get any conclusive results
(this is my first time using djangobench, so I'm not certain how to
interpret the results). Sometimes djangobench report a "significant"
difference (both slower and faster), other times there was no difference.
Here are a couple results. I'm assuming the difference is negligible
enough to ignore.
{{{
$ djangobench --control=master --experiment=ticket_20429
query_get_or_create -t 500
Running 'query_get_or_create' benchmark ...
Min: 0.000931 -> 0.000944: 1.0141x slower
Avg: 0.000997 -> 0.000979: 1.0187x faster
Significant (t=3.474695)
Stddev: 0.00011 -> 0.00005: 2.1908x smaller (N = 500)
Running 'query_get_or_create' benchmark ...
Min: 0.000935 -> 0.000947: 1.0130x slower
Avg: 0.000975 -> 0.001007: 1.0335x slower
Significant (t=-7.464501)
Stddev: 0.00006 -> 0.00008: 1.3357x larger (N = 500)
Running 'query_get_or_create' benchmark ...
Min: 0.000931 -> 0.000939: 1.0087x slower
Avg: 0.000999 -> 0.001006: 1.0076x slower
Not significant
Stddev: 0.00014 -> 0.00013: 1.0402x smaller (N = 500)
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:21>
Comment (by loic84):
As mentioned on IRC, I believe this method suffers from a race-condition
(especially if used in a view) which should be documented.
{{{
Warning:
This method is prone to race-condition and can result in multiple rows
being
inserted simultaneously if uniqueness is not enforced at the database
level
(see either ``unique`` or ``unique_together``).
During a race where uniqueness is inforced, all simultaneous calls to
``create_or_update`` but one will raise an ``IntegrityError``.
Without uniqueness inforcement, any ``create_or_update`` call with the
same set
of parameters that happens after a race will raise
``MultipleObjectsReturned``.
}}}
Like `get_or_create`, it might be good to recommend it as a convenience
for scripts rather than something to use in a view.
Refs #12579
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:22>
Comment (by tunixman):
If you're not enforcing uniqueness constraints in the database you're
going to have this race condition no matter what. Perhaps the
documentation around creating models and the basics of the tutorials could
be updated to reflect this. Documenting it as something endemic only to
these methods does nothing to improve the data consistency of Django
applications and would rather just discourage people from using these
methods which are basically the codification of the proper way to handle
these cases along side proper database implementation.
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:23>
Comment (by loic84):
The integrity issue is just a side effect of the race-condition, even if
uniqueness is enforced at the database level, you still need to be aware
of the potential 500 you will be getting.
I agree that any `get()` followed by `create()` in user code suffers from
the same issue, but when you do it manually you have a chance to think
about the potential problems. There is a layer of opacity when you use a
queryset method like `update_or_create()` and you may assume that all
problems have been solved by the framework. Eventually you get to shoot
yourself in the foot, which results in tickets like #12579.
`update()` documents that it's free of race-condition,
`select_for_update()` is obviously free of race-condition on supported
backends, it wouldn't be too hard to assume that `update_or_create()`
provides such guarantee as well.
I agree that the race-condition for these common patterns should be
documented (if that's not already the case), but convenience methods like
`get_or_create()` and `update_or_create()` also need to be clear about
their implementation and potential consequences.
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:24>
Comment (by timo):
I've updated the patch with the documentation suggested by @loic84 and
also to reflect the fact that this will have to go into 1.7 rather than
1.6.
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:25>
Comment (by tunixman):
Race conditions are conditions that silently accept corruption, so
database constraints are the exact opposite of race conditions. They
inform the programmer that data integrity was preserved, rather than
silently corrupting it. Dealing with those exceptions at that point
properly is the right thing, because it prevents having to go through the
database much later trying to understand why there are duplicates of some
rows but not others, and why there are other more opaque exceptions (like
.get() failing), or why relations to one item are actually to several
different items. Those are race conditions, but loudly saying "I'm sorry I
can't corrupt your data for you" is not a race condition, and I strongly
think Django should be choosing the non-race-condition approach with loud
and immediate errors rather than the race-condition approach with no
errors at the time of corruption, but loud and opaque errors much later.
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:26>
* stage: Accepted => Ready for checkin
Comment:
Replying to [comment:26 tunixman]:
> I strongly think Django should be choosing the non-race-condition
approach with loud and immediate errors rather than the race-condition
approach with no errors at the time of corruption, but loud and opaque
errors much later.
I'm not sure I understand how you would do that. If you are referring to
the backend specific upsert, it could be added in the future.
@timo: Left a comment on the PR, other than that I'm happy with the
current patch.
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:27>
Comment (by tunixman):
Can you take me off this ticket then? I'm through and don't need the
notifications about it anymore.
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:28>
* stage: Ready for checkin => Accepted
Comment:
I've updated the patch, changing the signature of `get_or_create` and
`update_or_create` to `(defaults=None, **kwargs)`. I'd like another person
to review the patch before committing it.
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:29>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"6272d2f155adb4f32ef129d57e9eb5493ebde6ed"]:
{{{
#!CommitTicketReference repository=""
revision="6272d2f155adb4f32ef129d57e9eb5493ebde6ed"
Fixed #20429 -- Added QuerySet.update_or_create
Thanks tunixman for the suggestion and Loic Bistuer for the review.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:30>
Comment (by Tim Graham <timograham@…>):
In [changeset:"f7290581fe2106c08d97215ab93e27cf6b27e100"]:
{{{
#!CommitTicketReference repository=""
revision="f7290581fe2106c08d97215ab93e27cf6b27e100"
Fixed a regression with get_or_create and virtual fields.
refs #20429
Thanks Simon Charette for the report and review.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:31>
Comment (by Andrew Godwin <andrew@…>):
In [changeset:"3f416f637918cc162877be95a59d50825b203089"]:
{{{
#!CommitTicketReference repository=""
revision="3f416f637918cc162877be95a59d50825b203089"
Fixed a regression with get_or_create and virtual fields.
refs #20429
Thanks Simon Charette for the report and review.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/20429#comment:32>