Fwd: Improvement to objects.get_or_create and objects.update_or_create

175 views
Skip to first unread message

Benjamin Scherrey

unread,
Aug 27, 2014, 4:04:45 PM8/27/14
to django-d...@googlegroups.com
Apologies for the cross-post. I imagine this is actually where this proposal belongs. Would anyone be interested in getting an implementation of this for consideration for incorporation into some upcoming or back ported release of Django?

  -- Ben Scherrey

---------- Forwarded message ----------
From: Benjamin Scherrey <prote...@gmail.com>
Date: Mon, Aug 25, 2014 at 4:58 PM
Subject: Improvement to objects.get_or_create and objects.update_or_create
To: django-users <django...@googlegroups.com>


Just want to run an idea by the list for a feature improvement for the oft-used convenience functions get_or_create and update_or_create. I'll just talk about get_or_create for now on but everything I saw going forward applies to both. 

It's a common idiom to populate a dictionary and simply pass it straight on to the function when creating or updating objects. While get_or_create takes a named parameter, defaults, as the content to store/update and the normal parameters are used for both search and content. The problem is that quite often you just have a dictionary of content you want to throw into the model table and don't want to separate it out into search content vs. default content. If you don't separate it out you get a big nasty where clause on the select that tries to compare every bit of content for the item.

For the most part this behavior is just a nuisance and probably sub-optimal db query. However, with the new json data type it will actually cause the code to break as you can't pass in a json structure as a search value in a where clause. Your app will break.

My proposal, which I've written in my own code, is to have get_or_create automatically select the most appropriate data to use for the search clause and then to make the rest "default" data to be applied to the found or newly created object. It first tries to see if the primary key field is present in the data structure. If that's not found then it goes through the model fields and looks for any field designated as unique in the data structure. The result is a cleaner and faster SQL request and a prevention of the above mentioned error condition. If no primary key or unique field is present in the data structure then behavior will continue as currently implemented.

Is this a change that would be welcome in the django project? If so I'd  be happy to create a pull request that implements this policy and supporting unit test coverage.

thanx,

  -- Ben 

--
Chief Systems Architect Proteus Technologies
Personal blog where I am not your demographic.

This email intended solely for those who have received it. If you have received this email by accident - well lucky you!!

Marc Tamlyn

unread,
Aug 27, 2014, 4:55:50 PM8/27/14
to django-d...@googlegroups.com
This would be fairly significantly backwards incompatible, and any heuristic based code I think belongs in user space.

It should not be too problematic to write as an external module, if it gains a lot of traction we can reconsider.

Worth also noting that the json implementation in contrib.postgres is being delayed until at least postgres 9.4 and will use the jsonb data type which does have an equality operator.


--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAHN%3D9D5-9GyYAKVn4j0xB7tvvDtUtYFFMrUxc2UxPm%3DvgebA3w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Benjamin Scherrey

unread,
Aug 27, 2014, 5:19:49 PM8/27/14
to django-d...@googlegroups.com
I don't believe the functionality is backwards incompatible at all unless I'm missing something. The new behavior of automatically selecting the optimal search field (prioritized by pk first then by any discovered field marked as unique) would only occur if the 'default' parameter was None. Then, if neither the primary key nor any unique attribute is present in the passed dictionary the original behavior would still occur. Off hand I can't think of any good expected behavior that would be impacted and certainly some bad un-expected behavior would be eliminated with the bonus of a more optimal where clause in the SQL in the majority of situations.

You're right, it wasn't hard to implement as external code as I've already done so. The main reason why I propose it for inclusion into Django's codebase is because the new behavior is actually what I would have expected these two functions to do and, so far, individuals that I've polled said the same thing. When they discover what the generated SQL actually looks like they're quite surprised.

I am indeed looking forward to Postgres 9.4's jsonb support and, apparently, 9.5 will provide individual json attribute access as well. Right now we're using the jsonfield library with 9.3 which has some good query support for Django 1.7.

-- Ben



For more options, visit https://groups.google.com/d/optout.

Tom Evans

unread,
Aug 28, 2014, 6:35:19 AM8/28/14
to django-d...@googlegroups.com
On Wed, Aug 27, 2014 at 10:19 PM, Benjamin Scherrey
<prote...@gmail.com> wrote:
> I don't believe the functionality is backwards incompatible at all unless
> I'm missing something. The new behavior of automatically selecting the
> optimal search field (prioritized by pk first then by any discovered field
> marked as unique) would only occur if the 'default' parameter was None.

So, if I passed in kwargs={'pk': 7, 'foo': 'bar', 'wibble': 'quuz'},
with "foo" and "wibble" being non-indexed fields, then the query that
would be run by get_or_create to determine whether the item exists
would differ in your new version.

This means it is not backwards compatible, as if I passed in those
arguments to get_or_create(), then I expect one of three things to
happen:

1) the object returned to have those values
2) an object is newly created that has those values
3) an error occurs because an object could not be created that has
those values.

The proposed change breaks that contract, eg it would simply return
whatever item has the pk 7, regardless of 'foo' or 'wibble'
attributes.

Cheers

Tom

Babatunde Akinyanmi

unread,
Aug 28, 2014, 8:09:02 AM8/28/14
to django-developers


On 27 Aug 2014 22:19, "Benjamin Scherrey" <prote...@gmail.com> wrote:
>

>
> You're right, it wasn't hard to implement as external code as I've already done so. The main reason why I propose it for inclusion into Django's codebase is because the new behavior is actually what I would have expected these two functions to do and, so far, individuals that I've polled said the same thing. When they discover what the generated SQL actually looks like they're quite surprised.
>

+1.

I agree on the 2 points. I'm not expert enough to comment on the backwards compatibility though.

> To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAHN%3D9D77eNhXWsbWWSETgiqkSoYKhUiTrLwZaH3_uk3Zpp5X5A%40mail.gmail.com.

Collin Anderson

unread,
Aug 28, 2014, 10:27:15 AM8/28/14
to django-d...@googlegroups.com
This is a bit magical. It means adding or removing a unique constraint could have additional unintended side effects.

Benjamin Scherrey

unread,
Aug 28, 2014, 12:33:50 PM8/28/14
to django-d...@googlegroups.com
You are. of course, correct Tom. I've never once used it that way but that is the published api/contract as you stated. Possibly it should have insisted that the programmer designate which terms are search terms separate from those which are content only rather than making it optional. But my proposal does indeed result in a very different semantic ultimately. Thanx for the feedback. Guess I'll keep wrapping it outside of the core code.

-- Ben


--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages