PostGres 9.5 Upsert

1,073 views
Skip to first unread message

bliy...@rentlytics.com

unread,
Jan 8, 2016, 7:13:26 PM1/8/16
to Django developers (Contributions to Django itself)
Hey Guys,

Postgres 9.5 has added the functionality for UPSERT aka update or insert.  Any interest in aligning UPSERT on the db layer with the get_or_create or update_or_create functionality in django?  Sounds like my company would be interested in doing the work if the PR will get the traction.

-Ben

Florian Apolloner

unread,
Jan 8, 2016, 7:15:31 PM1/8/16
to Django developers (Contributions to Django itself)
Absolutely!

Carl Meyer

unread,
Jan 9, 2016, 6:12:14 PM1/9/16
to django-d...@googlegroups.com
That'd be great! I can't see any reason why a good PR for that wouldn't
be accepted. There's no reason to be using the ugly algorithms in
`get_or_create` or `update_or_create` when native UPSERT is available at
the DB level.

Carl

signature.asc

Cristiano Coelho

unread,
Jan 9, 2016, 6:18:50 PM1/9/16
to Django developers (Contributions to Django itself)
I agree! Also, does this already happen for the MySQL backend? MySQL has the insert on conflict update, that could work the same way.
However, if I'm not wrong, the docs states that the above methods have a race condition (obvious since right now it does two operations), but if the code would actually use native database operations, the race conditions might be gone for those cases, so that should probably be documented as well.

Anssi Kääriäinen

unread,
Jan 10, 2016, 3:09:37 AM1/10/16
to django-d...@googlegroups.com
If I recall correctly, MySQL doesn't offer a way to specify on which index you want to do the conflict resolution. This leads to problems - the upsert might affect the wrong row if there are multiple unique indexes on the table.

PostgreSQL's version of upsert has a problem, too. It doesn't offer a direct way to know if the result of the upsert was an insert or update, but Django needs that knowledge, at least for save(). Maybe there is a way (the oid return value seems promising).

For get_or_create and update_or_create the problem is that the user is free to offer any condition to be used for matching, but PostgreSQL limits the upsert matching to columns in unique index. So, we can use upsert only for unique index cases.

The save() operation matches the semantics of upsert exactly - maybe we could use upsert there?

 - Anssi
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/ae38ba8e-3e79-47fb-92b9-dd305176c58e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Florian Apolloner

unread,
Jan 10, 2016, 4:10:24 AM1/10/16
to Django developers (Contributions to Django itself)


On Sunday, January 10, 2016 at 9:09:37 AM UTC+1, Anssi Kääriäinen wrote:
The save() operation matches the semantics of upsert exactly - maybe we could use upsert there?

Not sure .save() would be a good candidate for that. In the best case, .save() should execute one INSERT or UPDATE query without extra junk for "error" handling. Ie if you set a pk and want to update then pass force_update into it. I know that we already try UPDATE followed by INSERT in save() but I am wondering how often that is actually used by people.

Cheers,
Florian

Sean Brant

unread,
Jan 10, 2016, 10:22:39 AM1/10/16
to django-d...@googlegroups.com


On Jan 10, 2016, at 2:09 AM, Anssi Kääriäinen <akaa...@gmail.com> wrote:

If I recall correctly, MySQL doesn't offer a way to specify on which index you want to do the conflict resolution. This leads to problems - the upsert might affect the wrong row if there are multiple unique indexes on the table.

PostgreSQL's version of upsert has a problem, too. It doesn't offer a direct way to know if the result of the upsert was an insert or update, but Django needs that knowledge, at least for save(). Maybe there is a way (the oid return value seems promising).

For get_or_create and update_or_create the problem is that the user is free to offer any condition to be used for matching, but PostgreSQL limits the upsert matching to columns in unique index. So, we can use upsert only for unique index cases.

I've always considered this a bug waiting to happen. I have seen many errors with the get operation failing because it returned more then one value. Usually you don't notice the error until you hit production. I always suggest the lookup use fields that have unique indexes.

Changing that would be backwards incompatible so maybe it's a docs issue.


The save() operation matches the semantics of upsert exactly - maybe we could use upsert there?

 - Anssi

On Sunday, January 10, 2016, Cristiano Coelho <cristia...@gmail.com> wrote:
I agree! Also, does this already happen for the MySQL backend? MySQL has the insert on conflict update, that could work the same way.
However, if I'm not wrong, the docs states that the above methods have a race condition (obvious since right now it does two operations), but if the code would actually use native database operations, the race conditions might be gone for those cases, so that should probably be documented as well.

El viernes, 8 de enero de 2016, 21:13:26 (UTC-3), bliy...@rentlytics.com escribió:
Hey Guys,

Postgres 9.5 has added the functionality for UPSERT aka update or insert.  Any interest in aligning UPSERT on the db layer with the get_or_create or update_or_create functionality in django?  Sounds like my company would be interested in doing the work if the PR will get the traction.

-Ben

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/ae38ba8e-3e79-47fb-92b9-dd305176c58e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

Anssi Kääriäinen

unread,
Jan 11, 2016, 1:40:47 AM1/11/16
to django-d...@googlegroups.com
Yes, it is likely that save() wouldn't benefit much from using upsert.
Upsert is likely a bit slower than plain update, and we end up doing
an update in almost all cases anyways. It could be an useful option
for some use cases, but it likely isn't a good default.

- Anssi

Anssi Kääriäinen

unread,
Jan 11, 2016, 1:49:27 AM1/11/16
to django-d...@googlegroups.com
On Sun, Jan 10, 2016 at 5:22 PM, Sean Brant <brant...@gmail.com> wrote:

> I've always considered this a bug waiting to happen. I have seen many errors
> with the get operation failing because it returned more then one value.
> Usually you don't notice the error until you hit production. I always
> suggest the lookup use fields that have unique indexes.
>
> Changing that would be backwards incompatible so maybe it's a docs issue.

If there are no valid use cases for using get_or_create() without an
unique index, then we could consider doing a backwards incompatible
change here. I believe there are cases where you want to do a
get_or_create() without unique index (though I can't come up with one
right now).

We could always check if there is a backing unique index and use
get_or_create only in those cases.

- Anssi

bliy...@rentlytics.com

unread,
Jan 12, 2016, 1:54:03 PM1/12/16
to Django developers (Contributions to Django itself)
After thinking about it a bit, I think the only function that would really benefit from this would be the update_or_create.  If you're doing get_or_create you still need a second query to get the actual row.

john....@plushrugs.com

unread,
Feb 1, 2016, 10:12:02 AM2/1/16
to Django developers (Contributions to Django itself)
It should be possible to use a "RETURNING" clause to get the new row even in the instance of a get_or_create.

I occasionally use an UPDATE ... RETURNING query with Manager.raw to update a table and get modified instances in one shot.

Ben Liyanage

unread,
Feb 1, 2016, 1:04:04 PM2/1/16
to django-d...@googlegroups.com
Hey--that's pretty slick.  I'm not sure when we're going to take a shot at this implementation, but I'll keep that in mind.

-Ben

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/swBPqFi-Tdk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.

To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

For more options, visit https://groups.google.com/d/optout.



--
Ben Liyanage | Software Engineer | Rentlytics, Inc.
Phone: (410) 336-2464 | Email: bliy...@rentlytics.com
1132 Howard Street, San Francisco CA 94107

bliy...@rentlytics.com

unread,
Sep 7, 2016, 3:53:48 PM9/7/16
to Django developers (Contributions to Django itself)
Hey,

I'm picking up this thread again--sorry for reviving a sleeper.

We're looking into doing this again for some much needed performance improvements.

After looking at the feedback it looks like we would be able to deal with the created flag using the query status.

https://www.postgresql.org/docs/9.5/static/sql-insert.html
Down in the Outputs section is says:
> If count is exactly one, and the target table has OIDs, then oid is the OID assigned to the inserted row. The single row must have been inserted rather than updated.

Should be able to detect the presence of the OID in the output and set the created flag correctly.

How exactly would I go about overwriting the get_or_create function?  It looks like the get_or create is in the QuerySet object (https://github.com/django/django/blob/3c97ba2a0d3a03e89b27a7a895562e5282018613/django/db/models/query.py#L462-L475). 

And that that is returned in the base manager: https://github.com/django/django/blob/ed0ff913c648b16c4471fc9a9441d1ee48cb5420/django/db/models/manager.py#L146-L151

How would I go about overriding the base manager for postgres only?  I don't see that class instantiated/implmented in the project anywhere.  I'll admit, the github search is kinda lousy so I may have missed the spot.

-Ben


On Friday, January 8, 2016 at 4:13:26 PM UTC-8, bliy...@rentlytics.com wrote:
Reply all
Reply to author
Forward
0 new messages