bulk_create on Postgresql: on conflict do nothing / post_save signal

1,256 views
Skip to first unread message

Дилян Палаузов

unread,
Sep 28, 2017, 1:11:47 PM9/28/17
to django-d...@googlegroups.com

Hello,

I want after a user request to be sure that certain objects are stored
in a Postgres database, even if before the request some of the objects
were there.

The only way I can do this with django, not talking about raw sql, is
with "for obj in objects: Model.objects.get_or_create(obj)". It works,
but creates several INSERTs, and is hence suboptimal.

I cannot use bulk_create(), which squeezes all the INSERTs to a single
one, as it does not work, if any of the to-be-inserted rows was already
in the database.

In Postgresql this can be achieved by sending "INSERT ... ON CONFLICT DO
NOTHING".

I propose changing the interface of QuerySet.bulk_create to accept one
more parameter on_conflict, that can be a string e.g. "DO NOTHING" or
Q-objects (which could be used to implement ON CONFLICT DO UPDATE SET
... WHERE ...

def bulk_create(self, objs, batch_size=None, on_conflict=None): ...

What are the arguments against or in favour?

The further, bulk_create() does not send post_save signal, because it is
difficult to implement with the standard backends, except with postgresql.

I propose extending the implementation to send the signal:

https://code.djangoproject.com/ticket/28641#comment:1

when Postgresql is used. I assume there a no users, who want to get a
(post_save) signal on save() but not on bulk_create().

Combining ON CONFLICT DO NOTHING with sending post_save gets however
nasty, as "INSERT ... ON CONFLICT DO NOTHING RETURNING id;" does not
return anything on unchanged rows, hence the system knows at the end how
much rows were changed, but not which, so it cannot determine for which
objects to send post_save. At least I have not found a way how to
figure out which rows were inserted/not inserted.

However, this can be achieved by RETURNING * and then comparing the
returned objects to the sent objects, eventually making bulk_create()
return the objects actually inserted in the database.

These changes will allow a switch to a single INSERT on Postgresql.

Regards
Дилян

Tom Forbes

unread,
Sep 28, 2017, 1:20:44 PM9/28/17
to django-d...@googlegroups.com
I've been in similar situations before, you can usually get away with using a single query to fetch existing records and only pass data that doesn't exist to bulk_create. This works great for a single identity column, but if you have multiple it gets messy.

It seems all supported databases offer at least ON CONFLICT IGNORE in some form or another, with pretty similar syntax.

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/daa88462-c095-dfcc-2ce7-6d34f6bbc2f6%40aegee.org.
For more options, visit https://groups.google.com/d/optout.

Дилян Палаузов

unread,
Oct 1, 2017, 1:45:39 PM10/1/17
to django-d...@googlegroups.com, Tom Forbes
Hello,

fetching 3GB of existing records to only pass afterwards to
bulk_create() some non-existent ones is not feasible.

I found a way to convince Postgresql to report which rows were not
inserted on INSERT ON CONFLICT DO UPDATE:

Consider this:
CREATE TABLE t (
id SERIAL PRIMARY KEY,
name VARCHAR(10) NOT NULL UNIQUE,
comment VARCHAR(10) NOT NULL);

And now the magic:

WITH
to_be_inserted AS (
SELECT 'name1' AS "name", 'comment1' as "comment" UNION ALL
SELECT 'name4', 'comment4' UNION ALL
SELECT 'name5', 'comment5'),
successfully_inserted AS (
INSERT INTO t ("name", "comment" ) SELECT *
FROM to_be_inserted ON CONFLICT DO NOTHING RETURNING *)
SELECT s.id FROM to_be_inserted AS b
LEFT JOIN successfully_inserted AS s ON (b.name = s.name AND
b.comment = s.comment);

Returns a column "id" where for each record from to_be_inserted the id
is NULL for already existing records, or the new identifier.

This way bulk_create() can be implemented, so that it sends post_save
signal for all records created, forwards ON CONFLICT DO NOTHING to the
database and returns only the objects from its input, which were
actually created.

Looking at the existing code, my feeling is that this query does not fit
anyhow in the current approaches, hence I will be very glad if somebody
gets expired from this idea and implements it in Django.

Greetings
Дилян

On 09/28/2017 07:20 PM, Tom Forbes wrote:
> I've been in similar situations before, you can usually get away with
> using a single query to fetch existing records and only pass data that
> doesn't exist to bulk_create. This works great for a single identity
> column, but if you have multiple it gets messy.
>
> It seems all supported databases offer at least ON CONFLICT IGNORE in
> some form or another, with pretty similar syntax.
>
> On 28 Sep 2017 18:11, "Дилян Палаузов" <dpa-d...@aegee.org
> send an email to django-develop...@googlegroups.com
> <mailto:django-developers%2Bunsu...@googlegroups.com>.
> To post to this group, send email to
> django-d...@googlegroups.com
> <mailto:django-d...@googlegroups.com>.
> <https://groups.google.com/group/django-developers>.
> <https://groups.google.com/d/msgid/django-developers/daa88462-c095-dfcc-2ce7-6d34f6bbc2f6%40aegee.org>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-develop...@googlegroups.com
> <mailto:django-develop...@googlegroups.com>.
> To post to this group, send email to django-d...@googlegroups.com
> <mailto:django-d...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/CAFNZOJOUGzYixc4cvPF2%2B_VTo2YwqwVDk%2BTkErxW3hjXvpaXbQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/django-developers/CAFNZOJOUGzYixc4cvPF2%2B_VTo2YwqwVDk%2BTkErxW3hjXvpaXbQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages