{{{#!python
Comment.objects.create(name='test', text='test')
Comment.objects.bulk_create(Comment(name='test', text='test'),
on_conflict=IGNORE) # Does not throw an exception
}}}
All the databases we support have syntax for this, as well as
updating/replacing fields.
--
Ticket URL: <https://code.djangoproject.com/ticket/28668>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
Old description:
> When using bulk_create it would be nice to support `ON CONFLICT DO
> NOTHING`, which allows existing rows to be included in the `bulk_create`
> call, e.g:
>
> {{{#!python
> Comment.objects.create(name='test', text='test')
> Comment.objects.bulk_create(Comment(name='test', text='test'),
> on_conflict=IGNORE) # Does not throw an exception
> }}}
>
> All the databases we support have syntax for this, as well as
> updating/replacing fields.
New description:
When using bulk_create it would be nice to support `ON CONFLICT DO
NOTHING`, which allows existing rows to be included in the `bulk_create`
call, e.g:
{{{#!python
Comment.objects.create(name='test', text='test')
Comment.objects.bulk_create(Comment(name='test', text='test'),
on_conflict=IGNORE) # Does not throw an exception
}}}
All the databases we support have syntax for this, as well as
updating/replacing fields.
This is a spin-off of https://code.djangoproject.com/ticket/28641, just
including the ON CONFLICT idea which is backwards-compatible and would be
easier to implement than the others.
--
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:1>
* stage: Unreviewed => Accepted
Old description:
> When using bulk_create it would be nice to support `ON CONFLICT DO
> NOTHING`, which allows existing rows to be included in the `bulk_create`
> call, e.g:
>
> {{{#!python
> Comment.objects.create(name='test', text='test')
> Comment.objects.bulk_create(Comment(name='test', text='test'),
> on_conflict=IGNORE) # Does not throw an exception
> }}}
>
> All the databases we support have syntax for this, as well as
> updating/replacing fields.
>
> This is a spin-off of https://code.djangoproject.com/ticket/28641, just
> including the ON CONFLICT idea which is backwards-compatible and would be
> easier to implement than the others.
New description:
When using bulk_create it would be nice to support `ON CONFLICT DO
NOTHING`, which allows existing rows to be included in the `bulk_create`
call, e.g:
{{{#!python
Comment.objects.create(name='test', text='test')
Comment.objects.bulk_create(Comment(name='test', text='test'),
on_conflict=IGNORE) # Does not throw an exception
}}}
All the databases we support have syntax for this, as well as
updating/replacing fields.
This is a spin-off of #28641, just including the ON CONFLICT idea which is
backwards-compatible and would be easier to implement than the others.
--
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:2>
* owner: nobody => Tom
* status: new => assigned
Comment:
If in the future we want to add support for `ON CONFLICT UPDATE` we should
design the API with that in mind.
Perhaps have `DO_NOTHING` as a sentinel object, so in the future we can
pass in something else to describe updating (like a dictionary or
somesuch?).
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:3>
* has_patch: 0 => 1
Comment:
Initial PR: https://github.com/django/django/pull/9192
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:4>
Comment (by Дилян Палаузов):
Add documentation what bulk_create() is supposed to return.
I assume this not going to work, as Postgresql has
can_return_ids_from_bulk_insert, but 'INSERT ON CONFLICT DO NOTHING
RETURNING id' returns only the ids of the inserted rows, so the assert
will fail. In particular not inserted rows are not returned.
{{{
- ids = self._batched_insert(objs_without_pk, fields,
batch_size)
+ ids = self._batched_insert(objs_without_pk, fields,
batch_size, on_conflict=on_conflict)
if connection.features.can_return_ids_from_bulk_insert:
assert len(ids) == len(objs_without_pk)
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:5>
Comment (by Tom Forbes):
Good point, I'll fix it up tomorrow. There are some other issues I've
found tonight as well.
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:6>
* cc: Дилян Палаузов (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:7>
Comment (by Tom Forbes):
I fixed that assertion by skipping returning ID's from the insert if
`on_conflict` is used. They are incompatible, as you cannot tell which
models where inserted and which where skipped.
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:8>
Comment (by Дилян Палаузов):
For a table like {{{CREATE TABLE t (id SERIAL PRIMARY KEY, name
VARCHAR(10) UNIQUE, comment VARCHAR(10));}}} it is possible to tell which
models were inserted with a query like
{{{
WITH
to_be_inserted AS (SELECT * FROM (VALUES ('name12', 'comment12'),
('name5', 'comment5'), ('name6', 'comment6')) as g(name, comment)),
successfully_inserted AS (
INSERT INTO t ("name", "comment" ) SELECT *
FROM to_be_inserted ON CONFLICT DO NOTHING RETURNING *)
SELECT s.id FROM to_be_inserted AS b
LEFT JOIN successfully_inserted AS s ON (b.name = s.name AND b.comment =
s.comment);
}}}
where {{{to_be_inserted}}} contains the values that are going into the
database. The returned column contains NULL for values that were
presented in the database, and the id for the inserted rows.
With the proposed changes
{{{
- if connection.features.can_return_ids_from_bulk_insert:
+ if connection.features.can_return_ids_from_bulk_insert
and not on_conflict == 'ignore':
assert len(ids) == len(objs_without_pk)
for obj_without_pk, pk in zip(objs_without_pk, ids):
obj_without_pk.pk = pk
}}}
the for-loop will not work, as the amount of ids is not the same as the
amount of obj_without_pk.
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:9>
Comment (by Chris Beck):
Is there any consideration of this being included in a future 1.11.x
release? I'd really like to be able to use bulk_create for a fire-hose of
data that we're getting from a 3rd party who don't guarantee uniqueness so
at the moment our only solution is to remove the unique_together attribute
and enforce it at the reporting level.
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:10>
Comment (by Tim Graham):
No, new features aren't backported to stable branches. See our
[https://docs.djangoproject.com/en/dev/internals/release-process
/#supported-versions supported versions policy].
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:11>
* Attachment "on_conflict_postgresql.patch" added.
this adds INSERT ON CONFLICT IGNORE to Postgresql, where only IDs of the
actually inserted rows are returned
Comment (by Дилян Палаузов):
Let's start with the most complex case: calling
{{{.objects.bulk_create(..., on_conflict='ignore')}}} where the caller
wants to determine which of the supplied objects were actually inserted
(in order to call signals on them). We can start with the attached
minimalistic patch, which does this for Postgresql as only RDBMS allowing
efficient handling of this most complex case, and see how to implement it
in the other RDBMs.
Unfortunately, the sequence counter is automatically increased for each
object provided as parameter, even if at the end the object was already in
the database.
I propose extending the logic for bulk_create to send implicit post_save
signals for Postgresql.
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:12>
* Attachment "on_conflict_postgresql.patch" added.
this adds INSERT ON CONFLICT IGNORE to Postgresql, where only IDs of the
actually inserted rows are returned
--
* Attachment "on_conflict_postgresql.patch" added.
this adds INSERT ON CONFLICT IGNORE to Postgresql, where only IDs of the
actually inserted rows are returned
--
* Attachment "on_conflict_postgresql_signals_ids.patch" added.
In addition to the other file, this one add support for sending post_save
for the newly created objects and retrieving the ids of the objects
provided as parameters to bulk_create, that already existed in the
database.
Comment (by Tom Forbes):
Hey, I recommend consolidating your patches into a single PR on Github and
linking it here, that's the development process Django now follows.
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:13>
* Attachment "on_conflict_ignore.patch" added.
The on_conflict_ignore.patch adds on_conflict='ignore' to
QuerySet.bulk_create and empowers bulk_create optionally to send post_save
signals, when the ID of the inserted object is known to bulk_create. For
postgresql bulk_create offers in addition to retrieve the IDs of the newly
inserted objects, when using on_conflict='ignore' and, with a second
query, to find the PKs pf the supplied objs that were already in the
databse. When supported by the backend (so not Oracle),
on_conflict='ignore' is added to bulk_create() in
django.db.models.fields.create_forward_many_to_many_manager.ManyRelatedManager._add_items,
making ManyRelatedManager.add() thread-safe, possibly resolving #19544. I
have the feeling reset_sequence = True doesn't work as expected, as the
IDs are random. This patch integrated the changes of om Forbes and
previous changes of me.
* Attachment "on_conflict_ignore.patch" added.
The on_conflict_ignore.patch adds on_conflict='ignore' to
QuerySet.bulk_create and empowers bulk_create optionally to send post_save
signals, when the ID of the inserted object is known to bulk_create. For
postgresql bulk_create offers in addition to retrieve the IDs of the newly
inserted objects, when using on_conflict='ignore' and, with a second
query, to find the PKs pf the supplied objs that were already in the
databse. When supported by the backend (so not Oracle),
on_conflict='ignore' is added to bulk_create() in
django.db.models.fields.create_forward_many_to_many_manager.ManyRelatedManager._add_items,
making ManyRelatedManager.add() thread-safe, possibly resolving #19544.
--
* Attachment "on_conflict_ignore.patch" added.
The on_conflict_ignore.patch adds on_conflict='ignore' to
QuerySet.bulk_create and empowers bulk_create optionally to send post_save
signals, when the ID of the inserted object is known to bulk_create. For
postgresql bulk_create offers in addition to retrieve the IDs of the newly
inserted objects, when using on_conflict='ignore' and, with a second
query, to find the PKs pf the supplied objs that were already in the
databse. When supported by the backend (so not Oracle),
on_conflict='ignore' is added to bulk_create() in
django.db.models.fields.create_forward_many_to_many_manager.ManyRelatedManager._add_items,
making ManyRelatedManager.add() thread-safe, possibly resolving #19544.
--
* Attachment "on_conflict_ignore.2.patch" added.
* Attachment "on_conflict_ignore.patch" added.
The on_conflict_ignore.patch adds on_conflict='ignore' to
QuerySet.bulk_create and empowers bulk_create optionally to send post_save
signals, when the ID of the inserted object is known to bulk_create. For
postgresql bulk_create offers in addition to retrieve the IDs of the newly
inserted objects, when using on_conflict='ignore' and, with a second
query, to find the PKs pf the supplied objs that were already in the
databse. When supported by the backend (so not Oracle),
on_conflict='ignore' is added to bulk_create() in
django.db.models.fields.create_forward_many_to_many_manager.ManyRelatedManager._add_items,
making ManyRelatedManager.add() thread-safe, possibly resolving #19544.
--
* Attachment "on_conflict_ignore.patch" added.
The on_conflict_ignore.patch adds on_conflict='ignore' to
QuerySet.bulk_create and empowers bulk_create optionally to send post_save
signals, when the ID of the inserted object is known to bulk_create. For
postgresql bulk_create offers in addition to retrieve the IDs of the newly
inserted objects, when using on_conflict='ignore' and, with a second
query, to find the PKs pf the supplied objs that were already in the
databse. When supported by the backend (so not Oracle),
on_conflict='ignore' is added to bulk_create() in
django.db.models.fields.create_forward_many_to_many_manager.ManyRelatedManager._add_items,
making ManyRelatedManager.add() thread-safe, possibly resolving #19544.
--
* Attachment "on_conflict_ignore.patch" added.
The on_conflict_ignore.patch adds on_conflict='ignore' to
QuerySet.bulk_create and empowers bulk_create optionally to send post_save
signals, when the ID of the inserted object is known to bulk_create. For
postgresql bulk_create offers in addition to retrieve the IDs of the newly
inserted objects, when using on_conflict='ignore' and, with a second
query, to find the PKs pf the supplied objs that were already in the
databse. When supported by the backend (so not Oracle),
on_conflict='ignore' is added to bulk_create() in
django.db.models.fields.create_forward_many_to_many_manager.ManyRelatedManager._add_items,
making ManyRelatedManager.add() thread-safe, possibly resolving #19544.
--
Comment (by Дилян Палаузов):
How essential is it, that objects inserted in the database, have their
{{{._state.adding}}} and {{{._state.db}}} set? The code from
{{{bulk_create(..., on_conflict='ignore')}}} below is not called when
ids=[], which happens when database does not return the IDs.
{{{
for obj_without_pk, pk in zip(objs_without_pk, ids):
obj_without_pk.pk = pk
obj_without_pk._state.adding = False
obj_without_pk._state.db = self.db
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:14>
* Attachment "on_conflict_ignore.patch" added.
The on_conflict_ignore.patch adds on_conflict='ignore' to
QuerySet.bulk_create and empowers bulk_create optionally to send post_save
signals, when the ID of the inserted object is known to bulk_create. For
postgresql bulk_create offers in addition to retrieve the IDs of the newly
inserted objects, when using on_conflict='ignore' and, with a second
query, to find the PKs pf the supplied objs that were already in the
databse. When supported by the backend (so not Oracle),
on_conflict='ignore' is added to bulk_create() in
django.db.models.fields.create_forward_many_to_many_manager.ManyRelatedManager._add_items,
making ManyRelatedManager.add() thread-safe, possibly resolving #19544.
--
Ticket URL: <https://code.djangoproject.com/ticket/28668>
* Attachment "on_conflict_ignore.2.patch" added.
Comment (by Tom Forbes):
Hey, please create a branch on Github and submit a Pull Request with these
changes if you want them to be reviewed and tested.
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:15>
Comment (by shlomoa):
Replying to [comment:15 Tom Forbes]:
> Hey, please create a branch on Github and submit a Pull Request with
these changes if you want them to be reviewed and tested.
Anyway to contact the contributor to expedite this?
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:16>
Comment (by Дилян Палаузов):
The contributor does not work currently with Django. To expedite the
process you can follow what was stated in the last comment.
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:17>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"f1fbef6cd171ddfae41fcc901f1f60ccad039f51" f1fbef6]:
{{{
#!CommitTicketReference repository=""
revision="f1fbef6cd171ddfae41fcc901f1f60ccad039f51"
Fixed #28668 -- Allowed QuerySet.bulk_create() to ignore insert conflicts.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:18>
Comment (by Дилян Палаузов):
It is good to see progress here but can this ticket be kept open until
(for Postgres) also the possibility is implemented to return the IDs, and
the possibility tonsend post_save?
To what I remember .add() had also concurrent deficiencies, which could be
solved with ON CONFLICT IGNORE, I am nkt sure they are addressed with the
fix.
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:19>
Comment (by Tim Graham):
I think we should open new tickets for further enhancements.
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:20>
Comment (by Simon Charette):
> ... also the possibility is implemented to return the IDs
How would you expect this to work? Would `pk` only be assigned to objects
that didn't cause conflict?
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:21>
Comment (by Дилян Палаузов):
When I wrote initially on this, I meant having the knkwledge of which
records were inserted or nkt, when possible. I don't see a point to open
a new ticket, proving all the history on the matter is already here.
I expect it to work as in tbe proposed patch.
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:22>
* status: closed => new
* resolution: fixed =>
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:23>
* status: new => closed
* resolution: => fixed
Comment:
Hey Дилян, I think this ticket is considered fixed now. If you wish to add
backend specific support for returning the pk's of inserted records we
should do this as another ticket, and it no doubt will need a bit of
discussion around the implementation. There is not much of value here that
we could not carry over to a new ticket.
I'm happy to make a new ticket if you do not wish to?
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:24>
Comment (by Дилян Палаузов):
Please create a new ticket.
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:25>
Comment (by ghoostqq):
Is this feature implemented?
I get exception:
{{{
bulk_create() got an unexpected keyword argument 'ignore_conflicts'
}}}
Django version: 2.1.1
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:26>
Comment (by Simon Charette):
ghoostqq, it should be part of Django 2.2.
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:27>
* status: closed => new
* resolution: fixed =>
Comment:
Was a new ticket created?
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:28>
* status: new => closed
* resolution: => fixed
Comment:
Please don't reopen tickets for the sole purpose of leaving a comment.
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:29>
Comment (by Дилян Палаузов):
In https://code.djangoproject.com/ticket/28668#comment:24 you offered to
deviate from this ticket a new one. Where do i find it?
--
Ticket URL: <https://code.djangoproject.com/ticket/28668#comment:30>