--
Ticket URL: <https://code.djangoproject.com/ticket/19544>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* cc: Kronuz (added)
* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:1>
Comment (by aaugustin):
Which database are you using, and under which isolation level are you
running?
Can you provide the full traceback?
Thanks.
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:2>
Comment (by Kronuz):
This is traceback I'm getting:
{{{
Task core.tasks._no_retry_task_generic[802b4b61-be07-41ed-a7fa-
f5f8c671b4d8] raised exception: IntegrityError('duplicate key value
violates unique constraint
"social_connect_socialcontact_entit_socialcontact_id_user_id_key"\nDETAIL:
Key (socialcontact_id, user_id)=(56151, 2146) already exists.\n',)
Stacktrace (most recent call last):
File "celery/task/trace.py", line 212, in trace_task
R = retval = fun(*args, **kwargs)
File "core/tasks.py", line 70, in _no_retry_task_generic
return _run_task(_no_retry_task_generic, *args, **kwargs)
File "core/tasks.py", line 34, in _run_task
return callback(*_args, **_kwargs) # do not pass **kwargs (as it has
"magic" stuff)??
File "social_connect/Facebook/views.py", line 246, in _callback
join_contacts(user, social_connect_contacts, site)
File "relationships/utils.py", line 122, in join_contacts
contact.entities.add(from_entity)
File "django/db/models/fields/related.py", line 625, in add
self._add_items(self.source_field_name, self.target_field_name, *objs)
File "django/db/models/fields/related.py", line 710, in _add_items
for obj_id in new_ids
File "django/db/models/query.py", line 421, in bulk_create
self.model._base_manager._insert(objs_without_pk, fields=[f for f in
fields if not isinstance(f, AutoField)], using=self.db)
File "django/db/models/manager.py", line 203, in _insert
return insert_query(self.model, objs, fields, **kwargs)
File "django/db/models/query.py", line 1581, in insert_query
return query.get_compiler(using=using).execute_sql(return_id)
File "django/db/models/sql/compiler.py", line 910, in execute_sql
cursor.execute(sql, params)
File "django/db/backends/util.py", line 41, in execute
return self.cursor.execute(sql, params)
File "django/db/backends/postgresql_psycopg2/base.py", line 52, in
execute
return self.cursor.execute(query, args)
}}}
As you can see, I'm using pgsql, and I'm receiving this traceback from a
celery task.
The `contact.entities.add(from_entity)` is in a loop, which I'm not
changing to pass an array to add all at once instead. Hopefully that could
fix the problem, maybe some cache in the `entities` manager is not letting
`_add_items()` to properly get the already used primary keys.
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:3>
* component: Uncategorized => Database layer (models, ORM)
* type: Uncategorized => Cleanup/optimization
* stage: Unreviewed => Someday/Maybe
Comment:
Looks like there is a race condition. The ways to fix this seem to be:
- Locking the m2m table (or some portion of it). This will lead to
deadlocks so it is a no-go.
- Retrying until done: create a savepoint before bulk_create, if
IntegrityError is risen rollback, select-bulk_create again. Here the cost
is additional savepoint which can double the cost of .add().
- Doing the insert using SQL like:
{{{
INSERT INTO m2mtbl(lhs_id, rhs_id)
(SELECT lhs_id, rhs_id FROM (values(1, 2), (1, 3), (1, 4) AS
tmp(lhs_id, rhs_id)
LEFT JOIN m2mtbl ON m2mtbl.lhs_id = tmp.lhs.id and m2mtbl.rhs_id =
tmp.rhs_id WHERE m2mtbl.id IS NULL);
}}}
The last SQL is actually quite nice - we are guaranteed atomicity, and we
need to execute just a single SQL statement instead of the two done
currently. This is going to perform better unless the inserted set is huge
and most of the values already exists. However the SQL used is *highly* DB
backend specific so this isn't an easy addition to do.
I am going to mark this someday/maybe. Getting rid of the race condition
would be nice, but I don't see an easy and well performing way to do that.
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:4>
Comment (by Patrick Robertson <django@…>):
Just my 2¢ - I just got this error on my production server, which is
pretty nasty.
I'm using postgres
Any temporary solutions would be appreciated, as for me at least this is
quite a serious bug
Traceback:
This is just from a simple `form.save()` call, and I haven't experienced
it once in my testing.
I'd argue that this should be marked with a higher priority, regardless of
how easy it is to implement... :)
{{{
File "/somewhere/webapps/ordersystem/ordersystem/ordersystem/views.py",
line 1188, in form_valid
customer = form.save()
File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-
packages/django/forms/models.py", line 446, in save
construct=False)
File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-
packages/django/forms/models.py", line 100, in save_instance
save_m2m()
File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-
packages/django/forms/models.py", line 96, in save_m2m
f.save_form_data(instance, cleaned_data[f.name])
File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-
packages/django/db/models/fields/related.py", line 1529, in save_form_data
setattr(instance, self.attname, data)
File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-
packages/django/utils/functional.py", line 242, in __setattr__
setattr(self._wrapped, name, value)
File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-
packages/django/db/models/fields/related.py", line 842, in __set__
manager.add(*value)
File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-
packages/django/db/models/fields/related.py", line 583, in add
self._add_items(self.source_field_name, self.target_field_name, *objs)
File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-
packages/django/db/models/fields/related.py", line 672, in _add_items
for obj_id in new_ids
File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-
packages/django/db/models/query.py", line 359, in bulk_create
self._batched_insert(objs_without_pk, fields, batch_size)
File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-
packages/django/db/models/query.py", line 838, in _batched_insert
using=self.db)
File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-
packages/django/db/models/manager.py", line 232, in _insert
return insert_query(self.model, objs, fields, **kwargs)
File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-
packages/django/db/models/query.py", line 1514, in insert_query
return query.get_compiler(using=using).execute_sql(return_id)
File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-
packages/django/db/models/sql/compiler.py", line 903, in execute_sql
cursor.execute(sql, params)
File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-
packages/django/db/backends/util.py", line 53, in execute
return self.cursor.execute(sql, params)
File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-
packages/django/db/utils.py", line 99, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-
packages/django/db/backends/util.py", line 53, in execute
return self.cursor.execute(sql, params)
IntegrityError: duplicate key value violates unique constraint
"ordersystem_customer_disliked_vege_customer_id_vegetable_id_key"
DETAIL: Key (customer_id, vegetable_id)=(67, 37) already exists.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:5>
Comment (by Patrick Robertson <django@…>):
Looking at the code (very briefly), I think _add_items in
db/models/fields/relate.py (line 672 ish) might be what's causing the
problem.
We're saving a list of ids `new_ids` which is later used for the bulk
creation. As you can see, if this method were called twice at the same
time for whatever reason (I think in my case it was probably a double
submit of the form) then a race condition could occurr
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:6>
* status: new => closed
* resolution: => fixed
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:7>
Comment (by nihilifer):
I see that it's set as fixed and closed. That's because this bug cannot be
reproduced or it was really fixed?
Asking, because I have the same problem on Django 1.5.8.
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:8>
* status: closed => new
* resolution: fixed =>
Comment:
Reopening because the ticket was closed anonymously without any detail.
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:9>
Comment (by jdufresne):
I am running into the same issue using MySQL and Django 1.6.7. Looks like
it may be an issue across multiple database backends. Here is the
backtrace:
{{{
Traceback (most recent call last):
File "manage.py", line 13, in <module>
execute_from_command_line(sys.argv)
File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
packages/django/core/management/__init__.py", line 399, in
execute_from_command_line
utility.execute()
File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
packages/django/core/management/__init__.py", line 392, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
packages/django/core/management/base.py", line 242, in run_from_argv
self.execute(*args, **options.__dict__)
File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
packages/django/core/management/base.py", line 285, in execute
output = self.handle(*args, **options)
File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
packages/django/core/management/base.py", line 415, in handle
return self.handle_noargs(**options)
File
"/home/jon/devel/myproj/development/myapp/management/commands/thetest.py",
line 36, in handle_noargs
MyModel.objects.bulk_create(objs)
File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
packages/django/db/models/manager.py", line 160, in bulk_create
return self.get_queryset().bulk_create(*args, **kwargs)
File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
packages/django/db/models/query.py", line 359, in bulk_create
self._batched_insert(objs_without_pk, fields, batch_size)
File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
packages/django/db/models/query.py", line 838, in _batched_insert
using=self.db)
File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
packages/django/db/models/manager.py", line 232, in _insert
return insert_query(self.model, objs, fields, **kwargs)
File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
packages/django/db/models/query.py", line 1514, in insert_query
return query.get_compiler(using=using).execute_sql(return_id)
File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
packages/django/db/models/sql/compiler.py", line 903, in execute_sql
cursor.execute(sql, params)
File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
packages/django/db/backends/util.py", line 69, in execute
return super(CursorDebugWrapper, self).execute(sql, params)
File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
packages/django/db/backends/util.py", line 53, in execute
return self.cursor.execute(sql, params)
File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
packages/django/db/utils.py", line 99, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
packages/django/db/backends/util.py", line 53, in execute
return self.cursor.execute(sql, params)
File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
packages/django/db/backends/mysql/base.py", line 124, in execute
return self.cursor.execute(query, args)
File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
packages/MySQLdb/cursors.py", line 205, in execute
self.errorhandler(self, exc, value)
File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
django.db.utils.IntegrityError: (1062, "Duplicate entry 'mytype-7-6' for
key 'type'")
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:10>
Comment (by jdufresne):
I have investigated this a bit more. For me the race occurs for the
following reason:
Suppose you have `MyModel` that has a unique constraint. Suppose you have
a view of the form:
{{{
@transaction.atomic
def my_view(request):
MyModel.objects.some_long_filter().delete()
objs = builds_list_of_unsaved_models_from_request(request)
MyModel.objects.bulk_create(objs)
}}}
That is, a view that builds a list of objects in bulk. The view must first
delete the objects that were created by the previous request to ensure the
unique constraint is not violated. From my investigation, the `delete()`
will not actually call an SQL `DELETE` statement, as I originally
expected, but instead caches a list of ids of objects that exist, then
calls a new SQL query with a `DELETE .. WHERE id IN (list of ids)`. This
all happens in `DeleteQuery.delete_qs()` file
`django/db/models/sql/subqueries.py:52`.
Now suppose you have some original data. Then two requests occur at the
same time. The first request and the second request will both cache the
same ids from the original data to delete. Suppose the first request
finishes slightly earlier and commits the *new objects with new ids*. Now,
the second request will try to delete the cached ids from the original
data and *not* from the first request. Upon save there is a unique
constraint violation -- as the wrong ids were deleted -- causing the
`IntegrityError`.
Coming from an application with many raw SQL queries, this unique
constraint violation was a big surprise with the use of transactions. The
caching of ids in the `delete()` function seems a sneaky source of race
conditions.
Now, having investigated this, I am not sure if my case is the same bug
that was originally reported. If anyone thinks I should be filing this as
a separate ticket, I can move it.
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:11>
Comment (by akaariai):
I don't think comment:11 is talking about this ticket's issue (that is, it
is not about m2m add()).
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:12>
* cc: jon.dufresne@… (added)
Comment:
> I don't think comment:11 is talking about this ticket's issue (that is,
it is not about m2m add()).
I have moved my issue to ticket #23576.
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:13>
Comment (by lechatpito):
I think I have the same error using 1.6.7. Any idea how to fix this ?
Here is the trace:
{{{
Stacktrace (most recent call last):
File "celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "celery/app/trace.py", line 421, in __protected_call__
return self.run(*args, **kwargs)
File "itemizer/tasks.py", line 125, in itemize
write_entity_object_to_model(entity, ranking.top3_item,
ranking.category, ranking)
File "itemizer/tasks.py", line 453, in write_entity_object_to_model
e.categories.add(category)
File "django/db/models/fields/related.py", line 583, in add
self._add_items(self.source_field_name, self.target_field_name, *objs)
File "django/db/models/fields/related.py", line 672, in _add_items
for obj_id in new_ids
File "django/db/models/query.py", line 359, in bulk_create
self._batched_insert(objs_without_pk, fields, batch_size)
File "django/db/models/query.py", line 838, in _batched_insert
using=self.db)
File "django/db/models/manager.py", line 232, in _insert
return insert_query(self.model, objs, fields, **kwargs)
File "django/db/models/query.py", line 1514, in insert_query
return query.get_compiler(using=using).execute_sql(return_id)
File "django/db/models/sql/compiler.py", line 903, in execute_sql
cursor.execute(sql, params)
File "django/db/backends/util.py", line 53, in execute
return self.cursor.execute(sql, params)
File "django/db/utils.py", line 99, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "django/db/backends/util.py", line 53, in execute
return self.cursor.execute(sql, params)
File "django/db/backends/mysql/base.py", line 124, in execute
return self.cursor.execute(query, args)
File "MySQLdb/cursors.py", line 205, in execute
self.errorhandler(self, exc, value)
File "MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:14>
Comment (by guettli):
This StackO question tries to find a work around:
http://stackoverflow.com/questions/37654072/django-integrityerror-during-
many-to-many-add/37968648
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:15>
Comment (by gavinwahl):
The error is now in ManyRelatedManager._add_items. We try to get a list of
existing ids in `vals`, then only create ones that don't exist already.
This of course causes a race when two threads do that at the same time.
I think the only real solution is ON CONFLICT IGNORE.
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:16>
* cc: Andrey Fedoseev (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:17>
Comment (by Дилян Палаузов):
I reinvented this phenomenon in #28514.
I think what needs to be done in to teach bulk_create() the {{{INSERT ...
ON CONFLICT (DO NOTHING|IGNORE)}}} semantic and use this logic in
{{{django.db.models.fields.create_forward_many_to_many_manager.ManyRelatedManager._add_items()}}}.
#28668 is my proposal.
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:18>
* cc: Дилян Палаузов (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:19>
* cc: Simon Charette (added)
* stage: Someday/Maybe => Accepted
Comment:
Now that `bulk_create(ignore_conflicts=True)` has landed in #28668 it
should be possible to rely on it to address this issue for auto-created
many-to-many relationships.
AFAIK it's only possible to do for such relationships because we know the
exact schema of these tables and that a conflict can only happen for
`(source_field_name, target_field_name)` tuples. This is not the case for
manually defined `through` which could have other `unique_together`
defined and are allowed to be `.add(...)`'ed under certain circumstances
since #9475 was merged.
Now that
[https://github.com/django/django/commit/39ebdf5a3c5e4e979c71a18b4bad73e3946b02d0
master dropped support for PostgreSQL 9.4] the issue should only remain on
[https://github.com/django/django/blob/b8c48d06fab3f1c9d52c28422a4a1b8350f5537f/django/db/backends/oracle/features.py#L56
Oracle which doesn't support this feature].
I guess the code in `_add_items` could also be adapted to use
`get_or_create` when `supports_ignore_conflicts=False` or to deal with
`through` models if we had some form of `IntegrityError` introspection in
place to determine which constraint was violated. I think we should keep
this ticket focused on dealing with the most common case that can be
addressed with `ignore_conflicts` though.
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:20>
* has_patch: 0 => 1
Comment:
[https://github.com/django/django/pull/10995 PR] relying on
`ignore_conflicts` for auto-created intermediary models. The behaviour of
`IntegrityError` surfacing for user-defined `through` model conflicts
happens to be tested by `m2m_through.tests.M2mThroughTests`.
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:21>
* status: new => assigned
* owner: nobody => Simon Charette
Comment:
The patch should be ready for review. It also implements a fast-add
analogous to the the fast-delete algorithm when the relation and the
backend allows it (e.g. when no `m2m_changed` receivers are connected).
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:22>
Comment (by Tim Graham <timograham@…>):
In [changeset:"dd32f9a3a21272e784d434a6f9ca9f07aeedb50a" dd32f9a]:
{{{
#!CommitTicketReference repository=""
revision="dd32f9a3a21272e784d434a6f9ca9f07aeedb50a"
Refs #19544 -- Extracted ManyRelatedManager.add() missing ids logic to a
method.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:23>
Comment (by Tim Graham <timograham@…>):
In [changeset:"de7f6b51b21747e19e90d9e3e04e0cdbf84e8a75" de7f6b51]:
{{{
#!CommitTicketReference repository=""
revision="de7f6b51b21747e19e90d9e3e04e0cdbf84e8a75"
Refs #19544 -- Added a fast path for through additions if supported.
The single query insertion path is taken if the backend supports inserts
that ignore conflicts and m2m_changed signals don't have to be sent.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:25>
Comment (by Tim Graham <timograham@…>):
In [changeset:"28712d8acfffa9cdabb88cb610bae14913fa185d" 28712d8a]:
{{{
#!CommitTicketReference repository=""
revision="28712d8acfffa9cdabb88cb610bae14913fa185d"
Refs #19544 -- Ignored auto-created through additions conflicts if
supported.
This prevents IntegrityError caused by race conditions between missing ids
retrieval and bulk insertions.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:24>
* has_patch: 1 => 0
Comment:
Simon, should the ticket be closed or is there additional work that could
be done?
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:26>
* version: 1.4 => master
Comment:
Tim, the issue still remains on Oracle since it doesn't support
`bulk_create(ignore_conflicts=True)`.
Also, now that `add(through_defaults)` support was added the optimization
cannot be used for these cases for the reasons detailed in comment:20.
I'll add to this comment that `get_or_create` cannot be used to work
around the issue unless it's adapted not to send `pre_save`/`post_save`
signal somehow.
To summarize the issue is still present on Oracle or on all backends when
using `add` on a relationship defined through a user defined
`ManyToMany(through)`. I think we should favor renaming this ticket
instead of opening a new one since it contains a great discussion about
the reasoning behind the changes.
I went ahead and did just that but feel free to revert my change or adjust
the title to be more appropriate. Thanks!
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:27>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"379bf1a2d41494360d86bc3cf8adc482abca5d63" 379bf1a2]:
{{{
#!CommitTicketReference repository=""
revision="379bf1a2d41494360d86bc3cf8adc482abca5d63"
Fixed #8467 -- Prevented crash when adding existent m2m relation with an
invalid type.
This was an issue anymore on backends that allows conflicts to be
ignored (Refs #19544) as long the provided values were coercible to the
expected type. However on the remaining backends that don't support
this feature, namely Oracle, this could still result in an
IntegrityError.
By attempting to coerce the provided values to the expected types in
Python beforehand we allow the existing value set intersection in
ManyRelatedManager._get_missing_target_ids to prevent the problematic
insertion attempts.
Thanks Baptiste Mispelon for triaging this old ticket against the
current state of the master branch.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:28>
* cc: Andrey Fedoseev (removed)
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:29>
* owner: Simon Charette => (none)
* status: assigned => new
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:30>
* cc: Florian Demmer (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/19544#comment:31>