The [https://docs.djangoproject.com/en/1.8/ref/models/querysets/#bulk-
create documentation] for the Queryset method `bulk_create` states:
It does not work with child models in a multi-table inheritance scenario.
`bulk_create` also does not work with proxy inheritance, which does not
involve multiple tables. Rather than changing the documentation to reflect
that neither type of inheritance is supported, it would be better to allow
proxy inheritance to work.
= Origin of the problem =
It makes sense not to allow `bulk_create` with multiple tables. The
[https://github.com/django/django/blob/04e8d890aec8e996d568565555164a27a6a76057/django/db/models/query.py#L416
comment] in `bulk_create` explains it best:
So this case is fun. When you bulk insert you don't get the primary keys
back (if it's an autoincrement), so you can't insert into the child tables
which references this. There are two workarounds, 1) this could be
implemented if you didn't have an autoincrement pk, and 2) you could do it
by doing O(n) normal inserts into the parent tables to get the primary
keys back, and then doing a single bulk insert into the childmost table.
Some databases might allow doing this by using RETURNING clause for the
insert query. We're punting on these for now because they are relatively
rare cases.
`bulk_create` prevents inherited models from using it with
[https://github.com/django/django/blob/04e8d890aec8e996d568565555164a27a6a76057/django/db/models/query.py#L426
the following naive test]:
{{{#!python
if self.model._meta.parents:
raise ValueError("Can't bulk create an inherited model")
}}}
This test does not discriminate between multi-table inheritance and proxy
inheritance. However, simply modifying the code to read
{{{#!python
if self.model._meta.parents and not self.model._meta.proxy:
raise ValueError("Can't bulk create an inherited model with multiple
tables")
}}}
does not fix the problem because it results in more exceptions.
= Reproducing this error =
I did the following using Django 1.8.2 and Python 3.4.3 on Mac OS 10.9.5.
{{{#!bash
django-admin startproject bulk_create_proxy
cd bulk_create_proxy
./manage.py startapp app
}}}
Add `'app'` to the `INSTALLED_APPS` list in the resultant `settings.py`.
Save the following code in `app/models.py`.
{{{#!python
from django.db import models
class Parent(models.Model):
name = models.CharField(max_length=10)
class Child(Parent):
class Meta(object):
proxy = True
}}}
Save the following code in `app/tests.py`.
{{{#!python
from django.test import TestCase
from app.models import Parent, Child
class TestBulkCreate(TestCase):
def assert_can_bulk_create(self, model):
model.objects.all().delete()
model.objects.bulk_create([
model(name='a'), model(name='b'), model(name='c')])
self.assertEqual(model.objects.count(), 3)
def test_bulk_create_parent(self):
self.assert_can_bulk_create(Parent)
def test_bulk_create_child(self):
self.assert_can_bulk_create(Child)
}}}
Finish up with the following commands.
{{{#!bash
./manage.py makemigrations app
./manage.py migrate
./manage.py test
}}}
You should get the following test output. (I have elided file paths; ones
starting with `...django` come from Django.)
{{{
bulk_create_proxy ➜ ./manage.py test
Creating test database for alias 'default'...
E.
======================================================================
ERROR: test_bulk_create_child (app.tests.TestBulkCreate)
----------------------------------------------------------------------
Traceback (most recent call last):
File "...bulk_create_proxy/app/tests.py", line 18, in
test_bulk_create_child
self.assert_can_bulk_create(Child)
File "...bulk_create_proxy/app/tests.py", line 11, in
assert_can_bulk_create
model(name='a'), model(name='b'), model(name='c')])
File "...django/db/models/manager.py", line 127, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "...django/db/models/query.py", line 374, in bulk_create
raise ValueError("Can't bulk create an inherited model")
ValueError: Can't bulk create an inherited model
----------------------------------------------------------------------
Ran 2 tests in 0.003s
FAILED (errors=1)
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/24997>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_better_patch: => 0
* needs_docs: => 0
* type: Bug => New feature
* needs_tests: => 0
* stage: Unreviewed => Accepted
Comment:
If it's feasible to implement, I don't see why we couldn't do it.
--
Ticket URL: <https://code.djangoproject.com/ticket/24997#comment:1>
Comment (by wkschwartz):
I'll try to work this up into a real patch, but what I found was that the
following changes relative to the code at git tag 1.8.2 make the test
cases described in this ticket pass.
{{{#!python
--- django/db/models/query.py 2015-06-17 17:29:38.000000000 -0400
+++ django/db/models/query.py 2015-06-17 17:31:26.000000000 -0400
@@ -370,13 +370,13 @@
# this by using RETURNING clause for the insert query. We're
punting
# on these for now because they are relatively rare cases.
assert batch_size is None or batch_size > 0
- if self.model._meta.parents and not self.model._meta.proxy:
+ if self.model._meta.parents:
raise ValueError("Can't bulk create an inherited model")
if not objs:
return objs
self._for_write = True
connection = connections[self.db]
- fields = self.model._meta.concrete_fields
+ fields = self.model._meta.local_concrete_fields
objs = list(objs)
self._populate_pk_values(objs)
with transaction.atomic(using=self.db, savepoint=False):
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/24997#comment:2>
Comment (by akaariai):
One possible approach is that we change bulk_create()'s approach to be
"save the objects to the database using fastest method available". We
could add a couple of flags, too, for example signals=True/False and
with_pk=True/False. The defaults would be False and False.
Then we would check if the given database can do a fast bulk-insert, or if
we need to switch to a loop with obj.save(force_insert=True).
I guess the biggest problem is that it might be a bit surprising if
bulk_create doesn't do the insert in bulk. But even then, the current
alternative is to do the insert by a loop of save() calls, and that is
exactly what you will get from the updated bulk_create.
For example in fixture loading it would be very convenient if one could
always call bulk_create(), and get the fastest possible bulk insert for
the given model class and database backend combination.
--
Ticket URL: <https://code.djangoproject.com/ticket/24997#comment:3>
Comment (by wkschwartz):
I think that as a user, I would be surprised if `bulk_create` did not
execute in single (or very small number of, depending on `batch_size`)
INSERTs. I can imagine stackoverflow threads asking "How to force
bulk_create to use single insert," with answers to the effect of, "Back in
the day, it always executed in a single INSERT but now it's a bit of a
guessing game". I'm not saying that it would be a guessing game, I'm just
saying that having `bulk_create` decide what to do will feel opaque to
users. The current way it works has the advantage of being fully explicit.
Further, I'm not sure I see such an expansion of `bulk_create`'s duties as
being germane to allowing proxy models to use the method, except if your
goal is also to allow multi-table models to use it. But the latter
consideration seems like it should be a separate ticket.
--
Ticket URL: <https://code.djangoproject.com/ticket/24997#comment:4>
* owner: nobody => wkschwartz
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/24997#comment:5>
Comment (by wkschwartz):
Work ongoing at https://github.com/wkschwartz/django/tree/ticket_24997.
I've created a "practice" pull request at
https://github.com/wkschwartz/django/pull/1. Before submitting a real pull
request to Django, I'll squash my commits.
--
Ticket URL: <https://code.djangoproject.com/ticket/24997#comment:6>
Comment (by wkschwartz):
I have also created a forward port for Django 1.9 (based on the master
branch) at https://github.com/wkschwartz/django/tree/ticket_24997_19.
(It's simply a cherry-pick from the other branch.) The documentation in
this version of the patch still shows the bug being fixed in Django 1.8.3.
The purpose of this version of the patch is merely to make it easy to
forward-port the fix.
I've created another "practice" pull request at
https://github.com/wkschwartz/django/pull/2.
All commits are now squashed.
Tests pass on Sqlite and Postgresql.
Unless someone says otherwise beforehand, I'll submit both pull requests
tomorrow morning.
--
Ticket URL: <https://code.djangoproject.com/ticket/24997#comment:7>
Comment (by timgraham):
Per our [https://docs.djangoproject.com/en/dev/internals/release-process
/#supported-versions supported version policy], I don't think this would
qualify for a backport to 1.8. I'd suggest to take a look at our
[https://docs.djangoproject.com/en/dev/internals/contributing/writing-code
/submitting-patches/#patch-review-checklist patch review checklist] before
you submit the PR, but generally the patch looks okay.
--
Ticket URL: <https://code.djangoproject.com/ticket/24997#comment:8>
* has_patch: 0 => 1
Comment:
Pull request submitted: https://github.com/django/django/pull/4886
Note I've updated it since @timgraham looked at it this morning. It now
correctly handles the case where a proxy model inherits from a multi-table
model.
--
Ticket URL: <https://code.djangoproject.com/ticket/24997#comment:9>
* version: 1.8 => master
--
Ticket URL: <https://code.djangoproject.com/ticket/24997#comment:10>
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/24997#comment:11>
* status: assigned => closed
* resolution: => fixed
Comment:
Fixed in
https://github.com/django/django/commit/9a5cfa05a0d889db9daa7872a9697daa883f3609
--
Ticket URL: <https://code.djangoproject.com/ticket/24997#comment:12>