[Django] #27408: Multiple consecutive `bulk_create`s with FK.

30 views
Skip to first unread message

Django

unread,
Oct 30, 2016, 11:24:07 PM10/30/16
to django-...@googlegroups.com
#27408: Multiple consecutive `bulk_create`s with FK.
-------------------------------------+-------------------------------------
Reporter: Jarek | Owner: nobody
Glowacki |
Type: New | Status: new
feature |
Component: Database | Version: master
layer (models, ORM) | Keywords: bulk_create foreign
Severity: Normal | key id
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
As of dj110 `bulk_create` returns pks for the objects it has created.
Great! I want more.

Going to hop right into the use case:
{{{
class A(models.Model):
pass


class B(models.Model):
a = models.ForeignKey(A, on_delete=models.PROTECT)
}}}

{{{
a = A()
b = B(a=a) # doesn't set `a_id`, since `a` has no id assigned yet.
A.bulk_create([a]) # sets `a`'s id.
B.bulk_create([b]) # error!
}}}
The first `bulk_create` call sets the id on `a`, but `b.a_id` remains
`None`.
When running the second `bulk_create`, even though `b` has an `a` set, it
fails to save `a` onto the `b` instance, due to `a_id` being None.

So if you imagine the pregeneration of several related models in a big
loop, followed by several consecutive `bulk_create` calls we run into
trouble. It's nicer than having to manually feed in ids during the loop as
we needed to in the past, but it still needs some magic.

My current solution is running a little function between each
`bulk_create` call, to fill the foreignkey ids in:
{{{
def fill_foreignkey_ids(objects, fields):
for f in fields:
for o in objects:
setattr(o, '%s_id' % f, getattr(o, f).id)
}}}
Which makes:
{{{
a = A()
b = B(a=a) # doesn't set `a_id`, since `a` has no id assigned yet.
bulk_a = [a]
bulk_b = [b]
A.bulk_create(bulk_a) # sets `a`'s id.
fill_foreignkey_ids(bulk_b, ['a'])
B.bulk_create(bulk_b) # now it works. Lovely.
}}}

But this is quite ugly.

This is more of a brainstorm ticket, as I'm not sure how to solve this
nicely. Expecting every unsaved model instance to update its foreignkey
ids when a `bulk_create` inserts them sounds seems silly. But maybe we
could have the `bulk_create` method check whether `b.a.id` is set when
`b.a_id` isn't?

I can submit a PR if this sounds like a reasonable thing to do. It feels a
bit magic, but it would greatly improve the usefulness of this new
feature.

--
Ticket URL: <https://code.djangoproject.com/ticket/27408>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Oct 31, 2016, 1:13:57 AM10/31/16
to django-...@googlegroups.com
#27408: Multiple consecutive `bulk_create`s with FK.
-------------------------------------+-------------------------------------
Reporter: Jarek Glowacki | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: bulk_create foreign | Triage Stage:
key id | Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

I haven't tried it myself but I suspect this has little to do with
`bulk_create` itself. Does the following fails in a similar manner?

{{{#!python


a = A()
b = B(a=a)

a.save()
b.save()
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/27408#comment:1>

Django

unread,
Oct 31, 2016, 2:13:41 AM10/31/16
to django-...@googlegroups.com
#27408: Multiple consecutive `bulk_create`s with FK.
-------------------------------------+-------------------------------------
Reporter: Jarek Glowacki | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: bulk_create foreign | Triage Stage:
key id | Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Jarek Glowacki):

Yep, that fails too.

The difference is use case. For single `save`s, it's easy to pass the id
through. For `bulk_create`s it's a bit messier to do that.

I suggested only updating `bulk_create` (to automatically figure out what
to do in that second call) as I wasn't sure it was feasible to make this
work in the general case - but maybe by putting some more magic in the
foreignkey field's `__get__` could work?

--
Ticket URL: <https://code.djangoproject.com/ticket/27408#comment:2>

Django

unread,
Nov 1, 2016, 4:22:23 PM11/1/16
to django-...@googlegroups.com
#27408: Make QuerySet.bulk_create() populate fields on related models

-------------------------------------+-------------------------------------
Reporter: Jarek Glowacki | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: bulk_create foreign | Triage Stage:
key id | Someday/Maybe
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham):

* stage: Unreviewed => Someday/Maybe


Old description:

New description:

As of dj110 `bulk_create` returns pks for the objects it has created.
Great! I want more.

Going to hop right into the use case:
{{{
class A(models.Model):
pass


class B(models.Model):
a = models.ForeignKey(A, on_delete=models.PROTECT)
}}}

{{{
a = A()
b = B(a=a) # doesn't set `a_id`, since `a` has no id assigned yet.

A.objects.bulk_create([a]) # sets `a`'s id.
B.objects.bulk_create([b]) # error!


}}}
The first `bulk_create` call sets the id on `a`, but `b.a_id` remains
`None`.
When running the second `bulk_create`, even though `b` has an `a` set, it
fails to save `a` onto the `b` instance, due to `a_id` being None.

So if you imagine the pregeneration of several related models in a big
loop, followed by several consecutive `bulk_create` calls we run into
trouble. It's nicer than having to manually feed in ids during the loop as
we needed to in the past, but it still needs some magic.

My current solution is running a little function between each
`bulk_create` call, to fill the foreignkey ids in:
{{{
def fill_foreignkey_ids(objects, fields):
for f in fields:
for o in objects:
setattr(o, '%s_id' % f, getattr(o, f).id)
}}}
Which makes:
{{{
a = A()
b = B(a=a) # doesn't set `a_id`, since `a` has no id assigned yet.
bulk_a = [a]
bulk_b = [b]

A.objects.bulk_create(bulk_a) # sets `a`'s id.
fill_foreignkey_ids(bulk_b, ['a'])
B.objects.bulk_create(bulk_b) # now it works. Lovely.
}}}

But this is quite ugly.

This is more of a brainstorm ticket, as I'm not sure how to solve this
nicely. Expecting every unsaved model instance to update its foreignkey
ids when a `bulk_create` inserts them sounds seems silly. But maybe we
could have the `bulk_create` method check whether `b.a.id` is set when
`b.a_id` isn't?

I can submit a PR if this sounds like a reasonable thing to do. It feels a
bit magic, but it would greatly improve the usefulness of this new
feature.

--

Comment:

I'm not sure if this is a good idea due to magic/complexity, but if we had
a patch, we could better evaluate it. I guess your use case doesn't allow
constructing `B` after `A.objects.bulk_create()`?

--
Ticket URL: <https://code.djangoproject.com/ticket/27408#comment:3>

Django

unread,
Nov 1, 2016, 5:08:13 PM11/1/16
to django-...@googlegroups.com
#27408: Make QuerySet.bulk_create() populate fields on related models
-------------------------------------+-------------------------------------
Reporter: Jarek Glowacki | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: bulk_create foreign | Triage Stage:
key id | Someday/Maybe
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Jarek Glowacki):

That's correct, my use case requires `for`ing through a large blob of
data, parsing it and creating several different models per iteration. So
all the `bulk_creates` happen at the end.

I'll have a crack at a PR that solves this nicely (if such a thing exists)
and get back to you.

--
Ticket URL: <https://code.djangoproject.com/ticket/27408#comment:4>

Django

unread,
Nov 2, 2016, 9:31:49 AM11/2/16
to django-...@googlegroups.com
#27408: Make QuerySet.bulk_create() populate fields on related models
-------------------------------------+-------------------------------------
Reporter: Jarek Glowacki | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: bulk_create foreign | Triage Stage:
key id | Someday/Maybe
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

From what I remember there's already a ticket about keeping a foreign key
attribute and its underlying `_id` one synchronized which is the real
issue here as demonstrated by the failing example using save in comment:1.

--
Ticket URL: <https://code.djangoproject.com/ticket/27408#comment:5>

Django

unread,
Nov 2, 2016, 5:35:43 PM11/2/16
to django-...@googlegroups.com
#27408: Make QuerySet.bulk_create() populate fields on related models
-------------------------------------+-------------------------------------
Reporter: Jarek Glowacki | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: bulk_create foreign | Triage Stage:
key id | Someday/Maybe
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Jarek Glowacki):

I believe you're referring to this: #9553

The proposed solution there was to proactively update all instances that
had the just-saved instance as an FK. Big change, probably could hit
performance pretty hard in some situations.
If we want to solve this in the general case I'd be more inclined to just
use a fallback in the field descriptor's `__get__`, so that we're not
updating instances that might not end up getting accessed.

It's tricky, because the general case really wouldn't benefit much from
this, other than keeping things consistent. Hence I thought it might be
better to focus on just making it work for `bulk_create` (maybe with an
explicitly set flag even, to make it obvious).

I'll try both approaches over the weekend and see if I can come up with
something nice.

--
Ticket URL: <https://code.djangoproject.com/ticket/27408#comment:6>

Django

unread,
Apr 1, 2020, 4:23:49 AM4/1/20
to django-...@googlegroups.com
#27408: Make QuerySet.bulk_create() populate fields on related models
-------------------------------------+-------------------------------------
Reporter: Jarek Glowacki | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: bulk_create foreign | Triage Stage:
key id | Someday/Maybe
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by David Grant):

My solution to this problem is the following:

{{{
class BulkHelperMixin:
def bulk_create(self, objs, batch_size=None, ignore_conflicts=False):
foreign_key_fields = [field for field in
objs[0]._meta.get_fields() if type(field) == ForeignKey]
for obj in objs:
for foreign_key_field in foreign_key_fields:
parent_obj = getattr(obj, foreign_key_field.name)
if parent_obj and getattr(obj, foreign_key_field.column)
is None:
setattr(obj, foreign_key_field.name, parent_obj)
return super().bulk_create(objs, batch_size=batch_size,
ignore_conflicts=ignore_conflicts)


class BulkHelperManager(BulkHelperMixin, models.Manager):
pass

}}}

Then in my model I do something like:

{{{
objects = models.Manager()
bulk_manager = BulkHelperManager()
}}}

and then I just do bulk_create using my bulk_manager starting from the
parent going down to the children. It works beautifully!

--
Ticket URL: <https://code.djangoproject.com/ticket/27408#comment:7>

Django

unread,
May 5, 2020, 2:01:31 PM5/5/20
to django-...@googlegroups.com
#27408: Make QuerySet.bulk_create() populate fields on related models
-------------------------------------+-------------------------------------
Reporter: Jarek Glowacki | Owner: nobody
Type: New feature | Status: closed

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: wontfix

Keywords: bulk_create foreign | Triage Stage:
key id | Someday/Maybe
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by felixxm):

* status: new => closed
* resolution: => wontfix


Comment:

Closing in favor of #31539.

--
Ticket URL: <https://code.djangoproject.com/ticket/27408#comment:8>

Reply all
Reply to author
Forward
0 new messages