[Django] #29568: Avoid trying to UPDATE a parent model when its child just got INSERT

16 views
Skip to first unread message

Django

unread,
Jul 16, 2018, 11:20:18 AM7/16/18
to django-...@googlegroups.com
#29568: Avoid trying to UPDATE a parent model when its child just got INSERT
-------------------------------------+-------------------------------------
Reporter: François | Owner: nobody
Dupayrat |
Type: | Status: new
Cleanup/optimization |
Component: Database | Version: master
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
Hello,

While sorting through queries, I noticed something unusual: let's say you
have non-abstract models inheritance:
{{{
class Parent(models.Model):
parent_field = models.BooleanField()

class DirectChild(Parent):
direct_child_field = models.BooleanField()

class IndirectChild(DirectChild):
indirect_child_field = models.BooleanField()
}}}

When trying to create IndirectChild, I would expect 3 queries to be done:
* INSERT Parent
* INSERT DirectChild
* INSERT IndirectChild

Instead, since they all share the same PK, 5 queries are done:
* INSERT Parent
* UPDATE DirectChild (since it has a PK)
* INSERT DirectChild (because previous UPDATE failed)
* UPDATE IndirectChild (since it has a PK)
* INSERT IndirectChild (because previous UPDATE failed)

I found a fix that worked for me by modifying _save_parents in
django/db/models/base.py to return if something was inserted (default to
fAlse if there is no parent), and force_insert if the parent was inserted
(because if there parent was inserted, the child can't possibly exist).

Being a beginner, I'm really wary of breaking something. Could someone if
it's something wanted by Django and check if there is obvious mistake?
Here is the modified method:
{{{
def _save_parents(self, cls, using, update_fields):
"""Save all the parents of cls using values from self."""
meta = cls._meta
inserted = False
for parent, field in meta.parents.items():
# Make sure the link fields are synced between parent and
self.
if (field and getattr(self, parent._meta.pk.attname) is None
and
getattr(self, field.attname) is not None):
setattr(self, parent._meta.pk.attname, getattr(self,
field.attname))
parent_inserted = self._save_parents(cls=parent, using=using,
update_fields=update_fields)
updated = self._save_table(cls=parent, using=using,
update_fields=update_fields, force_insert=parent_inserted)
if not updated:
inserted = True
# Set the parent's PK value to self.
if field:
setattr(self, field.attname,
self._get_pk_val(parent._meta))
# Since we didn't have an instance of the parent handy set
# attname directly, bypassing the descriptor. Invalidate
# the related object cache, in case it's been accidentally
# populated. A fresh instance will be re-built from the
# database if necessary.
if field.is_cached(self):
field.delete_cached_value(self)
return inserted
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/29568>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jul 16, 2018, 11:45:48 AM7/16/18
to django-...@googlegroups.com
#29568: Avoid trying to UPDATE a parent model when its child just got INSERT
-------------------------------------+-------------------------------------
Reporter: François Dupayrat | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by François Dupayrat:

Old description:

New description:

Hello,

While sorting through queries, I noticed something unusual: let's say you
have non-abstract models inheritance:
{{{
class Parent(models.Model):
parent_field = models.BooleanField()

class DirectChild(Parent):
direct_child_field = models.BooleanField()

class IndirectChild(DirectChild):
indirect_child_field = models.BooleanField()
}}}

When trying to create IndirectChild, I would expect 3 queries to be done:
* INSERT Parent
* INSERT DirectChild
* INSERT IndirectChild

Instead, since they all share the same PK, 5 queries are done:
* INSERT Parent
* UPDATE DirectChild (since it has a PK)
* INSERT DirectChild (because previous UPDATE failed)
* UPDATE IndirectChild (since it has a PK)
* INSERT IndirectChild (because previous UPDATE failed)

I found a fix that worked for me by modifying _save_parents and save_base
in django/db/models/base.py: _save_parents return if something was


inserted (default to fAlse if there is no parent), and force_insert if the

parent was inserted in both methods (because if there parent was inserted,


the child can't possibly exist).

Being a beginner, I'm really wary of breaking something. Could someone if
it's something wanted by Django and check if there is obvious mistake?

Here is the modified method (without irrelevant code, materialized as
[...]):
{{{
def _save_parents([...]):
[...]


inserted = False
for parent, field in meta.parents.items():

[...]


parent_inserted = self._save_parents(cls=parent, using=using,
update_fields=update_fields)
updated = self._save_table(cls=parent, using=using,
update_fields=update_fields, force_insert=parent_inserted)
if not updated:
inserted = True

[...]
return inserted
def save_base([...]):
[...]
with transaction.atomic(using=using, savepoint=False):
parent_inserted = False
if not raw:
parent_inserted = self._save_parents(cls, using,
update_fields)
updated = self._save_table(raw, cls, force_insert or
parent_inserted, force_update, using, update_fields)
[...]
}}}

--

--
Ticket URL: <https://code.djangoproject.com/ticket/29568#comment:1>

Django

unread,
Jul 16, 2018, 11:46:05 AM7/16/18
to django-...@googlegroups.com
#29568: Avoid trying to UPDATE a parent model when its child just got INSERT
-------------------------------------+-------------------------------------
Reporter: François Dupayrat | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by François Dupayrat:

Old description:

> Hello,


>
> While sorting through queries, I noticed something unusual: let's say you
> have non-abstract models inheritance:
> {{{
> class Parent(models.Model):
> parent_field = models.BooleanField()
>
> class DirectChild(Parent):
> direct_child_field = models.BooleanField()
>
> class IndirectChild(DirectChild):
> indirect_child_field = models.BooleanField()
> }}}
>
> When trying to create IndirectChild, I would expect 3 queries to be done:
> * INSERT Parent
> * INSERT DirectChild
> * INSERT IndirectChild
>
> Instead, since they all share the same PK, 5 queries are done:
> * INSERT Parent
> * UPDATE DirectChild (since it has a PK)
> * INSERT DirectChild (because previous UPDATE failed)
> * UPDATE IndirectChild (since it has a PK)
> * INSERT IndirectChild (because previous UPDATE failed)
>

> I found a fix that worked for me by modifying _save_parents and save_base

> in django/db/models/base.py: _save_parents return if something was


> inserted (default to fAlse if there is no parent), and force_insert if

> the parent was inserted in both methods (because if there parent was


> inserted, the child can't possibly exist).
>
> Being a beginner, I'm really wary of breaking something. Could someone if
> it's something wanted by Django and check if there is obvious mistake?

> Here is the modified method (without irrelevant code, materialized as
> [...]):
> {{{
> def _save_parents([...]):
> [...]

> inserted = False
> for parent, field in meta.parents.items():

> [...]


> parent_inserted = self._save_parents(cls=parent, using=using,
> update_fields=update_fields)
> updated = self._save_table(cls=parent, using=using,
> update_fields=update_fields, force_insert=parent_inserted)
> if not updated:
> inserted = True

> [...]
> return inserted
> def save_base([...]):
> [...]
> with transaction.atomic(using=using, savepoint=False):
> parent_inserted = False
> if not raw:
> parent_inserted = self._save_parents(cls, using,
> update_fields)
> updated = self._save_table(raw, cls, force_insert or
> parent_inserted, force_update, using, update_fields)
> [...]
> }}}

New description:

Hello,

While sorting through queries, I noticed something unusual: let's say you
have non-abstract models inheritance:
{{{
class Parent(models.Model):
parent_field = models.BooleanField()

class DirectChild(Parent):
direct_child_field = models.BooleanField()

class IndirectChild(DirectChild):
indirect_child_field = models.BooleanField()
}}}

When trying to create IndirectChild, I would expect 3 queries to be done:
* INSERT Parent
* INSERT DirectChild
* INSERT IndirectChild

Instead, since they all share the same PK, 5 queries are done:
* INSERT Parent
* UPDATE DirectChild (since it has a PK)
* INSERT DirectChild (because previous UPDATE failed)
* UPDATE IndirectChild (since it has a PK)
* INSERT IndirectChild (because previous UPDATE failed)

I found a fix that worked for me by modifying _save_parents and save_base
in django/db/models/base.py: _save_parents return if something was


inserted (default to fAlse if there is no parent), and force_insert if the

parent was inserted in both methods (because if there parent was inserted,


the child can't possibly exist).

Being a beginner, I'm really wary of breaking something. Could someone if
it's something wanted by Django and check if there is obvious mistake?

Here is the modified method (without irrelevant code, materialized as
[...]):
{{{
def _save_parents([...]):
[...]

inserted = False
for parent, field in meta.parents.items():

[...]


parent_inserted = self._save_parents(cls=parent, using=using,
update_fields=update_fields)
updated = self._save_table(cls=parent, using=using,
update_fields=update_fields, force_insert=parent_inserted)
if not updated:
inserted = True

[...]
return inserted

def save_base([...]):
[...]
with transaction.atomic(using=using, savepoint=False):
parent_inserted = False
if not raw:
parent_inserted = self._save_parents(cls, using,
update_fields)
updated = self._save_table(raw, cls, force_insert or
parent_inserted, force_update, using, update_fields)
[...]
}}}

--

--
Ticket URL: <https://code.djangoproject.com/ticket/29568#comment:2>

Django

unread,
Jul 16, 2018, 12:49:37 PM7/16/18
to django-...@googlegroups.com
#29568: Avoid trying to UPDATE a parent model when its child just got INSERT
-------------------------------------+-------------------------------------
Reporter: François Dupayrat | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Claude Paroz):

I guess a first experiment would be to make the change and run the test
suite to see the outcome.

--
Ticket URL: <https://code.djangoproject.com/ticket/29568#comment:3>

Django

unread,
Jul 16, 2018, 3:22:04 PM7/16/18
to django-...@googlegroups.com
#29568: Avoid trying to UPDATE a parent model when its child just got INSERT
-------------------------------------+-------------------------------------
Reporter: François Dupayrat | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham):

* stage: Unreviewed => Accepted


Comment:

With`IndirectChild.objects.create(parent_field=True,
direct_child_field=True, indirect_child_field=True)`, I see four queries
at 4d48ddd8f93800a80330ec1dee7b7d4afe6ea95a:
{{{
1. INSERT INTO "t29568_parent" ("parent_field") VALUES (true) RETURNING
"t29568_parent"."id"
2. UPDATE "t29568_directchild" SET "direct_child_field" = true WHERE
"t29568_directchild"."parent_ptr_id" = 1
3. INSERT INTO "t29568_directchild" ("parent_ptr_id",
"direct_child_field") VALUES (1, true)
4. INSERT INTO "t29568_indirectchild" ("directchild_ptr_id",
"indirect_child_field") VALUES (1, true)
}}}
Accepting for investigation.

--
Ticket URL: <https://code.djangoproject.com/ticket/29568#comment:4>

Django

unread,
Jul 17, 2018, 4:00:43 AM7/17/18
to django-...@googlegroups.com
#29568: Avoid trying to UPDATE a parent model when its child just got INSERT
-------------------------------------+-------------------------------------
Reporter: François Dupayrat | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Old description:

> Hello,
>
> While sorting through queries, I noticed something unusual: let's say you
> have non-abstract models inheritance:
> {{{
> class Parent(models.Model):
> parent_field = models.BooleanField()
>
> class DirectChild(Parent):
> direct_child_field = models.BooleanField()
>
> class IndirectChild(DirectChild):
> indirect_child_field = models.BooleanField()
> }}}
>
> When trying to create IndirectChild, I would expect 3 queries to be done:
> * INSERT Parent
> * INSERT DirectChild
> * INSERT IndirectChild
>
> Instead, since they all share the same PK, 5 queries are done:
> * INSERT Parent
> * UPDATE DirectChild (since it has a PK)
> * INSERT DirectChild (because previous UPDATE failed)
> * UPDATE IndirectChild (since it has a PK)
> * INSERT IndirectChild (because previous UPDATE failed)
>

> I found a fix that worked for me by modifying _save_parents and save_base

> in django/db/models/base.py: _save_parents return if something was


> inserted (default to fAlse if there is no parent), and force_insert if

> the parent was inserted in both methods (because if there parent was


> inserted, the child can't possibly exist).
>
> Being a beginner, I'm really wary of breaking something. Could someone if
> it's something wanted by Django and check if there is obvious mistake?

> Here is the modified method (without irrelevant code, materialized as
> [...]):
> {{{
> def _save_parents([...]):
> [...]

> inserted = False
> for parent, field in meta.parents.items():

> [...]


> parent_inserted = self._save_parents(cls=parent, using=using,
> update_fields=update_fields)
> updated = self._save_table(cls=parent, using=using,
> update_fields=update_fields, force_insert=parent_inserted)
> if not updated:
> inserted = True

> [...]
> return inserted
>
> def save_base([...]):
> [...]
> with transaction.atomic(using=using, savepoint=False):
> parent_inserted = False
> if not raw:
> parent_inserted = self._save_parents(cls, using,
> update_fields)
> updated = self._save_table(raw, cls, force_insert or
> parent_inserted, force_update, using, update_fields)
> [...]
> }}}

New description:

Hello,

While sorting through queries, I noticed something unusual: let's say you
have non-abstract models inheritance:
{{{
class Parent(models.Model):
parent_field = models.BooleanField()

class DirectChild(Parent):
direct_child_field = models.BooleanField()

class IndirectChild(DirectChild):
indirect_child_field = models.BooleanField()
}}}

When trying to create IndirectChild, I would expect 3 queries to be done:
* INSERT Parent
* INSERT DirectChild
* INSERT IndirectChild

Instead, since they all share the same PK, when trying to save a
IndirectChild not yet persisted to DB, 5 queries are done:


* INSERT Parent
* UPDATE DirectChild (since it has a PK)
* INSERT DirectChild (because previous UPDATE failed)
* UPDATE IndirectChild (since it has a PK)
* INSERT IndirectChild (because previous UPDATE failed)

Note: when trying to use IndirectChild.objects.create, only 4 queries are
made, since QuerySet.create use force_insert=True (thanks to Tim Graham)

I found a fix that worked for me by modifying _save_parents and save_base

in django/db/models/base.py: _save_parents return if something was


inserted (default to fAlse if there is no parent), and force_insert if the

parent was inserted in both methods (because if there parent was inserted,


the child can't possibly exist).

Being a beginner, I'm really wary of breaking something. Could someone if
it's something wanted by Django and check if there is obvious mistake?

Here is the modified method (without irrelevant code, materialized as
[...]):
{{{
def _save_parents([...]):
[...]

inserted = False
for parent, field in meta.parents.items():

[...]


parent_inserted = self._save_parents(cls=parent, using=using,
update_fields=update_fields)
updated = self._save_table(cls=parent, using=using,
update_fields=update_fields, force_insert=parent_inserted)
if not updated:
inserted = True

[...]
return inserted

def save_base([...]):
[...]
with transaction.atomic(using=using, savepoint=False):
parent_inserted = False
if not raw:
parent_inserted = self._save_parents(cls, using,
update_fields)
updated = self._save_table(raw, cls, force_insert or
parent_inserted, force_update, using, update_fields)
[...]
}}}

--

Comment (by François Dupayrat):

Replying to [comment:3 Claude Paroz]:


> I guess a first experiment would be to make the change and run the test
suite to see the outcome.

Thanks, I'll get that done.

Replying to [comment:4 Tim Graham]:


> With`IndirectChild.objects.create(parent_field=True,
direct_child_field=True, indirect_child_field=True)`, I see four queries
at 4d48ddd8f93800a80330ec1dee7b7d4afe6ea95a:
> {{{
> 1. INSERT INTO "t29568_parent" ("parent_field") VALUES (true) RETURNING
"t29568_parent"."id"
> 2. UPDATE "t29568_directchild" SET "direct_child_field" = true WHERE
"t29568_directchild"."parent_ptr_id" = 1
> 3. INSERT INTO "t29568_directchild" ("parent_ptr_id",
"direct_child_field") VALUES (1, true)
> 4. INSERT INTO "t29568_indirectchild" ("directchild_ptr_id",
"indirect_child_field") VALUES (1, true)
> }}}
> Accepting for investigation.

Indeed, I didn't test with IndirectChild.objects.create, but with save()
on a IndirectChild not yet saved to DB, since it's done that way by a
library (django-tastypie), and outside my control.

I edited the description to reflect that.

--
Ticket URL: <https://code.djangoproject.com/ticket/29568#comment:5>

Django

unread,
Jul 18, 2018, 4:04:07 PM7/18/18
to django-...@googlegroups.com
#29568: Avoid trying to UPDATE a parent model when its child just got INSERT
-------------------------------------+-------------------------------------
Reporter: François Dupayrat | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by François Dupayrat):

* has_patch: 0 => 1


Comment:

Pull request created: https://github.com/django/django/pull/10197

Tests showed a different behavior than the one listed in the ticket:
instead of doing UPDATE queries, SELECT queries were made in-between
INSERT INTO queries. Editing Parent, Child and GrandChild to add a Boolean
field revert to UPDATE queries.

In both cases, only INSERT INTO queries should be made anyway, so I don't
think it's necessary to add fields to Parent, Child or GrandChild.


Summary of queries:
- with existing I see 6 queries on GrandChild.objects.create()
{{{
INSERT INTO "model_inheritance_grandparent" ("first_name", "last_name",
"email", "place_id") VALUES ('', '', '', NULL)
SELECT (1) AS "a" FROM "model_inheritance_parent" WHERE
"model_inheritance_parent"."grandparent_ptr_id" = 1 LIMIT 1
INSERT INTO "model_inheritance_parent" ("grandparent_ptr_id") SELECT 1
SELECT (1) AS "a" FROM "model_inheritance_child" WHERE
"model_inheritance_child"."parent_ptr_id" = 1 LIMIT 1
INSERT INTO "model_inheritance_child" ("parent_ptr_id") SELECT 1
INSERT INTO "model_inheritance_grandchild" ("child_ptr_id") SELECT 1
}}}
- by adding a BooleanField with a default value to each of Parent, Child
and GrandChild, I still see 6 queries
{{{
INSERT INTO "model_inheritance_grandparent" ("first_name", "last_name",
"email", "place_id") VALUES ('', '', '', NULL)
UPDATE "model_inheritance_parent" SET "parent_field" = 1 WHERE
"model_inheritance_parent"."grandparent_ptr_id" = 1
INSERT INTO "model_inheritance_parent" ("grandparent_ptr_id",
"parent_field") SELECT 1, 1
UPDATE "model_inheritance_child" SET "child_field" = 1 WHERE
"model_inheritance_child"."parent_ptr_id" = 1
INSERT INTO "model_inheritance_child" ("parent_ptr_id", "child_field")
SELECT 1, 1
INSERT INTO "model_inheritance_grandchild" ("child_ptr_id",
"grand_child_field") SELECT 1, 1
}}}
- with patch I see 4 queries, both for modified and unmodified Models,
provided the new fields have default values.
{{{
INSERT INTO "model_inheritance_grandparent" ("first_name", "last_name",
"email", "place_id") VALUES ('', '', '', NULL)
INSERT INTO "model_inheritance_parent" ("grandparent_ptr_id") SELECT 1
INSERT INTO "model_inheritance_child" ("parent_ptr_id") SELECT 1
INSERT INTO "model_inheritance_grandchild" ("child_ptr_id") SELECT 1
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/29568#comment:6>

Django

unread,
Jul 20, 2018, 9:00:03 AM7/20/18
to django-...@googlegroups.com
#29568: Avoid trying to UPDATE a parent model when its child just got INSERT
-------------------------------------+-------------------------------------
Reporter: François Dupayrat | Owner: nobody
Type: | Status: closed

Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: fixed
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham <timograham@…>):

* status: new => closed
* resolution: => fixed


Comment:

In [changeset:"861638a3074f289faf729d739edf6326332a484f" 861638a]:
{{{
#!CommitTicketReference repository=""
revision="861638a3074f289faf729d739edf6326332a484f"
Fixed #29568 -- Prevented unnecessary UPDATE queries creating child
models.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/29568#comment:7>

Reply all
Reply to author
Forward
0 new messages