[Django] #33191: Avoid unnecessary clear of cached reference

6 views
Skip to first unread message

Django

unread,
Oct 12, 2021, 4:49:26 PM10/12/21
to django-...@googlegroups.com
#33191: Avoid unnecessary clear of cached reference
-------------------------------------+-------------------------------------
Reporter: | Owner: nobody
callerbear |
Type: | Status: new
Cleanup/optimization |
Component: Database | Version: 3.2
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
Consider this case of ORM models "Parent" and "Child", where Child has a
foreign key reference to Parent (and the database can return generated IDs
following insert operations):

{{{
parent = Parent(name='parent_object')
child = child(parent=parent)
parent.save()
child.save()
print(child.parent.name)
}}}

The print statement will cause an unnecessary lazy read of the parent
object.

In the application where this behavior was first observed, the application
was creating thousands of parent and child objects using bulk_create().
The subsequent lazy reads occurred when creating log entries to record the
action, and added thousands of unwanted SELECT queries.

Closed ticket 29497 solved a problem with potential data loss in this
situation by essentially executing {{{child.parent_id = child.parent.pk}}}
while preparing the child object to be saved. However, when the child's
ForeignKeyDeferredAttrbute "parent_id" changes value from None to the
parent's ID, the child's internal cache containing the reference to
"parent" is cleared. The subsequent reference to child.parent then must
do a lazy read and reload parent from the database.

A workaround to avoid this lazy read is to explicitly update both the
"parent_id" and "parent" cache entry by adding this non-intuitive
statement:
{{{child.parent = child.parent}}}
after executing parent.save()

But it appears that a simple change could avoid clearing the cache in this
narrow case.
Within Model._prepare_related_fields_for_save(), replace
{{{setattr(self, field.attname, obj.pk)}}}
with
{{{self.__dict__[field.attname] = obj.pk}}}

This suggested code has -not- been tested.

This change would set the associated "parent_id" attribute to the desired
value without affecting the cache. In this spot of the code, "obj" is
currently set to the cached parent object that we want to preserve, and
we're just reconciling the associated copy of the parent's primary key.

--
Ticket URL: <https://code.djangoproject.com/ticket/33191>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Oct 12, 2021, 4:58:55 PM10/12/21
to django-...@googlegroups.com
#33191: Avoid unnecessary clear of cached reference
-------------------------------------+-------------------------------------
Reporter: Barry Johnson | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Barry Johnson:

Old description:

New description:

Consider this case of ORM models "Parent" and "Child", where Child has a
foreign key reference to Parent (and the database can return generated IDs
following insert operations):

{{{
parent = Parent(name='parent_object')

child = Child(parent=parent)
parent.save()
child.save()
print(child.parent.name)
}}}

--

--
Ticket URL: <https://code.djangoproject.com/ticket/33191#comment:1>

Django

unread,
Oct 13, 2021, 4:11:31 AM10/13/21
to django-...@googlegroups.com
#33191: Avoid unnecessary clear of cached reference
-------------------------------------+-------------------------------------
Reporter: Barry Johnson | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Carlton Gibson:

Old description:

> Consider this case of ORM models "Parent" and "Child", where Child has a
> foreign key reference to Parent (and the database can return generated
> IDs following insert operations):
>
> {{{
> parent = Parent(name='parent_object')

> child = Child(parent=parent)

New description:

Consider this case of ORM models "Parent" and "Child", where Child has a
foreign key reference to Parent (and the database can return generated IDs
following insert operations):

{{{
parent = Parent(name='parent_object')

child = Child(parent=parent)
parent.save()
child.save()
print(child.parent.name)
}}}

The print statement will cause an unnecessary lazy read of the parent
object.

In the application where this behavior was first observed, the application
was creating thousands of parent and child objects using bulk_create().
The subsequent lazy reads occurred when creating log entries to record the
action, and added thousands of unwanted SELECT queries.

Closed ticket #29497 solved a problem with potential data loss in this


situation by essentially executing {{{child.parent_id = child.parent.pk}}}
while preparing the child object to be saved. However, when the child's
ForeignKeyDeferredAttrbute "parent_id" changes value from None to the
parent's ID, the child's internal cache containing the reference to
"parent" is cleared. The subsequent reference to child.parent then must
do a lazy read and reload parent from the database.

A workaround to avoid this lazy read is to explicitly update both the
"parent_id" and "parent" cache entry by adding this non-intuitive
statement:
{{{child.parent = child.parent}}}
after executing parent.save()

But it appears that a simple change could avoid clearing the cache in this
narrow case.
Within Model._prepare_related_fields_for_save(), replace
{{{setattr(self, field.attname, obj.pk)}}}
with
{{{self.__dict__[field.attname] = obj.pk}}}

This suggested code has -not- been tested.

This change would set the associated "parent_id" attribute to the desired
value without affecting the cache. In this spot of the code, "obj" is
currently set to the cached parent object that we want to preserve, and
we're just reconciling the associated copy of the parent's primary key.

--

--
Ticket URL: <https://code.djangoproject.com/ticket/33191#comment:2>

Django

unread,
Oct 14, 2021, 2:03:41 AM10/14/21
to django-...@googlegroups.com
#33191: Avoid unnecessary clear of cached reference
-------------------------------------+-------------------------------------
Reporter: Barry Johnson | Owner: nobody
Type: | Status: closed

Cleanup/optimization |
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* status: new => closed
* resolution: => wontfix


Comment:

Hi Barry. Thanks for the report.

For me, this is just how things are expected to work, so I'm going to say
wontfix. I'm happy if you want to take it to the DevelopersMailingList for
a wider discussion, but (in general) I'm suspicious of a ''"simple
change"'' adjusting a long-standing behaviour like this… (Perhaps there's
a case to be made on the mailing list though.)

To address the example, my first thought is that you should be saving
`parent` before creating child:

{{{
parent = Parent(name='parent_object')

parent.save()
child = Child(parent=parent)
...
}}}

My second is that, you have `parent`, so print the name from there:

{{{
print(parent.name)
}}}

I appreciate you've no-doubt reduced this in order to show the issue more
clearly than in a real example, but I'd suspect those thoughts would go
back to the more realistic case.

However, as I say, the DevelopersMailingList is a better venue to pursue
this kind of thing.
Thanks.

--
Ticket URL: <https://code.djangoproject.com/ticket/33191#comment:3>

Django

unread,
Oct 18, 2021, 6:24:27 AM10/18/21
to django-...@googlegroups.com
#33191: Avoid unnecessary clear of cached reference
-------------------------------------+-------------------------------------
Reporter: Barry Johnson | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* status: closed => new
* resolution: wontfix =>
* stage: Unreviewed => Accepted


Comment:

[https://groups.google.com/g/django-
developers/c/JNCrmmSAU5Y/m/7NFUx9eTAAAJ Follow up discussion on the
mailing list].
Reopening based on the discussion there.

--
Ticket URL: <https://code.djangoproject.com/ticket/33191#comment:4>

Django

unread,
Oct 18, 2021, 8:30:25 AM10/18/21
to django-...@googlegroups.com
#33191: Avoid unnecessary clear of cached reference
-------------------------------------+-------------------------------------
Reporter: Barry Johnson | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by John Speno):

* cc: John Speno (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/33191#comment:5>

Django

unread,
Oct 18, 2021, 11:30:24 AM10/18/21
to django-...@googlegroups.com
#33191: Avoid unnecessary clear of cached reference
-------------------------------------+-------------------------------------
Reporter: Barry Johnson | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Barry Johnson:

Old description:

> Consider this case of ORM models "Parent" and "Child", where Child has a


> foreign key reference to Parent (and the database can return generated
> IDs following insert operations):
>
> {{{
> parent = Parent(name='parent_object')

> child = Child(parent=parent)


> parent.save()
> child.save()
> print(child.parent.name)
> }}}
>
> The print statement will cause an unnecessary lazy read of the parent
> object.
>
> In the application where this behavior was first observed, the
> application was creating thousands of parent and child objects using
> bulk_create(). The subsequent lazy reads occurred when creating log
> entries to record the action, and added thousands of unwanted SELECT
> queries.
>

> Closed ticket #29497 solved a problem with potential data loss in this


> situation by essentially executing {{{child.parent_id =
> child.parent.pk}}} while preparing the child object to be saved.
> However, when the child's ForeignKeyDeferredAttrbute "parent_id" changes
> value from None to the parent's ID, the child's internal cache containing
> the reference to "parent" is cleared. The subsequent reference to
> child.parent then must do a lazy read and reload parent from the
> database.
>
> A workaround to avoid this lazy read is to explicitly update both the
> "parent_id" and "parent" cache entry by adding this non-intuitive
> statement:
> {{{child.parent = child.parent}}}
> after executing parent.save()
>
> But it appears that a simple change could avoid clearing the cache in
> this narrow case.
> Within Model._prepare_related_fields_for_save(), replace
> {{{setattr(self, field.attname, obj.pk)}}}
> with
> {{{self.__dict__[field.attname] = obj.pk}}}
>
> This suggested code has -not- been tested.
>
> This change would set the associated "parent_id" attribute to the desired
> value without affecting the cache. In this spot of the code, "obj" is
> currently set to the cached parent object that we want to preserve, and
> we're just reconciling the associated copy of the parent's primary key.

New description:

Consider this case of ORM models "Parent" and "Child", where Child has a
foreign key reference to Parent (and the database can return generated IDs
following insert operations):

{{{
parent = Parent(name='parent_object')

child = Child(parent=parent)
parent.save()
child.save()
print(child.parent.name)
}}}

The print statement will cause an unnecessary lazy read of the parent
object.

In the application where this behavior was first observed, the application
was creating thousands of parent and child objects using bulk_create().
The subsequent lazy reads occurred when creating log entries to record the
action, and added thousands of unwanted SELECT queries.

Closed ticket #29497 solved a problem with potential data loss in this


situation by essentially executing {{{child.parent_id = child.parent.pk}}}
while preparing the child object to be saved. However, when the child's
ForeignKeyDeferredAttrbute "parent_id" changes value from None to the
parent's ID, the child's internal cache containing the reference to
"parent" is cleared. The subsequent reference to child.parent then must
do a lazy read and reload parent from the database.

A workaround to avoid this lazy read is to explicitly update both the
"parent_id" and "parent" cache entry by adding this non-intuitive
statement:
{{{child.parent = child.parent}}}
after executing parent.save()

But it appears that a simple change could avoid clearing the cache in this
narrow case.
Within Model._prepare_related_fields_for_save(), replace
{{{setattr(self, field.attname, obj.pk)}}}
with

{{{setattr(self, field.name, obj)}}}

This suggested code has -not- been tested.

This change would set the associated "parent_id" attribute while ensuring
that the currently referenced object remains referenced.

--

--
Ticket URL: <https://code.djangoproject.com/ticket/33191#comment:6>

Django

unread,
Oct 18, 2021, 11:47:11 AM10/18/21
to django-...@googlegroups.com
#33191: Avoid unnecessary clear of cached reference
-------------------------------------+-------------------------------------
Reporter: Barry Johnson | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Barry Johnson):

Changed the suggested correction based on the discussion within the
mailing list.

--
Ticket URL: <https://code.djangoproject.com/ticket/33191#comment:7>

Django

unread,
Oct 19, 2021, 2:37:51 AM10/19/21
to django-...@googlegroups.com
#33191: Avoid unnecessary clear of cached reference
-------------------------------------+-------------------------------------
Reporter: Barry Johnson | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak):

Replying to [comment:7 Barry Johnson]:


> Changed the suggested correction based on the discussion within the
mailing list.

Would you like to prepare a patch?

--
Ticket URL: <https://code.djangoproject.com/ticket/33191#comment:8>

Django

unread,
Oct 22, 2021, 2:06:07 PM10/22/21
to django-...@googlegroups.com
#33191: Avoid unnecessary clear of cached reference
-------------------------------------+-------------------------------------
Reporter: Barry Johnson | Owner: Vishal
Type: | Pandey
Cleanup/optimization | Status: assigned

Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Vishal Pandey):

* owner: nobody => Vishal Pandey
* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/33191#comment:9>

Django

unread,
May 31, 2022, 3:10:39 AM5/31/22
to django-...@googlegroups.com
#33191: Avoid unnecessary clear of cached reference
-------------------------------------+-------------------------------------
Reporter: Barry Johnson | Owner:
Type: | AllenJonathan

Cleanup/optimization | Status: assigned
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* owner: nobody => AllenJonathan


* status: new => assigned

* has_patch: 0 => 1


Comment:

[https://github.com/django/django/pull/15737 PR]

Django

unread,
May 31, 2022, 4:21:25 AM5/31/22
to django-...@googlegroups.com
#33191: Avoid unnecessary clear of cached reference
-------------------------------------+-------------------------------------
Reporter: Barry Johnson | Owner:
Type: | AllenJonathan
Cleanup/optimization | Status: assigned
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/33191#comment:10>

Django

unread,
May 31, 2022, 5:05:49 AM5/31/22
to django-...@googlegroups.com
#33191: Avoid unnecessary clear of cached reference
-------------------------------------+-------------------------------------
Reporter: Barry Johnson | Owner:
Type: | AllenJonathan
Cleanup/optimization | Status: closed

Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak <felisiak.mariusz@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"1058fc7023d04d07c22a5e667b6a446705119b14" 1058fc70]:
{{{
#!CommitTicketReference repository=""
revision="1058fc7023d04d07c22a5e667b6a446705119b14"
Fixed #33191 -- Avoided clearing cached reference when saving child after
parent.

Thanks Barry Johnson for the report.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/33191#comment:11>

Reply all
Reply to author
Forward
0 new messages