Re: [Django] #33937: Optimize m2m serialization to avoid loading full model instances (was: Serialize is loading full objects when serializing m2m fields.)

6 views

Skip to first unread message

Django

unread,

Aug 18, 2022, 9:16:01 AM8/18/22

to django-...@googlegroups.com

#33937: Optimize m2m serialization to avoid loading full model instances
--------------------------------------+------------------------------------
Reporter: Gordon Wrigley | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Core (Serialization) | Version: 4.0
Severity: Normal | Resolution:
Keywords: performance | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by Adam Johnson):

* type: Uncategorized => Cleanup/optimization
* stage: Unreviewed => Accepted

Old description:

> When not using natural keys, this function
> https://github.com/django/django/blob/main/django/core/serializers/python.py#L64
> loads the full object for every entry in the m2m, when it only actually
> wants the pks which it could get off the m2m intermediate table without
> even joining to the target table.
>
> In my case the table we are m2m'ing to has files in it, so that's a
> weighty fetch.
> We are using django-reversion which stores a serialized version of each
> save.
> On the workload that flagged this up enabling reversion incurs a 300x
> performance hit (from half a second to 2.5 minutes) and it's almost
> entirely because of this.

New description:

When not using natural keys, the `handle_m2m_field`
function([https://github.com/django/django/blob/aed60aee38215e293d6ec2f3c96ec55bb9a62fc2/django/core/serializers/python.py#L64
source]) loads the full object for every entry in the m2m model, when it
only needs the pks. The pk's can even be obtained from the m2m
intermediate table, without joining the target table.

In my case the table we are m2m'ing to has files in it, so that's a
weighty fetch.
We are using django-reversion which stores a serialized version of each
save.
On the workload that flagged this up enabling reversion incurs a 300x
performance hit (from half a second to 2.5 minutes) and it's almost
entirely because of this.

--
Ticket URL: <https://code.djangoproject.com/ticket/33937#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Reply all

Reply to author

Forward

0 new messages