Repository with test case: https://github.com/mrfleap/django-db-test
The underlying query is admittedly repetitive, but I still feel it this
degradation in performance should not go overlooked. Through some basic
debugging I believe the issue may lie django/utils/tree.py on line 93
(https://github.com/django/django/blob/stable/3.2.x/django/utils/tree.py#L93)
where it checks to see if a node is within a list of children, one of my
basic tests found the operation to take 9 seconds alone, rather than the
fractions of a millisecond it usually takes. I've attached a screenshot of
my breakpoint to showcase that it's taking ~9 seconds to determine if one
node exists in an array of two.
I'd also like to point out that I am using Django Guardian quite heavily
in this scenario, which is leading to the complicated query in the first
place. However, as far as I am aware, there is no difference in the
generated query between both versions, and the difference in speed is
based on how Django is resolving the internal objects into a SQL
statement.
Please let me know if there's any other information I can provide to help
fix this issue.
--
Ticket URL: <https://code.djangoproject.com/ticket/32632>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* Attachment "slow_tree_resolution.png" added.
Screenshot showing breakpoint set during query building in a separate
repository
* cc: Simon Charette (added)
* severity: Normal => Release blocker
* stage: Unreviewed => Accepted
Comment:
Thanks for the report. Tentatively accepted, I'm not sure if it's feasible
to improve a performance and keep the fix for #32143.
Performance regression in bbf141bcdc31f1324048af9233583a523ac54c94.
Reproduced at 1351f2ee163145df2cf5471eb3e57289f8853512.
--
Ticket URL: <https://code.djangoproject.com/ticket/32632#comment:1>
* owner: nobody => Simon Charette
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/32632#comment:2>
* cc: Oskar Persson (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/32632#comment:3>
Comment (by Simon Charette):
Alright so I did a bit of digging in the past hour and here are my
findings
Before bbf141bcdc31f1324048af9233583a523ac54c94 `Query.__eq__` was wrongly
returning `True` due to making it a subclass `BaseExpression` in
35431298226165986ad07e91f9d3aca721ff38ec. Before these changes
`Query.__eq__` was not implemented which meant that
`Model.objects.filter(foo=bar).query !=
Model.objects.filter(foo=bar).query` as `object.__eq__` is based of
`id(self) == id(other)`. This is problematic for expressions such as
`Exists` and `Subquery` which are based off `sql.Query` and need to be
able to compare to each other to implement the ''expression'' API.
Since `sql.Query` is an immutable internal object we could possibly
`cached_property(identity)` and move along as that cuts the slowdown in
about half, that does feel like a bandaid solution though. It's
interesting to me that `Node.add` would compare all its children in the
first place as that seems like a costly operation but I do wonder if
After playing around with a few things here's a list of two proposed
solutions
1. Remove the `tree.Node.add` branch for `if data in self.children`. While
[https://djangoci.com/job/django-coverage/HTML_20Coverage_20Report/ it's
covered by the suite] only
[https://github.com/django/django/blob/ca9872905559026af82000e46cde6f7dedc897b6/tests/queries/tests.py#L1737-L1747
a single test in the whole suite reaches it] the branch under arguably
convoluted conditions while the check is performed 99% of the time when
`filter` and `exclude` are called to perform expensive comparisons of
contained expressions. I wouldn't be surprised if it was at the origin of
others recent slowdowns in the ORM query construction due to the
introduction of `BaseExpression.identity` to needs for deconstruc-ability
imposed by `Index(condition)` and `Constraint(condition)` for little
benefit.
2. Make `Subquery` (and `Exists`) explicitly non-deconstructible; in
practice they already are because `sql.Query` doesn't properly implement
`deconstruct` so there's little value in implementing `identity` for them
in the first place.
3. Remove the `sql.Query(BaseExpression)` subclassing as it's no longer
necessary in practice. That'll have the side effect of removing support
for `Model.objects.filter(foo=bar).query ==
Model.objects.filter(foo=bar).query` which is a nice property but since we
don't internally have a need for it ''yet'' there's little point in the
complexity it incurs.
I'd suggest we backport 2. and 3. to address this ticket and move the
discussion about 1. in a new ''Cleanup/optimization'' one. Let me know if
that makes sense to you and I'll commit to a PR before the end of April.
--
Ticket URL: <https://code.djangoproject.com/ticket/32632#comment:4>
Comment (by Mariusz Felisiak):
Thanks for the investigation!
2 & 3 make sense for me, however I would like to see an implementation
first. This can be too massive for backporting. I'm a bit afraid that we
can open the Pandora's box.
Will we be able to keep support for combining `Q()` objects
[https://github.com/django/django/blob/78eaae9bf16f455c3a35234e0a04c08d5c82f808/django/db/models/query_utils.py#L46-L53
with boolean expressions]?
--
Ticket URL: <https://code.djangoproject.com/ticket/32632#comment:5>
* has_patch: 0 => 1
Comment:
> 2 & 3 make sense for me, however I would like to see an implementation
first. This can be too massive for backporting. I'm a bit afraid that we
can open the Pandora's box.
Hopefully the proposed patch is not too invasive, I could try to figure
out another approach. From an invasivity reduction perspective removing
the `Node.add` branch seems like a clear choice for this particular issue
but I wouldn't be surprised if other code paths were affected by the
slowness introduced by `sql.Query.__eq__`.
> Will we be able to keep support for combining Q() objects with boolean
expressions?
Yep, it shouldn't be an issue. The main change is really that `Subquery`
doesn't partially implement the deconstruct API which is only really
useful for migrations anyway.
--
Ticket URL: <https://code.djangoproject.com/ticket/32632#comment:6>
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/32632#comment:7>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"4f600673d71cd99918755805042b7c039645f712" 4f600673]:
{{{
#!CommitTicketReference repository=""
revision="4f600673d71cd99918755805042b7c039645f712"
Refs #32632 -- Added tests for returning a copy when combining Q()
objects.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/32632#comment:8>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"c8b659430556dca0b2fe27cf2ea0f8290dbafecd" c8b65943]:
{{{
#!CommitTicketReference repository=""
revision="c8b659430556dca0b2fe27cf2ea0f8290dbafecd"
Fixed #32632, Fixed #32657 -- Removed flawed support for Subquery
deconstruction.
Subquery deconstruction support required implementing complex and
expensive equality rules for sql.Query objects for little benefit as
the latter cannot themselves be made deconstructible to their reference
to model classes.
Making Expression @deconstructible and not BaseExpression allows
interested parties to conform to the "expression" API even if they are
not deconstructible as it's only a requirement for expressions allowed
in Model fields and meta options (e.g. constraints, indexes).
Thanks Phillip Cutter for the report.
This also fixes a performance regression in
bbf141bcdc31f1324048af9233583a523ac54c94.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/32632#comment:9>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"d5add5d3a26f98e961dfbcad67bb04d936f2f332" d5add5d3]:
{{{
#!CommitTicketReference repository=""
revision="d5add5d3a26f98e961dfbcad67bb04d936f2f332"
[3.2.x] Fixed #32632, Fixed #32657 -- Removed flawed support for Subquery
deconstruction.
Subquery deconstruction support required implementing complex and
expensive equality rules for sql.Query objects for little benefit as
the latter cannot themselves be made deconstructible to their reference
to model classes.
Making Expression @deconstructible and not BaseExpression allows
interested parties to conform to the "expression" API even if they are
not deconstructible as it's only a requirement for expressions allowed
in Model fields and meta options (e.g. constraints, indexes).
Thanks Phillip Cutter for the report.
This also fixes a performance regression in
bbf141bcdc31f1324048af9233583a523ac54c94.
Backport of c8b659430556dca0b2fe27cf2ea0f8290dbafecd from main
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/32632#comment:10>