{{{
Book.objects.annotate(Count('chapters')).count()
}}}
It produces the same results as:
{{{
Book.objects.count()
}}}
Django could be more intelligent about what annotations to include in the
query produced by `queryset.count()`, stripping out any annotations that
are not referenced by filters, other annotations or ordering. This should
speed up calls to count() with complex annotations.
There seems to be precedent for this: select_related calls are ignored
with count() queries.
--
Ticket URL: <https://code.djangoproject.com/ticket/28477>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:1>
* owner: nobody => Tom
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:2>
Comment (by Tom):
Same can be done for `exists()`
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:3>
Comment (by Tom Forbes):
WIP PR: https://github.com/django/django/pull/8928/files
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:4>
* has_patch: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:5>
* needs_better_patch: 0 => 1
Comment:
The PR is still marked [WIP] and there are test failures.
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:6>
Comment (by Reupen Shah):
I have also run into problems with `QuerySet.count()` being very slow on
annotated query sets. Django uses a subquery for the count but injects a
group by into the subquery. This typically causes the database server to
deduplicate all matched rows using the group by columns which is, in
general, extremely slow when there are a large number of matched rows.
For example, consider the following model:
{{{
class Person(models.Model):
"""Person model."""
first_name = models.TextField()
last_name = models.TextField()
country = models.TextField(null=True, blank=True)
}}}
and query set:
{{{
from django.db.models.functions import Concat
from django.db.models import Value
queryset = Person.objects.annotate(full_name=Concat('first_name', Value('
'), 'last_name'))
}}}
`queryset.count()` generates the following query under PostgreSQL:
{{{
SELECT COUNT(*)
FROM
(SELECT "support_person"."id" AS Col1,
CONCAT("support_person"."first_name", CONCAT(' ',
"support_person"."last_name")) AS "full_name"
FROM "support_person"
GROUP BY "support_person"."id",
CONCAT("support_person"."first_name", CONCAT(' ',
"support_person"."last_name"))) subquery
}}}
`list(queryset)` generates:
{{{
SELECT "support_person"."id",
"support_person"."first_name",
"support_person"."last_name",
"support_person"."country",
CONCAT("support_person"."first_name", CONCAT(' ',
"support_person"."last_name")) AS "full_name"
FROM "support_person"
}}}
I am not entirely sure why the subquery for the count needs to be any
different from the query used when the query set itself is evaluated.
There are some relevant comments in the source code here:
https://github.com/django/django/blob/5deb7a86e8b54d052a3b1dbed1ae7142d362b1c5/django/db/models/sql/query.py#L404-L414
This has all been tested under Django 2.1.7.
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:7>
Comment (by Simon Charette):
This is somewhat related to #30158 where the compiler is not smart enough
to determine if it can exclude subquery annotations from `GROUP BY` on
aggregation.
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:8>
Comment (by Reupen Shah):
The behaviour is actually a bit more bizarre than I thought.
With the same person model, here are four count variations and the
generated queries:
1.
{{{
Person.objects.count()
}}}
{{{
SELECT COUNT(*) AS "__count" FROM "people_person"
}}}
2.
{{{
Person.objects.values('pk').count()
}}}
{{{
SELECT COUNT(*) AS "__count" FROM "people_person"
}}}
3.
{{{
Person.objects.annotate(full_name=Concat('first_name', Value(' '),
'last_name')).count()
}}}
{{{
SELECT COUNT(*) FROM (SELECT "people_person"."id" AS Col1,
CONCAT("people_person"."first_name", CONCAT(' ',
"people_person"."last_name")) AS "full_name" FROM "people_person" GROUP BY
"people_person"."id", CONCAT("people_person"."first_name", CONCAT(' ',
"people_person"."last_name"))) subquery
}}}
4.
{{{
Person.objects.annotate(full_name=Concat('first_name', Value(' '),
'last_name')).values('pk').count()
}}}
{{{
SELECT COUNT(*) FROM (SELECT "people_person"."id" AS Col1 FROM
"people_person") subquery
}}}
So there's a simple workaround for the case I gave in my last comment in
calling `.values('pk')` before `.count()`. However, that's not much help
if Django or other libraries are the ones calling `.count()`.
On the plus side, I have a better understanding of what the problem is
from looking at the Django source code.
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:9>
* cc: Reupen Shah (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:10>
Comment (by Reupen Shah):
Just a note to say that the behaviour I was describing was rectified in
https://github.com/django/django/pull/11062.
The third case from above now executes the following SQL query:
{{{
SELECT COUNT(*) FROM (
SELECT CONCAT("people_person"."first_name", CONCAT(' ',
"people_person"."last_name")) AS "full_name" FROM "people_person"
) subquery
}}}
Although the original ticket was about eliminating unneeded annotations
completely, the result above is good enough for me, as the group by has
been eliminated which was the real performance killer.
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:11>
* Attachment "ProductAndServiceUsage.json.zip" added.
* owner: Tom Forbes => Simon Charette
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:12>
* needs_better_patch: 1 => 0
Comment:
I submitted [https://github.com/django/django/pull/16263 a revised PR]
that strip unused annotations not only for `.count()` for but any usage of
`.aggregate`.
To take example from comment:7 and comment:9 the resulting query is now
{{{#!python
Person.objects.annotate(full_name=Concat('first_name', Value(' '),
'last_name')).count()
}}}
{{{#!sql
SELECT COUNT(*) FROM support_person
}}}
The adjusted logic goes along these lines.
Instead of systematically performing a subquery wrapping when there are
existing annotations only do so when the pre-existing annotation are
aggregate or window functions. In both cases, pre-existing
aggregation/window or not, strip annotations that are not referenced by
the aggregates.
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:13>
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:14>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"59bea9efd2768102fc9d3aedda469502c218e9b7" 59bea9ef]:
{{{
#!CommitTicketReference repository=""
revision="59bea9efd2768102fc9d3aedda469502c218e9b7"
Fixed #28477 -- Stripped unused annotations on aggregation.
Also avoid an unnecessary pushdown when aggregating over a query that
doesn't
have aggregate annotations.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:15>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"a9d2d8d1c36a4338758a792c475965180629a59f" a9d2d8d]:
{{{
#!CommitTicketReference repository=""
revision="a9d2d8d1c36a4338758a792c475965180629a59f"
Refs #28477 -- Reduced complexity of aggregation over qualify queries.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:16>
Comment (by GitHub <noreply@…>):
In [changeset:"10037130c123cd747d32a14a9ba47e0c5c9a37d1" 1003713]:
{{{
#!CommitTicketReference repository=""
revision="10037130c123cd747d32a14a9ba47e0c5c9a37d1"
Refs #28477 -- Fixed handling aliased annotations on aggregation.
Just like when using .annotate(), the .alias() method will generate the
necessary JOINs to resolve the alias even if not selected.
Since these JOINs could be multi-valued non-selected aggregates must be
considered to require subquery wrapping as a GROUP BY is required to
combine duplicated tuples from the base table.
Regression in 59bea9efd2768102fc9d3aedda469502c218e9b7.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:17>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"9daf8b4109c3e133eb57349bb44d73cc60c5773c" 9daf8b41]:
{{{
#!CommitTicketReference repository=""
revision="9daf8b4109c3e133eb57349bb44d73cc60c5773c"
Fixed #34464 -- Fixed queryset aggregation over group by reference.
Regression in 59bea9efd2768102fc9d3aedda469502c218e9b7.
Refs #28477.
Thanks Ian Cubitt for the report.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:18>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"511dc3db539122577aaba71f5a24d65d5adab092" 511dc3db]:
{{{
#!CommitTicketReference repository=""
revision="511dc3db539122577aaba71f5a24d65d5adab092"
[4.2.x] Fixed #34464 -- Fixed queryset aggregation over group by
reference.
Regression in 59bea9efd2768102fc9d3aedda469502c218e9b7.
Refs #28477.
Thanks Ian Cubitt for the report.
Backport of 9daf8b4109c3e133eb57349bb44d73cc60c5773c from main
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:19>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"e5c844d6f2a4ac6ae674d741b5f1fa2a688cedf4" e5c844d6]:
{{{
#!CommitTicketReference repository=""
revision="e5c844d6f2a4ac6ae674d741b5f1fa2a688cedf4"
Fixed #34551 -- Fixed QuerySet.aggregate() crash when referencing
subqueries.
Regression in 59bea9efd2768102fc9d3aedda469502c218e9b7.
Refs #28477.
Thanks Denis Roldán and Mariusz for the test.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:20>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"c78a4421de0fc3240b91d59e8f9028331777c624" c78a442]:
{{{
#!CommitTicketReference repository=""
revision="c78a4421de0fc3240b91d59e8f9028331777c624"
[4.2.x] Fixed #34551 -- Fixed QuerySet.aggregate() crash when referencing
subqueries.
Regression in 59bea9efd2768102fc9d3aedda469502c218e9b7.
Refs #28477.
Thanks Denis Roldán and Mariusz for the test.
Backport of e5c844d6f2a4ac6ae674d741b5f1fa2a688cedf4 from main
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:21>
Comment (by Dustin Lorres):
Is there a way to opt out of this behavior from application code? After
upgrading from 3.2 to 4.2 some of our pagination stopped working correctly
and it was traced back to `.count()` not providing the correct results,
likely from the updates in this ticket.
Here is an example of the failure (using PostgreSQL JSON query):
{{{
>>> qs = MyModel.objects.annotate(table_element=Func("data",
Value("$.MyArray[*]"),function="jsonb_path_query",output_field=JSONField())).filter(pk=1)
>>> qs.count()
1
>>> len(qs)
2
}}}
This assumes a very simple model `MyModel` with a JSON field `data` that
is has an instance (pk of 1) that has something the data field set to:
{{{
{
"MyArray": [{"id": 1, "name": "test"}, {"id": 2, "name": "test2"}]
}
}}}
The issue is now the `count` is not factoring in the annotation which will
actually increase the number of rows returned in the queryset (for this
example due to the jsonb_path_query which returns a set). It is only
counting the number of rows of `MyModel` which due to the `pk` filter will
only have one row returned.
Is there anyway to force the count operation to expand the query and
include the annotation?
I tried to filter on the count of the jsonb path query, but it failed:
{{{
>>> qs = MyModel.objects.annotate(my_count=Count(Func("data",
Value("$.MyArray[*]"),function="jsonb_path_query",output_field=JSONField()))).filter(pk=1,
my_count__gt=0)
>>> qs.count()
Exception:
django.db.utils.NotSupportedError: set-returning functions are not allowed
in HAVING
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:22>
Comment (by Simon Charette):
Dustin, [https://forum.djangoproject.com/t/django-4-2-behavior-change-
when-using-arrayagg-on-unnested-arrayfield-postgresql-specific/21547/2 I
think this answer is your way out].
Basically the optimization should be disabled for any set-returning
function but since Django only has single native one,`Subquery`, so the
optimization is only disabled when the `subquery = True` attribute is set.
In order to truly solve this issue I think we should introduce a new
documented `Expression.set_returning: bool` (better name welcome!) flag
that defaults to `False` but is set to `True` for `Subquery`.
The root of this problem is that the ORM simply doesn't support functions
that return rows in a generic way. Instead it branches out using
`getattr(expr, "subquery", False)` in all cases that it makes the most
sense to support them (e.g. `__in` looups).
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:23>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"68912e4f6f84f21322f92a2c7b6c77f68f91b9c9" 68912e4]:
{{{
#!CommitTicketReference repository=""
revision="68912e4f6f84f21322f92a2c7b6c77f68f91b9c9"
Fixed #34717 -- Fixed QuerySet.aggregate() crash when referencing window
functions.
Regression in 59bea9efd2768102fc9d3aedda469502c218e9b7.
Refs #28477.
Thanks younes-chaoui for the report.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:24>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"7a67b065d7e5653f3af1cbd28882d33d2a088b02" 7a67b065]:
{{{
#!CommitTicketReference repository=""
revision="7a67b065d7e5653f3af1cbd28882d33d2a088b02"
[4.2.x] Fixed #34717 -- Fixed QuerySet.aggregate() crash when referencing
window functions.
Regression in 59bea9efd2768102fc9d3aedda469502c218e9b7.
Refs #28477.
Thanks younes-chaoui for the report.
Backport of 68912e4f6f84f21322f92a2c7b6c77f68f91b9c9 from main
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:25>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"3b4a571275d967512866012955eb0b3ae486d63c" 3b4a5712]:
{{{
#!CommitTicketReference repository=""
revision="3b4a571275d967512866012955eb0b3ae486d63c"
Fixed #34798 -- Fixed QuerySet.aggregate() crash when referencing
expressions containing subqueries.
Regression in 59bea9efd2768102fc9d3aedda469502c218e9b7,
complements e5c844d6f2a4ac6ae674d741b5f1fa2a688cedf4.
Refs #28477, #34551.
Thanks Haldun Komsuoglu for the report.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:26>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"4ccca9eedc2f453602a20f562399a835a24817c1" 4ccca9ee]:
{{{
#!CommitTicketReference repository=""
revision="4ccca9eedc2f453602a20f562399a835a24817c1"
[5.0.x] Fixed #34798 -- Fixed QuerySet.aggregate() crash when referencing
expressions containing subqueries.
Regression in 59bea9efd2768102fc9d3aedda469502c218e9b7,
complements e5c844d6f2a4ac6ae674d741b5f1fa2a688cedf4.
Refs #28477, #34551.
Thanks Haldun Komsuoglu for the report.
Backport of 3b4a571275d967512866012955eb0b3ae486d63c from main
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:27>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"803caec60bed3b282b9f9961860a467160c0c8f1" 803caec]:
{{{
#!CommitTicketReference repository=""
revision="803caec60bed3b282b9f9961860a467160c0c8f1"
[4.2.x] Fixed #34798 -- Fixed QuerySet.aggregate() crash when referencing
expressions containing subqueries.
Regression in 59bea9efd2768102fc9d3aedda469502c218e9b7,
complements e5c844d6f2a4ac6ae674d741b5f1fa2a688cedf4.
Refs #28477, #34551.
Thanks Haldun Komsuoglu for the report.
Backport of 3b4a571275d967512866012955eb0b3ae486d63c from main
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:28>