[Django] #35586: Queryset count and length do not match when annotating using JSONB_PATH_QUERY

16 views
Skip to first unread message

Django

unread,
Jul 9, 2024, 3:20:58 PM7/9/24
to django-...@googlegroups.com
#35586: Queryset count and length do not match when annotating using
JSONB_PATH_QUERY
----------------------------+-----------------------------------------
Reporter: devin13cox | Type: Bug
Status: new | Component: Uncategorized
Version: 5.0 | Severity: Normal
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
----------------------------+-----------------------------------------
There is a bug where the queryset count is inaccurate when doing an
annotation with JSONB_PATH_QUERY. The count is fixed after the queryset is
evaluated.

I discovered this bug via `PageNumberPagination` from
`rest_framework.pagination`, specifically in the function
`paginate_queryset`. This Paginator takes in an object_list (in this case
a Queryset) and determines the length by first checking if a .count()
method is available, otherwise by doing len(object_list). In this case, it
is determining length through Queryset.count().

When the Queryset is annotated, and the length of the results increases,
the count remains stale--only returning the original count.

Currently I have a workaround by adding a DISTINCT, INTERSECTION, UNION,
or DIFFERENCE to the queryset. Unfortunately, I now also need to have
ordering which makes this tricky to continue working around.

I believe this was a regression back in 4.2. This was previously asked
about here: https://code.djangoproject.com/ticket/28477#comment:22, and
this thread was referred: https://forum.djangoproject.com/t/django-4-2
-behavior-change-when-using-arrayagg-on-unnested-arrayfield-postgresql-
specific/21547. However, the `Unnest` approach does not work for our use
case.

Here are steps to reproduce:

**models.py:
**

{{{
class TestModel(models.Model):
test_json = models.JSONField(default=dict, blank=True)
id = models.IntegerField(primary_key=True)
}}}

**tests.py**

{{{
from .models import TestModel
from django.contrib.postgres.fields import JSONField
from django.db.models import Func, Value

def test_bug(self):
test_model = TestModel.objects.create(
test_json={
"test_key" : [
{
"id" : 1,
"name" : "test1"
},
{
"id" : 2,
"name" : "test2"
}
]
},
id=1
)
test_model.save()
qs = TestModel.objects.annotate(
table_element=Func(
"test_json",
Value("$.test_key[*]"),
function="jsonb_path_query",
output_field=JSONField(),
subquery=True
)
).filter(pk=1)

# qs.count() and len(qs) should be equal, but currently they are
not. Running qs.count() after len(qs) is currently equal because the
queryset was evaluated.

self.assertEqual(qs.count(), len(qs))

}}}

Thank you for any guidance and/or support!
--
Ticket URL: <https://code.djangoproject.com/ticket/35586>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jul 9, 2024, 3:23:40 PM7/9/24
to django-...@googlegroups.com
#35586: Queryset count and length do not match when annotating using
JSONB_PATH_QUERY
-------------------------------+--------------------------------------
Reporter: devin13cox | Owner: (none)
Type: Bug | Status: new
Component: Uncategorized | Version: 5.0
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Comment (by devin13cox):

Edit: the `subquery=True` component was from another recommendation I
found, however whether or not it is included has had no impact on the net
result.
--
Ticket URL: <https://code.djangoproject.com/ticket/35586#comment:1>

Django

unread,
Jul 9, 2024, 3:26:12 PM7/9/24
to django-...@googlegroups.com
#35586: Queryset count and length do not match when annotating using
JSONB_PATH_QUERY
-------------------------------------+-------------------------------------
Reporter: devin13cox | Owner: (none)
Type: Bug | Status: new
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by devin13cox):

* component: Uncategorized => Database layer (models, ORM)

--
Ticket URL: <https://code.djangoproject.com/ticket/35586#comment:2>

Django

unread,
Jul 9, 2024, 4:49:59 PM7/9/24
to django-...@googlegroups.com
#35586: Queryset count and length do not match when annotating using
JSONB_PATH_QUERY
-------------------------------------+-------------------------------------
Reporter: devin13cox | Owner: (none)
Type: Bug | Status: new
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* cc: Simon Charette (added)
* stage: Unreviewed => Accepted

Comment:

This one is tricky, the ORM and its aggregation logic was not built in a
way that accounts for annotations spawning rows themselves (instead of
through `JOIN`s) like some [https://www.postgresql.org/docs/current
/functions-srf.html Postgres set-returning functions] (e.g. `UNNEST`).
These do effectively throw a wrench in the optimization introduced in 4.2
if the set-returning annotation is not referenced by the aggregation.

What is really needed here is likely a `Expression.set_returning: bool =
False` attribute that the ORM can consider to enforce the subquery
pushdown and preventing of annotation eliding.

Something like

{{{#!diff
diff --git a/django/db/models/expressions.py
b/django/db/models/expressions.py
index 4ee22420d9..589fd30b0f 100644
--- a/django/db/models/expressions.py
+++ b/django/db/models/expressions.py
@@ -182,6 +182,8 @@ class BaseExpression:
allowed_default = False
# Can the expression be used during a constraint validation?
constraint_validation_compatible = True
+ # Does the expression possibly return more than one row?
+ set_returning = False

def __init__(self, output_field=None):
if output_field is not None:
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
index 438bb5ddbd..0eb1a2476c 100644
--- a/django/db/models/sql/query.py
+++ b/django/db/models/sql/query.py
@@ -491,6 +491,10 @@ def get_aggregation(self, using, aggregate_exprs):
)
or having
)
+ has_set_returning_annotation = any(
+ getattr(annotation, "set_returning", False)
+ for annotation in self.annotations.values()
+ )
# Decide if we need to use a subquery.
#
# Existing aggregations would cause incorrect results as
@@ -510,6 +514,7 @@ def get_aggregation(self, using, aggregate_exprs):
or qualify
or self.distinct
or self.combinator
+ or has_set_returning_annotation
):
from django.db.models.sql.subqueries import AggregateQuery
}}}

That could then be used like

{{{#!python
class JSONPathQuery(Func):
function = "jsonb_path_query"
output_field = JSONField()
set_returning = True

JSONPathQuery("test_json", Value("$.test_key[*]"))
}}}

The challenge here is that Django doesn't have any ''core'' set-returning
function (which also explains why this slipped under the radar for so
long) so maybe we should also consider adding support for
`contrib.postgres.Unnest` which is a common one that would allow us to
ensure proper test coverage and document it as an example of when this
flag should be set?
--
Ticket URL: <https://code.djangoproject.com/ticket/35586#comment:3>
Reply all
Reply to author
Forward
0 new messages