[Django] #35833: Annotation yielding an empty set causes entire QuerySet to become empty

17 views
Skip to first unread message

Django

unread,
Oct 12, 2024, 3:59:07 PM10/12/24
to django-...@googlegroups.com
#35833: Annotation yielding an empty set causes entire QuerySet to become empty
-------------------------------------+-------------------------------------
Reporter: Jacob Walls | Type: Bug
Status: new | Component: Database
| layer (models, ORM)
Version: 5.1 | Severity: Normal
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
With a nullable JSONField in my model:

{{{
class TileModel(models.Model):
provisionaledits = JSONField(null=True)
}}}

If I then annotate with a set-returning function (e.g.
JSONB_ARRAY_ELEMENTS on postgres) that happens to return an empty set, the
entire QuerySet is zeroed out.

{{{
In [1]: from django.db.models import Func, F

In [2]: from arches.app.models.models import TileModel

In [3]: class JsonbArrayElements(Func):
...: arity = 1
...: contains_subquery = True # Django 5.2: set_returning = True
...: function = "JSONB_ARRAY_ELEMENTS"
...:

In [4]: TileModel.objects.count()
Out[4]: 9

In [5]:
TileModel.objects.annotate(empty_set=JsonbArrayElements(F("provisionaledits")))
Out[5]: <QuerySet []>
}}}

I distilled this from a more complicated failure. If it helps, here is a
query that fails more helpfully when you actually have JSON in the queried
field but still can't extract array elements because it's not an array:

{{{
In [14]: TileModel.objects.create(provisionaledits={})
Out[14]: <TileModel: TileModel object
(6ee450f7-b516-4c61-b6e7-52e876b878fe)>

In [15]:
TileModel.objects.annotate(no_longer_empty=JsonbArrayElements(F("provisionaledits")))
Out[15]: DataError: cannot extract elements from an object
}}}

I guess I would expect an error in both cases instead of an empty queryset
in the former. At the very least, I would only expect the annotation to
yield some sort of empty value, not for the whole queryset to silently
become empty.
--
Ticket URL: <https://code.djangoproject.com/ticket/35833>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Oct 15, 2024, 3:19:54 AM10/15/24
to django-...@googlegroups.com
#35833: Annotation yielding an empty set causes entire QuerySet to become empty
-------------------------------------+-------------------------------------
Reporter: Jacob Walls | Owner: (none)
Type: Bug | Status: new
Component: Database layer | Version: 5.1
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Sarah Boyce):

* cc: Devin Cox (added)
* stage: Unreviewed => Accepted

Comment:

Agree this looks like a bug, here's a rough testy thing in case folks find
it useful

{{{#!diff
--- a/tests/annotations/models.py
+++ b/tests/annotations/models.py
@@ -61,7 +61,7 @@ class Ticket(models.Model):


class JsonModel(models.Model):
- data = models.JSONField(default=dict, blank=True)
+ data = models.JSONField(null=True)
id = models.IntegerField(primary_key=True)

class Meta:
diff --git a/tests/annotations/tests.py b/tests/annotations/tests.py
index 29660a827e..31dfed6ca4 100644
--- a/tests/annotations/tests.py
+++ b/tests/annotations/tests.py
@@ -3,7 +3,7 @@ from decimal import Decimal
from unittest import skipUnless

from django.core.exceptions import FieldDoesNotExist, FieldError
-from django.db import connection
+from django.db import connection, DataError
from django.db.models import (
BooleanField,
Case,
@@ -1188,6 +1188,22 @@ class NonAggregateAnnotationTestCase(TestCase):

self.assertEqual(qs.count(), len(qs))

+ @skipUnless(connection.vendor == "postgresql", "PostgreSQL tests")
+ @skipUnlessDBFeature("supports_json_field")
+ def test_set_returning_functions(self):
+ class JsonbArrayElements(Func):
+ function = "JSONB_ARRAY_ELEMENTS"
+ arity = 1
+ set_returning = True
+
+ JsonModel.objects.create(id=1)
+ qs =
JsonModel.objects.annotate(empty_set=JsonbArrayElements(F("data")))
+ self.assertEqual(qs.count(), 0) # <-- expect to be a DataError
+
+ JsonModel.objects.create(id=2, data={})
+ with self.assertRaises(DataError):
+
JsonModel.objects.annotate(empty_set=JsonbArrayElements(F("data"))).count()
+

class AliasTests(TestCase):
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/35833#comment:1>

Django

unread,
Oct 15, 2024, 9:36:18 AM10/15/24
to django-...@googlegroups.com
#35833: Annotation yielding an empty set causes entire QuerySet to become empty
-------------------------------------+-------------------------------------
Reporter: Jacob Walls | Owner: (none)
Type: Bug | Status: new
Component: Database layer | Version: 5.1
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Jacob Walls):

Thanks for the draft test. Looking again, it seems providing `NULL` to
this function is fine, and doesn't raise an error:

{{{
% python manage.py dbshell
psql (16.1, server 15.4)
Type "help" for help.

my_project=# SELECT JSONB_ARRAY_ELEMENTS(NULL);
jsonb_array_elements
----------------------
(0 rows)
}}}

So I guess my expectation is that I can use the annotation as written
without interfering with my queryset as a whole. My use case was providing
a path lookup that happened to not exist (returning NULL) rather than just
querying the root of the field.
--
Ticket URL: <https://code.djangoproject.com/ticket/35833#comment:2>

Django

unread,
Oct 15, 2024, 10:17:45 AM10/15/24
to django-...@googlegroups.com
#35833: Annotation yielding an empty set causes entire QuerySet to become empty
-------------------------------------+-------------------------------------
Reporter: Jacob Walls | Owner: (none)
Type: Bug | Status: new
Component: Database layer | Version: 5.1
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* cc: Simon Charette (added)

Comment:

I could not reproduce the difference of `count` and `len` locally with the
provided test case when `set_returning = True` is appropriately set; both
result in returning no rows when dealing with empty sequence references.
This is definitely a bug if we can reproduce the mismatch.

> So I guess my expectation is that I can use the annotation as written
without interfering with my queryset as a whole.

I'm not sure how much of a this is a bug over a misunderstanding of how
`JSONB_ARRAY_ELEMENTS` and set-returning functions actually works. AFAIK
when using a set-returning function in a `SELECT` clause (which is what
`annotate` does) Postgres performs
[https://en.wikipedia.org/wiki/Cartesian_product a Cartesian product] of
the existing rows and the each values returned for the row. The Cartesian
product of any set with the empty set is the empty set.

If you want to avoid the empty set product then you must avoid providing a
value to `JSONB_ARRAY_ELEMENTS` that results in one. In the case of `NULL`
this can be avoided using `Coalesce` but you also have to deal with the
empty array case. Something like the following is possibly what you're
after?

{{{#!python
class JsonbArrayLength(Func):
arity = 1
ouput_field = models.IntegerField()
function = "jsonb_array_length"


empty_set_condition = Q(provisionaledits=None) |
LesserThan(JsonbArrayLength("provisionaledits", 1))
empty_set_value = Value(["<empty>"], JSONField())

TileModel.objects.annotate(
annotation=JsonbArrayElements(
Case(
When(empty_set_condition, then=empty_set_value),
default="provisionaledits",
)
)
)
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/35833#comment:3>

Django

unread,
Oct 16, 2024, 9:35:27 AM10/16/24
to django-...@googlegroups.com
#35833: Annotation yielding an empty set causes entire QuerySet to become empty
-------------------------------------+-------------------------------------
Reporter: Jacob Walls | Owner: (none)
Type: Bug | Status: closed
Component: Database layer | Version: 5.1
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Jacob Walls):

* resolution: => invalid
* status: new => closed

Comment:

Thanks very much, Simon. This is user error, then. I guess rather than a
Cartesian product I was expecting an implicit subquery so that anything
that happened during annotate would not "filter" the QuerySet. I didn't
intend to suggest there was a count/len mismatch; I was just trying to
show that I had some number of model instances persisted in general, and I
expected to see all of them no matter what happened in `annotate()`. But
your explanation helps me see why that's wrong.

Thanks for the draft query, too. While this ticket was being triaged, I
ended up using an explicit subquery, along these lines:
{{{
# Start from an FK-related model
ResourceInstance.objects.annotate(
annotation=ArraySubquery(
TileModel.objects.filter(resourceinstance_id=OuterRef("resourceinstanceid"))
.annotate(as_array=JsonbArrayElements(F("more_complex_path_lookup"))
.values("as_array")
)
)
}}}

Then, in some queries where my json path lookup needed to be resilient
against missing keys in I ended up wrapping the expression given to the
`JsonbArrayElements` in a `Case` as you suggested.

I would have come back and closed the ticket if I could have put all these
pieces together more quickly (sorry!), but I guess I was still bewitched
by the assumption that whatever I did inside `annotate()` wouldn't
possibly filter down my queryset.
--
Ticket URL: <https://code.djangoproject.com/ticket/35833#comment:4>

Django

unread,
Oct 16, 2024, 9:44:20 AM10/16/24
to django-...@googlegroups.com
#35833: Annotation yielding an empty set causes entire QuerySet to become empty
-------------------------------------+-------------------------------------
Reporter: Jacob Walls | Owner: (none)
Type: Bug | Status: closed
Component: Database layer | Version: 5.1
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Natalia Bidart):

* stage: Accepted => Unreviewed

--
Ticket URL: <https://code.djangoproject.com/ticket/35833#comment:5>

Django

unread,
Oct 16, 2024, 9:53:34 PM10/16/24
to django-...@googlegroups.com
#35833: Annotation yielding an empty set causes entire QuerySet to become empty
-------------------------------------+-------------------------------------
Reporter: Jacob Walls | Owner: (none)
Type: Bug | Status: closed
Component: Database layer | Version: 5.1
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):

The way Postgres set-returning annotations work is definitely unintuitive
if you are used to subqueries which forces you to have a single row and
implicitly use `OUTER` instead of `INNER` semantics so I wouldn't be too
harsh on yourself here.

What I mean is that the fact subqueries which are somewhat set-returning
constructs behave in an `OUTER JOIN LATERAL` manner

{{{#!sql
SELECT
author.id, (
SELECT post.id
FROM post
WHERE post.author_id = author.id
ORDER BY id LIMIT 1
) first_post_id
FROM author
}}}

[https://dbfiddle.uk/A8JxBxnF Is equivalent to]

{{{#!sql
SELECT
author.id,
first_post.id
FROM author
LEFT OUTER JOIN LATERAL (
SELECT post.*
FROM post
WHERE post.author_id = author.id
ORDER BY id
LIMIT 1
) first_post ON true
}}}

||= id =||= first_post_id =||
|| 1 || 1 ||
|| 2 || null ||


While other set-returning functions behave in a `INNER JOIN LATERAL`
manner

{{{#!sql
SELECT
author.id,
jsonb_array_elements(jsonb_path_query_array(post_ids, '$[0 to 0]'))
first_post_id
FROM author
}}}

[https://dbfiddle.uk/d-22xqgK Is equivalent to]

{{{#!sql
SELECT
author.id,
first_post.id AS first_post_id
FROM author
INNER JOIN LATERAL (
SELECT * FROM jsonb_array_elements(jsonb_path_query_array(post_ids, '$[0
to 0]'))
) first_post(id) ON true
}}}

||= id =||= first_post_id =||
|| 1 || 1 ||

Is definitely breaks the principle of least astonishment, at least it did
for me!
--
Ticket URL: <https://code.djangoproject.com/ticket/35833#comment:6>
Reply all
Reply to author
Forward
0 new messages