[Django] #33929: Field Reference in FilteredRelation Does Not Recognize Previously Defined FilteredRelation

20 views
Skip to first unread message

Django

unread,
Aug 16, 2022, 7:53:08 PM8/16/22
to django-...@googlegroups.com
#33929: Field Reference in FilteredRelation Does Not Recognize Previously Defined
FilteredRelation
-------------------------------------+-------------------------------------
Reporter: Matt | Owner: nobody
Type: Bug | Status: new
Component: Database | Version: 4.1
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
I suspect this may be the same root cause as
https://code.djangoproject.com/ticket/33766, but the use-case here is
different enough I thought I'd log a new ticket.

All this is using Django 4.0 or 4.1, on PostgreSQL. I confess that I have
not checked if other DB layers might generate correct SQL.

It appears that I cannot reference one FilteredRelation from another's
condition without jumping through some hoops. Starting with the following
example models:
{{{
from django.db import models

class A(models.Model):
...

class B(models.Model):
a = models.ForeignKey("A", on_delete=models.CASCADE)
complete = models.BooleanField(default=False)

class C(models.Model):
a = models.ForeignKey("A", on_delete=models.CASCADE)
b = models.OneToOneField("B", blank=True, null=True,
on_delete=models.CASCADE)
complete = models.BooleanField(default=False)
}}}

Now suppose that I want a count of incomplete B, and also incomplete C,
but only when related to an incomplete B.
If I were writing SQL myself, I’d write this as:
{{{
SELECT COUNT(b.id) as b_count, COUNT(c.id) as c_count
FROM a
LEFT JOIN b ON b.a_id = a.id AND NOT b.complete
LEFT JOIN c ON c.a_id = a.id AND c.b_id = b.id AND NOT c.complete
}}}

Now, the below queryset very nearly works:
{{{
A.objects.annotate(
binc=FilteredRelation("b", condition=Q(b__complete=False)),
cinc=FilteredRelation("c", condition=Q(c__b=F("binc__pk"),
c__complete=False)),
b_count=Count("binc"),
c_count=Count("cinc"),
)
}}}

Unfortunately this uses an incorrect table alias into the `cinc`
FilteredRelation, where I tried to reference `F("binc__pk")`. If I try to
execute it, I get
{{{
django.db.utils.ProgrammingError: missing FROM-clause entry for table "t4"
LINE 1: ...("a"."id" = cinc."a_id" AND ((cinc."b_id" = (T4."id") A…
}}}

There is a workaround: I can force the correct identifier using RawSQL,
and use this, which provides correct results:

{{{
A.objects.annotate(
binc=FilteredRelation("b", condition=Q(b__complete=False)),
cinc=FilteredRelation("c", condition=Q(c__b=RawSQL("binc.id", ()),
c__complete=False)),
b_count=Count("binc"),
c_count=Count("cinc"),
)
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/33929>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Aug 16, 2022, 10:13:54 PM8/16/22
to django-...@googlegroups.com
#33929: Field Reference in FilteredRelation Does Not Recognize Previously Defined
FilteredRelation
-------------------------------------+-------------------------------------
Reporter: Matt | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 4.1
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Nick Vellios):

I hope to add some more context after looking this over. It appears as
though Matt is correct regarding being related to
https://code.djangoproject.com/ticket/33766. The fields are not being
properly aliased.

App name in my tests is `interface` which I left references of in the
console output, but I removed the `interface_` prefix from the formatted
SQL queries for readability.

{{{
>>> qs = A.objects.annotate(
... binc=FilteredRelation("b", condition=Q(b__complete=False)),
... cinc=FilteredRelation("c", condition=Q(c__b=F("binc__pk"),
c__complete=False)),
... b_count=Count("binc"),
... c_count=Count("cinc"),
... )
>>> qs.query.alias_map
{'interface_a': <django.db.models.sql.datastructures.BaseTable object at
0x11095fa60>, 'binc': <django.db.models.sql.datastructures.Join object at
0x11095c6a0>, 'cinc': <django.db.models.sql.datastructures.Join object at
0x11095de10>}
}}}

Invoking the query compiler adds the naive alias `T4` to the query which
also later shows up in the `Query.alias_map`.
{{{
>>> str(qs.query)

SELECT
"a"."id",
COUNT(binc."id") AS "b_count",
COUNT(cinc."id") AS "c_count"
FROM "a"
LEFT OUTER JOIN "b" binc
ON ("a"."id" = binc."a_id"
AND (NOT binc."complete"))
LEFT OUTER JOIN "c" cinc
ON ("a"."id" = cinc."a_id"


AND ((cinc."b_id" = (T4."id")

AND NOT cinc."complete")))
GROUP BY "a"."id"
}}}

Notice the incrementing `Tn` on the `Query.alias_map` but only the last
one is referenced in the resulting query:

{{{
>>> qs.query.alias_map
{'interface_a': <django.db.models.sql.datastructures.BaseTable object at
0x11095fa60>, 'binc': <django.db.models.sql.datastructures.Join object at
0x11095c6a0>, 'cinc': <django.db.models.sql.datastructures.Join object at
0x11095de10>, 'T4': <django.db.models.sql.dat
astructures.Join object at 0x1077991b0>, 'T5':
<django.db.models.sql.datastructures.Join object at 0x10778b400>}

>>> str(qs.query)
'SELECT "interface_a"."id", COUNT(binc."id") AS "b_count",
COUNT(cinc."id") AS "c_count" FROM "interface_a" LEFT OUTER JOIN
"interface_b" binc ON ("interface_a"."id" = binc."a_id" AND (NOT
binc."complete")) LEFT OUTER JOIN "interface_c" cinc ON
("interface_a"."id" =
cinc."a_id" AND ((cinc."b_id" = (T6."id") AND NOT cinc."complete")))
GROUP BY "interface_a"."id"'

>>> str(qs.query)
'SELECT "interface_a"."id", COUNT(binc."id") AS "b_count",
COUNT(cinc."id") AS "c_count" FROM "interface_a" LEFT OUTER JOIN
"interface_b" binc ON ("interface_a"."id" = binc."a_id" AND (NOT
binc."complete")) LEFT OUTER JOIN "interface_c" cinc ON
("interface_a"."id" =
cinc."a_id" AND ((cinc."b_id" = (T8."id") AND NOT cinc."complete")))
GROUP BY "interface_a"."id"'

>>> qs.query.alias_map
{'interface_a': <django.db.models.sql.datastructures.BaseTable object at
0x11095fa60>, 'binc': <django.db.models.sql.datastructures.Join object at
0x11095c6a0>, 'cinc': <django.db.models.sql.datastructures.Join object at
0x11095de10>, 'T4': <django.db.models.sql.dat
astructures.Join object at 0x1077991b0>, 'T5':
<django.db.models.sql.datastructures.Join object at 0x10778b400>, 'T6':
<django.db.models.sql.datastructures.Join object at 0x11099b640>, 'T7':
<django.db.models.sql.datastructures.Join object at 0x110834580>, 'T8':
<dj
ango.db.models.sql.datastructures.Join object at 0x110ad0610>, 'T9':
<django.db.models.sql.datastructures.Join object at 0x110ad03a0>}


>>> str(qs.query)

SELECT
"a"."id",
COUNT(binc."id") AS "b_count",
COUNT(cinc."id") AS "c_count"
FROM "a"
LEFT OUTER JOIN "b" binc
ON ("a"."id" = binc."a_id"
AND (NOT binc."complete"))
LEFT OUTER JOIN "c" cinc
ON ("a"."id" = cinc."a_id"
AND ((cinc."b_id" = (T9"id")
AND NOT cinc."complete")))
GROUP BY "a"."id"

}}}

The following works but selects and groups by one additional field
`binc_pk`. Postgres Query Planner output included. On a larger queryset
this could get expensive.

{{{
A.objects.annotate(
binc=FilteredRelation("b", condition=Q(b__complete=False)),

binc_pk=F('binc__pk'), # <-- Allows elimination of relying on raw
SQL, however...
cinc=FilteredRelation("c", condition=Q(c__b=F('binc_pk'),


c__complete=False)),
b_count=Count("binc"),
c_count=Count("cinc"),
)
}}}

{{{
SELECT
"a"."id",
binc."id" AS "binc_pk", /* <-- Ugly */
COUNT(binc."id") AS "b_count",
COUNT(cinc."id") AS "c_count"
FROM "a"
LEFT OUTER JOIN "b" binc
ON ("a"."id" = binc."a_id"
AND (NOT binc."complete"))
LEFT OUTER JOIN "c" cinc
ON ("a"."id" = cinc."a_id"
AND ((cinc."b_id" = (binc."id")
AND NOT cinc."complete")))
GROUP BY "a"."id",
binc."id" /* <-- Ugly */
}}}

Not ideal.

{{{
➜ ~ psql -d dj_issue_33929
psql (14.4)
Type "help" for help.

dj_issue_33929=# EXPLAIN SELECT
dj_issue_33929-# "interface_a"."id",
dj_issue_33929-# binc."id" AS "binc_pk",
dj_issue_33929-# COUNT(binc."id") AS "b_count",
dj_issue_33929-# COUNT(cinc."id") AS "c_count"
dj_issue_33929-# FROM "interface_a"
dj_issue_33929-# LEFT OUTER JOIN "interface_b" binc
dj_issue_33929-# ON ("interface_a"."id" = binc."a_id"
dj_issue_33929(# AND (NOT binc."complete"))
dj_issue_33929-# LEFT OUTER JOIN "interface_c" cinc
dj_issue_33929-# ON ("interface_a"."id" = cinc."a_id"
dj_issue_33929(# AND ((cinc."b_id" = (binc."id")
dj_issue_33929(# AND NOT cinc."complete")))
dj_issue_33929-# GROUP BY "interface_a"."id",
dj_issue_33929-# binc."id";
QUERY PLAN
---------------------------------------------------------------------------------------
HashAggregate (cost=162.74..185.34 rows=2260 width=32)
Group Key: interface_a.id, binc.id
-> Hash Left Join (cost=97.80..140.14 rows=2260 width=24)
Hash Cond: ((interface_a.id = cinc.a_id) AND (binc.id =
cinc.b_id))
-> Hash Right Join (cost=60.85..91.33 rows=2260 width=16)
Hash Cond: (binc.a_id = interface_a.id)
-> Seq Scan on interface_b binc (cost=0.00..28.10
rows=905 width=16)
Filter: (NOT complete)
-> Hash (cost=32.60..32.60 rows=2260 width=8)
-> Seq Scan on interface_a (cost=0.00..32.60
rows=2260 width=8)
-> Hash (cost=25.40..25.40 rows=770 width=24)
-> Seq Scan on interface_c cinc (cost=0.00..25.40
rows=770 width=24)
Filter: (NOT complete)
(13 rows)

dj_issue_33929=# EXPLAIN SELECT
dj_issue_33929-# "interface_a"."id",
dj_issue_33929-# COUNT(binc."id") AS "b_count",
dj_issue_33929-# COUNT(cinc."id") AS "c_count"
dj_issue_33929-# FROM "interface_a"
dj_issue_33929-# LEFT OUTER JOIN "interface_b" binc
dj_issue_33929-# ON ("interface_a"."id" = binc."a_id"
dj_issue_33929(# AND (NOT binc."complete"))
dj_issue_33929-# LEFT OUTER JOIN "interface_c" cinc
dj_issue_33929-# ON ("interface_a"."id" = cinc."a_id"
dj_issue_33929(# AND ((cinc."b_id" = (binc.id)
dj_issue_33929(# AND NOT cinc."complete")))
dj_issue_33929-# GROUP BY "interface_a"."id";
QUERY PLAN
---------------------------------------------------------------------------------------
HashAggregate (cost=157.09..179.69 rows=2260 width=24)
Group Key: interface_a.id
-> Hash Left Join (cost=97.80..140.14 rows=2260 width=24)
Hash Cond: ((interface_a.id = cinc.a_id) AND (binc.id =
cinc.b_id))
-> Hash Right Join (cost=60.85..91.33 rows=2260 width=16)
Hash Cond: (binc.a_id = interface_a.id)
-> Seq Scan on interface_b binc (cost=0.00..28.10
rows=905 width=16)
Filter: (NOT complete)
-> Hash (cost=32.60..32.60 rows=2260 width=8)
-> Seq Scan on interface_a (cost=0.00..32.60
rows=2260 width=8)
-> Hash (cost=25.40..25.40 rows=770 width=24)
-> Seq Scan on interface_c cinc (cost=0.00..25.40
rows=770 width=24)
Filter: (NOT complete)
(13 rows)
}}}

Instinctively I looked into the implementation of `OuterRef` for ideas,
but the implementation is quite contrasting to FilteredRelation.

--
Ticket URL: <https://code.djangoproject.com/ticket/33929#comment:1>

Django

unread,
Aug 17, 2022, 1:22:09 AM8/17/22
to django-...@googlegroups.com
#33929: Field Reference in FilteredRelation Does Not Recognize Previously Defined
FilteredRelation
-------------------------------------+-------------------------------------
Reporter: Matt | Owner: nobody
Type: Bug | Status: closed

Component: Database layer | Version: 4.1
(models, ORM) |
Severity: Normal | Resolution: duplicate

Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* status: new => closed
* resolution: => duplicate


Comment:

Thanks for the report! Agree, this is a duplicate of #33929.

--
Ticket URL: <https://code.djangoproject.com/ticket/33929#comment:2>

Django

unread,
Aug 17, 2022, 1:24:54 AM8/17/22
to django-...@googlegroups.com
#33929: Field Reference in FilteredRelation Does Not Recognize Previously Defined
FilteredRelation
-------------------------------------+-------------------------------------
Reporter: Matt | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: 4.1
(models, ORM) |
Severity: Normal | Resolution: duplicate
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

I agree with Mariusz and I think this has the exact same root cause as
#33766 and detailed in
[https://code.djangoproject.com/ticket/33766#comment:5 this comment].

The gist of it is that filter relations
[https://github.com/django/django/blob/9dff316be41847c505ebf397e4a52a0a71e0acc4/django/db/models/query_utils.py#L376-L380
conditions are resolved very late in the process of SQL compilation]
(basically when the `FROM` clause is generated) and thus the resolving of
references to filtered relation induced joins alias can be off.

--
Ticket URL: <https://code.djangoproject.com/ticket/33929#comment:3>

Reply all
Reply to author
Forward
0 new messages