[Django] #34993: Unexpected behaviour when filtering across mult-value (aka tomany) joins.

18 views
Skip to first unread message

Django

unread,
Nov 23, 2023, 5:17:15 AM11/23/23
to django-...@googlegroups.com
#34993: Unexpected behaviour when filtering across mult-value (aka tomany) joins.
------------------------------------------+------------------------
Reporter: Gordon Wrigley | Owner: nobody
Type: Uncategorized | Status: new
Component: Uncategorized | Version: 4.2
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
------------------------------------------+------------------------
My overall questions here are:
1: does the below all look correct / expected
2: how should I be writing this queryset
3: what are the general guidelines I should follow when doing this kind of
thing

I have some models the important bit's of which look like this
{{{
DataExport
pass
DataImport
from_data_export = ForeignKey(
DataExport,
related_name="to_data_imports",
null=True,
)
Run
data_import = ForeignKey(
DataImport,
related_name="runs",
null=False,
)
wet = BooleanField(null=False)
succeeded = BooleanField(null=True)
}}}
The basic idea here is stuff gets exported out of one part of the system.
And then will be imported into another part of the system.
And those imports happen in runs that can be dry/wet and success/fail.

I'm trying to write a query to find exports that do not have successful
wet import runs and the SQL it's producing is surprising in a variety of
ways.

Below I'm going to show a series of querysets, with the associated SQL and
some comments.

This is all on Django 4.1.13, I couldn't find anything in release notes
that would suggest any of this has changed since then.

--------------------
{{{ #!python
DataExport.objects.exclude(
to_data_imports__runs__wet=True,
to_data_imports__runs__succeeded=True,
)
}}}

{{{ #!sql
SELECT <snip>
FROM "approval_dataexport"
WHERE NOT (
EXISTS(
SELECT 1 AS "a"
FROM "retain_dataimport" U1
INNER JOIN "data_import_run" U2 ON (U1."id" = U2."data_import_id")
WHERE (
U2."succeeded"
AND U1."from_data_export_id" = ("approval_dataexport"."id")
)
LIMIT 1
)
AND EXISTS(
SELECT 1 AS "a"
FROM "retain_dataimport" U1
INNER JOIN "data_import_run" U2 ON (U1."id" = U2."data_import_id")
WHERE (
U2."wet"
AND U1."from_data_export_id" = ("approval_dataexport"."id")
)
LIMIT 1
)
)
}}}

This is the most concise expression of intent. And so it's not working as
desired is a trap for beginners. But it is consistent with where the docs
say "conditions in a single exclude() call will not necessarily refer to
the same item".

--------------------
{{{ #!python
DataExport.objects.exclude(
Q(
to_data_imports__runs__wet=True,
to_data_imports__runs__succeeded=True,
)
)
}}}

{{{ #!sql
SELECT <snip>
FROM "approval_dataexport"
WHERE NOT (
EXISTS(
SELECT 1 AS "a"
FROM "retain_dataimport" U1
INNER JOIN "data_import_run" U2 ON (U1."id" = U2."data_import_id")
WHERE (
U2."succeeded"
AND U1."from_data_export_id" = ("approval_dataexport"."id")
)
LIMIT 1
)
AND EXISTS(
SELECT 1 AS "a"
FROM "retain_dataimport" U1
INNER JOIN "data_import_run" U2 ON (U1."id" = U2."data_import_id")
WHERE (
U2."wet" AND U1."from_data_export_id" =
("approval_dataexport"."id")
)
LIMIT 1
)
)
}}}

AFAIK the docs don't say anything either way on what this should do.


--------------------
{{{ #!python
DataExport.objects.filter(
~Q(
to_data_imports__runs__wet=True,
to_data_imports__runs__succeeded=True,
)
)
}}}

{{{ #!sql
SELECT <snip>
FROM "approval_dataexport"
WHERE NOT (
EXISTS(
SELECT 1 AS "a"
FROM "retain_dataimport" U1
INNER JOIN "data_import_run" U2 ON (U1."id" = U2."data_import_id")
WHERE (
U2."succeeded"
AND U1."from_data_export_id" = ("approval_dataexport"."id")
)
LIMIT 1
)
AND EXISTS(
SELECT 1 AS "a"
FROM "retain_dataimport" U1
INNER JOIN "data_import_run" U2 ON (U1."id" = U2."data_import_id")
WHERE (
U2."wet"
AND U1."from_data_export_id" = ("approval_dataexport"."id")
)
LIMIT 1
)
)
}}}

I am surprised this didn't have the desired result.

--------------------
{{{ #!python
DataExport.objects.exclude(
to_data_imports__in=DataImport.objects.filter(
runs__wet=True,
runs__succeeded=True,
)
)
}}}

{{{ #!sql
SELECT <snip>
FROM "approval_dataexport"
WHERE NOT (
EXISTS(
SELECT 1 AS "a"
FROM "retain_dataimport" V1
WHERE (
V1."id" IN (
SELECT U0."id"
FROM "retain_dataimport" U0
INNER JOIN "data_import_run" U1
ON (U0."id" = U1."data_import_id")
WHERE (U1."succeeded" AND U1."wet")
)
AND V1."from_data_export_id" = ("approval_dataexport"."id")
)
LIMIT 1
)
)
}}}

This one does what I want but the nested query structure seems excessive,
particularly the dual `FROM "retain_dataimport"`, I'm not sure if the
planner will do a good job of that.

--------------------
{{{ #!python
DataExport.objects.exclude(
to_data_imports__runs__in=Run.objects.filter(
wet=True,
succeeded=True,
)
)
}}}

{{{ #!sql
SELECT <snip>
FROM "approval_dataexport"
WHERE NOT (
EXISTS(
SELECT 1 AS "a"
FROM "retain_dataimport" V1
INNER JOIN "data_import_run" V2 ON (V1."id" = V2."data_import_id")
WHERE (
V2."id" IN (
SELECT U0."id"
FROM "data_import_run" U0
WHERE (U0."succeeded" AND U0."wet")
)
AND V1."from_data_export_id" = ("approval_dataexport"."id")
)
LIMIT 1
)
)
}}}

Basically the same as the above except now the dual selects are on
data_import_run.

--------------------
{{{ #!python
DataExport.objects.exclude(
Exists(
Run.objects.filter(
data_import__from_data_export__id=OuterRef('id'),
wet=True,
succeeded=True,
)
)
)
}}}

{{{ #!sql
SELECT <snip>
FROM "approval_dataexport"
WHERE NOT (
EXISTS(
SELECT 1 AS "a"
FROM "data_import_run" U0
INNER JOIN "retain_dataimport" U1 ON (U0."data_import_id" =
U1."id")
WHERE (
U1."from_data_export_id" = ("approval_dataexport"."id")
AND U0."succeeded"
AND U0."wet"
)
LIMIT 1
)
)
}}}

This one is starting to look decent

--
Ticket URL: <https://code.djangoproject.com/ticket/34993>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Nov 23, 2023, 5:17:47 AM11/23/23
to django-...@googlegroups.com
#34993: Unexpected behaviour when filtering across mult-value (aka tomany) joins.
-------------------------------------+-------------------------------------

Reporter: Gordon Wrigley | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 4.1
(models, ORM) |
Severity: Normal | Resolution:

Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Gordon Wrigley):

* version: 4.2 => 4.1
* component: Uncategorized => Database layer (models, ORM)


--
Ticket URL: <https://code.djangoproject.com/ticket/34993#comment:1>

Django

unread,
Nov 23, 2023, 5:28:01 AM11/23/23
to django-...@googlegroups.com
#34993: Unexpected behaviour when filtering across mult-value (aka tomany) joins.
-------------------------------------+-------------------------------------
Reporter: Gordon Wrigley | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 4.1
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Julius Seporaitis):

* cc: Julius Seporaitis (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/34993#comment:2>

Django

unread,
Nov 23, 2023, 6:08:25 AM11/23/23
to django-...@googlegroups.com
#34993: Unexpected behaviour when filtering across mult-value (aka tomany) joins.
-------------------------------------+-------------------------------------
Reporter: Gordon Wrigley | Owner: nobody
Type: Uncategorized | Status: closed

Component: Database layer | Version: 4.1
(models, ORM) |
Severity: Normal | Resolution: duplicate

Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* status: new => closed
* resolution: => duplicate


Comment:

This was discussed many times, e.g. #27936. It's recommended and
documented to use subqueries in such cases. In the future, start by asking
questions on support channels before creating a ticket.

> AFAIK the docs don't say anything either way on what this should do.

Docs cannot describe every possible configuration, `exclude(filters)`
works as `exclude(Q(filters))`.

--
Ticket URL: <https://code.djangoproject.com/ticket/34993#comment:3>

Django

unread,
Nov 23, 2023, 6:47:55 AM11/23/23
to django-...@googlegroups.com
#34993: Unexpected behaviour when filtering across mult-value (aka tomany) joins.
-------------------------------------+-------------------------------------
Reporter: Gordon Wrigley | Owner: nobody
Type: Uncategorized | Status: closed
Component: Database layer | Version: 4.1
(models, ORM) |
Severity: Normal | Resolution: duplicate
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Gordon Wrigley):

Could you direct me to that documention? Are you referring to the second
note here https://docs.djangoproject.com/en/4.2/topics/db/queries
/#spanning-multi-valued-relationships

Should my mental model be "always use subqueries when I want two fields
off the same multi value relation"?

You didn't address the deeply nested and redundant inner query behaviour,
would you like me to reraise that somewhere else?

--
Ticket URL: <https://code.djangoproject.com/ticket/34993#comment:4>

Reply all
Reply to author
Forward
0 new messages