Background: It's the same query as #31094. I tried upgrading to Django
3.0.2 and now I get duplicate results. Even tho they query should be
distinct.
A quick diff on the queries still reveals a different grouped by section:
This is the new query on 3.0.2:
{{{
SELECT DISTINCT "camps_offer"."id",
"camps_offer"."title",
"camps_offer"."slug",
"camps_offer"."is_active",
"camps_offer"."modified",
"camps_offer"."created",
"camps_offer"."provider_id",
"camps_offer"."activity_type",
"camps_offer"."description",
"camps_offer"."highlights",
"camps_offer"."important_information",
"camps_offer"."min_age",
"camps_offer"."max_age",
"camps_offer"."food",
"camps_offer"."video",
"camps_offer"."accommodation",
"camps_offer"."accommodation_type",
"camps_offer"."room_type",
"camps_offer"."room_size_min",
"camps_offer"."room_size_max",
"camps_offer"."external_url",
"camps_offer"."application_form",
"camps_offer"."caseload",
"camps_offer"."field_trips",
MIN(T4."retail_price") AS "min_retail_price",
(SELECT U0."id"
FROM "camps_servicepackage" U0
INNER JOIN "camps_region" U2 ON (U0."region_id"
= U2."id")
WHERE (U0."company_id" = 1 AND U0."option" =
"camps_offer"."activity_type" AND
ST_Contains(U2."locations", T4."position"))
LIMIT 1) AS "in_package",
"camps_provider"."id",
"camps_provider"."title",
"camps_provider"."slug",
"camps_provider"."is_active",
"camps_provider"."modified",
"camps_provider"."created",
"camps_provider"."logo",
"camps_provider"."description",
"camps_provider"."video",
"camps_provider"."external_url",
"camps_provider"."terms",
"camps_provider"."cancellation_policy",
"camps_provider"."privacy_policy",
"camps_provider"."application_form"
FROM "camps_offer"
LEFT OUTER JOIN "camps_bookingoption" ON ("camps_offer"."id" =
"camps_bookingoption"."offer_id")
INNER JOIN "camps_provider" ON ("camps_offer"."provider_id" =
"camps_provider"."id")
INNER JOIN "camps_bookingoption" T4 ON ("camps_offer"."id" =
T4."offer_id")
WHERE ("camps_offer"."is_active" = True AND "camps_provider"."is_active" =
True AND
T4."end" >= STATEMENT_TIMESTAMP() AND T4."is_active" = True AND
"camps_offer"."max_age" >= 5 AND
"camps_offer"."min_age" <= 13 AND (SELECT U0."id"
FROM "camps_servicepackage" U0
INNER JOIN
"camps_region" U2 ON (U0."region_id" = U2."id")
WHERE (U0."company_id" = 1 AND
U0."option" = "camps_offer"."activity_type" AND
ST_Contains(U2."locations", T4."position"))
LIMIT 1) IS NOT NULL)
GROUP BY "camps_offer"."id", T4."position", "camps_provider"."id"
ORDER BY "camps_offer"."created" ASC
}}}
And what it was (and should be) on 2.2.9:
{{{
SELECT DISTINCT "camps_offer"."id",
"camps_offer"."title",
"camps_offer"."slug",
"camps_offer"."is_active",
"camps_offer"."modified",
"camps_offer"."created",
"camps_offer"."provider_id",
"camps_offer"."activity_type",
"camps_offer"."description",
"camps_offer"."highlights",
"camps_offer"."important_information",
"camps_offer"."min_age",
"camps_offer"."max_age",
"camps_offer"."food",
"camps_offer"."video",
"camps_offer"."accommodation",
"camps_offer"."accommodation_type",
"camps_offer"."room_type",
"camps_offer"."room_size_min",
"camps_offer"."room_size_max",
"camps_offer"."external_url",
"camps_offer"."application_form",
"camps_offer"."caseload",
"camps_offer"."field_trips",
MIN(T4."retail_price") AS "min_retail_price",
(SELECT U0."id"
FROM "camps_servicepackage" U0
INNER JOIN "camps_region" U2 ON (U0."region_id"
= U2."id")
WHERE (U0."company_id" = 1 AND U0."option" =
("camps_offer"."activity_type") AND
ST_Contains(U2."locations", (T4."position")))
LIMIT 1) AS "in_package",
"camps_provider"."id",
"camps_provider"."title",
"camps_provider"."slug",
"camps_provider"."is_active",
"camps_provider"."modified",
"camps_provider"."created",
"camps_provider"."logo",
"camps_provider"."description",
"camps_provider"."video",
"camps_provider"."external_url",
"camps_provider"."terms",
"camps_provider"."cancellation_policy",
"camps_provider"."privacy_policy",
"camps_provider"."application_form"
FROM "camps_offer"
LEFT OUTER JOIN "camps_bookingoption" ON ("camps_offer"."id" =
"camps_bookingoption"."offer_id")
INNER JOIN "camps_provider" ON ("camps_offer"."provider_id" =
"camps_provider"."id")
INNER JOIN "camps_bookingoption" T4 ON ("camps_offer"."id" =
T4."offer_id")
WHERE ("camps_offer"."is_active" = True AND "camps_provider"."is_active" =
True AND
T4."end" >= (STATEMENT_TIMESTAMP()) AND T4."is_active" = True AND
(SELECT U0."id"
FROM "camps_servicepackage" U0
INNER JOIN "camps_region" U2 ON (U0."region_id" = U2."id")
WHERE (U0."company_id" = 1 AND
U0."option" = ("camps_offer"."activity_type") AND
ST_Contains(U2."locations", (T4."position")))
LIMIT 1) IS NOT NULL)
GROUP BY "camps_offer"."id",
(SELECT U0."id"
FROM "camps_servicepackage" U0
INNER JOIN "camps_region" U2 ON (U0."region_id" =
U2."id")
WHERE (U0."company_id" = 1 AND U0."option" =
("camps_offer"."activity_type") AND
ST_Contains(U2."locations", (T4."position")))
LIMIT 1), "camps_provider"."id"
ORDER BY "camps_offer"."created" ASC
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/31150>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
Old description:
New description:
Sadly there is more regression in Django 3.0.2 even after #31094.
Background: It's the same query as #31094. I tried upgrading to Django
3.0.2 and now I get duplicate results. Even tho they query should be
distinct. Where on 2.2 the queryset yields 490 results, it's 519 on 3.0.
--
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:1>
Comment (by felixxm):
Johannes, I need to repeat my gentle request for a queryset (see
[https://code.djangoproject.com/ticket/31094#comment:2 comment:2] and
[https://code.djangoproject.com/ticket/31094#comment:6 comment:6] ) Can
you provide a queryset? It's really hard to restore the original queryset
from a raw SQL.
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:2>
* status: new => closed
* resolution: => needsinfo
Comment:
I really try but without a queryset I was not able to reproduce this
issue.
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:3>
* status: closed => new
* resolution: needsinfo =>
Comment:
@felixxm, it seems that `Subquery` annotation are omitted from the
grouping section in 3.0. I will try to create a test case today.
If you add the following test to 2.2.* anywhere in `tests.aggregation` it
will pass, in 3.0 it will fail, because the subquery is missing from the
grouping bit:
{{{
def test_filtered_aggregate_ref_subquery_annotation_e_31150(self):
aggs = Author.objects.annotate(
earliest_book_year=Subquery(
Book.objects.filter(
contact__pk=OuterRef('pk'),
).order_by('pubdate').values('pubdate__year')[:1]
),
).annotate(max_id=Max('id'))
print(aggs.query)
self.assertIn(str(aggs.query), """
SELECT "aggregation_author"."id", "aggregation_author"."name",
"aggregation_author"."age", (SELECT django_date_extract('year',
U0."pubdate") FROM "aggregation_book" U0 WHERE U0."contact_id" =
("aggregation_author"."id") ORDER BY U0."pubdate" ASC LIMIT 1) AS
"earliest_book_year", MAX("aggregation_author"."id") AS "max_id" FROM
"aggregation_author" GROUP BY "aggregation_author"."id",
"aggregation_author"."name", "aggregation_author"."age", (SELECT
django_date_extract('year', U0."pubdate") FROM "aggregation_book" U0 WHERE
U0."contact_id" = ("aggregation_author"."id") ORDER BY U0."pubdate" ASC
LIMIT 1)
""")
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:4>
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:5>
* stage: Unreviewed => Accepted
Comment:
OK, thank you for the test case Joe.
I added the import line so it pops straight into
,`aggregation.tests.AggregateTestCase` and then, yes, there's a change of
behaviour between 2.2 and 3.0.
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:6>
Comment (by Simon Charette):
The generated query effectively changed but the returned resultset should
be the same since the subquery is a function of the outer query's primary
key and the query is grouped by the outer query primary key.
I don't think asserting against the generated query string is a proper
test case; the generated SQL will change between Django versions but the
returned result set should be the same.
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:7>
Comment (by Carlton Gibson):
Yes. I didn’t take the test case as final. Just illustrating the issue. 😀
If it’s equivalent then there’s no issue, but I’m assuming Joe is seeing
what he thinks is a substantive regression?
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:8>
Comment (by Johannes Hoppe):
Replying to [comment:7 Simon Charette]:
> The generated query effectively changed but the returned resultset
should be the same since the subquery is a function of the outer query's
primary key and the query is grouped by the outer query primary key.
I came from a wrong result set and deduced the error from here on out. It
took me a while to figure it out to, but sadly grouping by the outer
reference is not the same as grouping by the query. Here an example:
{{{
with t_w_douplicates as (
select abs(n) as pk, n as val from generate_series(-2, 2, 1) n
)
select pk, (
select n
from generate_series(2, 2) n
where n = val
)
from t_w_douplicates
where (
select n from generate_series(2, 2) n where n = val
) is null
GROUP BY pk, (
select n
from generate_series(2, 2) n
where n = val
);
}}}
Which returns
{{{
(0, null)
(1, null)
(2, null)
}}}
And just using the outer ref:
{{{
with t_w_douplicates as (
select abs(n) as pk, n as val from generate_series(-2, 2, 1) n
)
select pk, (
select n
from generate_series(2, 2) n
where n = val
)
from t_w_douplicates
where (
select n from generate_series(2, 2) n where n = val
) is null
GROUP BY pk, val;
}}}
Which returns 4 results:
{{{
(2, null)
(1, null)
(0, null)
(1, null)
}}}
> I don't think asserting against the generated query string is a proper
test case; the generated SQL will change between Django versions but the
returned result set should be the same.
Certainly, haha, that was just to illustrate how to get from the ORM to
the SQL query in the description. The SQL sample now, shows how they
produce different result sets.
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:9>
Comment (by Simon Charette):
Thanks for the extra details, I think I have a better picture of what's
happening here.
In the cases where an outer query spawns a multi-valued relationship and
the subquery references one of these multi-valued relationship we must
keep grouping by the subquery.
In your original reports the `INNER JOIN "camps_bookingoption" T4` joins
spawns multiple rows for the same `camps.Offer` and then your subquery
filter against `(T4."position")` which
In ORM terms that means we must keep returning `self` in
`Subquery.get_group_by_cols` when any of our `OuterRef` include a `__`
which could point to a multi-valued relationship.
We could go a bit further and only disable the optimization if any of the
outer-ref in the `__` chain is multi-valued by relying on `getattr(rel,
'many_to_many', True)` introspection but in all cases that means adding a
way to ''taint'' `Col` instances with whether or not they are/could be
multi-valued so `get_group_by_cols` can `return self` if any of
`get_external_cols` is multi-valued
I could see this tainting as being useful to warn about #10060 in the
future since the logic could be based on this column or alias tainting.
I'm happy to submit a patch or discuss any alternatives but it feels like
the above solution solves the reported problem while maintaining the
optimization for most of the cases.
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:10>
Comment (by Simon Charette):
I give a quick shot at the above solution and came with this rough patch
https://github.com/django/django/compare/master...charettes:ticket-31150
Johannes, could you give it a shot and report whether or not it addresses
your problem? Do you confirm your original queryset was using an
`OuterRef` with a reference to an outer-query multi-valued relationship?
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:11>
Comment (by Johannes Hoppe):
I tested Simon's patch applied to 3.0.3 and it now works as expected. So
yes, I'd say go ahead!
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:12>
* owner: nobody => felixxm
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:13>
* has_patch: 0 => 1
Comment:
[https://github.com/django/django/pull/12519 PR]
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:14>
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:15>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"7b8fa1653fde578ab3a496d9974ab1d4261b8b26" 7b8fa16]:
{{{
#!CommitTicketReference repository=""
revision="7b8fa1653fde578ab3a496d9974ab1d4261b8b26"
Fixed #31150 -- Included subqueries that reference related fields in GROUP
BY clauses.
Thanks Johannes Hoppe for the report.
Regression in fb3f034f1c63160c0ff13c609acd01c18be12f80.
Co-authored-by: Simon Charette <chare...@gmail.com>
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:16>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"c5cfaad2f1f08b31ba04b9534f1a46a6ef1003bf" c5cfaad2]:
{{{
#!CommitTicketReference repository=""
revision="c5cfaad2f1f08b31ba04b9534f1a46a6ef1003bf"
[3.0.x] Fixed #31150 -- Included subqueries that reference related fields
in GROUP BY clauses.
Thanks Johannes Hoppe for the report.
Regression in fb3f034f1c63160c0ff13c609acd01c18be12f80.
Co-authored-by: Simon Charette <chare...@gmail.com>
Backport of 7b8fa1653fde578ab3a496d9974ab1d4261b8b26 from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:17>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"277eea8fcced7f04f3800617f189beb349a3212e" 277eea8f]:
{{{
#!CommitTicketReference repository=""
revision="277eea8fcced7f04f3800617f189beb349a3212e"
Fixed #32478 -- Included nested columns referenced by subqueries in GROUP
BY on aggregations.
Regression in fb3f034f1c63160c0ff13c609acd01c18be12f80.
Refs #31094, #31150.
Thanks Igor Pejic for the report.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:18>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"7a6ca01f4eb990772080e2978f19d83484184b04" 7a6ca01]:
{{{
#!CommitTicketReference repository=""
revision="7a6ca01f4eb990772080e2978f19d83484184b04"
[3.2.x] Fixed #32478 -- Included nested columns referenced by subqueries
in GROUP BY on aggregations.
Regression in fb3f034f1c63160c0ff13c609acd01c18be12f80.
Refs #31094, #31150.
Thanks Igor Pejic for the report.
Backport of 277eea8fcced7f04f3800617f189beb349a3212e from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:19>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"b7b28c7c189615543218e81319473888bc46d831" b7b28c7c]:
{{{
#!CommitTicketReference repository=""
revision="b7b28c7c189615543218e81319473888bc46d831"
Refs #31150 -- Enabled implicit GROUP BY aliases.
This ensures implicit grouping from aggregate function annotations
groups by uncollapsed selected aliases if supported.
The feature is disabled on Oracle because it doesn't support it.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/31150#comment:20>