--
Ticket URL: <https://code.djangoproject.com/ticket/34262>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* cc: Simon Charette (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/34262#comment:1>
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/34262#comment:2>
* cc: David Wobrock (added)
Comment:
Hey there,
Took a look at what is happening and why MySQL is failing with
`ONLY_FULL_GROUP_BY`.
In short and simplified, this statement works:
{{{
mysql> SELECT GREATEST(pages, 600), MIN(pages) FROM aggregation_book GROUP
BY GREATEST(pages, 600) ORDER BY NULL;
+----------------------+------------+
| GREATEST(pages, 600) | MIN(pages) |
+----------------------+------------+
| 600 | 300 |
| 1132 | 1132 |
| 946 | 946 |
+----------------------+------------+
3 rows in set (0,01 sec)
}}}
And when you try to add a third expression, that uses the two first:
{{{
mysql> SELECT GREATEST(pages, 600), MIN(pages), LEAST(MIN(pages),
GREATEST(pages, 600)) AS least FROM aggregation_book GROUP BY
GREATEST(pages, 600) ORDER BY NULL;
ERROR 1055 (42000): Expression #3 of SELECT list is not in GROUP BY clause
and contains nonaggregated column
'test_django_tests.aggregation_book2.pages' which is not functionally
dependent on columns in GROUP BY clause; this is incompatible with
sql_mode=only_full_group_by
}}}
MySQL is not happy, even though it seems rather straightforward that this
query should work.
The resources I could find on the topic are here:
- [https://bugs.mysql.com/bug.php?id=90792 A ticket from the MySQL bug
tracker]
- [https://stackoverflow.com/q/49447622 A related StackOverflow thread]
- [https://dev.mysql.com/blog-archive/when-only_full_group_by-wont-see-
the-query-is-deterministic/ A MySQL blog post]
And the blog post explains in more depth why it's not working.
-------
That leaves us with a choice to make for Django's behavior I reckon :)
Some options:
== 1. Add an `ANY_VALUE` around the problematic expression
That solves the issue here for instance:
{{{
mysql> SELECT GREATEST(pages, 600), MIN(pages),
ANY_VALUE(LEAST(MIN(pages), GREATEST(pages, 600))) AS least FROM
aggregation_book2 GROUP BY GREATEST(pages, 600) ORDER BY NULL;
+----------------------+------------+-------+
| GREATEST(pages, 600) | MIN(pages) | least |
+----------------------+------------+-------+
| 600 | 300 | 300 |
| 1132 | 1132 | 1132 |
| 946 | 946 | 946 |
+----------------------+------------+-------+
3 rows in set (0,00 sec)
}}}
However, I fear that detecting when to wrap the expression with an
`ANY_VALUE` is a rabbit hole we don't want to go down, as we might end up
implementing what the MySQL team didn't want to implement.
== 2. Raise awareness
We could, firstly, document the potential issue, and secondly raise a
warning when such an error occurs when executing a query a Django.
That way, users are at least aware that's not entirely their or Django's
fault.
== 3. Generally change query generation
Another type of workaround suggested by the [https://dev.mysql.com/blog-
archive/when-only_full_group_by-wont-see-the-query-is-deterministic/ MySQL
blog post] is to use a subquery/derived table:
{{{
mysql> SELECT greatest_pages,
MIN(pages),
LEAST(MIN(pages), greatest_pages) AS least
FROM (SELECT GREATEST(pages, 600) greatest_pages,
pages
FROM aggregation_book2) AS t
GROUP BY greatest_pages
ORDER BY NULL;
+----------------+------------+-------+
| greatest_pages | MIN(pages) | least |
+----------------+------------+-------+
| 600 | 300 | 300 |
| 1132 | 1132 | 1132 |
| 946 | 946 | 946 |
+----------------+------------+-------+
3 rows in set (0,00 sec)
}}}
So that we always try to group on a column, and not an expression.
Even though, it might be worse in terms of performances, depending the DB
implementation I guess.
This change would then affect all databases I reckon, which is a much
larger change, and therefore riskier.
== 4. Any other option! :D
I hope all of this makes sense, happy to read any thoughts on this :)
See ya!
--
Ticket URL: <https://code.djangoproject.com/ticket/34262#comment:3>
Comment (by Simon Charette):
Thanks for the analysis David, option 1. and 2. were also the conclusion
of [https://github.com/django/django/pull/16460#discussion_r1071497499 my
limited investigation on the subject] so it's great to have cross peer
validation on the subject.
I think there might be a way to implement 3. by reusing
[https://github.com/django/django/blob/d3c93cdc597e0efc2815111c04dd5a427432ed37/django/db/models/sql/compiler.py#L669-L680
some of the logic used to implement masking of columns when filtering
against window functions] which requires two level of subquery wrapping.
A different of approaching 3. is to think that any form of ''masking'' of
annotations/aliases used for grouping purposes would result in a subquery
pushdown. So to reuse your example, instead of performing a subquery
pushdown to compute expressions used with aggregate queries we'd do it the
other way around to have MySQL group against top level `SELECT`
expressions which it's good at inferring dependencies from
{{{#!sql
SELECT LEAST(min_pages, greatest_pages) AS `least` FROM (
SELECT
GREATEST(`aggregation_book`.`pages`, 600) greatest_pages,
MIN(`aggregation_book`.`pages`) min_pages
FROM `aggregation_book`
GROUP BY 1 ORDER BY NULL
) masked
}}}
This should reduce the area of impact of this change to aggregating
queries that group against a value that isn't explicitly selected and
would also happen to solve the last remaining known issue with using
server-side parameters binding for Postgres (#34255).
--
Ticket URL: <https://code.djangoproject.com/ticket/34262#comment:4>
Comment (by Amir Karimi):
Replying to [comment:4 Simon Charette]:
I'm curios to know what happened with this issue. Any updates?
--
Ticket URL: <https://code.djangoproject.com/ticket/34262#comment:5>
Comment (by Mariusz Felisiak):
Replying to [comment:5 Amir Karimi]:
> I'm curios to know what happened with this issue. Any updates?
Feel-free to work on this issue. Please don't leave comments like `any
updates?` they don't help and cause unnecessary noise to the history.
--
Ticket URL: <https://code.djangoproject.com/ticket/34262#comment:6>
* owner: nobody => Jonny Park
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/34262#comment:7>
Comment (by Jonny Park):
I think @David Wobrock's query is easier to implement and covers more
cases.
For example, if we have a followinfg queryset:
{{{
books_qs = (
Book.objects.annotate(greatest_pages=Greatest("pages",
Value(600)))
.values(
"greatest_pages",
)
.annotate(
min_pages=Min("pages"),
least=Least("min_pages", "greatest_pages"),
)
)
}}}
Creating the following query that @David Wobrock presented seems like more
sense to me and covers many other cases.
{{{
SELECT greatest_pages,
MIN(pages),
LEAST(MIN(pages), greatest_pages) AS least
FROM (SELECT GREATEST(pages, 600) greatest_pages,
pages
FROM aggregation_book2) AS t
GROUP BY greatest_pages
ORDER BY NULL;
}}}
If we were to take @Simon Charette's query, it could be like this:
{{{
SELECT geatest_pages, min_pages, LEAST(min_pages, greatest_pages) AS
`least` FROM (
SELECT
GREATEST(`aggregation_book`.`pages`, 600) greatest_pages,
MIN(`aggregation_book`.`pages`) min_pages
FROM `aggregation_book`
GROUP BY 1 ORDER BY NULL
) masked
}}}
I think the position of "MIN(`aggregation_book`.`pages`) min_pages" look
award for me.
with `.values_list("least", flat=True)` clause present, there was a
obvious reason for "MIN(`aggregation_book`.`pages`) min_pages" to be
pushed down because it is a dependency for `least`, but without
`.values_list("least", flat=True)` it loses it's reason to be pushed down.
I am a bit suspicious that choosing which additional item to be pushed
down by looking at `values_list` worth it's effort considering frequency
of this use case is thought to be small.
--
Ticket URL: <https://code.djangoproject.com/ticket/34262#comment:8>
* owner: Jonny Park => (none)
* status: assigned => new
--
Ticket URL: <https://code.djangoproject.com/ticket/34262#comment:9>
Comment (by Simon Charette):
FWIW this relates to #34992 where we had to disable
`allows_group_by_selected_pks` on MariaDB entirely as it doesn't implement
any form of functional dependence resolition.
--
Ticket URL: <https://code.djangoproject.com/ticket/34262#comment:10>