The summary: from r9838 on, I'm getting aggregation_regress test
failures in Oracle, apparently because annotations with extra selects
are adding the extra select aliases into the group by list. This
produces invalid SQL (at least in Oracle), since you can't use column
aliases defined at the same level within a group by clause.
The solution that I'm proposing is to use the extra select expression
itself in the group by, rather than the alias. This passes the tests
across all four included backends, and seems to work in general as
long as the expression is not a scalar subquery (in which case I think
a general solution would require advanced inspection of the
expression). But it bothers me a bit because it just feels fragile.
If there are no concerns or potential gotchas with this approach, then
I'll go ahead and commit this in a few days.
Thanks,
Ian
Sigh. You know, it's times like this that I'm really glad that we have
a standard language that is common across all relational database
implementations....
> The solution that I'm proposing is to use the extra select expression
> itself in the group by, rather than the alias. This passes the tests
> across all four included backends, and seems to work in general as
> long as the expression is not a scalar subquery (in which case I think
> a general solution would require advanced inspection of the
> expression). But it bothers me a bit because it just feels fragile.
We have to do a similar thing for the aggregates themselves; Postgres
can't use an alias in the HAVING clause. However, Postgres doesn't
seem to care about using aliases in the GROUP BY. In that case I erred
on the side of caution and just made all backends use the full clause.
If allowed alias usage is significantly different between backends (as
it appears to be), it may be worth refactoring this.
Out of interest - where does Oracle fall on this one? Can you use
aliases in a HAVING or ORDER BY?
> If there are no concerns or potential gotchas with this approach, then
> I'll go ahead and commit this in a few days.
My only major concern would be any potential performance hit from
double-computing the extra clause. Alex's suggestion to parameterize
this in the connection ops backend sounds like a good idea to me.
However, rather than just defining a boolean feature flag, my
preference would be to define a 'expand_extra()' function (or similar
- bikeshed builder gets to choose the colour) on the ops backend which
just returns the alias in the default case, but returns the full
expression under Oracle. I should probably do the same for alias usage
in HAVING/ORDER_BY clauses.
Yours,
Russ Magee %-)
Oracle only allows aliases in the ORDER BY clause.
>
>> If there are no concerns or potential gotchas with this approach, then
>> I'll go ahead and commit this in a few days.
>
> My only major concern would be any potential performance hit from
> double-computing the extra clause. Alex's suggestion to parameterize
> this in the connection ops backend sounds like a good idea to me.
> However, rather than just defining a boolean feature flag, my
> preference would be to define a 'expand_extra()' function (or similar
> - bikeshed builder gets to choose the colour) on the ops backend which
> just returns the alias in the default case, but returns the full
> expression under Oracle. I should probably do the same for alias usage
> in HAVING/ORDER_BY clauses.
That sounds like a good idea to me. I've already found one bug in my
patch, which is that I didn't account for the duplication of extra
select_params into the group by clause. Since we currently assume the
group by clause has no params, fixing that will already be
non-trivial.
Regards,
Ian
On Tue, 2009-02-17 at 16:05 -0700, Ian Kelly wrote:
[...]
> The solution that I'm proposing is to use the extra select expression
> itself in the group by, rather than the alias. This passes the tests
> across all four included backends, and seems to work in general as
> long as the expression is not a scalar subquery (in which case I think
> a general solution would require advanced inspection of the
> expression). But it bothers me a bit because it just feels fragile.
I'm pretty tired today and under a pile of past-urgent things to do, so
excuse the request for training wheels on this one... can you give a
short SQL example of what you're talking about here (before vs. after).
Regards,
Malcolm
No problem. Take the queryset
`Author.objects.extra(select={'sqrt_age':
'sqrt(age)'}).values('sqrt_age').annotate(Count('book'))`
Current SQL: SELECT (sqrt(age)) AS "SQRT_AGE",
COUNT("AGGREGATION_REGRESS_BOOK_AC81D"."BOOK_ID") AS "BOOK__COUNT"
FROM "AGGREGATION_REGRESS_AUTHOR" LEFT OUTER JOIN
"AGGREGATION_REGRESS_BOOK_AC81D" ON ("AGGREGATION_REGRESS_AUTHOR"."ID"
= "AGGREGATION_REGRESS_BOOK_AC81D"."AUTHOR_ID") GROUP BY "SQRT_AGE"
Proposed SQL: SELECT (sqrt(age)) AS "SQRT_AGE",
COUNT("AGGREGATION_REGRESS_BOOK_AC81D"."BOOK_ID") AS "BOOK__COUNT"
FROM "AGGREGATION_REGRESS_AUTHOR" LEFT OUTER JOIN
"AGGREGATION_REGRESS_BOOK_AC81D" ON ("AGGREGATION_REGRESS_AUTHOR"."ID"
= "AGGREGATION_REGRESS_BOOK_AC81D"."AUTHOR_ID") GROUP BY (sqrt(age))
Regards,
Ian
Ah, ok, I understand what you're talking about now. I don't see anything
horribly bad about this change.
To be SQL spec-compliant, we would have to do this. In addition to
Oracle, SQL Server can't handle aliases in GROUP BY. PostgreSQL can only
do it sometimes (see [1], which it turns out I'd bookmarked a couple of
years ago) and acknowledge it's a bit fragile to use aliases.
I'm not particularly worried about Alex's concern of backend optimisers
that for some reason support aliases, but can't do common sub-expression
optimisation in exactly the same case. They're just broken, if they
exist at all. So lacking concrete evidence, I'd choose not to be scared
by that right now (and keep the internals as simple as possible).
Any particular grounds for doing something spec-compliant feeling
fragile to you, Ian? Having let this bounce around for 24 hours, I can't
think of a strong reason not to do it.
[1] http://archives.postgresql.org/pgsql-sql/2004-02/msg00143.php
Regards,
Malcolm
Only the hazards of copying an arbitrary select expression into the
group by and keeping it syntactically correct. As already noted, we
need to be careful to duplicate the bind params as well, and even this
approach won't work if the expression is a scalar subquery (at least
in Oracle). I can't think of any other potential breakages.
Regards,
Ian