#10290: grouping with extra selects produces invalid SQL

4 views
Skip to first unread message

Ian Kelly

unread,
Feb 17, 2009, 6:05:00 PM2/17/09
to django-d...@googlegroups.com
I'd like to request some extra sets of eyes for the patch in ticket #10290.

The summary: from r9838 on, I'm getting aggregation_regress test
failures in Oracle, apparently because annotations with extra selects
are adding the extra select aliases into the group by list. This
produces invalid SQL (at least in Oracle), since you can't use column
aliases defined at the same level within a group by clause.

The solution that I'm proposing is to use the extra select expression
itself in the group by, rather than the alias. This passes the tests
across all four included backends, and seems to work in general as
long as the expression is not a scalar subquery (in which case I think
a general solution would require advanced inspection of the
expression). But it bothers me a bit because it just feels fragile.

If there are no concerns or potential gotchas with this approach, then
I'll go ahead and commit this in a few days.

Thanks,
Ian

Alex Gaynor

unread,
Feb 17, 2009, 6:07:40 PM2/17/09
to django-d...@googlegroups.com
As you mention it is fragile, and depending on how good the backends optimizer is it could require doing whatever computation 2x(which is undesirable), so I'd say make it an attribute on features and test for it when deciding which to use in group_by.

Alex

--
"I disapprove of what you say, but I will defend to the death your right to say it." --Voltaire
"The people's good is the highest law."--Cicero

Russell Keith-Magee

unread,
Feb 17, 2009, 7:01:04 PM2/17/09
to django-d...@googlegroups.com
On Wed, Feb 18, 2009 at 8:05 AM, Ian Kelly <ian.g...@gmail.com> wrote:
>
> I'd like to request some extra sets of eyes for the patch in ticket #10290.
>
> The summary: from r9838 on, I'm getting aggregation_regress test
> failures in Oracle, apparently because annotations with extra selects
> are adding the extra select aliases into the group by list. This
> produces invalid SQL (at least in Oracle), since you can't use column
> aliases defined at the same level within a group by clause.

Sigh. You know, it's times like this that I'm really glad that we have
a standard language that is common across all relational database
implementations....

> The solution that I'm proposing is to use the extra select expression
> itself in the group by, rather than the alias. This passes the tests
> across all four included backends, and seems to work in general as
> long as the expression is not a scalar subquery (in which case I think
> a general solution would require advanced inspection of the
> expression). But it bothers me a bit because it just feels fragile.

We have to do a similar thing for the aggregates themselves; Postgres
can't use an alias in the HAVING clause. However, Postgres doesn't
seem to care about using aliases in the GROUP BY. In that case I erred
on the side of caution and just made all backends use the full clause.
If allowed alias usage is significantly different between backends (as
it appears to be), it may be worth refactoring this.

Out of interest - where does Oracle fall on this one? Can you use
aliases in a HAVING or ORDER BY?

> If there are no concerns or potential gotchas with this approach, then
> I'll go ahead and commit this in a few days.

My only major concern would be any potential performance hit from
double-computing the extra clause. Alex's suggestion to parameterize
this in the connection ops backend sounds like a good idea to me.
However, rather than just defining a boolean feature flag, my
preference would be to define a 'expand_extra()' function (or similar
- bikeshed builder gets to choose the colour) on the ops backend which
just returns the alias in the default case, but returns the full
expression under Oracle. I should probably do the same for alias usage
in HAVING/ORDER_BY clauses.

Yours,
Russ Magee %-)

Ian Kelly

unread,
Feb 17, 2009, 7:27:52 PM2/17/09
to django-d...@googlegroups.com
On Tue, Feb 17, 2009 at 5:01 PM, Russell Keith-Magee
<freakb...@gmail.com> wrote:
> Out of interest - where does Oracle fall on this one? Can you use
> aliases in a HAVING or ORDER BY?

Oracle only allows aliases in the ORDER BY clause.

>
>> If there are no concerns or potential gotchas with this approach, then
>> I'll go ahead and commit this in a few days.
>
> My only major concern would be any potential performance hit from
> double-computing the extra clause. Alex's suggestion to parameterize
> this in the connection ops backend sounds like a good idea to me.
> However, rather than just defining a boolean feature flag, my
> preference would be to define a 'expand_extra()' function (or similar
> - bikeshed builder gets to choose the colour) on the ops backend which
> just returns the alias in the default case, but returns the full
> expression under Oracle. I should probably do the same for alias usage
> in HAVING/ORDER_BY clauses.

That sounds like a good idea to me. I've already found one bug in my
patch, which is that I didn't account for the duplication of extra
select_params into the group by clause. Since we currently assume the
group by clause has no params, fixing that will already be
non-trivial.

Regards,
Ian

Malcolm Tredinnick

unread,
Feb 17, 2009, 7:54:08 PM2/17/09
to django-d...@googlegroups.com
Hi Ian,

On Tue, 2009-02-17 at 16:05 -0700, Ian Kelly wrote:
[...]


> The solution that I'm proposing is to use the extra select expression
> itself in the group by, rather than the alias. This passes the tests
> across all four included backends, and seems to work in general as
> long as the expression is not a scalar subquery (in which case I think
> a general solution would require advanced inspection of the
> expression). But it bothers me a bit because it just feels fragile.

I'm pretty tired today and under a pile of past-urgent things to do, so
excuse the request for training wheels on this one... can you give a
short SQL example of what you're talking about here (before vs. after).

Regards,
Malcolm


Ian Kelly

unread,
Feb 17, 2009, 8:15:45 PM2/17/09
to django-d...@googlegroups.com

No problem. Take the queryset
`Author.objects.extra(select={'sqrt_age':
'sqrt(age)'}).values('sqrt_age').annotate(Count('book'))`

Current SQL: SELECT (sqrt(age)) AS "SQRT_AGE",
COUNT("AGGREGATION_REGRESS_BOOK_AC81D"."BOOK_ID") AS "BOOK__COUNT"
FROM "AGGREGATION_REGRESS_AUTHOR" LEFT OUTER JOIN
"AGGREGATION_REGRESS_BOOK_AC81D" ON ("AGGREGATION_REGRESS_AUTHOR"."ID"
= "AGGREGATION_REGRESS_BOOK_AC81D"."AUTHOR_ID") GROUP BY "SQRT_AGE"

Proposed SQL: SELECT (sqrt(age)) AS "SQRT_AGE",
COUNT("AGGREGATION_REGRESS_BOOK_AC81D"."BOOK_ID") AS "BOOK__COUNT"
FROM "AGGREGATION_REGRESS_AUTHOR" LEFT OUTER JOIN
"AGGREGATION_REGRESS_BOOK_AC81D" ON ("AGGREGATION_REGRESS_AUTHOR"."ID"
= "AGGREGATION_REGRESS_BOOK_AC81D"."AUTHOR_ID") GROUP BY (sqrt(age))

Regards,
Ian

Malcolm Tredinnick

unread,
Feb 19, 2009, 1:53:51 AM2/19/09
to django-d...@googlegroups.com

Ah, ok, I understand what you're talking about now. I don't see anything
horribly bad about this change.

To be SQL spec-compliant, we would have to do this. In addition to
Oracle, SQL Server can't handle aliases in GROUP BY. PostgreSQL can only
do it sometimes (see [1], which it turns out I'd bookmarked a couple of
years ago) and acknowledge it's a bit fragile to use aliases.

I'm not particularly worried about Alex's concern of backend optimisers
that for some reason support aliases, but can't do common sub-expression
optimisation in exactly the same case. They're just broken, if they
exist at all. So lacking concrete evidence, I'd choose not to be scared
by that right now (and keep the internals as simple as possible).

Any particular grounds for doing something spec-compliant feeling
fragile to you, Ian? Having let this bounce around for 24 hours, I can't
think of a strong reason not to do it.

[1] http://archives.postgresql.org/pgsql-sql/2004-02/msg00143.php

Regards,
Malcolm

Ian Kelly

unread,
Feb 19, 2009, 2:08:42 PM2/19/09
to django-d...@googlegroups.com

Only the hazards of copying an arbitrary select expression into the
group by and keeping it syntactically correct. As already noted, we
need to be careful to duplicate the bind params as well, and even this
approach won't work if the expression is a scalar subquery (at least
in Oracle). I can't think of any other potential breakages.

Regards,
Ian

Reply all
Reply to author
Forward
0 new messages