SQL Aggregations (ticket #3566) are almost ready for trunk!
This is a call for final testing. My aim is to have this committed for
the original v1.1 Jan 15 deadline, possibly sooner (depending on the
feedback received). If you want to help out with testing, the code is
available from my Github repository [1].
[1] http://github.com/freakboy3742/django/tree/aggregation
If you're not a Git user, you can download a snapshot tarball from the
Github page.
In particular, I need feedback on the following:
- Oracle support. I don't have any access to Oracle for testing
purposes, so I'm relying on the community (esp the Oracle backend
maintainers) to help me out here.
- Gnarly test cases that break things.
- The documentation. Any suggestions on clarifications that are
required, or places where mine English isn't gooderer enuf.
Thanks in advance,
Russ Magee %-)
[snip]
In particular, I need feedback on the following:
- Oracle support. I don't have any access to Oracle for testing
purposes, so I'm relying on the community (esp the Oracle backend
maintainers) to help me out here.
- Gnarly test cases that break things.
- The documentation. Any suggestions on clarifications that are
required, or places where mine English isn't gooderer enuf.
Karen,
Can you try changing line 230 of django.db.models.sql.query from:
row = row[:aggregate_start] + tuple(
to
row = tuple(row[:aggregate_start]) + tuple(
That should clear up most of the errors and seems to be the result of
Oracle returning a list instead of a tuple.
I've just added this fix to the branch. Thanks, Alex and Karen.
Russ %-)
The '...' change hides the problem, but doesn't fix it. Sum('price')
should return a decimal, since it is a non-computed, non-ordinal
aggregate over a decimal field. Max(closing_time) should return a
datetime.time(), not a datetime.datetime() - this looks like an error
in the Oracle backend.
The real solution for these problems is to write a backend-specific
coerce_aggregate_value() function as part of the DatabaseOperations
class. SQLite has one for exactly this purpose (and for exactly the
same data types).
The SQL errors are slightly more problematic - I'll need to defer to
someone with Oracle-fu for suggestions on the fix.
Russ %-)
By the way, I'm pretty sure I accidentally broke django.contrib.gis,
both in trunk and, consequently, the branch with the changes I committed
last night. I only remembered after going to bed that the whole reason
this was fiddly was because there's a custom WhereNode over there. I'll
look at that today.
Malcolm
Looks like Justin may have done this while you slept: [9702] Updated
GeoWhere to be compatible with the changes in r9700.
Russ %-)
The logic you have identified appears to be the right idea. My only
question is whether we should be needing to replicate this logic in
the aggregation coercion function - the code you identified is
executed as part of resolve_columns, which should be executed as a
precursor to aggregate coercion. This means the problem may lie in
resolve_columns not resolving enough columns, rather than extra
coercion being required.
I should be able to take a look at this tonight, unless someone with
an actual Oracle install beats me to it.
Yours,
Russ Magee %-)
The Decimal conversion might fail to happen if the corresponding field
passed to resolve_columns is not a DecimalField. This happens for
instance when using extra selects, where no field object is available.
My guess is that the fields for the aggregate columns are getting
excluded or misaligned somehow. This could also be the cause of the
datetime/time issue.
> The SQL command not properly ended appears to be the result of an annotate
> call followed by a call to aggregate(). I don't have a clue what causes
> that as I've used Oracle in my life :) .
I'll take a look at the queries tomorrow and see if I can straighten them out.
Ian
Do you have any particular fields in mind? Any custom field deriving
from the basic Django numeric and date/time types shouldn't require
any special handling, but I'd appreciate any bug report to the
contrary. If you have a use case for non numeric/date/time types, I'd
be interested to see what we can do to accomodate your needs.
Yours,
Russ Magee %-)
The "ORA-00972: identifier is too long" error is caused by the column
alias names not being truncated to an acceptable length
("book__publisher__num_awards__min" in this case). The call to
perform the truncation is django.db.backends.util.truncate_name(alias,
connection.ops.max_name_length()).
The "ORA-00923: FROM keyword not found where expected" is caused by
one of the column alias names being an unquoted keyword ("number" in
this case). Quoting the column alias names with
connection.ops.quote_name(alias) will fix it.
The "ORA-00933: SQL command not properly ended" errors are caused by
queries of the form "SELECT foo FROM (SELECT bar) AS subquery". The
"AS" keyword needs to be removed, because Oracle doesn't accept it for
subquery aliases.
I'll run the full Oracle test suite and see if anything else pops up.
Ian
Thanks Ian.
Russ %-)
I've just pushed some updates to Github that should hopefully fix the
problems that have been reported (ORA-00933, ORA-00923 and ORA-00972
and the decimal/datetime conversion problems). If someone with access
to an Oracle test machine can confirm that this has cleaned up the
errors, I'd be much obliged.
There is some potential for further refactoring here, especially with
regards to the decimal/datetime conversion routines. The Oracle
DatabaseOperations.convert_values() function that I added for
aggregate support is an almost perfect subset of the features in the
Oracle Query.convert_values() function.
Refactoring Query.convert_values() into the backend operations would
remove some code duplication, and would also serve to keep all the
data coercion/typecasting code in a common location rather than
spreading some of it into the Query class. However, there are two
things that prevented me from doing this refactor.
Firstly, Query.convert_values() uses a lot of isinstance() calls on
fields, rather than checking the internal type of the field.
Aggregates use the internal type because aggregates aren't actually
fields, but they do return database types. Ian - can you see a problem
with converting the isinstance(field, DecimalField) calls to
field.get_internal_type() == 'DecimalField' etc?
Secondly, contrib.GIS has a convert_values() function that invokes
super(GeoQuery, self).convert_values() in the case of an Oracle
spatial backend. Justin - is there any particular reason to call the
GeoQuery base class specifically, rather than invoking
self.connection.ops.convert_values()? Is GeoQuery likely to have a
base class other than OracleQuery in the Oracle case?
Thanks,
Russ %-)
I've just pushed some updates to Github that should hopefully fix the
problems that have been reported (ORA-00933, ORA-00923 and ORA-00972
and the decimal/datetime conversion problems). If someone with access
to an Oracle test machine can confirm that this has cleaned up the
errors, I'd be much obliged.
This appears to be an existing problem in the Oracle backend where
convert_values neglects to convert FloatField values (interesting that
nothing else in the test suite catches this). But is there a reason
that the aggregation field returns an internal type of FloatField when
the field was originally a DecimalField?
Another oddity with this query is that removing the aggregate call
from the above query and leaving only the annotate still results in
floats instead of decimals. It appears that ops.convert_values isn't
getting called in that case. But replace Avg with Max or Sum and it
does get called.
I noticed another bug while testing this: the annotate query above is
modified after the aggregate call, which can be observed by the
following code. I don't think this one is related to the Oracle
backend.
>>> qs = Publisher.objects.annotate(avg_price=Avg('book__price'))
>>> qs.query.as_sql()
>>> qs.aggregate(Max('avg_price'))
>>> qs.query.as_sql()
Ian
Yes - field could be None, resulting in an AttributeError. I see no
problem replacing them with (field and field.get_internal_type() ==
'DecimalField').
Currently we seem to be calling convert_values twice for each
aggregate value of an annotate query. First, query.convert_values
gets called from resolve_columns with a None field. Later,
ops.convert_values gets called with the actual field. The reason I
bring this up is that the former call might interfere with the latter
call. For example, if an aggregate value of a DateTimeField has a
time component of exactly midnight, the former call will incorrectly
assume it's meant to be a date value and cast it as such. The latter
call would then need to be intelligent enough to reverse it. In
refactoring these methods, would it be an extraordinary effort to also
refactor the calls so that the former call never happens?
Ian
The general principle that Justin's mentioning here: providing a general
method that possibly calls a backend method is something I like as a
rule. It allows one to provide some non-backend-specific extensions more
easily. So I agree with Justin in terms of how any refactoring might
happen: leave a Query-level entry point exposed.
> > Is GeoQuery likely to have a base class other than OracleQuery in the
> > Oracle case?
>
> No; unless, of course, the backend requires a custom Query class like
> Oracle does -- which would mean any future backends without LIMIT/
> OFFSET and spatial support would also subclass from the backend's
> Query class.
MS SQL Server support being one case that's likely to crop up as an
external project at some point.
Malcolm