#3566 Aggregations: Call for testing

1 view
Skip to first unread message

Russell Keith-Magee

unread,
Jan 5, 2009, 9:11:51 AM1/5/09
to Django Developers, ian.g...@gmail.com, ogg...@gmail.com
Hi all,

SQL Aggregations (ticket #3566) are almost ready for trunk!

This is a call for final testing. My aim is to have this committed for
the original v1.1 Jan 15 deadline, possibly sooner (depending on the
feedback received). If you want to help out with testing, the code is
available from my Github repository [1].

[1] http://github.com/freakboy3742/django/tree/aggregation

If you're not a Git user, you can download a snapshot tarball from the
Github page.

In particular, I need feedback on the following:

- Oracle support. I don't have any access to Oracle for testing
purposes, so I'm relying on the community (esp the Oracle backend
maintainers) to help me out here.

- Gnarly test cases that break things.

- The documentation. Any suggestions on clarifications that are
required, or places where mine English isn't gooderer enuf.

Thanks in advance,
Russ Magee %-)

Karen Tracey

unread,
Jan 5, 2009, 10:16:26 AM1/5/09
to django-d...@googlegroups.com
On Mon, Jan 5, 2009 at 9:11 AM, Russell Keith-Magee <freakb...@gmail.com> wrote:
[snip]

In particular, I need feedback on the following:

 - Oracle support. I don't have any access to Oracle for testing
purposes, so I'm relying on the community (esp the Oracle backend
maintainers) to help me out here.

I don't know Oracle but I do have a machine set up with it for testing purposes.  In case it's of any use, here are the errors from the aggregation test when run with Oracle:

http://dpaste.com/105594/

and the aggregation_regress errors:

http://dpaste.com/105596/

Some of them may need real Oracle knowledge to fix, but some may be things fixable just by seeing what Oracle is returning differently.
 

 - Gnarly test cases that break things.

 - The documentation. Any suggestions on clarifications that are
required, or places where mine English isn't gooderer enuf.

I'll play around with these, too, but that'll take a little longer than just running the tests on Oracle did.

Cheers,
Karen 

alex....@gmail.com

unread,
Jan 5, 2009, 10:51:41 AM1/5/09
to Django developers
Karen,

Can you try changing line 230 of django.db.models.sql.query from:
row = row[:aggregate_start] + tuple(
to
row = tuple(row[:aggregate_start]) + tuple(

That should clear up most of the errors and seems to be the result of
Oracle returning a list instead of a tuple.

Alex

On Jan 5, 9:16 am, "Karen Tracey" <kmtra...@gmail.com> wrote:
> On Mon, Jan 5, 2009 at 9:11 AM, Russell Keith-Magee
> <freakboy3...@gmail.com>wrote:

Karen Tracey

unread,
Jan 5, 2009, 11:14:29 AM1/5/09
to django-d...@googlegroups.com
On Mon, Jan 5, 2009 at 10:51 AM, alex....@gmail.com <alex....@gmail.com> wrote:

Karen,

Can you try changing line 230 of django.db.models.sql.query from:
                   row = row[:aggregate_start] + tuple(
to
                   row = tuple(row[:aggregate_start]) + tuple(

That should clear up most of the errors and seems to be the result of
Oracle returning a list instead of a tuple.

Yes, that gets rid of a bunch of errors.  Remaining is a quartet of "DatabaseError: ORA-00933: SQL command not properly ended" for aggregation_regress:

http://dpaste.com/105614/

and a few of those plus some other output differences, etc. for aggregation:

http://dpaste.com/105612/

(I'm assuming these are the only interesting tests to run.  The full test suite with Oracle on this machine will take 9 hours or so to run.  That can be improved to about one hour with the rollback testcases approach, but I haven't yet tried applying the latest patch for #8138 on top of the aggregate snapshot.)

Thanks,
Karen

Alex Gaynor

unread,
Jan 5, 2009, 11:21:25 AM1/5/09
to django-d...@googlegroups.com
The one's that are a result of Oracle not returning a Decimal can be solved be inserting "..." before and after the number, in placeess of explicitly saying Decimal(unless of course this is actually a typecasting issue in Django itself, in which case that should be fixed) this is done a few other places in aggregates regress.

The SQL command not properly ended appears to be the result of an annotate call followed by a call to aggregate().  I don't have a clue what causes that as I've used Oracle in my life :) .

Alex
--
"I disapprove of what you say, but I will defend to the death your right to say it." --Voltaire
"The people's good is the highest law."--Cicero

Russell Keith-Magee

unread,
Jan 5, 2009, 5:53:50 PM1/5/09
to django-d...@googlegroups.com
On Tue, Jan 6, 2009 at 12:51 AM, alex....@gmail.com
<alex....@gmail.com> wrote:
>
> Karen,
>
> Can you try changing line 230 of django.db.models.sql.query from:
> row = row[:aggregate_start] + tuple(
> to
> row = tuple(row[:aggregate_start]) + tuple(
>
> That should clear up most of the errors and seems to be the result of
> Oracle returning a list instead of a tuple.

I've just added this fix to the branch. Thanks, Alex and Karen.

Russ %-)

Russell Keith-Magee

unread,
Jan 5, 2009, 6:02:52 PM1/5/09
to django-d...@googlegroups.com
On Tue, Jan 6, 2009 at 1:21 AM, Alex Gaynor <alex....@gmail.com> wrote:
> The one's that are a result of Oracle not returning a Decimal can be solved
> be inserting "..." before and after the number, in placeess of explicitly
> saying Decimal(unless of course this is actually a typecasting issue in
> Django itself, in which case that should be fixed) this is done a few other
> places in aggregates regress.

The '...' change hides the problem, but doesn't fix it. Sum('price')
should return a decimal, since it is a non-computed, non-ordinal
aggregate over a decimal field. Max(closing_time) should return a
datetime.time(), not a datetime.datetime() - this looks like an error
in the Oracle backend.

The real solution for these problems is to write a backend-specific
coerce_aggregate_value() function as part of the DatabaseOperations
class. SQLite has one for exactly this purpose (and for exactly the
same data types).

The SQL errors are slightly more problematic - I'll need to defer to
someone with Oracle-fu for suggestions on the fix.

Russ %-)

Malcolm Tredinnick

unread,
Jan 5, 2009, 6:11:40 PM1/5/09
to django-d...@googlegroups.com
On Mon, 2009-01-05 at 23:11 +0900, Russell Keith-Magee wrote:
> Hi all,
>
> SQL Aggregations (ticket #3566) are almost ready for trunk!
>
> This is a call for final testing.

By the way, I'm pretty sure I accidentally broke django.contrib.gis,
both in trunk and, consequently, the branch with the changes I committed
last night. I only remembered after going to bed that the whole reason
this was fiddly was because there's a custom WhereNode over there. I'll
look at that today.

Malcolm


Karen Tracey

unread,
Jan 5, 2009, 6:17:10 PM1/5/09
to django-d...@googlegroups.com
I think Justin already fixed it in trunk:

http://code.djangoproject.com/changeset/9702

Karen

Russell Keith-Magee

unread,
Jan 5, 2009, 6:20:31 PM1/5/09
to django-d...@googlegroups.com

Looks like Justin may have done this while you slept: [9702] Updated
GeoWhere to be compatible with the changes in r9700.

Russ %-)

Alex Gaynor

unread,
Jan 5, 2009, 9:16:28 PM1/5/09
to django-d...@googlegroups.com
The datetime issue looks like it's just an issue of porting this logic:
http://code.djangoproject.com/browser/django/trunk/django/db/backends/oracle/query.py#L72

over to the coerce aggregates function.  There's a bit of decimal logic above, but I don't think it's anything different than what we already have, is it?

Alex

Russell Keith-Magee

unread,
Jan 5, 2009, 9:40:21 PM1/5/09
to django-d...@googlegroups.com
On Tue, Jan 6, 2009 at 11:16 AM, Alex Gaynor <alex....@gmail.com> wrote:
> The datetime issue looks like it's just an issue of porting this logic:
> http://code.djangoproject.com/browser/django/trunk/django/db/backends/oracle/query.py#L72
>
> over to the coerce aggregates function. There's a bit of decimal logic
> above, but I don't think it's anything different than what we already have,
> is it?

The logic you have identified appears to be the right idea. My only
question is whether we should be needing to replicate this logic in
the aggregation coercion function - the code you identified is
executed as part of resolve_columns, which should be executed as a
precursor to aggregate coercion. This means the problem may lie in
resolve_columns not resolving enough columns, rather than extra
coercion being required.

I should be able to take a look at this tonight, unless someone with
an actual Oracle install beats me to it.

Yours,
Russ Magee %-)

Ian Kelly

unread,
Jan 6, 2009, 2:14:27 AM1/6/09
to django-d...@googlegroups.com
On Mon, Jan 5, 2009 at 9:21 AM, Alex Gaynor <alex....@gmail.com> wrote:
> The one's that are a result of Oracle not returning a Decimal can be solved
> be inserting "..." before and after the number, in placeess of explicitly
> saying Decimal(unless of course this is actually a typecasting issue in
> Django itself, in which case that should be fixed) this is done a few other
> places in aggregates regress.

The Decimal conversion might fail to happen if the corresponding field
passed to resolve_columns is not a DecimalField. This happens for
instance when using extra selects, where no field object is available.
My guess is that the fields for the aggregate columns are getting
excluded or misaligned somehow. This could also be the cause of the
datetime/time issue.

> The SQL command not properly ended appears to be the result of an annotate
> call followed by a call to aggregate(). I don't have a clue what causes
> that as I've used Oracle in my life :) .

I'll take a look at the queries tomorrow and see if I can straighten them out.

Ian

zvoase

unread,
Jan 6, 2009, 5:52:03 AM1/6/09
to Django developers
Just a question - how, if at all, is aggregation going to be supported
or worked around on custom field types?

On Jan 6, 8:14 am, "Ian Kelly" <ian.g.ke...@gmail.com> wrote:

Alex Gaynor

unread,
Jan 6, 2009, 5:55:09 AM1/6/09
to django-d...@googlegroups.com
Well the queries themselves will execute fine, and then you will just back whatever type your database returned, you'll need to do some coercion yourself.

Alex

Russell Keith-Magee

unread,
Jan 6, 2009, 6:14:35 AM1/6/09
to django-d...@googlegroups.com
On Tue, Jan 6, 2009 at 7:52 PM, zvoase <crac...@gmail.com> wrote:
>
> Just a question - how, if at all, is aggregation going to be supported
> or worked around on custom field types?

Do you have any particular fields in mind? Any custom field deriving
from the basic Django numeric and date/time types shouldn't require
any special handling, but I'd appreciate any bug report to the
contrary. If you have a use case for non numeric/date/time types, I'd
be interested to see what we can do to accomodate your needs.

Yours,
Russ Magee %-)

Ian Kelly

unread,
Jan 6, 2009, 2:19:57 PM1/6/09
to django-d...@googlegroups.com

The "ORA-00972: identifier is too long" error is caused by the column
alias names not being truncated to an acceptable length
("book__publisher__num_awards__min" in this case). The call to
perform the truncation is django.db.backends.util.truncate_name(alias,
connection.ops.max_name_length()).

The "ORA-00923: FROM keyword not found where expected" is caused by
one of the column alias names being an unquoted keyword ("number" in
this case). Quoting the column alias names with
connection.ops.quote_name(alias) will fix it.

The "ORA-00933: SQL command not properly ended" errors are caused by
queries of the form "SELECT foo FROM (SELECT bar) AS subquery". The
"AS" keyword needs to be removed, because Oracle doesn't accept it for
subquery aliases.

I'll run the full Oracle test suite and see if anything else pops up.

Ian

Russell Keith-Magee

unread,
Jan 6, 2009, 5:45:22 PM1/6/09
to django-d...@googlegroups.com

Thanks Ian.

Russ %-)

Russell Keith-Magee

unread,
Jan 7, 2009, 9:46:36 AM1/7/09
to django-d...@googlegroups.com, ian.g...@gmail.com, Justin Bronn

I've just pushed some updates to Github that should hopefully fix the
problems that have been reported (ORA-00933, ORA-00923 and ORA-00972
and the decimal/datetime conversion problems). If someone with access
to an Oracle test machine can confirm that this has cleaned up the
errors, I'd be much obliged.

There is some potential for further refactoring here, especially with
regards to the decimal/datetime conversion routines. The Oracle
DatabaseOperations.convert_values() function that I added for
aggregate support is an almost perfect subset of the features in the
Oracle Query.convert_values() function.

Refactoring Query.convert_values() into the backend operations would
remove some code duplication, and would also serve to keep all the
data coercion/typecasting code in a common location rather than
spreading some of it into the Query class. However, there are two
things that prevented me from doing this refactor.

Firstly, Query.convert_values() uses a lot of isinstance() calls on
fields, rather than checking the internal type of the field.
Aggregates use the internal type because aggregates aren't actually
fields, but they do return database types. Ian - can you see a problem
with converting the isinstance(field, DecimalField) calls to
field.get_internal_type() == 'DecimalField' etc?

Secondly, contrib.GIS has a convert_values() function that invokes
super(GeoQuery, self).convert_values() in the case of an Oracle
spatial backend. Justin - is there any particular reason to call the
GeoQuery base class specifically, rather than invoking
self.connection.ops.convert_values()? Is GeoQuery likely to have a
base class other than OracleQuery in the Oracle case?

Thanks,
Russ %-)

Karen Tracey

unread,
Jan 7, 2009, 10:55:22 AM1/7/09
to django-d...@googlegroups.com
On Wed, Jan 7, 2009 at 9:46 AM, Russell Keith-Magee <freakb...@gmail.com> wrote:
I've just pushed some updates to Github that should hopefully fix the
problems that have been reported (ORA-00933, ORA-00923 and ORA-00972
and the decimal/datetime conversion problems). If someone with access
to an Oracle test machine can confirm that this has cleaned up the
errors, I'd be much obliged.

The aggregation tests now pass.  aggregation_regress has one (minor?) problem:

Doctest: regressiontests.aggregation_regress.models.__test__.API_TESTS ... FAIL

======================================================================
FAIL: Doctest: regressiontests.aggregation_regress.models.__test__.API_TESTS
----------------------------------------------------------------------
Traceback (most recent call last):
  File "d:\u\kmt\django\aggregates.git\django\test\_doctest.py", line 2180, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for regressiontests.aggregation_regress.models.__test__.API_TESTS
  File "D:\u\kmt\django\aggregates.git\tests\regressiontests\aggregation_regress\models.py", line unknown line number, in API_TESTS

----------------------------------------------------------------------
File "D:\u\kmt\django\aggregates.git\tests\regressiontests\aggregation_regress\models.py", line ?, in regressiontests.aggregation_regress.models.__test__.API_TESTS
Failed example:
    Publisher.objects.annotate(avg_price=Avg('book__price')).aggregate(Max('avg_price'))
Expected:
    {'avg_price__max': 75.0...}
Got:
    {'avg_price__max': 75}


----------------------------------------------------------------------
Ran 1 test in 11.757s

FAILED (failures=1)
Destroying test database...

Ian Kelly

unread,
Jan 7, 2009, 3:48:45 PM1/7/09
to django-d...@googlegroups.com

This appears to be an existing problem in the Oracle backend where
convert_values neglects to convert FloatField values (interesting that
nothing else in the test suite catches this). But is there a reason
that the aggregation field returns an internal type of FloatField when
the field was originally a DecimalField?

Another oddity with this query is that removing the aggregate call
from the above query and leaving only the annotate still results in
floats instead of decimals. It appears that ops.convert_values isn't
getting called in that case. But replace Avg with Max or Sum and it
does get called.

I noticed another bug while testing this: the annotate query above is
modified after the aggregate call, which can be observed by the
following code. I don't think this one is related to the Oracle
backend.

>>> qs = Publisher.objects.annotate(avg_price=Avg('book__price'))
>>> qs.query.as_sql()
>>> qs.aggregate(Max('avg_price'))
>>> qs.query.as_sql()

Ian

Alex Gaynor

unread,
Jan 7, 2009, 3:54:09 PM1/7/09
to django-d...@googlegroups.com
The solution to the fact that aggregate mutates the query is to have aggregates() do self._clone() like all the other queryset methods do(not sure how we missed this). 

My guess as to why the odd typecasting occurs is because the `is_computed` check occurs because the check for DecimalField.  When it sees is_computed it just casts to float and returns, I have no idea if this behavior is correct or not.

Alex

Justin Bronn

unread,
Jan 7, 2009, 4:28:31 PM1/7/09
to Django developers
> Refactoring Query.convert_values() into the backend operations would
> remove some code duplication, and would also serve to keep all the
> data coercion/typecasting code in a common location rather than
> spreading some of it into the Query class. However, there are two
> things that prevented me from doing this refactor.
> ...
> Secondly, contrib.GIS has a convert_values() function that invokes
> super(GeoQuery, self).convert_values() in the case of an Oracle
> spatial backend. Justin - is there any particular reason to call the
> GeoQuery base class specifically, rather than invoking
> self.connection.ops.convert_values()?

Yes, because your subset implementation of `convert_values` does not
convert LOB values (which GeoQuery needs). Specifically, geometries
are wrapped in a function that returns a CLOB, and the WKT string is
obtained from a `.read()` call on the CLOB.

I'm -1 on moving conversion operations from the Query class to the
backend. However, I'd be +1 if Query.convert_values is kept -- even
if it's just a wrapper around self.connection.ops.convert_values. If
the conversion functions are _only_ available at the backend level,
then it would mean I would have to create distinct spatial database
backends every time I want to subclass the convert_values() method. I
don't want to have to create distinct database backends just so I can
override a single method (DATABASE_ENGINE='gis_postgresql_psycopg2'
looks pretty awful to me).

> Is GeoQuery likely to have a base class other than OracleQuery in the
> Oracle case?

No; unless, of course, the backend requires a custom Query class like
Oracle does -- which would mean any future backends without LIMIT/
OFFSET and spatial support would also subclass from the backend's
Query class.

-Justin

Ian Kelly

unread,
Jan 7, 2009, 5:05:07 PM1/7/09
to django-d...@googlegroups.com
On Wed, Jan 7, 2009 at 7:46 AM, Russell Keith-Magee
<freakb...@gmail.com> wrote:
> There is some potential for further refactoring here, especially with
> regards to the decimal/datetime conversion routines. The Oracle
> DatabaseOperations.convert_values() function that I added for
> aggregate support is an almost perfect subset of the features in the
> Oracle Query.convert_values() function.
>
> Refactoring Query.convert_values() into the backend operations would
> remove some code duplication, and would also serve to keep all the
> data coercion/typecasting code in a common location rather than
> spreading some of it into the Query class. However, there are two
> things that prevented me from doing this refactor.
>
> Firstly, Query.convert_values() uses a lot of isinstance() calls on
> fields, rather than checking the internal type of the field.
> Aggregates use the internal type because aggregates aren't actually
> fields, but they do return database types. Ian - can you see a problem
> with converting the isinstance(field, DecimalField) calls to
> field.get_internal_type() == 'DecimalField' etc?

Yes - field could be None, resulting in an AttributeError. I see no
problem replacing them with (field and field.get_internal_type() ==
'DecimalField').

Currently we seem to be calling convert_values twice for each
aggregate value of an annotate query. First, query.convert_values
gets called from resolve_columns with a None field. Later,
ops.convert_values gets called with the actual field. The reason I
bring this up is that the former call might interfere with the latter
call. For example, if an aggregate value of a DateTimeField has a
time component of exactly midnight, the former call will incorrectly
assume it's meant to be a date value and cast it as such. The latter
call would then need to be intelligent enough to reverse it. In
refactoring these methods, would it be an extraordinary effort to also
refactor the calls so that the former call never happens?

Ian

Malcolm Tredinnick

unread,
Jan 7, 2009, 8:06:57 PM1/7/09
to django-d...@googlegroups.com
On Wed, 2009-01-07 at 13:28 -0800, Justin Bronn wrote:
[...]

> I'm -1 on moving conversion operations from the Query class to the
> backend. However, I'd be +1 if Query.convert_values is kept -- even
> if it's just a wrapper around self.connection.ops.convert_values. If
> the conversion functions are _only_ available at the backend level,
> then it would mean I would have to create distinct spatial database
> backends every time I want to subclass the convert_values() method. I
> don't want to have to create distinct database backends just so I can
> override a single method (DATABASE_ENGINE='gis_postgresql_psycopg2'
> looks pretty awful to me).

The general principle that Justin's mentioning here: providing a general
method that possibly calls a backend method is something I like as a
rule. It allows one to provide some non-backend-specific extensions more
easily. So I agree with Justin in terms of how any refactoring might
happen: leave a Query-level entry point exposed.

> > Is GeoQuery likely to have a base class other than OracleQuery in the
> > Oracle case?
>
> No; unless, of course, the backend requires a custom Query class like
> Oracle does -- which would mean any future backends without LIMIT/
> OFFSET and spatial support would also subclass from the backend's
> Query class.

MS SQL Server support being one case that's likely to crop up as an
external project at some point.

Malcolm

Reply all
Reply to author
Forward
0 new messages