Custom Aggregate Objects

1,382 views
Skip to first unread message

nsitarz

unread,
Feb 2, 2009, 9:05:27 PM2/2/09
to Django users
Hey,

Back when the ORM aggregate support was a patch in trac I used to
create custom aggregate objects that looked like this:

class Date(Aggregate):
pass


class DateHour(Aggregate):
def __init__(self, lookup):
super(DateHour, self).__init__(lookup)
self.sql_template = '(DATE(${field}) + INTERVAL HOUR(${field})
HOUR)'

However now that aggregate support has made its way into trunk the api
seems to have changed a bit. After reading the aggregate module it
looks to me like I'd have to reimplement the Aggregate class myself in
order to have this same kind of functionality. Is there a supported
way of creating custom aggregate classes for use with the trunk ORM?

I'd appreciate any help that anybody can offer on this.

alex....@gmail.com

unread,
Feb 2, 2009, 9:16:30 PM2/2/09
to Django users
Aggregates were refactored so that they involve 2 classes, 1
implements the public API, and the other implements the SQL bit.
Basically look here:
http://code.djangoproject.com/browser/django/trunk/django/db/models/aggregates.py#L26

for what you're public API bit should be(instead of pulling the clsas
out of the aggreagetes module it can just refer to te specific class
it goes with). And then here:
http://code.djangoproject.com/browser/django/trunk/django/db/models/sql/aggregates.py
for what the SQL backened's API is.

Alex

Russell Keith-Magee

unread,
Feb 3, 2009, 6:34:34 AM2/3/09
to django...@googlegroups.com

Alex has correctly pointed you at the right bit of code for the new
aggregate implementation; however, I would like to add some extra
detail.

The new implementation is split across 2 classes to allow the
definition of aggregates to be independent of their eventual
implementation. This is useful for Django, because it leaves the door
open for supporting non-SQL data stores which don't implement literal
SQL, but may provide aggregate functionality.

However, just because Django splits aggregates over two classes, this
doesn't mean you have to do the same. There are essentially just two
interfaces - there's no reason a single class couldn't implement both.

The 'generic' Aggregate is little more than a placeholder that can
provide a default alias name on the default_alias property, and can
install itself into a query using the add_to_query method. The real
lifting is in the backend specific Aggregate, which keeps track of
column aliases, etc.

You should be able to write an site-specific SQL aggregate by
subclassing the backend-specific Aggregate class
(django.db.models.sql.aggregates.Aggregate), and adding to your
subclass an implementation of default_alias and add_to_query,
following the lead of django.db.models.aggregates.Aggregate. A
straight mixin probably won't work, because your implementation of
add_to_query needs to be slightly different to the version in the
generic Aggregate class, but you should be able to crib off the
generic version to see what needs to be done.

What you will end up with is a backend-specific aggregate class that
appears to be a generic aggregate for the purposes of annotate() and
aggregate() calls.

Yours,
Russ Magee %-)

Russell Keith-Magee

unread,
Feb 3, 2009, 7:04:52 AM2/3/09
to django...@googlegroups.com
On Tue, Feb 3, 2009 at 8:34 PM, Russell Keith-Magee
<freakb...@gmail.com> wrote:
> On Tue, Feb 3, 2009 at 11:16 AM, alex....@gmail.com
> <alex....@gmail.com> wrote:
>>
>> On Feb 2, 9:05 pm, nsitarz <nsit...@gmail.com> wrote:
>>> Hey,
>>>
>>> Back when the ORM aggregate support was a patch in trac I used to
>>> create custom aggregate objects that looked like this:
>>>
>>> class Date(Aggregate):
>>> pass
>>>
>>> class DateHour(Aggregate):
>>> def __init__(self, lookup):
>>> super(DateHour, self).__init__(lookup)
>>> self.sql_template = '(DATE(${field}) + INTERVAL HOUR(${field})
>>> HOUR)'
>>>
>>> However now that aggregate support has made its way into trunk the api
>>> seems to have changed a bit. After reading the aggregate module it
>>> looks to me like I'd have to reimplement the Aggregate class myself in
>>> order to have this same kind of functionality. Is there a supported
>>> way of creating custom aggregate classes for use with the trunk ORM?
>>>
>>> I'd appreciate any help that anybody can offer on this.
...

> What you will end up with is a backend-specific aggregate class that
> appears to be a generic aggregate for the purposes of annotate() and
> aggregate() calls.

Curiosity got the better of me... so here's a worked (and documented)
example that reimplements the Max() aggregate:

from django.db.models.sql.aggregates import Aggregate
class MyMax(Aggregate):
sql_function = 'MAX'
"""A base class to make it easy for end users to define their own
custom SQL aggregates.

The subclass should define the following two class properties:
* sql_function - the name of the SQL function to invoke

Optionally, you can define
* sql_template - a format string that is used to compose the
SQL that will be sent to the database. The template will be
provided with the following substitution variables:
- ``function``, the sql fuction that will be invoked
- ``field``, the resolved name of the column to be
operated on.
The template will also be provided with any keyword argument
provided to the aggregate when it was defined.
The default template is '%(function)s(%(field)s)'
* is_ordinal - a boolean, True if the result of the aggregate
will always be a count, regardless of the field on which the
aggregate is applied. False by default.
* is_computed - a boolean, True if the result of the aggregate
will always be a float, regardless of the field on which the
aggregate is applied. False by default.
"""
def __init__(self, lookup, **extra):
self.lookup = lookup
self.extra = extra

def _default_alias(self):
return '%s__%s' % (self.lookup, self.__class__.__name__.lower())
default_alias = property(_default_alias)

def add_to_query(self, query, alias, col, source, is_summary):
super(MyMax, self).__init__(col, source, is_summary, **self.extra)
query.aggregate_select[alias] = self

Usage:

Book.objects.annotate(MyMax('pubdate'))

Yours,
Russ Magee %-)

nsitarz

unread,
Feb 3, 2009, 12:59:51 PM2/3/09
to Django users
Wow that was fast and is exactly what I was looking for :D

Thanks,
Nick.


On Feb 3, 4:04 am, Russell Keith-Magee <freakboy3...@gmail.com> wrote:
> On Tue, Feb 3, 2009 at 8:34 PM, Russell Keith-Magee
>
>
>
> <freakboy3...@gmail.com> wrote:
> > On Tue, Feb 3, 2009 at 11:16 AM, alex.gay...@gmail.com
> > <alex.gay...@gmail.com> wrote:
>
> >> On Feb 2, 9:05 pm, nsitarz <nsit...@gmail.com> wrote:
> >>> Hey,
>
> >>> Back when the ORMaggregatesupport was a patch in trac I used to
> >>> createcustomaggregateobjects that looked like this:
>
> >>> class Date(Aggregate):
> >>>     pass
>
> >>> class DateHour(Aggregate):
> >>>     def __init__(self, lookup):
> >>>         super(DateHour, self).__init__(lookup)
> >>>         self.sql_template = '(DATE(${field}) + INTERVAL HOUR(${field})
> >>> HOUR)'
>
> >>> However now thataggregatesupport has made its way into trunk the api
> >>> seems to have changed a bit. After reading theaggregatemodule it
> >>> looks to me like I'd have to reimplement theAggregateclass myself in
> >>> order to have this same kind of functionality. Is there a supported
> >>> way of creatingcustomaggregateclasses for use with the trunk ORM?
>
> >>> I'd appreciate any help that anybody can offer on this.
> ...
> > What you will end up with is a backend-specificaggregateclass that
> > appears to be a genericaggregatefor the purposes of annotate() and
> >aggregate() calls.
>
> Curiosity got the better of me... so here's a worked (and documented)
> example that reimplements the Max()aggregate:
>
> from django.db.models.sql.aggregates importAggregate
> class MyMax(Aggregate):
>     sql_function = 'MAX'
>     """A base class to make it easy for end users to define their own
>    customSQL aggregates.
>
>     The subclass should define the following two class properties:
>      * sql_function - the name of the SQL function to invoke
>
>     Optionally, you can define
>       * sql_template - a format string that is used to compose the
>         SQL that will be sent to the database. The template will be
>         provided with the following substitution variables:
>           - ``function``, the sql fuction that will be invoked
>           - ``field``, the resolved name of the column to be
>             operated on.
>         The template will also be provided with any keyword argument
>         provided to theaggregatewhen it was defined.
>         The default template is '%(function)s(%(field)s)'
>       * is_ordinal - a boolean, True if the result of theaggregate
>         will always be a count, regardless of the field on which the
>        aggregateis applied. False by default.
>       * is_computed - a boolean, True if the result of theaggregate
>         will always be a float, regardless of the field on which the
>        aggregateis applied. False by default.

nsitarz

unread,
Feb 3, 2009, 1:29:07 PM2/3/09
to Django users
Well I've successfully subclassed the base aggregate class from
Russell above and it works as expected, however when I try and group
by my new aggregate field I get a FieldError. This makes sense to me
because when I call values on the queryset the new field hasn't been
created yet.

Here's an example of what I mean:
queryset.objects.values('date_group').annotate(date_group=Date
('datefield'))

I've solved this in the past with calls like this:
all_objects = queryset.all()
all_objects.query.group_by=['date_group']
annotated_objects = all_objects.annotate( date_group=Date
('datefield') )

Is there a better way than this to accomplish grouping by aggregate
fields?

alex....@gmail.com

unread,
Feb 3, 2009, 4:00:11 PM2/3/09
to Django users
Why don't you call values() after annotate()
Reply all
Reply to author
Forward
0 new messages