Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
Aggregate Support to the ORM
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 34 - Collapse all  -  Translate all to Translated (View all originals)   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Nicolas E. Lara G.  
View profile  
 More options Apr 22 2008, 4:24 pm
From: "Nicolas E. Lara G." <nicolasl...@gmail.com>
Date: Tue, 22 Apr 2008 13:24:19 -0700 (PDT)
Local: Tues, Apr 22 2008 4:24 pm
Subject: [GSoC] Aggregate Support to the ORM
Hello,
I have been selected to implement Aggregate support in the Django ORM
as a project for Google Summer of Code 2008 =) My mentor for this
project is Russell Keith-Magee.
I am very happy with both being selected (of course :p) and, in
general, with all the projects that were selected for Django on this
summer of code. I believe they are all very good additions to the
framework.

Though it has been discussed in this mailing list before I will post a
more concrete description of my project:

General Considerations
----------------------------------
When doing aggregation using the ORM we should continue to be
abstracted from SQL and think about easily expressing the solutions to
the problems we need to solve. The idea is not to provide a wrapper
around any SQL functionalities but to solve data aggregation problems
in a higher abstraction level (with the ORM usign the underlaying SQL
functionalities).

The proposed syntax includes two QuerySet modifiers. The first one
(annotate) would be used to extend the QuerySet with aggregated
information, this is, adding each object a piece of aggregated
information. The second one (aggregate) would be used to reduce the
aggregation over a QuerySet to a set of values for the whole model.

An example of a problem solved with aggregate would be:
  "Find out the average height of the kids in a school"
An example of a problem solved with annotate would be:
  "Find the average hight of the kids in every classroom"

(this are just examples, more complex things follow)

Proposed Syntax
-----------------------
The syntax for specifying the aggregation will be filter-like syntax

So a QuerySet might be:

>>> Students.objects.all().aggregate(height__avg='agerage_hight')

{'average_height' : 1.43}

with the possibility of using non-keyword arguments for un-aliased
aggregation, in which case the 'alias' would be automatically created
from the query made:
 >>> Students.objects.all().aggregate('height__avg')
{'height_avg' : 1.43}

For specifying the aggregates over a related model the syntax will be
the same as in filter:

>>> Buyer.objects.all().aggregate(purchases__quantity__sum='total_sales')

{'total_sales': 259L}

The same would stand for annotate... but it would yield a different
result:

>>>Buyer.objects.all().annotate(purchases__quantity__sum='total_sales')

[{'age': 35, 'total_sales': 93L, 'id': 2, 'name': u'John'}, {'age':
21, 'total_sales': 166L, 'id': 1, 'name': u'Peter'}]

Grouping
--------------
Grouping should be done internally. There should be no syntax for
directely asking the database to use GROUP BY.  How to group should be
derived from the object representation of what needs to be retrieved.

When necessary to specify the fields to be included in the agregation
(i. e. grouped) it could be done with values().
If we want to specify:
  "The minimum number of items sold for the same bulk price"
we could do:

>>> Purchase.objects.all().values('payment').annotate(quantity__min='num_adquis itions')

[{'num_adquisitions': 56, 'payment': 356841.0},
 {'num_adquisitions': 3, 'payment': 13521.0},
 {'num_adquisitions': 85, 'payment': 6185463.0}]

which results in grouping by price.

Filtering
------------
When querying for aggregates wwe might want to filter the results
according to information retrieved by the aggregation itself. We
should be able to simply use the filter modifier refering to the alias
of the aggregated field:

>>> Purchase.objects.all().values('payment').annotate(quantity__min='num_adquis itions').filter(num_adquisitions__lt=5)

[ {'num_adquisitions': 3, 'payment': 13521.0}]

This would result in a 'having' sql clause.

Extra
-------
That defines the basic aggregation functionality. There are certain
extra things that can be added but, as I see it, shouldn't be
addressed until the basic functionality is up and running.
One of those things is custom aggregate functions. A first idea to
approach them would be something like this:
      >>> my_own_aggregate = Aggregate( sum('field_1') * 'field2',
'default_name')
      >>>
queryset.aggregate(purchases__price__custom=my_own_aggregate('alias'))
Still, as I said, there is much to be done before addressing the
custom aggergate problem.

Another improvement that could be added after the basic functionality
is done is the, previously discussed, F syntax for refering to other
fields in the model. A basic idea of this would be something like:

>>> Purchase.objects.all().values('payment').annotate(quantity__min='num_adquis itions', quantity__avg='average_items').filter(num_adquisitions__lt=F('average_items '))

Again, this is to be addressed after the aggregate's basic
functionality is working.

Building Use Cases
----------------------------
As for now I believe this covers most of the use cases of database
aggregation. If anybody can think of a use case which  is not covered
by this syntax please post it (preferably with a plain-english
description of what the expected behaviour is). Any suggestion is more
than welcome!

I will keep you all posted in any change or advance on this project.

Regards =)

Nicolas Lara


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matthias Kestenholz  
View profile  
 More options Apr 22 2008, 4:36 pm
From: Matthias Kestenholz <m...@spinlock.ch>
Date: Tue, 22 Apr 2008 22:36:14 +0200
Local: Tues, Apr 22 2008 4:36 pm
Subject: Re: [GSoC] Aggregate Support to the ORM
Hi.

this is very exciting! I've one suggestion/question though.

On Tue, 2008-04-22 at 13:24 -0700, Nicolas E. Lara G. wrote:

> with the possibility of using non-keyword arguments for un-aliased
> aggregation, in which case the 'alias' would be automatically created
> from the query made:
>  >>> Students.objects.all().aggregate('height__avg')
> {'height_avg' : 1.43}

Was the name-munging here done intentionally? I think, that it's a bad
choice to leave one of the underlines out. I think that it's too easy to
shoot oneself in the foot with such magic (Think denormalized table
schemas).

Apart from that: Great stuff, can't wait for it :-)

        Matthias

--
http://spinlock.ch/blog/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rajeev J Sebastian  
View profile  
 More options Apr 22 2008, 4:37 pm
From: "Rajeev J Sebastian" <rajeev.sebast...@gmail.com>
Date: Wed, 23 Apr 2008 02:07:26 +0530
Local: Tues, Apr 22 2008 4:37 pm
Subject: Re: [GSoC] Aggregate Support to the ORM
Hi Nicolas,

First, congratulations on your GSOC selection :)

I would to know if you will be building some way to use PG custom
aggregates. I have so far one custom aggregate and I wonder how it
could work with this scheme.

Specifically, will there be some way to create new aggregate
predicates just as the builtin sum, avg, etc ?

Regards
Rajeev J Sebastian

On Wed, Apr 23, 2008 at 1:54 AM, Nicolas E. Lara G.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
alex.gaynor@gmail.com  
View profile  
 More options Apr 22 2008, 4:55 pm
From: "alex.gay...@gmail.com" <alex.gay...@gmail.com>
Date: Tue, 22 Apr 2008 13:55:29 -0700 (PDT)
Local: Tues, Apr 22 2008 4:55 pm
Subject: Re: Aggregate Support to the ORM
Looks like good stuff, the one thing that would be nice would be to
have annotate return Model Objects instead of dicts.

On Apr 22, 3:37 pm, "Rajeev J Sebastian" <rajeev.sebast...@gmail.com>
wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nicolas E. Lara G.  
View profile  
 More options Apr 22 2008, 4:56 pm
From: "Nicolas E. Lara G." <nicolasl...@gmail.com>
Date: Tue, 22 Apr 2008 13:56:52 -0700 (PDT)
Local: Tues, Apr 22 2008 4:56 pm
Subject: Re: Aggregate Support to the ORM

On Apr 22, 4:36 pm, Matthias Kestenholz <m...@spinlock.ch> wrote:

non intentional at all.. justa typo... sorry for that..

> Apart from that: Great stuff, can't wait for it :-)

thanks =)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nicolas E. Lara G.  
View profile  
 More options Apr 22 2008, 5:10 pm
From: "Nicolas E. Lara G." <nicolasl...@gmail.com>
Date: Tue, 22 Apr 2008 14:10:14 -0700 (PDT)
Local: Tues, Apr 22 2008 5:10 pm
Subject: Re: Aggregate Support to the ORM

On Apr 22, 4:37 pm, "Rajeev J Sebastian" <rajeev.sebast...@gmail.com>
wrote:

> Hi Nicolas,

> First, congratulations on your GSOC selection :)

Thanks =)

> I would to know if you will be building some way to use PG custom
> aggregates. I have so far one custom aggregate and I wonder how it
> could work with this scheme.

To be honest I haven't thought about it. As I said, I'd like to get
the basic aggregation (semi)stable before digging into other stuff
simply because if the basic api changes then everything built on top
of it has to change also.
I do think it is a good idea as long as we don't have to provide PG-
only functionality. With what i've done so far everything is done with
the names provided by the DBMS so if you define a custom aggregate in
postgres, lets say, AGG; you should be able to use it as
some_field__agg='alias'...

> Specifically, will there be some way to create new aggregate
> predicates just as the builtin sum, avg, etc ?

When the I get the basic functionality on wheels I'll check of a way
to do this in a cross-database manner.

Thanks =)

Regards,


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eratothene  
View profile  
 More options Apr 23 2008, 10:50 am
From: Eratothene <cd...@bk.ru>
Date: Wed, 23 Apr 2008 07:50:43 -0700 (PDT)
Local: Wed, Apr 23 2008 10:50 am
Subject: Re: [GSoC] Aggregate Support to the ORM
It looks that API lacks support for SQL HAVING clause, this one should
be distinct from filter, GROUP BY without HAVING is often useless.

Consider adding it to api (something like
Model.objects.aggregate(height__sum).having(sex='M'))

Also do not forget to add HAVING clause to .extra()  method.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Malcolm Tredinnick  
View profile  
 More options Apr 23 2008, 10:54 am
From: Malcolm Tredinnick <malc...@pointy-stick.com>
Date: Thu, 24 Apr 2008 00:54:56 +1000
Local: Wed, Apr 23 2008 10:54 am
Subject: Re: [GSoC] Aggregate Support to the ORM

On Wed, 2008-04-23 at 07:50 -0700, Eratothene wrote:
> It looks that API lacks support for SQL HAVING clause, this one should
> be distinct from filter, GROUP BY without HAVING is often useless.

> Consider adding it to api (something like
> Model.objects.aggregate(height__sum).having(sex='M'))

> Also do not forget to add HAVING clause to .extra()  method.

You may wish to go back and read the earlier thread(s) about this on
this group where this has been discussed before.

One goal in the design here has been to avoid leaking lots of SQL into
the API: "having" and "group by" crop up as a natural function in the
use of aggregates, but they aren't necessarily needed in isolation. The
idea is not to rewrite SQL in Python. Rather, we want to provide certain
pieces of functionality -- in this case, aggregations support -- that is
mapped to SQL. So the SQL will use "having" and "group by" but they
don't to be exposed at the Python level.

Regards,
Malcolm

--
Save the whales. Collect the whole set.
http://www.pointy-stick.com/blog/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nicolas Lara  
View profile  
 More options Apr 24 2008, 10:14 am
From: "Nicolas Lara" <nicolasl...@gmail.com>
Date: Thu, 24 Apr 2008 17:14:47 +0300
Local: Thurs, Apr 24 2008 10:14 am
Subject: Re: [GSoC] Aggregate Support to the ORM
Sorry ıf thıs goes on the wrong thread. I am wrıttıng at the moment
from turkey, where google groups ıs blocked.

In short: I belıeve Malcom's reply says ıt all except for the fact
that the 'havıng' clause would be expressed wıth a fılter modıfıer.
The ORM would take care of puttıng the requested fılter ın a 'havıng'
clause or ın a 'where' clause...

On the ıssue of returnıng model objects ınstead of values
(dıctıonarıes): I belıeve the problem wıth that ıs ınconsıstency,
sınce when usıng values to restrıct the elements that are grouped
together you cannot retrıeve an object. I am -0 on retrıevıng the
objects because ıt becomes complıated for the users, but wouldn't
opose much because values ıs specıfıed whenever a valuesQuerySet would
be returned.

On Wed, Apr 23, 2008 at 5:54 PM, Malcolm Tredinnick

--
Nicolas Lara
Linux user #380134
Public key id: 0x152e7713 at http://keyserver.noreply.org/
# Anti-Spam e-mail addresses:
python -c "print '@'.join(['nicolaslara', '.'.join([x for x in
reversed( 'com|gmail'.split('|') )])])"
python -c "print '@'.join(['nicolas', '.'.join([x for x in reversed(
've|usb|labf|ac'.split('|') )])])"

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Russell Keith-Magee  
View profile  
 More options Apr 25 2008, 1:05 am
From: "Russell Keith-Magee" <freakboy3...@gmail.com>
Date: Fri, 25 Apr 2008 13:05:15 +0800
Local: Fri, Apr 25 2008 1:05 am
Subject: Re: [GSoC] Aggregate Support to the ORM

On Thu, Apr 24, 2008 at 10:14 PM, Nicolas Lara <nicolasl...@gmail.com> wrote:
>  On the ıssue of returnıng model objects ınstead of values
>  (dıctıonarıes): I belıeve the problem wıth that ıs ınconsıstency,
>  sınce when usıng values to restrıct the elements that are grouped
>  together you cannot retrıeve an object. I am -0 on retrıevıng the
>  objects because ıt becomes complıated for the users, but wouldn't
>  opose much because values ıs specıfıed whenever a valuesQuerySet would
>  be returned.

I'm not sure why you see this as being complicated for users:

>>> authors = Author.objects.annotate(book__sum='books_published')
>>> for a in authors:

...    print a.name,'has published',a.books_published,'books.'

There is no problem making each a in authors an actual Author object;
the only difference is that each author is annotated. Sure, this means
that the author intances have attributes that aren't part of the
original definition of Author, but this is perfectly consistent with
Python - you can associate arbitrary attributes with any object
instances.

If a values() clause is involved in a query, then it makes sense that
what gets returned should be a value list rather than an object - but
that's perfectly consistent with the behaviour of values() currently.
However, in the simple case -  where you are grouping by full object
instances - I don't see why we shouldn't return object instances.

Yours,
Russ Magee %-)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Honza Král  
View profile  
 More options Apr 25 2008, 9:35 am
From: "Honza Král" <honza.k...@gmail.com>
Date: Fri, 25 Apr 2008 15:35:14 +0200
Local: Fri, Apr 25 2008 9:35 am
Subject: Re: [GSoC] Aggregate Support to the ORM
I would love to see something like AggregateModel returned from
.values().annotate()

it would work like:

for m in Model.objects.values( 'name', 'place' ).annotate(
event__count='number_or_events' )

  # the basic data would be available right away
  print m['number_or_events'], m['place'], m['name']
  # or some other type of access, it doesn't have to look like a dictionary
  # haven't really thought about this

  for o m.objects: # this would pull all models that have the
appropriate name and place
    print o.name, o.place, o.type

does this make sense?

the target is to be able to pull the aggregates and provide easy
access to all the objects in the group, by wrapping this into a class
of some sort, we would be able to work with the results more
effectively than when using dictionaries.

It could also maybe mimic the behavior of the model (with rest of the
fields set to None ??) itself (just playing with ideas, please
somebody tell me if this makes any sense to anybody).

The motivation behind all this is that it seems a waste to just return
the data and drop all the metadata in the process.

On Fri, Apr 25, 2008 at 7:05 AM, Russell Keith-Magee

plus we currently do this in .extra(select={ ... } )

I am +1 on returning actual objects - the use case would be that I
want to print the table of users with number of comments they made and
i really want to be able to access their profile, their absolute url
and any other attribute or method of the model involved

>  If a values() clause is involved in a query, then it makes sense that
>  what gets returned should be a value list rather than an object - but
>  that's perfectly consistent with the behaviour of values() currently.
>  However, in the simple case -  where you are grouping by full object
>  instances - I don't see why we shouldn't return object instances.

>  Yours,
>  Russ Magee %-)

--
Honza Král
E-Mail: Honza.K...@gmail.com
ICQ#: 107471613
Phone: +420 606 678585

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Simon Willison  
View profile  
 More options Apr 25 2008, 9:35 am
From: Simon Willison <si...@simonwillison.net>
Date: Fri, 25 Apr 2008 06:35:31 -0700 (PDT)
Local: Fri, Apr 25 2008 9:35 am
Subject: Re: Aggregate Support to the ORM
On Apr 22, 9:24 pm, "Nicolas E. Lara G." <nicolasl...@gmail.com>
wrote:

> So a QuerySet might be:
> >>> Students.objects.all().aggregate(height__avg='average_height')
> {'average_height' : 1.43}

This looks really great, but one thing that initially confused me was
the ordering of the arguments - it seems strange to ask for
(height_avg = 'average_height') and then get back {'average_height':
1.43} - the name in the output is in a different place from the name
in the input.

Have you considered syntax that looks like this instead? :

>>> Students.objects.all().aggregate({'average_height': 'height__avg})
> {'average_height' : 1.43}

That way the API feels like you are passing in an example of the data
structure you want, and Django is using your example and filling in
the gaps for you.

Cheers,

Simon Willison


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Russell Keith-Magee  
View profile  
 More options Apr 26 2008, 1:40 am
From: "Russell Keith-Magee" <freakboy3...@gmail.com>
Date: Sat, 26 Apr 2008 13:40:46 +0800
Local: Sat, Apr 26 2008 1:40 am
Subject: Re: Aggregate Support to the ORM

On Fri, Apr 25, 2008 at 9:35 PM, Simon Willison <si...@simonwillison.net> wrote:

>  On Apr 22, 9:24 pm, "Nicolas E. Lara G." <nicolasl...@gmail.com>
>  wrote:

> > So a QuerySet might be:
>  > >>> Students.objects.all().aggregate(height__avg='average_height')
>  > {'average_height' : 1.43}

>  This looks really great, but one thing that initially confused me was
>  the ordering of the arguments - it seems strange to ask for
>  (height_avg = 'average_height') and then get back {'average_height':
>  1.43} - the name in the output is in a different place from the name
>  in the input.

Hi Simon

I can see what you are saying here, but I'm not sure I agree. The
underlying idea for this syntax is to follow the example of the
'double underscore then operator' syntax used by filters. Yes, the
'assignment' in an aggregate is in the opposite order to what it would
be in normal code - but then, filter(height__lt=3) doesn't really
match perfectly with normal code either.

If you're looking for a way to reconcile the syntax with expectation -
one suggestion is that the syntax aggregate(height__avg='foobar') can
be read as "the average height has the alias foobar". Not ideal, I
know, but it is a start.

>  Have you considered syntax that looks like this instead? :
>  >>> Students.objects.all().aggregate({'average_height': 'height__avg})
>  > {'average_height' : 1.43}

My issue with this syntax is that every aggregate requires 4 extra
punctuation characters (two parentheses, two quotes), and all you
really get in return is to reverse the order so that you get
result=operation type syntax... except that you can't use the =
operator. It also doesn't have any particular analogs with existing
ORM syntax.

Yours,
Russ Magee %-)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ian Kelly  
View profile  
 More options Apr 26 2008, 2:34 am
From: "Ian Kelly" <ian.g.ke...@gmail.com>
Date: Sat, 26 Apr 2008 00:34:31 -0600
Local: Sat, Apr 26 2008 2:34 am
Subject: Re: Aggregate Support to the ORM
On Fri, Apr 25, 2008 at 11:40 PM, Russell Keith-Magee

<freakboy3...@gmail.com> wrote:
>  >  Have you considered syntax that looks like this instead? :
>  >  >>> Students.objects.all().aggregate({'average_height': 'height__avg})
>  >  > {'average_height' : 1.43}

>  My issue with this syntax is that every aggregate requires 4 extra
>  punctuation characters (two parentheses, two quotes), and all you
>  really get in return is to reverse the order so that you get
>  result=operation type syntax... except that you can't use the =
>  operator. It also doesn't have any particular analogs with existing
>  ORM syntax.

I would say it has a pretty good analog:

>>> Student.objects.all().extra(select={'average_height': 'avg(height)'})

Of course that query won't work, since you can't just toss an
aggregate function into a select list and expect to get anything but
an error message.  But compare it to this one, which does work:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Simon Willison  
View profile  
 More options Apr 26 2008, 6:32 am
From: Simon Willison <si...@simonwillison.net>
Date: Sat, 26 Apr 2008 03:32:18 -0700 (PDT)
Local: Sat, Apr 26 2008 6:32 am
Subject: Re: Aggregate Support to the ORM
On Apr 26, 7:34 am, "Ian Kelly" <ian.g.ke...@gmail.com> wrote:

That was my thinking exactly. I think the cognitive dissonance
involved in height__avg='average_height' is much greater than that in
({'average_height': 'height__avg}). Sure, it's new ORM syntax but it's
also a new ORM feature. Overloading the existing syntax to do
something that's actually quite different feels really strange to me.

name__contains='blah' is field-name__comparison-operator = input-to-
comparison-operator

height__avg='average_height' is field-name__aggregate-function = alias
under which the result should be known

I think the syntax looking similar here is misleading rather than
helpful.

Cheers,

Simon


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Russell Keith-Magee  
View profile  
 More options Apr 26 2008, 7:33 am
From: "Russell Keith-Magee" <freakboy3...@gmail.com>
Date: Sat, 26 Apr 2008 19:33:00 +0800
Subject: Re: Aggregate Support to the ORM

I can see that there might be a bit of cognitive dissonance; however,
it doesn't bug me personally as much as the syntax your are proposing.
Excessive parentheses and quotes like that increase the possibilities
for mistyping mistakes, and don't really serve to improve clarity in
the underlying expression.

That said, the cognitive dissonance is a reasonable enough objection
to warrant some more exploration of the syntax options.

So - here's a slightly modified proposal. Last time this topic came up
on django-dev, Justin Fagnani made an interesting suggestion which I
think will answer your objections, and leaves open some interesting
possibilities for userspace extension. Rather than the
column__aggregate=alias syntax:

>>> Student.objects.all().annotate(Avg('height'),

tallest=Max('height'), avg_friend_age=Avg('friend__age')

The raw idea - the aggregates themselves are tied up into functions
(Avg, Max, etc) that are constructed as an object. The argument to the
aggregate is the query column, using the standard double-underscore
notation if necessary to cross tables. In QS-RF, this syntax is being
used for cross-table order_by() clauses, so there is an analog there.

If the aggregate function is provided as an anonymous argument (e.g.,
Avg('height')), then the aggregate is asked to provide an appropriate
alias - for example, Avg() might use 'avg__%s' % field_name. If the
aggregate function is provided as a kwarg, the kwarg name is used as
the alias.

Userspace extension becomes a lot easier - If a particular database
has an interesting aggregate that isn't supported as part of the core
set, you can write your own aggregate and use it as you see fit.

It also leaves open the door to do nifty things like
annotate(foo=Avg('a') * Sum('b')) - i.e., alegebraic expressions
between aggregates. Obviously, this sort of API will require a whole
lot of work to get right, but the option is there.

Does this proposal sound any better to you?

Yours,
Russ Magee %-)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Simon Willison  
View profile  
 More options Apr 26 2008, 6:32 pm
From: Simon Willison <si...@simonwillison.net>
Date: Sat, 26 Apr 2008 15:32:34 -0700 (PDT)
Local: Sat, Apr 26 2008 6:32 pm
Subject: Re: Aggregate Support to the ORM
On Apr 26, 12:33 pm, "Russell Keith-Magee" <freakboy3...@gmail.com>
wrote:

> So - here's a slightly modified proposal. Last time this topic came up
> on django-dev, Justin Fagnani made an interesting suggestion which I
> think will answer your objections, and leaves open some interesting
> possibilities for userspace extension. Rather than the
> column__aggregate=alias syntax:

> >>> Student.objects.all().annotate(
>    Avg('height'), tallest=Max('height'), avg_friend_age=Avg('friend__age')
> )
> The raw idea - the aggregates themselves are tied up into functions
> (Avg, Max, etc) that are constructed as an object. The argument to the
> aggregate is the query column, using the standard double-underscore
> notation if necessary to cross tables. In QS-RF, this syntax is being
> used for cross-table order_by() clauses, so there is an analog there.

I like this a lot - makes much more sense to me than the other syntax,
and I think I prefer it to my proposed dictionary based syntax as
well.

> If the aggregate function is provided as an anonymous argument (e.g.,
> Avg('height')), then the aggregate is asked to provide an appropriate
> alias - for example, Avg() might use 'avg__%s' % field_name. If the
> aggregate function is provided as a kwarg, the kwarg name is used as
> the alias.

Is it necessary to provide an anonymous argument syntax at all? Seeing
just annotate(Avg('height')) left me confused as to what the alias
would be until I read your further explanation. Is there any harm in
requiring the user of the API to always explicitly specify their alias
using keyword argument syntax? Unless there's a really good reason to
provide default aliases I think requiring explicit aliases would be
perfectly reasonable ("explicit is better than implicit").

Cheers,

Simon


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Russell Keith-Magee  
View profile  
 More options Apr 26 2008, 8:59 pm
From: "Russell Keith-Magee" <freakboy3...@gmail.com>
Date: Sun, 27 Apr 2008 08:59:54 +0800
Local: Sat, Apr 26 2008 8:59 pm
Subject: Re: Aggregate Support to the ORM

Its mostly there as a convenience for the really simple cases. There
some duplication in typing annotate(avg_height=Avg('height'), so the
shortcut is a convenience. IMHO, this is the sort of thing could be
easily explained in the documentation for each aggregate function.

However, it's hardly a mandatory feature - if public opinion is
against this idea, I won't lose any sleep about it.

Yours,
Russ Magee %-)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
alex.gaynor@gmail.com  
View profile  
 More options Apr 26 2008, 9:58 pm
From: "alex.gay...@gmail.com" <alex.gay...@gmail.com>
Date: Sat, 26 Apr 2008 18:58:33 -0700 (PDT)
Local: Sat, Apr 26 2008 9:58 pm
Subject: Re: Aggregate Support to the ORM
I like the lookup objects idea for aggregates, it's similar to Q
objects and I think it works well.  It is far more extensible then the
__ notation.

On Apr 26, 7:59 pm, "Russell Keith-Magee" <freakboy3...@gmail.com>
wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Simon Willison  
View profile  
 More options Apr 27 2008, 3:21 am
From: Simon Willison <si...@simonwillison.net>
Date: Sun, 27 Apr 2008 00:21:31 -0700 (PDT)
Local: Sun, Apr 27 2008 3:21 am
Subject: Re: Aggregate Support to the ORM
On Apr 27, 1:59 am, "Russell Keith-Magee" <freakboy3...@gmail.com>
wrote:

> Its mostly there as a convenience for the really simple cases. There
> some duplication in typing annotate(avg_height=Avg('height'), so the
> shortcut is a convenience. IMHO, this is the sort of thing could be
> easily explained in the documentation for each aggregate function.

Seems reasonable. I think I'd like to see the explicit syntax
introduced first in the docs just so people aren't confused by the
very first example, but avoiding the duplication in
avg_height=Avg('height') is a worthy goal.

Cheers,

Simon


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nicolas Lara  
View profile  
 More options Apr 27 2008, 2:26 pm
From: "Nicolas Lara" <nicolasl...@gmail.com>
Date: Sun, 27 Apr 2008 14:26:22 -0400
Local: Sun, Apr 27 2008 2:26 pm
Subject: Re: Aggregate Support to the ORM

>  That was my thinking exactly. I think the cognitive dissonance
>  involved in height__avg='average_height' is much greater than that in
>  ({'average_height': 'height__avg}). Sure, it's new ORM syntax but it's
>  also a new ORM feature. Overloading the existing syntax to do
>  something that's actually quite different feels really strange to me.

Regarding this, I believe an option would be to use the reverse syntax:

aggregate(alias='field__function')
because it mimics what we normally do when assigning in code.
or simply  aggregate('field__function') and have the standard alias be
the same as the requested aggregate (it would be 'field__function')
I still prefer this approach over the lookup objects. But this is
probably because i've been thinking of aggregates with the filter-like
syntax for a while and I became fond of it. Anyway I do think the
lookup object syntax is a very good approach, so here are some
considerations:

Extending seems easy, but is it? (in practice). The way I see to
extend this is to create custom classes (like Max, Avg, etc). this
classes would have a __init__ function that receives 2 parameters:
self, of course, and  the field name. The newly defined class should,
then, call something like get_column_name('field') to be able to
translate itself into SQL. Pleas don't furiously reply there might be
other ways to do it, I know there are, this is just one of the ways I
could come up from the top of my head to ilustrate what I mean... A
proposition on this follows...

Though the objective syntax mimics Q objects, when using Q objects you
have one class, Q, that is used for all the queries. Having multiple
classes seems confusing. I would propose to have a single class (A?)
to do the queries. So you could do something like:

aggregate(height=A('max', 'height'), av_friend_age=A('avg', 'friend__age'))

This way A could receive the name of the function (or the actuall
function object) and the field to which it will be applied. A good
thing that could come from this is that extending (adding a new
function, or set of functions) would be really easy because, since the
heavy lifting would be done by A which would already be part of the
framework, the new function only needs to define how it transkates to
SQL given the column to which would be applied. Having the first
parameter be class/classname instead of function/function name is also
an option. This would allow us to add members to the class so that A
can act differentely. An example of this would be using a class that
has the 'alias_prefix' attribute to define what comes before the field
name in the default alias (A alias function could also be available to
receive the fieldname and return the alias).

class Custom:
         alias_prefix = 'my_aggregate'

         def as_sql(self, column):
                  return 'CUSTOM(%s)' % column

class WierdMax(Max):
         def as_sql(self, column):
                   return super(Max, self).as_sql(column) - 10

so

aggregate(A('custom', 'height'), wierd=A('wierdmax', 'friend__age'))

would result in something like:

{'my_aggregate__hieght': 324, 'wierdmax__friend_age':  17}

I don't like the automatic lowercasing... Its very too much magic, but
I found the camel case quiet ugly...

Any thoughs?

--
Nicolas Lara
Linux user #380134
Public key id: 0x152e7713 at http://keyserver.noreply.org/
# Anti-Spam e-mail addresses:
python -c "print '@'.join(['nicolaslara', '.'.join([x for x in
reversed( 'com|gmail'.split('|') )])])"
python -c "print '@'.join(['nicolas', '.'.join([x for x in reversed(
've|usb|labf|ac'.split('|') )])])"


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Justin Fagnani  
View profile  
 More options Apr 28 2008, 2:55 pm
From: "Justin Fagnani" <justin.fagn...@gmail.com>
Date: Mon, 28 Apr 2008 11:55:54 -0700
Local: Mon, Apr 28 2008 2:55 pm
Subject: Re: Aggregate Support to the ORM

Hey everyone,

Good to see this has come up again. Congrats on getting the GSoC selection
Nicolas. The previous thread[1] has a lot of good discussion in it if anyone
here wasn't following that.

Obviously, I prefer the expression object API. I've actually been using it a
bit in an order processing app I'm working on. I've been keeping the patches
up to date with qs-rf and I just merged it with the new trunk. If anyone
wants to check them out, where should I put them, as an attachment on #3566?
(It works a bit differently than what's described here, folding all
the behavior into values(), which might not be ideal)

[1]:
http://groups.google.com/group/django-developers/browse_thread/thread...

I really like Honza's idea of an AggregateModel, at least for cases where
there's a 1-1 correspondence between results and actual instances, so that
the model will still behave as expected. To keep from cluttering the model's
attributes, aggregate values could be put into another object or dict:

>>>myproduct.aggregates['avg_price']

I like the idea less when the result would be a representative of a group.
There could be unexpected results from calling methods on an instance
because not all the data is there, or it has invalid values (averaging an
integer field, etc). In these cases, I don't think it's a bad idea to
require the use of values() and/or aggregate().

Also, there will probably be cases where we'd want to iterate over the
members of the groups, so maybe instead of a list of dicts, aggregate()
returns list of objects, so that a query like:

>>> people = Person.objects.values('age').aggregate(Avg('income'))

will return a list of objects that you can use like a dict:

>>>people[0]['age']

and get a queryset from:

>>>people[0].objects()

On Sun, Apr 27, 2008 at 11:26 AM, Nicolas Lara <nicolasl...@gmail.com>
wrote:

> Having multiple classes seems confusing.

I'm not sure why multiple classes would be confusing, since they do
represent different behaviors. If it has to do with dealing with many
classes, then it doesn't seem different than the many function names that
need to be passed to the A class.

> I would propose to have a single class (A?) to do the queries. So you
> could do something like:

> aggregate(height=A('max', 'height'), av_friend_age=A('avg',
> 'friend__age'))

At least for readability, I think this is clearer:

aggregate(height=Max('height'), av_friend_age=Avg('friend__age'))

In addition, some cases have special requirements and dealing with them in a
class is easy.

Cheers,
  Justin


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nicolas E. Lara G.  
View profile  
 More options Apr 28 2008, 5:10 pm
From: "Nicolas E. Lara G." <nicolasl...@gmail.com>
Date: Mon, 28 Apr 2008 14:10:33 -0700 (PDT)
Local: Mon, Apr 28 2008 5:10 pm
Subject: Re: Aggregate Support to the ORM
Hi Justin,

> I really like Honza's idea of an AggregateModel, at least for cases where
> there's a 1-1 correspondence between results and actual instances, so that
> the model will still behave as expected. To keep from cluttering the model's
> attributes, aggregate values could be put into another object or dict:

> >>>myproduct.aggregates['avg_price']

> I like the idea less when the result would be a representative of a group.
> There could be unexpected results from calling methods on an instance
> because not all the data is there, or it has invalid values (averaging an
> integer field, etc). In these cases, I don't think it's a bad idea to
> require the use of values() and/or aggregate().

I didn't understand your point here. When is the data incomplete? and
I dont see the problem in averaging an integerField (probably in
averaging a CharField, but IMHO this kind of invalid values should
make the whole query fail).

> Also, there will probably be cases where we'd want to iterate over the
> members of the groups, so maybe instead of a list of dicts, aggregate()
> returns list of objects, so that a query like:

> >>> people = Person.objects.values('age').aggregate(Avg('income'))

> and get a queryset from:

> >>>people[0].objects()

I like this idea quite a lot. It is basically a shortcut for a common
use case that is generating a QuerySet for the objects that have been
aggregated.Still, doing this queryset manually is also very simple.

I aggre. An advantage (which I think might also be a double edge
sword) of the string function names is the possibility to fallback to
a standard convertion to SQL when no function is defined. With this we
wouldn't need to wrap every aggregate method in a python class that
translates roughly the same way and we could take advantage of the
direct usage of the DB defined aggregate functions. Also with the A
syntax you would do only one import instead of one for each function.
Still the point of readability is a very important one and I believe
it might come before other features.

> In addition, some cases have special requirements and dealing with them in a
> class is easy.

I can't think of a requirement that can't be dealt with in a method
also... got an example?

Regards,

Nicolas


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Justin Fagnani  
View profile  
 More options Apr 28 2008, 6:25 pm
From: "Justin Fagnani" <justin.fagn...@gmail.com>
Date: Mon, 28 Apr 2008 15:25:00 -0700
Subject: Re: Aggregate Support to the ORM

On Mon, Apr 28, 2008 at 2:10 PM, Nicolas E. Lara G. <nicolasl...@gmail.com>
wrote:

> I didn't understand your point here. When is the data incomplete?

An example would be grouping people by age and averaging income, what should
the values of the name field be? None might make sense at first, as Honza
suggests, but what happens to methods that depend on name? The whole point
of returning models is to have access to the methods and other class
metadata, but if data is missing or not meaningful, then those methods might
not be useful.

I dont see the problem in averaging an integerField (probably in

> averaging a CharField, but IMHO this kind of invalid values should
> make the whole query fail).

The problem with averaging an IntegerField is that the result is unlikely to
be an integer, yet parts of the model might assume that it is. At the very
least AggregateModels in this situation are unable to be validated, saved or
deleted, and it's possible that other methods don't work, so either they
shouldn't be models at all, or we need to disable lots of things and issue
big, big warnings.

> I aggre. An advantage (which I think might also be a double edge
> sword) of the string function names is the possibility to fallback to
> a standard convertion to SQL when no function is defined.

I'm not sure how wise this is. Considering the variation of available
functions, and their different behaviors,   just passing the function name
straight into SQL could easily encourage non-portable queries. I know some
users will want DB specific functions, but like extras, it should probably
be clearly advanced functionality.

With this we

> wouldn't need to wrap every aggregate method in a python class that
> translates roughly the same way and we could take advantage of the
> direct usage of the DB defined aggregate functions.

Defining most functions is quite easy. Here's Max:

class Max(Aggregate):
    _func = 'MAX'

I'm sure we can get all the standard functions in there.

> Also with the A
> syntax you would do only one import instead of one for each function.

Ideally expressions.py should be import * friendly

> In addition, some cases have special requirements and dealing with them in
> a
> > class is easy.

> I can't think of a requirement that can't be dealt with in a method
> also... got an example?

Count() is an easy example. First, it takes an extra kwarg, 'distinct', and
changes the output of as_sql() if set. Then it also has to do a special case
in relabel_aliases() and whatever method passes column names to
query.setup_joins() for COUNT(*)

Cheers,
 Justin


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nicolas Lara  
View profile  
 More options Apr 29 2008, 3:45 am
From: "Nicolas Lara" <nicolasl...@gmail.com>
Date: Tue, 29 Apr 2008 10:45:11 +0300
Local: Tues, Apr 29 2008 3:45 am
Subject: Re: Aggregate Support to the ORM
On Tue, Apr 29, 2008 at 1:25 AM, Justin Fagnani

<justin.fagn...@gmail.com> wrote:

> On Mon, Apr 28, 2008 at 2:10 PM, Nicolas E. Lara G. <nicolasl...@gmail.com>
> wrote:
> > I didn't understand your point here. When is the data incomplete?

> An example would be grouping people by age and averaging income, what should
> the values of the name field be? None might make sense at first, as Honza
> suggests, but what happens to methods that depend on name? The whole point
> of returning models is to have access to the methods and other class
> metadata, but if data is missing or not meaningful, then those methods might
> not be useful.

In this case: should we actually return the actual objects? If a
values modifier is involved the expected behavior is to receive a list
of dicts. IMO, we should not allow aggregates to override this
behavior. Also, having some value for the fields wouldn't make sense
since we cannot ensure there is a unique value for a given field for
that group (unless it is specified in values, in which case it would
automatically appear).
I like the idea of having a QuerySet for retrieving the grouped
objects. This can be done simply by having something like: {'objects':
<QuerySet object>}

or they can be a list of Models only when no 'values' is specified.

> > I aggre. An advantage (which I think might also be a double edge
> > sword) of the string function names is the possibility to fallback to
> > a standard convertion to SQL when no function is defined.

> I'm not sure how wise this is. Considering the variation of available
> functions, and their different behaviors,   just passing the function name
> straight into SQL could easily encourage non-portable queries. I know some
> users will want DB specific functions, but like extras, it should probably
> be clearly advanced functionality.

I am not sure either. I see the problems it might bring.

This could all be handled in a count method. Still I see that using
classes poses and advantage.

> Cheers,
>  Justin

Regards,

--
Nicolas Lara


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 1 - 25 of 34   Newer >
« Back to Discussions « Newer topic     Older topic »