Fwd: Aggregate class: a first-attempt

12 views

Skip to first unread message

Honza Král

unread,

Feb 22, 2007, 9:34:17 PM2/22/07

to django-d...@googlegroups.com

I think we should really move this to the developers' list

On 2/23/07, Tim Chase <django...@tim.thechases.com> wrote:
>
> >> I haven't yet figured out a way to suppress the order_by portion,
> >> so what's currently in there is an ugly hack. But it would need
> >> to prevent the standard methods from inserting an ORDER BY clause
> >> against a non-aggregated field.
> >
> > if you add an empty call (no parameters) to order_by(), it will drop
> > the ORDER BY clause
>
> I thought I had tried this. I'll have to give it another whack
> on Monday when I get back from PyCon. I had problems with it
> trying to order the resultset by the Meta.ordering properties,
> when those fields weren't part of the aggregate. I'll try it
> again to see if I did something wrong or if I unearthed a bug.
>
> > I have another proposition, how about:
> >
> >>>> quseryset = Model.objects.all()
> >>>> queryset.aggregate( ( 'name', 'city' ), sum=( 'pay',
> > 'some_other_field' ), avg=( 'pay', 'age' ), count=True )
>
> I like this calling interface as an alternate method for its
> fine-tuned control, but there are times it would be nice to not
> have to have such tight coupling between the model and the
> aggregation. Otherwise, a change in the model requires not only
> a change in the template to make use of the results, but in the
> view (to request various aggregate functions) to add the desired
> aggregation functions.

yes, and that is how it should be in my opinion. Why have the database
calculate stuff you don't want?

>
> >>>> [
> > {
> > 'name' : 'some name',
> > 'city' : 'Prague',
> > 'sum' : { 'pay' : 30000, 'some_other_field' : 10000 },
> > 'avg' : { 'pay' : 10000, 'age' : 30 },
> > 'count' : 5,
> > },
> > ......
> > ]
>
> As long as the resulting data can be easily accessed directly by
> a template, any number of resulting formats would work just fine.

good point, the dictionary one wouldn't work very well in templates

> I'd see using it something like
>
> return render_to_response(
> 'test.html', {'items': Model.objects.all()}
> )
>
> and then within the template refer to
>
> <table>
> {% for item in items %}
> <tr><td>{{ item.pay }}</td></tr>
> {% endfor %}
> <tr><td>{{ items.aggregate.sum.pay }}</td></tr>
> </table>

this just seems wrong - doing something like this in a template just
doesn't feel right, you should calculate the aggregation in the view -
its an expensive operation you should always think through

>
> or possibly
>
> items = Model.objects.all()
> return render_to_response('test.html', {
> 'items': items,
> 'stats': items.aggregate(),
> })

I think this should be:

items = Model.objects.all()
return render_to_response('test.html', {
'stats_by_owner': items.aggregate( ('owner',), count=True,
sum=('visits',) max=('rating', ), min=('rating', 'visits') ),
})

{% for stat in stats_by_owner %}
Owner {{ stat.owner }} has written {{ stat.count }} articles wit
ratings between {{ stat.min.rating }} and {{ stat.max.rating }}. His
work was read {{ stat.sum.visits }} times in total and his least
popular article was read {{ stat.min.visits }} times.
{% endfor %}

the problem here is that if this should make any sense, owner should
be an object, not just a field value (user id in this case), I believe
though that this can be done...

>
> and then within the template refer to
>
> <table>
> {% for item in items %}
> <tr><td>{{ item.pay }}</td></tr>
> {% endfor %}
> <tr><td>{{ stats.sum.pay }}</td></tr>
> </table>
>
> > it seems reasonably clean interface to me:
> > fist argument is a tuple of fields on which to call GROUP BY, then
> > there are several optional kwargs: avg, sum, min, max and count

the first argument should also be optional - in case you want the total

> >
> > I am willing to code up a patch for this, it should be pretty
> > straightforward. If we would want to take it to the next level, we
> > could allow filter() on such aggregated queryset, that would manifest
> > itself in the HAVING clause...
> >
> > what do you think??
>
> I like the ideas you present, particularly your clean interface
> for the aggregated results ("results.sum.fieldname" rather than
> my munged "results.fieldname_sum") and the clean interface for
> specific aggregation. However, as stated above, I am hesitant to
> give up the simplicity of just asking for "aggregate everything
> possible" to prevent tight coupling. Both sound good to me ("but
> I want both specific *and* generic" :)

and what about performance and optimization? do you really want to
calculate even the things you have no intention on using? Tight
coupling with the model here makes sense - you cannot really ask for
aggregation, if you don't know, what you want.

>
> Feel free to swipe any of the code I provided for any patches you
> make.

thanks, I will wait for some more feedback though, I want to get the
interface right before I start coding

>
> I think it will be a valuable addition as evaluation can be made
> lazy (thus, it has no great impact on existing code), and I've
> seen it requested several times in the Django-Users ML archives
> where the answer was "Django doesn't do that, but you can drop to
> hand-rolled SQL to do it for you." I think it would help the
> folks that don't think in SQL.

or the guys that don't want to rely on one DB engine

>
> Just my musings...I appreciate your feedback.

same here ;)

>
> -tkc
>
>
>
>
> >
>

--
Honza Král
E-Mail: Honza...@gmail.com
ICQ#: 107471613
Phone: +420 606 678585

Reply all

Reply to author

Forward

0 new messages