From: "Honza Král" <honza.k...@gmail.com>
Date: Sat, 14 Apr 2007 00:16:02 +0200
Local: Fri, Apr 13 2007 6:16 pm
Subject: Re: Aggregates: anything planned for 1.0?
On 4/13/07, Russell Keith-Magee <freakboy3...@gmail.com> wrote:
> On 4/13/07, Honza Král <honza.k...@gmail.com> wrote: > > see: > Ok; I've had a chance to look at this now. Although there are some > Comments: > GROUP BY = values() > Isn't the group_by clause duplicating what is already provided by the that should be two different things, only the same format of returned data doesn't seem to be enough to merge these two functions When the aggregates clause is > in use, a GROUP BY over the selected fields is implied. If you don't if you want to go with values() (which I am personally against), why not do: > provide a values clause, then you want to GROUP BY all the fields in > the model > So, your example would become: > >>> queryset = Model.objects.all() qset.values('field', 'field2', 'field3__sum', 'f4__count', ...) < = > I still don't like it, but seems better, and would solve issues with > Output format > [ > To me, the clause is returning a list of data groups (i.e., the when there is a simple way around this. it would also allow for easier generic processing of the results (for example when building a general statistics app, we will know without inspecting the model, which fields were in the GROUP BY etc. -- explicit is better than implicit ;) > Weaknesses in the function=tuple argument approach > While the function=tuple of field names approach does work, I feel it you could use something like order_by('sum(field3)'), but that seems weird... > - It doesn't handle joins in the same way as filter() true, but it should, that's why I want to keep the arguments as simple string containing only the field name (including optional foreign__or__other__key__lookups), so same logic could be used for filters and these > The approach I proposed was to mirror the filter syntax: name (field__sum) as a reference to the field: qset = qset.values('field', 'field2', 'field3__sum', 'f4__count', ...) qset.filter( field__contains='AA' ) # results in a WHERE clause > This approach mirrors the filter mechanism for attribute naming and > The only downside I can see is that the simple case requires the user > However, it should be possible to combine both approaches: > >>> Model.objects.aggregate(sum=('pay'), other_field__average='mean_value') > This means the sum of pay isn't aliased, so it can't be filtered, but > Ambiguities in COUNT > aggregate(count=True) is already covered with count(); I'm not sure .count() wouldn't cut it (in one select) and once you are doing aggregation anyway, its simple to throw in COUNT(*)... so if you a) change .count() to behave like .aggregate(count=True), it would break code b) don't allow count in aggregation, you will have more queries for no good reason. I see .count() as a very useful shortcut for .aggregate(count=True)['count'] There is also the problem that it > precludes counting other things: e.g., counting related objects - I see, we obviously don't understand each other: > return the number of books that each author has written. > Author.objects.aggregate(books_count='number_of_books_written') > On top of this, you can get the gross count from: > len(Author.objects.aggregate(...)) I don't want count=True to add count of groups produced by group by, I want it to add information on the group's size: SELECT autor_id, COUNT(*), AVG(rating) FROM articles GROUP BY article_id; which would return me list of authors with number of published articles and their average rating > Aggregate of results vs aggregates of related objects > If you run an aggregate over a field from the table you are querying, > - Get me the date of the earliest and latest published article: Return > The second case is relatively easy: > >>> Author.objects.aggregate(article__pub_date__min='earliest', code of the queryset doesn't say anything about a name. I am confused by this. > However, the first case: > >>> Article.objects.aggregate(pub_date__min='earliest', pub_date__max='latest')) didn't specify group_by so the SELECT should look something like: SELECT MIN(pub_date) as earliest, MAX(pub_date) as latest FROM articles; and why would it magically include 'title' in the result ? I am confused again. > while legal, isn't the most elegant way at getting at a unique min/max > >>> Article.objects.aggregate(min=('pub_date'), max=('pub_date')) > Whereas annotate() returns a list of annotated dictionary data. group by clause, so it groups the entire table and is missing the 'group_by' field in the result (or any other fields except the aggregates you asked for) > ================================ > I'd better stop now - this is getting longer than I originally > Yours, > [1] http://groups.google.com/group/django-developers/msg/2f91fa217bc465bb Honza Král E-Mail: Honza.K...@gmail.com ICQ#: 107471613 Phone: +420 606 678585 You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
| ||||||||||||||