I'm a little bit confused as to what you think is happening here -
what do you mean by "other than a simple column"? The default
aggregate grouping is effectively by object instance; the
implementation means that every non-aggregate field on the model that
is returned by the SELECT is included in the GROUP BY.
> db/sql/query.py set_group_by suggests that this might work, to (for
> example) group together dollar amounts:
That block of code is internal API - if you choose to use it, you do
so at your own risk, and backwards compatibility is not guaranteed.
set_group_by() is an internal setup call; I doubt it will do anything
useful in your case.
The public API for aggregates consists of annotate(), aggregate() and
the interaction of those calls with values().
> Table.objects.extra(select={"dollars": "cents/100"}).values
> ("dollars").aggregate(...)
>
> That doesn't throw an error (it does if the extra() isn't there), but
> it doesn't group as expected, either. (For some reason, it adds a
> GROUP BY containing every field in Table.)
The reason every field in the table is included is because that's how
GROUP BY works - any field that is in the query that isn't an
aggregate needs to be included in the group by. When you call
Table.objects...., this implies SELECT Table.id, Table.name, ... at
the database level. Some other fields are included in the GROUP BY to
accomodate the needs of order_by() and select_related().
The only exception to this is MySQL; MySQL allows for grouping by
primary key, which is an optimization for the simple case of grouping
by every non-aggregate field on a model.
I'd like to provide some more concrete advice, but it's not exactly
clear what you're trying to achieve. If you describe the actual
problem you are trying to solve, what you have tried, and the errors
or unexpected results you have seen, we might be able to provide
better advice.
Yours,
Russ Magee %-)
So, this is kind of an API extension request. Realise that Django
intentionally does not expose "GROUP BY" or anything like that, in the
ORM. Although the fact that we're over a relational storage backend
sometimes leaks through, the ORM API is fairly SQL-agnostic. Instead, we
expose particular pieces of functionality that happen to be implemented
using "GROUP BY" when it's turned into SQL (and could well be
implemented by some entirely different set of magical fairies with a
different storage system backend).
The aggregate support in Django provided an ORM API for aggregate
functionality. Not an exposure of grouping.
>
> Just experimenting, if I do:
> Table.objects.extra(select={"group": "func(value)"}).values
> ("group").annotate(Sum("dollars"))
> and then manually hack set_group_by to add "group" to self.group_by,
> it generates the result I'm expecting, eg. [{"dollars__sum": 10,
> "group": 1}, ...]. But, I havn't found any path in the real API to
> make that happen.
That's because it doesn't exist. Django isn't meant to completely
replace the need for SQL, for a start.
Also, given the relative infrequency with which this sort of thing crops
up across the entire set of all possible database interactions, it's not
unreasonable that it might, in fact, take more than one line of Python
to do it. You could construct the queryset, then look up the appropriate
alias in the "query" attribute (possibly adding the table, if
necessary), then set up the extra() bit. It would take three or four
line, which could be wrapped in a function if you needed to do it
regularly. Uses internal API all the way through, but simple enough to
write tests for so that you could tell if it broke due to internal
changes in the future.
Finally, if there's some use-case that pops up at the Python level that
might justify an API and if there's a sensible API, then it's something
to consider (not necessarily add, but consider) as an addition to Django
in the future (maybe in the Django 1.2 or 1.3 time-frame). You'll need
to express the problem in terms of something that is a reasonable
situation for a Python ORM, not as "I can do this in SQL, and so it must
be possible in Django as a single command", since that is in conflict
with the stated goals. There's always a slight problem with attempting
to create an API to pass through essentially arbitrary SQL (e.g. the
"func()" part of your raw SQL) via the ORM, since it means nothing at
the Python level, so requires exposing raw SQL as part of the API.
Extra() is already a fairly ugly hack in many respects -- it's
incredibly fragile, in the sense that once you add it to a queryset,
subsquent manipulations of that queryset can fail in ways we cannot hope
to detect or recover from.
I hope that puts the item you're asking about in some perspective. I've
tried to give an idea of why this is a complex area. What you're wanting
to do is probably already possible via a few lines of code that poke at
the internals and possibly won't get any more direct than that in the
near future.
Regards,
Malcolm
Yes - using values() before an annotate() call specifies that a subset
of the fields is required, which controls the grouping that is output.
> Just experimenting, if I do:
> Table.objects.extra(select={"group": "func(value)"}).values
> ("group").annotate(Sum("dollars"))
> and then manually hack set_group_by to add "group" to self.group_by,
> it generates the result I'm expecting, eg. [{"dollars__sum": 10,
> "group": 1}, ...]. But, I havn't found any path in the real API to
> make that happen.
There have been a couple of reports recently (#10113 and #10127 are
two examples that jump to mind) that describe errors in the exact
composition of the GROUP BY statement constructed by Django. I suspect
this may be one of this family of bugs. In your case, the culprit is a
select added in an extra(). I'll need to spend some time to check the
exact details and implications, but my initial reaction is that your
query:
Table.objects.extra(select={"group":
"func(value)"}).values("group").annotate(Sum("dollars"))
should work as you describe without requiring any hackery. If you
could open a ticket describing this problem (the query and what you
expect as SQL) it will ensure we don't forget about this before we
release v1.1.
Also, keep in mind that Django's SQL layer is an ORM - an _Object_
Relational Mapper. The API is designed to look like you're dealing
with Objects. The fact that it is a relational store underneath is
secondary. This means that there is a non-trival collection of easy
SQL queries that will be difficult (or impossible) to represent in the
ORM. This may not apply in this case specifically, but it is worth
remembering when you start looking for ways to represent esoteric SQL
as ORM statements. Sometimes the right solution is to just use SQL.
Yours,
Russ Magee %-)
> I don't care if it takes more than one line, though there is, in
> fact :), a big difference between taking more than one line and having
> to bypass the public API.
Only semantically, when you're trying to access something like "group
by", since, pretty much by definition, you're doing stuff that the
public ORM isn't intending to provide.
Regards,
Malcolm