GROUP BY `refunds_userprofile`.`ird_number` ,
`auth_user`.`first_name` , `auth_user`.`last_name` ,
`refunds_userprofile`.`user_id` , `auth_user`.`id` ,
`auth_user`.`username` , `auth_user`.`first_name` ,
`auth_user`.`last_name` , `auth_user`.`email` ,
`auth_user`.`password` , `auth_user`.`is_staff` ,
`auth_user`.`is_active` , `auth_user`.`is_superuser` ,
`auth_user`.`last_login` , `auth_user`.`date_joined`
to
GROUP BY `refunds_userprofile`.`ird_number`
(note this doesn't change the query results)
I figured I might have to use some undocumented methods to do this,
but I couldn't figure it out - I've tried explicitly setting the
query.group_by attribute with no success. Does anyone know how I could
do this?
What ORM query are you running to get the above generated SQL? Looks
like it could be a bug.
--
DR.
The short answer is "you don't". Django doesn't expose any way to
manually control GROUP BY because that is a relational construct, and
Django is attempting to provide an Object-like wrapper.
As for why dropping GROUP BY clauses is faster - remember, t's always
possible to get the wrong answer in O(1) time :-) Daniel has asked the
more important question - what are you trying to do? Django is
generally pretty careful about adding GROUP BY clauses; arbitrarily
dropping them make me a little nervous. It's possible you're found a
bug, but it's impossible to confirm that without more details.
Yours
Russ Magee %-)
First, some background - UserProfile is the
settings.AUTH_PROFILE_MODULE and the Application model has a
ForeignKey to User. There's ~90,000 users/userprofiles, ~110,000
applications (most have only one application, some have 2). The
database is mysql. If that's not enough info, let me know and I'll see
if I can put together a simple reproduce-able test case.
The queryset giving the problem is
UserProfile.objects.values(
'ird_number', 'user__first_name', 'user__last_name', 'user__pk',
).annotate(
Max('user__application__creation_date'),
).order_by('-user__application__creation_date__max')
I'm beginning to think it might be a bug - because UserProfile and
User are joined by a one-to-one relationship, it's sufficient to group
by one of their primary keys, and I think the ORM should know this.
Should I open a ticket for this?
Greg
> --
> You received this message because you are subscribed to the Google Groups "Django users" group.
> To post to this group, send email to django...@googlegroups.com.
> To unsubscribe from this group, send email to django-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
>
>
It looks to me like you are trying to get more info from the User
model then then UserProfile model, so maybe change your query to be
based on the User not the UserProfile
User.objects.values('first_name', 'last_name', 'pk').annotate(
Max('application__creation_date'),
).order_by('-application__creation_date__max').select_related()
User.objects.values('first_name', 'last_name', 'pk',
'userprofile__ird_number').annotate(
Max('application__creation_date'),
).order_by('-application__creation_date__max').select_related()
but that raises
FieldError: Invalid field name: 'userprofile__ird_number'
Greg
This is only true for MySQL. Every other database that Django supports
requires a complete list of non-aggregate fields in the GROUP BY
clause - which is the behaviour you are seeing.
> Should I open a ticket for this?
It's probably worth opening a ticket, but not because it's a bug. Like
I said, the 'group by pk' think is a MySQL eccentricity; the SQL you
describe is completely normal for other databases (mod the fact that
your values() clause has a lot less fields than the SQL sample you
gave, but I take it that this is was just truncated for clarity). The
query is that is being issued is *correct*, it's just not *optimal*.
However, there might be an improved optimization to be had. My first
impression is that we might be able to allow the 'group by pk'
optimization as long as the pk is included in the list of fields in a
values query. The current code only allows the optimization in the
case that the list of selected columns is the same as the list of
fields on the requesting model (i.e., the non-values() case). Beefing
up the optimization like this will require a bit of thought to make
sure it isn't catching cases it shouldn't (i.e., is there a query that
includes the PK for which the 'group by pk' optimization isn't valid),
but providing we aren't going to break anything,
Yours,
Russ Magee %-)