group by 3 fields

73 views
Skip to first unread message

Larry Martell

unread,
Feb 25, 2017, 11:53:20 PM2/25/17
to django...@googlegroups.com
I have a query set and I need to do the SQL equivalent of

GROUP BY col1, col2, col3

I have read many SO posts about using aggregate and count but I have
not been able to get this to work using 3 columns. Anyone know how to
do this?

chris rose

unread,
Feb 26, 2017, 5:30:09 AM2/26/17
to Django users
there is a model meta option called ordering. you can specify a list of fields to order by


i have only used this in the admin

Larry Martell

unread,
Feb 26, 2017, 6:11:39 AM2/26/17
to django...@googlegroups.com
Order by is not the same as group by. Group by does aggregation 

Daniel Roseman

unread,
Feb 26, 2017, 6:55:17 AM2/26/17
to Django users
On Sunday, 26 February 2017 11:11:39 UTC, Larry....@gmail.com wrote:
Order by is not the same as group by. Group by does aggregation 


You should explain what you want to achieve. Grouping is pointless on its own. In any case, when working with an ORM like Django's it is generally not helpful to think in terms of SQL.
-- 
DR. 

marcin....@gmail.com

unread,
Feb 26, 2017, 5:05:21 PM2/26/17
to Django users


On Sunday, February 26, 2017 at 12:55:17 PM UTC+1, Daniel Roseman wrote:

You should explain what you want to achieve. Grouping is pointless on its own. In any case, when working with an ORM like Django's it is generally not helpful to think in terms of SQL.
-- 
DR. 

Also forget window functions, triggers, conditionals, pubsub and many other great db features, just because of Django ORM...

Daniel is right - Django ORM will limit your expressions, but not every ORM works that way. SQLAlchemy, in contrast, allows you not only grouping, but also wrap anything into their expression API. 

With Django there is still possibility to do raw queries, nby Model.objects.raw(). I'm doing a complex queries that way, but I'm avoiding spaghetti code by using django-sqltemplate. It allows you to write big SQLs in separate files (also as templates),load using Django templates engine, and finally pass the result directly to Model.objects.raw(), if you want.

You can force grouping, AFAIR. Try to get query object from your queryset, i.e. q=Model.objects.all().query and play with q.group_by or something like that. But this is kind of nasty workaround. I would not recommend doing grouping that way in a project with long-term support.

Generally speaking Django ORM is quite simple, limited and does not work well for more complex projects. I don't want to talk more about this, because my point of view is generally opposite to the "Django Way" (tm). I can just say that I'm using Django ORM as simple mapper for most typical cases, leaving more complex things for better tools.

And, Larry, please never give up and always think about db and data first, then about application layer. Data is most important. It's lifetime is much longer than any app. Continue thinking about grouping, windowing, indexing, fts and so on, and find a way how to handle such things using tools like Django. 

Good luck!
Marcin

Larry Martell

unread,
Feb 26, 2017, 9:52:38 PM2/26/17
to django...@googlegroups.com
Of course grouping is pointless on its own. I need to do grouping and
aggregation (summing). But I need to group by 3 columns. I don't know
why you say it's not helpful to think in SQL. That is a language I
have worked in for over 20 years, and when I see a querying need that
is how I think, and then I see how can I do that with the ORM. Often I
cannot and have to use raw SQL. But I always do try and see if it can
be done in the ORM.

Larry Martell

unread,
Feb 26, 2017, 9:55:51 PM2/26/17
to django...@googlegroups.com
I agree with Marcin. The ORM is a tool and the tool should serve its
user, not the other way around. I have an app that has been developed
and deployed over 7 years. At the beginning it only used the ORM. But
over time, due to performance issues, and application requirements,
we've had to move to raw SQL more and more.

marcin....@gmail.com

unread,
Feb 27, 2017, 3:41:57 AM2/27/17
to Django users


On Monday, February 27, 2017 at 3:52:38 AM UTC+1, Larry....@gmail.com wrote:
[SQL] That is a language I
have worked in for over 20 years, and when I see a querying need that
is how I think, and then I see how can I do that with the ORM. 

So I shouldn't give advices for you. We have similar experience, I think.

With Django you may play with .aggregate() and .annotate() functions.
You should be able to do grouping you need Just tell a ORM about aggregates you want to use, and it will automagically add group by. 

Let's try group by col1, col2:

> from django.db.models import Count
> from django.contrib.auth.models import User

> print User.objects.values('is_staff', 'is_superuser').annotate(cnt=Count('*')).query
SELECT `auth_user`.`is_staff`, `auth_user`.`is_superuser`, COUNT(*) AS `cnt` FROM `auth_user` GROUP BY `auth_user`.`is_staff`, `auth_user`.`is_superuser` ORDER BY NULL

You must tell Django that you need some aggregate, and then ask for selecting other values explicitely. These columns will be added to a group by.
Please note that as a result you will get iterable of dicts instead of model instances, which is pretty reasonable.

BR,
Marcin


Larry Martell

unread,
Mar 1, 2017, 7:52:09 AM3/1/17
to django...@googlegroups.com
On Mon, Feb 27, 2017 at 3:41 AM, <marcin....@gmail.com> wrote:
>
>
> On Monday, February 27, 2017 at 3:52:38 AM UTC+1, Larry....@gmail.com wrote:
>>
>> [SQL] That is a language I
>> have worked in for over 20 years, and when I see a querying need that
>> is how I think, and then I see how can I do that with the ORM.
>
>
> So I shouldn't give advices for you. We have similar experience, I think.

No, I will always take advice. Because you don't know what you don't know.

> With Django you may play with .aggregate() and .annotate() functions.
> You should be able to do grouping you need Just tell a ORM about aggregates
> you want to use, and it will automagically add group by.
>
> Let's try group by col1, col2:
>
>> from django.db.models import Count
>> from django.contrib.auth.models import User
>
>> print User.objects.values('is_staff',
>> 'is_superuser').annotate(cnt=Count('*')).query
> SELECT `auth_user`.`is_staff`, `auth_user`.`is_superuser`, COUNT(*) AS `cnt`
> FROM `auth_user` GROUP BY `auth_user`.`is_staff`, `auth_user`.`is_superuser`
> ORDER BY NULL
>
> You must tell Django that you need some aggregate, and then ask for
> selecting other values explicitely. These columns will be added to a group
> by.
> Please note that as a result you will get iterable of dicts instead of model
> instances, which is pretty reasonable.

As is so often the case, the requirements changed. Now what I had to
do, if I was doing it in SQL would have been:

(CASE
WHEN TRIM(IFNULL(roiname, '')) IN ('', 'None') THEN CONCAT_WS('.',
CONVERT(roi_type_id, CHAR), roi_id)
WHEN CONCAT_WS('.', CONVERT(roi_type_id, CHAR), roi_id) = roiname THEN roiname
ELSE CONCAT_WS('.', CONVERT(roi_type_id, CHAR), roi_id, roiname)
END) REGEXP '%s'

But the table I am selecting from has 600,000 rows or more, and that
query would cause a table scan. So I did not add that to the existing
query, and instead iterated over the result set in python and did that
filtering.

marcin....@gmail.com

unread,
Mar 1, 2017, 8:01:26 AM3/1/17
to Django users


As is so often the case, the requirements changed. Now what I had to
do, if I was doing it in SQL would have been:

(CASE
 WHEN TRIM(IFNULL(roiname, '')) IN ('', 'None') THEN CONCAT_WS('.',
CONVERT(roi_type_id, CHAR), roi_id)
 WHEN CONCAT_WS('.', CONVERT(roi_type_id, CHAR), roi_id) = roiname THEN roiname
 ELSE CONCAT_WS('.', CONVERT(roi_type_id, CHAR), roi_id, roiname)
 END) REGEXP '%s'

But the table I am selecting from has 600,000 rows or more, and that
query would cause a table scan. So I did not add that to the existing
query, and instead iterated over the result set in python and did that
filtering.

Queries like that aren't "compatible" with Django ;)
You may also consider creating a view and map it to a unmanaged django model, but do not forget to add on_delete/on_update=DO_NOTHING for FKs.
But I have no idea what Django migration system do with such mapped view - I kicked off builtin migrations completely and I am using Liquibase to manage dbs.

Marcin

Larry Martell

unread,
Mar 1, 2017, 8:09:06 AM3/1/17
to django...@googlegroups.com
I had thought of using a view, but that would have been a lot of
overhead on such a large table. I also considered adding a column and
running a one time script to update the existing rows, and modifying
the script that loads data to populate the new column. But doing an
alter on such a large table takes longer then we can afford to have
the table locked for.

Also, we don't use django migrations on this project. We have found
them very hard to manage in an environment where you have 40
deployments, all with different versions of the code and database.

marcin....@gmail.com

unread,
Mar 1, 2017, 8:13:56 AM3/1/17
to Django users


On Wednesday, March 1, 2017 at 2:09:06 PM UTC+1, Larry....@gmail.com wrote:

I had thought of using a view, but that would have been a lot of
overhead on such a large table. I also considered adding a column and
running a one time script to update the existing rows, and modifying
the script that loads data to populate the new column. But doing an
alter on such a large table takes longer then we can afford to have
the table locked for.

It depends on your needs, of course. Your solution described earlier is widely adopted. 
 
Also, we don't use django migrations on this project. We have found
them very hard to manage in an environment where you have 40
deployments, all with different versions of the code and database.

Good decision. There are many other factors like unavailability of migrations after application's code changes (due to "freezing" class references).

BR,
Marcin
Reply all
Reply to author
Forward
0 new messages