Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Django aggregation equivalent of SUM(IF(field = x, 1, 0)) - 40 million row table
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  5 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Cal Leeming [Simplicity Media Ltd]  
View profile  
 More options Jun 16 2011, 7:20 am
From: "Cal Leeming [Simplicity Media Ltd]" <cal.leem...@simplicitymedialtd.co.uk>
Date: Thu, 16 Jun 2011 12:20:57 +0100
Local: Thurs, Jun 16 2011 7:20 am
Subject: Django aggregation equivalent of SUM(IF(field = x, 1, 0)) - 40 million row table

Hey guys,

I've picked through as much as the Django docs as possible, but can't seem
to find any reference to Sum() allowing any in-method conditionals (i.e. a
conditional to Sum() with an in-line IF(), without the use of a
WHERE/filter).

The original query in MySQL is:

mysql> select SUM(is_spam) as is_spam, SUM(is_image_blocked) as
is_image_blocked, SUM(IF(dl_job_state = 2, 1, 0)) as dl_job_success COUNT(*)
as total_rows from fourchan_post;
+---------+------------------+----------------+------------+
| is_spam | is_image_blocked | dl_job_success | total_rows |
+---------+------------------+----------------+------------+
|    9116 |           266516 |        5010939 |   38832166 |
+---------+------------------+----------------+------------+
1 row in set (3 min 13.14 sec)

In Django, I'm using (incomplete):

>>> Post.objects.aggregate(Count('id'), Sum('is_spam'),

Sum('is_image_blocked'), Sum('is_checked'))

So far, the only way I can see to do this, would be to do a filter() before
the aggregate, but this will affect the other Sum()'s which would mean
multiple queries would be necessary. Normally this wouldn't be an issue, but
the table has over 40 million rows lol (and it already takes well over 3
minutes to execute)

Ideally, I'd like to try and find a way (within the ORM), to specify an IF
conditional for the Sum(), thus only having to perform a single query. If
this isn't possible, I'll put in a feature request for it.

Thanks

Cal


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Russell Keith-Magee  
View profile  
 More options Jun 16 2011, 7:27 am
From: Russell Keith-Magee <russ...@keith-magee.com>
Date: Thu, 16 Jun 2011 19:27:44 +0800
Local: Thurs, Jun 16 2011 7:27 am
Subject: Re: Django aggregation equivalent of SUM(IF(field = x, 1, 0)) - 40 million row table
On Thu, Jun 16, 2011 at 7:20 PM, Cal Leeming [Simplicity Media Ltd]

I haven't looked into your specific use case in detail, but it *might*
be possible to do this with a custom aggregate. SUM, COUNT etc are all
just classes in Django, so you can extend and define your own if you
want.

Ticket #11305 [1] seems to describe almost exactly the feature you are
asking for (although they wanted COUNT, not SUM); in the discussion
for the ticket, Alex gives a code sample implementing conditional
aggregates.

There were also discussions on django-users back when aggregates were
introduced, describing how to define a custom aggregate [2].

Both these references are worth a look.

[1] https://code.djangoproject.com/ticket/11305
[2] http://groups.google.com/group/django-users/browse_thread/thread/bd5a...

Yours,
Russ Magee %-)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Cal Leeming [Simplicity Media Ltd]  
View profile  
 More options Jun 16 2011, 7:41 am
From: "Cal Leeming [Simplicity Media Ltd]" <cal.leem...@simplicitymedialtd.co.uk>
Date: Thu, 16 Jun 2011 12:41:23 +0100
Local: Thurs, Jun 16 2011 7:41 am
Subject: Re: Django aggregation equivalent of SUM(IF(field = x, 1, 0)) - 40 million row table

Hi Russ,

Thanks for getting back to me. Ticket 11305 looks like what I need, but the
code is extremely hacky. The second link you pasted seems to take the right
approach, and I agree with you that the F() object should be used.

If I create a sane/clean patch (with F() support) as a stand alone class,
would you consider having it merged into the core?

Cal

On Thu, Jun 16, 2011 at 12:27 PM, Russell Keith-Magee <


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Russell Keith-Magee  
View profile  
 More options Jun 16 2011, 8:11 am
From: Russell Keith-Magee <russ...@keith-magee.com>
Date: Thu, 16 Jun 2011 20:11:40 +0800
Local: Thurs, Jun 16 2011 8:11 am
Subject: Re: Django aggregation equivalent of SUM(IF(field = x, 1, 0)) - 40 million row table
On Thu, Jun 16, 2011 at 7:41 PM, Cal Leeming [Simplicity Media Ltd]

<cal.leem...@simplicitymedialtd.co.uk> wrote:
> Hi Russ,
> Thanks for getting back to me. Ticket 11305 looks like what I need, but the
> code is extremely hacky. The second link you pasted seems to take the right
> approach, and I agree with you that the F() object should be used.
> If I create a sane/clean patch (with F() support) as a stand alone class,
> would you consider having it merged into the core?

It depends a little on the exact use case your patch covers. If your
aggregate only covers an obscure edge case, I'd prefer to see it live
externally; but if you can make a case that it's a common use case (or
it's obvious that it is a common use case), then a suitably robust and
tested patch could be considered for core.

Yours,
Russ Magee %-)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Cal Leeming [Simplicity Media Ltd]  
View profile  
 More options Jun 16 2011, 8:14 am
From: "Cal Leeming [Simplicity Media Ltd]" <cal.leem...@simplicitymedialtd.co.uk>
Date: Thu, 16 Jun 2011 13:14:23 +0100
Local: Thurs, Jun 16 2011 8:14 am
Subject: Re: Django aggregation equivalent of SUM(IF(field = x, 1, 0)) - 40 million row table

Hmm, to be honest, most people don't really have the amount of data that we
have, and thus would be able to use multiple queries with a filter()
instead.

I've had a look through the Django sql aggregates and models aggregates,
however I realised that for this to be included in the core, I'd have to add
support for all the other databases too. Sadly, I haven't got enough spare
time to create/patch/test to this extent, so it looks like I'll have to use
a raw() for now.

Thanks for getting back to me anyway though, I'll update the ticket with
this discussion.

Cal

On Thu, Jun 16, 2011 at 1:11 PM, Russell Keith-Magee <


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »