[Feature Request] Shorthand syntax for filtering aggregations

107 views
Skip to first unread message

Tom Forbes

unread,
Apr 12, 2017, 5:31:44 PM4/12/17
to django-d...@googlegroups.com
Hello,
At the Djangocon sprints I wanted to add support for a postgres specific syntax for filtering aggregations, which is quite simple: MAX(something) FILTER (WHERE x=1).

During this the sprints I was told that it would be good to support this for all databases, and it can be done using the CASE syntax: MAX(CASE WHEN x=1 THEN something END).

Doing this with the ORM is quite verbose, it would be really nice to be able to have a shorthand syntax for filtering aggregates that can use database-specific syntax where available (the postgres syntax is faster than the CASE syntax). I've made a proof of concept merge request that implements this here:
I'd really like some feedback and to maybe have a discussion about what the API could look like. Currently I went for adding `.filter()` and `.exclude()` methods to the base Aggregate class. I like this approach as it's close to the familiar queryset syntax but I understand there are subtleties with combining them (I just AND them together currently). It's also better to be consistent - if we can't support all of the queryset filter() syntax then we shouldn't confuse people by having a filter method that acts differently.

An example of the API is this:
Mailboxes.objects.annotate(
   read_emails=Count('emails').filter(unread=False),
   unread_emails=Count('emails').filter(unread=True),
   recent_emails=Count('emails').filter(received_date__lt=one_week_ago)
)

I'd really like some feedback on what the API should look like. Is filter a good idea? Any feedback on the current merge request implementation is appreciated as well, I'm fairly new to the Django expression internals. I think I'm going the right way with having a separate FilteredAggregate expression but I'm not sure.

Many thanks,
Tom

Ian Foote

unread,
Apr 12, 2017, 6:07:32 PM4/12/17
to django-d...@googlegroups.com
Hi Tom,

I really like the look of Count('emails').filter(unread=False) in the
annotate call.

Ian
> --
> You received this message because you are subscribed to the Google
> Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-develop...@googlegroups.com
> <mailto:django-develop...@googlegroups.com>.
> To post to this group, send email to django-d...@googlegroups.com
> <mailto:django-d...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/CAFNZOJM0HDNCeLta6L9u7uJY7usJWn5u8ZYO4S8cDFOeOLYgGQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/django-developers/CAFNZOJM0HDNCeLta6L9u7uJY7usJWn5u8ZYO4S8cDFOeOLYgGQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.


signature.asc

charettes

unread,
Apr 17, 2017, 12:45:39 AM4/17/17
to Django developers (Contributions to Django itself)
Hello Tom,

Thanks for working on adding filter support to aggregate. I think adding
support for the SQL:2003 features on all aggregates by emulating it on
backends that don't support it using CASE makes a lot of sense[0].

As I've mentioned on your PR this is syntactic sugar I've been missing from
Django's conditional aggregation API since I was using django-aggregate-if[1]
before #11305 was fixed[2].

Now for the proposed syntax I understand you want to mimic the Queryset's
API by allowing filter()/exclude() to be chained but I don't think filter chaining
on aggregate is common enough to warrant the extra effort required to make it
work. As I've mentioned previously I'd advocate for a simple `filter` kwarg that
accepts a Q instance instead as it makes it easier to implement and emulate
the Case() fallback. This also makes the filter() vs filter().filter() for multi-valued
relationships subtleties a non-issue.

Mailbox.objects.annotate(
   read_emails=Count('emails', filter=Q(unread=False)),
   unread_emails=Count('emails', filter=Q(unread=True)),
   recent_emails=Count('emails', filter=Q(received_date__lt=one_week_ago)),
)

Cheers,
Simon

Tom Forbes

unread,
Apr 17, 2017, 10:52:36 AM4/17/17
to django-d...@googlegroups.com
Hey Simon,
Thanks for your reply, I actually agree after thinking about it. Adding a filter/exclude method and getting it to work correctly in the way people would expect is be pretty complex. Adding an argument that takes a Q object would suffice for now at least.

I'm happy to implement this, but any advice (however brief) on how to would be much appreciated! The base Aggregate class uses the Func `as_sql` (and I'm assuming we want to keep it like that), but I'm not sure how to support this parameter without a custom `as_sql` on the Aggregate class. The reason being is that the FILTER syntax appears outside of the aggregate, whereas the CASE syntax appears inside.

Perhaps I am over thinking things though. I could override the `as_sql` in the Aggregate class to check if the database supports FILTER, and if so just tack `FILTER (WHERE ...)` on the end of the produced sql. If it doesn't support FILTER then I can do what I currently do in the pull request and munge the source expressions to be a CASE statement before calling `super().as_sql()`.

Thoughts?


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.

charettes

unread,
Apr 17, 2017, 12:10:22 PM4/17/17
to Django developers (Contributions to Django itself)
Hey Tom,

Overriding Aggregate.as_sql() to branch on connection.features to build the
FILTER clause shouldn't be an issue.

I think you could rely on super().as_sql(..., filter=filter_sql) extra parameter when the
connection supports the FILTER statement and fallback to self.clone(), clone.source_expressions[0]
wrapping in a Case(When) and return clone.as_sql(...) when it's not the case.

Don't hesitate to @mention me on Github if you'd like extra pointers. There is other
contributors more familiar with the expression API than I am but I think I have enough
knowledge to shepperd you through this patch.

Cheers,
Simon
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages