[Proposal] Support Aggregate Filters

90 views
Skip to first unread message

Zachary Daniel

unread,
Mar 27, 2018, 10:44:31 AM3/27/18
to elixir-ecto
It can really raise code clarity in SQL to use aggregate filters as opposed to using table level filters. Consider the example


"Get the number of active, inactive, deleted, and pending posts, and additionally the count of all the posts"

from post in Post,
  group_by
: post.status,
 
select: {post.status, count(post.id)}

result
= query |> Repo.all()

# I didn't think too hard about this, so there is probably a better way to write the code that does this,
# but the point is that the next example is abundantly clear in comparison
total
=
 result
 
|> Enum.map(&elem(1))
 
|> Enum.sum()

result
|> Enum.into(%{})
|> Map.take(["active", "inactive", "deleted", "pending"])
|> Map.put("total", total)


vs.

query =
from post in Post,
 
select:%{
    active
: count(:id, filter: post.status == "active"),
    inactive
: count(:id, filter: post.status == "inactive"),
    deleted
: count(:id, filter: post.status == "deleted"),
    pending
: count(:id, filter: post.status == "pending"),
    total
: count(:id)
 
}

Repo.all(query)


In many situations, using filters in counts can significantly raise code clarity.

The SQL that would need to be generated would be:


SELECT
  COUNT
(p.id) FILTER (WHERE p.status = "active") active,
 
COUNT(p.id) FILTER (WHERE p.status = "inactive") inactive,
 
COUNT(p.id) FILTER (WHERE p.status = "deleted") deleted,
 
COUNT(p.id) FILTER (WHERE p.status = "pending") pending,
  COUNT
(p.id) total
FROM posts




José Valim

unread,
Mar 27, 2018, 11:02:19 AM3/27/18
to elixi...@googlegroups.com
Thanks for the proposal. Questions:

1. How is the support across databases?
2. What other expressions can we pass to the aggregator besides FILTER? Can I compose FILTER and COUNT(DISTINCT foo)?
3. Which aggregators support the filter functionality?




José Valim
Founder and 
Director of R&D

--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-ecto/b04293a3-ac01-4ea1-91f8-761507136440%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Zachary Daniel

unread,
Mar 27, 2018, 12:51:39 PM3/27/18
to elixir-ecto
1. Support across databases is non-existent at the moment, but it is backed by SQL:2003, as an optional feature, meaning we may see it in mysql/sqlite some day.
2. You can pass, `OVER` for windowing as a sibling to `FILTER (WHERE clause)`, but nothing more inside the `FILTER` IIRC. I think its just WHERE.
3. All of the aggregators in postgres support the filter functionality, including user-defined aggregates.

José Valim

unread,
Mar 27, 2018, 1:15:18 PM3/27/18
to elixi...@googlegroups.com
Looks good to me. My only concern is that `count/2` already supports `count(p.foo, :distinct)`. So we need to make sure they can all play along. Maybe we can deprecate :distinct in favor of distinct: true.

Also, should we call this `concat(p.bar, where: p.foo == true)` or `concat(p.bar, filter: p.foo == true)`? I like the former because it is less keywords to learn.

I think this would be a great PR for you to contribute since it touches in many different parts of Ecto. :)



José Valim
Founder and 
Director of R&D

--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto+unsubscribe@googlegroups.com.

Zachary Daniel

unread,
Mar 27, 2018, 1:53:51 PM3/27/18
to elixir-ecto
yeah, this definitely seems like a good one to take on! I hesitate to deprecate `count(p.foo, :distinct)` as I'm sure there are like millions of references to that at this point.

Supporting all forms would be possible as well. I personally don't mind deprecating `:distinct` in favor of `distinct: true`, but I imagine others will.

My only hesitation against where is how it might look confusing in a big query.

from post in Post,
 
where: post.deleted == false,
  select_merge
: %{
    active
: count(post.id, where: post.status == "active")
 
},
 
where: post.special == true,
  select_merge
:%{
    inactive
: count(post.id, where: post.status == "inactive")
 
}

Honestly though, I don't think that we'd see something like this in real life, and also appreciate not adding new words to the language.

All in all, I find myself on the fence for both of those things, and could support them in any way. I leave it up to you/the community to decide on `where` vs `filter` and whether or not to deprecate `:distinct` in favor of `distinct: true`. I'll be able to get started on this tomorrow :)

Michał Muskała

unread,
Mar 27, 2018, 4:03:46 PM3/27/18
to elixir-ecto
I didn't really get it, but can distinct be combined with a filter? If not we could have `count(p.foo, :distinct)` and `count(p.foo, post.status == "active")`.

Michał.
--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-ecto/686b0d4d-d856-432e-a8c1-2352fc9b0dda%40googlegroups.com.

José Valim

unread,
Mar 27, 2018, 4:22:42 PM3/27/18
to elixi...@googlegroups.com
They can be combined, yes.
--

Zachary Daniel

unread,
Mar 27, 2018, 5:14:46 PM3/27/18
to elixir-ecto
We could solve for all variations really.

`count(p.foo, :distinct)` `count(p.foo, distinct: true)` `count(p.foo, where: p.thing == true)` and `count(p.foo, :distinct, where: p.thing == true)` are all distinguishable. I think leaving it as a keyword last argument there leaves us open to supporting window functions and things like that some day (not that I necessarily think that is time well spent), but aggregates do support more options *aside* from FILTER in postgres, so keywords leave us open for that.

José Valim

unread,
Mar 28, 2018, 4:37:44 AM3/28/18
to elixi...@googlegroups.com
The other idea here is to introduce filter as a separate function:

    filter(count(p.id), p.thing == true)

This is probably a better choice because it works with all aggregators, including user ones, and it doesn't need to deprecate count with distinct. Reference: https://www.postgresql.org/docs/9.4/static/sql-expressions.html



José Valim
Founder and 
Director of R&D

On Tue, Mar 27, 2018 at 11:14 PM, Zachary Daniel <zachary....@gmail.com> wrote:
We could solve for all variations really.

`count(p.foo, :distinct)` `count(p.foo, distinct: true)` `count(p.foo, where: p.thing == true)` and `count(p.foo, :distinct, where: p.thing == true)` are all distinguishable. I think leaving it as a keyword last argument there leaves us open to supporting window functions and things like that some day (not that I necessarily think that is time well spent), but aggregates do support more options *aside* from FILTER in postgres, so keywords leave us open for that.

--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-ecto/781ead5a-8d8d-4b86-83b4-894be6ac47ee%40googlegroups.com.

Michał Muskała

unread,
Mar 28, 2018, 5:01:57 AM3/28/18
to elixi...@googlegroups.com

On 28 Mar 2018, 10:37 +0200, José Valim <jose....@gmail.com>, wrote:
The other idea here is to introduce filter as a separate function:

    filter(count(p.id), p.thing == true)

This is probably a better choice because it works with all aggregators, including user ones, and it doesn't need to deprecate count with distinct. Reference: https://www.postgresql.org/docs/9.4/static/sql-expressions.html
 

I like this approach.

Michał.

Zachary Daniel

unread,
Mar 28, 2018, 10:15:42 AM3/28/18
to elixir-ecto
Yes, I like that :)

Especially since the syntax there *is* very separate. like `agg()` and `agg() FILTER ()`, which means there are a lot more expressions than `count` that this plays nice with, including something like `filter(fragment("array_agg(?)", p.foo), p.thing == true)`

José Valim

unread,
Mar 28, 2018, 10:24:20 AM3/28/18
to elixi...@googlegroups.com
Beautiful, so I think we can move forward. :)



José Valim
Founder and 
Director of R&D

On Wed, Mar 28, 2018 at 4:15 PM, Zachary Daniel <zachary....@gmail.com> wrote:
Yes, I like that :)

Especially since the syntax there *is* very separate. like `agg()` and `agg() FILTER ()`, which means there are a lot more expressions than `count` that this plays nice with, including something like `filter(fragment("array_agg(?)", p.foo), p.thing == true)`

--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto+unsubscribe@googlegroups.com.

Zachary Daniel

unread,
Mar 28, 2018, 10:25:31 AM3/28/18
to elixir-ecto
We could even make it more sql-y by allowing piping there (something I don't think we do anywhere else in the query syntax, so probably not worth it).

select: %{
  count_of_active
: count(p.foo) |> filter(p.active == true),
  count_of_inactive
: count(p.foo) |> filter(p.active == false)
}


José Valim

unread,
Mar 28, 2018, 10:32:59 AM3/28/18
to elixi...@googlegroups.com
We do allow pipe everywhere. It should just work.



José Valim
Founder and 
Director of R&D

--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto+unsubscribe@googlegroups.com.

Zachary Daniel

unread,
Mar 28, 2018, 10:35:35 AM3/28/18
to elixir-ecto
Today I learned :D
Reply all
Reply to author
Forward
0 new messages