Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Add Repo.group_aggregate

33 views
Skip to first unread message

José Valim

unread,
Jan 4, 2025, 11:41:47 AMJan 4
to elixir-ecto
Today, there is no easy API for computing the aggregation of groups.

For example, imagine I want to find the count of products in a given category. In order to do so, I need to build a query with group_by and a select:

from p in Product,
  group_by: p.category_id,
  select: {p.category_id, count(p.category_id)}

It would be nice to have a convenient API for this. We have three options:

1. we could use Repo.aggregate, given it already raises if a group_by is given, but then it could be confusing if we start suddenly returning a map as result for an input query

2. we could introduce Repo.group_aggregate and expect the groups to be given as arguments

3. we could introduce Repo.group_aggregate and expect the query with an existing group_by field set

Note that the implementation needs to deal with limits, offsets, and distinct, as done in Repo.aggregate today.

Thoughts?

Allen Madsen

unread,
Jan 4, 2025, 2:26:23 PMJan 4
to elixi...@googlegroups.com
Is the code you're trying to simplify?

from(p in Product,
  group_by: p.category_id,
  select: {p.category_id, count(p.category_id)})
|> Repo.all()
|> Enum.into(%{})

So, the api would be:

1. Repo.aggregate(Product, :count, group_by: :category_id)
2. Repo.group_aggregate(Product, :count, :category_id)
3. from(p in Product, group_by: p.category_id) |> Repo.group_aggregate(:count)

To me, 2 would be the most clear and least prone to error. It removes the possibility of grouping by multiple columns.



--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/elixir-ecto/fe1a724e-b94e-4c4c-8e66-fef241f57e50n%40googlegroups.com.

Austin Ziegler

unread,
Jan 4, 2025, 3:26:06 PMJan 4
to elixi...@googlegroups.com
On Sat, Jan 4, 2025 at 2:26 PM Allen Madsen <allen.c...@gmail.com> wrote:
To me, 2 would be the most clear and least prone to error. It removes the possibility of grouping by multiple columns.

Why would you want to do that? When producing certain reports I find *very* useful to group by multiple columns:

```sql
SELECT category_id, color_id, shape_id, COUNT(*)
  FROM product
 GROUP BY 1, 2, 3 
```

I do not believe that Ecto should restrict group aggregates to a single column unless it makes the API unwieldy and unstable to the point where dropping to SQL makes more sense.

-a
--

José Valim

unread,
Jan 4, 2025, 3:52:04 PMJan 4
to elixi...@googlegroups.com
I think supporting multiple keys would be a good add-on but it comes with the question: what happens if you pass multiple keys to group_by? We could use nested maps but that's very inefficient, so we should have tuples as keys? Therefore:

1. Single grouping column: the column is the key
2. Multiple grouping column: the columns are the keys (in a tuple)

?


--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.

Austin Ziegler

unread,
Jan 4, 2025, 4:08:19 PMJan 4
to elixi...@googlegroups.com
I think it would be unusual, but fine.

Another argument in favour of `Repo.group_aggregate` over `Repo.aggregate` accepting groups is that the *current* aggregate functions return a single result, but group aggregates can produce zero or more results based on the query.

-a

José Valim

unread,
Jan 4, 2025, 4:24:46 PMJan 4
to elixi...@googlegroups.com
> Another argument in favour of `Repo.group_aggregate` over `Repo.aggregate` accepting groups is that the *current* aggregate functions return a single result, but group aggregates can produce zero or more results based on the query.

Although in that case it should mean an empty map, right?


Austin Ziegler

unread,
Jan 4, 2025, 4:35:24 PMJan 4
to elixi...@googlegroups.com
Depends on what you want to return.

If we have the following query and results:

```sql
SELECT category, material, COUNT(*)
  FROM product
 GROUP BY 1, 2, 3;
```

| category | material | count |
| -------- | -------- | ----- |
| amateur  | plastic  | 25    |
| amateur  | iron     | 2     |
| pro      | plastic  | 2     |
| pro      | steel    | 25    |
| luxe     | plastic  | 1     |
| luxe     | gold     | 12    |

Then I would sort of expect the Ecto to be:

```elixir
Product
|> group_by([:category, :material])
|> Repo.aggregate_group(:count)
```

And the result would be something like:

```elixir
[{{"amateur", "plastic"}, 25},
{{"amateur", "iron"}, 2},
]
```

Or are you thinking

```elixir
%{
  {"amateur", "plastic"} => 25,
  {"amateur", "iron"} => 2,
  …
%}
```

-a

José Valim

unread,
Jan 4, 2025, 6:25:59 PMJan 4
to elixi...@googlegroups.com
I am thinking the latter:

%{
  {"amateur", "plastic"} => 25,
  {"amateur", "iron"} => 2,
  …
%}

:)




Greg Rychlewski

unread,
Jan 4, 2025, 7:05:56 PMJan 4
to elixir-ecto
I think there is a case for using the existing aggregate function. Other functions like update_all and delete_all change their return behaviour depending on whether select was provided or not.

To me it seems pretty normal/intuitive that aggregate would return  a map if group_by is in the query. And today the queries fail so it seems fair to consider it an add-on as opposed to a breaking change.

Christopher Keele

unread,
Jan 4, 2025, 7:10:33 PMJan 4
to elixir-ecto
>  I think supporting multiple keys would be a good add-on but it comes with the question: what happens if you pass multiple keys to group_by? We could use nested maps but that's very inefficient, so we should have tuples as keys?

Supporting multiple keys feels like table stakes, to me--at least in the multi-column natural keys/snowflake schemas I operate it a lot. If not part of the initial implementation, at least with an API that could support it in the future.

Tuples with grouped keys is very much the shape I'd want to get the data back in. It's fairly trivial to reduce that into nested maps and vice-versa, so I'd prefer the more efficient form regardless of preference, but that's also my preference.

José Valim

unread,
Jan 5, 2025, 5:08:51 AMJan 5
to elixi...@googlegroups.com
> To me it seems pretty normal/intuitive that aggregate would return  a map if group_by is in the query. And today the queries fail so it seems fair to consider it an add-on as opposed to a breaking change.

I am kind of thinking ahead here but, with the type system looming, there are benefits in knowing if something will return exclusively a map or not. Generally speaking we want functions with wide input types, narrow return types.


Greg Rychlewski

unread,
Jan 5, 2025, 11:25:19 AMJan 5
to elixi...@googlegroups.com
If going with a separate function maybe it makes sense to return a list of maps, one for each combination of the grouping columns? 

If we are considering  this a shortcut for a query then it would mimic the results of a real query (while being opinionated on the format of an individual row’s return value )

You received this message because you are subscribed to a topic in the Google Groups "elixir-ecto" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-ecto/mhvOmX8EaIQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-ecto...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/elixir-ecto/CAGnRm4L7h%2BpLxZkbS5LHUZ6mPoxMymbL9U9F5vU5vaL8%3DXrJMg%40mail.gmail.com.

José Valim

unread,
Jan 5, 2025, 11:26:56 AMJan 5
to elixi...@googlegroups.com
I was thinking more about mirroring the result of "Enum.group_by", which is a map with the group_by as keys and the aggregates as values :)


José Valim

unread,
Jan 5, 2025, 11:28:47 AMJan 5
to elixi...@googlegroups.com
Perhaps that's another API we could look into? What if we introduce:

Repo.groups(query)

For a query of:

from p in Post, group_by: :category, select: {count(p.id), average(p.visits)}

It would be equivalent to running:

Repo.all(from p in Post, group_by: :category, select: {p.category, {count(p.id), average(p.visits)}}) |> Map.new()

Greg Rychlewski

unread,
Jan 5, 2025, 12:40:56 PMJan 5
to elixi...@googlegroups.com
Ah ok got it. Using group by in the query is more flexible because we can use expressions and fields on join sources. But at the same time we have no way to aggregate over expressions or join fields so that might seem incongruous to some people. 

Maybe to make it the same we used a parameter for group by fields as well and if people really want join fields or expressions then they can aggregate over a subquery? That way it’s the same workaround for both the grouping fields and the aggregation field 

Reply all
Reply to author
Forward
0 new messages