Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Fragment result transformations

37 views
Skip to first unread message

Jonáš Trantina

unread,
Nov 27, 2024, 7:03:41 AM11/27/24
to elixir-ecto
Hi,

when working on a specific problem I felt the need to have more powerful fragments. Since this gets a bit complicated I will first explain the problem and my solution so the motivation is clear.

Illustration of the problem

In Ecto I love that I can just select structs (and other complex types). This eliminates the need for post-processing functions.

from(p in Post, select: %Post{...})

However, when using array aggregation function from the database, it creates a barrier after which it is not possible use this Ecto feature. I.e. you cannot do something like this

from(p in Post, select: fragment("array_agg(?)", %Post{...}))

So you are left with aggregation some DB-native type, like JSON objects and having a post processing function. I think it's obvious why this is a downgrade compared to the snippet above (if it worked), or something similar.

My solution

To solve this problem wrote a macro called cast_agg to easily select and transform structs even after this aggregation barrier. It looks something like this:

from(
  p in Post,
  join: assoc(c, :comments),
  as: :comment,
  select: %Post{
    ...
    comments: cast_agg(%Comment{
      ...
      ratings: subquery(from(
        r in Rating,
        where: r.comment_id == parent_as(:comment).id,
        select: cast_agg(%Rating{...})
      ))
    })
  },
  group_by: [p.id]
)


(this also illustrates two typical uses, but more on that later)

The macro cast_agg is implemented using fragments to Postgres functions array_agg and jsonb_build_object. It creates JSON objects corresponding to the constructed structs, but it also stores the information about the struct. So cast_agg(%Rating{...}) actually returns something like this

{
  "_struct_module_name": "Elixir.MyApp.Rating",
  "data": [...rating objects data]
}


Then I need to convert this back into the struct after the query is executed. This is where I hoped I would be able to hook into the query and do this transformation automatically. But I didn't find any such mechanism. So I ended up with having a generic function that converts these objects back to the struct. So the fetch looks like this:

query
|> Repo.all()
|> transform_cast_agg()

It works but it's not that nice. Having the requirement to run another function after using cast_agg seems detrimental to the user experience. And it makes it more difficult to create a generic Ecto macro like this.

I feel this transformation is something that could be done internally by Ecto but I wasn't able to do it. It seems to me that remembering that a specific part of the result is a struct and then converting the data into the structs is something that Ecto is capable of. It just seems that I can't hook into this capability as a user.

One solution I envisioned was to be able to append some transformation to a fragment. The transformation would then be used when converting the data back into the Elixir world. Something like this:

defmacro to_atom(field) do
  quote do
    fragment("lower(?)", unquote(field), &String.to_existing_atom/1)
  end
end

This way my cast_agg macro could store the information about the struct into the query itself and would also do the transformation back into the struct automatically once the query is executed. No need to run additional transformation functions and all would be wrapped in a single macro.

Obviously this proposal is not yet fully refined. Fragments can be used in other places, not just in selects. I also don't know much about how Ecto works internally yet. I just feel like I found something that could be useful. I'd be happy for any feedback.

Why is cast_agg useful

This is a bit of a side note to illustrate why is the example macro cast_agg useful.

The example query above illustrates why it's handy. You can use it to simply gather associated entities in a single query. The entities can be added via joins or via subqueries (which have to return a single value when used in a select). The macro can be nested in itself allowing to gather even nested associations.

The first case can be solved with an Ecto preload. However, I think this only works for associations, while the cast_agg macro works for any struct.

The second case is probably more interesting, since it allows you to load the associations via subqueries and avoid joins.

Regards,
Jonas Trantina

Ruslan Doga

unread,
Nov 27, 2024, 7:40:22 AM11/27/24
to elixir-ecto
This is a bit silly but it works. I wonder if it would be possible to extend type syntax to allow custom mappers.

Mix.install([:postgrex, :ecto_sql])

defmodule Repo do
  use Ecto.Repo, adapter: Ecto.Adapters.Postgres, otp_app: :demo
end

Application.put_env(:demo, Repo, url: "ecto://postgres:postgres@localhost:5432/plausible_test")
Repo.start_link()

import Ecto.Query

Repo.query!("""
CREATE TABLE example_data (
    id SERIAL PRIMARY KEY,
    name TEXT,
    value INT
)
""")

Repo.query!("""
INSERT INTO example_data (name, value)
VALUES
    ('Alice', 10),
    ('Bob', 20),
    ('Alice', 15),
    ('Bob', 25)
""")

Repo.query!("""
SELECT
    name,
    array_agg(ROW(name, value)) AS aggregated_tuples
FROM
    example_data
GROUP BY
    name
""")

defmodule App do
  import Ecto.Query

  defmodule Example do
    use Ecto.Schema

    schema "example_data" do
      field(:name, :string)
      field(:value, :integer)
    end

    def type, do: :record

    def load({name, value}) do
      {:ok, %Example{name: name, value: value}}
    end
  end

  def query do
    from(
      e in "example_data",
      group_by: e.name,
      select: %{
        name: e.name,
        agg: type(fragment("array_agg(row(?,?))", e.name, e.value), {:array, Example})
      }
    )
    |> Repo.all()
  end
end

# SELECT e0."name", array_agg(row(e0."name",e0."value"))::record[] FROM "example_data" AS e0 GROUP BY e0."name" []
App.query()

Results in

[
  %{
    name: "Alice",
    agg: [
      %App.Example{
        name: "Alice",
        value: 10
      },
      %App.Example{
        name: "Alice",
        value: 15
      }
    ]
  },
  %{
    name: "Bob",
    agg: [
      %App.Example{
        name: "Bob",
        value: 20
      },
      %App.Example{
        name: "Bob",
        value: 25
      }
    ]
  }
]

Ruslan Doga

unread,
Nov 27, 2024, 7:41:05 AM11/27/24
to elixir-ecto
I also tried doing it with a parametirized type, but Postgres adapter kept casting it to ::any which didn't work.

Ruslan Doga

unread,
Nov 27, 2024, 7:49:56 AM11/27/24
to elixir-ecto
Actually, I think type syntax doesn't need to be extended. It seems to be possible to pretty much express any mapping with types and parameterized types.

If Postgres adapter stops casting parameterized types to ::any, I think the following would be possible:

defmodule StructArray do
  use Ecto.ParameterizedType

  def type(_params), do: {:array, :record}

  def init(opts) do
    {Keyword.fetch!(opts, :struct), Keyword.fetch!(opts, :value)}
  end

  def load(record, _loader, {struct, fields}) do
    struct!(struct, record |> Tuple.to_list() |> Enum.zip(fields))
  end
end

defmodule App do
  @example_array Ecto.ParameterizedType.init(StructArray, struct: Example, fields: [:name, :value])
  def query do
    from(
      e in "example_data",
      group_by: e.name,
      select: %{
        name: e.name,
        agg: type(fragment("array_agg(row(?,?))", e.name, e.value), ^@example_array)
      }
    )
    |> Repo.all()
  end
end

Jonáš Trantina

unread,
Nov 27, 2024, 9:10:21 AM11/27/24
to elixir-ecto
Interesting. I didn't know `type` can be used to achieve something similar.

Can this be nested like in my original example? I haven't found a simple way to do that. It also feels like the structs would need to be prepared for a specific use in queries, although the last example with parameterized types looks much more general.

Maybe the `type` is indeed the way to go.

Dne středa 27. listopadu 2024 v 13:49:56 UTC+1 uživatel dogar...@gmail.com napsal:

Jonáš Trantina

unread,
Nov 28, 2024, 10:31:59 AM11/28/24
to elixir-ecto
Today I tested the example with parameterized types and when I fixed few typos and details it works just fine. It doesn't even require the struct to be an Ecto schema. This is very close to what I need. I'll play with it some more and see if this can be nested. If it can then I`m all good :) 

defmodule Param do

  import Ecto.Query

  defmodule Example do
    defstruct [:name, :value]
  end


  defmodule StructArray do
    use Ecto.ParameterizedType

    def type(_params), do: {:array, :record}

    def init(opts) do
      {Keyword.fetch!(opts, :struct), Keyword.fetch!(opts, :fields)}
    end

    def load(records, _loader, {struct, fields}) do
      {:ok, Enum.map(records, &struct!(struct, Enum.zip(fields, Tuple.to_list(&1))))}
    end

    # required but do nothing now
    def cast(data, _params), do: {:ok, data}
    def dump(data, _, _), do: {:ok, data}
  end


  @example_array Ecto.ParameterizedType.init(StructArray,
                   struct: Example,
                   fields: [:name, :value]
                 )
  def query do
    from(
      e in "example_data",
      group_by: e.name,
      select: %{
        name: e.name,
        agg: type(fragment("array_agg(row(?,?))", e.name, e.value), ^@example_array)
      }
    )
  end

  def run do
    Repo.all(query())
  end
end
Dne středa 27. listopadu 2024 v 15:10:21 UTC+1 uživatel Jonáš Trantina napsal:

Ruslan Doga

unread,
Nov 28, 2024, 11:14:42 AM11/28/24
to elixir-ecto
I see two possible approaches:
- changing :fields from a list of atoms to kv list of {field, type}
- using Ecto.Schema's __schema__/1 function to introspect its fields and types

And then using something like https://hexdocs.pm/ecto/Ecto.Type.html#load/3 recursively.

Jonáš Trantina

unread,
Nov 29, 2024, 10:37:13 AM11/29/24
to elixir-ecto
Yes, I have come to the same conclusion after playing with it a bit today.

So far I like the first option, since it will work with any struct, not just Ecto schemas. It should actually work on any maps with a bit of work. I am working on a PoC and it looks promising.

Thanks for the help!

Dne čtvrtek 28. listopadu 2024 v 17:14:42 UTC+1 uživatel dogar...@gmail.com napsal:

Jonáš Trantina

unread,
Dec 2, 2024, 6:19:26 AM12/2/24
to elixir-ecto
Well, I found that this doesn't work for subqueries. Something like this doesn't work - the type is not applied

type =
  Ecto.ParameterizedType.init(NewCast.CastTypeArray,
    struct: Actor,
    fields: [:key]
  )

from(e in Event,
  as: :e,
  select: %Event{
    actors:
      subquery(
        from(a in Actor,
          where: a.event_id == parent_as(:e).id,
          select: type(fragment("array_agg(row(?))", a.key), ^type)
        )
      )
  }
)
|> Repo.all()

So I though it would be possible to move the type out of the subquery, but that doesn't work too

type(subquery(
  from(a in Actor,
    where: a.event_id == parent_as(:e).id,
    select: fragment("array_agg(row(?))", a.key)
  )
), ^type)

generates error

** (Ecto.Query.CompileError) the first argument of type/2 must be one of:
* interpolations, such as ^value * fields, such as p.foo or field(p, :foo) * fragments, such as fragment("foo(?)", value) * an arithmetic expression (+, -, *, /) * an aggregation or window expression (avg, count, min, max, sum, over, filter) * a conditional expression (coalesce) * access/json paths (p.column[0].field) * parent_as/1 (parent_as(:parent).field)
So it seems I cannot use the parameterized types to achieve this.
Dne pátek 29. listopadu 2024 v 16:37:13 UTC+1 uživatel Jonáš Trantina napsal:

Ruslan Doga

unread,
Dec 2, 2024, 6:21:38 AM12/2/24
to elixir-ecto
I'm not familiar with this syntax and it's not obvious to me what it does. If it's a JOIN, then you can express differently and it would probably work.

Ruslan Doga

unread,
Dec 2, 2024, 6:24:49 AM12/2/24
to elixir-ecto
Something like:

from e in Event,
  join: a in Actor, on: a.event_id == e.id,
  group_by: a.key,
  select: %Event{
    actors: type(fragment("array_agg(row(?))", a.key), ^type)
  }

Or was it doing a lateral join?

Jonáš Trantina

unread,
Dec 2, 2024, 6:53:31 AM12/2/24
to elixir-ecto
Sure, this particular query can be rewritten, but that is just an example. I want the macro to work in subqueries too, since that is what we use it for. Converting it to a join is not always a good option. It actually allows us to avoid joins in places where it causes problems.

If you are not familiar with subqueries in select and are interested to learn more, you can check it here https://www.postgresql.org/docs/17/sql-expressions.html#SQL-SYNTAX-SCALAR-SUBQUERIES. It's basically just a subquery, which returns a single result and can be thus directly used in a select instead of a value or column. From what I understand, such subqueries and lateral joins are very similar (maybe even identical). But, at least in Ecto, I am not able to rewrite the subquery above into lateral join, because of this error

** (Ecto.QueryError) subquery/cte must select a source (t), a field (t.field) or a map

Thanks though!

Dne pondělí 2. prosince 2024 v 12:24:49 UTC+1 uživatel dogar...@gmail.com napsal:

Ruslan Doga

unread,
Dec 2, 2024, 6:59:35 AM12/2/24
to elixir-ecto
Right, `select` expressions in subqueries and `select` expressions in the parent queries are handled differently AFAIK. Post-processing and thus `type` application only happens on the parent query `select`. I still think all subqueries can be rewritten as joins and thus use `type` for post-processing :)

Ruslan Doga

unread,
Dec 2, 2024, 7:00:46 AM12/2/24
to elixir-ecto
** (Ecto.QueryError) subquery/cte must select a source (t), a field (t.field) or a map

Just means that you need to select a map from the subquery, like: %{actors: actors} in your example.

Jonáš Trantina

unread,
Dec 2, 2024, 7:22:19 AM12/2/24
to elixir-ecto
Right :) This works great

from(e in Event,
  as: :e,
  left_lateral_join:
    a in subquery(

      from(a in Actor,
        where: a.event_id == parent_as(:e).id,
        select: %{actors: type(fragment("array_agg(row(?))", a.key), ^type)}
      )
    ),
  select: %Event{
    actors: a.actors

  }
)
Dne pondělí 2. prosince 2024 v 13:00:46 UTC+1 uživatel dogar...@gmail.com napsal:
Reply all
Reply to author
Forward
0 new messages