Proposal - values_from_columns

63 views
Skip to first unread message

danie...@gmail.com

unread,
Oct 9, 2025, 5:51:55 PM (6 days ago) Oct 9
to elixir-ecto
Hey all. Recently I've been thinking about optimizing bulk inserts, and I saw this: https://groups.google.com/g/elixir-ecto/c/ucMvVCtYubM/m/bNjWGYQKAwAJ. It's about optimization for insert_all, the concern there was that we don't always know the type of data. I also know about the values function (it's really hard to find in the docs), where you can specify the data type - this can be used for inserts and other queries:
Enum.map(
  1..1000,
  &%{
    id: &1,
    user_id: &1,
    event_type: "benchmark_event",
    score: 42,
    created_at: ~N[2024-01-01 00:00:00],
    batch_id: batch_id,
    status: "completed"
  }
)

types = %{
  id: :integer,
  user_id: :integer,
  event_type: :string,
  score: :integer,
  created_at: :naive_datetime,
  batch_id: Ecto.UUID,
  status: :string
}

query = from(v in values(rows, types), select: v)
Repo.insert_all(DemoEvent, query, [])

I think we can generalize on both of these and have a function in Ecto that will create a values-like structure using unnest. I prepared a POC and called it values_from_columns:

columns_large = [
  id: Enum.to_list(1..1000),
  user_id: Enum.to_list(1..1000),
  event_type: "benchmark_event",
  score: 42,
  created_at: ~N[2024-01-01 00:00:00],
  batch_id: Ecto.UUID.bingenerate(),
  status: "completed"
]

types_large = %{
  id: {:array, :integer},
  user_id: {:array, :integer},
  event_type: :string,
  score: :integer,
  created_at: :naive_datetime,
  batch_id: :uuid,
  status: :string
}

subquery_large =
  from(
    v in values_from_columns(
      columns_large,
      types_large
    ),
    select: v
  )

Repo.insert_all(DemoEvent, subquery_large, [])

This insert is way faster—2.6ms vs 17ms with just values.
The query generated is:
INSERT INTO "demo_events" ("id","user_id","event_type","score","created_at","batch_id","status")
(SELECT v0."id", v0."user_id", v0."event_type", v0."score", v0."created_at", v0."batch_id", v0."status" FROM
(SELECT unnest($1::integer[]) AS "id",unnest($2::integer[]) AS "user_id",$3::varchar AS "event_type",$4::integer AS "score",$5::timestamp AS "created_at",$6::uuid AS "batch_id",$7::varchar AS "status") AS v0
 ("id","user_id","event_type","score","created_at","batch_id","status"))

If you think it's OK, I can clean up the changes I made and prepare a proper PR.

José Valim

unread,
Oct 10, 2025, 1:30:50 AM (6 days ago) Oct 10
to elixi...@googlegroups.com
Hi Daniel, thanks for the proposal. Can you post the query emitted by the first construct?

Also, what would be the database support for this new operation?

Thanks!



--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/elixir-ecto/321b5a0c-0fe7-48b5-b875-876ad0541f0cn%40googlegroups.com.

danie...@gmail.com

unread,
Oct 10, 2025, 1:47:28 AM (6 days ago) Oct 10
to elixir-ecto
The first one looks like this:

INSERT INTO "demo_events" ("id","status","score","user_id","event_type","batch_id","created_at") (SELECT v0."id", v0."status", v0."score", v0."user_id", v0."event_type", v0."batch_id", v0."created_at" FROM (VALUES ($1::integer,$2::varchar,$3::integer,$4::integer,$5::varchar,$6::uuid,$7::timestamp),($8::integer,$9::varchar,$10::integer,$11::integer,$12::varchar,$13::uuid,$14::timestamp),($15::integer,$16::varchar,$17::integer,$18::integer,$19::varchar,$20::uuid,$21::timestamp)..........

Fully optimized it would be postgres only, for mysql there may be a way to use the json trickery, like proposed in the mentioned pr. But I have looked currently only into postgres because I used it with fragments. The changes from the pr is using types - values already supports it, and allowing to pass a placeholder that is not a list but a single value.

José Valim

unread,
Oct 10, 2025, 1:57:54 AM (6 days ago) Oct 10
to elixi...@googlegroups.com
I wonder if we can do this by passing an optional “defaults” option as third argument to “values”, which different databases could be optimize?

In your case, they can even be added directly to the SELECT of the first query!


Greg Rychlewski

unread,
Oct 10, 2025, 8:24:20 AM (6 days ago) Oct 10
to elixir-ecto
I have an idea for how to support this by adding a couple of improvements to fragment splices. And those improvements would have uses cases outside of this as well.

1) If we allow a second argument to splice `modifier` that is an atom that takes the name of one of the other fragment modifiers `:identity`, `:constant`. This would allow you to insert dynamic table names and select column lists. Something like `for f in fragment("select ? from ... as(?)", ^splice(^select_field_list, :identifier), splice(^all_field_list, :identifier)`
2) If we allow spliced values to be casted to certain types dynamically instead of forcing the user to hard code the casting into the fragment. Something like `for f in fragment("select ? from ...", splice({^list_of_values, ^list_of_types})`

Then I think you can create a macro to support arbitrary unnest sources

```
defmacro unnest(values_list, types_list,  select_fields, all_fields) do
    quote do
        fragment("select ? from unnest(?) as f(?)", splice(^select_fields, :constant), splice({^values_list, ^types_list}), splice(^all_fields, :consant))
    end
end
```

So you get support for unnest sources using existing functionality without conflating it with VALUES. And you get support in fragments for dynamic column lists and dynamic casting of spliced parameters.

cevado

unread,
Oct 10, 2025, 10:14:54 AM (6 days ago) Oct 10
to elixir-ecto
There is an issue with unnest that it's recursive, so if you have a field that is an array you'll have problems, also unnest forces you to unnest each and every field independently.
I think an approach with jsonb_to_recordset. with your splice approach i think we could handle explicit typing all fields and actually the resulting query could be just 1 subquery instead of 2 nested ones.

cevado

unread,
Oct 10, 2025, 10:18:57 AM (6 days ago) Oct 10
to elixir-ecto
if there is a PR with that approach I can try to help with moving it to jsonb_to_recordset instead of unnest

Greg Rychlewski

unread,
Oct 10, 2025, 10:57:58 AM (6 days ago) Oct 10
to elixir-ecto
I don't currently have a PR I just kind of know where the changes would be because I implemented the splicing behaviour. With the changes I proposed I believe you can make both ways work (unnest and jsonb_to_recordset). Ecto would provide the tools and the user utilizes them in their own macros.

danie...@gmail.com

unread,
Oct 11, 2025, 12:50:46 PM (5 days ago) Oct 11
to elixir-ecto
> I wonder if we can do this by passing an optional “defaults” option as third argument to “values”, which different databases could be optimize?
> In your case, they can even be added directly to the SELECT of the first query!

I think it should work exactly like placeholders now work in insert_all. 
My issue is more about the unnest, I often need to emit events for a list of ids - in this case I have already the dynamic data as a single list, the rest is static data like topic and trace_id and I'm currently using placeholders for that. With unnest there is no intermediate data created - I can send the list directly to database.
I also found the unnest useful with bulk update: having a map of id and a column updates I can %{1 => "John", 2 => "Jane"} |> Enum.unzip() and I have 2 lists at this point that can be unnested and joined to actual table for the update.

Greg - looking at the example you gave with splice and different params splice({^values_list, ^types_list}), it may be a bit confusing for the reader.

Greg Rychlewski

unread,
Oct 11, 2025, 9:10:37 PM (4 days ago) Oct 11
to elixir-ecto
It might be a matter then of finding better abstractions to let fragments handle these cases. It seems like right now it is missing two things: 

1. Dynamically set the names of columns as in `select ... from ...as(col1 col1_type, col2 col2_type, ...)`
2. Dynamically pass arguments into functions with casting. We support `select fun(arg1, arg2, ...)` but not `select fun(arg1::arg1_type, arg2::arg2_type, ....)`

The reason I am thinking of this direction is because it seems there are multiple table returning functions that would be useful. Both unnest and jsonb_to_recordset were mentioned in this thread. And who knows what else is out there/will be out there in the future. 

José Valim

unread,
Oct 12, 2025, 1:44:48 PM (4 days ago) Oct 12
to elixi...@googlegroups.com
Great breakdown of the root causes Greg!

> 1. Dynamically set the names of columns as in `select ... from ...as(col1 col1_type, col2 col2_type, ...)`

Didn't we recently allow giving a schema to fragment? If so, the fragment + schema should be enough for this, right?

> 2. Dynamically pass arguments into functions with casting. We support `select fun(arg1, arg2, ...)` but not `select fun(arg1::arg1_type, arg2::arg2_type, ....)`

I may be missing something, but shouldn't this be valid?

fragment("fun(?, ?)", type(^arg1, :integer), type(^arg2, :integer))

In any case, I think we could have an argument for both low level APIs and high-level APIs.

---

Daniel, could your original example also be done with a inner join for the constant values instead of unnest? For example:

SELECT *
FROM (VALUES ('constant_value')) AS c(shared_col)
CROSS JOIN (VALUES
    (1, 'Alice', 25),
    (2, 'Bob', 30),
    (3, 'Carol', 28)
) AS people(id, name, age);

Which means you have the columns "id, name, age, constant_value". This way constant values are given only once. This should be possible today (but we may want to provide convenience for it, to be discussed).



--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.

Greg Rychlewski

unread,
Oct 12, 2025, 4:35:29 PM (3 days ago) Oct 12
to elixi...@googlegroups.com
>  Didn't we recently allow giving a schema to fragment? If so, the fragment + schema should be enough for this, right?

We might be able to do something with it but currently it doesn't work unfortunately. For example if you write this Ecto query: `from f in {fragment("select 1 as num"), Barebone}` it turns into this SQL query: `SELECT f0."visits" FROM (select 1 as visits) AS f0 []`.

So the schema controls how the columns are selected but it doesn't make sure the underlying fragment names its columns. I believe this would be tough to do automatically because we're not sure what is in the fragment string. Like we don't know if they have already named their columns.

> fragment("fun(?, ?)", type(^arg1, :integer), type(^arg2, :integer))

Oh actually I think you are right this can work. I'm not too sure how smart splice is right now like if it just expects literals but we can probably make it work with this.

You received this message because you are subscribed to a topic in the Google Groups "elixir-ecto" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-ecto/YEaa9YvvUuQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-ecto...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/elixir-ecto/CAGnRm4LPQ0tWp5e-UQsW-h2%3DEKMC_nRDgex-CVquaOFTcbJbgA%40mail.gmail.com.

danie...@gmail.com

unread,
Oct 13, 2025, 4:08:18 PM (2 days ago) Oct 13
to elixir-ecto
In the original example it was just a single table and select worked fine in this example :from(v in values(rows, types), select: v),
but when I now cross join 2 tables ecto is complaining that I need to be explicit about the columns to select. 

José Valim

unread,
Oct 13, 2025, 6:04:44 PM (2 days ago) Oct 13
to elixi...@googlegroups.com
That makes sense, you have to unify the columns by hand, which is definitely more work. Perhaps you could create an abstraction that does this for you using subqueries?

In any case, outside of Ecto, how well does the SQL query itself perform using this approach?


danie...@gmail.com

unread,
Oct 14, 2025, 1:16:13 AM (2 days ago) Oct 14
to elixir-ecto
With just ids like in the original example Enum.map(1..1000,&%{user_id: &1, id: &1}) joined to static values it is around 4ms.
But the formatted query is 38lines long.
cross_join.livemd

José Valim

unread,
Oct 14, 2025, 3:04:25 AM (2 days ago) Oct 14
to elixi...@googlegroups.com
Ah, good point. So perhaps the best is to mix both approaches. Use values for the constants and then unnest for the collection? That would be something hard for us to optimize for.

In a nutshell:

1. unnest is recursive
2. jsonb_to_recordset requires jsonb conversion first
3. Only values generates long queries
4. values + unnest may give the best ROI

Assuming 4 works, the feature we would need is the ability of merging two bindings into one, so you don't have to repeat the unnests columns and the values columns?


Dario Heinisch

unread,
Oct 14, 2025, 3:27:43 AM (2 days ago) Oct 14
to elixi...@googlegroups.com

keep in mind that unnest flattens the array:

 select UNNEST(ARRAY[Array['bob'], Array['alice']]);
 unnest
--------
 bob
 alice

so with unnest you cannot insert into an array column so you would have to use jsonb. I do this usually when unnest doesn't work:

insert into TABLE
  select record_to_insert.* 
  from jsonb_array_elements($1::jsonb) json_element,
  jsonb_populate_record(null::TABLE, json_element) as record_to_insert;

danie...@gmail.com

unread,
Oct 14, 2025, 5:56:17 PM (2 days ago) Oct 14
to elixir-ecto
> Assuming 4 works, the feature we would need is the ability of merging two bindings into one, so you don't have to repeat the unnests columns and the values columns?
Ideally it would create a table like structure like values do now - This can be inserted to database and also used with update_all. It's the most flexible structure.


> keep in mind that unnest flattens the array
Yes, but when we know the type of data then we can choose if it's possible to optimize it as unnest or just use jsonb. Storing arrays is not that popular comparing to basic data types. Last time I had a requirement to store array of id's I ended with a join table, because it has better support in ecto. 
Btw I have another idea: We know the datatype and we can create an optimized serializer that could encode the json on the fly when sending to db instead of first running JSON.encode on it. 
Reply all
Reply to author
Forward
0 new messages