[Proposal] Build better query for insert_all

159 views
Skip to first unread message

cevado

unread,
Feb 14, 2024, 10:00:33 AMFeb 14
to elixir-ecto
Currently ecto builds the following query when doing an insert_all:

insert into table(c1, c2...) values (?, ?...), (?, ?, ?..)...;

This has some issues when dealing with a large amount of rows to be inserted, and also it's a slow approach. here is an article with a better approach:
https://klotzandrew.com/blog/postgres-passing-65535-parameter-limit

José Valim

unread,
Feb 14, 2024, 10:22:02 AMFeb 14
to elixi...@googlegroups.com
This is very nice, thanks for sharing. I would love to see a pull request exploring this.

--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-ecto/5f1ca29c-f615-45be-b5b2-706adc5b9b4an%40googlegroups.com.

cevado

unread,
Feb 14, 2024, 11:35:03 AMFeb 14
to elixir-ecto
I can try exploring this one for postgres.

Onorio Catenacci

unread,
Feb 14, 2024, 2:54:29 PMFeb 14
to elixir-ecto
From the article it looks as if it would only be available for Postgres.

Ruslan Doga

unread,
Feb 15, 2024, 12:40:21 PMFeb 15
to elixir-ecto
👋

I did a quick test in https://github.com/ruslandoga/insert_all_bench and it seems like it's indeed an improvement when lots of rows are being inserted. It almost works with insert_all(table, ecto_query) but I don't know how to index into unnest tuples.

Message has been deleted

Onorio Catenacci

unread,
Feb 15, 2024, 3:03:03 PMFeb 15
to elixi...@googlegroups.com
On Thu, Feb 15, 2024 at 12:40 PM Ruslan Doga <dogar...@gmail.com> wrote:
👋

I did a quick test in https://github.com/ruslandoga/insert_all_bench and it seems like it's indeed an improvement when lots of rows are being inserted. It almost works with insert_all(table, ecto_query) but I don't know how to index into unnest tuples.



FWIW, while I don't see anything wrong with this it does strike me as a bit of a premature optimization. I get that it's faster for much smaller sets of parameters being passed--but it also strikes me as code that would violate the principle of least astonishment. Pretty atypical and unusual SQL; code that a DBA would look at and say "What?  Why are they passing parameters as an array?  That's odd."

But, again, I'm not objecting at all.  Just pointing out that performance is not the only concern we should be considering.  Just my $.02.

Onorio Catenacci

unread,
Feb 15, 2024, 3:04:10 PMFeb 15
to elixi...@googlegroups.com
Oop--what I get for not carefully editing before sending.  

"I get that it's faster for much smaller sets of parameters (than the 65,535 that it was originally intended to address)"

Dario Heinisch

unread,
Feb 15, 2024, 3:16:56 PMFeb 15
to elixi...@googlegroups.com

Single array parameters are faster than a long list of values that all need to be parsed. The parser is slower and more memory hungry when dealing with long lists compared to the array input function.

We use in our code base mostly the unnest trick for larger inserts for tables with lots of columns instead of using Repo.insert_all for 2 reasons:

1) The query is faster for the reason listed above
2) We don't have to pre chunk the stream so we don't hit the 65K limit

If Repo.insert_all would use the unnest trick, that would be a nice improvement.

Best regards,

Dario

--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.

José Valim

unread,
Feb 15, 2024, 4:07:57 PMFeb 15
to elixi...@googlegroups.com
Hi Ruslan, a PR is welcome! Please submit one if interested. We don't need to support it with ecto_query, at least not initially, only with lists of values?

Message has been deleted

José Valim

unread,
Feb 16, 2024, 2:40:58 PMFeb 16
to elixi...@googlegroups.com
>  It probably wouldn't be able to completely replace the current insert_all as we can't evaluate SQL in the arrays but can in VALUES so all of these would be unsupported

I was thinking more of an optimization that automatically emits the array if we know it is safe to do so.

> Single array parameters are faster than a long list of values that all need to be parsed. The parser is slower and more memory hungry when dealing with long lists compared to the array input function.

Both array and (?, ?) are sent as parameters, so no additional parsing in either case?

On Fri, Feb 16, 2024 at 8:38 PM Ruslan Doga <dogar...@gmail.com> wrote:

Hi Jose!

I meant that it already kind of works with he current insert_all implementation but the query is not as clean as in the blog post since we need to select the fields explicitly and unnest with multiple arrays doesn't seem to allow indexing into the tuples. My workaround (https://github.com/ruslandoga/insert_all_bench/blob/07bd1b4fce46944d09d81f00a68f6ede279d6193/lib/insert_all_bench.ex#L18-L38) is using a cte with multiple unnest (one for each array), the performance seems to be the same as with the simpler query from the blog post

    q =
      "input"
      |> with_cte("input",
        as: fragment("select unnest(?::int[]) as id, unnest(?::int[]) as name", ^ids, ^names)
      )
      |> select([i], %{id: i.id, name: i.name})

    :timer.tc(fn -> Repo.insert_all("users", q) end)

So I wonder if we should aim at making this approach easier and documenting it of if it should be a new insert_all function. It probably wouldn't be able to completely replace the current insert_all as we can't evaluate SQL in the arrays but can in VALUES so all of these would be unsupported https://github.com/elixir-ecto/ecto_sql/blob/629b6633bb46e4b3b966ba9fedcf8a2296ed7c10/lib/ecto/adapters/postgres/connection.ex#L303-L310
Message has been deleted
Message has been deleted

José Valim

unread,
Feb 20, 2024, 2:44:09 PMFeb 20
to elixi...@googlegroups.com
Does the parameter have to be typed? What happens if we leave it up for Postgres to fill in the array types?

On Tue, Feb 20, 2024 at 8:42 PM cevado <flav...@gmail.com> wrote:
> Both array and (?, ?) are sent as parameters, so no additional parsing in either case?
The difference is the amount of parameters there. If you have a 10 column table and you're inserting 1.000 rows.
with the current way, postgres needs to parse 10.000 parameters. with the unnest/array aproach, postgres have to parse 10.


I was trying to do something directly with ecto, but to do that in the Ecto.Adapters.SQL.Connection.insert/7 implementation for postgres, we need to have the the types of the columns in the headers list.
I don't see a way to change that without introducing a breaking change to ecto protocols.

Maybe having a helper or an easy way to do the query as Ruslan suggested would be a simple approach.

cevado

unread,
Feb 20, 2024, 10:52:54 PMFeb 20
to elixir-ecto
> Does the parameter have to be typed? What happens if we leave it up for Postgres to fill in the array types?
for numerical, text and datetime types it gonna cast according to the column type, applying coercion if needed.
for example if it's an array of timestamps and the column is date, it coerces to date.
i didn't investigate precision on numeric types.
trying with common types the only two issues was with arrays and jsonb:
jsonb requires explicit casting.
arrays just doesn't work, unnest will expand all dimensions of any given array.
all the tests i've done on that was with postgres 16.1.

José Valim

unread,
Feb 21, 2024, 2:32:02 AMFeb 21
to elixi...@googlegroups.com
Thank you. So unfortunately this means we indeed can't change insert, because a json column can also store regular integers and strings, and therefore it is not easy to know if a column would be safe to do this or not. Passing the Ecto types do not help either, because a single Ecto type may represent several database types.

The best way to implement this feature is indeed via a separate insert, maybe insert_columns, where you already expect the arguments to be given as columns instead of rows.

--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.

cevado

unread,
Feb 24, 2024, 1:28:06 PMFeb 24
to elixir-ecto
I"ve explored some other approaches based on Ruslan repo.

I've tried another approach using jsonb_to_recordset function, instead of unnest.
I've also added another implementation of the approach with unnest using raw sql and one of jsonb_to_recordset.
Both approachs with raw sql doesn't required explicit casting.

some discoveries of using jsonb_to_recordset, it supports both jsonb and array fields without issue.

here are some execution time for all approaches from my computer inserting 30K rows(upper limit because of the parameters issue with the usual insert all)
usual insert all approach: 388937
insert all using unnest with cte approach: 57431
insert all using jsonb_to_recordset with cte approach: 199934
raw sql using unnest approach: 1589
raw sql using jsonb_to_recordset approach: 736

I belive the jsonb_to_recordset with cte approach is slower  than the unnest with cte because it includes the time to encode the json, since it's done by ecto.
Since in the ecto internals it builds a raw sql, i think it's possible to include an approach similar to the raw sql using jsonb_to_recordset generic enough to be used always.
A separete insert could be done just so it includes a limitation to not have inner queries and other safeguards to avoid issues that might appear in the existing insert_all

Ruslan Doga

unread,
Feb 24, 2024, 2:40:04 PMFeb 24
to elixir-ecto
I wonder if the same insert_all could be used but with some extra accumulator flag like "can_be_optimized?" here https://github.com/elixir-ecto/ecto_sql/blob/629b6633bb46e4b3b966ba9fedcf8a2296ed7c10/lib/ecto/adapters/postgres/connection.ex#L301?

Ruslan Doga

unread,
Feb 24, 2024, 2:47:12 PMFeb 24
to elixir-ecto
Ah no, that would mean unnecessary work to build VALUES.
Reply all
Reply to author
Forward
0 new messages