Performance issues when bulk-inserting a lot of records with jOOQ

37 views
Skip to first unread message

Giovanni Zotta

unread,
May 28, 2024, 2:39:39 PMMay 28
to jOOQ User Group
Hello there,

This is not necessarily a jOOQ issue, but I wonder how you would best solve it when using jOOQ. If it's too off-topic, feel free to direct me somewhere else. 

We have been using jOOQ happily for a while in production; every day we have a couple of big bulk insert queries (inserting >200k records at once) that were taking ~2 minutes every time. Some days ago, we configured a parameter in our Postgres server (idle_in_transaction_session_timeout) that kills all sessions that are in a transaction, but haven't executed any query in 30s. After implementing this configuration, these inserts started to fail. 

On closer inspection of a past trace where the timeout is not implemented, these are the logs on our services:
  1. 23:31:53: Application starts the insert query (with returning clause)
  2. 23:31:56: SQL is rendered by jOOQ and logged (3s)
  3. 23:33:53: Postgres says query took 8s (after 2 minutes)
  4. 23:33:54: Application fetched all results, and transaction is committed in the same second
So, Postgres says the query took ~8s, but the application has been waiting for 2 minutes. We think this is because the SQL we render is very large (it's a query of the form `insert into table (column1, column2) values (1, 2), (3,4)`) and Postgres spends a lot of time parsing it, even though honestly ~2 minutes seems like a lot of time.

This is not a jOOQ issue per-se, but we speculated that splitting the query up into smaller batches would solve the issue. I'm curious what options we have to do such an insert, where we also need to return the IDs of the generated records. 

Regards,
Giovanni

Rob Sargent

unread,
May 28, 2024, 4:36:08 PMMay 28
to 'Sumner R Andrews Jr' via jOOQ User Group
Use PostgreSQL copy command.  You may have to send the (delimited) data to the server and let the server use copy.

Giovanni Zotta

unread,
May 28, 2024, 4:44:39 PMMay 28
to jOOQ User Group
AFAIK copy does not have a returning clause, which I need to retrieve the generated primary key IDs. Is there a way around that?

Rob Sargent

unread,
May 29, 2024, 12:54:51 AMMay 29
to 'Sumner R Andrews Jr' via jOOQ User Group

On May 28, 2024, at 1:41 PM, Giovanni Zotta <giovann...@teampicnic.com> wrote:

AFAIK copy does not have a returning clause, which I need to retrieve the generated primary key IDs. Is there a way around that?


Yes. But my way is to assign ids in the client.  If you’re using a sequence as Id generator then you have to be able to select the rows back out by input values. Perhaps roll raw data into temp table, join back to that.  This assumes there’s a “natural key” in the columns of the raw data.


Jason Bennett

unread,
May 29, 2024, 12:54:57 AMMay 29
to jOOQ User Group
Taking 2 minutes to insert 200k records is a little high, I see speeds of around 35 seconds for 300k rows (and 12 seconds for 100k rows) with jOOQ.

I'm not sure what transactions you are using, but if you are using Spring annotations, you can switch to using the TransactionTemplate and manually flush smaller batches, or just run smaller transactions as you suggest. I'm not sure about the IDs, but it would seem that you can just capture the return from each transaction?

Lukas Eder

unread,
May 29, 2024, 3:20:43 AMMay 29
to jooq...@googlegroups.com
Our loader API has multiple options to throttle insertions through jOOQ API:

But the loader API doesn't support the RETURNING clause. I'm assuming you're using an IDENTITY (or serial) column, so one workaround could be to pre-fetch all the IDs that you're going to need. jOOQ has a convenient nextvals() synthetic table valued function to do this:

And then indeed, using the COPY command heavily outperforms anything SQL based (if you're not using any other jOOQ feature for this insertion). If you must use jOOQ for some abstraction reason, then pre-fetching IDs and using the Loader API would be my go-to option here, or alternatively, chunk your data set into smaller sets manually.

I hope this helps,
Lukas

--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jooq-user/0266d773-652a-4b46-80d0-4b5d86fbbff9n%40googlegroups.com.

Giovanni Zotta

unread,
May 29, 2024, 6:04:27 AMMay 29
to jOOQ User Group
Thanks all for the replies, very useful! I think getting the IDs on the client and then using COPY makes sense, and sounds like the best alternative.

The problem I have now is that I don't have an explicit sequence, but simply a serial value: `id bigserial primary key`, so I do not have a generates `Sequence` object in java. I tried adding the configuration suggested here, but only my explicitly created sequences (via explicit `CREATE SEQUENCE`) are generated, and not the primary key serial I'm interested in.

Do you have suggestions on how to I can have something like `ctx.fetchValues(MySerial.nextvals(10))`?

Lukas Eder

unread,
May 29, 2024, 6:25:44 AMMay 29
to jooq...@googlegroups.com
On Wed, May 29, 2024 at 12:04 PM Giovanni Zotta <giovann...@teampicnic.com> wrote:
Thanks all for the replies, very useful! I think getting the IDs on the client and then using COPY makes sense, and sounds like the best alternative.

The problem I have now is that I don't have an explicit sequence, but simply a serial value: `id bigserial primary key`, so I do not have a generates `Sequence` object in java. I tried adding the configuration suggested here, but only my explicitly created sequences (via explicit `CREATE SEQUENCE`) are generated, and not the primary key serial I'm interested in.

Serials are implemented using sequences in PostgreSQL. For convenience, jOOQ hides these by default (because you usually don't need them). You can re-enable them using includeSystemSequences

I hope this helps

Giovanni Zotta

unread,
May 29, 2024, 6:27:25 AMMay 29
to jOOQ User Group
That's exactly what I was looking for, thanks Lukas! I will go ahead and try both the loader API and copy, and will report back :)
Reply all
Reply to author
Forward
0 new messages