Extremely slow performance when calling batchInsert(Collection<Record>)

2,278 views
Skip to first unread message

Aner Perez

unread,
Sep 10, 2015, 6:57:33 PM9/10/15
to jOOQ User Group
Let me preface this by saying that I am on my second day of evaluating jOOQ so user error is a distinct possibility.

Environment:
Postgresql 9.4.4
Postgresql jdbc driver 9.4-1202-jdbc42
jOOQ 3.6.2
Java 1.8.31
Spring 4.1.7

I am implementing a bulk upload of records by parsing a CSV file and loading them into a table.  I have an existing script I execute using the psql command that can load 80K sample records in about a second using the "\copy" command provided by psql.

I have written code using jOOQ to load these same records in a Spring service function.  The code parses the CSV and loads the data into Record objects that were generated by the jOOQ code generator from a database designed in Vertabelo (Very nice integration by the way!!!).

My service method looks like this:

    @Autowired
   
private DefaultDSLContext jooq;

   
@Transactional
   
public void loadActuals(Collection<JRawActualsRecord> records) {
        jooq
.deleteFrom(RAW_ACTUALS).execute();
        jooq
.batchInsert(records).execute();
   
}

The delete runs in about 0.5 seconds but the insert of 80k records takes almost 15 minutes!  The following log snippet tells the story.

17:48:15.353 [http-apr-8084-exec-15] DEBUG org.jooq.tools.LoggerListener - Executing query          : delete from "raw_actuals"
17:48:15.366 [http-apr-8084-exec-15] DEBUG o.s.jdbc.datasource.DataSourceUtils - Fetching JDBC Connection from DataSource
17:48:15.367 [http-apr-8084-exec-15] DEBUG o.s.jdbc.datasource.DataSourceUtils - Registering transaction synchronization for JDBC Connection
17:48:15.369 [http-apr-8084-exec-15] DEBUG o.s.j.d.LazyConnectionDataSourceProxy - Connecting to database for operation 'prepareStatement'
17:48:15.773 [http-apr-8084-exec-15] DEBUG org.jooq.tools.LoggerListener - Affected row(s)          : 80223
17:48:15.773 [http-apr-8084-exec-15] DEBUG org.jooq.tools.StopWatch - Query executed           : Total: 465.477ms
17:48:15.774 [http-apr-8084-exec-15] DEBUG org.jooq.tools.StopWatch - Finishing                : Total: 466.433ms, +0.956ms
17:48:23.317 [http-apr-8084-exec-15] DEBUG org.jooq.tools.LoggerListener - Executing batch query    : insert into "raw_actuals" ("year", "month", "lr3", "lr4", "lr5", "customer_channel", "regional_product_brand", "regional_product_sub_brand", "regional_product_material_group", "trial_indicator", "net_direct_sales_amt", "net_direct_units", "team", "national_net_combined_sales_amt", "national_net_combined_revenue_units", "strategic_net_combined_sales_amt", "strategic_net_combined_revenue_units", "associate", "associate_number", "associate_desc", "sales_manager_number", "sales_manager_desc") values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
17:48:23.317 [http-apr-8084-exec-15] DEBUG o.s.j.d.LazyConnectionDataSourceProxy - Using existing database connection for operation 'prepareStatement'
18:03:08.812 [http-apr-8084-exec-15] DEBUG org.jooq.tools.StopWatch - Query executed           : Total: 14:45
18:03:08.812 [http-apr-8084-exec-15] DEBUG org.jooq.tools.StopWatch - Finishing                : Total: 14:45, +0.207ms
18:03:08.840 [http-apr-8084-exec-15] DEBUG o.s.jdbc.datasource.DataSourceUtils - Returning JDBC Connection to DataSource
18:03:08.906 [http-apr-8084-exec-15] DEBUG o.s.j.d.LazyConnectionDataSourceProxy - Using existing database connection for operation 'close'
18:03:08.906 [http-apr-8084-exec-15] DEBUG o.s.orm.jpa.JpaTransactionManager - Initiating transaction commit
18:03:08.906 [http-apr-8084-exec-15] DEBUG o.s.orm.jpa.JpaTransactionManager - Committing JPA transaction on EntityManager [org.hibernate.jpa.internal.EntityManagerImpl@2343b67e]
18:03:08.906 [http-apr-8084-exec-15] DEBUG o.h.e.t.spi.AbstractTransactionImpl - committing
18:03:08.906 [http-apr-8084-exec-15] DEBUG o.h.e.t.i.jdbc.JdbcTransaction - committed JDBC Connection

I can't explain what could cause such excessively slow performance.  Any pointers on where I can start looking for answers?

Thanks!

  - Aner

Lukas Eder

unread,
Sep 11, 2015, 12:42:57 PM9/11/15
to jooq...@googlegroups.com
Hello,

2015-09-11 0:57 GMT+02:00 Aner Perez <aner....@gmail.com>:
Let me preface this by saying that I am on my second day of evaluating jOOQ so user error is a distinct possibility.

Environment:
Postgresql 9.4.4
Postgresql jdbc driver 9.4-1202-jdbc42
jOOQ 3.6.2
Java 1.8.31
Spring 4.1.7

I am implementing a bulk upload of records by parsing a CSV file and loading them into a table.  I have an existing script I execute using the psql command that can load 80K sample records in about a second using the "\copy" command provided by psql.

No jOOQ (or JDBC) based solution will beat \copy - or any other database-provided loading mechanism. If speed is really important and there is hardly any business logic, then you should stick to vendor-specific loader commands...
 
I have written code using jOOQ to load these same records in a Spring service function.  The code parses the CSV and loads the data into Record objects that were generated by the jOOQ code generator from a database designed in Vertabelo (Very nice integration by the way!!!).

Thanks, glad to hear that this integration is working for you
 
My service method looks like this:

    @Autowired
   
private DefaultDSLContext jooq;

   
@Transactional
   
public void loadActuals(Collection<JRawActualsRecord> records) {
        jooq
.deleteFrom(RAW_ACTUALS).execute();
        jooq
.batchInsert(records).execute();
   
}

The delete runs in about 0.5 seconds but the insert of 80k records takes almost 15 minutes! 

That's bad indeed!

First off, my immediate recommendation is for you to use jOOQ's loader API instead of batchInsert(). There's native support for loading CSV data:

You can play around with:

1. bulk sizes (how many rows per insert)
2. batch sizes (how many inserts per statement)
3. commit sizes (how many statements per transaction)

Unlike the loader API, batchInsert() generates individual SQL statements for each record and batches only those statements that are equal. If 80k records produce equal SQL statements, that's a lot of useless SQL rendering. We'll look into optimising this:

However, I'd still recommend using the loader API.

I hope this helps,
Lukas

 

Aner Perez

unread,
Sep 11, 2015, 4:41:21 PM9/11/15
to jOOQ User Group
Thanks for your time!

Comments inline

On Friday, September 11, 2015 at 12:42:57 PM UTC-4, Lukas Eder wrote:

That's bad indeed!

Yes, 1000 times slower is always a cause for concern :-)
 
First off, my immediate recommendation is for you to use jOOQ's loader API instead of batchInsert(). There's native support for loading CSV data:

Awesome, I was not aware of that functionality!  Unfortunately, I just tried this and it did not make any perceptible difference, still around 15 minutes to upload!

I did have debug logging turned on and with loadInto().loadCSV() I see debug log messages for each of the 80k inserts that looked like this:

15:19:57.691 [http-apr-8084-exec-73] DEBUG org.jooq.tools.StopWatch - Query executed           : Total: 16.3ms
15:19:57.691 [http-apr-8084-exec-73] DEBUG org.jooq.tools.StopWatch - Finishing                : Total: 16.312ms, +0.011ms
15:19:57.691 [http-apr-8084-exec-73] DEBUG org.jooq.tools.LoggerListener - Executing query          : insert into "raw_actuals" ("year", "month", "sales_manager_number", "sales_manager_desc", "associate", "associate_number", "associate_desc", "customer_channel", "team", "lr3", "lr4", "lr5", "regional_product_material_group", "trial_indicator", "regional_product_sub_brand", "regional_product_brand", "net_direct_sales_amt", "net_direct_units", "national_net_combined_sales_amt", "national_net_combined_revenue_units", "strategic_net_combined_sales_amt", "strategic_net_combined_revenue_units") values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
15:19:57.691 [http-apr-8084-exec-73] DEBUG org.jooq.tools.LoggerListener - -> with bind values      : insert into "raw_actuals" ("year", "month", "sales_manager_number", "sales_manager_desc", "associate", "associate_number", "associate_desc", "customer_channel", "team", "lr3", "lr4", "lr5", "regional_product_material_group", "trial_indicator", "regional_product_sub_brand", "regional_product_brand", "net_direct_sales_amt", "net_direct_units", "national_net_combined_sales_amt", "national_net_combined_revenue_units", "strategic_net_combined_sales_amt", "strategic_net_combined_revenue_units") values (/snip DATA REMOVED /snip)
15:19:57.691 [http-apr-8084-exec-73] DEBUG o.s.j.d.LazyConnectionDataSourceProxy - Using existing database connection for operation 'prepareStatement'
15:19:57.701 [http-apr-8084-exec-73] DEBUG org.jooq.tools.LoggerListener - Affected row(s)          : 1

I thought maybe the debug messages were killing performance but taking them out only shaved around 20 seconds from the 15 minute total.

The message "Using existing database connection for operation 'prepareStatement'" makes it sound like each insert is running a separate prepareStatement.  Hopefully that is just confusing wording in the message?

You can play around with:

1. bulk sizes (how many rows per insert)
2. batch sizes (how many inserts per statement)
3. commit sizes (how many statements per transaction)
 
Unfortunately, I'm currently using container managed transactions so I would not be able to use these.

Unlike the loader API, batchInsert() generates individual SQL statements for each record and batches only those statements that are equal. If 80k records produce equal SQL statements, that's a lot of useless SQL rendering. We'll look into optimising this:

However, I'd still recommend using the loader API.

If I can get the loader api to perform reasonably well I will definitely use it since it does exactly what I need.
 
I hope this helps,
Lukas
 
Once again, thanks.  Your suggestions were very useful but I still have to figure out what is going on here.  I have also tried loading using a ByteArrayInputStream to rule out other I/O issues loading the file but that only shaved off about 10 seconds.

My next attempt will be to increase the memory on the jvm to rule out garbage collection issues and if that does not help I guess I'll have to use as a profiler.

Aner Perez

unread,
Sep 12, 2015, 4:34:32 PM9/12/15
to jOOQ User Group
I had dismissed your recommendation for bulk and batch sizes because I was confusing them with the commit size.  I tried using bulkAll() but it did not make any difference.  Then I turned debug logging back on and noticed that with bulkAll() it was still logging 80k individual debug inserts.  I changed the call to uses bulkAfter(1000) which took the load time down to 18 seconds!  This is still 35 times longer than the psql \copy command but a perfectly usable number for our use case!

Seems like builkAll() is not doing what it should in this situation.

Any chance that jOOQ will support the postgresql CopyManager class (See http://stackoverflow.com/questions/6958965/how-to-copy-a-data-from-file-to-postgresql-using-jdbc ) in some future release?  Sounds like it can lead to some worthwhile speed improvements for people who need every last drop.

Thanks again for the help, loadInto()/loadCSV()/bulkAfter() was just what I needed!

  - Aner

Lukas Eder

unread,
Sep 15, 2015, 9:16:57 AM9/15/15
to jooq...@googlegroups.com
Thank you very much for your feedback!

2015-09-12 22:34 GMT+02:00 Aner Perez <aner....@gmail.com>:
I had dismissed your recommendation for bulk and batch sizes because I was confusing them with the commit size.  I tried using bulkAll() but it did not make any difference.

That's very interesting. Perhaps, at some point, if there are too many bind variables in a single statement, PostgreSQL will switch executino modes to a static statement, or the parser requires excessive memory.

I've also found this post here that explains how PostgreSQL's JDBC driver behaves when it comes to batching. Batching doesn't seem to really help much:

There's also:

Quoting Craig Ringer:

If you can't use COPY consider using multi-valued INSERTs if practical. You seem to be doing this already. Don't try to list too many values in a single VALUES though; those values have to fit in memory a couple of times over, so keep it to a few hundred per statement.
 
 Then I turned debug logging back on and noticed that with bulkAll() it was still logging 80k individual debug inserts.

Are you sure? bulkAll() should just generate one huge INSERT statement. Do you still have the source code to reproduce this?
 
 I changed the call to uses bulkAfter(1000) which took the load time down to 18 seconds!  This is still 35 times longer than the psql \copy command but a perfectly usable number for our use case!

You'll never beat COPY :)
 
Any chance that jOOQ will support the postgresql CopyManager class (See http://stackoverflow.com/questions/6958965/how-to-copy-a-data-from-file-to-postgresql-using-jdbc ) in some future release?  Sounds like it can lead to some worthwhile speed improvements for people who need every last drop.

We have a pending feature request for COPY:

We could certainly look into using COPY within the Loader API, if the Loader configuration permits it...

Best Regards,
Lukas 

--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aner Perez

unread,
Sep 17, 2015, 6:21:57 PM9/17/15
to jOOQ User Group


On Tuesday, September 15, 2015 at 9:16:57 AM UTC-4, Lukas Eder wrote:
 
 Then I turned debug logging back on and noticed that with bulkAll() it was still logging 80k individual debug inserts.

Are you sure? bulkAll() should just generate one huge INSERT statement. Do you still have the source code to reproduce this?

You are right, I must have been running the wrong code when I ran my original test.  I just retested with bulkAll() and I get an error now which is not surprising since it would be a single insert statement with 80,000 rows!

 Caused by: java.io.IOException: Tried to send an out-of-range integer as a 2-byte value: 1764906

Seems like the JDBC request has to be under 64K :-) 

bulkAfter() is what we needed and it is working great, thanks!

Lukas Eder

unread,
Sep 21, 2015, 2:27:35 AM9/21/15
to jooq...@googlegroups.com
2015-09-18 0:21 GMT+02:00 Aner Perez <aner....@gmail.com>:
You are right, I must have been running the wrong code when I ran my original test.  I just retested with bulkAll() and I get an error now which is not surprising since it would be a single insert statement with 80,000 rows!

 Caused by: java.io.IOException: Tried to send an out-of-range integer as a 2-byte value: 1764906

Seems like the JDBC request has to be under 64K :-) 

Ah, yes, true. There are various limitations like that. There's also a maximum number of bind variables in many databases...
 
bulkAfter() is what we needed and it is working great, thanks!

Great, thank you for your feedback!
Reply all
Reply to author
Forward
0 new messages