Confusion with bulk/batch inserts and bind variables

429 views
Skip to first unread message

Kevinjeet Gill

unread,
Jun 21, 2019, 2:34:37 PM6/21/19
to jOOQ User Group
I'm trying to batch inserts to reduce the number of round trips pushing millions (or billions) of records into MySQL 5.7 with jooq 3.11.9.

I was trying to compare two approaches with a table with schema:



CREATE TABLE `Table1` (
    `BytesKey` binary(16) NOT NULL,
    `StringData` varchar(2600) CHARACTER SET utf8 NOT NULL,
    UNIQUE KEY `UniqueKey` (`BytesKey`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

This first approach works correctly:

class BulkInsertBuffer {
    public InsertValuesStep2 insertBuffer = insertInto(table("Table1")).columns(field("BytesKey"), field("StringData"));

    public void add(Record record) {
        insertBuffer = insertBuffer.values(
            field("unhex({0})", String.class, record.getHexStringKey()),
            (String) record.getData());
    }

    public int flush(DSLContext mysql) {
        return mysql.execute(insertBuffer);
    }
}


but I wanted to try the second one, and got various errors trying different approaches:

static class BulkInsertBufferV2 {
    public BatchBindStep 
insertBuffer = mysql.batch(insertInto(table("Table1"), field("BytesKey"), field("StringData")).values(null, null));

    public void add(Record record) {
        insertBuffer = insertBuffer.bind(

            field("unhex({0})", String.class, record.getHexStringKey()),
            (String) record.getData());
    }

    public int[] flush(DSLContext mysql) {
        return insertBuffer.execute();
    }
}


Which throws a org.jooq.exception.DataAccessException: SQL [insert into Table1 (BytesKey, StringData) values (?, ?)]; Data truncation: Data too long for column 'BytesKey' at row 1

Thanks!

Lukas Eder

unread,
Jun 24, 2019, 3:46:11 AM6/24/19
to jOOQ User Group
Hi Kevinjeet,

In your second example, you cannot "bind" an expression to your batch. You can only bind values to it. You should make sure the expression is already in the INSERT statement.

static class BulkInsertBufferV2 {
    public BatchBindStep 
insertBuffer = mysql.batch(

        insertInto(table("Table1"), field("BytesKey"), field("StringData"))
        .values(field("unhex({0})", String.class, null), null));

    public void add(Record record) {
        insertBuffer = insertBuffer.bind(
record.getHexStringKey(),(String) record.getData());

    }

    public int[] flush(DSLContext mysql) {
        return insertBuffer.execute();
    }
}
 

Notice that those "null" values in the insertInto statement are "placeholders" for your bind variables. You have to put them there for syntactic reasons, even if that's a bit confusing. If you want to use an expression in your VALUES clause, you can do that, but you still have to place a bind variable placeholder somewhere.

I hope this helps,
Lukas

Kevinjeet Gill

unread,
Jun 25, 2019, 5:26:44 AM6/25/19
to jOOQ User Group
That worked for me. For future reference, I ended up using the first approach.

It wasn't a super scientific test, but my workload was 90 batches of 2000. Keep in mind my test was doing a lot of application specific work too.
Composing an individual query strategy rows took about 150 seconds to execute where as the batch/bind took 260 seconds.
Just as a baseline, I also no-oped those functions and found that that took about 130s. So the approach added 130s instead of just 20s! 

I didn't peek at the jOOq source to see what was happening so maybe this was to be expected.

Thanks!

Lukas Eder

unread,
Jun 26, 2019, 11:13:35 AM6/26/19
to jOOQ User Group
Oh wow, we do have some performance issues in the batch API, but nothing that would warrant such a huge increase. This definitely sounds like something we should investigate. Would you be interested in providing an MCVE (Minimal, Complete, Verifiable Example) that helps reproduce the two measured times? We have a template that you could adapt relatively easily, and plug your logic into the tests so we can see what you did exactly:

Thanks,
Lukas 

--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jooq-user/df385659-919d-4a86-b498-75f26a327863%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kevinjeet Gill

unread,
Jun 27, 2019, 4:26:54 PM6/27/19
to jOOQ User Group
Just on a hunch, I was assuming that MySQL/JDBC/Connector/J's batch was going to end up just eating
multiple roundtrips, just because I'm incredibly pessimistic about MySQL "doing the right thing".

I have done no further verification, but a few google searches seem to back this up:

I already deleted that code, so I won't have a chance to try this out to see if it's the issue until later.
Alternatively, is there a way to log what roundtrips are "actually" being made? Either at the JOOq
or the Connector/J Level? I think that'll go a long way to help understand if it's just my shitty internet
that's the problem, thanks!
To unsubscribe from this group and stop receiving emails from it, send an email to jooq...@googlegroups.com.

Lukas Eder

unread,
Jun 28, 2019, 4:02:45 AM6/28/19
to jOOQ User Group
On the jOOQ level, you can use an ExecuteListener and listen to executeStart() events. On the JDBC level, you can use a JDBC logging proxy, such as e.g. https://github.com/arthurblake/log4jdbc to log all the statments sent to the driver by *anyone*

I hope this helps

To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jooq-user/4ff2832c-0cf4-4bd8-bf9a-57c6d8572876%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages