Batch Insert / Updates with Upset

1,720 views
Skip to first unread message

Samir Faci

unread,
Oct 19, 2016, 9:34:02 PM10/19/16
to jooq...@googlegroups.com
Just for reference, simple upsert work fine, using something along these lines.  Snippet below is my preferred pattern, mergeInto is less verbose, but our PKeys don't actually mean primary key at times so being explicit is necessary.

this.ctx().insertInto(this.table())
    .set(DSL.field("id"), f.id())
    .set(DSL.field("last_modified"), Timestamp.from(Instant.now()))
    .onDuplicateKeyUpdate()
    .set(DSL.field("last_modified"), Timestamp.from(Instant.now())))

What I'm trying to do is take advantage of postgres upsert operation and use batch operations.

Ideally I would love something along these lines to work.

List<SomeTableRecord> records = Lists.newArrayList();
// build up a list of recrods...


dsl.batchInsert(records).onDuplicateKeyUpdate()
.set(Table.Field, TableRecord.getField)
.set(Table.AnotherField, TableRecord.getAnotherField)
.execute();

Failing that, is there support for batch operation with upserts? 

I was also trying to use this pattern:

create.batch(
	create.insertInto(AUTHOR, ID, FIRST_NAME, LAST_NAME).values(1, "Erich"  , "Gamma"    ),
	create.insertInto(AUTHOR, ID, FIRST_NAME, LAST_NAME).values(2, "Richard", "Helm"     ),
	create.insertInto(AUTHOR, ID, FIRST_NAME, LAST_NAME).values(3, "Ralph"  , "Johnson"  ),
	create.insertInto(AUTHOR, ID, FIRST_NAME, LAST_NAME).values(4, "John"   , "Vlissides"))
.execute();


It would be great if I can write something along the lines of:

List<InsertOnDuplicateSetMoreStep<TableRecord>> queries = Lists.newArrayList();
InsertOnDuplicateSetMoreStep<TableRecord> query = this.ctx().insertInto(this.table())
    .set(DSL.field("id"), f.id())
    .set(DSL.field("last_modified"), Timestamp.from(Instant.now()))
    .onDuplicateKeyUpdate()
    .set(DSL.field("last_modified"), Timestamp.from(Instant.now())))
queries.add(query);
Build up a List of these query and then simply execute something along the lines of:

create.batch( queryList).execute();


Either ways, my wish list aside, neither pattern seems to work with Jooq 3.8.1 (Or I'm passing in the wrong data type).  Is there a supported pattern that does work? 
Or can upserts not be done in conjunction with batch operations?


--
Thank you
Samir Faci

Lukas Eder

unread,
Oct 31, 2016, 10:00:41 AM10/31/16
to jooq...@googlegroups.com
Hi Samir,

I'm sorry for the delay. 

dsl.batchInsert(records).onDuplicateKeyUpdate()
.set(Table.Field, TableRecord.getField)
.set(Table.AnotherField, TableRecord.getAnotherField)
.execute();

Currently, this isn't possible out of the box. There's a pending feature request for this, here:

But you should be able to batch your explicit INSERT .. ON DUPLICATE KEY UPDATE statements

Either ways, my wish list aside, neither pattern seems to work with Jooq 3.8.1 (Or I'm passing in the wrong data type).  Is there a supported pattern that does work? 
Or can upserts not be done in conjunction with batch operations?

What went wrong?

Lukas

--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Samir Faci

unread,
Oct 31, 2016, 3:11:50 PM10/31/16
to jooq...@googlegroups.com
Actually, the the second approach does work, though I was running into a different issue because the "primary key" in my case is serial ID which doesn't work well with upserts.

I was wondering if there was a way to explicitly list the columns I want to use for upserts.

Example:

insert into pinned_tweets (user_handle, tweet_id, pinned_at)
  values (
    'rey',
    1,
    clock_timestamp()
  )
on conflict (user_handle)
do update set (tweet_id, pinned_at) = (1, clock_timestamp())
where pinned_tweets.user_handle = 'rey';
(Pulled from a random blog)

But I believe any valid Index or constrain can be applied (Postgres).   Can we add a method that takes a .onConflict(Index_name or constraint_name) ? 

I'm assuming jooq can auto-gen indexes as well? 


Thanks for the issue listed, I'll watch that and can't wait to see it get resolved.

--
Samir Faci



Lukas Eder

unread,
Nov 1, 2016, 6:19:02 PM11/1/16
to jooq...@googlegroups.com
Hi Samir,

2016-10-31 20:11 GMT+01:00 Samir Faci <sa...@esamir.com>:
Actually, the the second approach does work, though I was running into a different issue because the "primary key" in my case is serial ID which doesn't work well with upserts.

I was wondering if there was a way to explicitly list the columns I want to use for upserts.

Example:

insert into pinned_tweets (user_handle, tweet_id, pinned_at)
  values (
    'rey',
    1,
    clock_timestamp()
  )
on conflict (user_handle)
do update set (tweet_id, pinned_at) = (1, clock_timestamp())
where pinned_tweets.user_handle = 'rey';
(Pulled from a random blog)

Hmm, PostgreSQL ON CONFLICT will be supported in jOOQ 3.9:

Prior to 3.9, ON CONFLICT is already emulated via MySQL's ON DUPLICATE KEY UPDATE syntax, in case of which the primary key (from your generated table) will be passed to the ON CONFLICT clause.

Let me know if this works for you.

But I overlooked that WHERE clause. Added another feature request for this:

But do you need it in your example?

But I believe any valid Index or constrain can be applied (Postgres).   Can we add a method that takes a .onConflict(Index_name or constraint_name) ? 

Yes, in jOOQ 3.9.
 
I'm assuming jooq can auto-gen indexes as well? 

No, we don't generate indexes yet as that information hasn't been useful to general jOOQ API usage (yet).

Best Regards,
Lukas

Samir Faci

unread,
Nov 1, 2016, 11:29:20 PM11/1/16
to jooq...@googlegroups.com
Inline...

On Tue, Nov 1, 2016 at 3:18 PM, Lukas Eder <lukas...@gmail.com> wrote:
Hi Samir,

2016-10-31 20:11 GMT+01:00 Samir Faci <sa...@esamir.com>:
Actually, the the second approach does work, though I was running into a different issue because the "primary key" in my case is serial ID which doesn't work well with upserts.

I was wondering if there was a way to explicitly list the columns I want to use for upserts.

Example:

insert into pinned_tweets (user_handle, tweet_id, pinned_at)
  values (
    'rey',
    1,
    clock_timestamp()
  )
on conflict (user_handle)
do update set (tweet_id, pinned_at) = (1, clock_timestamp())
where pinned_tweets.user_handle = 'rey';
(Pulled from a random blog)

Hmm, PostgreSQL ON CONFLICT will be supported in jOOQ 3.9:

Prior to 3.9, ON CONFLICT is already emulated via MySQL's ON DUPLICATE KEY UPDATE syntax, in case of which the primary key (from your generated table) will be passed to the ON CONFLICT clause.

Let me know if this works for you.


Yup.  That's what I was using.  The issue I was running into was that the PKey was not useful, so I needed to specify a unique constraint/index.  It looks like that will be supported in 3.9.  
 

But I overlooked that WHERE clause. Added another feature request for this:

But do you need it in your example?

I don't think I need to use the where clause.  Though I imagine it would be a useful pattern.  

The two main Upsert + Jooq features I would love to see are:

1.  batch operation working with upserts (which is partially supported by generating a list of custom insert statements) 
2.  the ability to specify the list of columns rather the .onKeyConflict which doesn't support all of my use cases.

But I believe any valid Index or constrain can be applied (Postgres).   Can we add a method that takes a .onConflict(Index_name or constraint_name) ? 

Yes, in jOOQ 3.9.

All these great features coming out in 3.9.  Can't wait.  
 
 
I'm assuming jooq can auto-gen indexes as well? 

No, we don't generate indexes yet as that information hasn't been useful to general jOOQ API usage (yet).

That's fine.  I suppose there's not much usability added by having java objects generated for them.
 

Best Regards,
Lukas

--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lukas Eder

unread,
Nov 3, 2016, 8:35:13 AM11/3/16
to jooq...@googlegroups.com
2016-11-02 4:29 GMT+01:00 Samir Faci <sa...@esamir.com>:

Yup.  That's what I was using.  The issue I was running into was that the PKey was not useful, so I needed to specify a unique constraint/index.  It looks like that will be supported in 3.9.  

I'm curious about this to learn more about your use-case. Why was the primary key not useful in your case?
 
I don't think I need to use the where clause.  Though I imagine it would be a useful pattern.  

Yes indeed. Sometimes, it's safe to ignore failed inserts rather than performing some update. Especially when inserting a whole data set, this is a useful pattern, which is also available from the SQL standard MERGE statement.

I keep wondering why PostgreSQL didn't just implement MERGE, though.
 
The two main Upsert + Jooq features I would love to see are:

1.  batch operation working with upserts (which is partially supported by generating a list of custom insert statements) 
2.  the ability to specify the list of columns rather the .onKeyConflict which doesn't support all of my use cases.

Regarding 2), this is the jOOQ 3.9 API:

    /**
     * Add an <code>ON CONFLICT</code> clause to this insert query.
     */
    @Support({ POSTGRES_9_5 })
    InsertOnConflictDoUpdateStep<R> onConflict(Field<?>... keys);

    /**
     * Add an <code>ON CONFLICT</code> clause to this insert query.
     */
    @Support({ POSTGRES_9_5 })
    InsertOnConflictDoUpdateStep<R> onConflict(Collection<? extends Field<?>> keys);

No, we don't generate indexes yet as that information hasn't been useful to general jOOQ API usage (yet).

That's fine.  I suppose there's not much usability added by having java objects generated for them.

Well, there is now that we support DDL to create indexes or to restore a schema from jOOQ classes :)
There, I've added a feature request so this won't be forgotten (in particular: ON CONFLICT usage with index names):

Thanks,
Lukas

Samir Faci

unread,
Nov 3, 2016, 12:20:36 PM11/3/16
to jooq...@googlegroups.com
On Thu, Nov 3, 2016 at 5:35 AM, Lukas Eder <lukas...@gmail.com> wrote:


2016-11-02 4:29 GMT+01:00 Samir Faci <sa...@esamir.com>:

Yup.  That's what I was using.  The issue I was running into was that the PKey was not useful, so I needed to specify a unique constraint/index.  It looks like that will be supported in 3.9.  

I'm curious about this to learn more about your use-case. Why was the primary key not useful in your case?

Well, It was just a badly designed schema.  Where the PKey is a serial auto incrementing ID, while what uniquely identifies the record are other columns in the table.    

Let's take something like this as an example:

id, user_id, bonus_type, year, bonus_value.         

so the true key would be something like user_id_bonus_type, year but the actual primary key is id.

so the intended behavior is that if I insert a record with (user_id, bonus_type, year) and there is an existing record we should update the record rather then inserting another one.  But since 'id' is the actual key it won't ever conflict on ID since that is just a sequence.

in which case my SQL would end up having a constraint / index on (user_id, bonus_type, year) and my SQL would end up being something like

insert into bonus_table (id, user_id, bonus_type, year, bonus_value) values (....) on conflict (user_id, bonus_type, year) update bonus_value = $234234; 

(Or something along these lines, my syntax might be a bit off )
 
 
I don't think I need to use the where clause.  Though I imagine it would be a useful pattern.  

Yes indeed. Sometimes, it's safe to ignore failed inserts rather than performing some update. Especially when inserting a whole data set, this is a useful pattern, which is also available from the SQL standard MERGE statement.

I keep wondering why PostgreSQL didn't just implement MERGE, though.
 
The two main Upsert + Jooq features I would love to see are:

1.  batch operation working with upserts (which is partially supported by generating a list of custom insert statements) 
2.  the ability to specify the list of columns rather the .onKeyConflict which doesn't support all of my use cases.

Regarding 2), this is the jOOQ 3.9 API:

    /**
     * Add an <code>ON CONFLICT</code> clause to this insert query.
     */
    @Support({ POSTGRES_9_5 })
    InsertOnConflictDoUpdateStep<R> onConflict(Field<?>... keys);

    /**
     * Add an <code>ON CONFLICT</code> clause to this insert query.
     */
    @Support({ POSTGRES_9_5 })
    InsertOnConflictDoUpdateStep<R> onConflict(Collection<? extends Field<?>> keys);

No, we don't generate indexes yet as that information hasn't been useful to general jOOQ API usage (yet).

That's fine.  I suppose there's not much usability added by having java objects generated for them.

Well, there is now that we support DDL to create indexes or to restore a schema from jOOQ classes :)
There, I've added a feature request so this won't be forgotten (in particular: ON CONFLICT usage with index names):

I don't think this (indexes) will be necessary for my use case though if others find that useful it might be worth while.  The main limitation that I'm running into is that I couldn't list specific Fields for the onConflict which as you listed above address my blockers.

The caveat is that the Fields you're listing have to have a unique constraint or Index setup on the DB.  But that's more on the developer to be aware, not sure if Jooq could do anything to validate that at compile time.


Thank you  for all the work and help on this thread.  I'm looking forward to 3.9.  
 


Thanks,
Lukas

--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lukas Eder

unread,
Nov 4, 2016, 7:32:54 AM11/4/16
to jooq...@googlegroups.com
OK, thank you very much for the additional info - that makes sense.
Reply all
Reply to author
Forward
0 new messages