Bulk DML

150 views
Skip to first unread message

Tamás Gulácsi

unread,
Oct 29, 2018, 5:44:46 PM10/29/18
to golang-sql
Hi,

Anthony Tuininga (cx_Oracle, ODPI-C author) suggested that maybe database/sql should add a Bulk DML (ExecManyContext) function:

"
Hi Tamás, I am back from OpenWorld 2018. :-) If you're interested in my Go presentation you can find it here: https://static.rainfocus.com/oracle/oow18/sess/1525791357522001Q5tc/PF/DEV5047%20-%20The%20Go%20Language_1540587475596001afdk.pdf You'll probably be most interested in the last two sections of the presentation as the first two sections are simply an introduction to the language.
While I was preparing to give this presentation I examined the database/sql and goracle specifically. I noted that there weren't a whole lot of examples and created some. Feel free to borrow heavily if you'd like. :-)

I also noted that database/sql doesn't support Bulk DML. I see that goracle did so -- which is great because the performance is dramatically better.
I was thinking that it wouldn't be too hard to add bulk DML to database/sql.
That would mean that you wouldn't have to do it yourself *and* you wouldn't need to pass goracle.PlSqlArrays through to db.Exec().

My suggestion would be as follows:
// if a Conn does not implement ExecManyer, the database/sql package will implement Exec() in a loop instead type ExecManyer interface {     ExecManyContext(ctx context.Context, query string, args [][]Value) (Result, error) }

What do you think? Anthony
"

What's your thoughts?

First this doesn't seem to be a big addition, but can give HUGE performance gain if Bulk DML is supported at the database side!

Thanks,
Tamás Gulácsi

Daniel Theophanes

unread,
Oct 29, 2018, 7:31:43 PM10/29/18
to golang-sql
(apparently I have trouble hitting reply-all in gmail)

Hi Tamás,

MS SQL Server and Postgresql drivers also have a bulk insert function defined. With at least three major databases with drivers with the same need, I think there is sufficient precedent for this.

I'd rather not use that exact method signature. In particular, the "args [][]Value" will be a problem; it implies that you have all the data you wish to work with in the database in memory at a single time. For large data sets this seems bad.

How flexible is Oracle's Bulk DML? TDS (SQL Server) and Postgresql only support bulk insert operations. Thus they both have a signature like:
Then "(*Bulk) AddRow(row []Value)".

Thanks, -Daniel

Tamás Gulácsi

unread,
Oct 30, 2018, 2:10:15 AM10/30/18
to golang-sql
2018. október 30., kedd 0:31:43 UTC+1 időpontban Daniel Theophanes a következőt írta:
(apparently I have trouble hitting reply-all in gmail)

Hi Tamás,

MS SQL Server and Postgresql drivers also have a bulk insert function defined. With at least three major databases with drivers with the same need, I think there is sufficient precedent for this.

I'd rather not use that exact method signature. In particular, the "args [][]Value" will be a problem; it implies that you have all the data you wish to work with in the database in memory at a single time. For large data sets this seems bad.

How flexible is Oracle's Bulk DML? TDS (SQL Server) and Postgresql only support bulk insert operations. Thus they both have a signature like:
Then "(*Bulk) AddRow(row []Value)".

Thanks, -Daniel


You have to provide the whole data you want to push at once, each in a slice.
But for this, the mssqldb's Bulk context is usable - it could call the statement with the propagated slices once the temp buffers are full (reaches the threshold specified in BulkOptions.RoesPerBatch), and the rest at Done (Flush would be a more correct name, but whatever).
Although it needs the statement to prepare, not the table name (not just INSERTs can be called this way, but UPDATEs, DELETEs and PL/SQL blocks, too).

(*Conn) CreateBulkContext(ctx context.Context, query string) (_ *Bulk)

Mixing the different RDBMS methods won't be easy (how to pass the options - maybe in the query string?), but generally the Bulk type with AddRow and Done can work with Oracle.

Thanks!

Ren Thraysk

unread,
Feb 15, 2019, 8:52:01 PM2/15/19
to golang-sql

Hi,

MySQL now has a mechanism for minimising network round trips when executing multiple statements. For instance, preparing a statement, starting a transaction and executing it multiple times and either committing or rollback can be done in one round trip.

I don't think the Bulk "constructor" should be a function on *Conn, just a normal package level function. That means each driver can have whatever arguments that makes sense. But return an implementation of common interface... something along the lines.

type Bulk interface {
   Add([]driver.NamedValue) error
   Done() error
}

Not a fan of AddRow() as it implies only applicable to insertion, would rather just Add() should be considered another execution of a prepared query with the provided parameters.

Ren

INADA Naoki

unread,
Feb 15, 2019, 10:33:26 PM2/15/19
to Ren Thraysk, golang-sql
2019年2月16日(土) 10:52、Ren Thraysk さん(renth...@gmail.com)のメッセージ:

Hi,

MySQL now has a mechanism for minimising network round trips when executing multiple statements. For instance, preparing a statement, starting a transaction and executing it multiple times and either committing or rollback can be done in one round trip.

Do you mean MariaDB?
Or do you mean x protocol, not client/server protocol?
If neither, I didn't know it. Please give me a URL for it.


Ren Thraysk

unread,
Feb 16, 2019, 7:32:02 AM2/16/19
to golang-sql
It is MySQL X Protocol (how is it not a client/server protocol?). Prepared statement support was added in 8.0.14, so can prepare & execute in one round trip.

Ren
 

INADA Naoki

unread,
Feb 16, 2019, 10:06:15 PM2/16/19
to Ren Thraysk, golang-sql
On Sat, Feb 16, 2019 at 9:32 PM Ren Thraysk <renth...@gmail.com> wrote:
>
>
> It is MySQL X Protocol (how is it not a client/server protocol?).

I see.
MySQL says long used protocol as "Client/Server Protocol".
So it is not generic word.  It is proper noun.

https://dev.mysql.com/doc/internals/en/client-server-protocol.html
https://dev.mysql.com/doc/internals/en/x-protocol-comparison-comparison-to-mysql-c-s-protocol.html



> Prepared statement support was added in 8.0.14, so can prepare & execute in one round trip.

There are "SQL prepare" and "protocol prepare".  As far as I know, MySQL 8.0.14 supports "SQL prepare" in X protocol.

At protocol level, X protocol used "prepare" before 8.0.14.
https://dev.mysql.com/doc/internals/en/x-protocol-messages-messages.html#x-protocol-messages-sql

And X protocol supported bulk insert before 8.0.14, by pipelining.

FWIW, MariaDB extends MySQL Client/Server protocol and supports "pipelining EXEC".

--
INADA Naoki  <songof...@gmail.com>

RenThraysk

unread,
Feb 17, 2019, 7:53:47 AM2/17/19
to INADA Naoki, golang-sql
Yes, all these things combine to provide generic Bulk DML.

The X Protocol protobufs relating to a "protocol prepare" were added in 8.0.14 


Currently experimenting with pipelining using go. Using a response reader go routine, and still while sending statements in another.


Daniel Theophanes

unread,
Feb 26, 2019, 7:54:14 AM2/26/19
to golang-sql
Speaking of MySQL, I'm familiar with SQL Server and postgresql "bulk load", but not so much with MySQL. Would I be correct there isn't a dedicated interface for such, but a more efficient insert?

INADA Naoki

unread,
Feb 26, 2019, 9:07:13 AM2/26/19
to Daniel Theophanes, golang-sql
On Tue, Feb 26, 2019 at 9:54 PM Daniel Theophanes <kard...@gmail.com> wrote:
>
> Speaking of MySQL, I'm familiar with SQL Server and postgresql "bulk load", but not so much with MySQL. Would I be correct there isn't a dedicated interface for such, but a more efficient insert?
>

The answer is not simple.

MySQL has "LOAD LOCAL INFILE" statement. It uploads local csv file to server.
It is much faster than insert statement. This is the best way.

https://dev.mysql.com/doc/refman/8.0/en/insert-optimization.html

But if you expects "prepare and batched execute" protocol, it is
different thing.

MySQL has two protocols: C/S protocol and X protocol.

C/S is long lived, standard protocol. MariaDB and PerconaDB supports it.
There is "prepare" and "exec" command in the protocol. But it
requires one roundtrip
for each rows inserted.
So we used "query" command instead. It sends text query. So we need to build
SQL like "INSERT INTO ... VALUES (a1, a2, a3), (b1, b2, b3,) ... (x1, x2, x3)".

While MariaDB uses same protocol, it extend the protocol. So there is
"batched exec" command. But it can't be used for MySQL. (I don't know much
about PerconaDB).

New X protocol supports pipelined exec. But it is very young protocol and
under heavy development.
MariaDB and PerconaDB doesn't support it. X protocol doesn't support
compression
which C/S protocol supports. So X protocol is not widely used yet.

This is current situation of MySQL protocol world.
--
INADA Naoki <songof...@gmail.com>

Ren Thraysk

unread,
Feb 26, 2019, 9:09:08 AM2/26/19
to golang-sql

Yes, no dedicated interface. 

Just a mechanism to reduce network round trips. So instead of round trip per insert, can perform multiple inserts in one round trip. And still retrieve things like auto generated IDs (LastInsertID) for each insert.

Daniel Theophanes

unread,
Mar 19, 2019, 12:28:59 PM3/19/19
to golang-sql
Hi Tamás,

I think it would be a good idea to come up with a "bulk insert" interface for database/sql. Then for Oracle, let's find an escape hatch that allows database/sql users to still use specific database functionality.

If we limit ourselves to insert, I think we could make a command like:


package database/sql

type BulkInsert struct {
    Destination string // Destination table name.
    Columns []string // Columns to insert into.
    Options interface{} // Database specific options.

    // InsertRow will be called repeatedly until io.EOF is returned that finishes the query, or a different error
    // is returned that stops the insert.
    InsertRow func(values []interface{}) error 
}

type Command interface {
    // To be determined.
}


func (*DB, *Conn, *Tran) Run(cmd Command) error

Then programs could make a general assumption of being able to bulk insert using database/sql, but drivers could also define more specialized Commands.

The Command interface itself is something that I'm trying to develop as part of database/sql/v2. Thoughts appreciated.

Tamás Gulácsi

unread,
Mar 31, 2019, 1:33:37 AM3/31/19
to golang-sql
Hi Daniel,

Yes, this could work.

So, there'd be a bulk Insert command in database/sql, and my driver could expose a bulk Exec command, too?

The Command should support
1. define the command to execute (INSERT, UPDATE, DELETE, Exec...)
2. allow adding new arguments
3. close/flush/finish the operation.

At point 2, the implementation may decide that the buffer is full, and execute the command, maybe returning an error.
Point 3 is required, to flush the accumulated buffer.

Although after all 1. is not needed in the Command interface!
What remains is
type Command interface {
 
Append(values []interface{}) error
 
Close() error
}

And that means we can have
func Insert(destination string, columns []string, options ...interface{}) Command {
 
return BulkInsert{Destionation:destination, Columns: columns, Options: options}
}

func
(bi BulkInsert) Append(values []interface{}) error {
 
// implenet InsertRow(values)
}


But I must confess that I don't know much about other DB wire format / bulk possibilities,
so I don't understand the comment aboive InsertRow about io.EOF...

Jack Christensen

unread,
Apr 1, 2019, 10:25:50 AM4/1/19
to Tamás Gulácsi, golang-sql
As far as PostgreSQL (and pgx) goes, it would be important to distinguish between a dedicated bulk insert command and an arbitrary query batching system. There are (at least) 3 different approaches PostgreSQL supports.

1. The COPY interface is the best approach for bulk inserts (https://godoc.org/github.com/jackc/pgx#Conn.CopyFrom). This is the fastest approach for inserting many rows.
2. Multiple SQL commands can be included in a single simple protocol query. Any type of query can be included. But since it uses the simple protocol it needs client side SQL sanitization and uses string manipulation to bundle the queries.
3. It is also possible to batch multiple queries together and send them all at once over the wire (https://godoc.org/github.com/jackc/pgx#Batch). This latter approach allows bundling different types of queries using prepared statements and the extended protocol. In theory, this would be the best for batching heterogeneous queries. However, since this approach simply sends all the queries without waiting for a response, it is possible to deadlock with large batches due to network buffers being filled unless the results can be read concurrently with sending the queries. I have not come up with an interface I'm happy with that solves this problem.

--
You received this message because you are subscribed to the Google Groups "golang-sql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-sql+...@googlegroups.com.
To post to this group, send email to golan...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-sql/93b3a5f2-2404-44c3-bc86-d9e2271ca343%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ta...@gulacsi.eu

unread,
Apr 1, 2019, 2:13:48 PM4/1/19
to Tamás Gulácsi, Jack Christensen, golang-sql
I don't see any problem with the proposed interface - it can use the CopyFrom int the background in a goroutine, using a chan to transpose the writer to the reader.
Reply all
Reply to author
Forward
0 new messages