Rhino ETL SqlBulkInsert, BatchSize?

527 views
Skip to first unread message

Michael Gates

unread,
Jul 24, 2012, 3:09:11 PM7/24/12
to rhino-t...@googlegroups.com
I am trying to use the SqlBulkInsert macro to insert records from a 250 million line csv. I set the "BatchSize" parameter to 10,000 thinking that this will make the engine perform a bulk insert operation every 10,000 records. However, this isn't happening...looking at the Rhino.Etl source code, it appears that the SqlBulkInsert operation tries to insert all the records from the rows value at once...this won't work when there are a lot of rows like in my case.

Is there another way to batch out the bulk inserts?

Nathan Palmer

unread,
Jul 24, 2012, 4:57:07 PM7/24/12
to rhino-t...@googlegroups.com
For a file of that size I would also recommend to disable the transaction as it will still commit the 250mil at once at the end.

BatchSize = 10000, UseTransaction = false

Nathan Palmer

On Tue, Jul 24, 2012 at 3:09 PM, Michael Gates <miket...@gmail.com> wrote:
I am trying to use the SqlBulkInsert macro to insert records from a 250 million line csv. I set the "BatchSize" parameter to 10,000 thinking that this will make the engine perform a bulk insert operation every 10,000 records. However, this isn't happening...looking at the Rhino.Etl source code, it appears that the SqlBulkInsert operation tries to insert all the records from the rows value at once...this won't work when there are a lot of rows like in my case.

Is there another way to batch out the bulk inserts?

--
You received this message because you are subscribed to the Google Groups "Rhino Tools Dev" group.
To view this discussion on the web visit https://groups.google.com/d/msg/rhino-tools-dev/-/9nckkRpPWDAJ.
To post to this group, send email to rhino-t...@googlegroups.com.
To unsubscribe from this group, send email to rhino-tools-d...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rhino-tools-dev?hl=en.

Michael Gates

unread,
Jul 24, 2012, 6:08:40 PM7/24/12
to rhino-t...@googlegroups.com, em...@nathanpalmer.com
I'm looking at the source code right now, and SqlBulkInsertOperation doesn't even appear to use the BatchSize parameter. Am I looking at the wrong class?


On Tuesday, July 24, 2012 2:57:07 PM UTC-6, Nathan Palmer wrote:
For a file of that size I would also recommend to disable the transaction as it will still commit the 250mil at once at the end.

BatchSize = 10000, UseTransaction = false

Nathan Palmer

On Tue, Jul 24, 2012 at 3:09 PM, Michael Gates <miket...@gmail.com> wrote:
I am trying to use the SqlBulkInsert macro to insert records from a 250 million line csv. I set the "BatchSize" parameter to 10,000 thinking that this will make the engine perform a bulk insert operation every 10,000 records. However, this isn't happening...looking at the Rhino.Etl source code, it appears that the SqlBulkInsert operation tries to insert all the records from the rows value at once...this won't work when there are a lot of rows like in my case.

Is there another way to batch out the bulk inserts?

--
You received this message because you are subscribed to the Google Groups "Rhino Tools Dev" group.
To view this discussion on the web visit https://groups.google.com/d/msg/rhino-tools-dev/-/9nckkRpPWDAJ.
To post to this group, send email to rhino-tools-dev@googlegroups.com.
To unsubscribe from this group, send email to rhino-tools-dev+unsubscribe@googlegroups.com.

Simone Busoli

unread,
Jul 24, 2012, 6:20:33 PM7/24/12
to rhino-t...@googlegroups.com
Probably so, this is the class you want.

To view this discussion on the web visit https://groups.google.com/d/msg/rhino-tools-dev/-/1ZbnKtTn3HQJ.

To post to this group, send email to rhino-t...@googlegroups.com.
To unsubscribe from this group, send email to rhino-tools-d...@googlegroups.com.

Michael Gates

unread,
Jul 24, 2012, 6:37:53 PM7/24/12
to rhino-t...@googlegroups.com
Man, what the heck am I looking at here?!!

Is this some old version of the code?

Nathan Palmer

unread,
Jul 24, 2012, 6:49:10 PM7/24/12
to rhino-t...@googlegroups.com
Yeah we moved over to the hibernating-rhino's organization on github a year or so ago.

Nathan Palmer

To view this discussion on the web visit https://groups.google.com/d/msg/rhino-tools-dev/-/JuiMwVLCmiIJ.

To post to this group, send email to rhino-t...@googlegroups.com.
To unsubscribe from this group, send email to rhino-tools-d...@googlegroups.com.

Simone Busoli

unread,
Jul 24, 2012, 7:27:24 PM7/24/12
to rhino-t...@googlegroups.com
@Oren, maybe you might change the description of your repo to specify that the main one has moved.

oliwa

unread,
Sep 27, 2012, 10:38:00 AM9/27/12
to rhino-t...@googlegroups.com, em...@nathanpalmer.com
I'm having the sane issue.  I'm reading data in from a data reader and sending it to a SqlBulkInsertOperation.  I've set the batch size, turned off transactions, and even set LockTable to false but RhinoETL is still trying to bulk insert all the rows in one transaction.

I watch SQL Profiler so I know my batch size has taken effect.  This is rather frustrating.

Any ideas?

Miles Waller

unread,
Sep 27, 2012, 12:02:57 PM9/27/12
to rhino-t...@googlegroups.com
Hi,

Are you setting UseTransaction=false on the whole pipeline or just the
operation?

I noticed the other day that when you register the operation, the
UseTransaction setting is set with the value from the _process_ itself
(default true), overwriting anything you have applied to the actual
operation :

https://github.com/hibernating-rhinos/rhino-etl/blob/master/Rhino.Etl.Core/EtlProcessBase.cs
(line 55)

Perhaps that is the cause of this behaviour?

Miles
> https://groups.google.com/d/msg/rhino-tools-dev/-/4hnm-wqs62gJ.

oliwa

unread,
Sep 27, 2012, 1:17:23 PM9/27/12
to rhino-t...@googlegroups.com
I was setting it on the operation.  I took your suggestion and it still didn't work.  What did was override the Execute method like so...

public override IEnumerable<Row> Execute(IEnumerable<Row> rows)
{
    BatchSize = 5000;
    UseTransaction = false;
    LockTable = false;

    return base.Execute(rows);
}

This was the ONLY way I found to get it to load a table without blocking other users.
Reply all
Reply to author
Forward
0 new messages