Re: [RavenDB] Bulk Insert Issues

528 views
Skip to first unread message

Oren Eini (Ayende Rahien)

unread,
May 25, 2013, 6:25:59 AM5/25/13
to ravendb
Are you running inside ASP.Net?


On Sat, May 25, 2013 at 1:15 PM, Lee <safe...@hotmail.com> wrote:
I am using the latest stable build (2360) and am playing around with the bulk insert feature.

Is there an implicit limit to the total size of the documents that you store within a single bulk insert operation?  The behaviour I am experiencing suggests that there is but I can't find much on the net to confirm this.

The behaviour I am seeing is as follows: -

I have 2 types of document, one larger in terms of size than the other and there are around 100000 of each.  If I use a single bulk insert operation it stores 27136 of the larger document and none of the other.  If I create a single bulk operation for each of the document types then it stores 27136 of the larger document and 64000 of the other.  The number saved is consistent each time.

I get no exception to tell me that anything bad has happened.

--
You received this message because you are subscribed to the Google Groups "ravendb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Lee

unread,
May 25, 2013, 6:27:08 AM5/25/13
to rav...@googlegroups.com
No, it's a console application.

Oren Eini (Ayende Rahien)

unread,
May 25, 2013, 6:34:46 AM5/25/13
to ravendb
Hm, so that should work?
What build?
Embedded / server?
What happens when you look at the logs?

Lee

unread,
May 25, 2013, 7:00:18 AM5/25/13
to rav...@googlegroups.com
I upgraded the server to 2360 earlier and the client is using that build too.  It is using the server hosted in IIS.  After enabling the logs and re-running, I do see a couple of instances of the following exception: -

2013-05-25 11:52:13.2018,Raven.Database.DocumentDatabase,Info,Failed to execute background task 1,"System.AggregateException: One or more errors occurred. ---> System.Web.HttpException: Maximum request length exceeded.
   at System.Web.HttpBufferlessInputStream.ValidateRequestEntityLength()
   at System.Web.HttpBufferlessInputStream.Read(Byte[] buffer, Int32 offset, Int32 count)
   at System.IO.Stream.ReadByte()
   at Raven.Database.Util.Streams.PartialStream.Dispose(Boolean disposing) in c:\Builds\RavenDB-Stable\Raven.Database\Util\Streams\PartialStream.cs:line 77
   at System.IO.Stream.Close()
   at Raven.Database.Server.Responders.BulkInsert.<YieldBatches>d__6.System.IDisposable.Dispose() in c:\Builds\RavenDB-Stable\Raven.Database\Server\Responders\BulkInsert.cs:line 0
   at Raven.Database.DocumentDatabase.<>c__DisplayClass116.<BulkInsert>b__113(IStorageActionsAccessor accessor) in c:\Builds\RavenDB-Stable\Raven.Database\DocumentDatabase.cs:line 2147
   at Raven.Storage.Esent.TransactionalStorage.ExecuteBatch(Action`1 action) in c:\Builds\RavenDB-Stable\Raven.Database\Storage\Esent\TransactionalStorage.cs:line 558
   at Raven.Storage.Esent.TransactionalStorage.Batch(Action`1 action) in c:\Builds\RavenDB-Stable\Raven.Database\Storage\Esent\TransactionalStorage.cs:line 516
   at Raven.Database.Server.Responders.BulkInsert.<>c__DisplayClass4.<Respond>b__2() in c:\Builds\RavenDB-Stable\Raven.Database\Server\Responders\BulkInsert.cs:line 75
   at System.Threading.Tasks.Task.Execute()
   --- End of inner exception stack trace ---
---> (Inner Exception #0) System.Web.HttpException (0x80004005): Maximum request length exceeded.
   at System.Web.HttpBufferlessInputStream.ValidateRequestEntityLength()
   at System.Web.HttpBufferlessInputStream.Read(Byte[] buffer, Int32 offset, Int32 count)
   at System.IO.Stream.ReadByte()
   at Raven.Database.Util.Streams.PartialStream.Dispose(Boolean disposing) in c:\Builds\RavenDB-Stable\Raven.Database\Util\Streams\PartialStream.cs:line 77
   at System.IO.Stream.Close()
   at Raven.Database.Server.Responders.BulkInsert.<YieldBatches>d__6.System.IDisposable.Dispose() in c:\Builds\RavenDB-Stable\Raven.Database\Server\Responders\BulkInsert.cs:line 0
   at Raven.Database.DocumentDatabase.<>c__DisplayClass116.<BulkInsert>b__113(IStorageActionsAccessor accessor) in c:\Builds\RavenDB-Stable\Raven.Database\DocumentDatabase.cs:line 2147
   at Raven.Storage.Esent.TransactionalStorage.ExecuteBatch(Action`1 action) in c:\Builds\RavenDB-Stable\Raven.Database\Storage\Esent\TransactionalStorage.cs:line 558
   at Raven.Storage.Esent.TransactionalStorage.Batch(Action`1 action) in c:\Builds\RavenDB-Stable\Raven.Database\Storage\Esent\TransactionalStorage.cs:line 516
   at Raven.Database.Server.Responders.BulkInsert.<>c__DisplayClass4.<Respond>b__2() in c:\Builds\RavenDB-Stable\Raven.Database\Server\Responders\BulkInsert.cs:line 75
   at System.Threading.Tasks.Task.Execute()<---

Lee

unread,
May 25, 2013, 7:05:03 AM5/25/13
to rav...@googlegroups.com
There are actually two instances of this exception. 

I simplified the scenario in my original post for brevity but there are actually three types of document and the third type of document there are only a small number of and they all bulk insert successfully.  As I create a new bulk insert operation per type I am guessing that for the two larger types of document, too much data is trying to be sent as part of the insert request and as such I get one exception per bulk operation in that case.

Oren Eini (Ayende Rahien)

unread,
May 26, 2013, 4:39:27 AM5/26/13
to ravendb
Okay, make sense now.
Basically, when you are running in IIS you still have to respect the IIS limits.
You need to increase the IIS request size.

Lee

unread,
May 26, 2013, 5:11:17 AM5/26/13
to rav...@googlegroups.com
I thought the request size would be related to the bulk insert batch size, not the size of all of the documents that you were trying to insert?

Even if I increase the request size, I will only know at runtime how many documents I will be bulk inserting.  So how do I guarantee that the request size covers an unknown number of documents of an unknown size?

I can batch up the documents on the client and use more bulk insert operations but that doesn't sound right.  Sounds like something that should be happening with the bulk operation itself.

Oren Eini (Ayende Rahien)

unread,
May 26, 2013, 5:24:43 AM5/26/13
to ravendb
Bulk Insert uses only a single request to do all the work.
And in order to ensure that the request size isn't limiting, set it to a really high number.

Lee

unread,
May 26, 2013, 6:10:28 AM5/26/13
to rav...@googlegroups.com
If bulk insert uses a single request why are a proportion of the documents inserted successfully?

I feel 'features' like this are like traps, ready to catch developers unaware.  The semantics of a bulk insert means that you are going to get people inserting high quantities of documents.  You then have a batch size associated with a bulk insert which gives a false impression that the inserts will be split into batches.

What compounds this, is that the developer gets no immediate feedback about this issue at all and some but not all documents are inserted.

I hope that anybody using this feature in a production environment is also aware of this gotcha.

Thanks for your help.

Oren Eini (Ayende Rahien)

unread,
May 26, 2013, 6:14:20 AM5/26/13
to ravendb
Because they are batches. Basically, we flush a batch to the server, which is processed immediately, not at request end.

Kijana Woodard

unread,
May 26, 2013, 6:47:29 AM5/26/13
to rav...@googlegroups.com
@Lee - each batch is sent as a unit, as you stated you expect.
 
For feedback of what's going on, make sure to tie into the "Report" event:
bulkInsert.Report += BulkInsertReport;
 
The only gotcha I'm aware is that if you overwrite existing documents without the proper options, it fails silently in 2330. I believe this has already been addressed in the unstable 2.5 builds. I _want_ to overwrite existing docs, so I use:
using (var bulkInsert = _store.BulkInsert(options: new BulkInsertOptions() { CheckForUpdates = true, BatchSize = 2048 }))
 
I'm not in production, but I've had a dev instance running bulk insert for weeks without crashing. It has had no problem importing ~6M docs.

Lee

unread,
May 26, 2013, 7:08:55 AM5/26/13
to rav...@googlegroups.com
The report event is certainly useful from a progress point of view but it still doesn't give any feedback that anything has gone wrong in my case.  In fact it gives the developer even more misplaced confidence that everything has gone ok.

Justin A

unread,
May 26, 2013, 7:14:09 AM5/26/13
to rav...@googlegroups.com
>>  I _want_ to overwrite existing docs, so I use:
>> using (var bulkInsert = _store.BulkInsert(options: new BulkInsertOptions() { CheckForUpdates = true, BatchSize = 2048 }))
 

so CheckForUpdates means it will overwrite an existing document with the same Id? I thought it was the opposite? I thought CFU will throw a concurrency error or something, if exists.

  • CheckForUpdates - enables document updates (default: false)

damn it! i've had it backwards.

/facepalm !

Reply all
Reply to author
Forward
0 new messages