Bulk Insert database commands question

115 views
Skip to first unread message

Michael Hallock

unread,
Jul 29, 2014, 11:17:05 AM7/29/14
to rav...@googlegroups.com
If I am using a bulk insert operation to store a document, and then right after that using bulkInsert.DatabaseCommands,PutAttachment to put a related attachment into the database, in what order does that happen?

I.e, does a bulkInsert DatabaseCommand get executed immediately, or are DatabaseCommands ALSO handled in bulk?

Kijana Woodard

unread,
Jul 29, 2014, 11:24:44 AM7/29/14
to rav...@googlegroups.com
Do you mean after the Bulk Insert using block exits or within the using block?


On Tue, Jul 29, 2014 at 10:17 AM, Michael Hallock <i8b...@gmail.com> wrote:
If I am using a bulk insert operation to store a document, and then right after that using bulkInsert.DatabaseCommands,PutAttachment to put a related attachment into the database, in what order does that happen?

I.e, does a bulkInsert DatabaseCommand get executed immediately, or are DatabaseCommands ALSO handled in bulk?

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Hallock

unread,
Jul 29, 2014, 11:28:26 AM7/29/14
to rav...@googlegroups.com
Within. E.g.:

 using(var bulkInsert = @this.BulkInsert(database, new BulkInsertOptions { CheckForUpdates = true }))
 {
     bulkInsert.Store(document, DocumentType + "/" + document.Id); 
     bulkInsert.DatabaseCommands.PutAttachment(attId, null, data, RavenJObject.FromObject(new FileMetadata
                    {
                        FileName = string.Format("{0}.{1}", document.Name, document.FileType.ToLower().TrimStart('.')),
                        MimeType = Constants.MimeTypes[document.FileType.ToLower()],
                        Disposition = "inline",
                        LastModified = DateTime.Now
                    }));
 }

Kijana Woodard

unread,
Jul 29, 2014, 12:10:21 PM7/29/14
to rav...@googlegroups.com
I've never used attachments.

However, bulk insert completes in batches, not after the store. I would expect that the PutAttachment, will quite often, arrive at the server before the document. That said, I don't know whether attachments care if the document is present or not. The semantics of English would make it seem that the document should be present first.

A couple other possible approaches.
Keep a list of the names/file types and do the attachments after the bulk insert is complete.
Write the file metadata as separate real documents with bulk insert. Make their id be document.Id + "/fileMetadata".
Put that FIleMetadata either in the document or in the document's metadata?
   If in the document, you use classes that don't have the file metadata to avoid inadvertent changes. 

Chris Marisic

unread,
Jul 29, 2014, 1:46:40 PM7/29/14
to rav...@googlegroups.com
I strongly recommend against raven file attachments. There is not enough tooling around them to appropriately administer a real application in the wild.

I recommend using Azure Blob Storage instead.

Also RavenDB works best on high performance SSD disks, it's a horrible waste to stick binary on them.

Kijana Woodard

unread,
Jul 29, 2014, 1:51:56 PM7/29/14
to rav...@googlegroups.com
The reason I didn't suggest physical files is the example attachment seemed to be metadata, not the actual file.

Michael Hallock

unread,
Jul 29, 2014, 1:52:08 PM7/29/14
to rav...@googlegroups.com
I got outvoted on this particular implementation detail.

Chris Marisic

unread,
Jul 29, 2014, 1:57:56 PM7/29/14
to rav...@googlegroups.com
I recommend getting it in writing explicitly if you haven't yet. It will be a great insurance policy to have at a later time.

Michael Hallock

unread,
Jul 29, 2014, 1:58:44 PM7/29/14
to rav...@googlegroups.com
Well, I guess it WOULD be ideal that they go in at the same time, but that's not really what I'm hinting at... this is sort of a preliminary question so I can ask another one.

We are running into an issue that looks an awful lot like a timeout issue of some sort with this approach. I'm really looking for some insight into how Raven handles this to try and debug THAT issue. The way the API seems to look, I would assume that the bulkInsert.DatabaseCommands stuff would run immediately instead of being inserted into the bulk insert operation (Even if that would make sense...).

If it DOES work where they are put into the batch execution along with the other commands, it could help point in me in the right direction to ask another question, specifically, what would make a bulk insert seemingly timeout when pushing a couple hundred megs of attachments through DatabaseCommands.PutAttachment?

But we actually see this behavior in other places as well besides just this attachment loader, so this is more of an educational exercise at this point. I believe one of my other team members asked the question I was gearing up for already (may still be in moderation for the list though). 

I'm still curious though, just because I'd like to know how that particular aspect of this works.


On Tuesday, July 29, 2014 12:10:21 PM UTC-4, Kijana Woodard wrote:

Michael Hallock

unread,
Jul 29, 2014, 2:00:17 PM7/29/14
to rav...@googlegroups.com
Oh no, we are actually sending the attachment over too. See the "data" argument being passed in PutAttachment in my example.

Chris Marisic

unread,
Jul 29, 2014, 2:04:52 PM7/29/14
to rav...@googlegroups.com


On Tuesday, July 29, 2014 1:58:44 PM UTC-4, Michael Hallock wrote:

If it DOES work where they are put into the batch execution along with the other commands, it could help point in me in the right direction to ask another question, specifically, what would make a bulk insert seemingly timeout when pushing a couple hundred megs of attachments through DatabaseCommands.PutAttachment?


Could be default ASP.NET / IIS connection time outs, or data limits. They are very low to start with out of box, to limit the impacts of DDoS. The more you crank up limits the easier it is to DDoS your end points.

Kijana Woodard

unread,
Jul 29, 2014, 2:08:56 PM7/29/14
to rav...@googlegroups.com
In that case, I would recommend storing the files somewhere else.

Maybe it will help [rekindle?] the argument to add the fact that attachments will be deprecated in favor of RavenFS in 3.0.


Michael Hallock

unread,
Jul 29, 2014, 11:09:09 PM7/29/14
to rav...@googlegroups.com
We tried upping the max request length to 2 gigs and our current timeout is 1200 seconds (20 minutes) on the raven server. Were also tried seeing the json timeout on the docstore as we found an older topic that made a suggestion for that.

None of which worked unfortunately.

Michael Hallock

unread,
Jul 29, 2014, 11:11:40 PM7/29/14
to rav...@googlegroups.com
Actually that was one of the reasons I got out voted. Orin told us that there would be an upgrade path for that. Either way, I don't really have an option to change that at this point...

Michael Hallock

unread,
Jul 29, 2014, 11:13:05 PM7/29/14
to rav...@googlegroups.com
Lol, no problem, I made my opinions widely known on the topic. I'm bought in with the team at this point so I just need to find a way to make the chosen approach work well at this point.

Oren Eini (Ayende Rahien)

unread,
Jul 30, 2014, 2:52:10 AM7/30/14
to ravendb
When you use Fiddler to capture the traffic...
1) what is the client side error?
2) What do you see in Fiddler?



Oren Eini

CEO


Mobile: + 972-52-548-6969

Office:  + 972-4-622-7811

Fax:      + 972-153-4622-7811





On Wed, Jul 30, 2014 at 6:09 AM, Michael Hallock <i8b...@gmail.com> wrote:
We tried upping the max request length to 2 gigs and our current timeout is 1200 seconds (20 minutes) on the raven server. Were also tried seeing the json timeout on the docstore as we found an older topic that made a suggestion for that.

None of which worked unfortunately.

Chris Marisic

unread,
Jul 30, 2014, 9:53:08 AM7/30/14
to rav...@googlegroups.com
Remember that there is 2 sets of dials and knobs, 1 for IIS itself and 1 for ASP.NET. Make sure you've set both accordingly. If you only changed 1 set the other still might be closing your connection.

Michael Hallock

unread,
Jul 30, 2014, 10:59:54 AM7/30/14
to rav...@googlegroups.com
We are setting the httpRuntime maxRequestLength and executionTmeout. Which other knobs are you talking about?

Oren Eini (Ayende Rahien)

unread,
Jul 30, 2014, 11:03:07 AM7/30/14
to ravendb
Can you check through Fiddler?



Oren Eini

CEO


Mobile: + 972-52-548-6969

Office:  + 972-4-622-7811

Fax:      + 972-153-4622-7811





--

Michael Hallock

unread,
Jul 30, 2014, 11:37:06 AM7/30/14
to rav...@googlegroups.com
So just to be clear before posting the Fiddler logs:

These are attachments that are being uploaded inside of a bulkInsertOperation scope, e.g.:

 using(var bulkInsert = @this.BulkInsert(database, new BulkInsertOptions { CheckForUpdates = true }))
{
     bulkInsert
.Store(document, DocumentType + "/" + document.Id);
     
bulkInsert.DatabaseCommands.PutAttachment(attId, null, data, RavenJObject.FromObject(new FileMetadata
                   {
                       FileName = string.Format("{0}.{1}", document.Name, document.FileType.ToLower().TrimStart('.')),
                       MimeType = Constants.MimeTypes[document.FileType.ToLower()],
                       Disposition = "inline",
                       LastModified = DateTime.Now
                   }));
 }


Also: We DO have authentication turned on and are using an API key for this operation.

This starts off and runs happily for a while, until:

System.Net.WebException: The request was aborted: The request was canceled.
    at System.Net.HttpWebRequest.BeginGetResponse(AsyncCallback callback, Object state)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncImpl(Func`3 beginMethod, Func`2 endFunction, Action`1 endAction, Object state, TaskCreationOptions creationOptions)
   at System.Threading.Tasks.TaskFactory`1.FromAsync(Func`3 beginMethod, Func`2 endMethod, Object state)
   at System.Net.WebRequest.<GetResponseAsync>b__8()
   at System.Threading.Tasks.Task`1.InnerInvoke()
   at System.Threading.Tasks.Task.Execute()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Raven.Client.Connection.HttpJsonRequest.<RawExecuteRequestAsync>d__4b.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Raven.Client.Document.RemoteBulkInsertOperation.<DisposeAsync>d__18.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Raven.Client.Document.RemoteBulkInsertOperation.Dispose()
   at Raven.Client.Document.BulkInsertOperation.Dispose()


Now judging from THAT, it looks like that error is popping up on the Dispose of the BulkInsertOperation... Which might imply to me that it's actually the bulk insert operation that is timing out here, but I'd have to assume the following was true for that:

1. I start a bulk insert operation. This opens a connection to the server and holds it open.
2. I run ONLY databaseCommands inside of that scope, which I THINK bypass the bulkInsert connection all together and make their own requests, yes?
3. This long running operation without any input on the bulkInsert connection causes the connection to be closed from the remote side.
4. At the end, the bulkInsert operation attempts to dispose this connection, and finds it already closed on it and freaks out.

Is my logic sound here? I definitely see in Fiddler that this error is finally happening on a "unwatch-bulk-operation" command, which appears to be making a request WITHOUT the authentication API key (assuming because it thinks it's session is already authenticated somethow?) and then it fails with a 401 unauthorized.

Oren Eini (Ayende Rahien)

unread,
Jul 30, 2014, 11:40:43 AM7/30/14
to ravendb
BulkInsert uses its own connection, and the only way to interact with it is via Store(), nothing else.
What might happen here is that loading the attachment is taking so long, that you are actually timing out on the connection.

I don't think that the unwatch is the issue here, though.




Oren Eini

CEO


Mobile: + 972-52-548-6969

Office:  + 972-4-622-7811

Fax:      + 972-153-4622-7811





Michael Hallock

unread,
Jul 30, 2014, 11:52:18 AM7/30/14
to rav...@googlegroups.com
That's what I'm saying. The unwatch just happens to be the last operation that the bulkInsert gets to try and call during it's Dispose, but since it's connection (or its authentication session) is already dropped on it, that call fails. Alternatively, it could be happening as it's trying to close an already abandoned connection, but that was kind of why I started the original thread, to get some insight into the actual process involved behind a bulkInsertOperation.

Either way, I think this pinpoints our issue... The question would be this:

If this is due to an authentication session going away, is that expected behavior here? Just curious if that would be seen as a bug in this case because we are using auth, or not. Having a little more insight into how the BulkInsertOperation works under the hood would have caught this potential bug in our code earlier, and I imagine it would take some changes to how you handle the connection on bulk inserts to close the loop on your side, I'm just curious if that was expected behavior in this case, or if we should be talking about filing a bug report or something.

It sounds like our fix options are:
1. Use the bulkInsert still, but actually call a Store operation and store our own document as well as the attachment that said document references. This is possible, and maybe even advisable, since there's no queryability against attachments right now, but makes our delete operations a bit more involved (but not much).
2. Just use docStore directly and abandon the BulkInsertOperation around this particular loader. It would require a little rearchitecting as our base class for these loaders ASSUMES a wrapping bulkInsert, but it's certainly doable.

Oren Eini (Ayende Rahien)

unread,
Jul 30, 2014, 11:54:03 AM7/30/14
to ravendb
No, that isn't the case. The actual problem is the timeout. 
Then you have a cascading failure on top of that, probably. 

The auth session is not relevant. The 401 should be retrying and genearting a new token.

Michael Hallock

unread,
Jul 30, 2014, 11:57:59 AM7/30/14
to rav...@googlegroups.com
Ok. My "fixes" are still relevant though, yes? Sounds like either I need to make sure I'm actually using the bulkInsert.Store operation in that loop, or I need to drop the wrapping bulkInsert, correct?

Oren Eini (Ayende Rahien)

unread,
Jul 30, 2014, 12:01:15 PM7/30/14
to ravendb
Yes.

Oren Eini (Ayende Rahien)

unread,
Jul 30, 2014, 12:01:27 PM7/30/14
to ravendb
BulkInsert does absolutely nothing if you aren't using the store

Michael Hallock

unread,
Jul 30, 2014, 12:51:48 PM7/30/14
to rav...@googlegroups.com
Yeah, we just went down that road because we had these generalized loaders that all used bulk insert, and this particular one does only attachments, but inherits from the base loader class we had. I just changed it to go ahead and use DocStore directly and everything seems fine.

Thanks all.
Reply all
Reply to author
Forward
0 new messages