RavenDB exception and crashing client IIS app

Ian Cross

unread,

Aug 10, 2017, 10:35:50 AM8/10/17

to RavenDB - 2nd generation document database

Hi guys,

We are seeing this error message when the system is attempting to do a 'bulk' load and then this appears to be having a knock-on impact of causing our application (running in IIS) to sometimes crash.

EXCEPTION: Raven.Abstractions.Connection.ErrorResponseException: A task was canceled.

at Raven.Client.Connection.Implementation.HttpJsonRequest.<>c__DisplayClass36_0.<<SendRequestInternal>b__0>d.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at Raven.Client.Connection.Implementation.HttpJsonRequest.<RunWithAuthRetry>d__38`1.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at Raven.Client.Connection.Async.AsyncServerClient.<DirectGetAsync>d__74.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at Raven.Client.Connection.ReplicationInformerBase`1.<TryOperationAsync>d__34`1.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at Raven.Client.Connection.ReplicationInformerBase`1.<ExecuteWithReplicationAsync>d__33`1.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at Raven.Client.Connection.Async.AsyncServerClient.<ExecuteWithReplication>d__164`1.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at Raven.Abstractions.Util.AsyncHelpers.<>c__DisplayClass1_1`1.<<RunSync>b__0>d.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()

at Raven.Abstractions.Util.AsyncHelpers.RunSync[T](Func`1 task)

at Raven.Client.Document.DocumentSession.LoadInternal[T](String[] ids)

Rest of stack trace is client app.

We have potentially seen some issues where the system is trying to connect with the DR Database server as part of this process. The DR server is in a completely different network and the RavenDB client in our application is unable to see that server so we are wondering whether this is causing IIS to crash?

The requested URL could not be retrieved. The following error was encountered while trying to retrieve the URL: <a href="http://DR_DB_IPADDRESS:8080/databases/xxxxxxx_dr//queries/?">http://DR_DB_IPADDRESS:8080/databasesxxxxxxx_dr//queries/?</a

"The remote host or network may be down. Please try the request again."

Why would the client be trying to contact the DR server as part of running a session.Load[] request?

Would splitting the Load[] request into multiple requests with less documents be sensible? We need to keep them in the same session though ideally.

Should we change the replication settings somehow to prevent this Raven client behaviour?

There are probably 2000 ish documents ranging in size but they can be big - 2.5MB is not unheard of, but normally 1MB for the particular documents we are loading.

Cheers,

Ian

Oren Eini (Ayende Rahien)

unread,

Aug 10, 2017, 10:43:05 AM8/10/17

to ravendb

This is typically a timeout

How many items are you loading here?

Hibernating Rhinos Ltd

Oren Eini l CEO l Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ian Cross

unread,

Aug 10, 2017, 10:50:34 AM8/10/17

to RavenDB - 2nd generation document database

Thanks for response Oren,

Sorry, I know it was a long email…. At the bottom, there are these comments, in particular see the comment about the RavenDB client attempting to contact a DR RavenDB server (that it can’t see on the network):

There are probably 2000 ish documents ranging in size but they can be big - 2.5MB is not unheard of, but normally 1MB for the particular documents we are loading.

Why would the client be trying to contact the DR server as part of running a session.Load[] request?

Would splitting the Load[] request into multiple requests with less documents be sensible? We need to keep them in the same session though ideally.

Should we change the replication settings somehow to prevent this Raven client behaviour?

Cheers,

Ian

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.

Oren Eini (Ayende Rahien)

unread,

Aug 10, 2017, 11:09:44 AM8/10/17

to ravendb

You are trying to load multiple GB in the same request.

That is not a good idea.

We are assuming that a load request is a pretty small one, and aren't streaming it, so that can be really inefficient.

You get a timeout from the main db, so we fail over to the DR, and also fail there, probably.

Splitting it would be much better, yes.

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Ian Cross

unread,

Aug 10, 2017, 11:21:44 AM8/10/17

to RavenDB - 2nd generation document database

Ok, makes sense - one last thing... can we control the client so that it does not attempt to try and contact DR? It is never going to be able to see the DR server.

Would either of these options in the Replication Settings UI stop it: changing 'Client Failover Behaviour' to 'Fail Immediately' or 'Failover' = 'Skip'

When do these changes in the Replication settings become active in the client?

Cheers,

Ian

Oren Eini (Ayende Rahien)

unread,

Aug 10, 2017, 11:23:06 AM8/10/17

to ravendb

You can set ReplicationBehavior = FailImmediately, yes.

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Ian Cross

unread,

Aug 12, 2017, 2:32:00 AM8/12/17

to RavenDB - 2nd generation document database

Ok, thanks. Sending the Load() requests in batches has made a difference, but the same issue then moved to the SaveChanges() call, I assume because the client is struggling to send a large volume of data to the server in a single transaction. We got around this using multiple sessions for now, although we do lose the ability to do this bulk load, update and save changes in a single transaction.

So, things are a bit better but IIS is still crashing and we are seeing this message regularly in the logs:

"System.TimeoutException: Over 45 seconds have passed since we got a server heartbeat, even though we should get one every 10 seconds or so.

This connection is now presumed dead, and will attempt reconnection"

I have searched the group and can only see one previous mention of this relating to BulkInsert, but that is not involved here. Keen to understand what this message is and what it might tell us. Could this also happen because of volume of data, even if we are batching reads / writes now?

We know there is a lot of data going to / from the client to the server. There do not seem to be any worrying alerts in the event log or in the Raven console on the database server.

Cheers,

Ian

Oren Eini (Ayende Rahien)

unread,

Aug 12, 2017, 7:41:15 AM8/12/17

to ravendb

A single SaveChanges with 2000 with 1.5 - 2.5 MB documents will do that. You should either break it apart to separate SaveChanges or use Bulk Insert to allow you to stream the results to the server.

What you are trying to do is a single transaction that spans multiple GBs, which is something pretty rare.

If you want to do that, you need to provision enough system resources to handle that, but what is the scenario that require so much ?

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Ian Cross

unread,

Aug 14, 2017, 6:33:51 PM8/14/17

to RavenDB - 2nd generation document database

The scenario is a very complex quote containing many line items. The majority of the line items have a seperate document containing the bespoke configuration of the item being sold and the items are being updated in bulk. An example would be to make a change that impacts all line items and therefore each line item needs to be retrieved, updated, calculated and the changes saved. Ideally this would happen within a transaction so that all succeed or fail together.

We have already changed the approach to Load() and SaveChanges() in batches and this has helped, but we still see this in our logs:

"System.TimeoutException: Over 45 seconds have passed since we got a server heartbeat, even though we should get one every 10 seconds or so.

This connection is now presumed dead, and will attempt reconnection"

I can see that this is related to the Changes API and no heartbeat coming from the server. What might this indicate and what resources are likely to be the contstraining factor? Memory and CPU utilization on the DB server are looking ok so I guess it might be network or storage?

The long-term approach will be to update the design to reduce the size of the documents, but in the meantime we need to make sure things continue to functon adequately.

Cheers, Ian

Oren Eini (Ayende Rahien)

unread,

Aug 15, 2017, 6:27:03 AM8/15/17

to ravendb

That is something that is usually auto recover. It is possible that the client / server / network is holding things up too much.

Can you send the full stack trace of this error?

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Ian Cross

unread,

Aug 15, 2017, 1:39:23 PM8/15/17

to RavenDB - 2nd generation document database

Hi Oren,

There's not really a stack trace here - from looking at the Raven Changes API client code, there is a background timer that runs to check if a heartbeat has been received recently and if not, it simply raises an exception which appears in our logs.

We have investigated with some perf counters today and Avg. Disk Sec / Read looks like the latency on the Raven DB data volume is having higher read latency under load. My assumption currently is that RavenDB is having to wait too long to get the data off the disk and stuff gets backed up, eventually causing the server to fail to send the heartbeat message for the Changes API. We are exploring this angle and seeing what we can do to increase SAN performance.

Cheers,

Ian

Oren Eini (Ayende Rahien)

unread,

Aug 15, 2017, 2:54:05 PM8/15/17

to ravendb

You are seeing this error _somewhere_, no? This should come with a full stack that explains what it is doing for this timeout.

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Ian Cross

unread,

Aug 15, 2017, 4:05:25 PM8/15/17

to RavenDB - 2nd generation document database

I may be being thick, but there is no stack trace. It is not happening as a result of us actually making a call to the Raven Client which then errors. I.e. if we did a Load() or a SaveChanges() then yes, we would get a Stack Trace.

However, this is happening because Raven Client has a background thread (on a timer) and a message has not been received from the server (reason unknown). The code below here is in Raven.Client.Changes:

private void ClientSideHeartbeat(object _)
        {
            TimeSpan elapsedTimeSinceHeartbeat = SystemTime.UtcNow - lastHeartbeat;
            if (elapsedTimeSinceHeartbeat.TotalSeconds < 45)
                return;
            OnError(new TimeoutException("Over 45 seconds have passed since we got a server heartbeat, even though we should get one every 10 seconds or so.\r\n" +
                                         "This connection is now presumed dead, and will attempt reconnection"));
        }

Am I making sense?

Cheers,

Ian

Oren Eini (Ayende Rahien)

unread,

Aug 16, 2017, 9:29:38 AM8/16/17

to ravendb

Okay, that isn't an issue, it is fine if this happens.

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Ian Cross

unread,

Aug 17, 2017, 4:50:26 AM8/17/17

to RavenDB - 2nd generation document database

Ok, thanks - we will ignore this

Reply all

Reply to author

Forward