Unexplained latency

Ido Ben Ari

unread,

Jul 22, 2016, 1:25:54 AM7/22/16

to RavenDB - 2nd generation document database

Hi,

I asked about this here few times in the last months. Tried few solutions but nothing really solved the problem, nor do I know what it really is.

Giving it another shot:

To make long story short:

C# MVC web service call the db server, fiddler shows the call is completed within 300 ms (for example) but it takes 1800 ms (for example) to get the results itself (after deseralization).

Happens once in few minutes.

Here is a link to my ANTS results session, the said method is "SearchDBCaap" (just in case someone want to look into it). https://drive.google.com/file/d/0B5BJ7O60hUr1ZjVIZmUxNEVQR3c/view?usp=sharing

Also attached the results in HTML format.

It doesn't seem like the problem is with deserialization. Even using dynamic didn't help here.

In short, I just see that most of the time is spent waiting. Can't figure out for what.

Seems like a lot of time is spent in RavenMethod called "BeginMessageLoop", and there in the line "this.workItemsWaiting.WaitOne();"

Any ideas are welcome....

Thank,

Ido

Results.htm

Oren Eini (Ayende Rahien)

unread,

Jul 22, 2016, 1:27:53 AM7/22/16

to ravendb

What happens when you use async?

Hibernating Rhinos Ltd

Oren Eini l CEO l Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ido Ben Ari

unread,

Jul 22, 2016, 1:37:58 AM7/22/16

to rav...@googlegroups.com

Same...

Sent from my iPhone

You received this message because you are subscribed to a topic in the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ravendb/PFrEq-asfCE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ravendb+u...@googlegroups.com.

Oren Eini (Ayende Rahien)

unread,

Jul 22, 2016, 1:40:17 AM7/22/16

to ravendb

And what does your document looks like?

Certain document structure can cost a lot to parse from strings to objects (regardless of the object type), see:

https://ayende.com/blog/175047/production-postmortem-the-insidious-cost-of-managed-memory?key=ea4150353c6c46e38f6559e08eb9481d

Ido Ben Ari

unread,

Jul 22, 2016, 2:06:19 AM7/22/16

to RavenDB - 2nd generation document database

Attaching example of the response I get from db. time in fiddler was 420ms overall_elapsed.

This is the results of using transformer, more than one document. I think my db structure is not ideal (hence the 400ms transformation) and I am changing it these days. But this does not seem to be the cause to the problem because what I'm trying to understand and resolve here is the difference between 400ms total request time in fiddler and 1.5 seconds to get the results in code (sometimes).

Thanks,

Ido

SearchResults.txt

Oren Eini (Ayende Rahien)

unread,

Jul 22, 2016, 2:13:47 AM7/22/16

to ravendb

It isn't this, it looks like something in your code is making a web request call. See:

System.ServiceModel.Dispatcher.DispatchOperationRuntime.DeserializeInputs (MessageRpc rpc)

Ido Ben Ari

unread,

Jul 22, 2016, 2:25:38 AM7/22/16

to RavenDB - 2nd generation document database

I have a WCF dispatch behavior attached to the service and it's called everytime an operation is called on this service.

I will remove it and see if it has any affect.

But, why would it affect the calls to raven db server?

Oren Eini (Ayende Rahien)

unread,

Jul 22, 2016, 2:31:59 AM7/22/16

to ravendb

It isn't, I'm looking at the trace and seeing what is expensive. I don't have ANTS Profiler. Do you have a dotTrace file?

Ido Ben Ari

unread,

Jul 22, 2016, 2:38:19 AM7/22/16

to RavenDB - 2nd generation document database

I see. I anyway wanted to remove this one, just to eliminate options.

I will generate a DotTrace profile session and will send it to you.

Thank you!

Ido Ben Ari

unread,

Jul 22, 2016, 1:32:28 PM7/22/16

to RavenDB - 2nd generation document database

Hi,

I put 2 DotTrace traces here: https://drive.google.com/folderview?id=0B5BJ7O60hUr1TUNrdTlVbkx1UFk&usp=sharing

it's 4 files , 2 for each trace (results and settings).

The problematic method name is "SearchDbCapp". In these snapshots there 71 and 137 calls. Not necessarily do these snapshot reflect the exact scenario, I see that the average time is 336ms. Yet they reflect the same flow.

The scenario is - SerchDbCapp calls the db, sometimes it takes >1.5 sec and on the same time fiddler shows ~400 ms.

Thanks,

Ido

Oren Eini (Ayende Rahien)

unread,

Jul 22, 2016, 1:50:17 PM7/22/16

to ravendb

This looks very much like we are waiting for a pooled connection, or something like that.

Basically, the reason you see the wait is the we are actually running async and wrap that in a sync method.

That is why you see the long wait in the message loop.

But the actual delay is in the execution of the query. You can use the HttpJsonRequestFactory.ConfigureRequest and LogRequest to time the requests from the client side.

My guess is that something is stalling the requests. You can compare the start time in your log to the fiddler time.

Maybe you have a limit on concurrent connections?

Ido Ben Ari

unread,

Jul 22, 2016, 2:24:01 PM7/22/16

to RavenDB - 2nd generation document database

According to what I saw yesterday, the unexplained gap is after the request was completed in fiddler already.

I will use these events to log query times.

Max connection was defined to be 50, I changed it as the default supposed to be 2 only according to what I read.

One more note, I don't think it's related by I have the following in the System.Net config:

<system.net>

</settings>

Thanks,

Ido

Oren Eini (Ayende Rahien)

unread,

Jul 22, 2016, 2:28:35 PM7/22/16

to ravendb

Can you clear _all_ config items?

Are you running this concurrently?

Ido Ben Ari

unread,

Jul 22, 2016, 2:32:38 PM7/22/16

to RavenDB - 2nd generation document database

You mean delete from the maxconnection and the 100continue from the configuration file? I will try that (I need the 2nd one, but will work for the test).

Running concurrently what? I took the traces from productivity server, and there is a concurrency at least in some of the cases.

Oren Eini (Ayende Rahien)

unread,

Jul 22, 2016, 2:45:29 PM7/22/16

to ravendb

Try this on a separate system, without concurrency.

You might just be seeing the I/O delay on the network

Ido Ben Ari

unread,

Jul 22, 2016, 4:22:15 PM7/22/16

to RavenDB - 2nd generation document database

I will do that, thought the problem happens from time to time in production and I couldn't reproduce locally so far.

UPDATE -

I used the LogRequest and ConfiureRequest events, had fiddler opened and waited for the problem to happen.

Findings:

Configure Request: 19:50:42.28

LogRequest : 19:50:44.68

Duration as appear in the log request: 2390 MS

Fiddler duration: 376 MS

Raven Statistic: 370 MS

Fiddler start: 19:50:42.285

Fiddler complete: 19:50.42.662

To summarize, this matches what I've seen so far..The unexplained gap is after the response completed. Clearly the code is waiting for something...But for what?

Could have been GC - but we are on server mode and this seems to consume to much time,

Could have been http connection - but doesn't see like there are that many connections.

Could have been thread - not sure what takes my thread then.

Any idea?

Ido Ben Ari

unread,

Jul 22, 2016, 7:51:17 PM7/22/16

to RavenDB - 2nd generation document database

In addition - noticed something weird, the LogRequest and ConfigureRequest events are called multiple times. 10-12 times per each call, in some of the cases. I double checked, there is a single call to db which generates multiple events.

Earlier I wrote:

I will do that, thought the problem happens from time to time in production and I couldn't reproduce locally so far.

UPDATE -

I used the LogRequest and ConfiureRequest events, had fiddler opened and waited for the problem to happen.

Findings:

Configure Request: 19:50:42.28

LogRequest : 19:50:44.68

Duration as appear in the log request: 2390 MS

Fiddler duration: 376 MS

Raven Statistic: 370 MS

Fiddler start: 19:50:42.285

Fiddler complete: 19:50.42.662

To summarize, this matches what I've seen so far..The unexplained gap is after the response completed. Clearly the code is waiting for something...But for what?

Could have been GC - but we are on server mode and this seems to consume to much time,

Could have been http connection - but doesn't see like there are that many connections.

Could have been thread - not sure what takes my thread then.

Any idea?

Oren Eini (Ayende Rahien)

unread,

Jul 23, 2016, 9:31:23 AM7/23/16

to ravendb

Huh.

Can you send us a way to reproduce this?

Ido Ben Ari

unread,

Jul 23, 2016, 11:46:01 AM7/23/16

to rav...@googlegroups.com

Do you refer the multipleLogRequest per call?

Sent from my iPhone

Oren Eini (Ayende Rahien)

unread,

Jul 23, 2016, 11:53:09 AM7/23/16

to ravendb

Yeah, and I general your issue

Ido Ben Ari

unread,

Jul 23, 2016, 12:02:02 PM7/23/16

to rav...@googlegroups.com

Ok, will find a way.

Sent from my iPhone

Ido Ben Ari

unread,

Jul 24, 2016, 1:05:51 AM7/24/16

to RavenDB - 2nd generation document database

Oren,

As for the multiple events generated per each call, while preparing reproduce code for this, I figured the issue -

I am opening a new session per each of those calls and then add the event handler.

using (var session = RavenDocumentStore.RavenDocumentStoreInstance.OpenSession())

{

session.Advanced.DocumentStore.JsonRequestFactory.LogRequest += JsonRequestFactory_LogRequest;

I didn't bother to remove the handler using -= " session.Advanced.DocumentStore.JsonRequestFactory.LogRequest -= JsonRequestFactory_LogRequest;"

I assume you pool the sessions/connections , not sure if this is a bug in your code or I should have expected this.

That way or another, that does not seem related to my original problem, I think.

Still not sure how to reproduce the real problem not in production in a reasonable way.

I was wondering, perhaps the face that the requests are pretty slow (between 50-400 ms) causes issues with the threads pool? I am using some caching so it happens that there are 10 parallel requests to the application which generate sometimes 5-20 db calls (some in parallel). However, in a 4 threads (logical cores) computer should it really be a problem?

I am attaching the test project since I already coded it. Here: https://drive.google.com/file/d/0B5BJ7O60hUr1bXpGWFI4a1RIM1k/view?usp=sharing

If you want to run it:

It assumes ravendb is installed in localhost:8080, it creates a new db "TestIdo"., You should first run the "FillDB()" test method in the unit test project, then run the console application.

Thanks,

Ido

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ravendb/PFrEq-asfCE/unsubscribe.

To unsubscribe from this group and all its topics, send an email to ravendb+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ravendb/PFrEq-asfCE/unsubscribe.

To unsubscribe from this group and all its topics, send an email to ravendb+unsubscribe@googlegroups.com.

Oren Eini (Ayende Rahien)

unread,

Jul 24, 2016, 2:43:33 AM7/24/16

to ravendb

If this isn't reproing, that isn't going to help.

Make sense on the LogRequest issue, that is global to the document store, you can do that once.

It is possible that this happens when the thread pool need to add more threads.

What happens if you add, in your app, ThreadPool.SetMinThreads(100,100) ?

Ido Ben Ari

unread,

Jul 25, 2016, 3:23:23 PM7/25/16

to RavenDB - 2nd generation document database

Yes, I get you about the LogRequest, my mistake.

The good news, ThreadPool.SetMinThreads(100,100) seems to resolve the problem! Thank you so much for helping about that!

I think it's relate to the requests pattern in my system. There are requests bursts from time to time , so we can have like 1 request every few seconds and then a burst of 15 concurrently every 5-10 minutes.. I think it confused the Thread pool's thread allocation algorithm.

I am tracking the behavior now to see the influence of this change for 24 hours.

Obviously part of the problem is the long duration in the db level which I still have to solve.

Thank,

Ido

Oren Eini (Ayende Rahien)

unread,

Jul 25, 2016, 4:19:52 PM7/25/16

to ravendb

Okay, if that is the case, it explains it. The wait is the ThreadPool waiting to spin new threads in response for the new work.

Reply all

Reply to author

Forward