40 CPUs, all at 100% usage by Raven

167 views
Skip to first unread message

Media Lab

unread,
Mar 16, 2017, 4:56:00 PM3/16/17
to RavenDB - 2nd generation document database
Running Client Build #35191, when we set 20 calls per second to our web server that end up stacking IIS, we notice that the server running RavenDb starts spiking its CPU to 100%. We even kicked up the CPUs to 40, and all 40 CPUs spiked.

Normally it runs 12 CPUs, but even then, as soon as a few dozen requests come in at a time, Raven spikes.

None of the older threads regarding CPU's seem to match as they were for much, much older builds. 

Can anyone shed any light where to look?

Jahmai Lay

unread,
Mar 16, 2017, 6:27:33 PM3/16/17
to RavenDB - 2nd generation document database
I'm sure Oren will save the day, but useful info is:

Has anything changed (new indexes, server config, size of data)?
Are the calls writes that affect indexes? What do they look like?
Do you have replicated set up? What is the config / is the status healthy?
Do you have a Debug log you can share?

Media Lab

unread,
Mar 16, 2017, 8:31:26 PM3/16/17
to RavenDB - 2nd generation document database
Just to clarify one small mistake - we have 2 servers with 6 CPUs regularly, not 12 on one. My mistake.

Media Lab

unread,
Mar 16, 2017, 9:23:59 PM3/16/17
to RavenDB - 2nd generation document database
We're still learning the ropes on Raven - how do I find indexes that could be causing issues?

Michael Yarichuk

unread,
Mar 17, 2017, 5:03:40 AM3/17/17
to RavenDB - 2nd generation document database
Hi,

* Indexes : in order to see what the indexes are doing, you can  take a look in the Studio - at database view --> Status --> Indexing.
* Replication : you can look at the status of replication cluster in the Studio (https://ravendb.net/docs/article-page/3.5/csharp/studio/management/server-topology)

Couple of questions:
* Do you have debug logging enabled? Do you see anything in the logs?
* Do you see anything in Windows Events logs?
* Did you have this issue before? When did it start?
* Is it possible to reproduce it if needed?
* Can you take debug-info package when RavenDB spikes, I'd like to take a look. (https://ravendb.net/docs/article-page/3.5/csharp/studio/management/gather-debug-info)
You can send it to sup...@hibernatingrhinos.com

On Fri, Mar 17, 2017 at 3:23 AM, Media Lab <ml.se...@medialabinc.com> wrote:
We're still learning the ropes on Raven - how do I find indexes that could be causing issues?

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Best regards,

 

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Michael Yarichuk l RavenDB Core Team 

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 

RavenDB paving the way to "Data Made Simple"   http://ravendb.net/  

Oren Eini (Ayende Rahien)

unread,
Mar 17, 2017, 7:58:07 AM3/17/17
to ravendb
Is this something that you can reliably reproduce? 

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

--

Media Lab

unread,
Mar 17, 2017, 8:18:22 AM3/17/17
to RavenDB - 2nd generation document database
The worst part? No... Sometimes everything is fine, like this morning. I'm running stress tests  & while IIS is clogging up, Raven is running perfectly fine.

Here's our scenarios where we see the main problems.

We deal in the home builder industry. Our clients need to update statuses on Units to things like "Sold", "Available" or price, square footage, etc. So we have a WebAPI Restful endpoint where they can PUT the full unit object.

Before they do the PUT, they also call a GET to get the most current info on the Unit, They call the unit based on an id they have, not ours, as technically they may not know it's Units/38213 and only know it by PH338aZ. So we query on the unit.ClientUniqueIdentifier and unit.ClientId. There is an index for those two fields.

Then they dump ~300 updated units to ~300 individual PUT endpoints. IIS backs up because Raven slows down. It's not that the requests themselves are slow - according to Raven, the calls happen internally in only a few ms, but with the CPU spike, it's like Raven can't even send the information back. These delays cause IIS to stack up its request queue, which only makes things worse. During this time, Raven simply spikes the CPI usage. My assumption is a rogue index that is being re-indexed for each PUT request.

We are looking at a batch update for them, but that's not going to happen in our current sprint or even the next one as the client themselves don't do 2 week sprints and changes take sometimes 6 weeks for them to get to.

Another odd spike that tends to happen, but not necessarily all the time, is stress testing the server to call a single document, like Units/1. I'll make 200 requests per second and immediately the CPU just jumps up to 80/90% usage and stays there, all the while IIS is clogging up & not returning data.

I do, on the API calls, have a filter that starts a stopwatch as soon as the call comes in and upon call executed, writes an API log to Raven that indicates what URL was called, the user who called it, when it was started and how long the call took. Those are still reporting several milliseconds for the entire thing to happen, so it's not like Raven itself is being sluggish.

I'm just trying to learn what is causing this so I can fix it (and try to get more confidence in Raven as a solution).

Oh, I can't generate any debugging information. I click on Create Info Package With Stack Traces and the button blinks quickly, then does nothing.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.

Oren Eini (Ayende Rahien)

unread,
Mar 17, 2017, 8:37:06 AM3/17/17
to ravendb
Can you take a minidump of the process while it high cpu mode?
It is best if you can take 3 - 5 dumps a few seconds apart under high CPU mode
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Media Lab

unread,
Mar 17, 2017, 8:41:42 AM3/17/17
to RavenDB - 2nd generation document database
Be glad to, how do I do that in Raven?

Oren Eini (Ayende Rahien)

unread,
Mar 17, 2017, 8:43:35 AM3/17/17
to ravendb
The easiest is to use Process Explorer and right click on the process, then select dump -> Mini Dump

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


On Fri, Mar 17, 2017 at 2:41 PM, Media Lab <ml.se...@medialabinc.com> wrote:
Be glad to, how do I do that in Raven?

--

Media Lab

unread,
Mar 17, 2017, 8:47:10 AM3/17/17
to RavenDB - 2nd generation document database
Ah, I thought it was a Raven button that I couldn't find ;)

As soon as I can get it to clog back up, I'll take one.


On Friday, March 17, 2017 at 8:43:35 AM UTC-4, Oren Eini wrote:
The easiest is to use Process Explorer and right click on the process, then select dump -> Mini Dump

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


On Fri, Mar 17, 2017 at 2:41 PM, Media Lab <ml.se...@medialabinc.com> wrote:
Be glad to, how do I do that in Raven?

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.

Media Lab

unread,
Mar 22, 2017, 7:57:23 AM3/22/17
to RavenDB - 2nd generation document database
We were able to resolve this issue. I cleared out all the auto created indexes, created permanent ones, and the system started running just fine. We had other issues as well with the server itself, so it may not have been that one thing.

Regardless, under load, Raven performs quite well.

Oren Eini (Ayende Rahien)

unread,
Mar 22, 2017, 8:24:08 AM3/22/17
to ravendb
Okay, great, although I can't see how this would be related,unless you had a very large number of indexes?

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


On Wed, Mar 22, 2017 at 1:57 PM, Media Lab <ml.se...@medialabinc.com> wrote:
We were able to resolve this issue. I cleared out all the auto created indexes, created permanent ones, and the system started running just fine. We had other issues as well with the server itself, so it may not have been that one thing.

Regardless, under load, Raven performs quite well.

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Media Lab

unread,
Mar 22, 2017, 8:47:17 AM3/22/17
to RavenDB - 2nd generation document database
We were experiencing other issues at the same time, so it wasn't necessarily Raven alone. Whatever the combination of issues was is gone. Not only is Raven not running at 100% any more, it's performance has been impressive under load.
Reply all
Reply to author
Forward
0 new messages