Raven 3.5 RC 2 - IIS Worker Process maxing out CPU

134 views
Skip to first unread message

Jon van Doore

unread,
Sep 11, 2016, 9:01:57 PM9/11/16
to RavenDB - 2nd generation document database

I've observed this with both my local machine after the RC2 update and a server in Azure. My local machine is very beefy, 6 cores,32 gb ram on Windows 10. The server is pretty well specced Standard DS11 v2 (2 cores, 14 GB memory) on premium level 2 disks.

The system becomes pretty much unusable, the IIS Worker Process maxes out the CPU and stays there until an app pool recycle gets triggered. After a recycle it goes back to normal. 

This server isn't serving any traffic yet, there is nothing happening on that box, it's been set up in preparation for go live a week from now.

I managed to capture the logs (The names of the dbs have been altered for privacy) of it during the peak and than again after the restart. Nothing should be going on at all on this raven installation, as can be seen from the after reset logs, it's extremely quiet.

Thanks,

Jon
logs-after-reset.json
logs-during-peak.json

Maxim Buryak

unread,
Sep 12, 2016, 4:18:15 AM9/12/16
to rav...@googlegroups.com
Hi,
can you elaborate on the kind of databases you have there? databases amount, number of indexes in each database etc?



Best Regards,

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Maxim Buryak l Core Team Developer Mobile:+972-54-217-7751

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

RavenDB paving the way to "Data Made Simplehttp://ravendb.net/  



--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Oren Eini (Ayende Rahien)

unread,
Sep 12, 2016, 4:19:22 AM9/12/16
to ravendb
Also, do you know if there is anything that would trigger this? 
Can you send the Debug Info Package when that happens?

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Jon van Doore

unread,
Sep 12, 2016, 4:46:48 AM9/12/16
to RavenDB - 2nd generation document database
Hi Maxim,

There are 11 databases on this server, it has a duplicate with master <> master replication going between them. There are very few indexes in the databases, most only have the 2 RavenDb default indexes and an NServiceBus Outbox index. The most indexes we have is a db with 2 user created indexes. Simple map indexes, no multi-maps or even map reduce.

There is no data in these dbs at all yet, only the replication destinations documents and the indexes as placeholders ready for go live.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.

Jon van Doore

unread,
Sep 12, 2016, 4:58:07 AM9/12/16
to RavenDB - 2nd generation document database

If I catch it in the act again I'll take a debug info dump from the admin portal, although it may not work if it's performance intensive as even Traffic Watch really struggled with the cpu capping out at 100%

I did do a server backup a few hours prior, I'm not sure how long it was peaking for but it may have been since that time. 


On Monday, September 12, 2016 at 6:19:22 PM UTC+10, Oren Eini wrote:
Also, do you know if there is anything that would trigger this? 
Can you send the Debug Info Package when that happens?

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


On Mon, Sep 12, 2016 at 11:17 AM, Maxim Buryak <ma...@ravendb.net> wrote:
Hi,
can you elaborate on the kind of databases you have there? databases amount, number of indexes in each database etc?



Best Regards,

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Maxim Buryak l Core Team Developer Mobile:+972-54-217-7751

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

RavenDB paving the way to "Data Made Simplehttp://ravendb.net/  


On Mon, Sep 12, 2016 at 4:01 AM, Jon van Doore <jpv...@gmail.com> wrote:

I've observed this with both my local machine after the RC2 update and a server in Azure. My local machine is very beefy, 6 cores,32 gb ram on Windows 10. The server is pretty well specced Standard DS11 v2 (2 cores, 14 GB memory) on premium level 2 disks.

The system becomes pretty much unusable, the IIS Worker Process maxes out the CPU and stays there until an app pool recycle gets triggered. After a recycle it goes back to normal. 

This server isn't serving any traffic yet, there is nothing happening on that box, it's been set up in preparation for go live a week from now.

I managed to capture the logs (The names of the dbs have been altered for privacy) of it during the peak and than again after the restart. Nothing should be going on at all on this raven installation, as can be seen from the after reset logs, it's extremely quiet.

Thanks,

Jon

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.

Maxim Buryak

unread,
Sep 12, 2016, 6:13:50 AM9/12/16
to rav...@googlegroups.com
Also, when you encounter the high cpu problem again, please send us the memory dump, alongside with the debug info package. 
You may send it directly to sup...@ravendb.com for security



Best Regards,

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Maxim Buryak l Core Team Developer Mobile:+972-54-217-7751

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

RavenDB paving the way to "Data Made Simplehttp://ravendb.net/  



To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Jahmai Lay

unread,
Sep 12, 2016, 6:14:08 AM9/12/16
to RavenDB - 2nd generation document database

I find its best to use Task Manager to reduce the CPU capped process priority to Below Normal or something before trying to take the dump image.

Jon van Doore

unread,
Sep 12, 2016, 10:03:55 PM9/12/16
to RavenDB - 2nd generation document database
Ok I've got the dump and the debug of it happening again. I think it's pretty safe to say it happens after a full database backup of all tenants.

I'll send the debug package and the script we're using to backup everything to that email address. The memory dump is about 800mb so emailing it isn't an option, do you have another preferred method for receiving these?

Thanks,

Jon

Jon van Doore

unread,
Sep 26, 2016, 9:16:39 PM9/26/16
to RavenDB - 2nd generation document database

To add a little more info to this after hitting it again with Build #13 of RC2, I have had logging turned on for a few days to trace down a different issue and this happened again.

I've attached 2 logs, one from the master that was maxing out and the other from it's duplicate master replication partner for reference. Hopefully this helps.
other-master.zip
cpu-maxout-master.zip

Grisha Kotler

unread,
Sep 27, 2016, 3:16:34 AM9/27/16
to rav...@googlegroups.com
What is the backup interval?

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Grisha Kotler l RavenDB Core Team Developer Mobile: +972-54-586-8647

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

RavenDB paving the way to "Data Made Simplehttp://ravendb.net/

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Jon van Doore

unread,
Sep 27, 2016, 3:22:47 AM9/27/16
to RavenDB - 2nd generation document database

The backup is scheduled to run every day once at 1pm UTC.

Grisha Kotler

unread,
Sep 27, 2016, 3:40:32 AM9/27/16
to rav...@googlegroups.com
How do you start this process?
Raven.Backup?

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Grisha Kotler l RavenDB Core Team Developer Mobile: +972-54-586-8647

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

RavenDB paving the way to "Data Made Simplehttp://ravendb.net/


To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Jon van Doore

unread,
Sep 27, 2016, 3:46:47 AM9/27/16
to RavenDB - 2nd generation document database

The backup is triggered by using the powershell script in the package sent to ravendb support, that script is kicked off using a simple Task Scheduler task.

I've attached it again here for reference.
backupall.ps1

Oren Eini (Ayende Rahien)

unread,
Sep 27, 2016, 6:28:11 AM9/27/16
to ravendb
Can you try something, during the backup, make requests to the database as well?
Just /stats would do it.
We currently suspect it might be because we are shutting down during backup
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Jon van Doore

unread,
Sep 27, 2016, 10:13:37 PM9/27/16
to RavenDB - 2nd generation document database

Attach is a dump of the requests being made (to /stats) during a backup.

All requests kept serving through the full backup, cpu didn't go over 30% during the process, dropped back to 1-2% immediately after as usual.
stats during backup.PNG

Oren Eini (Ayende Rahien)

unread,
Sep 30, 2016, 11:07:23 AM9/30/16
to ravendb
Let us go back a bit.
Your issue is high CPU after backup, right?
30% during backup isn't high, it is expected given that we compress the data along the way
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Jon van Doore

unread,
Sep 30, 2016, 11:39:04 AM9/30/16
to RavenDB - 2nd generation document database

Not quite, the issue is I have one of my raven servers go to 100% cpu and stay there until an IIS reset is done, it will sit like that for hours until it's noticed and then killed.

During the max out nothing runs, no indexes are built, no queries get served.

I also thought the backup could be a red herring, the reason for suspecting it at all was simply that the boxes aren't serving traffic, it is literally the only thing running that hits raven on that box.

I took an IIS memory dump and sent it through previously to raven support which was taken during a max out, hopefully that yields some answers.

From the support email (September 13th):
The peak is on going and will continue until IIS is reset. I have a memory dump from IIS but it's 800 mb a thus too large for emailing.

The reason I suspect the backup is simply because it's the only thing actually running on the system. That said I did an IIS Reset, cpu went back to normal, then I ran the Task Scheduler that runs the backup, the backup finished successfully and the cpu during the backup didn't go past 30%. The backup took about 20 seconds to complete and then when finished the cpu dropped immediately back down to around 2-3% and stayed there. So it may be a red herring, so take the backup suggestion with a grain of salt.

The twin server of this one in the multi-master setup doesn't see this behavior. Both were set up at the same time with the same scripts, the only differences are the replication settings.Raven is the only website on these boxes, there are also several nservicebus services on these servers (both of them) but they're not processing any traffic at all yet (as can be seen in the screenshot of the services, nothing has been logged yet, and they've been on for days).

Jon van Doore

unread,
Oct 2, 2016, 8:44:31 PM10/2/16
to RavenDB - 2nd generation document database

It looks to me like the indexer is going insane, I've attached the Admin Log from the raven server during a spike.

The Admin log generated the attached in literally 2 seconds, 90,000 log entries. This server isn't serving any traffic and has a small amount of documents in it. It's duplicate, the other master in the multi-master pair, took 10s of minutes to fill it's admin log.

It looks like this is the culprit, I'm not sure what is triggering it though. It seems stuck in an infinite loop trying to index and then finding no work and then trying to index again.

This is the same as the RC 2 core log (this raven instance was updated/reinstalled to RC2 Build #13 as per the support email)
Ravendb_AdminLogs_DuringPeak.json

Oren Eini (Ayende Rahien)

unread,
Oct 3, 2016, 8:32:02 AM10/3/16
to ravendb
This looks like this is the culprit:

    "Message": "Incremented work counter to 113784507 because: WORK BY IndexingExecuter",

Something is causing us to think we still have indexing work, even though we don't have any.

Did you introduce a new index shortly before this?
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Jon van Doore

unread,
Oct 3, 2016, 7:30:46 PM10/3/16
to RavenDB - 2nd generation document database

I've now got both of the raven production boxes doing it, usually it's only one at a time but I'm guessing they just individually hit it overnight and haven't been reset as yet.

We have 3 indexes on that database, none of which are new or created explicitly by us.

The indexes are as follows:

Raven/ConflictDocuments - 0 entries   



OutboxRecordsIndex - 29 entries   



That Outbox records index is from nservicebus as is defined as follows:

from doc in docs.OutboxRecords
select new {
Dispatched = doc.Dispatched,
DispatchedAt = doc.DispatchedAt
}

No reduce, field storage or anything particularly special about it. You'll also note there isn't a ton of data in that db to index, about 80 documents total. This isn't serving traffic as yet, it is under extremely small load.

All of those indexes have been there since the database was first launched.

I have an IIS memory dump, the Admin Logs (already attached), the debug info output and logging enabled, if you need any of those just let me know and I'll drop them somewhere secure.

Jon van Doore

unread,
Oct 3, 2016, 7:52:35 PM10/3/16
to RavenDB - 2nd generation document database

To add a little more detail, This is the code within NServiceBus that is calling that other index:


This is just for completeness, I don't see anything here that is out of the ordinary. Other than that we don't use any custom indexes, just loading documents by Id.

ar...@ayende.com

unread,
Oct 4, 2016, 3:59:49 AM10/4/16
to RavenDB - 2nd generation document database
Hi Jon,

yes, please send the debug stuff you have grabbed to our support mail.

Regards,
Arek
Reply all
Reply to author
Forward
0 new messages