Major issues since #35202 upgrade

130 views
Skip to first unread message

ZNS

unread,
Jun 6, 2017, 3:09:42 PM6/6/17
to RavenDB - 2nd generation document database
Hello,

We have ravendb setup running multiple databases  and replication on two separete servers and we recently made two changes:

1. Upgraded from #35183 to #35202
2. All our databases we're running master->slave machine 01 replicated to 02 but 02 did not replicate anything. However one of our databases had some special needs so we also set only this database to master<->master so 02 replicates to 01 for this database only.

Since this time we have had two incidents where our second server, the one that runs as slave for all databases (except one), has used 100% CPU for several hours. This causes the master server to throw ALOT of "Failed to replicate.." errors both for documents, indexes and transfomers, due to timeout. Which after several hours seems to degrade the master server so that it also stops responding to requests, they time out. The strange thing is that the slave server which runs on 100% CPU does not log anything in the ravendb logs.

Is it safe to downgrade back to #35183?

Oren Eini (Ayende Rahien)

unread,
Jun 7, 2017, 1:57:37 AM6/7/17
to ravendb
Can you take a debug info package and maybe a minidump and send it to us?


Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ZNS

unread,
Jun 7, 2017, 4:18:25 AM6/7/17
to RavenDB - 2nd generation document database

At which point in time can I take those, so that they are of any use to you? Only during the 100% cpu-load, or right after? During the server is pretty much unreachable so that can be hard. Also would it be safe to downgrade to #35183 we might need to do that If this isn't resolved pretty soon.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.

Oren Eini (Ayende Rahien)

unread,
Jun 7, 2017, 4:37:03 AM6/7/17
to ravendb
During the high CPU, yes.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Oren Eini (Ayende Rahien)

unread,
Jun 7, 2017, 4:37:16 AM6/7/17
to ravendb
And I would rather not downgrade, we can look into this pretty easily.

ZNS

unread,
Jun 10, 2017, 7:08:58 PM6/10/17
to RavenDB - 2nd generation document database

Happened again but this time on the master server. Was unable to get any debug data due to that the server was completely unresponsive running on 100% CPU. Most annoyed that the failover doesn't seem to work, we use "ReadFromAllServers" for our API (which gets data from ravendb) but the application failed when master went down, failed when only the slave went down also. We don't have any choice but to downgrade at this point.

Oren Eini (Ayende Rahien)

unread,
Jun 11, 2017, 1:50:23 AM6/11/17
to ravendb
Can you take a minidump at this stage?
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

ZNS

unread,
Jun 13, 2017, 1:57:19 PM6/13/17
to RavenDB - 2nd generation document database

I'm unable to reach the server with remote desktop. I'm looking into using procdump to see if that could work. https://technet.microsoft.com/en-us/sysinternals/dd996900.aspx

Oren Eini (Ayende Rahien)

unread,
Jun 14, 2017, 2:35:03 AM6/14/17
to ravendb
That would work, yes
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

ZNS

unread,
Jun 22, 2017, 10:03:05 AM6/22/17
to RavenDB - 2nd generation document database

Took a while longer this time, but it happened again. Got at least two dumps this time. I'm sending them to support.

Oren Eini (Ayende Rahien)

unread,
Jun 22, 2017, 3:41:52 PM6/22/17
to ravendb
All the highly used threads seems to be request processing / thread pool stuff.

Do you have a high number of requests while this is going on?


To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

ZNS

unread,
Jun 23, 2017, 6:10:44 AM6/23/17
to RavenDB - 2nd generation document database

I haven't checked but it seems unlikely. Almost all requests to ravendb comes from an API we host, and there are not an unusal amount of requests to the API at the time of the cpu load. Also The cpu load goes from normal (avg. ~20%) to 100% almost instantly and then stays at 100% until the machine is rebooted.

http://prntscr.com/fn7pji

Michael Yarichuk

unread,
Jun 25, 2017, 7:01:28 AM6/25/17
to RavenDB - 2nd generation document database
Hi,
Just to be sure, can you check the server's Event Viewer for errors?

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Best regards,

 

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Michael Yarichuk l RavenDB Core Team 

RavenDB paving the way to "Data Made Simple"   http://ravendb.net/  

ZNS

unread,
Jun 25, 2017, 1:01:39 PM6/25/17
to RavenDB - 2nd generation document database

There's unfortunately nothing of interest in the windows event logs...

ZNS

unread,
Jun 27, 2017, 7:02:24 AM6/27/17
to RavenDB - 2nd generation document database

Without anything to back it up it feels like it's some specific request that triggers the cpu load, perhaps causing an infinite loop? Things that support that theory are that it happens at what seems random times. It's most often several days, but not a specific number of days, between occurances but in one case just a few hours. It happens on both the master and the slave server, never at once but randomly. This could be explained by that we use the failover "ReadFromAllServers".

Oren Eini (Ayende Rahien)

unread,
Jun 27, 2017, 1:21:15 PM6/27/17
to ravendb
Assuming that this is the case, this should show up in the Debug Package Info, which capture stack traces, or with a minidump, for the same reason

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Ian Cross

unread,
Nov 14, 2017, 5:07:07 AM11/14/17
to RavenDB - 2nd generation document database
This seems to be very similar to the behaviour I am seeing. Was there anything we can learn from the experience here please?

We have also setup procdump this morning to try and track it down. Did anything come from this thread?

Cheers,

Ian

Oren Eini (Ayende Rahien)

unread,
Nov 14, 2017, 10:51:05 AM11/14/17
to ravendb
Note that I recall, no

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
Reply all
Reply to author
Forward
0 new messages