v3 perfomance issues in production

197 views
Skip to first unread message

nickvane

unread,
May 27, 2015, 4:10:38 AM5/27/15
to rav...@googlegroups.com
Hi,


Above are 2 charts of the average response time for a query/load to ravendb for our system. After the upgrade to v3 we regularly (4-5 times daily) get these spikes. They go away without our intervention.

Some info about the production environment:

  • Raven version 3635
  • RavenDb is running on an azure ds3 (4 processors and 14 gb ram). Raven is using on average 30% cpu and 10 gb of ram.
  • The database contains 5,5 million records. 
  • The load on the system has been the same on the system for that last half hour
  • we have 17 indexes, of which almost none are on documents that are regularly added. The huge part of new documents that are added, are retrieved by load, and are not being indexed.
  • We have a master-slave setup. The second server is replicated to and has the same server specs.
  • No virus scanner installed, no running jobs, no spikes in load/activity.
What could be the cause of this, and how can we find it?

Kind regards, Nick

Oren Eini (Ayende Rahien)

unread,
May 27, 2015, 4:40:26 AM5/27/15
to ravendb
Can you check the I/O rates during the spikes?
Also, what is the metric for the charts?

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nickvane

unread,
May 27, 2015, 6:35:06 AM5/27/15
to rav...@googlegroups.com
The metrics are the average response time for the same type of requests measured by timing the duration of the query/load. It wraps the execution of the query.
The duration is then submitted to newrelic. We also have the raven profiler built in, but disabled at the moment.

Oren Eini (Ayende Rahien)

unread,
May 27, 2015, 7:02:29 AM5/27/15
to ravendb
The duration is in ms, right?

Oren Eini (Ayende Rahien)

unread,
May 27, 2015, 7:03:03 AM5/27/15
to ravendb
Do you also have I/O timing data from that machine? We know that azure I/O rates fluctuate heavily.

Lars Brand

unread,
May 27, 2015, 7:11:02 AM5/27/15
to rav...@googlegroups.com
Hello,

The incident today @ around 8:50, (first graph from Nick)
Master:

Slave:


Another noticeable incident happened yesterday:
At that time we opted to restart RavenDB on our master server. (at about the time the memory dropped)
Db Metrics:

Master:

Slave:

I hope these graphs provide sufficient information.
If you need anything, feel free to ask us.

Kind regards,
Lars

Oren Eini (Ayende Rahien)

unread,
May 27, 2015, 7:35:57 AM5/27/15
to ravendb
You mentioned that you restarted the server. Did this happen before or after the restart?

Did it happen before? Can you take a mini dump, as well as debug package info for the servers when this happens?

Also, I really need to see the I/O _latency_ during this time.

Lars Brand

unread,
May 27, 2015, 9:10:11 AM5/27/15
to rav...@googlegroups.com
The duration is in ms, right?

No, unfortunately these number represent good 'ol seconds. When things are running smoothly the "stack of metrics" stays under 1 second.

You mentioned that you restarted the server. Did this happen before or after the restart? Did it happen before?
 
We restarted RavenDb once we noticed calls where taking abnormally long. The incident happened before we restarted RavenDb, after the restart it will run fine for X hours.
It has happened before. At this moment it seems to be happening about twice a day. 

Do you also have I/O timing data from that machine? We know that azure I/O rates fluctuate heavily.
Can you take a mini dump, as well as debug package info for the servers when this happens?
Also, I really need to see the I/O _latency_ during this time.

I'm not versed in creating mini dumps, I will try to figure out how to provide you with these items.

Thank you for your time and help, i''ll get back to you asap. 
Lars

Michael Yarichuk

unread,
May 27, 2015, 9:19:34 AM5/27/15
to rav...@googlegroups.com
Best regards,

 

Michael Yarichuk

RavenDB Core Team

Tel: 972-4-6227811

Fax:972-153-4-6227811

Email : michael....@hibernatingrhinos.com

 

RavenDB paving the way to "Data Made Simple" http://ravendb.net/  

Oren Eini (Ayende Rahien)

unread,
May 27, 2015, 9:31:54 AM5/27/15
to ravendb
Also, can you try capturing I/O latency, and running the I/O test on the relevant machine?

Chris Marisic

unread,
May 27, 2015, 11:32:56 AM5/27/15
to rav...@googlegroups.com
I was going to make a comment about using Azure's SSD disks but the DS3 appears to already be so.

Oren Eini (Ayende Rahien)

unread,
May 28, 2015, 5:11:59 AM5/28/15
to ravendb
Try the I/O test output, it should give some indication on that.

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--

Lars Brand

unread,
May 29, 2015, 10:37:51 AM5/29/15
to rav...@googlegroups.com
Hello,

An update about the issues since our last exchange;
We've updated to the latest raven version, this seems to have helped somewhat.
Yesterday one of the azure VM's became unresponsive (our replication master). We had turn it of and are now running solely on our second VM (our replication target).
It's unclear what caused the unresponsiveness of the VM. 

After booting up the VM and had to run a recovery using the esentutl to get raven started again.
 "Database was not shutdown cleanly. Recovery must first be run to properly complete database operations for the previous shutdown."

We've switched our replication direction, thus our functioning environment replicates to the misbehaving environment.
This morning on the misbehaving environment, Ravendb became unresponsive. we couldn't open the studio.
It's unclear if this is the same issue or a new one. 
RavenDb was using 12gb/14gb. Together with other processes it was running about 95 % mem

I've tried creating the mini-dump (while raven was unresponsive). Rightclick process in task manager -> create dump file
this took like 5 hours and ultimately resulted in a 0 bytes file. I guess that's not of any use to you.

I could not run the highly requested i/o test while raven was unresponsive. I've ran the i/o test after restarting ravendb.
I could not create the debug package info while raven was unresponsive, I've ran it after restarting raven.
Note that creating the dump took 5 hours, so there's some time between the i/o test and the start of the issue.

I've got a bunch of i/o tests created yesterday. Though I regret that I've not been able to create one while we where having issues.
I hope these provide information nonetheless. 

Today: 5 hours after issue




Yesterday about 5 minutes apart:

And:

I would love to send you a proper dump, it is my understanding that you'll need a dump of the ravendb process once we're having issues. Or would a dump from anytime suffice?
Note that it's not an easy to reproduce scenario, it might take some time to send you a meaningful dump.

I've attached the debug info package.

Kind regards.
Lars
Admin-Debug-Info-2015-05-29_16-12-02.zip

Oren Eini (Ayende Rahien)

unread,
May 31, 2015, 5:06:31 AM5/31/15
to ravendb
Can you try the following?

Set Raven/DynamicLoadBalancing = false

And see what is going on?

Note that the I/O rates we are seeing are _very_ slow, and might be related to the timing issues.
Can you tell us what the read I/O tests numbers are?

nickvane

unread,
Jun 3, 2015, 2:56:07 PM6/3/15
to rav...@googlegroups.com
It was a little quiet on our side because nothing really happened, which is a good thing.
Last wednesday we upgraded ravendb on the 2 servers to the latest version 3690, and since then we haven't had any of those spikes anymore.

So for now: install latest version fixed the issue

Thanks for all the help!
Nick

Oren Eini (Ayende Rahien)

unread,
Jun 3, 2015, 3:25:51 PM6/3/15
to ravendb
Okay, that is great ot hear
Reply all
Reply to author
Forward
0 new messages