RavenDB builder 2137 - Maxing out Memory and Sluggish

291 views
Skip to first unread message

Khalid Abuhakmeh

unread,
Nov 12, 2012, 9:05:51 AM11/12/12
to rav...@googlegroups.com
Hello Again,

I am running Build 2137 and it seems to get sluggish at times. I checked my server and found that it is using up 7 gigs of my available 8 gigs of memory. Is this normal? I don't have a lot of documents (52K, 10 indexes). 

I am continually adding documents (10 - 20 documents at a time)  from a news feeds so indexing is constantly happening.

I notice the response time in my application (on another server). It just takes about 2 - 3 seconds for a response sometimes, and I am not using any WaitForNonStaleResults anywhere.

Thanks,

Khalid Abuhakmeh

Chris Marisic

unread,
Nov 12, 2012, 9:08:14 AM11/12/12
to rav...@googlegroups.com
Response times for queries, loads or both?

Oren Eini (Ayende Rahien)

unread,
Nov 12, 2012, 9:08:12 AM11/12/12
to rav...@googlegroups.com
That should not be happening, can you test build 2140 as well?

Khalid Abuhakmeh

unread,
Nov 12, 2012, 9:44:26 AM11/12/12
to rav...@googlegroups.com
Ok I just upgraded and the pictures are before and after:



I'll keep watching and see if it freaks out again.




On Monday, November 12, 2012 9:05:51 AM UTC-5, Khalid Abuhakmeh wrote:

Oren Eini (Ayende Rahien)

unread,
Nov 12, 2012, 9:45:11 AM11/12/12
to rav...@googlegroups.com
Okay, great

Khalid Abuhakmeh

unread,
Nov 12, 2012, 10:31:33 AM11/12/12
to rav...@googlegroups.com
Just as a side question:

Where is the major bottle neck for RavenDB. If we had to upgrade one or two things on the server what would it be to get the most band for our buck? (Memory, Disk upgrades (SSD), or Processors)

Thanks,

Khalid

Oren Eini (Ayende Rahien)

unread,
Nov 12, 2012, 10:37:23 AM11/12/12
to rav...@googlegroups.com
IO is probably the most important thing to watch out for.
Some things (fsync, writing indexes, etc) are pretty much IO bound.

One of the reasons that the latest version of RavenDB are significantly faster is that we implement a pretty sophisticate pre fetching and eager loading strategies to make sure that we don't have to wait for the IO.

Chris Marisic

unread,
Nov 12, 2012, 10:39:16 AM11/12/12
to rav...@googlegroups.com
And the IO is heavily seek based, such that SSD/15K rpm disks are going to have the largest impact, correct?

Oren Eini (Ayende Rahien)

unread,
Nov 12, 2012, 10:41:50 AM11/12/12
to rav...@googlegroups.com
Yes... except that then you get into the kind of optimizations that we are already doing, which lead us to things like multiple levels of caches, which help dealing with it.

But yeah, fast IO, especially for seeks, helps, a lot.

Khalid Abuhakmeh

unread,
Nov 12, 2012, 11:18:57 AM11/12/12
to rav...@googlegroups.com

The issue is back and not sure what is going on.


Oren Eini (Ayende Rahien)

unread,
Nov 12, 2012, 11:23:33 AM11/12/12
to rav...@googlegroups.com
Okay, can you reproduce it?
What does the /stats endpoint shows?

Khalid Abuhakmeh

unread,
Nov 12, 2012, 11:29:14 AM11/12/12
to rav...@googlegroups.com
Here are the stats, not sure what I should be looking at.

Not sure how you would reproduce this issue other than spinning up a new environment and duplicating the database.
stats.txt

Oren Eini (Ayende Rahien)

unread,
Nov 12, 2012, 11:31:07 AM11/12/12
to rav...@googlegroups.com
I am not sure either.
Can you try to run this under a memory profiler, to see what is taking so much memory?

Oren Eini (Ayende Rahien)

unread,
Nov 12, 2012, 11:32:21 AM11/12/12
to rav...@googlegroups.com
I can certain say that this isn't supposed to happen, so it is probably something that we missed. Hopefully it is obvious what it will be once we have everything at hand.
We can do a skype call tomorrow to resolve this.

Khalid Abuhakmeh

unread,
Nov 12, 2012, 11:50:57 AM11/12/12
to rav...@googlegroups.com
What memory profiler would you like? 

Chris Marisic

unread,
Nov 12, 2012, 11:53:35 AM11/12/12
to rav...@googlegroups.com
I think they have jetbrain's, and probably ants.

Oren Eini (Ayende Rahien)

unread,
Nov 12, 2012, 11:55:35 AM11/12/12
to rav...@googlegroups.com
JetBrains dotTrace is the one I routinely use, 

Khalid Abuhakmeh

unread,
Nov 12, 2012, 11:58:46 AM11/12/12
to rav...@googlegroups.com
DotTrace it is. Seeing it slowly climb from 2 gigs up to 4 gigs as I install the profiler. Looks like a memory leak from the task manager. I'll take some snapshots and see what happens.

Khalid Abuhakmeh

unread,
Nov 12, 2012, 2:21:42 PM11/12/12
to rav...@googlegroups.com
I had to use DotTrace 4 to take snapshots due to my environment. You can download the files below. I didn't see anything odd in terms of files escalating out of control.

https://docs.google.com/open?id=0B7jKgHd2UntSREFsRXV2UDNrQ1E

Thanks,

Khalid

Paul Hinett

unread,
Nov 13, 2012, 6:51:05 AM11/13/12
to rav...@googlegroups.com
I seem to be getting some kind of memory leak too, I haven't had chance
to get any hard evidence with tracing and such, but in more recent
builds my server always seems to max out the memory (32gb) and become
unstable.

Oren Eini (Ayende Rahien)

unread,
Nov 13, 2012, 6:53:09 AM11/13/12
to rav...@googlegroups.com
We are investigating this now

Oren Eini (Ayende Rahien)

unread,
Nov 13, 2012, 7:07:24 AM11/13/12
to rav...@googlegroups.com
I tried to open them with dot Trace memory, but it doesn't know how to deal with those files.

Khalid Abuhakmeh

unread,
Nov 13, 2012, 8:29:04 AM11/13/12
to rav...@googlegroups.com

Khalid Abuhakmeh

unread,
Nov 13, 2012, 1:59:29 PM11/13/12
to rav...@googlegroups.com
Getting anywhere with this issue?

My current work around is just to stop and start the RavenDB server once to twice a day now. Luckily restarts don't take too long.

Vlad K

unread,
Nov 13, 2012, 3:24:19 PM11/13/12
to rav...@googlegroups.com
Since you are having same issues as me I'll just add to this thread -

In profiling we see tons of System.LocalDataStoreElement in memory. It keeps growing.
Looking into it I stumbled upon this - http://support.microsoft.com/kb/2540745

Oren, do you have that hotfix on your test boxes by any chance?

Khalid Abuhakmeh

unread,
Nov 13, 2012, 3:32:42 PM11/13/12
to rav...@googlegroups.com
That is very interesting Vlad. Have you applied the Hotfix to your server, and what affect has it had?

Vlad K

unread,
Nov 13, 2012, 3:39:01 PM11/13/12
to rav...@googlegroups.com
Didn't apply. Have to contact MS about it (#$%#$%#$%). I'll wait for Oren's reply :)
 
In theory this issue should also be fixable by deploying 4.5 on the server (I would assume issue was fixed in 4.5) but we will wait a bit on doing that.

Khalid Abuhakmeh

unread,
Nov 13, 2012, 3:45:44 PM11/13/12
to rav...@googlegroups.com
Haha, nope. Bad assumption. I am running Windows Server 2012, which has .Net 4.5 out of the box installed. 

Not sure why hotfixes are horded like that by Microsoft.

Vlad K

unread,
Nov 13, 2012, 3:59:59 PM11/13/12
to rav...@googlegroups.com
When you installed 2012 you just gave up all support rights :)
That's what MS told us about some of IT problems, until SP1 we are SOL.

These issues we are having are on 2008 R2, IIS 7.5, raven 2139 and .NET 4

Chris Marisic

unread,
Nov 13, 2012, 4:08:16 PM11/13/12
to rav...@googlegroups.com
... I find that very unlikely.

Khalid Abuhakmeh

unread,
Nov 13, 2012, 4:08:57 PM11/13/12
to rav...@googlegroups.com
Well my IT Administrator is reaching out to Microsoft and going to try and get this patch and apply it. See what it does. In the mean time I will keep everyone updated on the situation.

@Oren: There are only 36 references to ThreadLocal in the codebase. So you might be able to switch over to ThreadStatic as an alternative maybe, possibly, hopefully.

Oren Eini (Ayende Rahien)

unread,
Nov 14, 2012, 2:59:42 AM11/14/12
to rav...@googlegroups.com
I am looking this over, and I am pretty sure that it is the deadlock issue that was also raised.
I'll do a build around midday, and it should fix this.

Oren Eini (Ayende Rahien)

unread,
Nov 14, 2012, 3:00:37 AM11/14/12
to rav...@googlegroups.com
I don't think that we have that, and I know we don't call this directly.
Maybe something we do call this indirectly? 

Oren Eini (Ayende Rahien)

unread,
Nov 14, 2012, 3:01:26 AM11/14/12
to rav...@googlegroups.com
We can't really switch to that, we assume different instances will have different thread local value in many cases.

Khalid Abuhakmeh

unread,
Nov 14, 2012, 8:09:35 AM11/14/12
to rav...@googlegroups.com
The only way I could think to reproduce this problem is to take a large dataset (IMDB) and feed in about a 100 documents a minutes every minute. Also have a Map/Reduce index that does some aggregation. The memory will grow over several hours until it just consumes the server.

Oren Eini (Ayende Rahien)

unread,
Nov 14, 2012, 8:22:19 AM11/14/12
to rav...@googlegroups.com
Okay, I am pretty sure it is the concurrent task issue, and we have fixed that.
Need to do a bunch more stuff, but the next build should resolve this.
Can you wait until tomorrow to test this?

Paul Hinett

unread,
Nov 14, 2012, 8:24:13 AM11/14/12
to rav...@googlegroups.com
I will test it tomorrow too, i have reverted back to on older build for
now which is fine.

Vlad K

unread,
Nov 14, 2012, 9:17:10 AM11/14/12
to rav...@googlegroups.com
My other culprit for this would be nLog. What's a proper way to completely disable logging in RavenDb?

Also, when I tried playing around with nlog.config I always get HttpEndpoint error and have to remove that config or server doesn't work.
I used examples on your site and posted before in this forum, they all throw HttpEndpoint exception.

Chris Marisic

unread,
Nov 14, 2012, 9:27:56 AM11/14/12
to rav...@googlegroups.com
This is why I choose to depend on RavenDB, i have never once in my life been involved with a project that has such immediate feedback cycles of the developers compared to RavenDB. I'm used to issues languishing for numerous days, weeks or even months including when failing tests are provided or even further pull requests are provided and never seemingly merged into the project.

This even goes as far as to include Microsoft, where i've seen dozens of users report and confirm issues with repeatable failures and then the issue gets closed as either can't reproduce or stated that it was fixed in a previous version, yet it's still occurring.

Khalid Abuhakmeh

unread,
Nov 15, 2012, 9:05:44 AM11/15/12
to rav...@googlegroups.com
Hello Oren,

has the build with your fixes been released yet?

Oren Eini (Ayende Rahien)

unread,
Nov 15, 2012, 9:09:56 AM11/15/12
to rav...@googlegroups.com
Not yet, we are merging from multiple sources, and it takes a bit of time.

Khalid Abuhakmeh

unread,
Nov 15, 2012, 10:31:00 AM11/15/12
to rav...@googlegroups.com
alright cool beans. 

Vlad K

unread,
Nov 15, 2012, 3:07:27 PM11/15/12
to rav...@googlegroups.com
Upgraded to 2142, still same memory profile  (this is at about 3 gigs, when we stopped the process)
Has anyone else done memory profiling? Do you also have LocalDataStoreElement as the highest memory use?


Oren Eini (Ayende Rahien)

unread,
Nov 15, 2012, 5:39:06 PM11/15/12
to rav...@googlegroups.com
I just pushed 2143, which I think should fix this, can you test this?

Vlad K

unread,
Nov 15, 2012, 5:48:29 PM11/15/12
to rav...@googlegroups.com
I will update to 43 tomorrow and we'll run through our db tests again.

Thanks.

Oren Eini (Ayende Rahien)

unread,
Nov 16, 2012, 2:14:39 AM11/16/12
to rav...@googlegroups.com
Great!

On Fri, Nov 16, 2012 at 3:43 AM, Dan Trotta <trott...@gmail.com> wrote:
Hi:
 
I work with Khalid, we just upgraded from 2140 to 2143.
So far the issue seems stabilized. The memory is holding between 1.6 and 1.8 gig.
On 2140 it would just climb until it maxed out.
 
We will post again tomorrow if it is still holding.
 
Dan

Khalid Abuhakmeh

unread,
Nov 16, 2012, 7:43:49 AM11/16/12
to rav...@googlegroups.com
Just to reconfirm what Dan said. I checked it this morning and memory is holding at the levels that he mentioned last night. Now Dan can go back to playing Halo 4. :)

Thank you Oren and thank you to the RavenDB team.
Reply all
Reply to author
Forward
0 new messages