Memory issue, progress

Oren Eini (Ayende Rahien)

unread,

Dec 20, 2012, 3:37:07 AM12/20/12

to ravendb

Okay, we have made some progress.

a) We are 99.9% sure that the issue is with pre fetching. This is supported by the fact that disabling pre-fetching appears to alleviate entirely the memory usage issue.

b) Fixing that seems pretty hard, since pre fetching is heavily async and relies on a lot of heuristics.

c) So far, we have mostly been able to find this issue only on production

d) In load test, we don't see this issue AT ALL.

We finally managed to crack things to the point where we have a pretty reliable system for checking this.

See: https://github.com/ayende/ravendb/tree/mem-repro

In particular, the SimulatedWorkLoad will exhibit what I believe is a fairly heavy production load (a lot of writes, a lot of intermixed reads, introducing an index mid way of working, etc).

Using this method, we were able to reproduce the issue on our test environment in relatively short order.

Oren Eini (Ayende Rahien)

unread,

Dec 20, 2012, 4:10:17 AM12/20/12

to ravendb

And the plot thickens, it appears that the thing that I suspected, the futureBatches, actually works properly now. Even though we still can see Raven taking WAY too much memory, and even though it is almost entirely in managed code.

Still investigating.

Matt Warren

unread,

Dec 20, 2012, 12:57:08 PM12/20/12

to ravendb

I've been running this sample and looking at the memory usage using VMMap and Performance Monitor and nothing stands out.

However I do consistently get "Version Store out of memory errors" as shown below.

Note that I've now run the SimulatedWorkLoad app against the same server 3 times (the 3rd time the app, not the server, crashed with the version store errors. So it's now inserted over 1/2 a million docs. But SimulatedWorkLoad app will no longer complete it's test, i.e. it can't insert all the docs without getting the version store error. It seems to insert about 100k before stopping.

Interesting enough when I go to http://localhost:8080/databases/load/stats or http://localhost:8080/admin/stats, DatabaseTransactionVersionSizeInMB always seems to be 0?

Request #193,264: GET - 440 ms - Load - 200 - /docs/users/182283

Request #193,265: GET - 33 ms - Load - 200 - /docs/users/168358

Request #193,266: GET - 616 ms - Load - 200 - /indexes/Users/Locations?query=City%253AManhattan&start=0&page

Size=128&aggregation=None

Query: City:Manhattan

Time: 595 ms

Index: Users/Locations

Results: 128 returned out of 400 total.

Error on request Version store out of memory (cleanup already attempted)

Request #193,267: POST - 2 ms - Load - 500 - /bulk_docs

PUT users/485800

PUT users/485801

PUT users/485802

PUT users/485803

PUT users/485804

Error on request Version store out of memory (cleanup already attempted)

Request #193,268: POST - 2 ms - Load - 500 - /bulk_docs

PUT users/485805

PUT users/485806

PUT users/485807

Error during idle operation run for Load Version store out of memory (cleanup already attempted)

Oren Eini (Ayende Rahien)

unread,

Dec 20, 2012, 4:13:42 PM12/20/12

to rav...@googlegroups.com

Matt,

You need to run the server as admin to see the DatabaseTransactionVersionSizeInMB value.

I never got the version out of memory when running this.

Matt Warren

unread,

Dec 20, 2012, 4:56:00 PM12/20/12

to ravendb

Oh yeah, I should've remembered that, I'll re-run it and see what the value for DatabaseTransactionVersionSizeInMB is before I get the Version Store error.

I only got the error when I ran the test app for the 3rd time against the same server, I didn't re-start the server in-between and it still has the 1/2 million docs from the first 2 runs. I don't know if that makes a difference? But it's happening repeatedly now, I can restart the server and the app and I still get the error without the app even running thru once.

Matt Warren

unread,

Dec 20, 2012, 5:01:43 PM12/20/12

to ravendb

Just to add, it seems like the memory totals still don't add up, there's still something there that's not accounted for?

I think that the amount for managed memory is a bit low. The perf counter ".NET CLR Memory - # Total Commited Bytes" always seems to be higher. And according to this page it's probably the one we wan't to use, see http://blogs.msdn.com/b/maoni/archive/2006/12/12/difference-between-perf-data-reported-by-different-tools-2.aspx in the last few paragraphs.

Matt Warren

unread,

Dec 20, 2012, 5:34:53 PM12/20/12

to ravendb

I've read a bit more into it and this is how I understand it

- GC.GetTotalMemory() is the amount of memory the GC detected your program was using the last time if did GC

- "# Total Commited Bytes" is the amount the CLR has allocated across all heaps, it allocated memory in chunks, so this will always be higher and is a truer measure of how much memory the CLR is using in your process.

Oren Eini (Ayende Rahien)

unread,

Dec 21, 2012, 12:58:23 AM12/21/12

to rav...@googlegroups.com

Cn you change things to use that?

Oren Eini (Ayende Rahien)

unread,

Dec 21, 2012, 2:13:25 AM12/21/12

to rav...@googlegroups.com

Note that what I used was clearing the db on every run.

I don't care about the version store for now, I care for the memory growth.

georgiosd

unread,

Dec 21, 2012, 3:26:47 AM12/21/12

to rav...@googlegroups.com

Wouldn't a memory profiler help with this work, like Scitech .NET Memory Profiler and ANTS Memory Profiler?

Oren Eini (Ayende Rahien)

unread,

Dec 21, 2012, 5:09:16 AM12/21/12

to rav...@googlegroups.com

We tried, dotTrace and JustTrace, they don't tell us what is going on.

georgiosd

unread,

Dec 21, 2012, 7:16:08 AM12/21/12

to rav...@googlegroups.com

Hm... perhaps it's worth trying on Mono - just in case it's related to a specific idiosyncrasy of the runtime in relation to your code and not of your code alone.

clayton collie

unread,

Dec 21, 2012, 7:31:05 AM12/21/12

to rav...@googlegroups.com

No Esent on Mono, unfortunately. Mono compatible backend planned post 2.0

Robert Edin

unread,

Dec 21, 2012, 8:38:40 AM12/21/12

to rav...@googlegroups.com

+1 on your idea regarding platform idiosyncrasy as one factor.

I used a trial of dotTraceMemory I few times to run the raven server. I noticed no bad behaviour then, and I have severe problems with this issue.

Georgios Diamantopoulos

unread,

Dec 21, 2012, 11:05:36 AM12/21/12

to rav...@googlegroups.com

Are there any alternative runtimes? Or perhaps you can try some different runtime memory management options, if any?
I remember from my Java days someone had randomly found a JVM garbage collector bug doing not much special.

Date: Fri, 21 Dec 2012 04:31:05 -0800
From: gbo...@gmail.com
To: rav...@googlegroups.com
Subject: Re: [RavenDB] Re: Memory issue, progress

Oren Eini (Ayende Rahien)

unread,

Dec 21, 2012, 11:59:36 AM12/21/12

to rav...@googlegroups.com

No need, I think that I found it. And it is a doozy.

Will be able to provide a lot more detail once I figure out if this is actually it.

Carlos Mendes

unread,

Dec 21, 2012, 7:38:31 PM12/21/12

to rav...@googlegroups.com

I just posted a comment on this thread about the results I got using MemProfiler from Scitech (that are aligned with what Ayende found out).

One of the most relevant features of MemProfiler is that we can import and compare .Net memory dumps (I haven't used the latest versions of dotTrace, JustTrace or Ants so I don't know if this feature is available nowadays).

That is really an useful feature since we usually can't reproduce memory (locking, etc) issues while profiling due to the changes introduced by the profiler in the system's behavior.

Reply all

Reply to author

Forward