running into "Version store out of memory (cleanup already attempted)" repeatedly

1,928 views
Skip to first unread message

fschwiet

unread,
Jan 10, 2013, 3:18:08 PM1/10/13
to rav...@googlegroups.com
  Hey I've been abusing RavenDB recently, importing a lot of documents.  I've ran into the "Version store out of memory (cleanup already attempted)" exception a couple times.  The first time I increased Raven/Esent/MaxVerPages from the default (512) to 4096.  After running into it again I'm now trying 16384.  I currently have 100 million documents and 2 (stale) indexes.  The indexes are getting the exception, though I've gotten the exception just trying to write.

  I'm not sure I really have a question.  Maybe someone has guidance on reasonable ranges for MaxVerPages.

  Another observation, when I hit 'q' from the server console it just stalls the process.  The database UI doesn't load at 8080, and the console is just stuck until I hit ctrl-C a couple times.  The database restarts ok afterwards so this hasn't been an issue.

Brett Nagy

unread,
Jan 10, 2013, 3:33:52 PM1/10/13
to rav...@googlegroups.com
Exactly the same issues for me.

I have an export/dump from our 960 build, trying to import into 2230 using the Studio (64bit, 8GB memory, 8-core). I can only import around 600K docs of a 1.4million doc export before it stops with the Version store out of memory error. Memory is max'd out, CPU is low.

Same export will import into 960 build just fine.

Brett

Matt Warren

unread,
Jan 10, 2013, 5:33:15 PM1/10/13
to ravendb
> Another observation, when I hit 'q' from the server console it just stalls the process.  The database UI doesn't load 
> at 8080, and the console is just stuck until I hit ctrl-C a couple times.  The database restarts ok afterwards so this 
> hasn't been an issue.

It's not stuck or stalled, it's trying to cleanly shut-down by waiting for any outstanding tasks to complete, flushing data etc. The database UI doesn't work because AFAIK one of the first thing it does it stop responding to requests, so that there isn't more work coming in

RavenDB should be robust to abrupt shut-downs like this, for instance Esent is definitely able to handle it without loosing data. But it does mean it might have to do a recovery step when it restarts. So in general you should wait, rather than hit Ctrl-C, altough I know it can sometime take several minutes.

But I think there should be something printed to the console to tell you what it's doing, otherwise it does look like it's stuck

Matt Warren

unread,
Jan 10, 2013, 5:36:25 PM1/10/13
to ravendb
For the other issue, you can also try playing with "Raven/MaxNumberOfItemsToIndexInSingleBatch", setting it smaller should mean less items in a transaction. However take a look at the "/stats" or "/admin/stats" endpoint first, to see the current values.

BTW what build are you using?


On 10 January 2013 20:18, fschwiet <fsch...@gmail.com> wrote:

fschwiet

unread,
Jan 10, 2013, 6:27:34 PM1/10/13
to rav...@googlegroups.com
  I'm on build 2199.  I really should upgrade, probably the next time I need to restart it to update a setting.

Matt Warren

unread,
Jan 10, 2013, 6:28:45 PM1/10/13
to ravendb
Yeah it might be worth it, quite a few things related to memory usage got fixed in the most recent builds.

Brett Nagy

unread,
Jan 10, 2013, 7:50:56 PM1/10/13
to rav...@googlegroups.com
I still see the same issues on 2230.

Stein Arild Hoem

unread,
Jan 11, 2013, 3:46:42 AM1/11/13
to rav...@googlegroups.com
I have the same issues in 2230. increasing Raven/Esent/MaxVerPages fixes the error message - but not the memory issues... see thread: build 2196.. trouble with EsentVersionStoreOutOfMemoryException and stale indexes
I've been trying different configurations for some days now ...

Stein

Oren Eini (Ayende Rahien)

unread,
Jan 11, 2013, 7:27:39 AM1/11/13
to ravendb
Any way for us to reproduce things?

Stein Arild Hoem

unread,
Jan 11, 2013, 8:17:01 AM1/11/13
to rav...@googlegroups.com
I've found one potential problem in raven.

I noticed that memory is very quickly consumed due to - what I think may be an error in the code - for setting default value for:

line 53 in InMemoryRavenConfiguration
AvailableMemoryForRaisingIndexBatchSizeLimit = Math.Min(768, MemoryStatistics.TotalPhysicalMemory / 2);

documentation states:
Raven/AvailableMemoryForRaisingIndexBatchSizeLimit The minimum amount of memory available for us to double the size of InitialNumberOfItemsToIndexInSingleBatch if we need to. 
Default: 50% of total system memory
Minimum: 768

I think the code should be Math.Max ?

Setting an reasonable value in config - 4 Gb : makes raven behave as expexted - and stop increasing batch index size when half of the memory is consumed...

Stein

Stein Arild Hoem

unread,
Jan 11, 2013, 11:00:22 AM1/11/13
to rav...@googlegroups.com
Setting AvailableMemoryForRaisingIndexBatchSizeLimit to a higher value only postpones my problems - memory eventually gets eaten - and cpu goes high:

I've tried debugging in visual studio - version 2202 - or https://github.com/ayende/ravendb.git (if its the same?)

With a 2.5 mill database and these settings:
    <add key="Raven/Esent/MaxVerPages" value="2048"/>
    <add key="Raven/DisableDocumentPreFetchingForIndexing" value="true"/>
    <add key="Raven/MaxNumberOfParallelIndexTasks" value="1"/>
    <add key="Raven/ResetIndexOnUncleanShutdown" value="true"/>

When I try to stop indexing from studio the following happens:
I't will wait for indexing tasks to finish - all but one task quickly ends and seems to be stuck in Database.Indexing.ReducingExecutor.MultistepReduce line 118:

var persistedResults = actions.MapReduce.GetItemsToReduce ( level: level, reduceKeys: keysToReduce, index: index.IndexName, itemsToDelete: itemsToDelete, loadData: true ).ToList();

it has 18 {string} keysToReduce ... on my ObservationTagsIndex 

I have a breakpoint on the next line - and can step through the do loop in 'GetItemsToReduce' - so something is happening - but it will 'never' finish the loop - and does not get to the breakpoint(above).. 

when i let continue to run - it keeps eating my memory - and 'never' gets to finish the task...  (never as in 'stuck inside there for 1 hour right now - while waiting for indexing to stop) : i put an extra breakpoint with counter on line 260: yield mappedResultInfo; passed 1500000 now... and eaten 3 Gb since I stopped the indexing tasks...

I do not know the code of RavenDb very well - but are just describing what i see. Maybe this has nothing to do with the memory issues, but right now I'm not able to load load my documents and use the indexes as I want. 

(on another note: putting new documents while indexing or changing/deleting some seems to accelerate the memory/cpu issues)

A dump of my larger database 6 mill docs with the same indexes here: http://folk.ntnu.no/steinho/

Stein

On Friday, January 11, 2013 1:27:39 PM UTC+1, Oren Eini wrote:

fschwiet

unread,
Jan 11, 2013, 2:20:40 PM1/11/13
to rav...@googlegroups.com
  Sorry this is one of those things where its hard to give you a repro.  It might be specific to the data, which I can't share.  I could suggest opening a database from the console and writing 100 million records of arbitrary data, but I do not know if that would really repro it.

Oren Eini (Ayende Rahien)

unread,
Jan 13, 2013, 6:36:19 AM1/13/13
to ravendb
The code is correct. This is used in a way that says.
If the memory available is _larger_ than AvailableMemoryForRaisingIndexBatchSizeLimit , we can raise the limit.
If the memory is smaller than that amount, we cannot.

Oren Eini (Ayende Rahien)

unread,
Jan 13, 2013, 6:37:34 AM1/13/13
to ravendb
Anyway for us to be able to reproduce it? Even seeing this live on your system and able to debug it on your end would be useful.

Stein Arild Hoem

unread,
Jan 13, 2013, 6:55:13 AM1/13/13
to Oren Eini (Ayende Rahien), ravendb
Yes - and I saw that the code respected that value when evaluating whether or not to increase batch size, but it think the default value is interpreted wrongly...
 
If you have a system with 1 Gb total memory:
AvailableMemoryForRaisingIndexBatchSizeLimit = Math.Min(768, MemoryStatistics.TotalPhysicalMemory / 2);  will give the value 512
 
If you have 16Gb it wil give you a value 768 (because of the Math.Min)
 
But documentation states that default is:Default: 50% of total system memory and minvalue is 768
(512 in the 1Gb memory case is less than 768)
 
I changed the code to use Math.Max and build the server.
 
It then more ore less flattened out on about half of my free memory...
 
Stein
 
 
Sendt fra Windows Mail
 
Fra: Oren Eini (Ayende Rahien)
Sendt: 13. januar 2013 12:36
Til: ravendb <rav...@googlegroups.com>
Emne: Re: [RavenDB] Re: running into "Version store out of memory (cleanup already attempted)" repeatedly
 
The code is correct. This is used in a way that says.
If the memory available is _larger_ than AvailableMemoryForRaisingIndexBatchSizeLimit , we can raise the limit.
If the memory is smaller than that amount, we cannot.
On Fri, Jan 11, 2013 at 3:17 PM, Stein Arild Hoem <stein...@gmail.com> wrote:
I've found one potential problem in raven.

I noticed that memory is very quickly consumed due to - what I think may be an error in the code - for setting default value for:

line 53 in InMemoryRavenConfiguration
AvailableMemoryForRaisingIndexBatchSizeLimit = Math.Min(768, MemoryStatistics.TotalPhysicalMemory / 2);

documentation states:
Raven/AvailableMemoryForRaisingIndexBatchSizeLimit The minimum amount of memory available for us to double the size of InitialNumberOfItemsToIndexInSingleBatch if we need to. 
Default: 50% of total system memory
Minimum: 768

I think the code should be Math.Max ?

Setting an reasonable value in config - 4 Gb : makes raven behave as expexted - and stop increasing batch index size when half of the memory is consumed...

Oren Eini (Ayende Rahien)

unread,
Jan 13, 2013, 7:10:54 AM1/13/13
to Stein Arild Hoem, ravendb
Maybe it is a docs issue, but the idea is that if you the limit is the min amount of memory that we want to leave _free_ for the system.

Brett Nagy

unread,
Jan 14, 2013, 2:17:27 PM1/14/13
to rav...@googlegroups.com, Stein Arild Hoem
I can send you (directly) a link to our DB dump, which I cannot import into 2230 due to this error. Until I can get this imported, we can't upgrade.

Oren Eini (Ayende Rahien)

unread,
Jan 15, 2013, 5:21:07 PM1/15/13
to ravendb, Stein Arild Hoem
I got the database, to confirm: 253 MB, right?

Brett Nagy

unread,
Jan 15, 2013, 5:58:43 PM1/15/13
to rav...@googlegroups.com, Stein Arild Hoem
Correct. Thanks.
Brett

Oren Eini

unread,
Jan 15, 2013, 11:48:18 PM1/15/13
to rav...@googlegroups.com, Stein Arild Hoem
Small question, how many docs are there in there?
We just fixed an issue that is probably related, and I am currently at 130K docs and the indexes are keeping up nicely.

Brett Nagy

unread,
Jan 15, 2013, 11:52:50 PM1/15/13
to rav...@googlegroups.com, Stein Arild Hoem
There are approx. 1.4 million documents. I haven't been able to import more than 400K without the above error, unless I disable indexing and then it got to around 600K.

Brett

Oren Eini (Ayende Rahien)

unread,
Jan 15, 2013, 11:53:57 PM1/15/13
to ravendb, Stein Arild Hoem
Okay, I'll let you know how it goes, thanks.

Oren Eini (Ayende Rahien)

unread,
Jan 16, 2013, 12:01:59 AM1/16/13
to ravendb, Stein Arild Hoem
Well, it is now at 434K with 2 stale indexes.
I'll let it run, but I have  a good feeling about it.

Oren Eini (Ayende Rahien)

unread,
Jan 16, 2013, 12:56:09 AM1/16/13
to ravendb, Stein Arild Hoem
1.35 million & 6 stale indexes.
I think we caught the issue :-)

Brett Nagy

unread,
Jan 16, 2013, 1:02:57 AM1/16/13
to rav...@googlegroups.com, Stein Arild Hoem
Awesome!!

I assume this was on comparable hardware? ~8gb ram, etc.

I'll test as soon as there is a build.

Brett

Oren Eini (Ayende Rahien)

unread,
Jan 16, 2013, 1:25:09 AM1/16/13
to ravendb, Stein Arild Hoem
8 GB RAM, yes.
It completed at 1,411,653 docs or so.
No errors.

Brett Nagy

unread,
Jan 16, 2013, 4:10:10 PM1/16/13
to rav...@googlegroups.com, Stein Arild Hoem
Ahh, same error for me on build 2233. It does get slightly further (~450K docs).

Checking again, the machine this is running on actually has 6GB memory and 8 cores.

I'll be able to test on many more machines tomorrow afternoon and will follow-up.

Brett

Oren Eini (Ayende Rahien)

unread,
Jan 16, 2013, 4:12:31 PM1/16/13
to ravendb, Stein Arild Hoem
Thank you, we found another issue that may be related, we will check then.
Did you make any config modification?

Brett Nagy

unread,
Jan 16, 2013, 4:16:50 PM1/16/13
to rav...@googlegroups.com

The only config modifications we make is to allow “All” for the Anon user. Everything else is straight out of the box, as unzipped from the download. No bundles enabled and a new DB created that we import our dump into.

 

Brett

Brett Nagy

unread,
Jan 24, 2013, 9:11:26 PM1/24/13
to rav...@googlegroups.com
We've been doing a lot of testing over the last few days and found success rate to be highly affected by drive performance. In order to test multiple configurations, we set up different EC2 instance types with different storage volume configurations, and this is what we've found.

---------------------------------------------------------

Import 1,411,653 docs, 23 indexes

EBS test configurations

4 x 200GB (each 2000 IOPS) as 1 striped volume
completed, 75 minutes

2 x 200GB (each 2000 IOPS) 2 volumes (1 data, 1 indexes) 
1st time, Esent exception at 416,000 docs
2nd time, Esent exception at 631,808 docs

1 x 200GB (2000 IOPS) volume
1st time, Esent exception at 926K docs (53 minutes)
2nd time, Esent exception at 757K docs (46 minutes)

2 x 200GB (each at 2000 IOPS) as 1 striped  volume 
completed, 75 minutes

2 x 100GB (each at 1000 IOPS) as 1 striped  volume
Raven/Esent/MaxVerPages 512
Esent exception at 1.114 million docs (72 minutes)

2 x 100GB (each at 1000 IOPS) as 1 striped volume
Raven/Esent/MaxVerPages 2048
completed, 90 mins

---------------------------------------------------------

Also worth noting, on my Windows 7 desktop (6GB memory, 2x drives in RAID 1, NOT a VM) I have not yet successfully imported our existing data set (1.4M docs). On my desktop, when setting MaxVerPages to 2048, the docs import, but all indexes are always stale.

Stein Arild Hoem

unread,
Jan 25, 2013, 3:22:30 AM1/25/13
to rav...@googlegroups.com
Sounds very similar to the problems issues we have...
 
I have notised that /stats sometimes shows a very high value for: DatabaseTransactionVersionSizeInMB 2000Mb or more... is this uncommited disk data? This usually happens when cpu/memory usage is very high and indexing gets into trouble..
 
Stein

Matt Warren

unread,
Jan 25, 2013, 3:49:41 AM1/25/13
to rav...@googlegroups.com
That stat is linked to the MaxVerPages, it's the current value as opposed to the Max value

So it's not surprising the you see that value high just before you get an esent vsession store exception

Can you increase MaxVerPages?
--
 
 

Stein Arild Hoem

unread,
Jan 25, 2013, 4:14:26 AM1/25/13
to ravendb
We have increased it to both 2048 and later to 4096, and thus no longer have Esent.OutOfMemory errors
- but have problems with indexing that eats all memory/cpu - util server stops responding - stale indexes that does no finish indexing - or that indexing simply stops - as I have reported in other threads.

Problems starts at as few as 1 000 000 documents with 18 indexes - 6 of them mapreduce (the other indexes seems not to cause problems).
Right now I have a database with 6 500000 documents (the source sql server i'm migrating from have 14 000 000), that I filled with document when build 2236 came, and then put the indexes after the documents.
After about a week with' indexing stops - and server restarts' - a few of the mapreduce indexes have indexed 5500000 documents - and will manage about 200000 before everything hangs and I will have to restart raven.

The dump I'm trying to import was exported from a build back in early december - when I was actually able to fill the database with 6500000 documents from the data source with indexes enabled.

Yesterday i started an import with my batch import job, with all indexes enabled on a clean/empty 2237 installation on a different server - and all went fine until about 900000 documents - then the same problems arise (see my other threads)

As few as 390 000 documents - in this database  - are actually involved in the indexes that does not finish indexing - stays stale....


One question:
If you hit an Esent - out of memory exception - stop the server - increase MaxVerPages - then start it again .... 
can I expect the server to behave ok? The one Esent.OutOf Memo... exception has not corrupted the database in any way?

Stein


2013/1/25 Matt Warren <matt...@gmail.com>



--
Vennlig hilsen,

Stein Hoem
leder Malvik IL Friidrett

Oren Eini (Ayende Rahien)

unread,
Jan 25, 2013, 5:06:55 AM1/25/13
to ravendb
Yes, if you get the OOM, and increase the size, it will behave. It does NOT corrupt the db.

I answered what I think is the root cause of your issues in a separate email.


--
 
 

Stein Arild Hoem

unread,
Jan 25, 2013, 6:03:25 AM1/25/13
to ravendb

Ok - will test right away..

Stein

2013/1/25 Oren Eini (Ayende Rahien) <aye...@ayende.com>



--
Sincerely,

Stein Hoem
Senior Engineer Norwegian Biodiversity Information Centre

Brett Nagy

unread,
Jan 25, 2013, 12:31:05 PM1/25/13
to rav...@googlegroups.com
re: the root cause, is this something I could try changing on my setup, too? Happy to test / change anything at this point.
Thanks

Oren Eini (Ayende Rahien)

unread,
Jan 25, 2013, 2:27:40 PM1/25/13
to ravendb
That wasn't it, and we are still investigating.


On Fri, Jan 25, 2013 at 7:31 PM, Brett Nagy <brett...@microgroove.com> wrote:
re: the root cause, is this something I could try changing on my setup, too? Happy to test / change anything at this point.
Thanks

--
 
 

Oren Eini (Ayende Rahien)

unread,
Feb 12, 2013, 1:14:05 PM2/12/13
to ravendb
I think we got a solution for that (we were over eager with optimizations and it killed us).
Can you check the latest?


On Tue, Feb 12, 2013 at 7:57 PM, Weston Binford <wbin...@gmail.com> wrote:
Using Build 2267, I am getting this issue as well. I am using stock RavenDB.Server running as a Windows service with no configuration changes other than setting the port to 20031. The test server is virtualized under VMWare. It has 16GB of RAM and 4 cores (2 Intel Xeon E5-2650 @2.0GHz with two cores each assigned to the VM) running Windows Server 2008 R2 Standard Edition.

The documents are Accounts each with a list of Tax Records. The accounts have 34 properties including a couple of addresses and a list of Tax Records. I am loading 43,829 accounts with 595,692 tax records. This is just one load of 14 that I am loading to test performance. In total, I have 509,921 accounts with 2,930,220 tax records (about half of the data in our Oracle database).

When I run locally (16GB of RAM with an 256GB SSD, not virtualized), I can load all 14 data loads individually (waiting for the indexes to update between each run) without error. However, if I use BulkInsert locally, I get the "Version store out of memory (cleanup already attempted)" error about half way through all loads. When bulk loading, I run each data load manually, but I don't wait for the indexes to finish.

I am working on creating a github repository with a cut-down version of the project to reproduce the problem.

-Weston





--
You received this message because you are subscribed to the Google Groups "ravendb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Weston Binford

unread,
Feb 14, 2013, 5:22:21 PM2/14/13
to rav...@googlegroups.com
Okay, I upgraded to Build 2268 and it solved the problem throwing errors. I still have a performance problem that I have not isolated, but I intend to solve that problem by "cutting the Gordian Knot". I intend to raise this issue in a separate thread. The short version is that I have a map reduce that uses a map to flatten a parent child relationship (Accounts and Tax Records), and then reduces the results to get the total due for the account. Instead, I am going to have a calculated value for total due on the Account and persist that in the JSON document.

-Weston
Reply all
Reply to author
Forward
0 new messages