I'm trying to migrate an existing application to RavenDB, and I've started migrating data to RavenDB. The database now contains about a quarter of a million documents, of varying types and sizes. The entire database is about 4 GB. When I create an index on documents of a collection that contains only 600-ish documents, all of which are pretty small, RavenDB's memory usage starts ballooning, until it consumes about 16 GB (the amount of RAM I have in my dev machine), Windows starts paging and everything grinds to a screeching halt. Then, when I try to stop RavenDB (it runs as a Windows service), it takes literally tens of minutes for it to shut down and my machine to recover. During shutdown, memory usage keeps increasing. More recently, this ballooning behavior has also started occurring when I just open the database and try to access the 'Collections' view from the studio.I'm running RavenDB build 960 as a Windows service. It's basically an OOB installation, except that I've created a separate database for the application.Can somebody point me in the right direction? At this point RavenDB is more a nuisance than a useful database system.
Hi,This is expected, RavenDB need to go over all of the documents in the database in order to index them.It attempts to balance between memory usage an indexing performance.You can control it by specifyingRaven/MaxNumberOfItemsToIndexInSingleBatch (default is 131,072)And:Raven/AvailableMemoryForRaisingIndexBatchSizeLimit (default is 768 MB)It shouldn't consume all RAM, however. And it certainly shouldn't be paging.Note that this happening on collections is consistent with you not having the default Raven/DocumentsByEntityName index.
Long shut down times are likely because of the high batch size that it has during heavy indexing.There is something strange going on, it should NOT be taking all RAM and it should NEVER force paging.What is the size of your documents? How many documents do you have?
How does it behave when you are not introducing a new index?
Can you trying taking a dump of the memory when it is using too much and sending it to us?
What was the crash?
That was very helpful.We managed to reproduced this locally, and are investigating.A quick fix would be to set Raven/MaxNumberOfItemsToIndexInSingleBatch to a value of 4096 or 8192We are currently checking to see why that isn't auto tuned, will update as we have more info.
Confirmed, you have some documents which are 100s of kb, then you have some that are 5MB + (ArticleImports/118, ArticleGroups/83303).Then you have:11,624 kb - ArticleImports/30512,878 kb - ArticleImports/36817,563 kb - ArticleImports/39919,413 kb - ArticleImports/52622,216 kb - ArticleImports/55828,846 kb - ArticleImports/564
This explains it.It is actually _really_ bad because of the way they were saved.You have a few hundred thousands with relatively small sizes (100s kb - 1MB) and then you have a whole batch of very large items.RavenDB assumes consistent documents size,so it tries to load a large batch of documents (assuming they are roughly 100s kb - 1MB) and suddenly we have this batch of dozens of MB that all come up at once.From the pattern of the data, you probably have documents even larger than that which you weren't able to send me.Note that we also improved the smuggler as a result of this.At any rate, now RavenDB take into account live documents size as they are indexed, not just heuristics.Hopefully that should fix that.The workaround for your case is to specify a REALLY low max size, 128 - 256 or so.Assume that each doc is 20 MB in size, that should do it.
What is the best way to model, for example, the article import documents? So, for clarity, an article import can contain as little as 1 article, up to as much as 50k articles, each of which can contain anywhere between 0 and 50 properties.