Re: [RavenDB] RavenDB Memory Usage

861 views
Skip to first unread message

Oren Eini (Ayende Rahien)

unread,
Jul 2, 2012, 5:17:44 AM7/2/12
to rav...@googlegroups.com
Hi,
This is expected, RavenDB need to go over all of the documents in the database in order to index them.
It attempts to balance between memory usage an indexing performance.

You can control it by specifying 
Raven/MaxNumberOfItemsToIndexInSingleBatch (default is 131,072)
And:
Raven/AvailableMemoryForRaisingIndexBatchSizeLimit (default is 768 MB)

It shouldn't consume all RAM, however. And it certainly shouldn't be paging.
Note that this happening on collections is consistent with you not having the default Raven/DocumentsByEntityName index.

Long shut down times are likely because of the high batch size that it has during heavy indexing.

There is something strange going on, it should NOT be taking all RAM and it should NEVER force paging.

What is the size of your documents? How many documents do you have? 
How does it behave when you are not introducing a new index?




On Mon, Jul 2, 2012 at 11:54 AM, ErikHeemskerk <erik.he...@integrace.nl> wrote:
I'm trying to migrate an existing application to RavenDB, and I've started migrating data to RavenDB. The database now contains about a quarter of a million documents, of varying types and sizes. The entire database is about 4 GB. When I create an index on documents of a collection that contains only 600-ish documents, all of which are pretty small, RavenDB's memory usage starts ballooning, until it consumes about 16 GB (the amount of RAM I have in my dev machine), Windows starts paging and everything grinds to a screeching halt. Then, when I try to stop RavenDB (it runs as a Windows service), it takes literally tens of minutes for it to shut down and my machine to recover. During shutdown, memory usage keeps increasing. More recently, this ballooning behavior has also started occurring when I just open the database and try to access the 'Collections' view from the studio.

I'm running RavenDB build 960 as a Windows service. It's basically an OOB installation, except that I've created a separate database for the application.

Can somebody point me in the right direction? At this point RavenDB is more a nuisance than a useful database system.

Matt Warren

unread,
Jul 2, 2012, 5:23:55 AM7/2/12
to rav...@googlegroups.com
> This is expected, RavenDB need to go over all of the documents in the database in order to index them.
> It attempts to balance between memory usage an indexing performance.

Slightly off topic, but this comes up from time-to-time. Could the doc type be stored in the esent dBase, so you can use an index to read just the docs that match the map in the index, rather than having to look at all of them?

ErikHeemskerk

unread,
Jul 2, 2012, 5:52:37 AM7/2/12
to rav...@googlegroups.com
On Monday, July 2, 2012 11:17:44 AM UTC+2, Oren Eini wrote:
Hi,
This is expected, RavenDB need to go over all of the documents in the database in order to index them.
It attempts to balance between memory usage an indexing performance.

You can control it by specifying 
Raven/MaxNumberOfItemsToIndexInSingleBatch (default is 131,072)
And:
Raven/AvailableMemoryForRaisingIndexBatchSizeLimit (default is 768 MB)

It shouldn't consume all RAM, however. And it certainly shouldn't be paging.
Note that this happening on collections is consistent with you not having the default Raven/DocumentsByEntityName index.

I do have the Raven/DocumentsByEntityName index, and it is not stale. (Stats: Total results: 294,992, Status: Up to date, Last update of document: 12 hours ago)
 

Long shut down times are likely because of the high batch size that it has during heavy indexing.

There is something strange going on, it should NOT be taking all RAM and it should NEVER force paging.

What is the size of your documents? How many documents do you have? 

There currently are 294,994 documents in the database. I know some of them are pretty large, but I don't know of any way to find out how large they are. There are two collections with large documents. In one, the documents meant to be that large; the modelling follows the transaction boundary. The other, I plan to remodel, because they don't follow the transaction boundary, but they are currently probably too large. 
 
How does it behave when you are not introducing a new index?

Most recently, a couple of minutes ago, I fired up RavenDB, opened the studio, and as soon as I selected the correct database, memory started to grow. Before that, memory usage was around 70 MB. It's currently so bad that I'm even unable to view the server logs. Also, when I start the server not as a service, but with Start.cmd, I get no logging information at all.

Oren Eini (Ayende Rahien)

unread,
Jul 2, 2012, 5:54:05 AM7/2/12
to rav...@googlegroups.com
Hard to do, what happen with multi map?

Oren Eini (Ayende Rahien)

unread,
Jul 2, 2012, 5:55:01 AM7/2/12
to rav...@googlegroups.com
Can you trying taking a dump of the memory when it is using too much and sending it to us? 

ErikHeemskerk

unread,
Jul 2, 2012, 6:59:57 AM7/2/12
to rav...@googlegroups.com
Do you want a minidump or a full dump? And how would you prefer I send it to you?


On Monday, July 2, 2012 11:55:01 AM UTC+2, Oren Eini wrote:
Can you trying taking a dump of the memory when it is using too much and sending it to us? 

Oren Eini (Ayende Rahien)

unread,
Jul 2, 2012, 7:03:34 AM7/2/12
to rav...@googlegroups.com
Full dump
If you can provide the db export, that would be great
You can send via dropbox

ErikHeemskerk

unread,
Jul 2, 2012, 8:39:05 AM7/2/12
to rav...@googlegroups.com
Smuggler crashes after 294.912 documents with a time-out. Can I send a partial export? Also, by 'Dropbox', I assume you mean 'a shared folder in Dropbox' (sorry, I'm kind of a Dropbox n00b). To who do I share the folder?

Fitzchak Yitzchaki

unread,
Jul 2, 2012, 8:46:26 AM7/2/12
to rav...@googlegroups.com
sup...@hibernatingrhinos.com, and if this requires a Dropbox account email, so Oren's email would work aye...@ayende.com 

Oren Eini (Ayende Rahien)

unread,
Jul 2, 2012, 2:42:35 PM7/2/12
to rav...@googlegroups.com
What was the crash?


On Mon, Jul 2, 2012 at 3:39 PM, ErikHeemskerk <erik.he...@integrace.nl> wrote:

ErikHeemskerk

unread,
Jul 2, 2012, 3:34:13 PM7/2/12
to rav...@googlegroups.com
Unhandled Exception: System.Net.WebException: The operation has timed out                                                      
   at System.Net.HttpWebRequest.GetResponse()                                                                                  
   at Raven.Abstractions.Connection.HttpRavenRequest.SendRequestToServer(Action`1 action) in c:\Builds\RavenDB-Stable\Raven.Abstractions\Connection\HttpRavenRequest.cs:line 183                                                                              
   at Raven.Abstractions.Connection.HttpRavenRequest.ExecuteRequest(Action`1 action) in c:\Builds\RavenDB-Stable\Raven.Abstractions\Connection\HttpRavenRequest.cs:line 135                                                                                   
   at Raven.Smuggler.SmugglerApi.GetDocuments(Guid lastEtag) in c:\Builds\RavenDB-Stable\Raven.Smuggler\SmugglerApi.cs:line 66 
   at Raven.Abstractions.Smuggler.SmugglerApiBase.ExportDocuments(SmugglerOptions options, JsonTextWriter jsonWriter, Guid lastEtag) in c:\Builds\RavenDB-Stable\Raven.Abstractions\Smuggler\SmugglerApiBase.cs:line 121                                     
   at Raven.Abstractions.Smuggler.SmugglerApiBase.ExportData(SmugglerOptions options, Boolean incremental) in c:\Builds\RavenDB-Stable\Raven.Abstractions\Smuggler\SmugglerApiBase.cs:line 87                                                                 
   at Raven.Smuggler.Program.Parse(String[] args) in c:\Builds\RavenDB-Stable\Raven.Smuggler\Program.cs:line 132               
   at Raven.Smuggler.Program.Main(String[] args) in c:\Builds\RavenDB-Stable\Raven.Smuggler\Program.cs:line 70                 

On Monday, July 2, 2012 8:42:35 PM UTC+2, Oren Eini wrote:
What was the crash?

Oren Eini (Ayende Rahien)

unread,
Jul 2, 2012, 5:57:23 PM7/2/12
to rav...@googlegroups.com
That was very helpful.
We managed to reproduced this locally, and are investigating.

A quick fix would be to set Raven/MaxNumberOfItemsToIndexInSingleBatch to a value of 4096 or 8192

We are currently checking to see why that isn't auto tuned, will update as we have more info.

On Mon, Jul 2, 2012 at 3:39 PM, ErikHeemskerk <erik.he...@integrace.nl> wrote:

ErikHeemskerk

unread,
Jul 3, 2012, 5:05:13 AM7/3/12
to rav...@googlegroups.com
I've tried setting this in my Raven.Server.exe.config, but unfortunately it did not work. The file currently looks like this:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <appSettings>
    <add key="Raven/Port" value="*"/>
    <add key="Raven/DataDir" value="~\Data"/>
    <add key="Raven/AnonymousAccess" value="Get"/>
<add key="Raven/MaxNumberOfItemsToIndexInSingleBatch" value="4096"/>
  </appSettings>
<runtime>
<loadFromRemoteSources enabled="true"/>
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
<probing privatePath="Analyzers"/>
</assemblyBinding>
</runtime>
</configuration>

I've modified the file, then started Raven, but the moment I selected the database in question, the ballooning happened at the same as before.

On Monday, July 2, 2012 11:57:23 PM UTC+2, Oren Eini wrote:
That was very helpful.
We managed to reproduced this locally, and are investigating.

A quick fix would be to set Raven/MaxNumberOfItemsToIndexInSingleBatch to a value of 4096 or 8192

We are currently checking to see why that isn't auto tuned, will update as we have more info.

Matt Warren

unread,
Jul 3, 2012, 8:16:06 AM7/3/12
to rav...@googlegroups.com
Just out of interest, what happens if you decease that value instead? I.e. set  MaxNumberOfItemsToIndexInSingleBatch  = 512 or 256?

Oren Eini (Ayende Rahien)

unread,
Jul 3, 2012, 9:42:36 AM7/3/12
to rav...@googlegroups.com
Give me some time to try to track down exactly why this is happening.

Oren Eini (Ayende Rahien)

unread,
Jul 3, 2012, 7:17:30 PM7/3/12
to rav...@googlegroups.com
I am hazarding a guess here, but it seems that there is a BIG disparity in the sizes of your documents. I haven't finished looking at all of the data, but I think that the fact that there are different sizes along with the big size is throwing off the heuristics RavenDB uses for memory usage.

It gave us a lot better smuggler output :-), at least. 

Oren Eini (Ayende Rahien)

unread,
Jul 3, 2012, 8:29:24 PM7/3/12
to rav...@googlegroups.com
Confirmed, you have some documents which are 100s of kb, then you have some that are 5MB + (ArticleImports/118, ArticleGroups/83303).
Then you have:
11,624 kb - ArticleImports/305
12,878 kb - ArticleImports/368
17,563 kb - ArticleImports/399
19,413 kb - ArticleImports/526
22,216 kb - ArticleImports/558
28,846 kb - ArticleImports/564

This explains it. 
It is actually _really_ bad because of the way they were saved.
You have a few hundred thousands with relatively small sizes (100s kb - 1MB) and then you have a whole batch of very large items.
RavenDB assumes consistent documents size,so it tries to load a large batch of documents (assuming they are roughly 100s kb - 1MB) and suddenly we have this batch of dozens of MB that all come up at once.

From the pattern of the data, you probably have documents even larger than that which you weren't able to send me.

Note that we also improved the smuggler as a result of this.

At any rate, now RavenDB take into account live documents size as they are indexed, not just heuristics.
Hopefully that should fix that.

The workaround for your case is to specify a REALLY low max size, 128 - 256 or so.
Assume that each doc is 20 MB in size, that should do it.

ErikHeemskerk

unread,
Jul 4, 2012, 12:40:31 AM7/4/12
to rav...@googlegroups.com
I'll try this and hope it solves the problem. And yes, some documents are pretty large. The ArticleImports documents are actually expected to be large; they follow the transaction boundary, because they are, as the name sort of implies, the unprocessed, parsed data of an article import, which may contain virtually unlimited articles (the current max is about 50,000), each of which may contain an unlimited amount of properties (the current max for a single import is 300,000). The article groups I modeled into a single document because my goal was to only need a single query to retrieve all of the data necessary to both display and edit an article group. An article group, can contain up to a thousand articles, so I guess it is wise to model the articles as individual documents. I was already planning to do this, but the current state of the database made this impossible.

What is the best way to model, for example, the article import documents? So, for clarity, an article import can contain as little as 1 article, up to as much as 50k articles, each of which can contain anywhere between 0 and 50 properties.


On Wednesday, July 4, 2012 2:29:24 AM UTC+2, Oren Eini wrote:
Confirmed, you have some documents which are 100s of kb, then you have some that are 5MB + (ArticleImports/118, ArticleGroups/83303).
Then you have:
11,624 kb - ArticleImports/305
12,878 kb - ArticleImports/368
17,563 kb - ArticleImports/399
19,413 kb - ArticleImports/526
22,216 kb - ArticleImports/558
28,846 kb - ArticleImports/564

This explains it. 
It is actually _really_ bad because of the way they were saved.
You have a few hundred thousands with relatively small sizes (100s kb - 1MB) and then you have a whole batch of very large items.
RavenDB assumes consistent documents size,so it tries to load a large batch of documents (assuming they are roughly 100s kb - 1MB) and suddenly we have this batch of dozens of MB that all come up at once.

From the pattern of the data, you probably have documents even larger than that which you weren't able to send me.

Note that we also improved the smuggler as a result of this.

At any rate, now RavenDB take into account live documents size as they are indexed, not just heuristics.
Hopefully that should fix that.

The workaround for your case is to specify a REALLY low max size, 128 - 256 or so.
Assume that each doc is 20 MB in size, that should do it.

ErikHeemskerk

unread,
Jul 4, 2012, 1:01:50 AM7/4/12
to rav...@googlegroups.com
Update: I'm curious what the lower bound of Raven/MaxNumberOfItemsToIndexInSingleBatch is and if I'm setting it correctly. I've set it to 128, but then memory usage reached 11.2 GB before the system ground into a swapping frenzy. It ate up a single CPU core for about 3 minutes, then stopped; I guess because of all the swapping. I stopped RavenDB, set the limit to 32, started it again, and memory usage grew even faster to about 15.7 GB before I pulled the plug. Is there any way I can query what the largest documents in my database are? Can I delete the Raven/DocumentsByEntityName index (because I think that's the index that's immediately addressed when I select the database in Raven Studio) and will it help?

Fyi, the current Raven.Server.exe.config looks like this:
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <appSettings>
    <add key="Raven/Port" value="*"/>
    <add key="Raven/DataDir" value="~\Data"/>
    <add key="Raven/AnonymousAccess" value="Get"/>
    <add key="Raven/MaxNumberOfItemsToIndexInSingleBatch" value="32" />
  </appSettings>
<runtime>
<loadFromRemoteSources enabled="true"/>
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
<probing privatePath="Analyzers"/>
</assemblyBinding>
</runtime>
</configuration>

Also, I'm assuming it's not possible to set MaxNumberOfItemsToIndexInSingleBatch differently per database.

Oren Eini (Ayende Rahien)

unread,
Jul 4, 2012, 1:17:39 AM7/4/12
to rav...@googlegroups.com
I would actually say that ArticleImport should be broken up.
Logically,it may be a single document, but physically, it needs to be broken up.

Oren Eini (Ayende Rahien)

unread,
Jul 4, 2012, 1:20:23 AM7/4/12
to rav...@googlegroups.com
I just pushed it, so a new build should be out in any minute now.
Not sure about the rapid growth, but probably because of not also setting:
Raven/InitialNumberOfItemsToIndexInSingleBatch to an initially small value than the max.

At any rate, check the new build, it should behave better. You can actually set different values to different dbs.
Just open the database document on the default db and add the Settings there.

Tobi

unread,
Jul 4, 2012, 4:07:26 AM7/4/12
to rav...@googlegroups.com
Am 04.07.2012 06:40, schrieb ErikHeemskerk:

> What is the best way to model, for example, the article import documents?
> So, for clarity, an article import can contain as little as 1 article, up
> to as much as 50k articles, each of which can contain anywhere between 0
> and 50 properties.

Depending on what you are actually doing with an article import
document, you might as well store this big chunk of data as an attachment.

Tobias

Daniel Lidström

unread,
Jul 4, 2012, 4:27:52 AM7/4/12
to rav...@googlegroups.com
On Wednesday, July 4, 2012 6:40:31 AM UTC+2, ErikHeemskerk wrote:

What is the best way to model, for example, the article import documents? So, for clarity, an article import can contain as little as 1 article, up to as much as 50k articles, each of which can contain anywhere between 0 and 50 properties.

I suppose you could create an article import document. Then, add the individual articles as another type of document with a reference to the article import document. You are allowed to have intra-document references. This would probably bring down the size of the individual documents. Also, you can always create indexes that aggregate information if you want to show, for example, the number of articles in an article import.

/Daniel

ErikHeemskerk

unread,
Jul 4, 2012, 5:48:45 AM7/4/12
to rav...@googlegroups.com
I've installed the new build; it behaves much better. Memory usage is still high (12.5 GB), but not so high that the machine grinds to a halt. I can now actually select the database and see documents and collections. I did notice it now says the Raven/DocumentsByEntityName index is stale, and it is now updating that index. I also dig the new Studio UI. :)

Oren Eini (Ayende Rahien)

unread,
Jul 4, 2012, 6:59:54 PM7/4/12
to rav...@googlegroups.com
RavenDB attempts to make use of as much memory as possible.
Reply all
Reply to author
Forward
0 new messages