The code below will cause an OutOfMemoryException with a database
containing a lot of Sale documents.
Where's the memory leak?
using (var documentStore = new EmbeddableDocumentStore())
{
documentStore.Configuration.DataDirectory = "Data";
documentStore.Configuration.DefaultStorageTypeName = "esent";
documentStore.Initialize();
var skip = 0;
while (true)
{
QueryResult queryResult;
do
{
queryResult =
documentStore.DatabaseCommands.Query("Raven/DocumentsByEntityName",
new IndexQuery
{
Query = "Tag:[[Sales]]",
Start = skip,
PageSize = 1024,
}, null);
if (queryResult.IsStale) Thread.Sleep(100);
} while (queryResult.IsStale);
if (queryResult.Results.Count == 0) break;
skip += queryResult.Results.Count;
}
}
Tobias
// Ryan
> documentStore is caching the results returned from the server?
It's running local/embedded, so it shouldn't do any caching at all
outside the session scope.
Tobias
> Are you running under 32 or 64-bit and how much memory is it consuming
> when it blows up?
The OS is running 64bit, the app is compiled for 32bit. It blows up at
1.6 GB after about 30 seconds.
Tobias
> Raven generally doesn't run well under 32-bit, as it quite easily uses
> up the ~2GB of memory available.
So far it hasn't been any problem (besides some memory leaks in previous
releases) and I'm using Raven for about a year now on dozens of systems.
And 64 bit isn't really an option - a lot of the embedded systems we use
don't have a 64-Bit-capable CPU and usually there's not more than 2GB RAM
available. And I have Raven running on 512MB-Systems as well!
> You can try changing some of the
> memory settings outlined in http://ravendb.net/faq/low-memory-footprint
> and http://ravendb.net/documentation/configuration.
This won't help.
> However it may well just down to high memory usage in Raven and/or
> Lucene as you are doing deep-paging which means that in effect all the
> docs are being queried from the index and read from the data store. So
> both Raven and Lucene will be doing a lot of caching.
I still think this is a bug. I'll check this against previous versions
next week, but I'm pretty sure it worked one or two stable builds back.
Raven shouldn't do this amount of useless heavy caching - not even in a
Client/Server scenario.
Tobias
> But is this test a scenario that you do regularly? i.e do you have a
> need to page through all the docs in the database or is it just a test
> that reproduces your issue more quickly?
I need to do this for the purpose of data migration. I did some
modifications to my document classes and now need to change the
stored docs. Until now, I could do minor changes with patch commands, but
this time it's more complicated, so I need to touch all documents (not ALL
documents, but just the docs of a specific type) myself.
Tobias
> You might want to get those docs via:
> documentStore.DatabaseCommands.StartsWith("sale", <page>, <size>)
Nice - I wasn't aware of this.
> This will pull docs directly from the Esent/Munin datastore, bypassing
> the Lucene index completely. You can also page through them in the
> same way as you would with the query.
>
> It might not fix the issue, but it's a more efficient way (as long as
> all the docs you want have the same prefix).
For the purpose of data migration this is definitly better than dealing
with indexes. I'll try this as soon as I'm back to work. Thx for the hint!
Tobias
> I've done a bit of digging and Raven uses the
> System.Runtime.Caching.MemoryCache internally to cache docs after
> they're been read.
I don't think this is causing the problem. I did a quick check with a
memory profiler last Friday and it's a dictionary that sucks all the
memory. I didn't had a chance to pinpoint this yet - such shit always
happens Friday afternoon :-)
Tobias
First: I was wrong in my assumption that this has worked with a
previous version. I've tried every stable release back to January and
always got an OOME.
Next I tried different ways to query the docs:
1. Using DatabaseCommands.Query("Raven/DocumentsByEntityName", ...)
Retrieved docs : 20480
OutOfMemoryException
Used memory: 1680752640
Used Time : 00:01:10.8783754
2. Using LuceneQuery<dynamic>("Raven/DocumentsByEntityName"):
Retrieved docs : 24908
Used memory : 1406980096
Used Time : 00:02:35.3375286
3. Using GetDocumentsWithIdStartingWith():
Retrieved docs : 24908
Used memory : 434380800
Used Time : 00:00:43.7233648
Only the IndexQuery via DatabaseCommands causes an OOME.
But the LucenQuery will throw an OOME as well, if I simply
read more docs.
GetDocumentsWithIdStartingWith() is by far the fastest method,
using the least memory.
This is probably the best way, for what I need to do (modify
the structure of all docs of a specific type). I'm just not
sure, if this will work, if I modify the docs while paging
through them via GetDocumentsWithIdStartingWith().
I still don't like, that RavenDB OOME's this easily.
Trying the following:
using (DocumentCacher.SkipSettingDocumentsInDocumentCache())
{
...
}
...makes even the first method work.
So I guess, Matt was right and this is only a caching issue.
It would be nice to allow to configure the cache parameters in
code.
I'll try to provide a pull request for this soon.
Tobias
> With the original method add the following lines of code after each
> query:
> GC.Collect();
> GC.WaitForFullGCComplete(2000);
> You can see that the memory usage is more reasonable. (I know this
> isn't a fix or advisable in production code, it's just to show the
> point)
Setting the configuration for MemoryCache works too. All I need is
to decrease the polling interval, which is 2min by default.
(Meaning the cache limits are checked every 2 minutes, but I reach
the memory limit much earlier.)
http://www.google.com/url?sa=D&q=http://msdn.microsoft.com/en-us/library/dd941875.aspx
The documentation just seems to be slightly wrong. This is a working
example:
<system.runtime.caching>
<memoryCache>
<namedCaches>
<add name="Default"
cacheMemoryLimitMegabytes="0"
physicalMemoryLimitPercentage="50"
pollingInterval="00:01:00" />
</namedCaches>
</memoryCache>
</system.runtime.caching>
But I prefer to be able to set this via the Raven-Configuration in
code, like this way:
https://github.com/e-tobi/ravendb/tree/ExposeCachingParameters
It would be possible to expose the megabyte limit as well, but I think
the percentage limit is all someone will ever need.
It would be possible to configure the TTL of the cached items as well,
which might be a little bit more helpful.
Tobias
> Tobi,
> Great job on figuring this out
Some credits go to Matt as well, he guessed it's a memory cache issue,
before I could track this down :-)
> but it seems that you figured it out on your own, this seems wonderful!
> I'll pull your changes.
I haven't done a pull-request yet, because I was thinking about adding
a TTL-setting for the cached entries as well. But If you think,
that MemoryCacheLimitPercentage and MemoryCacheLimitCheckInterval are
enough, please go ahead.
Tobias