OutOfMemoryException with simple batch export loop

277 views
Skip to first unread message

Tobias Sebring

unread,
Jan 9, 2013, 5:36:04 AM1/9/13
to rav...@googlegroups.com
I'm attempting to export partial documents of a specific type from a large RavenDb database built from the public imdb data dumps to a plain-text file. The client application is compiled for x86 and eventually runs out of memory and throws an OutOfMemoryException. Every iteration allocate around one megabyte of extra memory that is not garbage collected until I forcefully recreate the DocumentStore.

What am I missing here? Can I stop the DocumentStore from keeping references to these objects? I've attempted to add in session.Advanced.DocumentStore.DisableAggressiveCaching(in order to stop the documents from being cached in memory.

Any chance of an optimized path for exports like the one that was introduced for bulk-imports recently?


The source code is here:

Screenshot from Visual Studio memory profiler hot-path for one of many object types that remain allocated in memory:
This is after about 50 iterations i.e. 50 * 1024 documents processed.

Matt Warren

unread,
Jan 9, 2013, 7:56:34 AM1/9/13
to ravendb
You need to use code like this:

DocumentSession.Advanced.DocumentStore.DisableAggressiveCaching();
DocumentSession.Advanced.DocumentStore.Conventions.ShouldCacheRequest = url => false;

// Execute the query

// for each loaded doc
DocumentSession.Advanced.Evict(doc);

Chris Marisic

unread,
Jan 9, 2013, 8:59:05 AM1/9/13
to rav...@googlegroups.com
I think the optimized path for this will be to take an index snapshot.

For today, i think you might be better off directly managing your memory personally.


Session.Advanced.DatabaseCommands.StartsWith(keyPrefix, skip, take); which returns JsonDocuments

Then you could either deserialize those to your physical objects, or you could read right out of the Json

jsonDocument.DataAsJson["MyProperty"].ToString()

This will give you very fine grained control over memory usage, and for bulk export scenarios it's not really going to add many lines of code to need to be written.

Chris Marisic

unread,
Jan 9, 2013, 9:07:34 AM1/9/13
to rav...@googlegroups.com
One more thought, if all the data you want to export is directly stored in the index, reading from the index and projecting those values should also skip tracking and avoid full document deserialization and accumulation.

Looking at your code and seeing       .Select(x => new { GlobalId = x.Id, ImdbFullTitle = x.ImdbFullTitle }); it really looks like something that you could either stream right off the index instead of getting the full documents, or that you could easily do this reading directly from the json.

If the projection method results in caching type memory growth that you could use the relevant raw CommandData operation


  var indexQuery = new IndexQuery
            {
                Query = query.ToString(),
                SkipTransformResults = true, //optional
                PageSize = 1024,
                FieldsToFetch = new[] { "Id", "ImdbFullTitle"},
                SortedFields = new[] { new SortedField("Created") } //if needed
            };

            var queryResult = Session.Advanced.DatabaseCommands.Query(typeof(MyIndex).Name, indexQuery, new string[0]);

This returns a class with .Results which are RavenJObjects and you can read directly out of those

 var Ids = queryResult
                .Results
                .Select(x=> x["Id"].Value<string>());

Tobias Sebring

unread,
Jan 9, 2013, 3:07:20 PM1/9/13
to rav...@googlegroups.com
Matt:
Your solution work for getting rid of the residual memory. Thank you!

Chris:
I'm interested in optimizing this export as much as possible. I managed to implement the solution from you first post and the memory footprint is indeed much lower.

I cannot however get what you wrote in your second post to work. I've defined the variable query as: var query = session.Query<Entity, Index>().Skip(n).Take(batchSize); but queryResult.Results do not end up containing any data from the index. Each result instead contains this:
{
  "__document_id": "",
  "@metadata": {
    "Raven-Entity-Name": "",
    "Raven-Clr-Type": "",
    "Temp-Index-Score": 1.0
  }
}

Chris Marisic

unread,
Jan 9, 2013, 3:21:23 PM1/9/13
to rav...@googlegroups.com
I wonder if this should be opened as an issue, my code is running on 960, I haven't tested anything on 2.0 yet.

Tobias Sebring

unread,
Jan 10, 2013, 5:01:50 AM1/10/13
to rav...@googlegroups.com
Below I've added some more information on what is happening.

This is the request being made to RavenDB (build 2230):

This is the response json:
{
   "Results":[
      {
         "__document_id":"ImdbMedias-a5a9be7a-70a1-4493-a7f5-4d1d887e3258",
         "@metadata":{
            "Raven-Entity-Name":"ImdbMedias",
            "Raven-Clr-Type":"Imdb.Data.ImdbMedia, Imdb.Data",
            "Temp-Index-Score":1.0
         }
      },
      {
         "__document_id":"ImdbMedias-b04691d6-2641-481b-bfc9-9271a242996e",
         "@metadata":{
            "Raven-Entity-Name":"ImdbMedias",
            "Raven-Clr-Type":"Imdb.Data.ImdbMedia, Imdb.Data",
            "Temp-Index-Score":1.0
         }
      },
      {
         "__document_id":"ImdbMedias-2390ddd8-0829-4766-ae80-d24b63a21811",
         "@metadata":{
            "Raven-Entity-Name":"ImdbMedias",
            "Raven-Clr-Type":"Imdb.Data.ImdbMedia, Imdb.Data",
            "Temp-Index-Score":1.0
         }
      },
 ...
   ],
   "Includes":[

   ],
   "IsStale":false,
   "IndexTimestamp":"2013-01-07T22:59:56.6822443Z",
   "TotalResults":2410032,
   "SkippedResults":0,
   "IndexName":"ImdbMedias/Search",
   "IndexEtag":"00000001-0000-0100-0000-000000498caf",
   "ResultEtag":"f5f2512a-23bb-2df7-9d0a-40e00ab59370",
   "NonAuthoritativeInformation":false,
   "LastQueryTime":"2013-01-10T08:53:46.5904892Z"
}

Updated source code here:

Chris Marisic

unread,
Jan 10, 2013, 10:12:12 AM1/10/13
to rav...@googlegroups.com
ImdbFullTitle is stored in the index correct?

Tobias Sebring

unread,
Jan 10, 2013, 12:51:03 PM1/10/13
to rav...@googlegroups.com
Yep. Here is a screenshot from the Terms view:
https://dl.dropbox.com/u/6420016/imdbfulltitle.png

Chris Marisic

unread,
Jan 10, 2013, 3:48:35 PM1/10/13
to rav...@googlegroups.com
Seems like a bug then.

Tobias Sebring

unread,
Jan 10, 2013, 6:55:00 PM1/10/13
to rav...@googlegroups.com
Alright. I'll have to come back and try that optimization later then.

Thank you for your help Chris, much appreciated!

Chris Marisic

unread,
Jan 11, 2013, 8:23:34 AM1/11/13
to rav...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages