MemoryLeaks - Round II

Tobi

unread,

Aug 15, 2011, 5:43:26 AM8/15/11

to rav...@googlegroups.com

Using the new options to limit the memory usage, I haven't been able to
create an OOME with my simple test program anymore.

But my more complex application still crashes with TimeOut exceptions
or OOME's after running for several hours.

I had it running over the weekend with Ants, taking a snapshot every 30
min.

What I see is, that the memory consumption grows exponentially. During
the first 5 hours, it stays below 85 MB (total size of objects). After
8h I have 180 MB, 379MB after 10 hours an finally after 13h it's 712MB
(at this point my test
loop breaks with an TimoutException from a WaitForNonStaleResults())

Ants tells me there's no memory fragmentation issue.

I would like to share the profiler results, unfortunately, a snapshot
every 30 minutes is way too much and Ants does not seem to allow me to
delete snapshots.

But here's a CSV of the diff between the snapshots at 5h and 13h:

http://www.filedropper.com/snapshot-comparison

At the top I have a

Dictionary<TKey, TValue>+Entry<Object,
LinkedListNode<SimpleLRUCache+ListValueEntry>>[]

with an increase of about 380MB.

This seems to be something that happens inside Lucene.Net.

I hope to find some more time this week to dig deeper into this.

Tobias

jalchr

unread,

Aug 15, 2011, 6:30:11 AM8/15/11

to rav...@googlegroups.com

I hope this will be the Final Round.

I think your guesses about Lucene are correct. In the pas few days I got OOME, which I tried to reproduce ... still no luck.

"Url": "/indexes/Transaction/Entity?query=&start=808960&pageSize=1024&aggregation=None&sort=-Created",

"Error": "System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.\r\n at Lucene.Net.Search.FieldComparator.StringOrdValComparator..ctor(Int32 numHits, String field, Int32 sortPos, Boolean reversed) in z:\\Projects\\ravendb\\SharedLibs\\Sources\\Lucene2.9.2\\src\\Lucene.Net\\Search\\FieldComparator.cs:line 709\r\n at Lucene.Net.Search.SortField.GetComparator(Int32 numHits, Int32 sortPos) in z:\\Projects\\ravendb\\SharedLibs\\Sources\\Lucene2.9.2\\src\\Lucene.Net\\Search\\SortField.cs:line 672\r\n at Lucene.Net.Search.FieldValueHitQueue.OneComparatorFieldValueHitQueue..ctor(SortField[] fields, Int32 size) in z:\\Projects\\ravendb\\SharedLibs\\Sources\\Lucene2.9.2\\src\\Lucene.Net\\Search\\FieldValueHitQueue.cs:line 87\r\n at Lucene.Net.Search.FieldValueHitQueue.Create(SortField[] fields, Int32 size) in z:\\Projects\\ravendb\\SharedLibs\\Sources\\Lucene2.9.2\\src\\Lucene.Net\\Search\\FieldValueHitQueue.cs:line 207\r\n at Lucene.Net.Search.TopFieldCollector.create(Sort sort, Int32 numHits, Boolean fillFields, Boolean trackDocScores, Boolean trackMaxScore, Boolean docsScoredInOrder) in z:\\Projects\\ravendb\\SharedLibs\\Sources\\Lucene2.9.2\\src\\Lucene.Net\\Search\\TopFieldCollector.cs:line 1017\r\n at Lucene.Net.Search.IndexSearcher.Search(Weight weight, Filter filter, Int32 nDocs, Sort sort, Boolean fillFields) in z:\\Projects\\ravendb\\SharedLibs\\Sources\\Lucene2.9.2\\src\\Lucene.Net\\Search\\IndexSearcher.cs:line 261\r\n at Lucene.Net.Search.IndexSearcher.Search(Weight weight, Filter filter, Int32 nDocs, Sort sort) in z:\\Projects\\ravendb\\SharedLibs\\Sources\\Lucene2.9.2\\src\\Lucene.Net\\Search\\IndexSearcher.cs:line 196\r\n at Lucene.Net.Search.Searcher.Search(Query query, Filter filter, Int32 n, Sort sort) in z:\\Projects\\ravendb\\SharedLibs\\Sources\\Lucene2.9.2\\src\\Lucene.Net\\Search\\Searcher.cs:line 107\r\n at Raven.Database.Indexing.Index.IndexQueryOperation.ExecuteQuery(IndexSearcher indexSearcher, Query luceneQuery, Int32 start, Int32 pageSize, IndexQuery indexQuery) in c:\\Builds\\raven-unstable\\Raven.Database\\Indexing\\Index.cs:line 707\r\n at Raven.Database.Indexing.Index.IndexQueryOperation.<Query>d__1c.MoveNext() in c:\\Builds\\raven-unstable\\Raven.Database\\Indexing\\Index.cs:line 562\r\n at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()\r\n at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()\r\n at System.Linq.Buffer`1..ctor(IEnumerable`1 source)\r\n at System.Linq.Enumerable.ToArray[TSource](IEnumerable`1 source)\r\n at Raven.Database.DocumentDatabase.<>c__DisplayClass6d.<Query>b__65(IStorageActionsAccessor actions) in c:\\Builds\\raven-unstable\\Raven.Database\\DocumentDatabase.cs:line 666\r\n at Raven.Storage.Esent.TransactionalStorage.ExecuteBatch(Action`1 action) in c:\\Builds\\raven-unstable\\Raven.Storage.Esent\\TransactionalStorage.cs:line 363\r\n at Raven.Storage.Esent.TransactionalStorage.Batch(Action`1 action) in c:\\Builds\\raven-unstable\\Raven.Storage.Esent\\TransactionalStorage.cs:line 333\r\n at Raven.Database.DocumentDatabase.Query(String index, IndexQuery query) in c:\\Builds\\raven-unstable\\Raven.Database\\DocumentDatabase.cs:line 633\r\n at Raven.Database.Server.Responders.Index.PerformQueryAgainstExistingIndex(IHttpContext context, String index, IndexQuery indexQuery, Guid& indexEtag) in c:\\Builds\\raven-unstable\\Raven.Database\\Server\\Responders\\Index.cs:line 150\r\n at Raven.Database.Server.Responders.Index.ExecuteQuery(IHttpContext context, String index, Guid& indexEtag) in c:\\Builds\\raven-unstable\\Raven.Database\\Server\\Responders\\Index.cs:line 136\r\n at Raven.Database.Server.Responders.Index.GetIndexQueryRessult(IHttpContext context, String index) in c:\\Builds\\raven-unstable\\Raven.Database\\Server\\Responders\\Index.cs:line 92\r\n at Raven.Database.Server.Responders.Index.OnGet(IHttpContext context, String index) in c:\\Builds\\raven-unstable\\Raven.Database\\Server\\Responders\\Index.cs:line 84\r\n at Raven.Database.Server.Responders.Index.Respond(IHttpContext context) in c:\\Builds\\raven-unstable\\Raven.Database\\Server\\Responders\\Index.cs:line 46\r\n at Raven.Http.HttpServer.DispatchRequest(IHttpContext ctx) in c:\\Builds\\raven-unstable\\Raven.Http\\HttpServer.cs:line 382\r\n at Raven.Http.HttpServer.HandleActualRequest(IHttpContext ctx) in c:\\Builds\\raven-unstable\\Raven.Http\\HttpServer.cs:line 211"

}

Itamar Syn-Hershko

unread,

Aug 15, 2011, 7:11:09 AM8/15/11

to rav...@googlegroups.com

Ok, so now we need to determine whether its Lucene.Net's fault or ours. It may be due to a faulty LRU cache implementation - I'll look into it.

Does your "complex" app handle significantly more documents, or larger documents?

Can you make a dump of that dictionary object, or rather the SimpleLRUCache ? What I'm interested to see is the count of object it holds references to. It's size should be 1024 - if its significantly larger than it doesn't do removals correctly. Otherwise it leaks memory.

Lastly, can you create a simple application that mimics this behavior you have in your project?

Tobi

unread,

Aug 15, 2011, 7:47:07 AM8/15/11

to rav...@googlegroups.com

Am 15.08.2011 13:11, schrieb Itamar Syn-Hershko:

> Does your "complex" app handle significantly more documents, or larger
> documents?

What's a good way to measure the document size?
I have basically two type of docs involved. "Product" translates to a
formatted JsonDoc of about 3k and "Sale" is about 40k.

And the real app has some more indexes involved (but just 8 indexes, no
map/reduce).

> Can you make a dump of that dictionary object, or rather the SimpleLRUCache
> ?

How can I create a dump with the Ants Memory Profiler?

> What I'm interested to see is the count of object it holds references to.
> It's size should be 1024 - if its significantly larger than it doesn't do
> removals correctly. Otherwise it leaks memory.

It doesn't look like there's a SimpleLRUCache with more than 1024
entries. But there are a lot of SimpleLRUCache instances (13.016, this
was just 229 at the second snapshot).

What I also see are whopping 274.083 instances of
"ThreadLocal<T>+Boxed<Object>". This seems to relate to
CloseableThreadLocal in Lucene.Net. Looks like the ThreadLocal instance
created there is never disposed.

> Lastly, can you create a simple application that mimics this behavior you
> have in your project?

I'm working on this - it's not that easy, if you have to wait hours to
see a significant increase in memory usage.

Tobias

Itamar Syn-Hershko

unread,

Aug 15, 2011, 8:20:57 AM8/15/11

to rav...@googlegroups.com

inline

On Mon, Aug 15, 2011 at 2:47 PM, Tobi <lista...@e-tobi.net> wrote:

Am 15.08.2011 13:11, schrieb Itamar Syn-Hershko:

Does your "complex" app handle significantly more documents, or larger
documents?

What's a good way to measure the document size?
I have basically two type of docs involved. "Product" translates to a formatted JsonDoc of about 3k and "Sale" is about 40k.

And the real app has some more indexes involved (but just 8 indexes, no map/reduce).

Document size in this respect is number of terms count, meaning how many terms there are, particularly unique terms. If you have 1M products each with a field of 150 words, and fields are not repeating themselves too much, and that field is being indexed - that would result in a big index, with a lot of term caching.

Can you make a dump of that dictionary object, or rather the SimpleLRUCache
?

How can I create a dump with the Ants Memory Profiler?

I'm not familiar with that profiler. The more info you can give us on that SimpleLRUCache object, the better. It is being used by Lucene.Net.Index.TermInfosReader under (ThreadResources)

What I'm interested to see is the count of object it holds references to.
It's size should be 1024 - if its significantly larger than it doesn't do
removals correctly. Otherwise it leaks memory.

It doesn't look like there's a SimpleLRUCache with more than 1024 entries. But there are a lot of SimpleLRUCache instances (13.016, this was just 229 at the second snapshot).

What I also see are whopping 274.083 instances of "ThreadLocal<T>+Boxed<Object>". This seems to relate to CloseableThreadLocal in Lucene.Net. Looks like the ThreadLocal instance created there is never disposed.

13K instances of SimpleLRUCache would mean 13K indexes (since we have a searcher per index), or a memory leak. I assume that is the latter.

We already fixed a similar issue like what you're describing with ThreadLocal quite a long while ago. Oren would know the specifics.

Lastly, can you create a simple application that mimics this behavior you
have in your project?

I'm working on this - it's not that easy, if you have to wait hours to see a significant increase in memory usage.

I will review some related code to see if I can spot anything while you do.

Itamar.

Ayende Rahien

unread,

Aug 15, 2011, 10:19:37 AM8/15/11

to rav...@googlegroups.com

Okay, that sucks.

I think that I know what is going on, although I am not so sure about why.

If the number of SimpleLRUCache grows, it is likely that there is an issue with disposing them, the question is _why_ they are still being held.

We recreate a search after every indexing batch, but I am pretty sure we properly dispose of them.

Tobi,

Can you give me the rooted information for all the SimpleLRUCache and for the Thread Local stuff ?

I fixed a problem there a while ago, and I am pretty sure that it should have been fixed since.

Maybe it is an issue with how we do scheduling? (Probably not).

If you can give us the root path information, that would help track down the source of this.

Tobi

unread,

Aug 15, 2011, 11:11:44 AM8/15/11

to rav...@googlegroups.com

Am 15.08.2011 16:19, schrieb Ayende Rahien:

> I think that I know what is going on, although I am not so sure about why.
> If the number of SimpleLRUCache grows, it is likely that there is an issue
> with disposing them, the question is _why_ they are still being held.
> We recreate a search after every indexing batch, but I am pretty sure we
> properly dispose of them.

> Tobi,
> Can you give me the rooted information for all the SimpleLRUCache and for
> the Thread Local stuff ?

That's all I can give you right now:

http://www.filedropper.com/ants_1

I had too many snapshots and Ants couldn't save them and just died.
I'll have to start a new test today before leaving.

Could it be a problem, that the ThreadLocal<> in CloseableThreadLocal
never gets disposed? (it's listed with >274.000 instances in Ants).

Tobias

Itamar Syn-Hershko

unread,

Aug 15, 2011, 11:28:19 AM8/15/11

to rav...@googlegroups.com

Thats probably it - or the searcher object containing the cache (through one of its readers) is not disposed. The question is why it isn't disposed. We fixed a leak in Lucene.Net related to CloseableThreadLocal, and are not aware of any other leak that could cause this.

Ayende Rahien

unread,

Aug 15, 2011, 10:40:17 PM8/15/11

to rav...@googlegroups.com

I think that I know the bug, I wasn't disposing of the ThreadLocal instance properly, I assumed it would go away when de-reference, but obviously it doesn't.

I did a better code review of our usage of that and I think that I caught all of the places where this is happening.

New build should be out in a few hours, and hopefully that would resolve that issue once and for all.

Tobi,

Thanks a lot for being so through in finding and tracking this. Hopefully we can now put it to rest.

Justin A

unread,

Aug 16, 2011, 12:28:43 AM8/16/11

to ravendb

And where can we donate beer too, for this great effort from u gents?

Tobi

unread,

Aug 16, 2011, 4:10:29 AM8/16/11

to rav...@googlegroups.com

Am 20:59, schrieb Ayende Rahien:

> I think that I know the bug, I wasn't disposing of the ThreadLocal instance
> properly,

Great, so my guess was right...

> I did a better code review of our usage of that and I think that I caught
> all of the places where this is happening.
> New build should be out in a few hours, and hopefully that would resolve
> that issue once and for all.

I eagerly await the commit!

> Thanks a lot for being so through in finding and tracking this. Hopefully we
> can now put it to rest.

I'll start another long-running test as soon as you've commited your
changes.

Tobias

Tobi

unread,

Aug 16, 2011, 8:18:49 AM8/16/11

to rav...@googlegroups.com

> I eagerly await the commit!

Are you going to push the ThreadLocal-changes today?

Tobias

Ayende Rahien

unread,

Aug 16, 2011, 9:16:17 AM8/16/11

to rav...@googlegroups.com

Yes

Ayende Rahien

unread,

Aug 16, 2011, 5:52:56 PM8/16/11

to rav...@googlegroups.com

New build is out with this fix

Tobi

unread,

Aug 16, 2011, 6:04:13 PM8/16/11

to rav...@googlegroups.com

On 16.08.2011 23:52, Ayende Rahien wrote:

> New build is out with this fix

Thanks! - I'm already building it and will start a new test tonight, so I
can hopefully tell you tomorrow afternoon, if this solves the memory trouble.

Tobias

Tobi

unread,

Aug 16, 2011, 6:30:45 PM8/16/11

to rav...@googlegroups.com

On 17.08.2011 00:04, Tobi wrote:

> Thanks! - I'm already building it

... got an assertion error when building RavenDB:

http://e-tobi.net/stuff/ravendb/assertion1.png

(This is about accessing an already disposed ThreadLocal)

It didn't happen when I ran the build a second time.

Tobias

Tobi

unread,

Aug 16, 2011, 7:20:56 PM8/16/11

to rav...@googlegroups.com

On 17.08.2011 00:30, Tobi wrote:

> ... got an assertion error when building RavenDB:
> http://e-tobi.net/stuff/ravendb/assertion1.png

> It didn't happen when I ran the build a second time.

But it happened again during the third build, but at a different place:

http://e-tobi.net/stuff/ravendb/assertion2.png

At the 4'th build I got:

http://pastie.org/2383095

Tobias

Ayende Rahien

unread,

Aug 17, 2011, 12:02:15 AM8/17/11

to rav...@googlegroups.com

The first exception I don't really get, what does NLog has to do with this?

The second exception I'll fix.

Tobi

unread,

Aug 17, 2011, 5:01:23 AM8/17/11

to rav...@googlegroups.com

So far it looks fine. After about 8 hours the memory usage seems to be
more or less constant. Ants says the total size of objects is 54MB,
which was >180MB before.

I still have an increasing number of ThreadLocal<T>+Boxed<Object>
instances (>17.000 right now). But I have no idea, where they come
from. Ants shows them as childs of object[] type instances (which are
the GC root) but they seem to be created by Lucene (Value is
FieldsReader or TermInfosReader or just null.

I guess they will be collected at some point...

Tobias

Tobi

unread,

Aug 17, 2011, 11:16:23 AM8/17/11

to rav...@googlegroups.com

I don't have the time to dig deeper into this right now, so here just
two more exception traces related to the recent ThreadLocal changes:

http://pastie.org/2386321

http://pastie.org/2386349

Tobias

Ayende Rahien

unread,

Aug 17, 2011, 3:05:24 PM8/17/11

to rav...@googlegroups.com

Tobi, thanks for that, I am going to fix that, but it will probably only be neאt week, since I am traveling currently

Tobi

unread,

Aug 17, 2011, 7:57:51 PM8/17/11

to rav...@googlegroups.com

On 17.08.2011 21:05, Ayende Rahien wrote:

> Tobi, thanks for that, I am going to fix that, but it will probably only
> be neאt week, since I am traveling currently

Pity! I have a lot of trouble with the memory leak issues, but don't trust
the current unstable build.

I tried to fix the ThreadLocal issues myself, but it's kinda hard to
understand what's going on there and why ThreadLocal members are accessed
after the object has been disposed.

Here's what I tried so far:

https://github.com/e-tobi/ravendb/commit/ec17e91e9a1bac4bcc5dad2f813545bb0f3ac623

The same needs to be done in the Esent storage. It kinda works, but I
don't know how to cover this with tests and I'm not sure if there isn't a
better way to fix these problems. And there are more problems with
the ThreadLocal disposal in other places.

Tobias

Ayende Rahien

unread,

Aug 17, 2011, 9:07:42 PM8/17/11

to rav...@googlegroups.com

That is why it is called unstable
I'll try to get to it ASAP, but I'm traveling, and it is hard to know when I'll have time to dedicate to it
Next week, this has the first prio

On Wednesday, August 17, 2011, Tobi <lista...@e-tobi.net> wrote:

Ayende Rahien

unread,

Aug 18, 2011, 2:52:22 AM8/18/11

to rav...@googlegroups.com

I don't think that I follow why this change would work, since this is already covered by the lock, it shouldn't be any different.

Going to look into that in more details now

Ayende Rahien

unread,

Aug 18, 2011, 2:56:23 AM8/18/11

to rav...@googlegroups.com

Okay, I saw the logic now, made a slightly different change, and with the pull of your nlog change, we have a new build

Tobi

unread,

Aug 18, 2011, 3:01:54 AM8/18/11

to rav...@googlegroups.com

On 18.08.2011 03:07, Ayende Rahien wrote:

> That is why it is called unstable

I know. But someone has to test unstable and figure out all those nasty
little problems :-)

BTW: What happened to the unstable build/commit logs posted to the mailing
list? There hasn't been one since build-428.

> Next week, this has the first prio

Ok. Looking forward to it!

I'm testing the current unstable now for >24 hours. I get some index
errors because of the ThreadLocal-Disposal issues, but until now I never
got so far. Memory usage is now much more sane.

I still have a growing number of ThreadLocal<T>+Boxed<Object> instances
(>30.000 at the moment). They are rooted in object[] type instances.

And I guess running this for some more days, I would get an OOME again.

The only explanation I have is, that CloseableThreadLOocal.Close() isn't
called in all cases.

If time permits I'll try to track this down.

Tobias

Ayende Rahien

unread,

Aug 18, 2011, 3:10:59 AM8/18/11

to rav...@googlegroups.com

inline

On Thu, Aug 18, 2011 at 3:01 AM, Tobi <lista...@e-tobi.net> wrote:

On 18.08.2011 03:07, Ayende Rahien wrote:

> That is why it is called unstable

I know. But someone has to test unstable and figure out all those nasty
little problems :-)

Tobi, in general, you are correct. And we do a more thorough run of even unstable build, I am currently traveling, and I am not using my main machine, so...

BTW: What happened to the unstable build/commit logs posted to the mailing
list? There hasn't been one since build-428.

I directed them to a private email, I restore them now

> Next week, this has the first prio

Ok. Looking forward to it!

Actually, that is what 3 AM is for, right? You should have a new build in an hour or so.

Same caveats about the unstableness, but I have good hopes for it.

I'm testing the current unstable now for >24 hours. I get some index
errors because of the ThreadLocal-Disposal issues, but until now I never
got so far. Memory usage is now much more sane.

That is much better!

I still have a growing number of ThreadLocal<T>+Boxed<Object> instances
(>30.000 at the moment). They are rooted in object[] type instances.

Hm...

That is interesting, I am pretty sure that we are disposing of all of our thread local instances.

Do you have any idea who created those thread local instances?

And I guess running this for some more days, I would get an OOME again.

The only explanation I have is, that CloseableThreadLOocal.Close() isn't
called in all cases.

:-(

Agreed, and that might mean that we need to look at that in more details.

Sigh...

Tobi

unread,

Aug 18, 2011, 3:23:01 AM8/18/11

to rav...@googlegroups.com

On 18.08.2011 08:52, Ayende Rahien wrote:

> I don't think that I follow why this change would work, since this is
> already covered by the lock, it shouldn't be any different.

No. The first access to current.Value in Batch() is NOT covered by the lock.

https://github.com/ayende/ravendb/blob/master/Raven.Storage.Managed/TransactionalStorage.cs#L88

Tobias

Ayende Rahien

unread,

Aug 18, 2011, 3:23:54 AM8/18/11

to rav...@googlegroups.com

Yes, I know, I was referring to your code. I modified it so the dispose check that is happening there will be valid for that scenario as well

Tobi

unread,

Aug 18, 2011, 3:40:38 AM8/18/11

to rav...@googlegroups.com

On 18.08.2011 09:10, Ayende Rahien wrote:

> Actually, that is what 3 AM is for, right?

:-)

>You should have a new build in an hour or so.

Great!

> That is interesting, I am pretty sure that we are disposing of all of our
> thread local instances.
> Do you have any idea who created those thread local instances?

For about 1.700 of the 30.000 instances Value is != null and contains
Lucene related stuff like FieldsReader or TermInfosReader.

Ants doesn't show me who created the instances, I think dotTrace can do
this - maybe I'll have to try this one.

Tobias

Ayende Rahien

unread,

Aug 18, 2011, 3:45:47 AM8/18/11

to rav...@googlegroups.com

Okay, that probably means that we need to do a code review of Lucene and see if they aren't closing the thread local stuff.

If they do close it, it might might that we aren't properly disposing of Lucene resources.

Tobi

unread,

Aug 18, 2011, 5:10:42 AM8/18/11

to rav...@googlegroups.com

Am 18.08.2011 09:45, schrieb Ayende Rahien:

> Okay, that probably means that we need to do a code review of Lucene and see
> if they aren't closing the thread local stuff.

At first sight, it looks like, Lucene.Net.Analysis.Analyzer.Close()
isn't used at all.

Tobias

Tobi

unread,

Aug 18, 2011, 5:17:57 AM8/18/11

to rav...@googlegroups.com

Am 18.08.2011 11:10, schrieb Tobi:

> At first sight, it looks like, Lucene.Net.Analysis.Analyzer.Close()
> isn't used at all.

Forget this - Analyzers are not instantiated from within Lucene.Net.
I've looked at the QueryParsers Main() without noticing.

Tobias

Tobi

unread,

Aug 18, 2011, 11:59:39 AM8/18/11

to rav...@googlegroups.com

Am 18.08.2011 09:45, schrieb Ayende Rahien:

> Okay, that probably means that we need to do a code review of Lucene and see
> if they aren't closing the thread local stuff.
> If they do close it, it might might that we aren't properly disposing of
> Lucene resources.

Ok. I did some "brute-force" analysis of this problem by adding some
code to track when a CloseableThreadLocal is created and what instances
are not closed. The number of unclosed CloseableThreadLocal's is
significantly increasing over time:

2011-08-18T17:28:40Z, 238
2011-08-18T17:30:42Z, 312
2011-08-18T17:32:45Z, 375
2011-08-18T17:34:47Z, 470
2011-08-18T17:36:49Z, 455
2011-08-18T17:38:52Z, 560
2011-08-18T17:40:55Z, 595
2011-08-18T17:42:57Z, 683
2011-08-18T17:44:58Z, 739
2011-08-18T17:46:59Z, 748
2011-08-18T17:49:01Z, 825
2011-08-18T17:51:05Z, 887
2011-08-18T17:53:07Z, 984

Here's a list of stack traces from within the CloseableThreadLocal
ctor, where no Close() was called yet:

http://www.filedropper.com/dump1

The first one belongs to a SimpleAnalyzer created by
Raven.Database.Indexing.IndexStorage. This is created per
DocumentStore, so nothing to worry about - this is expected.

The second oldest stack trace looks more interesting.

FieldsReader gets cloned here and assigned to a FieldsReaderLocal,
which itself is a ClosableThreadLocal - feels like having a knot in my
brain :-)

I guess FieldsReaderLocal should Close its wrapped value when being
closed itself:

+ public override void Close()
+ {
+ ((FieldsReader)Get()).Close();
+ base.Close();
+ }

This doesn't seem to hurt, but it doesn't help either. It's hard to
follow, what's going on inside Lucene.Net :-(

I'll now try to test this in isolation somehow, maybe this brings me
closer to the root of all evil.

Tobias

PS: This is how I created the above stack traces - "brute force" - I
told you :-):

https://gist.github.com/1154378

Tobi

unread,

Aug 19, 2011, 12:02:49 PM8/19/11

to rav...@googlegroups.com

Am 18.08.2011 17:59, schrieb Tobi:

> I guess FieldsReaderLocal should Close its wrapped value when being
> closed itself:
>
> + public override void Close()
> + {
> + ((FieldsReader)Get()).Close();
> + base.Close();
> + }

Gotcha!

Took me some amount of thread-thinking, but this is indeed the "problem
zone". My above solution only works partially. Because of the
nature of ThreadLocal the above code will only close the FieldsReader
assigned to the thread that calls FieldsReaderLocal.Close(), not the
ones that have been created by other threads.

So assigning any IDisposable (or like in the Lucene example something
where Close() needs to be called), to ThreadLocal.Value generally is a
bad idea if you can't make each thread responsible for
disposing/closing it's value.

Currently I see no other solution than to track all created
FieldsReader instances for all threads and explicitly close
them in FieldsReaderLocal.Close().

Hope this helps...

Tobias

Ayende Rahien

unread,

Aug 22, 2011, 10:09:05 AM8/22/11

to Tobi, rav...@googlegroups.com

Tobi,

It helped a LOT

I modified our fork of Lucene.NET to reflect those behaviors, and it will now properly dispose of all of those values.

There is a new build getting ready to pop now, and I would be grateful if you can run the same test cases on that.

Itamar,

Can you post our changes to the Lucene.NET mailing list, and ensure that they are aware of those changes and incorporate them in the next version?

Itamar Syn-Hershko

unread,

Aug 22, 2011, 2:06:28 PM8/22/11

to rav...@googlegroups.com

Will do first thing tomorrow

Tobi

unread,

Aug 22, 2011, 2:37:04 PM8/22/11

to rav...@googlegroups.com

On 22.08.2011 16:09, Ayende Rahien wrote:

> I modified our fork of Lucene.NET to reflect those behaviors, and it will
> now properly dispose of all of those values.

Thanks!

> There is a new build getting ready to pop now, and I would be grateful if
> you can run the same test cases on that.