Memory leak with Lucene?

Ken Britton

unread,

Jun 14, 2012, 2:04:59 PM6/14/12

to ne...@googlegroups.com

I can't get Neo to stay up due to GC overhead and it looks like the heap is full of Lucene classes:

Object Histogram:

num #instances #bytes Class description

--------------------------------------------------------------------------

1: 8193809 196651416 org.apache.lucene.search.ScoreDoc

2: 4121303 131881696 java.util.HashMap$Entry

3: 4096000 131072000 org.neo4j.index.impl.lucene.HitDoc

4: 821260 111449680 byte[]

5: 4105009 98520216 java.lang.Long

6: 862241 92023072 char[]

7: 15547 59639568 java.lang.Object[]

8: 8934 42737760 java.util.HashMap$Entry[]

9: 1209678 29032272 org.apache.lucene.index.Term

10: 856545 27409440 java.lang.String

11: 792057 19009368 org.apache.lucene.util.BytesRef

12: 405110 16204400 org.apache.lucene.index.TermInfo

13: 405232 12967424 org.apache.lucene.util.PagedBytes$PagedBytesDataInput

14: 61414 9495976 * ConstMethodKlass

15: 61414 8361264 * MethodKlass

16: 407526 6520416 org.apache.lucene.index.TermInfosReader$CloneableTerm

17: 6153 6467232 * ConstantPoolKlass

18: 96071 5718064 * SymbolKlass

19: 6153 4678240 * InstanceKlassKlass

20: 5041 3743368 * ConstantPoolCacheKlass

Is this a memory leak? Is there some way to reclaim these objects?

Thanks,
Ken.

RickBullotta

unread,

Jun 17, 2012, 5:40:14 AM6/17/12

to ne...@googlegroups.com

Are you remembering to explicitly close your search results?

Michael Hunger

unread,

Jun 17, 2012, 5:45:10 AM6/17/12

to ne...@googlegroups.com

I think I asked the same question about the shared groovy script for the import.

IndexHits has a close method that should be called if you

# don't iterate the results fully
# want the resources to be reclaimed eagerly (i.e. don't want to wait for a gc-cycle)

Ken Britton

unread,

Jun 18, 2012, 12:40:00 PM6/18/12

to ne...@googlegroups.com

Thank you for replying. I'm only using the IndexHits class in one location and I'm explicitly closing it in a finally block. However, I'm using GremlinGroovyPipeline in a lot of places. I didn't see a method for closing a the gremlin pipeline so I assumed this is done automatically...?

As long as Gremlin is behaving, I think I've identified what is going on. I have a few queries which load a large number of nodes with a large number of string properties on them. It looks like Neo attempts to load everything using heap memory (I've left the memory-mapped IO at defaults) and simply cannot fit everything in memory. The JVM goes into Full GCs every second and I eventually get a GC overhead exceeded exception.

Is there anything built into Neo that detects this situation and throws an exception / provides a warning in the log / etc?

Thanks,

Ken.

Mattias Persson

unread,

Jun 19, 2012, 4:11:47 AM6/19/12

to ne...@googlegroups.com

You could perhaps try with cache_type=weak configuration option and see if neo4j releases its memory sooner. Otherwise there's the gcr cache (although in enterprise flavour) which deals very nicely with evicting stuff from the cache at a given max threshold.

2012/6/18 Ken Britton <kenbr...@gmail.com>

--
Mattias Persson, [mat...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com

Reply all

Reply to author

Forward