Memory leak with Lucene?

487 views
Skip to first unread message

Ken Britton

unread,
Jun 14, 2012, 2:04:59 PM6/14/12
to ne...@googlegroups.com
I can't get Neo to stay up due to GC overhead and it looks like the heap is full of Lucene classes:

Object Histogram:

num  #instances #bytes Class description
--------------------------------------------------------------------------
1: 8193809 196651416 org.apache.lucene.search.ScoreDoc
2: 4121303 131881696 java.util.HashMap$Entry
3: 4096000 131072000 org.neo4j.index.impl.lucene.HitDoc
4: 821260 111449680 byte[]
5: 4105009 98520216 java.lang.Long
6: 862241 92023072 char[]
7: 15547 59639568 java.lang.Object[]
8: 8934 42737760 java.util.HashMap$Entry[]
9: 1209678 29032272 org.apache.lucene.index.Term
10: 856545 27409440 java.lang.String
11: 792057 19009368 org.apache.lucene.util.BytesRef
12: 405110 16204400 org.apache.lucene.index.TermInfo
13: 405232 12967424 org.apache.lucene.util.PagedBytes$PagedBytesDataInput
14: 61414 9495976 * ConstMethodKlass
15: 61414 8361264 * MethodKlass
16: 407526 6520416 org.apache.lucene.index.TermInfosReader$CloneableTerm
17: 6153 6467232 * ConstantPoolKlass
18: 96071 5718064 * SymbolKlass
19: 6153 4678240 * InstanceKlassKlass
20: 5041 3743368 * ConstantPoolCacheKlass

Is this a memory leak?  Is there some way to reclaim these objects?

Thanks,
Ken.

RickBullotta

unread,
Jun 17, 2012, 5:40:14 AM6/17/12
to ne...@googlegroups.com
Are you remembering to explicitly close your search results?

Michael Hunger

unread,
Jun 17, 2012, 5:45:10 AM6/17/12
to ne...@googlegroups.com
I think I asked the same question about the shared groovy script for the import.

IndexHits has a close method that should be called if you

# don't iterate the results fully
# want the resources to be reclaimed eagerly (i.e. don't want to wait for a gc-cycle)

Ken Britton

unread,
Jun 18, 2012, 12:40:00 PM6/18/12
to ne...@googlegroups.com
Thank you for replying.  I'm only using the IndexHits class in one location and I'm explicitly closing it in a finally block.  However, I'm using GremlinGroovyPipeline in a lot of places.  I didn't see a method for closing a the gremlin pipeline so I assumed this is done automatically...?

As long as Gremlin is behaving, I think I've identified what is going on.  I have a few queries which load a large number of nodes with a large number of string properties on them.  It looks like Neo attempts to load everything using heap memory (I've left the memory-mapped IO at defaults) and simply cannot fit everything in memory.  The JVM goes into Full GCs every second and I eventually get a GC overhead exceeded exception.  

Is there anything built into Neo that detects this situation and throws an exception / provides a warning in the log / etc?

Thanks,
Ken.

Mattias Persson

unread,
Jun 19, 2012, 4:11:47 AM6/19/12
to ne...@googlegroups.com
You could perhaps try with cache_type=weak configuration option and see if neo4j releases its memory sooner. Otherwise there's the gcr cache (although in enterprise flavour) which deals very nicely with evicting stuff from the cache at a given max threshold.

2012/6/18 Ken Britton <kenbr...@gmail.com>



--
Mattias Persson, [mat...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
Reply all
Reply to author
Forward
0 new messages