fixed memory leak

290 views
Skip to first unread message

Walter Raboch

unread,
Mar 15, 2011, 3:35:43 PM3/15/11
to OrientDB
Hi Luca,

we found another memory hole here. I know there is one left in the
index, but the first I found is in the cache.

Basically the problem is in java's LinkedHashMap. It does not release
the records at all! So heap is getting bigger and bigger all the time.

Below I send you some patch and a new implementation using google's
ConcurrentLinkedHashMap.

It is not completely finished, but should help you as a starting
point. In my tests (without index) everything works fine and we create
50 million records in a batch without needing more than 4 MB of heap.

Additionally I detach all records before I put it into the cache - I
think that should be to avoid references between the records.

Than I added a flag to avoid more than one memory watchdog listener
instance because you register it more than once.

Best regards,
Walter

1. add dependency:
==============

+ <dependency>
+ <groupId>com.googlecode.concurrentlinkedhashmap</groupId>
+ <artifactId>concurrentlinkedhashmap-lru</artifactId>
+ <version>1.1</version>
+ </dependency>


2. the patch:
=========
ignore my logging output - but it helped to track down - especially
printing the size of cache before and after memorycleanup.


Index: src/main/java/com/orientechnologies/orient/core/cache/
OAbstractRecordCache.java
===================================================================
--- src/main/java/com/orientechnologies/orient/core/cache/
OAbstractRecordCache.java (revision 2506)
+++ src/main/java/com/orientechnologies/orient/core/cache/
OAbstractRecordCache.java (working copy)
@@ -22,6 +22,7 @@
import com.orientechnologies.orient.core.OMemoryWatchDog.Listener;
import com.orientechnologies.orient.core.Orient;
import com.orientechnologies.orient.core.id.ORID;
+import com.orientechnologies.orient.core.record.ORecordInternal;

/**
* Cache of documents.
@@ -31,23 +32,30 @@
*/
public abstract class OAbstractRecordCache extends OSharedResource {
protected int maxSize;
- protected ORecordCache entries;
+// protected ORecordCache entries;
+ protected GoogleRecordCache entries;

protected Listener watchDogListener;
protected String profilerPrefix;

+ protected String name;
+ protected boolean startupCompleted = false;
+
/**
* Create the cache of iMaxSize size.
*
* @param iMaxSize
* Maximum number of elements for the cache
*/
- public OAbstractRecordCache(final String iProfilerPrefix, final int
iMaxSize) {
+ public OAbstractRecordCache(final String iProfilerPrefix, final int
iMaxSize, final String name) {
profilerPrefix = iProfilerPrefix;
maxSize = iMaxSize;
+ this.name = name;

final int initialSize = maxSize > -1 ? maxSize + 1 : 1000;
- entries = new ORecordCache(maxSize, initialSize, 0.75f);
+// entries = new ORecordCache(maxSize, initialSize, 0.75f);
+ entries = new GoogleRecordCache(maxSize);
+
}

public boolean existsRecord(final ORID iRID) {
@@ -132,6 +140,7 @@
entries.clear();

Orient.instance().getMemoryWatchDog().removeListener(watchDogListener);
watchDogListener = null;
+ startupCompleted = false;

} finally {
releaseExclusiveLock();
@@ -143,13 +152,16 @@
}

public void startup() {
+ if (startupCompleted) return;
+
watchDogListener =
Orient.instance().getMemoryWatchDog().addListener(new Listener() {
/**
* Auto reduce cache size of 50%
*/
public void memoryUsageLow(TYPE iType, final long usedMemory,
final long maxMemory) {
if (iType == TYPE.JVM) {
- acquireExclusiveLock();
+ OLogManager.instance().info(this, "Low memory handler for cache
%s called", name + ":" + System.identityHashCode(this));
+ acquireExclusiveLock();
try {
final int oldSize = entries.size();
if (oldSize == 0)
@@ -157,11 +169,21 @@
return;

final int threshold = (int) (oldSize * 0.5f);
-
- entries.removeEldestItems(threshold);
-
- OLogManager.instance().debug(this, "Low memory: auto reduce the
record cache size from %d to %d", oldSize, threshold);
+
+ // reduce maxsize
+ if (maxSize > -1) {
+ maxSize = (int) (maxSize * 0.8f);
+ } else {
+ // if cache is unlimited, limit it now, because heap space is
running out -> automatically find right size
+ maxSize = (int) (oldSize * 0.8f);
+ }
+ entries.setMaxSize(maxSize);
+// entries.removeEldestItems(threshold);
+
+ OLogManager.instance().info(this, "----- reduced cache from %d
to %d entries, maxsize=%d", oldSize, entries.size(),
entries.getMaxSize());
+ OLogManager.instance().info(this, "Low memory: auto reduce the
record cache size from %d to %d", oldSize, threshold);
} catch (Exception e) {
+ e.printStackTrace();
OLogManager.instance().error(this, "Error while freeing
resources", e);
} finally {
releaseExclusiveLock();
@@ -202,5 +224,8 @@
return maxSize;
}
});
+
+ OLogManager.instance().info(this, "watchDogListener listener
created: %s:%s", this.toString(), System.identityHashCode(this));
+ startupCompleted = true;
}
}
Index: src/main/java/com/orientechnologies/orient/core/cache/
ODatabaseRecordCache.java
===================================================================
--- src/main/java/com/orientechnologies/orient/core/cache/
ODatabaseRecordCache.java (revision 2506)
+++ src/main/java/com/orientechnologies/orient/core/cache/
ODatabaseRecordCache.java (working copy)
@@ -37,7 +37,7 @@
private String PROFILER_CACHE_NOTFOUND;

public ODatabaseRecordCache(final ODatabaseRecord iDatabase) {
- super("db." + iDatabase.getName(),
OGlobalConfiguration.DB_CACHE_SIZE.getValueAsInteger());
+ super("db." + iDatabase.getName(),
OGlobalConfiguration.DB_CACHE_SIZE.getValueAsInteger(),
"ODatabaseRecordCache");
database = iDatabase;
}

@@ -59,6 +59,7 @@

acquireExclusiveLock();
try {
+ if (iRecord != null) iRecord.detach();
entries.put(iRecord.getIdentity(), iRecord);
} finally {
releaseExclusiveLock();
Index: src/main/java/com/orientechnologies/orient/core/cache/
OStorageRecordCache.java
===================================================================
--- src/main/java/com/orientechnologies/orient/core/cache/
OStorageRecordCache.java (revision 2506)
+++ src/main/java/com/orientechnologies/orient/core/cache/
OStorageRecordCache.java (working copy)
@@ -38,7 +38,7 @@
}

public OStorageRecordCache(final OStorage iStorage) {
- super("storage." + iStorage.getName(),
OGlobalConfiguration.STORAGE_CACHE_SIZE.getValueAsInteger());
+ super("storage." + iStorage.getName(),
OGlobalConfiguration.STORAGE_CACHE_SIZE.getValueAsInteger(),
"OStorageRecordCache");
storage = iStorage;

setStrategy(OGlobalConfiguration.STORAGE_CACHE_STRATEGY.getValueAsInteger());
}
@@ -51,8 +51,10 @@
acquireExclusiveLock();
try {
final ORecord<?> record = entries.get(iRecord.getIdentity());
- if (record == null || iRecord.getVersion() > record.getVersion())
+ if (record == null || iRecord.getVersion() > record.getVersion())
{
+ if (iRecord !=null) iRecord.detach();
entries.put(iRecord.getIdentity(), iRecord);
+ }

} finally {
releaseExclusiveLock();



------
3. in the end the code of my GoogleRecordCache:
====================================

package com.orientechnologies.orient.core.cache;

import com.googlecode.concurrentlinkedhashmap.CapacityLimiter;
import com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap;
import com.googlecode.concurrentlinkedhashmap.Weigher;
import com.orientechnologies.orient.core.id.ORID;
import com.orientechnologies.orient.core.record.ORecordInternal;


/**
* @author Walter Raboch <wra...@ingen.at>
*
*/
public class GoogleRecordCache {
ConcurrentLinkedHashMap<ORID, ORecordInternal<?>> map;

/**
* @param maxSize
*/
public GoogleRecordCache(final int maxSize) {
// TODO make size configurable
map = new ConcurrentLinkedHashMap.Builder<ORID, ORecordInternal<?
>>()
.weigher(new Weigher<ORecordInternal<?>>() {
public int weightOf(ORecordInternal<?> arg0) {
return 1;
}
})
.maximumWeightedCapacity(10000)
.capacityLimiter(new CapacityLimiter() {

public boolean hasExceededCapacity(ConcurrentLinkedHashMap<?, ?>
map) {
if (map.size() > 10000)
return true;
return false;
}
})
.build();
}

public int getMaxSize() {
return map.capacity();
}

public void setMaxSize(final int maxSize) {
// TODO make size hot changeable
}

public boolean containsKey(final ORID iRID) {
return map.containsKey(iRID);
}

public ORecordInternal<?> remove(ORID iRID) {
return map.remove(iRID);
}

public void clear() {
map.clear();
}

public int size() {
return map.size();
}

public ORecordInternal<?> get(ORID identity) {
return map.get(identity);
}

public void put(ORID identity, ORecordInternal<?> iRecord) {
map.put(identity, iRecord);
}

public Iterable<ORecordInternal<?>> values() {
return map.values();
}
}

Luca Garulli

unread,
Mar 15, 2011, 4:11:49 PM3/15/11
to orient-database, Walter Raboch
On 15 March 2011 20:35, Walter Raboch <wra...@gmail.com> wrote:
Hi Luca,

Hi Walter
 
we found another memory hole here. I know there is one left in the
index, but the first I found is in the cache.

Basically the problem is in java's LinkedHashMap. It does not release
the records at all! So heap is getting bigger and bigger all the time.

What kind of bug?

Below I send you some patch and a new implementation using google's
ConcurrentLinkedHashMap.

Your SVN patch can't be read since the email formats bad it. Try to add it as attachment.

In general I prefer to avoid to add 3rd party components unless are necessary. I've to test why LinkedHashMap is bugged and Google's is better.
 
It is not completely finished, but should help you as a starting
point. In my tests (without index) everything works fine and we create
50 million records in a batch without needing more than 4 MB of heap.

Additionally I detach all records before I put it into the cache - I
think that should be to avoid references between the records.
 
detach() was always called before to call pushRecord() but when has no sense: loading from the network. However it's cleaner to put the detach() inside the pushRecord(), so i've applied your patch here ;-)

Than I added a flag to avoid more than one memory watchdog listener
instance because you register it more than once.

I register a watchdog per cache, so it's correct having multiple registration.
 
Best regards,
Walter

Thank you for the contribution,
Lvc@

Walter Raboch

unread,
Mar 15, 2011, 4:53:05 PM3/15/11
to OrientDB
Hi Luca,

LinkedHashMap does not remove the entry neither with remove() nor with
removeEldestEntry() method. removeEldestEntry is called by the put()
method and if true is returned remove() is called too. so it does not
mind if you call remove() in removeEldestEntry() (and return false -
this is expected if the list accessOrder=true) or return true. this
seems to be a know bug - Michael can send you the link to the bug
description.

so the problem is: the size of the list never gets smaller. if you try
with my log output in the lowmemoryhandler in the end, you will see,
that oldSize and entries.size() are the same and size() is always
bigger than maxSize.

google's implementation is fast and working - i tested it with a bulk
insert with 50 billion records. with java LinkHashMap I have not
reached 4 billion before getting an OutOfMemoryException.

for registration of the watchdog - i know you have more caches. but
you registered the listener for the same cache twice. I have it in my
logs somewhere.

Best regards,
Walter

Michael Widmann

unread,
Mar 15, 2011, 5:31:47 PM3/15/11
to orient-...@googlegroups.com
Think that the very first time the database get up to more than 100mio entries without a problem 

So if an external library for the first time fixes massive inserts problems - memory problems 
OOMs at all (without index actually) - it should be fixed with the library .... 

Just my 2 cents after several days / nearly weeks for searching that bug ...

Michael / bayoda 

2011/3/15 Walter Raboch <wra...@gmail.com>



--
bayoda.com - Professional Online Backup Solutions for Small and Medium Sized Companies

Luca Garulli

unread,
Mar 15, 2011, 6:09:16 PM3/15/11
to orient-database, Walter Raboch
On 15 March 2011 21:53, Walter Raboch <wra...@gmail.com> wrote:
Hi Luca,

LinkedHashMap does not remove the entry neither with remove() nor with
removeEldestEntry() method. removeEldestEntry is called by the put()
method and if true is returned remove() is called too. so it does not
mind if you call remove() in removeEldestEntry() (and return false -
this is expected if the list accessOrder=true) or return true. this
seems to be a know bug - Michael can send you the link to the bug
description.

To test it I've written this test and succeed! It puts 1 millions of entries but I set the size to 10 and after the test only 10 entries are in memory. The full source code is available here: http://code.google.com/p/orient/source/browse/trunk/tests/src/test/java/com/orientechnologies/orient/test/java/collection/LinkedHashMapTest.java

public class LinkedHashMapTest {
public static void main(String[] args) {
LinkedHashMap<Integer, Integer> lhm = new LinkedHashMap<Integer, Integer>(100, 0.7f, true) {
@Override
protected boolean removeEldestEntry(final Map.Entry<Integer, Integer> iEldest) {
return size() > 10;
}
};

for (int i = 0; i < 1000000; ++i) {
lhm.put(i, i);
}
Assert.assertEquals(lhm.size(), 10);
}
}

The test ends with 10 entries. So the problem seems not in the LinkedHashMap itself, maybe it's in other places.

so the problem is: the size of the list never gets smaller. if you try
with my log output in the lowmemoryhandler in the end, you will see,
that oldSize and entries.size() are the same and size() is always
bigger than maxSize.

I've seen something like you reported some days ago, but I don't know with last changes works. I'll keep it under control.
 
google's implementation is fast and working - i tested it with a bulk
insert with 50 billion records. with java LinkHashMap I have not
reached 4 billion before getting an OutOfMemoryException.

If Google Collections are faster than Java Collections this seems a good reason to use them.

for registration of the watchdog - i know you have more caches. but
you registered the listener for the same cache twice. I have it in my
logs somewhere.

Probably this was a bug of previous versions, please tell me if the same happens with latest.

Best regards,
Walter

Lvc@

Luca Garulli

unread,
Mar 15, 2011, 6:09:19 PM3/15/11
to orient-database
On 15 March 2011 22:31, Michael Widmann <michael...@gmail.com> wrote:
Think that the very first time the database get up to more than 100mio entries without a problem 

So if an external library for the first time fixes massive inserts problems - memory problems 
OOMs at all (without index actually) - it should be fixed with the library .... 
 
I agree with you, but I need the evidence of the bug before to include that lib.

Just my 2 cents after several days / nearly weeks for searching that bug ...

I appreciate a lot the Bayoda's contributions until now, hope to fix all the remaining slowness with indexes.
 
Michael / bayoda 

Lvc@

Benjamin Manes

unread,
Mar 15, 2011, 11:53:31 PM3/15/11
to OrientDB
Hi,

I am the author of that library and found this discussion by a Google
Alert.

I am primarily replying as a request to not use the CapacityLimiter
interface, which is going away in v1.2. Neither the custom weigher nor
limiter are needed in the patch as those are the default settings on
the builder. The CapacityLimiter was provided due to a feature request
from LinkedIn's Voldemort project to allow bounding by the %-heap free
instead of the weighted size. However, since that was never
implemented and I am unaware of any valid usages, I am removing it due
to some concerns that I have with the API. That said, the library is
actively used in production environments such as by Apache Cassandra.

As an alternative, you may wish to consider adopting Google Guava so
that your project can take advantage of its many useful utilities and
abstractions. A colleague and I have been working on incorporating the
algorithm and features found in CLHM into MapMaker class, which as of
r08 supports a #maximumSize() setting. Our goal is to mature MapMaker
so that it subsumes all of CLHM's features and concurrency tricks by
the end of the year. An advantage of choosing that library is that it
has a dedicated team at Google, a broad community, and utilities that
can be used throughout your project.

Cheers,
Ben

On Mar 15, 3:09 pm, Luca Garulli <l.garu...@gmail.com> wrote:
> On 15 March 2011 22:31, Michael Widmann <michael.widm...@gmail.com> wrote:
>
> > Think that the very first time the database get up to more than 100mio
> > entries without a problem
>
> > So if an external library for the first time fixes massive inserts problems
> > - memory problems
> > OOMs at all (without index actually) - it should be fixed with the library
> > ....
>
> I agree with you, but I need the evidence of the bug before to include that
> lib.
>
> Just my 2 cents after several days / nearly weeks for searching that bug ...
>
>
>
> I appreciate a lot the Bayoda's contributions until now, hope to fix all the
> remaining slowness with indexes.
>
> > Michael / bayoda
>
> Lvc@
>
> 2011/3/15 Walter Raboch <wrab...@gmail.com>

Walter Raboch

unread,
Mar 16, 2011, 8:08:16 AM3/16/11
to OrientDB
Hi Benjamin,

I added the custom Weigher because of an idea to calculate little bit
more in the future - I just didn't removed it in the first test
version. I will remove the capacityLimiter and not use it anymore -
thanks for the advice.

I will take a look on Guava - sounds great.

Best regards,
Walter
Reply all
Reply to author
Forward
0 new messages