Large amount of memory used by ehcache statistics

2,153 views
Skip to first unread message

daniel...@gmail.com

unread,
Sep 14, 2015, 11:59:51 AM9/14/15
to ehcache-users
  1. What version of Ehcache you are currently using; I see two .jar files being loaded, ehcache-2.7.2.jar and ehcache-core.2.6.2.jar
  2. Paste the configuration for the Cache/CacheManager you have an issue with; See below
  3. Add any name and version of other library or framework you use Ehcache with (e.g. Hibernate); Hibernate, Spring
  4. Providing JDK and OS versions maybe useful as well. Java 1.7u79 on RHEL 6 x64

  5. I have to preface this with the fact that I'm not a developer, just a server admin. I know some Java but not nearly enough to be able to find any possible culprits here.

    We had multiple servers start performing poorly, and noticed they were thrashing the garbage collector and almost maxing out their heaps. From looking at a heap dump, by far the largest objects were two instances of java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue totaling over 2.5 GB between them, and their queues were filled with nothing but net.sf.ehcache.statistics.extended.ExtendedStatisticsImpl tasks - some 60,000 of them. A 'healthy' server that didn't see unusual memory usage has 12 of these DelayedWorkQueue objects using around 100 to 200 bytes apiece, whereas this crashed server had 14, all of them ~100 bytes apart from the 2 goliaths. Also on the 'healthy' server, there were no instances of et.sf.ehcache.statistics.extended.ExtendedStatisticsImpl whatsoever, which surprised me as the servers have identical configurations. The servers that saw this memory issue had all been up for 8 days, whereas all servers that made it through unscathed had been restarted 2 days prior.

    Another thing I noticed is that some current live servers have many org.terracotta.statistics.archive.StatisticSampler classes loaded, whereas other servers in the same cluster have 0 instances of that, going by the output from jcmd GC.class_histogram after multiple forced GC runs. I found the ehcache configuration file, there are multiple entries but almost all have the same configuration, which I'll mark here as 'appSpecificCache':

    <defaultCache eternal="true"
     maxElementsInMemory="500"
     overflowToDisk="false"
     copyOnRead="true"
     copyOnWrite="true"
     memoryStoreEvictionPolicy="LFU" />

    <cache name="appSpecificCache1through15"
     maxElementsInMemory="10000"
     eternal="false"
     overflowToDisk="false"
     timeToIdleSeconds="0"
     timeToLiveSeconds="6000"
     copyOnRead="true"
     copyOnWrite="true"
     statistics="true"
     memoryStoreEvictionPolicy="LFU"></cache>

    One application cache additionally has inside its <cache> block:
    <searchable keys="false" values="false">
     <searchAttribute name="id" expression="value.getID().toString()"/>
    </searchable>

    Walking through the heap dump I see many cache objects with eternal set to true, which troubled me as I assumed nothing would be using the default cache since we have 15 or so independent app caches specified in the ehcache config xml. Again I don't have any developer input on this, but do you see an obvious issue here? My gut feeling is that we shouldn't have statistics enabled, as everything I'm reading says they should be disabled for performance reasons, and if we aren't using the monitor or developer console (which we certainly are not) there's no use to them. Is statistics really what's creating this enormous memory usage, or is it a side effect of something else?

    Thanks for any input you have, I would really like to be able to find a cause to this as it was quite a disruption and ehcache was the only common thread I could find.

Fabien Sanglier

unread,
Sep 14, 2015, 12:18:37 PM9/14/15
to ehcach...@googlegroups.com

The fact that you see 2 ehcache .jar files being loaded may be a part of the problem (ehcache-2.7.2.jar and ehcache-core.2.6.2.jar)

Between these 2 versions, statistics are handled a bit differently...with the 2.7.2 way being more efficient with stats (eg. in 2.7.x and above, statistics get enabled as and when you use them. If you stop looking at them they then disable themselves after a configurable delay).

With 2.6.2, if you have statistics=true in the ehcache.xml config, then all stats are enabled (all counters /timers enabled) wether you're actievly using/looking at those.

So my recommendation would be to clean up the unwanted ehcache jar in the classpath (i would recommend you keep the newer one) and also remove the "statistics=true" altogether in the ehcache.xml.

Let me know if that helps.

Thanks,

Fabien


--
You received this message because you are subscribed to the Google Groups "ehcache-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ehcache-user...@googlegroups.com.
To post to this group, send email to ehcach...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ehcache-users/636a5217-9bd0-4094-ae29-a1338a30b9a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Fabien Sanglier
fabiens...@gmail.com

daniel...@gmail.com

unread,
Sep 14, 2015, 4:10:20 PM9/14/15
to ehcache-users
I double-checked the heap dump and all net.sf.ehcache.statistics.* classes that are loaded came from the 2.7.2 jar, 2.6.2 doesn't appear to have any active classes at all. Which is good, but confusing given what's happening here. I will definitely make the suggestion that statistics be disabled and the 2.6.2 jar be removed.

Can you think of any classes or variables I could check in the heap that could point to 2.6.x behavior? If nothing actively looked at the statistics, how could they have been enabled? Perhaps there is code in place that's enabling it manually or checking for unknown internal reasons.

daniel...@gmail.com

unread,
Sep 17, 2015, 6:04:45 PM9/17/15
to ehcache-users
On a hunch, in our test environment I loaded the application with only the 2.6.2 jar, and didn't see any statistics memory usage increasing over time. Using the 2.7.2 jar, I did see the statistics and especially the statistics archive use more and more memory over time. Substituting 2.10 had the same issue. It appears that the only real fix will be to set statistics to false.

cschanck

unread,
Sep 18, 2015, 10:42:09 AM9/18/15
to ehcache-users
I'd be really curious to see the impact of the removing the "statistics=true" in the config; in 2.7, no stats are recorded (as far avg/latency/history) until you ask for them (usually via the TMS/TMC), and there shouldn't be anything in those executors at all, certainly not 60k entries.

Chris

daniel...@gmail.com

unread,
Sep 18, 2015, 1:27:22 PM9/18/15
to ehcache-users
I have some additional details from looking into this more.

- we do have an exposed net.sf.ecache MBean that can be queried in jconsole for statistics, along with the caches and the cache managers. I'm assuming this is what's actually enabling the statistics/archive engine in ehcache to gather all this.
- the executor tasks all seem to be scanning org.terracotta.statistics.archive.StatisticsArchive objects every 300 seconds. We have over 6 million of these StatisticsArchive objects. We actually have the same number (6.5 million) of org.terracotta.statistics.archive.StatisticSampler, net.sf.ehcache.statistics.extended.SampleStatistic, terracotta StatisticArchive and StatisticSampler$SamplingTask

Why are all these archive objects accumulating? Is it really our MBean causing this retention? In both our CacheManager objects, the total amount cached only looks to be a couple hundred KB. Is there some way to flush the archive objects? Or are we somehow improperly telling ehcache to keep them forever? I just completed a scan of the heap dump, and all 6.5 million StatisticArchive objects are the same - buffer is null, size 30, overspill DevNull. Does that mean they're empty?

I looked through 2.6.2 and it doesn't appear to have statistics archive functionality. This gradual building up of archive objects and associated samplers looks to be happening with 2.7.2, and 2.10.

cschanck

unread,
Sep 18, 2015, 4:39:27 PM9/18/15
to ehcache-users
You said: "we do have an exposed net.sf.ecache MBean..."

Can you tell me exactly which MBean you are using?

Thanks,

Chris

daniel...@gmail.com

unread,
Sep 18, 2015, 4:52:12 PM9/18/15
to ehcache-users
I'm not 100% sure how to answer that as I'm not a developer. I see in the list of MBeans in JConsole, one named "net.sf.ehcache", and when expanded it has 4 subfolders labeled Cache, CacheConfiguration, CacheManager and CacheStatistics. Those 4 have 2 entries each corresponding to the 2 cache managers that appear to be configured by the application. I can see that from performing a class histogram before and after, that clearing the cache has no effect on the statistic archive count or size, but shutting down the cache does clear out the statistic archive classes.

daniel...@gmail.com

unread,
Sep 19, 2015, 2:13:26 PM9/19/15
to ehcache-users
We've been doing more tests, and while looking through 2.10 code I don't see how and when the StatisticsArchive objects are ever cleaned up. Can anyone help me with that? It looks to me like they just accumulate forever. All our caches have TTLs set and statistics=false (even though that's deprecated), and the number of org.terracotta.statistics.archive.StatisticsArchive objects just keeps growing and growing over time, even as users log in and log out and have their caches destroyed.

Is this expected? How can I track backwards from a StatisticsArchive object to the cache it was associated with? Surely when all caches on a server are destroyed, the StatisticsArchive objects that were tracking them will be destroyed as well?

daniel...@gmail.com

unread,
Sep 21, 2015, 4:42:32 PM9/21/15
to ehcache-users
I can only assume this is by design now, and that our app has been lucky all this time with its usage pattern. Instead of creating a limited number of caches upfront and using them throughout the JVM lifespan, we create and destroy several thousand caches per day as users log in and time out, with a common CacheManager for all these transitive caches. With 2.6.2 this was not an issue, but 2.7.2 through 2.10 create numerous archive and statistic objects whenever a Cache is added to a CacheManager object (via findStandardPassThruStatistics() in ExtendedStatisticsImpl constructor), which I assume will only ever be cleared out if we shutdown/destroy the CacheManager itself, which we don't do.

I made a simple test program which just creates an empty Cache object, adds it to a single permanent CacheManager, removes it again, then destroys the Cache, over and over again. Creating, adding and removing 50,000 Cache objects is enough to exhaust the heap on my 8gb MacBook. I would have assumed that any statistics associated with a Cache would be removed or invalidated when the Cache itself was removed from the CacheManager, but apparently not.

Sample code:

/* requires ehcache and slf4j jars to build.
 * run with 1 command line argument to override the default of 1000 caches created per run
 * i.e. java -jar ehcachetest.jar 100
 */

import net.sf.ehcache.Cache;
import net.sf.ehcache.CacheManager;

import java.io.IOException;

public class Main {
    static int defaultInstances = 1000;

    public static void main(String[] args) {
        int instances = defaultInstances;
        int totalInstances = 0;

        if (args.length > 0) try {
            int userInput = Integer.parseInt(args[0]);
            instances = userInput;
        } catch (NumberFormatException e) {
            System.err.printf("Couldn't parse command line argument '" + args[0] + "' as number, defaulting to %d%n", defaultInstances);
        }

        System.out.printf("Hit enter to begin test. Processing %d caches per run.%n", instances);

        CacheManager manager = CacheManager.create();

        while (true) {
            try {
                System.in.read();
            } catch (IOException e) {
                e.printStackTrace();
            }

            long timestamp = System.nanoTime();

            for (int i = 0; i < instances; ++i, ++totalInstances) {
                Cache cache = new Cache("newtestcache", instances, false, false, 60, 60, false, 60);

                manager.addCache(cache);
                manager.removeCache("newtestcache");

                cache = null;

                if (i > 0 && i % 1000 == 0) {
                    long backup = timestamp;
                    timestamp = System.nanoTime();
                    System.out.printf("completed cache no. %d in %d seconds%n", i, (timestamp - backup) / 1000000000);
                }
            }

            System.out.printf("Created, populated and destroyed %d caches, running total %d. Hit enter to add more%n", instances, totalInstances);

Fabien Sanglier

unread,
Sep 21, 2015, 9:15:17 PM9/21/15
to ehcach...@googlegroups.com
This is pretty interesting finding...seems to me that indeed cache stats should probably be cleaned up when a cache is removed...although I have rarely encountered an app creating and removing 1000s caches per day, so this may be a pretty isolated issue :)

But really, there has to be a (much?) better way for your app to achieve what it needs without creating thousands of caches per day, right? (At least I would think so)
Are you using a unique cache per session to store user session related data? (Eg as a replacement for the session object itself?)

Fabien

For more options, visit https://groups.google.com/d/optout.


--
Fabien Sanglier
fabiens...@gmail.com

Fabien Sanglier

unread,
Sep 21, 2015, 9:19:51 PM9/21/15
to ehcach...@googlegroups.com
And by the way, just an extra thought: if creation / removal of 1000s caches per day is really needed per your requirements, you should check ehcache3 which may have a good answer for you... It has the concept of user managed caches (eg not managed by a cache manager) which may be an good solution for these caches with shorter life span without incurring overhead added by cache manager ...
--
Fabien Sanglier
fabiens...@gmail.com

Chris Dennis

unread,
Sep 22, 2015, 3:03:03 PM9/22/15
to ehcach...@googlegroups.com
Thanks for the reproducible test case.  With the aid of this it was easy to identify the leak.  On cache disposal we’re neglecting to cancel any outstanding tasks running on the statistics executor.  This causes various bits of the stats framework to leak.  The correct fix for this is to ensure all this stuff gets cancelled and disposed of correctly.  A good short term workaround for you guys would probably be to periodically shutdown the cachemanager that owns these transient caches.  Shutting down the cache manager will in turn shutdown the statistics executor, and therefore allow all this stuff to be GC’ed.  I’ll file the necessary issues for this and reply back on the thread with links when I’m done.

Chris

daniel...@gmail.com

unread,
Dec 8, 2015, 3:58:04 PM12/8/15
to ehcache-users
Any word on this? I found what looks like the same problem in an issue from May - https://jira.terracotta.org/jira/browse/EHC-1105
To unsubscribe from this group and stop receiving emails from it, send an email to ehcache-users+unsubscribe@googlegroups.com.


--
Fabien Sanglier
fabiens...@gmail.com

niko.wi...@googlemail.com

unread,
Jul 26, 2018, 1:39:02 AM7/26/18
to ehcache-users
It's quite a pity that these topic is not continued for such a long time. As I've just stumbled upon the same issue having the same architecture (shutting down caches), I just want to point out that upgrading to ehCache 3.5 (maybe a 3.x is fine as well) seems to solve the issue, I'm not encountering memory leaks anymore.
Reply all
Reply to author
Forward
0 new messages