Typical CLHM sizes

19 views
Skip to first unread message

cburroughs

unread,
Feb 24, 2011, 5:26:12 PM2/24/11
to ConcurrentLinkedHashMap
CLHM is used for the row cache by Apache Cassandra. I'm having
somewhat unusual problems with unbounded memory growth [1] (ie the
memory use of the jvm itself as measured by the OS). My setup is as
boring and typical (RHEL5, sun jdk 1.6.6_23) as I can manage.

One thing in my configuration that is unusual for the Apache Cassandra
community is that my row cache (which is a CLHM) is larger than
typical with a size of 200,000. While I think it's exceedingly
unlikely that this has anything to do with my problem, for my sanity I
wanted to see where 200,000 items in a CLHM fell between "totally
normal" and "oh, you might be the biggest in production".

[1] http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/reduced-cached-mem-resident-set-size-growth-td5967110.html

Ben Manes

unread,
Feb 24, 2011, 6:01:31 PM2/24/11
to ConcurrentLinkedHashMap
My understanding from discussions with Jonathan Ellis is that
Cassandra caches are in the millions. I am not intimately familiar
with Cassandra's internals and its workload patterns, but that would
indicate that production environments are working well at 200k+ sizes.

The design of v1.0 and beyond have no shortcomings in the eviction
policy with respect to assumptions about the cache's size. The much
older pre-release version used the SecondChance policy which degraded
at large sizes, but Cassandra upgraded to an official release
(CLHMv1.1) in v0.7.0. The CLHM v1.x implement an LRU algorithm so its
agnostic to the cache's size, but differs from a LinkedHashMap by
avoiding lock contention. Ideally you should be able to just swap in a
synchronized LinkedHashMap and achieve similar behavior with lower
performance.

The LRU's state is caught up periodically on either a write (e.g. may
cause an eviction) or when the number of reads exceed a threshold
value. The threshold is (16 x concurrencyLevel) and set to 64 by
Cassandra. This means that after 1024 reads a caller thread takes a
small penalty to catch the LRU up. Each pending read task is wrapped
with a WeakReference so there shouldn't be any lingering data issues.

CLHM is a strong-reference cache so it doesn't automatically evict if
there is memory pressure. The porting into MapMaker allows combining
with soft-references as a fail-safe.

I think the tests would have caught any memory leak, but if you find
anything more concrete then I'm happy to investigate.

Cheers,
Ben
> [1]http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/redu...

Patricio Echagüe

unread,
Feb 24, 2011, 6:12:03 PM2/24/11
to concurrentl...@googlegroups.com, Ben Manes
Are your rows bounded in size? You can see that behavior  (memory growth) even caching a small amount of rows when those rows can grow in size.

I experienced that issue before.

cburroughs

unread,
Feb 25, 2011, 8:56:05 AM2/25/11
to ConcurrentLinkedHashMap
Thanks Ben. That's what I thought. If I come up with something
concrete I'll let you know.

On Feb 24, 6:12 pm, Patricio Echagüe <patric...@gmail.com> wrote:
> Are your rows bounded in size? You can see that behavior  (memory growth)
> even caching a small amount of rows when those rows can grow in size.
>
> I experienced that issue before.


They are not formally bounded. But there are a fixed number of
columns whose values are things like urls, and they only get so large.

cburroughs

unread,
Feb 25, 2011, 11:49:52 AM2/25/11
to ConcurrentLinkedHashMap
I wanted to use MemoryLeakTest as something to leave running for
several days and observe RSS. I've hit an unexpected snag.

On RHEL5 with sun java 6_u20 and u23 I tried running the
MemoryLeakTest. Physical box has I tried with ITERATIONS set to both
100k and 200k and with both the default options and with jvm args
like I use with Cassandra [1] in *all* cases I get an OOM, gc
overhead limit, or once a segfault within a few minutes. I was
using the lru branch. I have a lot of large heap dumps if they are of
any use.

Example (I disabled constantly printing the warning):
[testng] Running MemoryLeakTest...
[testng]
[testng] WARNING: This test will run forever and must be manually
stopped
[testng] Exception in thread "Thread-265"
java.lang.OutOfMemoryError: GC overhead limit exceeded
[testng] at
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.addToRecencyQueue(ConcurrentLinkedHashMap.java:
413)
[testng] at
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.get(ConcurrentLinkedHashMap.java:
787)
[testng] at com.googlecode.concurrentlinkedhashmap.MemoryLeakTest
$1.run(MemoryLeakTest.java:42)
[testng] at java.util.concurrent.Executors
$RunnableAdapter.call(Executors.java:441)
[testng] at
com.googlecode.concurrentlinkedhashmap.ConcurrentTestHarness
$1.run(ConcurrentTestHarness.java:92)
[testng] Exception in thread "Thread-661"
java.lang.OutOfMemoryError: GC overhead limit exceeded



[1] <jvmarg value="-ea"/> <jvmarg value="-XX:+UseThreadPriorities"/>
<jvmarg value="-XX:ThreadPriorityPolicy=42"/> <jvmarg value="-
Xms1500M"/> <jvmarg value="-Xmx1500M"/> <jvmarg value="-Xmn400M"/>
<jvmarg value="-XX:+HeapDumpOnOutOfMemoryError"/> <jvmarg value="-
Xss128k"/> <jvmarg value="-XX:+UseParNewGC"/> <jvmarg value="-XX:
+UseConcMarkSweepGC"/> <jvmarg value="-XX:+CMSParallelRemarkEnabled"/>
<jvmarg value="-XX:SurvivorRatio=8"/> <jvmarg value="-
XX:MaxTenuringThreshold=1"/> <jvmarg value="-
XX:CMSInitiatingOccupancyFraction=75"/> <jvmarg value="-XX:
+UseCMSInitiatingOccupancyOnly"/> <jvmarg value="-XX:
+UseCMSInitiatingOccupancyOnly"/>
<jvmarg value="-Dcom.sun.management.jmxremote.authenticate=false"/>
<jvmarg value="-Dcom.sun.management.jmxremote.ssl=false"/>
<jvmarg value="-Dcom.sun.management.jmxremote.port=10101"/>

Ben Manes

unread,
Feb 25, 2011, 2:24:01 PM2/25/11
to ConcurrentLinkedHashMap
Unfortunately I have two other production incidents that popped up
today, so I'll have to defer investigating until this evening.

On Feb 25, 8:49 am, cburroughs <chris.burrou...@gmail.com> wrote:
> I wanted to use MemoryLeakTest as something to leave running for
> several days and observe RSS.  I've hit an unexpected snag.
>
> On RHEL5 with sun java 6_u20 and u23 I tried running the
> MemoryLeakTest. Physical box has  I tried with ITERATIONS set to both
> 100k and 200k and with both  the default options and with jvm args
> like I use with Cassandra [1]  in *all* cases I get an OOM, gc
> overhead limit, or once a segfault within a few minutes.    I was
> using the lru branch.  I have a lot of large heap dumps if they are of
> any use.
>
> Example (I disabled constantly printing the warning):
>    [testng] Running MemoryLeakTest...
>    [testng]
>    [testng] WARNING: This test will run forever and must be manually
> stopped
>    [testng] Exception in thread "Thread-265"
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>    [testng]     at
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.addToRecency­Queue(ConcurrentLinkedHashMap.java:
> 413)
>    [testng]     at
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.get(Concurre­ntLinkedHashMap.java:

Ben Manes

unread,
Feb 25, 2011, 11:49:34 PM2/25/11
to ConcurrentLinkedHashMap
You should be happy to hear that the test is bunk and its
configuration is far too extreme. I must have left it at a poor choice
of settings when experimenting. The configuration of a
concurrencyLevel=1,000 means that their must be a 16k pending
reorderings before a draining is attempted. Due to the number of
threads, the draining thread is descheduling and the arrival rate of
new reads exceeds its capabilities to catch up.

I rewrote the test so that its clearer, more realistic, and using a
reasonable configuration. The test shows that the draining can keep up
with a concurrencyLevel=64 and numThreads=250. In a real-world
application the arrival rate of reads wouldn't be 250 concurrently non-
stop, so this should be safe.

The draining is a slight bottleneck because it must be done by a
single-thread to arrive at a strict LRU. It can be done concurrently
if the LRU is partitioned, e.g. one per segment. This is what we do in
MapMaker, but that was done for simplicity with plans to enhance to a
top-level like CLHM does.

A pending enhancement is to optionally move draining away from a user-
thread and have it performed by an Executor. This is to avoid exposing
the amortized penalty onto a caller and we've added it to MapMaker
since it has much more clean-up work to do (this was so that we could
move soft/weak GC to user threads). A dedicated thread would probably
be able to drain CLHM even faster, so an even more extreme
configuration could be used.

Sorry for the false positive. Can you experiment with the updated test
and see if its satisfactory?

Thanks!
Ben

cburroughs

unread,
Feb 28, 2011, 8:34:31 AM2/28/11
to ConcurrentLinkedHashMap
Thank, latest code from the lru branch no longer OOMs on me. I'll do
the experiments I originally wanted to do and report back if I find
anything interesting.

Ben Manes

unread,
Feb 28, 2011, 5:30:48 PM2/28/11
to ConcurrentLinkedHashMap
I will do the following in v1.2 to improve this situation.

- Determine the number of recencyQueues based on
Runtime.availableProcessors().
- Support a user-defined Executor so that draining can be offloaded
from the caller threads (non-user facing latency).
- Have an upper-threshold of number of reorder operations that can be
queued and discard if reached. This will avoid a stampede affect
causing a failure since these tasks are non-critical.

The first two should have a positive impact on performance and the
last one will provide a safety net that avoids this ever becoming an
issue.

Ben Manes

unread,
Feb 28, 2011, 10:14:43 PM2/28/11
to ConcurrentLinkedHashMap
An upper bound of 1M / CPU is now in. This fail-safe should avoid any
runaway processes from causing an issue w.r.t. CLHM.

On my 4-core machine I can see this cap honored when running the
MemoryLeak test-case. I removed the Thread.yield() statement to force
excessive reads prior to an eviction.

If you suspect that CLHM is causing a failure with Cassandra then you
may wish to canary with a snapshot build. The APIs have not changed,
so it should just be a JAR replacement.

cburroughs

unread,
Mar 2, 2011, 4:28:35 PM3/2/11
to ConcurrentLinkedHashMap
Thanks, resident set size tests all look good so far. If I come up
with something interesting I'll let you know.
Reply all
Reply to author
Forward
0 new messages