ChronicleMap and memory efficiency vs CHM

1,005 views
Skip to first unread message

Kevin Burton

unread,
Apr 17, 2015, 7:06:10 PM4/17/15
to mechanica...@googlegroups.com
I'm considering porting ActiveMQ's memory store to ChronicleMap to get tighter in-memory representation and fewer GC pauses.

The main motivator is memory.  We're wasting a MASSIVE amount of memory hosting our queue.  I think it's along the order of 150GB of RAM right now but only for a few million objects.  

Chronicle is saying that , in their benchmarks, they're getting 18x memory savings.  

In my experience this is probably realistic.  In-memory data structures in Java can be horribly inefficient.  

(I attached their performance numbers below).

I think the performance tradeoff here is probably realistic.  While we have millions of messages our scheduling throughput is more like 1-2k per second vs the 30M per second below.  (of course my mileage may vary depending on the encoding/decoding performance).

Has anyone tested this in production?

I would post to the chronicle list but I couldn't find one.  If this works I could get this data < 10GB which would be pretty amazing.  

We're planning on scaling a LOT larger so this could easily become 1TB of memory if I'm not careful :-P  .. that would be pricey!


Number of entriesChronicle* ThroughputChronicle RSSHashMap* ThroughputHashMap Worst GC pauseHashMap RSS
10 million30 Mupd/s½ GB155 Mupd/s2.5 secs9 GB
50 million31 Mupd/s3⅓ GB120 Mupd/s6.9 secs28 GB
250 million30 Mupd/s14 GB114 Mupd/s17.3 secs76 GB
1000 million24 Mupd/s57 GBOOME43 secsNA
2500 million23 Mupd/s126 GBDid not testNANA

Peter Lawrey

unread,
Apr 17, 2015, 10:03:09 PM4/17/15
to mechanica...@googlegroups.com, java-ch...@googlegroups.com

You are right that the mileage can vary *a lot* based on the size of the entry and how you do the serialization.

We offer a number of modes of use and we benchmark the fastest option as this is not only a bigger number but the situation where most of the time is spent in the Chronicle Map code rather than the serialization / deserialization. Ie it is the map we are benchmarking.

The most compact option is to use an LZW compressed Externalizable or our BytesMarshallable serialized object. Depending on the size and structure of the object you could get a 1/18 th the size with this option but you can get a range of sizes between 1/100 to worse than not compressing.  Eg the content of your data matters when using compression.
As LZW is expensive we support Snappy and no compression as well. Snappy compression is much faster but uses more space than LZW.
For one use case we examined large String values where using Chronicle Map alone halved the size of the String mostly due to UTF-8 encoding, Snappy was 1/10 th the size and LZW was better than 1/20 th the size of ConcurrentHashMap storing the same Strings. If you serialize an object you get different results as Snappy is optimised for compressing text fast. AFAIK.

Finally the fastest option is avoid serialization entirely and use a direct reference into off heap memory. This allows you access into native memory like a reference to a struct in C++ for a Java Bean (which we generate the code from an interface of getters and setters you provide) or you can hand craft the code to do this from our generated code.

As the Map is concurrent, we use multiple threads in our benchmarks. The benchmark was run on a machine with 16 cores and hyperthreading so we had 32 threads.
The table is a bit old and we get a little over 1 million updates/sec/thread when all the cpus are busy. We get about double that with just one thread running.

It is worth remembering that ChronicleMap supports persistence, ultra fast restart, data sizes larger than main memory with notional on heap foot print, and LAN/WAN replication. (The WAN replication allows you to control the amount of network bandwidth you utilise)

If you use chronicle for more than just a replacement for an on heap data structure it can be a more compelling option.

Regards,
Peter.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kevin Burton

unread,
Apr 17, 2015, 11:20:46 PM4/17/15
to java-ch...@googlegroups.com, mechanica...@googlegroups.com
Nice.  This is interesting.  At present, ActiveMQ doesn't support compressing the message headers. It supports compressing the bodies using LZW but not the headers.  In our situation the headers are about 50% of the space of the message.  I've actually thought about just encoding a message entirely as headers since this might actually be more efficient.

I think some of those tuneables might need to be exposed. IE should it use snappy or LZW, etc.  

Can chronicle be used as an LRU?  I couldn't find any documentation on this.  I imagine I could mirror part of the LRU inside the JVM and just keep key pointers but that would still use memory.  ActiveMQ uses LRUs for topics (not sure why).

This would basically mean I could drop in chronicle support by just changing the map constructor and literally using the same classes.  So it would just be some glue.

Would be a big win.  Off heap memory storage and much better compression.  
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Peter Lawrey

unread,
Apr 18, 2015, 1:33:38 AM4/18/15
to java-ch...@googlegroups.com, mechanica...@googlegroups.com

If you needed LRU, you would need to create a Map facade which kept an LRU of the keys on heap while the entries were off heap. Maintaining an strict LRU is not simple to combine with concurrent access.
In the future, we might support a mostly LRU which is not strict but we haven't had any clients ask for it.

Regards,
Peter.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Chronicle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-chronicl...@googlegroups.com.

Benjamin Manes

unread,
Apr 18, 2015, 7:03:59 PM4/18/15
to mechanica...@googlegroups.com, java-ch...@googlegroups.com
You could easily fork ConcurrentLinkedHashMap and change the backing map, which is only required to be a ConcurrentMap. That would provide a fairly strict LRU cache with high concurrency. There are many projects that opted to do similar, the code is easy to adapt, and has high test coverage. Cassandra uses this map as its on-heap cache as well as for managing off-heap entries. Supposedly that will soon be replaced with OHC which is a strictly off-heap cache.

Oddly OHC uses the linked list strategy, and earlier prototype caches in the Cassandra bug tracker claim some inspirations from CLHM. If I were to design an off-heap cache I'd have used a probabilistic policy to avoid the linkage bloat and recognizing that very large caches don't need to be very strict in LRU management to have a high hit-rate. A probabilistic LRU like Redis' could be augmented with TinyLFU if a higher hit-rate was desired. As with CLHM, writes would be buffered for replay to improve concurrency and simplify management. I'm pretty sure an LRU policy could be built on top of Chronicle in a day or two. All of the problems involved are solved, just not well known.

Jan van Oort

unread,
Apr 19, 2015, 1:35:45 AM4/19/15
to mechanical-sympathy
For building an extremely fast LRU cache ( whether on-heap or off-heap ), see pages 166 - 169 in: Henry S. Warren, "Hacker's Delight", 2nd edition.  



Fortuna audaces adiuvat - hos solos ? 

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Benjamin Manes

unread,
Apr 19, 2015, 3:13:53 AM4/19/15
to mechanica...@googlegroups.com
I don't think that the matrix method is applicable for software caches. He was musing at what it may look like in software, rather than making a serious proposal. The software versions described had serious shortcomings, such as not being O(1) friendly. In software you want either a LinkedHashMap variant with concurrency control or the probabilistic approach.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Jan van Oort

unread,
Apr 19, 2015, 6:53:28 AM4/19/15
to mechanical-sympathy
I've built one, recently, in Java. Use Case: it caches byte[] elements, following Warren's approach. If a <b>put</b> is performed upon a full cache, the least recently used entry is evicted. Works like a charm. 



Fortuna audaces adiuvat - hos solos ? 

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Peter Lawrey

unread,
Apr 19, 2015, 10:27:42 AM4/19/15
to mechanica...@googlegroups.com
If you are going to implement an off heap cache, you might want to be aware of this. 


"Off-heap direct-memory data stores, methods of creating and/or managing off-heap direct-memory data stores, and/or systems including off-heap direct-memory data store "

I am no lawyer so this sounds insane to me, but it might not to others.

Peter Lawrey

unread,
Apr 19, 2015, 10:35:25 AM4/19/15
to mechanica...@googlegroups.com
Though it does say

"It is believed that part of the reason for the long-felt need is that prior attempted solutions have tried to rely on either operating systems (OS) approaches, or programming language (PL) approaches, for solving these and related problems. The inventors of the instant application have realized, however, that what is needed is a more holistic approach that blends in elements from both of these art areas. "

I would like to meet the "inventors of the instant application", and I am amazed that no one has thought to use the operating system and the programming language in a "blend"ed way before 2012.

While the operating system is used when allocating direct ByteBuffers, I have this feeling the operating systems has been used to allocate memory before or that programs have used native memory to store data before.

Jan van Oort

unread,
Apr 19, 2015, 4:48:30 PM4/19/15
to mechanical-sympathy
To my European ears this gives quite a new meaning to "insane". 

Then again, the original patent filer was Software AG USA, a daughter of a German company. Frankie goes to Hollywood....  





Fortuna audaces adiuvat - hos solos ? 

Kasper Nielsen

unread,
Apr 21, 2015, 2:47:55 AM4/21/15
to mechanica...@googlegroups.com
They mention EHCache and Terracotta a couple of times. So it probably a result of Software AG buying up Terracotta a couple of years ago.

Martin Thompson

unread,
Apr 21, 2015, 4:17:43 AM4/21/15
to mechanica...@googlegroups.com
What's quite annoying is one of their group companies was a client of mine and I helped them with off-heap development. Those guys were really sane...then the US get involved. What's wrong with that country??? Patents are so misused!!! Grrrhhhh!

--

Michael Barker

unread,
Apr 21, 2015, 4:37:45 AM4/21/15
to mechanica...@googlegroups.com

Jan Kotek

unread,
Apr 21, 2015, 5:07:20 AM4/21/15
to mechanica...@googlegroups.com
Hi Kevin,

also consider MapDB HashMaps, they also support expiration. Current stable 1.0 version wont probably beat CHM, but new 2.0 which is under development could. Also space usage in MapDB is pretty low.

Contact me directly to give you hand to adapt the benchmarks. Most options in 2.0 are not well documented yet.

Jan
Reply all
Reply to author
Forward
0 new messages