Hibernate L2 cache

Mindaugas Žakšauskas

unread,

Dec 17, 2010, 6:23:43 AM12/17/10

to ConcurrentLinkedHashMap

Hi,

I was wondering if anybody has tried to use CLHM as a Hibernate second
level cache provider? We are trying to migrate from OSCache to other
solutions and my first choice (EHCache) might be a lacking some of the
flexibility CLHM still has.

I would have tried this myself already, but it seems that writing an
adapter to Hibernate become more complex using the new per-region API.

Also another question. The website says the CLHM is now integrated
into guava. But I have looked at the guava javadocs and I can't see
CLHM definition. Is it integrated as a publicly available class, or is
it just powering some external guava API?

Regards,
Mindaugas

Ben Manes

unread,

Dec 17, 2010, 12:08:06 PM12/17/10

to ConcurrentLinkedHashMap

Hi Mindaugas,

I have not seen a Hibernate wrapper for CLHM, so unfortunately I can't
be of help. Unless you find a need for a specific feature provided by
CLHM then I'd agree with the more pragmatic choice of adopting Ehcache
for that usage. Generally the performance concern for those types of
workload is the cache miss penalty (database operation) and not the
hit penalty (lock contention) so even a naive synchronized
LinkedHashMap is acceptable (which was what Ehcache used for most of
its history). From a practical perspective it may be better to
leverage Ehcache's integration (Hibernate, Spring) and adopt CLHM for
high-performance scenarios or directly written code (cleaner APIs).
Ehcache may also allow you to specify a custom in-memory store, so
writing an Ehcache adapter for CLHM could allow you to get the best of
both worlds.

In regards to Guava, the changes inspired by CLHM will be available in
r08. The class is not directly in Guava, but rather a significant
amount of code was ported and integrated into MapMaker. The
algorithmic techniques enabled MapMaker to provide concurrent
#maximumSize() and #expireAfterRead() [aka time-to-idle]
implementations. These had been longstanding feature requests, but the
Guava team had been unaware how to provide them without incurring lock
contention. Some additional features, such as bulk memoization, are
planned additions based on my tutorial examples.

Once r08 is released, I would suggest defaulting to MapMaker unless
you need a specific feature from CLHM. That would allow you to
leverage the Guava community and MapMaker's wider range of features.
For specific cases there are advantages with CLHM, though, which is
why it was adopted by Cassandra (performance) and Grails (weighted
values). The implementation ported to MapMaker is bit naive in
comparison, due to integration challenges (MapMaker forked
ConcurrentHashMap, whereas CLHM is a decorator). However, for most
purposes the difference is slight and the advantage of leveraging
Google's community is a big win.

I don't mean to discourage you from using CLHM and I'd love it if you
did, but from a pragmatic perspective there are good reasons to adopt
other solutions.

Best regards,
Ben

Mindaugas Žakšauskas

unread,

Dec 17, 2010, 6:21:32 PM12/17/10

to ConcurrentLinkedHashMap

Thank you for an extensive response Ben.

The problem I am solving is explained here:
http://stackoverflow.com/questions/4452242/specifying-global-ehcache-capacity
Long story short, I am not convinced that having a separate cache/
region for each different entity type is the best way to go. On top of
that, we have other, specific needs (e.g. we need to control how cache
messages are sent across the cluster) so lots of existing EhCache code
isn't much useful for us anyway.

This is why I thought, "wouldn't it be enough just to have a simple
single LRU ConcurrentMap so we could just decorate it?". In fact I
have spent some time today doing a test integration and got excellent
results - I was able to throw a simple adapter class which implements
org.hibernate.cache.Cache. It seems to work without any glitches, I
haven't done much testing though but can post further results if
anybody is interested.

The only problem with this approach is that legacy interface
(org.hibernate.cache.Cache) is now deprecated in favour of per-region
approach (org.hibernate.cache.RegionFactory) which corresponds almost
1:1 how EhCache is designed. It might still be reasonably easy
solvable though, as Hibernate developers did some of the legwork by
providing relevant adapters (EntityRegionAdapter,
EhcacheCollectionRegion, etc.).

Could you elaborate on the performance concern a bit, not sure if I
completely got your point. My understanding is - once an object is
often accessed (the U in LRU), it has higher survival chances (which
is what we want anyway). Assuming db retrieval costs are more or less
equal among different entity types, everything else is more or less
irrelevant, isn't it? I do not believe having synchronized methods
(like you suggest) is sufficient - I have seen too many deadlocks and
blocked threads in caching layer (that's why we're migrating off
OSCache anyway). The problem gets even worse if a database is under-
performing. And how exactly EhCache can help if a cache miss occurs
where CLHM couldn't?

Regarding guava - we are using r7 at the moment and I will definitely
switch to MapMaker (which we're happily using elsewhere) when r8 comes
out. I believe it will have all the stuff I need to achieve the goal
above. Just wanted to make sure I'm not missing anything.

The other feature I would need is grouping. Something like:

map.put("one", 1, "numbers");
map.put("orange", orange, "fruits");
map.put("two", 2, "numbers");

map.flushGroup("numbers"); // would only leave orange entry

This seems to be easily doable by adding another map where keys are
group names and values are lists with references to relevant entries.
A simple eviction listener would also be necessary to check these
entries and remove evicted ones to avoid memory leaks. But that's well
beyond of the scope of this project.

m.

Ben Manes

unread,

Dec 17, 2010, 7:11:34 PM12/17/10

to ConcurrentLinkedHashMap

> Could you elaborate on the performance concern a bit, not sure if I completely got your point.

My point was just to take the simplest, easiest to maintain solution
that fits your needs. I try not to advocate CLHM when alternative
solutions are acceptable and require less integration code.

In the reasoning I had, my point was that the performance penalty is
the DB access (uniform cost) due to a cache miss. Any LRU cache (CLHM,
Ehcache, LinkedHashMap) should perform well enough for the in-memory
operations. This is because even synchronization of a naive
LinkedHashMap might cost 100s of nanoseconds, whereas the cache miss
to the database would be in the milliseconds range. So in common
cases, a simple approach is acceptable because the value is in the LRU
algorithm and not the concurrency characteristics. In cases where the
synchronization penalty is noticeable, then CLHM would provide a
substantial speed-up.

In the cases where the cache's in-memory performance is critical the
difference between LinkedHashMap and CLHM should be night and day. The
difference between MapMaker and CLHM should be less so, with a small
win for CLHM due to some better design choices. We may eventually
integrate those improvements into MapMaker, but due to some complexity
issues due to design differences we decided to hold off for now. Those
aspects may be niche enough that the current #maximumSize()
implementation will stay as is, since so far everyone internally at
Google has been quite happy with its concurrency performance.

> And how exactly EhCache can help if a cache miss occurs where CLHM couldn't?

Oh, I didn't mean that at all. Both can help and CLHM should perform
better.

> Regarding guava - we are using r7 at the moment and I will definitely switch to MapMaker... Just wanted to make sure I'm not missing anything.

There are some unique features in CLHM which may wind up in MapMaker
someday. If you don't need those then switching is a reasonable choice
since you'll have a wider support group (not just me). Realistically,
though, I'll probably be the one digging into fixing the MapMaker bugs
anyways. :)

> The other feature I would need is grouping.

CLHM supports "weighted" values which would allow you to bound the map
by the total number of elements in the lists. For example,
map = [
key=1, value=List['a', 'b', 'c']
key=2, value=List['d', 'e']
]
would have a weighed size of 5. If the capacity was reached and
eviction of key=1 was required, then the map would reduce to a
weighted size of 2.

Depending on your needs, that may provide the group-based capacity
bounding that you're interested in.

An open item in Guava is determining exactly how a Multimap cache
would behave, since there are a few choices (evict the list entirely
or just a portion of it?). When Kevin B. has determined what the
semantics are then I will probably help the team in constructing a
concurrent implementation.

Cheers,
Ben

On Dec 17, 3:21 pm, Mindaugas Žakšauskas <min...@gmail.com> wrote:
> Thank you for an extensive response Ben.
>

> The problem I am solving is explained here:http://stackoverflow.com/questions/4452242/specifying-global-ehcache-...

Reply all

Reply to author

Forward