EhCache deadlock problem hibernate.

852 views
Skip to first unread message

Seamus McMorrow

unread,
Sep 9, 2015, 6:48:02 PM9/9/15
to ehcache-users
Hi,

Can anyone help me with this question on stackoverflow please?



Thanks,
S

Fabien Sanglier

unread,
Sep 12, 2015, 3:08:20 PM9/12/15
to ehcach...@googlegroups.com
I added an answer directly on stackoverflow...seems like this may be normal behavior due to the write locks and threads that are naturally waiting during these locks etc... At least it's known issues... as mentionned in the FAQ of that page: http://ehcache.org/documentation/2.6/apis/transactions#transaction-managers.

Aside from the exceptions, do you notice inconsistencies in your App? Seems like hibernate should handle these timed-out threads and fall back on its feet either way...

--
You received this message because you are subscribed to the Google Groups "ehcache-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ehcache-user...@googlegroups.com.
To post to this group, send email to ehcach...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ehcache-users/7bf73d91-209c-41cb-9034-b0546f7d87de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Fabien Sanglier
fabiens...@gmail.com

Seamus McMorrow

unread,
Sep 12, 2015, 9:21:10 PM9/12/15
to ehcache-users
Many thanks for the reply.

I have been trying to debug this for a few days now, and the app/server becomes really messed up when this happens.
It is only happening when the cache expires and 2 or more threads hit the server at the same time.

It seems to be trying to do a get and a put into the cache from both threads, and somehow it gets deadlocked. Things go very funny after that.
I tried adding a BlockingCache as a decorator, and verified it registered, by debugging, but the same thing happened. 
I have 60 seconds as timeToLiveSeconds so it expires quicker and I can replicate the deadlock

If you want more thread dumps or anything let me know.

jvisualvm is able to catch the deadlock when I run 5 users against the server. So I can replicate the issue fairly easy. Maybe its time to try and write a unit test to replicate the issue, and see if I can get some better help.
I am currently at a loss as what to do, I though ehcache would be able to handle concurrency pretty well, but at the moment, I think I may try another provider.......

eg.
<cache name="branch" maxEntriesLocalHeap="10000" eternal="false" timeToLiveSeconds="60" transactionalMode="xa">
<cacheDecoratorFactory class="com.ardan1.core.ehcache.BlockingCacheDecoratorFactory"/>
<persistence strategy="none" />
</cache>
 
public class BlockingCacheDecoratorFactory extends CacheDecoratorFactory {
@Override
public Ehcache createDecoratedEhcache(Ehcache cache, Properties properties) {
return new BlockingCache(cache);
}
@Override
public Ehcache createDefaultDecoratedEhcache(Ehcache cache, Properties properties) {
return new BlockingCache(cache);
}
}

Fabien Sanglier

unread,
Sep 14, 2015, 11:11:51 AM9/14/15
to ehcach...@googlegroups.com
I'm not quite sure where the hang-up is...2 extra thoughts:
 - Everything happens right if you disable transactional mode = XA, right?
 - have you tried a new version of ehcache? (current is 2.10.x) ... maybe something has been fixed from 2.6.6

Quick note: If you do end up testing with ehcache 2.10 and use maven, make sure you exclude the ehcache-core lib from the hibernate-ehcache dependency...

<dependency>
            <groupId>org.hibernate</groupId>
            <artifactId>hibernate-ehcache</artifactId>
            <version>${hibernate-core.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>net.sf.ehcache</groupId>
                    <artifactId>ehcache-core</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

Thanks,

Fabien


For more options, visit https://groups.google.com/d/optout.



--
Fabien Sanglier
fabiens...@gmail.com

Seamus McMorrow

unread,
Sep 15, 2015, 12:19:23 PM9/15/15
to ehcache-users
Hi,

Yes, everything works correctly if I disable transactional mode = XA. I double checked and ran a test, works fine.

I also tried the latest version of ehcache 2.10.0 with xa on, but I got the same deadlock again. 

I am digging through transaction logs, and debugging through it to try and figure it out. Nothing conclusive yet.

Thanks again,
S







So I tried the latest version of ehcache 2.10.0, and made it work with hibernate 4.2.6

Moises Ventura

unread,
Sep 16, 2015, 12:56:40 PM9/16/15
to ehcache-users
I've updated the stack overflow post (http://stackoverflow.com/questions/32484813/ehcache-expiry-and-deadlock) with detailed information about where and how the lock happens, and how to reproduce it using break points. Unfortunately my knowledge about the locking mechanisms is very basic. More specifically, I don't understand why Cache.tryRemoveImmediately(Object, boolean) involves two different locks,  ReadWriteLockSync.tryLock(ReadWriteLockSync.java:57) and ReentrantLock.tryLock(long, TimeUnit), setting the exclusive owner thread without acquiring the lock (which they couldn't do since it has already been acquired by the other thread).

I hope this additional information makes sense to somebody.

Thanks



Ludovic Orban

unread,
Sep 19, 2015, 1:06:18 AM9/19/15
to ehcach...@googlegroups.com
It would be useful if you could try configuring your cache as eternal
and re-running your test to see if that helps. I'm suspecting that
some lock might not get properly released when an element is being
expired.
> --
> You received this message because you are subscribed to the Google Groups
> "ehcache-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ehcache-user...@googlegroups.com.
> To post to this group, send email to ehcach...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ehcache-users/e6db320d-e039-46cc-bb87-d3faeb6305c8%40googlegroups.com.

Seamus McMorrow

unread,
Sep 19, 2015, 11:27:58 AM9/19/15
to ehcache-users
For the performance tests I am currently doing, I have put all the caches to never expire and everything works properly. There is no deadlocks. It is definitely something related to when the cache expires.

Ludovic Orban

unread,
Sep 22, 2015, 9:18:14 AM9/22/15
to ehcache-users
After looking closely at your analysis, thread dump and the source code I believe I do understand where the deadlock is coming from. My understanding is that there is a timeout mismatch between your transaction manager and the non-strict XA cache.

If my theory is right, the problem is that the xa cache's timeout is longer than the XA transaction's. Since it is not possible for non-strict XA to get the transaction timeout, the xa cache relies on the configured (or default) local transaction timeout.

The problem I see is that one element is being modified by one transaction, while a concurrent transaction tries to read the same element and notices that it is expired so the tx removes the element inline, which obviously will block since the element is locked by the 1st transaction. But the 2nd thread will only block until its transaction times out.

If my theory is right, that means lowering the default local transaction timeout should help the expirations happen within XA transaction timeout.

Moises Ventura

unread,
Sep 22, 2015, 1:47:39 PM9/22/15
to ehcache-users
Ludovic:

I was able to solve my test dealock by removing the syncForKey.tryLock parts in  net.sf.ehcache.Cache:tryRemoveImmediately and leaving only the removeInternal call. This way the concurrent calls correctly handle the locks through the LocalTransactionStore softLock, with one executing the get-remove-put and the others waiting and restarting when the first one is completed.

It would look like the deadlock happens because the conflict between the net.sf.ehcache.Cache syncForKey.tryLock, which sets the exclusive owner thread in the second thread, and the underlying net.sf.ehcache.transaction.local.LocalTransactionStore SoftLock, which sets the lock in the first thread. See the complete details in the updated stack overflow post.

Why does the Cache:tryRemoveImmediately call the syncForKey.tryLock, which is not done in either the put or the remove? Again, I don't know enough about ehcache, this seems to fix the deadlock issue but might be hiding the real cause, or breaking some other things.

Ludovic Orban

unread,
Sep 23, 2015, 4:37:43 AM9/23/15
to ehcache-users
Removing the tryLock() call in tryRemoveImmediately() may have some undesired side-effects that may or may not affect you.

The idea behind this tryLock() is to only perform eviction if the element isn't locked, otherwise postpone that effort to another time. This prevents deadlocks in some specific features, one that comes to my mind is the BlockingCache pattern but there could be others that escape me at the moment.

The transactional store has to play within the normal cache locking scheme while using its own specific locking scheme too. This makes deadlocks rather common but that's an aspect of all transactional systems and the transactional way to solve them is to cancel deadlocked transactions after their timeout. Said differently: the xa store has to handle this inconvenience for the benefit of other features.

Moises Ventura

unread,
Sep 23, 2015, 5:49:14 AM9/23/15
to ehcache-users
Do you know why only the Cache:tryRemoveImmediately calls the syncForKey.tryLock, which is not done by either the put or the remove in the same class?

Some key tables heavily used are in cache (branch, assets, etc). When they expire hundreds of threads are requesting them concurrently, so the Cache-syncForKey.tryLock vs LocalTransactionStore-SoftLock.lock happens systematically. If I understand correctly your suggestion, lowering the local transaction timeout will only make some threads to fail-fast, letting the others complete. If that's the case, it's not an acceptable solution. 
Reply all
Reply to author
Forward
0 new messages