Map in a 2-member cluster - apparent dead lock when deleting and adding objects with the same key, version 2.2.

302 views
Skip to first unread message

Danny

unread,
Aug 27, 2012, 6:24:38 PM8/27/12
to haze...@googlegroups.com
Gurus,
I am evaluating Hazelcast and have come to a problem. Would appreciate your insight.

Test setup:
Version 2.2. A 2-member cluster maintaining a simple map. Multi-threaded clients connect to the cluster as native java clients and would do all kind of add/modify/delete of the objects on the map. There is a fixed key space. Objects can be deleted and new objects can be added later on with the same keys.

The cluster is pre-populated with 10,000 objects.

The client thread in this test setup will repeat the following:
  1. Obtain a transaction from HazelcastClient.
  2. begin the transaction.
  3. Randomly pick a key, randomly do add/mod/delete. If delete, save the key for recycling. If add, add a new object with the recycled key.
  4. Commit transaction if succeeded, rollback otherwise.

Observations:
  1. Apparent deadlock observed on one of the cluster members. Logs from the cluster member and the client attached below.
  2. If I only do modification there is no deadlock and it works very well.
  3. It appears that new objects added with the same key may be causing the problem?
Thanks in advance.
Dan

=== Logs snippet from the trouble cluster memeber. Note the no-response error message would keep on printing with no end. ===

Aug 27, 2012 3:06:45 PM com.hazelcast.impl.ConcurrentMapManager
WARNING: [127.0.0.1]:5701 [dev] Still no response! Request{name='c:user',CONCURRENT_MAP_TRY_LOCK_AND_GET, callId='16458', redoCount='0', timeout='30000', lockThreadId='92', target='Address[127.0.0.1]:5702', local='true'}
Aug 27, 2012 3:06:45 PM com.hazelcast.impl.ConcurrentMapManager
WARNING: [127.0.0.1]:5701 [dev] Still no response! Request{name='c:user',CONCURRENT_MAP_TRY_LOCK_AND_GET, callId='16562', redoCount='0', timeout='300000', lockThreadId='94', target='Address[127.0.0.1]:5702', local='true'}
Aug 27, 2012 3:06:45 PM com.hazelcast.impl.ConcurrentMapManager
WARNING: [127.0.0.1]:5701 [dev] Still no response! Request{name='c:user',CONCURRENT_MAP_TRY_LOCK_AND_GET, callId='17002', redoCount='0', timeout='300000', lockThreadId='80', target='Address[127.0.0.1]:5702', local='true'}
Aug 27, 2012 3:06:45 PM com.hazelcast.impl.ConcurrentMapManager
WARNING: [127.0.0.1]:5701 [dev] Still no response! Request{name='c:user',CONCURRENT_MAP_TRY_LOCK_AND_GET, callId='17004', redoCount='0', timeout='30000', lockThreadId='78', target='Address[127.0.0.1]:5702', local='true'}
Aug 27, 2012 3:06:46 PM com.hazelcast.impl.ClientHandlerService
WARNING: [127.0.0.1]:5701 [dev] exception during handling CONCURRENT_MAP_PUT: [NONE] Could not acquire resource under transaction! Another thread holds a lock for the key : Data{partitionHash=-921146398} size= 10
com.hazelcast.core.OperationTimeoutException: [NONE] Could not acquire resource under transaction! Another thread holds a lock for the key : Data{partitionHash=-921146398} size= 10
at com.hazelcast.impl.BaseManager$TargetAwareOp.throwTxTimeoutException(BaseManager.java:937)
at com.hazelcast.impl.ConcurrentMapManager$MPut.txnalPut(ConcurrentMapManager.java:1775)
at com.hazelcast.impl.ConcurrentMapManager$MPut.txnalPut(ConcurrentMapManager.java:1759)
at com.hazelcast.impl.ConcurrentMapManager$MPut.put(ConcurrentMapManager.java:1628)
at com.hazelcast.impl.MProxyImpl$MProxyReal.put(MProxyImpl.java:618)
at com.hazelcast.impl.MProxyImpl$MProxyReal.put(MProxyImpl.java:597)
at com.hazelcast.impl.MProxyImpl.put(MProxyImpl.java:170)
at com.hazelcast.impl.MProxyImpl.put(MProxyImpl.java:121)
at com.hazelcast.impl.ClientHandlerService$MapPutHandler.processMapOp(ClientHandlerService.java:934)
at com.hazelcast.impl.ClientHandlerService$ClientMapOperationHandlerWithTTL.processCall(ClientHandlerService.java:1634)
at com.hazelcast.impl.ClientHandlerService$ClientOperationHandler.handle(ClientHandlerService.java:1586)
at com.hazelcast.impl.ClientRequestHandler$1.run(ClientRequestHandler.java:57)
at com.hazelcast.impl.ClientRequestHandler$1.run(ClientRequestHandler.java:54)
at com.hazelcast.impl.ClientRequestHandler.doRun(ClientRequestHandler.java:63)
at com.hazelcast.impl.FallThroughRunnable.run(FallThroughRunnable.java:22)
at com.hazelcast.impl.ClientHandlerService$Worker.run(ClientHandlerService.java:230)
at java.lang.Thread.run(Thread.java:680)
Aug 27, 2012 3:06:46 PM com.hazelcast.impl.ClientHandlerService
WARNING: [127.0.0.1]:5701 [dev] exception during handling CONCURRENT_MAP_PUT: [NONE] Could not acquire resource under transaction! Another thread holds a lock for the key : Data{partitionHash=-1362891295} size= 10
com.hazelcast.core.OperationTimeoutException: [NONE] Could not acquire resource under transaction! Another thread holds a lock for the key : Data{partitionHash=-1362891295} size= 10
at com.hazelcast.impl.BaseManager$TargetAwareOp.throwTxTimeoutException(BaseManager.java:937)
at com.hazelcast.impl.ConcurrentMapManager$MPut.txnalPut(ConcurrentMapManager.java:1775)
at com.hazelcast.impl.ConcurrentMapManager$MPut.txnalPut(ConcurrentMapManager.java:1759)
at com.hazelcast.impl.ConcurrentMapManager$MPut.put(ConcurrentMapManager.java:1628)
at com.hazelcast.impl.MProxyImpl$MProxyReal.put(MProxyImpl.java:618)
at com.hazelcast.impl.MProxyImpl$MProxyReal.put(MProxyImpl.java:597)
at com.hazelcast.impl.MProxyImpl.put(MProxyImpl.java:170)
at com.hazelcast.impl.MProxyImpl.put(MProxyImpl.java:121)
at com.hazelcast.impl.ClientHandlerService$MapPutHandler.processMapOp(ClientHandlerService.java:934)
at com.hazelcast.impl.ClientHandlerService$ClientMapOperationHandlerWithTTL.processCall(ClientHandlerService.java:1634)
at com.hazelcast.impl.ClientHandlerService$ClientOperationHandler.handle(ClientHandlerService.java:1586)
at com.hazelcast.impl.ClientRequestHandler$1.run(ClientRequestHandler.java:57)
at com.hazelcast.impl.ClientRequestHandler$1.run(ClientRequestHandler.java:54)
at com.hazelcast.impl.ClientRequestHandler.doRun(ClientRequestHandler.java:63)
at com.hazelcast.impl.FallThroughRunnable.run(FallThroughRunnable.java:22)
at com.hazelcast.impl.ClientHandlerService$Worker.run(ClientHandlerService.java:230)
at java.lang.Thread.run(Thread.java:680)
Aug 27, 2012 3:06:54 PM com.hazelcast.impl.ConcurrentMapManager
WARNING: [127.0.0.1]:5701 [dev] Still no response! Request{name='c:user',CONCURRENT_MAP_TRY_LOCK_AND_GET, callId='15007', redoCount='0', timeout='300000', lockThreadId='84', target='Address[127.0.0.1]:5702', local='true'}
Aug 27, 2012 3:06:54 PM com.hazelcast.impl.ConcurrentMapManager
WARNING: [127.0.0.1]:5701 [dev] Still no response! Request{name='c:user',CONCURRENT_MAP_TRY_LOCK_AND_GET, callId='15048', redoCount='0', timeout='300000', lockThreadId='86', target='Address[127.0.0.1]:5702', local='true'}

=== Logs snippet from the client process. Note those messages would keep on printing with no end, the time interval increments by 5 seconds. ===

INFO: There is no response for Call [177] operation=CONCURRENT_MAP_REMOVE in 35 seconds.
Aug 27, 2012 3:06:54 PM com.hazelcast.client.ProxyHelper
INFO: There is no response for Call [202] operation=CONCURRENT_MAP_REMOVE in 35 seconds.
Aug 27, 2012 3:06:54 PM com.hazelcast.client.ProxyHelper
INFO: There is no response for Call [209] operation=CONCURRENT_MAP_REMOVE in 35 seconds.
Aug 27, 2012 3:06:54 PM com.hazelcast.client.ProxyHelper
INFO: There is no response for Call [590] operation=CONCURRENT_MAP_REMOVE in 35 seconds.
Aug 27, 2012 3:06:55 PM com.hazelcast.client.ProxyHelper
INFO: There is no response for Call [660] operation=CONCURRENT_MAP_REMOVE in 35 seconds.
Aug 27, 2012 3:06:55 PM com.hazelcast.client.ProxyHelper
INFO: There is no response for Call [879] operation=CONCURRENT_MAP_REMOVE in 35 seconds.


Mehmet Dogan

unread,
Aug 28, 2012, 3:30:08 AM8/28/12
to haze...@googlegroups.com
Hi Dan,

Most probably this is related to issues 223 and 228Problem was that when a locked entry is removed and then unlocked, I mean when a key is locked-removed-unlocked, threads blocked while trying to acquire lock on that key were not able to get notification and were remaining blocked infinitely.

Can you build and try version 2.3 (master branch)? Or you can wait us to release 2.3 which will be this week. 

See also this thread;


@mmdogan






--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To view this discussion on the web visit https://groups.google.com/d/msg/hazelcast/-/GE1fxOzI5o8J.
To post to this group, send email to haze...@googlegroups.com.
To unsubscribe from this group, send email to hazelcast+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/hazelcast?hl=en.

Danny

unread,
Aug 28, 2012, 1:52:34 PM8/28/12
to haze...@googlegroups.com
Great insights Mehmet. Much appreciated.

I checked out version 2.3 released today! And yes the cluster recovers from the infinite blocking on acquiring lock on deleted keys. From the cluster's point of view the issue is fixed. I do see race conditions caused by my use cases. To be fair the cluster can't eliminate this problem but it should protect itself from the pitfalls by clients - which it does w/ this fix. Great work. Let me go back and think more on my use cases.

The evaluation on HZ has been good so far.
Thanks,
Dan
Reply all
Reply to author
Forward
0 new messages