Distributed lock cleanup ( ILock.destroy() )

Chris70

unread,

Aug 23, 2012, 4:18:55 PM8/23/12

to haze...@googlegroups.com

If thread 1 acquires distributed lock("foo"), thread 2 attempts to acquire and waits on lock("foo") because thread 1 has it locked, then thread 1 calls destroy on lock("foo"), does thread 2 successfully acquire lock("foo") or does something more "sinister" happen? So something like...

Thread 1:

ILock lock = Hazelcast.getLock("foo");

lock.lock();

// Do something

lock.destroy();

Thread 2:

ILock lock = Hazelcast.getLock("foo");

lock.lock();

// Do something

lock.destroy();

The point is, we'd like to clean up our locks as their released. Our original implementation in 2.1.2 was using an IMap-based lock, but this proved erroneous (we believe saw two different threads acquiring the same lock) due to what we perceive to be related to issue https://github.com/hazelcast/hazelcast/issues/223

Enes Akar

unread,

Aug 24, 2012, 2:53:18 AM8/24/12

to haze...@googlegroups.com

"two different threads acquiring the same lock" is more serious problem that we should firstly focus

as Lock object uses Map's lock mechanism on background.

The issue 233 is fixed and closed.

Is there any scenario that can help us to reproduce the problem?

Currently, destroy does not release the lock, you can file an issue regarding this.

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To view this discussion on the web visit https://groups.google.com/d/msg/hazelcast/-/gAndsRBjgJ4J.
To post to this group, send email to haze...@googlegroups.com.
To unsubscribe from this group, send email to hazelcast+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/hazelcast?hl=en.

Chris70

unread,

Aug 24, 2012, 10:23:20 AM8/24/12

to haze...@googlegroups.com

To be clear, we changed our code to use a Distributed Lock whereas previously we were seeing issues with the IMap-based lock. We made no code change other than to substitute the IMap-based lock with the Distributed Lock, but this was enough to fix the issue for us (or at least mask the root issue). What led us to try a Distributed Lock in lieu of an IMap-based lock is issue 223 (not issue 233) which we know is to directly fix an issue with MultiMap, but the fix appeared to be to components that are shared w/ Map.

I'll attempt to describe below the scenario we think we're seeing. We don't know if the scenario is accurate in terms of the exact sequence of events. Obviously if the lock isn't respected, the events could occur in a different order. The issue is always reproducible in our system.

- Thread 1 acquires lock "foo". Key "foo" is not yet an entry in the IMap, so it puts a value in for "foo", and then releases the lock.

- Thread 2, 3, 4 look to acquire lock "foo"

- Thread 2 acquires lock "foo", sees there's a value for "foo", registers an EntryListener, and then releases the lock.

- Thread 3 acquires lock "foo", sees there's a value for "foo", adds a listener to thread 2, and then releases the lock.

- Thread 4 acquires lock "foo", removes "foo" from the map, and then unlocks "foo".

We haven't found evidence that any of the threads successfully acquire lock "foo" while another thread holds it, though we suspect it to be the case. We're still investigating. The issue we suspect is thread 4 successfully acquires lock "foo" while thread 3 holds it. Meanwhile, it's possible that thread 4 removes "foo" from the map before thread 3 can register its listener on thread 2 and thus thread 3 never gets a callback. We completely rely on the lock "foo" to synchronize these operations.

Finally, we haven't tried running w/ the fix for issue 223, so we have no reason to believe that patch fixes the underlying issue. It may not be relevant. Is it worth trying the fix for 223 or are you convinced it has no relevance to our use case?

Mehmet Dogan

unread,

Aug 24, 2012, 12:37:23 PM8/24/12

to haze...@googlegroups.com

Both issue 223 and 228 are related to removal of a locked entry which applies to both IMap and MultiMap. Problem was that when a locked entry is removed and then unlocked, threads blocked while trying to acquire that lock were not able to get notification and were remaining blocked infinitely.

Since you are removing and unlocking an entry, it might be that you are encountering the same problem. And your fix (replacing IMap locks with external ILocks) seems to me a workaround for thread-4's operation, lock-remove-unlock.

If building your app using 2.3 (master branch) is not a hard process for you and if this issue is easily reproducible in your system, then it is worth trying. It will show us either 2.3 has fixed your issue or we have a serious lock issue waiting to be solved before releasing 2.3.

@mmdogan

To view this discussion on the web visit https://groups.google.com/d/msg/hazelcast/-/0k2GdmDDIH4J.

Chris70

unread,

Aug 29, 2012, 9:26:35 AM8/29/12

to haze...@googlegroups.com

Hazelcast 2.3 appears to have fixed the issue we saw when using a map-based lock.

Message has been deleted

Mehmet Dogan

unread,

Aug 30, 2012, 11:38:09 AM8/30/12

to haze...@googlegroups.com

Can you post a test case/app to be able to reproduce issue on our side? Or at least code snippets of operations those are using locks/maps?

@mmdogan

On Thu, Aug 30, 2012 at 5:54 PM, Chris70 <chris...@gmail.com> wrote:

Perhaps I spoke too soon...

We have some evidence that map-based locks are being acquired by other threads before they are released by the threads that initially acquired them. The issue is reproducible in a 4-node cluster of Hazelcast (running 2.3), and we do not see it as often (or perhaps not at all) with a 2-node cluster.

Separately, using a Distributed Locks on Hazelcast 2.3 fails w/ an IllegalMonitorStateException whereas we did not see this exception in 2.1.2. I am certain that in our code, the thread that releases the lock is the same thread that locked it. This seems to point to multiple threads acquiring the same lock. Is this possible?

2012-08-30 10:03:45,649 WARN SystemError - Thu Aug 30 10:03:45 EDT 2012 java.lang.IllegalMonitorStateException: Current thread is not owner of the lock! at com.hazelcast.impl.MProxyImpl$MProxyReal.unlock(MProxyImpl.java:731) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at com.hazelcast.impl.MProxyImpl$DynamicInvoker.invoke(MProxyImpl.java:66) at $Proxy0.unlock(Unknown Source) at com.hazelcast.impl.MProxyImpl.unlock(MProxyImpl.java:412) at com.hazelcast.impl.LockProxyImpl$LockProxyBase.unlock(LockProxyImpl.java:202) at com.hazelcast.impl.LockProxyImpl.unlock(LockProxyImpl.java:116) at

On Wednesday, August 29, 2012 9:26:35 AM UTC-4, Chris70 wrote:
Hazelcast 2.3 appears to have fixed the issue we saw when using a map-based lock.

--

You received this message because you are subscribed to the Google Groups "Hazelcast" group.

To view this discussion on the web visit https://groups.google.com/d/msg/hazelcast/-/0YElWJk5dHMJ.

Chris70

unread,

Aug 31, 2012, 8:29:33 AM8/31/12

to haze...@googlegroups.com

I had retracted my reply. Will post back if I have more info on this.

Mehmet Dogan

unread,

Sep 7, 2012, 7:54:03 AM9/7/12

to haze...@googlegroups.com

Can you try using version 2.3? If you can reproduce even using 2.3, can you post a test-case?

@mmdogan

On Fri, Sep 7, 2012 at 1:47 PM, Timur E <eblo...@gmail.com> wrote:

Have got similar issue, running Hazelcast 2.2 on Oracle JDK 1.7.0-05 Linux x86-64.

java.lang.IllegalMonitorStateException: Current thread is not owner of the lock!

at com.hazelcast.impl.MProxyImpl$MProxyReal.unlock(MProxyImpl.java:717)
at sun.reflect.GeneratedMethodAccessor462.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.hazelcast.impl.MProxyImpl$DynamicInvoker.invoke(MProxyImpl.java:66)
at $Proxy514.unlock(Unknown Source)
at com.hazelcast.impl.MProxyImpl.unlock(MProxyImpl.java:405)

at com.hazelcast.impl.LockProxyImpl$LockProxyBase.unlock(LockProxyImpl.java:202)
at com.hazelcast.impl.LockProxyImpl.unlock(LockProxyImpl.java:116)

It needs a load test to reproduce.

Most interesting observation:
The very same application runs without this error (production, quite heavy load) on Oracle JDK 1.6.0-29 Linux x86-64.

пятница, 31 августа 2012 г., 14:29:33 UTC+2 пользователь Chris70 написал:

To view this discussion on the web visit https://groups.google.com/d/msg/hazelcast/-/L3jje7g7P0AJ.

Mehmet Dogan

unread,

Sep 10, 2012, 9:29:23 AM9/10/12

to haze...@googlegroups.com

Thanks for the info.

I am able to reproduce this issue. As you said, this is a quite elusive race condition. (Increasing execution frequency of cleanup thread helps to much here.) Problem appears both on jdk6 and jd7.

I have also a fix and will push after some testing. (I hope this will solve Chris's problem too.)

@mmdogan

On Sun, Sep 9, 2012 at 10:37 AM, <eblo...@gmail.com> wrote:

Filed an issue here
https://github.com/hazelcast/hazelcast/issues/267

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.

To view this discussion on the web visit https://groups.google.com/d/msg/hazelcast/-/_elSDbeoLdsJ.

Reply all

Reply to author

Forward