Deadlock with Hazelcast cluster manager

72 views

Skip to first unread message

Rob Worsnop

unread,

Feb 8, 2016, 3:41:40 PM2/8/16

to vert.x

The following code will case a deadlock:


public class Test {
    public static void main(String[] args) {
        Vertx.clusteredVertx(new VertxOptions(), result->doTest(result.result()));
    }

    private static void doTest(Vertx vertx) {
       vertx.sharedData().getClusterWideMap("testmap", result->doTest(vertx, result.result()));
    }

    private static void doTest(Vertx vertx, AsyncMap map) {
        Context context = vertx.getOrCreateContext();
        context.runOnContext(v-> doTestOnContext(vertx, map));
        context.runOnContext(v-> doTestOnContext(vertx, map));
    }

    private static void doTestOnContext(Vertx vertx, AsyncMap map) {
        vertx.sharedData().getLockWithTimeout("lock1", 5000L, lockResult -> {
            if (lockResult.succeeded()){
                map.put("key", "value", result -> lockResult.result().release()) ;
            } else{
                lockResult.cause().printStackTrace();
            }
        });
    }
}

This is using Vert.x 3.2.0 with the Hazelcast cluster manager.

Both getLockWithTimeout and put are using executeBlocking with ordered = true under the covers.

This means that tasks get queued in this order (first at top):

Get lock

Put

Obviously, the second "get lock" is waiting for the first lock to be released. But that's never going to happen because nothing will be released until the second "put" completes. And that's never going to happen because it's behind the second "get lock" in the queue.

As you might imagine, our code doesn't look quite like the above. We're seeing the problem in a web server. It happens when two requests arrive at roughly the same time and are serviced by the same event loop.

We've worked around it by using ClusterManager.getSyncMap and wrapping calls to that map with executeBlocking with ordered set to false.

But I'm curious about the decision to use ordering in the Hazelcast cluster manager implementation. If a developer is using something called AsyncMap, and there's a handler to indicate completion, why is it necessary to ensure sequential execution under the covers?

Reply all

Reply to author

Forward

0 new messages