Deadlock in HazelcastClusterManager when I tring to get two shared lock in cluster

lokshin...@gmail.com

unread,

Apr 18, 2017, 10:34:10 AM4/18/17

to vert.x

Hi All.

We have a verticle with a consumer, which is registered with the address "test". We need to acquire two shared locks in the message handler (lockA and lockB for example) to do some work. After this, we send two messages (message1 and message2) to "test" address.

1.Message handler tries to get shard lock lockA for message1

2.Message handler tries to get shard lock lockA for message2

3.Message handler gets lockA for message1 and tries to get lockB for message1

At this moment we obtain a deadlock in io.vertx.spi.cluster.hazelcast.HazelcastClusterManager#getLockWithTimeout function when it tries to call executeBlocking with TaskQueue instance that "always run all tasks in order" (from TaskQueue javadoc). Message2 is waiting for lockA, and message1 is waiting for message2, when it get lockA, but lockA already holded by message1.

As I can see, even in situation when message1 needs only lockA and message2 needs only lockB message2 will be waiting until message1 gets lockA.

Is this an expected behavior, or I did a mistake somewhere?

In local mode everything works fine.

Here is a short test to reproduce this situation:

@Test
public void test(@NotNull TestContext context) throws InterruptedException {

  final Async async = context.async(2);
  getVertx().deployVerticle(new Vert1(async));
  Thread.sleep(1000);

  getVertx().eventBus().send("test", "testmsg");
  getVertx().eventBus().send("test", "testmsg");

}

public static final class Vert1 extends AbstractVerticle {
  private final @NotNull Logger logger = LoggerFactory.getLogger(this.getClass());

  final @NotNull Async async;

  public Vert1(@NotNull Async async) {
    this.async = async;
  }

  @Override
  public void start() {
    vertx.eventBus().consumer("test", r -> {
              vertx.sharedData().getLockWithTimeout("lock1", 100000, res -> {
                if (res.succeeded()) {
                  logger.info("Lock11 catched");
                  sleep();
                  final Lock lock1 = res.result();
                  vertx.sharedData().getLockWithTimeout("lock2", 100000, res2 -> {
                    if (res2.succeeded()) {
                      logger.info("Lock21 catched");
                      res2.result().release();
                      lock1.release();
                      async.countDown();
                    } else {
                      logger.info("Lock21 catch failed");
                      lock1.release();
                    }
                  });
                } else {
                  logger.info("Lock11 catch failed");
                }
              });
            }
    );
  }

  public static void sleep() {
    try {
      Thread.sleep(5000);
    } catch (InterruptedException e) {
      e.printStackTrace();
    }
  }
}

Tim Fox

unread,

Apr 18, 2017, 11:54:28 AM4/18/17

to vert.x

I don't know if it's related to the issue, but you should never block the event loop in Vert.x: your sleep() method does exactly that. If you want to simulate a delay, use the vertx.setTimer() method.

lokshin...@gmail.com

unread,

Apr 22, 2017, 6:34:21 AM4/22/17

to vert.x

Unfortunately, this problem still actual without any sleep() call in event loop.

вторник, 18 апреля 2017 г., 18:54:28 UTC+3 пользователь Tim Fox написал:

Tim Fox

unread,

Apr 22, 2017, 2:48:33 PM4/22/17

to vert.x

Could you provide a reproducer?

sd1ver

unread,

Apr 24, 2017, 4:09:58 PM4/24/17

to vert.x

Hello, Tim

I've found the same problem and I hope that I'm doing smething wrong. Here is an example: https://github.com/sd1ver/vertx-test.

If I run an example without cluster it works well, but if I run it in Hazelcast cluster it doesn't work fine.

Could you tell me where is my mistake?

суббота, 22 апреля 2017 г., 21:48:33 UTC+3 пользователь Tim Fox написал:

Tim Fox

unread,

Apr 25, 2017, 10:07:27 AM4/25/17

to vert.x

I'n guessing this is this issue https://github.com/vert-x3/vertx-hazelcast/issues/41 which was fixed in Jan?

sd1ver

unread,

Apr 27, 2017, 2:24:55 AM4/27/17

to vert.x

Yes, it's look like that issue. But the problem is still not solved in a last version of vertx-hazelcast (3.4.1).

вторник, 25 апреля 2017 г., 17:07:27 UTC+3 пользователь Tim Fox написал:

Tim Fox

unread,

Apr 27, 2017, 3:15:58 AM4/27/17

to vert.x

Best to comment on the issue, and re-open the issue if not fixed :)

Thomas SEGISMONT

unread,

May 3, 2017, 11:38:39 AM5/3/17

to ve...@googlegroups.com

Please provide a reproducer on GitHub before reopening the issue. Thanks

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/9fb84c3e-6081-4e49-bbe1-47ad979b39ba%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward