Queue hangup

52 views
Skip to first unread message

Vladimir Kirchev

unread,
Feb 3, 2012, 10:39:57 AM2/3/12
to Hazelcast
Hi,

We use hazelcast 1.9.4.4
For a test scenario we use a 4 nodes setup. Two nodes produce to a
queue, and two consume.
When we restart the two consumers simultaneously, the two producers
block and stop working correctly. And when the consumers are up and
running we have two separate clusters.
This situation gets fixed only with a restart of the two producers.

We have the following log on the producers:

"...EventHandler-3" daemon prio=10 tid=0x00002aaad7bda000 nid=0x6277
waiting on condition [0x000000004079f000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.hazelcast.impl.BaseManager
$ResponseQueueCall.getRedoAwareResult(BaseManager.java:574)
at com.hazelcast.impl.BaseManager
$ResponseQueueCall.getResult(BaseManager.java:511)
at com.hazelcast.impl.BaseManager
$RequestBasedCall.getResultAsObject(BaseManager.java:372)
at com.hazelcast.impl.BaseManager
$ResponseQueueCall.getResultAsObject(BaseManager.java:455)
at com.hazelcast.impl.BaseManager
$RequestBasedCall.getResultAsObject(BaseManager.java:368)
at com.hazelcast.impl.BaseManager
$ResponseQueueCall.getResultAsObject(BaseManager.java:455)
at
com.hazelcast.impl.BlockingQueueManager.generateKey(BlockingQueueManager.java:
563)
at
com.hazelcast.impl.BlockingQueueManager.offer(BlockingQueueManager.java:
185)
at
com.hazelcast.impl.BlockingQueueManager.offer(BlockingQueueManager.java:
181)
at com.hazelcast.impl.FactoryImpl$QProxyImpl
$QProxyReal.offer(FactoryImpl.java:2462)
at com.hazelcast.impl.FactoryImpl$QProxyImpl
$QProxyReal.offer(FactoryImpl.java:2451)
at com.hazelcast.impl.FactoryImpl
$QProxyImpl.offer(FactoryImpl.java:2389)
at ...
at ...
at ...
at java.util.concurrent.Executors
$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask
$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)



xxx xx, xxxx xx:xx:xx xx com.hazelcast.impl.BlockingQueueManager
INFO: /xxx.xxx.xxx.xxx:xxxx [asynch-group] ======= -1:
BLOCKING_GENERATE_KEY ========
thisAddress= Address[xxx.xxx.xxx.xxx:xxxx], target= null
targetMember= null, targetConn=null, targetBlock=null
null Re-doing [2360] times! q:distributedBlaQueue : null


Any ideas on the cause of this problem?


Kind Regards,
Vladimir Kirchev

Fuad Malikov

unread,
Feb 6, 2012, 2:13:31 AM2/6/12
to haze...@googlegroups.com
Hi Vladimir,

Can you enable the debug logs and send us all logs for 4 nodes starting from restart, till the problem occurs.
-fuad


Vladimir Kirchev

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To post to this group, send email to haze...@googlegroups.com.
To unsubscribe from this group, send email to hazelcast+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/hazelcast?hl=en.


Vladimir Kirchev

unread,
Feb 10, 2012, 5:31:24 AM2/10/12
to Hazelcast
Hi, Fuad,

Sorry for the delay with my answer, but our test environment is used
by the QA team, and I do not have much time to test the problem.

I tried enabling the debug with -Dhazelcast.logging.type=log4j and -
Dlog4j.logger.com.hazelcast=DEBUG, but I do not see anything new in
the log files. Am I doing it in the correct way or not.

The new thing I can add is that, when restarting the consumer nodes,
after the restart they join the cluster, but the elements that are put
to the queue by the producers remain there until the restart of all
nodes that are part of the cluster. MapStore is used and the elements
in the queue are loaded from a database.

- Vladimir Kirchev

Vladimir Kirchev

unread,
Feb 15, 2012, 5:42:30 AM2/15/12
to Hazelcast

Hi, Fuad,

Here is some more info.

We use a queue, which is backed by an IMap using MapStore.
They are configured in the following way:

QueueConfig mainQueueConfig = new QueueConfig();
mainQueueConfig.setName("DISTRIBUTED_QUEUE");
mainQueueConfig.setMaxSizePerJVM(0);
mainQueueConfig.setBackingMapRef("DISTRIBUTED_MAP");

MapStoreConfig mapStoreConfig = new MapStoreConfig();
mapStoreConfig.setEnabled(true);
mapStoreConfig.setWriteDelaySeconds(0); // write-through, synchronous
with put
mapStoreConfig.setImplementation(mapStore);

MapConfig queueBackupMapConfig = new MapConfig();
queueBackupMapConfig.setMapStoreConfig(mapStoreConfig);
queueBackupMapConfig.setName(DISTRIBUTED_REGULATIONS_MAP);
queueBackupMapConfig.setBackupCount(0);
queueBackupMapConfig.setEvictionPolicy("NONE");
queueBackupMapConfig.getMaxSizeConfig().setMaxSizePolicy("cluster_wide_map_size");
queueBackupMapConfig.getMaxSizeConfig().setSize(0);
queueBackupMapConfig.setMergePolicy("hz.ADD_NEW_ENTRY");

Config config = new Config();

config.addQueueConfig(mainQueueConfig);
config.addMapConfig(queueBackupMapConfig);


So we have two nodes A and B.

On Node A, we have code that puts to the Queue, and on Node B we have
code that gets from the Queue.

Initially when the two nodes are started the puts and gets work
correctly.

But when Node B is restarted, it blocks on queue.take(), even though
the queue is not empty. It stays this way until Node A is stopped or
restarted. Only then Nobe B is able to process the elements in the
queue.

We noticed the opposite behavior, Node A blocks on queue.offer() but
we still are not able to reproduce it.

About the hazelcast logs, I was not able to make it log anything but
INFO messages.

If you need more info, let me know.


- Vladimir Kirchev

Vladimir Kirchev

unread,
Feb 15, 2012, 8:25:30 AM2/15/12
to Hazelcast
Additional info: This problem can only be reproduced on 1.9.4.x.
Currently I cannot reproduce it on 2.0-RC1.

- Vladimir Kirchev
Reply all
Reply to author
Forward
0 new messages