Strange deadlock when starting Hazelcast 3.0 with persistent maps

395 views
Skip to first unread message

Lukas Blunschi

unread,
Aug 8, 2013, 5:00:08 AM8/8/13
to haze...@googlegroups.com
Hi,

I'm running into a strange deadlock situation when starting Hazelcast 3.0 with persistent maps.

What I do is:

1. I start my application on 4 nodes at the same time.

2. To make sure Hazelcast knows all its cluster members, I use the property hazelcast.initial.min.cluster.size. This works fine.

3. Now, I have to ensure that all persistent maps are fully loaded. I do this by polling map.size() on all persistent maps on all nodes until the map sizes do not change anymore. (this might not be the perfect way to do this, if you know a better way, please let me know:-)
This worked fine with Hazelcast 2.6.1, however now with Hazelcast 3.0 the nodes get blocked while loading the map entries.
After some waiting, I get exceptions like the following:

2013-08-08 10:45:44,030 WARN  [hz.appway.response] com.hazelcast.spi.Invocation - [blade1]:5713 [blade] Retrying invocation: InvocationImpl{ serviceName='hz:impl:mapService', op=com.hazelcast.map.operation.MapSizeOperation@29121a52, partitionId=0, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=100, callTimeout=60000, target=Address[blade2]:5713}, Reason: com.hazelcast.spi.exception.RetryableHazelcastException: Map is not ready.

I don't really know the problem, but one possible candidate might be that it has something to do with a node calling map.size() on a map that is in the process of being initialized...

Can you have a look into this?

Thanks and best,
Lukas

Enes Akar

unread,
Aug 9, 2013, 4:07:44 PM8/9/13
to haze...@googlegroups.com
Hi Lukas;

In version 3.0, operations is not executed until initial load is completed. This wait process is done by retrying operation requests while the map is not loaded yet. Every retry is not logged but if the number of retries exceed a threshold number (100 I guess) they are started to be logged as warning. So if your initial load takes long you can see these messages. 

But after the completion of load process, logs should stop and map.size() should return correct value. Is that the what you experienced? Have you waited the load operation to be completed? 


--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at http://groups.google.com/group/hazelcast.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Enes Akar
Hazelcast | Open source in-memory data grid
Mobile: +90.507.150.56.71

Lukas Blunschi

unread,
Aug 13, 2013, 5:21:15 AM8/13/13
to haze...@googlegroups.com
Hi Enes,

thanks for your answer.

I waited quite a long time (maybe a minute or so), but the initialization did not finish. Afterwards, I think some timeout happened and the system started, but without Hazelcast being initialized:-(

One option I should try is to remove my own polling and simply let the initialization happen. Maybe my polling - calling map.size every second one every node on every map was to extensive...

I switched back to 2.6.1 for the moment, but will let you know how it goes the next time I try HZ 3.0 :-)

Cheers,
Lukas
Reply all
Reply to author
Forward
0 new messages