I have tested this with version 1.9.4.3 and it doesn't behave well in the case where the whole cluster is taken down. My MapLoader implementation uses disk files, one file per key. Merge policy on the map is "hz.ADD_NEW_ENTRY".
This is how to reproduce the problem:
1) Start 2 nodes, node 1 and node2 with empty MapLoader storage. hazelcast> m.size returns 0.
2) On node 1 (or 2, doesn't matter): hazelcast> m.putmany 1000
The 1000 keys distributes 476/524 on the nodes. hazelcast> m.size returns 1000. Good.
3) Stop node 1, stop node 2.
4) Start node 1.
Node 1 calls MapLoader.loadAllKeys(), and then MapLoader.loadAll(Collection keys) with a collection with 476 keys. Good.
5) Start node 2.
Node 2 calls MapLoader.loadAllKeys(), and then MapLoader.loadAll(Collection keys) with a collection with less than 524 keys. In most test runs a collection with 519 keys is supplied. hazelcast> m.size returns 985. 15 keys have gone missing. Not good.
So I guess that on cluster join, node 2 gets a set of partitions to be responsible for. And any keys it loads through MapLoader.loadAllKeys() that doesn't map to any of its own partitions, is ignored.
Node 2 has keys in its MapLoader storage that (after the "cluster restart") belongs to Node 1, but they are not sent to Node 1.
If I, between 4) and 5) do a hazelcast> m.putmany 200 50 1000, the number of keys missing after node 2 starts is even bigger.
Am I doing something wrong, or is this a bug/"feature"?
-
Eivind