MapStore persistence logic

lep

unread,

Aug 7, 2011, 4:05:11 AM8/7/11

to Hazelcast

HI,

I running a few tests on MapStore persistence new api

<queue name="index_queue">
<max-size-per-jvm>10000</max-size-per-jvm>
<time-to-live-seconds>0</time-to-live-seconds>
<backing-map-ref>queue-map</backing-map-ref>
</queue>

<map name="queue-map">
<backup-count>1</backup-count>
<map-store enabled="true">
<class-name>test.test.BerkeleyMapStore</class-name>
<write-delay-seconds>2</write-delay-seconds>
</map-store>
</map>

I have setup 2 instances of hazelcast

1. run instance 1 (load MapStore implementation )
2. run instance 2 (did not load MapStore implementation yet)
3. client q.size = 0
4. client and inserted 1000 elements to Queue
5. instance 1 save 516 elemets
6. instance 2 save 484 elemets
7. client q.size 1000
8. stop instance 1 (copy elements to instance 1 and save 1000 elements
on instance 2)
9 stop instance 2
10. start instance 1 (load 516 elements from local storage )
11. start instance 2 only 516 elements in Queue
why hazelcast did not load 1000 elements from local disk
(MapStore logic did not activated )
12. client q.size 516

It looks like the logic does not load data from instance 2.

I'm using the version from the trunk

Please advice

Mehmet Dogan

unread,

Aug 7, 2011, 10:56:28 PM8/7/11

to haze...@googlegroups.com

Hi,

MapLoader api assumes (requires) external storage is a shared one between nodes that each node should access, not local/distinct storages. When first (master) node starts up, it will call MapLoader.loadAllKeys() method (resulting set should contain all keys that are required to loaded initially); it will divide keys set into sub-sets according to owned partitions of each node and send those sets of keys to related nodes. Then each node will load its local data by calling MapLoader.loadAll(Collection keys).

MapLoader.loadAllKeys() will be called just once in all cluster and MapLoader.loadAll(Collection keys) will be called by every node available on startup.

Hope this helps..

@mmdogan

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To post to this group, send email to haze...@googlegroups.com.
To unsubscribe from this group, send email to hazelcast+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/hazelcast?hl=en.

Mehmet Dogan

unread,

Aug 8, 2011, 12:23:39 AM8/8/11

to haze...@googlegroups.com

Correction on MapLoader.loadAllKeys();

it will not be called on only first node (only on a single node), it will be called on all nodes. (All nodes will be load all initial keys.) Then each node call MapLoader.loadAll(Collection keys) with its owned portion of keys. (Also note that MapLoader.loadAll(Collection keys) can be called multiple times because it on a node with a different chunk of owned keys. It is to optimize load performance.)

Sorry for mis-information.

@mmdogan

eivind.s...@gmail.com

unread,

Oct 13, 2011, 5:31:38 PM10/13/11

to haze...@googlegroups.com

I have tested this with version 1.9.4.3 and it doesn't behave well in the case where the whole cluster is taken down. My MapLoader implementation uses disk files, one file per key. Merge policy on the map is "hz.ADD_NEW_ENTRY".

This is how to reproduce the problem:

1) Start 2 nodes, node 1 and node2 with empty MapLoader storage. hazelcast> m.size returns 0.

2) On node 1 (or 2, doesn't matter): hazelcast> m.putmany 1000

The 1000 keys distributes 476/524 on the nodes. hazelcast> m.size returns 1000. Good.

3) Stop node 1, stop node 2.

4) Start node 1.

Node 1 calls MapLoader.loadAllKeys(), and then MapLoader.loadAll(Collection keys) with a collection with 476 keys. Good.

5) Start node 2.

Node 2 calls MapLoader.loadAllKeys(), and then MapLoader.loadAll(Collection keys) with a collection with less than 524 keys. In most test runs a collection with 519 keys is supplied. hazelcast> m.size returns 985. 15 keys have gone missing. Not good.

So I guess that on cluster join, node 2 gets a set of partitions to be responsible for. And any keys it loads through MapLoader.loadAllKeys() that doesn't map to any of its own partitions, is ignored.

Node 2 has keys in its MapLoader storage that (after the "cluster restart") belongs to Node 1, but they are not sent to Node 1.

If I, between 4) and 5) do a hazelcast> m.putmany 200 50 1000, the number of keys missing after node 2 starts is even bigger.

Am I doing something wrong, or is this a bug/"feature"?

-

Eivind

Jeff

unread,

Oct 14, 2011, 9:55:53 AM10/14/11

to Hazelcast

Hi Elvind,

I would suggest checking checking your loadAllKeys() implementation.
Make sure that it is in fact returning 524 keys. I havn't noticed any
discrepancies with the loading so far.

eivind.s...@gmail.com

unread,

Oct 17, 2011, 3:16:48 AM10/17/11

to haze...@googlegroups.com

My loadAllKeys() is always returning the correct number of keys (476 and 524 in the example above). So there is a number of keys that are "filtered away" in Hazelcast between loadAllKeys() and loadAll(Collection keys). 524 keys are returned by loadAllKeys, and 519 keys are in the collection passed to loadAll(Collection keys).

eivind.s...@gmail.com

unread,

Oct 17, 2011, 4:22:16 AM10/17/11

to haze...@googlegroups.com

My disk stores are not shared between the nodes. If a shared store is a requirement (which Mehmet writes), that may be the reason for losing keys.

Where is that requirement documented? The JavaDoc says nothing about it, and I can't find anything elsewhere in the documentation either.

A side-note: such a requirement goes against an important feature in the Hazelcast architecture (form http://www.hazelcast.com/products.jsp): "Hazelcast is a peer-to-peer solution (there is no master node, every node is a peer) so there is no single point of failure." Hazelcast suddenly requires a HA database or HA shared FS.

Talip Ozturk

unread,

Oct 17, 2011, 8:22:15 AM10/17/11

to haze...@googlegroups.com

Hazelcast as an in-memory data grid is peer2peer (no master) cluster.
But it doesn't mean that your map store should be peer2peer. Your map
store can be centralized/shared (which will be easier to work with) or
distributed (as in your case). If distributed then you will have to
manage the migration of partitions. It is possible that during the
'initial load' one or two partitions are migrating. We can enhance
Hazelcast to handle the migration issue during initial load. But
regardless, you should be aware of the migration issue long after your
cluster starts. When can you experience migration? After a new or dead
member...

Use migration listener to detect migrations.
http://hazelcast.com/docs/1.9.4/manual/single_html/#InternalsDistributedMap

http://twitter.com/oztalip

> --
> You received this message because you are subscribed to the Google Groups
> "Hazelcast" group.

> To view this discussion on the web visit
> https://groups.google.com/d/msg/hazelcast/-/YFjIH5lfOmsJ.

Reply all

Reply to author

Forward