Some basic questions

40 views
Skip to first unread message

syg6

unread,
Nov 10, 2009, 7:36:16 AM11/10/09
to Hazelcast, joan.ba...@ventusproxy.com
We would like to plug Hazelcast into our existing application. Our
data structure looks like this:

ConcurrentHashMap<String, CallDocs> allCalls;

public class CallDocs
{
...
ConcurrentHashMap<String, XmlDoc> docCache;

}

allCalls contains n CallDocs, indexed on an idCall. Each CallDoc
contains m XMLDocs.

XMLDoc is a POJO which has only a couple fields - a date and String,
path, which tells us where the physical XML document is saved on disk.
We don't save the XML files in memory because they are HUGE.

What the application does at startup is create allCalls and an empty
CallDocs for each call. Then as it receives requests it adds /
modifies / deletes XMLDoc entries within the docCache for each Call.

So if we were to use Hazelspan we'd have to create an IMap for each
Call. Something like this:

ConcurrentHashMap<String, Map<String, XMLDoc>> allCalls;

We organize it this way because we have to do a bunch of bulk ops, for
ALL entries of a given Call. So instead of having a flat structure
with XMLDocs for ALL calls at the same level, and having to iterate
over hundreds of thousands, even milliions of entries looking for
entries for a given call, we have them grouped by Call.

So here are the questions:

1. Is it feasible to create an IMap for each Call within a Hazelcast
Cluster? We could have anywhere from 5-200 Calls.

2. Let's say we have a Cluster with 2 members, A and B. Each has a
couple hundred-thousand entries. Along comes member C. When C joins he
has to get data from A and B, and both A and B need to re-hash to
equally distribute data. But at the same time both A and B are
receiving new data. When C finally has its data and the Cluster is
ready, how do A and B send the new data to C? Is the cluster blocked
at any time?

3. Related to #2, is it safe on Cluster start-up to start one node
after the other in succession, or is it better to wait a while between
nodes?

4. Anyway to know who is the oldest member?

5. We need to have some data (sums mainly) updated with each request.
Our idea is to save this data in memory on the oldest Cluster member
(A) and perhaps have a backup on the second-oldest member (B). It is
not persisted. Our idea was to use Distributed Execution to have all
members send messages to A and B to update the data. But what happens
when A or B dies? It could be a lot of work. Would some simple
Hazelcast Distibuted Data Structure be a better idea?

6. Any chance of implementing a put that has time-to-live, that
automatically removes entries when time-to-live has expired?

7. Care to compare / contrast Hazelcast with Infinispan?

Many thanks for any advice!

Bob

Talip Ozturk

unread,
Nov 11, 2009, 11:18:34 PM11/11/09
to haze...@googlegroups.com
> 1. Is it feasible to create an IMap for each Call within a Hazelcast
> Cluster? We could have anywhere from 5-200 Calls.

You can create as many IMap as you want. Creating 1000 maps is no big
deal for Hazelcast.

> 2. Let's say we have a Cluster with 2 members, A and B. Each has a
> couple hundred-thousand entries. Along comes member C. When C joins he
> has to get data from A and B, and both A and B need to re-hash to
> equally distribute data. But at the same time both A and B are
> receiving new data. When C finally has its data and the Cluster is
> ready, how do A and B send the new data to C? Is the cluster blocked
> at any time?

Hazelcast will make sure that every member will read the right value
for any given key at any given time, even when a new member joins or a
member leaves. This is one of the main jobs of Hazelcast anyways. As
of 1.7.1 there is almost no blocking even when a member joins. (there
used to be but not anymore)

> 3. Related to #2, is it safe on Cluster start-up to start one node
> after the other in succession, or is it better to wait a while between
> nodes?

start any number of nodes at anytime. it doesn't matter much for Hazelcast.

> 4. Anyway to know who is the oldest member?

the first member at Hazelcast.getCluster().getMembers(). but you
shouldn't rely on that. I highly recommend avoiding member specific
things.

> 5. We need to have some data (sums mainly) updated with each request.
> Our idea is to save this data in memory on the oldest Cluster member
> (A) and perhaps have a backup on the second-oldest member (B).  It is
> not persisted. Our idea was to use Distributed Execution to have all
> members send messages to A and B to update the data. But what happens
> when A or B dies? It could be a lot of work. Would some simple
> Hazelcast Distibuted Data Structure be a better idea?

you can store your data in a distributed map for sure and don't worry
about what will happen if a member dies or how it will redistribute.
If you have have a backup configured (1 backup by default) then your
data is safe as long as you have at least one member left in the
cluster and no-more than 1 member dies at a time.

> 6. Any chance of implementing a put that has time-to-live, that
> automatically removes entries when time-to-live has expired?

yes. please check out
http://code.google.com/docreader/#p=hazelcast&s=hazelcast&t=MapEviction

> 7. Care to compare / contrast Hazelcast with Infinispan?

i am probably the wrong person to do that comparison, at least in a
public discussion because I am biased.. I highly recommend you to try
it out and see it yourself. Create 10 node cluster, do a stress test.

-talip

syg6

unread,
Nov 16, 2009, 7:43:04 AM11/16/09
to Hazelcast
Hello again. Sorry for the delay in responding, flu ... :(

On Nov 12, 5:18 am, Talip Ozturk <ta...@hazelcast.com> wrote:
> > 1. Is it feasible to create an IMap for each Call within a Hazelcast
> > Cluster? We could have anywhere from 5-200 Calls.
>
> You can create as many IMap as you want. Creating 1000 maps is no big
> deal for Hazelcast.

Really? Awesome! I know that with Infinispan I tried creating a Cache
for about 100 Calls and there were literally thousands and thousands
of multicast messages being sent back and forth, and it seemed to
drag. The authors even told me that would probably be a bad idea.

I was testing with 90 IMaps on 2 machines and was seeing some
weirdness. I don't quite have the problem located yet but it does seem
that at a certain point, between 15-20 IMaps, my listener stops
firing, which I assume means that the remote put is not being
performed.

I am only doing simple puts in one of the 20 IMaps I create. When I
only create 1-15 IMaps, all is well. More than that, the listener
stops firing. Weird. Is there any way to turn on logging to see what's
going on?
>
> > 2. Let's say we have a Cluster with 2 members, A and B. Each has a
> > couple hundred-thousand entries. Along comes member C. When C joins he
> > has to get data from A and B, and both A and B need to re-hash to
> > equally distribute data. But at the same time both A and B are
> > receiving new data. When C finally has its data and the Cluster is
> > ready, how do A and B send the new data to C? Is the cluster blocked
> > at any time?
>
> Hazelcast will make sure that every member will read the right value
> for any given key at any given time, even when a new member joins or a
> member leaves. This is one of the main jobs of Hazelcast anyways. As
> of 1.7.1 there is almost no blocking even when a member joins. (there
> used to be but not anymore)
>

Amazing.

> > 3. Related to #2, is it safe on Cluster start-up to start one node
> > after the other in succession, or is it better to wait a while between
> > nodes?
>
> start any number of nodes at anytime. it doesn't matter much for Hazelcast.
>

Again, amazing.

> > 4. Anyway to know who is the oldest member?
>
> the first member at Hazelcast.getCluster().getMembers(). but you
> shouldn't rely on that. I highly recommend avoiding member specific
> things.
>

Ok.

> > 5. We need to have some data (sums mainly) updated with each request.
> > Our idea is to save this data in memory on the oldest Cluster member
> > (A) and perhaps have a backup on the second-oldest member (B).  It is
> > not persisted. Our idea was to use Distributed Execution to have all
> > members send messages to A and B to update the data. But what happens
> > when A or B dies? It could be a lot of work. Would some simple
> > Hazelcast Distibuted Data Structure be a better idea?
>
> you can store your data in a distributed map for sure and don't worry
> about what will happen if a member dies or how it will redistribute.
> If you have have a backup configured (1 backup by default) then your
> data is safe as long as you have at least one member left in the
> cluster and no-more than 1 member dies at a time.
>
Yea ... we're just not sure if it's worth it to use a map. Keep in
mind these are just totals, simple numbers, and there are only 3-4 of
them. It seems like overkill to use a Map or other Distributed Object,
but I guess if we want them univesrally available and updateable,
we'll have to use one ...

> > 6. Any chance of implementing a put that has time-to-live, that
> > automatically removes entries when time-to-live has expired?
>
> yes. please check outhttp://code.google.com/docreader/#p=hazelcast&s=hazelcast&t=MapEviction
>
Great! I knew about eviction but thought it was only used when the
cache was too small to hold the data ...

> > 7. Care to compare / contrast Hazelcast with Infinispan?
>
> i am probably the wrong person to do that comparison, at least in a
> public discussion because I am biased.. I highly recommend you to try
> it out and see it yourself. Create 10 node cluster, do a stress test.
>
Fair enough. I am sure there will soon be comparisons out there, once
the world starts using both.

> -talip

Cheers,
Bob

syg6

unread,
Nov 16, 2009, 10:41:56 AM11/16/09
to Hazelcast
I've done some more testing and I'm not sure what exactly the problem
is but it seems it probably is NOT Hazelcast, since a simple demo I
set up, with 100 IMaps on 2 machines, seems to work fine. If I could
get logging to work I might see what the issue is ...

Another little question - how can you debug multicast discovery? I've
had to use TCP/IP (tcp-ip enabled="true") and enter my ip addresses
manually (<interface>...</interface>) in order for 2 machines to find
each other. These same 2 machines run JGroups no problem. I've even
tried using the multicast group and port that JGroups uses and no go.

And lastly - what's the easiest way to configure the time-to-live
programmatically, instead of via hazelcast.xml? If I have 100 IMaps,
each with a different time-to-live, defined in the database, I can't
use an XML file to config.

Many thanks, Hazelcast is wicked!

Bob

syg6

unread,
Nov 17, 2009, 9:39:40 AM11/17/09
to Hazelcast


On Nov 16, 4:41 pm, syg6 <syg...@gmail.com> wrote:
> I've done some more testing and I'm not sure what exactly the problem
> is but it seems it probably is NOT Hazelcast, since a simple demo I
> set up, with 100 IMaps on 2 machines, seems to work fine. If I could
> get logging to work I might see what the issue is ...
>
> Another little question - how can you debug multicast discovery? I've
> had to use TCP/IP (tcp-ip enabled="true") and enter my ip addresses
> manually (<interface>...</interface>) in order for 2 machines to find
> each other. These same 2 machines run JGroups no problem. I've even
> tried using the multicast group and port that JGroups uses and no go.
>
> And lastly - what's the easiest way to configure the time-to-live
> programmatically, instead of via hazelcast.xml? If I have 100 IMaps,
> each with a different time-to-live, defined in the database, I can't
> use an XML file to config.
>

I figured this out looking at the source:

Config cfg = new Config();
Map<String,MapConfig> mapConfigs = new HashMap<String,MapConfig>();

MapConfig mCfg = new MapConfig();
mCfg.setTimeToLiveSeconds(300); // 300 seconds, 5 minutes ...
mapConfigs.put("map1",mCfg);

cfg.setMapMapConfigs(mapConfigs);

HazelcastInstance h1 = Hazelcast.newHazelcastInstance(cfg);
map1 = h1.getMap("map1");
map1.addEntryListener(this,true);

This works fine. One thing though - why isn't the removeItem or
removeEntry Listener envoked when the item/entry is removed? Is this
possible?

Talip Ozturk

unread,
Nov 17, 2009, 10:15:54 AM11/17/09
to haze...@googlegroups.com
> HazelcastInstance h1 = Hazelcast.newHazelcastInstance(cfg);
> map1 = h1.getMap("map1");
> map1.addEntryListener(this,true);
>
> This works fine. One thing though - why isn't the removeItem or
> removeEntry Listener envoked when the item/entry is removed? Is this
> possible?

If you have an entrylistener and if an entry is removed then your
entryListener's entryRemoved(Event) method should be called. If not,
it is a bug. Make sure you added your entry listener before you call
the map.remove(key). Can you please come up with a JUnit test that can
reproduce it.

Regards,
-talip

syg6

unread,
Nov 18, 2009, 3:58:12 AM11/18/09
to Hazelcast
Ok, I made a simple JUnit test. I wasn't sure if when an entry is
removed by time-to-live if the Remove Listener or the Evict Listener
is called so you may have to modify the code. Here's the link:

http://code.google.com/p/hazelcast/issues/detail?id=171

Cheers,

Bob
Reply all
Reply to author
Forward
0 new messages