1) I understand that I can achieve replication of state using Hazelcast just by using a simple distributed map or something similar. But I'm concerned about performance and scalability of this solution.As I explained, all of the state of a given application session is maintained by a single node currently. And each node may maintain e.g. 100000 application sessions. If there are no failures 100% of all reads/writes from/to this application session will happen from this single node. At the same time, if I understand correctly, Hazelcast will distribute the map (i.e. keys/values) across the whole cluster, i.e. across all nodes in our deployment. And this is probably an unnecessary CPU and memory overhead, as we know in advance that only one node (the one handling the session) needs to access the data most of the time. It is also not clear, if it would slow down the access to the data on this node, as it needs now to talk to other nodes to fetch the data that it produced before, or? So, is there a way to reduce this overhead?
2) In case of a failover, another node will take the role of the failing one. But this failover-node will need to fetch all the data about the session state from other nodes as it owns only a (small) part of the data due to Hazelcast's distributed nature. And this also seems not very efficient. Effectively, we'd like to have a replica of the whole application session state for one session on a single other node, instead of spreading it all over the cluster. May be, we would even like to have one node as a backup of another node, i.e. it contains complete replicas of all application sessions controlled by the other node. The question here is: if and how this can be achieved by means of Hazelcast?
3) If we use one distributed Map per application-specific session and we have 1000000 such sessions in our deployment, is it a performance problem for Hazelcast to have 1000000 distributed maps? Or it can scale nicely even in this situation? I ask because I know that some other frameworks (e.g. Infinispan) have issues with creating a lot of distributed maps dynamically.
Responses interspersed below.On Fri, Nov 25, 2011 at 11:00 AM, mongonix <romi...@gmail.com> wrote:
1) I understand that I can achieve replication of state using Hazelcast just by using a simple distributed map or something similar. But I'm concerned about performance and scalability of this solution.As I explained, all of the state of a given application session is maintained by a single node currently. And each node may maintain e.g. 100000 application sessions. If there are no failures 100% of all reads/writes from/to this application session will happen from this single node. At the same time, if I understand correctly, Hazelcast will distribute the map (i.e. keys/values) across the whole cluster, i.e. across all nodes in our deployment. And this is probably an unnecessary CPU and memory overhead, as we know in advance that only one node (the one handling the session) needs to access the data most of the time. It is also not clear, if it would slow down the access to the data on this node, as it needs now to talk to other nodes to fetch the data that it produced before, or? So, is there a way to reduce this overhead?If you have a natural partition of your application into largely independent chunks of state -- in your case, application-specific sessions -- you can cut down on the communication between nodes by using DistributedTasks with keys drawn from the identifiers of your work units. It adds an additional couple of "hops" for each request, from the node that handles the original request to the node that executes the DistributedTask (and back again), but it means that session-specific state and its processing can be confined to a single node instead of having to pull in data from arbitrarily many other nodes.So the generic request handler under such an approach has this kind of structure, leaving out many details:Response handle(Request req) throws ... {Callable<Response> task = getSessionSpecificTask(req);Object key = getSessionKey(req);FutureTask<Response> futureTask = new DistributedTask<Response>(task, key);Hazelcast.getExecutorService().execute(futureTask);return futureTask.get();}The real work is done on the target node derived from key.
2) In case of a failover, another node will take the role of the failing one. But this failover-node will need to fetch all the data about the session state from other nodes as it owns only a (small) part of the data due to Hazelcast's distributed nature. And this also seems not very efficient. Effectively, we'd like to have a replica of the whole application session state for one session on a single other node, instead of spreading it all over the cluster. May be, we would even like to have one node as a backup of another node, i.e. it contains complete replicas of all application sessions controlled by the other node. The question here is: if and how this can be achieved by means of Hazelcast?Serialize session-specific state (or break it into components and serialize those), and after each state change write the new state to a corresponding IMap using the same key that you used to dispatch the DistributedTask. Use transactions or not, according to your needs. This call to IMap.put will cause a local write for the primary copy and remote writes to whatever nodes are providing the backup copies. Depending on your requirements, you might be able to decrease latency by using IMap.putAsync and not waiting (or not waiting long).
On failure of the primary node for a given session key, the new primary node will have local copies (taken from one of the backup copies) of all the data it needs, and DistributedTasks for the given session will be dispatched to it.
3) If we use one distributed Map per application-specific session and we have 1000000 such sessions in our deployment, is it a performance problem for Hazelcast to have 1000000 distributed maps? Or it can scale nicely even in this situation? I ask because I know that some other frameworks (e.g. Infinispan) have issues with creating a lot of distributed maps dynamically.I don't have any pointers, but the Hazelcast team has in the past seemed to recommend against creating very large numbers of distributed data structures.
Serialize session-specific state (or break it into components and serialize those), and after each state change write the new state to a corresponding IMap using the same key that you used to dispatch the DistributedTask. Use transactions or not, according to your needs. This call to IMap.put will cause a local write for the primary copy and remote writes to whatever nodes are providing the backup copies. Depending on your requirements, you might be able to decrease latency by using IMap.putAsync and not waiting (or not waiting long).One problem here is the fact that our session-specific state is a essentially a map that is being changed quite often. And this map can have e.g. 1000 entries. Therefore, serializing it every time when a change occurs to one of the entries and letting Hazelcast distribute this copy of the map can be a bit heavy. This is the reason why I was initially thinking about representing a state of each session as an IMap. But this has other problems that were addressed above ;-)Any ideas how to solve this problem efficiently?
--You received this message because you are subscribed to the Google Groups "Hazelcast" group.To view this discussion on the web visit https://groups.google.com/d/msg/hazelcast/-/v7_qzVzMd2IJ.
To post to this group, send email to haze...@googlegroups.com.
To unsubscribe from this group, send email to hazelcast+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/hazelcast?hl=en.
Hi Leo,
First of all let me clarify the cost of 1,000,000 Maps. On each new map, the creation process takes on all nodes. Each node creates a CMap object locally for each IMap. And the CMap object is heavy one, takes around 100K. So having 1M maps will blow up your memory and since it is created on all node, this doesn't scale. And you have to destroy to cleanup the resource it takes.
Putting this aside, whatever you are going to do with Hazelcast will be a bit of hack because Hazelcast will always distribute the data everywhere and you'll hack around this. Dealing with Distribtued Taks is nice but still you loose the locality of Sessions in your current architecture.
May be you should still keep the sessions locally for fast reads and put updates to Hazelcast to restore after failover. Assuming that you'll have a Session object which is essentially a Map with 1000 entries and not all of them will be updated, the following approach may help.
Have one giant Map with the following Key:
class MySessionKey implements PartitionAware{
long sessionId; //This is the id of your session
String SessionMapKey; //This is the key to the Session Map (a 1000 entry Map)
long getPartitionKey(){
returns sessionId;
}
}
And you'll store the sessions locally in your own data structure and Put the updates to Hazelcast With the above key and value. Now you serialize only the changed attributes of the Session. And PartitionAware will make sure that all entries that are tied to particular Session will be stored at the same node.
And if the server crashes, you'll fail over to another node and the data will not be there. You can run a MultiTask that will iterate on every node through local Keys. Using MySessionKey it will be able to generate the Session objects and return them to the caller. So basically you will restore the Session that was backed up in Hazelcast.
Does it sound good?
--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To view this discussion on the web visit https://groups.google.com/d/msg/hazelcast/-/ZkX-UcNi-5IJ.
Yes having your own maps and using topics to replicate is definitely the way you should go. We also like this approach and in the past have suggested it for others.Somehow this time it came the other way:(.
Even if you store things on hazelcast, if you have two server crashes, you can loose data anyways. In your approach you can have as many replicas as you fill safe.
--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To view this discussion on the web visit https://groups.google.com/d/msg/hazelcast/-/FIquxdI7lZkJ.
It really takes a lot of time on our end to digest the information and come up with something that doesn't suck:)
Honestly when I said yes Topic, I remembered others replicating everywhere. This was when there was many reads, less update and relatively small cluster. If it is not your case then it will not be that dumb simple. Probably there is no any public code, but as you guess to replicate everywhere is quite straight.
I understand that you are trying to replicate only to certain nodes with Topic. How do you think that you'll be able to fail over to the next right node(that has the backup data) with your load balancer?
--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To view this discussion on the web visit https://groups.google.com/d/msg/hazelcast/-/DIGujHToobMJ.
Well, even though internally Hazelcast does all the dynamic, backup thing, I don't think that you'll be able to reuse that part. It is very deeply buried inner and right now with 2.0 we are changing the way we backup entries. So relying there wouldn't be a wise move. If you'll use you should only use the goodies that are available on the public API.
It looks like there is no clean solution. You could use Topic per each instance. Topic name could be instance Address. And somehow you can define who is the backup of whom and according that plan every member would be listening to the topic that it is backup and etc.
While we communicate among the memers(like sending updates and etc) everything is really under the hood. And it doesn't use any listener type of mechanism. It just creates a binary packet. Puts all required information in and sends to the node through the TCP/IP layer. So again reusing it wouldn't be possible.
It looks that the tools you have are Hazelcast.getCluster().getMembers() gives you the member list. Membership Listener to listen if any member crashed. And topics.
Honestly I really don't know what would be my final suggestion. Based on your domain you'll have to choose any of these approaches. And for sure all of them hos their downsides and upsides.
Hi Fuad,
On Friday, January 13, 2012 3:50:10 PM UTC+1, Fuad Malikov wrote:Well, even though internally Hazelcast does all the dynamic, backup thing, I don't think that you'll be able to reuse that part. It is very deeply buried inner and right now with 2.0 we are changing the way we backup entries. So relying there wouldn't be a wise move. If you'll use you should only use the goodies that are available on the public API.
I see. It is really a pity it is not reusable in any form currently. But do you think it could be an idea to support some sort of re-usability in the future and support custom distributed data-structures?
After all, according to my understanding, the "only" thing required to support my proposal is the ability to extend/replace the reactions to update/backup notifications. And the ability to produce "custom" update notifications (binary packets with serialized updates, whatever) when e.g. a value implements a certain special interface, e.g. "DeltaAware".
BTW, IIRC, this is roughly how Infinispan implements/supports custom DeltaAware data structures. You just implement such an interface which essentially handles receiving and producing update notifications and reacts on them. This is enough for Infinispan to plug in a custom DeltaAware data structure into their standard replication framework. But may be I'm wrong...
It looks like there is no clean solution. You could use Topic per each instance. Topic name could be instance Address. And somehow you can define who is the backup of whom and according that plan every member would be listening to the topic that it is backup and etc.
I see your point. This way I only need as many topics as the number of nodes in the cluster and not as many as session-maps.
While we communicate among the memers(like sending updates and etc) everything is really under the hood. And it doesn't use any listener type of mechanism. It just creates a binary packet. Puts all required information in and sends to the node through the TCP/IP layer. So again reusing it wouldn't be possible.
Just out of curiosity. I understand that "listener" is may be not the best name for it. But what do you do with the binary packets that you receive? You have some sort of a HZ logic that decides what to do when a packet is received (This is what I called "listener" or handler) But it is probably not exposed via a public API.
And BTW is there any architectural write up (besides the source code), that describes at high- or low-level how replication, failover, etc is implemented, how update notifications are sent, etc? It would be very interesting to better understand the implementation of these mechanisms in Hazelcast.
It looks that the tools you have are Hazelcast.getCluster().getMembers() gives you the member list. Membership Listener to listen if any member crashed. And topics.
Yes. Looks like it. I'll implement Hazelcast on top of Hazelcast ;-)
Honestly I really don't know what would be my final suggestion. Based on your domain you'll have to choose any of these approaches. And for sure all of them hos their downsides and upsides.
I'll think more about it and experiment. If I find any interesting solution, I'll let you know...
Thanks a lot for your answers! I really appreciate it.
Leo
--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To view this discussion on the web visit https://groups.google.com/d/msg/hazelcast/-/oQm1b8wOaFQJ.