Hazelcast Data Partition replica

69 views
Skip to first unread message

Rakesh Sharma

unread,
Aug 25, 2021, 2:57:58 PM8/25/21
to Hazelcast
By default, Hazelcast creates a single copy/replica of each partition. Is it possible to change this setting have multiple replica partitions in Hazelcast Cluster?
 Why I am looking for the answer depends of the following scenario:

If we have 4 member cluster with 3 sync backup copies. With default 271 partitions

node 1 P1: (primary partition)       node 2  P1 :(replicas Partiton1)  node 3  Partition3: (primary Partition 3 )  node 4  Partition4: (primary Partition 4 )
 ______________________                       _____________________ _____________________ _____________________
|   [data1] - pri replica          |         |   [data1] - backup replica  | |   [data1] - backup replica  | |   [data1] - backup replica  |
|   [data2] - pri replica          |         |   [data2] - backup replica  | |   [data2] - backup replica  | |   [data2] - backup replica  |
------------------------------------         ------------------------------- ---------------------------- ----------------------------
In short:
m1 holds - data1 and data2 in Primary Partition P1 
m2 holds - data1 and data2 replica copy on Backup Partition Copy of P1
m3 holds - data1 and data2 replica copy on Primary Partition of P3
m4 holds - data1 and data2 replica copy on Primary Partition of P4

If data 1 and data 2 keys will always going to partition P1 as per Hazelcast docs
The result of this modulo - MOD(hash result, partition count) - is the partition in which the data will be stored, that is the partition ID. For ALL members you have in your cluster, the partition ID for a given key is always the same.

Now If m1 and m2 unexpectedly terminated then How Hazelcast will determine data1 and data2 was part of Partition P1 as sych backup copies of these are stored in m3 under Primary Partition of P3 and  in m4 under Primary Partition of P4.

I will really appreciate if one can explain what I am missing as per understanding of Partitions replica and then actual data replications copies.

Thanks,
Rakesh Sharma


Tom OConnell

unread,
Aug 25, 2021, 4:12:59 PM8/25/21
to Hazelcast

Hi –

Yes, Hazelcast can have multiple backup copies.

You’re correct on the partitioning – the ‘hash result’ you mention is the java hashCode() value from the serialized byte-array representation of the key. As this will not change, the partition assignment will not change – although the partition may migrate, based on members joining or leaving the cluster.

There’s one error in your example – for each partition set, there’s a single primary and 0 to many backup copies (backup partitions). In your example below, either M1 or M3 would have the primary – not both.

You don’t need to – and probably shouldn’t – have backups on each member, although you may, should you so choose.

In your example – a backup count of 3 and 271 partitions across 4 members – each member will have 67 (or 68) primary partitions and 201 (or 202) backup partitions. Each of the primary partitions will have a backup on each of the other members (3 backups across 3 other members). With the loss of a member, you’d have 271 across the surviving 3 members, so 90 (or 91) primary partitions per and double that in backups. The 3 member-cluster, though, will only support 2 backup copies – unless/until the lost member rejoins, hence ‘double’, not triple.

Each partition has one (by default) or the configured number of backups. Except in the case of a two-member cluster – no member is ever the mirror of another – the backup partitions are spread across the cluster. This helps minimize the impact of a member loss.

So, with the loss of a second member (of the original 4), there will be a single backup for each primary partition. This is the sole case where one member is the logical mirror of the other.

Note that as each key-value pair is written to a map, it will be copied – either sync or async, based on your config – to backup partition(s) on other members. For synch-backups, the backup(s) all exist, before control is returned to the caller.

Note that you can configure either or both synchronous and asynchronous backups. This is rarely if ever, required. 

As members join and exit, though, the member to partition assignments will change. When a member terminates, each of its partitions will have a backup on another member promoted to primary and a new backup will be created – if this is possible, given your cluster-size and backup-count. In no case will there ever be two backups of a single partition on one member. So, with the simultaneous loss of two members, from a cluster of four – the backup count of 3 will not be maintained until the members rejoin. Note that when the members re-join, they are dynamically assigned partitions from the existing members; we would not expect to have the newly rejoined member holding the same partitions as before the exit.

As each backup is a full copy of your data-set, 3 backup copies (4 copies, total) is kind of memory-expensive. This may be warranted, but consider the uptime of your JVMs – ideally from your own production history. Many clusters stay up for months between maintenance activities. In light of this, fewer backups may be more reasonable – it is not required by Hazelcast, that each key resides within each member. Configuring persistence may be a good alternative to this backup count.

Please let me know if this helped.

Cheers

Tom

Rakesh Sharma

unread,
Aug 25, 2021, 5:47:41 PM8/25/21
to Hazelcast
Hi Tom,
                I like to take a step back a little and just talk about Data Partition first.

Lets assume we have 4 member cluster :
m1, m2, m3 , m4

As per docs:

By default, Hazelcast creates a single copy/replica of each partition.
I am just talking about 271 partitions and since hazelcast create single replica partitions so 271 replica partitions.

Here is more Elaborate details: 

The following is an illustration of the partition replica distributions in a Hazelcast cluster with four members with default settings. Since we have 271 partitions .Each member will have 68/69 primary partitions and 68/69 replica partitions.

m1 -  Primary Partitions [P1-P68] and Replica Partitions of [P137-P204]

m2 - Primary Partitions [P69-P136] and  Replica Partitions of [P205-P271]

m3- Primary Partitions [P137-P204] and Replica Partitions of [P1-P68]

m4 - Primary Partitions [P205-P271] and and Replica Partitions of [ P69-P136 ]


What I am trying to under is it possible to have 2 replica partitions on each member and how to achieve it? I want 2 replica partitions.271x2=542 replica partitions on each member.

m1 -  Primary Partitions [P1-P68] and Replica Partitions of [P137-P204], [P205-P271]

m2 - Primary Partitions [P69-P136] and  Replica Partitions of [P205-P271], [P1-P68]

m3- Primary Partitions [P137-P204] and Replica Partitions of [P1-P68],[ P69-P136 ]

m4 - Primary Partitions [P205-P271] and and Replica Partitions of [ P69-P136 ],[P137-P204]

Am I just talking about memory chunks(Partitions replica) only first.


 I will get into Map item replicas next.


Thanks,

Rakesh


Tom OConnell

unread,
Aug 26, 2021, 8:53:03 AM8/26/21
to Hazelcast

Hi Rakesh -

You're correct about 271 partitions across four members. The partition numbers may or may not be what you described, but we all understand that the actual partition numbers really don't matter.

Replicated partitions serve two purposes - they support data integrity upon the loss of a member and they also - the far less used case - reads for server-side code, when so configured.

The read-backup-data property is interesting, not often used, and not germane to your question.

Setting the backup count to '2' will do what you're describing. Think of it this way - for each primary partition on any member, there will be two backup copies - on two separate members. This will cause each member to have their normal count of primary partitions and double that, of backups.

In the same way, a backup count of '3' would make 3 backup copies - this is the most that you could do in a 4 member cluster. There'd be the 271 primary partitions and 3x that - 813 backup partitions.

It may be useful to note that, when choosing the backup count, you need to anticipate your worst-case assumption. With no backups, any member loss causes loss of data. (backup-count - 1) is the number of members that can exit without data loss. The max value for this is '6' - so you can configure a cluster for the simultaneous loss of 5 members. Beyond this, HA/DR stability is provided by persistence and/or WAN replication, an Hazelcast Enterprise feature.

Cheers

Tom

Vishal Patil

unread,
Aug 30, 2021, 1:05:52 PM8/30/21
to haze...@googlegroups.com
The very problem with your cluster is that you have 4 members, which is an even number. The number of nodes in a cluster should always be odd. If you fix it, that is, make a 3 node or 5 node cluster, then this issue should not occur and you will get the answer to your problem .

Regards,
Vishal Patil

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/72635ea9-270f-4fe9-8788-6558a6b26803n%40googlegroups.com.

Neil Stevenson

unread,
Sep 1, 2021, 3:13:20 AM9/1/21
to Hazelcast
Hi Rakesh

>  I will get into Map item replicas next.

  Can you confirm what problem you're trying to solve ?

 Asking about getting higher partition replication counts is more about how a solution is framed than the problem itself.

 Multiple copies increases data safety, if you've two copies of some data you can lose one and not lose access to that data.
 Multiple copies doesn't improve read speed. For most scenarios only the primary copy is used for read.
 Multiple copies makes write speed worse, more copies to update.

Neil

Rakesh Sharma

unread,
Sep 2, 2021, 1:28:28 PM9/2/21
to Hazelcast
Hi Neil,
              It was my mistake thinking that Partition replica and  IMap sync and Asyc backups are 2 different settings. I was thinking what will happen if number of partition backup is less than total number of IMap sync and asyc backup copies.

Thanks,
Rakesh

Reply all
Reply to author
Forward
0 new messages