How does Scylla calculate which node the sub-replica data falls on?

223 views
Skip to first unread message

Xiang Zhou

<feixiang11010@gmail.com>
unread,
Feb 18, 2021, 3:45:56 AM2/18/21
to ScyllaDB users
Hi~

If I have a 6-node cluster, RF=3.
Insert a row of data, it can be calculated by the partition key, which node the **primary copy** of this row of data falls on.
So how is the calculation of which two nodes the remaining **two copies**(sub-replicas) fall on?

Kane Wilson

<k@raft.so>
unread,
Feb 18, 2021, 5:03:45 AM2/18/21
to scylladb-users@googlegroups.com
It depends on the replication strategy.

SimpleStrategy is very basic and will simply place the following two replicas on the next two nodes in the token ring.
For example if you had a token ring of 6 with 1 token per node and 6 nodes (tokens 1 - 6), and a primary replica for some data X that landed on token 5, X would have copies on node 5, 6, and 1. You should always avoid using SimpleStrategy in production (or just always).

NetworkTopologyStrategy allows you to specify the number of replicas per datacenter and also respects racks, such that a primary replica will be chosen via the partitioner, however copies will be placed based on the allocation of replicas to datacenters and within a datacenter on alternating racks.

For example, with a 2 DC cluster, 6 nodes in each DC, 1 token per node, token ring of 1-6, RF=3 in each DC, and 3 racks per DC:

X primary replica = 5

node | token | rack | DC
 1      1       a     north    - X
 2      2       b     north
 3      3       c     north    - X
 4      4       a     north
 5      5       b     north    - X
 6      6       c     north
 7      1       a     south    - X
 8      2       b     south
 9      3       c     south    - X
 10     4       a     south
 11     5       b     south    - X
 12     6       c     south

Worth noting that this is the simple case with #racks == RF and racks are alternating (which you should aim for), and as long as your number of racks is at least the same as your RF (for the DC) you will have one entire copy of the data on that rack.

raft.so - ScyllaDB consulting, support, and managed services.


--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/97046ea5-242d-4cb2-a9fd-4f234c3cbab2n%40googlegroups.com.

Nadav Har'El

<nyh@scylladb.com>
unread,
Feb 18, 2021, 8:15:58 AM2/18/21
to ScyllaDB users, Kamil Braun
On Thu, Feb 18, 2021 at 12:03 PM Kane Wilson <k...@raft.so> wrote:
It depends on the replication strategy.

SimpleStrategy is very basic and will simply place the following two replicas on the next two nodes in the token ring.
For example if you had a token ring of 6 with 1 token per node and 6 nodes (tokens 1 - 6), and a primary replica for some data X that landed on token 5, X would have copies on node 5, 6, and 1. You should always avoid using SimpleStrategy in production (or just always).

NetworkTopologyStrategy allows you to specify the number of replicas per datacenter and also respects racks, such that a primary replica will be chosen via the partitioner, however copies will be placed based on the allocation of replicas to datacenters and within a datacenter on alternating racks.

Note that using SimpleStrategy is not recommended, and if Xiang is creating tables via Alternator (the DynamoDB API) he's getting NetworkTopologyStrategy automatically.

I'm CCing Kamil which I have a vague recollection that he wrote a document on exactly this issue. Or if my recollection is wrong, maybe he remembers if such a document exists somewhere.

To make a fairly long story short, the idea of the NetworkTopologyStrategy is to ensure that replicas aren't just randomly placed in RF (replication factor - e.g., 3) copies inside the cluster but rather:
  1. If you have multiple DCs, each of them will have RF copies of the data - not RF copies globally.
  2. If you have multiple racks in a DC (Amazon calls this multiple "availability zones"  in a "region") the goal is to place copies of the same data in different racks - not all of the copies in the same rack. You'll usually want to have the same number of racks as RF (e.g., RF=3 and 3 racks per DC) and then Scylla guarantees that the three replicas of each piece of data will be in different racks.
When you have just one DC and one rack, NetworkTopologyStrategy does basically the same thing as the Simple strategy, but if there's a possibility you'll ever want to add more DCs or racks, use NetworkTopologyStrategy right from the start - it will be a mess to change this later, and you have nothing to lose. As I said, Alternator does this for you by default .

Kamil Braun

<kbraun@scylladb.com>
unread,
Feb 18, 2021, 9:47:54 AM2/18/21
to Nadav Har'El, ScyllaDB users
On Thu, Feb 18, 2021 at 2:15 PM Nadav Har'El <n...@scylladb.com> wrote:
On Thu, Feb 18, 2021 at 12:03 PM Kane Wilson <k...@raft.so> wrote:
It depends on the replication strategy.

SimpleStrategy is very basic and will simply place the following two replicas on the next two nodes in the token ring.
For example if you had a token ring of 6 with 1 token per node and 6 nodes (tokens 1 - 6), and a primary replica for some data X that landed on token 5, X would have copies on node 5, 6, and 1. You should always avoid using SimpleStrategy in production (or just always).

NetworkTopologyStrategy allows you to specify the number of replicas per datacenter and also respects racks, such that a primary replica will be chosen via the partitioner, however copies will be placed based on the allocation of replicas to datacenters and within a datacenter on alternating racks.

Note that using SimpleStrategy is not recommended, and if Xiang is creating tables via Alternator (the DynamoDB API) he's getting NetworkTopologyStrategy automatically.

I'm CCing Kamil which I have a vague recollection that he wrote a document on exactly this issue. Or if my recollection is wrong, maybe he remembers if such a document exists somewhere.
I've attached that document.
part-repl.pdf

Xiang Zhou

<feixiang11010@gmail.com>
unread,
Feb 19, 2021, 7:38:17 AM2/19/21
to ScyllaDB users
Thank you all very much!
I got exactly what I wanted.

> I’ve attended a presentation about Cassandra where the presenter said that each
> DC has a different ring. That is as far from the truth as it could be, there is a single
> token ring; the token ring doesn’t care about DCs and DCs don’t care about the
> token ring. These concepts are completely orthogonal.

This passage did subverted some of my cognition, but more of it solved some doubts.
Reply all
Reply to author
Forward
0 new messages