best way to transfer data to a new cluster

128 views
Skip to first unread message

Michael

<micha-1@fantasymail.de>
unread,
Mar 15, 2022, 6:13:34 AM3/15/22
to scylladb-users@googlegroups.com
Hi,

what is the prefered way to transfer all data (5TB on each of 8 nodes)
from an now old cluster to a new cluster with more nodes?

I found this:
https://github.com/scylladb/scylla/commit
/4d32d0317248d7c84ba91a16bc3252b2c8d98428

Is this the workflow how to do it:
-copy all sstable from one old node to one new node and use nodetool
refresh.
-then copy all sstable from the next old node to one new node and use
nodetool
... and so on...

This would mean to copy 5TB 8 times and use nodetool refresh 8 times?




I could also add the new cluster to the old cluster as a second
datacenter, and let scylla stream all data. After that, remove the old
DC, leaving only the new DC active.
This is described here:
https://docs.scylladb.com/operating-scylla/procedures/cluster-management/add-dc-to-existing-dc/

There are some issues here I think:
Under point 9, in the "Before" paragraph, shouldn't it state:


CREATE KEYSPACE mykeyspace WITH replication = { 'class' :
'NetworkTopologyStrategy', '<new_dc>' : 3};
(as this is the config of the new nodes)

instead of:
CREATE KEYSPACE mykeyspace WITH replication = { 'class' :
'NetworkTopologyStrategy', '<exiting_dc>' : 3};


It's the new DC, so the replication points to the new nodes.
The "After" box is correct though:
CREATE KEYSPACE mykeyspace WITH REPLICATION = {'class’:
'NetworkTopologyStrategy', <exiting_dc>:3, <new_dc>: 3};


Then, I don't understand point 12 here:

"For each node in the existing data-center(s) and in the new data-center
with the newly promoted seed nodesU, update the``scylla.yaml`` file."

What should there be updated? The cluster is allready running.
Or does it mean adding the new seed nodes to the config of
the old/ex DC?


Sometimes in the steps, DCs are named us-east and us-west, later on they
are named us-dc and asia-dc. That's a little inconsistent.

And maybe the old DC is "exiting", but I think "existing" describes it
better :-)

Also, it's not mentioned to enable the replication to the new cluster,
since in step 1 it gets disabled:

"In the existing datacenter(s) alter each Keyspace replication to use
class : NetworkTopologyStrategy and set the new DC replication factor to
zero. This will prevent writing to the new DC until explicitly enabled."

ALTER KEYSPACE mykeyspace WITH replication = { 'class' :
'NetworkTopologyStrategy', 'us-east' : 3, 'us-west' : 0};




How do I know, that the new DC has got all the data? So that I can
remove the old DC from the cluster...



Another method seems to be:
https://docs.scylladb.com/operating-scylla/procedures/cluster-management/scale-up-cluster/

But that would mean streaming all data again and again.


Are there pros and cons?


Thanks
Michael

Avi Kivity

<avi@scylladb.com>
unread,
Mar 22, 2022, 11:47:04 AM3/22/22
to scylladb-users@googlegroups.com, Michael, Asias He
One way is to add the new nodes and decommission the old nodes.


Another way is load_and_stream [1]. I don't think it is formally
documented yet.


[1] https://www.scylladb.com/2022/02/11/scylladb-open-source-release-4-6/

Felipe Mendes

<felipemendes@scylladb.com>
unread,
Mar 22, 2022, 1:27:39 PM3/22/22
to scylladb-users@googlegroups.com, Avi Kivity, Michael, Asias He


On 22/03/2022 12:46, Avi Kivity wrote:
One way is to add the new nodes and decommission the old nodes.


Another way is load_and_stream [1]. I don't think it is formally documented yet.
--
Felipe Mendes
Solutions Architect
ScyllaDB

Shlomi Livne

<shlomi@scylladb.com>
unread,
Mar 22, 2022, 1:30:47 PM3/22/22
to ScyllaDB users, Avi Kivity, Michael, Asias He
Its not officially out since QA / Dev are still checking some items:




--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/8d9198c9-611a-72ce-9bbe-c10aa623f0f5%40scylladb.com.

Michael

<micha-1@fantasymail.de>
unread,
Mar 22, 2022, 3:35:11 PM3/22/22
to scylladb-users@googlegroups.com


Am 22.03.2022 um 16:46 schrieb Avi Kivity:
> One way is to add the new nodes and decommission the old nodes.

first add all new nodes then decommission the old ones?
Or alterante: add one node, decommissison an old node?
What generates more streaming?

Is it safe to decommision more than one node at once?

>
> Another way is load_and_stream [1]. I don't think it is formally
> documented yet.
>
>
> [1] https://www.scylladb.com/2022/02/11/scylladb-open-source-release-4-6/

Sounds good, I will look at it.

Thanks for answer,
Michael

Micha

<micha-1@fantasymail.de>
unread,
Mar 23, 2022, 4:38:04 AM3/23/22
to Avi Kivity, scylladb-users@googlegroups.com
Another idea:

is the following a good way to migrate to a new cluster:

use the "replace dead node" instructions to replace each old node with
a new node.
After that add remaining nodes to the cluster...


Cheers
Michael

Avi Kivity

<avi@scylladb.com>
unread,
Mar 23, 2022, 4:49:55 AM3/23/22
to scylladb-users@googlegroups.com, Michael
On 3/22/22 21:35, Michael wrote:
>
>
> Am 22.03.2022 um 16:46 schrieb Avi Kivity:
>> One way is to add the new nodes and decommission the old nodes.
>
> first add all new nodes then decommission the old ones?
> Or alterante: add one node, decommissison an old node?
> What generates more streaming?


I think it's similar. Best to first add all new nodes, then decommission
old ones. Be careful not to run out of space, run nodetool cleanup to
recover space.


>
> Is it safe to decommision more than one node at once?


No.

Avi Kivity

<avi@scylladb.com>
unread,
Mar 23, 2022, 5:35:45 AM3/23/22
to Micha, scylladb-users@googlegroups.com
This leaves the cluster vulnerable during the period the node is being
replaced. Otherwise it should work.

Micha

<micha-1@fantasymail.de>
unread,
Mar 23, 2022, 5:54:12 AM3/23/22
to Avi Kivity, scylladb-users@googlegroups.com
good point, thanks.
Reply all
Reply to author
Forward
0 new messages