Hi,
what is the prefered way to transfer all data (5TB on each of 8 nodes)
from an now old cluster to a new cluster with more nodes?
I found this:
https://github.com/scylladb/scylla/commit
/4d32d0317248d7c84ba91a16bc3252b2c8d98428
Is this the workflow how to do it:
-copy all sstable from one old node to one new node and use nodetool
refresh.
-then copy all sstable from the next old node to one new node and use
nodetool
... and so on...
This would mean to copy 5TB 8 times and use nodetool refresh 8 times?
I could also add the new cluster to the old cluster as a second
datacenter, and let scylla stream all data. After that, remove the old
DC, leaving only the new DC active.
This is described here:
https://docs.scylladb.com/operating-scylla/procedures/cluster-management/add-dc-to-existing-dc/
There are some issues here I think:
Under point 9, in the "Before" paragraph, shouldn't it state:
CREATE KEYSPACE mykeyspace WITH replication = { 'class' :
'NetworkTopologyStrategy', '<new_dc>' : 3};
(as this is the config of the new nodes)
instead of:
CREATE KEYSPACE mykeyspace WITH replication = { 'class' :
'NetworkTopologyStrategy', '<exiting_dc>' : 3};
It's the new DC, so the replication points to the new nodes.
The "After" box is correct though:
CREATE KEYSPACE mykeyspace WITH REPLICATION = {'class’:
'NetworkTopologyStrategy', <exiting_dc>:3, <new_dc>: 3};
Then, I don't understand point 12 here:
"For each node in the existing data-center(s) and in the new data-center
with the newly promoted seed nodesU, update the``scylla.yaml`` file."
What should there be updated? The cluster is allready running.
Or does it mean adding the new seed nodes to the config of
the old/ex DC?
Sometimes in the steps, DCs are named us-east and us-west, later on they
are named us-dc and asia-dc. That's a little inconsistent.
And maybe the old DC is "exiting", but I think "existing" describes it
better :-)
Also, it's not mentioned to enable the replication to the new cluster,
since in step 1 it gets disabled:
"In the existing datacenter(s) alter each Keyspace replication to use
class : NetworkTopologyStrategy and set the new DC replication factor to
zero. This will prevent writing to the new DC until explicitly enabled."
ALTER KEYSPACE mykeyspace WITH replication = { 'class' :
'NetworkTopologyStrategy', 'us-east' : 3, 'us-west' : 0};
How do I know, that the new DC has got all the data? So that I can
remove the old DC from the cluster...
Another method seems to be:
https://docs.scylladb.com/operating-scylla/procedures/cluster-management/scale-up-cluster/
But that would mean streaming all data again and again.
Are there pros and cons?
Thanks
Michael