How to re-shard the table while new data continues to be inserted to the table?

42 views
Skip to first unread message

Jun Li

unread,
Nov 14, 2019, 4:25:19 AM11/14/19
to ClickHouse

In our use case, we use the hash scheme to shard the distributed table,  and the table holds the data that continues to arrive in near real-time (say in seconds). Over time, the cluster to hold the table will have to grow, and because of hash-based sharding, data will have to be re-sharded. 

Supposed that the original table is Table 1, and we want to have the new table to be Table 2, located in the same cluster. Table 2 has 2x number of the shards, compared to Table 1.  


We can have ClickHouse Copier,

https://clickhouse.yandex/docs/en/operations/utils/clickhouse-copier/

 to copy data from Table 1 to Table 2, which takes care of re-sharding the data. It is easy if Table 1 is not changed at all. But since we have continuous incoming data, each copier will fetch the data based on the “select” queries on the partition of Table 1 at some time. Supposed at t1, the copier is started, and at t2, the copier finishes the copy of Table 1 to Table 2 . There is really no way to tell how much new data arrives from t1 to t2 has been copied over, and how much new data does not get copied. 


At the same time, the user needs to continue to query the table over the data that is up to date. Therefore, follow the above migration strategy, at t2, if we switch the incoming query to Table 2,  queries from now on will not get the correct result as some of the data arrives in (t1, t2) does not get copied. 


What would be the good strategy to handle such data migration from a source table to a destination table, while new data continues to arrive at the source table during data migration, and the user continues to issue queries over the data up to most recent data arrival time? 


Jun



Denis Zhuravlev

unread,
Nov 14, 2019, 9:11:56 AM11/14/19
to ClickHouse
I suppose your data partitioned by date.

You should start to ingest into both tables (somehow by inserting or by MV).
At some point you'll have equal data in the new partition e.g. December.
Then using clickhouse-copier you can copy all data before December.

Hint: CH server can be member of any number of cluster.

Jun Li

unread,
Nov 14, 2019, 1:07:14 PM11/14/19
to ClickHouse
Denis: thanks for the proposed solution. But it seems that the solution works only when the historical data does not changed. Supposed we create the two tables at Day-0. From now on, in Table 1 and Table 2, each day produces a partition, for Day-0, Day-1, ...Day-9. The historical data are in partitions named H-0, H-1, H-2, etc, only in Table 1 (the original table).  We start to do the copying of H-0, H-1, H-2,  at Day-5. What if during the data copying, we have new incoming data, which is to the update some historical data in some of H-0, H-1, H-2, etc. That is, we still have the problem that we can not guarantee that when we make a copy of the historical data to the new table, the historical data is completely frozen (no immutable). 

Jun

Denis Zhuravlev

unread,
Nov 14, 2019, 1:36:49 PM11/14/19
to ClickHouse
>But it seems that the solution works only when the historical data does not changed.
Yes, you can freeze the old table and attach frozen parts into another table -- snapshot. And copy data from snapshot. In this case your new table will ingest historical changes of data from inserts and old not changed data from the snapshot.

Jun Li

unread,
Nov 14, 2019, 5:40:56 PM11/14/19
to ClickHouse
Hi Denis: 

Unlike backup, I can not use simple copy from the snapshot data in Table 1 to Table 2, as  Table 2 has 2x number of the shards. So data re-sharding has to happen during data migration. The Click-House copier use: insert into ... select from remote(...). How can we change the query used in the ClickHouse copier, so that it will use the snapshot version of the table, rather than the regular table.  That is, how can we issue the query against the snapshot data? 

Jun

Denis Zhuravlev

unread,
Nov 14, 2019, 6:56:18 PM11/14/19
to ClickHouse
You don't understand.

imagine you have 2 shards and need to add one more.

1. stop ingestion.
2. freeze OLD table at 2 shards.
3. create NEW table at 3 shards.
4. start ingestion (2,3 takes seconds) into OLD(2shards) and NEW(3 shards).
5. create TEMP table at 2 shards as OLD
6. mv shadow/..../* TEMP/detached/
7. attach parts to TEMP
8. insert into NEW from TEMP

freeze - shadow - TEMP will not use any additional space because its hardlinks.

Jun Li

unread,
Nov 15, 2019, 2:11:23 AM11/15/19
to ClickHouse
Hi Denis:

Get it now! You use TEMP table to hold the snapshot of the table. 

Thank you very much!

Jun

Reply all
Reply to author
Forward
0 new messages