We are looking for a backup and restore solution for our data in the ClickHouse cluster.
We do no want to get involved in Zookeeper and found it tedious to create tables for each replica, with ReplicatedMergeTree tables.
So we want a cold backup solution, alternative to ReplicatedMergeTree.
Our requirement is as follows:
1. It would be good enough to backup all the data elder than 1 hour.
For the recent one hour's data, it is fine to lose it, as we can reload it from the source again.
2. The solution should support incremental backup, so that we can add data to the backup day by day.
3. Restoring data from backup into the ClickHouse tables should be as automatic as possible.
4. We should be able to re-construct tables from scratch with the backup, even if the metadata about tables is completely lost.
We think of following two candidate solutions:
1. Write a daily job to upload the entire Clickhouse directory (including all table parts, metadata and etc. ) into HDFS, ignoring previously copied parts and ignoring parts being modified by ClickHouse.
2. Use clickhouse-copier to copy data into another backup ClickHouse cluster. However, it looks this solution cannot support incremental backup.
Looking forward to hear your suggestion / experience =.