What is a backup solution for all the data in ClickHouse (not with ReplicatedMergeTree) ?

953 views
Skip to first unread message

Yj H

unread,
May 10, 2018, 8:08:48 AM5/10/18
to ClickHouse
We are looking for a backup and restore solution for our data in the ClickHouse cluster.

We do no want to get involved in Zookeeper and found it tedious to create tables for each replica, with ReplicatedMergeTree tables.

So we want a cold backup solution, alternative to ReplicatedMergeTree.

Our requirement is as follows: 

1. It would be good enough to backup all the data elder than 1 hour. 
    For the recent one hour's data, it is fine to lose it, as we can reload it from the source again.

2. The solution should support incremental backup, so that we can add data to the backup day by day.

3. Restoring data from backup into the ClickHouse tables should be as automatic as possible.

4. We should be able to re-construct tables from scratch with the backup, even if the metadata about tables is completely lost.

We think of following two candidate solutions:

1. Write a daily job to upload the entire Clickhouse directory (including all table parts, metadata and etc. ) into HDFS, ignoring previously copied parts and ignoring parts being modified by ClickHouse.

2. Use clickhouse-copier to copy data into another backup ClickHouse cluster. However, it looks this solution cannot support incremental backup.

Looking forward to hear your suggestion / experience =.


Ashish Gaurav

unread,
Dec 9, 2018, 7:49:19 AM12/9/18
to ClickHouse
For non replicated tables we use zfs to have live incremental backups. 
It's eats up extra memory to have zfs partition but also gives better iops compared to ext4 or xfs and ability to take incremental snapshot of entire disk without stopping any service using it.

Some relevant links i have bookmarked :

Reply all
Reply to author
Forward
0 new messages