we have a clickhouse cluster setup with 4 shards and 4 replicas on 4 nodes using Altinity clickhouse operator and we experienced node failures and zookeeper was also running on same nodes so there was zookeeper downtime as well but these were AWS autoscalling groups so all nodes recovred automatically and pods recreated once nodes were up and running.
After all clickhouse pods started running and there was discrepancy on the databases and tables across the cluster, for example on shard 1, I can see 20 databases but on replica I see only 17 databases(3 missing)
Is there a way we can recover the missing databases on all shards & repllicas? I've all data (zookeeper snapshots and clichouse-server) stored in PVCs(EBS volumes) or go back to previous zookeeper state where we have all metadata for all the tables?
Zookeeper snapshots:
zookeeper@zookeeper-1:/var/lib/zookeeper/data/version-2$
-rw-rw-r-- 1 zookeeper zookeeper 300314114 Jul 3 04:45 snapshot.b00016a8c
-rw-rw-r-- 1 zookeeper zookeeper 299436536 Jul 3 05:33 snapshot.c00017305
-rw-rw-r-- 1 zookeeper zookeeper 300844525 Jul 5 23:26 snapshot.d0002a48d
-rw-rw-r-- 1 zookeeper zookeeper 300845078 Jul 5 23:32 snapshot.e000087f9
Clickhouse-server data:
root@chi-ch-01-***-1-1-0:/var/lib/clickhouse/data#
Thanks in Advance !!!
Thanks
Naveen