Hi there,
We have a smallish Hadoop cluster dedicated to HBase and TSDB.
We take a daily HBase snapshot of each of the TSDB tables, and always keep 5 of them (5 days) on a rolling basis.
We have been using OpenTSDB for over a year, with compaction enabled. Currently we are on 2.3.
While investigating HDFS space usage, we were surprised to see that the 'archive' HDFS folder of HBase is almost 70% of the size of the 'data' folder. That seems huge for 5 days of snapshots vs. over a year of data, even if the write rates have been increasing. Data is happened only, so you'd think snapshots and in turn the HBase archive should be close to 5 days / 365 days of the HBase data (probably a bit bigger due to TSDB and HBase compactions, but not that much?!). We initially thought this might be that HBase cleanups were somehow disabled or not working, but all the non-empty files in archive are at most 5 days old, which seems consistent with the snapshots.
So perhaps this is normal behaviour after all... Has anyone played with snapshots and seen anything similar?
Would one of the list members with a good understanding of HBase be able to give clues as to why the archive is so big?
Thanks,
Thibault.
Ps: this is a HBase question, but I suspect this may be related to how TSDB works with HBase, hence why I post this question here.