HBase snapshots of TSDB tables: size on disk

Thibault Godouet

unread,

Sep 5, 2017, 3:31:05 AM9/5/17

to OpenTSDB

Hi there,

We have a smallish Hadoop cluster dedicated to HBase and TSDB.

We take a daily HBase snapshot of each of the TSDB tables, and always keep 5 of them (5 days) on a rolling basis.

We have been using OpenTSDB for over a year, with compaction enabled. Currently we are on 2.3.

While investigating HDFS space usage, we were surprised to see that the 'archive' HDFS folder of HBase is almost 70% of the size of the 'data' folder. That seems huge for 5 days of snapshots vs. over a year of data, even if the write rates have been increasing. Data is happened only, so you'd think snapshots and in turn the HBase archive should be close to 5 days / 365 days of the HBase data (probably a bit bigger due to TSDB and HBase compactions, but not that much?!). We initially thought this might be that HBase cleanups were somehow disabled or not working, but all the non-empty files in archive are at most 5 days old, which seems consistent with the snapshots.

So perhaps this is normal behaviour after all... Has anyone played with snapshots and seen anything similar?

Would one of the list members with a good understanding of HBase be able to give clues as to why the archive is so big?

Thanks,

Thibault.

Ps: this is a HBase question, but I suspect this may be related to how TSDB works with HBase, hence why I post this question here.

Stack

unread,

Sep 5, 2017, 12:24:26 PM9/5/17

to Thibault Godouet, OpenTSDB

On Tue, Sep 5, 2017 at 12:31 AM, Thibault Godouet <tib...@godouet.net> wrote:

Hi there,

We have a smallish Hadoop cluster dedicated to HBase and TSDB.
We take a daily HBase snapshot of each of the TSDB tables, and always keep 5 of them (5 days) on a rolling basis.
We have been using OpenTSDB for over a year, with compaction enabled. Currently we are on 2.3.

While investigating HDFS space usage, we were surprised to see that the 'archive' HDFS folder of HBase is almost 70% of the size of the 'data' folder. That seems huge for 5 days of snapshots vs. over a year of data, even if the write rates have been increasing. Data is happened only, so you'd think snapshots and in turn the HBase archive should be close to 5 days / 365 days of the HBase data (probably a bit bigger due to TSDB and HBase compactions, but not that much?!). We initially thought this might be that HBase cleanups were somehow disabled or not working, but all the non-empty files in archive are at most 5 days old, which seems consistent with the snapshots.

So perhaps this is normal behaviour after all... Has anyone played with snapshots and seen anything similar?
Would one of the list members with a good understanding of HBase be able to give clues as to why the archive is so big?

Archive has WAL files that are no longer needed and 'removed' hfiles -- removed because a compaction ran and replaced the old with new compacted versions -- that are still referenced by snapshot manifests.

When the snapshot is removed, after some lag, the hfiles in archive are garbage collected.

Could this explain the phenomenon you see?

S

Thibault Godouet

unread,

Sep 5, 2017, 6:23:39 PM9/5/17

to st...@duboce.net, OpenTSDB

It does explain why snapshots take *some* space, but does it explain them taking that much space? (Snapshots of the last 5 days taking as much as 70% of over a year of data)

Does that mean that compaction would be applied to 70% of all the data in HBase every 5 days? Does that feel right?

Stack

unread,

Sep 5, 2017, 6:31:30 PM9/5/17

to Thibault Godouet, OpenTSDB

On Tue, Sep 5, 2017 at 3:23 PM, Thibault Godouet <tib...@godouet.net> wrote:

It does explain why snapshots take *some* space, but does it explain them taking that much space? (Snapshots of the last 5 days taking as much as 70% of over a year of data)

Does that mean that compaction would be applied to 70% of all the data in HBase every 5 days? Does that feel right?

Compaction and write rates will determine what makes it out to archive.

Can you study rate of change in archive dir?

Sounds like this is a production system else I'd suggest try running w/ compactions disabled for a period to see how it effects the rate of change.

S

Thibault Godouet

unread,

Sep 7, 2017, 5:03:24 PM9/7/17

to Stack, OpenTSDB

Indeed that is a production system.

What do you mean by 'rate of change'? How its overall size vary, or how many files get created/deleted under archive/ ?

Stack

unread,

Sep 7, 2017, 5:26:29 PM9/7/17

to Thibault Godouet, OpenTSDB

On Thu, Sep 7, 2017 at 2:01 PM, Thibault Godouet <tib...@godouet.net> wrote:

Indeed that is a production system.
What do you mean by 'rate of change'? How its overall size vary, or how many files get created/deleted under archive/ ?

The latter would be more interesting. New entries in Archive correlate with a compaction? Could you compact less? (May mean some higher latency in exchange for less disk used).

M

Reply all

Reply to author

Forward