How does OpenTSDB Compaction save space?

Raunaq Kochar

unread,

Jun 2, 2017, 5:53:01 PM6/2/17

to OpenTSDB

On looking at the documentation in the Compaction section, from what I understand, OpenTSDB compacts data by appending together column qualifiers and the values with the other column qualifiers and values for each row.

So won't it take the same amount of space that was used earlier, as the data was being appended into each column?

The docs also use the phrase, "Since we know each data point qualifier is 2 bytes". However, right before that, they have said that a column qualifier can be "The qualifier is comprised of 2 or 4 bytes that encode an offset from the row's base time and flags to determine if the value is an integer or a decimal value."

Can someone help clear this up too?

ManOLamancha

unread,

Jun 6, 2017, 3:49:38 PM6/6/17

to OpenTSDB

On Friday, June 2, 2017 at 2:53:01 PM UTC-7, Raunaq Kochar wrote:

On looking at the documentation in the Compaction section, from what I understand, OpenTSDB compacts data by appending together column qualifiers and the values with the other column qualifiers and values for each row.
So won't it take the same amount of space that was used earlier, as the data was being appended into each column?

TSDB compaction won't save space over TSDB Append's, they're effectively equivalent. But the default config for OpenTSDB is to write individual columns per data point and enable compactions. Compactions and appends save a TON of space over the individual columns because:

1) Each column has an 8 byte timestamp in storage associated with the write time (or dp time in tsdb 2.4 with date-tiered hbase compactions).

2) During serialization, HBase returns the row key with every column which is really inefficient for us.

So the recommended configurations depend on priorities:

A) If you need to save space as much as possible but have lots of CPU and IO, try OpenTSDB's appends with compaction disabled. But watch the region server's resources to make sure they aren't running out of IO.

B) If you need as much write throughput as possible, use OpenTSDB's default puts and disable compactions.

C) If you have a low write throughput and enough TSDs and region servers, then try appends or just use puts and compactions.

Yahoo is working on a better append co-processor that gives us the space savings without the region server impact.

The docs also use the phrase, "Since we know each data point qualifier is 2 bytes". However, right before that, they have said that a column qualifier can be "The qualifier is comprised of 2 or 4 bytes that encode an offset from the row's base time and flags to determine if the value is an integer or a decimal value."
Can someone help clear this up too?

Yeah, I need to update that doc. The qualifier for data points was only two bytes when OpenTSDB only supported second timestamp resolution. When we added millisecond resolution, the qualifiers could be 4 or 2 bytes depending on whether the DP had a second or millisecond resolution. (When we add nanoseconds it may be 6 or 8 bytes).

So the offset and flag encoding is similar but the resolution is different.

Hope that helps :)

Raunaq Kochar

unread,

Jun 14, 2017, 7:08:37 PM6/14/17

to OpenTSDB

The first part helped!

The docs also use the phrase, "Since we know each data point qualifier is 2 bytes". However, right before that, they have said that a column qualifier can be "The qualifier is comprised of 2 or 4 bytes that encode an offset from the row's base time and flags to determine if the value is an integer or a decimal value."
Can someone help clear this up too?

Yeah, I need to update that doc. The qualifier for data points was only two bytes when OpenTSDB only supported second timestamp resolution. When we added millisecond resolution, the qualifiers could be 4 or 2 bytes depending on whether the DP had a second or millisecond resolution. (When we add nanoseconds it may be 6 or 8 bytes).

So here, I wanted to ask, can a single row have values in different resolutions and varying qualifier lengths? Or will the data format be consistent accross a row?

Thanks,

Raunaq.

ManOLamancha

unread,

Jun 15, 2017, 4:17:49 PM6/15/17

to OpenTSDB

On Wednesday, June 14, 2017 at 4:08:37 PM UTC-7, Raunaq Kochar wrote:

So here, I wanted to ask, can a single row have values in different resolutions and varying qualifier lengths? Or will the data format be consistent accross a row?

Yup, you can store both ms and s resolution in a single row. In fact we default ms values that align on a second boundary to second resolution right now.

Reply all

Reply to author

Forward