Different TTLs for different data points?

50 views
Skip to first unread message

John Humphreys

unread,
Aug 31, 2017, 12:01:38 PM8/31/17
to OpenTSDB
Is there a way to automatically delete data at different times?
  • Have minute, hour, and day level data.
  • Want to keep minute data for 6 months, hour data for 3 years, day data forever.
I know I can set the TTL on a column family but that will delete all my data unfortunately.

I'm also aware that I can perform a scan to delete data. 

Are there any other options for deleting data?  If I have to use scans, are they safe in production and relatively performant for large amounts of data?

Jonathan Creasy

unread,
Aug 31, 2017, 12:12:31 PM8/31/17
to John Humphreys, OpenTSDB
Well, the rollup support in 2.4.0 utilizes a different table for the rollups and the raw data. This would allow you to have a different TTL for those.

This is hacky, but, I think some people do this by storing the data with an offset.They have say a 3 year TTL on the table and then they store data they want to keep for 6 months as being 2 years, and 6 months ago rather than now. Then they adjust this value when querying to report the correct time. I don't think this is a recommended path. I assume they are keeping metadata about the series to tell them which offset to apply.

The scans should be no more taxing on a cluster than a query over the same data. I would recommend deleting smaller batches more often, so instead of a weekly run, maybe do it daily. If the dataset is large enough, maybe hourly.


John Humphreys

unread,
Aug 31, 2017, 7:03:53 PM8/31/17
to OpenTSDB, johnwillia...@gmail.com
Thank you very much :).  The hacky option is quite interesting actually.  We are planning on using a custom web-app to query the data mostly in which case it would work quite well.  We also are going to use Grafana, but I could adjust the OpenTSDB plugin for it to handle this.

Out of curiosity; does HBASE work off of the OpenTSDB timestamp when it comes to the TTL on a column family (which seems to be what you're implying)?  I assumed it had an internal timestamp from the actual insertion time.

ManOLamancha

unread,
Sep 1, 2017, 2:28:48 PM9/1/17
to OpenTSDB, johnwillia...@gmail.com
On Thursday, August 31, 2017 at 4:03:53 PM UTC-7, John Humphreys wrote:
Thank you very much :).  The hacky option is quite interesting actually.  We are planning on using a custom web-app to query the data mostly in which case it would work quite well.  We also are going to use Grafana, but I could adjust the OpenTSDB plugin for it to handle this.

Out of curiosity; does HBASE work off of the OpenTSDB timestamp when it comes to the TTL on a column family (which seems to be what you're implying)?  I assumed it had an internal timestamp from the actual insertion time.

Right now the the TSD will just write the system timestamp as the column time. There's a patch in 2.4 to use the value's timestamp for date-tiered compaction to help avoid extraneous compactions. And another company needs similar behavior where they'd like to have a different TTL per metric so the way to implement that would be to set the max TTL on the table, then lookup the metric's TTL in the meta table and fudge the column timestamp so it expires earlier or later. We need the code for that. 
Reply all
Reply to author
Forward
0 new messages