questions about renaming a tagv and dropcaches

Huafeng Lyu

unread,

Jan 5, 2017, 3:37:43 AM1/5/17

to OpenTSDB

Hi,

I need to rename a tagv (hostname, since sometimes our hostname changes). I read about the discussion at https://groups.google.com/forum/#!topic/opentsdb/gBRuaYGUmAo and opentsdb doc about 'dropcaches', and still have some questions.

The doc says, "This endpoint purges the in-memory data cached in OpenTSDB. This includes all UID to name and name to UID maps for metrics, tag names and tag values." So the in-memoy data purged by 'dropcaches' is only the uid<->name maps, not including the datapoints that opentsdb received but haven't written into hbase?

Is there a method to list the content of the cache (the in-memory data mentioned above) to be purged by 'dropcaches'?

If 'dropcaches' is not used, will the cache be cleared/refreshed automatically after some timeout?

After the tagk is renamed, before a 'dropcaches' command is not executed, if some agent still report datapoints using the old tagk, the cache (not refreshed) will still map the old tagk to the correct uid (of the new tagk), so the datapoints will actually be written into opentsdb using the new tagk (or, the uid of the new tagk), right? My test on opentsdb 2.2 showed so, and I would like to get confirmed by experts here.

After the tagk is renamed, and 'dropcaches' is not executed, and the agent start sending datapoints to opentsdb using the new tagk, I guess the new tagk will cause a cache miss and opentsdb will check the hbase uid table and get the cache refreshed, and thus these datapoints will be correctly written into opentsdb using the new tagk (or, uid of the new tagk). Is it correct?

In the case above, rolling restarting TSDs is not necessary, right?

Thanks for answering...

--huafeng

ManOLamancha

unread,

Jan 7, 2017, 8:00:46 PM1/7/17

to OpenTSDB

On Thursday, January 5, 2017 at 12:37:43 AM UTC-8, Huafeng Lyu wrote:

Hi,

I need to rename a tagv (hostname, since sometimes our hostname changes). I read about the discussion at https://groups.google.com/forum/#!topic/opentsdb/gBRuaYGUmAo and opentsdb doc about 'dropcaches', and still have some questions.

The doc says, "This endpoint purges the in-memory data cached in OpenTSDB. This includes all UID to name and name to UID maps for metrics, tag names and tag values." So the in-memoy data purged by 'dropcaches' is only the uid<->name maps, not including the datapoints that opentsdb received but haven't written into hbase?

Right, dropcaches only affects the TSD's memory and doesn't modify the data in storage at all.

Is there a method to list the content of the cache (the in-memory data mentioned above) to be purged by 'dropcaches'?

Not right now unfortunately. I'd like to be able to purge selected items at some point. And returning the cache data would be possible but depending on the TSD, it can be huge. (Ours are up to 10GB in size)

If 'dropcaches' is not used, will the cache be cleared/refreshed automatically after some timeout?

The current code simply keeps expanding the cache forever and only purges on restarts or when dropcaches is called. I've started work on converting to an LRU cache and that could also be time based.

After the tagk is renamed, before a 'dropcaches' command is not executed, if some agent still report datapoints using the old tagk, the cache (not refreshed) will still map the old tagk to the correct uid (of the new tagk), so the datapoints will actually be written into opentsdb using the new tagk (or, the uid of the new tagk), right? My test on opentsdb 2.2 showed so, and I would like to get confirmed by experts here.

Yes, you have that right, which is why renaming is a real pain. Alternatively you may want to look at tagging data with something that never changes, like an asset tag for a host. That way when a hostname changes, you can see new data after the change (since the old data really isn't relevant for the new role) or old data for the old role. But if you really want to look at the specific machine over time, just query for the asset tag.

After the tagk is renamed, and 'dropcaches' is not executed, and the agent start sending datapoints to opentsdb using the new tagk, I guess the new tagk will cause a cache miss and opentsdb will check the hbase uid table and get the cache refreshed, and thus these datapoints will be correctly written into opentsdb using the new tagk (or, uid of the new tagk). Is it correct?

Right.

In the case above, rolling restarting TSDs is not necessary, right?

Correct. :)

Huafeng Lyu

unread,

Jan 12, 2017, 4:24:20 AM1/12/17

to OpenTSDB

Thanks ManOLamancha for the explanation. Please see my response in bold below.

First, all these discussions (about tagv) also apply to other uids (metric, tagk), right?

On Sunday, January 8, 2017 at 9:00:46 AM UTC+8, ManOLamancha wrote:

On Thursday, January 5, 2017 at 12:37:43 AM UTC-8, Huafeng Lyu wrote:
Hi,

I need to rename a tagv (hostname, since sometimes our hostname changes). I read about the discussion at https://groups.google.com/forum/#!topic/opentsdb/gBRuaYGUmAo and opentsdb doc about 'dropcaches', and still have some questions.

The doc says, "This endpoint purges the in-memory data cached in OpenTSDB. This includes all UID to name and name to UID maps for metrics, tag names and tag values." So the in-memoy data purged by 'dropcaches' is only the uid<->name maps, not including the datapoints that opentsdb received but haven't written into hbase?

Right, dropcaches only affects the TSD's memory and doesn't modify the data in storage at all.

I meant the datapoints that opentsdb already received but hasn't written into storage and thus is still in memory (I remember opentsdb will cache those datapoints in memory for up to 1 second?). dropcaches won't affect such datapoints, right?

Is there a method to list the content of the cache (the in-memory data mentioned above) to be purged by 'dropcaches'?

Not right now unfortunately. I'd like to be able to purge selected items at some point. And returning the cache data would be possible but depending on the TSD, it can be huge. (Ours are up to 10GB in size)

Understood and agreed. If the content is that huge, listing them is probably not a good idea.

If 'dropcaches' is not used, will the cache be cleared/refreshed automatically after some timeout?

The current code simply keeps expanding the cache forever and only purges on restarts or when dropcaches is called. I've started work on converting to an LRU cache and that could also be time based.

Actually I thought this more and found LRU cache may not be a good idea - you cannot control when the cache is purged.

With the current cache (keeps expanding until dropcaches is called), we can actually rename a tagv easily - (1) renaming the tagv in opentsdb, and then (2) having the agent report datapoints with the new tagv. The cache will handle the tagv change for us, as you confirmed below. If the cache is LRU, then we're dead if the mapping of the tagk is purged/refreshed after step (1) but before step (2).

What do you think?

After the tagk is renamed, before a 'dropcaches' command is not executed, if some agent still report datapoints using the old tagk, the cache (not refreshed) will still map the old tagk to the correct uid (of the new tagk), so the datapoints will actually be written into opentsdb using the new tagk (or, the uid of the new tagk), right? My test on opentsdb 2.2 showed so, and I would like to get confirmed by experts here.

Yes, you have that right, which is why renaming is a real pain. Alternatively you may want to look at tagging data with something that never changes, like an asset tag for a host. That way when a hostname changes, you can see new data after the change (since the old data really isn't relevant for the new role) or old data for the old role. But if you really want to look at the specific machine over time, just query for the asset tag.

This is actually good for me, not a pain. :) In my case, I want the old datapoints with the old tagv to be modified with the new tagv, so that we still know these datapoints belong to the same item.

ManOLamancha

unread,

Jan 22, 2017, 5:20:05 PM1/22/17

to OpenTSDB

On Thursday, January 12, 2017 at 1:24:20 AM UTC-8, Huafeng Lyu wrote:

Thanks ManOLamancha for the explanation. Please see my response in bold below.

First, all these discussions (about tagv) also apply to other uids (metric, tagk), right?

Yes :)

Right, dropcaches only affects the TSD's memory and doesn't modify the data in storage at all.

I meant the datapoints that opentsdb already received but hasn't written into storage and thus is still in memory (I remember opentsdb will cache those datapoints in memory for up to 1 second?). dropcaches won't affect such datapoints, right?

Correct, if it's in the write queue for HBase and hasn't flushed at the 1 second interval, then drop caches would only affect any incoming data and won't modify the UIDs assigned to values in the queue. The HBase queue contains data *after* the metric and tag strings have been converted to UIDs.

If 'dropcaches' is not used, will the cache be cleared/refreshed automatically after some timeout?

The current code simply keeps expanding the cache forever and only purges on restarts or when dropcaches is called. I've started work on converting to an LRU cache and that could also be time based.
Actually I thought this more and found LRU cache may not be a good idea - you cannot control when the cache is purged.
With the current cache (keeps expanding until dropcaches is called), we can actually rename a tagv easily - (1) renaming the tagv in opentsdb, and then (2) having the agent report datapoints with the new tagv. The cache will handle the tagv change for us, as you confirmed below. If the cache is LRU, then we're dead if the mapping of the tagk is purged/refreshed after step (1) but before step (2).
What do you think?

True, an LRU would make this a bit less deterministic if data is coming in from the agents with the old tag while making the changes. But if you can completely stop the old agents for a period of time, you can:

1) Kill agents

2) Rename tag in HBase

3) Flush caches on all writers

4) Start agents with new tag value

then you should be ok regardless of whether or not it's an LRU cache or not as it will load the new UID mapping from HBase.

Reply all

Reply to author

Forward