explicitTags efficient filters on big table

Uri Okrent

unread,

Apr 13, 2021, 10:34:26 AM4/13/21

to OpenTSDB

Hi, general question. I stumbled upon this

"""

As of 2.3 and later, if you know all of the tag keys for a given metric query latency can be improved greatly by using the explicitTags feature.

...

Explicit tags will craft an underlying storage query that fetches only those rows with the given tag keys. That can allow the database to skip over irrelevant rows and answer in less time.

"""

(https://opentsdb.net/docs/build/html/user_guide/query/filters.html#explicit-tags)

The reference to "underlying storage" generally implies HBase, correct? Will I get similar performance improvements when using cloud Big Table as the underlying storage?

Thanks

Uri Okrent

unread,

Aug 17, 2022, 11:21:29 AM8/17/22

to OpenTSDB

I've tested this out myself, and in my use case, enabling explicit tags seems to impact performance *negatively*. Is that expected? Is there any reason why that might be the case? Also, somewhat related, queries seem to take the same amount of time regardless of whether or not I have fuzzy filter enabled. I am performing a search with on a metric with two tags, one with an explicit value (literal_or with a single value), and the other tag as plain wildcard("*").

一个股民的自我修养

unread,

Apr 14, 2023, 3:26:59 PM4/14/23

to OpenTSDB

It depends on data in your queries. If the data is not stored consecutively in disk, then OS has to read more than necessary pages so perf is bad. Now we have to look at how data is stored physically in disk. OpenTSDB is built on top of HBase which orders data based on rowkey. In OpenTSDB design, a data point's rowkey is something like this:

metric uid + partial timestamp + tag1 <key,value> + tag2 <key, value> + ....

So if your query contains a tag value '*', OS has to scan all of related rows. If unfortunately the tag in you query is not the first tag in physical storage, then lots of rows scanned are not related to your query. You can tell that even with explicit tag values you will still have such performance issues.

FYI, we are developing a TSDB, TickTockDB (https://github.com/ytyou/ticktock), compatible to OpenTSDB APIs but with more than 50x better performance than OpenTSDB, after years of pains and frustrations with OpenTSDB. You can give a try if you are interested.

thanks

Reply all

Reply to author

Forward