How to decide between metric names or tags as identifier

60 views
Skip to first unread message

Morten Lund

unread,
Jan 7, 2021, 8:17:35 AM1/7/21
to KairosDB
Hi!

We are currently trying to store a very large number of metrics with timeseries data.

We currently have ~5000 metrics.
For each of those metrics we have millions of datapoints.
Each of the datapoints have two tags:
  • id - ID is unique within the metric
  • quality - 0 or 1 based on some criteria

I am starting to question our approach.

Would it be better if we named the metrics using <metric_name>.<id> instead?

This way we would have more metrics, but rely less on tags?

I am not sure what approach gives the best performance and storage/index usage.

Any suggestions are highly welcome! :)

Morten

Loic Coulet

unread,
Jan 8, 2021, 2:18:22 AM1/8/21
to KairosDB
Hi,
IMO it really depends on your use case, but there are some clues.

The tag cardinality if kept in the same order of magnitude is manageable, but it would slow down queries.

Now the question is: is it usefuil for you to group by ID instead of using the metric names ?
You have too many series but not enough combinations to make advantage of this ID tag (unless you really want to have make a query for all the 5000 series without bothering about ids).
If some of the series come from same sources (e.g. location, system, version, provider...) then you may want to be able to add more tags to perform simple queries like metric where provider=toto grouped by id to comare some metrics that share similarities. Beware of combinatorial explosion of series if you do so.

If you don't you will not have any value with so many values for the id tag, you can get better query performances by using the <metric_name>.<id> pattern.

Morten Lund

unread,
Jan 13, 2021, 4:37:57 AM1/13/21
to KairosDB
Thank you for the reply!

Yes, we have "few" metric names (5000), few tag names (2), but a huge amount of tag values (1000-30000) for each of the metric names.
And for each metric.name we have millions of datapoints where each point is tagged with its "data-id" and "data-quality" tag.

And I see that there is an endpoint for getting all metric names based on prefix, so this would also still give us the possibilty to get all data point IDs from the metric names.

<location_id>.<data_point_id>

Is there any difference in the approaches when it comes to how kairos store the data?
Disk usage? Actual performance difference?

What format does kairosDB prefer to insert/query? Many metrics, or few metrics with lots of tags?

Thank you!

Morten Lund

unread,
Jan 13, 2021, 4:39:45 AM1/13/21
to KairosDB

By this:

And I see that there is an endpoint for getting all metric names based on prefix, so this would also still give us the possibilty to get all data point IDs from the metric names.  

I meant if we chose to use metric ids instead of all our tags.

Brian Hawkins

unread,
Apr 23, 2021, 10:42:16 AM4/23/21
to KairosDB
This question does get asked a lot so I've prepared a packaged answer: https://github.com/kairosdb/kairosdb/wiki/Query-Performance

The real gatcha is that kairos will take the data either way and consume it just fine.  The problem arises when you try to query it out.  If you are always going to specify a data-id when querying the data then put it in the metric name.  This will make queries a lot faster.  The upcoming release does have an index option so you could index the data-id tag so the performance would be almost the same - the index does create an extra insert when writing data so it can maintain the index.

Brian

Reply all
Reply to author
Forward
0 new messages