Making the host-name the metric and the metric a tag?

43 views
Skip to first unread message

John Humphreys

unread,
May 11, 2017, 9:47:17 AM5/11/17
to OpenTSDB
Hey,

I need to handle 40,000 hosts, 325 metrics each, 1 point per minute.

I've managed to load the data at a rate of 240,000 data-points-per-second pretty easily (18.7 billion data points loaded at this rate for a day).  But the query speeds were not what I anticipated originally.

Let's say I have 1 day's worth of data:
  • If I have 325 metrics (held as metrics) and 40,000 hosts (held as tags) -> Takes 12 seconds to query a new metric.
  • If I have 40,000 hosts (held as metrics) and 325 metrics (held as tags) -> Takes 50 milliseconds to query a new metric.
Since 325 * 40,000 = 13 million, and we'll likely get more metrics than 325, I can't push the metric and the host into the metric (I'd pass 16 million UIDs quickly).  So, I think these are my only two options.

Is there a down-side to using the host as the metric name in this manner in OpenTSDB?

I see that it messes with Grafana; but I can probably copy the OpenTSDB plugin for Grafana and modify it to work in this manner.

Extra notes:
  • Compaction disabled.
  • Appends disabled.
  • LZ4 compression (we're on MapR-DB).
  • Using 3 notes, salting with 3 buckets.
  • Using random UID assignment.
  • Data block encoding set to FAST_DIFF, though I'm not sure if MapR-DB honors that.
Thanks!

-John

ManOLamancha

unread,
May 27, 2017, 6:04:45 PM5/27/17
to OpenTSDB
Hi, there isn't anything inherently wrong in moving the host to the metric name, it just makes it uglier to query and perform aggregations.

That said, if you're running OpenTSDB 2.0, try making the same queries using the "explicitTags" set to true and make sure to add all the tags you expect to your query. That should help filter out most of the data during the scan.
Reply all
Reply to author
Forward
0 new messages