Dashboards with high-cardinality tags

454 views
Skip to first unread message

tr...@tuggle.org

unread,
Jul 8, 2014, 4:42:19 PM7/8/14
to kairosd...@googlegroups.com
Hi all,

In my use of KairosDB, I end up with a number of high cardinality tags.  I'm not putting in IP addresses or anything, but I've got some metrics with up to 7 tags, some of which have maybe 200-2,000 possible values.

I've also been trying out the Grafana dashboard that was mentioned in the thread, "Meta chart API for docking in visualization libraries".  (The KairosDB-compatible one is at https://github.com/rdettai/grafana)  Grafana is a great start for the type of monitoring that I need.  

So, grafana seems to have taken the approach of simply graphing the results of data source queries—essentially it is a thin wrapper around the underlying TSDB—so there are a number of pieces missing that would greatly help.  I'd like to discuss some of them here because I think an argument could be made either way for solving them on one side or another: either on the grafana side, or in Kairos.

Aliasing metrics.  Grafana simply names the graph traces by the name supplied by the data source.  Graphite provides an 'alias' function to help... it'd be super useful to translate metric names into English names for presenting human-friendly dashboards.

Limiting group_by tag values.  So when I have high-cardinality tags, I end up with thousands of traces on the graph.  But often I only need the top n metrics.  Wouldn't it be awesome to aggregate the rest and label them as value "(Other)"?  This feature would be so wonderfully useful that I put in a feature request.  https://github.com/kairosdb/kairosdb/issues/69

Converting to pure time-series.  My metrics are entirely data-driven.  We don't have a fixed number of metrics with a fixed number of tags—the tag values change to reflect the underlying data.  The metric values always represent a count of events.  Unfortunately, that makes it infeasible for my metrics collection system to 'fill-in' missing values with 0.  So when using grafana, it blows up when trying to stack graphs containing missing values.  But it'd be easily overcome if Kairos had a way to tell it to fill in missing points with either zero, or last-observation, or next-observation.  In other words, all results would be forced to have *some* value at each time-step present.  I suppose for completeness, interpolation could be an option, but that could end up eating more CPU than otherwise.

Each of these "problems" could easily be solved in grafana.  However, any other dashboarding solution that's employed could end up implementing the same things, which is why I thought it worthwhile to ask here if some of these should be considered for features in KairosDB.

So how does everyone else use Kairos?  Does everyone end up building entirely custom dashboards for their data?  Unfortunately, I don't have the resources to do that.

Thanks in advance,

Trenton Tuggle

Kevin Burton

unread,
Jul 8, 2014, 5:59:42 PM7/8/14
to kairosd...@googlegroups.com

Aliasing metrics.  Grafana simply names the graph traces by the name supplied by the data source.  Graphite provides an 'alias' function to help... it'd be super useful to translate metric names into English names for presenting human-friendly dashboards.


Agreed.. we're not going down the Grafana route just yet... but it does look amazingly interesting.  

We have our own console that we're working on... it's mostly just a think layer on top of the kairosdb queries.

What I did here is create a side-index... basically just a json structure with title/description, and related/important tags for a given metric.

This way 

 
Limiting group_by tag values.  So when I have high-cardinality tags, I end up with thousands of traces on the graph.  But often I only need the top n metrics.  Wouldn't it be awesome to aggregate the rest and label them as value "(Other)"?  This feature would be so wonderfully useful that I put in a feature request.  https://github.com/kairosdb/kairosdb/issues/69

I talked about this too.. I 100% agree that it would be valuable.  

The argument against it though is that KairosDB isn't really a graphing framework.

But perhaps a LIMIT on the query makes sense... 

Also, in our implementation, I'm also sorting the tags so that the "top N" are first... not just the first N.. .the difference being I can apply an "interestingness" function to the data and return the most important metrics first.

So say if you're an ops guy and you're trying to track down a bug you can have data anomalies first.
 

Converting to pure time-series.  My metrics are entirely data-driven.  We don't have a fixed number of metrics with a fixed number of tags—the tag values change to reflect the underlying data.

Same here.. 

I think this is going to be trend in KairosDB... if you have a fixed number of metrics, and you don't need tags, just use RRD or Ganglia or something else.  They work pretty well actually.

In our case, it's entirely driven by the data so I"m really aggressively using tags.
 
 The metric values always represent a count of events.  Unfortunately, that makes it infeasible for my metrics collection system to 'fill-in' missing values with 0.  So when using grafana, it blows up when trying to stack graphs containing missing values.  But it'd be easily overcome if Kairos had a way to tell it to fill in missing points with either zero, or last-observation, or next-observation.  In other words, all results would be forced to have *some* value at each time-step present.  I suppose for completeness, interpolation could be an option, but that could end up eating more CPU than otherwise.


YEAH.  There was a discussion in another thread about this.  Essentially what was discussed is a custom tag, which would be on EVERY metric which basically says that metrics are broadcast at regular intervals, say every 15 seconds.  If you don't have a value, it is NOT zero but a gap.

The advantage of this is that in the UI you will see a gap.  Highcharts, Flot, and Google Charts all support this. Not sure about Grafana.
 
Note that you can do this in Javascript ... you just have to factor in the time range of the rollup and insert a 'null' value there. 
 
So how does everyone else use Kairos?  Does everyone end up building entirely custom dashboards for their data?  Unfortunately, I don't have the resources to do that.


It depends on what you mean for 'dashboard'... we're building out a simple web interface that literally just embeds the charts.  It's pretty simple really.  

It makes the API calls and then just embeds the chart.

For anything more complicated, it seems like Grafana or a more advanced console/dashboard would be required.  

We'll probably investigate KairosDB+Grafana+Tcollector for internal stats.

Kevin
 

Loic Coulet

unread,
Jul 9, 2014, 4:03:12 PM7/9/14
to kairosd...@googlegroups.com
Hi,

1 -    Aliasing metrics. 

This is something we're also missing. Graphite and InfluxDB have the feature, and we deliberately kept the Grafana logic as close to KairosDB features. The feature in kairosDB would be nice to have (aliasing the metric in the query), since all other clients would benefit from it.
Maybe another feature request?

2 -    Limiting group_by tag values. 


I also agree on this one. This can be worked-around by defining filters, but building the query becomes too complicated.
The question is what do you have in mind: Top N  metrics :
  • using number of series using the same tag?
  • using number of points in each group?
  • using values of points in each groups?
  • or another idea?

For example, we have a custom version of KairosDB on which we implemented what we named "vertical aggregators", those aggregators permit to perform calculations between distinct groups (existing aggregators in KairosDB are the equivalent of OpenTSDB downsampling aggregators). It is perfectly possible to imagine a vertical aggregator names TOP or BOTTOM that'd take the top or bottom : by value, or by tag cardinality.

 
3 -   Converting to pure time-series. 


Also a nice catch. We have done that by implemeting two custom aggregators.
  • One is "interpolation": it does linear interpolation if a point is missing in a "bucket".
  • Another is "drops": it inserts  nulls values when there is no value in a "bucket" (unfortunately with the currrent design they cannot be chained together in a useful way).
  • A "bucket" being defined by the sampling rate provided ot the aggregators.
  • I have published  the drops aggregator on my github, feel free to reuse it. I Can also send a PR if you like.
This is something we mostly use for some data analysis by getting a normalized sampling rate. It is also useful to get stacked charts working in Grafana (Grafana requires aligned sampling rates to stack charts).

Otherwise the regular behaviour of KairosDB that doesn't impose any sampling rate is the best fit for 90% of our use cases.

4 -  Feed-back / other ideas ?

It's goot to have this discussion opened to share about use cases and best choices or implementations.
 
We can contribute back some (or maybe all of them if my boss is OK) of our customized features to the community.


Loic

Kevin Burton

unread,
Jul 9, 2014, 4:47:03 PM7/9/14
to kairosd...@googlegroups.com

1 -    Aliasing metrics. 

This is something we're also missing. Graphite and InfluxDB have the feature, and we deliberately kept the Grafana logic as close to KairosDB features. The feature in kairosDB would be nice to have (aliasing the metric in the query), since all other clients would benefit from it.
Maybe another feature request?


I wonder if this could be as easy as having an "AS" clause in the query which is kept around into the result.

This way you could include an alias/title in the query.
 One is "interpolation": it does linear interpolation if a point is missing in a "bucket".
  • Another is "drops": it inserts  nulls values when there is no value in a "bucket" (unfortunately with the currrent design they cannot be chained together in a useful way).
This is one I really want as well.  
 
  • A "bucket" being defined by the sampling rate provided ot the aggregators.
  • I have published  the drops aggregator on my github, feel free to reuse it. I Can also send a PR if you like.

I think it should become part of the core so I would suggest a PR if you feel the code is ready for that.

Part of the problem is that the UI is very confusing as it just draws a line between two points.  

So definitely a +1 from my perspective.
 
This is something we mostly use for some data analysis by getting a normalized sampling rate. It is also useful to get stacked charts working in Grafana (Grafana requires aligned sampling rates to stack charts).


Oh... hm.  so if two points are off by 1ms it won't work?  That's interesting.
 
We can contribute back some (or maybe all of them if my boss is OK) of our customized features to the community.


Great.. we're working on catching up here and will probably use grafana for our own internal metrics.  Kind of like a replacement for the ganglia UI. 

Loic Coulet

unread,
Jul 11, 2014, 2:03:12 AM7/11/14
to kairosd...@googlegroups.com
I ralize I already submitted a PR for the gaps aggregator: https://github.com/kairosdb/kairosdb/pull/55

Loic

Brian Hawkins

unread,
Jul 17, 2014, 6:38:38 PM7/17/14
to kairosd...@googlegroups.com
I like the alias idea, that should be pretty simple to implement.  Make a feature request in github so it isn't lost.

Brian

Kevin Burton

unread,
Jul 18, 2014, 1:29:54 PM7/18/14
to kairosd...@googlegroups.com
I'll create a ticket for it... So I assume the general idea is to have another field in the query like:

ALIAS='' or AS=''

which is a title/description?  I wonder if the alias should have fields... like title, description, tags... then grafana could process those as metadata too.
Reply all
Reply to author
Forward
0 new messages