OpenTSDB Supported Grafana Dashboards

nwhitehead

unread,

Oct 13, 2016, 4:14:02 PM10/13/16

to OpenTSDB

I was browsing the new[ish] Grafana.net dashboard repository at https://grafana.net/dashboards.

Sad to say, there are no OpenTSDB supported dashboards listed.

I am going to work on cleaning up some of the dashboards I use with OpenTSDB and will submit a small handful.

I figure some of you may have some Grafana dashboards you might be willing to contribute.

Here's a list of ideas I intend on getting started on [mostly driven by what I use]:

Host/Multi-Host Status driven by tcollector/scollector

Drill down into CPU and processes and memory
Drill down into disks and file systems
Drill down into Network and interface stats

TSDB/Multi-TSDB Status driven by OpenTSDB's internal metrics (drilldowns ?)
Basic HBase Status driven by OpenTSDB's internal metrics (drilldowns ?)
Advanced HBase Status driven by JMX and/or HBase's HTTP published JSON JMX metrics (drilldowns ?)
ZooKeeper Status driven by Quorum's published JMX stats (drilldowns ?)

Other ones in early development:

Oracle Instance and RAC / Postgresql Instance Status (currently driven by JDBC accessible stats from a gcollector [like tcollector, but groovy driven] script)

Tons of drill downs

Kafka instance and cluster Status driven by JMX stats

Many drill downs

WebSphere MQ instance Status driven by gcollector and PCF API

Many drill downs

DropWizard/JMX stats (custom collectors) Nowhere near a standard since this could go anywhere, but there's some work going into more dynamic dashboards which might support some sort of useful "meta" dashboard packaging.

One of the challenges in creating standard re-usable dashboards is that the data collectors must user standard metric-names and tags, so tcollector and scollector stats are good drivers, since the use opinionated and consistent naming conventions. For dashboards relying on non-standard data-collectors, the dashboard would need to accompanied by a minimal guide on how to implement the collection.

Lastly, the Grafana auto-loading template variables (a feature that takes dashboards from interesting up to super-useful) requires that tsd.core.meta.enable_realtime_ts be enabled, which I believe still has some performance implications. Interested to know, do you enable realtime_ts, or deliberately disable it ?

Any one else interested ?

//Nicholas

Izak Marais

unread,

Oct 20, 2016, 6:20:52 AM10/20/16

to OpenTSDB

Hi Nicholas,

We would be interested in dashboards (although we are still on Grafana 2.6).

We are currently load testing our distributed OpenTSDB after moving from a single node. The tsd.core.meta.enable_realtime_ts is really a problem, we cannot reliably insert more than 17k dps into our 4 node cluster with it enabled. After disabling it we easily handle three times as much. This is probably aggravated by the fact that we don't know how to pre-spint he tsdb-meta table, while we could pre-split the tsdb table (The tsdb-meta key-space is an undocumented black box). However we really want those Grafana template variable features!

I will make a separate post to discuss the options (commandline tsdb uid metasync vs. tsd.core.meta.enable_realtime_ts) and hear people's suggestions.

Regards

Izak

nwhitehead

unread,

Nov 4, 2016, 8:09:51 PM11/4/16

to OpenTSDB

I empathize. Saw the same issue. I am seeing some encouraging results switching to using an RTPublisher. Stay with me now.... The publisher asynchronously dispatches through a disruptor ring buffer, then checks a tsuid cache (a chronicle map) to see if the metric/tags are known. If not, the publication is redispatched and written to a chronicle queue where it is picked up by an external process which indexes the metric/tags in a postgres db. So it's basically a search plugin minus the overhead ( excepting about 2-10 micros on each publish and some extra CPU utilization ).

On top of that is a service that pretends to be a graphite server, but simply looks up in the db, which handles templating lookups from Grafana.

So far so good, but I am not handling a massive load, but it's decent. I have had about 200 win and linux servers running scollector and JMX/MQ/Oracle samplers, and we're not getting by ok with ridiculously low hardware resources (real hardware on order...)

ManOLamancha

unread,

Dec 20, 2016, 4:15:07 PM12/20/16

to OpenTSDB

If you get a chance to share your plugin that'd be awesome! We also have a Storm based meta facility that I'm hoping to get our team to open source that does something similar.

Reply all

Reply to author

Forward