Central OpenTSDB with async data from many influxdb instances

28 views
Skip to first unread message

Ralf

unread,
Aug 12, 2016, 7:23:18 AM8/12/16
to OpenTSDB
Hi,

I must admit, I am currently neither yet very familiar with opentsdb nor with influxdb in detail.

I am currently in the architect phase of a large farm of debian vm's which are going to host a bunch of llinux containers, each. 

The VM's are spread across different hosting locations, differend layer2 subnets, etc. all without any common network infrastructure and no VPN.

For monitoring of certain metrics and logs I am considering to run influxdb (along with fluentd) locally as a container on each vm. This container would be the datastore for each other containers metric (running on the same VM as well as for the VM itself) and syslog data. It will store application log and performance data, system (OS) performance data and also be target of syslog. Each data source (on the vm itself or from other local containers) will use an alias to reach the vm-local fluentd/influxdb container - so the alias will always point to the local influxdb/fluentd instance and if the container moves, its data might so be spread across different influxdb instances. The main reason for selecting influxdb here is because influxdb is lightweight and easy to deploy and maintain compared to opentsdb. I do not want to analyze the data directly from there.

For graphing and central analysis, I do want to ship the data of all influxdb instances to a central opentsdb with a graphana frontend, backed by an existing large hbase cluster.

I just wanted to check, if this sounds reasonable and if there are any available solutions already to allow asynchronous pull (push is no option!) from influxdb data into openTSDB, 1:1. From what I have read so far abouth both, influxdb and opentsdb, the data model should be compatible with each other to do something like this. I am thinking about letting a central instance pull each influxdb instance data into the locally available opentsdb.

Any gotachs I have to be aware or anyone who can point me to available solutions / technical documentation / etc. so that I would not have to re-invent the wheel regarding this?

Or is my basic idea completely crap and if yes, why?

thx
Ralf 

ManOLamancha

unread,
Dec 19, 2016, 8:28:48 PM12/19/16
to OpenTSDB
Hi, sorry for the delay:

On Friday, August 12, 2016 at 4:23:18 AM UTC-7, Ralf wrote:
I must admit, I am currently neither yet very familiar with opentsdb nor with influxdb in detail.

I am currently in the architect phase of a large farm of debian vm's which are going to host a bunch of llinux containers, each. 

The VM's are spread across different hosting locations, differend layer2 subnets, etc. all without any common network infrastructure and no VPN.

For monitoring of certain metrics and logs I am considering to run influxdb (along with fluentd) locally as a container on each vm. This container would be the datastore for each other containers metric (running on the same VM as well as for the VM itself) and syslog data. It will store application log and performance data, system (OS) performance data and also be target of syslog. Each data source (on the vm itself or from other local containers) will use an alias to reach the vm-local fluentd/influxdb container - so the alias will always point to the local influxdb/fluentd instance and if the container moves, its data might so be spread across different influxdb instances. The main reason for selecting influxdb here is because influxdb is lightweight and easy to deploy and maintain compared to opentsdb. I do not want to analyze the data directly from there.

I haven't finished my benchmarking of InfluxDB so you'll want to watch resource usage, but if you do need some flexible query capability and analysis on the local VM, then Influx is definitely the way to go. If the load grows too much you may look at setting up a stand-alone InfluxDB instance in each locality that the local VMs can write to.
 
For graphing and central analysis, I do want to ship the data of all influxdb instances to a central opentsdb with a graphana frontend, backed by an existing large hbase cluster.

I just wanted to check, if this sounds reasonable and if there are any available solutions already to allow asynchronous pull (push is no option!) from influxdb data into openTSDB, 1:1. From what I have read so far abouth both, influxdb and opentsdb, the data model should be compatible with each other to do something like this. I am thinking about letting a central instance pull each influxdb instance data into the locally available opentsdb.

Any gotachs I have to be aware or anyone who can point me to available solutions / technical documentation / etc. so that I would not have to re-invent the wheel regarding this?

The data models are similar though I believe Influx supports multi-value tags whereas TSDB does not. If you're only looking at numeric data then you're good to go and this is a perfectly valid setup.

You can also look at InfluxDB for your central collection and if your load is large enough you would likely need clustered Influx, though that's a paid product now. At that point you'd also need a distributed HBase setup with OpenTSDB to handle the write load. It's all free + open source but the pain point is running an HBase and HDFS cluster. You can also opt for Google's Bigtable to escape the pain of HBase if you like.
Reply all
Reply to author
Forward
0 new messages