Kairosdb and OpenTSDB

266 views
Skip to first unread message

Greg Zapp

unread,
May 12, 2014, 12:49:36 AM5/12/14
to metr...@googlegroups.com
What are the thoughts on kairosdb and openTSDB?  A lot of the metrics20.org copy appears to be geared around graphite's implementation of a time series database, and its API/ecosystem.  However we do currently have at least 2 open source time series databases that have tagging at their core.

Dieter Plaetinck

unread,
May 13, 2014, 2:33:55 AM5/13/14
to metr...@googlegroups.com
Do you see any specific graphite-isms that can be cleaned up?  The cases where we refer to graphite should be basically to use it as an example.
metrics 2.0 of course extends well beyond just graphite.

Although the metrics 2.0 spec doesn't demand any specific tag retrieval features,
I would say that the tags database should support tag key/value equality checks and pattern matching, ideally even regex.

Kairosdb looks neat, but its tags system has 2 limitations, see below.
"
15:57 < codyaray> so when you add a new (metric-name/tag-pairs) combination, this becomes a new row in the main column family….
but one of the index CFs has row-keys corresponding to the metric names and columns corresponding to the tag-pairs. so each different tag-pair becomes a column
which quickly exhausts the number of columns (technically "cells") that cassandra supports.
15:59 < UICTamale> cassandra has a hard limit of 2 billion cells per partition
15:59 < UICTamale> we were approaching that limit - which made query times skyrocket
"
and also:
"
16:00 < Dieterbe> and can you easily search based on tags in kairos? like can you search for all metrics with server tag matching a regex *and*
another tag key must be present *nested or* tag must equal ...
16:00 < codyaray> no. only string equality (right now, at least)
16:00 < codyaray> flat key-value map
16:01 < Dieterbe> so it's not really a good fit for metrics 2.0.. unless you put ES in between like graph explorer does
16:01 < codyaray> or its a good place to contribute so it works better. its still a very young project. ;)
16:01 < Dieterbe> fair point
16:02 < UICTamale> yeah, the regex support is definitely something I was expecting to already be there
"

bottom line is, kairosdb supporting tags natively is not very useful here, although we could just identify metrics with a key (and bypass kairosdb' tag support), and maintain the tags in ES, like we currently do with graphite.

opentsdb, i have no experience with. if it support lots of tags, and performant search on them (incl regex) and that seems useful, otherwise, ES proves to be pretty great.

Pradeep

unread,
May 13, 2014, 5:06:21 AM5/13/14
to Dieter Plaetinck, metr...@googlegroups.com
Hi Dieter,

I am pretty much familiar with the working of OpenTSDB.

So basically about the above two points you pointed out.

1. Basically opentsdb keeps a table name "tsdb-uid" which is a
bi-directional mapping of UID and the metric-name, tag-key, tag-value.
Each metric, tag-key, tag-value is assigned a UID in its own
namespace. So basically a new tag-key, tag-value pair means four
entries in tsdb-uid table ( since being bidirectional ). But since this
table is relatively very small, it is always kept in memory. The other
table "tsdb" is the actual table containing the actual datapoints. So
number of rows in this table will be same if you have suppose X
datapoints either with or without tag-key, tag-value.

2. OpenTSDB support some basic regex like.

If you want to plot graphs for all hosts which are sending support
metric cpu.idle. The you can put a regex host = * , but you can't use
host = api.* which is in the future roadmap of Opentsdb of 2.1.

If you want to plot graphs for support two hosts , you can use tags
like host = api.example.com | mysql.example.com

Thanks,
> metrics20.org <http://metrics20.org> copy appears to be geared
> around graphite's implementation of a time series database, and
> its API/ecosystem. However we do currently have at least 2 open
> source time series databases that have tagging at their core.
>
> --
> You received this message because you are subscribed to the Google
> Groups "metrics2.0" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to metrics20+...@googlegroups.com
> <mailto:metrics20+...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
Message has been deleted

Brian Hawkins

unread,
May 26, 2014, 9:46:21 AM5/26/14
to metr...@googlegroups.com
Thanks Dieter I think you just solved my problem.  I created KairosDB and one of the sore points has been the row key index.  Trying to store the row keys(that include the tags) in cassandra so they can be retrieved in interesting ways has been a challenge.  After reading your post I realized the obvious answer, ES or Lucene is the obvious/ideal place to store such data.  I just need to make the backend pluggable so that these applications can be plugged in as index lookup mechanisms. 

Thanks
Brian

Dieter Plaetinck

unread,
May 29, 2014, 2:09:25 PM5/29/14
to metr...@googlegroups.com
Nice to hear but also sad in a way: a seperate service is not ideal in terms of complexity, as well as performance, data locality and hence probably even scalability. I do it with graphite cause there's no other way, would be have been nice if this wasn't as big of an issue in C* as it is. Ideally the TSDB would have proper tags support built-in to allow the use cases i typically demo with graph-explorer but in a more performant/scaleable manner.
Or maybe I'm overthinking this and this will rarely become an issue.  maybe a decoupled tags/metadata service is a good thing,  I'm not quite clear yet on that.   It works well for Vimeo at least.   We have about 400k distinct timeseries, ES returns quick even for complicated searches with regex, although sometimes the targets become very long, because a simple query gets turned into an expression that mentions every used timeseries explicitly.  This also a thing to consider with a decoupled tags database.
Reply all
Reply to author
Forward
0 new messages