OpenTSDB Modifications

ManOLamancha

unread,

Jan 24, 2012, 9:52:43 AM1/24/12

to OpenTSDB

Hi Everyone, I'm going to be working on upgrades to OpenTSDB over the
next 6 months as I'll be using it for a massive metrics storage system
at work. I pinged Tsuna with some of these ideas and incorporated his
suggestions and now I'm bringing it to the list for everyone to take a
look at and constructively critique. If you're already working on some
portion of this idea (like Esper integration and GUI stuff) please
ping me so we can coordinate. Thanks!

Ingest

In our network, we have multiple points-of-presence around the globe
with hundreds or thousands of machines in each location generating
data. Since HBsae doesn’t provide real-time writing across data
centers (only replication) and I worry about the stability of
connections from a TSD to a central HBase cluster over a WAN (MySQL
doesn’t like this), I would like to create a scaled down ingest daemon
that can live in each DC. The ingester would be a TSD, configured for
ingest, that would intake the raw metric data from devices in the
local DC. Then it would perform basic checks to make sure the data is
valid (does it have the proper tags, is the data numeric?). It would
spool the information to a local disk via SQLite in case the DC
becomes partitioned and can’t talk to the central servers. If the DC
was portioned off, the ingest daemon could replay the data once a
connection was re-established. Then it would push the data off to the
central brokers. This style is similar to Splunk (a really neat
product) where we install “forwarders” in different data centers that
collect data, bundle it up, and transport it to central processing
nodes.

A critical component for ingest is to also add HTTP input and I would
like to use a JSON format for this. JSON seems to be used all over the
place now and it’s very easy to work with. I would like to add a JSON
input and output format for OpenTSDB and try to get other folks to
standardize on it instead of having tons of slightly different, single
line formats ala the telnet interface or Graphite’s input methods.
We’ve been using a format like this:

{“timestamp”:epoch_time, “metric”:”metric.name”, ”value”:value, “tags”:
{”tagname1”:”tagvalue1”, “tagname2”:”tagvalue2”}}

We’ve also used bulk formats where the metric name and tags are shared
to cut down on network traffic (which isn’t usually an issue but when
you have millions of data points traveling from dozens of DCs, it
starts to become and issue)

Also as a part of this upgrade we would like to include some simple
authentication to the API including LDAP integration and shared-key
access. The shared key would be for communication between TSDs and
source tools. LDAP for users. We need this because the ingest daemons
would be running on networks (and possibly machines) open to the Net,
so we need some way to lock them down a bit. It would be an opt-in
option that users would config so we don’t break existing setups.
Rollups

We would like to have separate rollup metrics written to HBase that
function in a fashion similar to RRDs where requests for wide
timespans can query fewer data points for quicker access and graphing.
It’s great that you can keep raw metrics forever, but if I want to
view the traffic on an interface over a year to see how it’s changed,
I don’t want to pull 8760 rows and down-sample 525,000 data points (1
minute interval metric). I’d rather find 1 row with 365 data points.

I’d break rollup configuration options into set intervals including:
• Hourly
• Daily
• Weekly
• Monthly
• Quarterly

For storage, what’s the best way to do this do you think? I initially
thought of creating a separate HBase table for each rollup interval.
Then we’d use a similar naming schema as the raw table (minus the
timestamp) and create a new cell for each timestamp in that interval.
The API could look at a timespan and automatically determine which
table to pull from unless the user overrides the choice with some
flags.
Or would it be better to store all of the rollups in a single table
with a slightly different key naming convention ala metricID/
rollup_type/tagIDs?
For storing the actual values, we could use an 8|8|8 byte format where
the values are min|max|avg.
The metric meta data would store a cell with flags about which rollups
to perform, configured by the user. E.g. the user could, through the
API or GUI, tell it to rollup all “if.bytes.in” metrics on a daily and
monthly basis. A thread would have to walk the metadata and update
each metric of that type.
Aggregates

We want to use OpenTSDB for capacity planning and properly doing so
requires that we be able to quickly and easily see a graph with the
total amount of some data for a service or platform. For example, we
may want to know the total traffic in and out of a specific platform
that has over 10,000 devices. Running a scan to fetch all of that
data, then downsample and generate a grapth would take a looooong
time, so I think it would be better to have rules, acting on real-time
data, that perform an aggregation and save the data as another metric
to the TSDB. I would use the same raw and rollup schemas but add a
special tag, maybe “tsdagg” that lets us know it’s an aggregate value.
Metadata

I would like to add meta data cells to the “metrics” and “tagk” rows
that can be displayed in the GUI or API. I would have the TSDs loading
from HBase every 5 minutes or so (configurable). Would it be better to
have a separate cell for each value or maybe have one extra cell that
includes a JSON string with the different values? A list of cells that
I would add appears in the Schema section. All of this information
would be accessible via the API and different interfaces could access
it. We’ll have a wide range of users (from smart Ops folks to new NOC
techs to management) so we need to metadata to help them make sense of
the flood of data we’ll be tossing at them.
Thresholding/Alerting

I know a few folks in the Google group have been asking for an
alerting mechanism and I saw the Nagios pull script that lets it query
the TSDB for data. I also saw some folks talking about Esper and after
reading some more about that engine, I think it would be a really good
and easy way of providing push alerts. The user would add Esper rules
via the API and a single TSD instance (single because complex Esper
queries need all of the data to function properly) would consume
messages and compare the data against the rules. When a threshold is
reached, it would perform one or more of the following actions:
• Send an email using some Java email library
• Launch a script. Users can write bridges to their own monitoring
infrastructure. We would create one off the bat to interface with
Zabbix.
• Log it to HBase for auditing later (this happens regardless)

Esper is mature, open source and written in Java. If users want, they
can plugin a different engine.
GUI

It sounds like a few folks are working on different GUIs for OpenTSDB
so I was thinking of divorcing the HTTP GUI from the TSD and letting
it access data only via the API. That way everyone could write
whatever interface they want and easily integrate it into their
existing control panels.

Part of the API calls I want to add include support for saving graphs
and creating dashboards for different users. I envision using Graphite
to let users design their dashboards and it would write the settings
back to HBase via the TSD API.
MQ

To maintain horizontal scalability and redundancy, I would like to
modularize the TSDs where users can install the TSD on a box and
configure it to perform whatever roll they want. MQ is an ideal
solution because it’s also horizontally scalable. The OpenTSDB modules
would hang off the MQ bus and each module (that works with data) would
setup a different queue (based on it’s roll) to receive data in real-
time. We’ve used RabbitMQ with a lot of success and I would start with
that for MQ work but we could plugin any MQ engine. This means we
don’t need a ton of inter-process communication but for instances
where we do need it, we can push messages back into the MQ brokers for
delivery.

Another possibility is using Storm, which looks very promising but I’m
not sure how it coordinates between distributed processes. Plus it
would require a good amount of work to convert the TSD into a
distributable JAR.
Modules

There are two ways to go about modularizing and I’ll defer to ya’ll to
determine what you think is best. We could either make a single TSD
executable that folks would install that contains the code for all of
the different rolls. A local config file would determine how the TSD
executes. Or we could break it up into separate components that users
would have to install in the proper locations. I’m not sure which is
better.

Rolls include:
• Storage – The existing TSD that takes in data and writes it to disk
• Ingest – Just accepts data, temporarily stores it for safe keeping,
and forwards it on to the central Storage nodes.
• Rollup/Aggregation – These would pull a copy of the data stream off
the MQ bus and perform real-time rollup/aggregation and push the data
to the Storage nodes. This could be processor intensive, that’s why I
think it would be best to let these run alone (optionally) separate
from the main TSDs.
• Thresholding – A single or HA pair of TSDs grabbing data from the MQ
feed and performing the threshold calculations and alerting. This can
be very memory and processor intensive and with Esper it has to be a
single node acting on all the data.
• API – These would be used for read/write access to the data,
metadata and configurations
• GUI – I’d kinda like to separate the built-in GUI so that it’s
performing API calls to get it’s data. We’re leaning towards splicing
the Graphite front-end by making it call the API.
Real Time

We wouldn’t do anything with this for a while, but I know there was a
bit of interest on the Google group in using something like Cube to
create a real-time display. An RT server (or cluster) could consume MQ
messages just like the others and draw graphs quickly.
Reporter

Part of our use requires reports being generated every so often so I
would write a schema in HBase for storing configuration data. Then a
daemon would run and spit the report data out via email and/or scripts
for ingestion in other services.
General

Configuration – I would like to use HBase for centralized
configuration system. The main API TSDs would have to have a config
file pointing them at the HBase cluster. On startup, these would pull
their config data directly from HBase. Then all the other modules
would simply require a config file with a list of API hosts (and
authentication) to connect to to pull their config info.

JSON – Have ya’ll though of using an existing Java JSON library
instead of rolling your own? It may make it easier if we can use more
JSON for settings (stored in HBase cells) and MQ transport.
Schemas

TABLE: tsdb-uid – Since the tags and metrics exist in the UID table,
I’d like to just add a cell or cells to the respective rows that can
be pulled into the TSD and displayed.

Full Metric UUID Row
• Maximum – the maximum value for this metric, useful in graphs to
calculate percentages or used/free values. E.g. a gigabit interface
would have a max value of 1000000000. This would be updated on an as-
received basis if a new value comes in. A value of 0 or missing means
“ignore me”. A non-zero value means it’s important.
• Interval – the TSD will track how often it receives this metric and
if it changes (after some hysteresis) it will update this value.
• First received – logged at metric creation
• Retention – how long (in days) to keep this metric in the raw
storage. Default is 0 which means it will never be deleted (for
backwards compatability)
• Rollup rules – what kind of rollups to perform for this metric
Metric Name Row
• display name – Metrics may be stored as “if.bytes.in” but in a graph
we may want to show “Interface Bytes In”
• value type – The type of data (gauge, counter, delta) that is
recorded. Useful for generating graphs without having to click the
“Rate” button
• notes – a text field with notes about this type of metric
• date updated – the last time this metric metadata was updated by a
user
• updated by – integer ID of the user who updated the meta data

Tag Name Row
• display name – Same as the metrics, used for display purposes
• description – notes about this tag and what it represents
• date_added – the first time this tag name was created
• date updated – the last time the metadata was updated
• updated by – who updated the meta data last

More schemas to come..

tsuna

unread,

Jan 24, 2012, 12:36:22 PM1/24/12

to ManOLamancha, OpenTSDB

On Tue, Jan 24, 2012 at 6:52 AM, ManOLamancha <clars...@gmail.com> wrote:
> Hi Everyone, I'm going to be working on upgrades to OpenTSDB over the
> next 6 months as I'll be using it for a massive metrics storage system
> at work. I pinged Tsuna with some of these ideas and incorporated his
> suggestions and now I'm bringing it to the list for everyone to take a
> look at and constructively critique. If you're already working on some
> portion of this idea (like Esper integration and GUI stuff) please
> ping me so we can coordinate. Thanks!

Although this discussion originally started off-list, I'd encourage
everyone involved to keep the discussions on the mailing list so that
everyone can contribute or just throw in their 2¢.

> For storage, what’s the best way to do this do you think? I initially
> thought of creating a separate HBase table for each rollup interval.
> Then we’d use a similar naming schema as the raw table (minus the
> timestamp) and create a new cell for each timestamp in that interval.
> The API could look at a timespan and automatically determine which
> table to pull from unless the user overrides the choice with some
> flags.
> Or would it be better to store all of the rollups in a single table
> with a slightly different key naming convention ala metricID/
> rollup_type/tagIDs?

From the off-list discussion earlier:

On Tue, Jan 24, 2012 at 6:58 AM, ManOLamancha <clars...@gmail.com> wrote:
> On Tue, Jan 24, 2012 at 1:24 AM, Benoit Sigoure <ts...@stumbleupon.com> wrote:
>> I would advise against using separate tables for rollups. HBase
>> generally performs better with a single table. Rollups could be
>> stored either under a different metric name (e.g. by appending
>> ".hourly" at the end of the metric for which we're doing hourly
>> rollups), or in a different column family, or simply by structuring
>> the key-space differently.
>>
>> I like the idea of introducing new metric names for rollups as it'll
>> keep the data locality good, and yet still distribute it well across
>> the table.
>
> Regarding the rollup storage, you definitely have more experience with Hbase than I do, but is it really faster to load the rollups from the same table as all the raw values? I thought that having a separate table with a smaller btree index on the rows would make for faster access when performing row scans. Or is there some other consideration like region distribution or caching that lets a single giant table perform better than split tables?

Given OpenTSDB's schema, there are virtually no performance difference
between using a single table or multiple tables. This is especially
true if you store a LOT of data. In practice the table for yearly
rollups will be much smaller, so accesses to it might be a bit faster,
but in the grand scheme of thing, this will be insignificant.

Using a single table for all the data makes life a lot easier from an
operational standpoint. It's easier to manage, troubleshoot, account
for, capacity plan for. It will create fewer regions and utilize them
better. Region count becomes a real problem when you start running on
large data sets. The solution is typically to create fewer bigger
regions by increasing the max region size and then merging regions
together. Having a single table for all the data makes this process
simpler, should you have to go through it.

So even if there are small theoretical performance gains, which isn't
even guaranteed, I think the pros of using a single table outweigh the
cons.

Also, using a single table will also keep the code a bit simpler as we
won't have to do "if (blah) then use this table; else if (blah) then
use that table; else if …"

> Also, for the config and other data that would have a pretty small footprint, is it alright to split those into separate tables or is it better to have one table for all of that data? I want to figure out how to design the schema. Thanks again for your feedback!

There is already a separate table for meta data, called "tsdb-uid".
Granted the name implies that it's only for UID mappings, but I would
recommend we put all the meta data in there anyway. A lot of the
meta-data applies per-metric/tag name/tag value, and that table is a
great fit to store it as it already has one row per metric/tag
name/tag value. We'd simply add more columns to these rows.

> For storing the actual values, we could use an 8|8|8 byte format where
> the values are min|max|avg.
> The metric meta data would store a cell with flags about which rollups
> to perform, configured by the user. E.g. the user could, through the
> API or GUI, tell it to rollup all “if.bytes.in” metrics on a daily and
> monthly basis. A thread would have to walk the metadata and update
> each metric of that type.

> Aggregates
>
> We want to use OpenTSDB for capacity planning and properly doing so
> requires that we be able to quickly and easily see a graph with the
> total amount of some data for a service or platform. For example, we
> may want to know the total traffic in and out of a specific platform
> that has over 10,000 devices. Running a scan to fetch all of that
> data, then downsample and generate a grapth would take a looooong
> time, so I think it would be better to have rules, acting on real-time
> data, that perform an aggregation and save the data as another metric
> to the TSDB. I would use the same raw and rollup schemas but add a
> special tag, maybe “tsdagg” that lets us know it’s an aggregate value.
> Metadata

I would also implement this with a modified metric name, e.g. by
adding a ".sum" suffix if you're doing aggregation by sum.

> I would like to add meta data cells to the “metrics” and “tagk” rows
> that can be displayed in the GUI or API. I would have the TSDs loading
> from HBase every 5 minutes or so (configurable). Would it be better to
> have a separate cell for each value or maybe have one extra cell that
> includes a JSON string with the different values?

Meta data could be stored in separate cells yes. You could always
reconstruct the JSON object by doing a get on all the cells.

The TSD's GUI already does everything it does through an API. It's
not cheating. But I agree we need more APIs to make it easier to
build third party GUIs.

For the modules, I feel like it's simpler if it's the same binary that
can do everything, just behaves the way you want depending on how you
invoke it.

I like the idea of persisting config in HBase and loading it at
startup. Yes if we add more JSON stuff, we'll definitely want to use
a well-established JSON library such as GSON or Jackson.

--
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

ManOLamancha

unread,

Jan 24, 2012, 1:00:59 PM1/24/12

to OpenTSDB

> Given OpenTSDB's schema, there are virtually no performance difference
> between using a single table or multiple tables. This is especially
> true if you store a LOT of data. In practice the table for yearly
> rollups will be much smaller, so accesses to it might be a bit faster,
> but in the grand scheme of thing, this will be insignificant.
>
> Using a single table for all the data makes life a lot easier from an
> operational standpoint. It's easier to manage, troubleshoot, account
> for, capacity plan for. It will create fewer regions and utilize them
> better. Region count becomes a real problem when you start running on
> large data sets. The solution is typically to create fewer bigger
> regions by increasing the max region size and then merging regions
> together. Having a single table for all the data makes this process
> simpler, should you have to go through it.
>
> So even if there are small theoretical performance gains, which isn't
> even guaranteed, I think the pros of using a single table outweigh the
> cons.
>
> Also, using a single table will also keep the code a bit simpler as we
> won't have to do "if (blah) then use this table; else if (blah) then
> use that table; else if …"

OK, that makes sense then, just wanted to verify it :) I'll go forward
working a single gianormous table then.

> > Also, for the config and other data that would have a pretty small footprint, is it alright to split those into separate tables or is it better to have one table for all of that data? I want to figure out how to design the schema. Thanks again for your feedback!
>
> There is already a separate table for meta data, called "tsdb-uid".
> Granted the name implies that it's only for UID mappings, but I would
> recommend we put all the meta data in there anyway. A lot of the
> meta-data applies per-metric/tag name/tag value, and that table is a
> great fit to store it as it already has one row per metric/tag
> name/tag value. We'd simply add more columns to these rows.

Yeah, I'll keep the metadata stuff in the "tsdb-uid" table. But for
things like users I was thinking of creating "tsdb-users" and for
rollup/agg rules maybe "tsdb-aggroll-rules". Sort of mirroring a MySQL
schema. Does that make sense? Thanks!

tsuna

unread,

Jan 24, 2012, 1:05:29 PM1/24/12

to ManOLamancha, OpenTSDB

On Tue, Jan 24, 2012 at 10:00 AM, ManOLamancha <clars...@gmail.com> wrote:
> Yeah, I'll keep the metadata stuff in the "tsdb-uid" table. But for
> things like users I was thinking of creating "tsdb-users" and for
> rollup/agg rules maybe "tsdb-aggroll-rules". Sort of mirroring a MySQL
> schema. Does that make sense? Thanks!

I would encourage you to put everything in the "tsdb-uid" table,
despite its name. As long as we carefully choose how to craft our
keys and column qualifiers, there won't be any collisions.

Rollup and aggregation rules will probably be a per-metric thing, so
it makes sense to store them along whatever other meta-data we'll add
to "tsdb-uid" for metrics.

Users are the only thing that would look weird in the "tsdb-uid"
table, but really at this point it would be better to not add a new
table.

David Prime

unread,

Jan 30, 2012, 12:09:36 PM1/30/12

to OpenTSDB

If we're going to be overloading the uid table, would it not be
sensible to rename it ? I really dislike unnecessary overloading or
poor naming, it makes traversing the system later very difficult.

ManOLamancha

unread,

Jan 30, 2012, 12:26:19 PM1/30/12

to OpenTSDB

There are a number of things besides users that I'd like to throw into
storage such as the agg/rollup rules, threshold rules, permissions,
report rules, configs, audit logs and graph queries. So the uid table
could get pretty bloated with additional info. I originally thought to
store all of this in MySQL (which we'll be using at work anyway
because the Graphite front-end will need MySQL to store Django
objects) but Tsuna would like to keep external dependencies down,
which makes sense.

tsuna

unread,

Feb 4, 2012, 3:44:41 AM2/4/12

to David Prime, OpenTSDB

This is true, I regret not calling this table "tsdb-meta". But
creating a new HBase table just for a dozen configuration knobs
doesn't make sense. So re-using "tsdb-uid" is fine.

Table names are configurable anyway. New installs could be deployed
with "tsdb-meta" as the table name, and existing installs that can't
undergo a table rename (as this requires downtime) can keep using
"tsdb-uid" until their next maintenance window.

Jonathan

unread,

Jul 5, 2012, 1:50:54 PM7/5/12

to open...@googlegroups.com, David Prime

Is this effort still underway? My group is currently kicking off an OTSDB effort with similar requirements: distribution, aggregation, visualization, with plans for RabbitMQ, Esper CEP, etc. and we'd love to collaborate on this.

-Jonathan

ManOLamancha

unread,

Jul 5, 2012, 2:02:54 PM7/5/12

to open...@googlegroups.com, David Prime

He Jonathan, absolutely. I was stuck with a higher priority project that wraps in the next week or two so I'm about to start digging on this again. Let me know your ideas and we can coordinate. Thanks!

Pablo Chacin

unread,

Jul 9, 2012, 5:03:26 AM7/9/12

to Jonathan, open...@googlegroups.com, David Prime

Same here! I proposed a similar architecture to my company . It was
"approved" but haven't committed resources so I'm still waiting for the
"green light" to start.

--
Pablo Chacin
R&D Engineer
SenseFields SL
Tlf (+34) 93 418 05 85
Baixada de Gomis 1,
08023 Barcelona (Spain)
http://www.sensefields.com/

m...@one.com

unread,

Jan 14, 2013, 8:03:28 AM1/14/13

to open...@googlegroups.com

On Tuesday, 24 January 2012 15:52:43 UTC+1, ManOLamancha wrote:

Ingest

In our network, we have multiple points-of-presence around the globe
with hundreds or thousands of machines in each location generating
data. Since HBsae doesn’t provide real-time writing across data
centers (only replication) and I worry about the stability of
connections from a TSD to a central HBase cluster over a WAN (MySQL
doesn’t like this), I would like to create a scaled down ingest daemon
that can live in each DC. The ingester would be a TSD, configured for
ingest, that would intake the raw metric data from devices in the
local DC. Then it would perform basic checks to make sure the data is
valid (does it have the proper tags, is the data numeric?). It would
spool the information to a local disk via SQLite in case the DC
becomes partitioned and can’t talk to the central servers. If the DC
was portioned off, the ingest daemon could replay the data once a
connection was re-established. Then it would push the data off to the
central brokers. This style is similar to Splunk (a really neat
product) where we install “forwarders” in different data centers that
collect data, bundle it up, and transport it to central processing
nodes.

IMHO, this could (and probably shoud) be a separate project. From my experience with Etsy's StatsD, such servers can be written in a fairly simple manner.

And why spool in SQLite? Wouldn't appending to plain files be simpler? (And, when all goes to hell, they could be bulk-imported to tsdb later on.)

A critical component for ingest is to also add HTTP input and I would
like to use a JSON format for this. JSON seems to be used all over the
place now and it’s very easy to work with. I would like to add a JSON
input and output format for OpenTSDB and try to get other folks to
standardize on it instead of having tons of slightly different, single
line formats ala the telnet interface or Graphite’s input methods.
We’ve been using a format like this:

{“timestamp”:epoch_time, “metric”:”metric.name”, ”value”:value, “tags”:
{”tagname1”:”tagvalue1”, “tagname2”:”tagvalue2”}}

We’ve also used bulk formats where the metric name and tags are shared
to cut down on network traffic (which isn’t usually an issue but when
you have millions of data points traveling from dozens of DCs, it
starts to become and issue)

As to what to send over HTTP, a client could easily set a 'Content-type: application/json'-header for JSON, and other types for whatever else it accepts.

Also as a part of this upgrade we would like to include some simple
authentication to the API including LDAP integration and shared-key
access. The shared key would be for communication between TSDs and
source tools. LDAP for users. We need this because the ingest daemons
would be running on networks (and possibly machines) open to the Net,
so we need some way to lock them down a bit. It would be an opt-in
option that users would config so we don’t break existing setups.

Authentication in the HTTP/Presentation-layer could easily be done via a proxy (and you need a perimeter-host anyway, given Hadoops lacking intra-cluster auth)

I might have missed something here, but I don't see any big difference between rollups and the aggregates-parts; they both read like persisted/pre-warmed caches to me.

While caching most certainly would make sense for historical data, I see a lot of complexity in making sure the cache actually contain something useful.

Just for an example, I don't think min/max/avg rollups are enough; I want percentiles, standard deviation and medians. And for quite a few types of data, a plain sum just makes more sense - so that will either have to be configured somewhere, or use a good deal of disk storing all the un-used aggregates/rollups...

Metadata

I would like to add meta data cells to the “metrics” and “tagk” rows
that can be displayed in the GUI or API. I would have the TSDs loading
from HBase every 5 minutes or so (configurable). Would it be better to
have a separate cell for each value or maybe have one extra cell that
includes a JSON string with the different values? A list of cells that
I would add appears in the Schema section. All of this information
would be accessible via the API and different interfaces could access
it. We’ll have a wide range of users (from smart Ops folks to new NOC
techs to management) so we need to metadata to help them make sense of
the flood of data we’ll be tossing at them.

People seem to have converged on some good ideas on this elsewhere on the list, so I won't repeat it here.

Thresholding/Alerting

I know a few folks in the Google group have been asking for an
alerting mechanism and I saw the Nagios pull script that lets it query
the TSDB for data. I also saw some folks talking about Esper and after
reading some more about that engine, I think it would be a really good
and easy way of providing push alerts. The user would add Esper rules
via the API and a single TSD instance (single because complex Esper
queries need all of the data to function properly) would consume
messages and compare the data against the rules. When a threshold is
reached, it would perform one or more of the following actions:
•        Send an email using some Java email library
•        Launch a script. Users can write bridges to their own monitoring
infrastructure. We would create one off the bat to interface with
Zabbix.
•        Log it to HBase for auditing later (this happens regardless)

Esper is mature, open source and written in Java. If users want, they
can plugin a different engine.
GUI

Couldn't Esper be another back-end/consumer, just like the HBase and Cassandra writers?

My only wish is that this sort of stuff is done as plug-ins of some sort, as I expect quite a few users either will have their own alerting-infrastucture or wish to use something else.

It sounds like a few folks are working on different GUIs for OpenTSDB
so I was thinking of divorcing the HTTP GUI from the TSD and letting
it access data only via the API. That way everyone could write
whatever interface they want and easily integrate it into their
existing control panels.

Yes, please!

Part of the API calls I want to add include support for saving graphs
and creating dashboards for different users. I envision using Graphite
to let users design their dashboards and it would write the settings
back to HBase via the TSD API.

Wait, what? You want to put a dashboard-saving API into OpenTSDB? Why not focus on giving OpenTSDB a kick-ass data-API, and then let each dashboard/interface/consumer/whatever do their own storage.

I'm all for borrowing the Graphite web-interface (it is amazingly good on Carbon), but I think it should rather have a thin shim/proxy for serving data, rather than integrating it directly into OpenTSDB.

MQ

To maintain horizontal scalability and redundancy, I would like to
modularize the TSDs where users can install the TSD on a box and
configure it to perform whatever roll they want. MQ is an ideal
solution because it’s also horizontally scalable. The OpenTSDB modules
would hang off the MQ bus and each module (that works with data) would
setup a different queue (based on it’s roll) to receive data in real-
time. We’ve used RabbitMQ with a lot of success and I would start with
that for MQ work but we could plugin any MQ engine. This means we
don’t need a ton of inter-process communication but for instances
where we do need it, we can push messages back into the MQ brokers for
delivery.

Another possibility is using Storm, which looks very promising but I’m
not sure how it coordinates between distributed processes. Plus it
would require a good amount of work to convert the TSD into a
distributable JAR.

Don't we have ZooKeeper for cluster coordination, or am I missing something here?

Or do you want to put all the actual data into RabbitMQ? If so, why not a back-end plug-in that just outputs to RabbitMQ? And I guess it wouldn't be too hard to write a small proxy that listens for RabbitMQ-messages and forwards them to a TSD.

Sticking to the UNIX-philosophy, couldn't a lot of this be put in separate projects? Granted, deeply integrating such tools has it's advantages, but it makes the whole project much more complex.

Reporter

Part of our use requires reports being generated every so often so I
would write a schema in HBase for storing configuration data. Then a
daemon would run and spit the report data out via email and/or scripts
for ingestion in other services.

Given OpenTSDB gets a better data-API, it should be easy to write this as a separate program.

General

Configuration – I would like to use HBase for centralized
configuration system. The main API TSDs would have to have a config
file pointing them at the HBase cluster. On startup, these would pull
their config data directly from HBase. Then all the other modules
would simply require a config file with a list of API hosts (and
authentication) to connect to to pull their config info.

Again, I believe ZooKeeper is built for this exact purpose.

[...]

A lot of that sound quite sensible to me.

All in all, I see a lot of stuff that would be solved by having some sort of pluggable front/back-ends.

- Keep "core" OpenTSDB small, lean and, hopefully, bug-free (and we bother tsuna less)

- No dependency-hell for simple installations.

- Allow for interfacing with proprietary stuff without having to open-source all of it.

Peter Speybrouck

unread,

Jan 14, 2013, 11:11:19 AM1/14/13

to open...@googlegroups.com

>> Why not focus on giving OpenTSDB a kick-ass data-API, and then let each dashboard/interface/consumer/whatever do their own storage.

I agree with this one.
In another discussion there were talks about canceling long running queries. With such a kick-ass data-API, you could be able to run async or streaming queries and send a cancel for the request if it is taking too long.
The current HTTP api would have no clue which request you mean if you send an update (like cancel) for an earlier request.
Another long running query could be a full text search on metric names (which is currently not a good idea if you have a lot of metrics and HBase does prefix searching apparently).

m...@one.com

unread,

Jan 15, 2013, 6:55:22 AM1/15/13

to open...@googlegroups.com

On Monday, 14 January 2013 17:11:19 UTC+1, Peter Speybrouck wrote:

[...]

In another discussion there were talks about canceling long running queries. With such a kick-ass data-API, you could be able to run async or streaming queries and send a cancel for the request if it is taking too long.
The current HTTP api would have no clue which request you mean if you send an update (like cancel) for an earlier request.

I believe the accepted way of doing away with too-long HTTP requests is to close the underlying TCP connection. It does sound rather cumbersome to send another HTTP-request to ask the server to stop sending more data.

Another long running query could be a full text search on metric names (which is currently not a good idea if you have a lot of metrics and HBase does prefix searching apparently).

If we stick to HTTP, I mainly see long-polling (where the server sends chunked responses when it gets new data) and a decidedly stateful API built on top of HTTP (ex. like scans in HBase's REST-interface http://wiki.apache.org/hadoop/Hbase/HbaseRest#Scanning).

ManOLamancha

unread,

Jan 18, 2013, 5:45:40 PM1/18/13

to open...@googlegroups.com

On Monday, January 14, 2013 8:03:28 AM UTC-5, m...@one.com wrote:

On Tuesday, 24 January 2012 15:52:43 UTC+1, ManOLamancha wrote:
Ingest

IMHO, this could (and probably shoud) be a separate project. From my experience with Etsy's StatsD, such servers can be written in a fairly simple manner.

And why spool in SQLite? Wouldn't appending to plain files be simpler? (And, when all goes to hell, they could be bulk-imported to tsdb later on.

Yeah, this could be a separate project and just share the underlying OpenTSDB Java library.

I like SQLite since it gives you the ability to do a little work at the ingest point with some rudimentary querying and such, but it doesn't have to be. Flat files would work fine to start with.

As to what to send over HTTP, a client could easily set a 'Content-type: application/json'-header for JSON, and other types for whatever else it accepts.

Already implemented Formatters that handle a lot of this though there are other ways to do it of course :)

Authentication in the HTTP/Presentation-layer could easily be done via a proxy (and you need a perimeter-host anyway, given Hadoops lacking intra-cluster auth)

Yeah, to start off with the proxy is fine, but eventually we may want some kind of auth/permission structure to control who can add/edit what. But for now it's not high on my list.

Rollups
Aggregates

I might have missed something here, but I don't see any big difference between rollups and the aggregates-parts; they both read like persisted/pre-warmed caches to me.

They are kinda like persisted caches, though since the data is different, it's not an actual cache. A rollup is simply a down-sampling of a single timeseries, cached in storage, ala the query's downsampling algorithm. An aggregate would be a combination of many separate timeseries.

While caching most certainly would make sense for historical data, I see a lot of complexity in making sure the cache actually contain something useful.

Just for an example, I don't think min/max/avg rollups are enough; I want percentiles, standard deviation and medians. And for quite a few types of data, a plain sum just makes more sense - so that will either have to be configured somewhere, or use a good deal of disk storing all the un-used aggregates/rollups...

You're absolutely right, which is why my goal is to develop a rule engine that folks can use to create whatever kind of rollus/aggs that they want without doing unnecessarily work.

Thresholding/Alerting

Couldn't Esper be another back-end/consumer, just like the HBase and Cassandra writers?

My only wish is that this sort of stuff is done as plug-ins of some sort, as I expect quite a few users either will have their own alerting-infrastucture or wish to use something else.

That's pretty much the plan, I think, as we've discussed a bit elsewhere. The TSD would receive data, write it to persistent storage, and ship it onto a buss where separate applications, like Esper, can grab the data and do their thing. Then the output from Esper, to handle events, would be plugable or scriptable so folks could use whatever they need to for further handling.

Wait, what? You want to put a dashboard-saving API into OpenTSDB? Why not focus on giving OpenTSDB a kick-ass data-API, and then let each dashboard/interface/consumer/whatever do their own storage.

I'm all for borrowing the Graphite web-interface (it is amazingly good on Carbon), but I think it should rather have a thin shim/proxy for serving data, rather than integrating it directly into OpenTSDB.

I'm dumping the dashboard saving idea :) After starting in on Graphite it can handle the dashboard stuff just fine.

MQ

Or do you want to put all the actual data into RabbitMQ? If so, why not a back-end plug-in that just outputs to RabbitMQ? And I guess it wouldn't be too hard to write a small proxy that listens for RabbitMQ-messages and forwards them to a TSD.

Elsewhere we're arguing, but I'd rather have TSD accept data, normalize it, and then spit it to RabbitMQ in a format other programs can work with.

Modules

Sticking to the UNIX-philosophy, couldn't a lot of this be put in separate projects? Granted, deeply integrating such tools has it's advantages, but it makes the whole project much more complex.

Tsuna said at one point that he liked having all of these tools in one project, but we could separate them out, though they'll all depend on the core OpenTSDB library.

General

Configuration – I would like to use HBase for centralized
configuration system. The main API TSDs would have to have a config
file pointing them at the HBase cluster. On startup, these would pull
their config data directly from HBase. Then all the other modules
would simply require a config file with a list of API hosts (and
authentication) to connect to to pull their config info.

Again, I believe ZooKeeper is built for this exact purpose.

Good idea, but if we're using different backends, it should be stored with the rest of the data since some folks may not have Zookeeper installed.

All in all, I see a lot of stuff that would be solved by having some sort of pluggable front/back-ends.

- Keep "core" OpenTSDB small, lean and, hopefully, bug-free (and we bother tsuna less)
- No dependency-hell for simple installations.
- Allow for interfacing with proprietary stuff without having to open-source all of it.

Sounds good to me.

Reply all

Reply to author

Forward