[RFC] Annotations implementation

tsuna

unread,

Jan 30, 2013, 1:34:21 PM1/30/13

to OpenTSDB, ManOLamancha, Peter Götz

Issue #16 is about adding support annotations
(https://github.com/OpenTSDB/opentsdb/issues/16). The idea is to be
able to store strings in TSDB, at a given point in time, and somehow
resurface them on graphs. This could be useful to annotate things
such as code pushes, known outages that start/stop, or anything else
you consider noteworthy.

Peter Götz took a first stab at implementing some support
(https://github.com/OpenTSDB/opentsdb/pull/95). He started off by
having a separate table to store them, and when I said I'd rather keep
all the data in the "tsdb" table to avoid having to hit two tables on
each query, he then refactored the code to move them to a new column
family of the "tsdb" table.

Chris merged Peter's work in his "Scratch" branch
(https://github.com/manolama/opentsdb/tree/Scratch) and he changed the
code to not use another family in the "tsdb" table, instead the
annotations are stored in-line with the data. The discussion started
on this small pull request, where he was trying to make OpenTSDB
v1.1.0 forward compatible with the upcoming support for annotations
(https://github.com/OpenTSDB/opentsdb/pull/150).

I would like to get more feedback on how we should go about
implementing support for annotations. I'm in favor of storing
annotations in-line with the data. But there are still several
questions to be answered:
- Are annotations global or per time-series? Or possibly we support
both types of annotations?
- In either case, how do we write an annotation to TSDB?
- Are annotations just plain strings, or should they be more complex objects?
- [probably some other things I can't think of right now]?

--
Benoit "tsuna" Sigoure

Peter Speybrouck

unread,

Jan 30, 2013, 2:06:39 PM1/30/13

to open...@googlegroups.com, ManOLamancha, Peter Götz

In my experience with OSIsoft PI, which is more oriented to industrial process monitoring this is how they handle annotations:

Stored in seperate files, not inline with the data (archives are files that contain a certain period of time, so not the same architecture or database system as HBase)
annotations are per timeseries and linked to individual events (datapoints)
maximum number of annotations per archive file (configurable), usually not many anotations compared to amount of datapoints
several annotations of different types (string, float, int, timestamp, ...) can be linked to a single datapoint but at my company we use mainly strings
fetching data can be done with or without the annotations, including annotations has a noticeable negative impact on speed
PI does not have the concept of tags like opentsdb does, so link an annotation to a metric or a metric+tag combination?

This is not much of an answer to the questions above, but other implementations can be a good inspiration.

HBase might impose a certain (optimal) strategy but my knowledge of HBase (or lack of) is not sufficient to comment on a specific implementation.

Peter

ManOLamancha

unread,

Jan 30, 2013, 2:19:35 PM1/30/13

to open...@googlegroups.com, ManOLamancha, Peter Götz

Thanks Tsuna,

On Wednesday, January 30, 2013 1:34:21 PM UTC-5, tsuna wrote:

I would like to get more feedback on how we should go about
implementing support for annotations. I'm in favor of storing
annotations in-line with the data. But there are still several
questions to be answered:
- Are annotations global or per time-series? Or possibly we support
both types of annotations?

I think most annotations would be specific to a timeseries. That certainly makes it super fast to add to graphs when executing a retrieval. Support for global notes would be pretty simply, we could store them in a dedicated row in the 'tsdb-uid' table. But queries that should return results would now have to check both tables.

Four our purposes, we just want to be able to tag events on a graph so we can correlate issues as they appear. Lets see what everyone else wants to use them for.

- In either case, how do we write an annotation to TSDB?

HTTP/Telnet API. I'd prefer to accept a JSON format as....

- Are annotations just plain strings, or should they be more complex objects?

... I think they should be complex objects. Peep: https://github.com/manolama/opentsdb/blob/Scratch/src/core/Annotation.java. Each note should have a short "description" to display on the graph without taking up much space, then folks could click on the flag in a GUI to get to the detailed notes. And each note should be able to have an end-time that users could set to say when the "event" was over.

- [probably some other things I can't think of right now]?

Searching :) My fork has Lucene for indexing meta and annotations so we'll have to spin up a discussion about that in another thread.

ManOLamancha

unread,

Jan 30, 2013, 2:24:33 PM1/30/13

to open...@googlegroups.com, ManOLamancha, Peter Götz

On Wednesday, January 30, 2013 2:06:39 PM UTC-5, Peter Speybrouck wrote:

several annotations of different types (string, float, int, timestamp, ...) can be linked to a single datapoint but at my company we use mainly strings

Is it a complex object or are limited to "store string X with data point Y"? What do ya'll use the annotations for in PI? (sounds like a neat app, I'll have to read up on it)

fetching data can be done with or without the annotations, including annotations has a noticeable negative impact on speed

Storing these inline with the data points means there isn't any impact on the query speed (other than deserialization when a note is found). But if we needed to include global support, that could impact speed a bit since we'd have to perform a separate lookup on each query.

Frederik Kraus

unread,

Jan 30, 2013, 2:29:30 PM1/30/13

to tsuna, OpenTSDB, ManOLamancha, Peter Götz

Our main use-case (and also why Peter worked on annotations) is annotating time series with events. Events could be things like a deployment and/or config change. To correlate such events with changes in metrics we select and display specific annotations together with our graphs. Descriptions would be super helpful.

Simon Matic Langford

unread,

Jan 30, 2013, 5:10:19 PM1/30/13

to Frederik Kraus, tsuna, OpenTSDB, ManOLamancha, Peter Götz

Having written an annotations layer in our proxy over tsdb (albeit stored in a MySQL db which we want to dispose of), we've had some experience of using them with a production system.

We've certainly found with the annotations we use that it's often hard to work out when an annotation (like an outage or deployment) is added as to what metrics it relates to, so we certainly would like to at least see the capability to store annotations globally, however we do find that we often choose not to display annotations, so we'd want to see the capability to choose whether to return global annotations or not on a particular query - which would save the extra query when not interested.

Additionally we've found it useful to store additional data rather than just a string with a timestamp, which is why we asked Peter to make the code accept something like a json string which could then be used to encode more complex data (for example we categorise annotations and often add a link to a wiki page with some more detail). Json for use is certainly a very desirable format as we have build a pure JS UI for OpenTSDB (something we'd like to open source at some point).

ManOLamancha

unread,

Jan 30, 2013, 5:55:08 PM1/30/13

to open...@googlegroups.com, Frederik Kraus, tsuna, ManOLamancha, Peter Götz, si...@exemel.co.uk

Fred, Simon, would the data format I proposed fit your needs? I had pinged Peter a while ago about the changes and he said he'd forward the data on to his Ops folks to see if it worked. Thanks!

Simon Matic Langford

unread,

Jan 31, 2013, 2:12:18 AM1/31/13

to ManOLamancha, open...@googlegroups.com, Frederik Kraus, tsuna, Peter Götz, si...@exemel.co.uk

Yep, the custom field map does fine for all our current needs and i suspect gives us all the flexibility we need for future enhancements.

tsuna

unread,

Jan 31, 2013, 3:04:02 AM1/31/13

to ManOLamancha, open...@googlegroups.com, Peter Götz

On Wed, Jan 30, 2013 at 11:19 AM, ManOLamancha <clars...@gmail.com> wrote:
> I think most annotations would be specific to a timeseries. That certainly
> makes it super fast to add to graphs when executing a retrieval. Support for
> global notes would be pretty simply, we could store them in a dedicated row
> in the 'tsdb-uid' table. But queries that should return results would now
> have to check both tables.

I'd like to keep the "tsdb-uid" table as a meta-data table only. We
could store the global annotations as annotations on the metric ID
"0", since this metric ID isn't used at the moment.

Do global annotations have tags?

> Each note should have a short "description" to display on the graph without
> taking up much space, then folks could click on the flag in a GUI to get to
> the detailed notes. And each note should be able to have an end-time that
> users could set to say when the "event" was over.

Good idea. The end time is interesting, I hadn't thought about that.
I was imaging you'd have another annotation to say when something
ended. So if you wanna track, say, an outage, you'd have one
annotation for when the outage started, and one for when it ended.
Potentially with different descriptions. Although I guess if you have
the end time in the annotation, this allows you to show that the two
lines on the graph are related to one another.

> Searching :) My fork has Lucene for indexing meta and annotations so we'll
> have to spin up a discussion about that in another thread.

Yup.

--
Benoit "tsuna" Sigoure

Peter Speybrouck

unread,

Jan 31, 2013, 6:56:47 AM1/31/13

to open...@googlegroups.com, ManOLamancha, Peter Götz

For some extra inspiration...

a visual example of how osisoft PI and their display tool Processbook show annotations:
http://img521.imageshack.us/img521/5103/annot.png

At the top of the chart, you see that there is an annotation at that timestamp for one of the signals.
On the right you see the actual datapoints for one of the signals (type string in this case) with 3 bits for questionable, annotated, substituted. In this case, all datapoints have an annotation because it holds the username of the guy clicking on a button.
Bottom-right you see the annotation(s) of the selected datapoint. In this case only 1 string annotation (value, type, description).

Annother thing that is better visible in this example:
http://img20.imageshack.us/img20/8190/annot2.png

The dotted line between the annotation icons indicate that all datapoints between the indicated ones are annotated which looks better compared to a bunch of overlapping icons.

Though, I haven't had the time to check what the current implementation looks like.

Simon Matic Langford

unread,

Jan 31, 2013, 1:10:36 PM1/31/13

to Peter Speybrouck, OpenTSDB, ManOLamancha, Peter Götz

Similar, from our UI. It shows a recent HBase outage our TSDB suffered and all the changes to recover from it.

http://img856.imageshack.us/img856/7708/annotations.png

Ion Savin

unread,

Feb 1, 2013, 4:31:17 AM2/1/13

to open...@googlegroups.com

On 01/30/2013 09:29 PM, Frederik Kraus wrote:
> Our main use-case (and also why Peter worked on annotations) is
> annotating time series with events. Events could be things like a
> deployment and/or config change. To correlate such events with changes
> in metrics we select and display specific annotations together with our
> graphs. Descriptions would be super helpful.

With global annotations it could be useful to be able to use annotations
when specifying the start/end points

start=a:deploy{id=123}-48h
end=a:deploy{id=123}+48h

and to be able to specify several annotation "classes" to be added to
the graph (or none) in a way similar to how metrics are specified:

m=proc.loadavg.5min{host=*}
a=deploy{id=*},issue_report{id=*,component=frontend}

(to get the changes deployed in this interval and the issues reported
for example).

Regards,
Ion Savin

ManOLamancha

unread,

Feb 1, 2013, 2:51:07 PM2/1/13

to open...@googlegroups.com, ManOLamancha, Peter Götz

On Thursday, January 31, 2013 3:04:02 AM UTC-5, tsuna wrote:

I'd like to keep the "tsdb-uid" table as a meta-data table only. We
could store the global annotations as annotations on the metric ID
"0", since this metric ID isn't used at the moment.

Do global annotations have tags?

That'd work well for storage of the globals. If we use a metric row then we could easily add tags and filter like Ion mentioned though we'll have to come up with some examples. Otherwise, the annotation itself has a custom field which is a set of custom tag/value pairs.

tsuna

unread,

Feb 3, 2013, 6:17:38 AM2/3/13

to Ion Savin, open...@googlegroups.com

On Fri, Feb 1, 2013 at 1:31 AM, Ion Savin <co...@gmx.net> wrote:
> With global annotations it could be useful to be able to use annotations
> when specifying the start/end points
>
> start=a:deploy{id=123}-48h
> end=a:deploy{id=123}+48h

In order to be able to use an annotation instead of a timestamp, we
would need to maintain a mapping between the annotation and the
timestamp it represents. For instance in your query above, how do we
look up what timestamp is represented by "deploy{id=123}"?

--
Benoit "tsuna" Sigoure

Ion Savin

unread,

Feb 4, 2013, 5:24:47 AM2/4/13

to tsuna, open...@googlegroups.com

Hi tsuna,

>> With global annotations it could be useful to be able to use annotations
>> when specifying the start/end points
>>
>> start=a:deploy{id=123}-48h
>> end=a:deploy{id=123}+48h
>
> In order to be able to use an annotation instead of a timestamp, we
> would need to maintain a mapping between the annotation and the
> timestamp it represents. For instance in your query above, how do we
> look up what timestamp is represented by "deploy{id=123}"?

Apologies, my post wasn't really on topic (annotations implementation)
and I was looking at the two cases from a user perspective.

Would it make sense to have two concepts with different implementations?
* annotations - data attached to a data point stored in the same cell
* (global) events - named timestamps with data attached

I'm not familiar with HBase schema design and performance considerations
so the below might not make sense:

Schema:
row(event_class^timestamp)->family(id):column(event_id)->cell(data)

For the two use-cases:
1) when used in start/end; this should be expensive I guess but rare
scan(deploy^0L -> deploy^Long.MAX_LONG; family(id); column(123))

2) when overlayed on the graph
scan(deploy^start_time -> deploy^end_time; family(id); column(*))

If this doesn't make sense please ignore. The start/end time use-case is
in the "nice to have" category so no need to bend things for it.

Regards,
Ion Savin

ManOLamancha

unread,

Feb 7, 2013, 8:53:20 PM2/7/13

to open...@googlegroups.com, tsuna

On Monday, February 4, 2013 5:24:47 AM UTC-5, Ion Savin wrote:

If this doesn't make sense please ignore. The start/end time use-case is
in the "nice to have" category so no need to bend things for it.

I like Tsuna's idea of storing the "global" annotations in the uid=0 metric row in the same way you'd store one that is associated with a metric. Then when you build a query, you would normally get annotations associated with the metric(s), but you could specify something like "&global_annotations=true" and it would run a parallel query to scan the uid=0 row for annotations within the query timespan.

Reply all

Reply to author

Forward