Adding percentile to OpenTSDB

netshade

unread,

Oct 3, 2011, 11:30:14 AM10/3/11

to OpenTSDB

I was thinking of adding a percentile() function to OpenTSDB, but as
I'm just getting to know the source code, figured I'd ask a few
questions.

Currently the aggregators count on a streaming arrival of data from
SpanGroup / DataPoints object - and for the metrics provided ( max/min/
sum/avg ), data ordering isn't important. For the percentile
functions I'm aware of, the way aggregators work would have to be
fundamentally changed to do a pre-run to generate the range of numbers
to sort to ascertain the percentile. The naive way would be to keep
the set in memory, which really just seems terrible to me. The less
naive way (though complicated) would seem to be to add a secondary
storage for metrics as received to bucket them accordingly and then
make the percentile calculation at call time by inspecting bucket
contents. Is there any work already ongoing here? Before I jump in,
would just be curious to know.

Also, one of the other messages in the mailing list seemed to indicate
that out of order data arrival may be something OpenTSDB may not
support in the future. That is to say, doing:

put <the-metric> <value> <2011 timestamp>
put <the-metric> <other-value> <2010 timestamp>

would be invalid at some near point in OpenTSDB's life due to planned
future optimizations. ( The discussion that led me to believe this
may be the case spoke of data-ordering; if I infer the point
incorrectly, I apologize ) Is this still the case?

Thanks,

Chris Z

tsuna

unread,

Oct 3, 2011, 12:50:12 PM10/3/11

to netshade, OpenTSDB

On Mon, Oct 3, 2011 at 8:30 AM, netshade <nets...@gmail.com> wrote:
> Currently the aggregators count on a streaming arrival of data from
> SpanGroup / DataPoints object - and for the metrics provided ( max/min/
> sum/avg ), data ordering isn't important. For the percentile
> functions I'm aware of, the way aggregators work would have to be
> fundamentally changed to do a pre-run to generate the range of numbers
> to sort to ascertain the percentile. The naive way would be to keep
> the set in memory, which really just seems terrible to me. The less
> naive way (though complicated) would seem to be to add a secondary
> storage for metrics as received to bucket them accordingly and then
> make the percentile calculation at call time by inspecting bucket
> contents. Is there any work already ongoing here? Before I jump in,
> would just be curious to know.

Two things you need to be aware of:
(1) As soon as the compaction branch is merged, my next big code
change will be to completely rewrite the read path to be fully
asynchronous / non-blocking, and work in a streaming fashion such that
the TSD will not have to hold all the data points of a query in
memory.
(2) As a consequence of (1), all the algorithms used in the
read-path have to be single-pass algorithms. In other words, they get
to look at each data point once and can only afford modest amounts of
extra memory storage.

To compute a percentile, the naive approach is, as you said, to sort
all the data points and then find the percentile from there. This
approach cannot be used in OpenTSDB for the two reasons above. The
second naive approach is to create a number of buckets and keep counts
of how many data points fall in each bucket. This approach only works
if you have a pretty good idea of what the distribution looks like
before you start, which isn't the case in OpenTSDB since the data is
arbitrary.

I too want to have percentile functions, but they need to be
implemented using state-of-the-hard streaming percentile methods. A
good starting point is to read "Quantiles on Streams" by Chiranjeeb
Buragohain and Subhash Suri, as I believe it does a good job at
summarizing the state of the art.
http://www.cs.ucsb.edu/~suri/psdir/ency.pdf

> Also, one of the other messages in the mailing list seemed to indicate
> that out of order data arrival may be something OpenTSDB may not
> support in the future. That is to say, doing:
>

> put <the-metric> <2011 timestamp> <value>
> put <the-metric> <2010 timestamp> <other-value>

>
> would be invalid at some near point in OpenTSDB's life due to planned

It's already "invalid" in the sense that if you send both data points
to the same TSD, the 2nd one will be rejected. If you send them both
to different TSDs, it'll work simply because the TSDs don't talk to
one another in order to keep things simple, and because technically
right now you can store out-of-order data if you're somewhat careful
and you know what you're doing. What I mean by that is that there's a
couple things to be aware of that can cause data problems when you
start storing data out of order. Most problems can be fixed
automatically by the "fsck" command, others might require you to go
and perform some manual surgery on the data table.

> future optimizations. ( The discussion that led me to believe this
> may be the case spoke of data-ordering; if I infer the point
> incorrectly, I apologize ) Is this still the case?

Yes.

--
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

netshade

unread,

Oct 4, 2011, 4:36:51 PM10/4/11

to OpenTSDB

Cool, thanks for the pointers Benoit.

Chris Z

On Oct 3, 12:50 pm, tsuna <tsuna...@gmail.com> wrote:

> summarizing the state of the art.http://www.cs.ucsb.edu/~suri/psdir/ency.pdf

Reply all

Reply to author

Forward