Reset timers/histograms when reporting to Ganglia?

786 views
Skip to first unread message

pablo fernandez

unread,
Oct 16, 2013, 4:53:03 PM10/16/13
to metrics-user
Hi,

I'm reporting percentiles to Ganglia, just wanted to check if the Reservoir values are reset each time the reporter asks for them (seems like the logical thing to do). If not, why?

Also, is it a good idea to report these percentiles to Ganglia? My guess is that they you are fine until the RRDB starts to compact them, but would like to hear some opinions.

Pablo

Coda Hale

unread,
Oct 16, 2013, 5:02:04 PM10/16/13
to metric...@googlegroups.com
They are not reset.

It's a fine idea to report percentiles.


--
You received this message because you are subscribed to the Google Groups "metrics-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metrics-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Coda Hale
http://codahale.com

Pablo Fernandez

unread,
Oct 16, 2013, 5:49:56 PM10/16/13
to metric...@googlegroups.com
But isn't it better to reset every read and then calculate percentiles? If you don't reset them, how can you make historic values not influence the more recent "real" values?

Coda Hale

unread,
Oct 16, 2013, 7:26:54 PM10/16/13
to metric...@googlegroups.com
You may want to look at the documentation for and implementation of the reservoirs.

Long story short: there are no such things as "real values" unless you're sending all recorded values downstream for aggregation. You're using Ganglia, which isn't capable of that, which means something has to aggregate. That something is Metrics.

pablo fernandez

unread,
Oct 17, 2013, 6:58:57 AM10/17/13
to metrics-user
Thanks Coda, I've just read the docs and was wondering: If I'm reporting to Ganglia exactly every 1 minutes shall I use a SlidingTimeWindowReservoir of 1 minute instead of the default?

Also (and maybe this is a question for the Ganglia mailing list) are percentiles useless once Ganglia (or RRDtool) starts compressing them due lack of space? I can think of cases where making a "percentile average" is not ideal. What do you guys use for compacting historic precentiles?

Thanks again.


--
You received this message because you are subscribed to a topic in the Google Groups "metrics-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/metrics-user/0O7Uyd2kx7c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to metrics-user...@googlegroups.com.

Justin Mason

unread,
Oct 17, 2013, 8:49:47 AM10/17/13
to metric...@googlegroups.com
hi Pablo --

for certain reporters at least, I agree.  For example, when displaying a time-series chart of percentiles over time in Graphite, it makes sense to be able to report the percentile distribution for measurements which took place *within* a single time period in the graph, rather than using the decaying reservoir approach, which will report the percentile distribution for measurements *including* that time period.

I've implemented this (sorry, still against Metrics 2.2.0) using the following class: https://gist.github.com/jmason/7024259

We've been using this in production for several months, and myself and the other consumers of our metrics are generally happier -- IMO this is a more comprehensible behaviour for Graphite, and consistent with how percentile measurements work in e.g. Voldemort, Amazon's internal metrics, etc.

--j.



pablo fernandez

unread,
Oct 17, 2013, 9:53:09 AM10/17/13
to metrics-user
Hi Justin,

Awesome stuff! thanks for sharing. A few questions:

- Do you use the default reservoir for your timers? or do you switch to the SlidingTimeWindowReservoir with a certain period? just thinking out loud here, what would be the difference between using a SlidingTimeWindowReservoir with 5 minutes and using the default reservoir, reporting it every 5 minutes and then clearing?

- I guess Graphite uses round-robin-databse or something similar, how do you handle percentile compression? do you get the "mean" between percentiles and just accept the fact that it's not entirely correct? or is there an alternative?

Thanks so much.


--
You received this message because you are subscribed to a topic in the Google Groups "metrics-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/metrics-user/0O7Uyd2kx7c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to metrics-user...@googlegroups.com.

Justin Mason

unread,
Oct 17, 2013, 10:06:56 AM10/17/13
to metric...@googlegroups.com
On Thu, Oct 17, 2013 at 2:53 PM, pablo fernandez <fernande...@gmail.com> wrote:
- Do you use the default reservoir for your timers? or do you switch to the SlidingTimeWindowReservoir with a certain period? just thinking out loud here, what would be the difference between using a SlidingTimeWindowReservoir with 5 minutes and using the default reservoir, reporting it every 5 minutes and then clearing?


We use the default reservoir (since we're still on 2.2.0, which I don't think supports switching reservoirs).

I considered the approach of upgrading to 3.0.0 and using the SlidingTimeWindowReservoir with a 1 minute period, and reporting every minute, but as the docs note, 'While SlidingTimeWindowReservoir is easier to understand than ExponentiallyDecayingReservoir, it is not bounded in size, so using it to sample a high-frequency process can require a significant amount of memory. Because it records every measurement, it’s also the slowest reservoir type.'  These limitations didn't work well for some of our use-cases.


- I guess Graphite uses round-robin-databse or something similar, how do you handle percentile compression? do you get the "mean" between percentiles and just accept the fact that it's not entirely correct? or is there an alternative?

Yes -- we accept it's not entirely correct.  Our retention settings keep 1-minute data for 24 hours, then compress it to 5 minutes per data point, using averaging; this averages the 5 percentile measurements for each 5 minute period.  That's pretty unavoidable in graphite, unfortunately, until they come up with a way to support percentiles natively.

Graphite's compression causes problems in general, for all kinds of metrics.  It's suboptimal, but that's graphite :(

--j. 


pablo fernandez

unread,
Oct 17, 2013, 10:32:20 AM10/17/13
to metrics-user
Cool, we're on Ganglia (not Graphite) and metrics 3 so I'll just switch to the SlidingTimeWindowRservoir and set it to the reporting interval. If I run into any issues I'll chime in so you know before upgrading. Least I can do :)

Thanks again!


Reply all
Reply to author
Forward
0 new messages