I have a monitoring probe that currently manually generates a "/metrics"
page using custom code, that I'm converting to use a standard library
(via first writing said standard library for Perl - more on that in a
later email).
The main metric that this probe exports is a histogram of "send" and
"receive" roundtrips for a messaging system. As well as exposing the
count, sum, and bounded buckets, it also keeps track of a
locally-calculated "maximum over the past 1 minute", which I can then
use with the max_over_time() function in prometheus or grafana to plot
larger graphs of maximum roundtrip times, as well as the average.
This feels like a useful-enough feature that I'm considering adding it
to my client library. Perhaps as an option that can be enabled on a
Summary or Histogram metric, allowing it to track min/max/both of the
observation over a short period of time, and add that to the output
format, perhaps looking something like:
recv_rtt_count 3
recv_rtt_sum 3.397684
recv_rtt_bucket{le="0.01"} 0
recv_rtt_bucket{le="0.1"} 1
...
recv_rtt_max_1m 1.476539
"Best Practice" would encourage keeping that horizon as short as
possible, as it blurs out the graphs and also uses more memory in the
exporter, having to remember those values; but not so short as to risk
missing a collection. I keep mine at 1 minute because with a scrape
interval of 20s, each scrape should cover about 3 observations, which
I feel is appropriate.
Is there any precedent in existing client libraries for doing this?
Something I can steal naming ideas out of?
Barring any other idea, I was thinking something along the lines of
two new constructor arguments, something like
aggregate => "max" # To request the aggregation at all
aggregate_horizon => "1m" # To set the duration of time that is
stored
How does that sit with people?
--
Paul "LeoNerd" Evans
leo...@leonerd.org.uk |
https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ |
https://www.tindie.com/stores/leonerd/