building histograms from gauges

388 views
Skip to first unread message

jbu...@gmail.com

unread,
Jul 19, 2018, 6:47:23 PM7/19/18
to Prometheus Users

is there no way in prometheus to take a series of gauges (e.g. load average across all hosts) and then calculate a derived histogram from that at query time?


If not, what's the best workaround for this, outside of accompanying every histogram with a gauge just in case you might want to aggregate it in this way?

Brian Brazil

unread,
Jul 20, 2018, 2:45:45 AM7/20/18
to jbu...@gmail.com, Prometheus Users


On Fri 20 Jul 2018, 00:47 , <jbu...@gmail.com> wrote:

is there no way in prometheus to take a series of gauges (e.g. load average across all hosts) and then calculate a derived histogram from that at query time?


Look at the quantile aggregator. Though I'm not sure the result is meaningful for load average.

Brian 

If not, what's the best workaround for this, outside of accompanying every histogram with a gauge just in case you might want to aggregate it in this way?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/6c2af7b5-816c-4661-a990-6e15c915a41e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matthias Rampke

unread,
Jul 20, 2018, 3:23:42 AM7/20/18
to Brian Brazil, Prometheus Users, jbu...@gmail.com
What would you like to do with the histogram?

For showing an overview of the state of a cluster, I like to throw collections of gauges into Grafana heat maps. This isn't very useful quantitatively (I can't get it to reliably count), but it is great for seeing at a glance if e.g. there is an outlier or load is rising across the cluster. Plotting quantiles or max/avg/min takes longer to interpret right.

This is not at query time, but I suppose you could construct the histogram with a series of recording rules like

hist{le="+Inf"} = count(gauge) or vector(0)
hist{le="100"} = count(gauge <= 100) or vector(0)

and so on. please excuse the old syntax, it's too early to write YAML on a touch keyboard.

/MR

Jonathan Burdge

unread,
Jul 20, 2018, 2:12:06 PM7/20/18
to brian....@robustperception.io, promethe...@googlegroups.com
Hi Brian,

Load average was a bad example, you're right.  A better example might be the distribution of fleet cpu utilization.  I'm not sure what you mean by quantile aggregator.  Do you mean the histogram_quantile() function?  My reading is that this is only going to work on histograms (it requires the le label).  Am I correct in this?

Jonathan Burdge

unread,
Jul 20, 2018, 2:17:29 PM7/20/18
to m...@soundcloud.com, brian....@robustperception.io, promethe...@googlegroups.com
Thanks Matthias,

What I'd like is to be able to determine e.g. the median or 95th percentile from arbitrary sets of gauges in an ad hoc way (sometimes because I don't control the system reporting the gauge.)  The heat map approach seems like a great visualization tool, but I definitely need more specific data as well. 

The recording a histogram from a gauge is an interesting approach, are there any major drawbacks of this approach?  Obviously the collection itself will be more resource intensive; is there anything else I should be aware of?

Jonathan Burdge

unread,
Jul 20, 2018, 10:48:35 PM7/20/18
to m...@soundcloud.com, brian....@robustperception.io, promethe...@googlegroups.com
Hi Matthias,

I tried this today, and it got me part of the way there, but it’s not giving me quite what I want.   I think I may be approaching this wrong. 

I recorded CPU utilization gauges into a histogram, as you suggested. 

When I take quantile(0.5, hist) I get the count of hosts that are less than 50% utilized, rather than the utilization of the median host. Can you help me understand what I’m doing wrong?

J



On Fri, Jul 20, 2018 at 12:23 AM Matthias Rampke <m...@soundcloud.com> wrote:

Julius Volz

unread,
Jul 21, 2018, 2:43:11 PM7/21/18
to Jonathan Burdge, Matthias Rampke, Brian Brazil, Prometheus Users
quantile() is meant to be used on a set of *non*-histogram series in this way:

  quantile(0.95, my_cpu_usage_metric)

This would give you the 95th percentile CPU usage across your fleet. See https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators

When you have a histogram, you can compute quantiles from it using histogram_quantile(), see https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile()

On Sat, Jul 21, 2018 at 4:48 AM, Jonathan Burdge <jbu...@gmail.com> wrote:
Hi Matthias,

I tried this today, and it got me part of the way there, but it’s not giving me quite what I want.   I think I may be approaching this wrong. 

I recorded CPU utilization gauges into a histogram, as you suggested. 

When I take quantile(0.5, hist) I get the count of hosts that are less than 50% utilized, rather than the utilization of the median host. Can you help me understand what I’m doing wrong?

J


On Fri, Jul 20, 2018 at 12:23 AM Matthias Rampke <m...@soundcloud.com> wrote:
What would you like to do with the histogram?

For showing an overview of the state of a cluster, I like to throw collections of gauges into Grafana heat maps. This isn't very useful quantitatively (I can't get it to reliably count), but it is great for seeing at a glance if e.g. there is an outlier or load is rising across the cluster. Plotting quantiles or max/avg/min takes longer to interpret right.

This is not at query time, but I suppose you could construct the histogram with a series of recording rules like

hist{le="+Inf"} = count(gauge) or vector(0)
hist{le="100"} = count(gauge <= 100) or vector(0)

and so on. please excuse the old syntax, it's too early to write YAML on a touch keyboard.

/MR
On Fri, Jul 20, 2018, 08:45 Brian Brazil <brian.brazil@robustperception.io> wrote:


On Fri 20 Jul 2018, 00:47 , <jbu...@gmail.com> wrote:

is there no way in prometheus to take a series of gauges (e.g. load average across all hosts) and then calculate a derived histogram from that at query time?


Look at the quantile aggregator. Though I'm not sure the result is meaningful for load average.

Brian 

If not, what's the best workaround for this, outside of accompanying every histogram with a gauge just in case you might want to aggregate it in this way?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CABSx2R1npt%2BGVu0wTZW2LKC65dw0ZA4kgE%3DBFtkHb39_WWEepg%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages