Counting observations per Histogram bucket within a range

1,561 views
Skip to first unread message

José San Gil

unread,
Aug 27, 2021, 6:55:44 AM8/27/21
to Prometheus Users
Hi,

We're recording the duration of completed operations within our system. Every time something succeeds, we observe the duration using a Prometheus histogram.

- I set up the buckets using an exponential distribution: 1000, 5000, 25000, 125000, 625000.
- Each observation includes around 6 labels e.g: operationType, organizationId

I want to count the number of observations per bucket for a specific operation type within a given range. I tried the following queries (using a Grafana Bar Gauge with Heatmap format):

sum(operation_x_bucket{operationType="TypeA"}) by (le)


I also tried using the `increase` function, but I'm clearly missing the point here (I also tried with a fixed range vector).

sum(increase(operation_x_bucket{operationType="TypeA"}[$__range])) by (le)

The result in both case doesn't make sense. The values are way below the real number of successful operations.

I validated that operations are properly observed, so my guess is that I'm completely misunderstanding Prometheus _bucket usage. 

What's the proper way to count many observation per bucket do we have for given label value?

Bjoern Rabenstein

unread,
Aug 30, 2021, 4:49:49 PM8/30/21
to José San Gil, Prometheus Users
On 27.08.21 03:55, José San Gil wrote:
>
> We're recording the duration of completed operations within our system.
> Every time something succeeds, we observe the duration using a Prometheus
> histogram.
>
> - I set up the buckets using an exponential distribution: 1000, 5000,
> 25000, 125000, 625000.
> - Each observation includes around 6 labels e.g: operationType,
> organizationId
>
> I want to count the number of observations per bucket for a specific
> operation type within a given range. I tried the following queries (using a
> Grafana Bar Gauge with Heatmap format):
>
> sum(operation_x_bucket{operationType="TypeA"}) by (le)
>
>
> I also tried using the `increase` function, but I'm clearly missing the
> point here (I also tried with a fixed range vector).
>
> sum(increase(operation_x_bucket{operationType="TypeA"}[$__range])) by (le)
>
> The result in both case doesn't make sense. *The values are way below the
> real number of successful operations*.
>
> I validated that operations are properly observed, so my guess is that I'm
> completely misunderstanding Prometheus _bucket usage.
>
> *What's the proper way to count many observation per bucket do we have for
> given label value?*

Perhaps you are confused by the fact that the buckets are cumulative?
That means that the bucket marked by {le="25000"} contains all
observations of 25000 or below (not just those between 5000 and 25000).

If you want to get, let's say, the observations between 5000 and 25000
in the last 10m, you need something like

increase(operation_x_bucket{operationType="TypeA",le="25000"}[10m])
- increase(operation_x_bucket{operationType="TypeA",le="5000"}[10m])

--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

José San Gil

unread,
Sep 9, 2021, 3:42:36 AM9/9/21
to Bjoern Rabenstein, Prometheus Users
Hi,

Thanks for your reply.

I'm aware of the cumulative nature of the buckets. I assumed that if do:

increase(operation_x_bucket{operationType="TypeA", le="+Inf"}[24h]), I'd be getting the total number of observations of the bucket for that specific operationType in the last 24h. Is that correct?

The problem is the that the values I get are significantly lower than reality for a period of let's say 24 hours. I thought that maybe he buckets might not behave as a regular counter (it seems they do).

I do have multiple buckets (each k8s pod seems to be create a new one), so I used sum(increase(operation_x_bucket{operationType="TypeA", le="+Inf"}[24h])) to sum the observations.

Thanks,
--

José San Gill
Senior Full-Stack Developer
 

 

 

 

 


   


Candis GmbH
Friedrichstraße 200, 10117 Berlin
CEOs: Christian Ritosek, Christopher Becker
Registry court: Berlin Charlottenburg
Register number: HRB 168078

 

Bjoern Rabenstein

unread,
Sep 16, 2021, 12:58:22 PM9/16/21
to José San Gil, Prometheus Users
On 09.09.21 09:42, José San Gil wrote:
>
> I'm aware of the cumulative nature of the buckets. I assumed that if do:
>
> increase(operation_x_bucket{operationType="TypeA", le="+Inf"}[24h]), I'd be
> getting the total number of observations of the bucket for that specific
> operationType in the last 24h. Is that correct?

Sounds about right. Note that you will get multiple time series if
there are other labels, including target labels. For example, if you
have multiple different instances that have served this metric
(concurrently or over time), you'll get a separate result for each. To
get them all, you can sum the final result, i.e.

sum(increase(operation_x_bucket{operationType="TypeA", le="+Inf"}[24h]))

> The problem is the that the values I get are significantly lower than
> reality for a period of let's say 24 hours. I thought that maybe he buckets
> might not behave as a regular counter (it seems they do).

They do, and in fact, the `le="+Inf"` bucket should be identical to
the `operation_x_sum` time series (which is something you could check
in your case).

> I do have multiple buckets (each k8s pod seems to be create a new one), so
> I used sum(increase(operation_x_bucket{operationType="TypeA",
> le="+Inf"}[24h])) to sum the observations.

OK, so you did that already. If that gives unexpected results, there
must be something else happening. All the information you provided
looks OK to me.

Aliaksandr Valialkin

unread,
Sep 16, 2021, 1:37:42 PM9/16/21
to José San Gil, Bjoern Rabenstein, Prometheus Users
On Thu, Sep 9, 2021 at 10:42 AM José San Gil <jo...@candis.io> wrote:
increase(operation_x_bucket{operationType="TypeA", le="+Inf"}[24h]), I'd be getting the total number of observations of the bucket for that specific operationType in the last 24h. Is that correct?

The problem is the that the values I get are significantly lower than reality for a period of let's say 24 hours. I thought that maybe he buckets might not behave as a regular counter To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAOULDYmnSbgvJuHpso7t53gTB0GusvxGk2SiQDGChsVzy3hXeA%40mail.gmail.com.

The issue may be related to the fact that increase() in Prometheus may perform extrapolation in some cases, so the end result might be slightly different than expected. See https://github.com/prometheus/prometheus/issues/3746 for details.

--
Best Regards,

Aliaksandr Valialkin, CTO VictoriaMetrics
Reply all
Reply to author
Forward
0 new messages