How to use histogram_quantile

277 views
Skip to first unread message

stav alfi

unread,
Jul 12, 2021, 10:49:40 AM7/12/21
to Prometheus Users
Hi,

I have a histogram which measge how much time a call to a specific REST API took in seconds.

* The buckets of the histogram will be added at the end of the question.

I want to understand 2 things:

1. can histogram_quantile return a value which is different than all the bucket values? for example, if the buckets are: 1,2,3. can it return 1.7? because I'm not sure if I did something wrong or it's just an approximation of the real value.

2. in the folllowing screenshot (green=first query - call duration, yellow=second query - quantile), I'm seeing that the 99% quantile took more than the actual API call at the same time (look at 11:46 - yellow (quantile) > green (call duration)). how can it be?

Screen Shot 2021-07-12 at 17.38.57.png

_______________________________________________________________

csm_punteam_execution_call_processing_duration_seconds:

# HELP csm_punteam_execution_call_processing_duration_seconds how many seconds it took to call punteam execution REST API # TYPE csm_punteam_execution_call_processing_duration_seconds histogram csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.01",jobType="Run @ 5m",restStatusCode="200",executionStatusCode="0"} 0 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.05",jobType="Run @ 5m",restStatusCode="200",executionStatusCode="0"} 1 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.1",jobType="Run @ 5m",restStatusCode="200",executionStatusCode="0"} 4 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.2",jobType="Run @ 5m",restStatusCode="200",executionStatusCode="0"} 6 csm_punteam_execution_call_processing_duration_seconds_bucket{le="+Inf",jobType="Run @ 5m",restStatusCode="200",executionStatusCode="0"} 9 csm_punteam_execution_call_processing_duration_seconds_sum{jobType="Run @ 5m",restStatusCode="200",executionStatusCode="0"} 19.015002676 csm_punteam_execution_call_processing_duration_seconds_count{jobType="Run @ 5m",restStatusCode="200",executionStatusCode="0"} 9 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.01",jobType="Run @ 15m",restStatusCode="200",executionStatusCode="0"} 0 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.05",jobType="Run @ 15m",restStatusCode="200",executionStatusCode="0"} 1 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.1",jobType="Run @ 15m",restStatusCode="200",executionStatusCode="0"} 6 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.2",jobType="Run @ 15m",restStatusCode="200",executionStatusCode="0"} 9 csm_punteam_execution_call_processing_duration_seconds_bucket{le="+Inf",jobType="Run @ 15m",restStatusCode="200",executionStatusCode="0"} 9 csm_punteam_execution_call_processing_duration_seconds_sum{jobType="Run @ 15m",restStatusCode="200",executionStatusCode="0"} 0.861849852 csm_punteam_execution_call_processing_duration_seconds_count{jobType="Run @ 15m",restStatusCode="200",executionStatusCode="0"} 9 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.01",jobType="Run @ 25m",restStatusCode="200",executionStatusCode="0"} 0 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.05",jobType="Run @ 25m",restStatusCode="200",executionStatusCode="0"} 3 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.1",jobType="Run @ 25m",restStatusCode="200",executionStatusCode="0"} 8 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.2",jobType="Run @ 25m",restStatusCode="200",executionStatusCode="0"} 9 csm_punteam_execution_call_processing_duration_seconds_bucket{le="+Inf",jobType="Run @ 25m",restStatusCode="200",executionStatusCode="0"} 9 csm_punteam_execution_call_processing_duration_seconds_sum{jobType="Run @ 25m",restStatusCode="200",executionStatusCode="0"} 0.6730819760000001 csm_punteam_execution_call_processing_duration_seconds_count{jobType="Run @ 25m",restStatusCode="200",executionStatusCode="0"} 9 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.01",jobType="Run @ 35m",restStatusCode="200",executionStatusCode="0"} 0 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.05",jobType="Run @ 35m",restStatusCode="200",executionStatusCode="0"} 3 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.1",jobType="Run @ 35m",restStatusCode="200",executionStatusCode="0"} 7 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.2",jobType="Run @ 35m",restStatusCode="200",executionStatusCode="0"} 9 csm_punteam_execution_call_processing_duration_seconds_bucket{le="+Inf",jobType="Run @ 35m",restStatusCode="200",executionStatusCode="0"} 9 csm_punteam_execution_call_processing_duration_seconds_sum{jobType="Run @ 35m",restStatusCode="200",executionStatusCode="0"} 0.691336511 csm_punteam_execution_call_processing_duration_seconds_count{jobType="Run @ 35m",restStatusCode="200",executionStatusCode="0"} 9 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.01",jobType="firstHalfExtraTime",restStatusCode="200",executionStatusCode="0"} 0 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.05",jobType="firstHalfExtraTime",restStatusCode="200",executionStatusCode="0"} 2 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.1",jobType="firstHalfExtraTime",restStatusCode="200",executionStatusCode="0"} 6 csm_punteam_execution_call_processing_duration_seconds_bucket{le="0.2",jobType="firstHalfExtraTime",restStatusCode="200",executionStatusCode="0"} 8 csm_punteam_execution_call_processing_duration_seconds_bucket{le="+Inf",jobType="firstHalfExtraTime",restStatusCode="200",executionStatusCode="0"} 9 csm_punteam_execution_call_processing_duration_seconds_sum{jobType="firstHalfExtraTime",restStatusCode="200",executionStatusCode="0"} 0.7996324570000001 csm_punteam_execution_call_processing_duration_seconds_count{jobType="firstHalfExtraTime",restStatusCode="200",executionStatusCode="0"} 9 

Bjoern Rabenstein

unread,
Jul 14, 2021, 1:28:26 PM7/14/21
to stav alfi, Prometheus Users
On 12.07.21 07:49, stav alfi wrote:
>
> 1. can histogram_quantile return a value which is different than all the
> bucket values? for example, if the buckets are: 1,2,3. can it return 1.7?

Yes, it can. See the documentation at
https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile
:

"The histogram_quantile() function interpolates quantile values by
assuming a linear distribution within a bucket."

So if the bucket `le="2"` contains 10 samples, and the seventh
happened to coincide with the median (ar whatever you are
calculating), histogram_quantile will return 1.7.

> 2. in the folllowing screenshot (green=first query - call duration,
> yellow=second query - quantile), I'm seeing that the 99% quantile took more
> than the actual API call at the same time (look at 11:46 - yellow
> (quantile) > green (call duration)). *how can it be?*

Your 1st "green" query calculates _average_ duration (mean value of
all observed call durations). Your 2nd "yellow" query calculates the
99th percentile duration (99% of the observed calls had a duration
shorter than that). It's not surprising that the average is lower than
the 99th percentile.

--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in
Reply all
Reply to author
Forward
0 new messages