How to calculate request_time type basic stats in a time range

1,705 views
Skip to first unread message

cubr...@gmail.com

unread,
May 4, 2017, 7:43:29 PM5/4/17
to Prometheus Users
I instrumented different components of my timeline to keep track of how long each takes to process. (In our case, with with Python client's `@time` decorator on a summary.) Let's say this is stored in a summary metric `execution_duration_seconds`, with a label `component` for the pipeline component. Because it's the Python client, I don't have quantiles, but only `_sum` and `_count` of the metric.

I would like to keep track of average, minimum, and maximum execution times by each component over the preceding 1h, as well as their standard deviation. For average, I've been using `increase(execution_duration_seconds_sum[1h]) / increase(execution_duration_seconds_count[1h])`, but I don't see how to do other stats. Various `<aggregation>_over_time` functions seem like they would work, but I get an error "range specification must be preceded by a metric selector, but follows a *promql.Call instead".

Davor

Björn Rabenstein

unread,
May 8, 2017, 8:20:36 AM5/8/17
to cubr...@gmail.com, Prometheus Users
On 5 May 2017 at 01:43, <cubr...@gmail.com> wrote:
> I would like to keep track of average, minimum, and maximum execution times
> by each component over the preceding 1h, as well as their standard
> deviation. For average, I've been using
> `increase(execution_duration_seconds_sum[1h]) /
> increase(execution_duration_seconds_count[1h])`, but I don't see how to do
> other stats. Various `<aggregation>_over_time` functions seem like they
> would work, but I get an error "range specification must be preceded by a
> metric selector, but follows a *promql.Call instead".

There are two answers to this:

(1) Prometheus is not event-based. The query "give me the longest
individual execution time that happened over the last one hour" cannot
be formulated in PromQL. max_over_time is not referring to events but
to values in the time series. You could for example ask "what was the
highest observed memory usage over the last one hour".

(2) max_over_time takes a range specification. However, you can only
create range specifications of vector selectors (i.e. just a metric
name, or a metric name with a label selector), not of expressions (cf.
https://github.com/prometheus/prometheus/issues/1227). The work around
is to create a recording rule for your expression. Then you can create
a range specification of the recording rule.

In combination, you could create the following recording rule:

component:execution_duration_seconds:avg_5m = sum by (job, component)
(increase(execution_duration_seconds_sum[5m])) / sum by (job,
component) (increase(execution_duration_seconds_count[5m]))

Then you can write

max_over_time(component:execution_duration_seconds:avg_5m[1h])

This is then the maximum over the last hour of the average execution
time (averaged over 5m). As said above, it's not the largest execution
time that has happened over the last hour.

--
Björn Rabenstein, Engineer
http://soundcloud.com/brabenstein

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany
Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg |
HRB 110657B
Reply all
Reply to author
Forward
0 new messages