Lookback delta different for recording rules?

279 views
Skip to first unread message

Marek Skalicky

unread,
Jul 23, 2021, 4:04:50 AM7/23/21
to Prometheus Users
Hi,
I'm running prometheus with lookback-delta=9m and scraping a metric every 5 minutes. I've seen issue when evaluating 'increase(ifHCInOctets{network_key="xxx"}[10m])/2' in recording rule.
I've reported that as a bug in https://github.com/prometheus/prometheus/issues/9092, but I was told the lookback-delta should be 15m so it's an issue of configuration.

But when I do the same query in grafana it shows correctly. So does the lookback-delta have special meaning for recording rules?
My understanding was that the lookback delta is used per individual data points, so the fact that the query/rule has range vector 10m has nothing to do with the lookback delta. I have also recording rules doing "the same" with 1h and 24h range vector and those work fine.

Could anyone please explain why I should use 15m lookback-delta?

Many thanks,
Marek Skalicky

Brian Brazil

unread,
Jul 23, 2021, 4:26:09 AM7/23/21
to Marek Skalicky, Prometheus Users
 

Many thanks,
Marek Skalicky

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c73ad64d-0958-442d-88c5-2d5a6f2a738an%40googlegroups.com.


--

Julien Pivotto

unread,
Jul 23, 2021, 5:09:20 AM7/23/21
to Marek Skalicky, Prometheus Users
In your case you have a very low scrape interval, 300s. That means that
for your prometheus server, if you query data, you would find some
oddities with the default lookback delta.

When I did look at the issue, I did not notice that you were using
increase. Then, loopback delta does not play a role. Instead, you should
use:

irate(ifHCInOctets{network_key="xxx"}[15m])

which seems a better option.

15 min should ensure there is at least 2 datapoints, and irate will do
the calculation between the last 2 datapoints.

>
> Many thanks,
> Marek Skalicky
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c73ad64d-0958-442d-88c5-2d5a6f2a738an%40googlegroups.com.


--
Julien Pivotto
@roidelapluie

Marek Skalicky

unread,
Jul 23, 2021, 7:59:51 AM7/23/21
to Prometheus Users
Thanks for suggestions.
The difference is that in case of missed scrape , I want the rule to produce to produce "nothing" for the missed instance. So having a gap in the graph. I could use 'irate()[10m] * 300' as you suggest, but I don't see the difference with increase(). Also I'm not using the default lookback-delta but increased delta 9m.

Any idea why the range query with the same datasource in grafana works, but the rule fails to create metrics for some instances?

Marek

Julien Pivotto

unread,
Jul 23, 2021, 8:09:42 AM7/23/21
to Marek Skalicky, Prometheus Users
On 23 Jul 04:59, Marek Skalicky wrote:
> Thanks for suggestions.
> The difference is that in case of missed scrape , I want the rule to
> produce to produce "nothing" for the missed instance. So having a gap in
> the graph. I could use 'irate()[10m] * 300' as you suggest, but I don't see
> the difference with increase(). Also I'm not using the default
> lookback-delta but increased delta 9m.
>
> Any idea why the range query with the same datasource in grafana works, but
> the rule fails to create metrics for some instances?

Grafana and Prometheus probably do not run the query at the same
timestamp.

You could enable the query log and provide us one query from grafana and
one from prometheus recording rules to get further assistance:

https://prometheus.io/docs/guides/query-log/
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/90ceef8f-c8d8-42af-95fa-ec62a393be07n%40googlegroups.com.


--
Julien Pivotto
@roidelapluie

Marek Skalicky

unread,
Jul 23, 2021, 9:18:14 AM7/23/21
to Prometheus Users
The query from grafana is:
{"httpRequest":{"clientIP":"::1","method":"GET","path":"/api/v1/query"},"params":{"end":"2021-07-23T13:06:47.000Z","query":"increase(ifHCInOctets{network_key=\"xxx1\"}[10m]) / 2","start":"2021-07-23T13:06:47.000Z","step":0},"stats":{"timings":{"evalTotalTime":0.012282801,"resultSortTime":0,"queryPreparationTime":0.001716968,"innerEvalTime":0.010520843,"execQueueTime":0.000023908,"execTotalTime":0.01231981}},"ts":"2021-07-23T13:06:46.574Z"}

and from rule evaluation:
{"params":{"end":"2021-07-23T13:08:24.309Z","query":"increase(ifHCInOctets{network_key=\"xxx1\"}[10m]) / 2","start":"2021-07-23T13:08:24.309Z","step":0},"ruleGroup":{"file":"/etc/prometheus/rules-snmp.yml","name":"stats_api_aggregate_snmp_xxx1"},"stats":{"timings":{"evalTotalTime":0.006878878,"resultSortTime":0,"queryPreparationTime":0.000889663,"innerEvalTime":0.005962612,"execQueueTime":0.000007889,"execTotalTime":0.0068928}},"ts":"2021-07-23T13:08:24.446Z"}

and the way how I query the timeseries produced by the recording rule:
{"httpRequest":{"clientIP":"::1","method":"GET","path":"/api/v1/query"},"params":{"end":"2021-07-23T13:15:08.000Z","query":"sum by (network_key,instance) (ifHCInOctets:increase{network_key=\"xxx1\"})","start":"2021-07-23T13:15:08.000Z","step":0},"stats":{"timings":{"evalTotalTime":0.003954003,"resultSortTime":0,"queryPreparationTime":0.000565806,"innerEvalTime":0.003363043,"execQueueTime":0.000026367,"execTotalTime":0.00399693}},"ts":"2021-07-23T13:15:07.513Z"}

Marek Skalicky

unread,
Jul 23, 2021, 9:36:16 AM7/23/21
to Prometheus Users
Also if I do the instant query via prometheus HTTP API at the time the rule was evaluated, I get all the time series I expect.

I'm doing:
curl -g 'localhost:9090/api/v1/query?query=sum+by+(instance)+(increase(ifHCInOctets{network_key="xxx1"}[10m]))/2&time=1627045704.309' | python -m json.tool

Although interesting is that when using just time=1627045704,  I get rounded values ("3596645003222" vs "3596645003221.9995"). Is that expected?

Reply all
Reply to author
Forward
0 new messages