Behavior of Prometheus wrt to scraping and evaluating interval.

akshay sharma

unread,

Jan 23, 2021, 2:14:47 AM1/23/21

to Prometheus Users

Hi,

I've one some queries wrt to prometheus scraping and evaluating intervals.

Queries as follows.

1)

Say a prometheus job with scrape interval 10 mins and evaluation interval 5 mins, has scraped metrics at at t1, when the same is queried on prometheus UI, as an instant vector, it is visible for around 3 minutes 30 seconds (Note:Here I am querying repeatedly for 3 minutes 30 seconds), at say approximately 4th minute, when I query the same I am seeing no data. And even if I query the same for a range vector metric_name[5m], I don't see the metrics (Note: Here, prometheus says, last s crape was around 4 minutes ago), but the I see the metrics when I give metric_name[10m]. Please explain the behaviour.

No data, when executing metric_name(instant vector)

Element	Value
no data

2)

This is my Prometheus alert rule

throughput{instance="x.x.x.x:xxxx"} > 100

After alert fired once, when the metric data missing for more than 1 minute, then alert will resolved. Is there any way to avoid this? Basically, I do not want a resolved alert, when there is no metrics.

3)

Say, there are 2 jobs, for one job, I need almost continuous monitoring, so, scrape interval is 1 min, and global evaluation interval is 1 min, there is another job which scrapes metrics at an interval of 10 mins, and since evaluation interval is always global, I think evaluation for the second job also is happening every 1 min, even though there are no metrics scraped, and the evaluation results in no_data ?. This will cause a problem in alerting as I mentioned above. Isn't it possible to have evaluation interval specific to job? Request you to provide some solution.

Thanks,

Akshay

Ben Kochie

unread,

Jan 23, 2021, 4:19:25 AM1/23/21

to akshay sharma, Prometheus Users

You need to scrape data faster than 5 minutes. 2 minutes is the recommended max to avoid stale data. Prometheus gets less efficient when you scrape slower than 1 minute.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAOrgXNKcJOUib%3D8XXS-TZe3%3DSiczMQK7-8vs4x%3DQ1AnQ9dcakQ%40mail.gmail.com.

akshay sharma

unread,

Jan 23, 2021, 4:29:34 AM1/23/21

to Ben Kochie, Prometheus Users

But my exporter is exporting the data every 15min after some processing ,and we are deleting the data as soon as Prometheus scrapes to avoid registering of same metric after 15minutes,as this a limitation with Prometheus go client. We need to unregister to avoid registering of same metric.

If Prometheus scrape every 1minutes, then next minute there will be no data till next 15minute.

How can we resolve this sync issue?

Stuart Clark

unread,

Jan 23, 2021, 5:32:25 AM1/23/21

to promethe...@googlegroups.com, akshay sharma, Ben Kochie, Prometheus Users

Removing the metric data shouldn't be done.

The expectation is that you can query an exporter at any point and receive metrics. Ideally that would be instant data from the time of the scrape, but for processes that might only happen occasionally or queries which take a while to produce you should return the latest result.

Prometheus needs to get data regularly (generally no more infrequently as every 2 minutes as previously mentioned). Scrapes which return the same value as the previous one are compressed heavily, so there is no real worries about storage space.

If a metric stops being scraped it becomes stale, and stops being returned as you have seen.

As a second point regarding exporters never removing state after a state part of the design for Prometheus is the way HA works is achieved by having both servers of a HA pair scrape each target. If exporters removed any state after a scrape this would break. Also it is very useful to be able to query an exporter manually if debugging potential metrics issues.

So in summary the recommendation would be to increase the scrape period to no slower than 2 minutes and adjust your exporter to always return the latest set of metrics, with that state being updated every 15 minutes as you do currently.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

akshay sharma

unread,

Jan 23, 2021, 5:55:41 AM1/23/21

to Stuart Clark, Prometheus Users, Ben Kochie

Thanks for your reply.

Actually the issue which are facing is ,

Let the exporter expose metric at every 15minutes, and let the Prometheus to scrape data every 2mintues.

But suppose at t1, Prometheus scrapes data and gets the metrics with some value say x(metric) and y(value).

I've a alert rule if metric >= y, alerts should be raised.

Now alert is raised as conditions is true, now Prometheus again scrapes data at t3 but now metrics is not available ..he gets nothing .

Then when again alert rules evaluates it is getting no data . But why alert is getting resolved??? Because there is no value how the condition is true? ?

Metric >=y ,will never be true in case of no data right.? Is need to understand this scenario.

Stuart Clark

unread,

Jan 23, 2021, 6:23:16 AM1/23/21

to akshay sharma, Prometheus Users, Ben Kochie

On 23/01/2021 10:55, akshay sharma wrote:
> Thanks for your reply.
>
> Actually the issue which are facing is ,
> Let the exporter expose metric at every 15minutes, and let the
> Prometheus to scrape data every 2mintues.
>
> But suppose at t1, Prometheus scrapes data and gets the metrics with
> some value say x(metric) and y(value).
>
> I've a alert rule if metric >= y, alerts should be raised.
>
> Now alert is raised as conditions is true, now Prometheus again
> scrapes data at t3 but now metrics is not available ..he gets nothing .
> Then when again alert rules evaluates it is getting no data . But why
> alert is getting resolved??? Because there is no value how the
> condition is true? ?
> Metric >=y ,will never be true in case of no data right.? Is need to
> understand this scenario.

Because at time=t3 as far as Prometheus is concerned the metric no
longer exists. The way alerts work is that the expression is evaluated,
and if there is a resulting set of metrics (could be many due to
different label combinations) alerts are created. With your query the
metric no longer exists, so the metric >= y will not return anything,
hence any pre-existing alert will be resolved.

The correct solution is to ensure the metric continues to exist, so
scraping at least every 2 minutes and ensuring the exporter always
returns data.

akshay sharma

unread,

Jan 23, 2021, 8:15:32 AM1/23/21

to Stuart Clark, Prometheus Users, Ben Kochie

Thanks for the clarification,

It makes sense, but why we are deleting the data as soon as Prometheus scrapes becuase next time (at t15minutes) we can't register same metric ,as it is already registered at time t1. We are using Prometheus go client and that's the limitation.

And if we delete data or unregistered at t15, and add new data or register again with new values then will loose data which was register by other go routines and which haven't processed yet.

So how can we monitor dynamic metrics ??

Or how can we resolve above issue?

Or can we do ,only set.value (for the metric was already register) instead of unregister and register again? Using goclient library.??

Stuart Clark

unread,

Jan 23, 2021, 8:34:48 AM1/23/21

to akshay sharma, Prometheus Users, Ben Kochie

On 23/01/2021 13:15, akshay sharma wrote:
> Thanks for the clarification,
>
> It makes sense, but why we are deleting the data as soon as Prometheus
> scrapes becuase next time (at t15minutes) we can't register same
> metric ,as it is already registered at time t1. We are using
> Prometheus go client and that's the limitation.
>
> And if we delete data or unregistered at t15, and add new data or
> register again with new values then will loose data which was register
> by other go routines and which haven't processed yet.
>
> So how can we monitor dynamic metrics ??
> Or how can we resolve above issue?
>
> Or can we do ,only set.value (for the metric was already register)
> instead of unregister and register again? Using goclient library.??

I'm not as directly familiar with the Go client library, but my
understanding is it operates very similarly with other ones such as the
Python one.

Broadly there are two ways to use the client library. The first is for
direct instrumentation of an application. With this you create
counter/gauge/etc objects and then when something happens (e.g.
increment a counter on an event, add request time to a counter) you use
those objects to adjust the current metric state. Separately you have
the HTTP interface which allows Prometheus (or any other system which
can understand OpenMetrics or Prometheus exposition format) to fetch the
current metric state.

The second mechanism, which sound like the one you actually want, is
commonly used for exporters where the metric information isn't created
via code running and adjusting objects directly, but instead by calling
an external service and converting the returned data or (as in your
case) querying a cache of such data. For that you would register a
custom collector which is called on every scrape. That collector is then
responsible for doing whatever is needed to obtain the information
(again for you just look at some local state data storage) and then
convert and return the data in the correct Prometheus format. The
difference between the two can be easily seen with a counter. For direct
instrumentation you would have an object that represents the counter and
then you would use the increment method to adjust its value based on the
events that are occurring (e.g. adding the timing of an action to a
counter). At no point does the main code know or care what the current
value of the counter is - that is purely for the HTTP interface to use.
For an exporter the custom collector instead creates an object during
the scrape process (rather than it being created on startup) and
directly sets the value (not using increments).

Some Python code can be found here to explain that:
https://godoc.org/github.com/prometheus/client_golang/prometheus#hdr-Custom_Collectors_and_constant_Metrics

For Go the equivalent documentation seems to be here:
https://godoc.org/github.com/prometheus/client_golang/prometheus#hdr-Custom_Collectors_and_constant_Metrics

From what you describe I'm not sure if you are using the custom
collector method or are trying to use the mechanism designed for direct
instrumentation.

akshay sharma

unread,

Jan 23, 2021, 8:48:20 AM1/23/21

to Stuart Clark, Prometheus Users, Ben Kochie

Thanks for the clarification,

We are using newgaugemetric collector, which in turn convert data in to Prometheus gauge format. We need to use gauge to monitor metric value at some point. Counter won't works in our case.

Function which we are using to convert in Prometheus gauge format.

prometheus.NewGauge(prometheus.GaugeOpts)

Can you please put some light on custom collector or any example we can refer to or any exporter where someone used same custom collectors.??