Most of the time metrics are coming as empty

Srinivasa praveen

unread,

Apr 23, 2020, 2:59:06 PM4/23/20

to Prometheus Users

My Setup

Hi, I am using one Prometheus instance(internal to my cluster) and there is a cloud instance of Prometheus. Internal Prometheus scrapes metrics from different exporters and there is a pusher process which federates metrics from internal Prometheus and pushes to push gateway sitting in the cloud. The cloud instance of Prometheus scrapes from this push gateway.

Internal Prometheus scraping interval is 5seconds for almost all the exporters except one custom exporter. This custom exporter will execute some queries in the database and export the query results as metrics. Scraping interval for this particular exporter endpoint is set to 30 minutes in internal Prometheus. And my pusher scraping interval is 1 minute, which means every 60 seconds it federates from internal Prometheus and pushes to push gateway.

Problem that I am facing

Metrics from all exporters are being transmitted by pusher to push gateway regularly. But, some times(frequently), the metrics from custom exporter are sent as empty by pusher to push gateway.

My Observation

From my investigation, I can confirm that internal Prometheus is scraping the metrics from my exporter regularly every 30 minutes(I saw this in my exporter logs). And as part of my troubleshooting, when I query for metrics(related to my custom exporter) from internal Prometheus using /api/v1/query?query=<metric_name> I see most of the time empty result set, except some times, which is around the time when Prometheus is scraping from my exporter, which is like every 30 minutes. So, I could see the metrics using the api, only during the time when Prometheus is scraping. And, I am able to get the metrics using this api for very less time hardly 20 to 30 seconds. After that again I am seeing empty result set.

My understanding is like this, the api I am using(for troubleshooting) will give the instant metric values (not intended to provide the latest/recent data). The same thing is happening with the pusher process while it is federating (the interval is every 60 seconds). Since Prometheus scraping interval is too long (30 minutes) and the current metric values are retained for few seconds only, by the time pusher federating, Prometheus is giving the empty set for this particular exporter most of the times.

Now, I want to know, is there a way to configure Prometheus to retain the recently/latest scrapped values as current values. Or is there a way to federate with a range like [30m] rather than the instant values.

Please help me out here and do let me know if my understanding is not correct.

Thanks

Stuart Clark

unread,

Apr 23, 2020, 4:32:45 PM4/23/20

to Srinivasa praveen, Prometheus Users

While your mechanism to send data between servers wouldn't be recommended (either straight federation or something like Thanos would be preferable) it sounds like the issue you are seeing is due to your long scrape interval. Due to staleness the maximum interval is around 2 minutes, so in your case for the majority of the time Prometheus is recording the time series as stale and therefore not returning anything if queried.

This is by design, so the main answer would be to reduce the scraping interval for your custom exporter.

-- 
Stuart Clark

Srinivasa praveen

unread,

Apr 24, 2020, 1:57:08 AM4/24/20

to Prometheus Users

Thanks for the response Stuark. The reason behind keeping the scraping interval so long is, on receiving scrape request from Prometheus, my exporter performs around 10 queries against database and exposes the result as 10 metrics, which will take around 15 minutes to complete all the queries. And Prometheus scrape was timing out. So, to increase the scrape_timeout I had to increase the scrape_interval also.

So, from your response, I understood that, if the scraping interval is more than 2 minutes, mostly Prometheus considers this as stale data. I think that is the reason, why I am seeing metrics for very few seconds, when I query using the api. And when I use range vector in my api (like <my_metric>[20m]) I see some data. Is my understanding correct here ?

So, do you suggest to change the exporter to periodically perform db queries in the back end and store the results in memory/file handy, so that it can respond back immediately when ever Prometheus scrapes (Instead of performing db queries on receiving the scrape request). Then scraping won't timeout and I can configure less than 2 minutes of scraping interval.

Julius Volz

unread,

Apr 24, 2020, 3:08:25 AM4/24/20

to Srinivasa praveen, Prometheus Users

To be more precise about maximum scrape times: an instant vector selector in PromQL (e.g. foo{bar="baz"}) looks back a maximum of 5 minutes from the current evaluation timestamp to find the latest sample for each matched series before that timestamp. If the last sample is more than 5 minutes old, nothing is returned at that evaluation step. This lookback duration is configurable via the --query.lookback-delta command-line flag, though most people will want to keep it at 5m (to not return super old data points as "current"). So in practice, if you take into account that scrapes can also fail once or twice due to various errors, it's a good idea to keep the scrape interval significantly lower than the lookback delta (like 2m).

And yeah, probably the best thing to do in your situation is to make the exporter cache the DB query results and scrape it more frequently. Or if really all your Prometheus does is scrape this exporter slowly, you could consider increasing the lookback delta. But in any case for such a slow computation in the backend it probably makes sense to decouple it from Prometheus scrapes, so that your scrapes don't take >15 minutes.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/8eebdfd4-ab02-4632-bb9a-a60f824fe4fb%40googlegroups.com.

Brian Candler

unread,

Apr 24, 2020, 4:44:24 AM4/24/20

to Prometheus Users

On Friday, 24 April 2020 06:57:08 UTC+1, Srinivasa praveen wrote:

Thanks for the response Stuark. The reason behind keeping the scraping interval so long is, on receiving scrape request from Prometheus, my exporter performs around 10 queries against database and exposes the result as 10 metrics, which will take around 15 minutes to complete all the queries. And Prometheus scrape was timing out. So, to increase the scrape_timeout I had to increase the scrape_interval also.

I think a better option is: run your slow queries from cron every 30 minutes, and write the results into a metrics file which is picked up using node_exporter textfile collector.

This means you can scrape it as often as you like, including from multiple prometheus servers for HA.

Also, textfile collector exposes a metric with the timestamp of the file, so you can alert if the file isn't being updated for any reason: useful to spot cronjobs that are persistently failing.

- name: Hourly

interval: 1h

rules:

- alert: StaleTextFile

expr: time() - node_textfile_mtime_seconds > 7200

for: 2h

labels:

severity: warning

annotations:

summary: "textfile-collector file has not been updated for more than 3 hours"

I also suggest: move the metric file into place only when your slow query has completed successfully.

(

...

) >/var/lib/node_exporter/sqlmetrics.prom.new && mv /var/lib/node_exporter/sqlmetrics.prom.new /var/lib/node_exporter/sqlmetrics.prom

Srinivasa praveen

unread,

Apr 24, 2020, 1:19:21 PM4/24/20

to Prometheus Users

Thanks Brian Candler, for your valuable suggestion. We will try this also.

Srinivasa praveen

unread,

Apr 24, 2020, 1:20:45 PM4/24/20

to Prometheus Users

Thanks Julius for the info and confirming about the approach.

To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.

Reply all

Reply to author

Forward