Support for --tsdb.too-far-in-future.time-window in Prometheus

232 views
Skip to first unread message

Kesavanand Chavali

unread,
Dec 18, 2023, 1:41:41 AM12/18/23
to promethe...@googlegroups.com, Kesavanand Chavali
Hello Experts,

Is there any plan to support the said flag in Prometheus as well? I see that it is supported in Thanos but not in Prometheus. Without this Prometheus is not accepting metrics in Future (erroring with metrics too far in future).


Context:
We have a node where time is not in sync. we implemented an exporter in Go that will scrape the metrics from other exporters and corrects the timestamp. Prometheus scrapes this exporter but errors out with the above said message.

--
Thanks and Regards,
Kesav

Brian Candler

unread,
Dec 18, 2023, 7:13:39 AM12/18/23
to Prometheus Users
On Monday 18 December 2023 at 06:41:41 UTC Kesavanand Chavali wrote:
We have a node where time is not in sync. we implemented an exporter in Go that will scrape the metrics from other exporters and corrects the timestamp. Prometheus scrapes this exporter but errors out with the above said message.

I don't understand the problem.

If the node where the exporter is running has a bad clock, it shouldn't make any difference. Firstly, the exported metrics generally won't have a timestamp (it's an optional field). Secondly, even if they do, Prometheus ignores it anyway unless you set "honor_timestamps: true", which is almost always the wrong thing to do.

Do you mean you are using the remote_write protocol with the remote_write receiver, rather than scraping?

Perhaps you can sketch a diagram of what the components are, and how they are talking to each other.

Kesavanand Chavali

unread,
Dec 19, 2023, 1:03:25 AM12/19/23
to Prometheus Users
Thanks for quick response. Here is the complete scenario

Prometheus 2.45 version is used and remote writes to Thanos. Prometheus is run in agent mode. Thanos is hosted in EKS as a HA. Thanos 0.32.5 is used. Thanos receiver is setup to accept metrics 24hrs into past and 6hrs into future using following flags:
--tsdb.too-far-in-future.time-window=6h
--tsdb.out-of-order.time-window=24h

All times are in UTC. Say that correct time is 10:00AM. We have a system that is out of time sync with no NTP configured and is at 09:00AM. We have windows exporter to get OS metrics. We have a customer exporter written in GoLang that scrapes metrics from windows exporter and corrects the time to current UTC Time. This custom exporter is scrapped by Prometheus for every 4 minutes. The custom exporter is written in GoLang and uses NewMetricWithTimestamp to add the correct timestamp. Following are the observations:
If we scrape windows exporter directly from Prometheus and remote write to Thanos, Thanos accepts the metrics as Out of order ingestion is allowed.
If we scrape our custom exporter then we see metrics too old or too future error in Prometheus logs

Brian Candler

unread,
Dec 19, 2023, 2:59:24 AM12/19/23
to Prometheus Users
On Tuesday 19 December 2023 at 06:03:25 UTC Kesavanand Chavali wrote:
We have a customer exporter written in GoLang that scrapes metrics from windows exporter and corrects the time to current UTC Time. This custom exporter is scrapped by Prometheus for every 4 minutes.

Still makes no sense to me. Can you show some examples of the actual scrapes, e.g. tested using curl?

- windows_exporter should not be adding timestamps to metrics (although I don't have a running instance to test with) so there should be nothing to change
- your custom exporter should not be adding timestamps to metrics
- prometheus by default records the scrape time, not the metric timestamp

(Aside: there may be some metrics whose value *is* a timestamp, like node_boot_time_seconds in node_exporter. But a metric is just a number, so whether it's "in the future" or "in the past" makes no difference for that kind of metric)
 
The custom exporter is written in GoLang and uses NewMetricWithTimestamp to add the correct timestamp

Why are you doing this? Why not just use [Must]NewConstMetric? And why are you proxying through a custom exporter, rather than just having the agent scrape windows_exporter directly?
 
If we scrape windows exporter directly from Prometheus and remote write to Thanos, Thanos accepts the metrics as Out of order ingestion is allowed.
If we scrape our custom exporter then we see metrics too old or too future error in Prometheus logs

Do you have "honor_timestamps: true" in Prometheus agent? If so, why?

To me, it seems like you're swimming against the current here. Just do what Prometheus does by default, which is to set the timestamp of every scrape as the time when it was scraped. The state of the clock on the scrape target is irrelevant.

Ben Kochie

unread,
Dec 19, 2023, 4:21:08 AM12/19/23
to Brian Candler, Prometheus Users
I think the problem here is that they have the system clock set incorrectly, intentionally. It's not tracking UTC with a local timezone set. But it's tracking a local timezone and the system thinks that is UTC.

So Prometheus thinks UTC is some random local timezone, not real UTC.

For the record, Prometheus always uses UTC for timestamps. On Linux systems Prometheus can figure out local time from UTC. Not sure about the Windows behavior here. This might be a windows-specific UTC / timezone handling issue.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/89d0fb28-bc8d-43c0-9166-99e660cd9b7fn%40googlegroups.com.

Brian Candler

unread,
Dec 19, 2023, 4:44:18 AM12/19/23
to Prometheus Users
Hmm: are you saying that the problem is that Prometheus Agent is running on a system with the wrong clock, and Prometheus Agent is adding the wrong UTC timestamps to the scrapes, and then remote_write is carrying these wrong timestamps, and then the remote_write receiver is rejecting them for being in the future?

Ergh. Trying to apply compensating timestamps by adding OpenMetrics timestamps at scrape time sounds like it's doomed to failure. I'm not sure how you would get Prometheus Agent to run in an environment where the clock is wrong.

If they could arrange that the central Prometheus (with the correct clock) scrapes the exporter directly, via some sort of reverse tunnel if necessary, and get rid of Prometheus Agent entirely, that would work.

But really, the solution is to fix the underlying clock or timezone issue.

Kesavanand Chavali

unread,
Dec 19, 2023, 5:40:35 AM12/19/23
to Brian Candler, Prometheus Users
Thanks again for quick response. The issue is that the node on which the Prometheus is running is not in time sync. It is 1hr in past. There is no easy way we can correct the time as it has many dependencies with many products. Here is our setup

image.png

We wrote a custom exporter in GoLang that scrapes metrics from other exporters and endpoints, adds timestamp and sends to Thanos via remotewrite.
Say correct time is 10:00AM. 
Node time is 09:00 AM. 
Prometheus runs at 09:00 AM. Prometheus runs as agent
Now the custom GoLang Exporter gets the metrics from windows exporter, adds timestamp as 10:00 AM and gives these metrics to Prometheus.
Prometheus logs says that metrics are too far in future. I dont know if this log comes from Prometheus or Thanos.

But consider the similar use case with node time in future.
Say correct time is 10:00AM. 
Node time is 11:00 AM. 
Prometheus runs at 11:00 AM. Prometheus runs as agent
Now the custom GoLang Exporter gets the metrics from windows exporter, adds timestamp as 10:00 AM and gives these metrics to Prometheus.
Prometheus accepts these metrics and sends to Thanos.

You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/vtmeo06pxiE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/0c96dce9-19f8-424e-b93e-50f49e8fd39bn%40googlegroups.com.

Brian Candler

unread,
Dec 19, 2023, 5:58:50 AM12/19/23
to Prometheus Users
> Prometheus logs says that metrics are too far in future. I dont know if this log comes from Prometheus or Thanos.

This must be coming from Prometheus, since it's refusing to accept the metrics *before* sending them out via remote_write.  (You still haven't said whether or not you've set "honor_timestamps: true", but I guess you have)

If you really cannot fix the system clock/timezone - and for many reasons that would be the right thing to do - then the only solution I can think of is to write a custom proxy for the remote_write protocol, which sits between prometheus agent and thanos.  That is, let Prometheus Agent scrape the metrics with the local system time (which is wrong) and then correct them on the way out.

Kesavanand Chavali

unread,
Dec 19, 2023, 6:06:53 AM12/19/23
to Brian Candler, Prometheus Users
Yes, between GoLang exporter and Prometheus, we have set the honorTimeStamp to true.
Thanos supports --tsdb.too-far-in-future.time-window to ingest metrics in future. Is there a way Prometheus also supports that? Is there a plan for that?
In our case, as the node is 1hr into past, if we correct the timestamp, the metric will be 1hr into future for Prometheus...

You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/vtmeo06pxiE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/0c96dce9-19f8-424e-b93e-50f49e8fd39bn%40googlegroups.com.

Brian Candler

unread,
Dec 19, 2023, 6:30:34 AM12/19/23
to Prometheus Users
I believe this is a fundamental characteristic of the TSDB. It appends data to a head chunk containing (I think) the last 2 hours of data. If you try to write data for "the future", you're trying to write into a chunk that doesn't exist yet.

I wouldn't expect the Prometheus project to expend effort and resources to support a use case which only applies where the user's own system is misconfigured.
Reply all
Reply to author
Forward
0 new messages