Get the last value of a metric with indefinite time to look back

86 views
Skip to first unread message

Roman Dodin

unread,
Jan 5, 2021, 10:35:37 AM1/5/21
to Prometheus Users
Hello community,
I have a telemetry system that sends data into Prometheus when the data changes (i.e. its a push-on-event mechanism, not a sample/interval push)
That means, if a system is stable, then the last reported value can be reported quite some time back.

Let's say this metric is called "my_metric", how do I get its last reported value without specifying manually the timeframe to look back?
So far I come up with the following query:

my_metric{label="value"}[100y]

Is this the only way to get the last value of a series, or maybe there is another alternative?

Stuart Clark

unread,
Jan 5, 2021, 11:14:51 AM1/5/21
to Roman Dodin, Prometheus Users

Prometheus is designed to be used as a scrape (pull) system where metrics are regularly fetched and their values recorded in the TSDB (even if they haven't changed). As a result of this and metric that doesn't have a recent value is seen as "stale" and isn't returned when doing instant queries. The default setting is 5 minutes, which leads to the generally suggested maximum scrape interval of 2 minutes (to allow for a single scrape failure without the series being marked as stale).

I'm not sure how you are pushing events into Prometheus, but it sounds like you are working against the design of the system. Additionally Prometheus is a metrics system rather than an event storage system.

For event storage I would use a different database (possibly SQL based or a generic timeseries database) which would have no problems with the sort of queries you are wanting.

Roman Dodin

unread,
Jan 5, 2021, 11:36:44 AM1/5/21
to Stuart Clark, Prometheus Users
Thank you for your comments, Stuart, maybe I expressed myself a bit vague.
Let me be more precise and maybe then it will be easier to answer my question.

The metric I am talking about is the number of routes a network element has in its routing table. This integer is reported to the collector only when it's changed (i.e. a new route has been added to the table and now the number of routes is X+1).
The collector receives the number of routes and exposes this metric for Prometheus to scrape. Prometheus successfully scrapes this metric and stores in TSDB.

If no more routes are added to the system for a period T, no metrics will be available for Prometheus to scrape, but at the same time, in my view, it is not an event, it is a metric, it is just not periodically reported, because there is no changes to it. We are reducing the amount of data to transfer and store, by not sending the data that hasn't changed.

I am curious, if that makes the original question clearer? Is it not against the design to use Prometheus for such metrics and with this particular scraping strategy? 

Ben Kochie

unread,
Jan 5, 2021, 2:13:45 PM1/5/21
to Roman Dodin, Stuart Clark, Prometheus Users
On Tue, Jan 5, 2021 at 5:36 PM Roman Dodin <dodin...@gmail.com> wrote:
Thank you for your comments, Stuart, maybe I expressed myself a bit vague.
Let me be more precise and maybe then it will be easier to answer my question.

The metric I am talking about is the number of routes a network element has in its routing table. This integer is reported to the collector only when it's changed (i.e. a new route has been added to the table and now the number of routes is X+1).
The collector receives the number of routes and exposes this metric for Prometheus to scrape. Prometheus successfully scrapes this metric and stores in TSDB.

If no more routes are added to the system for a period T, no metrics will be available for Prometheus to scrape, but at the same time, in my view, it is not an event, it is a metric, it is just not periodically reported, because there is no changes to it. We are reducing the amount of data to transfer and store, by not sending the data that hasn't changed.

I am curious, if that makes the original question clearer? Is it not against the design to use Prometheus for such metrics and with this particular scraping strategy? 

Yes, it's more clear. This data behavior is incompatible with Prometheus due to the way stale data is handled. If a thing exists, it should continue to be exported even if it's not changing.

Prometheus deals with this just fine, as it uses compression to store the data. In the case of non-changing values, the storage requirements are extremely trivial, on the order of a few bits.

In fact, not repeating is almost no savings, as there is a minimum of 120 samples stored in a 2 hour window.

I suggest changing your exporter behavior to always produce the "current state of the world".
 

On Tue, Jan 5, 2021 at 5:14 PM Stuart Clark <stuart...@jahingo.com> wrote:
On 05/01/2021 15:35, Roman Dodin wrote:
Hello community,
I have a telemetry system that sends data into Prometheus when the data changes (i.e. its a push-on-event mechanism, not a sample/interval push)
That means, if a system is stable, then the last reported value can be reported quite some time back.

Let's say this metric is called "my_metric", how do I get its last reported value without specifying manually the timeframe to look back?
So far I come up with the following query:

my_metric{label="value"}[100y]

Is this the only way to get the last value of a series, or maybe there is another alternative?

Prometheus is designed to be used as a scrape (pull) system where metrics are regularly fetched and their values recorded in the TSDB (even if they haven't changed). As a result of this and metric that doesn't have a recent value is seen as "stale" and isn't returned when doing instant queries. The default setting is 5 minutes, which leads to the generally suggested maximum scrape interval of 2 minutes (to allow for a single scrape failure without the series being marked as stale).

I'm not sure how you are pushing events into Prometheus, but it sounds like you are working against the design of the system. Additionally Prometheus is a metrics system rather than an event storage system.

For event storage I would use a different database (possibly SQL based or a generic timeseries database) which would have no problems with the sort of queries you are wanting.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAFBEvLtEbUhutdYcBc58Zme7tCOT-nVacod2tAWPMuYq954w_Q%40mail.gmail.com.

Roman Dodin

unread,
Jan 5, 2021, 2:58:25 PM1/5/21
to Ben Kochie, Stuart Clark, Prometheus Users
Thanks Ben, it is clear now.
I will check if it makes sense to add a "metric-replay" feature in the collector to monotonously export the on_change-type values. 

Roman Dodin

unread,
Jan 5, 2021, 3:22:14 PM1/5/21
to Ben Kochie, Stuart Clark, Prometheus Users
One thing I forgot to ask:
how bad is that query (since for now it does what is needed) performance-wise?

my_metric{label="value"}[100y]


Reply all
Reply to author
Forward
0 new messages