Prometheus remote write keep sending data from Push Gateway Exporter

997 views
Skip to first unread message

Samuel Stanley

unread,
Oct 9, 2020, 9:41:54 AM10/9/20
to Prometheus Users
I have configured my Prometheus server to scrape data from my component for custom metrics. Prometheus server has remote_write configured to send data to sysdig. I notice that even though my component is not sending data to the Push Gateway exporter I see data flowing from the Prometheus server into my sysdig instance. How do I get Prometheus to not send data when my component is not pumping metrics. Can someone please help?


Thanks in advance.

Brian Candler

unread,
Oct 9, 2020, 9:57:31 AM10/9/20
to Prometheus Users
Pushgateway doesn't do what you think.

Pushgateway exists so that short-lived, one-shot scripts can write their result somewhere and then terminate.  When prometheus scrapes it, it will always see the *most recent* value which was written to the pushgateway.

As long as prometheus scrapes it, it will be ingesting data.  It will be the same value every time, so it will compress very efficiently - but the timeseries will still contain values, and they'll get written to disk.

Samuel Stanley

unread,
Oct 12, 2020, 2:28:16 AM10/12/20
to Prometheus Users
Thank you. So how do I prevent Prometheus from  relaying the same data. Should it not stop relaying the same data to sysdig once it has sent it successfully? I have the retention time in my remote_write set to 5 minutes. So my assumption was, post this retention time Prometheus should not send me old data. Please let me know if I am missing some configuration.

Brian Candler

unread,
Oct 12, 2020, 3:16:55 AM10/12/20
to Prometheus Users
> Should it not stop relaying the same data to sysdig once it has sent it successfully?

No, because it's not the same data.  Each scrape creates a new data point:
- The value of this metric was V at time T1
- The value of this metric was V at time T2
- The value of this metric was V at time T3
... etc

Those are different pieces of information.

Pushgateway is not a "relay" as such.  It is more of a cache, just storing the last-written value.

The model of Prometheus is:
- data is fetched via scraping
- you can have multiple Prometheus instances scraping the same exporters (e.g. for high availability, or so you can run a development instance on your laptop)

Therefore it would not be possible for pushgateway to delete a value once it has been scraped; that would stop a second Prometheus instance from seeing the same data.

But there are other reasons why this is a bad idea.  One of them is that any timeseries which has not had a write within the last 5 minutes is considered "stale", i.e. it has no current data.  If you ask the value of a metric at a given instant, Prometheus only looks 5 minutes into the past.  So if the value of a metric is V, then you need to record the value repeatedly to be able to see that its current value is still V.  There is a difference between knowing the value of something is the same as it was before, and not knowing its value at all.

> So how do I prevent Prometheus from  relaying the same data

You would need to let the timeseries become stale - i.e. "no data" rather than "same data as last time".  You can't do that with pushgateway: see https://github.com/prometheus/pushgateway#non-goals

The solution is to get rid of pushgateway, and write a proper exporter for your data: one which is scraped directly, and returns a list of all the current values of a metric.  If a metric no longer exists, then you can stop returning it, and it will become stale.  But as I said before, "a metric no longer exists" is different to "the metric still exists, but happens to have the same value as last time you saw it".

If you describe what your metric is and what it represents, people may be able to help you more.

If you are only using pushgateway because the source cannot be scraped directly (e.g. because of firewall restrictions), then there are other options, e.g. PushProx.

Samuel Stanley

unread,
Oct 12, 2020, 3:34:55 AM10/12/20
to Brian Candler, Prometheus Users
Thank you Brian.

Here's my use case:

I use PushGateway to allows Prometheus to scrape a custom component in our service. Here the metric is the number of notifications sent for a given service instance. So every time there is a Push Notification being sent via the service instance, I go ahead and increment the Guage by 1 and send it to PushGateway. Prometheus then scrapes this and sends it to Sysdig. Currently what happens is if the user happened to send 5 notifications to say at 12:58 PM and then does not send any more notification, what I see is 5 is being sent by the Prometheus server even after 12:58 PM and that continues until the next time a notification is being sent by the user and the value changes. So the way I would want it to work is at 12:58 PM it should spike up showing that the value is 5 indicating that there were 5 notifications that were sent and then drop back to 0 until the next value comes in.

--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/uGYUQhQAdOE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/81a57343-8f6c-48b6-9e27-9954718936ffo%40googlegroups.com.

Brian Candler

unread,
Oct 12, 2020, 4:09:42 AM10/12/20
to Prometheus Users
On Monday, 12 October 2020 08:34:55 UTC+1, Samuel Stanley wrote:
Here's my use case:

I use PushGateway to allows Prometheus to scrape a custom component in our service. Here the metric is the number of notifications sent for a given service instance. So every time there is a Push Notification being sent via the service instance, I go ahead and increment the Guage by 1 and send it to PushGateway. Prometheus then scrapes this and sends it to Sysdig. Currently what happens is if the user happened to send 5 notifications to say at 12:58 PM and then does not send any more notification, what I see is 5 is being sent by the Prometheus server even after 12:58 PM and that continues until the next time a notification is being sent by the user and the value changes.

That is absolutely the correct way to use a counter in Prometheus.  This is how it should work.  The repeated value of 5 is confirming that no additional events occurred between T1 and T2, which is a valid and important piece of information.

Compare the following two timeseries:

(a) 1 2 4 5 5 5 5 5 5 5 6 6 6
(b) 1 2 4 5 . . . . . . 6 . .

In case (a) you know exactly when the counter went from 5 to 6.  In case (b) you don't know anything about the counter value where there is a dot.  What it's actually saying is that the metric has gone away.  In that case, it's impossible to calculate the rate:

(b) 1 2 4 5 . . . . . . 6 . .
           <vals unknown>

*Maybe* the counter just went from 5 to 6.  But maybe it went from 5 to 7, and then the counter was reset to zero, and then incremented back up to 6, all during that period where there is no data.  After the gap, it is effectively a completely new time series, which just happens to start at value 6.
 
So the way I would want it to work is at 12:58 PM it should spike up showing that the value is 5 indicating that there were 5 notifications that were sent and then drop back to 0 until the next value comes in.

That's abuse of the data model.  Prometheus *can* work with counters which occasionally reset, because this happens in real life when services are restarted and have no way to persist their counters, but you should not have counters resetting frequently as a matter of course.  At that point they are no longer counters, but they are not gauges either, and the data is useless.

If this were a local Prometheus server there would be no issue with the repeated counter values.  So I presume the driver here is that you are trying to micro-optimise sending to sysdig - maybe to do with the way sysdig charges you?  I'm sorry, but Prometheus doesn't support this use case, certainly not with remote_write.

It might be possible to do something with recording rules - that is, create a new timeseries with a stale gap where the data is not changing.  But I'm not going to help you with that, because in the long term it will bite you.
Reply all
Reply to author
Forward
0 new messages