No Telemetry data is sending resolved alert

109 views
Skip to first unread message

Ankur Nigam

unread,
Sep 19, 2021, 7:57:54 PM9/19/21
to Prometheus Users
Hi Team,

I am trying to setup host down alert and I have setup send_resolved as true which is requirement to map alerts.

I can see when there is no data for metric at the time of hostdown, alert manager is sending resolved notification.

Please let me know if there is way to fix this thing.

Stuart Clark

unread,
Sep 20, 2021, 3:31:43 AM9/20/21
to Ankur Nigam, Prometheus Users
I'm not quite clear what you are wanting to fix?

Alerts will resolve if the alert query stops returning anything, which
seems to be what you are saying is happening. What are you wanting to
happen?

--
Stuart Clark

Brian Candler

unread,
Sep 20, 2021, 3:33:43 AM9/20/21
to Prometheus Users
That's simply how alerting works in Prometheus.  An alert is defined with an expression.  Often this will be a filter expression, e.g.

expr: bytesFree < 1000

This is *not* a boolean, i.e. it's not a true/false: it's a PromQL filter.  "bytesFree" returns all of the timeseries which have that metric name, and "< 1000" filters this set so that it includes only those timeseries where the value is less than 1000.  So the result is an instant vector containing zero or more timeseries.

If this filter expression returns an empty result set, then there is no alert.  If it returns one or more timeseries, then one or more alerts are generated.  And if later on it no longer returns any timeseries, then those alerts are resolved.

In short: the absence of the metric which caused the alert in the first place, causes the alert to be resolved - and that's the only way that alerts *can* be resolved.

This also explains some other behaviour which sometimes people complain about.  Suppose you add an annotation to an alert such as
   description: Free space warning, bytesFree={{ $value }}

When you have an instance of the bytesFree timeseries with value 500 you'll get an alert like
  description: Free space warning, bytesFree=500

but when the value of that metric increases to say 2000, the alert resolved message will also say
  description: Free space warning, bytesFree=500

It can't say "bytesFree=2000" here, because the expression "bytesFree < 1000" returns no timeseries any more.  The only value which is known is the value which caused the alert to fire.  It's not the new value 2000 which causes the alert it be resolved; it's the absence of any value from the expression "bytesFree < 1000".

Julien Pivotto

unread,
Sep 20, 2021, 4:45:21 AM9/20/21
to Stuart Clark, Ankur Nigam, Prometheus Users
Resolved notifications are hard.

Please read
https://www.robustperception.io/running-into-burning-buildings-because-the-fire-alarm-stopped

>
> --
> Stuart Clark
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/8fe058df-5792-a0a0-eb22-67618dba180b%40Jahingo.com.

--
Julien Pivotto
@roidelapluie

Ankur Nigam

unread,
Sep 20, 2021, 10:37:37 AM9/20/21
to Prometheus Users

Hi Julien, 

Host down alert is the use case where we will not have metric when host is down and in this case it should not send resolved alert. If I need to write exclusive condition for sending resolve alert , is it possible in alert manager or prometheus.

Thanks.

Julien Pivotto

unread,
Sep 20, 2021, 10:42:23 AM9/20/21
to Ankur Nigam, Prometheus Users
On 20 Sep 07:37, Ankur Nigam wrote:
>
> Hi Julien,
>
> Host down alert is the use case where we will not have metric when host is
> down and in this case it should not send resolved alert. If I need to write
> exclusive condition for sending resolve alert , is it possible in alert
> manager or prometheus.

Host down alert is expressed by up == 0. Why would you have no telemetry
data? the up metric should always be present.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/b6c220fa-0575-4cdf-8190-908a81cc669cn%40googlegroups.com.


--
Julien Pivotto
@roidelapluie

Ankur Nigam

unread,
Sep 20, 2021, 10:57:57 AM9/20/21
to Prometheus Users
We have changed prometheus to collect the data (using push) instead of pulling it.

Regards,
Ankur 

Brian Candler

unread,
Sep 20, 2021, 12:18:55 PM9/20/21
to Prometheus Users
In what way are you getting data into prometheus?  If you are using pushgateway then the metric will persist indefinitely.  So you must be doing something else.

Ankur Nigam

unread,
Sep 20, 2021, 2:54:03 PM9/20/21
to Prometheus Users
Actually in cloud environment we cannot pull the data from server, so we created our own agent which push the data from local server to prometheus.

So locally we have two agents.

Brian Candler

unread,
Sep 20, 2021, 3:26:24 PM9/20/21
to Prometheus Users
Pushing *how* exactly?  Prometheus scrapes are always pulls.  You're not using the remote_write endpoint are you??
Reply all
Reply to author
Forward
0 new messages