Prometheus not restoring Alert state on restart.

soha...@gmail.com

unread,

Jun 26, 2020, 4:40:00 AM6/26/20

to Prometheus Users

Say an alert is in firing state then after a few minutes, I restart the Prometheus. After the restart, Prometheus sends the resolved notification to the alert-manager (and alert-manager to the webhook).

Is there any way to restore an alert state for example if it was in firing state than it should remain in firing state after a restart and not send resolved alert.

Note alert rule for=5m

Prometheus version: 2.17.2

Alertmanger version: .0.21.0

Env: running as stateful sets in Kubernetes

Prometheus runtime args:

rules.alert.for-grace-period=3m

rules.alert.for-outage-tolerance=1h

storage.tsdb.retention.time=2h

Prometheus config:

global:

scrape_interval: 30s

scrape_timeout: 10s

evaluation_interval: 30s

Please let me know if need more information

Brian Brazil

unread,

Jun 26, 2020, 4:57:53 AM6/26/20

to soha...@gmail.com, Prometheus Users

On Fri, 26 Jun 2020 at 09:40, soha...@gmail.com <soha...@gmail.com> wrote:

Say an alert is in firing state then after a few minutes, I restart the Prometheus. After the restart, Prometheus sends the resolved notification to the alert-manager (and alert-manager to the webhook).

Is there any way to restore an alert state for example if it was in firing state than it should remain in firing state after a restart and not send resolved alert.
Note alert rule for=5m

The alert restoration at startup feature is intended for really long for clauses, in the hours to days range. For short periods such as 5m things like a long restart are going to be enough to cause issues, and you should look at making the alerts and your handling of them more robust. Keeping in mind that on restart Prometheus may not yet have scraped much data for the alert.

Brian

Prometheus version: 2.17.2
Alertmanger version: .0.21.0
Env: running as stateful sets in Kubernetes

Prometheus runtime args:
rules.alert.for-grace-period=3m
rules.alert.for-outage-tolerance=1h
storage.tsdb.retention.time=2h

Prometheus config:
global:
scrape_interval: 30s
scrape_timeout: 10s
evaluation_interval: 30s

Please let me know if need more information

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/6055ce69-3360-4c1d-957a-298c44bdf18bn%40googlegroups.com.

--

Brian Brazil

www.robustperception.io

Sohaib Omar

unread,

Jun 26, 2020, 6:53:18 AM6/26/20

to Brian Brazil, Prometheus Users

Thanks for the quick response. I get your point

Reply all

Reply to author

Forward