Alertmanager "cycles" alerts after prometheus restart

p.deja...@gmail.com

unread,

Jul 3, 2018, 3:53:49 AM7/3/18

to Prometheus Users

Hi,

We're running a redundant prometheus (2.3.1) / alertmanager (0.15) setup. If we restart (not reload) prometheus, it triggers prometheus alerts to go into "pending" state when they are configured with the "for" option. This makes alertmanager think the alerts are resolved, so it sends out resolved webhooks. After the "for" timer has passed, they are firing again and alertmanager fires them again too.

As we have quite a few alerts, this causes pain and grief on operational teams. I'm wondering how we can avoid this from happening (alternate the restarts until it stabilizes I'm assuming) and see what other people are doing during restart scenarios with their alerts.

Thanks,

Pieter

Simon Pasquier

unread,

Jul 3, 2018, 4:39:03 AM7/3/18

to p.deja...@gmail.com, Prometheus Users

As you said, one way would be to restart your Prometheus servers in a rolling fashion.

Otherwise there's a pending PR [1] to alleviate this issue and make Prometheus remember the alerts across restarts. Hopefully it will be in for the next minor release of Prometheus.

[1] https://github.com/prometheus/prometheus/pull/4061

Thanks,
Pieter

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/8fd73a1e-6514-4b72-9095-419015d1ef7a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

p.deja...@gmail.com

unread,

Jul 3, 2018, 10:20:44 AM7/3/18

to Prometheus Users

The rolling restart isn't really workable, as we deploy with ansible from a CI/CD pipeline and waiting 30' isn't great.

that upcoming new feature looks awesome, looking forward to it.

Thanks,

Pieter

Reply all

Reply to author

Forward