Question on Handling Alerts From HA Prometheus Pairs

26 views

Skip to first unread message

Giedrius Statkevičius

unread,

Jul 30, 2020, 6:43:16 AM7/30/20

to Prometheus Users

Hello all,

Let's say we have >=2 Prometheus nodes that are scraping the same k8s metrics. k8s SD happens every 5 minutes. Then, imagine an alerting rule expression such as:

absent({pod="my-cool-pod}) == 1

Then, what happens in practice is that you will see the alert quickly becoming firing -> resolved -> firing -> resolved because AFAICT one Prometheus node will send an alert towards AlertManager with the state "resolved" and then after some seconds the 2nd will still send an alert with the state "firing" because the metric is still not there. Then, it sends an alert with the state "resolved" and only then it finally becomes actually resolved. Seems like the magic happens here: https://github.com/prometheus/prometheus/blob/master/rules/alerting.go#L103-L106. I would imagine that in such a scenario we should depend on AlertManager resolving the alert automatically for us after some time to get a "consistent" state.

Any thoughts on this or perhaps I am missing something?

BR,
Giedrius

Julius Volz

unread,

Jul 31, 2020, 3:42:37 AM7/31/20

to Giedrius Statkevičius, Prometheus Users

That's an interesting problem. When you have alerts where one server in the HA pair can take multiple minutes longer to resolve it than the other one (because they don't do SD at the same time), I'm not sure what you can do besides routing those alerts into a route that has a long-ish (longer than those couple of minutes, which is common anyway) "group_interval" set, so at least you shouldn't get a resolved notification and new firing notification flapping (resolved notifications also obey they "group_interval"). Or do you have a long-enough "group_interval" and still get multiple firing/resolved notifications actually sent out of Alertmanager?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/b2de936f-a79e-40c0-80be-0452e52980a8o%40googlegroups.com.

Julius Volz

PromLabs - promlabs.com

Reply all

Reply to author

Forward

0 new messages