Expire silence while alert became resolved

39 views
Skip to first unread message

John Smith

unread,
Nov 27, 2020, 4:21:16 AM11/27/20
to Prometheus Users
Hi,I'm coding a alertmanager like system, alert is sent by an im called DX,I placed some bottom  in alert message,like silence 10m,1h,6h. A silence will be created after click the bottom which silence the alert(each silence for an alert) for corresponding period.
Now some of my colleagues want feature that when the alert is resolved,expire the corresponding silence(which silence the only alert) by their choice.


I am not sure whether it's a good idea or not .
What do you guys think about this:) 

   Thanks~

b.ca...@pobox.com

unread,
Nov 27, 2020, 6:03:47 AM11/27/20
to Prometheus Users
During maintenance, alerts tend to bounce up and down, so for scheduled maintenance I wouldn't want a resolved alert to clear the silence automatically.  Rather, there should be a process to remove the silences when the maintenance has been confirmed as complete (by the engineers).  For example, you can include the ticket number in the silence (or the silence ID in the ticket), and delete it when the ticket is closed.

I do sympathise with the use case: "X has gone down, a ticket has been raised for X, don't bug me about X again".  This might be done by setting a long-duration silence, say 2 weeks - but if the problem is fixed before then, you do want to start raising alarms again.  Again, if the silence is explicitly linked to a ticket, then closing the ticket can delete the silence.

If you don't have this ticket linkage, then it might be useful to have an external program which monitors the alertmanager, and notices if a given silence has been covering zero alerts for an extended period of time (say 6 hours), and either flags it up for attention, or expires it automatically.

Remember that depending on how it's created, one silence can cover multiple active alerts.  If it covers several alerts, and a subset have resolved, you couldn't just delete the silence.  You'd have to replace it with a more specific matching silence to cover the remaining alerts which are still active.

l.mi...@gmail.com

unread,
Nov 27, 2020, 6:13:35 AM11/27/20
to Prometheus Users

John Smith

unread,
Nov 27, 2020, 7:48:34 AM11/27/20
to Prometheus Users
Thanks for your replay,I don't think it's necessary to expire a silence which covers many alerts, In this case endsAt is enough. 
What I'm not sure is that ,the silence  only mute one alert witch created by user on purpose. This kind of silence matched by alert's fingerprint rather than labels :)
I suppose it is reasonable now...

Ben Kochie

unread,
Nov 27, 2020, 8:03:08 AM11/27/20
to John Smith, Prometheus Users
We have a system called helicopter, that updates our issues/tickets with silence information. When a silence is put into place, we put issue URLs in them.


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/88d4f883-7812-4f9d-88a2-8483d802dac5n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages