How to delay de-duplication of alerts in AlertManager

24 views
Skip to first unread message

Dhiman Barman

unread,
Aug 14, 2019, 2:14:08 PM8/14/19
to Prometheus Developers
Hi,

We have multiple instances of Prometheus running and the same number of AlertManager instances forming a peer-mesh. 
We are observing that in the production 15-20% de-dup alerts are failing - that is, JIRA is creating new tickets. This happens 
when duplicate alerts are sent my AlertManager in quick succession. 

Is there a way to configure AlertManager so that first alert is sent to JIRA soon and sub-sequent duplicate alerts can be delayed 
by configurable amount of time ? 

Yes, it's possible that first and duplicate alerts may be served by different instances of AlertManager. 


Thanks,
Dhiman

Matthias Rampke

unread,
Aug 14, 2019, 4:40:02 PM8/14/19
to Dhiman Barman, prometheus-developers
The second notification is regularly only sent after the group interval, which defaults to 5 minutes. If you're getting duplicates in under a minute, it's caused by a failure in clustering. By design, if the Alertmanager instances can't communicate that they sent the notification, the next in the cluster will.

This should be traceable from the Alertmanager logs if you set verbosity high enough.

The main question then is why they can't communicate reliably. It could be something in your environment.

/MR

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/a0fab027-3e73-47c5-a330-6a9a2c64ad11%40googlegroups.com.

Dhiman Barman

unread,
Feb 11, 2020, 5:52:03 PM2/11/20
to Matthias Rampke, prometheus-developers
Hi,

We have the following metrics in our Prometheus instances. There are three instances of Prometheus and similar number of Alertmanager instances. AM instances form a mesh. Each Prometheus instance sends an alert to all the AM instances.  The metrics that we have in Prometheus are as follows:

metric_name{label1=a, label2=b}  
metric_name{label1=b, label2=a}

Is it possible to de-dup two alarms based on the above (label, value) combinations ?  The metric name is same and we want to create one JIRA ticket where label1 and label2 have values which are related. 
label1 could be "from" and label2 could be "to". 
If it is possible to do it in Alertmanager without having to generate a combined label in Prometheus,  can someone show an example configuration ?

Thanks,
Dhiman
 






Matthias Rampke

unread,
Feb 12, 2020, 4:13:00 AM2/12/20
to Dhiman Barman, prometheus-developers
Hey,

this is a new question, and not really development related. You'll more likely get an answer if you open a new thread on the prometheus-users mailing list.

Thank you!
Matthias
Reply all
Reply to author
Forward
0 new messages