Alert Duplication in HA Prometheus & Alertmanager setup.

888 views
Skip to first unread message

yagyans...@gmail.com

unread,
Nov 6, 2020, 11:02:31 AM11/6/20
to Prometheus Users
Hi. I have a HA Prometheus setup, with 2 instances(x.x.x.x and y.y.y.y) scraping exactly the same targets. On the respective machines, Alertmanager is also running and a mesh is created. But I am observing that all the alerts are getting duplicated and I am receiving every alert twice.
Alertmanager Version - 0.21.0.
/usr/local/bin/alertmanager --config.file /etc/alertmanager/alertmanager.yml --storage.path /mnt/vol2/alertmanager --data.retention=120h --log.level=debug --web.listen-address=x.x.x.x:9093 --cluster.listen-address=x.x.x.x:9094 --cluster.peer=y.y.y.y:9094

Oh, one thing that just popped into my head, for the temporary testing period I am running different versions of Prometheus in the instances. 2.12.0 in one and 2.20.1 in the other one. Could this also cause this?

Thanks in advance!

Nolan Crooks

unread,
Mar 23, 2021, 6:11:31 PM3/23/21
to Prometheus Users
I am also having this issue. I am running identical instances of Prometheus on the same version though.

Julius Volz

unread,
Mar 23, 2021, 6:38:00 PM3/23/21
to Nolan Crooks, Prometheus Users
Do you have a replica label (in external_labels in Prometheus) that distinguishes the two replicas so that the alerts no longer look the same to Alertmanager?

In that case, you would have to drop that first, see "Removing HA Replica Labels from Alerts" under https://training.promlabs.com/training/relabeling/writing-relabeling-rules/keeping-and-dropping-labels

If that is not the problem, maybe your AM instances aren't talking to each other correctly. If you create a silence in one of the AM replicas, does it appear in the other? There should also be log messages about the peer discovery, as well as on the /status page of Alertmanager.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c8245b99-376b-44c2-9a6f-433912a294den%40googlegroups.com.


--
Julius Volz
PromLabs - promlabs.com

Nolan Crooks

unread,
Mar 25, 2021, 3:11:49 PM3/25/21
to Prometheus Users

I don't know about the original poster, but in my case, I have an 2 instances of Prometheus using the same config file, with an external label set to "prom" on each. I have both pointed to 2 alertmanagers which are clustered, and if I create a silence in one I can see it appear in the other. However, I still am getting duplicate alerts. The first one will fire and the second will fire between 15-20s after. My group wait in Alertmanager is 45s, and the group interval is 1m. Scrape interval and evaluation intervals are both 5s. What could be causing this behaviour?
Reply all
Reply to author
Forward
0 new messages