alertmanager instance failure

17 views
Skip to first unread message

Dhiman Barman

unread,
Feb 18, 2020, 9:26:00 PM2/18/20
to promethe...@googlegroups.com
Hi,

We have a setup which has multiple prometheus instances and same number of (alertmanager + webhook) instances. 
We have a docker which has both alertmanager and webhook processes running. 
If alertmanager webhook but not alertmanager process, how catastrophic is this event ?
What if both go down, how catastrophic is the event. Note if VM gets rebooted, it might take a long time for the 
instances to come up. How much clustering will help in not dropping alerts ?

Thanks,
Dhiman

Simon Pasquier

unread,
Feb 21, 2020, 9:50:20 AM2/21/20
to Dhiman Barman, Prometheus Users
You need to run Alertmanager instances on different machines and setup
HA as described in the README.md [1].
This way your setup will be resilient to (N-1) instances going down.
If you want to detect a failure in your monitoring pipeline, you need
to setup something like a dead man's snitch integration [2].

[1] https://github.com/prometheus/alertmanager#high-availability
[2] https://www.pagerduty.com/docs/guides/dead-mans-snitch-integration-guide/
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CA%2BLhoFwWabxJBHhaaZT3AsAORD_8sWmsdpNtA%3DsTotD8U8FkGg%40mail.gmail.com.

Reply all
Reply to author
Forward
0 new messages