alertmanager instance failure

17 views

Skip to first unread message

unread,

Feb 18, 2020, 9:26:00 PM2/18/20

to promethe...@googlegroups.com

Hi,

We have a setup which has multiple prometheus instances and same number of (alertmanager + webhook) instances.

We have a docker which has both alertmanager and webhook processes running.

If alertmanager webhook but not alertmanager process, how catastrophic is this event ?

What if both go down, how catastrophic is the event. Note if VM gets rebooted, it might take a long time for the

instances to come up. How much clustering will help in not dropping alerts ?

Thanks,

Dhiman

unread,

Feb 21, 2020, 9:50:20 AM2/21/20

to Dhiman Barman, Prometheus Users

You need to run Alertmanager instances on different machines and setup
HA as described in the README.md [1].
This way your setup will be resilient to (N-1) instances going down.
If you want to detect a failure in your monitoring pipeline, you need
to setup something like a dead man's snitch integration [2].

[1] https://github.com/prometheus/alertmanager#high-availability
[2] https://www.pagerduty.com/docs/guides/dead-mans-snitch-integration-guide/

> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CA%2BLhoFwWabxJBHhaaZT3AsAORD_8sWmsdpNtA%3DsTotD8U8FkGg%40mail.gmail.com.

Reply all

Reply to author

Forward

0 new messages