The main purpose of this kind of configuration would be to adress the following :- have a single cluster to which silences to be managed
- to ensure global redundancy if one region should become unavailable
If the region has completely failed, then presumably there's nothing within that region that is worth alerting on anyway. You can monitor the alertmanager cluster in one region from a prometheus in another region, to get an alert of that failure mode.
However, the simplest solution would be to have a single alertmanager cluster, spread across AZs in a single region; all the other prometheuses send their alerts to this cluster. Alerting is low traffic and I don't see a particular reason to have a separate alertmanager cluster in every region. You can test that you can reach prometheus in every other region, and then you have high confidence that prometheus in those regions will be able to contact the central alertmanager.
With this approach, multiple AZ which are typically each hosted within a single DC, still run the risk of being inaccessible should the link to the DC go down. So let's say you have datacenters in 3 regions (AMER, EMEA and APAC) and you've chosen to have a single AM cluster in EMEA, should the link between AMER and EMEA and/or EMEA and APAC go down , then Prometheus instances located in AMER or APAC won't be able to send alert notifications. If you instead of 2 or 3 alertmanager instances in each of these regions, wouldn't that still allow alerts to be received and actioned within each of those regions?
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/prometheus-users/ec7b1e1f-d1af-4e0c-ad59-1f238e661737n%40googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/prometheus-users/e0d30be0-0dfb-421a-a457-ebef81b4d1d9n%40googlegroups.com.