Alertmanager Gossip Cluster and Silences

963 views
Skip to first unread message

Dj Fox

unread,
Nov 29, 2021, 12:55:15 PM11/29/21
to Prometheus Users
Hello!
I'm wondering how to correctly use Alertmanager at scale.

I have 10 regions. In each region, a pair of Prometheuses scrap exactly the same set of applications (which are also local, located in that region).
Then, each region has a pair or HA Alertmanagers, which are gossiping together.
Each Prometheus is connected to the 2 Alertmanagers of its region.

In order to benefit from a global metrics view + object storage, we are using Thanos.
It works great.

But with that kind of architecture, how I am supposed to silence an alert?
I want silences to be propagated to all Alertmanagers of the whole world. But if they are separated in 10 clusters of 2 members, this doesn't happen automatically.

How I am supposed to use the silencing system at scale? I can't afford creating only one silence in the correct region where the alert was firing, because it then means I have no global view of all of my silences, and I can forget where they are. It becomes hard to manage, and sometimes I may want to mute globally on several regions.

The memberlist library used by Alertmanager seems to have been exactly designed to exchange information between a lot of nodes of a big cluster, and keeping at the same time a good performance.

So, I then tried to connect all 20 Alertmanagers to the same Gossip cluster. The goal is to make them automatically propagate their silences.
By doing so, I made sure that one pair of Prometheus continues to ONLY be connected to 2 Alertmanagers of the same region.

=> It works well and it does what I want:
- Silences are propagated everywhere
- Alerts are gossiped to all nodes, but the other regions never do anything with the alert that they receive only by Gossip and not by Prom.
(If I understand correctly, an Alertmanager will never take responsibility to notify for an alert if it has not received it by a Prom.)

But then I noticed that in Alertmanager implementation, there is a timer depending on the index position of each node in the memberlist cluster: an Alertmanager receiving an alert from Prom will wait for 5s times its index in the cluster.
It means that if one Alertmanager region has index 19 and 20, I'll introduce a delay of 19x5 = 95s before the notification can be sent.

In official README in the Github project, it's cleary stated:
Important: Do not load balance traffic between Prometheus and its Alertmanagers, but instead point Prometheus to a list of all Alertmanagers. The Alertmanager implementation expects all alerts to be sent to all Alertmanagers to ensure high availability.

Do you have advise on how to handle "Silencing at scale" with Alertmanager?

Usually, we say that Prometheus does not handle scale (beyond one node), because it focuses on doing correctly its job, in a very efficient manner (one Prometheus can ingest millions of samples and be very good at it).
That's why tools have separated responsibilities, and Thanos/Cortex can come to the rescue in that case.
But in Thanos, I see no component designed to transform Alertmanager to be scalable.

Connecting ALL 20 Prometheus to ALL 20 Alertmanager seems a bit overkill to me.
I think it would make the cluster less robust, because I would expose myself more and be more susceptible to network partitions, causing a higher probability of failing alert deduplication (higher probability of being notified twice for the same alert because of a higher probability that a network partition will occur somewhere).

Is it a good idea to connect all Alertmanager of different regions to the same memberlist cluster, but at the same time, keeping only 2 Prom connected to each Alertmanager?

Thank you for your advice!
Regards

Brian Candler

unread,
Nov 29, 2021, 1:05:53 PM11/29/21
to Prometheus Users
I use Karma as an alert dashboard.  It can combine multiple alertmanagers (or alertmanager clusters) into a single view, and push silences out to all of them.

alerta.io is another tool which I believe can do this, but I've not tried it.

I'd then suggest keeping it nice and simple:
- one alertmanager cluster in each region
- prometheus in each region talks to its own alertmanagers only
- don't attempt any gossip or any other interconnection between regions

Matthias Rampke

unread,
Nov 30, 2021, 4:29:30 PM11/30/21
to Dj Fox, Prometheus Users
Alertmanager clustering works best if you send all alerts to all Alertmanagers – in effect, have a single, flat, global Alertmanager cluster where all members do the same work.

In that sense,

> Is it a good idea to [keep] only 2 Prom connected to each Alertmanager?

In my experience, no.

When sending all alerts across regions, all Alertmanagers can pick up all the work, providing redundancy. This means you can remove the doubling of AM within regions, reducing the number of instances to 10. You can also go further, and deploy AM to only a subset of regions. I would consider "assuming they are all in separate regions, how many alertmanager instances are enough for the reliability I desire?"

/MR

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/0fa45290-ab8f-4eb2-82d1-1142f282c981n%40googlegroups.com.

Dj Fox

unread,
Dec 2, 2021, 5:35:55 AM12/2/21
to Prometheus Users
@Brian
Do you confirm that one of the main reason that Alertmanager cluster needs to handle the same set of alerts as the others (and hence be plugged to all same prometheuses) is because of the way deduplication works?
It's because of the 5s delay times the position of the alertmanager in the gossip cluster, right?
- It would be cool to be able to tell each alertmanager: this is my "alert-family". The deduplication mechanism would then only occur among the members sharing the same "alert-family" value. That way, we are not forced to connect 20 Proms to every alertmanager anymore, and all of them can still gossip the valuable "silences".
- And/or being able to give to amtool a list of clusters so that it can also handle several clusters.

@Matthias
Yes, the main benefits I see with your proposal is:
- Silencings are automatically propagated to all nodes
- And the group_by will become a "real" global group_by (instead of having 10 sub-optimal group_by that are only capable of grouping by region)
So this is very appealing.
However, I'm just worried about exposing myself more to network partitions with this solution? With that solution of far away alertmanagers, there is more probability that one node will become totally isolated, and will hence stop to deduplicate and send more alerts.
At least with 2 alertmanagers in the same region, those partitions are still a possibility (missconfiguration, firewall, local issue), but much less likely.
As I understand it, memberlist library handles network partitions very well? But I guess only if one node is partially isolated.
Does someone have more information to share about the behaviour of that Gossip Protocol during network partitions? There is low level documentation in the source code of the lib, but I'm struggling to find a more high-level documentation for it.
For now, I know that there are awareness/suspicion mechanism, but I'm not sure exactly how it works.
If Alertmanager has (through the memberlist lib) a high awareness of being not properly working, will it decide to stop sending alerts?

Brian Candler

unread,
Dec 2, 2021, 5:56:54 AM12/2/21
to Prometheus Users
On Thursday, 2 December 2021 at 10:35:55 UTC Dj Fox wrote:
@Brian
Do you confirm that one of the main reason that Alertmanager cluster needs to handle the same set of alerts as the others (and hence be plugged to all same prometheuses) is because of the way deduplication works?

Kind of, although I think it's more for redundancy.  If Prometheus only sent to one alertmanager and that one failed, then you wouldn't get your alert.  So it sends to all of them, and then those which are up should have a consistent view quickly.
 
- It would be cool to be able to tell each alertmanager: this is my "alert-family". The deduplication mechanism would then only occur among the members sharing the same "alert-family" value. That way, we are not forced to connect 20 Proms to every alertmanager anymore, and all of them can still gossip the valuable "silences".

That's basically having multiple alertmanager clusters.
 
- And/or being able to give to amtool a list of clusters so that it can also handle several clusters.

Well, it is scriptable, and tools like karma will do this for you.

Instead you could follow MR's suggested approach and build a single global alertmanager cluster, say three nodes in total, and point all the other prometheus servers at those three.  I think that's reasonable: if a region becomes so isolated that it can't talk to that central alertmanager cluster, then it's probably so isolated that it couldn't send out an alert via E-mail either.
Reply all
Reply to author
Forward
0 new messages