Hi prometheus folks,
I have a question about alertmanager.
Here is an one year old issue about merging few HA alertmanager clusters into one big over time:
https://github.com/prometheus/alertmanager/issues/2250 I managed to reproduce it on my local k8s kind cluster. Seems there is small discrepancy between a list of peers reported by gossip library and a list of peers from am config file.
We can workaround it by using k8s network policy. However more proper fix would be on alertmanager side: keep eye on number of peers and compare with desired number. In case there is some unexpected state, clear table of peers, do DNS resolution once more and do form a new peer table. Maybe there is better solution. What do you think?
Probably I even can introduce a PR if we can agree on a way to fix it and someone can support me with review : )