Hi,
We have an environment where there is poor connectivity between the prometheus server and the alertmanager (IE. prometheus server in a remote location).
When an alert fires from the rules installed on the local prometheus server (as best practices advise) we receive a notification and all is well. However when we lose connectivity for more than 5 minutes (the time configured in prometheus) the alert is "resolved" only to begin firing again when connectivity returns.
Does anyone know of any way to resolve this? It sounds like perhaps there should be an option for an "explicit" resolution from prometheus to alertmanager - IE a request from promeheus to alertmanager that specifically triggers a resolution otherwise it is assumed to be continuing to fire? Or should this be covered by an inhibition rule based on the prometheus server responding?
Thanks in advance :)