I recently set up a single Alertmanager instance for multiple Prometheus instances in different datacenters.One datacenter has both an Alertmanager instance and a Prometheus instance (DC1) The other datacenters are running only a Prometheus instance (DC2, DC3, DC4)Each datacenter runs the same exporters under different job names and has the same set of alert rules. Labels are different for the metrics in each of the DCs.A space metric hit the alert criteria in DC1 and an alert was sent out. Prometheus in DCs2-4 also detected DC1's space problem and fired an alert with the metric labels from DC1 but with DC2-4's job name.
Taking it a step further, the alert condition in DC1 was fixed but the alert is still firing in DCs2-4.Looking at this is more detail, I found the metrics for DC1 are now in DCs2-4. DC1 does not have DC2-4's metrics.This all started when I pointed DC2-4 to DC1 Alertmanager.Why would the Prometheus metrics from the DC running Alertmanager show up in the other DCs?How are the metrics from DC1 getting to the other DC's database? The other DCs do not scrape targets in DC1.Prometheus is running version 2.17.0-rc.4, builddate 20200321-19:08:16Alertmanager is running version 0.20.0, builddate 20191211-14:13:14Alertmanager config section in prometheus.yml:# Alertmanager configuration
alerting:
alertmanagers:
- scheme: http
path_prefix: /alertmanager1/
static_configs:
- targets: [<alertmanger server>:9093']
Thanks,
Kevin
Stuart
I found the metrics for DC1 are now in DCs2-4. DC1 does not have DC2-4's metrics. DCs2-4 only have metrics for their DC and DC1.
Scrape interval varies by job. Most are 45 seconds, some are 3 minutes and some are 1 minute.
What is the actual configuration in your Prometheus?
-- Stuart Clark