Re: [prometheus-users] Single Alertmanager for Multiple Prometheus Instances

Message has been deleted

Stuart Clark

unread,

Jun 8, 2020, 12:50:15 PM6/8/20

to Kevin Kruppa, Prometheus Users

On 08/06/2020 17:31, Kevin Kruppa wrote:
> I recently set up a single Alertmanager instance for multiple
> Prometheus instances in different datacenters.
>
> One datacenter has both an Alertmanager instance and a Prometheus
> instance (DC1) The other datacenters are running only a Prometheus
> instance (DC2, DC3, DC4)
>
> Each datacenter runs the same exporters under different job names and
> has the same set of alert rules. Labels are different for the metrics
> in each of the DCs.
>
> A space metric hit the alert criteria in DC1 and an alert was sent
> out. Prometheus in DCs2-4 also detected DC1's space problem and fired
> an alert with the metric labels from DC1 but with DC2-4's job name.

What do you mean by this? "Prometheus in DCs2-4 also detected DC1's
space problem"

--
Stuart Clark

Kevin Kruppa

unread,

Jun 8, 2020, 1:08:50 PM6/8/20

to Prometheus Users

Stuart -

Since the DC1 metrics are now in DC2-4's Prometheus DB and each DC has the same exporters and alert rules, DC2-4's Prometheus instance analyzed DC1 metrics (which should not be there), found the space condition in DC1 and fired an alert on DC1 metrics with the DC2-4's job name, essentially alerting 3 extra times with different a job name.

DC1 metrics should not be in DCs2-4.

On Monday, June 8, 2020 at 11:31:17 AM UTC-5, Kevin Kruppa wrote:

I recently set up a single Alertmanager instance for multiple Prometheus instances in different datacenters.

One datacenter has both an Alertmanager instance and a Prometheus instance (DC1) The other datacenters are running only a Prometheus instance (DC2, DC3, DC4)

Each datacenter runs the same exporters under different job names and has the same set of alert rules. Labels are different for the metrics in each of the DCs.

A space metric hit the alert criteria in DC1 and an alert was sent out. Prometheus in DCs2-4 also detected DC1's space problem and fired an alert with the metric labels from DC1 but with DC2-4's job name.

Taking it a step further, the alert condition in DC1 was fixed but the alert is still firing in DCs2-4.

Looking at this is more detail, I found the metrics for DC1 are now in DCs2-4. DC1 does not have DC2-4's metrics.

This all started when I pointed DC2-4 to DC1 Alertmanager.

Why would the Prometheus metrics from the DC running Alertmanager show up in the other DCs?

How are the metrics from DC1 getting to the other DC's database? The other DCs do not scrape targets in DC1.

Prometheus is running version 2.17.0-rc.4, builddate 20200321-19:08:16

Alertmanager is running version 0.20.0, builddate 20191211-14:13:14

Alertmanager config section in prometheus.yml:

# Alertmanager configuration

alerting:

alertmanagers:

- scheme: http

    path_prefix: /alertmanager1/

    static_configs:

    - targets: [<alertmanger server>:9093']

Thanks,
Kevin

Stuart Clark

unread,

Jun 8, 2020, 1:34:29 PM6/8/20

to Kevin Kruppa, Prometheus Users

On 08/06/2020 18:08, Kevin Kruppa wrote:
> Stuart -
>
> Since the DC1 metrics are now in DC2-4's Prometheus DB and each DC has
> the same exporters and alert rules, DC2-4's Prometheus instance
> analyzed DC1 metrics (which should not be there), found the space
> condition in DC1 and fired an alert on DC1 metrics with the DC2-4's
> job name, essentially alerting 3 extra times with different a job name.
>
> DC1 metrics should not be in DCs2-4.
>

So if you do a query in DC2-4 you can see the metric from DC1? What
about seeing other metrics (e.g. DC2 in DC1 or DC4 in DC2)?

What is your scrape configuration?

--
Stuart Clark

Message has been deleted

Stuart Clark

unread,

Jun 8, 2020, 1:54:31 PM6/8/20

to Kevin Kruppa, Prometheus Users

On 08/06/2020 18:53, Kevin Kruppa wrote:

Stuart

I found the metrics for DC1 are now in DCs2-4. DC1 does not have DC2-4's metrics. DCs2-4 only have metrics for their DC and DC1.

Scrape interval varies by job. Most are 45 seconds, some are 3 minutes and some are 1 minute.

What is the actual configuration in your Prometheus?

-- 
Stuart Clark

Kevin Kruppa

unread,

Jun 8, 2020, 1:56:46 PM6/8/20

to Prometheus Users

Stuart

I found the metrics for DC1 are now in DCs2-4. DC1 does not have DC2-4's metrics. DCs2-4 only have metrics for their DC and DC1.

Scrape interval varies by job. Most are 45 seconds, some are 3 minutes and some are 1 minute.

Message has been deleted

Kevin Kruppa

unread,

Jun 8, 2020, 4:31:38 PM6/8/20

to Prometheus Users

Problem resolved

Reply all

Reply to author

Forward

Re: [prometheus-users] Single Alertmanager for Multiple Prometheus Instances - Metrics Appearing In Other Instances

Stuart Clark

Kevin Kruppa

Stuart Clark

Stuart Clark

Kevin Kruppa

Kevin Kruppa