Have multiple datacenters - each running own Prometheus instance. During maintenance - I'd like to inhibit all the alerts in datacenter A, and only those alerts in datacenter B that are related to A being down. This would be scenario for avoiding alerts for db broken replication between datacenters.
Thinking about creating these rules in each dc:
- alert: MaintenanceMode
expr: maintenance_mode == 1
for: 1m
labels:
severity: warning
annotations:
summary: This is a maintenance alert for {{ $labels.instance }}.
- alert: SatelliteMaintenanceMode
expr: maintenance_mode == 2
for: 1m
labels:
severity: warning
annotations:
summary: This is a satellite maintenance alert for {{ $labels.instance }}.
Than in alertmanager inhibition section I would try with
- source_match:
alertname: MaintenanceMode
target_match_re:
severity: 'warning|critical'
- source_match:
alertname: SatelliteMaintenanceMode
target_match_re:
alertname: 'MySQLReplicationNotRunning'
channel_name: .*
Metrics would be set exclusively so that Maintenance Alert is triggered on only one datacenter at the time. Concern is should each rule use the same metric to trigger the inhibition or to create different metrics?