Hi,
We are using blackbox exporter on a remote location to monitor gateway routers, hypervisors and virtual machines (router —> hypervisor —> virtual machines). We are looking for something like below.
Example 1:
If a gateway router is down and alertmanager is firing, it should stop alerting on hypervisor hosts and servers
Example2:
If a hypervisor is down, it should not alert on the virtual machines
On prometheus, we group routers in one group, hypervisor on another group and also virtual machines as a single group.
Example:
job_name: 'blackbox_icmp-routers
job_name: 'blackbox_icmp-hypervisors
job_name: 'blackbox_icmp-virtualmachines
Alertmanager rules are defined based on each job
- name: RouterDown
rules:
- alert: R-InstanceDown
expr: probe_success{job="blackbox_icmp-routers} == 0
for: 1m
- name: HypervisorDown
rules:
- alert: H-InstanceDown
expr: probe_success{job="blackbox_icmp-hypervisors} == 0
for: 1m
- name: VirtualMachinesDown
rules:
- alert: V-InstanceDown
expr: probe_success{job="blackbox_icmp-virtualmachines} == 0
for: 1m
Alertmanager config as below:
route:
group_by: ['alertname']
receiver: ms-teams
repeat_interval: 5m
receivers:
- name: ms-teams
webhook_configs:
- url: 'http://monitoring:2000/alertmanager'
send_resolved: false
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
Any help is much appreciated.
Thanks
Sandosh