Query on Inhibit Rules

17 views
Skip to first unread message

Sandosh Kumar P

unread,
Aug 24, 2022, 9:04:52 AM8/24/22
to Prometheus Users

Hi,


We are using blackbox exporter on a remote location to monitor gateway routers, hypervisors and virtual machines (router —> hypervisor —> virtual machines). We are looking for something like below.

Example 1:

If a gateway router is down and alertmanager is firing, it should stop alerting on hypervisor hosts and servers

Example2:

If a hypervisor is down, it should not alert on the virtual machines


On prometheus, we group routers in one group, hypervisor on another group and also virtual machines as a single group. 

Example:

job_name: 'blackbox_icmp-routers

job_name: 'blackbox_icmp-hypervisors

job_name: 'blackbox_icmp-virtualmachines


Alertmanager rules are defined based on each job

- name: RouterDown

   rules:

   - alert: R-InstanceDown

     expr: probe_success{job="blackbox_icmp-routers} == 0

     for: 1m


- name: HypervisorDown

   rules:

   - alert: H-InstanceDown

     expr: probe_success{job="blackbox_icmp-hypervisors} == 0

     for: 1m


- name: VirtualMachinesDown

   rules:

   - alert: V-InstanceDown

     expr: probe_success{job="blackbox_icmp-virtualmachines} == 0

     for: 1m


Alertmanager config as below:

route:

  group_by: ['alertname']

  receiver: ms-teams

  repeat_interval: 5m

receivers:

- name: ms-teams

  webhook_configs:

    - url: 'http://monitoring:2000/alertmanager'

      send_resolved: false

inhibit_rules:

  - source_match:

      severity: 'critical'

    target_match:

      severity: 'warning'

    equal: ['alertname', 'dev', 'instance']


Any help is much appreciated.



Thanks

Sandosh

Reply all
Reply to author
Forward
0 new messages