Repeat interval and absent hours on instance.

33 views
Skip to first unread message

Sebastian Glock

unread,
Feb 25, 2021, 8:27:18 AM2/25/21
to Prometheus Users
Hi, I have problem with repeat_interval. I want to specify in routes diffrent repeating.

alertmanager.yml

```
global:
route:
  receiver: alert-emailer-default-30m

  group_by: ['alertname', 'priority', 'instance']
  group_wait: 1m
  group_interval: 1m
  repeat_interval: 30s

  routes:
  - receiver: alert-emailer-default-30m
    group_by: ['alertname', 'priority', 'instance']
    group_wait: 1m
    group_interval: 1m
    repeat_interval: 5m

    match:
      severity: "[Disaster] {{ $labels.instance }}"

  - receiver: alert-emailer-default-1h
    group_by: ['alertname', 'priority', 'instance']
    group_wait: 1m
    group_interval: 1m
    repeat_interval: 1m

    match:
      severity: "[High] {{ $labels.instance }}"


  - receiver: alert-emailer-default-3h
    group_by: ['alertname', 'priority', 'instance']
    group_wait: 1m
    group_interval: 1m
    repeat_interval: 3m
   
    match:
      severity: "[Average] {{ $labels.instance }}"

```



alert.rules.yml:

```
groups:
- name: alert.rules
  rules:
#Windows
#CPU

  - alert: CPU load is more than 70%!
    expr: 100 - (1 -avg(irate(windows_cpu_time_total{mode="user"}[10m])) by (instance)) * 100 >= 40
    for: 30s
    labels:
      severity: "[Average] {{ $labels.instance }}"
    annotations:
      summary: "CPU load is more than 70%!"
      description: "{{ humanize $value }}%"

  - alert: CPU load is more than 80%!
    expr: 100 - (1 -avg(irate(windows_cpu_time_total{mode="user"}[10m])) by (instance)) * 100 >= 40
# AND ON() absent(hour() >= 0 < 18{instance="10.16.155.150"})
    for: 10s
    labels:
      severity: "[High] {{ $labels.instance }}"
    annotations:
      summary: "CPU load is more than 80%!"
      description: "{{ humanize $value }}%"

  - alert: CPU load is more than 90%!
    expr: 100 - (1 -avg(irate(windows_cpu_time_total{mode="user"}[10m])) by (instance)) * 100 >= 40
    for: 50s
    labels:
      severity: "[Disaster] {{ $labels.instance }}"
    annotations:
      summary: "CPU load is more than 90%!"
      description: "{{ humanize $value }}%"

```

but still main route holding repeat_interval. I'm getting reminder every 30 seconds, which is it implemented in main route. How to solve it so that e-mails arrive at different intervals defined in routes?


Second question is about absent hours:

```
100 - (1 -avg(irate(windows_cpu_time_total{mode="user"}[10m])) by (instance)) * 100 AND ON() absent(hour() >0 <12) AND ON() absent(nonexistent{instance="10.16.22.22"})

```
How to specify absent hours for given instances between 20 pm and 6 am?

Thanks for all your help!


Sebastian Glock

unread,
Feb 25, 2021, 1:16:22 PM2/25/21
to Prometheus Users

Ok I have done 1st problem:

```
severity: "[Average] {{ $labels.instance }}" 

``` 
There can't be $labels.instance - without it works great.
Reply all
Reply to author
Forward
0 new messages