How alertmanager works with group_wait and group_interval

1,648 views
Skip to first unread message

californi...@gmail.com

unread,
Feb 26, 2018, 6:59:10 AM2/26/18
to Prometheus Users
Hi,

Let me clarify how alertmanager works with group_wait and group_interval.

The very first alert (alert A) that creates a new alert group waits for the period of group_wait before being sent as notification. After 1st notification has been sent, following alerts (alert B) wait for the period of group_interval. Assuming all the firing alerts (alert A, B) are resolved in the alert group, then a new alert (alert C) that has same labels with alert A, B created, which timer is applied to alert C? I expected group_wait because all firing alerts has been resolved, but it seems that group_interval is applied to alertC. When will group_wait be used again?

---------------------

what happens:
I set group_wait and group_interval as shown below. I created an alert that matches to both routes for account1 and account2. Then I resolved the alert and recreate same alert. I expected group_wait (which is 1s) is applied to the recreated alert. Account1 receives notification in 1s as expected but account2 receives it 35-50s later unexpectedly.

alertmanager version: 0.14.0

alertmanager configuration:
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 1s
group_interval: 60s
repeat_interval: 30d
receiver: account1@localhost
routes:
- match_re:
severity: ^.*$
continue: true
group_wait: 1s
group_interval: 1s
receiver: account1@localhost
- match:
severity: ALERT
continue: false
receiver: account2@localhost

inhibit_rules:

receivers:
- name: 'account1@localhost'
email_configs:
- from: 'alertmanager@localhost'
to: account1@localhost
smarthost: localhost:25
require_tls: false
send_resolved: true
html: '{{ template "template1.html" . }}'
- name: account2@localhost
email_configs:
- from: alertmanager@localhost
to: account2@localhost
smarthost: localhost:25
require_tls: false
send_resolved: true
html: '{{ template "template1.html" . }}'

- alertmanager and postfix that receives notification email from alertmanager are in same host.
- I compared StartsAt of an alert with Date field of mail spool that postfix uses to find out the time lag.
- An unexpected interval of 35-45s gets longer when I set group_interval: 5m at top layer of routes configuration. It's around 4m40s. This seems to mean group_interval is applied instead of group_wait.

anuthan...@gmail.com

unread,
Feb 26, 2018, 9:45:16 AM2/26/18
to Prometheus Users
i want to monitor (python ) services and api's, which is the best exporter to monitor those things using prometheus, and with the configurations

ravikhand...@gmail.com

unread,
May 27, 2019, 10:36:43 PM5/27/19
to Prometheus Users
Grup in wait

benny...@gmail.com

unread,
May 31, 2019, 2:03:28 PM5/31/19
to Prometheus Users
Exactly, since the alerts A and B are resolved, group_interval is used for alert C. Group wait is to collect similar alerts, most notably for inhibition.

Reply all
Reply to author
Forward
0 new messages