group_wait parameter not working

26 views
Skip to first unread message

Valentin Bogdan

unread,
Nov 22, 2022, 10:33:24 AM11/22/22
to Prometheus Users
I'm trying to use the group_wait parameter in order to allow Alertmanager to wait for all the alerts received from Prometheus, group them and send a single notification.

I have the following configuration:

route:
  receiver: default-receiver
  group_by:
  - alertname
  - environment
  continue: false
  group_wait: 5m
  group_interval: 20m
  repeat_interval: 1d


receivers:
- name: default-receiver
  email_configs:
  - send_resolved: true
    to: mye...@exmaple.com
    from: alertm...@example.com
    hello: localhost
    smarthost: smptserver:25



Although the group_wait parameter is set to 5 minutes, as soon as Alertmanager receives the alerts from Prometheus, it flushes the alerts and also sends a notification to the configured receiver. I would expect Alertmanager to delay the notification message and send it after 5 minutes (value of group_wait parameter).

ts=2022-11-22T12:37:19.367Z caller=cluster.go:705 level=info component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000781422s
ts=2022-11-22T12:37:21.368Z caller=cluster.go:702 level=debug component=cluster msg="gossip looks settled" elapsed=4.001197371s
ts=2022-11-22T12:37:23.368Z caller=cluster.go:702 level=debug component=cluster msg="gossip looks settled" elapsed=6.001883916s
ts=2022-11-22T12:37:25.369Z caller=cluster.go:702 level=debug component=cluster msg="gossip looks settled" elapsed=8.00222292s
ts=2022-11-22T12:37:27.369Z caller=cluster.go:697 level=info component=cluster msg="gossip settled; proceeding" elapsed=10.002782746s
ts=2022-11-22T12:37:42.811Z caller=dispatch.go:165 level=debug component=dispatcher msg="Received alert" alert=file_not_processed[c0e2772][active]
ts=2022-11-22T12:37:42.812Z caller=dispatch.go:165 level=debug component=dispatcher msg="Received alert" alert=file_not_processed[64605a5][active]
ts=2022-11-22T12:37:42.812Z caller=dispatch.go:165 level=debug component=dispatcher msg="Received alert" alert=file_not_processed[e70ae18][active]
ts=2022-11-22T12:37:42.812Z caller=dispatch.go:165 level=debug component=dispatcher msg="Received alert" alert=file_not_processed[7325965][active]
ts=2022-11-22T12:37:42.812Z caller=dispatch.go:517 level=debug component=dispatcher aggrGroup="{}:{alertname=\"file_not_processed\", environment=\"ACC\"}" msg=flushing alerts=[file_not_processed[c0e2772][active]]
ts=2022-11-22T12:37:42.812Z caller=dispatch.go:517 level=debug component=dispatcher aggrGroup="{}:{alertname=\"file_not_processed\", environment=\"DEV\"}" msg=flushing alerts="[file_not_processed[64605a5][active] file_not_processed[e70ae18][active] file_not_processed[7325965][active]]"
ts=2022-11-22T12:37:42.883Z caller=notify.go:743 level=debug component=dispatcher receiver=default-receiver integration=webhook[0] msg="Notify success" attempts=1
ts=2022-11-22T12:37:42.914Z caller=notify.go:743 level=debug component=dispatcher receiver=default-receiver integration=webhook[0] msg="Notify success" attempts=1
ts=2022-11-22T12:37:43.031Z caller=notify.go:743 level=debug component=dispatcher receiver=default-receiver integration=email[0] msg="Notify success" attempts=1
ts=2022-11-22T12:37:43.031Z caller=notify.go:743 level=debug component=dispatcher receiver=default-receiver integration=email[0] msg="Notify success" attempts=1
ts=2022-11-22T12:37:43.660Z caller=dispatch.go:165 level=debug component=dispatcher msg="Received alert" alert=locked_oracle_accounts[bcc49ac][active]
ts=2022-11-22T12:37:43.660Z caller=dispatch.go:517 level=debug component=dispatcher aggrGroup="{}:{alertname=\"locked_oracle_accounts\", environment=\"DEV\"}" msg=flushing alerts=[locked_oracle_accounts[bcc49ac][active]]
ts=2022-11-22T12:37:43.704Z caller=notify.go:743 level=debug component=dispatcher receiver=default-receiver integration=webhook[0] msg="Notify success" attempts=1
ts=2022-11-22T12:37:43.840Z caller=notify.go:743 level=debug component=dispatcher receiver=default-receiver integration=email[0] msg="Notify success" attempts=1
ts=2022-11-22T12:37:58.355Z caller=dispatch.go:165 level=debug component=dispatcher msg="Received alert" alert=sdl_critical_services_down[7b9c988][active]
ts=2022-11-22T12:37:58.355Z caller=dispatch.go:517 level=debug component=dispatcher aggrGroup="{}:{alertname=\"sdl_critical_services_down\", environment=\"TST\"}" msg=flushing alerts=[sdl_critical_services_down[7b9c988][active]]
ts=2022-11-22T12:37:58.398Z caller=notify.go:743 level=debug component=dispatcher receiver=default-receiver integration=webhook[0] msg="Notify success" attempts=1
ts=2022-11-22T12:37:58.416Z caller=dispatch.go:165 level=debug component=dispatcher msg="Received alert" alert=sdl_critical_services_down[7b9c988][active]
ts=2022-11-22T12:37:58.494Z caller=notify.go:743 level=debug component=dispatcher receiver=default-receiver integration=email[0] msg="Notify success" attempts=1
ts=2022-11-22T12:38:02.724Z caller=dispatch.go:165 level=debug component=dispatcher msg="Received alert" alert=edl_instance_down[49003d1][active]
ts=2022-11-22T12:38:02.724Z caller=dispatch.go:517 level=debug component=dispatcher aggrGroup="{}:{alertname=\"edl_instance_down\", environment=\"ACC\"}" msg=flushing alerts=[edl_instance_down[49003d1][active]]
ts=2022-11-22T12:38:02.765Z caller=notify.go:743 level=debug component=dispatcher receiver=default-receiver integration=webhook[0] msg="Notify success" attempts=1
ts=2022-11-22T12:38:02.876Z caller=notify.go:743 level=debug component=dispatcher receiver=default-receiver integration=email[0] msg="Notify success" attempts=1


I expect Alertmanager to group the alerts from Prometheus and send after 5 minutes (group_wait value) 1 single notification that contains all the grouped alerts. In my case it seems like group_wait parameter is not considered and as soon as the alert is received from Prometheus, a notification to the receiver is sent immediately after. Due to this behavior, alertmanager won't have time to group all the alerts of the same type (based on my group_by filters) and  i will have multiple notifications for the same alerts at a new evaluation interval period (group_interval). 
Reply all
Reply to author
Forward
0 new messages