Good morning!
I'm new to prometheus and wondering what is causing a delay in receiving resolved slack alerts. I enabled debug logging and have tried changing resolve_timeout, yet there is still a delay beyond what the configuration is set. I receive the initial InstanceDown slack alert quickly and when expected, but you can see from the debug logs that prometheus is sending a resolved alert to alertmanager, but alertmanager is delaying flushing and sending to slack. I tried peering into the alertmanager data, but couldn't tell if prometheus is sending EndsAt. resolve_timeout seems to have no effect. It always seems to send 5m after the active alert is triggered.
######### alertmanager version #########
root@alertmanager1:/etc/alertmanager# alertmanager --version
alertmanager, version 0.20.0 (branch: HEAD, revision: f74be0400a6243d10bb53812d6fa408ad71ff32d)
build user: root@00c3106655f8
build date: 20191211-14:13:14
go version: go1.13.5
######### alertmanager.yml #########
global:
resolve_timeout: 15s
templates:
- '/etc/alertmanager/*.tmpl'
route:
repeat_interval: 1h
receiver: critical
receivers:
- name: 'critical'
slack_configs:
- api_url:
https://hooks.slack.com/services/XXXXXXX/XXXXXXXX/XXXXXXXX channel: '#alerts'
send_resolved: true
title: '{{ template "title" . }}'
text: '{{ template "slack_message" . }}'
######### prometheus alerts config #########
groups:
- name: example
rules:
# Alert for any instance that is unreachable
- alert: InstanceDown
expr: up == 0
labels:
severity: page
annotations:
summary: "Instance {{$labels.instance}} down"
description: "{{$labels.instance}} is down!"
######### alertmanager debug logs #########
Apr 16 13:13:06 alertmanager1-mon-prd-cle alertmanager[28061]: level=debug ts=2020-04-16T13:13:06.458Z caller=dispatch.go:135 component=dispatcher msg="Received alert" alert=InstanceDown[bd6501b][active]
Apr 16 13:13:06 alertmanager1-mon-prd-cle alertmanager[28061]: level=debug ts=2020-04-16T13:13:06.458Z caller=dispatch.go:465 component=dispatcher aggrGroup={}:{} msg=flushing alerts=[InstanceDown[bd6501b][active]]
Apr 16 13:14:36 alertmanager1-mon-prd-cle alertmanager[28061]: level=debug ts=2020-04-16T13:14:36.456Z caller=dispatch.go:135 component=dispatcher msg="Received alert" alert=InstanceDown[bd6501b][active]
Apr 16 13:15:36 alertmanager1-mon-prd-cle alertmanager[28061]: level=debug ts=2020-04-16T13:15:36.456Z caller=dispatch.go:135 component=dispatcher msg="Received alert" alert=InstanceDown[bd6501b][resolved]
Apr 16 13:17:06 alertmanager1-mon-prd-cle alertmanager[28061]: level=debug ts=2020-04-16T13:17:06.455Z caller=dispatch.go:135 component=dispatcher msg="Received alert" alert=InstanceDown[bd6501b][resolved]
Apr 16 13:18:06 alertmanager1-mon-prd-cle alertmanager[28061]: level=debug ts=2020-04-16T13:18:06.458Z caller=dispatch.go:465 component=dispatcher aggrGroup={}:{} msg=flushing alerts=[InstanceDown[bd6501b][resolved]]