Hi,
Recently, I've been debugging an issue where the alert is resolving even though from prometheus it is in firing mode.
so, the cycle is firing->resolving->firing.
After going through some documents and blogs, I found out that alertmanager will resolve the alert if the prometheus doesn't send the alert within the "resolve_timeout".
If, Prometheus now sends the endsAt field to the Alertmanager with a very short timeout until AlertManager can mark the alert as resolved. This overrides the resolve_timeout setting in AlertManager and creates the firing->resolved->firing behavior if Prometheus does not resend the alert before the short timeout.
Is that understanding correct?
Questions as follows:
1) How endsAt time is calculated? Is that calculated from resend_delay?
2) say, If alert is sent to alertmanager and resend_delay is 100hrs, and during next evaluation interval which is 1m, alert is resolved. will prometheus send alert clear to alertmanager or it will wait for 100hrs(resend_delay)?
3) msg:Received alerts logs are received when prometheus sends alerts to alertmanager? msg:flushing logs get logged when? (below)
4) evaluation_interval : 1m, and scrape_interval : 1m. then why did the received alert at 12:34 and received alert at 12:36 have a time difference of 2m?
When I do get a request for alerts from alertmanager, I could see endsat time is +4 minutes from the last received alert, why is that so? Is my resend_delay 4m? Because, I didn't set the resend_delay value.
Below are the logs from alertmanager :
level=debug ts=2021-08-29T12:34:40.342Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=disk_utilization[6356c43][active]
level=debug ts=2021-08-29T12:34:40.342Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=disk_utilization[1db5352][active]
level=debug ts=2021-08-29T12:34:40.381Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}/{name=~\"^(?:test-1)$\"}:{alertname=\"disk_utilization\"}" msg=flushing alerts="[disk_utilization[6356c43][active] disk_utilization[1db5352][active]]"
level=debug ts=2021-08-29T12:35:10.381Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}/{name=~\"^(?:test-1)$\"}:{alertname=\"disk_utilization\"}" msg=flushing alerts="[disk_utilization[6356c43][active] disk_utilization[1db5352][active]]"
level=debug ts=2021-08-29T12:35:40.382Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}/{name=~\"^(?:test-1)$\"}:{alertname=\"disk_utilization\"}" msg=flushing alerts="[disk_utilization[6356c43][active] disk_utilization[1db5352][active]]"
level=debug ts=2021-08-29T12:36:10.382Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}/{name=~\"^(?:test-1)$\"}:{alertname=\"disk_utilization\"}" msg=flushing alerts="[disk_utilization[6356c43][active] disk_utilization[1db5352][active]]"
level=debug ts=2021-08-29T12:36:40.345Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=disk_utilization[6356c43][active]
level=debug ts=2021-08-29T12:36:40.345Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=disk_utilization[1db5352][active]
Get request from alertmanager:
curl
http://10.233.49.116:9092/api/v1/alerts{"status":"success","data":[{"labels":{"alertname":"disk_utilization","device":"xx.xx.xx.xx:/media/test","fstype":"nfs4","instance":"xx.xx.xx.xx","job":"test-1","mountpoint":"/media/test","node_name":"test-1","severity":"critical"},"annotations":{"summary":"Disk utilization has crossed x%. Current Disk utilization = 86.823044624783"},"startsAt":"2021-08-29T11:28:40.339802555Z",
"endsAt":"2021-08-29T12:40:40.339802555Z","generatorURL":"x","status":{"state":"active","silencedBy":[],"inhibitedBy":[]},"receivers":["test-1"],"fingerprint":"1db535212ea6dcf6"},{"labels":{"alertname":"disk_utilization","device":"test","fstype":"ext4","instance":"xx.xx.xx.xx","job":"Node_test-1","mountpoint":"/","node_name":"test-1","severity":"critical"},"annotations":{"summary":"Disk utilization has crossed x%. Current Disk utilization = 94.59612027578963"},"startsAt":"2021-08-29T11:28:40.339802555Z","
endsAt":"2021-08-29T12:40:40.339802555Z","generatorURL":"x","status":{"state":"active","silencedBy":[],"inhibitedBy":[]},"receivers":["test-1"],"fingerprint":"6356c43dc3589622"}]}
thanks,
Akshay