Duplicate Slack notifications from Alertmanager (my misunderstanding?)

252 views

Skip to first unread message

William Hargrove

unread,

Dec 2, 2021, 5:20:21 PM12/2/21

to Prometheus Users

alertmanager: 0.21.0

prometheus: 2.30.3

I am trying to get my head around some unexpected alertmanager behaviour.

I am alerting on the following metrics:

client_disconnect{appenv="testbed",conn="2",compid="CLIENT-A"} 1

client_disconnect{appenv="testbed",conn="3",compid="CLIENT-A"} 1

client_disconnect{appenv="testbed",conn="4",compid="CLIENT-A"} 1

client_disconnect{appenv="testbed",conn="5",compid="CLIENT-A"} 0

and have the rule below defined:

- alert: Client Disconnect

expr: client_disconnect == 1

for: 2s

labels:

severity: critical

notification: slack

annotations:

summary: "Appenv {{ $labels.appenv }} on connection {{ $labels.conn }} compid {{ $labels.compid }} down"

description: "{{ $labels.instance }} disconnect: {{ $labels.appenv }} on connection {{ $labels.conn }} compid {{ $labels.compid }}"

My alertmanager config is as below:

global:

slack_api_url: 'https://hooks.slack.com/services/REDACTED'

route:

group_wait: 5s

group_interval: 5s

group_by: ['section','env']

repeat_interval: 10m

receiver: 'default_receiver'

routes:

- match:

notification: slack

receiver: slack_receiver

group_by: ['appenv','compid']

receivers:

- name: 'slack_receiver'

slack_configs:

- channel: 'monitoring'

send_resolved: true

title: '{{ template "custom_title" . }}'

text: '{{ template "custom_slack_message" . }}'

- name: 'default_receiver'

webhook_configs:

- url: http://pi4-1.home:5000

send_resolved: true

templates:

- /etc/alertmanager/notifications.tmpl

My custom template results in a message as formatted below being display in Slack:

as expected this repeats every 10 mins.

If one of these client_disconnects subsequently resolves, such that the metric now looks like this:

client_disconnect{appenv="testbed",conn="2",compid="CLIENT-A"} 1

client_disconnect{appenv="testbed",conn="3",compid="CLIENT-A"} 1

client_disconnect{appenv="testbed",conn="4",compid="CLIENT-A"} 0

client_disconnect{appenv="testbed",conn="5",compid="CLIENT-A"} 0

Then I receive the following messages:

When the repeat interval comes round (10 mins later) I receive the following messages:

The second firing line comes in at 22:02 and the third firing line at 22:03 (sorry the timestamps only show through a hover over in Slack).

I can't understand this behaviour. I am running single unclustered instances of prometheus and alertmanager.

Is anyone in a position to explain this behaviour to me. I get a very similar situation if I simply use the webhook instead of slack.

The subsequent repeat (after the last message) shows the current state:

Many thanks.

For reference, my slack templates are below:

{{ define "__single_message_title" }}{{ range .Alerts.Firing }}{{ .Labels.alertname }} on {{ .Annotations.identifier }}{{ end }}{{ range .Alerts.Resolved }}{{ .Labels.alertname }} on {{ .Annotations.identifier }}{{ end }}{{ end }}

{{ define "custom_title" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ if or (and (eq (len .Alerts.Firing) 1) (eq (len .Alerts.Resolved) 0)) (and (eq (len .Alerts.Firing) 0) (eq (len .Alerts.Resolved) 1)) }}{{ template "__single_message_title" . }}{{ end }}{{ end }}

{{ if or (and (eq (len .Alerts.Firing) 1) (eq (len .Alerts.Resolved) 0)) (and (eq (len .Alerts.Firing) 0) (eq (len .Alerts.Resolved) 1)) }}

{{ range .Alerts.Firing }}{{ .Annotations.description }}{{ end }}{{ range .Alerts.Resolved }}{{ .Annotations.description }}{{ end }}

*Alerts Firing:*

Client disconnect: {{ .CommonLabels.appenv }} for {{ .CommonLabels.compid }}. Connections: {{ range .Alerts.Firing }}{{ .Labels.conn }} {{ end }}have failed.

*Alerts Resolved:*

Client disconnect: {{ .CommonLabels.appenv }} for {{ .CommonLabels.compid }}. Connections: {{ range .Alerts.Resolved }}{{ .Labels.conn }} {{ end }}have failed.

Reply all

Reply to author

Forward

0 new messages