Resolved Alerts Still Generating Notifications

794 views
Skip to first unread message

matthe...@gmail.com

unread,
Mar 29, 2018, 6:11:06 PM3/29/18
to Prometheus Users
This message seems to have been deleted. Or at least that is what I am seeing when viewing it in this group. Not sure why it would have gotten deleted so I am reposting. Apologies if this is a duplicate.

I am trying to figure out grouping and inhibiting rules and struggle a little.

Here is my basic config:

global:
 resolve_timeout: 5m
 slack_api_url: .....
templates:
- '/etc/alertmanager/template/*.tmpl'

route:
 group_by: [ 'alertname', 'cluster' ]
 # how long to initially wait to send a notification
 # (allows for inhibiting alert to arrive)
 group_wait: 30s
 # how often to send notifications for this group
 group_interval: 5m
 # how long to wait before re-sending a given alert that
 # has already been sent
 repeat_interval: 5m
 receiver: default-receiver

inhibit_rules:
- source_match:
   severity: 'critical'
 target_match:
   severity: 'warning'
# apply inhibition if alertname is the same
 equal: [ 'alertname', 'cluster', 'namespace', 'deployment'  ]

receivers:
- name: 'default-receiver'
 slack_configs:
 - channel: '#tps-reports'
   title: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] Prometheus Event Notification'
   text: >-
       {{ range .Alerts }}
          *Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
          *Description:* {{ .Annotations.description }}
          *Details:*
          {{ range .Labels.SortedPairs }}  - {{ .Name }}: `{{ .Value }}`
          {{ end }}
       {{ end }}
   send_resolved: true

And here is the alert rules I am currently experimenting with. As you can see I have two alerts with the same name. One is a severity of critical and the other is a severity of warning.

groups:
- name: pod.rules
 rules:
 - alert: DeploymentReplicasUnavailable
   expr: kube_deployment_status_replicas - kube_deployment_status_replicas_available > 0
   for: 1m
   labels:
     severity: warning
   annotations:
     description: Deployment {{ $labels.deployment }} has {{ $value }} unavailable replicas
     summary: "{{ $labels.deployment }} : {{ $value }} replicas unavailable"
 - alert: DeploymentReplicasUnavailable
   expr: kube_deployment_status_replicas_available == 0
   for: 1m
   labels:
     severity: critical
   annotations:
     description: Deployment {{ $labels.deployment }} has no available replicas
     summary: "{{ $labels.deployment }} : no replicas available"

So, if I am understanding this correctly any alerts that come in with the same alertname and cluster get sent as one notification. This part appears to be working as when multiple alerts over multiple deployments get sent as one alert to Slack. Great.

It even appears as those the inhibiting rule is working as if I have a deployment with 0 replicas available, I just get one alert in the notification for the critical alert, not the warning alert. 

The problem is that when an alert resolves, I am not seeing a notification for it. Within prometheus is no longer shows as FIRING (and in alertmanager it doesn't show under alerts). Therefore when another notification is sent for this group (if there are FIRING alerts), the previous cleared and now resolved alerts are including in the notification. This continues for about 2 notification periods and then the resolved alerts no longer appear in subsequent notifications.

Further testing is showing that a resolved notification is sent only when all alerts for a grouping are clear. Is that normal behavior? By that I mean say I have two DeploymentReplicasUnavailable alerts. One for deployment1 and the other for deployment2. If deployment2 alert resolves (replica comes online), I don't get a resolved notification until deployment1 resolves itself as well. 

Simon Pasquier

unread,
Mar 30, 2018, 3:55:26 AM3/30/18
to matthe...@gmail.com, Prometheus Users
Yes, AlertManager will send the resolved notification only when all alerts in the group are resolved.
 

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/9867135d-079c-4804-9575-2701adcea8fd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages