Resolved alerts are grouped into firing alerts

56 views

Skip to first unread message

Ivan

unread,

Jan 13, 2023, 6:08:37 AM1/13/23

to Prometheus Users

I got this strange behavior where resolved alerts are sent alongside with firing ones. So i have this rule kube_pod_container_status_ready{namespace="default"} == 0. What happens: when pod is down alert is sent and everything is fine, then pod is up and it is resolved. But if pod will fail again in a short period an gets recreated by deploy with different name then the alert will be fired mentioning previous pod and new one. I also noticed that if you wait about 20 minutes after alert is resolved and kill a pod again there is only one pod in the alert.

This is expected:
fist alert 12:24
Container ubuntu in pod test-ubuntu-5579c5f49c-rsb8v is not ready for 30 seconds.

Prometheus Alert (Firing)

summary Container is not ready for too long.
alertname KubeContainerNotReady
container ubuntu
endpoint http
instance 10.233.74.200:8080
job kube-state-metrics
pod test-ubuntu-5579c5f49c-rsb8v
prometheus prometheus/prometheus-kube-prometheus-prometheus
service prometheus-kube-state-metrics
severity warning
uid 85f61574-2559-4f1a-8a14-f08ee4e34b8a

second alert 12:27

Container ubuntu in pod test-ubuntu-5579c5f49c-rsb8v is not ready for 30 seconds.

Prometheus Alert (Resolved)

summary Container is not ready for too long.
alertname KubeContainerNotReady
container ubuntu
endpoint http
instance 10.233.74.200:8080
job kube-state-metrics
pod test-ubuntu-5579c5f49c-rsb8v
prometheus prometheus/prometheus-kube-prometheus-prometheus
service prometheus-kube-state-metrics
severity warning
uid 85f61574-2559-4f1a-8a14-f08ee4e34b8a

Then I kill the pod and this happens (its a single alert):

third alert: 12:32

Container ubuntu in pod test-ubuntu-5579c5f49c-rsb8v is not ready for 30 seconds. 12:32

Prometheus Alert (Firing)

summary Container is not ready for too long.
alertname KubeContainerNotReady
container ubuntu
endpoint http
instance 10.233.74.200:8080
job kube-state-metrics
pod test-ubuntu-5579c5f49c-rsb8v
prometheus prometheus/prometheus-kube-prometheus-prometheus
service prometheus-kube-state-metrics
severity warning
uid 85f61574-2559-4f1a-8a14-f08ee4e34b8a

Container ubuntu in pod test-ubuntu-5579c5f49c-sjlrk is not ready for 30 seconds.

summary Container is not ready for too long.
alertname KubeContainerNotReady
container ubuntu
endpoint http
instance 10.233.74.200:8080
job kube-state-metrics
pod test-ubuntu-5579c5f49c-sjlrk
prometheus prometheus/prometheus-kube-prometheus-prometheus
service prometheus-kube-state-metrics
severity warning
uid aba2b39c-a4b1-4d02-a532-4ca39ef8c0da

here's my config:

config:

global:

resolve_timeout: 5m

route:

group_by: ['alertname']

group_interval: 30s

repeat_interval: 24h

group_wait: 30s

receiver: 'prometheus-msteams'

receivers:

- name: 'prometheus-msteams'

webhook_configs: # https://prometheus.io/docs/alerting/configuration/#webhook_config

- send_resolved: true

url: "http://prometheus-msteams:2000/prometheus-msteams"

Now, I know I can just group them by pod or some other labels or even turn off grouping, but I want to figure out what exactly happens here. Also i cant figure out what will happen to alert that has no label by which i am grouping. For example if i group by podname how will alerts without pod be treated.

Reply all

Reply to author

Forward

0 new messages