I got this strange behavior where resolved alerts are sent alongside with firing ones. So i have this rule kube_pod_container_status_ready{namespace="default"} == 0. What happens: when pod is down alert is sent and everything is fine, then pod is up and it is resolved. But if pod will fail again in a short period an gets recreated by deploy with different name then the alert will be fired mentioning previous pod and new one. I also noticed that if you wait about 20 minutes after alert is resolved and kill a pod again there is only one pod in the alert.
This is expected:
fist alert 12:24
Container ubuntu in pod test-ubuntu-5579c5f49c-rsb8v is not ready for 30 seconds.
Prometheus Alert (Firing)
summary Container is not ready for too long.
alertname KubeContainerNotReady
container ubuntu
endpoint http
instance
10.233.74.200:8080job kube-state-metrics
pod test-ubuntu-5579c5f49c-rsb8v
prometheus prometheus/prometheus-kube-prometheus-prometheus
service prometheus-kube-state-metrics
severity warning
uid 85f61574-2559-4f1a-8a14-f08ee4e34b8a
second alert 12:27
Container ubuntu in pod test-ubuntu-5579c5f49c-rsb8v is not ready for 30 seconds.
Prometheus Alert (Resolved)
summary Container is not ready for too long.
alertname KubeContainerNotReady
container ubuntu
endpoint http
instance
10.233.74.200:8080job kube-state-metrics
pod test-ubuntu-5579c5f49c-rsb8v
prometheus prometheus/prometheus-kube-prometheus-prometheus
service prometheus-kube-state-metrics
severity warning
uid 85f61574-2559-4f1a-8a14-f08ee4e34b8a
Then I kill the pod and this happens (its a single alert):
third alert: 12:32
Container ubuntu in pod test-ubuntu-5579c5f49c-rsb8v is not ready for 30 seconds. 12:32
Prometheus Alert (Firing)
summary Container is not ready for too long.
alertname KubeContainerNotReady
container ubuntu
endpoint http
instance
10.233.74.200:8080job kube-state-metrics
pod test-ubuntu-5579c5f49c-rsb8v
prometheus prometheus/prometheus-kube-prometheus-prometheus
service prometheus-kube-state-metrics
severity warning
uid 85f61574-2559-4f1a-8a14-f08ee4e34b8a
Container ubuntu in pod test-ubuntu-5579c5f49c-sjlrk is not ready for 30 seconds.
summary Container is not ready for too long.
alertname KubeContainerNotReady
container ubuntu
endpoint http
instance
10.233.74.200:8080job kube-state-metrics
pod test-ubuntu-5579c5f49c-sjlrk
prometheus prometheus/prometheus-kube-prometheus-prometheus
service prometheus-kube-state-metrics
severity warning
uid aba2b39c-a4b1-4d02-a532-4ca39ef8c0da
Now, I know I can just group them by pod or some other labels or even turn off grouping, but I want to figure out what exactly happens here. Also i cant figure out what will happen to alert that has no label by which i am grouping. For example if i group by podname how will alerts without pod be treated.