Alert Resetting on Every Evaluation Cycle

Alexander Diyakov

unread,

Jan 20, 2025, 5:02:12 AM1/20/25

to Prometheus Users

Hello Prometheus Users,

I'm facing an issue with my alert rules where the alerts are resetting on every evaluation cycle. I have simplified the setup as much as possible, but the problem persists. Here's the context:

Metric :

Metric rail_temp is continuously increasing or decreasing and is always greater than 0.

The metric is exposed via an HTTP server using the start_http_server function from prometheus_client. It updates every second.

Alert Rule:

groups:
- name: rail_temp_alerts
rules:
- alert: rail_temp_Warning
annotations:
description: rail_temp is above the warning threshold (rail_temp_th_W_G)
summary: rail_temp exceeded warning threshold
expr: rail_temp > 0
for: 10s
labels:
metric: rail_temp
severity: warning
threshold: 0
threshold_type: global
value: '{{ $value }}'

Prometheus Global Configuration

global:
scrape_interval: 7s
evaluation_interval: 4s
# scrape_timeout is set to the global default (10s).

rule_files:
- "alert_rules.yml"

scrape_configs:

- job_name: "pushgateway"
scrape_interval: 1s
static_configs:
- targets: ["localhost:9091"] # URL Pushgateway

Observations:

The rail_temp metric has no gaps and updates correctly, as seen in the screenshot

However, the alert constantly resets on each evaluation cycle (evaluation_interval: 4s), even though the for duration is set to 10 seconds. And alert newer goes to Firing, otherwise for=0.

There's two graphs of the ALERTS prometheus internal metric and Alerts tab.

What I've Tried:

Verified that the metric updates correctly without any gaps.
Used both push_to_gateway and start_http_server to expose metrics, but the behavior remains the same.
Increased the for duration and adjusted the scrape_interval and evaluation_interval, but it didn't help.

Expected Behavior:

The alert should transition to firing after the for duration is met without resetting on each evaluation cycle.

Current Behavior:

The alert resets to pending every 4 seconds (matching the evaluation_interval) instead of transitioning to firing.

I believe this could be a bug or misconfiguration, but I'm not sure how to further debug this. Any insights or suggestions on resolving this would be greatly appreciated.

Thank you in advance!

Best regards,

Alexander

Screenshot 2025-01-20 122907.png

Screenshot 2025-01-20 123042.png

Screenshot 2025-01-20 123108.png

Screenshot 2025-01-20 122907.png

Brian Candler

unread,

Jan 20, 2025, 9:26:27 AM1/20/25

to Prometheus Users

I can see from your ALERTS graph that your alerts are all different (they have a different combination of labels), which in turn comes from here:

labels:
metric: rail_temp
severity: warning
threshold: 0
threshold_type: global

value: '{{ $value }}' <<< HERE

Just remove that label, and you should be good. You can use $value in annotations, but you should not use it in labels, for this very reason.

What's happening is that $value changes, and so the old alert (with value="old") resolves, and a new alert fires (with value="new")

Alexander Diyakov

unread,

Jan 20, 2025, 2:28:04 PM1/20/25

to Prometheus Users

Thank you so much for pointing this out!

I completely overlooked the fact that including value in the labels would create distinct alerts for every change in the metric's value. Your explanation about $value causing the old alert to resolve and a new one to fire makes perfect sense.

I’ve now removed the value label and kept it only in the annotations, as you suggested. After testing, the alert behaves exactly as expected and no longer resets on each evaluation cycle.

This was an invaluable insight—thank you again for taking the time to help me resolve this issue!

Best regards,

Alexander

понедельник, 20 января 2025 г. в 17:26:27 UTC+3, Brian Candler:

Reply all

Reply to author

Forward