Unnecessary/wrong alerts

43 views

Skip to first unread message

Amol Nagotkar

unread,

Mar 24, 2025, 8:17:44 AMMar 24

to Prometheus Users

Hi all,

Got this alert from alertmanager.

[{

"status": "firing",

"labels": {

"alertname": "Instance Down",

"instance": "IP_ADDRESS:15692",

"job": "XYZ",

"severity": "Critical"

"annotations": {

"description": "IP_ADDRESS:15692 of job XYZ has been down for more than 30 sec.",

"summary": "Endpoint IP_ADDRESS:15692 down"

"startsAt": "2025-03-23T11:52:01.891Z",

"endsAt": "0001-01-01T00:00:00Z",

"generatorURL": "http://XYZ-container-pr:9091/graph?g0.expr=up+%3D%3D+0&g0.tab=1",

"fingerprint": "bf26da1f76eab5ca"

}

]

But if you see the graph where is no alerts. plus in reality Instance wan't Down.

how to debug/solve this?

Thank you in advance.

Brian Candler

unread,

Mar 24, 2025, 12:10:06 PMMar 24

to Prometheus Users

You are looking at the graph over a 4 week time window. A short burst of up==0 will not be visible. For example, if the graph had 1000 points on the X axis, then there would only be one sample roughly every 40 minutes.

Zoom in to a shorter time window, or use a query like this:

min_over_time(up[4h]) == 0

> plus in reality Instance wan't Down.

You didn't show the alerting rule, but if it was "up == 0", then I can assure you that you got up == 0 at least once - Prometheus doesn't lie. This means a failed scrape. It doesn't necessarily mean the target was down, but it could be a network problem between prometheus and the target, or a resource limitation which was preventing the target from responding.

Reply all

Reply to author

Forward

0 new messages