Unnecessary/wrong alerts

43 views
Skip to first unread message

Amol Nagotkar

unread,
Mar 24, 2025, 8:17:44 AMMar 24
to Prometheus Users
Hi all,

Got this alert from alertmanager.
[{
    "status": "firing",
    "labels": {
        "alertname": "Instance Down",
        "instance": "IP_ADDRESS:15692",
        "job": "XYZ",
        "severity": "Critical"
    },
    "annotations": {
        "description": "IP_ADDRESS:15692 of job XYZ has been down for more than 30 sec.",
        "summary": "Endpoint IP_ADDRESS:15692 down"
    },
    "startsAt": "2025-03-23T11:52:01.891Z",
    "endsAt": "0001-01-01T00:00:00Z",
    "fingerprint": "bf26da1f76eab5ca"
    }
]

Screenshot from 2025-03-24 17-19-16.png
But if you see the graph where is no alerts. plus in reality Instance wan't Down.
how to debug/solve this?

Thank you in advance.

Brian Candler

unread,
Mar 24, 2025, 12:10:06 PMMar 24
to Prometheus Users
You are looking at the graph over a 4 week time window. A short burst of up==0 will not be visible. For example, if the graph had 1000 points on the X axis, then there would only be one sample roughly every 40 minutes.

Zoom in to a shorter time window, or use a query like this:

min_over_time(up[4h]) == 0

> plus in reality Instance wan't Down.

You didn't show the alerting rule, but if it was "up == 0", then I can assure you that you got up == 0 at least once - Prometheus doesn't lie.  This means a failed scrape.  It doesn't necessarily mean the target was down, but it could be a network problem between prometheus and the target, or a resource limitation which was preventing the target from responding.
Reply all
Reply to author
Forward
0 new messages