Up server alert

28 views
Skip to first unread message

Tamar

unread,
Mar 16, 2021, 3:10:19 AM3/16/21
to Prometheus Users
Hi,

I'd like to get an alert if the host is down.

Currently I am using the follwoing rule:
up{instance="host:port", job="node-exporter"}

However this is based on the assumption that in case Node Exporter is unavailable then the whole host is down, which is not always the case (as I am using node-exporter container).

I there another way to check if the host itself is indeed down?


Thanks,

Tamar

Stuart Clark

unread,
Mar 16, 2021, 5:23:05 AM3/16/21
to Tamar, Prometheus Users

It really depends on what "down" is meaning to you.

Prometheus scrapes metrics from an instrumented application or exporter, so knowledge depends on being able to scrape those endpoints (or inferring something if you cannot do so). There are many reasons why you might not be able to perform a scrape (or get particular "down" type metric values) when the host isn't actually down.

For example, as you mentioned, if the node exporter isn't running on the host (or isn't accessible due to firewalls/network) the "up" metric would have a value of 0 even though the host might otherwise be fine. Similarly using the Blackbox Exporter to ping/port check could show issues due to network or services not running even though the host is ok.

In general I'd suggest that the host running or not isn't actually important. Instead the host is there to achieve some outcome - running networked services, being accessible via SSH, processing data using a cron job, etc. so it is those things which you should monitor. If those fail it could be due to the host being down (or network issues or a service being stopped) but the cause can be quickly discovered by your on-call support person - why it broke is less important for the alerting than what broke.

If you care about a networked service (a web application, database, SSH access, etc.) you can use direct instrumentation or an exporter if possible, but as a basic check you can use the Blackbox Exporter to make HTTP/TCP probes. If you care about a cron type processing job you can use the textfile collector in the Node Exporter as well as metrics that indicate the outcome of that processing (e.g. as well as producing metrics via the Node Exporter for a ETL database job you could produce metrics from database queries, giving you the ability to alert if the job hasn't appeared to run or if it hasn't appeared to add records).

-- 
Stuart Clark

Ben Kochie

unread,
Mar 16, 2021, 5:32:42 AM3/16/21
to Stuart Clark, Tamar, Prometheus Users
One of the Prometheus philosophy things here is that if the metrics target, like the node_exporter, is down. The node may as well be down because you're flying blind.

Even if the node itself is fine, the fact that the node_exporter is down is critical to being able to observe problems on the host.

So, a down node_exporter, or any similar exporter, should be treated as critical even if the service itself is fine.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/2bcb239f-36ed-8f43-2d63-2e6cfc08f1b6%40Jahingo.com.

Tamar

unread,
Mar 16, 2021, 8:41:33 AM3/16/21
to Prometheus Users
Thank you both for the explanation, it is clearer now.

BR,
Tamar

ב-יום שלישי, 16 במרץ 2021 בשעה 11:32:42 UTC+2, sup...@gmail.com כתב/ה:
Reply all
Reply to author
Forward
0 new messages