However this is based on the assumption that in case Node Exporter is unavailable then the whole host is down, which is not always the case (as I am using node-exporter container).
I there another way to check if the host itself is indeed down?
Thanks,
Tamar
It really depends on what "down" is meaning to you.
Prometheus scrapes metrics from an instrumented application or exporter, so knowledge depends on being able to scrape those endpoints (or inferring something if you cannot do so). There are many reasons why you might not be able to perform a scrape (or get particular "down" type metric values) when the host isn't actually down.
For example, as you mentioned, if the node exporter isn't running on the host (or isn't accessible due to firewalls/network) the "up" metric would have a value of 0 even though the host might otherwise be fine. Similarly using the Blackbox Exporter to ping/port check could show issues due to network or services not running even though the host is ok.
In general I'd suggest that the host running or not isn't actually important. Instead the host is there to achieve some outcome - running networked services, being accessible via SSH, processing data using a cron job, etc. so it is those things which you should monitor. If those fail it could be due to the host being down (or network issues or a service being stopped) but the cause can be quickly discovered by your on-call support person - why it broke is less important for the alerting than what broke.
If you care about a networked service (a web application,
database, SSH access, etc.) you can use direct instrumentation or
an exporter if possible, but as a basic check you can use the
Blackbox Exporter to make HTTP/TCP probes. If you care about a
cron type processing job you can use the textfile collector in the
Node Exporter as well as metrics that indicate the outcome of that
processing (e.g. as well as producing metrics via the Node
Exporter for a ETL database job you could produce metrics from
database queries, giving you the ability to alert if the job
hasn't appeared to run or if it hasn't appeared to add records).
-- Stuart Clark
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/2bcb239f-36ed-8f43-2d63-2e6cfc08f1b6%40Jahingo.com.