If you're looking to determine if a target is reachable or not, you could use the "
up" metric which is automatically added to the scrape of a given target (see
docs). The alerting condition could look something like this:
alert: TargetIsUnreachable
expr: up == 0
for: 3m
labels:
severity: warning
annotations:
title: Instance {{ $labels.instance }} is unreachable
description: Prometheus is unable to scrape {{ $labels.instance }}. This could indicate the target being down or at network issue.
This will trigger the alert if the "
up" metric is continuously equal to 0 (or in other words, the instance is unreachable) for a period of 3 minutes. The value of the "for" parameter should probably be at least 2 to 3 times higher than what your scrape_interval setting (see
docs for reference) . It's often advised to add the "
for" parameter to alerting conditions to avoid noise from flapping alerts. You wouldn't want to necessarily be notified if a single scrape fails, say due to a transient network connectivity problem. There is also the "
absent" function (see
docs) which you can use to determine if series (aka samples) exist for a given metric name and label combination. You would use that in cases like where you might want to be notified if a given metric disappears due to the target itself disappearing from the
service discovery.