Is up metric reliable?

20 views
Skip to first unread message

Steve

unread,
Mar 16, 2020, 4:09:26 PM3/16/20
to Prometheus Users

Hi

I have been playing with the up metric to see if it is reliable.

So far, I conclude it is not. Do you have the same results?


This is what I have done:


- I create the exporter. It is a pod.

- At the scrape interval T, Prometheus server discovers it.

- Target is visible and state is UP (UP=1)

- Now I delete the pod.

- At the next scrape interval T+1, Prometheus server detects the exporter is down UP is (UP=0)

- So far so good…but...

- At the next scrape interval T+2, Prometheus server removes the target and I don’t see have any info about the existence of the target (UP metric for target disappears)


Please let me know what you think.


Regards

Steve

Christian Hoffmann

unread,
Mar 16, 2020, 4:21:47 PM3/16/20
to Steve, Prometheus Users
Hi Steve,

On 3/16/20 9:09 PM, Steve wrote:
> I have been playing with the up metric to see if it is reliable.
>
> So far, I conclude it is not. Do you have the same results?

I guess I would have the same (technical) results, but my conclusion
would be that "up" works as expected and is reliable. ;)

> This is what I have done:
>
>
> - I create the exporter. It is a pod.
>
> - At the scrape interval T, Prometheus server discovers it.
>
> - Target is visible and state is UP (UP=1)
>
> - Now I delete the pod.
>
> - At the next scrape interval T+1, Prometheus server detects the
> exporter is down UP is (UP=0)
>
> - So far so good…but...
>
> - At the next scrape interval T+2, Prometheus server removes the target
> and I don’t see have any info about the existence of the target (UP
> metric for target disappears)

This is expected. When your service discovery removes a target,
Prometheus will also stop knowing and scraping it.

You can build alerts for such a situation if this is unexpected in your
case (e.g. "up offset 10m unless up" would send an alert for all targets
which had been there 10 minutes ago, but are no longer there).

However, in most cases, such a behavior is wanted. If you remove a
target from a static file service discovery, this is usually by
intention (e.g. because a server has been removed).
If a target gets removed from the kubernetes service discovery, it might
be because the container got destroyed and redeployed elsewhere. In both
cases you would usually want your monitoring to stop.

So, if this behavior is a problem in your case, try describing why and
maybe there is a solution for that. :)

Kind regards,
Christian

Steve

unread,
Mar 16, 2020, 9:01:11 PM3/16/20
to Christian Hoffmann, Prometheus Users
Hi Christian 
Thanks for the suggestion.
I stand corrected.

Regards
Steve
Reply all
Reply to author
Forward
0 new messages