Hello,
I’ve noticed when Prometheus runs out of storage space, it does generate error logs (such as "no space left on device"), but it continues to operate as if it was healthy. Both "/-/healthy" and "/-/ready" return a 200 status code. Furthermore, it sometimes continues to evaluate rules/alerts and it continues serving "/api/v1" with potentially "incomplete" or "incorrect" data. This issue can persist for some time before detection.
I’m curious to know if this is the anticipated behavior, prioritizing availability over consistency (even though the data isn’t really distributed).
I apologize if this topic has been previously discussed. If so, I would appreciate being directed to that conversation.
Regards,