Prometehus Hangs without any error log

51 views

Skip to first unread message

Ravindra Singh

unread,

Aug 11, 2020, 1:47:09 AM8/11/20

to Prometheus Users

We are using Prometheus operator https://github.com/helm/charts/tree/master/stable/prometheus-operator 0.38.1 and Prometheus 2.18.2.

We are seeing a weird problem with this setup, Intermittently Prometheus's static endpoints like /-/ready, /-/healthy, /-/metrics become unresponsive and they never recover again.

Apparently kubelet checks readiness probes on these endpoints and after some failure, it will send a sigterm to the Prometheus. This is ok but due to these intermittent restarts, we lost metrics for the restarts period for like ~10 mins.

We have run 2 replicas of Prometheus pods and we have seen restarts or sigterm sent on the same time to both pods.

CPU and Memory requested is 8 Core, 24GB. And we are not having much load on the Prometheus in terms of time series(307K only).

Attaching the logs of a sigterm received pod in debug mode and configuration of Prometheus.

Its been a headache for us now, any help is highly appreciated.

Thanks

Ravindra Singh

prometheus_config.yaml

prometheus.log

Reply all

Reply to author

Forward

0 new messages