Prometehus Hangs without any error log

51 views
Skip to first unread message

Ravindra Singh

unread,
Aug 11, 2020, 1:47:09 AM8/11/20
to Prometheus Users
We are using Prometheus operator https://github.com/helm/charts/tree/master/stable/prometheus-operator 0.38.1  and Prometheus 2.18.2.

We are seeing a weird problem with this setup, Intermittently Prometheus's static endpoints like /-/ready, /-/healthy, /-/metrics become unresponsive and they never recover again.
Apparently kubelet checks readiness probes on these endpoints and after some failure, it will send a sigterm to the Prometheus. This is ok but due to these intermittent restarts, we lost metrics for the restarts period for like ~10 mins.

We have run 2 replicas of Prometheus pods and we have seen restarts or sigterm sent on the same time to both pods.
CPU and Memory requested is 8 Core, 24GB. And we are not having much load on the Prometheus in terms of time series(307K only).


Attaching the logs of a sigterm received pod in debug mode and configuration of Prometheus.

Its been a headache for us now, any help is highly appreciated.

Thanks 
Ravindra Singh
prometheus_config.yaml
prometheus.log
Reply all
Reply to author
Forward
0 new messages