Dear friends,
I'm experiencing a very strange situation with Prometheus on two identical servers.
We are running Prometheus v1.7.1 on two VMs (Debian 8.7, 4-core, 32GB RAM) with local SSD storage (retention is 168h), prometheus is running in Docker (official image) without resource limits/etc.
cmd flags: -storage.local.retention=168h -storage.local.target-heap-size=22906492245
CPU is constantly @100%, I have tried the following:
1) Removed all targets except Prometheus itself
2) Removed all recording rules
3) There's just one dashboard being accessed (refreshed by me every few minutes) with Prometheus stats
4) Performed clean shutdown (kill -SIGTERM) and started again - CPU spiked to 100% and holding
Only thing I haven't tried is purging the data dir (kind of don't want to loose my history).
I can't quite understand what's going on... based on a comment
here, I executed a profile on the running instance (svg and pprof attached), I couldn't find any clues but don't have experiencing in go profiling.
Any ideas?
Danny