Compaction process almost caused OOM Kill

193 views
Skip to first unread message

Shay Berman

unread,
Jan 11, 2021, 3:18:00 PM1/11/21
to Prometheus Users
During compaction duration the prometheus had memory spike from 5GGB to 20GB based on metric process_resident_memory_bytes. Since we have limit of 21GB to the prometheus container(VM RAM is 23GB) we almost had OOMkill.

Full detail about the issue (spike graphs & environment detail) can be found -> https://github.com/prometheus/prometheus/issues/8357

Questions:
1. Does the process_resident_memory_bytes[link] is the right metric to monitor and alert about prometheus memory or should we use the k8s pod metric container_memory_working_set_bytes[link] of the prometheus pod?

2. Is the process_resident_memory_bytes metric can go above VM physical RAM or  OOM Kill will hit first?  (is there any memory that count in this metric can be evicted by the kernel to avoid OOMkill?)

3. Assuming prometheus compaction may cause huge memory spikes(as mentioned above and like issue1 and issue2). Is there a way to tune prometheus to avoid such huge spikes during compaction (e.g: tune prometheus settings or to increase the instance RAM)?
    
Thanks 
Shay
Reply all
Reply to author
Forward
0 new messages