Compaction process almost caused OOM Kill

193 views

Skip to first unread message

Shay Berman

unread,

Jan 11, 2021, 3:18:00 PM1/11/21

to Prometheus Users

During compaction duration the prometheus had memory spike from 5GGB to 20GB based on metric process_resident_memory_bytes. Since we have limit of 21GB to the prometheus container(VM RAM is 23GB) we almost had OOMkill.

Full detail about the issue (spike graphs & environment detail) can be found -> https://github.com/prometheus/prometheus/issues/8357

Questions:

1. Does the process_resident_memory_bytes[link] is the right metric to monitor and alert about prometheus memory or should we use the k8s pod metric container_memory_working_set_bytes[link] of the prometheus pod?

2. Is the process_resident_memory_bytes metric can go above VM physical RAM or OOM Kill will hit first? (is there any memory that count in this metric can be evicted by the kernel to avoid OOMkill?)

3. Assuming prometheus compaction may cause huge memory spikes(as mentioned above and like issue1 and issue2). Is there a way to tune prometheus to avoid such huge spikes during compaction (e.g: tune prometheus settings or to increase the instance RAM)?

Thanks

Shay

Reply all

Reply to author

Forward

0 new messages