Prometheus High RAM Investigation

27 views
Skip to first unread message

Shubham Shrivastav

unread,
Jan 26, 2022, 12:57:55 AM1/26/22
to Prometheus Users
Hi all, 

I've been investigating Prometheus memory utilization over the last couple of days.

Based on pprof command outputs, I do see a lot of memory utilized by getOrSet function, but according to docs, it's just for creating new series, so not sure what I can do about it.


Pprof "top" output: 
https://pastebin.com/bAF3fGpN

Also, to figure out if I have any metrics that I can remove I ran ./tsdb analyze on memory (output here: https://pastebin.com/twsFiuRk)

I did find some metrics having more cardinality than others but the difference was not very massive.

With ~100 nodes our RAM takes around 15 Gigs.

We're getting average Metrics Per node: 8257


Our estimation is around 200 nodes, which will make our RAM go through the roof.

Apart from distributing our load over multiple Prometheus nodes, are there any alertnatives?

TIA,
Shubham

Ben Kochie

unread,
Jan 26, 2022, 1:55:11 AM1/26/22
to Shubham Shrivastav, Prometheus Users
How are you getting the value of 15GiB?

What do you get for process_resident_memory_bytes and go_memstats_alloc_bytes for Prometheus?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/201d2d39-f5ca-4dfa-948b-5c6b54aa4dc4n%40googlegroups.com.

Shubham Shrivastav

unread,
Jan 27, 2022, 4:29:13 PM1/27/22
to Prometheus Users
Hi Ben,

Prometheus Containers got restarted due to OOM, and has fewer targets now (~6). That's probably why numbers seem low, but the metrics pulled will be same.
I was trying to recognize the pattern 
Below are the metrics you requested,

process_resident_memory_bytes{instance="localhost:9090", job="prometheus"} 1536786432

go_memstats_alloc_bytes{instance="localhost:9090", job="prometheus"} 908149496


Reply all
Reply to author
Forward
0 new messages