Prometheus OOM , Memory Sizing

Random Person

unread,

May 25, 2022, 3:56:51 AM5/25/22

to Prometheus Users

I am attempting to use the memory calculator formula from https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion as we have been getting oom killed .

Is there a query that will provide Number of Unique Label Pairs , and Average Bytes per Label Pairs ? Looking at the stats, and trying to get this data out of our prometheus has been quite difficult to this point.

Thanks!

l.mi...@gmail.com

unread,

May 25, 2022, 5:39:30 AM5/25/22

to Prometheus Users

In general this is a tricky topic, linked blog post help me in the past but it won't give you a magical number that's always true.

What I have found is that you really need to worry about one thing: the number of time series scraped.

With that in mind you can calculate how much memory is needed per time series with a simple query: go_memstats_alloc_bytes / prometheus_tsdb_head_series.

This give you per time series memory cost before GC, now unless you specify custom GOGC env variable for your Prometheus instance you usually need to double that to get the RSS memory cost.
Then we need to add other memory costs to the mix and these are less easy to quantify, there are other parts of Prometheus that use memory and queries will eat more or less memory depending on how complex they are etc. So it gets more fuzzy from there. But in general memory usage will scale with (since it's mostly driven by) the number of time series you have in Prometheus (prometheus_tsdb_head_series tells you that).

Now another complication is that all time series stay in memory for a while even if you scrape them only once. If you plot prometheus_tsdb_head_series over a few hours range you should see it go down every now and then, there's metrics garbage collection that happens (which you can see in logs) and also blocks get written from in-memory data every 2h (by default AFAIR). And this is an important thing to remember - if you have a lot of "event like" metrics that are exported only for a few seconds, for example if labels on metrics keeps changing all the time because some services put things like user IDs, requests paths etc, then that will get accumulated in memory until gc/block write happens. Again - prometheus_tsdb_head_series will show you that - if it just keeps growing all the time then so will your memory usage.

tl;dr keep an eye on prometheus_tsdb_head_series and you'll see how many time series you're able to fit into your instance

Random Person

unread,

May 25, 2022, 8:45:09 AM5/25/22

to Prometheus Users

Thanks for this info, so if I am understanding correctly.

prometheus_tsdb_head_series returns 6181063 , when graphing this it almost looks vertical.

go_memstats_alloc_bytes / prometheus_tsdb_head_series returns 6741

Doing a calculation on that, would come up to roughly 41GB , prior to any other memory utilization requirements of Prometheus. With that said then you double it, so looking more like 82GB of memory + allocation for prometheus overhead. Does this seem about accurate?

Thanks!

l.mi...@gmail.com

unread,

May 25, 2022, 8:55:34 AM5/25/22

to Prometheus Users

Sounds about right.
I do have a number of instances with 6-6.4M time series and they use around 20-35GB of go_memstats_alloc_bytes and around 66-96GB of RSS memory.
That's all with GOGC=40.
go_memstats_alloc_bytes / prometheus_tsdb_head_series is between 4-8KB on most of them.

Ben Kochie

unread,

May 25, 2022, 9:50:22 AM5/25/22

to l.mi...@gmail.com, Prometheus Users

Those numbers seem to be well within the expected normal range.

One thing we noticed is that Kubernetes kubelet/cAdvisor has a lot of less useful/duplicate metrics exposed.

This is the metric filter we ended up with for "/metrics/cadvisor"

https://gist.github.com/SuperQ/d61e405e3464da4a766808f6329c5c1e

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/773e7e8f-55a6-40d1-8067-6a2c064b23c1n%40googlegroups.com.

Reply all

Reply to author

Forward