Scaling Prometheus

157 views
Skip to first unread message

kvr

unread,
Oct 11, 2020, 1:14:18 AM10/11/20
to Prometheus Users
Hello,

We are hitting some limits with our current setup of Prometheus. I have read a lot of posts here as well as blogs and videos but still need some guidance.

Our current setup is at it's limit. Head series count is around 15M during pod churn regularly. Each app exports between 5000 and 8000 metrics series. So a 1000 pods causes about 8M new series in the head block. 
Prometheus currently has access to 300 GB of memory, but it can't use past 200GB in reality. It starts degrading around the 150GB mark. 
- Scrape time for Prometheus scraping itself is 5+ seconds and config reloads fail.
- We verified that this is not due to a cardinality explosion from a misbehaving app. So this has naturally degraded due to load.
- We eliminated bad queries as a cause by spinning up an additional Prometheus which just scrapes targets and nothing else. So the bottleneck is just ingestion. 

So the next step for us is to shard and use namespace level Prometheis. But I expect a similar level of usage in about an year again at the namespace level, with multiple apps in a single namespace scaling to 1000s of pods exporting 5K metrics each. And I will not be able to shard again because I don't want to go below  the NS granularity. 

How have others dealt with this situation where is the bottle neck is going to be ingestion and not queries?

Thanks for your time,
KVR

Ben Kochie

unread,
Oct 11, 2020, 3:08:27 AM10/11/20
to kvr, Prometheus Users
If all of the 1000s of pods in a namespace are of the same thing, you can use the hashmod feature to horizontally scale.

You can have several Prometheus instances per namespace, each responsible for a fraction of the pods.

Just to be sure, are you keeping up to date on the latest releases? 200G of memory seems like a lot for 15M series.

Are you using Thanos or a remote write service?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/cf15cc42-fe3e-4f4d-8489-3750fac7f81en%40googlegroups.com.

kvr

unread,
Oct 11, 2020, 4:14:20 PM10/11/20
to Prometheus Users

There are different services and each could scale to 1000+ pods in a given namespace. 
But even then managing a Prometheus instance pair per set of apps is not tenable. The management overhead would be too great when there are several such apps.

Version wise, we are keeping up, but not aggressively. 
We are on 2.18.2 and the instance under test does not have Thanos. It only scrapes and does some rule evaluation (the memory usage is the same even when rule eval is disabled).
We are using prometheus operator to reload config.

Yeah, I read that ~2GB of memory is sufficient per million metrics, so I am surprised that it consumes such a large amount.  Will having a diverse scrape intervals have such an effect?

Our stats at peak:
~15M head series
~45M head chunks
~475K samples/s ingested
~7000 pods scraped

Thanks!

l.mi...@gmail.com

unread,
Oct 12, 2020, 5:29:21 AM10/12/20
to Prometheus Users
I found the formula from https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion to be pretty accurate for estimating memory usage. It doesn't cover querying but from my experience memory needed for queries is usually hiding in the extra "double for gc" multiplication.
Try running your numbers there and see if it reflects what you are seeing.

Ben Kochie

unread,
Oct 12, 2020, 5:55:14 AM10/12/20
to kvr, Prometheus Users
Thanks, knowing what Prometheus version you're on helps a lot. There are two things that will help setups like yours quite a lot.

First, Prometheus 2.19 introduced some new memory management improvements that mostly eliminates pod churn memory growth. It also greatly improves memory use for high scrape frequencies.

Second, 2.18.2 was the first official Prometheus version to be built with Go 1.14. This introduced an issue affected the compression, and hence the memory use of Prometheus. See https://github.com/prometheus/prometheus/pull/7976.

Once 2.22.0 is out, upgrading would be highly recommended.

You might want to look at this Prometheus Operator issue about hashmod sharding:
https://github.com/prometheus-operator/prometheus-operator/issues/2590

Karthik Vijayaraju

unread,
Oct 13, 2020, 7:23:53 AM10/13/20
to Ben Kochie, Prometheus Users
Thank you! 
I will try this out with a newer version and experiment with hashmod.

Aliaksandr Valialkin

unread,
Oct 17, 2020, 4:12:46 AM10/17/20
to Karthik Vijayaraju, Prometheus Users
Hi Karthik,

There is another option - to substitute Prometheus with VictoriaMetrics stack, which includes vmagent for data scraping and vmalert for alerting and recording rules. It is optimized for high load, so it should require lower amounts of resources compared to Prometheus. See, for example, this case study.



--
Best Regards,

Aliaksandr Valialkin, CTO VictoriaMetrics

Karthik Vijayaraju

unread,
Oct 20, 2020, 1:33:12 PM10/20/20
to Aliaksandr Valialkin, Prometheus Users
Hi Aliaksandr,

Thank you! Those numbers look interesting; we will give it a shot as well. 

Thanks,
Karthik
Reply all
Reply to author
Forward
0 new messages