Resources limits

Tomer Leibovich

unread,

Jun 17, 2020, 3:34:55 AM6/17/20

to Prometheus Users

I’m using Prometheus-Operator in my cluster and encountered an issue with Prometheus pod that consumed 20GB RAM when my cluster grew and consisted of 400 pods, eventually Prometheus chocked the server and I had to terminate it.
How much memory should I allocate to the pod in order to keep it running and avoid letting it to grow as it was?

Stuart Clark

unread,

Jun 17, 2020, 3:37:39 AM6/17/20

to Tomer Leibovich, Prometheus Users

The amount of memory needed depends on the scrape interval, number of
timeseries being ingested and the query load.

--
Stuart Clark

Tomer Leibovich

unread,

Jun 17, 2020, 3:48:09 AM6/17/20

to Prometheus Users

Thanks, so if I cannot reduce the amount of pods, it’s better to change the scraper interval from default of 30s to 60s?

Brian Candler

unread,

Jun 17, 2020, 4:29:44 AM6/17/20

to Prometheus Users

There's a calculator here:

https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion

You can see from this how much difference increasing the scrape interval would make.

Ben Kochie

unread,

Jun 17, 2020, 7:24:15 AM6/17/20

to Tomer Leibovich, Prometheus Users

The standard approach for larger setups is to start sharding Prometheus. In Kubernetes it's common to have a Prometheus-per-namespace.

You may also want to look into how many metrics each of your pods is exposing. 20GB of memory indicates that you probably have over 1M prometheus_tsdb_head_series

Changing the scrape interval is probably not going to help as much as reducing your cardinality per Prometheus.

For example, we have a couple different shards. One is using 33GB of memory and managing 1.5M series. The other shard is 38GB and managing 2.5M series. We allocate 64GB memory instances for these servers.

If you don't want to go down the sharding route, you'll likely need some larger nodes to run Prometheus on.

On Wed, Jun 17, 2020 at 9:48 AM Tomer Leibovich <tomer.l...@gmail.com> wrote:

Thanks, so if I cannot reduce the amount of pods, it’s better to change the scraper interval from default of 30s to 60s?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/71fc37fc-4e4f-4a14-9fdb-67ef49e5f661o%40googlegroups.com.

Reply all

Reply to author

Forward