Is there any limits for prometheus monitoring?

wang dong

unread,

Jun 24, 2020, 9:23:51 AM6/24/20

to Prometheus Users

Hi prometheus expert,

we have a production cluster, 5 masters, 20 workers. And we run our service in this cluster.

And we install prometheus 2.8.0 with a helm chart.

After one year running, we recently keep getting OOM of prometheus pod. From the prometheus stats dashboard,

we got the peak RSS 20 GB when clients access to our service.

We have been keeping increasing mem again and again. Now, the limit mem of this container is 32 GB and CPU is 1.

I am not sure how huge we will increase the resource. But 32GB is really big for a pod/container.

So I wonder if this is limit of prometheus and we hit it? Or is there any best practice we should comply

to make our service available to our clients. Thanks in advance.

Stuart Clark

unread,

Jun 24, 2020, 9:41:36 AM6/24/20

to wang dong, Prometheus Users

Memory usage is due to both the targets you scrape and the queries you
perform.

To reduce the memory used for scraping, reduce the scrape interval or
the number of targets/metrics being ingested.

For query memory reduction, look at the recording rules & API queries -
if you have to process a lot of time series or a long duration more
memory will be used.

Julien Pivotto

unread,

Jun 26, 2020, 4:06:10 AM6/26/20

to Stuart Clark, wang dong, Prometheus Users

On 24 Jun 14:41, Stuart Clark wrote:
> On 24/06/2020 14:23, wang dong wrote:
> > Hi prometheus expert,
> >
> > we have a production cluster, 5 masters, 20 workers. And we run our
> > service in this cluster.
> > And we install prometheus 2.8.0 with a helm chart.
> > After one year running, we recently keep getting OOM of prometheus pod.
> > From the prometheus stats dashboard,
> > we got the peak RSS 20 GB when clients access to our service.
> > We have been keeping increasing mem again and again. Now, the limit mem
> > of this container is 32 GB and CPU is 1.
> >
> > I am not sure how huge we will increase the resource. But 32GB is really
> > big for a pod/container.
> >
> >
> > So I wonder if this is limit of prometheus and we hit it? Or is there
> > any best practice we should comply
> > to make our service available to our clients. Thanks in advance.

1 CPU is also too low, as we generally expect 1 CPU available for the
TSDB itself.

--
Julien Pivotto
@roidelapluie

Ben Kochie

unread,

Jun 26, 2020, 4:13:41 AM6/26/20

to Stuart Clark, wang dong, Prometheus Users

I would also recommend upgrading to 2.19.1. There have been many memory optimizations since 2.8.0. Including the new head chunk mmap code which eliminates the need to reduce scrape interval to reduce memory.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/e6db72e6-c434-ff57-591a-a934451ff76e%40Jahingo.com.

Adso Castro

unread,

Jul 15, 2020, 4:40:15 PM7/15/20

to Prometheus Users

Seconded. Clark's right. My current scenario is kinda the same (pod memory keep floating between 12~16GB) and I'm currently working on identifying why do I have so much targets (17k at the moment) and doing all the records I can to relieve stress from dashboards and rules.

Stuart Clark

unread,

Jul 15, 2020, 4:48:35 PM7/15/20

to Adso Castro, Prometheus Users

On 15/07/2020 21:40, Adso Castro wrote:
> Seconded. Clark's right. My current scenario is kinda the same (pod
> memory keep floating between 12~16GB) and I'm currently working on
> identifying why do I have so much targets (17k at the moment) and
> doing all the records I can to relieve stress from dashboards and rules.

17k targets sounds a lot for a single Prometheus server. It is worth
looking at sharding into multiple Prometheus instances, by namespace or
function.

Adso Castro

unread,

Jul 15, 2020, 5:08:04 PM7/15/20

to Prometheus Users

Good to know, I was actually thinking about doing something like sharding by namespace. Do you know any kind of doc or guide on how do I do that? I'm running Prometheus Operator by the way and they're still implementing a sharding mechanism (https://github.com/coreos/prometheus-operator/pull/3241).

Reply all

Reply to author

Forward