Prometheus RAM usage investigation

1,268 views
Skip to first unread message

Victor H

unread,
Jan 24, 2023, 4:20:44 AM1/24/23
to Prometheus Users
Hi,

We are running multiple Prometheus instances in Kubernetes (deployed using Prometheus Operator) and hope that someone can help us understanding why the RAM usage in a few of our instances are unexpectedly high (we think it's cardinality but not sure where to look)

In Prometheus A, we have the following stat:

Number of Series: 56486
Number of Chunks: 56684
Number of Label Pairs: 678

tsdb analyze has the following result:

/bin $ ./promtool tsdb analyze /prometheus/
Block ID: 01GQGMKZAF548DPE2DFZTF1TRW
Duration: 1h59m59.368s
Series: 56470
Label names: 26
Postings (unique label pairs): 678
Postings entries (total label pairs): 338705

This instance uses roughly between 4Gb - 5Gb of RAM (measured by Kubernetes).

From our reading, each time series should use around 8kb of RAM so for 56k series should be using a mere 500Mb.

On a different Prometheus instance (let's call it Prometheus Central) we have 1,1m series and it's using 9Gb - 10Gb which is roughly what is expected.

We're curious about this instance and we believe it's cardinality. We have a lot more targets in Prometheus A. I also note that the Posting entries (total label pairs) is 338k but I'm not sure where to look for this.

The top entries from tsdb analyze is right at the bottom of this post. The "most common label pairs" entries have alarmingly high count, I wonder if this contributes the high "total label pairs" and consequently higher than expected RAM usage.

When calculating the expected RAM usage, is the "total label pairs" is the number we need to use rather than the "total series"

Thanks,
Victor


Label pairs most involved in churning:
296 activity_type=none
258 workflow_type=PodUpdateWorkflow
163 __name__=temporal_request_latency_bucket
104 workflow_type=GenerateSPVarsWorkflow
95 operation=RespondActivityTaskCompleted
89 __name__=temporal_activity_execution_latency_bucket
89 __name__=temporal_activity_schedule_to_start_latency_bucket
65 workflow_type=PodInitWorkflow
53 operation=RespondWorkflowTaskCompleted
49 __name__=temporal_workflow_endtoend_latency_bucket
49 __name__=temporal_workflow_task_schedule_to_start_latency_bucket
49 __name__=temporal_workflow_task_execution_latency_bucket
49 __name__=temporal_workflow_task_replay_latency_bucket
39 activity_type=UpdatePodConnectionsActivity
38 le=+Inf
38 le=0.02
38 le=0.1
38 le=0.001
38 activity_type=GenerateSPVarsActivity
38 le=5

Label names most involved in churning:
734 __name__
734 job
724 instance
577 activity_type
577 workflow_type
541 le
177 operation
95 datname
53 datid
31 mode
29 namespace
21 state
12 quantile
11 container
11 service
11 pod
11 endpoint
10 scrape_job
4 alertname
4 severity

Most common label pairs:
23012 activity_type=none
20060 workflow_type=PodUpdateWorkflow
12712 __name__=temporal_request_latency_bucket
8092 workflow_type=GenerateSPVarsWorkflow
7440 operation=RespondActivityTaskCompleted
6944 __name__=temporal_activity_execution_latency_bucket
6944 __name__=temporal_activity_schedule_to_start_latency_bucket
5100 workflow_type=PodInitWorkflow
4140 operation=RespondWorkflowTaskCompleted
3864 __name__=temporal_workflow_task_replay_latency_bucket
3864 __name__=temporal_workflow_endtoend_latency_bucket
3864 __name__=temporal_workflow_task_schedule_to_start_latency_bucket
3864 __name__=temporal_workflow_task_execution_latency_bucket
3080 activity_type=UpdatePodConnectionsActivity
3004 le=0.5
3004 le=0.01
3004 le=0.1
3004 le=1
3004 le=0.001
3004 le=0.002

Label names with highest cumulative label value length:
8312 scrape_job
4279 workflow_type
3994 rule_group
2614 __name__
2478 instance
1564 job
434 datname
248 activity_type
139 mode
128 operation
109 version
97 pod
88 state
68 service
45 le
44 namespace
43 slice
31 container
28 quantile
18 alertname

Highest cardinality labels:
138 instance
138 scrape_job
84 __name__
75 workflow_type
71 datname
70 job
19 rule_group
14 le
10 activity_type
9 mode
9 quantile
6 state
6 operation
5 datid
4 slice
2 container
2 pod
2 alertname
2 version
2 service

Highest cardinality metric names:
12712 temporal_request_latency_bucket
6944 temporal_activity_execution_latency_bucket
6944 temporal_activity_schedule_to_start_latency_bucket
3864 temporal_workflow_task_schedule_to_start_latency_bucket
3864 temporal_workflow_task_replay_latency_bucket
3864 temporal_workflow_task_execution_latency_bucket
3864 temporal_workflow_endtoend_latency_bucket
2448 pg_locks_count
1632 pg_stat_activity_count
908 temporal_request
690 prometheus_target_sync_length_seconds
496 temporal_activity_execution_latency_count
350 go_gc_duration_seconds
340 pg_stat_database_tup_inserted
340 pg_stat_database_temp_bytes
340 pg_stat_database_xact_commit
340 pg_stat_database_xact_rollback
340 pg_stat_database_tup_updated
340 pg_stat_database_deadlocks
340 pg_stat_database_tup_returned







Ben Kochie

unread,
Jan 24, 2023, 4:29:47 AM1/24/23
to Victor H, Prometheus Users
When you say "measured by Kubernetes", what metric specifically?

There are several misleading metrics. What matters is `container_memory_rss` or `container_memory_working_set_bytes`. The `container_memmory_usage_bytes` is misleading because it includes page cache values.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/59f74cb9-3135-4fc3-a7e7-9bec02a3143an%40googlegroups.com.

Brian Candler

unread,
Jan 24, 2023, 5:03:01 AM1/24/23
to Prometheus Users
Also, what version(s) of prometheus are these two instances? Different versions of Prometheus are compiled using different versions of Go, which in turn have different degrees of aggressiveness in returning unused RAM to the operating system. Also remember Go is a garbage-collected language.

The RAM usage of Prometheus depends on a number of factors. There's a calculator embedded in this article, but it's pretty old now:

Victor H

unread,
Jan 24, 2023, 5:41:16 AM1/24/23
to Prometheus Users
> When you say "measured by Kubernetes", what metric specifically?

I'm not entirely sure. In Kubernetes you can specify the container's request and limit. Request is used for scheduling (to ensure that your pod is scheduled in a node that has enough memory) and limit is used to kill your pod if it breached.

In GCP's GKE the metric is simply shown as "memory". I can see that when it breached the limit (the limit line is shown) then the pod is killed. I'm assuming this is the entirety of the RAM usage of the Pod.

The GCP documentation says that "memory" metric is: "The metrics quantify non-evictable memory." And about non-evictable memory we have this rather vague description: "Evictable memory is memory that can be easily reclaimed by the kernel, while non-evictable memory cannot."

Regards,
Victor

Victor Hadianto

unread,
Jan 24, 2023, 5:44:34 AM1/24/23
to Brian Candler, Prometheus Users
> Also, what version(s) of prometheus are these two instances?

They are both the same:
prometheus, version 2.37.0 (branch: HEAD, revision: b41e0750abf5cc18d8233161560731de05199330)

> The RAM usage of Prometheus depends on a number of factors. There's a calculator embedded in this article, but it's pretty old now: https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion

Thanks for this, I'll read & play around with that calculator for our Prometheus instances (we have 9 in various clusters now).

Regards,
Victor


You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/_yUpPWtFaQA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/9a2d7848-4f4f-43b9-90f4-765367f33c47n%40googlegroups.com.

Omero Saienni

unread,
Feb 1, 2023, 3:28:27 AM2/1/23
to Prometheus Users
Hi,

We are seeking to monitor nearly a thousand rapidly expanding Postgres databases using Prometheus.
Currently, we have divided the targets into two Prometheus instances.
One instance is monitoring the `pg_up` metric with instance labels only, and with metrics from Postgres and Operator disabled.
However, we have noticed a significant increase in memory usage as we add more targets.
The `go tool pprof` shows that the majority of memory consumption is due to the `labels (*Builder) Labels` function.

Measurement values show an exponential increase in memory usage, with a large portion of the memory consumed being from labels.
For example, with 2091 time series and 360 labels, memory usage has reached 8028 MiB with 4392 MiB consumed by label memory.

We are unsure if this is normal behavior for Prometheus.

Here are the measurement values:

Number of SMons,Memory Used,PProf Memory Used for Labels,Number of Series,Number of Chunks,Number of Label Pairs
0,45 MiB,-,0,0,0
1,64 MiB,-,9,9,13
2,67 MiB,0.5 MiB (12%),15,15,14
5,80 MiB,6.2 MiB (19%),33,33,17
10,103 MiB,10 MiB (25%),63,63,22
15,123 MiB,20 MiB (39%),93,93,27
20,130 MiB,25 MiB (40%),123,123,32
30,189 MiB,30 MiB (42%),183,183,42
46,297 MiB,55 MiB (48%),273,273,57
348,8028 MiB,4392 MiB (82%),2091,2091,360

These were measured using, `kubectl top pods` and `go tool pprof https//prom-shard/debug/pprof/heap`

The second instance, which we used for comparison, is currently using approximately 9981 MiB.

Here are its measurement values:

Number of SMons,Memory Used,PProf Memory Used for Labels,Number of Series,Number of Chunks,Number of Label Pairs
77,9981 MiB,728 MiB (17%),1124830,2252751,47628

Here it makes sense as to where the memory is being consumed as there are a large amount of label pairs and time series in the HEAD.

We would appreciate recommendations on the best way to set up Prometheus for this scenario?
Is this expected behaviour for Prometheus?

Thanks,
Omero

Omero Saienni

unread,
Feb 1, 2023, 3:28:44 AM2/1/23
to Prometheus Users
We are seeking to monitor nearly a thousand rapidly expanding Postgres databases using Prometheus.
Currently, we have divided the targets into two Prometheus instances.

One instance is monitoring the `pg_up` metric with instance labels only, and with metrics from Prometheus and Operator disabled.

However, we have noticed a significant increase in memory usage as we add more targets.

The `go tool pprof` shows that the majority of memory consumption is due to the labels (*Builder) Labels function.Measurement values show an exponential increase in memory usage, with a large portion of the memory consumed being from labels.
For example, with 2091 time series and 360 labels, memory usage has reached 8028 MiB with 4392 MiB consumed by label memory.We are unsure if this is normal behaviour for Prometheus?

Here are the measurement values:

```
Number of SMons,Memory Used,PProf Memory Used for Labels,Number of Series,Number of Chunks,Number of Label Pairs
0,45 MiB,-,0,0,0
1,64 MiB,-,9,9,13
2,67 MiB,0.5 MiB (12%),15,15,14
5,80 MiB,6.2 MiB (19%),33,33,17
10,103 MiB,10 MiB (25%),63,63,22
15,123 MiB,20 MiB (39%),93,93,27
20,130 MiB,25 MiB (40%),123,123,32
30,189 MiB,30 MiB (42%),183,183,42
46,297 MiB,55 MiB (48%),273,273,57
348,8028 MiB,4392 MiB (82%),2091,2091,360
```
These were measured using, `kubectl top pods` and `go tool pprof https//prom-shard/debug/pprof/heap`.

We have a second instance, which we used for comparison which is currently using approximately 9981 MiB.

Here are its measurement values:

```
Number of SMons,Memory Used,PProf Memory Used for Labels,Number of Series,Number of Chunks,Number of Label Pairs
77,9981 MiB,728 MiB (17%),1124830,2252751,47628
```

Here it makes sense as to where the memory is being consumed as there are a large amount of label pairs and time series in the HEAD.

We would appreciate recommendations on the best way to set up Prometheus for this scenario?
Is this expected behaviour for Prometheus?

Julien Pivotto

unread,
Feb 1, 2023, 3:48:48 AM2/1/23
to Victor Hadianto, Brian Candler, Prometheus Users
On 24 Jan 21:43, Victor Hadianto wrote:
> > Also, what version(s) of prometheus are these two instances?
>
> They are both the same:
> prometheus, version 2.37.0 (branch: HEAD, revision:
> b41e0750abf5cc18d8233161560731de05199330)

Please update to 2.37.5. There has been a memory leak fixed in 2.37.3.
> >>> <https://groups.google.com/d/msgid/prometheus-users/59f74cb9-3135-4fc3-a7e7-9bec02a3143an%40googlegroups.com?utm_medium=email&utm_source=footer>
> >>> .
> >>>
> >> --
> > You received this message because you are subscribed to a topic in the
> > Google Groups "Prometheus Users" group.
> > To unsubscribe from this topic, visit
> > https://groups.google.com/d/topic/prometheus-users/_yUpPWtFaQA/unsubscribe
> > .
> > To unsubscribe from this group and all its topics, send an email to
> > prometheus-use...@googlegroups.com.
> > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/prometheus-users/9a2d7848-4f4f-43b9-90f4-765367f33c47n%40googlegroups.com
> > <https://groups.google.com/d/msgid/prometheus-users/9a2d7848-4f4f-43b9-90f4-765367f33c47n%40googlegroups.com?utm_medium=email&utm_source=footer>
> > .
> >
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CANP6zPKHQkSZPcQ%3Dcj1obbq4RfcnnE_eOJqEkYtvEwOqAE6EgQ%40mail.gmail.com.

--
Julien Pivotto
@roidelapluie

Ben Kochie

unread,
Feb 1, 2023, 4:35:00 AM2/1/23
to Victor Hadianto, Brian Candler, Prometheus Users
Or upgrade to 2.42.0. :)

Brian Candler

unread,
Feb 1, 2023, 5:00:29 AM2/1/23
to Prometheus Users
Aside: is 2.42.0 going to be an LTS version?

Julien Pivotto

unread,
Feb 1, 2023, 5:45:34 AM2/1/23
to Brian Candler, Prometheus Users
On 01 Feb 02:00, Brian Candler wrote:
> Aside: is 2.42.0 going to be an LTS version?

Hello,

I have not updated the website yet, but 2.42 will not be a LTS version.

My feeling is that we still need a few releases so that the native
histogram and OOO ingestion "stabilizes". It is not about waiting for
them to be stable, but more making sure that the eventual bugs
introduced in the codebase by those two major features are noticed and
fixed.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/b1a2bd98-b65f-40f0-b92b-52fe8f34febbn%40googlegroups.com.


--
Julien Pivotto
@roidelapluie

Brian Candler

unread,
Feb 1, 2023, 6:07:29 AM2/1/23
to Prometheus Users
That makes sense.  Hopefully the LTS support for 2.37 can be extended in the mean time.

Omero Saienni

unread,
Feb 16, 2023, 8:28:59 PM2/16/23
to Prometheus Users
I will upgrade to the LTS. 

I did upgrade to the latest helm chart and did see very little difference but I will send you all some metrics and see how we can proceed.  

Thanks

Omero Saienni

unread,
Feb 19, 2023, 7:45:34 PM2/19/23
to Prometheus Users
I upgraded Prometheus from 2.37.0 to 2.37.5 and I see negligible difference in memory consumption.

Constants: 

Number label pairs in prometheus-prometheus-my-namespace-0: 455
Number of Targets in prometheus-prometheus-my-namespace-0: 392

What do you suggest we do?

# Analysis

Number label pairs in prometheus-prometheus-my-namespace-0: 455
Number of Targets in prometheus-prometheus-my-namespace-0: 392

## Version: v2.37.0

## Version: v2.37.0 - Trough

```sh
$ kubectl top pod prometheus-prometheus-my-namespace-0
NAME CPU(cores) MEMORY(bytes)
prometheus-prometheus-my-namespace-0 31m 8748Mi
```

## Version: v2.37.0 - Peak

```sh
$ kubectl top pod prometheus-prometheus-my-namespace-0
NAME CPU(cores) MEMORY(bytes)
prometheus-prometheus-my-namespace-0 31m 12160Mi
```

## Version: v2.37.5

### Version: v2.37.5 - Trough

```sh
$ kubectl top pod prometheus-prometheus-my-namespace-0
NAME CPU(cores) MEMORY(bytes)
prometheus-prometheus-my-namespace-0 31m 8338Mi
```

## Version: v2.37.5 - Peak

```sh
$ kubectl top pod prometheus-prometheus-my-namespace-0
NAME CPU(cores) MEMORY(bytes)
prometheus-prometheus-my-namespace-0 241m 11698Mi
```
Reply all
Reply to author
Forward
0 new messages