Hi,
We are seeking to monitor nearly a thousand rapidly expanding Postgres databases using Prometheus.
Currently, we have divided the targets into two Prometheus instances.
One instance is monitoring the `pg_up` metric with instance labels only, and with metrics from Postgres and Operator disabled.
However, we have noticed a significant increase in memory usage as we add more targets.
The `go tool pprof` shows that the majority of memory consumption is due to the `labels (*Builder) Labels` function.
Measurement values show an exponential increase in memory usage, with a large portion of the memory consumed being from labels.
For example, with 2091 time series and 360 labels, memory usage has reached 8028 MiB with 4392 MiB consumed by label memory.
We are unsure if this is normal behavior for Prometheus.
Here are the measurement values:
Number of SMons,Memory Used,PProf Memory Used for Labels,Number of Series,Number of Chunks,Number of Label Pairs
0,45 MiB,-,0,0,0
1,64 MiB,-,9,9,13
2,67 MiB,0.5 MiB (12%),15,15,14
5,80 MiB,6.2 MiB (19%),33,33,17
10,103 MiB,10 MiB (25%),63,63,22
15,123 MiB,20 MiB (39%),93,93,27
20,130 MiB,25 MiB (40%),123,123,32
30,189 MiB,30 MiB (42%),183,183,42
46,297 MiB,55 MiB (48%),273,273,57
348,8028 MiB,4392 MiB (82%),2091,2091,360
These were measured using, `kubectl top pods` and `go tool pprof https//prom-shard/debug/pprof/heap`
The second instance, which we used for comparison, is currently using approximately 9981 MiB.
Here are its measurement values:
Number of SMons,Memory Used,PProf Memory Used for Labels,Number of Series,Number of Chunks,Number of Label Pairs
77,9981 MiB,728 MiB (17%),1124830,2252751,47628
Here it makes sense as to where the memory is being consumed as there are a large amount of label pairs and time series in the HEAD.
We would appreciate recommendations on the best way to set up Prometheus for this scenario?
Is this expected behaviour for Prometheus?
Thanks,
Omero