I have a Prometheus container running on Kubernetes that keeps using more and more memory and keeps matching the memory limit, I keep increasing the memory limit to keep a small buffer so it is not OOM killed, but then directly it increases its memory usage to match the new limit.
Right now it is using 38Gi of memory, I looked at the /metrics endpoint and it looks like it has ~7 million timeseries:
# HELP prometheus_tsdb_head_series Total number of series in the head block.
# TYPE prometheus_tsdb_head_series
gauge prometheus_tsdb_head_series 6.847937e+06
I used pprof to get a memory profile but nothing stands out, I lot of memory is being used in relabel and (*Head)getOrCreateWithID (see inuse.svg)
Containers:
prometheus:
Container ID: docker://31246323d73dc751298a7322d56731a35e8fed827f817561cd4fe1970ee13b02
Image: quay.io/prometheus/prometheus:v2.16.0
Image ID: docker-pullable://quay.io/prometheus/prometheus@sha256:e4ca62c0d62f3e886e684806dfe9d4e0cda60d54986898173c1083856cfda0f4
Port: 9090/TCP
Host Port: 0/TCP
Args:
--web.console.templates=/etc/prometheus/consoles
--web.console.libraries=/etc/prometheus/console_libraries
--storage.tsdb.retention.size=30GB
--config.file=/etc/prometheus/config_out/prometheus.env.yaml
--storage.tsdb.path=/prometheus
--storage.tsdb.retention.time=4h
--web.enable-lifecycle
--storage.tsdb.no-lockfile
--web.external-url=https://thanos-system-querier.xing.io
--web.route-prefix=/
--storage.tsdb.min-block-duration=2h
--storage.tsdb.max-block-duration=2h
State: Running
Started: Thu, 27 Feb 2020 02:09:37 +0100
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Thu, 27 Feb 2020 00:10:42 +0100
Finished: Thu, 27 Feb 2020 02:09:36 +0100
Ready: True
Restart Count: 2
Limits:
memory: 38Gi
Requests:
cpu: 2
memory: 30Gi
Liveness: http-get http://:web/-/healthy delay=180s timeout=5s period=30s #success=1 #failure=5
Readiness: http-get http://:web/-/ready delay=180s timeout=5s period=60s #success=1 #failure=5
Environment: <none>
Mounts:
/etc/prometheus/config_out from config-out (ro)
/etc/prometheus/rules/prometheus-thanos-system-infrastructure-rulefiles-0 from prometheus-thanos-system-infrastructure-rulefiles-0 (rw)
/prometheus from pv-thanos-prometheus-infrastructure (rw)
/var/run/secrets/kubernetes.io/serviceaccount from thanos-prometheus-system-token-kv7tp (ro)
