Prometheus Memory Usage?

2,076 views

Skip to first unread message

Praveen Maurya

unread,

Apr 3, 2017, 12:58:28 PM4/3/17

to Prometheus Users

I am running prometheus in docker containers with 2gb memory limit :-

docker run --rm -m 2048m

Following are the command line flags :-


alertmanager.notification-queue-capacity 10000
alertmanager.timeout 10s
alertmanager.url 
config.file /etc/prometheus/prometheus.yml
log.format "logger:stderr"
log.level "info"
query.max-concurrency 20
query.staleness-delta 5m0s
query.timeout 2m0s
storage.local.checkpoint-dirty-series-limit 5000
storage.local.checkpoint-interval 5m0s
storage.local.chunk-encoding-version 1
storage.local.dirty false
storage.local.engine persisted
storage.local.index-cache-size.fingerprint-to-metric 10485760
storage.local.index-cache-size.fingerprint-to-timerange 5242880
storage.local.index-cache-size.label-name-to-label-values 10485760
storage.local.index-cache-size.label-pair-to-fingerprints 20971520
storage.local.max-chunks-to-persist 104857
storage.local.memory-chunks 209715
storage.local.num-fingerprint-mutexes 4096
storage.local.path /prometheus
storage.local.pedantic-checks false
storage.local.retention 720h0m0s
storage.local.series-file-shrink-ratio 0.1
storage.local.series-sync-strategy adaptive
storage.remote.graphite-address 
storage.remote.graphite-prefix 
storage.remote.graphite-transport tcp
storage.remote.influxdb-url 
storage.remote.influxdb.database prometheus
storage.remote.influxdb.retention-policy default
storage.remote.influxdb.username 
storage.remote.opentsdb-url 
storage.remote.timeout 30s
version false
web.console.libraries /usr/share/prometheus/console_libraries
web.console.templates /usr/share/prometheus/consoles
web.enable-remote-shutdown false
web.external-url http://someurl:9061/
web.listen-address :9061
web.max-connections 512
web.read-timeout 30s
web.route-prefix /
web.telemetry-path /metrics
web.user-assets

alertmanager.notification-queue-capacity	10000
alertmanager.timeout	10s
alertmanager.url
config.file	/etc/prometheus/prometheus.yml
log.format	"logger:stderr"
log.level	"info"
query.max-concurrency	20
query.staleness-delta	5m0s
query.timeout	2m0s
storage.local.checkpoint-dirty-series-limit	5000
storage.local.checkpoint-interval	5m0s
storage.local.chunk-encoding-version	1
storage.local.dirty	false
storage.local.engine	persisted
storage.local.index-cache-size.fingerprint-to-metric	10485760
storage.local.index-cache-size.fingerprint-to-timerange	5242880
storage.local.index-cache-size.label-name-to-label-values	10485760
storage.local.index-cache-size.label-pair-to-fingerprints	20971520
storage.local.max-chunks-to-persist	104857
storage.local.memory-chunks	209715
storage.local.num-fingerprint-mutexes	4096
storage.local.path	/prometheus
storage.local.pedantic-checks	false
storage.local.retention	720h0m0s
storage.local.series-file-shrink-ratio	0.1
storage.local.series-sync-strategy	adaptive
storage.remote.graphite-address
storage.remote.graphite-prefix
storage.remote.graphite-transport	tcp
storage.remote.influxdb-url
storage.remote.influxdb.database	prometheus
storage.remote.influxdb.retention-policy	default
storage.remote.influxdb.username
storage.remote.opentsdb-url
storage.remote.timeout	30s
version	false
web.console.libraries	/usr/share/prometheus/console_libraries
web.console.templates	/usr/share/prometheus/consoles
web.enable-remote-shutdown	false
web.external-url	http://someurl:9061/
web.listen-address	:9061
web.max-connections	512
web.read-timeout	30s
web.route-prefix	/
web.telemetry-path	/metrics
web.user-assets

So basically, as suggested in some blogs ,"configure storage.local.memory-chunks to be 1/3 or more safe 1/5 of the total RAM". I went overboard and I configured it to 1/10 and max-chunks to persist to half of it. Following are the 3 setups with different rate of samples ingestions (3 different prometheus instances scraping three different targets) :-

Also here are the other graphs of resident_memory(as exposed by "process_resident_memory_bytes"), in-memory chunks ("prometheus_local_storage_memory_chunks") and mem_usage (Its from docker stats <container-id>).

Following are the queries :-

1)AFAIK, docker stats command reads mem-usage form cgroups folder, so Its "cgroup-aware" stats, also it does not take into account disk-cache (https://github.com/docker/docker/issues/10824) . So in that case why my memory(mem-used graph) is increasing continuously, even if chunks in memory have plateaued long back. The number of queries made is only for plotting the graphs in grafana (its count is fixed and minimal) .

2) What does resident memory bytes depict (I recollect somewhere, i read it's somewhat related to RAM usage by prometheus) ,So Which of the graphs to follow, if I want to establish something like this "for this rate of sample ingestion and and this configuration of prometheus (as above) you will be requiring this much amount of RAM "

Björn Rabenstein

unread,

Apr 4, 2017, 11:34:07 AM4/4/17

to Praveen Maurya, Prometheus Users

On 3 April 2017 at 18:58, Praveen Maurya <maury...@gmail.com> wrote:

> 1)AFAIK, docker stats command reads mem-usage form cgroups folder, so Its
> "cgroup-aware" stats, also it does not take into account disk-cache
> (https://github.com/docker/docker/issues/10824) . So in that case why my
> memory(mem-used graph) is increasing continuously, even if chunks in memory
> have plateaued long back. The number of queries made is only for plotting
> the graphs in grafana (its count is fixed and minimal) .

I don't know what exact memory metric the docker-stats output is coming from.
Looking at the value, I would guess it is the virtual memory size.
Which has very
little practical relevance. (You could compare it to the metric
`process_virtual_memory_bytes`.)

> 2) What does resident memory bytes depict (I recollect somewhere, i read
> it's somewhat related to RAM usage by prometheus) ,So Which of the graphs to
> follow, if I want to establish something like this "for this rate of sample
> ingestion and and this configuration of prometheus (as above) you will be
> requiring this much amount of RAM "

Resident memory bytes are determined by reading from the /proc filesystem the
RSS for the own PID. It's the best approximation of "physical memory actually
used". It plateaus in your graph, so all looks good.

In general, I'd expect the RAM usage to depend more on the number of time
series than the ingestion rate. But it will also depend on may other
circumstances.

With v1.6, there will be a new command line flag to just tell Prometheus what
heap size it may grow to.

--
Björn Rabenstein, Engineer
http://soundcloud.com/brabenstein

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany
Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg |
HRB 110657B

Reply all

Reply to author

Forward

0 new messages