Prometheus with remote_write VM dies with OOM: remote_write settings problem?

42 views
Skip to first unread message

Olga Chukanova

unread,
Jan 15, 2021, 4:18:15 AM1/15/21
to Prometheus Users
Hello!
I have prometheus like monitoring system in kubernetes, and I trying to set up remote_write to victoria metrics. But I have one tragic problem - my prometheus dies by OOM. 
 I’ve tested two versions of Prometheus (v2.11.0 and v.2.23.0) and had same problem on both.
My average value of rate(prometheus_remote_storage_samples_in_total [5m]) is ~75k, prometheus pod limits is cpu ‘4’ and memory 6144M and average metric prometheus_remote_storage_shards = 1.
Settings in remote_write are:
        queue_config:
          capacity: 100
          max_samples_per_send: 10000
          max_shards: 10
          min_shards: 1
Global scrape setting:
    global:
      scrape_interval: 10s
      scrape_timeout: 10s
      evaluation_interval: 10s
In logs (with debug mode) I didn’t found anything, what can explayn the problem.
I think, I’m doing something wrong in remote_write setting, but I don’t understand what, and based on wich metrics I should configure that.
Thank you for any help!

Aliaksandr Valialkin

unread,
Jan 15, 2021, 5:52:53 AM1/15/21
to Olga Chukanova, Prometheus Users
Try increasing `capacity` to 3x max_samples_per_send, i.e. to 20000 for your case according to https://prometheus.io/docs/practices/remote_write/ .

Prometheus may require up to 30% more memory after enabling remote_write according to production measurements. Make sure that your Prometheus instance runs on a host with at least 30% of free memory before enabling remote_write on it.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/830a40c0-dac9-43ca-880d-5898a2d70f50n%40googlegroups.com.


--
Best Regards,

Aliaksandr Valialkin, CTO VictoriaMetrics

Olga Chukanova

unread,
Jan 19, 2021, 8:08:01 AM1/19/21
to Aliaksandr Valialkin, Prometheus Users
Firstly - thank you for your answer and advice!
I've increased capacity to 20000, but it didn't help. Anyway, I had to do it, I misread the documentation, thank you.
To calculate required memory, I used this calc: https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion
With my max_over_time(prometheus_tsdb_head_series[1d]) = 769255 and unique label pairs (count(kube_pod_labels{app=~".*"})) 1154, Combined Memory = 3,868MiB
Plus 30% for remote_write and we have ~5.1MiB
My pod has 6Gb memory in limits. May it be not enough 1Gb in reserve? I know if I increase the memory limit to 8Gb it can solve my problem, I just want to be sure this is only one way.

пт, 15 янв. 2021 г. в 13:52, Aliaksandr Valialkin <val...@gmail.com>:


--
С уважением, Ольга.

Reply all
Reply to author
Forward
0 new messages