Hello!
I have prometheus like monitoring system in kubernetes, and I trying to set up remote_write to victoria metrics. But I have one tragic problem - my prometheus dies by OOM.
I’ve tested two versions of Prometheus (v2.11.0 and v.2.23.0) and had same problem on both.
My average value of rate(prometheus_remote_storage_samples_in_total [5m]) is ~75k, prometheus pod limits is cpu ‘4’ and memory 6144M and average metric prometheus_remote_storage_shards = 1.
Settings in remote_write are:
queue_config:
capacity: 100
max_samples_per_send: 10000
max_shards: 10
min_shards: 1
Global scrape setting:
global:
scrape_interval: 10s
scrape_timeout: 10s
evaluation_interval: 10s
In logs (with debug mode) I didn’t found anything, what can explayn the problem.
I think, I’m doing something wrong in remote_write setting, but I don’t understand what, and based on wich metrics I should configure that.
Thank you for any help!