Hello,
I have a cluster of 3 RabbitMQ nodes. 2 of the nodes running out out memory once a week.It was very strange that It happend suddently: the memory of the process which named beam.smp was 200M, but it increase suddently to 16G suddently in several seconds. My prometheus get the metrics every 30s, it didn't even catch the increase.I see the oom in the system log. It increase suddently to 16G and killed by the system. What can I do for this oom issue?
My environment:
System: CentOS Linux release 7.9.2009 (Core), 8C32G
Kernal: 3.10.0-1160.6.1.el7.x86_64
RabbitMQ: 3.8.18
Erlang: 24.0
node: all the 3 nodes are disk node
queue: mirror queue and "ha-mode":"all"
MA config
vm_memory_high_watermark.relative = 0.2 # there are some java process also in those nodes, so I set 0.2
vm_memory_high_watermark_paging_ratio = 0.5
disk_free_limit.absolute = 500MB
management.tcp.port = 15672
management.tcp.ip = 0.0.0.0
log.file.level = info
log.file.rotation.size = 10485760
log.file.rotation.count = 7
consumer_timeout = 86400000
System oom kill log
Killed process 30128 (beam.smp), UID 1001, total-vm:20165616kB, anon-rss:16865564kB, file-rss:1156kB, shmem-rss:0kB
The metrics of the oom time