We have a cluster of 3 nodes running quorum queues on AWS linux instances with 8GB RAM where we are experiencing problems with memory going over limit when pushing messages into queue and when growing quorum queues in case a node crashes and another node replaces it.
Our messages are quite big – about 5MB - with a rate of about 4/s.
Situation is, that producers push messages into queue without a consumer so a bunch of messages are collecting in a queue (e.g. 5000).
Occasionally, we are experiencing a problem, that a memory on some of the nodes goes over limit and does not go down, which blocks producers.
Another problem we have is, that if there are e.g. 5000 messages 5MB big in the quorum queue and one of the nodes crashes, we are starting a new node, adding it to cluster and then running grow command to add it to quorum. This always results in memory on the attaching node going over limit (e.g. 3-4 times higher than limit) and usually crashing the node (probably due to exhausting the node memory).
We tried
tuning what we think are memory related parameters, without luck. Currently, following
parameters are used in rabbitmq.conf
I replicated the problem into simplified scenario with 3 nodes running on a single computer.
Rabbitmq 3.11.2, Erlang 25.1.1, 32G RAM, Windows 10, rabbitmq.conf as specified above
Scenario 1:
Secnario 2:
During high memory situation, when I open Memory details of the node, I can see that almost all the memory is in Binaries and when opening Binary references, Quorum Queues/quorum is the one consuming almost everything (e.g. 2.2 GiB quorum).
Are there any other parameters besides those stated above that affect memory consumption that we could use to tune rabbitmq in order to resolve the problems that we are experiencing or is this a possible bug? I searched through mailing list as well as github issues but did not find description of similar problem especially connected to quorum grow command (scenario 2).
Any help is appreciated.
Stano