quorum queues "locking" nodes in cluster

Sillyfrog

unread,

Feb 8, 2023, 2:26:54 AM2/8/23

to rabbitmq-users

HI,

I'm starting to look at the use of quorum queues in Kubernetes. I've got the cluster operator going (using `helm install rabbit -n rabbitmq-system --create-namespace bitnami/rabbitmq-cluster-operator`), and installed a 3 node cluster with a single quorum queue (see attached Kubernetes YAML).

Things have generally been OK (testing killing nodes, performance etc)

One of my tests was to try and fill up the queues to understand what will happen to the cluster and to the associated processes working with the queue, and I've managed to regularly get a node in the cluster to hit a memory alarm (which is fine), but it will then never release the memory, even after the queue is emptied by another process reading from the queue. Some screen shots attached showing the empty queue and memory alarm, and the node memory usage.

In the node memory usage I can see that most of the memory is used by Binaries, but the Binary references are practically zero, which is the confusing part.

I've tried running`rabbitmqctl force_gc` on the node, but it didn't help.

I'm also tried using the `bitnami/rabbitmq:3.11-debian-11` image (which is not the default), however it's got the same issue.

The discussion at https://groups.google.com/g/rabbitmq-users/c/rTZIqOqHUxM looks similar, however the linked PR was not merged, and I'm not sure how to actually apply the proposed settings.

I've also attached my simple scripts I'm using for testing. Some times you need to run the `big-sender.py` twice to trigger it, but generally the first time it won't make it to 1,000 messages. (Passwords etc will need to be updated, the pika package can be installed with `pip install pika`.)

Please let me know if there's any other information I can provide, or if I've missed something obvious :)

Thanks!

big-sender.py

rabbitmq-testing.yaml

memory-alarm.png

big-consumer.py

node-memory-usage.png

Michal Kuratczyk

unread,

Feb 8, 2023, 3:09:14 AM2/8/23

to rabbitm...@googlegroups.com

Hi,

Quorum queues have a "buffer" (write-ahead log) that accumulates data (therefore increasing memory usage) until it's full, and then the data is flushed and the memory released.

We need to update the first paragraph of this section (since 3.10, messages are no longer kept in queue memory) but most of this still applies:

https://www.rabbitmq.com/quorum-queues.html#resource-use

With the default WAL size of 512MB, you didn't allocate sufficient memory for your pod. Realistically, you should just add RAM and you should see "saw-tooth"

pattern when you look at the memory usage graph. Something like this:

Alternatively, if you want a tiny instance for development, you can reduce the WAL size.

Moreover, your messages are 1MB in size. While technically supported (and I know some people use RabbitMQ with even larger messages), this is generally not what messaging systems are designed/optimised for.

You will have a much better experience if you can design your system to send (kilo)bytes, not megabytes.

Best,

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/99ea3b4d-a534-4285-94fe-e36f9485e282n%40googlegroups.com.

--

Michał

RabbitMQ team

Sillyfrog

unread,

Feb 8, 2023, 6:27:15 AM2/8/23

to rabbitmq-users

Perfect, that's it, thank you!

Setting the WAL to a smaller value (64MB), fixed the issue. I have attached the Kubernetes config I used for reference for anyone else.

This is very much a test setup, with an exaggerated workload, to allow us to test what happens when things go bad, and will help us tweak our config on our test clusters (which are much more resource constrained).