Confusion regarding memory utilisation in quorum queues

134 views
Skip to first unread message

Angshu Mukherjee

unread,
Apr 10, 2025, 3:50:29 PMApr 10
to rabbitmq-users
Hi Team,

I am having some confusion understanding how the entire memory management happens with a cluster running Quorum queues. I am just curious to understand how the system works under the hood so please bear with me 🙂.

From the docs, my understanding is that all quorum queue operations are kept in memory along with disk (first written here) in the WAL logs. This does not contain the messages themselves. Now once it reaches a certain limit (configured by raft.wal_max_size_bytes), the contents are written to the disk in segment files.

This is done to ensure that messages that are acknowledged soon are never written to the segment files. 

My first question is do the entries of segment files contain the messages themselves as well? The config raft.segment_max_entries documented here does hint at that.

When we ran our system with quorum queues, instead of seeing a zig-zag pattern with gradual falls as illustrated in the docs, we got very sharp falls.

Pattern in doc:
Quorum Queues memory usage pattern

Pattern we observed
graph.jpg

Curious to explain this pattern, I produced a graph of the memory breakdown as mentioned here using the REST API. Please note that I have kept only the major contributors in the graph.
image (1).png
Based on my understanding, with raft.wal_max_size_bytes set to its default value of 512 MB, I was expecting quorum_ets to reach that size as that contains the WAL and other Quorum related data. Instead I found out that binary which contains the actual message data is taking up most of the memory.

Trying this with different values of raft.wal_max_size_bytes gives a similar pattern.

Why is this the case? Also why was 512MB chosen as the default value and how low can we set it to avoid major performance throttling? I see this discussion in this forum saying that 256 MB is perfectly fine. So just wanted to check how can we decide what should be an appropriate value for the config.

Thank you in Advance.

Angshu

Michal Kuratczyk

unread,
Apr 11, 2025, 12:29:51 PMApr 11
to rabbitm...@googlegroups.com
Hey,

First of all, things will change next week :)

I think the short explanation for the differences between these pictures is simply the time scale: to produce the graph for the docs,
I ran a workload that filled that WAL in just a few seconds. In your case, it seems like it took 3 hours. Such a difference,
especially combined with likely different Prometheus scraping frequency, exact Prometheus query and Grafana settings, may look
quite different. Additionally, Erlang dynamically adds/releases memory to an Erlang process (not to be confused with an operating system
process) and the increments/decrements might be different based on how quickly things happen. Specifically, since I was publishing very quickly,
as soon as the process released the memory, it was already using more of it again. In your case, with a slower publisher(s), once the WAL
memory was released, the process was pretty much empty, so Erlang released more/didn't immediately allocate again.

The memory breakdown is harder to interpret for me. With no additional data I can't really tell exactly why binary memory dominates,
but RabbitMQ indeed mostly deals with binaries (in Erlang terms) so they will be there. The allocated unused part (purple)
is related to what I mentioned before - memory was released from WAL/ETS but not returned to the operating system, since
Erlang runtime assumed it might need it again. 

As for "the appropriate value of raft.wal_max_size_bytes" - if there was the perfect value, we'd just set it and not make it configurable. ;)
We believe 512MB is a reasonable value. I wouldn't change it at all without a specific reason. And remember that things will change
in RabbitMQ 4.1 and likely again in the future. My recommendation would be not to touch it - such "magic values", even if they help
a bit in a given scenario, tend to outlive whatever problems they were solving. RabbitMQ might work very differently in a few years,
and you will still have this value set to some, by then perhaps completely inadequate, value "because it's always been like that". ;)

If you have a specific issue that you are trying to address - please share what it is.

Best,


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/rabbitmq-users/458edc87-e2d5-4c52-9136-210c12f5f9e4n%40googlegroups.com.


--
Michal
RabbitMQ Team

This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.

Michal Kuratczyk

unread,
Apr 14, 2025, 8:49:49 AMApr 14
to rabbitm...@googlegroups.com
Regarding the "binary" memory growing/dominating the graph: the reason is that the Raft ETS table only references the binaries,
while the actual binary values are stored on the binary heap. Therefore, you can see ETS and binary memory usage growing
and then falling together, when the write-ahead log rolls over.

Best,
Reply all
Reply to author
Forward
0 new messages