rabbitmq spending most of the time garbage collecting on large(r) queues

743 views
Skip to first unread message

Tomáš Dubec

unread,
Mar 14, 2023, 7:46:22 AM3/14/23
to rabbitmq-users
hello guys,
we are experiencing a rather peculiar issue with our RMQ cluster. If one queue gets to a rather bigger size (~20M-30M msgs of size 250B), consuming it gets really slow. If the size grows to ~60M, it's almost impossible to consume the queue in reasonable time.
Further investigation shows, that RMQ spends all the time garbage collecting (whole CPU core):

$ rabbitmq-diagnostics runtime_thread_stats --sample-interval 30
Average thread real-time    : 30010755 us
Accumulated system run-time : 25298319 us
Average scheduler run-time  :   129877 us

        Thread      aux check_io emulator       gc    other     port    sleep

Stats per thread:
...
dirty_cpu_( 3)    0.00%    0.00%    0.00%   80.58%    0.00%    0.00%   19.42%
...
Stats per type:
         async    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
           aux    0.06%    0.02%    0.00%    0.00%    0.00%    0.00%   99.92%
dirty_cpu_sche    0.00%    0.00%    0.00%   10.07%    0.00%    0.00%   89.93%
dirty_io_sched    0.00%    0.00%    0.02%    0.00%    0.00%    0.00%   99.98%
          poll    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
     scheduler    0.02%    0.01%    0.28%    0.03%    0.02%    0.07%   99.57%


this can be further verified by running observer or even `perf top`.
My question is, why is GC triggered so often, there is plenty of free RAM, we are nowhere near hitting memory watermark, there is no other load on the RMQ cluster (I have a cluster just for testing this).
Also, it looks like the GC is not really doing anything, I can see memory usage doubling during GC (that's by design) and then getting back to roughly the original size, nothing is freed.
Is there a way to tune this? Any of the `_gc_` related configuration directives don't seem to do anything..

We are using quorum queues with group size 3. Cluster has 5 nodes (8GB RAM each).
RMQ 3.11.9, erlang 25.2.3.

Any advice is much appreciated!
Thanks

Tomas Dubec

Michal Kuratczyk

unread,
Mar 14, 2023, 8:35:25 AM3/14/23
to rabbitm...@googlegroups.com
Hi,

We've just merged two changes that can help and will be included in 3.12:

It'd be great if you could test it. Details of how to try 3.12 depend on how you install/run it. Using docker image
pivotalrabbitmq/rabbitmq:v3.12.x-otp-max-bazel is probably the easiest.

Also, if you can provide https://perftest.rabbitmq.com/ commands that reproduce your scenario, that would be helpful as well.

Having said that, can you explain why you need tens of millions of messages in the queue? If you expect long queues,
perhaps a stream is what you really need?

Best,

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/042eac76-01cb-443f-b876-702fc1d0a56bn%40googlegroups.com.


--
Michał
RabbitMQ team

Tomáš Dubec

unread,
Mar 14, 2023, 9:04:26 AM3/14/23
to rabbitmq-users
Hi Michal,
thanks for the response. Perftest commands to reproduce are:
publisher:
java -jar perf-test-2.19.0.jar -h ${URI} --use-millis -x4 --routing-key rk11 -y0 --queue-pattern 'q-%d' --queue-pattern-from 1 --queue-pattern-to 1 --quorum-queue -f persistent --id test10 -s 250 -r20000 -c 500 --multi-ack-every 1000 -qa x-quorum-initial-group-size=3 -b 10
consumer:
java -jar perf-test-2.19.0.jar -h ${URI} --use-millis -x0 --routing-key rk11 -y20 --queue-pattern 'q-%d' --queue-pattern-from 1 --queue-pattern-to 1 --quorum-queue -f persistent --id test10 -s 250 -r20000 -c 500 --multi-ack-every 1000 -qa x-quorum-initial-group-size=3 -b 10

First I run publisher until ~30M messages pile up (GC already being triggered a lot, publishing is slowing down in time), then stop publisher and run consumer. First it takes several tens of seconds to get any messages, after that consuming rate is ~1000msgs/sec (where I know the cluster is capable of delivering two orders of magnitude more).

The use case is not a standard situation, but rather a corner case, when consumer for some reason goes haywire and stops consuming (or slows down significantly). It's then really complicated to get out of such situation, since it's almost impossible to consume the backlog (no matter how many consumers, it's throttled on RMQ itself). This did happen on production to us, above perf test scenario is to simulate it (which it does rather perfectly)

We'll try to compile/package RMQ with mentioned PRs and test it..

Thanks
T.D.

Michal Kuratczyk

unread,
Mar 14, 2023, 9:12:06 AM3/14/23
to rabbitm...@googlegroups.com
Cool, sounds exactly like the scenario we wanted to address in these PRs. They've been merged, so you can just use the main branch (or v3.12.x).

I will give it a quick test in a moment to see how it goes.

Best,



--
Michał
RabbitMQ team

Michal Kuratczyk

unread,
Mar 14, 2023, 10:27:42 AM3/14/23
to rabbitm...@googlegroups.com
So here are my observations:

1. First of all, 3.12 doesn't slow down when publishing at full speed to a long queue. This is the main issue we wanted to address in Ra 2.5.0 (to be included in RabbitMQ 3.12) and it does indeed help in your case as well.
2. Consumption, while not great, is also better.
3. I've tested with 12M messages. I had a quick look at a longer queue and it was definitely worse, but honestly, I don't think we are going to prioritize this, since queues have never been meant to get this long (those who have such use cases are welcome to speak up, try streams, or contribute improvements).
4. Using many consumers to speed up draining a very long queue is pointless, unless the time to process the message is significant (in which case we are really speeding up message processing, not queue consumption). perf-test has a large default of 2000 message prefetch (you can change it with -q). Many consumers multiplied by such a prefetch lead to a lot of data being loaded into memory which leads to excessive garbage collection.

Here are the results with a somewhat modified test case (most importantly, only one consumer):
    perf_test publish -x 4 -y 0 -u qq -qq -r 20000 -c 500 -b 10 -C 3000000
    perf_test consume -x 0 -y 1 -u qq -qq --multi-ack-every 1000 -b 10

Screenshot 2023-03-14 at 15.25.20.png
I didn't wait for 3.11 to finish consuming the queue but it'd get faster over time, as the queue got shorter.

Best,
--
Michał
RabbitMQ team

Jérôme Loyet

unread,
Jun 22, 2023, 1:19:51 PM6/22/23
to rabbitmq-users
Hello,

we have use case where we can pile up millions of messages. Here is the context: we are managing a huge storage cluster of mechanical HDD disks (like thousands of them) and we'd like to use rabbitmq to store file deletion events. Sometime (depending on our customers), we can receive at once thousands (some time several millions) of delete in a really short time. As the HDD disks are used to read and write files and are often 100% used, the consumers of those event take some time to consume all the deletion queue.

In a normal situation, the event should be consumed quite quickly, there are plenty of situations were events could pile up for several hours/days. While speed of consuming is not a key factor in our situation the reliability of the process is. On a 3 nodes test cluster, I used perf_test to put 80M messages of 5KB in a single quorum queue and I stopped cleanly a follower node during the process. less than 2 minutes after I started it again and it's been more than 2 hours and the rabbitmq-queues quorum_status command still show the restarted node as timeout.

Status of quorum queue bench-queue-single on node rabbit@admin-25729 ...
│ Node Name                 │ Raft State │ Log Index │ Commit Index │ Snapshot Index │ Term │ Machine Version │
│ rabbit@admin-26586 │ leader     │ 137130040 │ 137130040    │ 50001135       │ 1    │ 3               │
│ rabbit@admin-25729 │ follower   │ 137130040 │ 137130040    │ 50001135       │ 1    │ 3               │
│ rabbit@admin-25726 │ timeout    │           │              │                │      │                 │


From what you said, I understand that we should NOT go with this workload in production. What are our options if we want to be able to temporary store, let's say, 100M of events ?

- splitting the single queue of 100M to several queues (but how many ? 100 with 1M each or 1000 with 100k each ?)
- use stream instead of quorum. But does our use case match the spirit of stream queues ?

Thanks for the help !! :-)

Regards
++ Jerome

Michal Kuratczyk

unread,
Jun 22, 2023, 2:41:15 PM6/22/23
to rabbitm...@googlegroups.com
80 million messages in a quorum queue is ~10x more than the max we normally test with, so there could be dragons. ;)

A couple of questions:
* What are the ordering requirements? Is there a global order for all events? A partial order per customer or some device?
* When you stopped the node, how many messages were in the queue? How many were published when the node was down? (roughly)
* Is the 5kb set in stone? Is this a real value you have in your system and something you can't change?

Best,



--
Michał
RabbitMQ team

Jérôme Loyet

unread,
Jun 22, 2023, 3:01:28 PM6/22/23
to rabbitmq-users
Le jeudi 22 juin 2023 à 20:41:15 UTC+2, Michal Kuratczyk a écrit :
80 million messages in a quorum queue is ~10x more than the max we normally test with, so there could be dragons. ;)
I guess so (and I figured out too) :-D
 

A couple of questions:
* What are the ordering requirements? Is there a global order for all events? A partial order per customer or some device?
there is no ordering requirement for those events, they can be consumed in any order.
 
* When you stopped the node, how many messages were in the queue? How many were published when the node was down? (roughly)
there were like 60-70M messages, don't remember the exact number.
 
* Is the 5kb set in stone? Is this a real value you have in your system and something you can't change?
it is the average that we have now, some are a bit less, some a bit more. In one message there are between 9 and 12 URLs (+ some data) corresponding to 9-12 files to be deleted on different disks. So we can reduce the size by splitting each message but in this case we will rise x9-12 the number of messages.

kjnilsson

unread,
Jun 23, 2023, 4:23:10 AM6/23/23
to rabbitmq-users
A couple of questions:

1. What version of RabbitMQ.
2. What disks are used for the RabbitMQ cluster? I assume not HDDs but with that kind of data the fastest NVMEs would be recommended.

But really this feels like a streams use case.

Cheers

Jérôme Loyet

unread,
Jun 23, 2023, 5:02:46 AM6/23/23
to rabbitm...@googlegroups.com
- we are using rabbitmq 3.12.0 and Erlang 25.0.4
- rabbitmq servers use samsung enterprise PCIE 4.0 nvme

kjnilsson

unread,
Jun 23, 2023, 6:21:29 AM6/23/23
to rabbitmq-users
I still think streams would be a good way to deal with this kind of data. Alternatively you can spread the events over multiple queues which may well behave better,

Also you could try to increase the number of entries per disk segment. This may allow the recovery to proceed somewhat faster.

raft.segment_max_entries = 65536

Michal Kuratczyk

unread,
Jun 30, 2023, 5:52:12 AM6/30/23
to rabbitm...@googlegroups.com
We have discussed this use case briefly in a video we recorded yesterday: https://www.youtube.com/watch?v=vFmPWz8n_rI

It's an interesting one and certainly requires a bit of trial and error to find the best approach. Personally I'd expect multiple queues to handle this use case best. Since these are tasks, destructive consumption is better - once performed, the operation should no longer be in the queue (it'd remain in the stream).
A single very long quorum queue is indeed not great due to quorum queue recovery and other issues.

Given no ordering requirements and no strict latency requirements (I mean that it's ok if a task takes a bit of time to execute),
I'd expect a random exchange with a bunch of classic queues (v2 of course ;) ) to work well:
* classic queues v2 can handle very long queues quite well, but with messages split between many queues, they wouldn't even have to
* if a node with a given queue is unavailable for some time, the tasks from that queue(s) will take longer to execute, but that shouldn't be a huge problem
* local random exchange (https://github.com/rabbitmq/rabbitmq-server/pull/8334, discussed in the video) could further improve this solution by making sure that messages are always published locally (where the publishing connection is) and with a bit of work, that they are consumed locally as well

As for how many queues - that'd certainly require trial and error, but I'd expect the number to be a few per node, not more.
If we split even 100M messages between 3 nodes and then further into a few queues, that'd be a few million messages per queue.
This should be handled quite well with classic queues (or even quorum queues, but there would be replication overhead for sure).

Lastly, keep in mind that classic queues (both v1 and v2) have two storage mechanisms: messages embedded in the index and a shared message store.
By default, the message store is used for messages above 4kb (https://www.rabbitmq.com/persistence-conf.html#index-embedding).
So this is another dimension worth testing/investigating: there could be a significant difference based on whether the messages are below this threshold or not (changing the value of queue_index_embed_msgs_below is an alternative).

Best,




--
Michał
RabbitMQ team
Reply all
Reply to author
Forward
0 new messages