Starting rabbitmq times out when many durable queus

159 views
Skip to first unread message

jacobh...@gmail.com

unread,
Mar 6, 2022, 4:21:36 PM3/6/22
to rabbitmq-users
We are using RabbitMQ with many durable queues over 200K queues, and when we want to restart the service it times out and fails to start, the only way to solve is to delete the mnesia folder.
Is there some configuration that can help us have so many durable queues and successfully restart the service ?

Wes Peng

unread,
Mar 6, 2022, 4:34:21 PM3/6/22
to rabbitm...@googlegroups.com
1. Keep queue small, consume  out messages as soon as possible
2. Use fast disk such as nvme. No raid needed if you have multiple replications.
3. Upgrade RMQ to the latest version

Thanks 

On Mon, Mar 7, 2022 at 5:21 AM jacobh...@gmail.com <jacobh...@gmail.com> wrote:
We are using RabbitMQ with many durable queues over 200K queues, and when we want to restart the service it times out and fails to start, the only way to solve is to delete the mnesia folder.
Is there some configuration that can help us have so many durable queues and successfully restart the service ?

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/f86d32bb-9c3a-475a-bdec-f2e57c61afaan%40googlegroups.com.

jacobh...@gmail.com

unread,
Mar 6, 2022, 4:59:36 PM3/6/22
to rabbitmq-users
Thank you for the quick response

1. Our queues are very small, just  a few KB or less
2. We are using AWS r5.4xlarge with EBS volume, we will try to change to ssd or nvme
3. We are using  RMQ 3.9.8

Michal Kuratczyk

unread,
Mar 7, 2022, 4:17:50 AM3/7/22
to rabbitm...@googlegroups.com
Hi,

Would you mind sharing:
1. your definitions file (rabbitmqctl export_definitions) or at least one queue definition if they are all the same?
2. node startup logs, ideally at debug level

Also, if you have a definitions JSON file specified in your config file (load_definitions) then:
1. hopefully you can remove it to avoid reimporting queues you already have defined
2. In 3.10, there will be an option to skip importing previously imported definitions file (based on a checksum, so "skip import if the JSON file didn't change" options, see: https://github.com/rabbitmq/rabbitmq-server/blob/master/release-notes/3.10.0.md)

Best,



--
Michał
RabbitMQ team

jacobh...@gmail.com

unread,
Mar 7, 2022, 8:39:17 AM3/7/22
to rabbitmq-users

1. queue definition ( all are the same ) :
"queues": [
                {
                        "arguments": {},
                        "auto_delete": false,
                        "durable": true,
                        "name": "someid",
                        "type": "classic",
                        "vhost": "/"
                },

2. We dont have a definitions JSON file.

3. I cant currently share the log files, is there something specific I should try to look for ?

Michal Kuratczyk

unread,
Mar 7, 2022, 9:02:08 AM3/7/22
to rabbitm...@googlegroups.com
Thanks, I'll define 200k of these and see what I get. Just to confirm:
1. Are there any policies in effect for these queues?
2. These queues are pretty much empty, correct?

The logs would show which steps of the boot process consume the time. If you can at least have a look at the logs and tell us which lines are "far apart" (time-wise), that'd be helpful.
At debug level, you should also see lines such as
- Time to start RabbitMQ ...
- Recovering N queues of type rabbit_classic_queue took Xms
- rabbit_binding:recover/2 for vhost / completed in...

Lastly, how long does it take before RabbitMQ gives up booting?

Best,



--
Michał
RabbitMQ team

jacobh...@gmail.com

unread,
Mar 7, 2022, 9:55:07 AM3/7/22
to rabbitmq-users
Thanks
1. We don't have policies on theses queues
2. Most are empty, some have just a few KB ( is there a simple way to check this at runtime for so many queues ? )

I don't have the exact information because this is a production environment and reproducing means downtime...

Michal Kuratczyk

unread,
Mar 8, 2022, 9:20:56 AM3/8/22
to rabbitm...@googlegroups.com
Hi,

On my test machine, a node with 100k classic queues starts in about 10 minutes. With the upcoming v2 of the classic queues, it goes down to 6 minutes.
How long does it take for you? Any chance for some info from your logs? Bindings, especially topic bindings, can also take a long time to import/load on boot
so it could be that the culprit is not with the queues, but rather the bindings.

RabbitMQ team
Reply all
Reply to author
Forward
0 new messages