RabbitMQ takes 50 minutes to start

86 views
Skip to first unread message

Nacho Vargas

unread,
Mar 9, 2026, 4:22:03 PMMar 9
to rabbitmq-users
Hello,

An on-prem single RabbitMQ node takes around 50 minutes to restart. It contains about 4,000 streams, for a total of 1 Terabyte of data. RabbitMQ version 3.13.7 with Erlang 26.5.3.2 on WIndows 2022. Low ingress rate. In the log, almost all time is spent on messages like:

osiris_writer:init/1: name: <STREAM_NAME> last offset: X committed chunk id: Y epoch: Z

This behavior has been reproduced by CloudAMQP support engineers in a 3 node cluster inside AWS, using the latest versions of RabbitMQ and Erlang as supported by CloudAMQP. To reproduce the issue they simultaneously restart RabbitMQ on all the cluster nodes. They observe that increasing IOPS helps. They mention that according to some metrics, IOPS is the limiting factor.

I don't understand why there is so much I/O on restart, this feels like a bug, perhaps? In my head I would expect Rabbit to read the last segment for each stream, or maybe just some metadata, but it feels like it's reading the whole terabyte?

Please help, even if this is the expected behavior by design, I would like to please understand what is doing on restart and why.

Thank you,

- Nacho Vargas.

Luke Bakken

unread,
Mar 10, 2026, 11:38:59 AMMar 10
to rabbitmq-users
Hello,

Your first step must be to use the latest version of RabbitMQ, which is 4.2.4, as well as the latest compatible version of Erlang, which is 27.3.4.8. You don't mention which versions were used by CloudAMQP, other than to say "latest ... as supported", which isn't precise.

Your next step, after reproducing with the latest version of RabbitMQ, is to provide a project / script / docker compose project that reproduces the issue, every time. Once you have that, post a discussion and Team RabbitMQ may be able to find time to assist you:

"if this is the expected behavior by design, I would like to please understand what is doing on restart and why."

Finally, let me mention this - in the age of open-source software and agentic AI, it has never been easier to answer your own questions using the source code of the software you run.

Thanks,
Luke

Nacho Vargas

unread,
Mar 10, 2026, 4:19:23 PMMar 10
to rabbitmq-users
Thank you for your reply.

Publishing at a rate of 50 messages/second, 50Kb per message, it takes almost 5 days to make it to 1 terabyte, if I haven't made a most embarrassing mistake in the math: 50 msg/sec * 50 Kb/sec = 2.5 Mb/sec => 150 Mb/minute => 9 Gb/hour => 0.216 Tb/day

I am not sure what is the most practical way to share this with you. I can do a docker with the versions of RabbitMQ and Erlang that you mention, then a script that creates a random exchange then 4,000 streams each bound to the exchange then publish 50 messages/second, 50Kb per message, but you would have to keep it running for 5 days. Or I could do the publishing myself then share with you a docker that is already populated with data, but the size of the image would be over 1 terabyte. What would be the best way to share this with you, please?

I do not know which versions were used by CloudAMQP, I believe 4.2.2 with Erlang 28 but I don't know for sure.

I am not well versed on agentic AI but I will give it a try, thank you.
Reply all
Reply to author
Forward
0 new messages