RabbitMQ takes 50 minutes to start

9 views
Skip to first unread message

Nacho Vargas

unread,
Mar 9, 2026, 4:22:03 PM (8 hours ago) Mar 9
to rabbitmq-users
Hello,

An on-prem single RabbitMQ node takes around 50 minutes to restart. It contains about 4,000 streams, for a total of 1 Terabyte of data. RabbitMQ version 3.13.7 with Erlang 26.5.3.2 on WIndows 2022. Low ingress rate. In the log, almost all time is spent on messages like:

osiris_writer:init/1: name: <STREAM_NAME> last offset: X committed chunk id: Y epoch: Z

This behavior has been reproduced by CloudAMQP support engineers in a 3 node cluster inside AWS, using the latest versions of RabbitMQ and Erlang as supported by CloudAMQP. To reproduce the issue they simultaneously restart RabbitMQ on all the cluster nodes. They observe that increasing IOPS helps. They mention that according to some metrics, IOPS is the limiting factor.

I don't understand why there is so much I/O on restart, this feels like a bug, perhaps? In my head I would expect Rabbit to read the last segment for each stream, or maybe just some metadata, but it feels like it's reading the whole terabyte?

Please help, even if this is the expected behavior by design, I would like to please understand what is doing on restart and why.

Thank you,

- Nacho Vargas.
Reply all
Reply to author
Forward
0 new messages