Hello Michal,
I ran several tests today, during which time I observed a lower than typical testing volume from our business teams. This seemed to have no impact on the startup time, with each instance resulting in a recovery time of more than 40 seconds.
1) Running the list_queues command before and after restarting the rabbit service, I see 0 queued messages in either scenario. I also checked most of the other list_queues commands, all of which yielded similar results.
2) I'm having difficulty in obtaining the exact message sizes, though I made several attempts with the various list_queues message_bytes commands. Some of these queues pass smaller messages, but we also utilize them to pass larger XML bodies. I'll try to get better numbers from our Production environment, but from what I've seen thus far most messages are under 4KB.
3) No, we're using 100% durable queues.
4) Do you need the physical files to view? We handle sensitive data with our application, so while sharing data from the test environment would probably be fine I would prefer not to if possible. If you need the files, I'll get those to you tomorrow. With that being said, all of the queue directories contain a single .queue_name file and the msg_store_transient folder containing a single 0KB 0.rdq file. The msg_store_persistent directory is much more interesting, currently containing 38 .rdq files comprising 128MB of data on this server:
Our typical process that reproduces this issue is as follows:
1) Shut down the message consuming services in Windows Services app
2) Shut down or restart the RabbitMQ service in the Windows Services app
3) Start the RabbitMQ service in Windows Services, then wait for the system to start responding (the Rabbit startup process now typically brings the server to its knees)
I greatly appreciate your time and insight into this matter.
Thanks,
Patrick