RabbitMQ Server Slow Startup

2,677 views
Skip to first unread message

Simon

unread,
Jul 19, 2018, 4:30:23 AM7/19/18
to rabbitmq-users
We frequently see problem with RabbitMQ Server taking a long time to start. This causes problems with other services that rely on RabbitMQ.

Typical Startup
2018-07-17 12:44:09.589 [info] <0.33.0> Application lager started on node 'rabbit@WIN-KFOQRCTIIHC'
2018-07-17 12:44:09.620 [info] <0.5.0> Log file opened with Lager
2018-07-17 12:44:10.543 [info] <0.33.0> Application crypto started on node 'rabbit@WIN-KFOQRCTIIHC'
2018-07-17 12:44:10.543 [info] <0.33.0> Application xmerl started on node 'rabbit@WIN-KFOQRCTIIHC'
2018-07-17 12:44:10.543 [info] <0.33.0> Application jsx started on node 'rabbit@WIN-KFOQRCTIIHC'
2018-07-17 12:44:11.049 [info] <0.33.0> Application inets started on node 'rabbit@WIN-KFOQRCTIIHC'
2018-07-17 12:44:11.417 [info] <0.33.0> Application mnesia started on node 'rabbit@WIN-KFOQRCTIIHC'
2018-07-17 12:44:11.417 [info] <0.33.0> Application cowlib started on node 'rabbit@WIN-KFOQRCTIIHC'
2018-07-17 12:44:11.472 [info] <0.33.0> Application os_mon started on node 'rabbit@WIN-KFOQRCTIIHC'
2018-07-17 12:44:11.472 [info] <0.33.0> Application recon started on node 'rabbit@WIN-KFOQRCTIIHC'
2018-07-17 12:44:11.473 [info] <0.33.0> Application asn1 started on node 'rabbit@WIN-KFOQRCTIIHC'
2018-07-17 12:44:11.473 [info] <0.33.0> Application public_key started on node 'rabbit@WIN-KFOQRCTIIHC'
2018-07-17 12:44:11.734 [info] <0.33.0> Application ssl started on node 'rabbit@WIN-KFOQRCTIIHC'
2018-07-17 12:44:11.743 [info] <0.33.0> Application ranch started on node 'rabbit@WIN-KFOQRCTIIHC'
2018-07-17 12:44:11.748 [info] <0.33.0> Application cowboy started on node 'rabbit@WIN-KFOQRCTIIHC'
2018-07-17 12:44:11.748 [info] <0.33.0> Application ranch_proxy_protocol started on node 'rabbit@WIN-KFOQRCTIIHC'
2018-07-17 12:44:11.748 [info] <0.33.0> Application rabbit_common started on node 'rabbit@WIN-KFOQRCTIIHC'
2018-07-17 12:44:11.769 [info] <0.196.0> 
 Starting RabbitMQ 3.7.3 on Erlang 20.3
 Copyright (C) 2007-2018 Pivotal Software, Inc.
 Licensed under the MPL.  See http://www.rabbitmq.com/
2018-07-17 12:44:11.774 [info] <0.196.0> 
 node           : rabbit@WIN-KFOQRCTIIHC
 home dir       : C:\Windows\system32\config\systemprofile
 config file(s) : c:/Users/vagrant/AppData/Roaming/RabbitMQ/advanced.config
                : c:/Users/vagrant/AppData/Roaming/RabbitMQ/rabbitmq.conf
 cookie hash    : ajGQKrVtBmORnWipc9i0+Q==
 log(s)         : C:/Users/vagrant/AppData/Roaming/RabbitMQ/log/RABBIT~1.LOG
                : C:/Users/vagrant/AppData/Roaming/RabbitMQ/log/rabbit@WIN-KFOQRCTIIHC_upgrade.log
 database dir   : c:/Users/vagrant/AppData/Roaming/RabbitMQ/db/RABBIT~1
2018-07-17 12:44:44.451 [info] <0.215.0> Memory high watermark set to 1228 MiB (1288301772 bytes) of 3071 MiB (3220754432 bytes) total
2018-07-17 12:44:44.495 [info] <0.217.0> Enabling free disk space monitoring
2018-07-17 12:44:44.495 [info] <0.217.0> Disk free limit set to 50MB
2018-07-17 12:44:44.595 [info] <0.219.0> Limiting to approx 8092 file handles (7280 sockets)


Slow Startup
2018-07-17 09:32:35.707 [info] <0.33.0> Application lager started on node 'rabbit@WIN-QGP85F7NV82'
2018-07-17 09:32:36.376 [info] <0.5.0> Log file opened with Lager
2018-07-17 09:46:26.737 [info] <0.33.0> Application crypto started on node 'rabbit@WIN-QGP85F7NV82'
2018-07-17 09:46:26.737 [info] <0.33.0> Application xmerl started on node 'rabbit@WIN-QGP85F7NV82'
2018-07-17 09:46:26.738 [info] <0.33.0> Application inets started on node 'rabbit@WIN-QGP85F7NV82'

As you can clearly see, crypto apparently takes about 14 minutes to start when we have a Slow Startup situation.

Any help with resolving this would be much appreciated.

Many thanks,
Simon

Daniil Fedotov

unread,
Jul 19, 2018, 5:46:16 AM7/19/18
to rabbitmq-users
Hi,

There is not that much info - will be hard to reproduce that. I can see that lager is the application which starts before crypto, can this be related to the log file size somehow?

Simon

unread,
Jul 19, 2018, 6:41:18 AM7/19/18
to rabbitmq-users
Thanks for the quick reply.

The log files are generally quite small, only about 100KB. I agree there's not much information. If you can tell me how to get more, I'll gladly turn it on.

Luke Bakken

unread,
Jul 19, 2018, 8:26:24 AM7/19/18
to rabbitmq-users
Hi Simon,

I see that you're using Windows and Vagrant. How many virtual CPUs have you assigned to the virtual machine? I frequently test RabbitMQ issues on Windows and haven't seen this behavior, but I usually assign more than one vCPU to the Windows guest.

Also, what hypervisor are you using, and what host OS?

Thanks,
Luke

Simon Gooch

unread,
Jul 19, 2018, 11:37:54 AM7/19/18
to rabbitm...@googlegroups.com
Hi Luke,

Yes that could well be a factor but most of the time RabbitMQ gets past this stage in a few seconds. It's the inconsistency that's the biggest worry, as I don't know what to change to fix it.

Host OS: Windows 7
Hypervisor: Virtual Box
vCPUs: 2

Thanks,
Simon

--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/6Z_HCO1ENgk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Luke Bakken

unread,
Jul 19, 2018, 5:18:07 PM7/19/18
to rabbitmq-users
Is the guest OS that runs RabbitMQ also Windows 7?

Simon Gooch

unread,
Jul 19, 2018, 5:29:44 PM7/19/18
to rabbitm...@googlegroups.com
Sorry I should have included that information.

No the guest OS is Windows Server 2016

Thanks

--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/6Z_HCO1ENgk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

Luke Bakken

unread,
Jul 20, 2018, 8:23:37 PM7/20/18
to rabbitmq-users
Hi Simon,

Sorry to say but I don't have a lot of ideas with regard to this issue, mainly because I have never seen it. I run VirtualBox 5.2.16 on a Linux host and test RabbitMQ using VMs that run Windows 8.1, Windows 10, Windows Server 2008R2 and 2012.

My main thought is to run VirtualBox using a more modern host OS than Windows 7. Using a more modern host OS would allow using Hyper-V rather than VirtualBox.

Luke

Simon Gooch

unread,
Jul 21, 2018, 2:32:53 AM7/21/18
to rabbitm...@googlegroups.com
Hi Luke,

Thanks for trying to help. Unfortunately I can't upgrade the host OS at the moment.

What I'd really like is to enable some more debug to find out what's going on during the long period where it appears to be doing nothing.

Thanks,
Simon

--

Dave Cottlehuber

unread,
Jul 21, 2018, 4:43:33 PM7/21/18
to Simon Gooch, rabbitm...@googlegroups.com

On Sat, 21 Jul 2018, at 08:32, Simon Gooch wrote:
> Hi Luke,
>
> Thanks for trying to help. Unfortunately I can't upgrade the host OS at the
> moment.
>
> What I'd really like is to enable some more debug to find out what's going
> on during the long period where it appears to be doing nothing.

Have you read the windows system event log on both guest and host? This looks like you’re running out of suitable entropy.
> You received this message because you are subscribed to the Google
> Groups "rabbitmq-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> To post to this group, send an email to rabbitm...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



  Dave Cottlehuber
  +43 67 67 22 44 78
  Managing Director
  Skunkwerks, GmbH
  ATU70126204
  Firmenbuch 410811i


Simon

unread,
Jul 23, 2018, 5:06:29 AM7/23/18
to rabbitmq-users
Hi Dave,

Thanks for that. I think you're on to something as some people have reported seeing erlsrv using a lot of CPU time while it's in this state. Unfortunately I can't reproduce this on demand so I'll have to wait for it to happen again before I can check the event logs.

However, this does raise some questions in my head:
1. Is it possible to get some debugging from crypto?
2. Do I need to have crypto running as I'm not interested in security as it's only running on a test VM?
3. Is there a way I can generate entropy from the host machine to enable crypto to start up quicker, if you see what I mean?

Many thanks for your help,
Simon

On Thursday, 19 July 2018 09:30:23 UTC+1, Simon wrote:

Chris

unread,
Feb 19, 2019, 10:25:55 PM2/19/19
to rabbitmq-users
Is there a solution for this problem?

I'm having the same issue, where it gets stuck after "Log file opened with Lager" for anywhere from 1 second to 20 minutes, during this time the ERL.EXE process is running at 100% of one core.

I don't think it's related to crypto, as the modules appear in a different order for me, for example:

2019-02-20 02:07:30.725 [info] <0.33.0> Application lager started on node 'rabbit@EC2AMAZ-I06P2HT'
2019-02-20 02:07:31.209 [info] <0.5.0> Log file opened with Lager
2019-02-20 02:24:33.515 [info] <0.33.0> Application xmerl started on node 'rabbit@EC2AMAZ-I06P2HT'
2019-02-20 02:24:33.531 [info] <0.33.0> Application os_mon started on node 'rabbit@EC2AMAZ-I06P2HT'
2019-02-20 02:24:33.531 [info] <0.33.0> Application inets started on node 'rabbit@EC2AMAZ-I06P2HT'
2019-02-20 02:24:33.609 [info] <0.33.0> Application mnesia started on node 'rabbit@EC2AMAZ-I06P2HT'
2019-02-20 02:24:33.609 [info] <0.33.0> Application crypto started on node 'rabbit@EC2AMAZ-I06P2HT'

I'm using ERL 20.2, Rabbit 3.7.2 on Windows Server 2019 VM (AWS EC2)

Any progress on how to solve this? It's a fairly major issue as it means that if we need to restart the machine or RabbitMQ, sometimes it will come back immediately, but sometimes we have to wait up to 20 minutes for it to start working...

Regards,
Chris

Michael Klishin

unread,
Feb 20, 2019, 2:11:12 AM2/20/19
to rabbitm...@googlegroups.com
In environments with multiple virtual hosts [1] speeds up node startup sometimes up to x5
(roughly proportional to the number of virtual hosts to recover but this is one real world workload
we were using to compare).


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Luke Bakken

unread,
Feb 20, 2019, 9:56:55 AM2/20/19
to rabbitmq-users
Hi Chris,

You may be running into this issue, though I've only seen it on Erlang 21 and higher: https://bugs.erlang.org/browse/ERL-716

Any chance you're only using one virtual CPU?

Thanks -
Luke

Simon

unread,
Feb 20, 2019, 10:05:23 AM2/20/19
to rabbitmq-users
Hi Chris,

Unfortunately we haven't come up with a solution and our investigations confirm what you've seen that it's probably nothing to do with crypto. It just seems to get stuck loading the plugins but we haven't managed to get further than that.
If we could enable some debug that would help us to diagnose the problem that would be useful.

Hope you manage to get to the bottom of this as we've been living with it for the last year.

All the best,
Simon

Chris

unread,
Feb 20, 2019, 6:17:18 PM2/20/19
to rabbitmq-users
Thanks for the replies, guys.

In environments with multiple virtual hosts [1] speeds up node startup sometimes up to x5
(roughly proportional to the number of virtual hosts to recover but this is one real world workload
we were using to compare).
 
I don't think it's related to recovery because this problem happens even on a brand new server that has just had Rabbit installed, with no exchanges or queues configured.

Any chance you're only using one virtual CPU?

The machine does have one vCPU with 2 cores, yes... increasing that is something to try.


Some good news (I think), I just tried downgrading Erlang to 19.3 and so far it seems a lot better, perhaps ERL 19.3 is not affected by this issue. I'm going to keep an eye on it but this looks like it might be a workaround for now...

Luke Bakken

unread,
Feb 21, 2019, 7:37:53 PM2/21/19
to rabbitmq-users
Hi everyone,

When you upgrade next, be sure to use Erlang 21.2. This issue appears to be fixed on Windows with that version:


Thanks,
Luke

Nicholas Randal

unread,
Jan 24, 2020, 1:21:03 PM1/24/20
to rabbitmq-users
TL;DR you must have more than 1 vCPU.

I went through the process of setting up 4 new servers. Erlang 22.2 (10.6) and RabbitMQ 3.8.2 on Windows 2019 servers. As soon as I enabled a couple plugins erlang slammed the CPU and for the next two hours never came down again. I coworker found this thread and I verified that I had only 1 vCPU.

I went back to my test VM that I had initially tested my setup and it had 4 vCPUs. dropped it to 1 and ERLANG slammed the processor. Brought it back up to 2 vCPUs and all was well.

After that I added a PowerShell script check to make sure the system under install has at least 2 logical processors.
Reply all
Reply to author
Forward
0 new messages