RabbitMQ 3.7.x startup issues with erl.exe high cpu

749 views
Skip to first unread message

Ralph Kramden

unread,
Apr 12, 2018, 12:14:34 PM4/12/18
to rabbitmq-users
I'm running into an issue starting RabbitMQ where the service says it's running, however CPU usage for erl.exe is high (40-60%) for upwards of 15 minutes.   During this time, it does not process messages, and the management console says it cannot connect.  This is not related to the hibernation/fast boot issue, since this can happen if I manually stop the service and start it.  The strange thing, is that it doesn't happen all the time. It seems to happen around every 3rd or 4th service restart.  Most of the time it takes about 5-10 seconds.  This has happened on 2 different machines.

Setup:
- Windows 2012 R2
- RabbitMQ 3.7.2 or 3.7.4

I've checked the rabbit log, it has a huge time gap between the following statements "[info] <0.5.0> Log file opened with Lager" and "<0.33.0> Application inets started on node".  I've changed the log level to be debug, but hasn't shown anything different.  There are no crash dumps when this happens.  I also used Process Monitor to see what erl.exe is doing during that time, and there doesn't seem to be a difference other than the amount of time it takes.  It does seem like a lot of file operations are happening during this time.

Has anyone experienced anything like this before?

Ralph Kramden

unread,
Apr 12, 2018, 1:38:31 PM4/12/18
to rabbitmq-users
As an update, I've tried Erlang OTP 20.1 and 20.3, and both have had the issue.

Luke Bakken

unread,
Apr 12, 2018, 1:52:19 PM4/12/18
to rabbitmq-users
Hi Ralph,

Have you seen this on multiple Windows 2012 R2 servers or just the one? Is it a VM or bare metal?

Enabling rabbitmq-top and using that to investigate may shed some light on what's going on during this time: https://github.com/rabbitmq/rabbitmq-top

If that fails to work let me know and I'll figure out the equivalent command to run from the command line.

Thanks,
Luke

Ralph Kramden

unread,
Apr 12, 2018, 2:29:34 PM4/12/18
to rabbitmq-users
Thanks for the suggestion, however while this is happening the management console is unavailable.  Once it finishes doing whatever erlang is doing, it becomes available, but by then it's not helpful.

Seeing on multiple Windows 2012 R2 servers.  They are in Virualbox, unfortunately we don't have bare metal servers at this time.  We tried using fixed disks instead of dynamic disks, but no difference.  We're using the latest release of Virtualbox.  Our development machines, which are Windows 10, do no exhibit this behavior - it becomes available within seconds.  I'm going to test using a Hyper-V VM as well.

Does Rabbit/Erlang have any known issues running in a VM?

Thanks!

Luke Bakken

unread,
Apr 12, 2018, 4:18:21 PM4/12/18
to rabbitmq-users
Hi Ralph -

I regularly test RabbitMQ features using Windows 8.1 Pro and Windows 10 Pro on VirtualBox (hosted by Arch Linux) and have not experienced this issue.

You could poke at the erl.exe process using procinfo64.exe - I would be interested to see if there is high I/O or high thread count.

Could you also test with Erlang 19.3? I would be curious to know if that version exhibits this issue.

On my Windows 10 system, I can run etop (http://erlang.org/doc/man/etop.html) against RabbitMQ this way:

* Open an administrative cmd.exe prompt

* Change to the "erts-XXX" subdirectory of your Erlang installation. Mine is C:\erlang\19.3\erts-8.3\bin

* Run hostname to see what your hostname should be

* Run .\epmd.exe -names to ensure that rabbit is running on port 25672

* Figure out what the value of your Erlang cookie is. It is in a file named .erlang.cookie and will be in one of the locations mentioned here - https://www.rabbitmq.com/clustering.html#erlang-cookie

* Run this command (WIN81 is what the hostname command outputs for me - substitute with yours):

.\erl.exe -sname test@WIN81 -setcookie TEXTVALUEOFCOOKIE

* You'll get a shell. To run etop, enter the following function call (again, using your own host name). Note that you'll have to surround the node name with single-quotes since it is an atom:

etop:start([{node, 'rabbit@WIN81'}]).

You'll see information scroll by. Hopefully we can get information about what's going on during startup. Of course there's a chance that whatever is taking so long will prevent etop from running, too.

Thanks,
Luke
Reply all
Reply to author
Forward
0 new messages