Problem with Rabbitmqctl Windows 2008

117 views
Skip to first unread message

Justin Wombat

unread,
Oct 22, 2014, 10:08:55 AM10/22/14
to rabbitm...@googlegroups.com
Hi,

we deployed RabbitMq 3.3.5 with 64bit Erlang 17 on Windows Server 2008 yesterday.

Yesterday I configured clustering and all our user accounts using rabbitmqctl.

Today after going live with RabbitMq I tried to get status information using rabbitmqctl -n rabbit@SVR1 status

This failed saying that no node is running as follows:

C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.3.5\sbin>rabbitmqctl -n rabbit@SVR1 status

Status of node 'rabbit@SVR1' ...
Error: unable to connect to node 'rabbit@SVR1': nodedown

DIAGNOSTICS
===========

attempted to contact: ['rabbit@SVR1']

rabbit@MESSAGE-SVR1:
  * connected to epmd (port 4369) on SVR1
  * epmd reports: node 'rabbit' not running at all
                  no other nodes on SVR1
  * suggestion: start the node

current node details:
- node name: 'rabbitmqctl2954469@SVR1'
- home dir: C:\Users\mccarj
- cookie hash: <deleted>

The node is up and operational according to the web console and our application is obviously working.

I have used exactly the same node name (including case sensitivity) and also the erlang cookie is correct for my account as it was working yesterday.

I was able to reproduce the same scenario on test servers by killing the epmd.exe after starting up Rabbit.  After I did this rabbitmqctl which was previous working failed to work.  It started working again only after restarting the RabbitMq Windows service.

Anyone got any ideas?

I cannot go restarting the RabbitMq servers on production as they are in use and the next maintenance window is far off.

Thanks,

Justin

Michael Klishin

unread,
Oct 22, 2014, 10:11:38 AM10/22/14
to rabbitm...@googlegroups.com, Justin Wombat
On 22 October 2014 at 18:09:01, Justin Wombat (justin.mcca...@gmail.com) wrote:
> I was able to reproduce the same scenario on test servers by killing
> the epmd.exe after starting up Rabbit. After I did this rabbitmqctl
> which was previous working failed to work. It started working
> again only after restarting the RabbitMq Windows service.
>
> Anyone got any ideas?
>
> I cannot go restarting the RabbitMq servers on production as
> they are in use and the next maintenance window is far off.

Justin,

This s ounds quite similar to https://groups.google.com/d/msg/rabbitmq-users/cDTOGWXNe0A/gHQr4kBXPLYJ.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Simon MacMullen

unread,
Oct 22, 2014, 11:21:24 AM10/22/14
to Justin Wombat, rabbitm...@googlegroups.com
On 22/10/14 15:08, Justin Wombat wrote:
> I was able to reproduce the same scenario on test servers by killing the
> epmd.exe after starting up Rabbit. After I did this rabbitmqctl which
> was previous working failed to work. It started working again only
> after restarting the RabbitMq Windows service.

Yes, so killing epmd will look like this; the new epmd will not know
that the service is there. So two questions present themselves:

A) Why is epmd being killed?

B) How can we recover from it?

Re A), my guess would be that epmd is getting started as the local user
because someone or something is invoking rabbitmqctl.bat *before* the
service is started or anything else has created epmd. Then when that
user logs out, epmd is killed by the OS. If someone who has access to a
windows machine can check that, that would be useful (yeah, I need to
get a Windows VM again...).

Re B), hopefully a future release of RabbitMQ can automatically
re-register as needed. The bug number to look for in release notes is 26426.

But for now, to get you back on the road, I've been able to recover from
epmd death with the following rather annoying process:

1) Make sure epmd is running again. The easy way to do that is invoke
rabbitmqctl.bat, but this will presumably start it up again as the local
user, so you'll face the problem again when logging out. I think the
only way to start epmd as the right user is to use rabbitmq-service.bat
to create a second RabbitMQ service, with a different service name, node
name, port and so on, and then start and stop it. Ugh.

2) Get erl.exe onto your path, or change to the Erlang bin directory. I
think cd %ERLANG_HOME%\bin should do this.

3) Start the Erlang VM interactively with "erl".

4) At the prompt, invoke

erl_epmd:start().

then

erl_epmd:register_node(rabbit, 25672).

The first command should reply {ok, SomeProcessIdentifier}. The second
should reply {ok, SomeNumber}.

(substitute "rabbit" for the node name and "25672" for the distribution
port if you have changed either).

This should print {ok, SomeNumber} if successful, and {error, SomeError}
otherwise.

At this point, epmd knows about the name and port again (so rabbitmqctl
should work again). However, it's also monitoring the erl.exe process
you have open; as soon as that closes it will unregister the name again.

However, now we have access to "rabbitqctl eval", so...

5) In another window, invoke

rabbitmqctl eval
'spawn(fun()->timer:sleep(10000),erl_epmd:register_node(rabbit,25672)end).'

(again substituting "rabbit" and "25672" as needed.)

This will sleep for 10,000ms and then get the server to re-register. But
only one process can claim a name at once, so we need to quit the
erl.exe process before the timer goes off, so...

6) In the window with erl.exe, type

halt().

(or close the window, hammer on ctrl-C etc)

And you're done. Finally.

I've been able to use that procedure to re-register a running broker
with epmd on Unix, so I assume it *should* work for Windows.

Cheers, Simon

Justin Wombat

unread,
Oct 23, 2014, 5:20:40 AM10/23/14
to rabbitm...@googlegroups.com, justin.mcca...@gmail.com
Hi,

first off thanks very much for the support, if you're ever in Belfast I will buy you pints :>

The reason for the failure is exactly as you state I tested it on Windows myself.
  • Stop Rabbit service.
  • Run rabbitmqctl status, this starts epmd as my local user account.
  • Start Rabbit service
  • Run rabbitmqctl status, notice that it works.
  • Log off
  • Log back in
  • epmd has exited due to being killed during logoff
  • Run rabbitmqctl status, notice that it no longer works.
Also this seems to have a negative effect on a server joining a cluster.  For instance Server1 is the master and its epmd is not running correctly.  If I restart Server2 then it will not join the cluster and the Server2 Rabbitmq service will fail to start.

Here is a link to how to run a process as the System account: http://mikehowells.wordpress.com/2011/02/12/running-a-command-prompt-as-nt-authoritysystem/

You can use this tool (psExec) to run "rabbitmqctl status" and it will remain running after a logoff.  You need to run it directly and not create a command prompt to run it "second hand" as it were.

I have tried your instructions but when using rabbitmqctl to run the Erlang that spawns a process that delays and registers it does not work.  There is no error reported.

When I run this I can see the Erlang process starting but it exits immediately.  I have copied and pasted the code exactly as you suggested but it is not seeming to work for me.

Thanks again,

Justin

Simon MacMullen

unread,
Oct 23, 2014, 6:11:13 AM10/23/14
to Justin Wombat, rabbitm...@googlegroups.com
On 23/10/14 10:20, Justin Wombat wrote:
> The reason for the failure is exactly as you state I tested it on
> Windows myself.

Thanks for confirming, that should be a big help going forward.

> Also this seems to have a negative effect on a server joining a cluster.

Yes, any Erlang process that needs to find the node will not be able to
do so. Primarily that means rabbitmqctl and clustering (and
rabbitmq-plugins in 3.4.0+).

> Here is a link to how to run a process as the System
> account: http://mikehowells.wordpress.com/2011/02/12/running-a-command-prompt-as-nt-authoritysystem/
>
> You can use this tool (psExec) to run "rabbitmqctl status" and it will
> remain running after a logoff. You need to run it directly and not
> create a command prompt to run it "second hand" as it were.

Unfortunately I don't think this is a goer - epmd is started
automatically by the Erlang runtime, we don't have any control over it.

However, what we are able to do is prevent rabbitmqctl or
rabbitmq-plugins from starting epmd if it is not already running. Then
if the only thing that starts epmd is the service we should be good.

> I have tried your instructions but when using rabbitmqctl to run the
> Erlang that spawns a process that delays and registers it does not work.
> There is no error reported.

I realise now that I was assuming Unix quoting conventions. I think you
need to substitute double quotes for single ones in step 5.

Cheers, Simon

Justin Wombat

unread,
Oct 24, 2014, 4:08:24 AM10/24/14
to rabbitm...@googlegroups.com, justin.mcca...@gmail.com
Hi

just to say thanks I got everything working on production.

I was suggesting psexec only to start the epmd as the System user so it would not exit and I documented the full procedure I followed here:


Justin

Michael Klishin

unread,
Oct 24, 2014, 4:42:33 AM10/24/14
to rabbitm...@googlegroups.com, Justin Wombat
On 24 October 2014 at 12:08:29, Justin Wombat (justin.mcca...@gmail.com) wrote:
> I was suggesting psexec only to start the epmd as the System user
> so it would not exit and I documented the full procedure I followed
> here:
>
> https://codingbadgers.wordpress.com/2014/10/23/rabbitmq-failure-of-rabbitmqctl-after-previously-working/

Thank you, Justin! We appreciate the post and will refer folks to it.
Reply all
Reply to author
Forward
0 new messages