On 22/10/14 15:08, Justin Wombat wrote:
> I was able to reproduce the same scenario on test servers by killing the
> epmd.exe after starting up Rabbit. After I did this rabbitmqctl which
> was previous working failed to work. It started working again only
> after restarting the RabbitMq Windows service.
Yes, so killing epmd will look like this; the new epmd will not know
that the service is there. So two questions present themselves:
A) Why is epmd being killed?
B) How can we recover from it?
Re A), my guess would be that epmd is getting started as the local user
because someone or something is invoking rabbitmqctl.bat *before* the
service is started or anything else has created epmd. Then when that
user logs out, epmd is killed by the OS. If someone who has access to a
windows machine can check that, that would be useful (yeah, I need to
get a Windows VM again...).
Re B), hopefully a future release of RabbitMQ can automatically
re-register as needed. The bug number to look for in release notes is 26426.
But for now, to get you back on the road, I've been able to recover from
epmd death with the following rather annoying process:
1) Make sure epmd is running again. The easy way to do that is invoke
rabbitmqctl.bat, but this will presumably start it up again as the local
user, so you'll face the problem again when logging out. I think the
only way to start epmd as the right user is to use rabbitmq-service.bat
to create a second RabbitMQ service, with a different service name, node
name, port and so on, and then start and stop it. Ugh.
2) Get erl.exe onto your path, or change to the Erlang bin directory. I
think cd %ERLANG_HOME%\bin should do this.
3) Start the Erlang VM interactively with "erl".
4) At the prompt, invoke
erl_epmd:start().
then
erl_epmd:register_node(rabbit, 25672).
The first command should reply {ok, SomeProcessIdentifier}. The second
should reply {ok, SomeNumber}.
(substitute "rabbit" for the node name and "25672" for the distribution
port if you have changed either).
This should print {ok, SomeNumber} if successful, and {error, SomeError}
otherwise.
At this point, epmd knows about the name and port again (so rabbitmqctl
should work again). However, it's also monitoring the erl.exe process
you have open; as soon as that closes it will unregister the name again.
However, now we have access to "rabbitqctl eval", so...
5) In another window, invoke
rabbitmqctl eval
'spawn(fun()->timer:sleep(10000),erl_epmd:register_node(rabbit,25672)end).'
(again substituting "rabbit" and "25672" as needed.)
This will sleep for 10,000ms and then get the server to re-register. But
only one process can claim a name at once, so we need to quit the
erl.exe process before the timer goes off, so...
6) In the window with erl.exe, type
halt().
(or close the window, hammer on ctrl-C etc)
And you're done. Finally.
I've been able to use that procedure to re-register a running broker
with epmd on Unix, so I assume it *should* work for Windows.
Cheers, Simon