RabbitMQCtl status shows nodedown after logout/login Windows

1,168 views
Skip to first unread message

Brian Locke

unread,
Aug 21, 2014, 5:08:40 PM8/21/14
to rabbitm...@googlegroups.com
Moving conversation here from Legacy list:

I have 3 RabbitMQ servers running on 3 windows servers as a service (local system acct) . They are clustered and are up and running. The management console shows all up, etc.
I have the same .erlang.cookie in my c:\windows and c:\users\username folder.

This morning I was using rabbitmqctl normally and all was good. I logged out of the machine instead of my usual disconnect (via RDP).
When I logged back in (as same user) and run rabbitmqctl status I get the nodedown error like it's not running. To verify I did this on a second server and the same results occurred. The third server I have left as-is (meaning I didn't log out) and rabbitmqctl status works just fine. It does ,however, fail when I run rabbitmqctl report.

What else am I missing in troubleshooting this?


On 20/08/14 18:57, Brian Locke wrote:
What else am I missing in troubleshooting this?

Which version of RabbitMQ are you running? What error messages are you seeing? 3.3.x should give reasonably sensible diagnostics in this situation.
3.3.4

I have the same .erlang.cookie in my c:\windows and c:\users\username folder.

You might check that it's getting picked up - check the log file for a hash of the cookie used by the server, check the output of a rabbitmqctl failure for a hash of the cookie used by rabbitmqctl.
They are the same:
------------------------- Start Server Log ------------------------
"=INFO REPORT==== 18-Aug-2014::19:56:21 ===
node           : rabbit@RABBITMQ2
home dir       : C:\Windows
config file(s) : c:/rabbitmq_data/rabbitmq.config
cookie hash    : 3NIVc4nT0YGsAURF7ilx1g==
log            : c:/rabbitmq_data/log/rab...@RABBITMQ2.log
sasl log       : c:/rabbitmq_data/log/rab...@RABBITMQ2-sasl.log
database dir   : c:/rabbitmq_data/db/rabbit@RABBITMQ2-mnesia"
------------------------- End Server Log ------------------------
------------------------- Start Cmd Prompt ------------------------
C:\RabbitMQ Server\rabbitmq_server-3.3.4\sbin>rabbitmqctl.bat status
Status of node rabbit@RABBITMQ2 ...
Error: unable to connect to node rabbit@RABBITMQ2: nodedown

DIAGNOSTICS
===========

attempted to contact: [rabbit@RABBITMQ2]

rabbit@RABBITMQ2:
  * connected to epmd (port 4369) on RABBITMQ2
  * epmd reports: node 'rabbit' not running at all
                  other nodes on RABBITMQ2: [rabbitmqctl1379248]
  * suggestion: start the node

current node details:
- node name: rabbitmqctl1379248@RabbitMQ2
- home dir: C:\Users\LBWebUser
- cookie hash: 3NIVc4nT0YGsAURF7ilx1g==
------------------------- End Cmd Prompt ------------------------ 

Cheers, Simon

Simon MacMullen

unread,
Aug 22, 2014, 6:42:37 AM8/22/14
to Brian Locke, rabbitm...@googlegroups.com
On 21/08/14 22:08, Brian Locke wrote:
> Moving conversation here from Legacy list:

Hi, thanks.

> I have 3 RabbitMQ servers running on 3 windows servers as a service
> (local system acct) . They are clustered and are up and running. The
> management console shows all up, etc.
> I have the same .erlang.cookie in my c:\windows and c:\users\username
> folder.
>
> This morning I was using rabbitmqctl normally and all was good. I logged
> out of the machine instead of my usual disconnect (via RDP).
> When I logged back in (as same user) and run rabbitmqctl status I get
> the nodedown error like it's not running.

<snip>

> DIAGNOSTICS
> ===========
>
> attempted to contact: [rabbit@RABBITMQ2]
>
> rabbit@RABBITMQ2:
> * connected to epmd (port 4369) on RABBITMQ2
> * epmd reports: node 'rabbit' not running at all
> other nodes on RABBITMQ2: [rabbitmqctl1379248]
> * suggestion: start the node

The interesting bit there is:

> * epmd reports: node 'rabbit' not running at all

But you said that the RabbitMQ instance is actually running - you can
connect over AMQP and the management plugin.

So epmd is somehow confused. epmd is a small program started by Erlang
apps to manage the mapping of node names (e.g. "rabbit") to Erlang
distribution ports (e.g. 25672). This mechanism is used by rabbitmqctl
and to establish new cluster connections.

My hypothesis is that epmd is getting killed when you log out, although
the RabbitMQ service continues to run. It is started again the next time
you invoke rabbitmqctl, but at that point it does not know about the
RabbitMQ service.

Can you verify this with Task Manager or whatever? I don't have access
to a Windows machine today.

If this is happening, it might be worth investigating more:

* Does it depend on whether you start the RabbitMQ service first or
rabbitmqctl? (I am picturing epmd getting started as a different user or
in a different context somehow?)

* Does it depend on the Erlang version used? (I am suspicious that you
and someone else both reported the issue near-simultaneously - why
haven't we heard about this before?)

Sorry to not investigate further myself but I'm about to go on holiday...

Cheers, Simon

Ahmed Alani

unread,
Aug 22, 2014, 12:28:59 PM8/22/14
to rabbitm...@googlegroups.com, brian...@datascan.com
Go ahead and add me as a third person experiencing this. I installed on windows and am seeing the same behavior when I log out. Some general information:  
* Windows Server 2012 R2 VM in Azure
* I installed ERLANG 64-bit 17.0 as my ***local user account***, which is an admin
* I installed RABBITMQ as my ***local user account***, which is an admin. <---Could this be a problem? 
* I changed permissions and locations of conf/logs/db using environmental variables. 
* I modified the windows service to run as an AD service account. 

Question first:
* Are you suggesting epmd is killed when we log out, then restarted on login? Or the next time we run rabbitmqctl status?
* If this confusion does happen, could this disrupt clustering and lead to a partition?

Answering your questions:

> Can you verify this with Task Manager or whatever? I don't have access 
> to a Windows machine today. 

I do not see 'epmd' in task manager. Only erl and erlsrv

> * Does it depend on whether you start the RabbitMQ service first or 
> rabbitmqctl? (I am picturing epmd getting started as a different user or 
> in a different context somehow?) 

I stopped the service, logged out of the box. Logged back in, ran rabbitmqctl status, got 'node not running at all', started service, checked status and saw that it was up. Logged out of the box and back in, rabbitmqctl status reports 'node not running at all' again.

Simon MacMullen

unread,
Aug 22, 2014, 1:19:23 PM8/22/14
to rabbitm...@googlegroups.com
On 22/08/2014 17:28, Ahmed Alani wrote:
> Question first:
> * Are you suggesting epmd is killed when we log out, then restarted on
> login? Or the next time we run rabbitmqctl status?

I'm guessing it's killed at logout. It's started again the next time you
run an Erlang program, which typically means running rabbitmqctl or
starting the service.

> * If this confusion does happen, could this disrupt clustering and lead
> to a partition?

It shouldn't; epmd is only used to figure out how to initially make the
clustering connection; existing connections should continue to work. If
you *do* experience a partition though it might be hard to end it until
epmd can

> Answering your questions:
>
> > Can you verify this with Task Manager or whatever? I don't have access
> > to a Windows machine today.
>
> I do not see 'epmd' in task manager. Only erl and erlsrv

That backs up the hypothesis at least.

> > * Does it depend on whether you start the RabbitMQ service first or
> > rabbitmqctl? (I am picturing epmd getting started as a different user or
> > in a different context somehow?)
>
> I stopped the service, logged out of the box. Logged back in, ran
> rabbitmqctl status, got 'node not running at all', started service,
> checked status and saw that it was up. Logged out of the box and back
> in, rabbitmqctl status reports 'node not running at all' again.

I meant that both running rabbitmqctl and starting the Windows service
should start epmd; but the difference might lie in which of those things
you do *first* when epmd is down.

Cheers, Simon

Justin Wombat

unread,
Oct 22, 2014, 10:42:59 AM10/22/14
to rabbitm...@googlegroups.com
Hi

so I have the same problem just noticed today on some production servers.  Restarting the Rabbit service is not really an option as these are servers are being used.

Is there no way round this issue on Windows?

I saw the problem with Erlang 17 64bit, Rabbit 3.3.5 and Windows 2008.

Thanks,

Justin


On Thursday, August 21, 2014 10:08:40 PM UTC+1, Brian Locke wrote:
Moving conversation here from Legacy list:

I have 3 RabbitMQ servers running on 3 windows servers as a service (local system acct) . They are clustered and are up and running. The management console shows all up, etc.
I have the same .erlang.cookie in my c:\windows and c:\users\username folder.

This morning I was using rabbitmqctl normally and all was good. I logged out of the machine instead of my usual disconnect (via RDP).
When I logged back in (as same user) and run rabbitmqctl status I get the nodedown error like it's not running. To verify I did this on a second server and the same results occurred. The third server I have left as-is (meaning I didn't log out) and rabbitmqctl status works just fine. It does ,however, fail when I run rabbitmqctl report.

What else am I missing in troubleshooting this?


On 20/08/14 18:57, Brian Locke wrote:
What else am I missing in troubleshooting this?

Which version of RabbitMQ are you running? What error messages are you seeing? 3.3.x should give reasonably sensible diagnostics in this situation.
3.3.4

I have the same .erlang.cookie in my c:\windows and c:\users\username folder.

You might check that it's getting picked up - check the log file for a hash of the cookie used by the server, check the output of a rabbitmqctl failure for a hash of the cookie used by rabbitmqctl.
They are the same:
------------------------- Start Server Log ------------------------
"=INFO REPORT==== 18-Aug-2014::19:56:21 ===
node           : rabbit@RABBITMQ2
home dir       : C:\Windows
config file(s) : c:/rabbitmq_data/rabbitmq.config
cookie hash    : 3NIVc4nT0YGsAURF7ilx1g==
log            : c:/rabbitmq_data/log/rabbit@RABBITMQ2.log
sasl log       : c:/rabbitmq_data/log/rabbit@RABBITMQ2-sasl.log
database dir   : c:/rabbitmq_data/db/rabbit@RABBITMQ2-mnesia"

Tony

unread,
Oct 23, 2014, 10:51:11 AM10/23/14
to rabbitm...@googlegroups.com
Woot! I figured this out!!!!!!

Pull up a chair....
  1. When you install RabbitMQ say you are logged in as "bob", during the installation epmd gets started but it is running under "bob" (use tasklist /svc  followed by tasklist /v /fi "pid eq XXX" to see the account)
  2. 'rabbitmqctl status' works
  3. When bob logs out, epmd dies
  4. When bob, logs back in and runs 'rabbitmqctl status' fails, but it DOES start a new instance of epmd (running under bob)
  5. restart rabbitmq service, apparently empd is able to locate the restarted service  (<-- I don't know what is happening here as I don't know the relationship between epmd and erlang/rabbit)
  6. 'rabbitmqctl status' works again
  7. bob logs out, epmd dies
  8. bob logs in, restarts the rabbitmq service BEFORE running rabbitmqctl
  9. this time epmd is started but is running under the local system account
  10. 'rabbitmqctl status' works
  11. bob can log out/in and rabbitmqctl continues to work because epmd doesn't die.

This needs to be filed as a bug against erlang/rabbit, something needs to be done so that the epmd runs under local system and not as the local user during the first install.  Or possibly "fix" epmd so that it can talk to an already running rabbitmq (might be a permission issue as epmd is running as local user, whereas erlang/rabbit is local system). 

In the meantime, I suggest rebooting rabbitmq after installing so the epmd process will be launched under local system account and not as the local user.

Joshua Keel

unread,
Oct 24, 2014, 10:01:08 AM10/24/14
to rabbitm...@googlegroups.com
I had independently reached the same conclusions as you, Tony. Works for me. :)

Josh

Eric Aldinger

unread,
Jan 15, 2016, 4:45:17 PM1/15/16
to rabbitmq-users
This was very helpful to me today. My issue was having service accounts with different HOMEDRIVE settings using different erlang cookies. For us, epmd was remaining stable on the server.

I wanted to add a further diagnostic step here. We use LDAP security as part of our configuration. To rule out SASL security issues I ran rabbitmqadmin.py  list exchanges with valid credentials. I would have known it was an LDAP issue if I saw *** Access refused: /api/exchanges from the Python API call. It is not our primary admin API, but it is nice to use for debugging.



On Thursday, August 21, 2014 at 2:08:40 PM UTC-7, Brian Locke wrote:
Moving conversation here from Legacy list:

I have 3 RabbitMQ servers running on 3 windows servers as a service (local system acct) . They are clustered and are up and running. The management console shows all up, etc.
I have the same .erlang.cookie in my c:\windows and c:\users\username folder.

This morning I was using rabbitmqctl normally and all was good. I logged out of the machine instead of my usual disconnect (via RDP).
When I logged back in (as same user) and run rabbitmqctl status I get the nodedown error like it's not running. To verify I did this on a second server and the same results occurred. The third server I have left as-is (meaning I didn't log out) and rabbitmqctl status works just fine. It does ,however, fail when I run rabbitmqctl report.

What else am I missing in troubleshooting this?


On 20/08/14 18:57, Brian Locke wrote:
What else am I missing in troubleshooting this?

Which version of RabbitMQ are you running? What error messages are you seeing? 3.3.x should give reasonably sensible diagnostics in this situation.
3.3.4

I have the same .erlang.cookie in my c:\windows and c:\users\username folder.

You might check that it's getting picked up - check the log file for a hash of the cookie used by the server, check the output of a rabbitmqctl failure for a hash of the cookie used by rabbitmqctl.
They are the same:
------------------------- Start Server Log ------------------------
"=INFO REPORT==== 18-Aug-2014::19:56:21 ===
node           : rabbit@RABBITMQ2
home dir       : C:\Windows
config file(s) : c:/rabbitmq_data/rabbitmq.config
cookie hash    : 3NIVc4nT0YGsAURF7ilx1g==
log            : c:/rabbitmq_data/log/rabbit@RABBITMQ2.log
sasl log       : c:/rabbitmq_data/log/rabbit@RABBITMQ2-sasl.log
database dir   : c:/rabbitmq_data/db/rabbit@RABBITMQ2-mnesia"
Reply all
Reply to author
Forward
0 new messages