management plugin suddenly stopped responding on all nodes in the cluster

1,374 views
Skip to first unread message

da...@weheartit.com

unread,
Jul 17, 2014, 3:35:16 PM7/17/14
to rabbitm...@googlegroups.com
I am running version 3.3.4 in a 4 node cluster and all of the sudden the management interface on all 4  nodes is no longer responding.
The queue is still running fine and the admin tools that don't use the management interface list rabbitmqctl still work.
I tried restarting rabbitmq-server, disabling the plugin, restarting and then enabling and restarting.
There is nothing in the logs.
Is there any tools available to me to help debug this?

Michael Klishin

unread,
Jul 17, 2014, 4:18:20 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
 On 17 July 2014 at 23:35:19, da...@weheartit.com (da...@weheartit.com) wrote:
> > Is there any tools available to me to help debug this?

what do

curl http://[rabbitmq-host]:15672/
curl -u [username]:[password] http://[rabbitmq-host]:15672/api/whoami

return?
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

da...@weheartit.com

unread,
Jul 17, 2014, 6:09:06 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
daveo@sidekiq03:/etc/rabbitmq$ curl http://localhost:15672/
<html>
  <head>
    <title>RabbitMQ Management</title>
    <script src="js/ejs.js" type="text/javascript"></script>
    <script src="js/jquery-1.6.4.min.js" type="text/javascript"></script>
    <script src="js/jquery.flot.min.js" type="text/javascript"></script>
    <script src="js/jquery.flot.time.min.js" type="text/javascript"></script>
    <script src="js/sammy-0.6.0.min.js" type="text/javascript"></script>
    <script src="js/json2.js" type="text/javascript"></script>
    <script src="js/base64.js" type="text/javascript"></script>
    <script src="js/global.js" type="text/javascript"></script>
    <script src="js/main.js" type="text/javascript"></script>
    <script src="js/prefs.js" type="text/javascript"></script>
    <script src="js/help.js" type="text/javascript"></script>
    <script src="js/formatters.js" type="text/javascript"></script>
    <script src="js/charts.js" type="text/javascript"></script>

    <link href="css/main.css" rel="stylesheet" type="text/css"/>
    <link href="favicon.ico" rel="shortcut icon" type="image/x-icon"/>

<!--[if lte IE 8]>
    <script src="js/excanvas.min.js" type="text/javascript"></script>
    <link href="css/evil.css" rel="stylesheet" type="text/css"/>
<![endif]-->
  </head>
  <body>
    <div id="outer"></div>
    <div id="debug"></div>
    <div id="scratch"></div>
  </body>
</html>
daveo@sidekiq03:/etc/rabbitmq$ curl -u guest:guest http://localhost:15672/api/whoami
{"name":"guest","tags":"administrator","auth_backend":"rabbit_auth_backend_internal"}

Michael Klishin

unread,
Jul 17, 2014, 6:13:30 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com


On 18 July 2014 at 02:09:09, da...@weheartit.com (da...@weheartit.com) wrote:
> > daveo@sidekiq03:/etc/rabbitmq$ curl -u guest:guest http://localhost:15672/api/whoami
> {"name":"guest","tags":"administrator","auth_backend":"rabbit_auth_backend_internal"}

This means the API responds (at least for guest/guest from localhost). When you say
"no longer responds", what exactly do you observe? TCP timeouts? 

da...@weheartit.com

unread,
Jul 17, 2014, 6:18:09 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
It no longer loads in the browser.
None of the rabbitmqadmin commands respond.
The nagios plugin check_rabbitmq_queue which uses the admin interface no longer responds.

Michael Klishin

unread,
Jul 17, 2014, 6:21:40 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
 On 18 July 2014 at 02:18:11, da...@weheartit.com (da...@weheartit.com) wrote:
> > It no longer loads in the browser.
> None of the rabbitmqadmin commands respond.
> The nagios plugin check_rabbitmq_queue which uses the admin
> interface no longer responds.

Do you use rabbitmqadmin from a different host? Does Nagios run on
a different host? Do you always use the same user?

RabbitMQ 3.3 no longer accepts connections from the default user (guest)
from hosts other than localhost, however, this should be visible (as a clear
authentication failure).

See

http://www.rabbitmq.com/blog/2014/04/02/breaking-things-with-rabbitmq-3-3/

If that's not the case, I'd expect some kind of firewall to be the cause.

da...@weheartit.com

unread,
Jul 17, 2014, 6:26:28 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
Nagios is making the request locally on each machine.
Rabbitmqadmin is also connecting locally.
Also when we connect to the admin interface via browser its always though an ssh tunnel which is a local connection.
The strange thing is that the queue is functioning fine and rabbitmqctl works fine as well.
Also the management port is listening, just not responding to any requests.

Michael Klishin

unread,
Jul 17, 2014, 6:31:28 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
 On 18 July 2014 at 02:26:29, da...@weheartit.com (da...@weheartit.com) wrote:
> > Also the management port is listening, just not responding
> to any requests.

Again, what exactly does "not responding" mean?

 * Is the TCP connection refused?
 * Does TCP connection time out?
 * Is there an HTTP error?

The curl http://[rabbitmq-host]:15672/ request is what home page in the Web UI
serves. It responds with 200 OK and the correct content.

Does curl -u guest:guest http://127.0.0.1:15672/api/queues/ respond with 200 OK
and a JSON body?

da...@weheartit.com

unread,
Jul 17, 2014, 7:04:28 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
Not responding means when I run a command
ie. 
rabbitmqadmin list queues
or
the commands hang and there is zero response.
I there a log I can turn on or at least increase the verbosity of the logging?\

Michael Klishin

unread,
Jul 17, 2014, 7:08:08 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
On 18 July 2014 at 03:04:30, da...@weheartit.com (da...@weheartit.com) wrote:
> > Not responding means when I run a command
> ie.
> rabbitmqadmin list queues
> or
> curl -u guest:guest http://127.0.0.1:15672/api/queues/
> the commands hang and there is zero response.

Does it eventually respond if you wait for a long time? Possibly something takes a while to load, e.g. you
have 10s of millions of queues.

> I there a log I can turn on or at least increase the verbosity of
> the logging?

You can turn on HTTP API request logging:
http://www.rabbitmq.com/management.html#configuration

da...@weheartit.com

unread,
Jul 17, 2014, 7:36:05 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
I created another user other than guest with the tag  administrator and still no luck.
I also attempted to set up logging with the following line in rabbitmq.conf
{rabbitmq_management,       [ {http_log_dir,          "/var/log/rabbitmq/"} ] },
and no requests are being logged.

Michael Klishin

unread,
Jul 17, 2014, 7:46:21 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
On 18 July 2014 at 03:36:07, da...@weheartit.com (da...@weheartit.com) wrote:
> > I also attempted to set up logging with the following line in
> rabbitmq.conf
> {rabbitmq_management, [ {http_log_dir, "/var/log/rabbitmq/"}
> ] },
> and no requests are being logged.

File name should be rabbitmq.config. Please see http://www.rabbitmq.com/configure.html.

Also, you haven't replied to my question about waiting on the response (if it eventually
succeeds) or having a really high number of queues or other entities which may take a while
to load.

da...@weheartit.com

unread,
Jul 17, 2014, 7:51:56 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
Yes sorry the filename I edited was rabbitmq.config.
I only have 2 queues with a small number of messages in them which I can get using rabbitmqctl
rabbitmqctl list_queues
Listing queues ...
aliveness-test 0
prod_dashboard 72
...done.

As far as timeouts when I run rabbitmqadmin commands I've waited for at least 5 minutes with zero on 2 different servers.
In each case nothing was returned and I killed it with the control + c

Michael Klishin

unread,
Jul 17, 2014, 8:01:16 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
On 18 July 2014 at 03:51:58, da...@weheartit.com (da...@weheartit.com) wrote:
> > I only have 2 queues with a small number of messages in them which
> I can get using rabbitmqctl
> rabbitmqctl list_queues
> Listing queues ...
> aliveness-test 0
> prod_dashboard 72
> ...done.
>
> As far as timeouts when I run rabbitmqadmin commands I've waited
> for at least 5 minutes with zero on 2 different servers.
> In each case nothing was returned and I killed it with the control
> + c

Try

rabbitmqctl eval 'application:stop(rabbitmq_management).'
rabbitmqctl eval 'application:start(rabbitmq_management).’

this will restart the management plugin without affecting the core of RabbitMQ.

da...@weheartit.com

unread,
Jul 17, 2014, 8:19:06 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
That command works, from the logs it shows that the plug in is successfully unloaded and reloaded again.
Unfortunately its still not responsive.
Is there a way to blow away its database of staticstics?
I'm wondering if that is the problem as its a shared resource for sure that had to have caused all 4 nodes to fail at the same time.

Michael Klishin

unread,
Jul 17, 2014, 8:29:09 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
On 18 July 2014 at 04:19:07, da...@weheartit.com (da...@weheartit.com) wrote:
> > Is there a way to blow away its database of staticstics?
> I'm wondering if that is the problem as its a shared resource for
> sure that had to have caused all 4 nodes to fail at the same time.

It can be an issue with the statistics DB. Can you post output of

rabbitmqctl report

(note that it will list queue names and such, so you may want to edit sensitive
info out). 

Michael Klishin

unread,
Jul 17, 2014, 8:32:43 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
On 18 July 2014 at 04:29:08, Michael Klishin (mkli...@gopivotal.com) wrote:
> > It can be an issue with the statistics DB. Can you post output
> of
>
> rabbitmqctl report

 and maybe rabbitmqctl eval 'process_info(whereis(rabbit_mgmt_db)).'

Michael Klishin

unread,
Jul 17, 2014, 8:41:55 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
On 18 July 2014 at 04:32:42, Michael Klishin (mkli...@gopivotal.com) wrote:
> > and maybe rabbitmqctl eval 'process_info(whereis(rabbit_mgmt_db)).'

As you've mentioned you have a cluster (something I did not realise before), see "statistics_db_node" in curl -u guest:guest http://127.0.0.1:15672/api/overview/ output.

The stats DB runs on a single node in the cluster at a time. Try restarting
the management plugin on that node. 

da...@weheartit.com

unread,
Jul 17, 2014, 8:51:26 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
Problem solved.
When I went to unload/load the management plugin on all of the nodes, 3 ran successfully but one of the nodes threw an error.
So on that host I performed a stop/start of the rabbitmq-server process.
Now I have access to the plugin again.
Thanks for your help.

Michael Klishin

unread,
Jul 17, 2014, 8:52:56 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
 On 18 July 2014 at 04:51:27, da...@weheartit.com (da...@weheartit.com) wrote:
> > When I went to unload/load the management plugin on all of the
> nodes, 3 ran successfully but one of the nodes threw an error.

Do you still have the error message? Can you post it here?

da...@weheartit.com

unread,
Jul 17, 2014, 9:24:27 PM7/17/14
to rabbitm...@googlegroups.com, da...@weheartit.com
Sorry I didn't save the output but instead of just unloading the plugin it produced an error message.
Thanks for your help.
Reply all
Reply to author
Forward
0 new messages