Rabbitmq-Server Fails to Stop

661 views
Skip to first unread message

Brad Jorgensen

unread,
Nov 12, 2015, 2:25:17 PM11/12/15
to rabbitmq-users
I have a cluster of 5 rabbitmq nodes with all queues mirrored for HA.  I often (at least weekly) have to restart all of the nodes due to errors that cause the entire cluster or individual nodes to stop working.  When such a problem occurs, "service rabbitmq-server stop" usually fails to stop the server.  The main log shows "Stopping RabbitMQ" in an info report and the shutdown log shows "Stopping and halting node rabbit@web1 ..." just as it does when it is able to stop, but nothing ever happens.  The cpu and memory usage of rabbitmq do not change significantly while it is trying to stop.  I have let the command run for more than an hour in one instance and usually give it at least one minute.  After that, I have to kill the main rabbitmq process.  Nothing is logged in the sasl log or the shutdown _err log.  I've been using rabbitmq for more than a year with the same setup and across several versions. Any suggestions on what to do?  Let me know if any more information is needed.

Environment:
RabbitMQ-Server 3.5.6
erlang 17.3
Centos 6.7

The configs on all of the nodes are basically the same and look like this:
[
  {mnesia, [
    {dump_log_write_threshold, 1000}
  ]},
  {rabbit, [
    {auth_backends, [rabbit_auth_backend_internal, rabbit_auth_backend_http]},
    {log_levels, [
      {connection, info},
      {mirroring, info}
    ]},
    {heartbeat, 10},
    {collect_statistics, coarse},
    {collect_statistics_interval, 1000},
    {delegate_count, 32},
    {cluster_partition_handling, pause_minority},
    {vm_memory_high_watermark, 0.2},
    {disk_free_limit, 1000000000}
  ]},
  {rabbitmq_management, [
    {sample_retention_policies, [
      {global, [{3600, 5}, {86400, 60}, {604800, 600}]},
      {basic, [{3600, 5}, {86400, 60}, {604800, 600}]},
      {detailed, [{60, 1}, {3600, 5}]}
    ]},
    {http_log_dir, "/var/log/rabbitmq/mgmt"}
  ]},
  {kernel, [
    {net_ticktime, 10}
  ]},
  {rabbitmq_web_stomp, [
    {port, 15674}
  ]},
  {rabbitmq_auth_backend_http, [
    {user_path,     "http://localhost/auth/user.php"},
    {vhost_path,    "http://localhost/auth/vhost.php"},
    {resource_path, "http://localhost/auth/resource.php"}
  ]}
].

Michael Klishin

unread,
Nov 12, 2015, 2:28:57 PM11/12/15
to rabbitm...@googlegroups.com, Brad Jorgensen
On 12 November 2015 at 22:25:20, Brad Jorgensen (br...@debtpaypro.com) wrote:
> Any suggestions on what to do? Let me know if any more information
> is needed.

rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().’

Most likely this is one of a couple of known problems where channels wait on
operations without a timeout. So, feel free to kill the node. It will rebuild indices
of persistent messages in durable queues on next start as needed .
--
MK

Staff Software Engineer, Pivotal/RabbitMQ


Reply all
Reply to author
Forward
0 new messages