service rabbitmq-server restart command hangs forever

6,318 views
Skip to first unread message

Shridhar Sahukar

unread,
Nov 4, 2014, 5:28:13 PM11/4/14
to rabbitm...@googlegroups.com
Hi All,

On some of our setups, I am noticing that when we issue 'service rabbitmq-server restart', the command hangs forever. 'ps fax' output shows that it is stuck in "/usr/lib/rabbitmq/bin/rabbitmqctl  "wait" "/var/run/rabbitmq/pid". The command gets unblocked if I issue "rabbitmqctl start_app" in another window. So it seems like the "restart" operation is only starting the erlang virtual machine but not the rabbitmqapp.

I dont see any errors in rabbitmq logs that indicate the problem. I have noticed erl_crash.dump file some times, but it is not seen always. Also I am not sure what gets the machine into this state, but once it gets in this state, nothing seems to get it out of that state. For example, I tried removing the entire /var/lib/rabbitmq/mnesia/ directory and restarting the rabbitmq, but subsequent restarts get hung again.

Please suggest how to gather more information on why the command fails to start the rabbitmq app. Also please let me know if this is a known issue.

Thanks for your help in advance.

Regards,
Shridhar 

Michael Klishin

unread,
Nov 4, 2014, 5:30:21 PM11/4/14
to rabbitm...@googlegroups.com, Shridhar Sahukar
What version do you run? I can recall a few bugs in 3.1.x and 3.2.x releases that could
cause a deadlock on server shutdown.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Shridhar Sahukar

unread,
Nov 4, 2014, 5:38:54 PM11/4/14
to Michael Klishin, rabbitm...@googlegroups.com
Hi MK,

Thanks for the quick response.

I am seeing the problem on 3.1.1-1 version of rabbitmq. However, on one of the machines that had the problem, I installed 3.4.1-1 version, stopped rabbitmq, deleted /var/lib/rabbitmq/mnesia directory and started it back, and I still the problem on subsequent restarts.

Thanks,
Shridhar

Simon MacMullen

unread,
Nov 5, 2014, 5:17:12 AM11/5/14
to Shridhar Sahukar, Michael Klishin, rabbitm...@googlegroups.com
On 04/11/14 22:38, Shridhar Sahukar wrote:
> I am seeing the problem on 3.1.1-1 version of rabbitmq. However, on one
> of the machines that had the problem, I installed 3.4.1-1 version,
> stopped rabbitmq, deleted /var/lib/rabbitmq/mnesia directory and started
> it back, and I still the problem on subsequent restarts.


If you are seeing this on a modern version, could you post the output of

# rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().'

please?

Cheers, Simon

Shridhar Sahukar

unread,
Nov 5, 2014, 1:23:31 PM11/5/14
to Simon MacMullen, Michael Klishin, rabbitm...@googlegroups.com
Hello Simon,

Please find the output of the command below:

$ sudo rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().'
There are 38 processes.
Error: {badarg,[{erlang,process_info,[<5056.0.0>,current_stacktrace]},
                {rabbit_diagnostics,looks_stuck,1},
                {rabbit_diagnostics,'-maybe_stuck/2-lc$^1/1-1-',1},
                {rabbit_diagnostics,maybe_stuck,2},
                {erl_eval,do_apply,5},
                {rpc,'-handle_call_call/6-fun-0-',5}]}


Let me know if you need any other information.

Thanks,
Shridhar

Simon MacMullen

unread,
Nov 7, 2014, 8:07:09 AM11/7/14
to Shridhar Sahukar, Michael Klishin, rabbitm...@googlegroups.com
Well damn. It looks like there's a race condition in the diagnostic
which is causing it to fail. This will be fixed in the next release,
but: does it return this badarg every time you invoke it?

Cheers, Simon
> # rabbitmqctl eval 'rabbit_diagnostics:maybe___stuck().'
>
> please?
>
> Cheers, Simon
>
>

Gary B

unread,
Apr 26, 2016, 10:34:03 AM4/26/16
to rabbitmq-users, shridhar...@cyaninc.com, mkli...@pivotal.io, si...@rabbitmq.com
I'm running rabbitmq 3.5.7 and it's still happening.  It means that I can't enable the service to start on Linux startup as then the startup never finishes.  Running

bash -x /etc/init.d/rabbitmq-server start

shows it getting this far:

...
+ ensure_pid_dir
++ dirname /var/run/rabbitmq/pid
+ PID_DIR=/var/run/rabbitmq
+ '[' '!' -d /var/run/rabbitmq ']'
+ set +e
+ /usr/sbin/rabbitmqctl wait /var/run/rabbitmq/pid
+ RABBITMQ_PID_FILE=/var/run/rabbitmq/pid
+ daemon /usr/sbin/rabbitmq-server

and it hangs there forever.

Michael Klishin

unread,
Apr 26, 2016, 10:48:04 AM4/26/16
to Gary B, rabbitmq-users
What's in RabbitMQ log files? and startup stdout/stderr files?

Michael Klishin

unread,
Apr 26, 2016, 10:59:01 AM4/26/16
to Gary, rabbitm...@googlegroups.com
+rabbitmq-users

So the issue ended up being with PID file/directory permissions?

On Tue, Apr 26, 2016 at 9:56 AM, Gary <theto...@gmail.com> wrote:
Interesting:
$ tail -f startup_* *.log

tail: startup_log: file truncated
tail: startup_err: file truncated
/usr/lib/rabbitmq/bin/rabbitmq-server: line 42: /var/run/rabbitmq/pid: Permission denied

==> startup_log <==
                                                           [FAILED]


It's capturing the normal output of the init script and putting into the rabbitmq logs.

I've fixed that error and now it's not hanging.  So if there is an error, it hangs...  May be specific to this particular error.

Thanks.

Michael Klishin

unread,
Apr 26, 2016, 11:29:31 AM4/26/16
to Gary, rabbitm...@googlegroups.com
Please CC the list.

I'm pretty sure our packages already take care of directory permissions. What distribution do you use?

On Tue, Apr 26, 2016 at 10:02 AM, Gary <theto...@gmail.com> wrote:
Correct.  Our puppet was setup to change the ownership of /var/run/rabbitmq to root:root, so the rabbitmq user could no longer write to it.  The script has a 'wait' in there to wait for the PID file to be created.  As the file was never created, the script waits forever.
Maybe in a future version you can test for the correct permissions on that folder?

Anyway, it's sorted now.  thanks for your help.

Gary

unread,
Apr 26, 2016, 11:34:57 AM4/26/16
to Michael Klishin, rabbitm...@googlegroups.com

The packages do, yes.  But someone configured our puppet to change the permissions. It was done some time ago, and they probably thought they were doing it right.  I would guess other might do similar...  There's no accounting for humans at times...

Gary

unread,
Apr 27, 2016, 3:42:59 AM4/27/16
to Michael Klishin, rabbitm...@googlegroups.com
Just to confirm that setting the permission correctly on /var/run/rabbitmq in puppet fixed the issue.
Reply all
Reply to author
Forward
0 new messages