Rabbitmq server randomly fails to start after reboot

1,061 views
Skip to first unread message

User2013

unread,
Jun 5, 2018, 6:13:55 AM6/5/18
to rabbitmq-users
I have Rabbitmq server version 3.5.7-1ubuntu0.16.04.2 in installed on Ubuntu 16.04 server.
When the server is rebooted Rabbitmq randomly fails to start.
Rabbit server is installed from official Ubuntu repository.
There is no clustering, it is just one independent server.

When the problem happens this is what gets logged to Journalctl on reboot:

Jun 03 09:50:52 hostname systemd[1]: Starting RabbitMQ Messaging Server...
...
Jun 03 09:51:23 hostname rabbitmq[1067]: Waiting for 'rabbit@hostname' ...
Jun 03 09:51:24 hostname rabbitmq[1067]: pid is 1188 ... 

At 09:51:29 application that tries to use Rabbit is unable to connect: [Errno 111] Connection refused 

Few seconds later Rabbit is started according to /var/log/rabbitmq/rab...@hostname.log:

=INFO REPORT==== 3-Jun-2018::09:52:22 ===
Server startup complete; 6 plugins started.
 * rabbitmq_management
 * rabbitmq_web_dispatch
 * webmachine
 * rabbitmq_management_agent
 * mochiweb
 * amqp_client

 But at the same moment, Rabbit goes into error state because of Start-port operation timing out:

Jun 03 09:52:22 hostname systemd[1]: rabbitmq-server.service: Start-post operation timed out. Stopping.
Jun 03 09:52:22 hostname systemd[1]: Failed to start RabbitMQ Messaging Server.
Jun 03 09:52:22 hostname systemd[1]: rabbitmq-server.service: Unit entered failed state.
Jun 03 09:52:22 hostname systemd[1]: rabbitmq-server.service: Failed with result 'timeout'.
 
Logs in /var/log/rabbit don't show any errors warnings.

Looking the Systemd unit file I see that Startpost operation executes script /usr/lib/rabbitmq/bin/rabbitmq-server-wait which contains following code:

. `dirname $0`/rabbitmq-env
/usr/lib/rabbitmq/bin/rabbitmqctl wait $RABBITMQ_PID_FILE

After this running command service rabbitmq-server start starts the service without problems.

I am unable to understand what is purpose of this Start-post operation and why it is timing out. 
What could be causing the problem and how should I proceed researching it?

Michael Klishin

unread,
Jun 5, 2018, 6:16:47 AM6/5/18
to rabbitm...@googlegroups.com
See server and OS logs (on both nodes if there are multiple) for clues.

Generally the only piece of advice we have for 3.5.x these days is: upgrade [1][2].


On Tue, Jun 5, 2018 at 1:13 PM, User2013 <openstac...@gmail.com> wrote:
I have Rabbitmq server version 3.5.7-1ubuntu0.16.04.2 in installed on Ubuntu 16.04 server.
When the server is rebooted Rabbitmq randomly fails to start.
Rabbit server is installed from official Ubuntu repository.
There is no clustering, it is just one independent server.

When the problem happens this is what gets logged to Journalctl on reboot:

Jun 03 09:50:52 hostname systemd[1]: Starting RabbitMQ Messaging Server...
...
Jun 03 09:51:23 hostname rabbitmq[1067]: Waiting for 'rabbit@hostname' ...
Jun 03 09:51:24 hostname rabbitmq[1067]: pid is 1188 ... 

At 09:51:29 application that tries to use Rabbit is unable to connect: [Errno 111] Connection refused 

Few seconds later Rabbit is started according to /var/log/rabbitmq/rabbit@hostname.log:

=INFO REPORT==== 3-Jun-2018::09:52:22 ===
Server startup complete; 6 plugins started.
 * rabbitmq_management
 * rabbitmq_web_dispatch
 * webmachine
 * rabbitmq_management_agent
 * mochiweb
 * amqp_client

 But at the same moment, Rabbit goes into error state because of Start-port operation timing out:

Jun 03 09:52:22 hostname systemd[1]: rabbitmq-server.service: Start-post operation timed out. Stopping.
Jun 03 09:52:22 hostname systemd[1]: Failed to start RabbitMQ Messaging Server.
Jun 03 09:52:22 hostname systemd[1]: rabbitmq-server.service: Unit entered failed state.
Jun 03 09:52:22 hostname systemd[1]: rabbitmq-server.service: Failed with result 'timeout'.
 
Logs in /var/log/rabbit don't show any errors warnings.

Looking the Systemd unit file I see that Startpost operation executes script /usr/lib/rabbitmq/bin/rabbitmq-server-wait which contains following code:

. `dirname $0`/rabbitmq-env
/usr/lib/rabbitmq/bin/rabbitmqctl wait $RABBITMQ_PID_FILE

After this running command service rabbitmq-server start starts the service without problems.

I am unable to understand what is purpose of this Start-post operation and why it is timing out. 
What could be causing the problem and how should I proceed researching it?

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

User2013

unread,
Jun 6, 2018, 3:30:23 AM6/6/18
to rabbitmq-users
Thanks for response.

Everything I could find from the logs is in my post.
Any other log sources I might be missing?

Unfortunately I cannot upgrade to newer version at this time but I will look into it when possible.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Jun 6, 2018, 1:47:28 PM6/6/18
to rabbitm...@googlegroups.com
There isn't much to work with and like I said, 3.5.x is out of support.

Connectivity issues can be genuine (there are two guides that describe the methodology that will help narrow the problem down efficiently: [1][2])
or due to known Erlang issues (and 3.5.7 cannot run on the first known version to contain a fix, 19.3.6.5) [3].


On Tue, Jun 5, 2018 at 1:13 PM, User2013 <openstac...@gmail.com> wrote:
I have Rabbitmq server version 3.5.7-1ubuntu0.16.04.2 in installed on Ubuntu 16.04 server.
When the server is rebooted Rabbitmq randomly fails to start.
Rabbit server is installed from official Ubuntu repository.
There is no clustering, it is just one independent server.

When the problem happens this is what gets logged to Journalctl on reboot:

Jun 03 09:50:52 hostname systemd[1]: Starting RabbitMQ Messaging Server...
...
Jun 03 09:51:23 hostname rabbitmq[1067]: Waiting for 'rabbit@hostname' ...
Jun 03 09:51:24 hostname rabbitmq[1067]: pid is 1188 ... 

At 09:51:29 application that tries to use Rabbit is unable to connect: [Errno 111] Connection refused 

Few seconds later Rabbit is started according to /var/log/rabbitmq/rabbit@hostname.log:

=INFO REPORT==== 3-Jun-2018::09:52:22 ===
Server startup complete; 6 plugins started.
 * rabbitmq_management
 * rabbitmq_web_dispatch
 * webmachine
 * rabbitmq_management_agent
 * mochiweb
 * amqp_client

 But at the same moment, Rabbit goes into error state because of Start-port operation timing out:

Jun 03 09:52:22 hostname systemd[1]: rabbitmq-server.service: Start-post operation timed out. Stopping.
Jun 03 09:52:22 hostname systemd[1]: Failed to start RabbitMQ Messaging Server.
Jun 03 09:52:22 hostname systemd[1]: rabbitmq-server.service: Unit entered failed state.
Jun 03 09:52:22 hostname systemd[1]: rabbitmq-server.service: Failed with result 'timeout'.
 
Logs in /var/log/rabbit don't show any errors warnings.

Looking the Systemd unit file I see that Startpost operation executes script /usr/lib/rabbitmq/bin/rabbitmq-server-wait which contains following code:

. `dirname $0`/rabbitmq-env
/usr/lib/rabbitmq/bin/rabbitmqctl wait $RABBITMQ_PID_FILE

After this running command service rabbitmq-server start starts the service without problems.

I am unable to understand what is purpose of this Start-post operation and why it is timing out. 
What could be causing the problem and how should I proceed researching it?

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

User2013

unread,
Jun 7, 2018, 3:17:56 AM6/7/18
to rabbitmq-users
Thanks, I will look into those troubleshooting steps.

One more question:
Is there any documentation about the Post-start script (/usr/lib/rabbitmq/bin/rabbitmq-server-wait)?
I would like understand what it is doing but I was unable to find information about it.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Jun 7, 2018, 5:55:33 AM6/7/18
to rabbitm...@googlegroups.com
There is no script with that name in RabbitMQ or our standard packages as far as I know, at least not in modern
release series.

RabbitMQ is 100% open source. You can find both server's own scripts [1] as well as package ones [2]
on GitHub. Pay attention to what version you are looking at since 3.5.7 is from December 2015.


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages