RabbitMQ - Root Cause for Broker Forced Connection Closures

7,775 views
Skip to first unread message

path...@apps.disney.com

unread,
Apr 2, 2021, 5:00:13 PM4/2/21
to rabbitmq-users
Hi, 

In an effort to apply security patches to our infrastructure, we rotate our RabbitMQ clusters on a timely basis but without any changes to the RabbitMQ cluster configuration, we noticed the following message in the (crash) log and are trying to identify the root cause for the error: 

{"log":" operation none caused a connection exception connection_forced: \"broker forced connection closure with reason 'shutdown'\"\n","stream":"stdout","time":"2021-04-02T09:43:31.411122261Z"} 

The message above is logged hundreds of times but neither the application-specific nor the RabbitMQ logs hint at the source of the problem. When this issue occurs, the corresponding RabbitMQ process restarts and recovers but the corresponding application has to be manually restarted. Specifically, what we would like to confirm is whether RabbitMQ itself is shutting down the broker or whether an application communicating with RabbitMQ is . Could you point us in the right direction of determining the root cause? 

If it helps, here are a few important details for our environment: 
RabbitMQ version: 3.7.11 
Number of RabbitMQ nodes in the cluster: 3 
Hosting location: Kubernetes (i.e Docker) 
 
Additionally, we see a similar behavior with two more RabbitMQ clusters which use the same configuration mentioned above.  

I am happy to retrieve and share any additional logs if those are required. Any help is greatly appreciated. 

- Tarpan Pathak

Johan Rhodin

unread,
Apr 3, 2021, 10:01:08 PM4/3/21
to rabbitm...@googlegroups.com
Hi,

What is logged in the RabbitMQ logs and syslog at the time (2021-04-02T09:43)? It looks like RabbitMQ is shutting down and is closing the connection. RabbitMQ logs should give you more info about why that's happening.

As you probably know 3.7.11 is now a very old release and is out of support since quite some time, so the first recommendation would be to upgrade.

/Johan

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/41e3db24-59e2-4c44-b46f-7ccb4c6217e9n%40googlegroups.com.

path...@apps.disney.com

unread,
Apr 5, 2021, 8:26:31 PM4/5/21
to rabbitmq-users
Hi Johan, 

Thanks for the prompt response. 

For one of the clusters, I see a ton of messages in the crash log like so:

=SUPERVISOR REPORT====
     Supervisor: {<0.9265.531>,amqp_channel_sup_sup}
     Context:    shutdown_error
     Reason:     shutdown
     Offender:   [{nb_children,1},{name,channel_sup},{mfargs,{amqp_channel_sup,start_link,[direct,<0.9288.531>,<<"<rabbit@<sensitive>.3.9288.531>">>]}},{restart_type,temporary},{shutdown,infinity},{child_type,supervisor}]
=SUPERVISOR REPORT====
     Supervisor: {<0.10689.532>,amqp_channel_sup_sup}
     Context:    shutdown_error
     Reason:     noproc
     Offender:   [{nb_children,1},{name,channel_sup},{mfargs,{amqp_channel_sup,start_link,[direct,<0.10711.532>,<<"<rabbit@<sensitive>.3.10711.532>">>]}},{restart_type,temporary},{shutdown,infinity},{child_type,supervisor}]

For another cluster, I see a ton of messages in the crash log like so:

 =ERROR REPORT====
 Mnesia('rabbit@<sensitive>'): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, 'rabbit@<sensitive>'}
 =SUPERVISOR REPORT====
      Supervisor: {<0.19761.266>,amqp_channel_sup_sup}
      Context:    shutdown_error
      Reason:     noproc
      Offender:   [{nb_children,1},{name,channel_sup},{mfargs,{amqp_channel_sup,start_link,[direct,<0.19779.266>,<<"<rabbit@<sensitive>.3.19779.266>">>]}},{restart_type,temporary},{shutdown,infinity},{child_type,supervisor}]

Based on the message/s above, I am still not sure whether RabbitMQ itself or an "external" application is shutting down the connection. Since our log level currently is set to debug. I plan to bump this up to investigate this further. 

You bring up a very good point regarding the version. This is also something I plan on bringing up with the team soon. Would you recommend going to the latest version (3.8.14 at this time) or should 3.8.xx suffice? 

M K

unread,
Apr 6, 2021, 5:37:52 AM4/6/21
to rabbitmq-users
Nothing in those messages tells us that the node had to shut down.

Whether it is a `rabbitmqctl shutdown` or a node stopping due to pause_minority partition handling strategy,
the reason would still be reported to applications as "shutdown". Full server logs should contain more details that will help you understand the root cause.
Reply all
Reply to author
Forward
0 new messages