AMQP close-reason, initiated by Library, code=541, text="Unexpected Exception"

10,496 views
Skip to first unread message

Oren Luzzatto

unread,
Jul 13, 2017, 11:02:07 AM7/13/17
to rabbitmq-users
Hi,

We have been experiencing outages working with RMQ 3.6.5 and ERL version 19.0 and receive the following error:

AMQP close-reason, initiated by Library, code=541, text="Unexpected Exception", classId=0, methodId=0, cause=System.Net.Sockets.SocketException (0x80004005): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
   at RabbitMQ.Client.Impl.Frame.ReadFrom(NetworkBinaryReader reader)
   at RabbitMQ.Client.Impl.SocketFrameHandler.ReadFrame()
   at RabbitMQ.Client.Framing.Impl.Connection.MainLoopIteration()
   at RabbitMQ.Client.Framing.Impl.Connection.MainLoop()

This happens for both publisher and consumer.
At the same time RMQ logs shows:

=WARNING REPORT==== 
closing AMQP connection <0.9126.669> 
client unexpectedly closed TCP connection

This happens on multiple servers sporadically, when publisher and consumer are on the same server as RMQ and also when publisher and consumer communicate with RMQ over load balancer in a cluster of 2 servers.

We are working with windows server 2012. We use the default RMQ configuration with the following changes:
  1. Enabled Delayed exchange plugin
  2. Disk alarm changed from 50MB to 2GB  

Did anyone experience such outages? Any idea how to find the root cause of this issue?
any help will be appreciated. 


Thanks,
Oren 





Michael Klishin

unread,
Jul 13, 2017, 11:10:10 AM7/13/17
to rabbitm...@googlegroups.com
Both client and server indicate that a TCP connection from a client was interrupted.

We've seen this once on localhost with .NET Core on Linux. I don't think there
was a resolution but it was discussed on this list.

When a load balancer is involved, it can close both client and server connections
if it finds that they are "inactive". Such inactivity timeouts are configurable
and can be worked around by lowering heartbeat timeout interval:

As in every scenario involving network connectivity loss, a traffic capture
on all machines involved will give you a lot of information to work with:


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Oren Luzzatto

unread,
Jul 16, 2017, 6:45:41 AM7/16/17
to rabbitmq-users
Thanks for the quick response, We will look into this. 

One more thing, we are caching the publisher object between publish calls. 
Could this be the reason for our issue?



Oren
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Jul 16, 2017, 10:29:09 AM7/16/17
to rabbitm...@googlegroups.com
If your publisher uses an IModel instance as a dependency, yes.
It's hard to give an informed answer since you haven't posted your code.

Oren Luzzatto

unread,
Jul 16, 2017, 11:56:43 AM7/16/17
to rabbitmq-users
we just found out that we were using RMQ client version 3.6.6 , while our code works with DLLs at version 3.6.0.
we are checking now it this could be the source of our problems

Michael Klishin

unread,
Jul 16, 2017, 7:57:18 PM7/16/17
to rabbitm...@googlegroups.com
If you mean that your server version doesn't match that of the client,
you can use any 3.6.x or 4.x client version with RabbitMQ 2.0+.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Oren Luzzatto

unread,
Jul 19, 2017, 8:33:14 AM7/19/17
to rabbitmq-users
Hi,

Going over the rabbitMq log files we find lots of entries similar to this :


=SUPERVISOR REPORT==== 19-Jul-2017::02:57:14 ===
     Supervisor: {<0.19421.837>,amqp_channel_sup_sup}
     Context:    shutdown_error
     Reason:     shutdown
     Offender:   [{nb_children,1},
                  {name,channel_sup},
                  {mfargs,
                      {amqp_channel_sup,start_link,
                          [direct,<0.19311.837>,
                           <<"<rab...@SERVER1.1.19311.837>">>]}},
                  {restart_type,temporary},
                  {shutdown,brutal_kill},
                  {child_type,supervisor}]


What is causing these type of errors? 
Is this related to our issue?


Thanks,
Oren

Michael Klishin

unread,
Jul 19, 2017, 11:45:21 AM7/19/17
to rabbitm...@googlegroups.com
It's not related to your issue. It's a channel process tree shutdown timeout.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Oren Luzzatto

unread,
Jul 24, 2017, 3:08:15 AM7/24/17
to rabbitmq-users
Hi,

We've removed the caching of the publishers from our code, and the problem stopped.

1. What is the correct design to support caching of the publisher connection?
2. Why were the consumer connections aborted when the publisher connections aborted?

Thanks,
Oren

Oren Luzzatto

unread,
Jul 30, 2017, 3:24:15 AM7/30/17
to rabbitmq-users
Sorry, we're not following;
We're experiencing communication outages between the client and RMQ.
Doesn't this error indicate communication shutdown?

Michael Klishin

unread,
Jul 30, 2017, 4:16:04 AM7/30/17
to rabbitm...@googlegroups.com
5xx codes suggest there was an unhandled exception (much like 5xx status codes in HTTP), of which should be traces in the server log files. That's why connection is closed.

Allard Poldermans

unread,
Feb 8, 2019, 7:36:20 AM2/8/19
to rabbitmq-users
Thanks so much for sharing this. I experienced exactly the same problem. It could be reproduced very easily:  create a channel, and then use a foreach loop to post 1000 messages with BasicPublish with a sleep interval. Within a minute we already had the same error for about 5 seconds.

I assume that since the load balancer swaps between the available nodes, some messages fail arrival. My conclusion is that it is necesary to re-create the channel everytime it is being used, and not leave a channel open for a long time.

By re-declaring the channel each time within the loop,the problem did not occur anymore.

Michael Klishin

unread,
Feb 8, 2019, 9:22:09 AM2/8/19
to rabbitm...@googlegroups.com
If it was so easy to reproduce, this list and GitHub issues would be flooded with similar threads but that's not the case.
I'm sorry but I feel you are leaving out something important that's going on in your test.

Channels are most certainly *not* meant to be recreated every time. It would be incredibly wasteful.
Your application must be prepared to handle scenarios where a channel is closed and various channel exceptions (e.g. try consuming from a non-existent queue),
and scenarios where connection is down/recovering [1] and a new channel cannot be immediately opened.

Instead of opening a new channel per operation (again, not a recommended practice, terribly wasteful and so on), consider taking a traffic capture [2]
and looking at server logs [3] to see what's really going on. If you need help with interpreting a new data, start a new thread and share as much as possible there.

Michael Klishin

unread,
Feb 8, 2019, 9:36:08 AM2/8/19
to rabbitm...@googlegroups.com
The above should read "If you need help with interpreting that data, start a new thread…"

Nitin Jain

unread,
Feb 26, 2020, 12:37:19 AM2/26/20
to rabbitmq-users
Hi,

We are experiencing the same issue with our services running on.netcore 3.0, linux VM. The error message is same : "Already closed: The AMQP operation was interrupted: AMQP close-reason, initiated by Library, code=541, text='Unexpected Exception'"

We have enabled automatic recovery and using default heartbeat timeout.

Wondering if anyone is also facing the similar issue or have any resolution to this.

Thanks,
Nitin Jan
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Luke Bakken

unread,
Feb 26, 2020, 2:08:43 PM2/26/20
to rabbitmq-users
Hello,

Please start a new discussion. You must include more details, such as:

* RabbitMQ and Erlang version
* Operating system you are using to run RabbitMQ
* RabbitMQ .NET client version
* Very important: RabbitMQ log messages from the time of this exception
* Complete source code or some code we can review for programming errors

Without the above information, it is impossible for anyone to assist you.

Thanks,
Luke
Reply all
Reply to author
Forward
0 new messages