Closing AMQP connection frequently with errors {writer,send

Rajesh D N

unread,

Jul 18, 2017, 2:35:19 AM7/18/17

to rabbitmq-users

HI All,

i am using amqp-client-4.1.0 (Java client) with rabbitmq 3.6.9 and Erlang 19.2. Application and RabbitmQ is on same server no clustering and i am directly accessing rabbitmq using localhost.

I recently updated few of the production system from 3.6.6 to 3.6.9 rabbitmq.

After that i am seeing too many connection related errors in rabbitmq logs. (timeout, enotconn) causing the connection to close abruptly.

Any one else facing the same problem.

can some one help me identifying the issue here?

=ERROR REPORT==== 10-Jul-2017::03:52:18 ===

closing AMQP connection <0.26922.26> (127.0.0.1:63254 -> 127.0.0.1:5672):

{writer,send_failed,{error,timeout}}

=ERROR REPORT==== 17-Jul-2017::03:52:18 ===

closing AMQP connection <0.26922.26> (127.0.0.1:63254 -> 127.0.0.1:5672):

{writer,send_failed,{error,enotconn}}

=WARNING REPORT==== 16-Jul-2017::23:05:55 ===

Could not find handle.exe, please install from sysinternals

=ERROR REPORT==== 16-Jul-2017::23:55:17 ===

** Generic server rabbit_mgmt_external_stats terminating

** Last message in was emit_update

** When Server state == {state,8192,

[{{io_file_handle_open_attempt,count},

364099758},

{{io_file_handle_open_attempt,time},

1010448639},

{{io_read,bytes},925822257},

{{io_read,count},112},

{{io_read,time},22406960},

{{io_reopen,count},0},

{{io_seek,count},21819},

{{io_seek,time},32251159},

{{io_sync,count},1468942},

{{io_sync,time},27026253881},

{{io_write,bytes},69869961042},

{{io_write,count},1468942},

{{io_write,time},25759640733},

{{mnesia_disk_tx,count},23},

{{mnesia_ram_tx,count},175048},

{{msg_store_read,count},0},

{{msg_store_write,count},0},

{{queue_index_journal_write,count},362608386},

{{queue_index_read,count},111},

{{queue_index_write,count},10975}],

{set,0,16,16,8,80,48,

{[],[],[],[],[],[],[],[],[],[],[],[],[],

[],[],[]},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],

[],[],[]}}},

undefined,5000}

** Reason for termination ==

** {badarg,

[{erlang,port_command,

[#Port<0.2000398>,[]],

[{file,"erlang.erl"},{line,2990}]},

{os,cmd,1,[{file,"os.erl"},{line,233}]},

{rabbit_mgmt_external_stats,get_used_fd,1,

[{file,"src/rabbit_mgmt_external_stats.erl"},{line,132}]},

{rabbit_mgmt_external_stats,get_used_fd,0,

[{file,"src/rabbit_mgmt_external_stats.erl"},{line,60}]},

{rabbit_mgmt_external_stats,'-infos/2-lc$^0/1-0-',2,

[{file,"src/rabbit_mgmt_external_stats.erl"},{line,174}]},

{rabbit_mgmt_external_stats,emit_update,1,

[{file,"src/rabbit_mgmt_external_stats.erl"},{line,368}]},

{rabbit_mgmt_external_stats,handle_info,2,

[{file,"src/rabbit_mgmt_external_stats.erl"},{line,355}]},

{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,601}]}]}

Michael Klishin

unread,

Jul 18, 2017, 6:50:10 AM7/18/17

to rabbitm...@googlegroups.com

Those messages are not related.

External stats server terminating means the node won't be able to retrieve the number of open file handles,

free disk space and similar node-wide stats from the OS. In a recent version it's somewhat more resilient (stops

after a few failed attempts). This does not take the node down and is not related to client connections.

Failed writes and `enotconn` means that a client TCP socket wasn't in the connected state when RabbitMQ attempted

to perform a write on it. `timeout` is fairly self-explanatory and also comes from a client socket write operation in this case.

In environments where clients can disappear at any moment, this may be expected.

Relevant guide: http://www.rabbitmq.com/heartbeats.html.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

MK

Staff Software Engineer, Pivotal/RabbitMQ

Rajesh D N

unread,

Jul 31, 2017, 5:50:08 AM7/31/17

to rabbitmq-users

thanks for the reply and it was very useful. as default value is 60 seconds for heartbeat, still setting 60 or not setting at all makes any difference?

i have changed from queuing consumer to default consumers for connection recovery.

but now i am facing some problems like

I see too many {writer,send_failed,{error,timeout}} in rabbitmq logs.

when the connection is recovered i see more than one consumer per channel as apposed to one consumer per channel which is what i have configured.

Any thoughts on avoiding timeout error?

Relevant guide: http://www.rabbitmq.com/heartbeats.html.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.

To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,

Jul 31, 2017, 8:10:34 AM7/31/17

to rabbitm...@googlegroups.com

> I see too many {writer,send_failed,{error,timeout}} in rabbitmq logs.

This means socket writes fail due to timeout (from the RabbitMQ end).

Automatic connection recovery generally recovers the topology it knows about.

There are known edge cases where a hostname resolves to more than one IP

and it sometimes can lead to more connections than expected:

https://github.com/rabbitmq/rabbitmq-java-client/issues/283#issuecomment-304665372

We are not aware of any other recovery issues in the most recent release (4.2.0).

You can use Wireshark to take a traffic capture and see what's going on on the wire:

https://www.rabbitmq.com/amqp-wireshark.html.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rajesh D N

unread,

Oct 2, 2017, 11:46:37 PM10/2/17

to rabbitmq-users

My understanding of this might be due to Kaspersky is wrong.

I uninstalled kaspersky and i still see this error in logs. I have set the heartbeat to 10 seconds. What's causing this?

And even though I have connection auto recovery set to true this error causes the connection not to get recovered.

Michael Klishin

unread,

Oct 3, 2017, 9:56:28 AM10/3/17

to rabbitm...@googlegroups.com

There can be a lot of reasons that lead to TCP socket write timeouts.

This error does not "cause client connection to not get recovered". It merely indicates a client network connection

is not in a healthy state. Before the client detects it — which takes time, see http://rabbitmq.com/heartbeats.html — automatic

recovery will not begin.

Socket write timeouts and connection recovery are not directly related. It is fair to say that

they do have a common root cause, however.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.

To post to this group, send an email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rajesh D N

unread,

Oct 4, 2017, 3:11:35 AM10/4/17

to rabbitmq-users

I tried setting the hearbeat to 10 seconds, 30 seconds.

but now i see too many "{inet_error,enotconn}" in rabbitmq logs.

Now, i am running into some wearied issue after this error.

I am getting "Connection reset by peer: socket write error" issue when try to acknowledge the message.

this is the line that is causing exception recoverableChannel.basicAck(deliveryTag, false);

I am using DefaultConsumer, recoverableChannel, autorecoveringConnection on java client 4.10.

above this line i have tried printing channel number and it prints fine. so i am suspecting the basicAck method is causing this issue.

any thoughts on this?

On Tuesday, 3 October 2017 19:26:28 UTC+5:30, Michael Klishin wrote:

There can be a lot of reasons that lead to TCP socket write timeouts.

This error does not "cause client connection to not get recovered". It merely indicates a client network connection
is not in a healthy state. Before the client detects it — which takes time, see http://rabbitmq.com/heartbeats.html — automatic
recovery will not begin.

Socket write timeouts and connection recovery are not directly related. It is fair to say that
they do have a common root cause, however.

On Tue, Oct 3, 2017 at 6:46 AM, Rajesh D N <rajes...@gmail.com> wrote:

My understanding of this might be due to Kaspersky is wrong.

I uninstalled kaspersky and i still see this error in logs. I have set the heartbeat to 10 seconds. What's causing this?

And even though I have connection auto recovery set to true this error causes the connection not to get recovered.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send an email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,

Oct 4, 2017, 6:27:40 AM10/4/17

to rabbitm...@googlegroups.com

"enotconn" means a TCP socket is "no longer connected" (in an unusable state). It's very likely a yet

another variation of the same root cause: TCP connections from clients time out or error.

If you observe this over and over, something is up with the quality of the network connection

between your clients and RabbitMQ. Such errors cannot be avoided entirely but in many cases

are not the norm.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rajesh D N

unread,

Oct 4, 2017, 7:48:49 AM10/4/17

to rabbitmq-users

can you help me in solving/troubleshooting the "Connection reset by peer: socket write error" while ack recoverableChannel.basicAck(deliveryTag, false);?

if you are available i can send a meeting invite to check.

To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,

Oct 4, 2017, 8:14:07 AM10/4/17

to rabbitm...@googlegroups.com

The ack operation here is not the root cause, merely something that happens to be called often enough in your app to run into a socket write failure.

You have enough evidence on both ends that TCP connections

between RabbitMQ and your clients fail often and repeatedly.

Client connection recovery can help to some extent but ultimately the question is why that happens and how can it be made less frequent.

Closing AMQP connection frequently with errors {writer,send_failed,{error,enotconn}} and timeout

Rajesh D N

Michael Klishin

Rajesh D N

Michael Klishin

Rajesh D N

Michael Klishin

Rajesh D N

Michael Klishin

Rajesh D N

Michael Klishin