Closing AMQP connection frequently with errors {writer,send_failed,{error,enotconn}} and timeout

7,285 views
Skip to first unread message

Rajesh D N

unread,
Jul 18, 2017, 2:35:19 AM7/18/17
to rabbitmq-users
HI All,

i am using amqp-client-4.1.0 (Java client) with rabbitmq 3.6.9 and Erlang 19.2.  Application and RabbitmQ is on same server no clustering and i am directly accessing rabbitmq using localhost.

I recently updated few of the production system from 3.6.6 to 3.6.9 rabbitmq.

After that i am seeing too many connection related errors in rabbitmq logs.  (timeout, enotconn) causing the connection to close abruptly. 

Any one else facing the same problem.

can some one help me identifying the issue here?

=ERROR REPORT==== 10-Jul-2017::03:52:18 ===
closing AMQP connection <0.26922.26> (127.0.0.1:63254 -> 127.0.0.1:5672):
{writer,send_failed,{error,timeout}}


=ERROR REPORT==== 17-Jul-2017::03:52:18 ===
closing AMQP connection <0.26922.26> (127.0.0.1:63254 -> 127.0.0.1:5672):
{writer,send_failed,{error,enotconn}}


=WARNING REPORT==== 16-Jul-2017::23:05:55 ===
Could not find handle.exe, please install from sysinternals

=ERROR REPORT==== 16-Jul-2017::23:55:17 ===
** Generic server rabbit_mgmt_external_stats terminating 
** Last message in was emit_update
** When Server state == {state,8192,
                               [{{io_file_handle_open_attempt,count},
                                 364099758},
                                {{io_file_handle_open_attempt,time},
                                 1010448639},
                                {{io_read,bytes},925822257},
                                {{io_read,count},112},
                                {{io_read,time},22406960},
                                {{io_reopen,count},0},
                                {{io_seek,count},21819},
                                {{io_seek,time},32251159},
                                {{io_sync,count},1468942},
                                {{io_sync,time},27026253881},
                                {{io_write,bytes},69869961042},
                                {{io_write,count},1468942},
                                {{io_write,time},25759640733},
                                {{mnesia_disk_tx,count},23},
                                {{mnesia_ram_tx,count},175048},
                                {{msg_store_read,count},0},
                                {{msg_store_write,count},0},
                                {{queue_index_journal_write,count},362608386},
                                {{queue_index_read,count},111},
                                {{queue_index_write,count},10975}],
                               {set,0,16,16,8,80,48,
                                    {[],[],[],[],[],[],[],[],[],[],[],[],[],
                                     [],[],[]},
                                    {{[],[],[],[],[],[],[],[],[],[],[],[],[],
                                      [],[],[]}}},
                               undefined,5000}
** Reason for termination == 
** {badarg,
       [{erlang,port_command,
            [#Port<0.2000398>,[]],
            [{file,"erlang.erl"},{line,2990}]},
        {os,cmd,1,[{file,"os.erl"},{line,233}]},
        {rabbit_mgmt_external_stats,get_used_fd,1,
            [{file,"src/rabbit_mgmt_external_stats.erl"},{line,132}]},
        {rabbit_mgmt_external_stats,get_used_fd,0,
            [{file,"src/rabbit_mgmt_external_stats.erl"},{line,60}]},
        {rabbit_mgmt_external_stats,'-infos/2-lc$^0/1-0-',2,
            [{file,"src/rabbit_mgmt_external_stats.erl"},{line,174}]},
        {rabbit_mgmt_external_stats,emit_update,1,
            [{file,"src/rabbit_mgmt_external_stats.erl"},{line,368}]},
        {rabbit_mgmt_external_stats,handle_info,2,
            [{file,"src/rabbit_mgmt_external_stats.erl"},{line,355}]},
        {gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,601}]}]}

Michael Klishin

unread,
Jul 18, 2017, 6:50:10 AM7/18/17
to rabbitm...@googlegroups.com
Those messages are not related.

External stats server terminating means the node won't be able to retrieve the number of open file handles,
free disk space and similar node-wide stats from the OS. In a recent version it's somewhat more resilient (stops
after a few failed attempts). This does not take the node down and is not related to client connections.

Failed writes and `enotconn` means that a client TCP socket wasn't in the connected state when RabbitMQ attempted
to perform a write on it. `timeout` is fairly self-explanatory and also comes from a client socket write operation in this case.

In environments where clients can disappear at any moment, this may be expected.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Rajesh D N

unread,
Jul 31, 2017, 5:50:08 AM7/31/17
to rabbitmq-users
thanks for the reply and it was very useful. as default value is 60 seconds for heartbeat, still setting 60 or not setting at all makes any difference?

i have changed from queuing consumer to default consumers for connection recovery.

but now i am facing some problems like 
I see too many {writer,send_failed,{error,timeout}} in rabbitmq logs. 
when the connection is recovered i see more than one consumer per channel as apposed to one consumer  per channel which is what i have configured. 

Any thoughts on avoiding timeout error? 

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.

To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Jul 31, 2017, 8:10:34 AM7/31/17
to rabbitm...@googlegroups.com
I see too many {writer,send_failed,{error,timeout}} in rabbitmq logs. 

This means socket writes fail due to timeout (from the RabbitMQ end).

Automatic connection recovery generally recovers the topology it knows about.
There are known edge cases where a hostname resolves to more than one IP
and it sometimes can lead to more connections than expected:

We are not aware of any other recovery issues in the most recent release (4.2.0).
You can use Wireshark to take a traffic capture and see what's going on on the wire:


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rajesh D N

unread,
Oct 2, 2017, 11:46:37 PM10/2/17
to rabbitmq-users
My understanding of this might be due to Kaspersky is wrong.

I uninstalled kaspersky and i still see this error in logs. I have set the heartbeat to 10 seconds. What's causing this?

And even though I have connection auto recovery set to true this error causes the connection not to get recovered.

Michael Klishin

unread,
Oct 3, 2017, 9:56:28 AM10/3/17
to rabbitm...@googlegroups.com
There can be a lot of reasons that lead to TCP socket write timeouts.

This error does not "cause client connection to not get recovered". It merely indicates a client network connection
is not in a healthy state. Before the client detects it — which takes time, see http://rabbitmq.com/heartbeats.html — automatic
recovery will not begin.

Socket write timeouts and connection recovery are not directly related. It is fair to say that
they do have a common root cause, however.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send an email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rajesh D N

unread,
Oct 4, 2017, 3:11:35 AM10/4/17
to rabbitmq-users
I tried setting the hearbeat to 10 seconds, 30 seconds.

but now i see  too many "{inet_error,enotconn}" in rabbitmq logs. 

Now, i am running into some wearied issue after this error.

I am getting "Connection reset by peer: socket write error" issue when try to acknowledge the message. 

this is the line that is causing exception recoverableChannel.basicAck(deliveryTag, false);

I am using DefaultConsumer, recoverableChannel,  autorecoveringConnection on java client 4.10. 

above this line i have tried printing channel number and it prints fine. so i am suspecting the basicAck method is causing this issue. 

any thoughts on this?


On Tuesday, 3 October 2017 19:26:28 UTC+5:30, Michael Klishin wrote:
There can be a lot of reasons that lead to TCP socket write timeouts.

This error does not "cause client connection to not get recovered". It merely indicates a client network connection
is not in a healthy state. Before the client detects it — which takes time, see http://rabbitmq.com/heartbeats.html — automatic
recovery will not begin.

Socket write timeouts and connection recovery are not directly related. It is fair to say that
they do have a common root cause, however.
On Tue, Oct 3, 2017 at 6:46 AM, Rajesh D N <rajes...@gmail.com> wrote:
My understanding of this might be due to Kaspersky is wrong.

I uninstalled kaspersky and i still see this error in logs. I have set the heartbeat to 10 seconds. What's causing this?

And even though I have connection auto recovery set to true this error causes the connection not to get recovered.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send an email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Oct 4, 2017, 6:27:40 AM10/4/17
to rabbitm...@googlegroups.com
"enotconn" means a TCP socket is "no longer connected" (in an unusable state). It's very likely a yet
another variation of the same root cause: TCP connections from clients time out or error.

If you observe this over and over, something is up with the quality of the network connection
between your clients and RabbitMQ. Such errors cannot be avoided entirely but in many cases
are not the norm.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rajesh D N

unread,
Oct 4, 2017, 7:48:49 AM10/4/17
to rabbitmq-users
can you help me in solving/troubleshooting  the "Connection reset by peer: socket write error" while ack recoverableChannel.basicAck(deliveryTag, false);?

if you are available i can send a meeting invite to check. 

To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Oct 4, 2017, 8:14:07 AM10/4/17
to rabbitm...@googlegroups.com
The ack operation here is not the root cause, merely something that happens to be called often enough in your app to run into a socket write failure.

You have enough evidence on both ends that TCP connections 
between RabbitMQ and your clients fail often and repeatedly.
Client connection recovery can help to some extent but ultimately the question is why that happens and how can it be made less frequent.

Please take it from here.
Staff Software Engineer, Pivotal/RabbitMQ
Reply all
Reply to author
Forward
0 new messages