"silent" disconnecting

959 views
Skip to first unread message

duus...@gmail.com

unread,
Feb 18, 2016, 8:45:21 AM2/18/16
to rabbitmq-users

In my application, I'm trying to connect to RabbitMQ broker as a consumer (via RabbitMQ Java Client). And in most cases connection succeed. But sometimes it fails, and I get these exceptions: java.io.IOException, java.io.EOFException. Why this error appears?

Also, when app connecting to broker as a consumer, this connection appears in RabbitMQ manager. After some time this connection dissapears. But client "thinks" that is still connected. Function AMQConnection.isOpen, returns true. I think this may be related to this exceptions, I mentioned earlier.

Acctually exceptions are not that big problem. Problem starts when client have not acctual information about connection.

Connecting to RabbitMQ:

mqConnFactory = new ConnectionFactory();
String uri = "<url to RabbitMQ>";
mqConnFactory
.setExceptionHandler(exceptionHandler);
mqConnFactory
.setRequestedHeartbeat(120);
mqConnFactory
.setConnectionTimeout(30000);
mqConnection
= (AMQConnection) mqConnFactory.newConnection();



Michael Klishin

unread,
Feb 18, 2016, 8:48:28 AM2/18/16
to rabbitm...@googlegroups.com, duus...@gmail.com
On 18 February 2016 at 16:45:23, duus...@gmail.com (duus...@gmail.com) wrote:
> In my application, I'm trying to connect to RabbitMQ broker as
> a consumer (via RabbitMQ Java Client). And in most cases connection
> succeed. But sometimes it fails, and I get these exceptions:
> java.io.IOException, java.io.EOFException. Why this error
> appears?

EOFException is very generic. It means peer closed TCP connection or it was lost.

> Also, when app connecting to broker as a consumer, this connection
> appears in RabbitMQ manager. After some time this connection
> dissapears. But client "thinks" that is still connected. Function
> AMQConnection.isOpen, returns true. I think this may be related
> to this exceptions, I mentioned earlier.

It takes time to detect TCP connection loss in many cases. See
http://rabbitmq.com/heartbeats.html.

Inspecting RabbitMQ log is a must when investigating connectivity issues. So is asking yourself
what kind of intermediaries (proxies, load balancers) can be involved. 
--
MK

Staff Software Engineer, Pivotal/RabbitMQ


duus...@gmail.com

unread,
Feb 18, 2016, 1:25:04 PM2/18/16
to rabbitmq-users
I had this issue today again. And in log I have only 2 entries:

=INFO REPORT==== 18-Feb-2016::18:02:50 ===
accepting AMQP connection <0.8913.0> (192.168.0.20:36208 -> 192.168.0.13:5672)

=ERROR REPORT==== 18-Feb-2016::18:08:50 ===
closing AMQP connection <0.8913.0> (192.168.0.20:36208 -> 192.168.0.13:5672):
missed heartbeats from client, timeout: 120s
When client calls isOpen, it returns true. It's strange. I had this error about missed heartbeats already. And to fix it, I was changing heartbeat value. I fially set heartbeat to 120s.

duus...@gmail.com

unread,
Feb 18, 2016, 5:07:23 PM2/18/16
to rabbitmq-users, duus...@gmail.com
Except that logs I already pasted, these logs also appeared:


=INFO REPORT==== 18-Feb-2016::21:56:42 ===
accepting AMQP connection <0.25224.3> (192.168.0.20:54013 -> 192.168.0.13:5672)

=ERROR REPORT==== 18-Feb-2016::22:02:42 ===
closing AMQP connection <0.25224.3> (192.168.0.20:54013 -> 192.168.0.13:5672):

missed heartbeats from client, timeout: 120s

=INFO REPORT==== 18-Feb-2016::22:03:46 ===
accepting AMQP connection <0.28703.3> (192.168.0.20:57415 -> 192.168.0.13:5672)

=ERROR REPORT==== 18-Feb-2016::22:09:46 ===
closing AMQP connection <0.28703.3> (192.168.0.20:57415 -> 192.168.0.13:5672):

missed heartbeats from client, timeout: 120s

=INFO REPORT==== 18-Feb-2016::22:13:57 ===
accepting AMQP connection <0.932.4> (192.168.0.20:58957 -> 192.168.0.13:5672)

=ERROR REPORT==== 18-Feb-2016::22:19:57 ===
closing AMQP connection <0.932.4> (192.168.0.20:58957 -> 192.168.0.13:5672):

missed heartbeats from client, timeout: 120s

=INFO REPORT==== 18-Feb-2016::22:27:09 ===
accepting AMQP connection <0.7423.4> (192.168.0.20:50481 -> 192.168.0.13:5672)

=WARNING REPORT==== 18-Feb-2016::22:29:29 ===
closing AMQP connection <0.7423.4> (192.168.0.20:50481 -> 192.168.0.13:5672):
client unexpectedly closed TCP connection

=INFO REPORT==== 18-Feb-2016::22:33:36 ===
accepting AMQP connection <0.10508.4> (192.168.0.20:35190 -> 192.168.0.13:5672)

=ERROR REPORT==== 18-Feb-2016::22:33:36 ===
closing AMQP connection <0.10508.4> (192.168.0.20:35190 -> 192.168.0.13:5672):
{handshake_timeout,handshake}

=INFO REPORT==== 18-Feb-2016::22:34:19 ===
accepting AMQP connection <0.10950.4> (192.168.0.20:41237 -> 192.168.0.13:5672)

=ERROR REPORT==== 18-Feb-2016::22:40:19 ===
closing AMQP connection <0.10950.4> (192.168.0.20:41237 -> 192.168.0.13:5672):

missed heartbeats from client, timeout: 120s

 
When client disconnects, I expect to handleShutdownSignal is called. And sometimes it is. But not always. And thats the problem.
 

Michael Klishin

unread,
Feb 19, 2016, 12:07:48 AM2/19/16
to rabbitmq-users, duus...@gmail.com


On Friday, 19 February 2016 01:07:23 UTC+3, duustin89 wrote:
Except that logs I already pasted, these logs also appeared:


=INFO REPORT==== 18-Feb-2016::21:56:42 ===
accepting AMQP connection <0.25224.3> (192.168.0.20:54013 -> 192.168.0.13:5672)

=ERROR REPORT==== 18-Feb-2016::22:02:42 ===
closing AMQP connection <0.25224.3> (192.168.0.20:54013 -> 192.168.0.13:5672):
missed heartbeats from client, timeout: 120s

This message is pretty self-explanatory.
 
=INFO REPORT==== 18-Feb-2016::22:03:46 ===
accepting AMQP connection <0.28703.3> (192.168.0.20:57415 -> 192.168.0.13:5672)

=ERROR REPORT==== 18-Feb-2016::22:09:46 ===
closing AMQP connection <0.28703.3> (192.168.0.20:57415 -> 192.168.0.13:5672):
missed heartbeats from client, timeout: 120s

=INFO REPORT==== 18-Feb-2016::22:13:57 ===
accepting AMQP connection <0.932.4> (192.168.0.20:58957 -> 192.168.0.13:5672)

=ERROR REPORT==== 18-Feb-2016::22:19:57 ===
closing AMQP connection <0.932.4> (192.168.0.20:58957 -> 192.168.0.13:5672):
missed heartbeats from client, timeout: 120s

=INFO REPORT==== 18-Feb-2016::22:27:09 ===
accepting AMQP connection <0.7423.4> (192.168.0.20:50481 -> 192.168.0.13:5672)

=WARNING REPORT==== 18-Feb-2016::22:29:29 ===
closing AMQP connection <0.7423.4> (192.168.0.20:50481 -> 192.168.0.13:5672):
client unexpectedly closed TCP connection


This means client closed TCP connection without closing AMQP 0-9-1 connection.
 
=INFO REPORT==== 18-Feb-2016::22:33:36 ===
accepting AMQP connection <0.10508.4> (192.168.0.20:35190 -> 192.168.0.13:5672)

=ERROR REPORT==== 18-Feb-2016::22:33:36 ===
closing AMQP connection <0.10508.4> (192.168.0.20:35190 -> 192.168.0.13:5672):
{handshake_timeout,handshake}


This means a client connected and then never performed AMQP 0-9-1 handshake.
 
=INFO REPORT==== 18-Feb-2016::22:34:19 ===
accepting AMQP connection <0.10950.4> (192.168.0.20:41237 -> 192.168.0.13:5672)

=ERROR REPORT==== 18-Feb-2016::22:40:19 ===
closing AMQP connection <0.10950.4> (192.168.0.20:41237 -> 192.168.0.13:5672):
missed heartbeats from client, timeout: 120s

 
When client disconnects, I expect to handleShutdownSignal is called. And sometimes it is. But not always. And thats the problem.
 

I believe http://rabbitmq.com/heartbeats.html explains why. 120 seconds is a pretty high heartbeat interval,
perhaps you want something more like 10 seconds.

duus...@gmail.com

unread,
Feb 20, 2016, 8:18:01 AM2/20/16
to rabbitmq-users, duus...@gmail.com
Ok. I changed heartbeat to 10 sec. And now situation is much better - when connection is closed, handler is called, and I can handle that case.
But now, app loosing connection more ofthen than before. Sometimes it's 10 minutes, sometimes is less than 30 sec:


=INFO REPORT==== 20-Feb-2016::13:45:13 ===
accepting AMQP connection <0.13061.19> (192.168.0.20:45018 -> 192.168.0.13:5672)

=ERROR REPORT==== 20-Feb-2016::13:48:14 ===
closing AMQP connection <0.13061.19> (192.168.0.20:45018 -> 192.168.0.13:5672):
missed heartbeats from client, timeout: 10s

=INFO REPORT==== 20-Feb-2016::13:48:16 ===
accepting AMQP connection <0.14574.19> (192.168.0.20:59819 -> 192.168.0.13:5672)

=ERROR REPORT==== 20-Feb-2016::13:49:06 ===
closing AMQP connection <0.14574.19> (192.168.0.20:59819 -> 192.168.0.13:5672):
missed heartbeats from client, timeout: 10s

=INFO REPORT==== 20-Feb-2016::13:49:10 ===
accepting AMQP connection <0.15021.19> (192.168.0.20:41576 -> 192.168.0.13:5672)

=ERROR REPORT==== 20-Feb-2016::13:49:40 ===
closing AMQP connection <0.15021.19> (192.168.0.20:41576 -> 192.168.0.13:5672):
missed heartbeats from client, timeout: 10s

Can I improve this situation, to lost connection less frequent? I read that article from link you sent. And I know, that connection lost may happen. But now it's happens very frequent, I think.
I didn't mentioned yet, but this application runs on Android. This information may be important.


I believe http://rabbitmq.com/heartbeats.html explains why. 120 seconds is a pretty high heartbeat interval,
perhaps you want something more like 10 seconds.
Should I also change connectionTimeout in connection factory? Now, it's set to 60000 (propably calculated from heartbeat).

Michael Klishin

unread,
Feb 20, 2016, 8:51:16 AM2/20/16
to rabbitm...@googlegroups.com, duus...@gmail.com
In Android and iOS apps are completely hibernated when they go into background, and they can only transfer data at particular intervals, or never. So this is expected.

There can be a value that makes it less frequent but still
makes connection loss to be detected reasonably quickly. It's a matter of tweaking the heartbeat interval until you find the value.

Or simply re-connect when your app goes into foreground, and disconnect when it is hibernated.
This is how mobile apps are supposed to be designed,
as far as I understand.

duus...@gmail.com

unread,
Feb 22, 2016, 5:29:10 AM2/22/16
to rabbitmq-users, duus...@gmail.com
In my app, message delivery time is important. Message make sense only in short time - less than minute or so. Of course if app has internet access. So I need way, to consume messages constantly. And now, when I decrease heartbeat time it looks better. I do reconnect right after app lost connection. But I think that reconnecting over and over again is not eficient. I'm just wondering how apps like whatapp or fb messanger work. Both of them deliver message even if phone is hibernated.
Maybe someone had similar problem already?

Michael Klishin

unread,
Feb 22, 2016, 5:41:42 AM2/22/16
to rabbitm...@googlegroups.com, duus...@gmail.com
On 22 February 2016 at 13:29:13, duus...@gmail.com (duus...@gmail.com) wrote:
> I think that reconnecting over and over again is not eficient.

It’s not but if the OS hibernates our process, what can a client library do about it?
 
> I'm just wondering how apps like whatapp or fb messanger work.
> Both of them deliver message even if phone is hibernated.

Before you consider my suggestions below, note: I have done virtually no mobile development
in my life, so I can be entirely wrong. 

First of all, there are push notifications which in both cases probably have special privileges
compared to arbitrary TCP connections that apps open.

At least in the past, mobile Facebook are known to do pretty hacky things to make the OS believe it needs
time to run in the background before hibernating [1].

Like a lot of things on mobile devices, this is a matter of finding the best balance between
battery life and latency.

1. https://blog.hagga.net/archives/iphone-ipod-touch/3805-facebook-app-is-draining-your-iphones-battery
Reply all
Reply to author
Forward
0 new messages