I like to understand heartbeat timeout error

3,201 views
Skip to first unread message

dan boston

unread,
Oct 6, 2015, 12:28:39 PM10/6/15
to rabbitmq-users
Hello,

I am experiencing a heartbeat timeout that I don't understand why it happens, and wondering if someone could provide me some hints.

Below are my settings:

- I am running RabbitMQ 3.1.0 on my Mac laptop.
- My application (message consumer that use Java client) takes long time (multiple hours) to process.
- I did not set heartbeat.  (I verified by printing out ConnectionFactory.getRequestedHeartbeat(), which returns 0).

Problem I am seeing is that I got heartbeat timeout error (see below) on the rabbitmq server after 2 hours 45 minutes message was sent to consumer while the consumer was still processing. 

=ERROR REPORT==== 6-Oct-2015::01:15:26 ===
closing AMQP connection <0.4303.0> (127.0.0.1:56818 -> 127.0.0.1:5672):
{heartbeat_timeout,running}

My question is what caused rabbitmq server to throw heartbeat time error?  I don't understand why I still got this error even though heartbeat was not set.

Thanks,
Dan

Michael Klishin

unread,
Oct 6, 2015, 12:37:36 PM10/6/15
to rabbitm...@googlegroups.com
It is set for the server by default and both peers negotiate. What client version do you use?

Michael Klishin

unread,
Oct 6, 2015, 12:39:44 PM10/6/15
to rabbitm...@googlegroups.com
Also: there are no reasons to run 3.1.0. It is not supported in any way, please upgrade to 3.5.5 (via Homebrew or standalone Mac build).

FTR, 3.5.5 Java client can work with 3.1.0 server.

> On 6 oct 2015, at 19:28, dan boston <danbos...@gmail.com> wrote:
>

dan boston

unread,
Oct 6, 2015, 1:17:12 PM10/6/15
to rabbitmq-users
Hi Michael,

Thanks for your reply.

If it is set for the server by default, what is the default time for the server to timeout if the message sent out to consumer has not been acknowledged?  I just want understand what caused heartbeat timeout since I could not find any documents that describe how this works.

Also, do I need to set enable heartbeat?  And why?

I am using Java client 3.1.2.

Thanks,

Dan

Michael Klishin

unread,
Oct 6, 2015, 1:59:53 PM10/6/15
to rabbitm...@googlegroups.com, dan boston
 On 6 Oct 2015 at 20:17:16, dan boston (danbos...@gmail.com) wrote:
> Thanks for your reply.
>
> If it is set for the server by default, what is the default time
> for the server to timeout if the message sent out to consumer has
> not been acknowledged? I just want understand what caused heartbeat
> timeout since I could not find any documents that describe how
> this works.

Default heartbeat is 580 seconds.

> Also, do I need to set enable heartbeat? And why?

> I am using Java client 3.1.2.

That client should already use a thread pool (an ExecutorService, to be specific),
so I/O loop blocking by slow consumers, which has been an issue e.g.
in .NET client up to 3.5.0, shouldn’t be applicable.

It could be a heartbeat sender thread starvation.

In any case, there is no reason to use Java client 3.1.x, and we *highly* recommend
moving on from 3.1.x server, which is dozens of releases behind:
http://www.rabbitmq.com/changelog.html
--
MK

Staff Software Engineer, Pivotal/RabbitMQ


dan boston

unread,
Oct 6, 2015, 2:35:37 PM10/6/15
to rabbitmq-users, danbos...@gmail.com
Hi Michael,

Two questions:

1 - I don't see this default heartbeat time as you said in previous post.  Because in my case rabbitmq server thrown out the "heartbeat timeout error" 2 hours and 45 minutes after the message was sent to consumer.  If default heartbeat is 580 seconds like you said, shouldn't the server throw this error after about 10 minutes?

2 - I am confused with this "default heartbeat", as this document (https://www.rabbitmq.com/heartbeats.html) describes, "Heartbeats can be disabled by setting the timeout interval to 0", which is what I did exactly, then how come there is still "default heartbeat"?  This does not make sense to me.

Yes I will move to later version as you suggested, but I am not clear with what caused the "heartbeat timeout" in my tests.

Thanks,
Dan

Michael Klishin

unread,
Oct 6, 2015, 2:40:46 PM10/6/15
to rabbitm...@googlegroups.com, dan boston
On 6 Oct 2015 at 21:35:39, dan boston (danbos...@gmail.com) wrote:
> 1 - I don't see this default heartbeat time as you said in previous
> post. Because in my case rabbitmq server thrown out the "heartbeat
> timeout error" 2 hours and 45 minutes after the message was sent
> to consumer. If default heartbeat is 580 seconds like you said,
> shouldn't the server throw this error after about 10 minutes?

That’s 580 seconds since most recent connection activity, not the moment
when connection was set up.

> 2 - I am confused with this "default heartbeat", as this document
> (https://www.rabbitmq.com/heartbeats.html) describes,
> "Heartbeats can be disabled by setting the timeout interval
> to 0", which is what I did exactly, then how come there is still
> "default heartbeat"? This does not make sense to me.

I thought you were using all defaults in the client. 

It can be a bug. This is one of the reasons why I recommend not using 3.1.x:
we don’t support it, and don’t remember what kind of known issues those releases
had all that well.

dan boston

unread,
Oct 6, 2015, 3:14:11 PM10/6/15
to rabbitmq-users, danbos...@gmail.com
Sorry, I still have one more question (hope this would be the last). 

> That’s 580 seconds since most recent connection activity, not the moment 
> when connection was set up. 

I think there are only two connection activities in my test: first, the message was post to the queue, and second, the message was picked up from the queue by the consumer.  After the consumer picks up the message it starts its process, and it does NOT access to the queue at all during its process, and therefore there should be no more connection activity after the consumer picks up the message.  The "two hours and 45 minutes" I have been mentioning is between the time the message was post to the queue, and time the rabbitmq thrown out "heartbeat timeout" error. I have checked that the consumer picks up the message quickly after the message was post to the queue. Therefore "two hours and 45 minutes" is really the time from the consumer picked up the message until the time the rabbitmq server thrown out "heartbeat timeout error".  This is why I really don't see "default heartbeat time 580 seconds" takes any effects here.

Thanks,
Dan

Michael Klishin

unread,
Oct 6, 2015, 3:16:19 PM10/6/15
to rabbitm...@googlegroups.com, dan boston
On 6 Oct 2015 at 22:14:15, dan boston (danbos...@gmail.com) wrote:
> After the consumer picks up the message it starts its process,
> and it does NOT access to the queue at all during its process, and
> therefore there should be no more connection activity after
> the consumer picks up the message. The "two hours and 45 minutes"
> I have been mentioning is between the time the message was post
> to the queue, and time the rabbitmq thrown out "heartbeat timeout"
> error. I have checked that the consumer picks up the message quickly
> after the message was post to the queue. Therefore "two hours
> and 45 minutes" is really the time from the consumer picked up
> the message until the time the rabbitmq server thrown out "heartbeat
> timeout error". This is why I really don't see "default heartbeat
> time 580 seconds" takes any effects here.

If heartbeats are enabled at all (you can see this in the management UI),
clients and server periodically exchange heartbeat frames: those are connection
activity. 

Luis Santos

unread,
Oct 6, 2015, 5:11:16 PM10/6/15
to rabbitmq-users, danbos...@gmail.com
Hi guys,

Since i'm having some troubles with the heart beat i will ask another question that may help other people.

Is the heartbeat exchanged at connection or at channel level? Does the 
heartbeat thread uses an existing channel or creates a new one? Or Is the heartbeat handled in totally different way? 


Thanks.

Luis Santos

Michael Klishin

unread,
Oct 6, 2015, 5:25:43 PM10/6/15
to rabbitm...@googlegroups.com, Luis Santos, danbos...@gmail.com
 On 7 Oct 2015 at 00:11:19, Luis Santos (lu...@luissantos.pt) wrote:
> Is the heartbeat exchanged at connection or at channel level?
> Does the
> heartbeat thread uses an existing channel or creates a new one?
> Or Is the heartbeat handled in totally different way?

Connection.

Heartbeat communication happens on channel 0 — a special channel that
must not be used by applications. I think it’s fair to say that
heartbeats are handled differently from most protocol methods used
by apps.

dan boston

unread,
Oct 6, 2015, 5:58:58 PM10/6/15
to rabbitmq-users, danbos...@gmail.com
Ok thanks Michael, now I see the heartbeat is set to 600 seconds (from connection.getHeartbeat()).

I have two more questions:

1 Could you explain the difference between the two heartbeats that are returned from the following two methods?  From my tests, the first one returns 0 and second returns 600.  If I understood this document (https://www.rabbitmq.com/heartbeats.html) correctly, heartbeat is DISABLED if the "requested heartbeat" is set it to 0, what is why I always thought that heartbeats are disabled in my tests.
   - connectionFactory.getRequestedHeartbeat()
   - connection.getHeartbeat()

2 Now looks like that heartbeats are enabled.  Then my next question is, what is cause of "heartbeat timeout error" (see below) that I got on the rabbitmq server? And how do I solve this problem? 
=ERROR REPORT==== 6-Oct-2015::01:15:26 ===
closing AMQP connection <0.4362.0> (127.0.0.1:56823 -> 127.0.0.1:5672):
{heartbeat_timeout,running}


Thanks,
Dan

Michael Klishin

unread,
Oct 6, 2015, 6:14:37 PM10/6/15
to rabbitm...@googlegroups.com, dan boston
On 7 Oct 2015 at 00:59:00, dan boston (danbos...@gmail.com) wrote:
> 1 Could you explain the difference between the two heartbeats
> that are returned from the following two methods? From my tests,
> the first one returns 0 and second returns 600. If I understood
> this document (https://www.rabbitmq.com/heartbeats.html)
> correctly, heartbeat is DISABLED if the "requested heartbeat"
> is set it to 0, what is why I always thought that heartbeats are
> disabled in my tests.
> - connectionFactory.getRequestedHeartbeat()
>
> - connection.getHeartbeat()

The former returns the heartbeat you requested (configured). The latter
is *effective* heartbeat for a connection, that is, after it’s been negotiated
with the server.

> 2 Now looks like that heartbeats are enabled. Then my next question
> is, what is cause of "heartbeat timeout error" (see below) that
> I got on the rabbitmq server? And how do I solve this problem?
> =ERROR REPORT==== 6-Oct-2015::01:15:26 ===
> closing AMQP connection <0.4362.0> (127.0.0.1:56823 -> 127.0.0.1:5672):
> {heartbeat_timeout,running}

I’ve mentioned some possible issues earlier in this thread . Either you have

 * A genuine TCP connection failure
 * A case where resource-constrained client couldn’t sent a heartbeat frame in a while

The former is more likely.

Heartbeats are fairly well explained in the docs:
http://www.rabbitmq.com/heartbeats.html

dan boston

unread,
Oct 6, 2015, 6:36:04 PM10/6/15
to rabbitmq-users, danbos...@gmail.com
Thanks Michael for your explanation for my first question, it makes sense.

However for your answer below for my second question, sorry I don't see you have mentioned a solution in this thread for solving this "heartbeat timeout" problem.  You did recommend to upgrade to later version which I will do, is that the solution?

> I’ve mentioned some possible issues earlier in this thread . Either you have 

> * A genuine TCP connection failure 
> * A case where resource-constrained client couldn’t sent a heartbeat frame in a while 

> The former is more likely. 

Thanks,
Dan

Michael Klishin

unread,
Oct 6, 2015, 6:48:55 PM10/6/15
to rabbitm...@googlegroups.com, dan boston
On 7 Oct 2015 at 01:36:07, dan boston (danbos...@gmail.com) wrote:
> However for your answer below for my second question, sorry
> I don't see you have mentioned a solution in this thread for solving
> this "heartbeat timeout" problem. You did recommend to upgrade
> to later version which I will do, is that the solution?

First thing would be: please make sure you understand what heartbeats *are for*.

IIRC in 3.5.6 the Java client will start automatic connection recovery [1] upon
missed heartbeats. With earlier versions, recovering connections and so on is up to you.

Upgrading the client *will not* magically make TCP connections go down; it does,
however, offer automatic connection recovery which works very well in practice.

Upgrading the server has nothing to do with heartbeats or automatic recovery.
3.1.x is dozens of releases behind and is not supported in any way.

I don’t have much to add beyond this. Please put some effort into understanding
why heartbeats exist. 

1. http://www.rabbitmq.com/api-guide.html

dan boston

unread,
Oct 6, 2015, 8:29:44 PM10/6/15
to rabbitmq-users, danbos...@gmail.com
> IIRC in 3.5.6 the Java client will start automatic connection recovery [1] upon 
> missed heartbeats. With earlier versions, recovering connections and so on is up to you. 

Sorry for not being familiar with your terms, what is IIRC?  Because I checked rabbitmq.com and noticed that the latest version for both server and Java client is 3.5.5.  I also checked the JavaDoc for java client 3.5.5 and see the method setAutomaticRecoveryEnabled() in class ConnectionFactory.  Did you say calling connectionFactory.setAutomaticRecoveryEnabled() in 3.5.5 to set it to true will still not recover the connections lost due to missing heartbeats?

Thanks,
Dan

Michael Klishin

unread,
Oct 7, 2015, 1:21:46 AM10/7/15
to rabbitm...@googlegroups.com, danbos...@gmail.com
If I remember correctly.

3.5.6 ships any day now.

dan boston

unread,
Oct 7, 2015, 10:10:35 AM10/7/15
to rabbitmq-users, danbos...@gmail.com
Ok thanks Michael.  But how about 3.5.5?  could you please confirm if connectFactory.setAutomaticRecoveryEnabled() recover the connections that are lost due to missing heartbeats?

Thanks,
Dan

Michael Klishin

unread,
Oct 7, 2015, 10:13:03 AM10/7/15
to rabbitm...@googlegroups.com, dan boston
On 7 October 2015 at 17:10:39, dan boston (danbos...@gmail.com) wrote:
> But how about 3.5.5? could you please confirm if connectFactory.setAutomaticRecoveryEnabled()
> recover the connections that are lost due to missing heartbeats?

No:
https://github.com/rabbitmq/rabbitmq-java-client/issues/57

dan boston

unread,
Oct 7, 2015, 10:24:46 AM10/7/15
to rabbitmq-users, danbos...@gmail.com
Michael, thanks for confirmation.
Reply all
Reply to author
Forward
0 new messages