Connection gets closed due to missed heartbeats

285 views
Skip to first unread message

Nanda Kumar

unread,
Jul 24, 2018, 5:51:01 PM7/24/18
to Pika
Hi,

I just discovered this group so posting my questions here as well. I already created an issue at https://github.com/pika/pika/issues/1104.

We are currently seeing an issue with our application which uses pika to send/consume messages to/from Rabbit MQ. I don't think it is a bug but we are certainly missing something here so wanted to get some help or feedback from this forum.

We are using pika 0.11.2 and Rabbit 3.7.4, Python 2.7.6.
We recently upgraded both pika & Rabbit and since then we started seeing an issue with our application so not sure what else need to be changed.
Also, we are using SelectConnection for sending or consuming messages in and out of Rabbit MQ.
When sending/publishing messages after opening a connection, at times, our application can do some processing in between sending messages so it doesn't queue in a message until after few minutes from the last message was sent.
Issue is when that happens we are seeing the Rabbit is closing the connection with the following error in its log:
"missed heartbeats from client, timeout: 60s"

From tcpdump, it appears, Rabbit server is sending a heartbeat(not sure) frame every 60 secs and I see a reply back immediately. But after 2 attempts, I do not see a reply and that's exactly when I see the error message in Rabbit logs.

We have 'heartbeat' set to None when initiating SelectConnection object. And Rabbit is using all default configurations, nothing has been changed by us.

Here is tcpdump output:

sudo tcpdump -i lo port 54436
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
18:44:28.797612 IP localhost.amqp > localhost.54436: Flags [P.], seq 1356704746:1356704754, ack 2440025265, win 32779, options [nop,nop,TS val 668711718 ecr 668696729], length 8
18:44:28.797639 IP localhost.54436 > localhost.amqp: Flags [.], ack 8, win 350, options [nop,nop,TS val 668711718 ecr 668711718], length 0
18:45:28.799615 IP localhost.amqp > localhost.54436: Flags [P.], seq 8:16, ack 1, win 32779, options [nop,nop,TS val 668726718 ecr 668711718], length 8
18:45:28.799645 IP localhost.54436 > localhost.amqp: Flags [.], ack 16, win 350, options [nop,nop,TS val 668726718 ecr 668726718], length 0
18:46:28.798834 IP localhost.amqp > localhost.54436: Flags [R.], seq 16, ack 1, win 32779, options [nop,nop,TS val 668741718 ecr 668726718], length 0

In above instance, connection was opened at 18:43:28, last successful message was sent at 18:43:48. It took me few seconds to start the tcpdump after opening the connection so it could capture the packets only from 18:44:28.
At 18:46:28, there was no reply back to the server, I am not sure what that means but that caused the connection to be closed by the server.


In the Rabbit logs:
2018-07-23 18:46:28.798 [warning] <0.11822.4> closing AMQP connection <0.11822.4> (127.0.0.1:54436 -> 127.0.0.1:5672):
missed heartbeats from client, timeout: 60s


Questions are -

  • From https://www.rabbitmq.com/heartbeats.html#heartbeats-timeout, "Heartbeat frames are sent about every timeout / 2 seconds. ". But in our case, I do not see any traffic at 60/2=30sec interval but the traffic is seen only at 60 sec interval. Is there something that need to be configured to make it work according to the Rabbit documentation ?
  • Also from docs, "After two missed heartbeats, the peer is considered to be unreachable". But it looks in this case, it is closing the connection after one missed heartbeat. Am I correct ?
  • Finally, how can we solve this issue ? Is there anything I am missing ?
    I'd appreciate the help. Thanks.

lba...@pivotal.io

unread,
Jul 29, 2018, 10:16:55 AM7/29/18
to Pika
Hi Nanda,

I have followed up via the GitHub issue and will continue discussion there.

Thanks,
Luke
Reply all
Reply to author
Forward
0 new messages