RabbitMQ consumers getting disconnected automatically

4,844 views
Skip to first unread message

Milind Utsav

unread,
Jun 2, 2016, 5:11:57 AM6/2/16
to rabbitmq-users
We have a bunch of consumers running behind a queue and they are getting disconnected after some time randomly. Interestingly, they don't get disconnected when they are idle, but when they are processing messages.
The error thrown is "Socket closed when the connection was open". Please find attached a screenshot of the full trace. This error started happening only recently. We are on Google Cloud, using RabbitMQ version 3.3.5, pika 0.9.14, with three RAM nodes and one disk/stats node. Our one message takes about 30-40 seconds to process. The heartbeat_value is the default 580 seconds. We are using a topic exchange to route our messages to various queues, and the prefetch count for every consumer is 20. Acknowledgements are enabled, and there is a dedicated channel for every consumer.

Any thoughts on why are the consumers getting disconnected randomly?

Thanks,
Milind.
gcloudrabbitmq.png

Karl Nilsson

unread,
Jun 2, 2016, 5:19:29 AM6/2/16
to rabbitm...@googlegroups.com
Is there anything in the rabbitmq server logs at the corresponding time?

Cheers
Karl

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Karl Nilsson

Staff Software Engineer, Pivotal/RabbitMQ

Milind Utsav

unread,
Jun 2, 2016, 5:52:04 AM6/2/16
to rabbitmq-users
There are heartbeat_timeout messages in the server log, but we cannot pinpoint the time when the consumer got disconnected to match it with the corresponding message.

Why are heartbeats a problem if a message only takes 40 seconds at max to load and the interval is set to 580 by default?

Karl Nilsson

unread,
Jun 2, 2016, 6:24:56 AM6/2/16
to rabbitm...@googlegroups.com
Heartbeats are negotiated during the connection handshake. Can you check the actual heartbeat value of the connection in the management interface?

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Milind Utsav

unread,
Jun 2, 2016, 9:03:11 AM6/2/16
to rabbitmq-users
The management interface shows 580 seconds as the default timeout. However, upon deeper digging, we found some log statements that might be of interest. The errors are {inet_error, etimed out} and {connection closed abruptly}. I forgot to mention that our cluster is load balanced, and a superficial Google search reveals that it might be because of a network firewall or load balancer. Can that be the cause for inet_error? What can be the cause for the second error?

Karl Nilsson

unread,
Jun 2, 2016, 9:16:52 AM6/2/16
to rabbitm...@googlegroups.com
Yes this sounds like it could be caused by the load balancer. I'd investigate it's logs and settings next.

Cheers
Karl

Michael Klishin

unread,
Jun 2, 2016, 9:43:19 AM6/2/16
to rabbitm...@googlegroups.com
"connection closed abruptly" means a client connection was closed. In case of a load balancer, that means the TCP connection
between the load balancer and a RabbitMQ node. Either reduce heartbeat interval to 60 (the default in 3.6.x) or make
sure that your load balancer connection inactivity timeout is at least 300 seconds (heartbeats are sent at half the timeout value).
Otherwise the proxy will close connections it considers to be inactive.

Inspecting proxy metrics and logs is a very good idea, of course.
MK

Staff Software Engineer, Pivotal/RabbitMQ
Reply all
Reply to author
Forward
0 new messages