rabbitmq heartbeat_timeout,running

817 views
Skip to first unread message

Qingchuan Hao

unread,
Sep 19, 2014, 10:23:29 AM9/19/14
to rabbitm...@googlegroups.com
First of all, I'm very appreciated to have a instant reply from the MK, and other friends around. 
Still, I am using rabbitmq 3.1.5 with two nodes in a cluster for message communication.
After a shutdown error, heartbeat related error, and startup, rabbitmq client connected to server, but seem not to exchange heartbeat messages successfully, and all clients closed the connection with heartbeat_timeout, which was logged by server. And the server took a high memory and cpu usage. It seems to be a heartbeat listener error, or high cpu and memory(leak?) load resulted in the heartbeat not functioning well?

Qingchuan Hao

unread,
Sep 19, 2014, 10:35:33 AM9/19/14
to rabbitm...@googlegroups.com
The rabbitmqctl command hanging, and never listed the result.

Qingchuan Hao

unread,
Sep 19, 2014, 10:35:53 AM9/19/14
to rabbitm...@googlegroups.com

Michael Klishin

unread,
Sep 19, 2014, 3:23:32 PM9/19/14
to Qingchuan Hao, rabbitm...@googlegroups.com
On 19 September 2014 at 18:23:36, Qingchuan Hao (haoqin...@gmail.com) wrote:
> It seems to be a heartbeat listener error, or high cpu and memory(leak?)
> load resulted in the heartbeat not functioning well?

It should not be, it is very likely to be something else. Try upgrading to 3.3.5 and Erlang 17 first. Monitoring the number of channels/queues and message rates
prior to when the problem happens may give you some clues.

Also, `rabbitmqctl report` 
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,
Sep 19, 2014, 3:28:26 PM9/19/14
to Qingchuan Hao, rabbitm...@googlegroups.com
On 19 September 2014 at 23:23:32, Michael Klishin (mic...@rabbitmq.com) wrote:
> It should not be, it is very likely to be something else. Try upgrading
> to 3.3.5 and Erlang 17 first. Monitoring the number of channels/queues
> and message rates
> prior to when the problem happens may give you some clues.

Actually, this and your other thread combined make me think it may be bug 26069,
which is fixed in 3.3.0:

http://www.rabbitmq.com/release-notes/README-3.3.0.txt

But this is just a hypothesis.

Another issue may be a storm of clients reconnecting all at once — do you have many of them? 

Qingchuan Hao

unread,
Sep 20, 2014, 11:19:06 AM9/20/14
to rabbitm...@googlegroups.com, haoqin...@gmail.com
No, we did not have too many, no more than 300, client connecting to the server. 

在 2014年9月20日星期六UTC+8上午3时28分26秒,Michael Klishin写道:

Michael Klishin

unread,
Sep 20, 2014, 5:34:20 PM9/20/14
to Qingchuan Hao, rabbitm...@googlegroups.com
On 20 September 2014 at 19:19:12, Qingchuan Hao (haoqin...@gmail.com) wrote:
> No, we did not have too many, no more than 300, client connecting
> to the server.

OK, then my next hypothesis is bug 26069. Please try 3.3.5.

Qingchuan Hao

unread,
Sep 21, 2014, 6:52:28 AM9/21/14
to rabbitm...@googlegroups.com, haoqin...@gmail.com
Ok, thank you very much,@MK. An upgrade is indeed necessary, but product with lower version was handed over to our consumer.And I am trying to read the source code of rabbitmq.

在 2014年9月21日星期日UTC+8上午5时34分20秒,Michael Klishin写道:
Reply all
Reply to author
Forward
0 new messages