Memory Leak in the latest version of rabbitmq (stable)

1,095 views
Skip to first unread message

CP

unread,
Feb 3, 2015, 3:54:26 PM2/3/15
to rabbitm...@googlegroups.com
We had a serious outage on our cluster this past week.

Even though there was no backlog on our queues, RAM on one of the nodes ballooned to 90% (about 60 GB). 

I had to restart the rabbitmq server to clear the alarms.

I am wondering if anyone has seen this behavior with the latest version?

Michael Klishin

unread,
Feb 3, 2015, 9:29:54 PM2/3/15
to CP, rabbitm...@googlegroups.com
Search the list, there were similar discussions in the last month. It would be great to have rabbitmqctl status or report output if you see higher than usual RAM use. Management statistics DB is a common suspect.

MK
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Simon MacMullen

unread,
Feb 4, 2015, 6:47:24 AM2/4/15
to CP, rabbitm...@googlegroups.com
On 03/02/15 20:54, CP wrote:
> We had a serious outage on our cluster this past week.
>
> Even though there was no backlog on our queues, RAM on one of the nodes
> ballooned to 90% (about 60 GB).
>
> I had to restart the rabbitmq server to clear the alarms.

It is really worth looking at the memory diagnostics in "rabbitmqctl
report" or the management plugin node details page if this happens
again. It should give you a half-decent idea of where memory is being
used. See http://www.rabbitmq.com/memory-use.html

> I am wondering if anyone has seen this behavior with the latest version?

There's one known issue in 3.4.3. It's been there since forever though.

We recently determined that the lack of flow control in the master ->
slave part of queue mirroring can lead to excessive memory use in some
circumstances. The symptoms are:

* Messages are being sent through mirrored queues at a high rate
* Queue length can be close to zero
* Memory use assigned to "queues" increases, especially memory used by
slaves
* When the message rate drops to zero, slaves continue to use CPU as
they work through the backlog
* Eventually memory use returns to normal and CPU to 0

Flow control will be added for this message route in 3.5.0.

Cheers, Simon

Noah

unread,
Feb 5, 2015, 12:13:51 PM2/5/15
to rabbitm...@googlegroups.com, chi...@simplyhired.com
Hi,

I am happy to hear you have identified this bug.  It has been long standing for us trough several versions, and we don't believe it to be a regression, either.

In fact, we just saw that exact issue this morning, with 3.4.2.  A queue with one mirror, using the Top Processes plugin, we saw that the  slave process was taking a huge amount of RAM despite the queue containing near-0 messages.  We saw similar memory consumption on the master "gm" queue processes as well.  This state lasted several hours until we finally decided to remove the HA policy from the queue, thus destroying the mirror queue processes and freeing the memory.

-N  

Chirayu Patel

unread,
Feb 6, 2015, 1:35:31 PM2/6/15
to Noah, rabbitm...@googlegroups.com
Will there be a fix for this in next version?

Chirayu Patel
Site Reliabillity Engineer
chi...@simplyhired.com


www.simplyhired.com

Michael Klishin

unread,
Feb 7, 2015, 2:19:22 AM2/7/15
to Noah, Chirayu Patel, rabbitm...@googlegroups.com
 On 6 February 2015 at 21:35:31, Chirayu Patel (chi...@simplyhired.com) wrote:
> Will there be a fix for this in next version?

It is scheduled for 3.5.0 as far as I can see. Late February to early March. 
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Chirayu Patel

unread,
Feb 23, 2015, 9:59:29 PM2/23/15
to Michael Klishin, Noah, rabbitm...@googlegroups.com
Its happening again, here's the output from rabbitmqctl status;

root@queue-101:~# rabbitmqctl status

Status of node 'rabbit@queue-101' ...

[{pid,41248},

 {running_applications,

     [{rabbitmq_federation_management,"RabbitMQ Federation Management",

          "3.4.2"},

      {rabbitmq_management,"RabbitMQ Management Console","3.4.2"},

      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.4.2"},

      {webmachine,"webmachine","1.10.3-rmq3.4.2-gite9359c7"},

      {mochiweb,"MochiMedia Web Server","2.7.0-rmq3.4.2-git680dba8"},

      {rabbitmq_federation,"RabbitMQ Federation","3.4.2"},

      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.4.2"},

      {rabbit,"RabbitMQ","3.4.2"},

      {os_mon,"CPO  CXC 138 46","2.2.14"},

      {inets,"INETS  CXC 138 49","5.9.7"},

      {mnesia,"MNESIA  CXC 138 12","4.11"},

      {amqp_client,"RabbitMQ AMQP Client","3.4.2"},

      {xmerl,"XML parser","1.3.5"},

      {sasl,"SASL  CXC 138 11","2.3.4"},

      {stdlib,"ERTS  CXC 138 10","1.19.4"},

      {kernel,"ERTS  CXC 138 10","2.16.4"}]},

 {os,{unix,linux}},

 {erlang_version,

     "Erlang R16B03 (erts-5.10.4) [source] [64-bit] [smp:32:32] [async-threads:30] [kernel-poll:true]\n"},

 {memory,

     [{total,38605654960},

      {connection_readers,1848552},

      {connection_writers,432464},

      {connection_channels,1366145520},

      {connection_other,2144056},

      {queue_procs,17506274880},

      {queue_slave_procs,7722683144},

      {plugins,606752},

      {other_proc,35804560},

      {mnesia,256312},

      {mgmt_db,11496},

      {msg_index,337189264},

      {other_ets,39207400},

      {binary,11557388208},

      {code,19894172},

      {atom,703377},

      {other_system,15064803}]},

 {alarms,[]},

 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]},

 {vm_memory_high_watermark,0.8},

 {vm_memory_limit,54008391270},

 {disk_free_limit,50000000},

 {disk_free,333488336896},

 {file_descriptors,

     [{total_limit,924},

      {total_used,88},

      {sockets_limit,829},

      {sockets_used,77}]},

 {processes,[{limit,1048576},{used,936}]},

 {run_queue,3},

 {uptime,1817814}]

root@queue-101:~# 


Chirayu Patel
Site Reliabillity Engineer
chi...@simplyhired.com


www.simplyhired.com

Michael Klishin

unread,
Feb 24, 2015, 12:42:18 AM2/24/15
to Chirayu Patel, Noah, rabbitm...@googlegroups.com
Please upgrade to 3.4.4.

MK

Michael Klishin

unread,
Feb 24, 2015, 4:54:07 AM2/24/15
to Chirayu Patel, rabbitm...@googlegroups.com, Noah
On 24 February 2015 at 08:42:06, Michael Klishin (mkli...@pivotal.io) wrote:
> Please upgrade to 3.4.4.

S aid too soon, the fix is scheduled for 3.5.0, which is a few weeks away.

We have nightly builds you can try:
https://www.rabbitmq.com/nightly-builds.html

Chirayu Patel

unread,
Feb 24, 2015, 12:11:20 PM2/24/15
to Michael Klishin, rabbitm...@googlegroups.com, Noah
I'll setup a test environment with nightly build and do some tests.

I do have a question, do you guys think this particular issue can impact the overall throughput on the cluster?

Appreciate all the responses from you guys!

Chirayu Patel
Site Reliabillity Engineer
chi...@simplyhired.com


www.simplyhired.com

Michael Klishin

unread,
Feb 24, 2015, 1:41:08 PM2/24/15
to Chirayu Patel, rabbitm...@googlegroups.com, Noah
It can on the mirror nodes. By how much is impossible to tell without conducting a test.

MK
Reply all
Reply to author
Forward
0 new messages