RabbitMQ Excessive Memory Utilization

2,774 views
Skip to first unread message

conz...@gmail.com

unread,
Jul 23, 2014, 9:16:03 PM7/23/14
to rabbitm...@googlegroups.com
Hello;

I am encountering memory resource limit alarms in RabbitMQ 3.3.4 (Erlang R15B03) running on Linux x86_64. I have configured vm_memory_high_watermark to .8, or 6.4GB of memory. The broker reaches this limit and blocks all publishers until the condition resolves. After hitting the memory alarm, examination of the channels shows I have no unacknowledged messages in any channel.

When I examine memory usage by the node in the management plugin or using 'rabbitmqctl status', it shows that around 6GB of memory is used by the "Plugins" category. I have enabled the following plugins: rabbitmq_management, rabbitmq_shovel, rabbitmq_shovel_management, and rabbitmq_stomp. I typically have about 140 connections to the broker using the STOMP 1.0 protocol, all perl clients, 135 are publishers with 5 subscribers to the messages. I use 3 exchanges in one vhost, all topic exchanges.

How can I determine the cause of this memory consumption? Do you have any advice? Thanks in advance.

Con

 

Michael Klishin

unread,
Jul 23, 2014, 9:23:57 PM7/23/14
to rabbitm...@googlegroups.com, conz...@gmail.com
On 24 July 2014 at 05:16:05, conz...@gmail.com (conz...@gmail.com) wrote:
> > How can I determine the cause of this memory consumption? Do
> you have any advice? Thanks in advance.

Please post what rabbitmq status (or rabbitmq report, unless it is enormous).

You can reduce the amount of RAM used by the stats DB (used by the management plugin)
by tweaking stats retention policy and emission interval. See

http://www.rabbitmq.com/management.html,

in particular Fine-grained statistics, Statistics interval, and
Sample retention policies.

With STOMP, the plugin itself shouldn't use a lot of RAM. If the apps left any
unused non-empty queues behind, it wouldn't show up in the Plugins section,
but still may be worth inspecting. 
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

conz...@gmail.com

unread,
Jul 23, 2014, 9:31:50 PM7/23/14
to rabbitm...@googlegroups.com, conz...@gmail.com
Is there a way to determine which plugin is consuming the memory, to confirm that it's the management plugin?

Michael Klishin

unread,
Jul 23, 2014, 9:37:20 PM7/23/14
to rabbitm...@googlegroups.com, conz...@gmail.com
On 24 July 2014 at 05:31:53, conz...@gmail.com (conz...@gmail.com) wrote:
> > Is there a way to determine which plugin is consuming the memory,
> to confirm that it's the management plugin?

You can go to the Overview page, then individual node page and use Memory Details
there.

It will display the amount of memory used by management database separately. There is
no per-plugin breakdown.

conz...@gmail.com

unread,
Jul 23, 2014, 9:47:18 PM7/23/14
to rabbitm...@googlegroups.com, conz...@gmail.com
The management database memory usage was very low, on the order or 10 megabytes. The management database was not what was using 6GB of memory, it was under the Plugins category.

I will generate the report once the problem reoccurs. I end up restarting the broker to get things working again when this happens, because the memory never frees up on its own.

Con

Michael Klishin

unread,
Jul 23, 2014, 9:51:00 PM7/23/14
to rabbitm...@googlegroups.com, conz...@gmail.com
On 24 July 2014 at 05:47:20, conz...@gmail.com (conz...@gmail.com) wrote:
> > never frees up on its own.

What kind of workload do you have over STOMP? Do your apps use acknowledgements or
transactions?

Can you reproduce the issue with a couple of scripts, or at least imitate your
workload?

conz...@gmail.com

unread,
Jul 23, 2014, 10:24:59 PM7/23/14
to rabbitm...@googlegroups.com, conz...@gmail.com
I do not use transactions. I use auto-acknowledgement in my consumers. I have around 135 publishers using STOMP, and 5 consumers using STOMP. All of the connections are long-lived for the life of the publisher/consumer process.

What commands should I run to gather debug information the next time this happens (other than the rabbitmqctl report)? Should I gather an Erlang crash dump? It is possible to look at something to determine where the memory is being used?

I can't reproduce the issue on demand since I don't know what's causing it. It occurs within about 24 hours of restarting my broker, so by tomorrow I should see the elevated memory usage again.

Con

Simon MacMullen

unread,
Jul 24, 2014, 7:13:33 AM7/24/14
to conz...@gmail.com, rabbitm...@googlegroups.com
On 24/07/14 03:24, conz...@gmail.com wrote:
> What commands should I run to gather debug information the next time
> this happens (other than the rabbitmqctl report)? Should I gather an
> Erlang crash dump? It is possible to look at something to determine
> where the memory is being used?

This command:

$ rabbitmqctl eval
'lists:sublist(lists:reverse(lists:sort([{process_info(Pid, memory),
Pid, process_info(Pid)} || Pid <- processes()])), 30).'

will print information about the 30 most memory-heavy processes. That is
likely to help narrow things down.

Cheers, Simon

conz...@gmail.com

unread,
Jul 24, 2014, 9:16:17 AM7/24/14
to rabbitm...@googlegroups.com, conz...@gmail.com
Hello all;

Since my last broker restart at 2014-07-23T19:23:16-05:00 (about 12 hours ago) the "Plugins" memory consumption has steadily grown. This .png shows a chart of the growth.



You can see that Plugins memory consumption grew to 1.7GB overnight. That is the problem I'm trying to solve. Eventually memory will hit the high water mark and halt the publishers.

I've posted the output of the command Simon provided in this gist:


Indeed, the top process is using 1.7GB. 

This gist has rabbitmqctl status, for reference:


I checked all channels using this command:

rabbitmqadmin list channels name messages_unacknowledged messages_uncommitted messages_unconfirmed

There are zero unacknowledged messages. Thanks again for your help.

Con

conz...@gmail.com

unread,
Jul 24, 2014, 9:23:53 AM7/24/14
to rabbitm...@googlegroups.com, conz...@gmail.com
Here's an update to the chart that shows the other component of memory growth, the "Binaries" category:


Con

Simon MacMullen

unread,
Jul 24, 2014, 9:32:14 AM7/24/14
to conz...@gmail.com, rabbitm...@googlegroups.com
On 24/07/2014 14:16, conz...@gmail.com wrote:
> I've posted the output of the command Simon provided in this gist:
>
> https://gist.github.com/conzyor34/7fffd7ce1ed6c2667867
>
> Indeed, the top process is using 1.7GB.

Thank you. So that's a STOMP reader process. It has no excuses to be
using that much memory, of course.

I wonder if it is not GCing enough. What does

$ rabbitmqctl eval '[garbage_collect(P) || P <- processes()].'

do when so much memory is in use?

Cheers, Simon

conz...@gmail.com

unread,
Jul 24, 2014, 10:19:46 AM7/24/14
to rabbitm...@googlegroups.com, conz...@gmail.com
I ran the manual garbage collection. Garbage Collection reduced memory consumed somewhat, but STOMP is still using 1.37G.

plugins 2.42G -> 1.37G
binaries 1.53G -> 1.08G

Here is new rabbitmqctl status output after manual garbage collection


Con

Simon MacMullen

unread,
Jul 25, 2014, 7:25:30 AM7/25/14
to conz...@gmail.com, rabbitm...@googlegroups.com
Hi.

I have been playing with STOMP somewhat and I have some idea how this
could be happening. In STOMP (unlike AMQP) we don't have outgoing flow
control for subscribers; if a subscriber consumes with no prefetch count
then we just throw messages at the socket as fast as we can. If the
network or the client can't keep up, memory use can balloon with
messages that have left the queue but haven't yet left the broker. If
your consumers use client acknowledgements then you will be able to see
this as large queues with a small number of ready messages and large
numbers of unacknowledged messages.

If this is what's happening to you then you can fix it by:

* Ensuring your users are in client-acknowledgement mode ("ack: client")
* Setting some limit on the prefetch count (e.g. "prefetch-count: 1000")

I will file a bug to add outgoing flow control to STOMP.

Cheers, Simon

On 24/07/14 15:19, conz...@gmail.com wrote:
> I ran the manual garbage collection. Garbage Collection reduced memory
> consumed somewhat, but STOMP is still using 1.37G.
>
> plugins 2.42G -> 1.37G
> binaries 1.53G -> 1.08G
>
> Here is new rabbitmqctl status output after manual garbage collection
>
> https://gist.github.com/conzyor34/680d8b4ec95eed59f35a
>
> Con
>
> On Thursday, July 24, 2014 8:32:14 AM UTC-5, Simon MacMullen wrote:
>
> On 24/07/2014 14:16, conz...@gmail.com <javascript:> wrote:
> > I've posted the output of the command Simon provided in this gist:
> >
> > https://gist.github.com/conzyor34/7fffd7ce1ed6c2667867
> <https://gist.github.com/conzyor34/7fffd7ce1ed6c2667867>
> >
> > Indeed, the top process is using 1.7GB.
>
> Thank you. So that's a STOMP reader process. It has no excuses to be
> using that much memory, of course.
>
> I wonder if it is not GCing enough. What does
>
> $ rabbitmqctl eval '[garbage_collect(P) || P <- processes()].'
>
> do when so much memory is in use?
>
> Cheers, Simon
>
> --
> You received this message because you are subscribed to the Google
> Groups "rabbitmq-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to rabbitmq-user...@googlegroups.com
> <mailto:rabbitmq-user...@googlegroups.com>.
> To post to this group, send email to rabbitm...@googlegroups.com
> <mailto:rabbitm...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

conz...@gmail.com

unread,
Jul 25, 2014, 9:01:52 AM7/25/14
to rabbitm...@googlegroups.com, conz...@gmail.com
I've located the particular client that was causing the problem. When I shut it down, all memory was immediately freed. 

It is a Perl consumer using STOMP 1.0 consuming 200 messages/s. When we restarted the consumer the problem reappeared but when we rebooted the Linux machine it was running on, the problem failed to reappear even after restarting the consumer. We believe the consumer machine was swapping heavily prior to the reboot. I will add a prefetch count the next time this problem reoccurs.

Thank you very much for your assistance in debugging this.

Con

conz...@gmail.com

unread,
Jul 26, 2014, 3:32:28 PM7/26/14
to rabbitm...@googlegroups.com, conz...@gmail.com
Interestingly or not, while debugging this issue I also discovered a process leak. The number of Erlang processes is slowly and steadily creeping upward at a rate of around 75 per hour, but that can be a topic for another thread.



Con

Simon MacMullen

unread,
Jul 28, 2014, 5:52:18 AM7/28/14
to rabbitm...@googlegroups.com, conz...@gmail.com
On 26/07/14 20:32, conz...@gmail.com wrote:
> Interestingly or not, while debugging this issue I also discovered a
> process leak. The number of Erlang processes is slowly and steadily
> creeping upward at a rate of around 75 per hour, but that can be a topic
> for another thread.

Just to be sure: you're not seeing something leak connections / channels
/ queues at that rate are you?

Cheers, Simon

conz...@gmail.com

unread,
Jul 29, 2014, 11:04:28 AM7/29/14
to rabbitm...@googlegroups.com, conz...@gmail.com


Yes, I found that I'm leaking channels. Thanks again.

Con

conz...@gmail.com

unread,
Jul 29, 2014, 11:14:53 AM7/29/14
to rabbitm...@googlegroups.com, conz...@gmail.com
A little more information: I am leaking channels for STOMP publishers that are in a DMZ with a stateful firewall in between the publisher and the broker. When the firewall times out a connection, the publisher is detecting it and opening a new connection, but the broker is apparently not using TCP keep-alives so the broker is leaving its end of the connection opened.

I am still researching if there is an option to change this. TCP keep-alives are enabled at the linux level for the host running the broker, but something's not working.

Con

Michael Klishin

unread,
Aug 13, 2014, 11:04:26 AM8/13/14
to rabbitm...@googlegroups.com, conz...@gmail.com
On 25 July 2014 at 15:25:31, Simon MacMullen (si...@rabbitmq.com) wrote:
> > I have been playing with STOMP somewhat and I have some idea how
> this
> could be happening. In STOMP (unlike AMQP) we don't have outgoing
> flow
> control for subscribers; if a subscriber consumes with no prefetch
> count
> then we just throw messages at the socket as fast as we can. If the
> network or the client can't keep up, memory use can balloon with
> messages that have left the queue but haven't yet left the broker.
> If
> your consumers use client acknowledgements then you will be
> able to see
> this as large queues with a small number of ready messages and
> large
> numbers of unacknowledged messages.

To follow-up on this issue, it has been resolved and will be in 3.4.0. 

conz...@gmail.com

unread,
Oct 14, 2014, 3:58:15 PM10/14/14
to rabbitm...@googlegroups.com, conz...@gmail.com

On Wednesday, August 13, 2014 10:04:26 AM UTC-5, Michael Klishin wrote:

To follow-up on this issue, it has been resolved and will be in 3.4.0. 
--  
MK  

Staff Software Engineer, Pivotal/RabbitMQ

Very nice news. Thank you very much.

Con 
Reply all
Reply to author
Forward
0 new messages