Understanding memory usage

354 views
Skip to first unread message

Joshua Rendek

unread,
Sep 11, 2016, 8:29:44 PM9/11/16
to rabbitmq-users
I have a 3 node cluster ( 8 cores each, 62 g ram each ).

After about a day or two the main node (in this case rmq-1) starts balooning in memory with nothing apparent on my end on what to tweak to reduce (when the queue page loads, the memory per queue is usually very low, under 500mb since its lazy queues).

All queues are using lazy mode. Stats DB is on a different server, and all servers have this config:

[
{rabbit, [ {vm_memory_high_watermark, 0.65}, { collect_statistics_interval, 60000 } ]
}
].

For some background on the type of tasks: We have nightly jobs that kick off data to be processed - and also other type of batch jobs that get sent to RMQ and then processed (sometimes leaving 20-30 million messages in the system).

I've attached 3 screen shots with details for the admin dashboard, node stats, and memory detail pages. 

What I don't understand is why there is a binary section taking up 32G of ram (and then split again at the references into 16G of queues and 16G of system/other) -- it appears this is from RMQ internals or Erlang  (from the material I have been able to find on the binary memory) - is this something I can control? Force more aggressive garbage collection? 

The solution right now is to just restart the node when it hits the memory watermark and I'd like to find a more stable solution.

Let me know if I can provide any more information to help diagnose or troubleshoot this. Thanks in advance.

- Josh



memory_detail.png
node_stats.png
dashboard.png

Michael Klishin

unread,
Sep 12, 2016, 5:34:40 AM9/12/16
to rabbitm...@googlegroups.com
Binary memory is primarily message payloads (this is an oversimplification but still). It is reported by
the runtime and we cannot get a more detailed breakdown. Some processes either hold messages
in memory (with lazy queues, this means they were loaded for delivery) or still holds references
to message payloads.

We have reproduced something similar earlier today when consumers are being very very slow.

If that's not the case, it was reported previously that increasing rabbit.credit_flow_default_credit [1]
to {2000, 100} or {2000, 125} might help.

In your case 16 GB is used by queue processes, which definitely suggests that there are messages
loaded from disk and held in memory.

See consumer utilization, it can be that your consumers do not keep up


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Joshua Rendek

unread,
Sep 12, 2016, 8:16:18 AM9/12/16
to rabbitmq-users
Are there other settings related to HA that could trigger the memory? I'm able to toggle this behavior by modifying the queue policy for the large queues:

No memory bloat:
rabbitmqctl set_policy --priority 1 - vh QPol "ha.qnamel" '{"queue-mode":"lazy"}' --apply-to all

Memory bloat:
rabbitmqctl set_policy -p vh Lazy "^ha\." '{"queue-mode":"lazy", "ha-mode":"all", "ha-sync-mode": "automatic"}' --apply-to all

While I do have consumers that aren't able to keep up as its published - the ideas is to use it as a work queue to finish as fast as the workers are able to (other constraints downstream like another postgres database are an issue). Is the idea with RMQ that you should never publish quicker than you can consume?

I will try those settings if this re-occurs again for default credit flow.

Thanks,
Josh
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Sep 12, 2016, 8:32:35 AM9/12/16
to rabbitm...@googlegroups.com
The idea is that if publishers outpace consumers you
have 4 options:

1. Limit queue length or use TTL 
2. Throttle publishers 
3. Move things to disk and worry about not affecting throughput 
4. Some combination of the above 

RabbitMQ has features that cover all of the above. We may make lazy queues more efficient over time but each option has downsides.

We are aware of a couple of issues that seem relevant.

Joshua Rendek

unread,
Sep 12, 2016, 8:36:23 AM9/12/16
to rabbitmq-users
We are aware of a couple of issues that seem relevant.
Do you have the issue list by any chance?

It looks like the only option we have if this re-occurs with HA disabled is to use #2 and throttle the publishers then if we're already using Lazy queues? TTL wouldn't be an option since we need all the jobs to process

Michael Klishin

unread,
Sep 12, 2016, 8:53:28 AM9/12/16
to rabbitm...@googlegroups.com
RabbitMQ will throttle publishers automatically when VM memory watermark is reached. You can also try "regular"
mirrored queues or simply over provision consumers.

Noah

unread,
Sep 12, 2016, 12:37:24 PM9/12/16
to rabbitmq-users
Hi, 

We have seen this same case in testing.  lazy queues on 3.6.4 with many (millions) of (small - ~1.5k) messages, that are converted to mirrored queues via ha-mode:all will cause nodes to hit their memory high water marks, then eventually crash.  It was unclear whether the master or the slave queues were causing the memory exhaustion, as both queues are distributed across the cluster in our test setup.  We will try to isolate this in future testing.

Also note that this happens reliably during queue synchronization when no consumers or publishers are connected.


Best,

-N

Once we have a reliable test case we will submit 

Michael Klishin

unread,
Sep 12, 2016, 1:15:37 PM9/12/16
to rabbitm...@googlegroups.com
Master has to load all messages and replicate them when a queue becomes mirrored, regardless of whether
there are any client connections.

Eager queue synchronization can be disabled as documented in http://www.rabbitmq.com/ha.html.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Noah Magram

unread,
Sep 12, 2016, 1:49:33 PM9/12/16
to rabbitm...@googlegroups.com
Master has to load all messages and replicate them when a queue becomes mirrored, regardless of whether
there are any client connections.

I assume that all the messages in the master queue would be read at some point as they are batched and sent to the slave(s) - but are you saying that all the messages are loaded at once instead of in batches?  

Also once the node reaches this state, the memory is never recovered.


-N

--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/c0Pq4yEavmw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.

Joshua Rendek

unread,
Sep 12, 2016, 1:54:00 PM9/12/16
to rabbitmq-users
Is there any documentation on needing to load all messages? The doc page you references actually says: "By default queues will synchronise one message at a time but since RabbitMQ 3.6.0 we can tell masters to synchronise messages in batches."

Michael Klishin

unread,
Sep 12, 2016, 2:02:39 PM9/12/16
to rabbitm...@googlegroups.com
"all messages" doesn't mean "all messages at once". It can be done one message at a time or in batches of N
but it still has to be all messages the master has. Which potentially means loading tens of GBs of data.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Sep 12, 2016, 2:03:24 PM9/12/16
to rabbitm...@googlegroups.com
No, I am not saying that they are loaded all at once but they can still be loaded
faster than they are sent and acknowledged, or even attempted to be sent.

Josh Rendek

unread,
Sep 12, 2016, 2:10:15 PM9/12/16
to rabbitm...@googlegroups.com
If the master is able to batch them quicker than they are received then
how would changing batch sizes alleviate that? It seems like there
should be some flow control for the internal syncing, or am I
misunderstanding something?

Thanks again for your help so far

On 9/12/16 2:03 PM, Michael Klishin wrote:
> No, I am not saying that they are loaded all at once but they can still
> be loaded
> faster than they are sent and acknowledged, or even attempted to be sent.
>
> On Mon, Sep 12, 2016 at 8:49 PM, Noah Magram <noah....@gmail.com
> <mailto:noah....@gmail.com>> wrote:
>
> > Master has to load all messages and replicate them when a queue
> becomes mirrored, regardless of whether
> there are any client connections.
>
> I assume that all the messages in the master queue would be read at
> some point as they are batched and sent to the slave(s) - but are
> you saying that all the messages are loaded at once instead of in
> batches?
>
> Also once the node reaches this state, the memory is never recovered.
>
>
> -N
>
> On Mon, Sep 12, 2016 at 10:14 AM, Michael Klishin
> <mkli...@pivotal.io <mailto:mkli...@pivotal.io>> wrote:
>
> Master has to load all messages and replicate them when a queue
> becomes mirrored, regardless of whether
> there are any client connections.
>
> Eager queue synchronization can be disabled as documented
> in http://www.rabbitmq.com/ha.html
> <http://www.rabbitmq.com/ha.html>.
>
> On Mon, Sep 12, 2016 at 7:37 PM, Noah <noah....@gmail.com
> <mailto:noah....@gmail.com>> wrote:
>
> Hi,
>
> We have seen this same case in testing. lazy queues on
> 3.6.4 with many (millions) of (small - ~1.5k) messages, that
> are converted to mirrored queues via ha-mode:all will cause
> nodes to hit their memory high water marks, then eventually
> crash. It was unclear whether the master or the slave
> queues were causing the memory exhaustion, as both queues
> are distributed across the cluster in our test setup. We
> will try to isolate this in future testing.
>
> Also note that this happens reliably during queue
> synchronization /when no consumers or publishers are connected/.
> <https://www.rabbitmq.com/blog/2014/04/14/finding-bottlenecks-with-rabbitmq-3-3/>.
>
> 1. https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit.app.src#L101
> <https://groups.google.com/d/optout>.
>
>
>
>
> --
> MK
>
> Staff Software Engineer, Pivotal/RabbitMQ
>
> --
> You received this message because you are subscribed to the
> Google Groups "rabbitmq-users" group.
> To unsubscribe from this group and stop receiving emails
> from it, send an email to
> rabbitmq-user...@googlegroups.com
> <mailto:rabbitmq-user...@googlegroups.com>.
> To post to this group, send email to
> rabbitm...@googlegroups.com
> <mailto:rabbitm...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
>
>
> --
> MK
>
> Staff Software Engineer, Pivotal/RabbitMQ
>
> --
> You received this message because you are subscribed to a topic
> in the Google Groups "rabbitmq-users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/rabbitmq-users/c0Pq4yEavmw/unsubscribe
> <https://groups.google.com/d/topic/rabbitmq-users/c0Pq4yEavmw/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email
> to rabbitmq-user...@googlegroups.com
> <mailto:rabbitmq-user...@googlegroups.com>.
> To post to this group, send email to
> rabbitm...@googlegroups.com
> <mailto:rabbitm...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "rabbitmq-users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to rabbitmq-user...@googlegroups.com
> <mailto:rabbitmq-user...@googlegroups.com>.
> To post to this group, send email to rabbitm...@googlegroups.com
> <mailto:rabbitm...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
>
>
> --
> MK
>
> Staff Software Engineer, Pivotal/RabbitMQ
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "rabbitmq-users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/rabbitmq-users/c0Pq4yEavmw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> rabbitmq-user...@googlegroups.com
> <mailto:rabbitmq-user...@googlegroups.com>.
> To post to this group, send email to rabbitm...@googlegroups.com
> <mailto:rabbitm...@googlegroups.com>.

Michael Klishin

unread,
Sep 12, 2016, 2:16:02 PM9/12/16
to rabbitm...@googlegroups.com
Changing batch size won't eliminate the problem (maybe only delay it). Disabling eager sync will.

Flow control applied in many areas internally, so it would be great to have a way to reproduce this.


>             To post to this group, send email to
>             rabbitmq-users@googlegroups.com
>             <mailto:rabbitmq-users@googlegroups.com>.

>             For more options, visit https://groups.google.com/d/optout
>             <https://groups.google.com/d/optout>.
>
>
>
>
>         --
>         MK
>
>         Staff Software Engineer, Pivotal/RabbitMQ
>
>         --
>         You received this message because you are subscribed to a topic
>         in the Google Groups "rabbitmq-users" group.
>         To unsubscribe from this topic, visit
>         https://groups.google.com/d/topic/rabbitmq-users/c0Pq4yEavmw/unsubscribe
>         <https://groups.google.com/d/topic/rabbitmq-users/c0Pq4yEavmw/unsubscribe>.
>         To unsubscribe from this group and all its topics, send an email

>         To post to this group, send email to

>         For more options, visit https://groups.google.com/d/optout
>         <https://groups.google.com/d/optout>.
>
>
>     --
>     You received this message because you are subscribed to the Google
>     Groups "rabbitmq-users" group.
>     To unsubscribe from this group and stop receiving emails from it,
>     send an email to rabbitmq-users+unsubscribe@googlegroups.com
>     <mailto:rabbitmq-users+unsub...@googlegroups.com>.
>     To post to this group, send email to rabbitmq-users@googlegroups.com
>     <mailto:rabbitmq-users@googlegroups.com>.

>     For more options, visit https://groups.google.com/d/optout
>     <https://groups.google.com/d/optout>.
>
>
>
>
> --
> MK
>
> Staff Software Engineer, Pivotal/RabbitMQ
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "rabbitmq-users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/rabbitmq-users/c0Pq4yEavmw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send an email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Noah Magram

unread,
Sep 12, 2016, 2:35:34 PM9/12/16
to rabbitm...@googlegroups.com

Our test case is pretty straightforward:

1. Set up 3 node cluster
2. Create lazy queue, no HA policy
3. Shut down consumer, allow 10M messages to accrue in queue
4. Shut down publisher
5. Apply ha-mode: all policy
6. Watch memory usage climb on nodes(?) during sync and never recover


>             <mailto:rabbitmq-users+unsubscr...@googlegroups.com>.

>             To post to this group, send email to
>             rabbitmq-users@googlegroups.com
>             <mailto:rabbitmq-users@googlegroups.com>.
>             For more options, visit https://groups.google.com/d/optout
>             <https://groups.google.com/d/optout>.
>
>
>
>
>         --
>         MK
>
>         Staff Software Engineer, Pivotal/RabbitMQ
>
>         --
>         You received this message because you are subscribed to a topic
>         in the Google Groups "rabbitmq-users" group.
>         To unsubscribe from this topic, visit
>         https://groups.google.com/d/topic/rabbitmq-users/c0Pq4yEavmw/unsubscribe
>         <https://groups.google.com/d/topic/rabbitmq-users/c0Pq4yEavmw/unsubscribe>.
>         To unsubscribe from this group and all its topics, send an email
>         to rabbitmq-users+unsubscribe@googlegroups.com

>         To post to this group, send email to
>         rabbitmq-users@googlegroups.com
>         <mailto:rabbitmq-users@googlegroups.com>.
>         For more options, visit https://groups.google.com/d/optout
>         <https://groups.google.com/d/optout>.
>
>
>     --
>     You received this message because you are subscribed to the Google
>     Groups "rabbitmq-users" group.
>     To unsubscribe from this group and stop receiving emails from it,
>     send an email to rabbitmq-users+unsubscribe@googlegroups.com
>     <mailto:rabbitmq-users+unsubscr...@googlegroups.com>.
>     To post to this group, send email to rabbitm...@googlegroups.com

>     <mailto:rabbitmq-users@googlegroups.com>.
>     For more options, visit https://groups.google.com/d/optout
>     <https://groups.google.com/d/optout>.
>
>
>
>
> --
> MK
>
> Staff Software Engineer, Pivotal/RabbitMQ
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "rabbitmq-users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/rabbitmq-users/c0Pq4yEavmw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> rabbitmq-users+unsubscribe@googlegroups.com
> <mailto:rabbitmq-users+unsubscri...@googlegroups.com>.
> To post to this group, send email to rabbitm...@googlegroups.com

> <mailto:rabbitmq-users@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send an email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/c0Pq4yEavmw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

Michael Klishin

unread,
Sep 12, 2016, 3:05:05 PM9/12/16
to rabbitm...@googlegroups.com
Thanks, we will try to reproduce it post [1] which may indirectly fix this
(this is a guess at this point).


To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Noah Magram

unread,
Sep 12, 2016, 3:10:12 PM9/12/16
to rabbitm...@googlegroups.com

Sounds good.

Please also note that in our test case all messages are published with the
persistent bit set, and we are using a topic exchange, if it matters.


To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/c0Pq4yEavmw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages