Can someone help me reconcile memory usage?

556 views
Skip to first unread message

Ryan Moore

unread,
Jul 30, 2018, 7:37:58 PM7/30/18
to rabbitmq-users
I'm having a terrible time getting a RabbitMQ HA config to work properly in Kubernetes. It keeps either blowing through the memory limits I set (which on k8s results in OOMKilled), or blocking/flapping on memory alarms for something that shouldn't be even close to the limit.

This is my entire queue - 269 messages and approx. 22kb. There are 10 celery workers attached, each with their own empty queue.



That node is showing a memory alarm that looks like this:


The usage is consistent, with approximately 1 message being added for each that's pulled off over a few minutes - the 1.2 incoming and 2.4 ack balance out over a longer period. The memory bounces between ~280MB and ~200MB for alarm and clear.

I originally had the memory limit set to 256MB on the container, but (I believe) due to GC it was doubling that and OOMing pretty regularly. It's now set to 512MB, and now isn't OOM but continuously alarms and clears on memory usage.

My config:
    total_memory_available_override_value  = 250MiB
    vm_memory_calculation_strategy
= allocated
    vm_memory_high_watermark
.absolute      = 250MiB
    vm_memory_high_watermark_paging_ratio  
= 0.4
    background_gc_enabled
= true
    background_gc_target_interval
= 60000
    cluster_formation
.k8s.address_type = ip
    cluster_formation
.k8s.host = kubernetes.default.svc.cluster.local
    cluster_formation
.node_cleanup.interval = 3600
    cluster_formation
.node_cleanup.only_log_warning = false
    cluster_formation
.peer_discovery_backend  = rabbit_peer_discovery_k8s
    cluster_partition_handling
= autoheal
    cluster_formation
.randomized_startup_delay_range.min = 0
    cluster_formation
.randomized_startup_delay_range.max = 2
    loopback_users
.guest = false
    queue_master_locator
=min-masters
    management
.load_definitions = /etc/rabbitmq/ha-policy.json


The ha-policy.json looks like this, along with some generic permission/user config:
      "policies": [
       
{
         
"vhost": "/",
         
"name": "ha",
         
"pattern": "",
         
"apply-to": "all",
         
"definition": {
           
"ha-mode": "exactly",
           
"ha-params": 2,
           
"ha-sync-mode": "automatic",
           
"queue-mode": "lazy"
         
},
         
"priority": 0
       
}
     
]


What am I doing wrong, and/or what am I mis-understanding such that < 300 queued messages total < 30K equates to > 256MB of memory used? It shouldn't even be close to 100MB used to trigger the high water mark. I've tried several cheats like the lazy queue and the explicit background GC, all of which should reduce memory overhead, but nothing seems to make a difference.

 - Ryan

Michael Klishin

unread,
Jul 30, 2018, 7:42:51 PM7/30/18
to rabbitm...@googlegroups.com
There is a guide for that [1]. Queue and published message properties also matter
and weren't mentioned. Please help others help you.


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,
Jul 30, 2018, 7:45:32 PM7/30/18
to rabbitm...@googlegroups.com
Also, each client connection by default consumers over 200 KB (most of it is TCP buffers) [1]
regardless of the number of messages enqueued, whether the queues are lazy and so on.

Channels also consume memory. [2] explain how to find out the key contributing area (there's usually just one or two).


To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Ryan Moore

unread,
Jul 30, 2018, 8:42:35 PM7/30/18
to rabbitmq-users
I have 10-20 connections, 200KB * 20 connections == 4MB, this is using 50x that.

I've been through the memory guide and it doesn't help - it tells me "yup, you're using a lot of RAM" but not what is actually using it. I end up with a generic "binaries" answer that doesn't give me any info on what I'm doing wrong. (To be clear: I'm 100% sure I'm doing something wrong, I just don't know what.)

rabbitmqctl status memory section:

 {memory,
     
[{connection_readers,269224},
     
{connection_writers,172672},
     
{connection_channels,885912},
     
{connection_other,462824},
     
{queue_procs,3540912},
     
{queue_slave_procs,914432},
     
{plugins,260088},
     
{other_proc,14944048},
     
{metrics,329008},
     
{mgmt_db,2254080},
     
{mnesia,209232},
     
{other_ets,2330208},
     
{binary,189140848},
     
{msg_index,29232},
     
{code,28660602},
     
{atom,1131721},
     
{other_system,23465029},
     
{allocated_unused,28193400},
     
{reserved_unallocated,0},
     
{strategy,allocated},
     
{total,[{erlang,269000072},{rss,103153664},{allocated,297193472}]}]},


This breakdown is pretty consistent. I'm using "allocated" strategy because the others seem to massively undercount RAM usage as seen from the outside world. Binary is 189MB, so that's my primary usage.

Binaries Most of this section is usually message bodies and properties (metadata).

I'm not sure how that equates to the other page that says I have around 200 messages, is that implying ~1MB of body + metadata per message?

Memory Details breakdown is likewise inscrutable. Binaries == 168MB, but Binary references says "231kb Total referenced binaries" at last update. 



Queues and queue properties say that the total usage should be < 1MB. Largest queue (including largest message body bytes):


Other queues all look about like this:



The only metric that looks a little odd is GC operations and bytes reclaimed, which is showing 600 GC operations and 5MB reclaimed per second pretty much steady-state. That feels high, but I don't know if that's a result of the alarm flapping or if it's expected. CPU usage is reasonable and the background GC running every 60 seconds doesn't throw any noticeable spikes in lag / throughput.

Am I missing any other metrics to be able to debug this, or is there anything I should set in config / flags to get more details?

 - Ryan




To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.

To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Luke Bakken

unread,
Jul 30, 2018, 10:19:59 PM7/30/18
to rabbitmq-users
Hi Ryan,

I re-read your previous messages but I don't see where you mention the RabbitMQ and Erlang version you're using.

Also, can you reproduce this memory usage on a single node outside of k8s?

Using Erlang 20.3.8.2 and RabbitMQ 3.7.7 on OS X, a fresh start of a RMQ server that only has rabbitmq_management enabled and no other config changes or clustering, memory use is about 85 - 90MiB. Running PerfTest with the following args over a long period of time shows that the memory increases only very slowly as expected as messages are queued in memory:

--producers 1 --consumers 1 --rate 2 --consumer-rate 1

So, in your case, I would "start simple" and see what change increases memory usage.

Thanks,
Luke

Ryan Moore

unread,
Jul 30, 2018, 10:57:15 PM7/30/18
to rabbitm...@googlegroups.com
Running rabbitmq:3.7.7 container, so whatever Erlang comes with that.

I'll take a look at single-node, but the entire point of my setup is to get it to run in k8s with hard memory limits. I had a non-HA setup that worked, but I couldn't convince it to run happily in less than a gig of RAM. That's the main reason for all the "optimizations" here, to reduce memory usage.

PerfTest would be a good data point, I'll see what that says when I run without any other producers / consumers on the same config that's blowing up.

 - Ryan



--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/wM5L5Gv0Je8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.

Luke Bakken

unread,
Jul 31, 2018, 8:42:14 AM7/31/18
to rabbitmq-users
Hi Ryan,

For what it's worth, the RabbitMQ core team doesn't maintain or contribute to Docker or other containers for RabbitMQ. Since you're using RMQ 3.7.7 I will also test with Erlang 21 since that version is supported.

I'm very surprised to hear that you had a non-HA setup that required that much RAM. Is your statement ("less than a gig of RAM") referring to what RabbitMQ required or the whole thing, container included?

Either way, I'll see what memory use is like in my environment with two and three nodes.

Thanks,
Luke

Luke Bakken

unread,
Aug 7, 2018, 12:50:34 PM8/7/18
to rabbitmq-users
Hi Ryan,

For what it's worth, memory and process usage with more than one node is hardly more than a single node.

Ryan Moore

unread,
Aug 7, 2018, 3:02:22 PM8/7/18
to rabbitmq-users
I haven't been able to reproduce the workload properly in isolation.

The HA config with no producers or consumers uses about what you indicated, 80-90MB per node, with a small spike when they first sync up.

Running bare perftest (no parameters other than host) chugs along at several thousand messages per second processed. I didn't grab memory usage, but it wasn't much more than baseline, similar to the test below.

Running PerfTest on the HA config with 5 producers and 5 consumers with a rate of 5 each (25 messages/second) for a few minutes will spike up to 120MB on 2 out of 3 nodes (since at-least:2 config for HA), but it doesn't throttle / block on high water. Memory layout is pretty normal, a lot closer to the expected / documented usage than my 90% "generic binaries" usage.

How can I get the content of messages in-flight? I tried rabbitmqadmin get queue=celery but even with 228 messages showing in that queue for rabbitmqadmin list queues I always get "No items".

 - Ryan

Michael Klishin

unread,
Aug 7, 2018, 3:28:53 PM8/7/18
to rabbitm...@googlegroups.com
`rabbitmqadmin` has to be given a virtual host to use.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Aug 7, 2018, 3:29:35 PM8/7/18
to rabbitm...@googlegroups.com
How do we define "in flight" here? There is no way to list messages that have been parsed
but not yet routed anywhere, for example. You can only list what's enqueued.

On Tue, Aug 7, 2018 at 10:02 PM, Ryan Moore <ry...@geekportfolio.com> wrote:

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Ryan Moore

unread,
Aug 7, 2018, 4:56:29 PM8/7/18
to rabbitmq-users
http://www.rabbitmq.com/management-cli.html says I should be able to get a message via
rabbitmqadmin get queue=test requeue=false

In my case it's queue=celery and I don't care about requeue. There is no mention of a vhost - is the documentation incorrect?

root@rabbitmq-2:/# ./rabbitmqadmin get queue=celery
No items

root@rabbitmq-2:/# ./rabbitmqadmin list queues
+--------------------------------------------------+----------+
|                       name                       | messages |
+--------------------------------------------------+----------+
| celery                                           | 226      |
| cel...@celery-REDACTED-g737d.celery.pidbox       | 0        |
| cel...@celery-REDACTED-r35hs.celery.pidbox       | 0        |
| celeryev.REDACTED-4f46-453e-a250-1906d01fafc2    | 0        |
| celeryev.REDACTED-8509-45bf-bd70-3e6e2da0fdf8    | 0        |
+--------------------------------------------------+----------+


I don't know what terminology I should be using - I want to see one of the 226 "messages" shown as in queue celery.

It doesn't matter if it's destructive and consumes / drops the message once it shows it to me, I just want to see what kind of message is being produced such that these 226 messages exhaust a 300MB instance. 

 - Ryan
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Aug 7, 2018, 5:05:58 PM8/7/18
to rabbitm...@googlegroups.com
Doc guides do not cover every single option a CLI tool supports. `rabbitmqadmin --help` provides
a more detailed list.

You can consume the messages and requeue them, either manually or using the Firehose mechanism [1][2].
The latter puts some additional load on the cluster and is not meant to be enabled at all times.


On Tue, Aug 7, 2018 at 11:56 PM, Ryan Moore <ry...@geekportfolio.com> wrote:
http://www.rabbitmq.com/management-cli.html says I should be able to get a message via
rabbitmqadmin get queue=test requeue=false

In my case it's queue=celery and I don't care about requeue. There is no mention of a vhost - is the documentation incorrect?

root@rabbitmq-2:/# ./rabbitmqadmin get queue=celery
No items

root@rabbitmq-2:/# ./rabbitmqadmin list queues
+--------------------------------------------------+----------+
|                       name                       | messages |
+--------------------------------------------------+----------+
| celery                                           | 226      |
| celery@celery-REDACTED-g737d.celery.pidbox       | 0        |
| celery@celery-REDACTED-r35hs.celery.pidbox       | 0        |
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages