Rabbit MQTT memory exhaustion

309 views
Skip to first unread message

Victor Martin

unread,
May 30, 2019, 6:51:50 AM5/30/19
to rabbitm...@googlegroups.com
Hi, 

We have a dedicated windows server running RabbitMQ 3.7.15 / Erlang 22.0 handling around 9k MQTT connections.

Every other day it crashes with message "ehap_alloc: Cannot allocate 18446744073471590968 bytes of memory (of type "heap" ... ". Seems there is a memory leak somewhere:

Captura de pantalla 2019-05-30 a las 9.54.21.png

We've tried the "tuning for large number of connections" in MQTT plugin config but it hasn't improved:

{tcp_listen_options, [
 {backlog,       4096},
 {nodelay,       true},
 {linger,        {true,0}},
 {exit_on_close, false},
 {buffer,        1024},
 {sndbuf,        1024},
 {recbuf,        1024}
]}

And we enabled GC collectors with no luck either:

 {background_gc_enabled, true},
 {background_gc_target_interval, 60000}

This is how the memory looks like on UI:

Captura de pantalla 2019-05-30 a las 12.50.26.png

Any ideas on how to reduce memory consumption?

Thanks,
Victor

Luke Bakken

unread,
May 30, 2019, 10:24:46 AM5/30/19
to rabbitmq-users
Hi Victor,

The memory graph screenshot you posted is very hard to read due to its size. It looks like I can see "26GiB plugins" in the output, is that correct?

If that is the case, can you elaborate on how you're using MQTT? Large messages? Are your consumers keeping up?

As the memory increases, if you disable the MQTT plugin does memory usage drop and possibly return to normal?

Thanks -
Luke

Victor Martin

unread,
May 30, 2019, 11:14:45 AM5/30/19
to rabbitm...@googlegroups.com
Yes, 

Consumers are keeping up great, no messages are piling up on any queue. 

We use MQTT to process messages sent by 9k IoT devices. These messages are very slow (<1k each), and the rate is maybe 10 per minute on average. Find attached the JSON info about one of this 9k mqtt clients, looks pretty good to me (file queue_info.json). 

This is the original screenshot in a better resolution:

Captura de pantalla 2019-05-30 a las 12.50.26.png
And here is how it looks like now (8 hours later). See how memory consumption is growing steadily (+10Gb):

Captura de pantalla 2019-05-30 a las 17.10.25.png



Potentially related to this, we've seen that web UI gets frozen when accessing to queue section (I guess due to the 9k queues there). Same thing happens when query /api/queues REST API which we do every few seconds to see what devices have lost connectivity with rabbitmq... Could this be related to the memory issue?

Thanks,
Victor


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/c91abde4-7139-4ba8-b2d4-f885930fccd2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
queue_info.json

Luke Bakken

unread,
May 30, 2019, 11:33:28 AM5/30/19
to rabbitmq-users
Hi Victor,

Thanks for providing those images. I suggest disabling your /api/queues query for a day or so to see if that stabilizes memory usage.

Also, could you run this command and provide the output? If running the command reduces memory use, please let us know.

rabbitmqctl eval 'recon:bin_leak(10).'

Thanks,
Luke

On Thursday, May 30, 2019 at 8:14:45 AM UTC-7, Victor Martin wrote:
Yes, 

Consumers are keeping up great, no messages are piling up on any queue. 

We use MQTT to process messages sent by 9k IoT devices. These messages are very slow (<1k each), and the rate is maybe 10 per minute on average. Find attached the JSON info about one of this 9k mqtt clients, looks pretty good to me (file queue_info.json). 

Victor Martin

unread,
May 30, 2019, 11:49:40 AM5/30/19
to rabbitm...@googlegroups.com
Hi Luke, 

Thanks for your help. This is the output for the bin_leak command:

PS C:\Program Files\RabbitMQ Server\rabbitmq_server-3.7.15\sbin> .\rabbitmqctl eval 'recon:bin_leak(10).'
[{<10308.1792.0>,-2587656,
  [rabbit_mgmt_db_cache_queues,
   {current_function,{orddict,find,2}},
   {initial_call,{proc_lib,init_p,5}}]},
 {<10308.534.0>,-39403,
  [channel_queue_exchange_metrics_metrics_collector,
   {current_function,{gen_server,loop,7}},
   {initial_call,{proc_lib,init_p,5}}]},
 {<10308.539.0>,-33256,
  [queue_metrics_metrics_collector,
   {current_function,{gen_server,loop,7}},
   {initial_call,{proc_lib,init_p,5}}]},
 {<10308.540.0>,-26995,
  [queue_coarse_metrics_metrics_collector,
   {current_function,{gen_server,loop,7}},
   {initial_call,{proc_lib,init_p,5}}]},
 {<10308.536.0>,-17813,
  [channel_exchange_metrics_metrics_collector,
   {current_function,{gen_server,loop,7}},
   {initial_call,{proc_lib,init_p,5}}]},
 {<10308.535.0>,-12606,
  [channel_queue_metrics_metrics_collector,
   {current_function,{gen_server,loop,7}},
   {initial_call,{proc_lib,init_p,5}}]},
 {<10308.666.0>,-8000,
  [{current_function,{gen_server2,process_next_msg,1}},
   {initial_call,{proc_lib,init_p,5}}]},
 {<10308.23971.0>,-1979,
  [{current_function,{gen_server2,process_next_msg,1}},
   {initial_call,{proc_lib,init_p,5}}]},
 {<10308.473.0>,-585,
  [rabbit_web_dispatch_registry,
   {current_function,{gen_server,loop,7}},
   {initial_call,{proc_lib,init_p,5}}]},
 {<10308.23974.0>,-508,
  [{current_function,{gen_server2,process_next_msg,1}},
   {initial_call,{proc_lib,init_p,5}}]}]
PS C:\Program Files\RabbitMQ Server\rabbitmq_server-3.7.15\sbin>

Seems it has freed around 10Gb:

Captura de pantalla 2019-05-30 a las 17.45.37.png

We could run a scheduled task to call it every say 2h if that makes sense.

Thanks,
Victor

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

Luke Bakken

unread,
May 30, 2019, 12:35:25 PM5/30/19
to rabbitmq-users
Hi Victor,

You can see that the command freed 2587656 binaries in the rabbit_mgmt_db_cache_queues cache. This cache is used to speed up the /api/queues query, but in your case it is causing issues because of the number of queues you have and how long it takes for the query to run. The way the cache works is that it times how long it takes to fetch the data, and then caches the value with an expiration timeout related to how long it takes to fetch the data. The key for the cache is the list of queues, which in your case is probably changing, which compounds the problem since that invalidates the cache.

You can disable this caching by using the attached advanced.config file. Please note that this will affect how long your queue queries take, so I suggest running the /api/queues query much less frequently. If you can track connectivity some other way that would be preferable.


Thanks,
Luke
We could run a scheduled task to call it every say 2h if that makes sense.

Thanks,
Victor
On Thu, May 30, 2019 at 5:33 PM Luke Bakken <lba...@pivotal.io> wrote:
Hi Victor,

Thanks for providing those images. I suggest disabling your /api/queues query for a day or so to see if that stabilizes memory usage.

Also, could you run this command and provide the output? If running the command reduces memory use, please let us know.

rabbitmqctl eval 'recon:bin_leak(10).'

Thanks,
Luke

On Thursday, May 30, 2019 at 8:14:45 AM UTC-7, Victor Martin wrote:
Yes, 

Consumers are keeping up great, no messages are piling up on any queue. 

We use MQTT to process messages sent by 9k IoT devices. These messages are very slow (<1k each), and the rate is maybe 10 per minute on average. Find attached the JSON info about one of this 9k mqtt clients, looks pretty good to me (file queue_info.json). 

Potentially related to this, we've seen that web UI gets frozen when accessing to queue section (I guess due to the 9k queues there). Same thing happens when query /api/queues REST API which we do every few seconds to see what devices have lost connectivity with rabbitmq... Could this be related to the memory issue?

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
advanced.config

Victor Martin

unread,
May 30, 2019, 12:43:15 PM5/30/19
to rabbitm...@googlegroups.com
Hi Luke, 

Thank you very much for your help. We'll try to disable this caching and find another way to check connectivity. 

Best,
Victor

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/b0e5015f-6aea-4615-b163-7f2c6e44e9a5%40googlegroups.com.

Victor Martin

unread,
Jun 3, 2019, 10:45:49 AM6/3/19
to rabbitm...@googlegroups.com
Hi, 

Unfortunately setting management_db_cache_multiplier = 0 didn't help with erl.exe memory consumption:

Captura de pantalla 2019-06-03 a las 16.42.27.png

Any other ideas on where to look? We stopped using web UI and querying REST API but memory consumption is still sky-rocketed 

Victor

Luke Bakken

unread,
Jun 3, 2019, 10:51:31 AM6/3/19
to rabbitmq-users
Hi Victor,

I am assuming that when you re-visit the management UI that the "plugins" memory stat still shows the greatest value. Could you take another screenshot of that part of the UI?

It would be very helpful to re-run this command during a memory spike and provide the output:

rabbitmqctl eval 'recon:bin_leak(10).'

The reason being that the output of that command tells us what is taking up memory.

Could you also run this command, redirected to a file, and attach the file to your response? (don't paste the output)

rabbitmqctl environment

Finally, do you monitor other RabbitMQ statistics? https://www.rabbitmq.com/monitoring.html

We may be able to correlate these memory spikes with some other statistic, like queue length.

On Monday, June 3, 2019 at 7:45:49 AM UTC-7, Victor Martin wrote:
Hi, 

Unfortunately setting management_db_cache_multiplier = 0 didn't help with erl.exe memory consumption:
Any other ideas on where to look? We stopped using web UI and querying REST API but memory consumption is still sky-rocketed 

Victor

Victor Martin

unread,
Jun 3, 2019, 11:08:08 AM6/3/19
to rabbitm...@googlegroups.com
Hi Luke, 

Thanks again for your help, very much appreciated :)

This is how the memory looked like BEFORE running rabbitmqctl eval 'recon:bin_leak(10).' command:

Captura de pantalla 2019-06-03 a las 16.54.36.png
and this is how it looks like right AFTER executing the command (output attached):

Captura de pantalla 2019-06-03 a las 17.00.51.png
Please find attached the rabbitmqctl environment as requested. We do monitor RabbitMQ, and as far as we can tell traffic is pretty much stable, not correlated at all with memory consumption:

Captura de pantalla 2019-06-03 a las 17.04.00.png

Thanks,
Victor

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
rabbit_binleak_output.txt
rabbitmq_environment.txt

Luke Bakken

unread,
Jun 3, 2019, 11:37:33 AM6/3/19
to rabbitmq-users
Hi Victor,

Yes it still appears that the managment database is taking up this memory. Is it possible for you to run this environment with the rabbitmq_management plugin disabled for a while? It would be great to confirm that as the source of this issue.

Victor Martin

unread,
Jun 3, 2019, 11:43:49 AM6/3/19
to rabbitm...@googlegroups.com
Yes, will disable the management plugin for a few days and let you know

Best,
Victor

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

Victor Martin

unread,
Jun 5, 2019, 9:39:58 AM6/5/19
to rabbitm...@googlegroups.com
Just for the record, disabling rabbitmq_management plugin has been the solution for the huge memory consumption. 

After disabling the plugin, memory is super stable and even CPU usage has been vastly improved:

Captura de pantalla 2019-06-05 a las 15.12.35.png

The only downside is that we need to change monitoring.. we're trying with rabbit CLR as explained in https://www.rabbitmq.com/monitoring.html but that's a different story :)


Thanks so much Luke for you world-class support!

Cheers,
Victor

Luke Bakken

unread,
Jun 5, 2019, 11:30:44 AM6/5/19
to rabbitmq-users
Hi Victor,

We appreciate you testing this out and following up.

Just to be certain - you see this memory growth even if nothing is querying the management plugin's data, via either the management UI or api, correct?

If that's the case I'm surprised my this memory growth. Do you have a lot of "churn" in your system - new queues, bindings, MQTT connections created / destroyed frequently? Anything else you can think of? I'm trying to figure out how I could reproduce this.

Is this a single-node environment? It seems that way but I would like to confirm.

Thanks -
Luke

On Wednesday, June 5, 2019 at 6:39:58 AM UTC-7, Victor Martin wrote:
Just for the record, disabling rabbitmq_management plugin has been the solution for the huge memory consumption. fter disabling the plugin, memory is super stable and even CPU usage has been vastly improved
Reply all
Reply to author
Forward
0 new messages