Experiencing high memory use immediately after starting Rabbit service (Windows) that never resolves

537 views
Skip to first unread message

Ryan Zink

unread,
Aug 11, 2016, 7:04:48 PM8/11/16
to rabbitmq-users
I have a RabbitMQ Windows server where the memory usage grows immediately to the configured memory limit and then stops accepting connections with the "Publishers will be blocked until this alarm clears" log entry. It takes about 2 minutes for the system to go from service start to 90% memory saturation.

I think I have ruled out the Rabbit install and configuration (I'm using the default rabbit config file) by uninstalling both Erlang and Rabbit and upgrading both. I see the same results when the new Rabbit install attempts to start with the existing mnesia database.

Here's all I get in the RabbitMQ log:

=INFO REPORT==== 11-Aug-2016::11:25:09 ===
Starting RabbitMQ 3.5.4 on Erlang 17.5
Copyright (C) 2007-2015 Pivotal Software, Inc.
Licensed under the MPL.  See http://www.rabbitmq.com/

=INFO REPORT==== 11-Aug-2016::11:25:09 ===
node           : rabbit@COMPUTER
home dir       : C:\Windows
config file(s) : e:/RabbitMQ/rabbitmq.config
cookie hash    : /VDlrgwkq54joF+mO+xxrQ==
log            : E:/RabbitMQ/log/rab...@COMPUTER.log
sasl log       : E:/RabbitMQ/log/rab...@COMPUTER-sasl.log
database dir   : e:/RabbitMQ/db/rabbit@COMPUTER-mnesia

=WARNING REPORT==== 11-Aug-2016::11:25:09 ===
Kernel poll (epoll, kqueue, etc) is disabled. Throughput and CPU utilization may worsen.

=INFO REPORT==== 11-Aug-2016::11:25:10 ===
Memory limit set to 26213MB of 32767MB total.

=INFO REPORT==== 11-Aug-2016::11:25:10 ===
Disk free limit set to 50MB

=INFO REPORT==== 11-Aug-2016::11:25:10 ===
Limiting to approx 8092 file handles (7280 sockets)

=INFO REPORT==== 11-Aug-2016::11:25:10 ===
Priority queues enabled, real BQ is rabbit_variable_queue

=INFO REPORT==== 11-Aug-2016::11:25:10 ===
Management plugin: using rates mode 'basic'

=INFO REPORT==== 11-Aug-2016::11:25:11 ===
msg_store_transient: using rabbit_msg_store_ets_index to provide index

=INFO REPORT==== 11-Aug-2016::11:25:11 ===
msg_store_persistent: using rabbit_msg_store_ets_index to provide index

=WARNING REPORT==== 11-Aug-2016::11:25:11 ===
msg_store_persistent: rebuilding indices from scratch

=INFO REPORT==== 11-Aug-2016::11:27:12 ===
vm_memory_high_watermark set. Memory used:27499918896 allowed:27487361433

=WARNING REPORT==== 11-Aug-2016::11:27:32 ===
memory resource limit alarm set on node rabbit@COMPUTER.

**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************

And the alarm never clears and memory usage stays maxed out.

Attempting to perform any rabbitmqctl commands crashes the instance of erl.exe that opens, so I can't easily query state when the node gets to this point.

Is there some way to tell what is causing this resource utilization and how to resolve it so the service can start? The mnesia database is not that large... it is around 8 GB, and the available memory on the machine is 32GB (with vm_memory_high_watermark set to .8).

Michael Klishin

unread,
Aug 12, 2016, 3:36:50 AM8/12/16
to rabbitm...@googlegroups.com
Hi Ryan,

There is and it is being mentioned quite regularly on this list: `rabbitmqctl status` and rabbitmq-top.

You run a version that forces in-process file system cache and has other issues that can lead to a lot of
data being loaded on node boot, in particular if it has to sync from other cluster members.


Consider upgrading to at least 3.5.7.


On Fri, Aug 12, 2016 at 2:04 AM, Ryan Zink <ryan...@gmail.com> wrote:
I have a RabbitMQ Windows server where the memory usage grows immediately to the configured memory limit and then stops accepting connections with the "Publishers will be blocked until this alarm clears" log entry. It takes about 2 minutes for the system to go from service start to 90% memory saturation.

I think I have ruled out the Rabbit install and configuration (I'm using the default rabbit config file) by uninstalling both Erlang and Rabbit and upgrading both. I see the same results when the new Rabbit install attempts to start with the existing mnesia database.

Here's all I get in the RabbitMQ log:

=INFO REPORT==== 11-Aug-2016::11:25:09 ===
Starting RabbitMQ 3.5.4 on Erlang 17.5
Copyright (C) 2007-2015 Pivotal Software, Inc.
Licensed under the MPL.  See http://www.rabbitmq.com/

=INFO REPORT==== 11-Aug-2016::11:25:09 ===
node           : rabbit@COMPUTER
home dir       : C:\Windows
config file(s) : e:/RabbitMQ/rabbitmq.config
cookie hash    : /VDlrgwkq54joF+mO+xxrQ==
log            : E:/RabbitMQ/log/rabbit@COMPUTER.log
sasl log       : E:/RabbitMQ/log/rabbit@COMPUTER-sasl.log

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Ryan Zink

unread,
Aug 12, 2016, 8:06:16 AM8/12/16
to rabbitmq-users
Hey Michael,

Thanks for the response.

A couple of issues here... because the memory usage ramps up so quickly, we are never able to load and get into the management plugin, so I doubt the top plugin would be able to help us out. Also, all calls to rabbitmqctl crash.

We have tried upgrading to 3.6.2, with nearly identical results -- memory immediately going to maximum and being throttled. I also tried setting up the lazy queues policy on all queues to see if that would help with the memory usage on startup, with no visible effect.

Do you have any other thoughts on how to resolve this? I'll try to see if I can get into the top plugin quickly while it is ramping up.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Aug 12, 2016, 8:45:04 AM8/12/16
to rabbitm...@googlegroups.com
Skip 3.6.2, go to 3.6.5.

Does `rabbitmqctl status` not produce anything after boot? I don't think I've seen this with any
of the known issues.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Aug 12, 2016, 9:43:59 AM8/12/16
to rabbitm...@googlegroups.com
There was a relevant bug fix in 3.5.3 but you are on a later version :/

What does `rabbitmqctl environment` output?

Ryan Zink

unread,
Aug 12, 2016, 10:43:27 AM8/12/16
to rabbitmq-users
We aren't able to go to 3.6.5, as it is a bit too "bleeding-edge" for us to consider going to production.

I was able to get rabbitmqctl status and rabbitmqctl environment to run, but enabling rabbitmq_top is failing.

Here's the output of rabbitmqctl environment:
Application environment of node rabbit@COMPUTER ...
[{amqp_client,[{prefer_ipv6,false},{ssl_options,[]}]},
 {compiler,[]},
 {crypto,[]},
 {inets,[]},
 {kernel,[{error_logger,tty},
          {inet_default_connect_options,[{nodelay,true}]},
          {inet_dist_listen_max,25672},
          {inet_dist_listen_min,25672}]},
 {mnesia,[{dc_dump_limit,40},
          {dir,"E:/RabbitMQ/db/rabbit@COMPUTER-mnesia"},
          {dump_log_write_threshold,2000}]},
 {os_mon,[{start_cpu_sup,false},
          {start_disksup,false},
          {start_memsup,false},
          {start_os_sup,false}]},
 {rabbit_common,[]},
 {ranch,[]},
 {sasl,[{errlog_type,error},{sasl_error_logger,false}]},
 {stdlib,[]},
 {syntax_tools,[]},
 {xmerl,[]}]

And here's rabbitmqctl status:
Status of node rabbit@COMPUTER ...
[{pid,6052},
 {running_applications,[{amqp_client,"RabbitMQ AMQP Client","3.6.2"},
                        {rabbit_common,[],"3.6.2"},
                        {os_mon,"CPO  CXC 138 46","2.4"},
                        {compiler,"ERTS  CXC 138 10","6.0"},
                        {syntax_tools,"Syntax tools","1.7"},
                        {inets,"INETS  CXC 138 49","6.0"},
                        {crypto,"CRYPTO","3.6"},
                        {xmerl,"XML parser","1.3.8"},
                        {mnesia,"MNESIA  CXC 138 12","4.13"},
                        {ranch,"Socket acceptor pool for TCP protocols.",
                               "1.2.1"},
                        {sasl,"SASL  CXC 138 11","2.5"},
                        {stdlib,"ERTS  CXC 138 10","2.5"},
                        {kernel,"ERTS  CXC 138 10","4.0"}]},
 {os,{win32,nt}},
 {erlang_version,"Erlang/OTP 18 [erts-7.0] [64-bit] [smp:8:8] [async-threads:64]
\n"},
 {memory,[{total,219835920},
          {connection_readers,0},
          {connection_writers,0},
          {connection_channels,0},
          {connection_other,2712},
          {queue_procs,0},
          {queue_slave_procs,0},
          {plugins,0},
          {other_proc,37560560},
          {mnesia,1854968},
          {mgmt_db,0},
          {msg_index,465184},
          {other_ets,1593840},
          {binary,140513680},
          {code,27746983},
          {atom,992409},
          {other_system,9105584}]},
 {alarms,[]},
 {listeners,[]},
 {vm_memory_high_watermark,0.8},
 {vm_memory_limit,27487361433},
 {disk_free_limit,50000000},
 {disk_free,24806293504},
 {file_descriptors,[{total_limit,32668},
                    {total_used,16},
                    {sockets_limit,29399},
                    {sockets_used,0}]},
 {processes,[{limit,1048576},{used,144}]},
 {run_queue,0},
 {uptime,36},
 {kernel,{net_ticktime,60}}]

I am only able to get status and environment calls to run immediately after starting the service. After that, memory usage goes through the roof and all future calls fail:

Michael Klishin

unread,
Aug 12, 2016, 10:47:02 AM8/12/16
to rabbitm...@googlegroups.com
Well, RabbitMQ wasn't even running at the time `rabbitmqctl status` was executed.
What does your config look like?

I see no justification for staying on a 3.5.x version that's not 3.5.7.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Aug 12, 2016, 10:49:50 AM8/12/16
to rabbitm...@googlegroups.com
Err, so this is with 3.6.2 but one of the earlier logs says 3.5.4. So what version are you on?

We *highly* recommend avoiding 3.6.2 as it has a really bad mirror failover bug and 2 known memory leaks
(that are slow and cannot consume 20 GB in a minute).

So this sounds like a run-away on disk data loading but you claim that there isn't enough
data on disk to consume 32 GB. I'm out of ideas then.

Ryan Zink

unread,
Aug 12, 2016, 10:52:08 AM8/12/16
to rabbitmq-users
We're already planning on moving from 3.5.4 to 3.6.2. This issue became apparent when we did the upgrade in a dev environment.

The config file is pretty standard... it looks like this:
[
{rabbit,
[
{log_levels, [{connection, error}, {channel, error}]},
{vm_memory_high_watermark, 0.8},
{vm_memory_high_watermark_paging_ratio, 0.75},
{collect_statistics_interval, 60000}
]
},
{mnesia, 
[
{dump_log_write_threshold, 2000}, 
{dc_dump_limit, 40}
]
},
{rabbitmq_management,
[
{rates_mode, basic}
]
}
].

When trying to run rabbitmqctl status after the node has reached high memory usage, all I get is the nodedown error.

Michael Klishin

unread,
Aug 12, 2016, 10:55:52 AM8/12/16
to rabbitm...@googlegroups.com
Again, I'd like to strongly advice against using 3.6.2 even in development environments. Use 3.6.3 or 3.6.4 (in case
Erlang 19.0 support or /api/overview rates aren't important) if 3.6.5 is too "cutting edge".

What's the size of the node data directory? Does it start if you lower VM memory watermark (even if an alarm goes into effect)?

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Aug 12, 2016, 10:56:36 AM8/12/16
to rabbitm...@googlegroups.com
Another thing to try is Erlang 18.3.4, since this is in a development environment, shouldn't take too long to compare.

Michael Klishin

unread,
Aug 12, 2016, 11:00:35 AM8/12/16
to rabbitm...@googlegroups.com
Erlang has an app called Observer which is very similar to e.g. Visual VM. It provides memory
usage breakdown for individual processes.

It can be started with

rabbitmqctl eval 'observer:start().'

and require Erlang to have WxWidgets support compiled. I haven't tried it on Windows.

Ryan Zink

unread,
Aug 12, 2016, 11:32:35 AM8/12/16
to rabbitmq-users
What specific changes in 3.6.3 or 3.6.4 or a newer version of Erlang would improve this? I see 3.6.3 includes rabbitmq-top, which will simplify our packaging. Is there a critical bug in 3.6.2 that would cause you not to recommend its promotion to production?

Michael Klishin

unread,
Aug 12, 2016, 12:01:23 PM8/12/16
to rabbitm...@googlegroups.com
server#812.

Ryan Zink

unread,
Aug 12, 2016, 1:43:10 PM8/12/16
to rabbitmq-users
OK, an upgrade to 3.6.3 had no impact.

Ryan Zink

unread,
Aug 12, 2016, 2:22:17 PM8/12/16
to rabbitmq-users
When the service is stopped, the only data in the mnesia database is in the queues folder, where there is 8.66 GB of data.

I dropped the config file memory watermark down to .4 and still see the memory usage creep up to 80% after a service restart. Is a service reinstall required to make those changes take effect?

Michael Klishin

unread,
Aug 13, 2016, 12:48:19 AM8/13/16
to rabbitm...@googlegroups.com
Can you please provide a directory listing for your database directory?
There certainly should be stuff other than the message store directories.

Am answer to your server reconfiguration question is available in the docs.
Reply all
Reply to author
Forward
0 new messages