RabbitMQ cluster nodes constantly crashing after several hours uptime due to out of memory

Marcus Kröger

unread,

Mar 7, 2016, 11:26:54 AM3/7/16

to rabbitmq-users

Hi

we are running RabbitMQ 3.6.0 with Erlang 18 in a cluster with 2 nodes.

Configuration is as follows:

[{rabbit, [

{cluster_partition_handling, autoheal},

{disk_free_limit, 100000000},

{ssl_listeners, [{"0.0.0.0",50000}]},

{collect_statistics_interval, 10000},

{vm_memory_high_watermark, {absolute, "8000MiB"}},

{vm_memory_high_watermark_paging_ratio, 0.6},

{auth_backends, [rabbit_auth_backend_internal,rabbit_auth_backend_ldap]},

{ssl_options, [

{cacertfile, "/srv/rabbitmq/config/xxx/ssl.crt/xxx.pem"},

{certfile, "/srv/rabbitmq/config/xxx/ssl.crt/xxxcrt"},

{keyfile, "/srv/rabbitmq/config/xxx/ssl.key/xxx.key"},

{verify,verify_peer},

{fail_if_no_peer_cert,true}

]}

]},

{rabbitmq_management, [

{http_log_dir, "/var/log/rabbitmq/xxx/management"},

{listener, [{port, 15672}]},

{redirect_old_port, false},

{rates_mode,none}

]},

].

Even though there is not much load on the system we very often see the message

"The management statistics database currently has a queue of x events to process. If this number keeps increasing, so will the memory used by the management plugin." on the admin gui. The number stays up to 1000 -> 5000 over hours, but then it grows up to 2 million and above. When this is happening the cluster node is not responding anymore. There is basically no change on the load.

We get this issue after we migrated from 3.4.2 to 3.6.0. Before the same setup run for many years withou any issue.

We already set the rates_mode to none, without success. Is there anything else we could look at specifically which could cause this kind of issue?

Michael Klishin

unread,

Mar 7, 2016, 11:29:42 AM3/7/16

to rabbitm...@googlegroups.com, Marcus Kröger

On 7 March 2016 at 19:26:56, Marcus Kröger (marcus....@gmail.com) wrote:
> Is there anything else we could look at specifically which could
> cause this kind of issue?

https://github.com/rabbitmq/rabbitmq-management/issues/41, which will be in 3.6.2.

If you can tell me what type of package you use, I’d be happy to build one from stable branch
and send you off-list.

A 3.6.2 preview release should ship this week.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Marcus Kröger

unread,

Mar 7, 2016, 12:14:14 PM3/7/16

to rabbitmq-users, marcus....@gmail.com

Hi Michael

we are compiling from the source as we are using zLinux on a Mainframe

cheers

Marcus

Marcus Kröger

unread,

Mar 7, 2016, 12:15:36 PM3/7/16

to rabbitmq-users, marcus....@gmail.com

would the disabling of the management plugin be a workaround for now?

We would still be able to collect monitoring data via rabbitmqctl.

cheers

Marcus

On Monday, 7 March 2016 17:29:42 UTC+1, Michael Klishin wrote:

Michael Klishin

unread,

Mar 7, 2016, 12:28:40 PM3/7/16

to rabbitm...@googlegroups.com, marcus....@gmail.com

> On 7 mar 2016, at 20:15, Marcus Kröger <marcus....@gmail.com> wrote:
>
> would the disabling of the management plugin be a workaround for now?

It would. Can you run a generic UNIX binary build?

Marcus Kröger

unread,

Mar 7, 2016, 12:31:08 PM3/7/16

to rabbitmq-users, marcus....@gmail.com

Running a unix binary built in our environment is not possible.

So, by disabling the management plugin we would not run into this bug at all and we could wait for 3.6.2 being officially released?

cheers

Marcus

Marcus Kröger

unread,

Mar 7, 2016, 12:46:18 PM3/7/16

to rabbitmq-users, marcus....@gmail.com

And in addition, is this only happening in a cluster setup?

Michael Klishin

unread,

Mar 7, 2016, 12:52:15 PM3/7/16

to rabbitm...@googlegroups.com, marcus....@gmail.com

> On 7 mar 2016, at 20:31, Marcus Kröger <marcus....@gmail.com> wrote:
>
> So, by disabling the management plugin we would not run into this bug at all and we could wait for 3.6.2 being officially released?

Yes. Or you could build the tip of stable from source
if that's what you do.

Michael Klishin

unread,

Mar 7, 2016, 12:52:51 PM3/7/16

to rabbitm...@googlegroups.com, marcus....@gmail.com

> On 7 mar 2016, at 20:46, Marcus Kröger <marcus....@gmail.com> wrote:
>
> And in addition, is this only happening in a cluster setup?

Please see rabbitmq/rabbitmq-management#41.

Marcus Kröger

unread,

Mar 8, 2016, 4:46:28 AM3/8/16

to rabbitmq-users, marcus....@gmail.com

Hi Michael,

thx for the fast replies. We would need the src package for

rabbitmq-server-generic-unix

We will take this package a compile a server packge ourselves using

Erlang 18.2.1 - http://www.erlang.org/download/otp_src_18.2.1.tar

RabbitMQ 3.6.2 (pre final) - rabbitmq-server-generic-unix-3.6.2.tar.xz

Michael Klishin

unread,

Mar 8, 2016, 4:49:51 AM3/8/16

to rabbitm...@googlegroups.com, Marcus Kröger

On 8 March 2016 at 12:46:31, Marcus Kröger (marcus....@gmail.com) wrote:
> thx for the fast replies. We would need the src package for
>
> rabbitmq-server-generic-unix
>
> We will take this package a compile a server packge ourselves
> using
>
> Erlang 18.2.1 - http://www.erlang.org/download/otp_src_18.2.1.tar
> RabbitMQ 3.6.2 (pre final) - rabbitmq-server-generic-unix-3.6.2.tar.xz

Why do you need to build your own binary package if I may ask?

Marcus Kröger

unread,

Mar 8, 2016, 5:11:49 AM3/8/16

to rabbitmq-users, marcus....@gmail.com

Hi

because we run on a mainframe using

SUSE Linux Enterprise Server 11 (s390x)

VERSION = 11

PATCHLEVEL = 3

with kernel

"Linux 3.0.101-0.31-default s390x s390x s390x GNU/Linux"

regards

Marcus

Michael Klishin

unread,

Mar 8, 2016, 5:15:27 AM3/8/16

to rabbitm...@googlegroups.com, Marcus Kröger

On 8 March 2016 at 13:11:51, Marcus Kröger (marcus....@gmail.com) wrote:
> because we run on a mainframe using
>
> SUSE Linux Enterprise Server 11 (s390x)
> VERSION = 11
> PATCHLEVEL = 3

RabbitMQ has no native code and provided you have a supported Erlang version, the binary generic UNIX
package should work just fine.

We’ll publish a preview of 3.6.2, including source tarballs, later this week.

Marcus Kröger

unread,

Mar 8, 2016, 5:23:31 AM3/8/16

to Michael Klishin, rabbitm...@googlegroups.com

Hi Michal,

well, our zLinux department would like to stick to the current process using a source package and create their own "installation" package.

Would it be possible that you send me a src package it advance?

regards

Marcus

Message has been deleted

Marcus Kröger

unread,

Mar 8, 2016, 5:59:07 AM3/8/16

to Michael Klishin, rabbitm...@googlegroups.com

Hi Michal,

we "disabled" the cluster by only running one node.

After running with this single node only this node crashed as well which indicates that the issue

https://github.com/rabbitmq/rabbitmq-management/issues/41

cannot be the reason for this shutdown.

What do we see:

The rabbit node behaves "normal" for several hours and out of the sudden it uses up all memory and becomes unresponsive. It then crashes with out of memory.

This is what we see in the logs

=INFO REPORT==== 8-Mar-2016::11:21:13 ===

vm_memory_high_watermark set. Memory used:9229658632 allowed:8388608000

=WARNING REPORT==== 8-Mar-2016::11:21:13 ===

memory resource limit alarm set on node 'xxxx@xxx'.

**********************************************************

*** Publishers will be blocked until this alarm clears ***

**********************************************************

=INFO REPORT==== 8-Mar-2016::11:21:14 ===

vm_memory_high_watermark clear. Memory used:4829491912 allowed:8388608000

=WARNING REPORT==== 8-Mar-2016::11:21:14 ===

memory resource limit alarm cleared on node 'xxx@xxx'

=WARNING REPORT==== 8-Mar-2016::11:21:14 ===

memory resource limit alarm cleared across the cluster

=INFO REPORT==== 8-Mar-2016::11:26:44 ===

Starting RabbitMQ 3.6.0 on Erlang 18

Licensed under the MPL. See http://www.rabbitmq.com/

=INFO REPORT==== 8-Mar-2016::11:26:44 ===

node : xxx@xxx

home dir : /home/rabbitmq

config file(s) : /srv/rabbitmq/config/ComXervPower_B_production/rabbitmq.config

cookie hash : +Pemgog0Dm+Lv7C9ZOD5dQ==

log : /var/log/rabbitmq/ComXervPower_B_production/ComXervPower_B_production.log

sasl log : /var/log/rabbitmq/ComXervPower_B_production/ComXervPower_B_production-sasl.log

database dir : /srv/rabbitmq/data/ComXervPower_B_production