Rabbitmq crashing: Cannot alocate XXXX bytes of memory (of type "heap")

2,064 views
Skip to first unread message

Michal Medvecky

unread,
Mar 4, 2016, 11:54:22 AM3/4/16
to rabbitmq-users
Hello,

I'm using rabbitmq 3.6.1-1 on Ubuntu 14.04 and when Rabbit has too many messages (30M in this case) in queue it crashes with a pretty weird message:

Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 6801972448 bytes of memory (of type "heap").

(Yes, trying to allocate 6.8G)
 
It's simply repeatable:

1) apt-get install rabbitmq-server
2) use this config:
[
  {rabbit, [
     {ssl_listeners, [5671]},
     {ssl_allow_poodle_attack, true},
     {vm_memory_high_watermark, 0.85},
     {ssl_options, [
        {cacertfile,"/etc/ssl/certs/ssl-cert-snakeoil.pem"},
        {certfile,"/etc/ssl/certs/ssl-cert-snakeoil.pem"},
        {keyfile,"/etc/ssl/private/ssl-cert-snakeoil.key"},
        {verify,verify_none},
        {fail_if_no_peer_cert,false}]}
   ]}
].
3) declare one durable exchange "oops", one durable queue "oops", bind them with routing-key "ble"
4) run this script in a loop:

#!/usr/bin/env python
import pika
import sys

connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost',credentials=pika.PlainCredentials('admin', 'admin')))
channel = connection.channel()

message = "Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!"
for i in range(0,100000):
    channel.basic_publish(exchange='oops',routing_key='ble',body=message)#,properties=pika.BasicProperties(delivery_mode = 2))
connection.close()

5) wait until "enough" messages are published
6) rabbitmq crashes
7) when you start rabbit again, after a while (with no publishers) it crashes again.

I'm getting this on an AWS m4.large instance (4 cores, 16GB RAM). Attaching some collectd graphs from time of crash and before.

Am I doing something wrong or is it a rabbit bug?

Some useful info:

Erlang R16B03 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:10] [kernel-poll:false]
Linux test-rabbit.aws.px 3.13.0-74-generic #118-Ubuntu SMP Thu Dec 17 22:52:10 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Thank you for any help,

Michal
disk.png
memory.png
messages.png

Michael Klishin

unread,
Mar 4, 2016, 12:12:20 PM3/4/16
to rabbitm...@googlegroups.com, Michal Medvecky
On 4 March 2016 at 19:54:26, Michal Medvecky (medv...@pexe.so) wrote:
> I'm using rabbitmq 3.6.1-1 on Ubuntu 14.04 and when Rabbit has
> too many messages (30M in this case) in queue it crashes with a
> pretty weird message:
>
> Crash dump was written to: erl_crash.dump
> eheap_alloc: Cannot allocate 6801972448 bytes of memory (of
> type "heap").

That's a messages from the Erlang VM. Make sure you use the most recent release,
and better yet, if you expect 30M messages to be enqueued, use lazy queues:
http://rabbitmq.com/lazy-queues.html 
--
MK

Staff Software Engineer, Pivotal/RabbitMQ


Michael Klishin

unread,
Mar 4, 2016, 12:33:14 PM3/4/16
to Michal Medvecky, rabbitm...@googlegroups.com
+rabbitmq-users

On 4 March 2016 at 20:16:22, Michal Medvecky (medv...@pexe.so) wrote:
> I use ubuntu-prepacked Erlang and I'm not sure if I would be able
> to upgrade, but will check my options. Thanks.

Well, it's a runtime issue. Ubuntu 14.04 packages R16B03, while we generally recommend
17.x or 18.x these days. You can get Ubuntu packages of 18.x from Erlang Solutions:
https://www.erlang-solutions.com/resources/download.html

> and better yet, if you expect 30M messages to be enqueued, use
> lazy queues:
> > http://rabbitmq.com/lazy-queues.html
>
> Actually on my production machines it crashes with only 2-3M
> messages in total (in more than one queue). They are usually consumed
> within 2-3 hours and I don't want them to be saved on disk (my intention
> is to make consumption even faster with more workers)

You can let RabbitMQ use more RAM but that likely won't help with
problems like this when it's a VM issue.

You'd trade of a few % of throughput for drastically lower RAM use with lazy queues,
and probably a lot lower probability of running into the issue. The choice is yours. 

Michael Klishin

unread,
Mar 4, 2016, 12:58:01 PM3/4/16
to Michal Medvecky, rabbitm...@googlegroups.com
 +rabbitmq-users, please CC the list. I do not offer 1-on-1 support of any kind.

On 4 March 2016 at 20:53:43, Michal Medvecky (medv...@pexe.so) wrote:
> It's much worse with Rabbit 3.5.4 on Erlang 18 (Ubuntu 15.10).

Prior to 3.5.6 message paging algorithm had significant inefficiencies. So it can be that
your node simply consumes more RAM and/or the runtime has to allocate bigger chunks
or do it more frequently with 3.5.4.

Again, the right thing to do if you expect that many messages to be enqueued for periods of time,
to use lazy queues. Expecting millions of messages enqueued and not wanting them to be stored
on disk is an opinionated decision that carries the risk of running into similar issues, if you ask me.
So why not avoid it entirely.

Michal Medvecky

unread,
Mar 4, 2016, 2:39:40 PM3/4/16
to Michael Klishin, rabbitm...@googlegroups.com
Hello,

On Fri, Mar 4, 2016 at 6:57 PM, Michael Klishin <mkli...@pivotal.io> wrote:
 +rabbitmq-users, please CC the list. I do not offer 1-on-1 support of any kind.

Sorry, I did not notice it replies to sender instead of the whole group. 

On 4 March 2016 at 20:53:43, Michal Medvecky (medv...@pexe.so) wrote:
> It's much worse with Rabbit 3.5.4 on Erlang 18 (Ubuntu 15.10).

Prior to 3.5.6 message paging algorithm had significant inefficiencies. So it can be that
your node simply consumes more RAM and/or the runtime has to allocate bigger chunks
or do it more frequently with 3.5.4.

I tried with Erlang 18.2 + rabbitmq 3.6.1. This time rabbit crashed after ~7M messages without _any_ message in the logs.

Don't tell me it's a feature :-(
 
Again, the right thing to do if you expect that many messages to be enqueued for periods of time,
to use lazy queues. Expecting millions of messages enqueued and not wanting them to be stored
on disk is an opinionated decision that carries the risk of running into similar issues, if you ask me.
So why not avoid it entirely.

I'll try this as a workaround.

Michal

Itay Adler

unread,
Feb 1, 2018, 11:48:43 AM2/1/18
to rabbitmq-users
Hey Michal,

I'm experiencing a similar issie with RabbitMQ 3.6.6 and Erlang 18.3, I caught in the syslog
prior to the node crash that it tried to allocate 900 megs of RAM and it crashed.
Switching to lazy queues helped? Or simply upgrading to a newer Erlang/Rabbit?

Cheers,
Itay

Michael Klishin

unread,
Feb 1, 2018, 11:51:57 AM2/1/18
to rabbitm...@googlegroups.com
We don’t have enough information to suggest much but please upgrade to 3.6.15
and Erlang/OTP 20.x as 18.x has known issues that prevent nodes from shutting down.

Note the 3.6.7 and other release notes:


Starting with 3.6.12 you will have a more accurate and detailed memory usage reporting, which is critically important for monitoring scenarios like this:
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages