Lazy queue setup crashes with "Cannot allocate <...> bytes of memory" on ~190 GB RAM machine

295 views
Skip to first unread message

Linas Valiukas

unread,
May 31, 2016, 6:45:17 AM5/31/16
to rabbitmq-users
Hi,

Our RabbitMQ instance keeps the memory usage stable for an unpredictable amount of time (from 30 mins to 8 hours) and then crashes suddenly with:

eheap_alloc: Cannot allocate 972288 bytes of memory (of type "old_heap").


(Bytes to be allocated varies between 1 MB and 2 GB, so does the heap type.)


We run RabbitMQ on a machine with 193406 MB of RAM of which at least 50% is free at any given moment, plenty of swap space too.


Queue lengths might go to hundreds of thousands of messages, so we chose lazy queues for our workload to reduce memory footprint. We have about 400 queues - most of them transient, a couple persistent, all of them lazy. Consumer / provider connection count is around 300. Messages seem to be kept in disk and fetched only when needed, memory usage stays around 1-2 GB but then the server crashes with the message above.


Both consumers and providers are rather slow (20-60 messages/s); some providers might decide to publish some more messages about once an hour but this behavior doesn't coincide with RabbitMQ crashes.


vm_memory_high_watermark is set to 8192MB which seems to be enough for our workload (until it's not and the server crashes). I've tried reducing vm_memory_high_watermark_paging_ratio to 0.3 so that any memory purging would happen sooner but that didn't help.


I should note that we have a limit of per-process virtual memory set to 32 GB:


$ ulimit -v

33554432


Some more debugging info, package versions etc.: http://p.defau.lt/?Sf62woGRfo0oTBcVHvdd9A


What could be going wrong? What else could we try?


Regards,

Michael Klishin

unread,
May 31, 2016, 7:45:01 AM5/31/16
to rabbitmq-users
RabbitMQ does not allocate memory directly and has no native code. Try a different
Erlang/OTP version, e.g. 17.5 or 18.2 or 18.3.

Linas Valiukas

unread,
Jun 2, 2016, 2:08:01 PM6/2/16
to rabbitmq-users
Thanks Michael, downgrading Erlang to 17.5.3 seems to have helped.

Do you think this is something to be reported to Erlang developers?
Reply all
Reply to author
Forward
0 new messages