eheap_alloc: Cannot allocate 972288 bytes of memory (of type "old_heap").
(Bytes to be allocated varies between 1 MB and 2 GB, so does the heap type.)
We run RabbitMQ on a machine with 193406 MB of RAM of which at least 50% is free at any given moment, plenty of swap space too.
Queue lengths might go to hundreds of thousands of messages, so we chose lazy queues for our workload to reduce memory footprint. We have about 400 queues - most of them transient, a couple persistent, all of them lazy. Consumer / provider connection count is around 300. Messages seem to be kept in disk and fetched only when needed, memory usage stays around 1-2 GB but then the server crashes with the message above.
Both consumers and providers are rather slow (20-60 messages/s); some providers might decide to publish some more messages about once an hour but this behavior doesn't coincide with RabbitMQ crashes.
vm_memory_high_watermark is set to 8192MB which seems to be enough for our workload (until it's not and the server crashes). I've tried reducing vm_memory_high_watermark_paging_ratio to 0.3 so that any memory purging would happen sooner but that didn't help.
I should note that we have a limit of per-process virtual memory set to 32 GB:
$ ulimit -v
33554432
Some more debugging info, package versions etc.: http://p.defau.lt/?Sf62woGRfo0oTBcVHvdd9A
What could be going wrong? What else could we try?
Regards,