Bug report: Dead-letter-exchange causes oom-kill on classic queue mirror nodes.

348 views
Skip to first unread message

Rafal Goslawski

unread,
Aug 10, 2021, 11:50:01 AM8/10/21
to rabbitmq-users
Hi, I believe we have found a bug in rabbitmq-server. It seems related to dead-letter-exchange and classic mirrored queues.

What's strange is that the mirror nodes can use all available RAM (32GB) even though there's only 3GB of data in the queues.

It seems to be reproducible with the following scenario:

Setup:
* 3 node RabbitMQ cluster, (AmazonMQ mq.m5.2xlarge instances - 8 vCPUs 32GB RAM, happens also in k8s and on bare-metal).

* ha-policy: all

* 2 classic durable queues

  * input queue:
      classic, durable, lazy,
      x-dead-letter-exchange: "",
      x-dead-letter-key: "output"

  * output queue:
      classic, durable, lazy

Steps to reproduce:

1. Inject 3M messages, 1KB each in the input queue, for a total of 3GB of data.

2. Start 10 consumers that reject (requeue=false) all the messages, so they get dead-lettered to the outbox queue.

3. After a while, you can observe high memory usage on 1 or 2 mirror nodes for the "output" queue, which results in OOM-killing the node(s). The memory is used by the mirror queue process and binaries, in much larger size than the total of data in the queues.

This scenario is an isolated edge case from a larger application that uses DLX and nack(requeue=false) to move messages back to a different queue on shutdown. The original application uses AMQP-CPP C++ library. This example uses python aio-pika, which makes me believe it's a bug in rabbitmq-server itself.

Screenshot from 2021-08-10 16-56-38.png
Screenshot from 2021-08-10 16-58-44.png
Does anyone have an idea what could have caused this and how to fix this?

PS: I have scripts to reproduce and more screenshots on imgur, but this mailing list seems to remove my post if i add links here...


mkura...@gmail.com

unread,
Aug 11, 2021, 7:07:03 AM8/11/21
to rabbitmq-users
Hi,

If the mailing list prevents you from sharing more details (it blocked even this message but I've approved it), can you please share it as a Github issue or join our slack channel and share there?
Those scripts would be very helpful.

Thanks,

Luke Bakken

unread,
Jul 22, 2022, 12:19:22 PM7/22/22
to rabbitmq-users
Reply all
Reply to author
Forward
0 new messages