Message TTL seem to stop working

117 views
Skip to first unread message

Alex Puschinsky

unread,
Mar 16, 2018, 11:22:45 AM3/16/18
to rabbitmq-users
Hi,

We're having trouble with some queues that have message TTL defined on them. 
The queues are used for exponential backoff. Queues TTL1 to TTL5 are defined with different TTL value and all have mutual DLX:

rabbitmqctl set_policy TTL1 "TTL1" '{"message-ttl":10000,"ha-mode":"exactly","ha-params":1,"ha-sync-mode":"automatic"}' --priority 1 --apply-to queues
rabbitmqctl set_policy TTL2
"TTL2" '{"message-ttl":60000,"ha-params":1,"ha-mode":"exactly","ha-sync-mode":"automatic"}' --priority 1 --apply-to queues
rabbitmqctl set_policy TTL3
"TTL3" '{"message-ttl":900000,"ha-params":1,"ha-mode":"exactly","ha-sync-mode":"automatic"}' --priority 1 --apply-to queues
rabbitmqctl set_policy TTL4
"TTL4" '{"message-ttl":3600000,"ha-mode":"exactly","ha-sync-mode":"automatic","ha-params":1}' --priority 1 --apply-to queues
rabbitmqctl set_policy TTL5
"TTL5" '{"message-ttl":21600000,"ha-mode":"exactly","ha-sync-mode":"automatic","ha-params":1}' --priority 1 --apply-to queues


All the queues are durable and persistent, and been defined with a DLX. On the deal letter queue we have a consumer that routes messages to the next TTL queue.

We have a cluster of 4 EC2 machines of type m5.2xlarge (8 CPU, 32 GB)

The situation is this:
The ttl queues receive messages, but nothing happens, the TTL does not removes the messages from the queue to the DLX.

The situation is even more dire - when we try to manually reject the messages from a ttl queue, its hosting node RAM immediately starts to increase until it reaches the servers limit and crush. Similar effect happens when I try to use the shovel plugin to move the messages to a different queue.  There are several million messages in the queues.

I could not find any relevant errors in the logs.

Can anyone speculate about the cause of the issue? How can we vacate the queues? We rather not to loose the messages.

Thanks,
Alex.
 

Alex Puschinsky

unread,
Mar 16, 2018, 11:29:18 AM3/16/18
to rabbitmq-users
Forgot to mention the versions - RabbitMQ version 3.7.3,Erlang 20.2, Ubuntu 16.04 servers

Michael Klishin

unread,
Mar 16, 2018, 11:29:25 AM3/16/18
to rabbitm...@googlegroups.com
Are you running Shovels with a prefetch and manual acknowledgements?
If not then it will consume as many messages as it can and with millions
that definitely will lead to a RAM blowup.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Mar 16, 2018, 11:34:09 AM3/16/18
to rabbitm...@googlegroups.com
Of course, it wouldn’t hurt to collect some data
on what actually consumes the RAM:

http://www.rabbitmq.com/memory-use.html.


If you can consume messages, I don’t see a reason for Shovel not to work with a low prefetch value.


I cannot speculate as to what’s going on with nacks as you haven’t provided any code but if the number of unacknowledged messages keeps growing, it will lead to a lot more RAM being used. That’s a known limitation.


On 16 Mar 2018, at 18:29, Alex Puschinsky <ale...@gmail.com> wrote:

Forgot to mention the versions - RabbitMQ version 3.7.3,Erlang 20.2, Ubuntu 16.04 servers

--

Alex Puschinsky

unread,
Mar 16, 2018, 11:47:29 AM3/16/18
to rabbitmq-users
I've run the shovel using the interface available in the management console queue page. Not sure how it's configured. The same memory explosion occurred when I rejected 100 or so messages (requeue=false, also from the management console queue page) . My theory about the issue is that rejecting the messages somehow "awakens" the TTL and the queue tried to deal letter millions of messages at once. Can this be true? can the queue be limited by how many messages it rejects? 

I don't see what relevant code I can provide as my issue is that queues without consumers, just with TTL configuration do not reject the messages from the queue as they should.

By the way - I've tried to configured the queues to be lazy to preserve RAM, it does not seem to have a positive affect. Can you speculate about why this happens?
Reply all
Reply to author
Forward
0 new messages