Delivery acknowledgement timeouts after upgrading to 3.8.17

1398 views
Skip to first unread message

James Armstrong

unread,
Jun 28, 2021, 2:36:44 PMJun 28
to rabbitmq-users
Hi, 

I've been using RabbitMQ 3.8.9 on a Windows server for several months with no problems. We recently upgraded our server hardware and I did a completely fresh install of RMQ, upgrading to 3.8.17 in the process.

I have one long running process that used to run fine, but is now generating a "delivery acknowledgement timeout" error. I know I can increase this timeout in the config files, but I am trying to figure out why this process never threw these timeout errors when using 3.8.9.

Looking through the release notes, it looks like the timeout default was actually increased from 15 to 30 minutes in a recent release, which makes even less sense! The log message I am getting looks like this:

2021-06-28 13:41:22.310 [warning] <0.9987.0> Consumer amq.ctag-UiHtX6AUIwPD6stE8nh64g on channel 1 has timed out waiting for delivery acknowledgement. Timeout used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
2021-06-28 13:41:22.310 [error] <0.9987.0> Channel error on connection <0.9916.0> ([::1]:54147 -> [::1]:5672, vhost: '/', user: 'redacted'), channel 1:
operation none caused a channel exception precondition_failed: delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
2021-06-28 13:45:17.282 [warning] <0.9928.0> closing AMQP connection <0.9928.0> ([::1]:54151 -> [::1]:5672 - SecuritiesData.Service, vhost: '/', user: ' redacted'):
client unexpectedly closed TCP connection

Before I just increase the timeout and call it a day, I'd love to understand why this just started happening. Nothing has changed with my consumers or the processes that are sending messages over RMQ - in fact, if I point my programs to the old server, they run just fine.

Thanks in advance for any info you can provide!

-Jim

Allan

unread,
Jul 13, 2021, 2:17:47 AMJul 13
to rabbitmq-users
"Starting with RabbitMQ 3.5.5, the broker’s default heartbeat timeout decreased from 580 seconds to 60 seconds" see "https://pika.readthedocs.io/en/stable/examples/heartbeat_and_blocked_timeouts.html".

Alex Simenduev

unread,
Aug 12, 2021, 4:28:52 PMAug 12
to rabbitmq-users
I'm experiencing exact same issue. We never had issues with delivery ack timeout, after upgrading to 3.8.17 we started to ged sporadic errors like that and our AMQP clients becoming zombies until we restart the,
Can someone from RabbitMQ team help with this. 

Some details:
version before upgrade 3.8.14
version after upgrade: 3.8.18 (Erlang 24)


On Monday, June 28, 2021 at 9:36:44 PM UTC+3 jcarm...@gmail.com wrote:

Nahum Litvin

unread,
Aug 26, 2021, 7:25:28 AMAug 26
to rabbitmq-users
Hello we are suffering from the same issues, did anyone find out how to resolve or mitigate this?

Michal Kuratczyk

unread,
Aug 27, 2021, 3:50:09 AMAug 27
to rabbitm...@googlegroups.com
This timeout was introduced in 3.8.15 with the default value of 15 minutes and then in 3.8.17, the default was changed to 30 minutes.
In general, messages should simply be acknowledged within the timeout. If you really can't do that then you can increase the timeout:

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/d314c5f3-0c11-462c-9373-9a00020d8f9fn%40googlegroups.com.


--
Michał
RabbitMQ team

Arthour Martirosyan

unread,
Nov 10, 2021, 10:50:58 AMNov 10
to rabbitmq-users
Hello. thanks for sharing your experience and knowledge. I have come across to this issue in my experience many times. So the problem comes because the consumer node has a message that is not acknowledged more that 30 min and  due to this fact the broker decides to hang the node and consider it as buggy(in stuck) to avoid unnecessary resource consumption because such consumers can affect node's on disk data compaction and potentially drive nodes out of disk space. Whenever this problem happens broker takes the initiative to close the AMQP connection( by destroying the connection channel over tcp and also the tcp connection thread with the client) with the consumer and the reconnection handling is up to the client. First of all i would like to say that increasing the heartbeat timeout does not solve the problem permanently (there is still the risk of having unacknowledged messages), and in the other hand disabling the heartbeat timeout is highly discouraged( in this case you need TCP keepalive mechanism). So to solve the problem you are supposed to reconfigure your PREFETCH configuration and take a monitoring on your nodes performance and if needed make TTL for every received message and consistently check if TTL is near to 30 min elapsed then urgently prioritize its execution and acknowledgement. If you are developing high load services under RabbitMQ then you need detect connection changes with RabbitMQ as well and detect orphaned messages and flush them, othervice you will hang up to 406 PRECONDITION_FAILED error with unknown delivery tag for unacknowledged messages after connection change on newly created connection channel. This is also possible that your PREFETCH configuration has very high value and your node's code execution is very slow, therefor your messages get stuck in execution phase and on the acknowledgment  time it is already elapsed 30 min time. In any case the solution is not simple and is not only up to a config side, you need to deeply investigate your consumers behavior.

Best regards
Arthur Martirosyan

Sravani Cheruvu

unread,
Nov 16, 2021, 11:17:13 PM (13 days ago) Nov 16
to rabbitmq-users
Hi ,

Could you let us know , if there is any command to check the current timeout value configured in server ?

what is the command to configure PREFETCH configuration in server?

what can we do with already stuck messages ( around 5000) in the server right now (in production) ?


Is this delivery ack timeout not present in version 3.8.2.

Please let us know

Thanks,
Lakshmi

Sravani Cheruvu

unread,
Nov 16, 2021, 11:41:53 PM (13 days ago) Nov 16
to rabbitmq-users
Hi Michal ,

So before version 3.8.15 , how was this handled?

was there no timeout for the delivery acknowledgement ?

we had 3.8.2 earlier , when we did not faced this error and now after upgrading to 3.9.5 , we see this error

Michal Kuratczyk

unread,
Nov 17, 2021, 3:29:55 AM (13 days ago) Nov 17
to rabbitm...@googlegroups.com
Hi,

See the PR for the rationale of this change: https://github.com/rabbitmq/rabbitmq-server/pull/2990
You can check the value like any other configuration option, for example `rabbitmqctl environment`.



--
Michał
RabbitMQ team
Reply all
Reply to author
Forward
0 new messages