Frequent Timeout and Stability Issues Observed in RabbitMQ

105 views
Skip to first unread message

niveddha raja

unread,
Oct 8, 2025, 10:44:02 AMOct 8
to rabbitmq-users

Dear Team, 

As part of addressing RabbitMQ Vulnerability we upgraded RabbitMQ from 3.12.x(Erlang 25.3) to 3.13.x(Erlang 26.2) followed by  4.1.0.(Erlang 27.3.4) But post upgrading we are frequently encountering timeout and stability issues.

We found that similar problems were discussed and reportedly resolved in later 4.1.x releases, as referenced in the following links:


 

We need clarifications on below points and requesting your inputs asap:

    • Are the timeout and stability issues (including unexpected node stop and queue crash) fully resolved in RabbitMQ 4.1.4?
    • Can we use Erlang 27.3.4 version with RabbitMQ 4.1.4 for stable and production-safe operation?
    • When executing rabbitmqctl batch files, which erlang. cookie.yml file does it depend? the one from the user profile path or the one from system configuration path?
    • We observed that after upgrading RabbitMQ from 3.12.x → 3.13.x → 4.1.0, the cookie values sometimes differ. We want to understand how RabbitMQ determines which cookie file to use in such upgrade scenarios.

Configuration Details which we are using in our application:

  • Queue type: Classic
  • Handshake timeout: 30,000 ms
  • Heartbeat: Default

Message Flow

  1. Receive field message → Publish to queue1
    • If publish fails → NACK to field
    • If successful → ACK to field
  2. Consume message from queue1 (using BasicGet) → Convert for application → Publish to queue2
    • If publish fails → NACK to queue1
    • If successful → ACK to queue1
  3. Consume message from queue2 (using BasicGet) → Store in database
    • If DB operation fails → NACK to queue1
    • If successful → ACK to queue1

We are using BasicGet to ensure synchronous processing and prevent data loss.


 

Observed Behavior in 4.1.0

We often see timeout errors and cases where RabbitMQ stops unexpectedly without manual intervention, as noted in the logs from multiple customer environments.

Example Log Snippet

[error] <0.446.0> ** Reason for termination ==

[error] <0.446.0> ** {{badmatch,{error,eacces}},

[error] <0.446.0>     [{rabbit_classic_queue_index_v2,new_segment_file,3,

[file,"rabbit_classic_queue_index_v2.erl"},{line,594}]},

...

[error] <0.950.0> Restarting crashed queue 'xx-xx-queue' in vhost '/'.

[info] <0.98.0> RabbitMQ is asked to stop..


We are using RabbitMQ.Client v6.2.2 on the client side and have observed the following exceptions: Our application and RabbitMQ are running in single server node. During exceptions, though we are trying to recreate the connection after proper disposal of existing connection it is not getting succeeded.

 

Example Exceptions

1. BrokerUnreachableException

RabbitMQ.Client.Exceptions.BrokerUnreachableException:

None of the specified endpoints were reachable

---> RabbitMQ.Client.Exceptions.ConnectFailureException: Connection failed

---> System.Net.Sockets.SocketException: A non-recoverable error occurred during a database lookup

...

at RabbitMQ.Client.ConnectionFactory.CreateConnection(String clientProvidedName)

This leads to application disruption and inability to process field messages reliably. Even after properly disposing and recreating the connection, the re-connection often fails.

2: DNS Exception

 RabbitMQ.Client.Exceptions.BrokerUnreachableException: None of the specified endpoints were reachable ---> System.AggregateException: One or more errors occurred. ---> RabbitMQ.Client.Exceptions.ConnectFailureException: Connection failed ---> System.Net.Sockets.SocketException: A non-recoverable error occurred during a database lookup

   at System.Net.Dns.HostResolutionEndHelper(IAsyncResult asyncResult)

   at System.Net.Dns.EndGetHostAddresses(IAsyncResult asyncResult)

   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)

--- End of stack trace from previous location where exception was thrown ---

   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

   at RabbitMQ.Client.Impl.TcpClientAdapter.<ConnectAsync>d__2.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

   at RabbitMQ.Client.Impl.TaskExtensions.<TimeoutAfter>d__0.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

   at RabbitMQ.Client.Impl.SocketFrameHandler.ConnectOrFail(ITcpClient socket, AmqpTcpEndpoint endpoint, TimeSpan timeout)

   --- End of inner exception stack trace ---

   at RabbitMQ.Client.Impl.SocketFrameHandler.ConnectOrFail(ITcpClient socket, AmqpTcpEndpoint endpoint, TimeSpan timeout)

   at RabbitMQ.Client.Impl.SocketFrameHandler.ConnectUsingAddressFamily(AmqpTcpEndpoint endpoint, Func`2 socketFactory, TimeSpan timeout, AddressFamily family)

   at RabbitMQ.Client.Impl.SocketFrameHandler..ctor(AmqpTcpEndpoint endpoint, Func`2 socketFactory, TimeSpan connectionTimeout, TimeSpan readTimeout, TimeSpan writeTimeout)

   at RabbitMQ.Client.ConnectionFactory.CreateFrameHandler(AmqpTcpEndpoint endpoint)

   at RabbitMQ.Client.EndpointResolverExtensions.SelectOne[T](IEndpointResolver resolver, Func`2 selector)

   --- End of inner exception stack trace ---

   at RabbitMQ.Client.EndpointResolverExtensions.SelectOne[T](IEndpointResolver resolver, Func`2 selector)

   at RabbitMQ.Client.Framing.Impl.AutorecoveringConnection.Init(IEndpointResolver endpoints)

   at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName)

   --- End of inner exception stack trace ---

   at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName)

   at RabbitMQ.Client.ConnectionFactory.CreateConnection(String clientProvidedName)


Luke Bakken

unread,
Oct 8, 2025, 3:52:59 PMOct 8
to rabbitmq-users
Are you running RabbitMQ on Windows servers? The reason being that error,eacces can be due to virus scanners or other "security" software (like crowdstrike) interfering with RabbitMQ. It's best that you disable all software like that.

You have the ability to research these issues yourself as well -


You can see that some code was added in 4.1.2 to help with this case.

Before asking any more questions on this mailing list, you must upgrade to version 4.1.4 to receive free support from Team RabbitMQ.

Thanks,
Luke

niveddha raja

unread,
Dec 15, 2025, 3:44:23 AM (yesterday) Dec 15
to rabbitmq-users
Hello Luke,

Based on the recommendations provided, We have migrated to RabbitMQ 4.1.4 with Erlang/OTP 27.3.4.3. And Yes, We are running RabbitMQ on Windows servers. We noticed that some related issues have been addressed in the GitHub links and discussions on this Google Group.

Despite this, we are still encountering frequent “operation timed out” issues in our test environment.

While investigating, we found suggestions from ChatGPT to decrease the RequestHeartbeatTimeout to 30 seconds from default value 60 seconds. We would like to understand whether decreasing this value is a recommended approach and if it could help mitigate the timeout issues. And also we would like to understand what would be the root cause behind this.

On the application side, when ‘operation timed out’ exceptions occur, we handle them by attempting to recreate connections that have entered a faulted state. However, in certain cases, the recovery process fails even after five retry attempts

For context, our application is expected to process approximately 45 events per second.

We would appreciate your guidance and recommendations on how to address this issue effectively.

Best Regards,
Usharani K

Vilius Šumskas

unread,
Dec 15, 2025, 4:59:24 AM (yesterday) Dec 15
to rabbitm...@googlegroups.com

Hi,

 

as this issue is related to your environment you need to provide specs of that environment where you are running RabbitMQ instance. Particularly for `eaccess` kind of errors, what kind of storage you are using to store queue information? How many IOPS it can handle? Is it by chance a shared network storage? Etc.

 

--

    Vilius

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/rabbitmq-users/2912ea0d-3815-496c-8b5c-63c41351ddb0n%40googlegroups.com.

Luke Bakken

unread,
Dec 15, 2025, 11:27:14 AM (yesterday) Dec 15
to rabbitmq-users
You may be running into this, check your logs - https://github.com/rabbitmq/rabbitmq-server/discussions/15134

In addition, you should have upgraded to the latest version of RabbitMQ, 4.2.1. There's no reason to use an older version.

Did you disable and remove ALL security and anti-virus software as I recommended?
Reply all
Reply to author
Forward
0 new messages