Frequent Timeout and Stability Issues Observed in RabbitMQ

32 views
Skip to first unread message

niveddha raja

unread,
Oct 8, 2025, 10:44:02 AM (2 days ago) Oct 8
to rabbitmq-users

Dear Team, 

As part of addressing RabbitMQ Vulnerability we upgraded RabbitMQ from 3.12.x(Erlang 25.3) to 3.13.x(Erlang 26.2) followed by  4.1.0.(Erlang 27.3.4) But post upgrading we are frequently encountering timeout and stability issues.

We found that similar problems were discussed and reportedly resolved in later 4.1.x releases, as referenced in the following links:


 

We need clarifications on below points and requesting your inputs asap:

    • Are the timeout and stability issues (including unexpected node stop and queue crash) fully resolved in RabbitMQ 4.1.4?
    • Can we use Erlang 27.3.4 version with RabbitMQ 4.1.4 for stable and production-safe operation?
    • When executing rabbitmqctl batch files, which erlang. cookie.yml file does it depend? the one from the user profile path or the one from system configuration path?
    • We observed that after upgrading RabbitMQ from 3.12.x → 3.13.x → 4.1.0, the cookie values sometimes differ. We want to understand how RabbitMQ determines which cookie file to use in such upgrade scenarios.

Configuration Details which we are using in our application:

  • Queue type: Classic
  • Handshake timeout: 30,000 ms
  • Heartbeat: Default

Message Flow

  1. Receive field message → Publish to queue1
    • If publish fails → NACK to field
    • If successful → ACK to field
  2. Consume message from queue1 (using BasicGet) → Convert for application → Publish to queue2
    • If publish fails → NACK to queue1
    • If successful → ACK to queue1
  3. Consume message from queue2 (using BasicGet) → Store in database
    • If DB operation fails → NACK to queue1
    • If successful → ACK to queue1

We are using BasicGet to ensure synchronous processing and prevent data loss.


 

Observed Behavior in 4.1.0

We often see timeout errors and cases where RabbitMQ stops unexpectedly without manual intervention, as noted in the logs from multiple customer environments.

Example Log Snippet

[error] <0.446.0> ** Reason for termination ==

[error] <0.446.0> ** {{badmatch,{error,eacces}},

[error] <0.446.0>     [{rabbit_classic_queue_index_v2,new_segment_file,3,

[file,"rabbit_classic_queue_index_v2.erl"},{line,594}]},

...

[error] <0.950.0> Restarting crashed queue 'xx-xx-queue' in vhost '/'.

[info] <0.98.0> RabbitMQ is asked to stop..


We are using RabbitMQ.Client v6.2.2 on the client side and have observed the following exceptions: Our application and RabbitMQ are running in single server node. During exceptions, though we are trying to recreate the connection after proper disposal of existing connection it is not getting succeeded.

 

Example Exceptions

1. BrokerUnreachableException

RabbitMQ.Client.Exceptions.BrokerUnreachableException:

None of the specified endpoints were reachable

---> RabbitMQ.Client.Exceptions.ConnectFailureException: Connection failed

---> System.Net.Sockets.SocketException: A non-recoverable error occurred during a database lookup

...

at RabbitMQ.Client.ConnectionFactory.CreateConnection(String clientProvidedName)

This leads to application disruption and inability to process field messages reliably. Even after properly disposing and recreating the connection, the re-connection often fails.

2: DNS Exception

 RabbitMQ.Client.Exceptions.BrokerUnreachableException: None of the specified endpoints were reachable ---> System.AggregateException: One or more errors occurred. ---> RabbitMQ.Client.Exceptions.ConnectFailureException: Connection failed ---> System.Net.Sockets.SocketException: A non-recoverable error occurred during a database lookup

   at System.Net.Dns.HostResolutionEndHelper(IAsyncResult asyncResult)

   at System.Net.Dns.EndGetHostAddresses(IAsyncResult asyncResult)

   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)

--- End of stack trace from previous location where exception was thrown ---

   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

   at RabbitMQ.Client.Impl.TcpClientAdapter.<ConnectAsync>d__2.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

   at RabbitMQ.Client.Impl.TaskExtensions.<TimeoutAfter>d__0.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

   at RabbitMQ.Client.Impl.SocketFrameHandler.ConnectOrFail(ITcpClient socket, AmqpTcpEndpoint endpoint, TimeSpan timeout)

   --- End of inner exception stack trace ---

   at RabbitMQ.Client.Impl.SocketFrameHandler.ConnectOrFail(ITcpClient socket, AmqpTcpEndpoint endpoint, TimeSpan timeout)

   at RabbitMQ.Client.Impl.SocketFrameHandler.ConnectUsingAddressFamily(AmqpTcpEndpoint endpoint, Func`2 socketFactory, TimeSpan timeout, AddressFamily family)

   at RabbitMQ.Client.Impl.SocketFrameHandler..ctor(AmqpTcpEndpoint endpoint, Func`2 socketFactory, TimeSpan connectionTimeout, TimeSpan readTimeout, TimeSpan writeTimeout)

   at RabbitMQ.Client.ConnectionFactory.CreateFrameHandler(AmqpTcpEndpoint endpoint)

   at RabbitMQ.Client.EndpointResolverExtensions.SelectOne[T](IEndpointResolver resolver, Func`2 selector)

   --- End of inner exception stack trace ---

   at RabbitMQ.Client.EndpointResolverExtensions.SelectOne[T](IEndpointResolver resolver, Func`2 selector)

   at RabbitMQ.Client.Framing.Impl.AutorecoveringConnection.Init(IEndpointResolver endpoints)

   at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName)

   --- End of inner exception stack trace ---

   at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName)

   at RabbitMQ.Client.ConnectionFactory.CreateConnection(String clientProvidedName)


Luke Bakken

unread,
Oct 8, 2025, 3:52:59 PM (2 days ago) Oct 8
to rabbitmq-users
Are you running RabbitMQ on Windows servers? The reason being that error,eacces can be due to virus scanners or other "security" software (like crowdstrike) interfering with RabbitMQ. It's best that you disable all software like that.

You have the ability to research these issues yourself as well -


You can see that some code was added in 4.1.2 to help with this case.

Before asking any more questions on this mailing list, you must upgrade to version 4.1.4 to receive free support from Team RabbitMQ.

Thanks,
Luke
Reply all
Reply to author
Forward
0 new messages