RabbitMQ inconsistently fails to publish messages from .NET client on windows

2,326 views
Skip to first unread message

Abhilash Kumar K

unread,
Sep 12, 2019, 12:17:14 PM9/12/19
to rabbitmq-users

Hi,

We are experiencing two problems in our production cluster:
Our setup:

RabbitMQ cluster:
3 nodes in Azure (3 different availability zones within same subnet) - Windows Servers
Pause Minority
Load balanced (Standard Azure LB) with probing interval of 5 second (for HA)
All configurations are default except for the partition handling strategy etc

Clients:
All clients are .NET Core 2.2 (uses MassTransit)

Problems:
1. When there is network glitches, the node pauses, but it takes some time to report it to the probe (simple probe on 5672 port), so the messages that is published using this node (due to LB routing) will timeout
2. For some reason, message publish were continuously failing. Our cluster was having one machine in pause mode. But it was failing with the following log. Retries did help in this case. After we restarted machines, we couldn't reproduce this issue. We are worried that this could occur again.

On Server we found this log when the issue occurred(one instance)
-----------------------------------------------------------------------------------------
2019-09-11 13:10:32.063 [info] <0.22400.1> accepting AMQP connection <0.22400.1> (x.x.x.x:57291 -> y.y.y.y:5672)

2019-09-11 13:10:32.095 [info] <0.22400.1> connection <0.22400.1> (x.x.x.x:57291 -> y.y.y.y:5672): user 'myadmin' authenticated and granted access to vhost '/'


Please see the client log for the second issue:
-----------------------------------------------------------
2019-09-11 13:10:57.652 +08:00 [Error] One or more errors occurred. (Broker unreachable: mya...@x.x.x.x:5672/)
System.AggregateException: One or more errors occurred. (Broker unreachable: mya...@x.x.x.x:5672/) ---> MassTransit.RabbitMqTransport.RabbitMqConnectionException: Broker unreachable: mya...@x.x.x.x:5672/ ---> RabbitMQ.Client.Exceptions.BrokerUnreachableException: None of the specified endpoints were reachable ---> System.AggregateException: One or more errors occurred. (Connection failed) ---> RabbitMQ.Client.Exceptions.ConnectFailureException: Connection failed ---> System.TimeoutException: The operation has timed out.
   at RabbitMQ.Client.Impl.TaskExtensions.TimeoutAfter(Task task, Int32 millisecondsTimeout)
   at RabbitMQ.Client.Impl.SocketFrameHandler.ConnectOrFail(ITcpClient socket, AmqpTcpEndpoint endpoint, Int32 timeout)
   --- End of inner exception stack trace ---
   at RabbitMQ.Client.Impl.SocketFrameHandler.ConnectUsingAddressFamily(AmqpTcpEndpoint endpoint, Func`2 socketFactory, Int32 timeout, AddressFamily family)
   at RabbitMQ.Client.Impl.SocketFrameHandler..ctor(AmqpTcpEndpoint endpoint, Func`2 socketFactory, Int32 connectionTimeout, Int32 readTimeout, Int32 writeTimeout)
   at RabbitMQ.Client.ConnectionFactory.CreateFrameHandler(AmqpTcpEndpoint endpoint)
   at RabbitMQ.Client.EndpointResolverExtensions.SelectOne[T](IEndpointResolver resolver, Func`2 selector)
   --- End of inner exception stack trace ---
   at RabbitMQ.Client.EndpointResolverExtensions.SelectOne[T](IEndpointResolver resolver, Func`2 selector)
   at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName)
   --- End of inner exception stack trace ---
   at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName)
   at MassTransit.RabbitMqTransport.Integration.ConnectionContextFactory.CreateConnection(ISupervisor supervisor)
   --- End of inner exception stack trace ---
   at MassTransit.RabbitMqTransport.Integration.ConnectionContextFactory.CreateConnection(ISupervisor supervisor)
   at MassTransit.RabbitMqTransport.Integration.ConnectionContextFactory.CreateSharedConnection(Task`1 context, CancellationToken cancellationToken)
   at GreenPipes.Agents.PipeContextSupervisor`1.GreenPipes.IPipeContextSource<TContext>.Send(IPipe`1 pipe, CancellationToken cancellationToken)
   at GreenPipes.Agents.PipeContextSupervisor`1.GreenPipes.IPipeContextSource<TContext>.Send(IPipe`1 pipe, CancellationToken cancellationToken)
   at GreenPipes.Agents.PipeContextSupervisor`1.GreenPipes.IPipeContextSource<TContext>.Send(IPipe`1 pipe, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at MassTransit.RabbitMqTransport.Integration.ModelContextFactory.CreateSharedModel(Task`1 context, CancellationToken cancellationToken)
   at GreenPipes.Agents.PipeContextSupervisor`1.GreenPipes.IPipeContextSource<TContext>.Send(IPipe`1 pipe, CancellationToken cancellationToken)
   at GreenPipes.Agents.PipeContextSupervisor`1.GreenPipes.IPipeContextSource<TContext>.Send(IPipe`1 pipe, CancellationToken cancellationToken)
   at GreenPipes.Agents.PipeContextSupervisor`1.GreenPipes.IPipeContextSource<TContext>.Send(IPipe`1 pipe, CancellationToken cancellationToken)
   at MassTransit.RabbitMqTransport.Transport.RabbitMqSendTransport.MassTransit.Transports.ISendTransport.Send[T](T message, IPipe`1 pipe, CancellationToken cancellationToken)
   at MassTransit.Transports.PublishEndpoint.Publish[T](CancellationToken cancellationToken, T message, PublishPipeContextAdapter`1 adapter)
   at MassTransit.Transports.PublishEndpoint.Publish[T](CancellationToken cancellationToken, T message, PublishPipeContextAdapter`1 adapter)
   at MessageSender.Lib.Publisher.PublisherMessageService.SendAsync(PublisherMessage message)
   at MessageSender.WebAPIs.PublisherProject.Services.PublisherProjectService.InitiatePublisherMessageHandlerAsync(PublisherProjectResult result, PublisherProject project) in C:\Agent2\_work\30\s\MessageSender.WebAPIs\MessageSender.WebAPIs.PublisherProject\Services\PublisherProjectService.cs:line 457
   at MessageSender.WebAPIs.PublisherProject.Services.PublisherProjectService.SavePublisherProjectAsync(PublisherProject project) in C:\Agent2\_work\30\s\MessageSender.WebAPIs\MessageSender.WebAPIs.PublisherProject\Services\PublisherProjectService.cs:line 124
   at MessageSender.Lib.Api.Framework.Framework.Controllers.BaseApiController.<>c__DisplayClass21_0`1.<<ExecuteGatewayCall>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at MessageSender.Lib.Framework.DataAccess.DbRetryManager.ExecuteAsync(Int32 maxAttempts, TimeSpan retryInterval, Func`1 executeMethod)
   at MessageSender.Lib.Framework.DataAccess.DbRetryManager.ExecuteAsync(Int32 maxAttempts, TimeSpan retryInterval, Func`1 executeMethod)
   at MessageSender.Lib.Api.Framework.Framework.Controllers.BaseApiController.ExecuteGatewayCall[TReturn](Func`1 methodToCall, TransactionScopeOption transactionScopeOption, Int32 maxRetryAttempts)
   at MessageSender.Lib.Api.Framework.Framework.Controllers.BaseApiController.ExecuteGatewayCall[TReturn](Func`1 methodToCall)
   at MessageSender.WebAPIs.PublisherProject.Controllers.PublisherProjectController.SavePublisherProjectAsync(PublisherProject PublisherProject) in C:\Agent2\_work\30\s\MessageSender.WebAPIs\MessageSender.WebAPIs.PublisherProject\Areas\V1\Controllers\PublisherProjectController.cs:line 104
   at Microsoft.AspNetCore.Mvc.Internal.ActionMethodExecutor.TaskOfIActionResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeActionMethodAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeNextActionFilterAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Rethrow(ActionExecutedContext context)
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeInnerFilterAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeNextResourceFilter()
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.Rethrow(ResourceExecutedContext context)
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeFilterPipelineAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeAsync()
   at Microsoft.AspNetCore.Builder.RouterMiddleware.Invoke(HttpContext httpContext)
   at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context)
   at Microsoft.AspNetCore.Cors.Infrastructure.CorsMiddleware.Invoke(HttpContext context)
   at MessageSender.Lib.Api.Framework.Middlewares.CustomExceptionMiddleware.Invoke(HttpContext context)
---> (Inner Exception #0) MassTransit.RabbitMqTransport.RabbitMqConnectionException: Broker unreachable: mya...@x.x.x.x:5672/ ---> RabbitMQ.Client.Exceptions.BrokerUnreachableException: None of the specified endpoints were reachable ---> System.AggregateException: One or more errors occurred. (Connection failed) ---> RabbitMQ.Client.Exceptions.ConnectFailureException: Connection failed ---> System.TimeoutException: The operation has timed out.
   at RabbitMQ.Client.Impl.TaskExtensions.TimeoutAfter(Task task, Int32 millisecondsTimeout)
   at RabbitMQ.Client.Impl.SocketFrameHandler.ConnectOrFail(ITcpClient socket, AmqpTcpEndpoint endpoint, Int32 timeout)
   --- End of inner exception stack trace ---
   at RabbitMQ.Client.Impl.SocketFrameHandler.ConnectUsingAddressFamily(AmqpTcpEndpoint endpoint, Func`2 socketFactory, Int32 timeout, AddressFamily family)
   at RabbitMQ.Client.Impl.SocketFrameHandler..ctor(AmqpTcpEndpoint endpoint, Func`2 socketFactory, Int32 connectionTimeout, Int32 readTimeout, Int32 writeTimeout)
   at RabbitMQ.Client.ConnectionFactory.CreateFrameHandler(AmqpTcpEndpoint endpoint)
   at RabbitMQ.Client.EndpointResolverExtensions.SelectOne[T](IEndpointResolver resolver, Func`2 selector)
   --- End of inner exception stack trace ---
   at RabbitMQ.Client.EndpointResolverExtensions.SelectOne[T](IEndpointResolver resolver, Func`2 selector)
   at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName)
   --- End of inner exception stack trace ---
   at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName)
   at MassTransit.RabbitMqTransport.Integration.ConnectionContextFactory.CreateConnection(ISupervisor supervisor)
   --- End of inner exception stack trace ---
   at MassTransit.RabbitMqTransport.Integration.ConnectionContextFactory.CreateConnection(ISupervisor supervisor)
   at MassTransit.RabbitMqTransport.Integration.ConnectionContextFactory.CreateSharedConnection(Task`1 context, CancellationToken cancellationToken)
   at GreenPipes.Agents.PipeContextSupervisor`1.GreenPipes.IPipeContextSource<TContext>.Send(IPipe`1 pipe, CancellationToken cancellationToken)
   at GreenPipes.Agents.PipeContextSupervisor`1.GreenPipes.IPipeContextSource<TContext>.Send(IPipe`1 pipe, CancellationToken cancellationToken)
   at GreenPipes.Agents.PipeContextSupervisor`1.GreenPipes.IPipeContextSource<TContext>.Send(IPipe`1 pipe, CancellationToken cancellationToken)<---

Can somebody please help?

Michael Klishin

unread,
Sep 17, 2019, 2:16:30 PM9/17/19
to rabbitmq-users

Arabela P

unread,
Oct 6, 2019, 12:37:55 PM10/6/19
to rabbitmq-users
Hi Michael,

I am facing the same issue in another setup (aks cluster - kubernetes cluster).

Bellow the stacktrace:

RabbitMQ.Client.Exceptions.BrokerUnreachableException: None of the specified endpoints were reachable ---> System.AggregateException: One or more errors occurred. (Connection failed) ---> RabbitMQ.Client.Exceptions.ConnectFailureException: Connection failed ---> System.Net.Sockets.SocketException: Connection refused\n   at System.Net.Sockets.Socket.BeginConnectEx(EndPoint remoteEP, Boolean flowContext, AsyncCallback callback, Object state)\n   at System.Net.Sockets.Socket.UnsafeBeginConnect(EndPoint remoteEP, AsyncCallback callback, Object state, Boolean flowContext)\n   at System.Net.Sockets.Socket.BeginConnect(EndPoint remoteEP, AsyncCallback callback, Object state)\n   at System.Net.Sockets.Socket.BeginConnect(IPAddress address, Int32 port, AsyncCallback requestCallback, Object state)\n   at System.Net.Sockets.Socket.ConnectAsync(IPAddress address, Int32 port)\n   at RabbitMQ.Client.TcpClientAdapter.ConnectAsync(String host, Int32 port)\n   at RabbitMQ.Client.Impl.TaskExtensions.TimeoutAfter(Task task, Int32 millisecondsTimeout)\n   at RabbitMQ.Client.Impl.SocketFrameHandler.ConnectOrFail(ITcpClient socket, AmqpTcpEndpoint endpoint, Int32 timeout)\n   --- End of inner exception stack trace ---\n   at RabbitMQ.Client.Impl.SocketFrameHandler.ConnectUsingAddressFamily(AmqpTcpEndpoint endpoint, Func`2 socketFactory, Int32 timeout, AddressFamily family)\n   at RabbitMQ.Client.Impl.SocketFrameHandler..ctor(AmqpTcpEndpoint endpoint, Func`2 socketFactory, Int32 connectionTimeout, Int32 readTimeout, Int32 writeTimeout)\n   at RabbitMQ.Client.ConnectionFactory.CreateFrameHandler(AmqpTcpEndpoint endpoint)\n   at RabbitMQ.Client.EndpointResolverExtensions.SelectOne[T](IEndpointResolver resolver, Func`2 selector)\n   --- End of inner exception stack trace ---\n   at RabbitMQ.Client.EndpointResolverExtensions.SelectOne[T](IEndpointResolver resolver, Func`2 selector)\n   at RabbitMQ.Client.Framing.Impl.AutorecoveringConnection.Init(IEndpointResolver endpoints)\n   at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName)\n   --- End of inner exception stack trace ---\n   at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName)\n

I took a tcp dump and I see no traffic. The endpoint name is correct and there are no networking problems.
I enabled log level debug on the service and I see no events that could help me.

The behavior is inconsistent and I really need some help. I am stuck. I followed all the steps mentioned in the troubleshooting area.

Luke Bakken

unread,
Oct 6, 2019, 1:34:11 PM10/6/19
to rabbitmq-users
RabbitMQ.Client.Exceptions.BrokerUnreachableException: None of the specified endpoints were reachable ---> System.AggregateException: One or more errors occurred. (Connection failed) ---> RabbitMQ.Client.Exceptions.ConnectFailureException: Connection failed ---> System.Net.Sockets.SocketException: Connection refused
 
I took a tcp dump and I see no traffic. The endpoint name is correct and there are no networking problems.
I enabled log level debug on the service and I see no events that could help me.

The behavior is inconsistent and I really need some help. I am stuck. I followed all the steps mentioned in the troubleshooting area.

There is nothing here to suggest a bug in either the .NET client or RabbitMQ.

"Connection refused" means that either RabbitMQ is not running or listening on the port you expect, or a network device between your application and RabbitMQ is blocking connections. Since this error is inconsistent, the latter is the more likely explanation.

A stack trace is insufficient for helping us to help you. Before we can start to help, we must know this information:
  • Version of Erlang and RabbitMQ
  • Operating system and version
  • Better description of your environment. "aks cluster - kubernetes cluster" provides no useful information
  • RabbitMQ configuration files
  • RabbitMQ log files, or log file entries corresponding to this error
Thanks,
Luke

Arabela P

unread,
Oct 7, 2019, 4:09:15 AM10/7/19
to rabbitmq-users
We are using the following Docker file

FROM rabbitmq:3.7.17

# Update apt-get, install curl and unzip
RUN apt-get update && apt-get install -y curl unzip

# Download and unzip delayed messages plugin, also remove zip and move the plugin to the plugins folder
unzip rabbitmq_delayed_message_exchange-20171201-3.7.x.zip && \
rm -f rabbitmq_delayed_message_exchange-20171201-3.7.x.zip && \
mv rabbitmq_delayed_message_exchange-20171201-3.7.x.ez plugins/

COPY enabled_plugins /etc/rabbitmq/enabled_plugins
COPY rabbitmq.conf /etc/rabbitmq/rabbitmq.conf

# Enable the delayed message plugin
RUN rabbitmq-plugins enable rabbitmq_delayed_message_exchange

EXPOSE 1883/tcp
EXPOSE 15672/tcp
EXPOSE 5672/tcp 

rabbitmq.conf fie content

loopback_users.guest = false
listeners.tcp.default = 5672
hipe_compile = false
log.console = true
log.console.level = debug

We are consuming the service inside a .Net Core 2.2 using   <PackageReference Include="RabbitMQ.Client" Version="5.1.1" />

As hosting we are using azure kubernetes cluster (Kubernetes version 1.14.6) with 4 nodes (linux machines). Initially I thought it is a kubernetes issue so I open a support ticket at Microsoft. 
They suggested to collect a tcp dump to see all the traffic.
I run the following command - tcpdump -i eth0 -s 0 -w -c 500 /tmp/mqtttrace.pcap on the pod that calls the service (container) and sent the tcp dump to Microsoft.
They saw no traffic and when i asked if it is a firewall issue the answer was:
no firewall/selinux should not be an issue because tcpdump uses libpcap and libpcap processes packets before they get processed by the firewall. if it was dropped by the firewall or selinux then we should see that in the tcp dump file

Michael Klishin

unread,
Oct 7, 2019, 10:26:30 AM10/7/19
to rabbitmq-users
We don't have much to add to [1]. Networking protocols are layered and so are problems that can affect connectivity.
Only you have access to your environment and can realistically narrow the problem down. [1] offers a methodology
that very likely will save you time.

As a side note, you are enabling remote access for the user with default well known credentials. This is a terrible security practice
we recommend against [2].


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/cdca3991-af5c-49e0-9e5c-b1a4cf51f71e%40googlegroups.com.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Arabela P

unread,
Oct 8, 2019, 2:38:03 AM10/8/19
to rabbitmq-users
I've tried the steps from the troubleshooting page (and I've attached the results of the steps suggested).
I saw no issue with tcp connections - and the fact that I see no records inside the tcpdump is very strange (seems like the connection is not made at all  (stacktrace shows System.Net.Sockets.SocketException: Connection refused - no firewall rules are configured).
The client that tries to connect to the rabbitmq service has the following ip 10.244.7.10.
As you see in the list_connections output - i already have 2 clients connected in the same environment. And 1 day ago this client could also connect to rabbitmq, but now - suddenly it can't.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.
troubleshooting.txt

Michael Klishin

unread,
Oct 8, 2019, 9:07:57 AM10/8/19
to rabbitmq-users
All connections that reach RabbitMQ are logged [1]. A traffic capture can tell you more than any other source of information [2]
as to what is going on with the connection.

Don't guess, guessing is too expensive.


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/db5b9164-a65d-4195-8067-a7a05d7b4ff3%40googlegroups.com.

Luke Bakken

unread,
Oct 8, 2019, 12:11:22 PM10/8/19
to rabbitmq-users
Hello,

Thanks for the comprehensive amount of information. All the details you have provided suggest a network problem that is preventing connections from being established.

From the 10.244.7.10 machine, what happens when you run this command?

telnet 10.244.7.62 5672

If you don't have telnet, you can install a similar tool like netcat. My guess is that you will see "connection refused".

Thanks,
Luke
Reply all
Reply to author
Forward
0 new messages