ServiceBus Error with Competing Consumers and Queue Creation

Adam Tybor

unread,

May 2, 2014, 12:36:19 PM5/2/14

to particula...@googlegroups.com

Helpful information to include

Product name: NServiceBus.Azure.Transports.WindowsAzureServiceBus

Version: 5.3.0

Stacktrace:

Microsoft.ServiceBus.Messaging.MessagingException: The remote server returned an error: (409) Conflict. SubCode=40901. Another conflicting operation is in progress..TrackingId:c7a40d9a-ac31-49e1-844a-b4d48362b418_G8

at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)

at Microsoft.ServiceBus.Messaging.ServiceBusResourceOperations.CreateOrUpdateAsyncResult`1.<GetAsyncSteps>b__1f(CreateOrUpdateAsyncResult`1 thisPtr

at Microsoft.ServiceBus.Messaging.IteratorAsyncResult`1.StepCallback(IAsyncResult result)

--- End of inner exception stack trace ---

Server stack trace:

Exception rethrown at [0]:

at Microsoft.ServiceBus.Common.AsyncResult.End[TAsyncResult](IAsyncResult result)

at Microsoft.ServiceBus.NamespaceManager.CreateOrUpdateQueueAsyncResult.CreateDescription(CreateOrUpdateQueueAsyncResult thisPtr

at Microsoft.ServiceBus.Messaging.IteratorAsyncResult`1.StepCallback(IAsyncResult result)

Exception rethrown at [1]:

at Microsoft.ServiceBus.Common.AsyncResult.End[TAsyncResult](IAsyncResult result)

at Microsoft.ServiceBus.NamespaceManager.CreateQueue(QueueDescription description)

at NServiceBus.Azure.Transports.WindowsAzureServiceBus.AzureServiceBusQueueCreator.Create(Address address) in y:\BuildAgent\work\ba77a0c29cee2af1\src\NServiceBus.Azure.Transports.WindowsAzureServiceBus\Creation\Resources\AzureServiceBusQueueCreator.cs:line 56

at NServiceBus.Azure.Transports.WindowsAzureServiceBus.QueueAutoCreation.Run() in y:\BuildAgent\work\ba77a0c29cee2af1\src\NServiceBus.Azure.Transports.WindowsAzureServiceBus\Creation\QueueAutoCreation.cs:line 33

at System.Collections.Generic.List`1.ForEach(Action`1 action)

at NServiceBus.Configure.Initialize() in y:\BuildAgent\work\31f8c64a6e8a2d7c\src\NServiceBus.Core\Configure.cs:line 368

at NServiceBus.Configure.CreateBus() in y:\BuildAgent\work\31f8c64a6e8a2d7c\src\NServiceBus.Core\Configure.cs:line 297

at NServiceBus.Hosting.GenericHost.Start() in y:\BuildAgent\work\31f8c64a6e8a2d7c\src\NServiceBus.Core\Hosting\GenericHost.cs:line 72

Description:

We are launching multiple instances of our worker roles in Azure using a competing consumer pattern, so they are all trying to initialize the same queue. Depending on timing we are seeing the following errors in some of our nodes. This exception type is actually considered a transient error and the IsTransient flag on the exception = true. Can we get better error handling / retry around Queue Creation to prevent these exceptions from bubbling up?

Also, is there a way to gracefully handle initialization errors such that we can try and recover ourselves from these types of things?

Thanks,

Adam

Yves Goeleven

unread,

May 10, 2014, 6:55:20 AM5/10/14

to particula...@googlegroups.com

I would have expected to get MessagingEntityAlreadyExistsException in this case and not the base MessagingException, but I'll add a check for transient MessagingException's (Issue 137)

You can always catch any unhandled exception by registering on the AppDomain.Current.UnhandledException event and handle it in there.

Adam Tybor

unread,

May 13, 2014, 5:44:10 PM5/13/14

to particula...@googlegroups.com

If we hande it there, is there anyway to re-initialize the bus or the transport? My assumption is this error gets thrown in the Transport, then the transport basically shuts down so I am not sure if we could actually do anything with the exception now that I think about it.

Are their any plans to actually make NSB lightweight so we can spin it up and shut it down safely in an appdomain, or run multiple buses in an appdomain?

Adam

Yves Goeleven/Particular Software

unread,

May 14, 2014, 1:52:58 AM5/14/14

to particula...@googlegroups.com

I see where you’re getting at, I don’t think there is an easy way to start over initializing the bus. But isn’t your role instance recycling on this exception? Or are you running in the shared host?

I need to check with the team for the multi bus support per host process.

On May 13, 2014 at 12:44AM IDT Adam Tybor <adam....@gmail.com> wrote:

If we hande it there, is there anyway to re-initialize the bus or the
transport? My assumption is this error gets thrown in the Transport, then
the transport basically shuts down so I am not sure if we could actually do
anything with the exception now that I think about it.

Are their any plans to actually make NSB lightweight so we can spin it up
and shut it down safely in an appdomain, or run multiple buses in an
appdomain?

Adam

On May 10, 2014 at 01:55PM IDT Yves Goeleven <yv...@goeleven.com> wrote:

I would have expected to get MessagingEntityAlreadyExistsException in this
case and not the base MessagingException, but I'll add a check for
transient MessagingException's (Issue 137)

You can always catch any unhandled exception by registering on the
AppDomain.Current.UnhandledException event and handle it in there.

--
You received this message because you are subscribed to the Google Groups "Particular Software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to particularsoftw...@googlegroups.com.
To post to this group, send email to particula...@googlegroups.com.
Visit this group at http://groups.google.com/group/particularsoftware.
To view this discussion on the web visit https://groups.google.com/d/msgid/particularsoftware/0c249de2-bc11-4795-9385-128b95bd83d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[[59cd8fa9b4c8a5b719fe306c71af8fc2ff7a2fb8-233373292]]

Adam Tybor

unread,

May 14, 2014, 8:46:44 AM5/14/14

to particula...@googlegroups.com, Yves Goeleven/Particular Software

This is a problem in general regardless of host because if the transport crashes, it can crash for any reason, our whole AppDomain needs to be recycled. So apps running in IIS will constantly be recycled for every unhandled exception and our windows services will be recycled too. I am less concerned about windows services since those are truly async and won't have a user on the other end. However for web apps thats a big problem if we kick people out of the app every time there is a hiccup in the transport.

I think there are two options here.

1) Fix NSB to get rid of statics, allow multiple buses per processes, and allow the user control over actually starting, stopping, and reinitializing the bus within an appdomain.

2) Make the ASB transport far more resilient to failures. Currently it attempts to retry and after x retries it throws on a background thread crashing the bus. This is really bad behavior as it can be expected that the ASB Service, an entity, or even a single partition goes down for several seconds and up to a couple minutes. It also gets interesting when you think about all the ancillary connections involved in a typical service. Its not just the worker queue that I need to worry about crashing my processes, its also things like timeouts, slr, and other satellites.

Adam

To unsubscribe from this group and stop receiving emails from it, send an email to particularsoftware+unsub...@googlegroups.com.

To post to this group, send email to particula...@googlegroups.com.
Visit this group at http://groups.google.com/group/particularsoftware.
To view this discussion on the web visit https://groups.google.com/d/msgid/particularsoftware/0c249de2-bc11-4795-9385-128b95bd83d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[[59cd8fa9b4c8a5b719fe306c71af8fc2ff7a2fb8-233373292]]

Yves Goeleven

unread,

May 14, 2014, 8:54:36 AM5/14/14

to particula...@googlegroups.com, Yves Goeleven/Particular Software

nr 2. is on my list (https://github.com/Particular/NServiceBus.Azure/issues/136) but the problem is that there is no reliable alternative to store the messages if asb is gone. It's probably a networking issue, which also means we can't rely on disk (as disks are not presisted in cloudservices or persisted remotely for vm's so also subject to networking issues), can't rely on storage, putting it in memory is not a good idea either, etc... so besides crashing the process, we don't see much alternatives

Yves Goeleven

unread,

May 14, 2014, 8:57:14 AM5/14/14

to particula...@googlegroups.com, Yves Goeleven/Particular Software

PS: this issue is not limited to azure servicebus or azure in general, it happens in every big virtualized environment, so please don't conclude that running with msmq on amazon ec2's disks backed bij ebs is gonna be any better, it has the same issue...

Adam Tybor

unread,

May 14, 2014, 9:42:09 AM5/14/14

to particula...@googlegroups.com, Yves Goeleven/Particular Software

:) Fully aware that this is a problem everywhere.

There are two paths that need be considered when handling transport errors. The easy path is the "Send / Publish" path. Since sending doesn't happen on a background thread it is safe to throw here and let the user handle it. If they want recovery they can easily catch the exception. Each time a send / publish is called a new connection to the transport can be tried, I don't think its worth crashing the process here.

The harder path is what to do for the background listeners. Throwing here will force the process to be recycled, that is the only way to recover and I think this is a really bad idea, it would be better to wrap this in a constant retry of some kind and keep logging exceptions unless its something fatal like the queue doesn't exist and can't be created or permissions are invalid to receive. I think John did some work like this on the RabbitMQ transport to make it more resilient.

Adam

Yves Goeleven

unread,

May 14, 2014, 10:00:41 AM5/14/14

to particula...@googlegroups.com, Yves Goeleven/Particular Software

I'll consider to 'keep trying', but it is quite opposite to the general cloudservices mantra that says that failure is normal, so just crash and let the instance recycle...

Adam Tybor

unread,

May 14, 2014, 10:19:52 AM5/14/14

to particula...@googlegroups.com, Yves Goeleven/Particular Software

I don't know if you retry indefinitely, and I don't know if cloud mantra is crash, its design for failure. Recycling a worker role can be very expensive, so I think we just need more control over retry here.

Adam

Reply all

Reply to author

Forward