Azure Service Bus - Active Replication

159 views
Skip to first unread message

Mike Gore

unread,
Mar 2, 2017, 12:02:19 AM3/2/17
to Particular Software

Hi,

This ties into a previous topic, but what I am trying to do is get a partitioning strategy to work just like the Azure Service Bus example in the Azure docs which is for active replication.  That is:
  1. Attempt to send a message to both namespaces
  2. If there is a failure in one send, but not the other then treat as success (this is where I'm struggling to mimic the behavior).
  3. If there is a failure in both then throw an exception.

The reason we want this as opposed to the out of the box strategies is so by default we are always sending messages to both namespaces to minimize the risk of message loss and to ensure we can still process all messages in the case of service bus downtime.  The best I've been able to come up with is the code below.  This basically sets all active namespaces to also be passive namespaces.  The main issue I can see with this is that we can't determine if the initial send has failed to optimize which secondaries to return.  I'm just reversing the primary list at the moment, but there's no guarantee that states the primary would be the failing namespace.  Therefore I'm going to end up with two message sends onto the namespace that hasn't gone down which feels wasteful.

If I remove the secondary namespaces then the Bus.Send will throw an exception which I'm trying to avoid here.  Just wondering if there are any better approaches I could take here?  A potential alternative may be to spin up a 3rd namespace that is always passive while the current two namespaces are set to active.  In the case of a namespace outage (assuming each namespace is on a separate region) then we should have at least two namespaces that can accept messages.

    public class ActiveReplicationNamespacePartitioningStrategy : INamespacePartitioningStrategy
   
{
       
private readonly NamespaceConfigurations namespaces;

       
public ActiveReplicationNamespacePartitioningStrategy(ReadOnlySettings settings)
       
{
           
if (settings.TryGet("AzureServiceBus.Settings.Topology.Addressing.Namespaces", out namespaces) && namespaces.Count == 2)
           
{
               
return;
           
}

           
throw new Exception("The 'Active Replication' namespace partitioning strategy requires two namespaces");
       
}


       
public IEnumerable<RuntimeNamespaceInfo> GetNamespaces(PartitioningIntent partitioningIntent)
       
{
           
if (partitioningIntent == PartitioningIntent.Sending)
           
{
               
var activePassive = new List<RuntimeNamespaceInfo>();

                activePassive
.AddRange(namespaces.Select(ns => new RuntimeNamespaceInfo(ns.Alias, ns.ConnectionString, NamespacePurpose.Partitioning, NamespaceMode.Active)));
                activePassive
.AddRange(namespaces.Reverse().Select(ns => new RuntimeNamespaceInfo(ns.Alias, ns.ConnectionString, NamespacePurpose.Partitioning, NamespaceMode.Passive)));

               
return activePassive;
           
}
           
else
           
{
               
return namespaces.Select(ns => new RuntimeNamespaceInfo(ns.Alias, ns.ConnectionString, NamespacePurpose.Partitioning));
           
}
       
}
   
}

Thanks,

Mike.

Sean Feldman

unread,
Mar 10, 2017, 2:03:56 PM3/10/17
to Particular Software
Mike,

Could you elaborate on how this would be different from the FailOver partitioning provided by the ASB transport? https://docs.particular.net/nservicebus/azure-service-bus/multiple-namespaces-support#fail-over-namespace-partitioning

Trying to understand what is the difference between what FailOver partitioning is doing and what you describe in the 3 steps above.

Thank you,
Sean

Mike Gore

unread,
Mar 18, 2017, 1:00:44 AM3/18/17
to Particular Software
Hi Sean,

The key difference is that we are always attempting to deliver to two namespaces.  This means if there is an outage then messages that would typically be stuck in one namespace can still be processed as they exist in the other, the concern here is more that the send was successful but the message isn't received in a timely fashion, or even worst case lost because of an issue within ASB (depending on the nature of their outage).  Of course the trade off is having to handle the idempotency and process more messages.  However, the messages contain information about customer payments, so we do not want to risk losing these.

With the fail over strategy we are only attempting to deliver to one namespace and an outage in one namespace switches which namespace to use - so whilst we should always be able to send (assuming at least one namespace is available), the impact of an outage would differ as this would cause messages to be stuck and there is the risk (albeit small) of data loss.

An alternative would be to introduce a buffer between the sender and the bus, so messages could be replayed if required and the fail over strategy could be used here.

Yves Goeleven

unread,
Mar 18, 2017, 12:06:59 PM3/18/17
to Particular Software
Sounds like you're looking for replication instead of failover?

Maybe this sample can help and act as a starting point to get replication going: https://docs.particular.net/samples/azure/custom-partitioning-asb/

Sean Feldman

unread,
Mar 20, 2017, 12:37:06 PM3/20/17
to particula...@googlegroups.com
Mike,

One thing to clarify about ASB.

 This means if there is an outage then messages that would typically be stuck in one namespace can still be processed as they exist in the other, the concern here is more that the send was successful but the message isn't received in a timely fashion, or even worst case lost because of an issue within ASB (depending on the nature of their outage)

If a message is reported by the client as successfully sent to the broker, the broker will never loose that message. If an outage is taking place, your message either won't be successfully sent, or it will be on the broker, inaccessible till the outage is resolved. But never lost. In scenario where the entire data center, where the namespace is hosted, goes down (all 3 replicas), then the message would not be accessible until data center is back. But again, no loss, just temporary inability to access it. Only in the case of an unlikely natural disaster, where DC is wiped out, the message will be lost. Which is not different BTW from Azure SQL or Storage blobs.
Reply all
Reply to author
Forward
0 new messages