Publish hangs if RabbitMq confirmation ackowledgement is lost

943 views
Skip to first unread message

George

unread,
Oct 18, 2013, 12:53:08 PM10/18/13
to masstrans...@googlegroups.com
Hi,
I am in the process of evaluating MassTransit / RabbitMq and I am testing the scenario where the ack from Rabbit is lost before it gets to the publishing application.
I am trying to emulate the scenario where a network connection gets dropped after the publish but before the ack is received be the caller.
 
I can see in RabbitMqProducer that a task is started and is either completed (when the ack is received) or an exception is raised (when the nack is received).
OutboundRabbitMqTransport will initiate the task using the producer and then Wait for the task to complete.
 
I am trying to simulate a network cable becoming unplugged so that the ack and/or nack never arrive.
My first attempt to simulate this was to simply comment out the RabbitMqProducer code that gets executed when the ack is received (IE simulate a lost ack).
Of course when I do this my publisher will hang because OutboundRabbitMqTransport is waiting forever for the task to complete.
I am going to run another simulation where I actually unplug the network cable to see what happens (IE see if the channel gets broken and an error raised or if the publisher will simply hang).
 
Has anyone run into this situation before?  If so, how did you deal with it? (ideally I would like some kind of timeout on the Wait that I can configure).
 
Thanks

George

unread,
Oct 18, 2013, 2:11:12 PM10/18/13
to masstrans...@googlegroups.com
I did a little more digging and it appears that once RabbitMqProducer creates a Task the task will either complete (when an ack is received) or fail (when a nack is receieved).
The OutboundRabbitMq will then wait on the Task for an infinite amount of time.
 
To fake out a lost ack (due to network connectivity issues) I commented out the code in the RabbitMqProducer ack handler (so even thought RabbitMq broker sends the ack, the producer will never process it).
 
While my publisher was waiting for the ack I shutdown the RabbitMq service to see what would happen.  My publisher was still waiting for the ack (IE the shutdown of RabbitMq and the breaking of the connections did not cause my publisher to fail).
 
So it appears, if an ack is lost my publisher will hang.
 
I could wrap my publishing code in its own task, and then use a timeout...but this seems like I am coding around the root problem.
I know this is a corner case, but the system we are developing could suffer from intermittent network problems.
Has anyone else encountered this, and if so, how did you deal with it?

Chris Patterson

unread,
Oct 18, 2013, 2:20:53 PM10/18/13
to masstrans...@googlegroups.com
Thanks for pointing this out, the notifications should be canceling the task when the connection breaks. I will use your modification to reproduce the issue.


--
You received this message because you are subscribed to the Google Groups "masstransit-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to masstransit-dis...@googlegroups.com.
To post to this group, send email to masstrans...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/masstransit-discuss/11d11daf-5d13-4c0a-acf7-9bf2c7409a68%40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

George

unread,
Oct 18, 2013, 3:45:17 PM10/18/13
to masstrans...@googlegroups.com
Thanks Chris,
 
I also noticed that if a nack is received a AggregateException is caught in OutboundRabbitMqTransport and rethrown as a InvalidConnectionException.
 
The InvalidConnectionException is then handled by DefaultConnectionPolicy.Execute (I have not added any policies of my own) and the message is resent.
 
The net result is that if the first send resulted in a nack, the framework will try to resend one more time before propagating the exception.  My assumption is that this was not by design, but more of a side effect.
 

On Friday, 18 October 2013 11:53:08 UTC-5, George wrote:

George

unread,
Oct 18, 2013, 3:55:22 PM10/18/13
to masstrans...@googlegroups.com
I modified the "catch(AggregateException)" handler in OutboundRabbitMqTransport.Send to throw the InnerException rather than a new InvalidConnectionException.  So now when I receive a nack from Rabbit my publisher is notified immediately of the exception rather than going through the reconnect scenario I described in my earlier post.
 
Again, thanks for looking at this!

Chris Patterson

unread,
Oct 19, 2013, 11:00:03 PM10/19/13
to masstrans...@googlegroups.com
Perfect, do you have a pull request so I can test it on my end?


--
You received this message because you are subscribed to the Google Groups "masstransit-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to masstransit-dis...@googlegroups.com.
To post to this group, send email to masstrans...@googlegroups.com.

George

unread,
Oct 21, 2013, 9:52:10 AM10/21/13
to masstrans...@googlegroups.com
Hi Chris,
I am a newbie to github.  I simply grabbed the zip file for the source and was modifiying it locally on my machine.
I will look into what is involved in creating a pull request tonight.\

On Saturday, 19 October 2013 22:00:03 UTC-5, Chris Patterson wrote:
Perfect, do you have a pull request so I can test it on my end?
On Fri, Oct 18, 2013 at 12:55 PM, George <scant...@gmail.com> wrote:
I modified the "catch(AggregateException)" handler in OutboundRabbitMqTransport.Send to throw the InnerException rather than a new InvalidConnectionException.  So now when I receive a nack from Rabbit my publisher is notified immediately of the exception rather than going through the reconnect scenario I described in my earlier post.
 
Again, thanks for looking at this!

On Friday, 18 October 2013 14:45:17 UTC-5, George wrote:
Thanks Chris,
 
I also noticed that if a nack is received a AggregateException is caught in OutboundRabbitMqTransport and rethrown as a InvalidConnectionException.
 
The InvalidConnectionException is then handled by DefaultConnectionPolicy.Execute (I have not added any policies of my own) and the message is resent.
 
The net result is that if the first send resulted in a nack, the framework will try to resend one more time before propagating the exception.  My assumption is that this was not by design, but more of a side effect.
 

On Friday, 18 October 2013 11:53:08 UTC-5, George wrote:
Hi,
I am in the process of evaluating MassTransit / RabbitMq and I am testing the scenario where the ack from Rabbit is lost before it gets to the publishing application.
I am trying to emulate the scenario where a network connection gets dropped after the publish but before the ack is received be the caller.
 
I can see in RabbitMqProducer that a task is started and is either completed (when the ack is received) or an exception is raised (when the nack is received).
OutboundRabbitMqTransport will initiate the task using the producer and then Wait for the task to complete.
 
I am trying to simulate a network cable becoming unplugged so that the ack and/or nack never arrive.
My first attempt to simulate this was to simply comment out the RabbitMqProducer code that gets executed when the ack is received (IE simulate a lost ack).
Of course when I do this my publisher will hang because OutboundRabbitMqTransport is waiting forever for the task to complete.
I am going to run another simulation where I actually unplug the network cable to see what happens (IE see if the channel gets broken and an error raised or if the publisher will simply hang).
 
Has anyone run into this situation before?  If so, how did you deal with it? (ideally I would like some kind of timeout on the Wait that I can configure).
 
Thanks

--
You received this message because you are subscribed to the Google Groups "masstransit-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to masstransit-discuss+unsub...@googlegroups.com.

To post to this group, send email to masstrans...@googlegroups.com.

Chris Patterson

unread,
Oct 21, 2013, 6:29:30 PM10/21/13
to masstrans...@googlegroups.com
Well, you could mail me the files as well in a ZIP that you've changed and I could compare them. Curious what changes you've made.


To unsubscribe from this group and stop receiving emails from it, send an email to masstransit-dis...@googlegroups.com.

To post to this group, send email to masstrans...@googlegroups.com.

George

unread,
Oct 22, 2013, 10:34:08 AM10/22/13
to masstrans...@googlegroups.com
Here you go!
 
I modified the code so that a nack is returned immediately to the publishing code.
 
I modified RabbitMqProducer to simulate a nack (every second ack is treated as a nack).
I modified OutboundRabbitMqTransport so that it throws a TransportException instead of a InvalidConnectException when an AggregateException is handled.  This prevents a second attempt of a resend when a nack is received.
 
I have not done anything wrt to a "missing ack/nack" causes the Publish to hang.  To simulate a missing ack I simply commented out the HandleAck implementation in RabbitMqProducer. 
To unsubscribe from this group and stop receiving emails from it, send an email to masstransit-discuss+unsubscribe...@googlegroups.com.
 
To post to this group, send email to masstrans...@googlegroups.com.
 

--
You received this message because you are subscribed to the Google Groups "masstransit-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to masstransit-discuss+unsub...@googlegroups.com.
To post to this group, send email to masstrans...@googlegroups.com.
 
MassTransit.Transports.RabbitMq.zip
Reply all
Reply to author
Forward
0 new messages