Incomplete Rollback

42 views
Skip to first unread message

Michael Sinnott

unread,
Sep 26, 2014, 10:07:48 AM9/26/14
to nginn-me...@googlegroups.com
Hi, 

I have a long running message handler, at the end of which, I perform a CurrentMessageBus.Reply. I have noticed a System.Transactions.TransactionException: The operation is not valid for the state of the transaction. ---> System.TimeoutException: Transaction Timeout. This has rolled back the handler and marked it to retry but I noticed that the reply message was successfully placed onto the queue anyway.

This has caused duplications further down the process as the original failed message retried. 

Do you have any suggestions as to what may have caused this?

Cheers, 

Mike

Rafal Gwizdala

unread,
Sep 26, 2014, 10:52:42 AM9/26/14
to nginn-me...@googlegroups.com
Mike, can you show some code that reproduces that? Is the behavior consistent, I mean does the reply message get sent every time even if the transaction is rolled back? Are you sure your code runs inside TransactionScope? AFAIK no inserts should be performed at all before the transaction is commited (because nginn-messagebus caches outgoing messages until transaction commit), that's why I find this strange.

R

--
You received this message because you are subscribed to the Google Groups "nginn-messagebus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nginn-messageb...@googlegroups.com.
To post to this group, send email to nginn-me...@googlegroups.com.
Visit this group at http://groups.google.com/group/nginn-messagebus.

Michael Sinnott

unread,
Sep 26, 2014, 10:57:38 AM9/26/14
to nginn-me...@googlegroups.com
Hi Rafal, 

We are not doing anything with TransactionScopes ourselves and we have not tried to manually rollback the transaction. We have only noticed the timeout exception once and we can't get it to timeout on demand, so can't say if it consistent or not. I'll try and write something to reproduce this but it is hard as it does not timeout every time. 

I'll be in touch when I have something you can look at. 

Cheers, 

Mike

Michael Sinnott

unread,
Sep 29, 2014, 7:51:32 AM9/29/14
to nginn-me...@googlegroups.com
Hi Rafal, 

I have attached a sample app which demonstrates an issue we have found. It is not the same as above but I will try my best to explain it. 

We have a long running message which is being dealt with by Thread 1. The handler, which deals with this message, has some custom code to log that an action has timed out. There was a bug in our code which allowed this message on Thread 1 to eternally keep trying up until the transaction timed out. The transaction timeout was set to 10 Minutes using the following in the message bus config - .SetReceiveTransactionTimeout(TimeSpan.FromMinutes(10)).

Looking through the logs, we notice that after 10 minutes, Thread 2 has picks up the same message while Thread 1 is still processing it. Thread 2, then completes successfully first and marks the message down as a success. However, Thread 1 now fails (nginn rolls back correctly) and marks the message as a Retry over writing the Success ('X') from Thread 2 and at a later time a new thread then picks up the message and successfully processes it again. 

We end up with 2 responses for 1 message. This, as well as a few bugs in our code, caused us the duplications. 

We think the message shouldn't have been given to a new thread as it was already being processed by another thread and if the 2nd thread is successful, the 1st thread should not mark it down to retry. 

The sample app should demonstrate this to you. 

Cheers, 

Mike
PrintTest.zip

Rafal Gwizdala

unread,
Sep 29, 2014, 8:34:17 AM9/29/14
to nginn-me...@googlegroups.com
Hi Michael, thank you for this analysis, I think you have nailed the problem (however, i haven't had a look at the code yet). It shouldn't be difficult to add a protection for such scenario (overwriting the status of a processed message), but currently I'm working on Oracle support and it will take a while until I'm ready with next version. 
In the meantime, I'd like to suggest some solutions:

Option #1 is to use just 1 receiving thread - this way no other thread will step in and process the message before the first transaction completes.
Option #2 is to redesign the process so message handler will not keep a transaction open for 10 minutes. It's imho too long for a database transaction, maybe you could just initiate the lengthy operation in a message handler, then let it run for 10 (or more) minutes  and publish 'completed' message after completion? You'd have also to add some timeout message for handling the case when the operation doesn't complete at all or is interrupted in the middle, but this way you wouldn't keep the message 'busy' and you wouldn't have to worry about database transaction timing out. It's a standard approach for handling long-running transactions and 10 minutes is certainly a long-running one. I'd recommend this even if the fix for the original problem is ready.
I don't have #3 yet, it could be 'use saga' but basically it's same as #2.
Pls let me know if this works for you.

Best regards
R
 

Michael Sinnott

unread,
Sep 29, 2014, 10:16:03 AM9/29/14
to nginn-me...@googlegroups.com
Hi Rafal, 

Thanks for your response, much appreciated. We have a few comments on your suggestions:

Option #1  - We can't do this option as it is a printing process so we are waiting for the printer to give us a response (success or fail) and we also want to run other messages to the printer, not just one as it becomes too slow.

Option #2 - We have made changes to our code which should stop the process before the timeout is reached. This should help us in the short term. If we used another background process, we would lose the resilience (retry etc) of the message bus and we don't want to. 

Option #3 - Can we suggest that the message bus doesn't over write a success ('X') with a retry ('R')?

Thanks again, 

Mike

Rafal Gwizdala

unread,
Sep 29, 2014, 11:48:42 AM9/29/14
to nginn-me...@googlegroups.com
#3 is the intended fix, I'll let you know as soon as new version is ready (i need about 1 week). Or is it urgent?

R

Michael Sinnott

unread,
Sep 29, 2014, 12:25:56 PM9/29/14
to nginn-me...@googlegroups.com
We should be ok to wait.

Thanks Rafal

From: Rafal Gwizdala
Sent: ‎29/‎09/‎2014 16:48
To: nginn-me...@googlegroups.com
Subject: Re: [nginn-messagebus] Incomplete Rollback

You received this message because you are subscribed to a topic in the Google Groups "nginn-messagebus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nginn-messagebus/8GU4Yy_mEvk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nginn-messageb...@googlegroups.com.

nginn-messagebus

unread,
Oct 6, 2014, 3:51:44 PM10/6/14
to nginn-me...@googlegroups.com
Hi Michael,

There's a new version, 1.1.4, that includes a fix for the bug you reported.
It also includes support for Oracle, but this shouldn't affect you.
Unfortunately, I didnt have time to perform detailed tests so please check if the update goes smoothly and let me know if something breaks.

Best regards
Rafal

W dniu poniedziałek, 29 września 2014 18:25:56 UTC+2 użytkownik Michael Sinnott napisał:
We should be ok to wait.

Thanks Rafal

From: Rafal Gwizdala
Sent: ‎29/‎09/‎2014 16:48
To unsubscribe from this group and all its topics, send an email to nginn-messagebus+unsubscribe@googlegroups.com.
To post to this group, send email to nginn-messagebus@googlegroups.com.

Michael Sinnott

unread,
Oct 6, 2014, 3:58:29 PM10/6/14
to nginn-me...@googlegroups.com
Hi Rafal,

Thanks for this. I'll be in touch once I've completed some testing.

Cheers,

Mike

Michael Sinnott

unread,
Oct 10, 2014, 9:14:26 AM10/10/14
to nginn-me...@googlegroups.com, nginn-me...@googlegroups.com
Hi Rafal, 

I have done some testing and everything seems to be fine, fixes this bug. 

Thanks for your help and the new version!

Cheers, 

Mike


On Monday, 6 October 2014 20:51:44 UTC+1, nginn-messagebus wrote:
Hi Michael,

There's a new version, 1.1.4, that includes a fix for the bug you reported.
It also includes support for Oracle, but this shouldn't affect you.
Unfortunately, I didnt have time to perform detailed tests so please check if the update goes smoothly and let me know if something breaks.

Best regards
Rafal

W dniu poniedziałek, 29 września 2014 18:25:56 UTC+2 użytkownik Michael Sinnott napisał:
We should be ok to wait.

Thanks Rafal

From: Rafal Gwizdala
Sent: ‎29/‎09/‎2014 16:48
To unsubscribe from this group and all its topics, send an email to nginn-messageb...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages