Strange nHibernate command timeout behavior

861 views
Skip to first unread message

Bill Matson

unread,
Sep 10, 2014, 10:05:16 AM9/10/14
to particula...@googlegroups.com

Hello,

 

We are running version 3.3.7 of nServiceBus, and using the nHibernate timeout manager.  We had the attached failure in a pretty much idle server (it is a test server which had nothing going on at the time).  After a bunch of command timeouts the nservicebus host process just shuts down.  There was nothing in either the system logs, or SQL server logs to explain what happened, and we have 7 other processes also working against the database at the same time, none of them had errors at the same time (i.e. some sort of network or server/SQL glitch)

 

What is really odd is that the times recorded in the log are <1 second apart, however for a command timeout, you’d expect these to be spaced out by some reasonable timeout period (i.e. the default of 30 seconds, or some other value set in the nHibernate config)

 

We’ve seen similar behavior before, and it usually seems to be related to the nHibernate saga or timeout persistence, we don’t see similar errors in our other SQL commands.  Any idea where we should start looking to troubleshoot further?

 

Thanks,

Bill

 

 

log-warning.log

Bill Matson

unread,
Sep 11, 2014, 7:48:01 AM9/11/14
to particula...@googlegroups.com

Bump…Anyone?

John Simons

unread,
Sep 11, 2014, 6:04:53 PM9/11/14
to particula...@googlegroups.com
Hi Bill,

I looked at the log file and here is what I think it may have happened.

Your idle endpoint has timeouts enabled and because of that we poll the timeouts database every so often, during one of these polls, the database server was not contactable (it could be a comms interruption or something else, it happens) and because of that we tried a few more times (in your version we try 10 times), and after all those attempts failed, NServiceBus determines that it cannot recover from this situation and shutsdown intentionally.
When this happens, we also log an error to the window event viewer.

So do you think this is a plausible scenario?

Cheers
John

Bill Matson

unread,
Sep 12, 2014, 3:15:25 AM9/12/14
to particula...@googlegroups.com

John,

 

I agree that there are 5 timeout exceptions there (there is one log entry from nhibernate, followed by one for the timeoutpersisterreceiver), but what’s really odd is that the entire duration of the log is 500 ms, and it’s all happening on the same thread, number 11.  So the timeout persister took 5 tries at reading the table, which appear to have all timed out with little to no delay between them.

 

I may have found a cause, the stack trace isn’t spot-on, but I think the difference is just the fact that it’s opening a distributed transaction in nServiceBus.  Turns out when connecting to mirrored SQL servers (which ours is), there’s a bug in the retry logic that results in a MUCH shorter than expected connection timeout (default is 15 seconds).  Instead of getting a 15 second timeout, you get 1.2 seconds, and if you don’t connect in that time, as an added bonus, the connection gets thrown back into the pool in an invalid state that causes the subsequent failures.  See this KB: http://support.microsoft.com/kb/2605597/en-us  I’ve gone on the server this was happening and we have 4.0.30319.1 of System.data.dll, not the .545 that is referenced in the article, so I’m off to find some updates.

 

Yesterday I realized that this happened while we were doing our ETL run for our DW on the server…..so a slightly longer than “normal” connect time is not highly surprising.

 

Just an FYI in case someone else sees “unexplained” connection timeouts in SQL server from nServiceBus!

 

Bill

--
You received this message because you are subscribed to the Google Groups "Particular Software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to particularsoftw...@googlegroups.com.
To post to this group, send email to particula...@googlegroups.com.
Visit this group at http://groups.google.com/group/particularsoftware.
To view this discussion on the web visit https://groups.google.com/d/msgid/particularsoftware/a47d0101-64d2-4f47-922d-dee8b1c6d560%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages