Hello,
We are running version 3.3.7 of nServiceBus, and using the nHibernate timeout manager. We had the attached failure in a pretty much idle server (it is a test server which had nothing going on at the time). After a bunch of command timeouts the nservicebus host process just shuts down. There was nothing in either the system logs, or SQL server logs to explain what happened, and we have 7 other processes also working against the database at the same time, none of them had errors at the same time (i.e. some sort of network or server/SQL glitch)
What is really odd is that the times recorded in the log are <1 second apart, however for a command timeout, you’d expect these to be spaced out by some reasonable timeout period (i.e. the default of 30 seconds, or some other value set in the nHibernate config)
We’ve seen similar behavior before, and it usually seems to be related to the nHibernate saga or timeout persistence, we don’t see similar errors in our other SQL commands. Any idea where we should start looking to troubleshoot further?
Thanks,
Bill
Bump…Anyone?
John,
I agree that there are 5 timeout exceptions there (there is one log entry from nhibernate, followed by one for the timeoutpersisterreceiver), but what’s really odd is that the entire duration of the log is 500 ms, and it’s all happening on the same thread, number 11. So the timeout persister took 5 tries at reading the table, which appear to have all timed out with little to no delay between them.
I may have found a cause, the stack trace isn’t spot-on, but I think the difference is just the fact that it’s opening a distributed transaction in nServiceBus. Turns out when connecting to mirrored SQL servers (which ours is), there’s a bug in the retry logic that results in a MUCH shorter than expected connection timeout (default is 15 seconds). Instead of getting a 15 second timeout, you get 1.2 seconds, and if you don’t connect in that time, as an added bonus, the connection gets thrown back into the pool in an invalid state that causes the subsequent failures. See this KB: http://support.microsoft.com/kb/2605597/en-us I’ve gone on the server this was happening and we have 4.0.30319.1 of System.data.dll, not the .545 that is referenced in the article, so I’m off to find some updates.
Yesterday I realized that this happened while we were doing our ETL run for our DW on the server…..so a slightly longer than “normal” connect time is not highly surprising.
Just an FYI in case someone else sees “unexplained” connection timeouts in SQL server from nServiceBus!
Bill
--
You received this message because you are subscribed to the Google Groups "Particular Software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to particularsoftw...@googlegroups.com.
To post to this group, send email to particula...@googlegroups.com.
Visit this group at http://groups.google.com/group/particularsoftware.
To view this discussion on the web visit https://groups.google.com/d/msgid/particularsoftware/a47d0101-64d2-4f47-922d-dee8b1c6d560%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.