Thanks again for your response, Krishnan.
ChromeDriver driver = new ChromeDriver(ChromeDriverService.CreateDefaultService(), options, TimeSpan.FromMinutes(3));
driver.Manage().Timeouts().PageLoad.Add(System.TimeSpan.FromSeconds(60));
Also, Daniel Charles' response had a link to
http://jimevansmusic.blogspot.com/2012/11/net-bindings-whaddaymean-no-response.html which talked about the challenges involved with what to do when an application just stops responding. This prompted me to dig in a little more and I was able to isolate my hung application issue to a specific application in our lower environments. We've got a situation where, for whatever reason, the application just goes into a really weird mode of stalling while sending data back. It doesn't crash or send an error code. You can watch it in F12 Dev Tools and see the HTML gets partially output for the page and then the content quits coming in. The status icon is still spinning on the page but nothing else is coming in. So the test in Test Explorer would eventually fail and allow another test to kick off, but the session would still be active in the Running Sessions view in Grid and the browser would still be open on that VM and still trying to download content. This would happen a bunch of times and use up half my slots but the Test Explorer was told to use 16 parallel threads so it would keep sending the requests.
Once I made the change to abandon a request after 60 seconds, then my queued sessions went from 40-50 down to 5-6 and most of them would then finish.
The only issue that I am running across now is that I have a flag for whether or not to use Grid...
if (config.UseSeleniumGrid)
{
driver = new RemoteWebDriver(new Uri(config.SeleniumGridHubUrl), chromeOptions);
}
else
{
// Local Selenium WebDriver
driver = new ChromeDriver(ChromeDriverService.CreateDefaultService(), chromeOptions, TimeSpan.FromSeconds(70));
}
driver.Manage().Timeouts().PageLoad.Add(TimeSpan.FromSeconds(50));
but the RemoteWebDriver constructor does not have an option to accept options and a commandTimeout so I'm not able to specify a commandTimeout when using Grid but I think as long as I drop PageLoad to 50 seconds then it seems to work better because I think the commandTimeout is 60 seconds and it was implied somewhere in something I read that it was better to have a shorter PageLoad duration so that it would fail up to the Grid properly.
public RemoteWebDriver(DriverOptions options);
public RemoteWebDriver(ICapabilities capabilities);
public RemoteWebDriver(Uri remoteAddress, DriverOptions options);
public RemoteWebDriver(Uri remoteAddress, ICapabilities capabilities);
public RemoteWebDriver(ICommandExecutor commandExecutor, ICapabilities capabilities);
public RemoteWebDriver(Uri remoteAddress, ICapabilities capabilities, TimeSpan commandTimeout);
I also found that on my Click() helper method that I had a very naive implementation of the Retry pattern and was eating the ElementClickInterceptedException which might have been causing issues as well and not allowing the TearDown method to get called. These two articles helped me out to implement Microsoft's recommended version of the pattern:
Okay, making progress and I think I can mostly call this specific issue closed because I've made it mostly go away with the PageLoad duration and also I've identified the root cause in a hanging application on our side.
Still lots of other things to work through but I'll bring those up on other threads.
Thanks!