Hi Will,
For any operation / action, Gauge core requests the language runners and waits for a response. If it doesn't receive any response within a particular time limit, it times out. How long the Gauge core should wait is something that users can configure by setting the runner_request_timeout property in gauge.properties.
Looking at the logs that you have shared, it looks like these errors have occurred in the validation phase. What Gauge core does is, for each spec file, it sends a validation request to language runner. Gauge aggregates all the validation errors (including time outs) and prints it when validation is done. And you are seeing these errors after 2 hours because that's the time Gauge takes to finish the validation.
I see 3 issues separate here:
- It could be a connection failure between Gauge core and language runner. This looks very likely. We are investigating this.
- The fact that this occurs intermittently could be an environment issue. The most likely cause could be the way OS schedules various processes. Maybe you can check this happens consistently on the same VM?
- If each validation action takes 10 secs then that is concerning and we NEED to improve the performance. However, I doubt this is what is happening. I think the validation take way less than 10 secs, but at some points the runner loses contact with the core which is why these time out.
Can you help us with some more information - How many spec files you have and what is the combined size of those files on disk? Also, do you always see the same set of specs timing out?
Thanks
Mahendra