Maybe I'm doing something wrong, but yesterday I've got interesting situation:
1. CloudRetentionStrategy is used, node is offline for quite some time.
2. During that time, ResponseTimeMonitor still gathers results and
interprets them as -1.
3. There is a job for a node, the node is starting.
4. ResponseTimeMonitor kick in, collects the result before the channel
to the node is established, and terminates the node on whichever stage
it is at the moment, which is mostly harmless when node is still
offline or not yet connected, but yesterday it actually killed the
working node.
I've forked Jenkins and made some changes to avoid this situation, but
that is fast and dirty solution, I would like to hear some ideas and
recommendations on how to proceed with this, and how is it expected to
work in general.
My changes:
https://github.com/TheIndifferent/jenkins/commit/d8d93ccedd42a36cfe08548671a8b81470f77a1d